For a couple of months now I’ve been corresponding with Etienne Racine and Pierre Racine out of Montreal Laval University in Quebec City. They decided to take on the problem of finding the speed and accuracy of a number of different techniques for extracting canopy height from LiDAR data. They have been kind enough to allow me to post the results here. This will be a multipart series. We’ll start with the punchline, and then build out the code behind each of the queries/calculations in turn (category will be “LiDAR Shootout“) with separate posts.
In the hopper for testing were essentially 4 different ways to extract canopy height from a LiDAR point cloud and (in three of the cases) a DEM extracted from the LiDAR point cloud. The 4 techniques were as follows:
- Use ArcGIS ExtractValuesToPoints.
- Use R using the raster and rgdal packages with raster::extract() function with in-memory results.
- Use postgis (2.0 svn) Raster datatype (formerly WKT Raster)
- And use a simple nearest neighbor approach with ST_DWithin within PostGIS.
The data are from the Ohio Statewide Imagery Program (OSIP) program, run by Ohio Geographically Referenced Information Program (OGRIP). Ohio is blessed with an excellent 2006-2008 LiDAR dataset and statewide DEM derived from the (effectively) 3.5 foot horizontal posting data (specs may say 5-ft, I can’t remember…).
Speeeed:
So, first to speed, initial results. Who wins in the speed category? Measured as points per second (on consistently the same hardware), nearest neighbor wins by a landslide (bigger is better here):
Application: Points per Second
ArcGIS: 4504
R: 3228
PostGIS Raster: 1451 (maybe as high as 4500)
Postgis nn: 18208
Now, later on, Pierre discovered changes to indexing may help an the EXPLAIN query analyzer optimization which tripled the PostGIS Raster query speed, making it about the same speed as ArcGIS. More on that later.
Figure removed– to be replaced in another post
Accuracy:
A fast calculation is always better, unless you trade off accuracy in some detrimental way. With PostGIS NN, we’re not interpolating our LiDAR ground point cloud before calculating the difference, so relative to the other three techniques, we could be introducing some bias/artifacts here, a point Jeff Evans makes here. Overall error relative to the interpolated solution introduced by using an NN technique on this dataset: 0.268 feet. Whew! Sigh of relief for me! We may spend some time looking at the geographic distribution of that error to see if it’s random or otherwise, but that’s very near the error level for the input data.
Addendum:
8 days ago, Martin Isenburg’s let the lasheight tool drop. Lastools is impressively fast. Preliminary tests by Etienne: 140450 points per second.
Oy! A factor of 8 faster than PostGIS NN And it uses an on-the-fly calculated DEM using delaunay triangulation. Preliminary results indicate a very different DEM than the one calculated by the state:
RMSE: 25.4227446600656
More research will need done to find out the source of the differences: they are not random.
In other news:
PostGIS will get a true Knn technique in the near future. Such a project is funded, so stay tuned for faster results:
http://www.postgis.org/pipermail/postgis-devel/2011-June/013931.html
Also, index changes to PostGIS could come down the pike, if funded, that will result in bounding box queries that are 6-times faster, ala:
http://blog.opengeo.org/2011/05/27/pgcon-notes-3/
Between the two optimizations, we could give Lastools a run for its money on speed
. In the meantime, preliminary RMSE aside, Lastools (the 5th tool not in this test) may win hands down.