I finally got PDAL properly compiled with Point Cloud Library (PCL) baked in. Word to the wise — CLANG is what the makers are using to compile. The PDAL crew were kind enough to revert the commit which broke GCC support, but why swim upstream? If you are compiling PDAL yourself, use CLANG. (Side note, the revert to support GCC was really helpful for ensuring we could embed PDAL into OpenDroneMap without any compiler changes for that project.)
With a compiled version of PDAL with the PCL dependencies built in, I can bypass using the docker instance. When I was spawning tens of threads of Docker and then killing them, recovery was a problem (it would often hose my docker install completely). I’m sure there’s some bug to report there, or perhaps spawning 40 docker threads is ill advised for some grander reason, but regardless, running PDAL outside a container has many benefits, including simpler code. If you recall our objectives with this script, we want to:
- Calculate relative height of LiDAR data
- Slice that data into bands of heights
- Load the data into a PostgreSQL/PostGIS/pgPointCloud database.
The control script without docker becomes as follows:
# readlink gets us the full path to the file. This is necessary for docker
readlinker=`readlink -f $1`
# returns just the directory name
# basename will strip off the directory name and the extension
name=`basename $1 .las`
# PDAL must be built with PCL.
# See http://www.pdal.io/tutorial/calculating-normalized-heights.html
pdal translate "$name".las "$name".bpf height --writers.bpf.output_dims="X,Y,Z,Intensity,ReturnNumber,NumberOfReturns,ScanDirectionFlag,EdgeOfFlightLine,Classification,ScanAngleRank,UserData,PointSourceId,HeightAboveGround"
# Now we split the lidar data into slices of heights, from 0-1.5 ft, etc.
# on up to 200 feet. We're working in the Midwest, so we don't anticipate
# trees much taller than ~190 feet
for START in 0:1.5 1.5:3 3:6 6:15 15:30 30:45 45:60 60:105 105:150 150:200
# We'll use the height classes to name our output files and tablename.
# A little cleanup is necessary, so we're removing the colon ":".
nameend=`echo $START | sed s/:/-/g`
# Name our output
# Implement the height range filter
pdal translate $name.bpf $bpfname -f range --filters.range.limits="HeightAboveGround[$START)"
# Now we put our data in the PostgreSQL database.
pdal pipeline -i pipeline.xml --writers.pgpointcloud.table='pa_layer_'$nameend --readers.bpf.filename=$bpfname --writers.pgpointcloud.overwrite='false'
We still require our pipeline xml in order to set our default options as follows:
<?xml version="1.0" encoding="utf-8"?>
host='localhost' dbname='user' user='user' password=‘password’
And as before, we can use parallel to make this run a
little lot faster:
find . -name '*.las' | parallel -j20 ./pdal_processor.sh
For the record, I found out through testing that my underlying host only has 20 processors (though more cores). No point in running more processes than that… .
So, when point clouds get loaded, they’re broken up in to “chips” or collections of points. How many chips do we have so far?:
user=# SELECT COUNT(*) FROM "pa_layer_0-1.5";
Now, how many rows is too many in a PostgreSQL database? Answer:
In other words, your typical state full of LiDAR (Pennsylvania or Ohio for example) are not too large to store, retrieve, and analyze. If you’re in California or Texas, or have super dense stuff that’s been flown recently, you will have to provide some structure in the form of partitioning your data into separate tables based on e.g. geography. You could also modify your “chipper” size in the XML file. I have used the default 400 points per patch (for about 25,765,414,000 points total), which is fine for my use case as then I do not exceed 100 million rows once the points are chipped: