Smathermather's Weblog

Remote Sensing, GIS, Ecology, and Oddball Techniques

KNN with FLANN and laspy, a starting place

Posted by smathermather on August 8, 2014

FLANN is Fast Library for Approximate Nearest Neighbors, which is a purportedly wicked fast nearest neighbor library for comparing multi-dimensional points. I only say purportedly, as I haven’t verified, but I assume this to be quite true. I’d like to move some (all) of my KNN calculations outside the database.

I’d like to do the following with FLANN– take a LiDAR point cloud and change it into a LiDAR height-above-ground point cloud. What follows is my explorations so far.

In a previous series of posts, e.g. http://smathermather.wordpress.com/2014/07/14/lidar-and-pointcloud-extension-pt-6/

pointcloud6

I have been using the point cloud extension in PostGIS. I like the 2-D chipping, but I think I should segregate my data into height classes before sending it into the database. In this way, I can query my data by height class and by location efficiently, taking full advantage of the efficiencies of storing all those little points in chips, while also being able to query the data in any of the dimensions I need to in the future. Enter FLANN.

I haven’t gotten far. To use FLANN with LiDAR through Python, I’m also using laspy.  There’s a great tutorial here: http://laspy.readthedocs.org/en/latest/tut_part_1.html

laspy

I make one change to the tutorial section using FLANN. The code as written is:

import laspy
import pyflann as pf
import numpy as np

# Open a file in read mode:
inFile = laspy.file.File("./laspytest/data/simple.las")
# Grab a numpy dataset of our clustering dimensions:
dataset = np.vstack([inFile.X, inFile.Y, inFile.Z]).transpose()

# Find the nearest 5 neighbors of point 100.

neighbors = flann.nn(dataset, dataset[100,], num_neighbors = 5)
print("Five nearest neighbors of point 100: ")
print(neighbors[0])
print("Distances: ")
print(neighbors[1])

To make this example work with the current version of pyflann, we need to make sure we import all of pyflann (or at least nn), and also set flann = FLANN() as follows:

import laspy
import numpy as np
from pyflann import *

# Open a file in read mode:
inFile = laspy.file.File("simple.las")
# Grab a numpy dataset of our clustering dimensions:
dataset = np.vstack([inFile.X, inFile.Y, inFile.Z]).transpose()

# Find the nearest 5 neighbors of point 100.
flann = FLANN()
neighbors = flann.nn(dataset, dataset[100,], num_neighbors = 5)
print("Five nearest neighbors of point 100: ")
print(neighbors[0])
print("Distances: ")
print(neighbors[1])

Finally, a small note on installation of pyflann on Ubuntu. What I’m about to document is undoubtedly not the recommended way to get pyflann working. But it worked… .

Installation for FLANN on Ubuntu can be found here: http://www.pointclouds.org/downloads/linux.html
pcl

But this does not seem to install pyflann. That said, it installs all our dependencies + FLANN, so…

I cloned, compiled, and installed the FLANN repo: https://github.com/mariusmuja/flann

git clone git://github.com/mariusmuja/flann.git
cd flann
mkdir BUILD
cd BUILD
cmake ../.
make
sudo make install

This get’s pyflann where it needs to go, and voila! we can now do nearest neighbor searches within Python.

Next step, turn my LiDAR xyz point cloud into a xy-height point cloud, then dump in height-class by height class into PostgreSQL. Wish me luck!

Posted in 3D, Database, FLANN, Other, pointcloud, PostGIS, PostgreSQL | Tagged: , , , , , , | Leave a Comment »

Drivetime analyses, pgRouting

Posted by smathermather on August 6, 2014

Map of overlapping 5-minute drive times from park access points
We’ve got some quick and dirty pgRouting-based code up on github. I say quick and dirty because it directly references the table names in both of the functions. I hope to fix this in the future.

The objective with this code is to input a point, use a nearest neighbor search to find the nearest intersection, and from that calculate a drive time alpha shape.

First, the nearest neighbor search:

CREATE OR REPLACE FUNCTION pgr.nn_search (geom geometry) RETURNS
int8 AS $$

SELECT id FROM pgr.roads_500_vertices_pgr AS r
ORDER BY geom <#> r.the_geom
LIMIT 1;

$$ LANGUAGE SQL VOLATILE;

This function takes one argument– pass it a point geometry, and it will do a knn search for the nearest point. Since we are leveraging pgRouting, it simply returns the id of the nearest intersection point. We wrote this as a function in order to run it, e.g. on all the points in a table. As I stated earlier, it directly references a table name, which is a little hackerish, but we’ll patch that faux pax later.

Now that we have the ID of the point in question, we can do a driving distance calculation, wrap that up in an alpha shape, and return our polygon. Again, we write this as a function:

CREATE OR REPLACE FUNCTION pgr.alpha_shape (id integer, minutes integer) RETURNS
geometry AS $$

WITH alphashape AS(SELECT pgr_alphaShape('WITH
DD AS (
SELECT seq, id1 AS node, cost
FROM pgr_drivingDistance(''SELECT id, source, target, cost FROM pgr.roads_500'',' || id || ', ' || minutes || ', false, false)),
dd_points AS (
SELECT id_ AS id, x, y
FROM pgr.roads_500_vertices_pgr v, DD d
WHERE v.id = d.node)
SELECT * FROM dd_points')),

alphapoints AS (
SELECT ST_Makepoint((pgr_alphashape).x, (pgr_alphashape).y) FROM alphashape),

alphaline AS (
SELECT ST_MakeLine(ST_MakePoint) FROM alphapoints)

SELECT ST_MakePolygon(ST_AddPoint(ST_MakeLine, ST_StartPoint(ST_MakeLine))) AS the_geom FROM alphaline

$$ LANGUAGE SQL VOLATILE;

Finally, we’ll use these functions in conjunction with a set of park feature access points to map our our 5-minute drive time. Overlaps in 5-minute zones we’ve rendered as a brighter green in the image above.

CREATE TABLE pgr.alpha_test AS
WITH dest_ids AS (
SELECT pgr.nn_search(the_geom) AS id FROM pgr.dest_pts
)
SELECT pgr.alpha_shape(id::int, 5)::geometry, id FROM dest_ids;

Oh, and I should point out that for the driving distance calculation, we have pre-calculated the costs of the roads based on the speed limit, so cost is really travel time in minutes.

Posted in Analysis, Database, pgRouting, PostGIS, PostgreSQL, SQL | Tagged: , , | 2 Comments »

Using Spatial Data in R to Estimate Home Ranges (guest blog post)

Posted by smathermather on August 5, 2014

Overlay of daytime and nightime home ranges and coyote points (A guest blog post from Dakota Benjamin today, a Case Western Reserve University senior, and 3 year Summer intern we’ve been luck enough to recruit)

This post has two big take-aways: using spatial data in R, and applying that knowledge to estimating home ranges. If you are not familiar with the R environment, there are many great resources to familiarizing yourself with this powerful language, such as [here](http://cran.r-project.org/doc/manuals/R-intro.pdf) and [here](http://www.statmethods.net/index.html). You will need a basic understanding of R before proceeding.

The package [rgdal](http://cran.r-project.org/web/packages/rgdal/) is essential for utilizing spatial data in R. There are two functions that we will use. First is readOGR(). This function will import a shape file into a special type of data frame, in this case a SpatialPointsDataFrame, which contains the data, coordinates, bounding box, etc. which makes it all accessible in R.

 coyote <- readOGR("nc_coyote_centroids_dn","nc_coyote_centroids_dn") 

The second function we will use is writeOGR(), which as you can probably guess will write a SpatialDataFrame to a shape file (or other format).

 writeOGR(ver, "hr_coyote_centroids_dn", "hr_centroids_dn", "ESRI Shapefile") 

Now we can work with the coyote data in R and use some tools to estimate the home range. The previous blog post shows how to clean up the data using PostGIS (http://smathermather.wordpress.com/2014/07/31/cleaning-animal-tracking-data-throwing-away-extra-points/) will be useful for cleaning up and framing the data in a useful way. What if we want to look at how the home range changes between night and day? I wrote a quick function to add a column to the data. calcDayNight() uses a function from the [RAtmosphere package](http://cran.r-project.org/web/packages/RAtmosphere/) called suncalc(), which calculates the sunrise and sunset for a given date and location. calcDayNight() compares each data point to the sunrise and sunset times and determines whether the point was taken during the day or at night. Here’s the code:

calcDayNight <- function(x) {
  #if time is greater than the sunrise time or less than the sunset time, apply DAY
  suntime <- suncalc(as.numeric(as.Date(x["LMT_DATE"], format="%m-%d-%y") - as.Date("2013-01-01")), Lat=41.6, Long=-81.4)
  coytime <- as.numeric(strptime(x["LMT_TIME"], "%T") - strptime("00:00:00", "%T"))
 
  if(coytime > suntime$sunrise & coytime < suntime$sunset){
    x["dayornight"] <- "DAY"
  } else x["dayornight"] <- "NIGHT"
}

After we get the day and night centroids (from the previous post), we can do our home range estimation. The package we need is [adehabitatHR](http://cran.r-project.org/web/packages/adehabitatHR/). There’s two steps here: create the utilization distribution, and produce a home range contour from that distribution.

The function for creating the distribution is kernelUD(). More information about this model can be found [here](http://cran.r-project.org/web/packages/adehabitatHR/adehabitatHR.pdf) on page 33.

ud <- kernelUD(coyote[,2], h="href")

coyote[,2] refers to the “dayornight” column that contains the “DAY” and “NIGHT” values. Two different distributions will be calculated. Then we will produce a SpatialPolygonsDataFrame using the getverticeshr(), which will be the 95% estimation of the home range:

ver <- getverticeshr(ud, 95)

Export that as I showed above and we get a shape file with two polygons. Above is the home range estimation with the day/night centroids overlaid (blue for day and purple for night).

Posted in Analysis, R | Tagged: , | Leave a Comment »

Bicycling for Mapillary

Posted by smathermather on August 2, 2014

We’ve got two wheels now, baby! (and I’ve refined this a bit from a bungee only setup).  Here’s my rig for bicycling for Mapillary photos.  This uses the same gimbal for stabilizing and leveling the photos, so it should have similarly great output.

Mapillary output yet to come for this one.  In the mean time, you can watch the camera gimbal come alive:

Also you can look at the stabilized unicycle versions here: https://mapillary.github.io/mapillary_examples/seven_hills.html and here: http://www.mapillary.com/map/im/16/41.38742655884223/-81.69359564781189

mapillary

 

Posted in Mapillary | Tagged: | Leave a Comment »

Unicycling for Mapillary

Posted by smathermather on August 1, 2014

I really dig the crowd-sourced street-view service, Mapillary, and like to contribute to it, but doing a good job of getting blur-free images is a little tricky. This is double true in dark, under forest canopy trails. So, we solved it with some technology. Enter, a GoPro camera, a gimbal for an RC copter (drone), and some old bicycle parts:

The gimbal is a two axis gimbal which keeps the roll and pitch constant to 0.01-degrees, so it eliminates shake, and keeps the orientation of the photos consistent. The one we got costs around $180, although now I find them for as low as $130 (not my video):

And then we attached it to a makeshift unicycle:

Some post-processing to match it to our GPS and voila!

http://www.mapillary.com/map/im/rX7zM6I3VdDjSAAPBhPZ-g

Posted in Mapillary | Tagged: | Leave a Comment »

Plugin-free QGIS TMS tiles via GDAL

Posted by smathermather on July 31, 2014

Want to load your favorite tiles into QGIS? How about a plugin-free QGIS TMS tiles via GDAL:

http://www.3liz.com/blog/rldhont/index.php?post/2012/07/17/OpenStreetMap-Tiles-in-QGIS

Really awesome… .

Needs but one change: epsg:900913 should be epsg:3857 or QGIS (GDAL?) throws an error. Presumably you could also define epsg:900913 in some config file, but barring that create an XML file as follows, and load as a raster in QGIS:

<GDAL_WMS>
    <Service name="TMS">
        <ServerUrl>http://maps1.clemetparks.com/tilestache/tilestache.cgi/basemap/${z}/${x}/${y}.jpg</ServerUrl>
    </Service>
    <DataWindow>
        <UpperLeftX>-20037508.34</UpperLeftX>
        <UpperLeftY>20037508.34</UpperLeftY>
        <LowerRightX>20037508.34</LowerRightX>
        <LowerRightY>-20037508.34</LowerRightY>
        <TileLevel>18</TileLevel>
        <TileCountX>1</TileCountX>
        <TileCountY>1</TileCountY>
        <YOrigin>top</YOrigin>
    </DataWindow>
    <Projection>EPSG:3857</Projection>
    <BlockSizeX>256</BlockSizeX>
    <BlockSizeY>256</BlockSizeY>
    <BandsCount>3</BandsCount>
    <Cache />
</GDAL_WMS>

Now I can use my pretty tilestache maps _everywhere!_:
Image of coyote points overlayed on tilestache rendered hillshade map

Screenshot of map from postgis

edit: forgot the hattip for the lead on this, Thomas gratier: @ThomasG77

Posted in Analysis, Database, PostGIS, PostgreSQL, QGIS, SQL, TileStache | Tagged: , , , , | 1 Comment »

Cleaning animal tracking data — throwing away extra points

Posted by smathermather on July 31, 2014

Much the problem of the modern era– too much data, uneven data, and yet, should we keep it all?

Here’s the problem space: attach GPS collar to a coyote, send that data home, and you have a recipe for gleaning a lot of information about the movement of that animal across the landscape. In order to maximize the data collected while also maximizing the battery life of said device, sometimes (according to a preset scheme) you might sample every hour or every 15 minutes, and then switch back to once a day.

When you get the data back in, you get something like this:

Map of raw coyote data-- all points, showing all available collar GPS points

Already, we see some pretty interesting patterns. The first thing that I notice is clusters of activity. Are these clusters related to our uneven sampling, sometimes every 15 minutes, sometimes once or twice a day? Or are those really important clusters in the overall activity of the animal?

The next thing I notice is how perfectly the range of movement of the coyote is bounded by expressways to the west and north, and by the river to the east. Those seem to be pretty impermiable boundaries.

So, as does a man who only has a hammer, we clean the data with PostGIS. In this case, it’s an ideal task. First, metacode:

Task one: union / dissolve our points by day — for some days this will give us a single point, for other days a multi-point for the day.

Task two: find the centroid of our union. This will grab a centroid for each clump of points (a weighted spatial mean, if you will) and return that single point per day. There are some assumptions here that might not bear out under scrutiny, but it’s not a bad first approximation.

Now, code:

CREATE TABLE centroids AS
-- union / dissolve our points by day
WITH clusterp AS (
	SELECT ST_Union(geom) AS geom, gmt_date FROM nc_clipped_data_3734
		GROUP BY gmt_date
),
-- find the centroid of our union
centroids AS (
	SELECT ST_Centroid(geom) AS geom, gmt_date FROM clusterp
)
SELECT * FROM centroids;

Boom! One record per day:

Map showing much smaller home range, one record per day clustered around a single core zone

A different pattern now emerges. We can’t see the hard boundaries of the use area, but now the core portion of the home range can be seen.

Upcoming (guest) blog post on the use of these data in home range calculations in R. Stay tuned.

Map tiles by Stamen Design, under CC BY 3.0. Data by OpenStreetMap, under CC BY SA.

Posted in Analysis, Database, PostGIS, PostgreSQL, SQL | Tagged: , , | 4 Comments »

LiDAR and pointcloud extension pt 6

Posted by smathermather on July 14, 2014

There’s all sorts of reasons why what I’ve been doing so far is wrong (see the previous 5 posts).  More on that later. But in case you needed 3D visualization of the chipping I’ve been working through, look no further:

Isometric 3D visualization of 3D chipping

Posted in 3D, Database, Other, pointcloud, PostGIS, PostgreSQL | Tagged: , , , , , | 1 Comment »

LiDAR and pointcloud extension pt 5

Posted by smathermather on July 11, 2014

Now for the crazy stuff:

Plaid-like overlay of vertical patches

The objective is to allow us to do vertical and horizontal summaries of our data. To do this, we’ll take chipped LiDAR input and further chip it vertically by classifying it.

First a classifier for height that we’ll use to do vertical splits on our point cloud chips:

CREATE OR REPLACE FUNCTION height_classify(numeric)
  RETURNS integer AS
$BODY$
 
SELECT
	CASE WHEN $1 < 10 THEN 1
	     WHEN $1 >= 10 AND $1 < 20 THEN 2
	     WHEN $1 >= 20 AND $1 < 30 THEN 3
	     WHEN $1 >= 30 AND $1 < 40 THEN 4
	     WHEN $1 >= 40 AND $1 < 50 THEN 5
	     WHEN $1 >= 50 AND $1 < 60 THEN 6
	     WHEN $1 >= 60 AND $1 < 70 THEN 7
	     WHEN $1 >= 70 AND $1 < 80 THEN 8
	     WHEN $1 >= 80 AND $1 < 90 THEN 9
	     WHEN $1 >= 90 AND $1 < 100 THEN 10
	     WHEN $1 >= 100 AND $1 < 110 THEN 11
	     WHEN $1 >= 110 AND $1 < 120 THEN 12
	     WHEN $1 >= 120 AND $1 < 130 THEN 13
	     WHEN $1 >= 130 AND $1 < 140 THEN 14
	     WHEN $1 >= 140 AND $1 < 150 THEN 15
	     WHEN $1 >= 150 AND $1 < 160 THEN 16
	     WHEN $1 >= 160 AND $1 < 170 THEN 17	     
	     WHEN $1 >= 170 AND $1 < 180 THEN 18
	     ELSE 0
	END
	FROM test;
 
$BODY$
  LANGUAGE sql VOLATILE
  COST 100;

And now, let’s pull apart our point cloud, calculate heights from approximate ground, then put it all back together in chips grouped by height classes and aggregates of our original chips. Yes, I will explain more later. And don’t do this at home– I’m not even sure if it makes sense yet… :

/*
First we explode the patches into points,
 and along the way grab the minimum value
for the patch, and the id
*/
WITH lraw AS (
    SELECT PC_Explode(pa) AS ex,  PC_PatchMin(pa, 'Z') AS min, id
    FROM test1
    ),
/*
Calculate our height value ( Z - min )
*/
heights AS (
    SELECT PC_Get(ex, 'X') || ',' || PC_Get(ex, 'Y') || ',' -- grab X and Y
    || PC_Get(ex, 'Z') - min                -- calculate height
    || ',' ||
-- And return our remaining dimensions
    PC_Get(ex, 'Intensity') || ',' || PC_Get(ex, 'ReturnNumber') || ',' || PC_Get(ex, 'NumberOfReturns') || ',' ||
    PC_Get(ex, 'ScanDirectionFlag') || ',' || PC_Get(ex, 'EdgeOfFlightLine') || ',' ||
    PC_Get(ex, 'Classification') || ',' || PC_Get(ex, 'ScanAngleRank') || ',' || PC_Get(ex, 'UserData') || ',' ||
    PC_Get(ex, 'PointSourceId') || ',' || PC_Get(ex, 'Time') || ',' || PC_Get(ex, 'PointID') || ',' || PC_Get(ex, 'BlockID')
    AS xyheight, PC_Get(ex, 'Z') - min AS height, id FROM lraw
    ),
-- Now we can turn this aggregated text into an array and then a set of points
heightcloud AS (
    SELECT PC_MakePoint(1, string_to_array(xyheight, ',')::float8[]) pt, height_classify(height) heightclass, id FROM heights
    )
-- Finally, we bin the points back into patches grouped by our heigh class and original ids.
SELECT PC_Patch(pt), id / 20 AS id, heightclass FROM heightcloud GROUP BY id / 20, heightclass;

The resultant output should allow us to query our database by height above ground and patch. Now we can generate vertical summaries of our point cloud. Clever or dumb? It’s too early to tell.

Posted in 3D, Database, Other, pointcloud, PostGIS, PostgreSQL | Tagged: , , , , , | Leave a Comment »

LiDAR and pointcloud extension pt 4

Posted by smathermather on July 10, 2014

Height cloud overlayed with patchs

Now, we navigate into unknown (and perhaps silly) territory. Now we do point level calculations of heights, and drop the new calculated points back into the point cloud as though it were a Z value. I don’t recommend this code as a practice– this is as much as me thinking aloud as anything Caveat emptor:

/*
First we explode the patches into points,
 and along the way grab the minimum value
for the patch, and the id
*/
WITH lraw AS (
	SELECT PC_Explode(pa) AS ex,  PC_PatchMin(pa, 'Z') AS min, id
	FROM test1
	),
/*
Calculate our height value ( Z - min ) 
*/
heights AS (
	SELECT PC_Get(ex, 'X') || ',' || PC_Get(ex, 'Y') || ',' -- grab X and Y
	|| PC_Get(ex, 'Z') - min				-- calculate height
	|| ',' ||
-- And return our remaining dimensions
	PC_Get(ex, 'Intensity') || ',' || PC_Get(ex, 'ReturnNumber') || ',' || PC_Get(ex, 'NumberOfReturns') || ',' ||
	PC_Get(ex, 'ScanDirectionFlag') || ',' || PC_Get(ex, 'EdgeOfFlightLine') || ',' ||
	PC_Get(ex, 'Classification') || ',' || PC_Get(ex, 'ScanAngleRank') || ',' || PC_Get(ex, 'UserData') || ',' ||
	PC_Get(ex, 'PointSourceId') || ',' || PC_Get(ex, 'Time') || ',' || PC_Get(ex, 'PointID') || ',' || PC_Get(ex, 'BlockID')
	AS xyheight, id FROM lraw
	),
-- Now we can turn this aggregated text into an array and then a set of points
heightcloud AS (
	SELECT PC_MakePoint(1, string_to_array(xyheight, ',')::float8[]) pt, id FROM heights
	)
-- Finally, we bin the points back into patches
SELECT PC_Patch(pt) FROM heightcloud GROUP BY id / 20;

Posted in 3D, Database, Other, pointcloud, PostGIS, PostgreSQL | Tagged: , , , , , | Leave a Comment »

 
Follow

Get every new post delivered to your Inbox.

Join 702 other followers