<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:media="http://search.yahoo.com/mrss/"
		>
<channel>
	<title>Comments on: PostGIS Gripe—Limits to Postgre&#8217;s B-tree indexing</title>
	<atom:link href="http://smathermather.wordpress.com/2008/11/27/postgis-gripe%e2%80%94limits-to-postgres-b-tree-indexing/feed/" rel="self" type="application/rss+xml" />
	<link>http://smathermather.wordpress.com/2008/11/27/postgis-gripe%e2%80%94limits-to-postgres-b-tree-indexing/</link>
	<description>Remote Sensing, GIS, Ecology, and Oddball Techniques</description>
	<lastBuildDate>Mon, 08 Dec 2008 20:34:40 +0000</lastBuildDate>
	<generator>http://wordpress.com/</generator>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
		<item>
		<title>By: smathermather</title>
		<link>http://smathermather.wordpress.com/2008/11/27/postgis-gripe%e2%80%94limits-to-postgres-b-tree-indexing/#comment-19</link>
		<dc:creator>smathermather</dc:creator>
		<pubDate>Fri, 05 Dec 2008 04:29:49 +0000</pubDate>
		<guid isPermaLink="false">http://smathermather.wordpress.com/?p=119#comment-19</guid>
		<description>I had to do some reading in Wikipedia to make sure I understood this response, but I think it will work, just so long as there aren&#039;t hash collisions (a term which I&#039;d never heard before but now understand...  shouldn&#039;t confess to such Nube-ness).

So, my interpretation of Abe&#039;s comment is that an alternative approach is to create that unique index on a proxy for the geometry field, without having to index something as long as the geometry field.  So the idea is to turn the geometry field into a hash representation to compress it, hope there aren&#039;t any false duplicates when I do inserts later (hash collisions), and index and constrain that.   Brilliant approach, and a new tool in my toolbox.  So, maybe something like this:

So first binary straight from the database...

ST_AsBinary(the_geom)

then encode to text:

encode()

and then convert this to an MD5 hash

MD5()

for a complete command like this:

MD5(encode(ST_AsBinary(the_geom)))

resulting in something like this:
900150983cd24fb0 d6963f7d28e17f72

Which is much shorter than the original values, and might be index-able.

To be tested ASAP (next week most likely...).</description>
		<content:encoded><![CDATA[<p>I had to do some reading in Wikipedia to make sure I understood this response, but I think it will work, just so long as there aren&#8217;t hash collisions (a term which I&#8217;d never heard before but now understand&#8230;  shouldn&#8217;t confess to such Nube-ness).</p>
<p>So, my interpretation of Abe&#8217;s comment is that an alternative approach is to create that unique index on a proxy for the geometry field, without having to index something as long as the geometry field.  So the idea is to turn the geometry field into a hash representation to compress it, hope there aren&#8217;t any false duplicates when I do inserts later (hash collisions), and index and constrain that.   Brilliant approach, and a new tool in my toolbox.  So, maybe something like this:</p>
<p>So first binary straight from the database&#8230;</p>
<p>ST_AsBinary(the_geom)</p>
<p>then encode to text:</p>
<p>encode()</p>
<p>and then convert this to an MD5 hash</p>
<p>MD5()</p>
<p>for a complete command like this:</p>
<p>MD5(encode(ST_AsBinary(the_geom)))</p>
<p>resulting in something like this:<br />
900150983cd24fb0 d6963f7d28e17f72</p>
<p>Which is much shorter than the original values, and might be index-able.</p>
<p>To be tested ASAP (next week most likely&#8230;).</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Abe</title>
		<link>http://smathermather.wordpress.com/2008/11/27/postgis-gripe%e2%80%94limits-to-postgres-b-tree-indexing/#comment-18</link>
		<dc:creator>Abe</dc:creator>
		<pubDate>Fri, 05 Dec 2008 00:18:16 +0000</pubDate>
		<guid isPermaLink="false">http://smathermather.wordpress.com/?p=119#comment-18</guid>
		<description>Why not hash the geometry value and place that into another column, say &quot;the_geom_hash&quot;.  Then put a unique constraint on the_geom_hash.  Now create an insert trigger that does the hashing and sets the value of the_geom_hash column.

I realize it&#039;s possible to have hash collisions, but they should generally be unique.  If your data can tolerate this very slim possibility of error then it should work fine.</description>
		<content:encoded><![CDATA[<p>Why not hash the geometry value and place that into another column, say &#8220;the_geom_hash&#8221;.  Then put a unique constraint on the_geom_hash.  Now create an insert trigger that does the hashing and sets the value of the_geom_hash column.</p>
<p>I realize it&#8217;s possible to have hash collisions, but they should generally be unique.  If your data can tolerate this very slim possibility of error then it should work fine.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Regina</title>
		<link>http://smathermather.wordpress.com/2008/11/27/postgis-gripe%e2%80%94limits-to-postgres-b-tree-indexing/#comment-17</link>
		<dc:creator>Regina</dc:creator>
		<pubDate>Mon, 01 Dec 2008 20:00:40 +0000</pubDate>
		<guid isPermaLink="false">http://smathermather.wordpress.com/?p=119#comment-17</guid>
		<description>Hmm I suppose that would work.   Though may not work for large geometries.    Not sure about the floating point case.  You could also try this trick I mentioned - which would also use a spatial index so might perform better.  Though admittedly I haven&#039;t tried it.  Also check out other people&#039;s suggestions.

http://postgis.refractions.net/pipermail/postgis-users/2008-November/021891.html

This is not something I run into often so haven&#039;t really stress tested any of these tricks as far as geometries go.

hope that helps,
Regina</description>
		<content:encoded><![CDATA[<p>Hmm I suppose that would work.   Though may not work for large geometries.    Not sure about the floating point case.  You could also try this trick I mentioned &#8211; which would also use a spatial index so might perform better.  Though admittedly I haven&#8217;t tried it.  Also check out other people&#8217;s suggestions.</p>
<p><a href="http://postgis.refractions.net/pipermail/postgis-users/2008-November/021891.html" rel="nofollow">http://postgis.refractions.net/pipermail/postgis-users/2008-November/021891.html</a></p>
<p>This is not something I run into often so haven&#8217;t really stress tested any of these tricks as far as geometries go.</p>
<p>hope that helps,<br />
Regina</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: smathermather</title>
		<link>http://smathermather.wordpress.com/2008/11/27/postgis-gripe%e2%80%94limits-to-postgres-b-tree-indexing/#comment-16</link>
		<dc:creator>smathermather</dc:creator>
		<pubDate>Sun, 30 Nov 2008 23:59:58 +0000</pubDate>
		<guid isPermaLink="false">http://smathermather.wordpress.com/?p=119#comment-16</guid>
		<description>So, ignorant question, why wouldn&#039;t this work (other than my table would be too large to index):
&lt;b&gt; alter table base.contours add constraint unique_geom unique ST_AsEWKB(the_geom);&lt;/b&gt;</description>
		<content:encoded><![CDATA[<p>So, ignorant question, why wouldn&#8217;t this work (other than my table would be too large to index):<br />
<b> alter table base.contours add constraint unique_geom unique ST_AsEWKB(the_geom);</b></p>
]]></content:encoded>
	</item>
	<item>
		<title>By: smathermather</title>
		<link>http://smathermather.wordpress.com/2008/11/27/postgis-gripe%e2%80%94limits-to-postgres-b-tree-indexing/#comment-15</link>
		<dc:creator>smathermather</dc:creator>
		<pubDate>Sun, 30 Nov 2008 14:31:53 +0000</pubDate>
		<guid isPermaLink="false">http://smathermather.wordpress.com/?p=119#comment-15</guid>
		<description>This is very helpful, thank you.  I can&#039;t offhand think of any case where bbox instead of a full comparison would get me in trouble, but it&#039;d always be out there lurking as a possibility, and there&#039;s nothing worse than a subtle lurking problem.

Now a nagging question about floating point values:  if the comparison is done on well known binary, do we have to worry about comparison of floating point values?  What I mean is, can we be certain that each time a float value is represented, it will be represented identically to the last time, and thus we can detect when two values are equal?  How does PostGIS handle this?</description>
		<content:encoded><![CDATA[<p>This is very helpful, thank you.  I can&#8217;t offhand think of any case where bbox instead of a full comparison would get me in trouble, but it&#8217;d always be out there lurking as a possibility, and there&#8217;s nothing worse than a subtle lurking problem.</p>
<p>Now a nagging question about floating point values:  if the comparison is done on well known binary, do we have to worry about comparison of floating point values?  What I mean is, can we be certain that each time a float value is represented, it will be represented identically to the last time, and thus we can detect when two values are equal?  How does PostGIS handle this?</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Regina</title>
		<link>http://smathermather.wordpress.com/2008/11/27/postgis-gripe%e2%80%94limits-to-postgres-b-tree-indexing/#comment-14</link>
		<dc:creator>Regina</dc:creator>
		<pubDate>Sat, 29 Nov 2008 05:10:07 +0000</pubDate>
		<guid isPermaLink="false">http://smathermather.wordpress.com/?p=119#comment-14</guid>
		<description>Actually using DISTINCT the_geom would use the same bbox =  logic that Vincent mentioned, so that won&#039;t work either.

If you want to go the DISTINCT route, you&#039;ll need to do a  
INSERT INTO newtable(the_geom)
SELECT DISTINCT ST_AsEWKB(the_geom)
FROM oldtable

that will force it into binary mode for the distinct thus preventing a bbox =  thingy, but on insert into the table it will be cast back to a geometry.</description>
		<content:encoded><![CDATA[<p>Actually using DISTINCT the_geom would use the same bbox =  logic that Vincent mentioned, so that won&#8217;t work either.</p>
<p>If you want to go the DISTINCT route, you&#8217;ll need to do a<br />
INSERT INTO newtable(the_geom)<br />
SELECT DISTINCT ST_AsEWKB(the_geom)<br />
FROM oldtable</p>
<p>that will force it into binary mode for the distinct thus preventing a bbox =  thingy, but on insert into the table it will be cast back to a geometry.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Vincent</title>
		<link>http://smathermather.wordpress.com/2008/11/27/postgis-gripe%e2%80%94limits-to-postgres-b-tree-indexing/#comment-13</link>
		<dc:creator>Vincent</dc:creator>
		<pubDate>Thu, 27 Nov 2008 18:21:06 +0000</pubDate>
		<guid isPermaLink="false">http://smathermather.wordpress.com/?p=119#comment-13</guid>
		<description>Hi,
Be careful that when you declare a unique constraint on the_geom, you are actually not eliminating exact geometry duplicates, but geometries with the same bounding box.
As a matter of fact, it uses the = operator, which in postgis is defined on geometries as a bbox comparison.
bye,
Vincent</description>
		<content:encoded><![CDATA[<p>Hi,<br />
Be careful that when you declare a unique constraint on the_geom, you are actually not eliminating exact geometry duplicates, but geometries with the same bounding box.<br />
As a matter of fact, it uses the = operator, which in postgis is defined on geometries as a bbox comparison.<br />
bye,<br />
Vincent</p>
]]></content:encoded>
	</item>
</channel>
</rss>
