TransWikia.com

Nearest Neighbor problem in Postgis 2.0 using GIST Index ( function)

Geographic Information Systems Asked by Alexandre Neto on April 5, 2021

I’m trying to use Postgis 2.0 new function <-> (Geometry Distance Centroid) in order to calculate, for each row of my table (cosn1), the distance to the nearest polygon of the same class.

I was trying to use the following code:

WITH index_query AS (
  SELECT g1.gid As ref_gid, ST_Distance(g1.the_geom,g2.the_geom) As ENN    
    FROM "cosn1" As g1, "cosn1" As g2   
    WHERE g1.gid <> g2.gid AND g1.class = g2.class
    ORDER BY g1.gid, g1.the_geom <-> g2.the_geom) 
SELECT DISTINCT ON (ref_gid) ref_gid, ENN 
    FROM index_query
ORDER BY ref_gid, ENN;

But then I realize the warning:

Note: Index only kicks in if one of the geometries is a constant (not in a subquery/cte). e.g. ‘SRID=3005;POINT(1011102 450541)’::geometry instead of a.geom

Meaning that the Index wont be used at all, and the query will take almost the same time as before using:

SELECT DISTINCT ON(g1.gid)  g1.gid As ref_gid, ST_Distance(g1.the_geom,g2.the_geom) As ENN    
    FROM "cosn1" As g1, "cosn1" As g2   
    WHERE g1.gid <> g2.gid AND g1.class = g2.class
    ORDER BY g1.gid, ST_Distance(g1.the_geom,g2.the_geom)

Can anyone point me a workaround that allows me to improve performance of my query?

Thank you very much.

One Answer

Doing some tests on my machine suggested this operator <-> is not working properly. I am not sure that is a bug but it reported zero distance on not overlapped geometries.

I tried the fair traditional SQL query optimizations. Since those unexpected results with <-> operator I replace it with st_centroid. Got much better results in speed.

Hope semantics with st_overlaps keep same. At least this was I understood from documentation about <->

From docs on Postigs <->

For other geometry types the distance between the floating point bounding box centroids is returned.

On my test data with ~5.5k polygons got speed up from ~1000 seconds to ~5 seconds without spatial indexing.

I see some people using DISTINCT ON to do grouping but not the group by exists to eliminate duplicates.

Your query with standard SQL optimizations without the st_centroid error introduced

select g1.gid, min( st_distance( g1.the_geom, g2.the_geom ) ) AS enn
FROM 
  "cosn1" AS g1, "cosn1" AS g2
WHERE
  g1.gid <> g2.gid
  AND g1.class = g2.class
  AND g1.the_geom && g2.the_geom
GROUP BY
  g1.gid

Answered by cavila on April 5, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP