Geographic Information Systems Asked on November 9, 2021
My ultimate goal is to fetch all points from a 4 million record table that fit within a given rectangle and group the results in clusters. All of the queries I have tried up until now take too long. They take upwards of 15 seconds. We are shooting for a few hundred milliseconds max.
I obtained the fastest result from the following approach…
Created a second column named “snapped_geometry” that is the location_point snapped to a postGIS grid.
set snapped_geometry = ST_SnapToGrid(location_point, 0.2);
Indexed this snapped_geometry column using GIST
create index on tablename using GIST (snapped_geometry)
Ran this query…
explain (analyze) select count(snapped_geometry) as count, snapped_geometry from contacts_80 where st_contains(st_MakeEnvelope(-95, 30.5, -80, 45, 4326), snapped_geometry)
group by snapped_geometry
Some things I learned from researching the terms in this explain response…
1. The sort information is related to the “group by” clause.
2. The heap scan is related to the “where” clause.
3. Limited work_mem is not the reason for our query taking so long. Initially, there was not enough work_mem to execute the sort in working memory. As a result, the sort spilled to disk. See here. I increased work_mem with set work_mem = '800MB'
. This fixed the issue as confirmed by the line “Sort Method: quicksort” in the Explain response.
4. The bitmap heap scan was not lossy. We were initially concerned that our query was lossy because a row in the Explain response displays a “recheck condition”. I later learned that this line is in all explain responses even when the bitmap heap scan does not need to recheck the index conditions (i.e. even when the scan is not lossy). The scan only rechecks the condition when the scan is lossy. The absence of the word “lossy” in line 9 of the explain response indicates the scan was not lossy. See here.
Which leaves me still curious how I can speed up this query.
Am I using ST_SnapToGrid incorrectly?
Is there an error in how I created and used the GIST index?
Is it impossible to speed up this query?
I have also experimented with PostGIS’s kmeans, clusterDBScan and clusterWithin with no speed advantages.
Other links I used to learn about heap scans and sorting methods…
Did you used "ANALYZE contacts_80;" after your indexes so your planner could correctly plan? It seems to plan 4000 lines, and have 4000000 in the end...
Also, ST_Within is usually used instead of ST_Contains, there is some minor differences.
What I would try is:
ANALYZE tablename;
explain analyze
select count(snapped_geometry) as count, snapped_geometry
from contacts_80
where ST_Within(snapped_geometry, st_MakeEnvelope(-95, 30.5, -80, 45, 4326))
group by snapped_geometry;
Note the inversion of the parameters for ST_Within.
Also, the geometry comparison seems really long in your group by, so maybe you can try to use st_geohash instead of the geometry as a key to group?
Answered by robin loche on November 9, 2021
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP