Geographic Information Systems Asked on June 30, 2021
We have two virtual servers on the same hardware (identical specs, 6 core Intel Xeon E5-2643), one is running Postgres 9.6 with PostGIS 2.4, and the other is running Postgres 12.4 with PostGIS 3.0. Both are using GEOS.
We are using a custom PostGIS function which can be found here: https://github.com/iboates/ST_RayCast
Using a simple dataset, we can compute the result instantly on the 9.6/2.4 server, but it takes 8 seconds or more on the 12.4/3.0 server.
You can test it out using this data:
CREATE TABLE public.canyon_points (
id integer NOT NULL,
geom public.geometry(Point,23947)
);
INSERT INTO public.canyon_points (id, geom) VALUES (1, '01010000208B5D00007AE68ED83ED111412DDA68187C884141');
INSERT INTO public.canyon_points (id, geom) VALUES (2, '01010000208B5D0000D7D006A6E5D1114171D906BB81884141');
INSERT INTO public.canyon_points (id, geom) VALUES (3, '01010000208B5D0000371DD89F68D211419981959D85884141');
INSERT INTO public.canyon_points (id, geom) VALUES (4, '01010000208B5D00002CC0B22166D21141C656944C96884141');
INSERT INTO public.canyon_points (id, geom) VALUES (5, '01010000208B5D0000C2C09A9F61D2114109D4646EAB884141');
INSERT INTO public.canyon_points (id, geom) VALUES (6, '01010000208B5D0000E1BFBE6268D21141DB50C7F1C3884141');
INSERT INTO public.canyon_points (id, geom) VALUES (7, '01010000208B5D00002CC0B22166D21141A34D4BBDDC884141');
INSERT INTO public.canyon_points (id, geom) VALUES (8, '01010000208B5D00000DC18E5E5FD21141594A1219F6884141');
INSERT INTO public.canyon_points (id, geom) VALUES (9, '01010000208B5D0000EDC16A9B58D2114151C7EE7B0D894141');
INSERT INTO public.canyon_points (id, geom) VALUES (10, '01010000208B5D0000D1B63243AED2114195C68C1E13894141');
INSERT INTO public.canyon_points (id, geom) VALUES (11, '01010000208B5D0000E3A0B65157D31141D746A22511894141');
ALTER TABLE ONLY public.canyon_points
ADD CONSTRAINT canyon_points_pkey PRIMARY KEY (id);
CREATE TABLE public.canyon_building (
id integer NOT NULL,
geom public.geometry(Polygon,23947),
height double precision
);
INSERT INTO public.canyon_building (id, geom, height) VALUES (1, '01030000208B5D0000010000000500000093B51104F2D01141088E0BF70A894141FB185D700AD21141695CE44F0B89414111FFE9FD0FD21141E3F796FEE9884141A99B9E91F7D011416931FA61EB88414193B51104F2D01141088E0BF70A894141', 7);
INSERT INTO public.canyon_building (id, geom, height) VALUES (2, '01030000208B5D00000100000005000000A99B9E91F7D011416931FA61EB884141A99B9E91F7D01141B1DC2489CD884141FB185D700AD21141CAD4E8CCCB88414111FFE9FD0FD21141E3F796FEE9884141A99B9E91F7D011416931FA61EB884141', 6);
INSERT INTO public.canyon_building (id, geom, height) VALUES (3, '01030000208B5D00000100000005000000A99B9E91F7D01141B1DC2489CD88414193B51104F2D0114136EB9DFEAE88414175B40F1FE9D11141ED1489E9AC884141FB185D700AD21141CAD4E8CCCB884141A99B9E91F7D01141B1DC2489CD884141', 8);
INSERT INTO public.canyon_building (id, geom, height) VALUES (4, '01030000208B5D000001000000050000007CCF8476ECD011411FDA26838988414193B51104F2D0114136EB9DFEAE88414175B40F1FE9D11141ED1489E9AC884141535BBCCAE0D11141115529068E8841417CCF8476ECD011411FDA268389884141', 8);
INSERT INTO public.canyon_building (id, geom, height) VALUES (5, '01030000208B5D00000100000005000000DCC186AFC1D21141CB2ABDA80B894141709B175BA5D3114145F159450A8941418681A4E8AAD311418A8A4DB6F3884141098EA0CACCD21141D36062CBF5884141DCC186AFC1D21141CB2ABDA80B894141', 4);
INSERT INTO public.canyon_building (id, geom, height) VALUES (6, '01030000208B5D00000100000005000000098EA0CACCD21141D36062CBF5884141E7344D76C4D21141A01A0668B9884141385C377997D3114101E9DEC0B98841418681A4E8AAD311418A8A4DB6F3884141098EA0CACCD21141D36062CBF5884141', 5);
INSERT INTO public.canyon_building (id, geom, height) VALUES (7, '01030000208B5D00000100000005000000E7344D76C4D21141A01A0668B9884141F2A7133DC7D21141E8C5308F9B884141DFC3034381D3114101BEF4D299884141385C377997D3114101E9DEC0B9884141E7344D76C4D21141A01A0668B9884141', 8);
ALTER TABLE ONLY public.canyon_building
ADD CONSTRAINT canyon_building_pkey PRIMARY KEY (id);
CREATE OR REPLACE FUNCTION ST_RayCast(
in_point GEOMETRY,
in_boundaries GEOMETRY,
out_geom_type TEXT,
num_rays INTEGER,
max_ray_dist FLOAT
)
RETURNS GEOMETRY AS
$$
DECLARE
adj FLOAT;
opp FLOAT;
theta FLOAT = 0;
srid INTEGER = ST_SRID(in_point);
candidate_geom GEOMETRY;
return_geom GEOMETRY;
BEGIN
/* Do preliminary checks for common problems */
IF ST_SRID(in_boundaries) != srid THEN
RAISE EXCEPTION 'SRID of input points (%) does not match input boundaries SRID (%)', ST_SRID(in_point), ST_SRID(in_boundaries);
END IF;
IF out_geom_type NOT IN ('MULTIPOINT', 'MULTILINESTRING') THEN
RAISE EXCEPTION 'Output geometry type (''%'') must be one of (''%'', ''%'')', out_geom_type, 'MULTIPOINT', 'MULTILINESTRING';
END IF;
/* Cast rays over the specified angle window */
WHILE theta < 2*pi() LOOP
candidate_geom = (
/* Make a CTE for the casted ray endpoint so we only have to query it once */
WITH
ray
AS MATERIALIZED (
SELECT
/* Make a ray */
ST_MakeLine(
ST_Transform(
ST_Project(
/* PostGIS only allows projecting points in geographical CRS, so we have to do some transforming here */
ST_Transform(
in_point,
4326
)::geography,
max_ray_dist,
theta
)::geometry,
srid
),
in_point
) AS geom
)
SELECT
/* Intersect this ray with the input boundaries, ignore empty results (no ray intersection) */
ST_Intersection(
ST_MakeLine(
ray.geom,
in_point
),
in_boundaries
) AS geom
FROM
ray
WHERE
NOT ST_IsEmpty(
ST_Intersection(
ST_MakeLine(
ray.geom,
in_point
),
in_boundaries
)
)
);
/* In the case of multiple ray intersections, take the closest one */
IF ST_NumGeometries(candidate_geom) > 1 THEN
candidate_geom = (
SELECT
dp.geom
FROM
ST_DumpPoints(candidate_geom) AS dp
ORDER BY
ST_Distance(
in_point,
dp.geom
) ASC
LIMIT 1
);
END IF;
/* Either prep the point for return or make a line out of it for return depending on user input */
IF out_geom_type = 'MULTIPOINT' THEN
return_geom = ST_Collect(ST_CollectionExtract(return_geom, 1), candidate_geom);
ELSIF out_geom_type = 'MULTILINESTRING' THEN
return_geom = ST_Collect(
ST_CollectionExtract(
return_geom,
2
),
ST_MakeLine(in_point, candidate_geom)
);
END IF;
theta = theta + 2*pi() / num_rays;
END LOOP;
/* Return all the points or lines created */
IF out_geom_type = 'MULTIPOINT' THEN
RETURN ST_Multi(ST_CollectionExtract(return_geom, 1));
ELSIF out_geom_type = 'MULTILINESTRING' THEN
RETURN ST_Multi(ST_CollectionExtract(return_geom, 2));
END IF;
END
$$
LANGUAGE plpgsql;
COMMENT ON FUNCTION ST_RayCast(
GEOMETRY,
GEOMETRY,
TEXT,
INTEGER,
FLOAT
) IS 'Created by Isaac Boates. Use of this software is at the user''s own risk, and no responsibility is claimed by the creator in the event of damages, whether tangible or financial caused directly or indirectly by the use of this software.';
with buildings as (
select height, geom as geom from canyon_building
),
street_points as (
select
s.id as id,
s.geom as geom
from
canyon_points as s
)
select
ST_RayCast(
sp.geom,
ST_CollectionExtract(ST_Collect(ST_ExteriorRing(b.geom)), 2),
out_geom_type := 'MULTIPOINT',
num_rays := 16,
max_ray_dist := 50
) as geom
from
street_points as sp,
buildings as b
where
ST_DWithin(sp.geom, b.geom, 50)
group by
sp.geom
As I said, in Postgres 9.6 / PostGIS 2.4, it executes almost instantaneously. In Postgres 12.4 / PostGIS 3.0, it takes 8 seconds or more.
I am actually the author of this Raycast function and I was able to run it on my home computer with much lower specs (back then on 9.6), and it was also fast.
I have verified that there is no VACCUM or other process happening, and the CPU usage stays close to 0 (until we execute the query)
Does anybody have any idea what is going on?
If it helps, here is the result of EXPLAIN ANALYZE
(can be visualized at https://tatiyants.com/pev):
9.6/2.4:
[
{
"Plan": {
"Node Type": "Aggregate",
"Strategy": "Sorted",
"Partial Mode": "Simple",
"Parallel Aware": false,
"Startup Cost": 28419.14,
"Total Cost": 28891.15,
"Plan Rows": 200,
"Plan Width": 64,
"Actual Startup Time": 404.149,
"Actual Total Time": 4241.192,
"Actual Rows": 11,
"Actual Loops": 1,
"Group Key": ["s.geom"],
"Plans": [
{
"Node Type": "Sort",
"Parent Relationship": "Outer",
"Parallel Aware": false,
"Startup Cost": 28419.14,
"Total Cost": 28422.73,
"Plan Rows": 1435,
"Plan Width": 64,
"Actual Startup Time": 0.391,
"Actual Total Time": 0.426,
"Actual Rows": 77,
"Actual Loops": 1,
"Sort Key": ["s.geom"],
"Sort Method": "quicksort",
"Sort Space Used": 45,
"Sort Space Type": "Memory",
"Plans": [
{
"Node Type": "Nested Loop",
"Parent Relationship": "Outer",
"Parallel Aware": false,
"Join Type": "Inner",
"Startup Cost": 25.88,
"Total Cost": 28343.90,
"Plan Rows": 1435,
"Plan Width": 64,
"Actual Startup Time": 0.220,
"Actual Total Time": 0.297,
"Actual Rows": 77,
"Actual Loops": 1,
"Inner Unique": false,
"Plans": [
{
"Node Type": "Nested Loop",
"Parent Relationship": "Outer",
"Parallel Aware": false,
"Join Type": "Inner",
"Startup Cost": 25.88,
"Total Cost": 28308.50,
"Plan Rows": 1,
"Plan Width": 32,
"Actual Startup Time": 0.205,
"Actual Total Time": 0.233,
"Actual Rows": 7,
"Actual Loops": 1,
"Inner Unique": false,
"Join Filter": "st_dwithin((st_union(s_1.geom)), canyon_building.geom, '50'::double precision)",
"Rows Removed by Join Filter": 0,
"Plans": [
{
"Node Type": "Aggregate",
"Strategy": "Plain",
"Partial Mode": "Simple",
"Parent Relationship": "Outer",
"Parallel Aware": false,
"Startup Cost": 25.88,
"Total Cost": 25.89,
"Plan Rows": 1,
"Plan Width": 32,
"Actual Startup Time": 0.185,
"Actual Total Time": 0.186,
"Actual Rows": 1,
"Actual Loops": 1,
"Plans": [
{
"Node Type": "Seq Scan",
"Parent Relationship": "Outer",
"Parallel Aware": false,
"Relation Name": "canyon_points",
"Alias": "s_1",
"Startup Cost": 0.00,
"Total Cost": 22.70,
"Plan Rows": 1270,
"Plan Width": 32,
"Actual Startup Time": 0.009,
"Actual Total Time": 0.012,
"Actual Rows": 11,
"Actual Loops": 1
}
]
},
{
"Node Type": "Seq Scan",
"Parent Relationship": "Inner",
"Parallel Aware": false,
"Relation Name": "canyon_building",
"Alias": "canyon_building",
"Startup Cost": 0.00,
"Total Cost": 21.30,
"Plan Rows": 1130,
"Plan Width": 32,
"Actual Startup Time": 0.006,
"Actual Total Time": 0.007,
"Actual Rows": 7,
"Actual Loops": 1
}
]
},
{
"Node Type": "Seq Scan",
"Parent Relationship": "Inner",
"Parallel Aware": false,
"Relation Name": "canyon_points",
"Alias": "s",
"Startup Cost": 0.00,
"Total Cost": 22.70,
"Plan Rows": 1270,
"Plan Width": 32,
"Actual Startup Time": 0.003,
"Actual Total Time": 0.005,
"Actual Rows": 11,
"Actual Loops": 7
}
]
}
]
}
]
},
"Planning Time": 0.261,
"Triggers": [
],
"Execution Time": 4241.271
}
]
12.4/3.0:
[
{
"Plan": {
"Node Type": "Aggregate",
"Strategy": "Sorted",
"Partial Mode": "Simple",
"Parallel Aware": false,
"Startup Cost": 435.72,
"Total Cost": 435.99,
"Plan Rows": 1,
"Plan Width": 64,
"Actual Startup Time": 7.349,
"Actual Total Time": 57.570,
"Actual Rows": 11,
"Actual Loops": 1,
"Group Key": ["sp.geom"],
"Plans": [
{
"Node Type": "Seq Scan",
"Parent Relationship": "InitPlan",
"Subplan Name": "CTE buildings",
"Parallel Aware": false,
"Relation Name": "canyon_building",
"Alias": "canyon_building",
"Startup Cost": 0.00,
"Total Cost": 21.30,
"Plan Rows": 1130,
"Plan Width": 40,
"Actual Startup Time": 0.006,
"Actual Total Time": 0.009,
"Actual Rows": 7,
"Actual Loops": 1
},
{
"Node Type": "Seq Scan",
"Parent Relationship": "InitPlan",
"Subplan Name": "CTE street_points",
"Parallel Aware": false,
"Relation Name": "canyon_points",
"Alias": "s",
"Startup Cost": 0.00,
"Total Cost": 22.70,
"Plan Rows": 1270,
"Plan Width": 36,
"Actual Startup Time": 0.002,
"Actual Total Time": 0.004,
"Actual Rows": 11,
"Actual Loops": 1
},
{
"Node Type": "Aggregate",
"Strategy": "Plain",
"Partial Mode": "Simple",
"Parent Relationship": "InitPlan",
"Subplan Name": "CTE street_points_union",
"Parallel Aware": false,
"Startup Cost": 25.88,
"Total Cost": 25.89,
"Plan Rows": 1,
"Plan Width": 32,
"Actual Startup Time": 0.216,
"Actual Total Time": 0.217,
"Actual Rows": 1,
"Actual Loops": 1,
"Plans": [
{
"Node Type": "Seq Scan",
"Parent Relationship": "Outer",
"Parallel Aware": false,
"Relation Name": "canyon_points",
"Alias": "s_1",
"Startup Cost": 0.00,
"Total Cost": 22.70,
"Plan Rows": 1270,
"Plan Width": 32,
"Actual Startup Time": 0.011,
"Actual Total Time": 0.014,
"Actual Rows": 11,
"Actual Loops": 1
}
]
},
{
"Node Type": "Sort",
"Parent Relationship": "Outer",
"Parallel Aware": false,
"Startup Cost": 365.83,
"Total Cost": 365.83,
"Plan Rows": 1,
"Plan Width": 64,
"Actual Startup Time": 0.416,
"Actual Total Time": 0.436,
"Actual Rows": 77,
"Actual Loops": 1,
"Sort Key": ["sp.geom"],
"Sort Method": "quicksort",
"Sort Space Used": 45,
"Sort Space Type": "Memory",
"Plans": [
{
"Node Type": "Nested Loop",
"Parent Relationship": "Outer",
"Parallel Aware": false,
"Join Type": "Inner",
"Startup Cost": 0.00,
"Total Cost": 365.82,
"Plan Rows": 1,
"Plan Width": 64,
"Actual Startup Time": 0.261,
"Actual Total Time": 0.319,
"Actual Rows": 77,
"Actual Loops": 1,
"Plans": [
{
"Node Type": "Nested Loop",
"Parent Relationship": "Outer",
"Parallel Aware": false,
"Join Type": "Inner",
"Startup Cost": 0.00,
"Total Cost": 327.72,
"Plan Rows": 1,
"Plan Width": 32,
"Actual Startup Time": 0.258,
"Actual Total Time": 0.296,
"Actual Rows": 7,
"Actual Loops": 1,
"Join Filter": "((spu.geom && st_expand(b.geom, '50'::double precision)) AND (b.geom && st_expand(spu.geom, '50'::double precision)) AND _st_dwithin(spu.geom, b.geom, '50'::double precision))",
"Rows Removed by Join Filter": 0,
"Plans": [
{
"Node Type": "CTE Scan",
"Parent Relationship": "Outer",
"Parallel Aware": false,
"CTE Name": "street_points_union",
"Alias": "spu",
"Startup Cost": 0.00,
"Total Cost": 0.02,
"Plan Rows": 1,
"Plan Width": 32,
"Actual Startup Time": 0.218,
"Actual Total Time": 0.219,
"Actual Rows": 1,
"Actual Loops": 1
},
{
"Node Type": "CTE Scan",
"Parent Relationship": "Inner",
"Parallel Aware": false,
"CTE Name": "buildings",
"Alias": "b",
"Startup Cost": 0.00,
"Total Cost": 22.60,
"Plan Rows": 1130,
"Plan Width": 32,
"Actual Startup Time": 0.006,
"Actual Total Time": 0.012,
"Actual Rows": 7,
"Actual Loops": 1
}
]
},
{
"Node Type": "CTE Scan",
"Parent Relationship": "Inner",
"Parallel Aware": false,
"CTE Name": "street_points",
"Alias": "sp",
"Startup Cost": 0.00,
"Total Cost": 25.40,
"Plan Rows": 1270,
"Plan Width": 32,
"Actual Startup Time": 0.000,
"Actual Total Time": 0.002,
"Actual Rows": 11,
"Actual Loops": 7
}
]
}
]
}
]
},
"Planning Time": 0.249,
"Triggers": [
],
"Execution Time": 57.667
}
]
I can't see your explains apparently, but one of the major changes that slow a lot of my queries in term of performance is the CTE optimization.
In postgres 12, the CTE is now, if possible, treated as a subquery, and not as a temporary table. You should try to change WITH ray AS (
to WITH ray AS MATERIALIZED (
to be sure that the behaviour is similar as the old version (compute the CTE first), otherwise the planner may want to compute only on demand during the rest of the query.
You should also do that to the query that you use for your tests.
EDIT
I reproduced the example and I don't have any issue with it (postgresql 12.3 / 3.0):
Maybe a problem with the install, or the parameters of postgres?
EDIT2
I forgot that I also have an old postgres (10.10/2.4) running, so I tested this on it too and still no problem (150ms)
EDIT3
I don't really understand what's goin on, but as I said, on a big server with comfortable parameter for postgres there is no performance issue, no matter the version. So I tried to look with default postgres (I just ran some dockers) with differents versions of postgres and postgis and here is what I got:
My guess is that there is a link to the parallelization (lot of things parallelized in 10) and the memory bottlenecks that are easy to come by when you manipulate relatively big data (geom) in parallel. The defaults parameters of postgres are really too much conservatives for postgis know that a lot of things are parallelized and the typical computer have 8 thread at its disposal. You can look at this answer that I made where I list the parameters (more explanation for the parameters here) that I usually change (some needs to restart the server).
EDIT4
I checked to set all the parameters that I had set for the big database that worked on all the postgis docker, It didn't make any difference.
So I checked the libs in the differents versions, here is what I have:
My guess is that there is something with the GEOS version (I don't think proj have something to do with this problem). I could be wrong and have missed something in the parameters. I don't have time to test to change the GEOS lib on the 12 docker, but I think it is worth to check.
EDIT5
I've seen that ther was a docker postgis with postgresql 10 and postgis 2.5, so I tested with it
PostGIS_full_version: POSTGIS="2.5.5" [EXTENSION] PGSQL="100" GEOS="3.7.1-CAPI-1.11.1 27a5e771" PROJ="Rel. 4.9.3, 15 August 2016" GDAL="GDAL 2.1.2, released 2016/10/24" LIBXML="2.9.4" LIBJSON="0.12.1" LIBPROTOBUF="1.2.1" TOPOLOGY RASTER
To launch the docker: docker run --name test_postgis_10_st_25 -e POSTGRES_PASSWORD=mysecretpassword --shm-size=40g --memory="100g" --memory-swap="150g" --memory-swappiness=0 --ulimit memlock=81920000000:81920000000 -d postgis/postgis:10-2.5 -c shared_buffers='25GB' -c effective_cache_size='70GB' -c work_mem='256MB' -c maintenance_work_mem='5GB' -c autovacuum_work_mem='5GB' -c max_connections='1000' -c effective_io_concurrency='200' -c max_worker_processes='24' -c max_parallel_workers='24' -c max_parallel_workers_per_gather='12' -c wal_buffers='16MB' -c min_wal_size='1GB' -c max_wal_size='2GB'
It didn't have the bug (it works in 150ms). Same request with docker postgis/postgis:10-3.0-alpine
finished in 10s
I really think that there is a problem with GEOS 3.8.1 for this request
Answered by robin loche on June 30, 2021
Get help from others!
Recent Answers
Recent Questions
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP