Geographic Information Systems Asked by minus34 on September 28, 2021
The following PostGIS query executes in parallel using Postgres 9.6’s new parallel processing feature:
SELECT pnts.mb_code11, bdys.mb_code11
FROM testing.mb_random_points AS pnts
INNER JOIN testing.mb_2011_aust AS bdys ON ST_Contains(bdys.geom, pnts.geom);
Here’s the query plan:
Run the same query as an insert, and it only runs on a single CPU:
INSERT INTO test
SELECT pnts.mb_code11, bdys.mb_code11
FROM testing.mb_random_points AS pnts
INNER JOIN testing.mb_2011_aust AS bdys ON ST_Contains(bdys.geom, pnts.geom);
Here’s it’s query plan:
How can I get the INSERT to run in parallel?
Writing data and reading data is a different story. Modifying data from different parallel resources always needs to be serialized and prepared for changes in source data while writing. It is therefore more complicated and needs to be treated carefully while reading data in parallel is no trouble.
Also see the wiki page for parallel queries.
Even when parallel query is enabled in general, the query planner will never generate a parallel plan if any of the following are true:
- The query writes data. If a query contains a data-modifying operation either at the top level or within a CTE, no parallel plans for that query will be generated. This is a limitation of the current implementation which could be lifted in a future release.
- [...]
Update for PostgreSQL >= 11
The query writes any data or locks any database rows. If a query contains a data-modifying operation either at the top level or within a CTE, no parallel plans for that query will be generated. As an exception, the commands CREATE TABLE ... AS, SELECT INTO, and CREATE MATERIALIZED VIEW which create a new table and populate it can use a parallel plan."
https://postgresql.org/docs/11/when-can-parallel-query-be-used.html
Thanks @chrismarx
Correct answer by Matthias Kuhn on September 28, 2021
It took me a while to figure this out and I want to share the solution so that others can real the benefits. Indeed, @chrismarx mentioned the basis for the answer in the last comment above: CREATE TABLE AS
Here is the process that has worked out very well for me:
EXPLAIN
-- Insert query
SELECT
...
FROM...
;
Do whatever you need to do to get that to trigger parallel workers (doing that is a different subject).
CREATE TEMP TABLE temp_table_name
-- Insert query
SELECT
...
FROM...
;
INSERT INTO destination_table_name
SELECT * FROM temp_table_name
;
-- optionally:
DROP TABLE temp_table_name;
I have been using this on a large data warehousing project with great success.
Answered by Alexi Theodore on September 28, 2021
Get help from others!
Recent Answers
Recent Questions
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP