Database Administrators Asked by Dmitriy Grankin on September 11, 2020
I join listings_tail
table and materialised view cals_reviewed_wide_scaled
on listing_id
column of type integer.
The table has around 1,5 billion rows and 2 mln unique listing ids. The MV has been tested with a range from 500 to 500 000 rows. Both columns are indexed with btree, hash, and brin.
explain
always returns Seq Scan
despite of the indexes available.
How to optimize the indexing to make them work?
SELECT * from listings_tail L
INNER JOIN cals_reviewed_wide_scaled R
ON L.listing_id = R.listing_id;
d listings_tail
id | bigint | not null default nextval('listings_tail_id_seq'::regclass)
listing_id | integer | not null
parsing_timestamp | timestamp without time zone | not null
value | character varying | not null
homes_details_inside | integer | not null
tags | character varying[] | not null
Indexes:
"listings_tail_pkey" PRIMARY KEY, btree (id)
"listings_tail_dd_unique" UNIQUE CONSTRAINT, btree (listing_id, value, homes_details_inside, tags)
"listing_id_tail" btree (listing_id)
"tail_listing_id_brin" brin (listing_id)
"tail_listing_id_hash" hash (listing_id)
d cals_reviewed_wide
id | integer | not null default nextval('cals_reviewed_wide_id_seq'::regclass)
listing_id | integer | not null
date | timestamp without time zone | not null
price | double precision |
parsing_timestamp | timestamp without time zone | not null
min_nights | double precision |
max_nights | double precision |
available | integer | not null
created_at | timestamp without time zone | not null
revs_id | integer | not null
Indexes:
"cals_reviewed_wide_pkey" PRIMARY KEY, btree (id)
"cals_reviewed_wide_unique" UNIQUE CONSTRAINT, btree (listing_id, date)
"cals_reviewed_wide_revs_id_ixs" btree (revs_id)
"ix_cals_reviewed_wide_listing_id" btree (listing_id)
Foreign-key constraints:
"cals_reviewed_wide_revs_id_fkey" FOREIGN KEY (revs_id) REFERENCES reviews(id)
d random_cals_reviewed_wide_scaled
View "public.random_cals_reviewed_wide_scaled"
Column | Type | Modifiers
---------+--------+-----------
revs_id | bigint |
d cals_reviewed_wide_scaled
Materialized view "public.cals_reviewed_wide_scaled"
-------------------+-----------------------------+-----------
listing_id | integer |
date | timestamp without time zone |
price | double precision |
parsing_timestamp | timestamp without time zone |
min_nights | double precision |
max_nights | double precision |
created_at | timestamp without time zone |
revs_id | integer |
Indexes:
"cals_reviewed_wide_scaled_listing_id_brin" brin (listing_id)
"cals_reviewed_wide_scaled_listing_id_hash" hash (listing_id)
"cals_reviewed_wide_scaled_listing_id_ixs" btree (listing_id)
CREATE MATERIALIZED VIEW cals_reviewed_wide_scaled
as
SELECT
C.listing_id,
C.date,
C.price,
C.parsing_timestamp,
C.min_nights,
C.max_nights,
C.created_at,
C.revs_id
FROM cals_reviewed_wide as C
JOIN random_cals_reviewed_wide_scaled as V
ON C.revs_id = V.revs_id;
EXPLAIN (ANALYZE, BUFFERS)
:
Both tables are not small, and you have no WHERE
condition that restricts the number of rows in either table.
work_mem
is big enough to contain the hash of cals_reviewed_wide_scaled
, so the most efficient plan is a hash join.
With a hash join, no index can be used, and a sequential scan is the most efficient access method.
You can try if a nested loop join might be faster.
For that, make sure that there is an index on listings_tail.listing_id
, then try the following to nudge PostgreSQL in the right direction:
Try increasing effective_cache_size
to the amount of RAM you have got.
Try lowering random_page_cost
to 1.0
.
If all fails, experiment with setting the following in your database session:
SET enable_hashjoin = off;
SET enable_mergejoin = off;
Then try again, and you should see a nested loop join. That way you can test if that is actually faster and PostgreSQL is wrong.
Answered by Laurenz Albe on September 11, 2020
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP