Database Administrators Asked by Rainer Yuan on October 28, 2021
I’m running an experiment with the query
Select Distinct table_1.id1, table_2.id1
FROM image as table_1, image as table_2
WHERE table_1.id2 = table_2.id2
LIMIT K;
When I run the query without limit, it terminates after 8 hours. However, the query does not terminate for Limit 80000.
I’m running this on cloudLab with 100GB Ram and Below is the image of my.cnf and query plan. I’m not sure what is the bottle neck for my query. How should I solve this problem
Query Plan text:
| -> Limit: 80000 row(s)
-> Table scan on <temporary>
-> Temporary table with deduplication
-> Limit table size: 80000 unique row(s)
-> Inner hash join (table_2.id2 = table_1.id2) (cost=48913990425076.15 rows=48913988916980)
-> Table scan on table_2 (cost=0.01 rows=22116507)
-> Hash
-> Table scan on table_1 (cost=2224194.70 rows=22116507)
Your comma between two table means that you make a cross join and then select only the fitting rows.
do a proper JOIN with ON Clause, like
Select Distinct t1.id1, t2.id1
FROM image as t1 INNER JOIN image as t2
ON t1.id2 = t2.id2
LIMIT K;
Also have an index on id2 in the table image
Answered by nbk on October 28, 2021
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP