Database Administrators Asked by James Randall on January 2, 2022
I have been iterating on an idea presented by Rob Conery in his excellent post to generate monthly reports using PostgreSQL Views.
My version required the View to be refactored into a Function as I needed to utilize input parameters. I recently received a request to add filtering so that specific products and locations could be searched as well, but I found myself executing this function N number of times, which caused a significant performance bottleneck. I figured having these conditions in one query would alleviate these performance issues.
I made a bit of progress after following the (very well-written) answers to some questions here, but I am still stuck wrapping my head around how to generate WHERE
clauses for each input array element.
Essentially my desired "output" SQL would look something like this:
select sum(total) as total_activity,
count(1) as sales_event_count,
created_at::date as sales_event_date,
date_part('year',created_at at time zone 'hst') as year,
date_part('quarter',created_at at time zone 'hst') as quarter,
date_part('month',created_at at time zone 'hst') as month,
date_part('day',created_at at time zone 'hst') as day
from locations loc
left outer join sales_events se ON loc.id = se.location_id
left outer join junction_products jp ON jp.sales_event_id = se.id
left outer join products p ON p.id = jp.product_id
where (p.sku = '12345' and p.manufacturer = 'CompanyA' and location_id = 'LocationA') or
(p.sku = '09876' and p.manufacturer = 'CompanyA' and location_id = 'LocationA') or
(p.sku = '10293' and p.manufacturer = 'CompanyB' and location_id = 'LocationA')
group by se.created_at
order by se.created_at
Here are some example pages that I have explored to help tackle this problem:
After picking and choosing from each of these, I have come up with the following:
create type product_type as(sku character varying(100), manufacturer character varying(200))
create or replace function find_sales_location_activity(
_products_arr product_type[],
_location_id bigint
)
returns table (total_activity bigint, sales_event_count bigint, sales_event_date date, "year" double precision, quarter double precision, "month" double precision, "day" double precision) as
$func$
select sum(total) as total_activity,
count(1) as sales_event_count,
created_at::date as sales_event_date,
date_part('year',created_at at time zone 'hst') as year,
date_part('quarter',created_at at time zone 'hst') as quarter,
date_part('month',created_at at time zone 'hst') as month,
date_part('day',created_at at time zone 'hst') as day
from locations loc
left outer join sales_events se ON loc.id = se.location_id
left outer join junction_products jp ON jp.sales_event_id = se.id
left outer join products p ON p.id = jp.product_id
where (p.sku = $1[1][1] and p.manufacturer = $1[1][2] and location_id = $2) or
(p.sku = $1[2][1] and p.manufacturer = $1[2][2] and location_id = $2) or
(p.sku = $1[3][1] and p.manufacturer = $1[3][2] and location_id = $2)
group by se.created_at
order by se.created_at
$func$
language sql;
…but obviously this isn’t looping over anything. I have experimented with replacing the FROM locations loc
clause with FROM generate_subscripts($1, 1)
and attempting to loop through that way, but replacing the table name causes my left outer join
‘s to fail.
Clearly I’m a bit out of my depths here, but I’d really, really appreciate it if anyone could lead me in the right direction. Thanks in advance!
The parameters in the array may be transformed into a table
to be be joined against the rest. This transformation is
done with the unnest
function and the WHERE conditions can then be expressed as
JOIN clauses.
This could also be done by joining with generate_subscripts($1, 1)
but that leads to more complicated syntax for no apparent benefit.
select ...<same as your query>...
from locations loc
left outer join sales_events se ON loc.id = se.location_id
left outer join junction_products jp ON jp.sales_event_id = se.id
left outer join products p ON p.id = jp.product_id
join unnest($1) params ON (p.sku=params.sku AND p.manufacturer=params.manufacturer)
WHERE location_id = $2
group by se.created_at
order by se.created_at
Note: this query leaves the left outer join
as they are in the original query because the question is not about them, but it seems they should be inner joins instead.
The reason is that WHERE clauses like p.sku = '12345' and p.manufacturer = 'CompanyA'
imply that these columns in p cannot be NULL, but that negates the reason why you'd use an outer join versus an inner join, which is having NULL columns added for non-matching rows from each table at the right side of each left join.
In short, if you're not sure why this query uses left outer join
, consider replacing them with join
.
Answered by Daniel Vérité on January 2, 2022
Get help from others!
Recent Answers
Recent Questions
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP