Stack Overflow Asked by sudhishkr on January 25, 2021
I am mucking around with airflow, which is backed by PostgresDB. I am trying to run an audit and create an aggregation report from 2 SQL tables. I am thoroughly confused with joins
and subqueries
as I re-learn.
Details around my problem:
Table1
select dag_id, owners from aaf.public.dag order by dag_id;
sample result =>
dag_id | owners
-------------------------------
aa_example_hello_world owner1
aa_example_sud_test owner2
Table2
select dag_id, state from aaf.public.dag_run;
sample result =>
dag_id | state
-------------------------------
aa_example_hello_world success
aa_example_hello_world failed
aa_example_hello_world running
aa_example_sud_test failed
aa_example_hello_world success
aa_example_sud_test failed
aa_example_hello_world failed
What I want to achieve =>
dag_id | owners | run_percentage_success
------------------------------------------------------------
aa_example_hello_world owner1 40 #->(which is 2success/5total from Table2 * 100)
What I have tried so far =>
1st attempt (trying to see if i can get a summary for individual dag_id
)
select
(select cast(COUNT(id) as FLOAT) from aaf.public.dag_run where dag_id = 'aa_example_hello_world' and state = 'success' group by dag_id order by dag_id) /
(select cast(COUNT(id) as FLOAT) from aaf.public.dag_run where dag_id = 'aa_example_hello_world' group by dag_id order by dag_id)*100
2nd attempt (trying to generalize attempt1 for all dag_id)
select
(select cast(COUNT(id) as FLOAT) from aaf.public.dag_run where dag_id in (select dag_id from aaf.public.dag order by dag_id) and state = 'failed' group by dag_id order by dag_id) /
(select cast(COUNT(id) as FLOAT) from aaf.public.dag_run where dag_id in (select dag_id from aaf.public.dag order by dag_id) group by dag_id order by dag_id)*100
^ this fails because it cannot divide columnar data
3rd attempt
select a.dag_id, a.owners, cast(count(b.dag_id) as Float) from aaf.public.dag as a, aaf.public.dag_run as b where b.state = 'success' and b.dag_id = a.dag_id group by a.dag_id;
^ BUT I am not able to compute the `divisions` for my expected result
You can join and aggregate: avg()
comes handy to compute the success rate:
select d.dag_id, d.owners, avg( (dr.status = 'success')::int ) avg_success
from aaf.public.dag d
inner join aaf.public.dag_run dr
on dr.dag_id = d.dag_id
group by d.dag_id
order by d.dag_id;
This gives you a decimal value between 0
and 1
for the success rate - you can multiply that by `100‘ if you want a percentage.
Correct answer by GMB on January 25, 2021
Use conditional aggregation (i.e. a CASE WHEN
expression inside the aggregation function):
select
dag_id,
d.owners,
count(case when dr.state = 'success' then 1 end)::float / count(*)::float * 100.0
from aaf.public.dag_run dr
join aaf.public.dag d using(dag_id)
group by dag_id
order by dag_id;
Answered by Thorsten Kettner on January 25, 2021
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP