TransWikia.com

SQL create aggregated result from 2 queries

Stack Overflow Asked by sudhishkr on January 25, 2021

I am mucking around with airflow, which is backed by PostgresDB. I am trying to run an audit and create an aggregation report from 2 SQL tables. I am thoroughly confused with joins and subqueries as I re-learn.

Details around my problem:

Table1

select dag_id, owners from aaf.public.dag order by dag_id;

sample result =>

dag_id                  | owners
-------------------------------
aa_example_hello_world     owner1
aa_example_sud_test        owner2

Table2

select dag_id, state from aaf.public.dag_run;

sample result =>
dag_id                  | state
-------------------------------
aa_example_hello_world    success
aa_example_hello_world    failed
aa_example_hello_world    running
aa_example_sud_test       failed
aa_example_hello_world    success
aa_example_sud_test       failed
aa_example_hello_world    failed

What I want to achieve =>

dag_id                  | owners    |    run_percentage_success
------------------------------------------------------------
aa_example_hello_world    owner1          40 #->(which is 2success/5total from Table2 * 100)

What I have tried so far =>

1st attempt (trying to see if i can get a summary for individual dag_id)

select
    (select cast(COUNT(id) as FLOAT) from aaf.public.dag_run where dag_id = 'aa_example_hello_world' and state = 'success' group by dag_id order by dag_id) /
    (select cast(COUNT(id) as FLOAT) from aaf.public.dag_run where dag_id = 'aa_example_hello_world' group by dag_id order by dag_id)*100

2nd attempt (trying to generalize attempt1 for all dag_id)

select
    (select cast(COUNT(id) as FLOAT) from aaf.public.dag_run where dag_id in (select dag_id from aaf.public.dag order by dag_id) and state = 'failed' group by dag_id order by dag_id) /
    (select cast(COUNT(id) as FLOAT) from aaf.public.dag_run where dag_id in (select dag_id from aaf.public.dag order by dag_id) group by dag_id order by dag_id)*100

^ this fails because it cannot divide columnar data

3rd attempt

select a.dag_id, a.owners, cast(count(b.dag_id) as Float)  from aaf.public.dag as a, aaf.public.dag_run as b where b.state = 'success' and b.dag_id = a.dag_id group by a.dag_id;

^ BUT I am not able to compute the `divisions` for my expected result

2 Answers

You can join and aggregate: avg() comes handy to compute the success rate:

select d.dag_id, d.owners, avg( (dr.status = 'success')::int ) avg_success
from aaf.public.dag d
inner join aaf.public.dag_run dr
    on dr.dag_id = d.dag_id
group by d.dag_id
order by d.dag_id;

This gives you a decimal value between 0 and 1 for the success rate - you can multiply that by `100‘ if you want a percentage.

Correct answer by GMB on January 25, 2021

Use conditional aggregation (i.e. a CASE WHENexpression inside the aggregation function):

select
  dag_id,
  d.owners,
  count(case when dr.state = 'success' then 1 end)::float / count(*)::float * 100.0
from aaf.public.dag_run dr
join aaf.public.dag d using(dag_id)
group by dag_id
order by dag_id;

Answered by Thorsten Kettner on January 25, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP