How to use SQL PARTITION BY GROUPS?

Question

I'm working with PostgreSQL 12, but the question is standard SQL.
I have a table like this:
| timestamp                | raw_value |
| ------------------------ | --------- |
| 2015-06-27T03:52:50.000Z | 0         |
| 2015-06-27T03:53:00.000Z | 0         |
| 2015-06-27T03:53:10.000Z | 1         |
| 2015-06-27T03:53:20.000Z | 1         |
| 2015-06-27T04:22:20.000Z | 1         |
| 2015-06-27T04:22:30.000Z | 0         |
| 2015-06-27T05:33:40.000Z | 1         |
| 2015-06-27T05:33:50.000Z | 1         |

I need to get the first and last timestamp of each group with raw_value = 1, i.e. needed result :
| start_time               | end_time                 |
| ------------------------ | ------------------------ |
| 2015-06-27T03:53:10.000Z | 2015-06-27T04:22:20.000Z |
| 2015-06-27T05:33:40.000Z | 2015-06-27T05:33:50.000Z |

My best effort so far looks like this:
SELECT timestamp, raw_value, row_number() over w as rn, first_value(obt) OVER w AS start_time, last_value(obt) OVER w AS end_time
FROM mytable
WINDOW w AS (PARTITION BY raw_value ORDER BY timestamp GROUPS CURRENT ROW )
ORDER BY timestamp;

Google doesn't have much info about it, but according to the docs the "GROUPS" clause is exactly what I need, but the end result is wrong, because window functions simply copy value from the timestamp column:
| timestamp                | raw_value | rn  | start_time               | end_time                 |
| ------------------------ | --------- | --- | ------------------------ | ------------------------ |
| 2015-06-27T03:52:50.000Z | 0         | 1   | 2015-06-27T03:52:50.000Z | 2015-06-27T03:52:50.000Z |
| 2015-06-27T03:53:00.000Z | 0         | 2   | 2015-06-27T03:53:00.000Z | 2015-06-27T03:53:00.000Z |
| 2015-06-27T03:53:10.000Z | 1         | 1   | 2015-06-27T03:53:10.000Z | 2015-06-27T03:53:10.000Z |
| 2015-06-27T03:53:20.000Z | 1         | 2   | 2015-06-27T03:53:20.000Z | 2015-06-27T03:53:20.000Z |
| 2015-06-27T04:22:20.000Z | 1         | 3   | 2015-06-27T04:22:20.000Z | 2015-06-27T04:22:20.000Z |
| 2015-06-27T04:22:30.000Z | 0         | 3   | 2015-06-27T04:22:30.000Z | 2015-06-27T04:22:30.000Z |
| 2015-06-27T05:33:40.000Z | 1         | 4   | 2015-06-27T05:33:40.000Z | 2015-06-27T05:33:40.000Z |
| 2015-06-27T05:33:50.000Z | 1         | 5   | 2015-06-27T05:33:50.000Z | 2015-06-27T05:33:50.000Z |

At line#6 I'd expect the row number to reset to 1, but it doesn't! I tried using BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING as well without luck.
I have created a DB Fiddle link for your convenience as well.
If there is any other way to achieve the same result in SQL (ok to be PG-specific) without window functions, I'd like to know.

Tomáš Záluský · Accepted Answer

Identify groups using row_number() - sum() trick, then choose min and max time for each identified group.
with grp as (
  select obt, raw_value
       , row_number() over w - sum(raw_value) over w as g
  from tm_series
  window w as (order by obt)
)
select min(obt), max(obt)
from grp
where raw_value = 1
group by g;

DB fiddle here.
(The GROUPS clause depends on window ordering and seems to have nothing common with your problem.)

Mike Organek · Answer

Your updated fiddle here.
For an gaps and islands approach, first mark your transitions from raw_value = 0 to raw_value = 1
with mark_changes as (
  select obt, raw_value,
         case
           when raw_value = 0 then 0
           when raw_value = lag(raw_value) over (order by obt) then 0
           else 1
         end as transition
    from tm_series
),

Keep only the raw_value = 1 rows, and sum() the preceding transition markers to place each row into a group.
id_groups as (
  select obt, raw_value, 
         sum(transition) over (order by obt) as grp_num
    from mark_changes
   where raw_value = 1
)

Use group by on these grp_num values to get your desired result.
select min(obt) as start_time, 
       max(obt) as end_time
  from id_groups
 group by grp_num
 order by min(obt);

How to use SQL PARTITION BY GROUPS?

2 Answers

Add your own answers!

Ask a Question