TransWikia.com

Find timestamp for highest number of concurrent users by date

Database Administrators Asked by Sharvari on November 14, 2021

I have a table including these columns:

Sample data:

concurrent_users  concurrent_timestamp       concurrent_date
3                 2020-06-24 07:20:00 UTC    2020-06-24
7                 2020-06-24 08:20:00 UTC    2020-06-24
8                 2020-07-05 06:20:00 UTC    2020-07-05
2                 2020-07-05 03:20:00 UTC    2020-07-05

I want to find/list the date and timestamp where the concurrent_users is highest for each date.

Expected output:

concurrent_users  concurrent_timestamp       concurrent_date
7                 2020-06-24 08:20:00 UTC    2020-06-24
8                 2020-07-05 06:20:00 UTC    2020-07-05

I tried following approach/query:

Error 1

select concurrent_timestamp, max(concurrent_users)
from sample_data
group by concurrent_date;

But this gives an error as the concurrent_timestamp column is not included in the group by clause.

When I put concurrent_timestamp in group by, I get …

Error 2

select concurrent_timestamp, max(concurrent_users)
from sample_data
group by concurrent_timestamp ;

It does not give the desired output instead just lists more than a single record for each date.

How to solve this?

4 Answers

Just apply the DATE() function in the aggregation of concurrent_timestamp

select
    DATE(concurrent_timestamp) timestamp_date
   ,max(concurrent_users) max_concurrent_users
from sample_data
group by DATE(concurrent_timestamp);

Answered by RolandoMySQLDBA on November 14, 2021

In PostgreSQL you can simply use DISTINCT ON:

SELECT DISTINCT ON (concurrent_date) *
FROM   sample_data
ORDER  BY concurrent_date, concurrent_users DESC;

We don't need the redundant column concurrent_date for this at all. Casting concurrent_timestamp on the fly is very cheap - overall cheaper than storing the functionally dependent value redundantly.

SELECT DISTINCT ON (concurrent_timestamp::date) *
FROM   sample_data
ORDER  BY concurrent_timestamp::date, concurrent_users DESC;

db<>fiddle here

See:

DISTINCT ON is typically fastest for few rows per group. Depending on table definition and data distribution, there may be (much) faster solutions. See:

Answered by Erwin Brandstetter on November 14, 2021

You can do something like this in mysql

CREATE TABLE sample_data (
  `concurrent_users` INTEGER,
  `concurrent_timestamp` VARCHAR(30),
  `concurrent_date` DATE
);
INSERT INTO sample_data
  (`concurrent_users`, `concurrent_timestamp`, `concurrent_date`)
VALUES
  ('3', '2020-06-24 07:20:00 UTC', '2020-06-24'),
  ('7', '2020-06-24 08:20:00 UTC', '2020-06-24'),
  ('8', '2020-07-05 06:20:00 UTC', '2020-07-05'),
  ('2', '2020-07-05 03:20:00 UTC', '2020-07-05');
select concurrent_timestamp, maxid 
from sample_data s INNER JOIN (select concurrent_date, max(concurrent_users) maxid
from sample_data
group by concurrent_date) s1 ON s.concurrent_users = s1.maxid; 
concurrent_timestamp    | maxid
:---------------------- | ----:
2020-06-24 08:20:00 UTC |     7
2020-07-05 06:20:00 UTC |     8

db<>fiddle here

Answered by nbk on November 14, 2021

This is a "groupwise-max" problem. You can't do it with a simple query, you need a subquery. There are many examples off the tag I added. Or you can read my blog: http://mysql.rjweb.org/doc.php/groupwise_max

Answered by Rick James on November 14, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP