TransWikia.com

Limiting join to top 1 row for each row

Database Administrators Asked by dstr on January 25, 2021

I’m trying to join two tables and filter join results but I couldn’t manage it and I’d appreciate any help.

I have these row sets:

Table A
id | userid | targetid | start_date
-----------------------------------
11 | user1  | 123      | 22/10/2019
22 | user1  | 123      | 02/10/2019
33 | user1  | 123      | 04/10/2019
44 | user1  | 456      | 02/10/2019
55 | user1  | 123      | 13/11/2020

Table B
id | targetid | start_date
---------------------------
66 | 123      | 21/10/2019
77 | 456      | 11/11/2020
88 | 123      | 11/11/2020
99 | 123      | 12/11/2020

What I’m trying to do is to find most recent Table B row for each Table A row using targetid as FK and using start_date as filter/order values. Here’s the result I’m looking for:

11 | user1  | 123      | 66
22 | user1  | 123      | 66
33 | user1  | 123      | 66
44 | user1  | 456      | 77
55 | user1  | 123      | 99

I tried to inner join and limit join condition using on ... and a.start_date <= b.start_date but that made Table A | 55 join with each row in Table B.

This could be any type of SQL, join/cursor/loop the method and performance doesn’t matter.

SQL Fiddle

2 Answers

So there are two ways to do this:

  1. If you have keys that guarantee that combinations of (targetId,start_date) are unique, a correlated subquery will return consistent results
  2. If you cannot guarantee uniqueness, you will have to use a partitioning function in order to guarantee the same result is returned each time the query is executed

If a unique index exists:

SELECT
  a.id
 ,a.userid
 ,a.targetid
 ,b.id AS b_id
 /* And other columns from Table B */
FROM
  table_a a
LEFT JOIN
  table_b b
    ON b.targetid = a.targetid
        AND b.start_date =
          (
            SELECT
              MAX(start_date)
            FROM
              table_b
            WHERE
              targetid = a.targetid
                AND start_date <= a.start_date
          )

No unique index exists:

For this I'm assuming Id on Table B is a row identifier and unique. If it isn't, you will need to add additional columns to the SORT portion of the PARTITION statement to guarantee the query is deterministic.

SELECT
  id
 ,userid
 ,targetid
 ,b_id
 /* Other columns from table b */
FROM
  (
    SELECT
      a.id
     ,a.userid
     ,a.targetid
     ,b.id AS b_id
     /* Other columns from Table B */
     ,ROW_NUMBER()
        OVER
          (
            PARTITION BY
              a.id
            ORDER BY
              b.start_date DESC
             ,b.id DESC
          ) AS rownum
    FROM
      table_a a
    LEFT JOIN
      table_b b
        ON b.targetid = a.targetid
            AND b.start_date <= a.start_date
  ) x
WHERE
  rownum = 1

Notes:

As I mentioned in my earlier comment, the desired results you provided are inconsistent with the requirement tableb.start_date <= tablea.start_date, specifically for Ids 22, 33, 44.

Answered by bbaird on January 25, 2021

WITH CTE AS (
SELECT a.id, a.userid, a.targetid, b.id AS b_id,
    ROW_NUMBER() OVER (PARTITION BY  a.id, a.userid, a.targetid ORDER BY b.start_date) AS rn      
FROM a
JOIN b ON a.targetid = b.targetid AND b.start_date >= a.start_date
)
SELECT  id, userid, targetid, b_id
FROM CTE WHERE rn = 1

Answered by NikitaSerbskiy on January 25, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP