Database Administrators Asked by ATMdev on January 2, 2022
In SQL 2017, I have a ‘food’ table as follows:
ID item Ingredients
---- ----- -----------
1 item1 flour,sugar,cocoa,butter
2 item2 flour,sugar,butter,water
3 item3 sugar,cocoa,water
4 item4 sugar,cocoa,butter
5 item5 flour,water
Although the "Ingredients" are a CSV in a single field, I have no problem moving them to a related table:
ID Ingredient
--- ----------
1 flour
1 sugar
1 cocoa
1 butter
2 flour
etc
or just select them with a CROSS APPLY string_split
select ID, item, ingredient
from food
CROSS APPLY string_split(food.ingredients,',') as ing
Now here is what I am trying to attempt:
For any given row ex:(where id = 1)
I want to return the top n rows (other than id = 1
) that have the most common ingredients with id = 1
.
So the top 2 with the most common ingredients with id = 1
should be:
ID item Ingredients
---- ----- -----------
2 item2 flour,sugar,butter,water ( 3 in common)
3 item3 sugar,cocoa,water ( 2 in common)
4 item4 sugar,cocoa,butter ( 3 in common)
5 item5 flour,water ( 1 in common)
The result should return rows with id = 2
and id = 4
as both have 3 ingredients in common with row id = 1
This is my first question on StackExchange, and I hope it is clear.
I can write this as a C# or Python batch program to create a table of ‘closest ingredients’, but I’m hoping I can do this in SQL.
Thanks !
Please store the ingredients in a table not a list. So let's call that ItemIngredient
with two columns ItemId
and Ingredient
and leave it at that. The primary key will be (ItemId, Ingredient)
. Because searches by Ingredient
will probably happen a lot, let's put a non-unique index on Ingredient
(which is necessary for below to work the most efficiently).
SELECT
ItemId
,MatchItemId
,IngredientMatchCount
,MatchRank
FROM
(
SELECT
Base.ItemId
,Match.ItemId AS MatchItemId
,COUNT(*) AS IngredientMatchCount
,RANK() OVER (PARTITION BY Base.ItemId ORDER BY COUNT(*) DESC, Match.ItemId) AS MatchRank
FROM
ItemIngredient Base
INNER JOIN
ItemIngredient Match
ON Match.Ingredient = Base.Ingredient
AND Match.ItemId <> Base.ItemId
-- Put your WHERE clause here if you want to only fine matches for certain Ids
GROUP BY
Base.ItemId
,Match.ItemId
) matched
WHERE
MatchRank <= 1 --Or whatever
ORDER BY
ItemId
You can experiment with RANK
vs DENSE_RANK
at some point to see what provides the best results for you. If you need details about the Item
you can join back in the outer query.
Answered by bbaird on January 2, 2022
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP