Stack Overflow Asked by yponde on December 30, 2021
I have df1 which contains a set of particular IDs as a column and df2 which contains a mix of IDs in each row (Figure shown below). I want to create a data frame which contains all the different combinations of IDs in df1 present in each row of df2 and get a count of all the different combinations.
df1=pd.DataFrame({'Id':["181","456","235","653","987","5","300"]})
df2=pd.DataFrame({'Tag Id':["213,435,181,954,987","456","215,435,181,754,987","213,12,432,300,653,987"})
Here is a faster approach using list comprehensions and itertools -
import itertools
#Get vocab of items
vocab = list(df1['Id'].astype(int))
#get filtered list of combinations in each row of df2
filtered = [[int(j) for j in i.split(',') if int(j) in vocab] for i in list(df2['Tag Id'])]
#Get counts of the combinations and display as a dataframe
counts = list(zip(*np.unique(filtered, return_counts=True)))
pd.DataFrame(counts, columns=['Combinations', 'Counts'])
Combinations Counts
0 [181, 987] 2
1 [300, 653, 987] 1
2 [456] 1
Answered by Akshay Sehgal on December 30, 2021
Let's try explode
to separate the Tag Ids
in df1
, then merge
with df1
and count:
s = (df2['Tag Id'].str.split(',')
.explode()
.reset_index()
)
(df1.merge(s, left_on='Id', right_on='Tag Id')
.sort_values('Tag Id')
.groupby('index')
.agg(Combination=('Id',','.join))
['Combination']
.value_counts().reset_index()
)
Output:
index Combination
0 181,987 2
1 653,987,300 1
2 456 1
Answered by Quang Hoang on December 30, 2021
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP