Data Science Asked by Remus Raphael on December 21, 2020
I have a pandas dataframe df1
:
Now, I want to filter the rows in df1
based on unique combinations of (Campaign, Merchant)
from another dataframe, df2
, which look like this:
What I tried is using .isin
, with a code similar to the one below:
df1.loc[df1['Campaign'].isin(df2['Campaign']) &
df1['Merchant'].isin(df2['Merchant'])]
The problem here is that the conditions are independent eg : I want to check if (A,1)
from df2
is in df1
, but with the above condition, since I am checking all the list, not row by row, it would return all rows in df1
where Campaign
column is A
OR Merchant
column is 1
.
Do you have any suggestion for this multiple pandas filtering?
import pandas as pd
df1 = pd.DataFrame({"Random numbers 1": pd.np.random.randn(6),
"Campaign": ["A"] * 5 + ["B"],
"Merchant": [1, 1, 1, 2, 3, 1]})
df2 = pd.DataFrame({"Random numbers 2": pd.np.random.randn(6),
"Campaign": ["A"] * 2 + ["B"] * 2 + ["C"] * 2,
"Merchant": [1, 2, 1, 2, 1, 2]})
columns_consider = ["Campaign", "Merchant"]
combined = pd.concat((df1[columns_consider].drop_duplicates(),
df2[columns_consider].drop_duplicates()), ignore_index=True)
identical = combined[combined.duplicated()]
print(identical)
Output:
Campaign Merchant
4 A 1
5 A 2
6 B 1
Answered by tuomastik on December 21, 2020
Bit late but my preferred solution to this is
# verbetim from @tuomastik
import pandas as pd
df1 = pd.DataFrame({"Random numbers 1": pd.np.random.randn(6),
"Campaign": ["A"] * 5 + ["B"],
"Merchant": [1, 1, 1, 2, 3, 1]})
df2 = pd.DataFrame({"Random numbers 2": pd.np.random.randn(6),
"Campaign": ["A"] * 2 + ["B"] * 2 + ["C"] * 2,
"Merchant": [1, 2, 1, 2, 1, 2]})
# modification
def pair_columns(df, col1, col2):
return df[col1] + df[col2]
def paired_mask(df1, df2, col1, col2):
return pair_columns(df1, col1, col2).isin(pair_columns(df2, col1, col2))
identical = df1.loc[paired_mask(df1, df2, "Campaign", "Merchant")]
Answered by user91338 on December 21, 2020
The way I always go about it is by creating a lookup column:
df1['lookup'] = df1['Campaign'] + "_" + df1['Merchant'].astype(str)
df2['lookup'] = df2['Campaign'] + "_" + df2['Merchant'].astype(str)
Then use loc to filter and drop the lookup columns:
df1.loc[df1['lookup'].isin(df2['lookup'])]
df1.drop(columns='lookup', inplace=True)
I'm still looking for a better solution.
Answered by Jason on December 21, 2020
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP