Data Science Asked by Jsmoka on June 21, 2021
I have this DataFrame:
CL1 CL2 CL3 CL4
0 a a b f
1 b y c d
2 c x d s
3 x s x a
4 s dx s s
5 a c d d
6 s dx f d
7 d dc g g
8 f x s t
9 c x a d
10 x y y a
11 c a x y
12 f s d s
13 d d w a
Intention:
1- I wanna search and find the similar values between all columns (CL1-CL4) and sort in a new column (SIM).
2- I wanna find the non-similar values between columns and sort in another column (NON-SIM).
How can i do that? With df.pivot_table
i was not successful.
Given your input data is saved in a variable df
, I count the values which occur in all 4 unique columns as follows:
import pandas as pd
import numpy as np
output = (
df
.melt()
.drop_duplicates()
.groupby("value")
.agg(count=("value", "count"))
.reset_index()
)
output["SIM"] = np.where(output["count"] == 4, "SIM", "NON-SIM")
output = output.pivot(columns="similarity", values="value")
print(output)
similarity NON-SIM SIM
0 NaN a
1 b NaN
2 c NaN
3 NaN d
4 dc NaN
5 dx NaN
6 f NaN
7 g NaN
8 NaN s
9 t NaN
10 w NaN
11 x NaN
12 y NaN
Correct answer by Oxbowerce on June 21, 2021
This may be simpler using sets. Assuming your dataframe is df
, first get each column's unique values as a set:
import pandas as pd
from functools import reduce
# df = pd.read_clipboard()
cols = df.agg(set)
print(cols)
This gives a pandas Series
of python set
objects:
CL1 {c, s, d, a, x, f, b}
CL2 {c, dc, y, s, d, a, x, dx}
CL3 {c, s, y, d, a, x, g, f, w, b}
CL4 {s, y, d, a, g, t, f}
dtype: object
Then you can combine these using set
operations to get your desired results:
sim = reduce(set.intersection, cols)
non_sim = reduce(set.union, cols) - sim
print(sim, non_sim, sep='n')
This gives the results as two set
objects:
{'d', 'a', 's'}
{'c', 'dc', 'y', 'x', 'g', 't', 'f', 'dx', 'w', 'b'}
Answered by GeoMatt22 on June 21, 2021
Get help from others!
Recent Answers
Recent Questions
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP