Data Science Asked by interstar on May 23, 2021
I’m a n00b in Python Pandas.
I want to do a particular aggregation / grouping / cross-tab but I know so little of the terminology that I don’t even know how to look this up.
But here’s what I want.
Say I have a table like this
Bob, Oranges, 5
Bob, Apples, 10
Bob, Bananas, 12
Tim, Oranges, 3
Tim, Apples, 20
Tim, Bananas, 5
I want to groupby fruit to find the total for each type of fruit. But produce another field containing a string which has the details sorted by their value in another column.
So I want an output something like this.
Oranges, 8, "Bob(5), Tim(3)"
Apples, 30, "Tim(20), Bob(10)"
Bananas, 17, "Bob(12), Tim(5)"
Where the string aggregates the values from the names column in a list sorted by the associated numeric value.
I know that there isn’t something out of the box to do this, but what is this kind of operation or aggregation or pivoting (where you take one of the columns and turn it to be "horizontal" if that makes any sense) actually called?
How would I go about implementing it in Pandas functions?
Say you have a dataframe of your data in this format:
df = pd.DataFrame({
'name': ['Bob', 'Bob', 'Bob', 'Tim', 'Tim', 'Tim'],
'fruit': ['Oranges', 'Apples', 'Bananas', 'Oranges', 'Apples', 'Bananas'],
'num': [5, 10, 12, 3, 20, 5]
})
You can perform a groupby on fruit and aggregate the sum of the num field. After that you can apply a function on your aggregated dataframe that leverages values from your non-aggregated data like so:
df_agg = df.groupby('fruit').sum()
df_agg.reset_index(inplace=True)
def desc(x):
d = []
for idx, row in df.loc[df['fruit']==x].sort_values(['num'], ascending=False).iterrows():
d.append(f"{row['name']}({row['num']})")
return ', '.join(d)
df_agg['desc_str'] = df_agg['fruit'].apply(lambda x: desc(x))
This gives you the following df_agg that you are looking for:
fruit num desc_str
0 Apples 30 Tim(20), Bob(10)
1 Bananas 17 Bob(12), Tim(5)
2 Oranges 8 Bob(5), Tim(3)
Correct answer by Oliver Foster on May 23, 2021
Does that work for you?
df.groupby(["fruit","name"]).num.sum().unstack().assign(suma = lambda x: x.sum(axis = 1))
Returns:
name Bob Tim suma
fruit
Apples 10 20 30
Bananas 12 5 17
Oranges 5 3 8
Answered by Julio Jesus on May 23, 2021
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP