TransWikia.com

Building a summary string in a Pandas groupby (Possibly cross-tab or pivot-table question)

Data Science Asked by interstar on May 23, 2021

I’m a n00b in Python Pandas.

I want to do a particular aggregation / grouping / cross-tab but I know so little of the terminology that I don’t even know how to look this up.

But here’s what I want.

Say I have a table like this

Bob, Oranges,  5
Bob, Apples,  10
Bob, Bananas, 12
Tim, Oranges,  3
Tim, Apples,  20
Tim, Bananas,  5

I want to groupby fruit to find the total for each type of fruit. But produce another field containing a string which has the details sorted by their value in another column.

So I want an output something like this.

Oranges,  8, "Bob(5), Tim(3)"
Apples,  30, "Tim(20), Bob(10)"
Bananas, 17, "Bob(12), Tim(5)"

Where the string aggregates the values from the names column in a list sorted by the associated numeric value.

I know that there isn’t something out of the box to do this, but what is this kind of operation or aggregation or pivoting (where you take one of the columns and turn it to be "horizontal" if that makes any sense) actually called?

How would I go about implementing it in Pandas functions?

2 Answers

Say you have a dataframe of your data in this format:

df = pd.DataFrame({
    'name': ['Bob', 'Bob', 'Bob', 'Tim', 'Tim', 'Tim'],
    'fruit': ['Oranges', 'Apples', 'Bananas', 'Oranges', 'Apples', 'Bananas'],
    'num': [5, 10, 12, 3, 20, 5]
})

You can perform a groupby on fruit and aggregate the sum of the num field. After that you can apply a function on your aggregated dataframe that leverages values from your non-aggregated data like so:

df_agg = df.groupby('fruit').sum()
df_agg.reset_index(inplace=True)

def desc(x):
    d = []
    for idx, row in df.loc[df['fruit']==x].sort_values(['num'], ascending=False).iterrows():
        d.append(f"{row['name']}({row['num']})")
    return ', '.join(d)

df_agg['desc_str'] = df_agg['fruit'].apply(lambda x: desc(x))

This gives you the following df_agg that you are looking for:

    fruit   num desc_str
0   Apples  30  Tim(20), Bob(10)
1   Bananas 17  Bob(12), Tim(5)
2   Oranges 8   Bob(5), Tim(3)

Correct answer by Oliver Foster on May 23, 2021

Does that work for you?

df.groupby(["fruit","name"]).num.sum().unstack().assign(suma = lambda x: x.sum(axis = 1))

Returns:

name    Bob Tim suma
fruit           
Apples  10  20  30
Bananas 12  5   17
Oranges 5   3   8

Answered by Julio Jesus on May 23, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP