Building a summary string in a Pandas groupby (Possibly cross-tab or pivot-table question)

Question

I'm a n00b in Python Pandas.
I want to do a particular aggregation / grouping / cross-tab but I know so little of the terminology that I don't even know how to look this up.
But here's what I want.
Say I have a table like this
Bob, Oranges,  5
Bob, Apples,  10
Bob, Bananas, 12
Tim, Oranges,  3
Tim, Apples,  20
Tim, Bananas,  5

I want to groupby fruit to find the total for each type of fruit. But produce another field containing a string which has the details sorted by their value in another column.
So I want an output something like this.
Oranges,  8, "Bob(5), Tim(3)"
Apples,  30, "Tim(20), Bob(10)"
Bananas, 17, "Bob(12), Tim(5)"

Where the string aggregates the values from the names column in a list sorted by the associated numeric value.
I know that there isn't something out of the box to do this, but what is this kind of operation or aggregation or pivoting (where you take one of the columns and turn it to be "horizontal" if that makes any sense) actually called?
How would I go about implementing it in Pandas functions?

Oliver Foster · Accepted Answer

Say you have a dataframe of your data in this format:
df = pd.DataFrame({
    'name': ['Bob', 'Bob', 'Bob', 'Tim', 'Tim', 'Tim'],
    'fruit': ['Oranges', 'Apples', 'Bananas', 'Oranges', 'Apples', 'Bananas'],
    'num': [5, 10, 12, 3, 20, 5]
})

You can perform a groupby on fruit and aggregate the sum of the num field. After that you can apply a function on your aggregated dataframe that leverages values from your non-aggregated data like so:
df_agg = df.groupby('fruit').sum()
df_agg.reset_index(inplace=True)

def desc(x):
    d = []
    for idx, row in df.loc[df['fruit']==x].sort_values(['num'], ascending=False).iterrows():
        d.append(f"{row['name']}({row['num']})")
    return ', '.join(d)

df_agg['desc_str'] = df_agg['fruit'].apply(lambda x: desc(x))

This gives you the following df_agg that you are looking for:
    fruit   num desc_str
0   Apples  30  Tim(20), Bob(10)
1   Bananas 17  Bob(12), Tim(5)
2   Oranges 8   Bob(5), Tim(3)

Julio Jesus · Answer

Does that work for you?
df.groupby(["fruit","name"]).num.sum().unstack().assign(suma = lambda x: x.sum(axis = 1))

Returns:
name    Bob Tim suma
fruit           
Apples  10  20  30
Bananas 12  5   17
Oranges 5   3   8

Building a summary string in a Pandas groupby (Possibly cross-tab or pivot-table question)

2 Answers

Add your own answers!

Ask a Question