TransWikia.com

Using user defined function in groupby

Data Science Asked on August 3, 2021

I am trying to use the groupby functionality in order to do the following given this example dataframe:

dates = ['2020-03-01','2020-03-01','2020-03-01','2020-03-01','2020-03-01',
        '2020-03-10','2020-03-10','2020-03-10','2020-03-10','2020-03-10']
values = [1,2,3,4,5,10,20,30,40,50]
d = {'date': dates, 'values': values}
df = pd.DataFrame(data=d)

I want to take the largest n values grouped by date and take the sum of these values. This is how I understand I should do this: I should use groupby date, then define my own function that takes the grouped dataframes and spits out the value I need:

def myfunc(df):
    
    a = df.nlargest(3, 'values')['values'].sum()
    
    return a

data_agg = df.groupby('date').agg({'relevant_sentiment':myfunc})

However, I am getting various errors, like the fact that the value keep is not set, or that it’s not clearly set when I do specify it in myfunc.

I would hope to get a dataframe with the two dates 03-01 and 03-10 with respectively the values 12 and 120.

Any help/insights/remarks will be appreciated.

One Answer

You could do it simple and it should work like this:

def myfunc(df):
    return df.nlargest(3, 'values')[['values']].sum()

and then:

data_agg = df.groupby('date', as_index=False).apply(myfunc)

You decide if "data_agg" is the proper name then. Good luck!

Correct answer by Nikonation on August 3, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP