Data Science Asked on August 3, 2021
I am trying to use the groupby functionality in order to do the following given this example dataframe:
dates = ['2020-03-01','2020-03-01','2020-03-01','2020-03-01','2020-03-01',
'2020-03-10','2020-03-10','2020-03-10','2020-03-10','2020-03-10']
values = [1,2,3,4,5,10,20,30,40,50]
d = {'date': dates, 'values': values}
df = pd.DataFrame(data=d)
I want to take the largest n values grouped by date and take the sum of these values. This is how I understand I should do this: I should use groupby date, then define my own function that takes the grouped dataframes and spits out the value I need:
def myfunc(df):
a = df.nlargest(3, 'values')['values'].sum()
return a
data_agg = df.groupby('date').agg({'relevant_sentiment':myfunc})
However, I am getting various errors, like the fact that the value keep is not set, or that it’s not clearly set when I do specify it in myfunc.
I would hope to get a dataframe with the two dates 03-01 and 03-10 with respectively the values 12 and 120.
Any help/insights/remarks will be appreciated.
You could do it simple and it should work like this:
def myfunc(df):
return df.nlargest(3, 'values')[['values']].sum()
and then:
data_agg = df.groupby('date', as_index=False).apply(myfunc)
You decide if "data_agg" is the proper name then. Good luck!
Correct answer by Nikonation on August 3, 2021
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP