TransWikia.com

pandas count values for last 7 days from each date

Data Science Asked by Artem Betley on August 14, 2020

There are two pd.DataFrame. First is like this:

print df1

        id        date    month  is_buy
     0  17  2015-01-16  2015-01       1
     1  17  2015-01-26  2015-01       1
     2  17  2015-01-27  2015-01       1
     3  17  2015-02-11  2015-02       1
     4  17  2015-03-14  2015-03       1
     5  18  2015-01-28  2015-01       1
     6  18  2015-02-12  2015-02       1
     7  18  2015-02-25  2015-02       1
     8  18  2015-03-04  2015-03       1

In second data frame there are some aggregated data by month from the first one:

df2 = df1[df1['is_buy'] == 1].groupby(['id', 'month']).agg({'is_buy': np.sum})
     
print df2

        id    month       buys
     0  17  2015-01          3
     1  17  2015-02          1
     2  17  2015-03          1
     3  18  2015-01          1
     4  18  2015-02          2
     5  18  2015-03          1

I’m trying to get new df2 column named 'last_week_buys' with aggregated buys by last 7 days from first day of each df1['month']. In other words, I want to get this:

        id    month       buys    last_week_buys
     0  17  2015-01          3               NaN
     1  17  2015-02          1                 2
     2  17  2015-03          1                 0
     3  18  2015-01          1               NaN
     4  18  2015-02          2                 1
     5  18  2015-03          1                 1

Are there any ideas to get this column?

2 Answers

The main obstacle is figuring out whether a date is within the last 7 days of the month. I'd recommend something hacky like the following:

from datetime import datetime, date, timedelta
def last7(datestr):
    orig = datetime.strptime(datestr,'%Y-%m-%d')
    plus7 = orig+timedelta(7)
    return plus7.month != orig.month

Once you have that, it's relatively simple to adapt your previous code:

df3 = df1[df1['is_buy'] == 1 && last7(df1['date'])].groupby(['id', 'month']).agg({'is_buy': np.sum})

Now we just join together df2 and df3 and we're done.

Correct answer by Matthew Graves on August 14, 2020

You can also do something like this:

patterns = df [['Total','Date']]
patterns = purchase_patterns.set_index("Date")
resample = patterns.resample ('D' , how = sum)

#to extract the last items of the list

last_7 = resample[-7:]

# and to get the total
last_7 = resample[-7:].sum()

A reference for data slicing is here.

Answered by rainer on August 14, 2020

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP