Python Pandas cumsum with shift of n

Stack Overflow Asked by Gonzalo Polo on October 10, 2020

I would like to know if there is an efficient way (avoiding for loops) of doing a serie.cumsum() but with a shift of n.

The same way you can see serie.cumsum() like the inverse of serie.diff(1) I am looking for an inverse of diff(n) (I know that for a proper inverse you need the initial values but for simplicity I ignore them here) that could be called cumsum_shift

More explicitly implementing it with a for loop (which I would like to avoid):

def cumsum_shift(s, shift = 1, init_values = [0]):
    s_cumsum = pd.Series(np.zeros(len(s)))
    for i in range(shift):
        s_cumsum.iloc[i] = init_values[i]
    for i in range(shift,len(s)):
        s_cumsum.iloc[i] = s_cumsum.iloc[i-shift] + s.iloc[i]
    return s_cumsum

This code with shift = 1 is exactly the same that the s.cumsum() pandas method does but of course the pandas method do it in C code (I guess) so it is much faster (of course you should always use the s.cumsum() pandas method and not implement it yourself with a for loop).

My question then is
What would be the way of doing cumsum_shift avoiding a for loop with pandas methods?

Edit 1

Adding an example of input and output

If you call it with:

s = pd.Series([1,10,100,2,20,200,5,50,500])
out[26] 0      NaN
        1      NaN
        2      NaN
        3      1.0
        4     10.0
        5    100.0
        6      3.0
        7     30.0
        8    300.0
        dtype: float64

With this input, the ouput of cumsum_shift(s.diff(3), shift = 3, init_values = [1,2,3]) is again the original series s. Notice the shift of 3, this with just cumsum() e.g s.diff(3).cumsum() would not recover the original s:

cumsum_shift(s.diff(3), shift = 3, init_values= [1,10,100])
0      1.0
1     10.0
2    100.0
3      2.0
4     20.0
5    200.0
6      5.0
7     50.0
8    500.0
dtype: float64

But let me emphasize that the initial values is not a big deal, a constant difference is not a problem. I would like to know how to perform a cumsum of shifted differenced serie without having to use a for loop

The same way that if you do a diff() and then a cumsum() you get back the orginal one up to the initial value:

s = pd.Series([1,10,100,2,20,200,5,50,500])
0      NaN
1      9.0
2     99.0
3      1.0
4     19.0
5    199.0
6      4.0
7     49.0
8    499.0
dtype: float64

I would like to know if there some clever way of doing something like s.diff(n).cumsum(n) that returned something correct up to some constant initial values.

EDIT 2 – Reverse a Moving Average

Thinking of an application of the "shifted cumsum" I found this other question in SO of how to reverse a moving average that I have answered using my cumsum_shift function and I think it clarifies more what I am asking here

One Answer

You can use the pandas method rolling.sum() among with sum:


However you may want to fill the NaN values until the shift with the original df.

Answered by Elif on October 10, 2020

Add your own answers!

Ask a Question

Get help from others!

© 2024 All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP