Stack Overflow Asked by Gonzalo Polo on October 10, 2020
I would like to know if there is an efficient way (avoiding for loops) of doing a serie.cumsum()
but with a shift of n.
The same way you can see serie.cumsum()
like the inverse of serie.diff(1)
I am looking for an inverse of diff(n)
(I know that for a proper inverse you need the initial values but for simplicity I ignore them here) that could be called cumsum_shift
More explicitly implementing it with a for loop (which I would like to avoid):
def cumsum_shift(s, shift = 1, init_values = [0]):
s_cumsum = pd.Series(np.zeros(len(s)))
for i in range(shift):
s_cumsum.iloc[i] = init_values[i]
for i in range(shift,len(s)):
s_cumsum.iloc[i] = s_cumsum.iloc[i-shift] + s.iloc[i]
return s_cumsum
This code with shift = 1
is exactly the same that the s.cumsum()
pandas method does but of course the pandas method do it in C code (I guess) so it is much faster (of course you should always use the s.cumsum()
pandas method and not implement it yourself with a for loop).
My question then is
What would be the way of doing cumsum_shift
avoiding a for loop with pandas methods?
Adding an example of input and output
If you call it with:
s = pd.Series([1,10,100,2,20,200,5,50,500])
s.diff(3)
out[26] 0 NaN
1 NaN
2 NaN
3 1.0
4 10.0
5 100.0
6 3.0
7 30.0
8 300.0
dtype: float64
With this input, the ouput of cumsum_shift(s.diff(3), shift = 3, init_values = [1,2,3])
is again the original series s
. Notice the shift of 3, this with just cumsum()
e.g s.diff(3).cumsum()
would not recover the original s
:
cumsum_shift(s.diff(3), shift = 3, init_values= [1,10,100])
out[27]
0 1.0
1 10.0
2 100.0
3 2.0
4 20.0
5 200.0
6 5.0
7 50.0
8 500.0
dtype: float64
But let me emphasize that the initial values is not a big deal, a constant difference is not a problem. I would like to know how to perform a cumsum of shifted differenced serie without having to use a for loop
The same way that if you do a diff()
and then a cumsum()
you get back the orginal one up to the initial value:
s = pd.Series([1,10,100,2,20,200,5,50,500])
s.diff().cumsum()
out[28]
0 NaN
1 9.0
2 99.0
3 1.0
4 19.0
5 199.0
6 4.0
7 49.0
8 499.0
dtype: float64
I would like to know if there some clever way of doing something like s.diff(n).cumsum(n)
that returned something correct up to some constant initial values.
EDIT 2 – Reverse a Moving Average
Thinking of an application of the "shifted cumsum" I found this other question in SO of how to reverse a moving average that I have answered using my cumsum_shift
function and I think it clarifies more what I am asking here
You can use the pandas method rolling.sum() among with sum:
s.rolling(shift).sum()
However you may want to fill the NaN values until the shift with the original df.
Answered by Elif on October 10, 2020
Get help from others!
Recent Answers
Recent Questions
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP