Stack Overflow на русском Asked on December 26, 2021
for i in range(600,len(Dts),1):
Dts['Av sales D'][i] = Dts['Sales'][i-600:i][Dts['D']==Dts['D'][i]].mean()
Считает среднее значении продаж за текущий день месяца на скользящем окне в 600 дней.
Данных много, уходит больше 10 секунд. Пробовала через Pandarallel, но он не ставится на мою машину.
Пример данных D
– дни месяца (1-31) Sales
– сами продажи. Пример расчетов – формула в колонке average 2
:
D Sales average 2
1 15 na
2 1 na
3 9 na
. . .
. . .
. . .
1 17 na
2 28 na
3 6 na
. . .
. . .
. . .
1 20 (15+17)/2
2 13 (1+28)/2
3 2 (9+6)/2
. . .
. . .
. . .
1 1 (20+1)/2
2 0 (13+0)/2
3 7 (2+7)/2
Пример решения для уточненного вопроса:
Исходный фрейм:
In [20]: df = pd.DataFrame({'Date':pd.date_range('2020-01-01', '2020-02-29'), 'Sales':np.random.randint(30, size=60)})
In [21]: df
Out[21]:
Date Sales
0 2020-01-01 3
1 2020-01-02 16
2 2020-01-03 24
3 2020-01-04 14
4 2020-01-05 23
5 2020-01-06 0
6 2020-01-07 29
7 2020-01-08 16
8 2020-01-09 24
9 2020-01-10 8
10 2020-01-11 12
11 2020-01-12 23
12 2020-01-13 23
13 2020-01-14 28
14 2020-01-15 11
15 2020-01-16 6
16 2020-01-17 0
17 2020-01-18 17
18 2020-01-19 12
19 2020-01-20 14
20 2020-01-21 8
21 2020-01-22 29
22 2020-01-23 29
23 2020-01-24 20
24 2020-01-25 3
25 2020-01-26 3
26 2020-01-27 16
27 2020-01-28 8
28 2020-01-29 14
29 2020-01-30 13
30 2020-01-31 11
31 2020-02-01 7
32 2020-02-02 1
33 2020-02-03 6
34 2020-02-04 26
35 2020-02-05 18
36 2020-02-06 26
37 2020-02-07 1
38 2020-02-08 29
39 2020-02-09 10
40 2020-02-10 3
41 2020-02-11 18
42 2020-02-12 22
43 2020-02-13 24
44 2020-02-14 26
45 2020-02-15 14
46 2020-02-16 22
47 2020-02-17 18
48 2020-02-18 25
49 2020-02-19 23
50 2020-02-20 16
51 2020-02-21 25
52 2020-02-22 11
53 2020-02-23 27
54 2020-02-24 8
55 2020-02-25 1
56 2020-02-26 3
57 2020-02-27 16
58 2020-02-28 14
59 2020-02-29 28
Решение:
df["val_roll_avg"] = (df
.groupby(df["Date"].dt.day)
["Sales"]
.rolling(2)
.mean()
.reset_index(level=0, drop=True))
Результат:
In [23]: df
Out[23]:
Date Sales val_roll_avg
0 2020-01-01 3 NaN
1 2020-01-02 16 NaN
2 2020-01-03 24 NaN
3 2020-01-04 14 NaN
4 2020-01-05 23 NaN
5 2020-01-06 0 NaN
6 2020-01-07 29 NaN
7 2020-01-08 16 NaN
8 2020-01-09 24 NaN
9 2020-01-10 8 NaN
10 2020-01-11 12 NaN
11 2020-01-12 23 NaN
12 2020-01-13 23 NaN
13 2020-01-14 28 NaN
14 2020-01-15 11 NaN
15 2020-01-16 6 NaN
16 2020-01-17 0 NaN
17 2020-01-18 17 NaN
18 2020-01-19 12 NaN
19 2020-01-20 14 NaN
20 2020-01-21 8 NaN
21 2020-01-22 29 NaN
22 2020-01-23 29 NaN
23 2020-01-24 20 NaN
24 2020-01-25 3 NaN
25 2020-01-26 3 NaN
26 2020-01-27 16 NaN
27 2020-01-28 8 NaN
28 2020-01-29 14 NaN
29 2020-01-30 13 NaN
30 2020-01-31 11 NaN
31 2020-02-01 7 5.0
32 2020-02-02 1 8.5
33 2020-02-03 6 15.0
34 2020-02-04 26 20.0
35 2020-02-05 18 20.5
36 2020-02-06 26 13.0
37 2020-02-07 1 15.0
38 2020-02-08 29 22.5
39 2020-02-09 10 17.0
40 2020-02-10 3 5.5
41 2020-02-11 18 15.0
42 2020-02-12 22 22.5
43 2020-02-13 24 23.5
44 2020-02-14 26 27.0
45 2020-02-15 14 12.5
46 2020-02-16 22 14.0
47 2020-02-17 18 9.0
48 2020-02-18 25 21.0
49 2020-02-19 23 17.5
50 2020-02-20 16 15.0
51 2020-02-21 25 16.5
52 2020-02-22 11 20.0
53 2020-02-23 27 28.0
54 2020-02-24 8 14.0
55 2020-02-25 1 2.0
56 2020-02-26 3 3.0
57 2020-02-27 16 16.0
58 2020-02-28 14 11.0
59 2020-02-29 28 21.0
Answered by MaxU on December 26, 2021
Воспользуйтесь методом Series.rolling().
Пример:
In [22]: df = pd.DataFrame({"val": np.arange(20)})
In [23]: df["val_roll_avg"] = df["val"].rolling(3, min_periods=1).mean()
In [24]: df
Out[24]:
val val_roll_avg
0 0 0.0
1 1 0.5
2 2 1.0
3 3 2.0
4 4 3.0
5 5 4.0
6 6 5.0
7 7 6.0
8 8 7.0
9 9 8.0
10 10 9.0
11 11 10.0
12 12 11.0
13 13 12.0
14 14 13.0
15 15 14.0
16 16 15.0
17 17 16.0
18 18 17.0
19 19 18.0
Answered by MaxU on December 26, 2021
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP