TransWikia.com

Как ускорить расчет среднего значения на скользящем окне?

Stack Overflow на русском Asked on December 26, 2021

for i in range(600,len(Dts),1):
    Dts['Av sales D'][i] = Dts['Sales'][i-600:i][Dts['D']==Dts['D'][i]].mean()

Считает среднее значении продаж за текущий день месяца на скользящем окне в 600 дней.

Данных много, уходит больше 10 секунд. Пробовала через Pandarallel, но он не ставится на мою машину.

Пример данных D – дни месяца (1-31) Sales – сами продажи. Пример расчетов – формула в колонке average 2:

D       Sales   average 2
1       15      na
2       1       na
3       9       na
.       .       .
.       .       .
.       .       .
1       17      na
2       28      na
3       6       na
.       .       .
.       .       .
.       .       .
1       20      (15+17)/2
2       13      (1+28)/2
3       2       (9+6)/2
.       .       .
.       .       .
.       .       .
1       1       (20+1)/2
2       0       (13+0)/2
3       7       (2+7)/2

2 Answers

Пример решения для уточненного вопроса:

Исходный фрейм:

In [20]: df = pd.DataFrame({'Date':pd.date_range('2020-01-01', '2020-02-29'), 'Sales':np.random.randint(30, size=60)})

In [21]: df
Out[21]:
         Date  Sales
0  2020-01-01      3
1  2020-01-02     16
2  2020-01-03     24
3  2020-01-04     14
4  2020-01-05     23
5  2020-01-06      0
6  2020-01-07     29
7  2020-01-08     16
8  2020-01-09     24
9  2020-01-10      8
10 2020-01-11     12
11 2020-01-12     23
12 2020-01-13     23
13 2020-01-14     28
14 2020-01-15     11
15 2020-01-16      6
16 2020-01-17      0
17 2020-01-18     17
18 2020-01-19     12
19 2020-01-20     14
20 2020-01-21      8
21 2020-01-22     29
22 2020-01-23     29
23 2020-01-24     20
24 2020-01-25      3
25 2020-01-26      3
26 2020-01-27     16
27 2020-01-28      8
28 2020-01-29     14
29 2020-01-30     13
30 2020-01-31     11
31 2020-02-01      7
32 2020-02-02      1
33 2020-02-03      6
34 2020-02-04     26
35 2020-02-05     18
36 2020-02-06     26
37 2020-02-07      1
38 2020-02-08     29
39 2020-02-09     10
40 2020-02-10      3
41 2020-02-11     18
42 2020-02-12     22
43 2020-02-13     24
44 2020-02-14     26
45 2020-02-15     14
46 2020-02-16     22
47 2020-02-17     18
48 2020-02-18     25
49 2020-02-19     23
50 2020-02-20     16
51 2020-02-21     25
52 2020-02-22     11
53 2020-02-23     27
54 2020-02-24      8
55 2020-02-25      1
56 2020-02-26      3
57 2020-02-27     16
58 2020-02-28     14
59 2020-02-29     28

Решение:

df["val_roll_avg"] = (df
                      .groupby(df["Date"].dt.day)
                      ["Sales"]
                      .rolling(2)
                      .mean()
                      .reset_index(level=0, drop=True))

Результат:

In [23]: df
Out[23]:
         Date  Sales  val_roll_avg
0  2020-01-01      3           NaN
1  2020-01-02     16           NaN
2  2020-01-03     24           NaN
3  2020-01-04     14           NaN
4  2020-01-05     23           NaN
5  2020-01-06      0           NaN
6  2020-01-07     29           NaN
7  2020-01-08     16           NaN
8  2020-01-09     24           NaN
9  2020-01-10      8           NaN
10 2020-01-11     12           NaN
11 2020-01-12     23           NaN
12 2020-01-13     23           NaN
13 2020-01-14     28           NaN
14 2020-01-15     11           NaN
15 2020-01-16      6           NaN
16 2020-01-17      0           NaN
17 2020-01-18     17           NaN
18 2020-01-19     12           NaN
19 2020-01-20     14           NaN
20 2020-01-21      8           NaN
21 2020-01-22     29           NaN
22 2020-01-23     29           NaN
23 2020-01-24     20           NaN
24 2020-01-25      3           NaN
25 2020-01-26      3           NaN
26 2020-01-27     16           NaN
27 2020-01-28      8           NaN
28 2020-01-29     14           NaN
29 2020-01-30     13           NaN
30 2020-01-31     11           NaN
31 2020-02-01      7           5.0
32 2020-02-02      1           8.5
33 2020-02-03      6          15.0
34 2020-02-04     26          20.0
35 2020-02-05     18          20.5
36 2020-02-06     26          13.0
37 2020-02-07      1          15.0
38 2020-02-08     29          22.5
39 2020-02-09     10          17.0
40 2020-02-10      3           5.5
41 2020-02-11     18          15.0
42 2020-02-12     22          22.5
43 2020-02-13     24          23.5
44 2020-02-14     26          27.0
45 2020-02-15     14          12.5
46 2020-02-16     22          14.0
47 2020-02-17     18           9.0
48 2020-02-18     25          21.0
49 2020-02-19     23          17.5
50 2020-02-20     16          15.0
51 2020-02-21     25          16.5
52 2020-02-22     11          20.0
53 2020-02-23     27          28.0
54 2020-02-24      8          14.0
55 2020-02-25      1           2.0
56 2020-02-26      3           3.0
57 2020-02-27     16          16.0
58 2020-02-28     14          11.0
59 2020-02-29     28          21.0

Answered by MaxU on December 26, 2021

Воспользуйтесь методом Series.rolling().

Пример:

In [22]: df = pd.DataFrame({"val": np.arange(20)})

In [23]: df["val_roll_avg"] = df["val"].rolling(3, min_periods=1).mean()

In [24]: df
Out[24]:
    val  val_roll_avg
0     0           0.0
1     1           0.5
2     2           1.0
3     3           2.0
4     4           3.0
5     5           4.0
6     6           5.0
7     7           6.0
8     8           7.0
9     9           8.0
10   10           9.0
11   11          10.0
12   12          11.0
13   13          12.0
14   14          13.0
15   15          14.0
16   16          15.0
17   17          16.0
18   18          17.0
19   19          18.0

Answered by MaxU on December 26, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP