Backtesting short-selling strategy using pandas dataframe

Question

I would like to make a simple backtest for one of my short-selling strategies. I am using pandas dataframes.
So I have a dataframe like the following, that indicates how many positions to open/close every day.
                position_change position_total
2018-01-03      1               1
2018-01-04      0               1
2018-01-05      0               1
2018-01-08      0               1
2018-01-09      0               1
2018-01-10      1               2
2018-01-11      0               2
2018-01-12      0               2
2018-01-16      0               2

I also have a dataframe with the prices of the asset:
            price       short_sell_change       accum_change
2018-01-03  10          1                       1
2018-01-04  9           1,1111111111            1,1111111111
2018-01-05  8           1,125                   1,25
2018-01-08  7           1,1428571429            1,4285714286
2018-01-09  6           1,1666666667            1,6666666667
2018-01-10  5           1,2                     2
2018-01-11  4           1,25                    2,5
2018-01-12  3           1,3333333333            3,3333333333
2018-01-16  2,5         1,2                     4

The final (net amount) dataframe should be:
            net_amount
2018-01-03  10
2018-01-04  11,1111111111
2018-01-05  12,5
2018-01-08  14,2857142857
2018-01-09  16,6666666667
2018-01-10  25
2018-01-11  31,25
2018-01-12  41,6666666667
2018-01-16  50

This is easy to do with Excel accumulating the previous net_amount with a reference to the above cell and adding the position_chage info:

How can this be done in a pandas-way? (Unfortunately I guess the only possible way is iterating over the rows)

amdopt · Answer

Assuming your dataframe looks something like this and the name of the dataframe is your_dataframe (I left out one of your columns that wasn't needed for this):
   position  pos_total  price  ss_change
0         1          1   10.0   1.000000
1         0          1    9.0   1.111111
2         0          1    8.0   1.125000
3         0          1    7.0   1.142857
4         0          1    6.0   1.166667
5         1          2    5.0   1.200000
6         0          2    4.0   1.250000
7         0          2    3.0   1.333333
8         0          2    2.5   1.200000
9         0          2    1.0   2.500000

Next, to replicate your Excel formula I defined a function that takes a dataframe and your initial value as inputs and returns the dataframe with a new column and the values you are looking for:
def net_amount(df, initial_value):
    df['net_amount'] = float(initial_value)
    for row in range(1, len(df)):
        df['net_amount'][row] = (df['net_amount'][row-1] *
                                 df['ss_change'][row] +
                                 df['position'][row] *
                                 df['price'][row])
    return df

This is an iterative way of doing it and with large amounts of data it will take a long time.  If you have a ton of data there are better ways.
Next just call the function:
net_amount(your_dataframe, 10)

This is what is returned:
   position  pos_total  price  ss_change  net_amount
0         1          1   10.0   1.000000   10.000000
1         0          1    9.0   1.111111   11.111111
2         0          1    8.0   1.125000   12.500000
3         0          1    7.0   1.142857   14.285714
4         0          1    6.0   1.166667   16.666667
5         1          2    5.0   1.200000   25.000000
6         0          2    4.0   1.250000   31.250000
7         0          2    3.0   1.333333   41.666667
8         0          2    2.5   1.200000   50.000000
9         0          2    1.0   2.500000  125.000000

There are plenty of ways to improve upon this and to make it more dynamic if needed.  The way it sits relies upon the columns being named a specific way but you can use it as a template and make changes.
Edit:
For additional speed you can incorporate the use numba like this:
import pandas as pd
from numba import jit

@jit
def numba_calc(net, ss, pos, price):
    for i in range(1, len(net)):
        net[i] = (net[i-1] * ss[i] + pos[i] * price[i])
    return net

def net_amount_numba(df, initial_value):
    df['net_amount'] = float(initial_value)
    net = df['net_amount'].to_numpy()
    ss = df['ss_change'].to_numpy()
    pos = df['position'].to_numpy()
    price = df['price'].to_numpy()

df['net_amount'] = numba_calc(net, ss, pos, price)

return df

On my machine, this improves performance by ~8x from ~2.4 milliseconds down to ~300 microseconds. Pandas/Numba documentation

Backtesting short-selling strategy using pandas dataframe

One Answer

Add your own answers!

Ask a Question