Quantitative Finance Asked by Alfonso_MA on December 10, 2021
I would like to make a simple backtest for one of my short-selling strategies. I am using pandas dataframes.
So I have a dataframe like the following, that indicates how many positions to open/close every day.
position_change position_total
2018-01-03 1 1
2018-01-04 0 1
2018-01-05 0 1
2018-01-08 0 1
2018-01-09 0 1
2018-01-10 1 2
2018-01-11 0 2
2018-01-12 0 2
2018-01-16 0 2
I also have a dataframe with the prices of the asset:
price short_sell_change accum_change
2018-01-03 10 1 1
2018-01-04 9 1,1111111111 1,1111111111
2018-01-05 8 1,125 1,25
2018-01-08 7 1,1428571429 1,4285714286
2018-01-09 6 1,1666666667 1,6666666667
2018-01-10 5 1,2 2
2018-01-11 4 1,25 2,5
2018-01-12 3 1,3333333333 3,3333333333
2018-01-16 2,5 1,2 4
The final (net amount) dataframe should be:
net_amount
2018-01-03 10
2018-01-04 11,1111111111
2018-01-05 12,5
2018-01-08 14,2857142857
2018-01-09 16,6666666667
2018-01-10 25
2018-01-11 31,25
2018-01-12 41,6666666667
2018-01-16 50
This is easy to do with Excel accumulating the previous net_amount with a reference to the above cell and adding the position_chage info:
How can this be done in a pandas-way? (Unfortunately I guess the only possible way is iterating over the rows)
Assuming your dataframe looks something like this and the name of the dataframe is your_dataframe
(I left out one of your columns that wasn't needed for this):
position pos_total price ss_change
0 1 1 10.0 1.000000
1 0 1 9.0 1.111111
2 0 1 8.0 1.125000
3 0 1 7.0 1.142857
4 0 1 6.0 1.166667
5 1 2 5.0 1.200000
6 0 2 4.0 1.250000
7 0 2 3.0 1.333333
8 0 2 2.5 1.200000
9 0 2 1.0 2.500000
Next, to replicate your Excel formula I defined a function that takes a dataframe and your initial value as inputs and returns the dataframe with a new column and the values you are looking for:
def net_amount(df, initial_value):
df['net_amount'] = float(initial_value)
for row in range(1, len(df)):
df['net_amount'][row] = (df['net_amount'][row-1] *
df['ss_change'][row] +
df['position'][row] *
df['price'][row])
return df
This is an iterative way of doing it and with large amounts of data it will take a long time. If you have a ton of data there are better ways.
Next just call the function:
net_amount(your_dataframe, 10)
This is what is returned:
position pos_total price ss_change net_amount
0 1 1 10.0 1.000000 10.000000
1 0 1 9.0 1.111111 11.111111
2 0 1 8.0 1.125000 12.500000
3 0 1 7.0 1.142857 14.285714
4 0 1 6.0 1.166667 16.666667
5 1 2 5.0 1.200000 25.000000
6 0 2 4.0 1.250000 31.250000
7 0 2 3.0 1.333333 41.666667
8 0 2 2.5 1.200000 50.000000
9 0 2 1.0 2.500000 125.000000
There are plenty of ways to improve upon this and to make it more dynamic if needed. The way it sits relies upon the columns being named a specific way but you can use it as a template and make changes.
Edit:
For additional speed you can incorporate the use numba like this:
import pandas as pd
from numba import jit
@jit
def numba_calc(net, ss, pos, price):
for i in range(1, len(net)):
net[i] = (net[i-1] * ss[i] + pos[i] * price[i])
return net
def net_amount_numba(df, initial_value):
df['net_amount'] = float(initial_value)
net = df['net_amount'].to_numpy()
ss = df['ss_change'].to_numpy()
pos = df['position'].to_numpy()
price = df['price'].to_numpy()
df['net_amount'] = numba_calc(net, ss, pos, price)
return df
On my machine, this improves performance by ~8x from ~2.4 milliseconds down to ~300 microseconds. Pandas/Numba documentation
Answered by amdopt on December 10, 2021
Get help from others!
Recent Answers
Recent Questions
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP