TransWikia.com

Machine Learning algorithms and Panel data

Data Science Asked on January 29, 2021

I have a large panel dataset composed of $N$ stocks, $T$ quarterly dates and $K$ features for each stock. The dataset looks like the following:

            symbol  stockPriceD numberOfShares  marketCapitalization   ...    label
2002-06-01  A       -4.91       1000000.0       -2.254640e+09          ...    0
2002-09-01  A       -9.08       1000000.0       -4.203510e+09          ...    1
2002-12-01  A       4.27        0.0             1.985550e+09           ...    1 
...
2009-06-01  BA      3.19        732600000.0     3.167762e+10           ...    1
...
2019-12-01  ZTS     10.43       -700000.0       4.896220e+09           ...    0 
2020-03-01  ZTS     -8.72       -2400000.0      -4.478504e+09          ...    0

I would like to do a forecast task on this dataset but I cannot assume independence among the features (almost all are autocorrelated), and splitting the dataset into $N$ different ones for each stock will leave me with very small datasets (max 72 instances/rows).

How can I handle this problem? Am I allowed to assume independence among the instances in any case forgetting about the autocorrelation? Are there Machine Learning algorithms that can handle these kinds of problems (panel data)?

I read about using RNN and LSTM algorithms to address these issues, but how the data should be treated?

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP