modeling time series data with large number of variables

Question

I want to model time series data of 52 dependent variable using neural networks in order to forecast these series in future .
I have tried some architectures of LSTM and CNN (conv1D) models but my models always overfit as they can't generalize.

Does the number of features impact the results of models, if yes how to deal with data with large number of variables ? Is there any models preferred for this task ?

MXK · Accepted Answer

Start by studying the covariance of your features, if you find that hard to interpret use Pearson's correlation it will help you detect the correlated features and follow up with Spearman's correlation or Kendall's correlation just to be sure.
Then you can proceed with dimensionality reduction using PCA but projecting highly dimentinality data to lower space, or you can embed some of your features, but for this you need a good data analysis to choose the right ones.
You can also use Feature Selection to get the importance of every feature and choose them based on their score then remove those who are correlated. This will definitely improve your result.
Now for overfitting you need to figure out the source of low bias and high variance in your model I suggest you use Keras Tuner if you have implemented LSTM and CNN with Tensorflow of course or Ray Tune if you're using Pytorch to get the parameter tuning which will help you reduce the overfitting and you can always act on on Early Stopping, Regularization techniques, Dropouts...

modeling time series data with large number of variables

One Answer

Add your own answers!

Ask a Question