TransWikia.com

How to split data into training, validation, test data sets if the data is non-stationary?

Data Science Asked by user781486 on May 24, 2021

When splitting data into training, validation, test data sets to be fed to machine learning model, the data is ideally expected to be stationary. However, in the real world, some data is non-stationary. For example, financial time-series data are non-stationary.

So, for this kind of non-stationary data, how do you split data into training, validation, test data sets?

One Answer

In general, for time series data, you can split it in such a way that the temporal order is as is in all your splits (training, validation and testing sets).

Once you make your series stationary (via differentiation for instance), you can make use of the TimeSeriesSplit cross-validator by scikit-learn, with wich you get the time-ordered indices of each train-validation split, so that you can use them later on with the validation strategy you might want to use, something like:

enter image description here

As a possible validation strategy, you could use the walk-forward validation (nice info here)

Answered by German C M on May 24, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP