Data Science Asked by user781486 on May 24, 2021
When splitting data into training, validation, test data sets to be fed to machine learning model, the data is ideally expected to be stationary. However, in the real world, some data is non-stationary. For example, financial time-series data are non-stationary.
So, for this kind of non-stationary data, how do you split data into training, validation, test data sets?
In general, for time series data, you can split it in such a way that the temporal order is as is in all your splits (training, validation and testing sets).
Once you make your series stationary (via differentiation for instance), you can make use of the TimeSeriesSplit cross-validator by scikit-learn, with wich you get the time-ordered indices of each train-validation split, so that you can use them later on with the validation strategy you might want to use, something like:
As a possible validation strategy, you could use the walk-forward validation (nice info here)
Answered by German C M on May 24, 2021
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP