Cross Validated Asked on December 23, 2021
I have a time series binary classification dataset. I am implementing an online learning Logistic Regression algorithm in Sklearn and am cross validating with Sklearn’s TimeSeriesSplit method.
I am wondering whether if it makes sense to perform the nested CV on the training set only, after splitting into training and test sets, as it is normally done with batch learning algorithms.
Alternatively, wouldn’t one just be able to perform the nested cross validation with the entire available dataset?
The reason I ask is, I noticed in Sklearn that to implement classification in an online learning fashion one should implement the SGDClassifier model and then, based on the choice of loss function (e.g. huber vs hinge), the model is either implemented as SVM or Logistic Regression – but the cost function minimization algorithm is still Stochastic Gradient Descent.
So with that in mind, one could simply feed the two loss parameters as part of the hyperparameters to gridsearch on and obtain a single grid search best estimator with a single loss function, which would make the need to predict on the test set irrelevant.
Or am I just completely off track???
Nested CV with online learning is not needed. One is better off using https://github.com/creme-ml/creme than sklearn with online optimizers.
Answered by Odisseo on December 23, 2021
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP