TransWikia.com

Time series - is it necessary to retrain the model when new time series data is present

Data Science Asked by da4l on July 10, 2021

Say you’re building a sales prediction model to predict tomorrow’s sales value, as well as the next 2 weeks of daily sales. The model is being trained using daily data for the previous 1.5 years, and it follows a strong weekly seasonality pattern.

Obviously once you are happy with model performance, would it be necessary to re-train the model everyday to capture data up to and including yesterday in order to get the most accurate prediction for tomorrow’s sales value? Essentially, the model would be trained on a rolling 1.5 year data set to capture yesterday’s sales value.

Or does it depend on the type of model being used? Whether you need to to completely re-train when new data is present

I can see why it would make sense to re-train your standard time series forecasting models (ARIMA etc.), but I can also understand more sophisticated models (Neural Networks etc.) may generalize well enough to not have to re-train everyday.

I am looking for an explanation of models where you would and wouldn’t re-train when new time series data is present.

4 Answers

You need to retrain your model every time you want to generate a new prediction:

For most ML models, you train a model, test it, retrain it if necessary until you’ve gotten satisfactory results, and then evaluate it on a hold out data set. After you’re satisfied with the model’s performance, you then deploy it to production. Once in production, you score new data as it comes in. Eventually after a few months, you might want to update your model if a significant amount of new training data comes in. Model training is a one time activity, or done at most at periodic intervals to maintain the model’s performance to take int account new information. For instance, the visual properties of cats are stable over time. We don’t expect cats to look different next week, or next year, or even ten years from now. Given enough data, the model we trained this week is good enough for the foreseeable future as well.

For time series models, this is not the case. Instead we have to retrain our model every time we want to generate a new forecast. To understand why this happens consider this...trends, and related seasonal variations, change over time!!

Answered by ASH on July 10, 2021

You are right, as usual in ML, it entirely depends on your data and you model ability to generalise.

There is no general rule. Ideally models should be recalibrated often. That notion of 'often' entirely depends on your problem. But in practice there are statistical and operationnal considerations that should be taken into account.

Statistical considerations revolve around the availability of new data (if you have a yearly data collection process, it would be a bad idea to recalibrate mid-year), and the evolution of the underlying data generation process. You can try to observe the evolution of the underlying process trough studying basic statistics about the distribution of your variable / of your output, but this will only give you a partial answer. The only real answer you can get to those statistical considerations is to recalibrate the model and compare the results to the previous one... so at this point you might as well use the new model (if you haven't any operrational considerations).

Operrationals considerations revolve aorund the difficulties of the recalibration process and putting the new model into production. Is this process automated ? does it require a specific data exctraction ? Is the recalibration easy to do ? Are the ressources (human / hardware) available to do that ?

Note that some constraint may also be regulatory : in some domains there are some internal guidelines / external regulation that give a minimal frequency of model recalibration.

Answered by lcrmorin on July 10, 2021

It is good practice to retrain model once in a while. That "in a while" depends on the specific task, of course. This is due to the fact that countless unmeasurable things change continuously.

Think for example about a model to forecast companies' revenues: the rules of the game change continuously, the market today is not like the market one year ago, not to mention five years ago. This is the first example that I thought about, but it's true for pretty much any possible application of ML.

That is why it was observed that models' performance deteriorates in time. For that reason, it is good practice to keep your models updated with fresh data.

Answered by Leevo on July 10, 2021

In some cases where you know that the underlying process slightly changes, a good course of action is to have an adaptive forecasting model. Thus, model parameters are slightly re-calibrated at each new observation (analogous to online learning). An example would be forecasting power output for a wind turbine: due to some factors such as blades getting dirty and weather slowly changing, it is actually in your best interest to have a time adaptive model.

Also, local learning algorithms are essentially re-trained at each new observation, since kernel weights are re-estimated. This includes algorithms like k-NN, kernel regression, local linear regression, etc. In this case the training is always conducted when the prediction is needed; the only requirement is that you update your historical observations.

Regarding retraining batch models I have not seen general consensus, only empirical evidence. In some applications, such as electricity price forecasting, a lot of researchers re-train models using a rolling approach. The length of the rolling window usually is 2 times the length of the longest seasonal period. Sometimes this approach is coupled with exponentially decaying weights. However, I have noticed that this applies usually to researchers from econometric background rather than machine learning/ computational intelligence.

Answered by Akylas Stratigakos on July 10, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP