Normalization in production

Question

I am currently writing a machine learning pipeline for my time series application. At the end of each month, I get the data gathered, normalize it ([0, 1]), retrain the ML model with the new observation only and predict future values.

Question

Should I be reading the entire dataset each time I get a new Observation, normalize the entire dataset, create the ML model, then predict?

How I got stuck:

Let's say I have 1 feature and at t-1 all of the values have min/max = [0, 1000]
At t, a new observation comes in with value = 1001
How should I normalize the new value given that the ML model has been trained with different min/max?

Thank you

vienna_kaggling · Answer

Really depends

Why? updating everything in production (pre-processing, fitting etc) can get extremely expensive. If you have some complex architecture it is not worth it.

Alternatives

Approximate covariate shift if you know distribution of your future data you can adjust all your, for example normalisation parameters, in advance.
Save your you future data every time you make prediction, it could be cheaper to quickly save your data in DB and depending on your system do updates weekly,monthly

Normalization in production

Question

One Answer

Add your own answers!

Ask a Question