TransWikia.com

Normalization in production

Data Science Asked on June 16, 2021

I am currently writing a machine learning pipeline for my time series application. At the end of each month, I get the data gathered, normalize it ([0, 1]), retrain the ML model with the new observation only and predict future values.

Question

Should I be reading the entire dataset each time I get a new Observation, normalize the entire dataset, create the ML model, then predict?

How I got stuck:

  • Let’s say I have 1 feature and at t-1 all of the values have min/max = [0, 1000]
  • At t, a new observation comes in with value = 1001
  • How should I normalize the new value given that the ML model has been trained with different min/max?

Thank you

One Answer

Really depends

Why? updating everything in production (pre-processing, fitting etc) can get extremely expensive. If you have some complex architecture it is not worth it.

Alternatives

  1. Approximate covariate shift if you know distribution of your future data you can adjust all your, for example normalisation parameters, in advance.

  2. Save your you future data every time you make prediction, it could be cheaper to quickly save your data in DB and depending on your system do updates weekly,monthly

Answered by vienna_kaggling on June 16, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP