Data Science Asked by scipilot on January 21, 2021
I am building something very similar to this BigQuery ML example project.
My system is different in two ways:
Firstly it will need several thousand time-series so I would prefer to use the multiple-series feature rather than having thousands of individual models.
Secondly is the data is more unpredictable in the long run (rather than periodic or seasonal) so needs retraining quite often, with only local trends being detected.
The data is actually monitoring voltages in battery-operated devices, which usually drops at a linear rate, but sometimes can drop much faster depending on usage, then get recharged randomly. I am forecasting the future voltage and predicting when a critical level will happen. I’ve tested one model and the predictions are impressively good, ARIMA seems to balance both long-term typical behaviour with the local recent changes even though it’s aperiodic.
Concerning the data-pipeline, I receive new data from each device individually at random intervals. So I need to push this new data into the model, retrain it and refresh the forecast for that device’s time-series.
I can limit the number of times this refresh happens, e.g. once per hour, rather than doing it in event-based in realtime. Near realtime would be great, but it’s not essential.
If I had thousands of single models, this would be trivial to design, I’d update the data (BiqQuery table) and then re-create that specific model, which takes some 30 seconds.
However having all the series in one model, it seems (from the docs) that I’d have to throw away the entire model of all thousands of series, retrain the entire thing. I don’t currently know how long this would take, how it scales, but I assume it will be significantly longer and presumably more expensive. All but one of of the time series are unchanged so this seems a very wasteful operation.
So is there a way to retrain just one data-series in a BigQuery ARIMA model?
I have read the interesting notes under the limitations section for the time-series specific CREATE MODEL syntax which discusses processing thousands of series.
I have read this question but I wasn’t convinced by the assertion of using warm start as he leaves no doc references, and it doesn’t seem available in the ARIMA model.
I have also read this question which is more about the general necessity of retraining.
After implementing and training this model and understanding the details better, it seems the short answer is NO, I have to remake and train the entire model.
It takes about 7 minutes to train about 3,500 series with about 100 datapoints each from a seed dataset. This may grow as the incremental ETL adds data indefinitely.
Answered by scipilot on January 21, 2021
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP