Time series model for multiple different series observations

Question

I have a series of $n$ machines that are going to emit some sensor data. The machines are going to be started at some point, telemetry collected every minute for some time and then stopped. So, instead of one long time-series, I have many short time series. Note that some machines might run for an hour, others for 50 minutes, others for 40 minutes and so on. Within the hour, there will be some seasonal patterns.
Now, I want to fit a time-series model that gives me some 95% confidence bands for each time instant since a machine starts (for the sensor value). Also, tomorrow I will get some new machine I haven't seen before but one that is expected to behave like the $n$ machines I trained on. For each minute and sensor value it produces, I want to estimate the p_value, denoting what the probability of seeing that observation would be if the machine were no different from the $n$ machines I saw in the past.
What time series models can be good fits for this use-case? As a stretch goal, each of the $n$ machines might have some feature vectors associated with them. Is it possible to take the features into account?

Some thoughts: perhaps we can combine the $n$ time series into one long time series and consider the intervals for which the shorter ones don't have values as missing? But then the question becomes what order we should combine them in?

Some examples of the panel data:
https://1drv.ms/x/s!AiY4k2EqE618gakcA0pRg0SAvQl_UA?e=ir93Db

Skander H. · Answer

When dealing with large numbers of time series, you need to look at the difference between local time series models and global time series models:

Local time series models involve training multiple (usually statistical) models, on for each individual time series. E.g. 100 time series 100 ARIMA models, each with its own p,q,d values.
Global models involve training one large ML model on all of the time series. Batching and suitable sampling schemes are used to shuffle through the multiple time series during each training iteration.

For the global model to work though, especially with your new machine cold-start requirement, you definitely need to have the additional feature vectors, not just a a stretch goal, because those features are what will allow the global model to match the new machine to the historical series of similar machines.

What time series models can be good fits for this use-case?

This is a perfect use case for a global deep learning/seq2seq model. Both empirical results and theoretical analysis show that seq2seq models work best with large groups of short time series, while local models work better with small numbers of long time series.

perhaps we can combine the n time series into one long time series and consider the intervals for which the shorter ones don't have values as missing?

Your idea is actually a good one in spirit, because its trying to effectively duplicate the idea behind global time series models. However, it won't work because (a) it would introduce fake patterns and correlations in the data regardless of the order you use to patch the series together. (b) it doesn't solve your new machine cold-start problem.

Now, I want to fit a time-series model that gives me some 95% confidence bands for each time instant since a machine starts (for the sensor value).

If you only need one confidence interval (e.g. 95%) all the time, then just train your global model using pinball loss, with $a=0.95$.
If you need arbitrary confidence intervals, then you will need to forecast full densities, which is also possible using seq2seq and deep learning models.

Time series model for multiple different series observations

One Answer

Add your own answers!

Ask a Question