Machine Learning based Multivariate Time Series Prediction - How to create supervised data format

Question

Q1:

I have a multivariate time series dataset. For each timestep, there are 11 features and 1 output. I am going to use supervised ML to predect the output. I understand that in univariate cases, if I am going to use the past 3 days to predict the t-th day, the dataset will be formatted as

x(t-3) | x(t-2) | x(t-1) | x(t)

, where x(t) is the output to predict.

How should I format the dataset when it is a multivariate problem?

I saw that in some kernels, the problem is formatted as

x12(t-3) | x12(t-2) | x12(t-1) | x1(t), x2(t), ..., x12(t)

, where x12(t) is the output to predict.

In this case, variables x1 to x11 for the past 3 days are ignored.

However, these variables may be important in my case. Can I format the problem into

x1(t-3),...,x12(t-3) | x1(t-2), ..., x12(t-2) | x1(t-1),..., x12(t-1) | x1(t), x2(t), ..., x12(t)

?

(some of the features are just day, month, day of week, etc. created from the datetime index)

Q2:

With only 11 features, is it necessary to conduct feature selection?

vipin bansal · Answer

Regarding how to handle multivariate time series problem, I believe that GitHub link Timeseries multivariate will be helpful.

You have to change n_inputs and n_outputs as 12 and 1 respectively.

Machine Learning based Multivariate Time Series Prediction - How to create supervised data format

One Answer

Add your own answers!

Ask a Question