Data Science Asked by jkjsdf fod on September 11, 2020
Q1:
I have a multivariate time series dataset. For each timestep, there are 11 features and 1 output. I am going to use supervised ML to predect the output. I understand that in univariate cases, if I am going to use the past 3 days to predict the t-th day, the dataset will be formatted as
x(t-3) | x(t-2) | x(t-1) | x(t)
, where x(t) is the output to predict.
How should I format the dataset when it is a multivariate problem?
I saw that in some kernels, the problem is formatted as
x12(t-3) | x12(t-2) | x12(t-1) | x1(t), x2(t), ..., x12(t)
, where x12(t) is the output to predict.
In this case, variables x1 to x11 for the past 3 days are ignored.
However, these variables may be important in my case. Can I format the problem into
x1(t-3),...,x12(t-3) | x1(t-2), ..., x12(t-2) | x1(t-1),..., x12(t-1) | x1(t), x2(t), ..., x12(t)
?
(some of the features are just day, month, day of week, etc. created from the datetime index)
Q2:
With only 11 features, is it necessary to conduct feature selection?
Regarding how to handle multivariate time series problem, I believe that GitHub link Timeseries multivariate will be helpful.
You have to change n_inputs and n_outputs as 12 and 1 respectively.
Answered by vipin bansal on September 11, 2020
Get help from others!
Recent Answers
Recent Questions
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP