Data Science Asked on January 19, 2021
Let’s say I have a time series data which is a bunch of observations that occur at different time stamps and intervals. For example, my observations come from a camera located at a traffic intersection. It only records when something occurs, like a car passes, a pedestrian crosses, etc… But otherwise doesn’t record information.
I want to produce a LTSM NN (or some other memory based NN for time series data), but since my features don’t occur at even time intervals, I am not sure how having a memory would help. For example let’s consider two sequences:
Scenario 1:
Scenario 2:
In the first scenario, the last car passed 3 observations ago. In the second scenario, the last car passed 1 observation ago. Yet in both scenarios, the last car passed 1 hour ago. I am afraid that any model would treat the last car passing in scenario 1 as 4 time periods ago and the last car passing in scenario 2 as 1 time period ago, even though in reality, the time difference is the same. My hypothesis is that the time difference is a very important feature, probably more so than the intermediate activity between the two car passing. In other words, knowing that the last car passed 1 hour ago is equal or likely more important than knowing that there were some people crossing in the last hour. With that said, knowing that people crossed is important too so I can’t just remove that feature.
Another example of my issue can be scene below:
Scenario 1
Scenario 2
Once again, in my data set, this would be treated as adjacent observations, but in reality, the time gap is vastly different and thus, the two scenarios should be viewed as very dissimilar.
What are some ways to solve these issues?
Any ideas on ways to solve this problem while keeping in mind that the goal is to build a NN? Another way to think about it is, the time when a data point occurred relative to when the prediction will be made, in my situation, is a crucial piece of information for prediction.
So, the question essentially asks how to model sequential data with inconsistent time intervals.
I would say option 2 would be more logical in my mind.
I would propose to have the timestamp when the image was taken from the camera as an input feature for each example. Here, it would still be in a format which you could model this using an RNN/LSTM (any sequential model architecture).
By adding the timestamp feature to your input, you at least have given the model the time representation of the images. You could represent the timestamp fo the image as number of seconds (minutes/hours, whichever is more appropriate to the dataset you have) from the earliest image you have in the dataset.
Of course, it's best to normalise this so you don't end up with very large values, which could reduce the overall generalisation performance of the model.
Answered by shepan6 on January 19, 2021
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP