TransWikia.com

What are some solutions for dealing with time series data that are recorded at uneven intervals?

Data Science Asked on January 19, 2021

Let’s say I have a time series data which is a bunch of observations that occur at different time stamps and intervals. For example, my observations come from a camera located at a traffic intersection. It only records when something occurs, like a car passes, a pedestrian crosses, etc… But otherwise doesn’t record information.

I want to produce a LTSM NN (or some other memory based NN for time series data), but since my features don’t occur at even time intervals, I am not sure how having a memory would help. For example let’s consider two sequences:

Scenario 1:

  • At 1PM, I recorded a car passing.
  • At 105, some people cross.
  • At 150 some more people cross.
  • At 2PM, another car passes.

Scenario 2:

  • At 1 PM a car passes
  • At 2 PM a car passes

In the first scenario, the last car passed 3 observations ago. In the second scenario, the last car passed 1 observation ago. Yet in both scenarios, the last car passed 1 hour ago. I am afraid that any model would treat the last car passing in scenario 1 as 4 time periods ago and the last car passing in scenario 2 as 1 time period ago, even though in reality, the time difference is the same. My hypothesis is that the time difference is a very important feature, probably more so than the intermediate activity between the two car passing. In other words, knowing that the last car passed 1 hour ago is equal or likely more important than knowing that there were some people crossing in the last hour. With that said, knowing that people crossed is important too so I can’t just remove that feature.

Another example of my issue can be scene below:

Scenario 1

  • 1PM Car passes
  • 2PM Car passes

Scenario 2

  • 1PM Car passes
  • 10PM Car passes

Once again, in my data set, this would be treated as adjacent observations, but in reality, the time gap is vastly different and thus, the two scenarios should be viewed as very dissimilar.

What are some ways to solve these issues?

  1. I’ve thought of just expanding the data set by creating a row for every possible time stamp, but I don’t think this is the right choice as it would make my dataset humongous and most rows would have 0s across the board. I have observations that occur in microseconds so it would just become a very sparse dataset.
  2. It would be nice to include time difference as a feature, but I am not sure if there’s a way to include a dynamic feature in your data set. For example, in the first scenario, at 105, the 1PM observation needs a feature that says this occurred 5 minute ago. But at 150, that feature needs to be changed to this occurred 50 minutes ago, and then at 2PM, that feature needs to now say that it occurred 1 hour ago.
  3. Would the solution to just give the NN the raw data and not worry about it? When building a NN on word prediction, I guess if you give the model enough data, it’ll learn the language even if the relevant word happened 10 paragraphs ago… However, I am not sure if there are enough examples of the exact same sequences (even with the amount of data) for it to obtain the predictability I want.

Any ideas on ways to solve this problem while keeping in mind that the goal is to build a NN? Another way to think about it is, the time when a data point occurred relative to when the prediction will be made, in my situation, is a crucial piece of information for prediction.

One Answer

So, the question essentially asks how to model sequential data with inconsistent time intervals.

I would say option 2 would be more logical in my mind.

I would propose to have the timestamp when the image was taken from the camera as an input feature for each example. Here, it would still be in a format which you could model this using an RNN/LSTM (any sequential model architecture).

By adding the timestamp feature to your input, you at least have given the model the time representation of the images. You could represent the timestamp fo the image as number of seconds (minutes/hours, whichever is more appropriate to the dataset you have) from the earliest image you have in the dataset.

Of course, it's best to normalise this so you don't end up with very large values, which could reduce the overall generalisation performance of the model.

Answered by shepan6 on January 19, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP