How to Predict/Forecast street's Traffic based on previous values?

Question

I have a dataset which has the following 5 columns:

date, hour, day_of_week, street_id, counts

My dataset has information about the number of cars that each street (same city) has in a given hour of a certain date, and I want to predict the traffic count that a certain street has in a given hour of a certain date.

I think I could use certain variables depending on the day and hour that I want to predict, for example, if a want to predict the traffic count of a working Wednesday:

Results of others working days
Results of others Wednesdays
...

I want to use Spark MLlib to perform the prediction because I have experience with Spark and I have large datasets.

How you deal with this kind of problem?

Any ideas?

Vivek Kalyanarangan · Answer

This looks like a Time Series problem. So based on a variable's past values, you try to predict the future values.

Usually an "unheard of" problem with Spark, but you are in luck ; spark-ts library seems to be doing what you need, so you don't need to code your own using MLlib. I recommend you try it out and then circle back to something in MLlib if things don't work.

They have introduced a TimeSeriesRDD and once you can encode your data in this data structure (Note that this still behaves like a normal RDD), you can play around with the models available. For example, implementing the ARIMA model would be as simple as  -

val arimaModel = ARIMA.fitModel(1, 0, 1, ts)

Hope that helps!

How to Predict/Forecast street's Traffic based on previous values?

One Answer

Add your own answers!

Ask a Question