Data Science Asked by AngryCoder on December 18, 2020
I have a dataset which has the following 5 columns:
date, hour, day_of_week, street_id, counts
My dataset has information about the number of cars that each street (same city) has in a given hour of a certain date, and I want to predict the traffic count that a certain street has in a given hour of a certain date.
I think I could use certain variables depending on the day and hour that I want to predict, for example, if a want to predict the traffic count of a working Wednesday:
Results of others working days
Results of others Wednesdays
…
I want to use Spark MLlib to perform the prediction because I have experience with Spark and I have large datasets.
How you deal with this kind of problem?
Any ideas?
This looks like a Time Series
problem. So based on a variable's past values, you try to predict the future values.
Usually an "unheard of" problem with Spark, but you are in luck ; spark-ts
library seems to be doing what you need, so you don't need to code your own using MLlib. I recommend you try it out and then circle back to something in MLlib if things don't work.
They have introduced a TimeSeriesRDD
and once you can encode your data in this data structure (Note that this still behaves like a normal RDD
), you can play around with the models available. For example, implementing the ARIMA
model would be as simple as -
val arimaModel = ARIMA.fitModel(1, 0, 1, ts)
Hope that helps!
Answered by Vivek Kalyanarangan on December 18, 2020
Get help from others!
Recent Answers
Recent Questions
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP