Data Science Asked on May 2, 2021
There is a need to estimate Annual Average Daily Traffic Volume (AADT).
We have bunch of data about vehicles’ speeds during several years. It is noticed that AADT depends on the average number of such samples during some time, so a regression model $Y = f(x_1)$ could help estimating the AADT.
The problem is there are other features affecting the dependency which are both numerical $(x_2, .., x_k)$ and categorical $(c_1 = data provider, c_2 = road class, .., c_m)$.
We believe that $x_1$ affects the AADT much more than all the other features and the $x_1$ itself could also depend on other features too.
That’s why we would like to get a set of regressions $Y = f(x_1)$ depending on $(x_2, ..x_k, c_1, ..c_m)$.
Both $k$ and $m$ are just few.
—
Is it reasonable to cluster dataset by features $(x2, .., x_k, y_1, .., y_m)$ first, and then try to find regression $Y=f(x_1)$ in each cluster?
Or is it better to consider all the features $(x1, x2, .., x_k, y_1, .., y_m)$ together with $x1$ having more weight than others?
Also note that for multiple variable regression there is a mix of numerical & categorical features.
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP