Data Science Asked by Simi on December 30, 2020
I am new to Machine Learning. Recently I am trying to build a model to predict the sales of a particular automobile make and model of a dealership and to which location. The data given to me has information about the dealer name, state, zip code, owner state, zip code, vehicle make, model, year, vehicle segment, body type, odometer reading, new/used indicator. How do I build a model to do predictive analysis?
Welcome to the forum (and machine learning). Choosing an estimator/model can sometimes be the hardest problems, as such there are many different opinions and ways to do so.
The first thing I notice with your problem statement is that you're wanting to predict the Sales (either in $ or quantity, I would assume). Because you're not trying to predict a category or binary label, we can rule out classification models and look at instead regression style models.
A logical starting point for this would be to explore using a Linear Regression. This assumes that you have some existing data with the price, quantity, or target you're trying to predict.
Linear Regression is a logical starting point as they're simple to implement, yet can get some pretty good results. Scikit-learn has a nice package in Python on linear regressions.
If you don't get satisfactory results using Linear Regressions, you may wish to explore Decision Trees, either scikit-learns implementation or xgboost. This rules out classification models.
Briefly, I think it's important to note that the features to your model are nearly all categorical, eg. make, model, etc. To handle these features, you'll need to encode these features, a guide on that is here.
Take your time, explore your data a lot and start with a simple model. The more you understand your data, the more you'll understand the model. Enjoy and have fun! :)
Answered by James C on December 30, 2020
Get help from others!
Recent Answers
Recent Questions
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP