Data Science Asked by Srule on September 5, 2021
I’m playing with regression models in scikit-learn. The goal is to predict how much inventory we should purchase for the next 90 days. My data set has hundred of product categories. Each category has many unique features that do not apply to every category.
For Example: Shirt category could have “size” and “color” features where as the category Razors could have a “number of blades” feature.
Should I split my data up by category and make a different model for each? Or is it suffient to have one model in which I keep the products category as one of the features?
You should split them by category since their features do not apply to each category.
Under certain circumstances that perhaps you manage to group some categories together based on some business logic, then perhaps you can build less models.
Answered by Siong Thye Goh on September 5, 2021
First question what is the motivation.
If you get this question in real life, how would you tackle it? If you were to tackle it for your business?
For sure, you should split the data into categories. This is similar to feature engineering, you want the best data to predict your categories.
Create a model for each of the labels you want to predict and make sure to choose the best predictive features.
Answered by ombk on September 5, 2021
Get help from others!
Recent Answers
Recent Questions
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP