When to split data into multiple regression models instead of one model?

Question

I'm playing with regression models in scikit-learn. The goal is to predict how much inventory we should purchase for the next 90 days. My data set has hundred of product categories. Each category has many unique features that do not apply to every category.

For Example: Shirt category could have "size" and "color" features where as the category Razors could have a "number of blades" feature.

Should I split my data up by category and make a different model for each? Or is it suffient to have one model in which I keep the products category as one of the features?

Siong Thye Goh · Answer

You should split them by category since their features do not apply to each category.

Under certain circumstances that perhaps you manage to group some categories together based on some business logic, then perhaps you can build less models.

ombk · Answer

First question what is the motivation.
If you get this question in real life, how would you tackle it? If you were to tackle it for your business?
For sure, you should split the data into categories. This is similar to feature engineering, you want the best data to predict your categories.
Create a model for each of the labels you want to predict and make sure to choose the best predictive features.

When to split data into multiple regression models instead of one model?

2 Answers

Add your own answers!

Ask a Question