Data Science Asked by Alex Dore on April 15, 2021
I am creating a few models based on service requests. The services being requested are not distributed equally, some services being used sparingly, whereas others are quite common.
I had these services as categorical variables and built pipelines to incorporate them through one-hot encoding. I got to thinking that it may make more sense to train a model per service(at least for the common ones). Or does it make more sense to lump in the less common ones in a special category?
I am struggling with the regression model, coming in at 0.41 for my R2 value.
Is there a fundamental difference from creating a model for each value in a category?
Yes there is.
If a model is trained for each specific value of a variable (a category), then only the subset of data for this category can be used to train and test the model. As a consequence each model has a smaller number of instances to be trained from. Consequences:
In conclusion the choice often depends on:
Correct answer by Erwan on April 15, 2021
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP