Data Science Asked by David Masip on September 5, 2021
I am working on a ML model to be deployed in a product operating in many countries.
The issue that I am having is the following: should I
I’ve faced this problem several times, and to me, there’s a trade-off in the learning: in the first case, the model has more data to learn, and it’ll be more robust (also, the solution is simpler). In the second case, I’ll have a more tailored model to each country, and will be able to see effects that are specific to that country.
I’m very interested in knowing if there’s an intermediate solution – a general model with some country-specific fine-tuning that can see all the data but also specialize in each specific country. If I were to use Neural Networks, this fine-tuning is natural – you train some epochs with all the data, then the last epochs with each specific country. I am wondering if something similar can be done in Linear regression models and Xgboost, which are the models I generally use.
Is there any literature on this? I think it is kind of a generic topic and there should be some.
I don't think there is a unique rule to answer that. It strongly depends on how pertinent is the country information regarding other input data and what you want to predict.
It is possible to face cases where similar input data in different countries lead to different outputs. In that case, it would be mandatory either to add the country as input information or to create a model per country.
In other cases, the country information would not lead to any improvement in the model (so no need to do a specific model per country).
Finally, there are cases for which you will find global information (whatever country) and specific information per country. In that case, there are multiple approaches to deal with it. The first and most common is to include the country as input of your global model. As @Fnguyen mentioned, why dealing with the country differently than other inputs?
Update
If you think that the country has a specific impact on the prediction, here are a non-exhaustive list of how you could create models that deal with your assumption:
The list is non-exhaustive and you must have a good reason to use such approaches which give more importance to the country information. Normally, the machine learning algorithms would do it on their own.
Answered by etiennedm on September 5, 2021
I think the only objective criterion to decide this is to simply compare the performance of the candidate approaches over the validation data.
That being said, if I were to blindly choose the approach upfront without any other information, I would choose a single model, where the model is aware of the country of each piece of data. This would let it model the peculiarities of each country while profiting from the combined training data.
If you have reasons to believe this is harming the global performance because of the intrinsic difference of some countries, you can apply boosting and therefore let the performance of the classifiers speak by itself.
Answered by noe on September 5, 2021
First of all in this use case i see we want our model to learn data or understand the data frist. This is a similar problem of natural language processing like one always try to make model to learn it from data. Here we can do little tricky things where we can declare country as target variable and rest of features as input variable. We can at least train model to learn mapping for input features to country so model might have an understanding of the representation of input with respect to the country. we can use this model as ensemble modeling suggested above. I think it would give little accuracy increment and cost-effective solution also.
Answered by Gaurav Koradiya on September 5, 2021
In the paper of Hinton - Distilling the knowledge of Neural Networks, the following is mentioned (Section 5) when defining specialist models:
When the number of classes is very large, it makes sense for the cumbersome model to be an ensemble that contains one generalist model trained on all the data and many “specialist” models, each of which is trained on data that is highly enriched in examples from a very confusable subset of the classes (like different types of mushroom).
What they do is they use a general model first and then a specialist model each to focus on a different subset of the classes.
You could consider your problem something similar, instead of a specialist to classes, and specialist in countries. This way you could build a country(cluster of countries) specific ensemble of models.
Answered by Carlos Mougan on September 5, 2021
I think the most important thing you can do to bridge both assumptions is to include the country as a variable in the global model.
Should there be any country-specific effects they will simply be modeled as interactions in the global model. This is how the model deals with any other variable anyway and why should country be any different?
I think the problem is much more complicated if the data is heavily imbalanced e.g. some products are only sold in one country, etc. However this only becomes a problem at a point when training a global model is infeasible anyway.
Answered by Fnguyen on September 5, 2021
I think that @mirimo's idea of having a regularized model as an offset is very interesting.
My proposal is a slight variation where you ensure you don't overfit.
The idea is, to obtain the model for group $j$, train a model with all groups except $j$ and use that model as an offset to the model for group $j$. This way, we can have a complex model for the general behavior and still not train on the same target twice, thus having a more stable model.
The downside is that this is way slower, as, if there are $J$ groups, it takes around $J$ times more than the regular training.
Edit
On top of @Carlos Mougan proposal, we can:
I think this shrinkage is very relevant to the problem.
Answered by David Masip on September 5, 2021
I don't have the theoretical ressources to confirm this but I think it's possible to train a first model on the whole dataset with a limited degree of freedom (high regularization) and with the commmon features that will allow you to capture the global trends and then train local models on the residuals.
Answered by mirimo on September 5, 2021
Get help from others!
Recent Answers
Recent Questions
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP