Data Science Asked on May 10, 2021
I am solving a problem with machine learning and I have some data with two integer type independent variables and a continuous dependent variable. I am optimising to RMSE. I had fairly large RMSE value on my validation data. I learned that my model didn’t do good on larger values of target; so, I have tried removing rows with larger values and that didn’t help. So, now in the process of understanding the mistakes, I calculated RMSE for each ground truth value and it’s prediction from validation set and plotted it to understand where big mistakes had happened. Apparently, my model still doesn’t do good at larger values of target.
And here is the plot showing relationship between ground truth values and predictions:
As you can see, my model’s predictions got worse as the values got large. How do I prevent this?
Some information about my data(only what I can reveal):
How do I decrease my RMSE or doing what will decrease it?
If my understanding is right, you have a regression problem, with categorical features with high cardinality and "outliers" (or just big numbers).
How have you encoded categories? Target Encoding? There is another option that is not encoding with the mean but with the median that on some cases can perform better.
On this notebook , you can see an implementation adn the results of this method.
Answered by Carlos Mougan on May 10, 2021
It looks like that your prediction is clamping at 750.
Be mindful of the fact that Tree can't predict a Regression value that is outside the range it has been trained on.
So, first of all, please assure that your data doesn't have a trend.
Answered by 10xAI on May 10, 2021
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP