Data Science Asked on December 30, 2021
all,
i am training light gradient boosting and have used all of the necessary parameters to help in over fitting.i plot the predicted probabilities (i..e probabililty has cancer) distribution from the model (after calibrating using calibrated classifier) i.e. their histogram or kde. as you can see from below the probabilities for my class 1 are concentrated on the upper and lower end.
i have tried playing around with bandwith too to smooth this a little and it doesn’t smooth the bumps too much. what do you think this shows about my model? isn’t it a good thing that the model for class 1 (which is has cancer) is assigning a greater probability for this class?
i am unsure how to interpret this or where i could be going wrong
the red curve is positive class (has cancer) and the blue curve is hasn’t. below is plot used to generate.
results = df[['label','predicted_prob']]
colors = ['b', 'r']
for label in [0, 1]:
results[results['label'] == label]['predicted_prob'].plot.kde(bw_method=0.35,color=colors[label])
plt.xlim(0,1)
Such a plot doesn't really tell you much about overfitting.
First, check that your calibration has worked well; it's possible that an incorrect calibration has pushed the probabilities to the extremes. Otherwise, the distribution of probabilities being so extreme suggests the data just naturally separates into a segment of easy-to-detect cancers and the rest. Among the latter, it looks like you get reasonably good but not great rank-ordering of cases.
Answered by Ben Reiniger on December 30, 2021
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP