Data Science Asked on December 9, 2021
I have an imbalanced data set where positives are just 10% of the whole sample. I am using logistic regression and random forest for classification. While comparing the results of these models, I have found that the probability output of logistic regression ranges between [0,1] while that of random forest ranges between [0, 0.6].
I cannot share the data set but my doubt is around the working of these algorithms. How can random forest generate probability less than 0.6?
To have a probability of 1 in a RF, it means that your algorithm can construct a leaf containing only positive sample. Since it doesn't, this means that your features are not explaining the variance of the output or that your algorithm is under-fitted.
I suggest that you try optimize the hyper-parameters of your RF by using cross-validation and use some oversampling to reduce the bias in your dataset.
Answered by mirimo on December 9, 2021
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP