Data Science Asked by Jonathan Ng on March 13, 2021
I am interested in finding the OOB score for random forest using sklearn, when it is used for a binary classification task, and there are unbalanced samples. What does the oob decision function mean in random forest, and how get class predictions from it?
I read RandomForestClassifier OOB scoring method but am still not clear. Does the oob decision function provide class probabilities, and if so, do I get the class predictions by taking whichever number is higher (e.g. by doing something like pred_train = np.argmax(forest.oob_decision_function_,axis=1))?
Since my classes are unbalanced, would it be correct to say I can’t used sklearn’s default OOB score here, and I should do the above to get some kind of F1 score from the OOB predictions, to get a better estimate of my random forest’s error?
Every Tree gets its OOB sample.
So it might be possible that a data point is in the OOB sample of multiple Trees.
oob_decision_function_
calculates the aggregate predicted probability for each data points across Trees when that data point is in the OOB sample of that particular Tree.
The reason for putting above points is that OOB will give you the mean of probability but it will not tell you anything about the standard deviation of the probability across Trees.
Does the oob decision function provide class probabilities,
Yes
and if so, do I get the class predictions by taking whichever number is higher (e.g. by doing something like pred_train = np.argmax(forest.oob_decision_function_,axis=1))?
Yes
Since my classes are unbalanced, would it be correct to say I can't use sklearn's default OOB score here
OOB score is still the default score i.e. Accuracy. So, will not help for the Imbalanced class.
Correct answer by 10xAI on March 13, 2021
Get help from others!
Recent Answers
Recent Questions
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP