Data Science Asked on July 14, 2021
Is there an efficient way to handle pruning in Decision Tree with Python ?
Currently I’m doing that:
def do_best_tree(Xtrain, ytrain, Xtest, ytest):
clf = DecisionTreeClassifier()
clf.fit(Xtrain, ytrain)
path = clf.cost_complexity_pruning_path(Xtrain, ytrain)
ccp_alphas = path.ccp_alphas
clfs = []
for ccp_alpha in tqdm(ccp_alphas):
clf = DecisionTreeClassifier(ccp_alpha=ccp_alpha)
clf.fit(Xtrain, ytrain)
clfs.append(clf)
return max(clfs, key=lambda x:x.score(Xtest, ytest))
But it’s super slow (as I create and fit a lot of trees).
Is there a more efficient way to do this with scikit-learn, or another library that handle this ?
You might benefit from random forests instead which aim to achieve the same objectives you are aiming for, i.e better generalization through pruning to remove overfitting.
scikit learn's random forest algorithm will let you specify how many or what proportion of variables you want to automatically drop across the many trees whose results will be averaged for even better generalization performance.
Answered by Nitin on July 14, 2021
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP