Data Science Asked by frigidelirium on July 30, 2021
I am trying to predict based on several parameters like trip type, car type, source of booking, start time, lead time (start- book) and a few other params whether or not a customer will cancel. From the code below the accuracy of default.ct the 1st classification I do is giving me an accuracy of 75%. deeper.ct the deeper tree that I am generating is giving me an accuracy of 70%. Progressively the accuracy of the pruned tree also is remaining the same. Boosting with adabag package is taking way too long because I’ve nearly 5,00,000 observations across 19 variables. xgboost is giving me the best mlogloss value at about 0.43.
What can I do to improve the accuracy of the model?
# Generate classification tree
default.ct <- rpart(tag ~ ., data = train.df, method = "class",
control=rpart.control(minsplit=2, minbucket=1, cp=0.001))
summary(default.ct)$used
printcp(default.ct)
# generate confusion matrix for training data
prp(default.ct, type = 1, extra = 1, under = TRUE, split.font = 1, varlen =
-10)
default.ct.point.pred.train <- predict(default.ct,train.df,type = "class")
confusionMatrix(default.ct.point.pred.train, train.df$tag)
deeper.ct <- rpart(tag ~ ., data = train.df, method = "class", cp = 0,
minsplit = 1)
# count number of leaves
length(deeper.ct$frame$var[deeper.ct$frame$var == "<leaf>"])
## Use cross-validation to prune the tree
cv.ct <- rpart(tag ~ ., data = train.df, method = "class", cp = 0, minsplit =
5, xval = 5)
# use printcp() to print the table.
printcp(cv.ct)
# Use variable c to store accuracy data for different cp and print it out
c <- list()
for (i in 1:nrow(cv.ct$cptable)){
pruned.ct <- prune(cv.ct,
cp = cv.ct$cptable[i])
pruned.ct.point.pred.train <- predict(pruned.ct,valid.df,type = "class")
c[i] <- confusionMatrix(pruned.ct.point.pred.train, valid.df$tag)$overall[1]
}
# prune the tree with second large cp and use it to predict validation data
pruned.ct <- prune(cv.ct, cp = cv.ct$cptable[2])
length(pruned.ct$frame$var[pruned.ct$frame$var == "<leaf>"])
You have mentioned time issues when you tried ensemble methods. To solve that:
Answered by Rohan on July 30, 2021
Get help from others!
Recent Answers
Recent Questions
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP