Data Science Asked by znoris007 on April 27, 2021
I need to try out few different machine learning methods (SVM, Logistic regression etc.), predict a value either true or false, and write down their AUC and Accuracy of these predictions.
I have allready successfully done that, now i have a two matrixes one for AUC and one for Accuracy, and they are filled with data from SVM and logistic regression (one row).
Now i need to create models for SVM and Logistic regression 10 more times (i should use bootstrapping sampling) and with that i should have 10 rows of my AUC and accuracy data. I have read multiple articles and guides/tutorials, however i can’t figure out how to achieve this. I also found and tried few libraries ( one is ROSE and the other one is boot) and none worked for me. Because if i understand the assigment correctly i need to get 10 different samples from my dataset, and then seperate the data in train and test sets so i can compare the models AUC and accuracy and see how good those models actually are.
Like i said i found multiple sources and the best thing i came up with is this:
for (i in 1:10){
set.seed(123)
##########################
##########################
boot.sample = sample(n, 1000, replace = TRUE)
bootSample = dataset[boot.sample, ]
bootSample
split = sample.split(bootSample$blueWins, SplitRatio= 0.80)
training = subset(bootSample, split == TRUE, replace=TRUE)
test = subset(bootSample, split == FALSE, replace=TRUE)
print(training)
}
But with this approach i think set.seed messes up everything, because it works with the same data every time. However i think the assingment wants me to use the same seed for every machine learning model.
I maybe overcomplicated the whole thing, i am new to R.
Hope someone can clear these things up.
Thanks
Try using a different seed for each loop. You can do it like this.
my_seeds <- c(1:10) # These are 10 seeds, 1, 2, 3...10. Change to whatever.
for (i in 1:10){
set.seed(my_seeds[i])
##########################
##########################
boot.sample = sample(n, 1000, replace = TRUE)
bootSample = dataset[boot.sample, ]
bootSample
split = sample.split(bootSample$blueWins, SplitRatio= 0.80)
training = subset(bootSample, split == TRUE, replace=TRUE)
test = subset(bootSample, split == FALSE, replace=TRUE)
print(training)
}
Answered by bstrain on April 27, 2021
You can set seed once outside the loop:
set.seed(123)
for (i in 1:10){
##########################
##########################
boot.sample = sample(n, 1000, replace = TRUE)
bootSample = dataset[boot.sample, ]
bootSample
split = sample.split(bootSample$blueWins, SplitRatio= 0.80)
training = subset(bootSample, split == TRUE, replace=TRUE)
test = subset(bootSample, split == FALSE, replace=TRUE)
print(training)
}
Answered by Ruin Donas on April 27, 2021
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP