Resampling train and test data in R

Question

I need to try out few different machine learning methods (SVM, Logistic regression etc.), predict a value either true or false, and write down their AUC and Accuracy of these predictions.
I have allready successfully done that, now i have a two matrixes one for AUC and one for Accuracy, and they are filled with data from SVM and logistic regression (one row).
Now i need to create models for SVM and Logistic regression 10 more times (i should use bootstrapping sampling) and with that i should have 10 rows of my AUC and accuracy data. I have read multiple articles and guides/tutorials, however i can't figure out how to achieve this. I also found and tried few libraries ( one is ROSE and the other one is boot) and none worked for me. Because if i understand the  assigment correctly i need to get 10 different samples from my dataset, and then seperate the data in train and test sets so i can compare the models AUC and accuracy and see how good those models actually are.
Like i said i found multiple sources and the best thing i came up with is this:
 for (i in 1:10){
      set.seed(123)
      ##########################
      ##########################
      boot.sample = sample(n, 1000, replace = TRUE)
      bootSample = dataset[boot.sample, ]
      bootSample
    
      split = sample.split(bootSample$blueWins, SplitRatio= 0.80)
      training = subset(bootSample, split == TRUE,  replace=TRUE)
      test = subset(bootSample, split == FALSE,  replace=TRUE)
      print(training)
}

But with this approach i think set.seed messes up everything, because it works with the same data every time. However i think the assingment wants me to use the same seed for every machine learning model.
I maybe overcomplicated the whole thing, i am new to R.
Hope someone can clear these things up.
Thanks

bstrain · Answer

Try using a different seed for each loop. You can do it like this.
my_seeds <- c(1:10) # These are 10 seeds, 1, 2, 3...10. Change to whatever.     
for (i in 1:10){
          set.seed(my_seeds[i])
          ##########################
          ##########################
          boot.sample = sample(n, 1000, replace = TRUE)
          bootSample = dataset[boot.sample, ]
          bootSample
        
          split = sample.split(bootSample$blueWins, SplitRatio= 0.80)
          training = subset(bootSample, split == TRUE,  replace=TRUE)
          test = subset(bootSample, split == FALSE,  replace=TRUE)
          print(training)
    }

Ruin Donas · Answer

You can set seed once outside the loop:
set.seed(123)
 for (i in 1:10){
      
      ##########################
      ##########################
      boot.sample = sample(n, 1000, replace = TRUE)
      bootSample = dataset[boot.sample, ]
      bootSample
    
      split = sample.split(bootSample$blueWins, SplitRatio= 0.80)
      training = subset(bootSample, split == TRUE,  replace=TRUE)
      test = subset(bootSample, split == FALSE,  replace=TRUE)
      print(training)
}

Resampling train and test data in R

2 Answers

Add your own answers!

Ask a Question