Data Science Asked by J1J1P13 on May 7, 2021
While I was studying Scikit-learn’s kNN algorithm, I realized that if I use sklearn.model_selection.train_test_split
, the provided data gets automatically split into the train data and the test data set, according to the proportions provided as parameters.
Then based on the train data, the algorithm looks at the k-nearest neighbor points closest to the test data points to determine whether the test data points belong to a certain criteria or not.
I was wondering whether there was a way to predict the criteria NOT for the test data sets, which were already a part of the provided data set, but brand new data that were not provided during the whole process.
Is there a way to do that using sci-kit learn?
KNN is not fitted to "the k-nearest neighbor points closest to the test data points". You specify the fit option, like:
neigh = KNeighborsClassifier(n_neighbors=3)
neigh.fit(X, y)
Usually this will be xtrain, ytrain
, while you test the model performance using "new" (unseen) data and compare the true targets to the prediction.
neigh.predict(xtest)
or
neigh.predict_proba(xtest)
See docs: https://scikit-learn.org/stable/modules/generated/sklearn.neighbors.KNeighborsClassifier.html
Correct answer by Peter on May 7, 2021
After your initial validation of the model using train-test split, if you are satisfied with the performance, You can create a final model by training on the entire dataset. That way you put to use all available labeled for running inferences on brand new data.
You would simply perform a:
model = KNeighborsClassifier()
model.fit(X, y)
Where X, y represent your entire training data.
Answered by Jayaram Iyer on May 7, 2021
Get help from others!
Recent Answers
Recent Questions
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP