Data Science Asked by tristar8 on September 4, 2021
I’m working on a machine learning problem involving inventory (i.e. physical retail stock), however through the cleaning (outlier removal) process some of the items (via their corresponding transactions) will be removed. Therefore, I thought of using KNN to group similar items into respective categories.
There are 1245 items
The info for each item is
Am I right in thinking that KNN is a good option – and if so, how do I decide on the number of clusters?
Training: You can use a distance metric to compute the distance between all observations along the dimensions of your observed variables (Avg. Weight. Price, Tot. Quant. Sold, etc.). For each observation or row or sample i, the point with the smallest distance from that observation is the nearest neighbor. The point with the second smallest distance is the 2nd nearest neighbor, and so on.
Prediction: You can find the nearest neighbors for new data by calculating their distances to each point in the training data as above. A predicted label is then assigned, usually by taking the most common label amongst the test data points' k nearest neighbors. Hence k-NN classification:
from sklearn.neighbors import KNeighborsClassifier
knn = KNeighborsClassifier(algorithm='auto',
metric='minkowski', # pick a distance metric
metric_params=None,
n_neighbors=5, # take the majority label from the 5-nearest neighbors
p=2, # a hyperparameter required for 'minkowski' distance metric
weights='uniform')
knn.fit(train_data, train_labels)
# Find the predicted class of the test data:
knn.predict(testset_data)
Answered by Dij on September 4, 2021
So your question is on the effectiveness of KNN to categories items based on features you have listed above.
As you might already know, KNN is a unsupervised clustering algorithm which creates K clusters with a minimal intra-cluster variation. This is method can be particularly use for when you know what the number of groups K you need. Also, it is particularly handy if you do not have any labels for categories for all examples.
At the same time, this method isn’t deterministic, which means that groupings do vary after each execution.
From this information, you might get a better idea for yourself as to whether KNN would be useful for this task.
Answered by shepan6 on September 4, 2021
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP