Using KNN to categorise inventory (physical stock items) - is it the best way?

Question

I'm working on a machine learning problem involving inventory (i.e. physical retail stock), however through the cleaning (outlier removal) process some of the items (via their corresponding transactions) will be removed. Therefore, I thought of using KNN to group similar items into respective categories.
There are 1245 items
The info for each item is

Average Weighted Price
Total Quantity Sold
Total Revenue Achieved
Min Sold per Transaction
Max Sold per Transaction
Min Sell Price
Max Sell Price
Number of Unique Transactions

Am I right in thinking that KNN is a good option - and if so, how do I decide on the number of clusters?

Dij · Answer

Training: You can use a distance metric to compute the distance between all observations along the dimensions of your observed variables (Avg. Weight. Price, Tot. Quant. Sold, etc.). For each observation or row or sample i, the point with the smallest distance from that observation is the nearest neighbor. The point with the second smallest distance is the 2nd nearest neighbor, and so on.
Prediction: You can find the nearest neighbors for new data by calculating their distances to each point in the training data as above. A predicted label is then assigned, usually by taking the most common label amongst the test data points' k nearest neighbors. Hence k-NN classification:
from sklearn.neighbors import KNeighborsClassifier

knn = KNeighborsClassifier(algorithm='auto', 
                           metric='minkowski', # pick a distance metric
                           metric_params=None,
                           n_neighbors=5, # take the majority label from the 5-nearest neighbors
                           p=2, # a hyperparameter required for 'minkowski' distance metric
                           weights='uniform')

knn.fit(train_data, train_labels)

# Find the predicted class of the test data:
knn.predict(testset_data)

shepan6 · Answer

So your question is on the effectiveness of KNN to categories items based on features you have listed above.
As you might already know, KNN is a unsupervised clustering algorithm which creates K clusters with a minimal intra-cluster variation. This is method can be particularly use for when you know what the number of groups K you need. Also, it is particularly handy if you do not have any labels for categories for all examples.
At the same time, this method isn’t deterministic, which means that groupings do vary after each execution.
From this information, you might get a better idea for yourself as to whether KNN would be useful for this task.

Using KNN to categorise inventory (physical stock items) - is it the best way?

2 Answers

Add your own answers!

Ask a Question