Calculating distance between data points when there are more than 3 features in KNN algorithm

Question

I've been reading about K-nearest neighbors algorithm and want to clarify few things.
If we have 2 features we could simply plot it on 2-d plane and calculate distance by using euclidean distance or Manhattan distance.
When there are more than 3 features, exceeding 3-d going into 4-d and more I've read that we use PCA to reduce dimension to 2-D and then calculate distance on PCA plot.
But my question is, is that only way? so in order to use KNN for more than 3 features we must use PCA?

etiennedm · Accepted Answer

No, you can definitely search for k-NN with more than 2-dimension data. Here is an example based on sklearn:
X = [[0, 0, 0], [3, 3, 3], [1, 2, 3]]
from sklearn.neighbors import NearestNeighbors
neigh = NearestNeighbors(n_neighbors=2)
neigh.fit(X)
print( neigh.kneighbors([[2,2,2]]) )

PCA is used to reduce the input dimensionality but this is not mandatory before searching k nearest neighbors (it is often used in tutorials so the data could be visualized on a 2-d plot).
One thing to know/understand about k-NN is that if you plan to use it for classification, it will handle features with a lot of information the same way as the features with no information (if you normalize them). PCA could be used to handle this problem (but this is not the only way and would not always work, but I think this is another question :) ).

Calculating distance between data points when there are more than 3 features in KNN algorithm

One Answer

Add your own answers!

Ask a Question