TransWikia.com

How to find nearest neighbors in SMOTE

Data Science Asked on December 1, 2020

I am reading the original paper by Chawla and others for SMOTE. I am trying to understand how to generate these synthetic examples for over-sampling the minority class. The paper says:

“Synthetic samples are generated in the following way: Take the difference between the feature vector (sample) under consideration and its nearest neighbor. Multiply this difference by a random number between 0 and 1, and add it to the feature vector under consideration. This causes the selection of a random point along the line segment between two specific features”.

I understand the idea, take your sample, the nearest neighbor, pick a random point in between, what I don’t understand is how these nearest neighbors are defined.

2 Answers

You need to compute the Euclidean distance between each point and sort theses distances.

That said, you can find an implementation in the imbalanced-learn toolbox. More precisely, you can see the different steps of the implementation that you mentioned: (i) fit a KNN, (ii) find the NN of each sample, (iii) generate a new samples.

Correct answer by glemaitre on December 1, 2020

The nearest neighbor will be the sample with the smaller Euclidean distance.

Answered by Paulo on December 1, 2020

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP