Identify same product

Question

I am new to ML and still learning it.

My problem is to identify duplicate products. I have a dataset containing product details such as name, colour, size, description, features etc (there are roughly 70 columns).

I need to remove duplicate products.

I just completed some of the supervised ML model(classification and regression) and unsupervised clustering(K means and HC). I am also on the way of learning w2v and d2v.

But due to time constrain, I need to deliver a solution to the above problem statement. I am unsure as to how to proceed.

Any help and guidance would be appreciated

Erwan · Answer

This problem is called record linkage, there are various techniques which can be used, usually involving some distance measure between record and/or approximate string matching between string fields.

Fyi it's a quite complex problem, especially if quality deduplication is expected and the volume of data is high.

SrJ · Answer

You can do a K Means Clustering to see cluster your products and see if some products is situated very closely. (In the same cluster). Then you can say that products in the same cluster are similar. But you have to find the optimal k value of clusters.

Answered by SrJ on November 30, 2020

Identify same product

2 Answers

Add your own answers!

Ask a Question