TransWikia.com

Are there deduplication algorithms that do not work on a metric space?

Data Science Asked by Imago on May 5, 2021

Recently I got interested in the process of data cleansing and specifically in record linkage.

Thus far I read about deterministic and probabilistic approaches to deduplicate data sets and to some lesser degree also about machine learning methods. It struck me that the key part of all algorithm basically introduce a metric space. Through the metric space every two data points can be assigned a distance. The distance is then basically a measure of how close these two data points are related to another.

However I do wonder, if there were not also different kinds of algorithms that do not work on this principle?

One Answer

One option is fingerprinting. If two objects have the same fingerprint, they are probably the same object. Depending the technique used, the fingerprint can not tell about approximate duplicates.

Answered by Brian Spiering on May 5, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP