Mapping a set of corrupted strings to the correct ones

Question

I am fairly new into Data Science but encoutered it before. The following problem troubles me and i hope you guys can point me in the right direction.

The input are some strings where some carry the same information others not. An unknow number of these strings are crooked* to a warrying degree. From only one letter off to complete garbage. On the output side are the corrected strings from the input. The catch is that there are only certain, already known, combinations of valid strings possible.

In a naive approach i chained some fuzzy searches and already got some promising results. Now i don't know where to start or if there are similar problems already solved.

* (are we still allowed to say this?)

Brian Spiering · Answer

That problem is called approximate string matching or fuzzy string searching. It has been well studied in computer science.
If there is a limited, known collection of valid strings (aka, a dictionary), then the problem can be framed as spell correction.

Mapping a set of corrupted strings to the correct ones

One Answer

Add your own answers!

Ask a Question