Wednesday, February 02, 2022

Fuzzy matching of Strings

Quite often during document processing or email processing tasks, we need to compare strings or search for keywords. While the traditional way of doing this would be using String comparison or RegEx, there are a number of other techniques available. 

Fuzzy matching is an approximate string matching technique that is typically used to identify typos or spelling mistakes. Fuzzy matching algorithms try to measure how close two strings are to one another using a concept called as 'Edit Distance'. In simple words, 'edit distance' can be considered as the number of edits required to make both the sentences same. There are different types of edit distance measurements as described in the Wikipedia article above. 

TheFuzz is a cool Python library that can be used to measure the Levenshtein Distance between sequences. The following articles would help you quickly grasp the basics of using the library. 

https://www.activestate.com/blog/how-to-implement-fuzzy-matching-in-python/

https://www.analyticsvidhya.com/blog/2021/07/fuzzy-string-matching-a-hands-on-guide/

https://towardsdatascience.com/fuzzy-string-matching-in-python-68f240d910fe

No comments:

Post a Comment