Code snippets 4U - Approximate Matching search using RegEx

Allowing your users to search for a specific string criteria within a text is a common task. However, you might want to add Approximate matching / Fuzzy matching to the search. In our App, the user can search for a criteria within a patient’s electronic health records (huge set of text). Our search feature attempts an Approximate Matching and then Fuzzy matching approaches to find results, in order to get results for near matches / misspelled words etc.

The following snippet uses Regular Expression for Approximate matching, depending on Levenshtein distance method for approximation / distancing from criteria vs. source.

Var regexBegin as new regex
Var regexEnd as new RegEx
Var beginMatch as RegExMatch
Var endMatch as RegExMatch
Var BeginMatchPosition as integer = -1
Var EndMatchPosition as integer = -1

if BeginMatchPosition = -1 or EndMatchPosition = -1 then
  Var approxBegin as string = ""
  Var approxEnd as string = ""
  Var minDist as integer = 99999

  For i as integer = 0 to MatchArrayBegin.LastIndex
    
     for j as integer = 0 to MatchArrayEnd.LastIndex

     Var dist as integer = LevenshteinDistance(MatchArrayBegin(i), MatchArrayEnd(j))

      if dist < minDist then
        minDist = dist
        approxBegin = MatchArrayBegin(i)
        approxEnd = MatchArrayEnd(j)

      end if

    next

  next

  regexBegin.SearchPattern = approxBegin
  regexEnd.SearchPattern = approxEnd
  beginMatch = regexBegin.Search(stringSource)
  endMatch = regexEnd.Search(stringSource)

end if

And the LevenshteinDistance method used. See more at (Levenshtein distance - Wikipedia)

Method LevenshteinDistance(string1 As String, string2 As String) As Integer

Var s1len as integer = string1.Length
Var s2len as integer = string2.Length

Var matrix(0,0) as Integer
Redim matrix(string1.Length,string2.Length)

for i as integer = 0 to s1len
  matrix(i,0) = i

next

for j as integer = 0 to s2len
  matrix(0,j) = j
next

for j as integer = 1 to s2len

  for i as integer = 1 to s1len

    Var cost as integer
    Var s as string 
    Var v as string

    s = string1.Middle(i-1,1)
    v = string2.Middle(j-1,1)

    if string1.Middle(i-1,1) = string2.Middle(j-1,1) then
      cost = 0

    else 
      cost = 1

    end if

    matrix(i,j) = Min(matrix(i-1,j)+1,matrix(i,j-1)+1,matrix(i-1,j-1)+cost)

  next

next

return matrix(s1len,s2len)
4 Likes