Allowing your users to search for a specific string criteria within a text is a common task. However, you might want to add Approximate matching / Fuzzy matching to the search. In our App, the user can search for a criteria within a patient’s electronic health records (huge set of text). Our search feature attempts an Approximate Matching and then Fuzzy matching approaches to find results, in order to get results for near matches / misspelled words etc.
The following snippet uses Regular Expression for Approximate matching, depending on Levenshtein distance method for approximation / distancing from criteria vs. source.
Var regexBegin as new regex
Var regexEnd as new RegEx
Var beginMatch as RegExMatch
Var endMatch as RegExMatch
Var BeginMatchPosition as integer = -1
Var EndMatchPosition as integer = -1
if BeginMatchPosition = -1 or EndMatchPosition = -1 then
Var approxBegin as string = ""
Var approxEnd as string = ""
Var minDist as integer = 99999
For i as integer = 0 to MatchArrayBegin.LastIndex
for j as integer = 0 to MatchArrayEnd.LastIndex
Var dist as integer = LevenshteinDistance(MatchArrayBegin(i), MatchArrayEnd(j))
if dist < minDist then
minDist = dist
approxBegin = MatchArrayBegin(i)
approxEnd = MatchArrayEnd(j)
end if
next
next
regexBegin.SearchPattern = approxBegin
regexEnd.SearchPattern = approxEnd
beginMatch = regexBegin.Search(stringSource)
endMatch = regexEnd.Search(stringSource)
end if
And the LevenshteinDistance method used. See more at (Levenshtein distance - Wikipedia)
Method LevenshteinDistance(string1 As String, string2 As String) As Integer
Var s1len as integer = string1.Length
Var s2len as integer = string2.Length
Var matrix(0,0) as Integer
Redim matrix(string1.Length,string2.Length)
for i as integer = 0 to s1len
matrix(i,0) = i
next
for j as integer = 0 to s2len
matrix(0,j) = j
next
for j as integer = 1 to s2len
for i as integer = 1 to s1len
Var cost as integer
Var s as string
Var v as string
s = string1.Middle(i-1,1)
v = string2.Middle(j-1,1)
if string1.Middle(i-1,1) = string2.Middle(j-1,1) then
cost = 0
else
cost = 1
end if
matrix(i,j) = Min(matrix(i-1,j)+1,matrix(i,j-1)+1,matrix(i-1,j-1)+cost)
next
next
return matrix(s1len,s2len)