Public Function levenshtein(a as string,b as string) as integer
Dim i As Integer
Dim j As Integer
Dim cost As Integer
Dim d(-1,-1) As Integer
' Avoid calculations where there there are empty words
If Len(a) = 0 Then Return Len(b)
If Len(b) = 0 Then Return Len(a)
' Array initialization
Redim d(Len(a),Len(b))
For i = 0 To Len(a)
d(i,0) = i
Next
For j = 0 To Len(b)
d(0,j) = j
Next
' Actual calculation
For i = 1 To Len(a)
For j = 1 To Len(b)
If Mid(a,i,1) = Mid(b,j,1) Then
cost = 0
Else
cost = 1
End If
d(i,j)=Min((d(i-1,j)+1),(d(i,j-1)+1),(d(i-1,j-1)+cost))
Next j
Next i
Return d(Len(a),Len(b))
End Function
msgbox str(levenshtein(“S”,“S”)) = 0
msgbox str(levenshtein(“S”,“S”)) = 1
msgbox str(levenshtein(“S”,“FRED”)) = 7
so even here there isn’t a lot of granulatirty
Len is actually kind of expensive. For short strings it probably won’t matter, but for longer ones, you might feel it. Instead, I recommend caching Len or perhaps splitting each string into arrays first.
No, but (my guess is) it has to account for multi-byte characters each time. Unless they’ve changed it behind the scenes recently, it was definitely slower for large strings, but I haven’t tested in the last couple of years.
that doen’t seem to be a true statement…
I timed taking the LEN(s) where s got progressively longer, and the time to calculate it… got progressively longer
a Million calculations for a string 9k long was 69.7 seconds for 10k it was 77.9,
1000 char:91 sec.
2000 char:88.5 sec.
3000 char:87.66667 sec.
4000 char:87 sec.
5000 char:86.6 sec.
6000 char:86.16667 sec.
7000 char:85.71429 sec.
8000 char:85.375 sec.
9000 char:85.22222 sec.
10000 char:85.1 sec.
No you are right it is not true, I was thinking it from the Plugin SDK but thats the raw bytes that has counter. So I am guessing that LenB would be no op while Len not.
Nothing is “no-op”… the operation still takes time to execute… its just that the amount of time is not dependant on the length of the string (I tested with single byte data)…