Fastest case sensitive equality test

Off the top of my head:

var chars() as string = s.Split( "" )
var upperChars() as string = s.Uppercase.Split( "" )

for index as integer = 0 to chars.LastIndex
  if chars( index ).Asc = upperChars( index ).Asc then
    // uppercase
  else
    // lowercase
  end if
next
1 Like

Thanks @Kem_Tekinay. Interesting, using

Private Function IsLowercase(value As String) As Boolean
  Return value.Asc = value.Lowercase.Asc
End Function

instead of

Private Function IsLowercase(value As String) As Boolean
  Var iResult As Integer = value.Compare(value.Lowercase, ComparisonOptions.CaseSensitive)
  Return iResult = 0
End Function

makes in my test case a difference of 20 ms. Further suggestions for optimization?

Note that s.Asc will return the code point for the first character only. Is that what you want?

Are you measuring the calling function too? If so, are you using Split the parse the characters?

It doesn’t matter in my case, because I call the function inside a loop for single characters:

For Each char As String In source.Characters
  If IsLowerCase(char) Then ...
Next

That’s why it works wonderfully in my case. I have also tested it with Cyrillic letters etc. .

1 Like

The only optimization I see is to inline the test as function calls have overhead.

Edit: But that affects readability.

1 Like

Cool, thanks. However, this doesn’t make sense in my case, because I call the function in other methods as well and don’t want duplicate code :wink:

I agree.

But in thinking about it more, where speed is important, you are calling the Lowercase function for each character of the string instead of just once on the whole thing. If your string is 100 character, it probably doesn’t make a difference. If it’s 100k, you might feel that.

In other words, instead of converting each character to lowercase before the comparison, convert the string to lowercase, then compare each character to the corresponding character of the original.

1 Like

After reading the Leaking Locale and core.Local Objects? thread. I started thinking about string comparison performance and found the current thread.

And after reading through here, I was interested in how NSStringCompareMBS would preform, if included as another test case (based on the code in the original post).

So I added the following two test runs:

dblSeconds = Xojo.Core.Date.Now.SecondsFrom1970
intRounds = 0
Do Until intRounds = 10000
  intRounds = intRounds + 1
  If NSStringCompareMBS(strTest1, strTest2, 0) <> 0 Then
    Break
  End If
Loop
strMessage = strMessage + Chr(13) + Str(Xojo.Core.Date.Now.SecondsFrom1970 - dblSeconds) + " seconds for 10,000 NSStringCompareMBS case-sensitive string comparisons"

dblSeconds = Xojo.Core.Date.Now.SecondsFrom1970
intRounds = 0
Do Until intRounds = 10000
  intRounds = intRounds + 1
  If NSStringCompareMBS(strTest1, strTest2, 1) <> 0 Then
    Break
  End If
Loop
strMessage = strMessage + Chr(13) + Str(Xojo.Core.Date.Now.SecondsFrom1970 - dblSeconds) + " seconds for 10,000 NSStringCompareMBS case-insensitive string comparisons"

And got the following results:

0.0206921 seconds for 10,000 String.Compare
0.0049689 seconds for 10,000 HexEncoding comparisons
0.5892110 seconds for 10,000 Hashing comparisons
0.0012398 seconds for 10,000 case-insensitive string comparisons
0.0022290 seconds for 10,000 NSStringCompareMBS case-sensitive string comparisons
0.0021710 seconds for 10,000 NSStringCompareMBS case-insensitive string comparisons

Note: I included the results of all tests, because I’m using Xojo 2021r1.1 on a 2018 Mac Mini (10.15.7) with 3.2 GHz 6-Core Intel Core i7 & 32Gb RAM.

My conclusion was, for case-insensitive string comparisons use the = or <> operators. And for case-sensitive matches, use NSStringCompareMBS - if available to you and appropriate.

I hope that is useful to someone. Thanks.

2 Likes

There must be a way to efficiently use memory blocks for this. Albeit I can’t think of one at the moment. I use memoryblocks for case sensitive “select case”.

If you have to use MemoryBlocks for standard operations then Xojo is doing something wrong.

Thanks for comparison.

For String.Compare I made a feedback case 64647 as it converts all Strings to Text and then does a compare, which makes it slower than it needs to be. And on macOS the compare may be with creating CFString internally (another copy in addition to text) to do the compare.

For NSStringCompareMBS similarly you have the overhead of a plugin function call, which is not efficient as it could be. (see 62010). And then our plugin will do CFString/NString comparison for you.

1 Like

String.Compare may be slower than other methods, but in one project it was the only way for me to get a decent ordering of a ListBox containing “Umlaute”. (StrComp didn’t work properly.) I don’t know how else I could have it done so I am glad I have this option. :thinking:

Unicode normalization issue?

I fear I don‘t know what that is. :see_no_evil:

1 Like

You will learn the first time when strings that should be the same don’t match. I thought I was going crazy.

I will look into this as soon as it bugs me 
 (Fortunately I didn‘t have to deal with that sort of problems before.)

In a nutshell, some characters can be represented in two different ways: one code point that represents the character, or a series of two or more code points.

To see this in action in Xojo, try this:

MessageBox "e" + &u0300

(I am greatly simplifying the issue here.)

Normalization is the process of getting all the character represented in the same way, either Composed (one code point) or Decomposed (two or more code points). Once you have achieved that consistency, things like sorts and searches will work properly.

My M_String project include normalization code, as does the MBS plugins.

http://www.mactechnologies.com/index.php?dowloads

That is why there are flags like

const NSDiacriticInsensitiveSearch = 128
const NSWidthInsensitiveSearch = 256
1 Like