Sometimes you have to loop over characters in a string in Xojo. Whether you count something, search for patterns or you want to replace some characters, the reasons are divers, but performance may matter.
Let’s try four different ways and report how much time is needed for each function.
My test file is about 300,000 characters big, stored as UTF-8 and contains various German umlauts, so we have a couple of two byte characters. All tests are made with DisableBackgroundTasks pragma set to reduce background activity. For the timing we run each block 10 times to get an average duration.
First way is using String.Characters, which does create an iterator over the characters. Basically it creates an Iterable object and converts String to Text internally. Then creates an iterator object, where the for loop internally calls MoveNext and Value functions, which includes wrapping the string for each character into a variant. Here is the loop:
Dim m1 As Double = System.Microseconds // using String.Characters For Each c As String In t.Characters If Asc(c) = 13 Then n1 = n1 + 1 End If Next Dim m2 As Double = System.Microseconds
In my test this takes about 550 ms to run over 300,000 characters of text. Let’s see if we can do better.
We call String.Split with an empty string as delimiter to split by characters. So the function walks over the text, looks where characters begin and end and copies them into new strings and adds them to an array. Then we traverse that array with a for each loop:
Dim chars() As String = t.Split("") // using Split For Each c As String In chars If Asc(c) = 13 Then n2 = n2 + 1 End If Next chars = Nil // free memory Dim m3 As Double = System.Microseconds
In our test this takes about 110 ms on the same text.
We add StringCodePointsMBS for version 21.3 of MBS Xojo DataTypes Plugin. This function returns an array with UInt32 representing the code points. We skip creating the string objects to save some time here, but we can handle correctly unicode characters above 65535, which won’t fit in 16 bit integers.
// using new StringCodePointsMBS function in 21.3 Dim values() As UInt32 = StringCodePointsMBS(t) For Each codePoint As UInt32 In values If codePoint = 13 Then n3 = n3 + 1 End If Next values = Nil // free memory Dim m4 As Double = System.Microseconds
In our test this takes about 51 ms per run.
The fastest way is to not bother about unicode characters and just look on the bytes. By converting string to Memoryblock, the bytes are copied and you can travers the new memory block like this:
// using Memoryblock Dim mem As MemoryBlock = t Dim u As Integer = mem.Size - 1 For i As Integer = 0 To u If mem.UInt8Value(i) = 13 Then n4 = n4 + 1 End If Next mem = Nil // free memory Dim m5 As Double = System.Microseconds
This takes about 50 ms, just a bit faster than our plugin function. But please try it with , where you would get a 4 byte memory block for one character.