String.Characters is needlessly slow

Feedback Case Number: 76078

I’ve been writing about this issue for years (most recently 2021) but still a good solution has not been proposed.

Correctly splitting a String into characters is vastly slower using String than Text. Since Text is deprecated, better functionality needs to be added to String.

I think in 2021 String.Characters() was added which correctly returns the characters but as an Iterable. This means to get a String array from it you have to loop over each character and add to a String array.

There needs to be a method added to String that simple returns a String array containing the characters. This definitely could be done faster internally in Xojo’s C++ framework than pushing it up the stack to Xojo code.

Put this code in the Opening() event of a new window in a desktop project (I also added two constants SOURCE_STRING and SOURCE_TEXT of type String and Text respectively to the window that contain identical characters):

// There are 1507 characters in SOURCE_STRING and SOURCE_TEXT.

// This gives 1508 characters because it splits the first character (an emoji into two strings).
Var stringSplit() As String
Var t1 As Double = System.Microseconds
stringSplit = SOURCE_STRING.Split("")
t1 = System.Microseconds - t1 // About 70 microseconds.

// This gives the correct 1507 characters.
Var textSplit() As Text
Var t2 As Double = System.Microseconds
textSplit = SOURCE_TEXT.Split
t2 = System.Microseconds - t2 // About 270 microseconds.

// This also gives the correct 1507 characters.
Var iteration() As String
Var t3 As Double = System.Microseconds
For Each character As String In SOURCE_STRING.Characters
  iteration.Add(character)
Next character
t3 = System.Microseconds - t3 // About 800 microseconds.

Break

Conclusions from the example:

  1. You cannot use String.Split() if the source contains emoji or other characters in the extended plane.
  2. Text.Split() works as expected but is deprecated and using Text within a project alongside String leads to hard to find bugs (trust me!)
  3. Using String.Characters() gives the correct result but because it returns an Iterable and not a simple String array (why?!!) you need to loop through it needlessly which adds time.

Xojo needs to provide a faster way to get a String array of characters from a String. Since the underlying Text (ICU?) framework can get a Text array in this example in 270 microseconds (3.5x faster than iterating) it strikes me that it would be easy to add a new method to String to just return a simple String array.

Obviously dealing with emoji and other more complex characters takes longer than a simple String.Split call but returning an Iterable is needlessly slow.

6 Likes