Feedback Case Number: 76078
I’ve been writing about this issue for years (most recently 2021) but still a good solution has not been proposed.
Correctly splitting a String
into characters is vastly slower using String
than Text
. Since Text
is deprecated, better functionality needs to be added to String
.
I think in 2021 String.Characters()
was added which correctly returns the characters but as an Iterable
. This means to get a String
array from it you have to loop over each character and add to a String
array.
There needs to be a method added to String
that simple returns a String
array containing the characters. This definitely could be done faster internally in Xojo’s C++ framework than pushing it up the stack to Xojo code.
Put this code in the Opening()
event of a new window in a desktop project (I also added two constants SOURCE_STRING
and SOURCE_TEXT
of type String
and Text
respectively to the window that contain identical characters):
// There are 1507 characters in SOURCE_STRING and SOURCE_TEXT.
// This gives 1508 characters because it splits the first character (an emoji into two strings).
Var stringSplit() As String
Var t1 As Double = System.Microseconds
stringSplit = SOURCE_STRING.Split("")
t1 = System.Microseconds - t1 // About 70 microseconds.
// This gives the correct 1507 characters.
Var textSplit() As Text
Var t2 As Double = System.Microseconds
textSplit = SOURCE_TEXT.Split
t2 = System.Microseconds - t2 // About 270 microseconds.
// This also gives the correct 1507 characters.
Var iteration() As String
Var t3 As Double = System.Microseconds
For Each character As String In SOURCE_STRING.Characters
iteration.Add(character)
Next character
t3 = System.Microseconds - t3 // About 800 microseconds.
Break
Conclusions from the example:
- You cannot use
String.Split()
if the source contains emoji or other characters in the extended plane. Text.Split()
works as expected but is deprecated and usingText
within a project alongsideString
leads to hard to find bugs (trust me!)- Using
String.Characters()
gives the correct result but because it returns anIterable
and not a simpleString
array (why?!!) you need to loop through it needlessly which adds time.
Xojo needs to provide a faster way to get a String array of characters from a String
. Since the underlying Text
(ICU?) framework can get a Text
array in this example in 270 microseconds (3.5x faster than iterating) it strikes me that it would be easy to add a new method to String
to just return a simple String
array.
Obviously dealing with emoji and other more complex characters takes longer than a simple String.Split
call but returning an Iterable
is needlessly slow.