Fast character access of a string

The problem is that the Middle() function gets called in the loop a lot of times and it has to start the search again. There is not much to optimize, unless you like to build a lookup table.

Interesting discussion. The blog article on the speed was nice. I have a recursive regex that isn’t very fast. Unfortunately, not even the regex from MBS made a difference.

Oh, I didn’t know that because I had not to deal with that details until now.
That makes these things a lot easier, though. :blush:

I’m mostly still using API1 - in API2, does String have a ForEach iterator? That would be a nice addition and much faster when you are iterating through the entire set of characters.

Yes, a lookup table would be a good solution to prevent the call overhead of the Middle() function as well.

Look String.Characters

Cool! In theory, that should be a lot faster than repeated calls to String.Middle()

@Christian_Schmitz - can you add this to your benchmark suite and update the blog post?

2 Likes

Yes, there is. If it’s a fast implementation, it could do the job as well.

Strange, in my case it’s much slower if I am using a ‘for each’ loop (6275ms vs 3173ms).

var cnt,len as integer
var arr() as String
var str,ch as string
var instr as Boolean
instr=false

len=source.Length-1
//for cnt=0 to len
for each ch in source.Characters
  //ch=source.middle(cnt,1)
  if ch<>delimiter and ch<>"""" then
    str=str+ch
    Continue
  end if
  if ch=delimiter and instr=false then
    arr.Add(str)
    str=""
    Continue
  end if
  if ch="""" and instr=false then
    instr=true
  elseif ch="""" and instr=true then
    instr=false
  end if
next
arr.Add(str) 
return arr()

how about:

var cnt,len as integer
var arr() as String
var str,ch as string
var instr as Boolean
instr=false

len=source.Length-1
Var sourceChars() As String = Source.characters
for each ch in sourceChars // <- to see if the function recalling is slowing down
  //ch=source.middle(cnt,1)
  if ch<>delimiter and ch<>"""" then
    str=str+ch
    Continue
  end if
  if ch=delimiter and instr=false then
    arr.Add(str)
    str=""
    Continue
  end if
  if ch="""" and instr=false then
    instr=true
  elseif ch="""" and instr=true then
    instr=false
  end if
next

Odd, it should be faster.
A few geeneral suggestions:

  • add #pragma DisableBackgroundTasks and NilObjectChecking
  • run tests in a built app (not in the IDE) to get the fastest (and most consistent) measurements
  • the equality operator (=) is probably doing a slow, unicode-savvy case-insensitive string comparison.
  • It should be much faster to use String.Compare caseinsensitive, or even just use the old https://documentation.xojo.com/api/text/str.htmlComp in binary mode?

with this code I get a Type Mismatch error…

i see it returns an iterable, my mistake…:wink:

how about String.SplitBytes("")
http://documentation.xojo.com/api/data_types/string.html#string-splitBytes

perhaps that one could be faster if you read it to a property first.?

We are just curious why the ‘for each’ version is slower. I have already a much faster solution for my original problem. But thank you for your suggestion.

Perhaps becuase it’s an class interface that may be re-creating everything every iteration.
We are aware you got a fast solution. It may still be interesting to get more results. As @Christian_Schmitz may add it to it’s blog post.

I can just say if I am replacing my code lines:

for cnt=0 to len
    ch=source.middle(cnt,1)

with

for each ch in source.Characters

…it is much slower than before.

@Rainer_Hofmann I’m getting very different results. What version of Xojo are you using? What OS?

I created 3 tests.

  • “Middle()” was by far the slowest
  • for each ch in source.characters was about 40x as fast
  • ch = MiddleBytes() was about 118x as fast as Middle()
Test 1: for i = 0 to u ; ch = source.middle(i,1)
Took 6.952 seconds

Test 2: for each ch in source.characters
Took 0.180 seconds

Test 3: for i = 0 to u ; ch = MiddleBytes(i,1)
Took 0.058 seconds

Using Xojo 2021 R 2.1 on macOS 11.5.2 big sur (Intel)

Project file: https://xochi.com/xojo/unicode/characters1.xojo_binary_project

1 Like

Interesting results! I am using Xojo 2021 R 2.1 on macOS 11.5.2 big sur (M1).
I will have more time looking into it in the evening (european time).

I was able to use your project. But it is running on Xojo 2021 R 2.1 BUT ON LINUX Pop_OS! (Intel).
I had to reduce the field size because it’s running on a slow notebook. Btw. you have exchanged the words Test1 and Test2 when writing to TextArea1 (not important, though).

Interesting enough, I have got very different results, again:
Test 1: for i = 0 to u ; ch = source.middle(i,1)
Fields=1000
Took 0.219 seconds

Test 2: for each ch in source.characters
Fields=1000
Took 8.619 seconds

Test 3: for i = 0 to u ; ch = MiddleBytes(i,1)
Fields=1000
Took 0.021 seconds

So, here again, the solution with for each is much slower!

Further analysis of the ‘for each’ solution shows exponential timinig behavior if increasing the number of fields:

Test 2: for each ch in source.characters
Fields=100
Took 0.119 seconds

Test 2: for each ch in source.characters
Fields=200
Took 0.386 seconds

Test 2: for each ch in source.characters
Fields=300
Took 0.828 seconds

Test 2: for each ch in source.characters
Fields=400
Took 1.437 seconds

Test 2: for each ch in source.characters
Fields=500
Took 2.225 seconds