Fast character access of a string

Thank you very much! However, how do I use code tags in the forum?

There is an icon in the toolbar, but the easiest way is to use three backticks on their own lines before and after the code.

Ok, thanks!

using Ptr instead of MemoryBock will be even faster.
e.g. ptr pointing to bytes in MemoryBlock.

But using string + array is more save as a wrong index with ptr can crash easily.

2 Likes

can you tell what is the task with this 5000000 rows csv file?
maybe you can import this file first into a sqlite database.
i remember people wrote it is very fast.

https://www.sqlite.org/download.html
https://www.sqlite.org/cli.html

1 Like

It depends. This character isnā€™t available on my layout, so the button is preferable. And who knows how many layouts donā€™t have it?

backticks = alt-shift-1 in my fr macOSā€¦

Well, I donā€™t think itā€™s worth enumerating all the existing layouts, as I rather think this character isnā€™t part of the ā€œstandard setā€. But I can be wrong, of course.

Emile, you have it directly under the Ā£ key on your french keyboard !

Sorry, I was not allowed to write more messages on my first day in the forum. :blush: Btw. the performance is better if using String.FromArray. But unfortunately, only about 4.1%. I was hoping for more because your explanation makes a lot sense for me. However, the solution should be much better if the strings are much longer I guess.
Thank you! There have been a lot of good suggestions. I will play a little bit with it and see what I can do.

1 Like

Now I had time to look into more performance optimisation. My goal was to avoid string concatenation as much as possible. With using String.IndexOf for searching the right positions and building afterwards only new strings if needed I was able to get 6 times faster code compared to my first solution.

Thanks a lot for everybody who was interested in helping me to improve my first steps in Xojo. :blush:

I wrote you a blog post:

Optimizing a Xojo function

4 Likes

Nice work, Christian, and very helpful!
As I already wrote, I have looked into a different approach with simply avoiding as many string concatenations as possible. Therefore, I have not to take care about utf8-strings as well.

My first attempt was using String.FromArray for concatenation but I got only a small percentage of better timings.

Of note, Christianā€™s tests showed a potential 24000x (yes, twenty-four-thousand fold) speedup when using Ptr compared to using the Unicode-aware Xojo string functions such as Middle().

I wonder how much of this is overhead in the libraries that Xojo is using, vs. overhead in the Xojo framework itself?

1 Like

A kind of pointer access was the first thing I was looking for because I know of similar operations in other languages like string[ptr]. Which is really fast but needs additional efforts to take care about unicode characters with different length.

Not necessarily - If you are splitting strings, and the separator is always ASCII, then the beauty of UTF8 is you can just search for the separators as Uint8 (byte), and not worry about whether the strings between the separators are Unicode or not. It ā€œjust worksā€.

Very interesting results. I moved to using split() to make an array of characters when looking through a text with a view to replacing some items That ended up being fast enough for me in practice.

Your complete set of tests is a useful reference - thanks for that.

1 Like

However, I guess that works only if the second or third byte code of utf8 is not by accident the same as the delimiter which you are using.

1 Like

UTF8 guarantees it is not. (The short answer: all extra bytes have the high bit set, so are in the range 128ā€¦255, thus outside the ASCII character set)

Thatā€™s not possible. See:

and look at the Encoding section.

1 Like