str and text split performance

Hello,
I have a big parenthesis-delimited text-file imported into my project, and I have to make an array out of it.
Replacing the old way with the new framework-way I notice a pretty high difference in speed:

dim myStr as string = DefineEncoding(myTxtFile, Encodings.UTF8)
//kParenthesis is a constant// “]”

dim tt as Double = Ticks
dim mArray() as string = myStr.splitB(kParenthesis)//resulting ubound = 35500
label1.text = Format(tt - Ticks, “#”)// = 1 or 2 ticks

and

dim tt as Double = Ticks
dim mArray() as text = myStr.toText.split(kParenthesis, 1)
label1.text = Format(tt - Ticks, “#”)// = more or less 20 ticks

Any suggestion how to speed up the process? Thanks.

I think – in terms of benchmarking – you would need to compare Text.Split with String.Split (and not with String.SplitB).

Text.Split and String.Split will compare each character with the separator.
String.SplitB will compare each byte with the separator, which is much faster, but will not find multi-byte characters.

I thought that myStr.toText.split(kParenthesis, 1) was the equivalent of splitB.

Anyway, using string.split instead of string.splitB still returns 2 ticks:
dim tt as double = Ticks
dim mArray() as String = split(mBible, kParenthesis)
label71.text = Format(tt - Ticks, “#”)

In iOS where String is not available, using a Xojo.Core.MemoryBlock together with a combination of IndexOf and Mid would probably be very fast. http://developer.xojo.com/xojo-core-textencoding will enable going to and from that MemoryBlock to Text.

Could be a bit heavy to construct, but once wrapped into a function, it can be made rather simple to use.

In Desktop or Web, I would simply use String for the spliting and then use ToText. Unless of course the file contains characters that require the Text datatype, such as composite characters.

Actually I hoped to refactor the existing code in my apps with the new framework, but it seems that, at least for text.split, the time has not yet come to refactor.
As for using Xojo.Core.MemoryBlock together with a combination of IndexOf and Mid, well, I’m not fluent enough with memoryblocks.

In iOS, that would be the only solution to speed up the process I guess. In Desktop and Web, we fortunately still have String for a long time to come.

Since I develop only for Desktop, String is my friend.

Careful with 64-bit builds though.
Split and Join can give very weird results on strings.

[quote=235093:@Marco Hof]Careful with 64-bit builds though.
Split and Join can give very weird results on strings.[/quote]

What “weird results” ? There no seem to be any bug report.

It took me two days struggling with a perfectly working 32 bits version while the 64 bits had serious issues. The ‘weird’ -part was that it happened at random points. The split and join operations themselves don’t crash but the results are inconsistent. At random times, characters got chopped off or garbage was added. And only to find out (no 64-bit debugger) way after the split/joins.

I couldn’t use Text because of another 64-bit issue (with encoding. I filed a bug report for that) but finally I saw the very last line at the bottom that I totally overlooked: http://developer.xojo.com/64-bit-guidelines

You are right. I did not notice either.

And that is the original reason why I intended to “refactor” strings into text. Compiling at 64 bits, splitting Bengali strings would return messed up chunks of text; while using splitB.string the output is OK. Text.split too is OK, but is too slow with big texts, as I mentioned above).

Right.
I can’t remember exactly because I stumbled on two issues in 64 bits. So when trying to work around the first, I ran into the second. Both a pita because not able to use the debugger.

Using Text, above in 32 bits. Same Text below in 64 bits.

Other issues as well like crashing when doing ConvertEncoding or DefineEncoding.

Even with standard string. Different output with same encoding in 64 bits plus the join/split issues.

I know, 64 bits is Beta. I tried but had to give up.

These examples brilliantly demonstrate the advantages of Text over String.

Sure, Text is slower, but in some cases, vastly superior.