.Split on 64-bit

Hi All,
I used the Somestring.Split(";") function it seems there is a bug in it with 64-bit compilation. I tested it on macOS, can anyone confirm this for other platforms and make a Feedback report?

Thanks.

String was a csv string working when compiled on 32-bit (no changes made)

Xojo 2017 R1.1

You could try SplitMBS or better SplitCommaSeparatedValuesMBS, if you use MBS Plugins.

<https://xojo.com/issue/40961>

Have a look at the Folder ‘Documentation’ of your Xojo Installation. There’s a file ‘64bitGuidelines.pdf’.
Quoted from that PDF:

[quote]Current Known Issues
Long Split/Join operations on String may have issues. If you do not need multibyte encodings, you can use
SplitB. Otherwise, convert the String to Text and then use the Text methods to Split or Join.[/quote]

Geoff said it’s scheduled to be fixed soon.
Just other things are more urgent.

So for the time being, use plugin, write your own or just wait.

What are the parameters that make this happen? I use split quite a lot on shorter strings to get a list of things separated by newline characters and then join them back together for saving and I haven’t had any of them fail or produce any garbage yet. Does it have to be very large strings? Or strings with strange or non-encodings? Or what exactly?

I’m using a utf-8 encoded csv file generated by numbers on macOS. Splitting it by “;” then on 64-bit compiled i get Split that is offset -1 or -2 characters to the left eg:
If i have column1;column2;column3 and i get
colum n1;colum n2;colum n3;
The “;” is not removed and some other character is removed or offset.

On 32 bit it works fine.
Only the first line seems to be like so, which is my header… strangly

Hi Derk

If you can 100% gaurantee that your data is valid UTF8 you might be able to use the B functions (SplitB / MidB / LeftB etc…).

The string functions currently have lots of issues under 64 bit so fingers crossed they all get addressed in 2017r2

Ah, it’s broken for multibyte characters. But not for ascii. It’s like it actually is the splitB function. With a listbox and a button in the window If I do this it works fine:

Listbox1.DeleteAllRows

Dim s As String = “one;two;three;four;five”

Dim t() As String = Split( s, “;”)

For Each u As String In t
listbox1.AddRow( “(” + u + “)”)
next

but if I do this then it’s broken:

Listbox1.DeleteAllRows

Dim s As String = “oné;two;three;four;five”

Dim t() As String = Split( s, “;”)

For Each u As String In t
listbox1.AddRow( “(” + u + “)”)
next

That makes sense that I’m not seeing it, as all the data I’m testing with is probably UTF8, but no multi-byte characters. I think MOST of my use of it does not involve user inputted data, but rather re-parsing strings of known providence, but I will verify for sure because it would make it very hard to debug a strange problem with one of our French users for instance.

That’s misleading: I’ve seen the bug with only a few kilobytes of text - which is not what I would consider “long”. Super dangerous bug in my opinion and that feature should not have shipped in that state.

The good news is that if your data is UTF8 then SplitB works fine.

As far as I understand, the bug does not show up for Text. So it is possible to have a workaround.

Yes, as workaround I did my own split method which transform the string in text and then split it. Just be aware that when transform a string into text, the string encoding must be know (define)

I split on text not on a string. I didn’t mention that just checked it.

Your first post was specifying string.

So, you mean the issue presents with Text ?

Can’t reproduce this. Only the debugger hangs when viewing a longer text property (200k). But the split looks okay.

[quote=331616:@Michel Bujardet]Your first post was specifying string.

So, you mean the issue presents with Text ?[/quote]

Well yes. I had string in use, it didn’t work.

Dim Line As String
Dim Fields() As String

Fields = Line.Split(";") 'Doesn't work on 64-bit build (no error, just strange data, seems offset).

Then i had this:

Dim Fields() As Text
Dim Field As String
Fields = Line.Split(";")

For each f As Text in Fields

field = f
system.debuglog("field: " + field)

Next

I had some offset output… to one char, changed all to “As Text” And it gives 0-problems.

dim t as Text = TextArea1.Text.ToText dim s(-1) as text = t.Split("das")

works fine.

I will be bad guy, but I think as Xojo know that Spilt doesn’t work on 64-bit, it should display an alert.

I’m still trying to wrap my head around the “why” between “string” and “text”… but for now it is what it is…

That being said… can anyone confirm or deny, that this would work to provide a “split” STRING (not ‘text’) result in both 32bit and 64bit compile? I just don’t want to waste a few hours doing use cases if someone already has a definitive answer :slight_smile:

Public Function Split64(source as string,delimiter as string) as string()
  Return Split(source.ToText,delimiter)
End Function

Does JOIN has similar issue? or can I leave that code in my app As-Is?

Dave

Simple observation. The .ToText function will return a BadDataException if the string encoding is not there. Perhaps you should add an encoding line:

Public Function Split64(source as string,delimiter as string) as string() source = DefineEncoding(source, Encodings.UTF8) Return Split(source.ToText,delimiter) End Function