.Split on 64-bit

DerkJ · May 15, 2017, 11:35am

Hi All,
I used the Somestring.Split(";") function it seems there is a bug in it with 64-bit compilation. I tested it on macOS, can anyone confirm this for other platforms and make a Feedback report?

Thanks.

String was a csv string working when compiled on 32-bit (no changes made)

Xojo 2017 R1.1

Christian_Schmitz · May 15, 2017, 11:38am

You could try SplitMBS or better SplitCommaSeparatedValuesMBS, if you use MBS Plugins.

Michel_Bujardet · May 15, 2017, 12:02pm

<https://xojo.com/issue/40961>

Jürg_Otter · May 15, 2017, 12:17pm

Have a look at the Folder ‘Documentation’ of your Xojo Installation. There’s a file ‘64bitGuidelines.pdf’.
Quoted from that PDF:

[quote]Current Known Issues
Long Split/Join operations on String may have issues. If you do not need multibyte encodings, you can use
SplitB. Otherwise, convert the String to Text and then use the Text methods to Split or Join.[/quote]

Christian_Schmitz · May 15, 2017, 12:19pm

Geoff said it’s scheduled to be fixed soon.
Just other things are more urgent.

So for the time being, use plugin, write your own or just wait.

James_Sentman · May 15, 2017, 11:06pm

What are the parameters that make this happen? I use split quite a lot on shorter strings to get a list of things separated by newline characters and then join them back together for saving and I havent had any of them fail or produce any garbage yet. Does it have to be very large strings? Or strings with strange or non-encodings? Or what exactly?

DerkJ · May 15, 2017, 11:56pm

I’m using a utf-8 encoded csv file generated by numbers on macOS. Splitting it by “;” then on 64-bit compiled i get Split that is offset -1 or -2 characters to the left eg:
If i have column1;column2;column3 and i get
colum n1;colum n2;colum n3;
The “;” is not removed and some other character is removed or offset.

On 32 bit it works fine.
Only the first line seems to be like so, which is my header… strangly

kevin_g · May 16, 2017, 6:01am

Hi Derk

If you can 100% gaurantee that your data is valid UTF8 you might be able to use the B functions (SplitB / MidB / LeftB etc…).

The string functions currently have lots of issues under 64 bit so fingers crossed they all get addressed in 2017r2

James_Sentman · May 16, 2017, 12:43pm

Ah, its broken for multibyte characters. But not for ascii. Its like it actually is the splitB function. With a listbox and a button in the window If I do this it works fine:

Listbox1.DeleteAllRows

Dim s As String = “one;two;three;four;five”

Dim t() As String = Split( s, “;”)

For Each u As String In t
listbox1.AddRow( “(” + u + “)”)
next

but if I do this then its broken:

Listbox1.DeleteAllRows

Dim s As String = “oné;two;three;four;five”

Dim t() As String = Split( s, “;”)

For Each u As String In t
listbox1.AddRow( “(” + u + “)”)
next

That makes sense that Im not seeing it, as all the data Im testing with is probably UTF8, but no multi-byte characters. I think MOST of my use of it does not involve user inputted data, but rather re-parsing strings of known providence, but I will verify for sure because it would make it very hard to debug a strange problem with one of our French users for instance.

Mike_D · May 18, 2017, 4:02pm

That’s misleading: I’ve seen the bug with only a few kilobytes of text - which is not what I would consider “long”. Super dangerous bug in my opinion and that feature should not have shipped in that state.

The good news is that if your data is UTF8 then SplitB works fine.

Michel_Bujardet · May 18, 2017, 4:24pm

As far as I understand, the bug does not show up for Text. So it is possible to have a workaround.

Thomas_ROBISSON · May 18, 2017, 6:03pm

Yes, as workaround I did my own split method which transform the string in text and then split it. Just be aware that when transform a string into text, the string encoding must be know (define)

DerkJ · May 18, 2017, 9:18pm

I split on text not on a string. I didn’t mention that just checked it.

Michel_Bujardet · May 19, 2017, 7:06am

Your first post was specifying string.

So, you mean the issue presents with Text ?

Beatrix_Willius · May 19, 2017, 7:33am

Can’t reproduce this. Only the debugger hangs when viewing a longer text property (200k). But the split looks okay.

DerkJ · May 19, 2017, 7:49am

[quote=331616:@Michel Bujardet]Your first post was specifying string.

So, you mean the issue presents with Text ?[/quote]

Well yes. I had string in use, it didn’t work.

Dim Line As String
Dim Fields() As String

Fields = Line.Split(";") 'Doesn't work on 64-bit build (no error, just strange data, seems offset).

Then i had this:

Dim Fields() As Text
Dim Field As String
Fields = Line.Split(";")

For each f As Text in Fields

field = f
system.debuglog("field: " + field)

Next

I had some offset output… to one char, changed all to “As Text” And it gives 0-problems.

Beatrix_Willius · May 19, 2017, 7:52am

dim t as Text = TextArea1.Text.ToText dim s(-1) as text = t.Split("das")

works fine.

Thomas_ROBISSON · May 19, 2017, 12:37pm

I will be bad guy, but I think as Xojo know that Spilt doesn’t work on 64-bit, it should display an alert.

DaveS · June 8, 2017, 3:21pm

I’m still trying to wrap my head around the “why” between “string” and “text”… but for now it is what it is…

That being said… can anyone confirm or deny, that this would work to provide a “split” STRING (not ‘text’) result in both 32bit and 64bit compile? I just don’t want to waste a few hours doing use cases if someone already has a definitive answer

Public Function Split64(source as string,delimiter as string) as string()
  Return Split(source.ToText,delimiter)
End Function

Does JOIN has similar issue? or can I leave that code in my app As-Is?

Simon_Berridge · June 8, 2017, 3:32pm

Dave

Simple observation. The .ToText function will return a BadDataException if the string encoding is not there. Perhaps you should add an encoding line:

Public Function Split64(source as string,delimiter as string) as string() source = DefineEncoding(source, Encodings.UTF8) Return Split(source.ToText,delimiter) End Function