Hi All,
I used the Somestring.Split(";") function it seems there is a bug in it with 64-bit compilation. I tested it on macOS, can anyone confirm this for other platforms and make a Feedback report?
Thanks.
String was a csv string working when compiled on 32-bit (no changes made)
Have a look at the Folder ‘Documentation’ of your Xojo Installation. There’s a file ‘64bitGuidelines.pdf’.
Quoted from that PDF:
[quote]Current Known Issues
Long Split/Join operations on String may have issues. If you do not need multibyte encodings, you can use
SplitB. Otherwise, convert the String to Text and then use the Text methods to Split or Join.[/quote]
What are the parameters that make this happen? I use split quite a lot on shorter strings to get a list of things separated by newline characters and then join them back together for saving and I havent had any of them fail or produce any garbage yet. Does it have to be very large strings? Or strings with strange or non-encodings? Or what exactly?
I’m using a utf-8 encoded csv file generated by numbers on macOS. Splitting it by “;” then on 64-bit compiled i get Split that is offset -1 or -2 characters to the left eg:
If i have column1;column2;column3 and i get
colum n1;colum n2;colum n3;
The “;” is not removed and some other character is removed or offset.
On 32 bit it works fine.
Only the first line seems to be like so, which is my header… strangly
Ah, its broken for multibyte characters. But not for ascii. Its like it actually is the splitB function. With a listbox and a button in the window If I do this it works fine:
Listbox1.DeleteAllRows
Dim s As String = “one;two;three;four;five”
Dim t() As String = Split( s, “;”)
For Each u As String In t
listbox1.AddRow( “(” + u + “)”)
next
but if I do this then its broken:
Listbox1.DeleteAllRows
Dim s As String = “oné;two;three;four;five”
Dim t() As String = Split( s, “;”)
For Each u As String In t
listbox1.AddRow( “(” + u + “)”)
next
That makes sense that Im not seeing it, as all the data Im testing with is probably UTF8, but no multi-byte characters. I think MOST of my use of it does not involve user inputted data, but rather re-parsing strings of known providence, but I will verify for sure because it would make it very hard to debug a strange problem with one of our French users for instance.
That’s misleading: I’ve seen the bug with only a few kilobytes of text - which is not what I would consider “long”. Super dangerous bug in my opinion and that feature should not have shipped in that state.
The good news is that if your data is UTF8 then SplitB works fine.
Yes, as workaround I did my own split method which transform the string in text and then split it. Just be aware that when transform a string into text, the string encoding must be know (define)
I’m still trying to wrap my head around the “why” between “string” and “text”… but for now it is what it is…
That being said… can anyone confirm or deny, that this would work to provide a “split” STRING (not ‘text’) result in both 32bit and 64bit compile? I just don’t want to waste a few hours doing use cases if someone already has a definitive answer
Public Function Split64(source as string,delimiter as string) as string()
Return Split(source.ToText,delimiter)
End Function
Does JOIN has similar issue? or can I leave that code in my app As-Is?
Simple observation. The .ToText function will return a BadDataException if the string encoding is not there. Perhaps you should add an encoding line:
Public Function Split64(source as string,delimiter as string) as string()
source = DefineEncoding(source, Encodings.UTF8)
Return Split(source.ToText,delimiter)
End Function