My app stores the names of folderItems, such as PDFs, and then finds and displays them when the users selects the name from a popup menu. I ran into a subtle change in 2019R3 that broke existing code and may affect others.
Although the file name is UTF-8 encoded in both versions of Xojo, the bytes differ (precomposed vs. decomposed?)? Here’s an example for the file name
Krmer 2013.pdf
In R1.1, the bytes are
4B72 C3A4 6D65 7220 3230 2E70 6466
In R3 they are
4B72 61CC 886D 6572 2032 3031 332E 7064 66
The good news is that when I search for f.child(“Krmer 2013.pdf”) with the first encoding it finds the file whose name has the second encoding (in the IDE).
Not surprising since the underpinnings of Folderitem did get replaced with much update API calls particularly for macOS
The difference could be down to how those macOS api’s return the names
You may not be surprised but I certainly didn’t expect this change in behavior (or I wouldn’t have spent hours tracking down the problem). I understand why it happened and didn’t post here because I thought it was a bug. I posted so that others who encounter this it might figure it out more quickly than I did.
Knowing what the side effects of that may be is just too much to ask. FolderItem behaves differently, and it is documented why.[/quote]
Release Notes are interesting, but shouldn’t be a source of documentation, any more than threads here should be. I think it’s entirely reasonable for the change in OS APIs on macOS to be mentioned on the doc page for FolderItem, along with any side-effects that people can report as and when they find them.
In fact, more generally, one might copy what the PHP pages do, where each PHP documentation page includes, at the end, user-contributed notes which would be ideal for this sort of thing.
In Unicode, different (i.e. decomposed and precomposed) sequences of code points and thus bytes may represent the same character and are to be treated as canonically equivalent. Apple appears to prefer fully decomposed file names but thankfully normalizes names received from foreign file systems so on macOS there cannot be two files with visually identical names that are yet considered distinct.
The upshot is that one cannot rely on some text to always be represented by the same sequence of bytes, even when the encoding is the same.
[quote=471357:@Michael Hußmann]In Unicode, different (i.e. decomposed and precomposed) sequences of code points and thus bytes may represent the same character and are to be treated as canonically equivalent. Apple appears to prefer fully decomposed file names but thankfully normalizes names received from foreign file systems so on macOS there cannot be two files with visually identical names that are yet considered distinct.
The upshot is that one cannot rely on some text to always be represented by the same sequence of bytes, even when the encoding is the same.[/quote]
except xojo does NOT behave this way - although maybe it should ?
Dim mb1 As New memoryblock(16)
mb1.UInt8Value(0) = &h4B
mb1.UInt8Value(1) = &h72
mb1.UInt8Value(2) = &hC3
mb1.UInt8Value(3) = &hA4
mb1.UInt8Value(4) = &h6D
mb1.UInt8Value(5) = &h65
mb1.UInt8Value(6) = &h72
mb1.UInt8Value(7) = &h20
mb1.UInt8Value(8) = &h32
mb1.UInt8Value(9) = &h30
mb1.UInt8Value(10) = &h31
mb1.UInt8Value(11) = &h33
mb1.UInt8Value(12) = &h2E
mb1.UInt8Value(13) = &h70
mb1.UInt8Value(14) = &h64
mb1.UInt8Value(15) = &h66
Dim mb2 As New memoryblock(17)
mb2.UInt8Value(0) = &h4B
mb2.UInt8Value(1) = &h72
mb2.UInt8Value(2) = &h61
mb2.UInt8Value(3) = &hCC
mb2.UInt8Value(4) = &h88
mb2.UInt8Value(5) = &h6D
mb2.UInt8Value(6) = &h65
mb2.UInt8Value(7) = &h72
mb2.UInt8Value(8) = &h20
mb2.UInt8Value(9) = &h32
mb2.UInt8Value(10) = &h30
mb2.UInt8Value(11) = &h31
mb2.UInt8Value(12) = &h33
mb2.UInt8Value(13) = &h2E
mb2.UInt8Value(14) = &h70
mb2.UInt8Value(15) = &h64
mb2.UInt8Value(16) = &h66
Dim str1 As String = DefineEncoding(mb1, Encodings.UTF8)
Dim str2 As String = DefineEncoding(mb2, Encodings.UTF8)
If mb1 = mb2 Then
Break
End If
If str1 = str2 Then
break
End If
I have to thank Jonathon for starting this thread. I ran into this and just marked it up to one more reason for me to completely ignore r2.1+ for real work.
This seems like something that would be easy for Xojo to implement, so we don’t have to wade through miles of code looking for places where strings are compared using = and then replacing = with a function would be tedious in the extreme, to say nothing of prone to mistakes.
Or is there some existing easy way to implement this?
Testing for equality with = is much faster than using String.Compare and I suppose we wouldnt want = to become less efficient when comparing strings. I have no idea what fixing = would entail with regard to speed. It might be preferable to normalize strings internally so any code could rely on canonically equivalent text to be represented by the same sequence of code points and bytes (always precomposed or always decomposed, whatever).
Until canonical equivalence is properly implemented throughout Xojo, using String.Compare for any text received from the outside world may yet be your best bet.
I have tried converting the encoding from UTF8 to some other variant of Unicode and back, hoping the text would be normalized in the process, but to no avail.
[quote=471614:@Michael Hußmann]Testing for equality with = is much faster than using String.Compare and I suppose we wouldnt want = to become less efficient when comparing strings. I have no idea what fixing = would entail with regard to speed. It might be preferable to normalize strings internally so any code could rely on canonically equivalent text to be represented by the same sequence of code points and bytes (always precomposed or always decomposed, whatever).
[/quote]
We wouldn’t want strings to be normalized automatically or the performance of = to be reduced.
Maybe the strings should have a Normalize method for people who need it or just use the String.Compare method that is available today.
that = doesnt treat ü and ü as “equal” is really a problem
I would expect that most users would be VERY surprised that it doesnt and it would be the unusual case where you would NOT want it to do that