I had written some code to deal with this, and it seems to work fine in Mojave (10.14), but when I test the same code, it seems to break in High Sierra (10.13). Both systems are using APFS.
Is anyone aware of a change between 10.13 and 10.14 ?
Have you read the article above?
I can very well imagine a situation like this: Depending on how you’re creating a file “Jürg.txt”, you can or can’t access it by e.g. aFolder.Child(“Jürg.txt”). And you can even have two Files “Jürg.txt” in the very same folder. They look the same, but have a different “binary representation” for the filename, so two valid and distinct files for APFS (but not for all API’s).
It all boils down to UTF8 and de/composed representations.
One situation we’ve run into has been on iOS (when Apple forced all iOS devices to have APFS with iOS 10.3). A document saved with name “Zürich” could not be opened any more, because the iOS API (UIManagedDocument) was always using the “decomposited String” for the filename when opening the document. So if the file had been saved with a “composed String” filename, one has been (or maybe still is) out of luck.
I just hope Xojo is thinking about that when rewriting their FolderItem Framework and testing thoroughly files/folders with diacritics (and how they are de/composing what gets assigned as String, so that it works when it comes to APFS filesystem).
Michael said he had a Xojo solution that works, so my question was about the Xojo aspects of “it seems to break in High Sierra”. There are many different ways for things to break.
Does it switch to wrong characters?
Does it crash hard and die?
Does it raise an exception?
Does it create nil folder items?
Does it …
Still investgating, but it looks like what I’m seeing is this:
Given a filename such as ä.pdf
get the file path using Xojo FolderItem.ShellPath Edit: FolderItemNativePath
on 10.13 : \\\\U00e4.pdf
on 10.14 : a\\\\U0308.pdf
and I’m passing this path to an API where it fails with the \\U00e4 version but works with the a\\U0308 version
What I’m not sure about - is this an OS change? A Xojo bug… etc. I’m using 2019 R1.
This may be unrelated, but what is String.Right(1) supposed to do with decomposed Unicode?
dim filename as string = ""
Dim f as FolderItem = GetFolderItem("").child(filename)
dim ta as TextArea = TextArea1
dim CR as EndOfLine
// these two lines behave as expected - the UTF8 string for is C3 A4
ta.appendText "name=" + filename + " : " + EncodeHex(filename,true) + CR
ta.appendText "f.name=" + f.Name + " : " + EncodeHex(f.name,true) + CR
// shellPath appears to use the decomposed form, where the is encoded as 61 CC 88
dim sp as string = f.ShellPath
ta.appendText "f.ShellPath= " + sp + " : " + EncodeHex(sp, true) +CR
// Here is where it gets wacky: taking the right-most character should give us
// but instead it displays as and the hex representation is CC 88
dim sp1 as string = sp.right(1)
ta.appendText "sp1= " + sp1 + " : " + EncodeHex(sp1, true) + CR
// if we grab the right-most 3 bytes then it displays properly as
dim sp3 as string = sp.rightB(3)
ta.appendText "sp3= " + sp3 + " : " + EncodeHex(sp3, true) + CR
FolderItem.name returns the composed unicode filename but FolderItem.NativePath and .ShellPath return decomposed unicode. Submitted as <https://xojo.com/issue/55652>
The behavior also seems to be OS -dependent: in macOS 10.11, folderItem.name and .nativePath are the same, but in 10.14 they differ. I’m not sure at which OS version it changed. Maybe APFS with 10.13?