APFS and Xojo FolderItem issues - May be very serious issue !

Sam_Rowlands · April 3, 2017, 2:27am

It is, but I always have my drives formatted as case insensitive.

I haven’t reformatted my drive for many years.

Joe_Ranieri · April 3, 2017, 3:20am

Default OS X installs have always used HFS+, which is case preserving but not case sensitive. Early versions were able to use Apple’s UFS and modern versions can use case-sensitive HFS+, but they aren’t the default.

Thomas_Tempelmann · April 3, 2017, 7:38am

OSX is neither nor. The type of volume it runs on is what matters. And Apple has made it clear in the past to developers that they must test their programs on case-sensitive volumes to make sure they do not mess up their case, e.g. when storing path and file references somewhere. When it came to moving some well-established code libraries and samples from OS X to iOS, which always uses a case-sensitive formatted volume, I remember seeing cases where they failed, because names were hard-coded with the wrong case, for instance.

Now, we’ll have another problem soon with normalization-sensitive file systems, so we’ll have to double check our code once again. And Xojo has to make sure their own code doesn’t cause any problems for us either.

To those who still seem to think this is a user’s error (I got the impression from some comments):

It’s not the user’s fault to use something Apple provided in beta state and then tell us developers that it breaks our apps. It’s our job to make sure that this doesn’t happen, unless it’s clear it’s a bug on Apple’'s behalf. It’s often enough not a fault in the beta software, but rather on our side, just like in this case.

Telling the customer to not use APFS with your problematic program is not a solution, it’s only a temporary work-around until you, its developer, fix your code. (also see Jeff Tulin’s post, which I agree with)

Christian_Schmitz · April 3, 2017, 7:41am

Xcode now warns if #include uses name of different case.
I fixed thousands of #includes here. Just in case I someday use case sensitive system.

Thomas_Tempelmann · April 3, 2017, 7:48am

For testing, it should be sufficient to just add a new partition or an external drive and format that in APFS, preferrably case-sensitive, and then use your software on it. That includes:

Run your app from it.
If it can open or write files at user’s choice, puts those files onto the APFS volume.
If your app reads and saves data automatically, e.g. to ~/Library/Application Support/YourAppName, then let it create it first there, then move that entire folder to the APFS volume and then add a symlink from its original place to the place on the APFS volume. That procedure should, in most cases, simulate the case of running OS X from such a APFS volume.

Thomas_Tempelmann · April 3, 2017, 8:25am

Okay, I’ll try to explain better.

Well, you seemed to when you wrote this in your comment:

I even pointed out that they are already in the same encoding, yet you now say that you were not talking about encoding?? Despite that you clearly did back there?

Anyway, let’s focus on what you really wanted to talk about:

and then again:

You are still hooked on encodings there. Yet, the encoding of all these compared strings is the SAME - as I pointed out before. Why do you keep referring to encoding differences when they’re the same here?

I wrote that these use different composition forms. Oddly, later in your reply you use the correct term, so I assume you do know the difference. This is really confusing me.

What I pointed out was that StrComp(… , 1) should be able to handle the different composition forms of strings with the same encoding.

Here’s an example why:

If a user enters a string into a TextField, and he enter an “ü”, it may be composited or decomposited. Now, the program may be reading text from a text file that also contains an “ü”. The program would - per your expressed rule - make sure to convert BOTH these texts to UTF-8, so that they have the SAME encoding. Yet, if I’d use “=” or StrComp(…, 1), they may turn out not to be the same, because they still have different byte representations.

It’ll be difficult for a Xojo programmer, who is usually not that deep into technical aspects, to understand why sometimes “ü” = “ü” is false when even “Ü” = “ü” can be true.

Or consider this: While “ü” = “ü” could be false in any Unicode encoding, it would be true always if I’d convert those to a non-Unicode encoding that supports this character, such as WinLatin1 or MacRoman. Is that something a user should expect - that the lesser encoding does the right thing but that richer one does not???

I could agree with you if you’d say that adding a normalization call of any compared string would slow down practically all string operations, and may slow down entire string-intensive apps. But I don’t agree that this behavior is correct, especially since it’s (a) hard to understand and (b) Xojo offers no solution to it for Strings (and no, equalizing the Encodings does NOT help, just try that with my sample code!).

I also didn’t request that StrComp and “=” get fixed to deal with this now. I only explained that they’re giving misleading and unexpected results. And my examples above clearly show why.

You, Norman, even acknowledge that by referring to the new “text” type, which solves this finally. But you kept going on about different encodings when it wasn’t about that, and that’s very irritating. It makes you sound as if you don’t really understand what you’re talking about.

Jürg_Otter · April 5, 2017, 6:52am

We’ve ran into this issue with APFS… Just yesterday I noticed that an iOS App of ours no longer could create/save/re-open a UIManagedDocument when running on iOS 10.3. In the Situation, where the Filename contained chars such as “ü” (e.g.: “Zürich”). The fix was to use the decomposited String of “Zürich” when creating the filename…
This affected an iOS project created with Xcode. So it can happen to anyone… code that worked since iOS 4 (when the App was first released) suddenly “broke” with iOS 10.3 - just because of the filesystem handling strings in a different way.

Thomas_Tempelmann · April 5, 2017, 7:01am

Interesting. I wonder how that came to happen.

My guess:

The file with the name “Zürch” had been created pre-iOS 10.3. First, the name was determined and stored in composited form in a file or a database. Then a file based on that name was created on the HFS volume. Per HFS+ behavior, the name was stored decomposited on-disk. Then the user installs iOS 10.3 and the volume gets converted to APFS, with the name preserved in its binary form. Now, the same app runs again and wants to open the file, using the name it once created it with - and that will now fail because even though, back then, the file was created with the composited name, it’s now having a slightly different name, i.e. decomposited.

The worst part of it is that you could think you did everything correctly pre-10.3. In fact, UIManagedDocument is an Apple-provided mechanism that is supposed to take care of this for you! But what you would have had to do, and which NO ONE would have thought of, was to fetch the actual name from disk back after creating the file, and store that in the database. Only then you’d end up with the decomposited name that APFS requires now. (Well, actually, on the cocoa-dev list there are some people who seemed to know about this potential problem ahead of this issue, saying that it’s always good to call [NSString fileSystemRepresentation] to make sure the name is stored correctly and won’t get any more changes applied once stored on-disk. The current Apple docs for this function do not explain any use case for it, though.)

An impossible situation, created by Apple without any warning ahead of it.

Jürg_Otter · April 5, 2017, 7:50am

In our situation, no. UIManagedDocument could not create/save/open a new document on iOS 10.3 - a “document bundle” that has never seend anything else than APFS.
It may have to do with what our document-subclasses do, or it may be a bug in UIManagedDocument.
Either way - i just wanted to say that every framework can face issues like this when the users are (forced to) switch to APFS.

The “warning” has been that we have been aware that APFS is coming.
What I don’t like at all in the “new/current Apple-world” is that every “dot release” may break one’s App (earlier it was usually with every “major release”). In an ideal world, one would have to check every aspect of every app about 4x a year. That’s just too much effort for some of us. And I don’t compare this to windows, where most (or even all) apps we’ve built for WinXP still run without issue on Win10.

Anyway, I don’t want to move the focus to such aspects. Let’s go back to the APFS topic.

I know what we’re going to do soon with our macOS apps:

And so should every one of us dealing with files. Even when Apple is not warning us - somehow we know that APFS will be on our user’s systems sooner or later.
At the end of the day, it has to work - no matter if there are issues or inconsistencies in Apple’s, Xojo’s or our own framework. We will have to find our own way to work around it, so that what one is doing is going to work both now and with APFS. And if it works today with APFS, the chances are probably higher that it will work as well once it’s coming. And we know Apple - it may be sooner than expected.
At least it’s easier on macOS so test in advance (using an external disk or just a diskimage)…

Thomas_Tempelmann · April 5, 2017, 7:58am

Amen to that!

Jürg_Otter · April 5, 2017, 1:11pm

Oh, and those of your that have an iOS App dealing with files… better test your App (using characters such as “ü”, “ä”, “à” in filenames) now… and remember: run it on the iOS 10.3 device, not in the simulator.
It may very well work as expected in the Simulator (which is using the FileSystem of your macOS), but not on a iOS 10.3 device (using APFS). But of course I hope you won’t find any issues

Jürg_Otter · April 5, 2017, 7:31pm

Two articles about APFS:
APFSs Bag of Bytes Filenames
APFS to Add Case-Insensitive Variant for Mac

Tim_Jones · April 5, 2017, 9:34pm

A lot of discussion here, but this is not specific to APFS. You can see all of this in a standard MacOS Journaled Case-sensitive volume all the way back to 10.1.

I’ve always used a case-sensitive filesystem since that’s the way it works in real Unix. I had 3 games back in the 10.4 era that wouldn’t run because of that, but I just didn’t play those games.

Beatrix_Willius · April 6, 2017, 5:22am

@Tim Jones: What do umlauts have to do with case sensitivity? We were talking about encodings and normalization but not about case.

After reading the article by Michael Tsai: it’s encodings AND case. Shudder. This is just awful.

Thomas_Tempelmann · April 6, 2017, 9:32pm

That’s not correct. This talk is about normalization forms, and those are handled differently between HFS+ and APFS, even if both are case-sensitive.
The normalization form differences discussed here are ARE specific to APFS

While some of it also relates to general comparisons of strings, you cannot reproduce all the issues discussed here by using a case-sensitive HFS+ volume.

Thomas_Tempelmann · April 6, 2017, 9:36pm

Nope, not encodings. I tried to explain that a while up there. The long posts that no one seems to read (or understand? am I writing so badly?)

BTW, I’ll give a talk at the next Macoun conference in Frankfurt (in Oct 2017) about all this, with examples and such.

Beatrix_Willius · April 7, 2017, 5:37am

@Thomas Tempelmann:

Composition is not part of encodings?

I remember having fun with Mülleimer (trash can). A comparison “Mülleimer” = “Mülleimer” didn’t work. I really thought I was losing it.

Thomas_Tempelmann · April 7, 2017, 7:14am

Compositing is part of Unicode. But it’s separate from Encodings.

First, a definition: Code Points are the codes to make Unicode “characters”. For example, “u” uses the Code Point 117 (hex 75). And the single-Code Point representation for “ü” is 252 (hex FC).

Encodings is about different byte representation of a Unicode character (more exactly: Code Points), as in UTF-8, UTF-16, UTF-32, with Little Endian and Big Endian orders. The Code Point 252 uses 2 bytes in UTF-8, 2 Bytes in UTF-16 and 4 bytes in UTF-32.

Composition is about the same character we see, such as “ü”, being represented by different Code Points. For instance, either as a single Code Point (252, hex FC) or as a combination of “u” and “¨” (chr(&h75) + chr(&h308)). In UTF-8, the first uses 2 byte, the latter uses 3 byte. So, same encoding, different composition.

Normalization is the attempt to make them equatable, so that “ü” = “ü”, at the Code Point level (and, if same encoding is used, also at the byte level).

Tomas_J · April 7, 2017, 7:21am

Nope it was written clearly… and at least I’ve read it

Michel_Bujardet · April 7, 2017, 7:58am

Which the Text type manages brilliantly BTW.