Special Character folder item child encoding

If the accented character was created on Mac, for instance, it would not show as ü on a Windows system.

@Boudewijn_Krijger : Could you look at the file in the Windows Terminal (Command Prompt in Windows 10 and before)? Does the ü show ?

Yes it would. It’s using a valid UTF-8 encoding. It just happens to be different to what Windows does for the ü - also a valid UTF-8 encoding.

In the years before Unicode, there was ASCII and values above 127 (to 255).

macOS attributed characters in that range following its own order.
When Microsoft created Windows (based on macOS), it attributed the same characters (or so) to a different set of values (also in the 128-255 range).

That is what I think Michel was saying.

That is not the issue here, though.

Yes, the issue is a misrepresentation of a set of characters that also exists in the 128 thru 255 values…

No, it’s the fact the ü (and similar) can be created via two different, but equally valid, routes using UTF-8. For ü, one is the two-byte sequence C3 BC (used on Windows), the other is a three-byte sequence, the lower-case u (one byte, 75 hex) followed by a combining character (two-bytes, CC 88), used by the macOS file system.

@Michel_Bujardet it shows with a ü in a terminal.

@TimStreater : that is indeed the issue.

MsgBox "u umlaut: " + DefineEncoding(Chr(159),Encodings.MacRoman)

Returns:
u umlaut: Ÿ

Without the DefineEncoding, it is not displayed.

Xojo 2021r2.1 / Monterey.

Unfortunately, I cannot test that with, say, Xojo 2015r1.

That tells you nothing about how a filename containing a ü is represented in the directory blocks of the filesystem.

Actually, if the terminal uses the CP-1252 (WindowsANSI), so if it shows ü it means the character is encoded in Windows ANSI.

@Boudewijn_Krijger : Again, what do you get with Specialfolder.desktop.child(“fürmich.txt”).shellpath ?

Emile and Michael, please. The case we see here is a known “UTF different binary representation for the same graphemes”, composed vs decomposed, meaning the same thing. There’s no old ASCII, MacRoman, or Windows codepages translation relation, it’s pure Unicode, UTF-8 or 16. Macs seems to opt for decomposed representations and Windows for composed ones for storing filenames. Seems that if you move a Mac file, with a decomposed representation of its name using accented letters, to a Windows system, it receives it AS IS. If you do a raw compare of both filenames (byte to byte), they differ, if you normalize both to the same composition scheme, they match, If you store files in a system that allows you to save names with BOTH schemes, As Windows does, you will see THE SAME file TWICE! They look the same name, but its binary representation differs.

Mode Normalized Text
NFD 0066 0075 0308 0072 0020 006d 0069 0063 0068 002e 0063 0066 0067 für mich.cfg
NFC 0066 00fc 0072 0020 006d 0069 0063 0068 002e 0063 0066 0067 für mich.cfg

Windows Explorer, same name:
image

Console, Dir command:
image

The OP verified that fürmich was displaying correctly in Windows terminal. Rick, please :smiley:

I just did this test on the mac:

I had a file abcü.txt on my Desktop. This was the one I created the other day, with the decomposed name. So I ran this code:

Var  memfn As MemoryBlock, strfn, strdata As String, f As FolderItem, tmpof as TextOutputStream

strdata = "12345"

memfn = new MemoryBlock(9)

memfn.Byte(0) = &h61       // a
memfn.Byte(1) = &h62       // b
memfn.Byte(2) = &h63       // c
memfn.Byte(3) = &hC3       // C3 and BC give you the UTF8 *composed* ü
memfn.Byte(4) = &hBC
memfn.Byte(5) = &h2E       // .
memfn.Byte(6) = &h74       // t
memfn.Byte(7) = &h78       // x
memfn.Byte(8) = &h74       // t   (So Memblk contains abcü.txt - composed.)

strfn = memfn
strfn = strfn.DefineEncoding (Encodings.UTF8)

f = new FolderItem ("/Users/tim/Desktop/" + strfn, FolderItem.PathModes.Native)

tmpof = TextOutputStream.Create (f)    // First time, I stopped here
tmpof.Write (strdata)
tmpof.Close ()

On first run, I stopped execution as per the comment above. In the debugger, I could see that the folderitem f pointed at the existing file on the Desktop. I then quit the app, and removed the existing file from the desktop. I then ran the above code again and allowed it to complete.

The file was written, but examination shows that it now has the decomposed name, even though the folderitem was given a composed name to use.

So, on the Mac at least, you can’t create a file with an un-normalised name (from macOS PoV).

This is a thread about Windows. Any experiment under Mac is irrelevant.

Michael, when the OP explained his problem, we, he and us, saw where the problem was. Take another look.

It is EXACTLY what I said. UTF-8 composed vs decomposed.

It’s a thread about what happens if you create a file on one system, them move it to another machine with a different OS.

So you should move the file under Windows, right ?

That’s what people were doing at the top of the thread and then complaining about not finding the file. We’ve been discussing why, and what Xojo could do or could not do, ever since.

In the context of this thread, this may be of interest:

https://openradar.appspot.com/FB8957502