Confused about accented characters

Hello,

I’ve recently narrowed down the fact that displaying some files name in a WebListBox raises an exception. I iterate in a folder and add the items’ display name in a listbox. The files may have been uploaded by other users, so I can’t predict the names.

For instance, when a file name contains an “é”, the app may crash, but not always. It looks like the result is different if I upload files from FTP or if my web app renames the file (with the same “apparent” character).
I’ve stumbled with the fact that, say, “à” might be treated as a single character (“à”) or a multi-character (“`” and then “a”). I guess they are distinct characters; not sure…

Handling the items in code seems to work fine, so I don’t want to rename the files physically once they are uploaded: if one file is downloaded later, I wouldn’t easily recover the original name without having an extra file to link displayed names with shown names. I just want to display the name in the listbox without crashing the app.

Ideally, I want to have accented characters appearing accordingly, otherwise the words may mean other things. This rules out setting the encoding to ASCII or otherwise removing accents.

Is that doable somehow?

You simply want to make sure the string is UTF-8 before you assign it to the weblistbox cell.

As simple as that? Shouldn’t Folder.ChildAt return an UTF8 string (unless in “odd” configurations)?

NO, FolderItem.ChildAt Returns a FolderItem

The FolderItem.Name retuns a String, nowere says it must be a UTF8 string.

So, you have to make sure the string is UTF-8

You got to understand that Xojo Web makes intensive use of JavaScript to do it’s magic. If for any reason you send strings with bad encoding or no encoding in there, chances are you will get the black JavaScript error box because it threw a wrench into the machine.

You’re right; I’ve bad expressed myself. Of course, I talked about the name (actually, the DisplayName).

That’s true, but sometimes, not everything is mentioned. I’m not implying it “must” be a UTF8 string, but it was my expectation given the fact it should have been saved that way.

Ok, thanks.

Ok, that makes sense. At any rate, it’s usually the way to handle strings in the desktop version as well. The only difference being a web app crashes while a desktop app “continues” with malformed characters.
Still, I expected ChildAt to return a UTF8 encoding in my folders. Ok, I’ll be explicit…

Thank you.

That is because the web app is Javascript underneath, which is more sensitive to malformed strings. Indeed desktop can gobble misencoded strings, and all it will do is display funny characters.

Ok, so I’ve tried defining the encoding. Same problem.
I tried these 2 versions:

l.AddRow=DefineEncoding(f.DisplayName,Encodings.UTF8)

or

l.AddRow=ConvertEncoding(f.DisplayName,Encodings.UTF8)

I’ve also implemented the UnhandledException event to clean up the design upon exceptions. With both these tries, I get the exception with a file name that contains “é”.
Puzzled…

You may want to file a feedback report with a small project demonstrating the issue attached.

Let me know when you do, I will have a look at the project and try to create a workaround for you.

Or post a link, so I can download the project, and see what I can do.

[quote=467477:@Michel Bujardet]You may want to file a feedback report with a small project demonstrating the issue attached.

Let me know when you do, I will have a look at the project and try to create a workaround for you.

Or post a link, so I can download the project, and see what I can do.[/quote]
Thank you, very appreciated; I’ll do that (later though, as I’m going to sleep and I don’t know how this week-end will be busy).

I’m currently wondering whether the FTP program I use to put “dummy” files to test my app could mess the file name encoding, or if it’s an OS issue (my server is running Debian).

Could very well be an OS thing, but I suspect it has more to do with the way ChildAt may be reporting the string. By default, most PC systems use CP-1252 encoding for the file names and shell, so it probably requires some work.

At any rate, I am confident we should find a solution.

Which FTP client are you using ?

[quote=467556:@Michel Bujardet]Could very well be an OS thing, but I suspect it has more to do with the way ChildAt may be reporting the string. By default, most PC systems use CP-1252 encoding for the file names and shell, so it probably requires some work.

At any rate, I am confident we should find a solution.

Which FTP client are you using ?[/quote]
Hello Michel,
Thank you, you sent me to the solution!

I’m using Transmit to send my “dummy” (for testing) folder. I remembered about the “Text encoding” sub menu and knew I didn’t changed anything there ever, but I went to check. It was set to “Default”, which I assumed meant “UTF-8” (so much accustomed that “default” is UTF-8 in xojo…). Just to check, I changed explicitly to UTF-8 and saw the list refreshing with names having previously correct accents becoming malformed and vice-versa; now, both uploaded (using Transmit) or internally-made (using my web app) have the same encoding and don’t make crashing the app.

Thanks!

I am glad you found a solution. Though

I am still worried other users may be uploading as CP-1252 or something else, and will still have to contend with malformed characters.

[quote=467640:@Michel Bujardet]I am glad you found a solution. Though

I am still worried other users may be uploading as CP-1252 or something else, and will still have to contend with malformed characters.[/quote]
What’s the proper way to deal with this?
Actually, I’ve uploaded my dummy files for testing, but once it gets in use, the files and folders will be almost only handled from the web app (creating folders, uploading files). “Only” uploading files would be a concern.

Maybe I could test with various configurations (several Mac/Windows/Linux versions with different browsers) and see if the encoding is “always good”? Or would the encoding be set by the OS and not modified upon upload from the web app?

Thanks.

I don’t know about Debian, but at least under Windows, anything about shell, which includes file names, is always CP-1252. As a result, French accents and probably all accented languages, get messed up when dealing with them from Xojo.

There are several threads in this forum I participated in about that.

Usually, Linux works like Windows. I suspect Mac is doing something else. Indeed, you want to test.

[quote=467712:@Michel Bujardet]I don’t know about Debian, but at least under Windows, anything about shell, which includes file names, is always CP-1252. As a result, French accents and probably all accented languages, get messed up when dealing with them from Xojo.

There are several threads in this forum I participated in about that.

Usually, Linux works like Windows. I suspect Mac is doing something else. Indeed, you want to test.[/quote]
You’re right. File names in shell under Windows are easily mangled (at least, when I tried).
However, I can’t see how uploading files using a web browser would use the shell (I’m basing this on my “assumption” that the shell clearly handles encoding differently than “GUI apps”, otherwise handling encoding in desktop apps for file names would be a nightmare too).
I’ll test later, when the program gets “useable”. I’ll then have the necessary functions/tools to inspect files from various sources.

Thank you.