ReplaceAll knows nothing of the various possible ways a codepoint can be encoded into bytes. Especially with something like an umaut. It could be a combining pair or a single character. The name from the file system is encoding one way, the name from your string literal is using a different method. That’s one of the advantages of the new Text class. It works with the original codepoints, not some arbitrary byte representation.
I hate it when something does not work as expected. It turns out for some reason Xojo displays characters that are composites of 3 characters representing each accented character. As a result, Instr() does not work with accented characters from the name of a file such as “a?e?e?c?a?u?a?e?i?o?u?.txt”. Looking at each byte in the string, is composed of “a”, then &uCC, then &u88.
This will remove all accents :
[code] dim f as folderitem = GetOpenFolderItem(“special/any”)
Dim rawname as String = f.name
dim name as string
for i as integer = 1 to len(rawname)
if asc(mid(rawname,i,1)) < 128 then
name = name+mid(rawname,i,1)
end if
next
msgbox name[/code]
I tried ReplaceAllB with “a”+&uCC+&u88, it simply does not work. Looks like a bug, but at this point, I am not ready to continue with this charade.
I quickly looked at the content of the string, it is indeed something else.
What that says is that UTF-8 is not quite as convenient as I thought. It does not normalize code points. Neither do the Text type which I tried as well.
I am not going to lose sleep over that, though. Filenames with no accents are just fine to me.
[quote=169184:@Michel Bujardet]I quickly looked at the content of the string, it is indeed something else.
What that says is that UTF-8 is not quite as convenient as I thought. It does not normalize code points. Neither do the Text type which I tried as well.
I am not going to lose sleep over that, though. Filenames with no accents are just fine to me.[/quote]
I think you’re missing my point. Your use of &uXX is invalid in that context. The Text type should handle the filename, I would think, unless it doesn’t merge composite characters into a single codepoint. But it should.
i tried the EncodeURLComponent with my attachment.Name:
[code] Dim dlg As New OpenDialog
Dim f As FolderItem
Dim result As String
dlg.Title = "Bitte Anhang auswhlen..."
f = dlg.ShowModal
if f <> NIL then
System.DebugLog("f.Name: " + f.Name)
attachment.LoadFromFile(GetFolderItem(f.NativePath, FolderItem.PathTypeNative))
attachment.Name = EncodeURLComponent(f.Name)
System.DebugLog("f.Name: " + f.Name)
What now? How do i make a usable attachment.Name out of this encoded string?
Michael
Btw: Regarding the replaceAll-Problem should i open a new topic to show in a simple code, that there seems to be a bug at all when using variables as OldString in this statement?
[quote=169263:@Michael Bzdega]i tried the EncodeURLComponent with my attachment.Name:
[code] Dim dlg As New OpenDialog
Dim f As FolderItem
Dim result As String
dlg.Title = "Bitte Anhang auswählen..."
f = dlg.ShowModal
if f <> NIL then
System.DebugLog("f.Name: " + f.Name)
attachment.LoadFromFile(GetFolderItem(f.NativePath, FolderItem.PathTypeNative))
attachment.Name = EncodeURLComponent(f.Name)
System.DebugLog("f.Name: " + f.Name)
mail.Attachments.Append(attachment)
end if[/code][/quote]
Sorry there is an error in the code i posted above, but i cannot edit the post?
In the 2nd system.debuglog-statement it should be:
The result is attachment.Name: *=UTF-8gescha%CC%88ftsverlauf-06-09-2014.pdf and in my mail.app the name is also attachment.Name: *=UTF-8gescha%CC%88ftsverlauf-06-09-2014.pdf.
Yeah, the RFC is pretty suboptimal. As far as I can see you are missing the quotes around the name. Like in *=UTF-8"gescha%CC%88ftsverlauf-06-09-2014.pdf".
[quote] dim tempFile as FolderItem = SpecialFolder.Desktop.Child(“smörebröd.zip”)
dim theAttachment as new EmailAttachment
theAttachment.LoadFromFile(tempFile)
theAttachment.Name = “=?utf-8?Q?smo=CC=88rebro=CC=88d=2Ezip?=”
MailToCompany.Attachments.Append theAttachment[/quote]
And it works. Now there are both Content Disposition and Content Type in Mail both containing the smörebröd with different encodings. The encoding from the Content Type is what is usually used for subjects.