The normal situation for my app is to get data directly out of Mail from the harddisk. Encodings are okay there. The fallback solution - for instance needed for Gmail - is to get the data via AppleScript. I noticed a nasty encoding problem: the bytes are different and when assigning the correct encoding I get garbage.
The script itself is very simple:
tell application "Mail"
set theMailbox to mailbox ("xxx") of mailbox ("yyy")
set theSource to source of message 1 of theMailbox
Let’s say that the mail contains the umlaut ü. This is the byte sequence C3 BC in UTF8. When doing the AppleScript I get C3 82 C2 BC instead. Which is “Ã¼” instead of an ü.
Does anyone have an idea how to get a nice ü out of this mess?
you could use “content” instead of “source” in AppleScript.
source (text, r/o) : Raw source of the message – I don’t know the encoding for that
tell application "Mail"
set theMailbox to mailbox ("xxx") of account ("yyy")
set theSender to sender of message 1 of theMailbox
set theSubject to subject of message 1 of theMailbox
set theSource to content of message 1 of theMailbox
set theMail to theSender & return & return & theSubject & return & return & theSource
And if you need the header too, you can use: set theHeader to all headers of message 1 of theMailbox
Encodings out of Mail.app are, apparently, a mess. Mail.app is handling the encoding correctly internally and, using drag and drop, you can get text in the proper endcoding (content is always in UTF-8, it seems), but if you save the text or transfer it via AppleScript, Mail goes on a wild randomization spree and you can’t be sure of exactly what you will get.
Although I figured out a solution to my particular issue (with Wolfgang’s help), since it uses the content of the message and not the source, I don’t know if it will work for you.
That said, while I was struggling with the Mail.app issue, I also looked at various other e-mail clients. The state of Mac e-mail clients is pretty all over the place. But, one constant seems to be their lack of good AppleScript support.
The one exception to that I found was Outlook (Outlook 2011, not the subscription-only Outlook that was just recently released, which I haven’t tried). Say what you will about Outlook (it is overkill for what most people actually need, has several interface attrocities and can be buggy in may different ways), but it has an incredibly rich AppleScript dictionary and it’s handling of encodings seems to be rock-solid. I was able to use it during my battles with Mail.app without having to change my Xojo application at all, and it worked very well.
It’s entirely possible that Outlook 2011 is not an option for you, but I wanted to throw this out as a potential solution, just in case.
in recent OS versions applescript standardized itself, separate from the rest of the OS, to use UTF16 encoding throughout. If youre getting data from applescript it may be UTF16 encoded which can be garbage for high order glyphs if you just define the encoding to UTF8. Try doing a define encoding on the data youre getting back as encodings.utf16 and see if that makes it all work. Once you set the encoding to utf16 then you can convert the encoding to UTF8 without it getting garbage too, but it may be enough to just define the proper encoding for the string from applescript. Or that might have nothing to do with it
Unfortunately, in my (admittedly limited) tests, what I was getting out of AppleScript from Mail.app was not UTF-16. If was UTF-8. AppleScript may use UTF-16 internally, but it appears it tries to maintain the encoding of the source material when passing strings from one application to another.
[quote=163295:@Beatrix Willius]Let’s say that the mail contains the umlaut ü. This is the byte sequence C3 BC in UTF8. When doing the AppleScript I get C3 82 C2 BC instead. Which is “Ã¼” instead of an ü.
Does anyone have an idea how to get a nice ü out of this mess?[/quote]