Encodings from Mail.App

I’m not sure this is exactly a Xojo issue, but it’s something that I’ve figured out building a Xojo app, so here goes:

I am building an application that will process an e-mail from Apple’s Mail.app. The user can select all the text in a message in Mail.app and then drag it into the Xojo application’s window and the application will then process the message’s text and all is happy and good and birds sing and the sky is blue.

So, as part of the next version, I’ve decided that I want to automate this process a bit.

So, I’ve built in an AppleScript dictionary into the app. This dictionary contains one command, which basically tells the application to process the passed text, exactly the same as if the text were dropped instead. Here’s where things go weird. Now, when the text is passed through the AppleEvent (via AppleEvent.StringParam), it acts like the text’s encoding is different than when text is dropped. I’ve done a variety of tests and it appears that text being dropped from Mail.app is encoded as UTF-8. The strange thing is that the text found in AppleEvent.StringParam also appears to be encoded as UTF-8. But, there are definite differences between the exact same text being dropped from Mail.app and being passed via AppleScript. But, those differences only appear where space characters are. And, it’s not always the same difference. One space character may be one code, but the next space character may be a completely different code. This is NOT happy and good. Birds do not sing and the sky is cloudy and stormy.

Converting the encoding to something different (like MacRoman) still ends up with different text between the two methods.

I’ve spent the better part of the afternoon trying to figure this out and I am out of ideas. As I said, I’m not sure this is a Xojo problem. It could very well be something that Mail.app is doing. But I can’t figure this out. I truly hate encodings and I feel that my frustrations are blinding me to something I am missing.

Has anyone else run into encoding oddities out of Mail.app? And, even if you haven’t, does anyone have any ideas of how to tackle this?

Any help would be greatly appreciated.

Do you have a small test project that you could upload somewhere?

Yes. I do.

Example Project

I made this example with Xojo 2014r2.1. This sample does not include the AppleScript stuff, but the problem can still be recreated. I’ve tested using Mail.app 8.1 (1993) on Mac OS X 10.10.1.

Run the project. You will get a window. In the textfield at the top of the window, put in some text that will be in the e-mail. The text I’ve been testing with is already in that field ("Quantity: "—note that the two spaces after the semi-colon are important).

Using Mail.app, send yourself an e-mail that has some text, followed by the text you put in that textfield (i.e. "Quantity: "—again, the spaces are important), followed by some more text. Make sure you format this e-mail as plain text.

When the e-mail arrives in your inbox, select all the text and drag it to the sample app’s window over the “Drop Text Here”. The text in the Text Area will change to show the text you dragged and a MsgBox will appear showing the results of String.CountFields, using the text in the textfield as the delimiter. Unless you put that text more than once in your e-mail, it should say “2”.

Then, back in Mail.app, save that sample message by selecting File->Save As. Make sure that “Plain Text” is selected as the file type. Then, back in the sample app, select File->Open and select the file you just saved. Most likely, the MsgBox will appear showing “1” (at least, that’s what happens on my system). The reason for this is that the spaces have changed to something other than a space.

Whatever this is it’s surely a Mail problem. Encodings in Mail are totally screwed up because Mail usually reformats and re-encodes the Mail.

Which version of Mac OS/Mail do you use? Encodings are handled differently on Yosemite.

Hmm… are you mixing things up with your 2 methods: is one method of your code getting the source code and one the rendered html text? I can’t get your example to work on drag-and-drop for text (using Mavericks at the moment).

2 things to also consider:

  • There is a command to get the mail source from a selected mail.
  • Everything you do with AppleScript in Mail is slloowwww.

Thanks for the response Beatrix!

As I indicated above, the e-mails that are involved here are simply plain text. No HTML. Yes, I am testing on Yosemite.

What do you mean by the example won’t work for drag-and-drop?

The two methods are only different in how you get the text into the application. The processing behind the scenes is identical as it’s the same method. The saving as a plain text file and opening it in the example isn’t how it is done in the real application, but the end result is the same.

As for AppleScript in Mail being slow, that may very well be true, but it is perfectly fine for the use I need it for.

I’m sure it probably is an issue with Mail.app itself, but I really need to try to find some sort of solution. If I was only dealing with ASCII, this wouldn’t be an issue as I could write a function to simply replace anything where String.asc < 32 or String.asc > 126 with a space character. The problem with that is, some characters in these e-mails may involve other characters, such as ü or é.

How do I do the drag-and-drop out of Mail? I get this working by dragging a mail from the list of mails. But not by selecting text in a mail.

I’ve done much reverse engineering on Mail. What you see - even for a plain text mail - is a sort of html viewer. Plain text mails have line edings every 70 chars or so. This is changed for the html representation. I would also guess that the encoding is changed for html.

I just select all the text (Command-A) and then click and drag.

Scott,

you get back from Apple Mail a “space” followed by a “Nonbreaking space”.

The “common” non-breaking space is encoded as U+00A0. For workaround you can “ReplaceAll” before you do your work, maybe.

http://en.wikipedia.org/wiki/Non-breaking_space

Have no idea why Apple Mail change two spaces in one space an a nonbreaking space.

Wolfgang

… and this works in your example. But I am not very familiar with encodings.

  TextArea1.text = theText
  
  dim vNonBreakingSpace as string = Encodings.UTF8.Chr(&h00A0)
  dim vSpace as string = Encodings.UTF8.Chr(&h20)
  
  theText = ReplaceAll( theText, vNonBreakingSpace , vSpace )
  
  msgBox str( theText.CountFields( TextField1.Text ) )

Thanks Wolfgang! I will give your advise a try sometime tomorrow and let you know if it works for my needs.

Thank you again Wolfgang! Looking for and replacing non-breaking spaces appears to have been the answer. Things are working just fine non that I understand what Mail.app is sending to my application!