ConvertEncoding eats text

My app parses mail: get ContentTransferEncoding and ContentType, decode mail, apply encoding and convert to UTF8. Which works mostly. Now I’ve got a hebrew html mail, where ConvertEncoding makes an empty string. Xojo 2014r2 and 2015r2. Mac OS 10.10.5.


dim f as FolderItem = GetOpenFolderItem("") dim b as BinaryStream = BinaryStream.Open(f) dim s as String = b.Read(b.Length) s = DecodeQuotedPrintable(s) dim theEncoding as TextEncoding = GetInternetTextEncoding("iso-8859-8") s = DefineEncoding(s, theEncoding) 'string shows okay here as hebrew s = ConvertEncoding(s, encodings.UTF8) 'string empty



=F7. =F9=E9=F8=E5=FA = =EE=F1. SC5015010146 =EC=EC=F7=E5=E7


Does anyone have an idea what could cause this? Any idea for a workaround? I need the UTF8 because mails could have multiple parts that need to be added and then the data is written to Valentina.

Are you sure that’s the right encoding?

I think you should use:

dim theEncoding as TextEncoding = Encodings.ISOLatinHebrew

ISO-8859-8-I and ISO-8859-8 are not the same.

The encoding is correct and comes from the mail as

Content-Type: text/html; charset=“iso-8859-8”

Encodings.ISOLatinHebrew at least doesn’t eat my text. But what is the difference between ISO-8859-8-I and ISO-8859-8? Google wasn’t able to give me an answer.

The first to links in Google give you ISO-8859-8-I and ISO-8859-8. I just quickly read them (so I’m not really certain I understand it correctly), but ISO-8859-8 seems to be in logical order (left-to-right) and ISO-8859-8-I in visual order (right-to-left).

Overlooked it: ISO-8859-8-I… The characters are in logical order. … ISO-8859-8 is sometimes in logical order (HTML, XML), and sometimes in visual (left-to-right) order (plain text without any markup).

Still doesn’t explain the empty string after ConvertEncoding.

ISO-8859-8-I includes additional codes. If one of those were present in your text, ConvertEncoding would fail.

If fails even with ???.

I’ve been working with encodings (especially the messed up sort) for a long time now and I’ve NEVER seen this behavior. Also, the hebrew text is shown correctly in the debugger. Still confused…

Have you tried converting to UTF-16 instead? Perhaps UTF-8 doesn’t include the Hebrew glyphs?

Trying to convert to UTF16 also gives empty text. The debugger shows the hebrew glyphs before trying to convert the encoding.

I wonder why it should make any difference if the Unicode code point is shown as UTF-8, UTF-16, UTF-32 or whatever? These encodings are just different ways to point to the same Unicode character, or do I miss something?

It shouldn’t. I mistyped. I was wondering if the font being used doesn’t include the glyphs, but if the debugger shows the characters before the conversion and not after, that’s probably not the case.

@Beatrix Willius — you might try using the new Text type instead. You may have stumbled into an edge case that doesn’t work in String.

@Greg: will do some testing. The code is part of a large parsing algorithm. I get heartburn when I think about converting this to text. Not going to happen soon. Also the data comes in without encoding and is given encoding only as very last step.