ASCII to UTF8 Follies

Once again, I am wrestling with converting encodings. I thought it was working well, thanks to Joe Strout’s GuessEncoding function, but today a customer sent me a file that upends everything. I’m working in macOS.

The issue is the degree symbol, which is generated on Windows by pressing Alt+176 within Excel (or any other app, according to the user). When I read a file containing this, the content correctly shows a degree symbol when viewed in Xojo’s inspector with ASCII encoding. If I view the content as UTF8, then Xojo shows (on MacOS) a diamond with a question mark inside it. When the content is printed, an infinity symbol is drawn instead of the degree symbol.

The same thing happens when I use .ConvertEncoding to convert the original ASCII string to UTF8. The degree symbol prints and displays onscreen as infinity. It does not convert to the unicode version of a degree symbol, which is what I was expecting.

Here’s the relevant code:

'Fi is the FolderItem passed to the method

DimTextInput as TextInputStream
Dim FileChunkStr as String
Dim ResultStr as String

DimTextInput = TextInputStream.Open(Fi)    'Open the file
TextInput.Encodidng = Encodings.ASCII.   'Because I know the file is ASCII
FileChunk = Textinput.ReadAll.   'Get the file content

'Look at FileChunk in the Inspector, it shows the degree symbol and says encoding is ASCII.


Look at ResultStr in the Inspector, it says it’s UTF8, and has a black diamond instead of the degree symbol. Printing ResultStr or drawing it in a graphics instance shows it as an infinity symbol.

I also tried using a TextConverter:

Dim tc As TextConverter
tc = GetTextConverter(GetTextEncoding(&h0600),GetTextEncoding(&h0600))
Dim ResultStr As String
ResultStr = tc.convert(FileChunk)

ResultStr still shows as the black diamond and prints infinity, not the degree symbol. Clearly I don’t understand what ConvertEncoding and TextConverter are designed to do.

Thoughts? Suggestions?

  • John

It’s not ASCII, it’s WindowsLatin1.

1 Like

This is not an ASCII character, so it’s unsurprising that it doesn’t convert. See the upper table at:

Actual ASCII, BTW, is already UTF8.

1 Like

This reminds me… I ran across a bug in GuessEncoding. I think I got my copy of the method from @Kem_Tekinay’s website

Look for this line:

elseif b0=&hEF and b1=&hBB and b1=&hBF then

It should be:

elseif b0=&hEF and b1=&hBB and b2=&hBF then

Umm, those two lines look the same…?

Look closer :wink:

OMG, you’re right. I’m having eye problems these days, that one was tricky. Thanks!

1 Like

In case I was too terse earlier, set the encoding to Encodings.WindowsLatin1 when you read it in (instead of Encodings.ASCII) and it should convert just fine.


Having fixed that, unfortunately FNGuessEncoding doesn’t recognize the text as WindowsLatin1, which it turns out is the actual encoding. So I can either reinstate a user pref from several years ago where the user manually selected the encoding or figure out

Unfortunately there is no way to guess a single-byte encoding from the bytes alone.

Yeah, I decided to offer them a popup menu to choose an encoding and they can take it from there. Not as user friendly as I’d like, but it’ll get the data.


1 Like