Once again, I am wrestling with converting encodings. I thought it was working well, thanks to Joe Strout’s GuessEncoding function, but today a customer sent me a file that upends everything. I’m working in macOS.
The issue is the degree symbol, which is generated on Windows by pressing Alt+176 within Excel (or any other app, according to the user). When I read a file containing this, the content correctly shows a degree symbol when viewed in Xojo’s inspector with ASCII encoding. If I view the content as UTF8, then Xojo shows (on MacOS) a diamond with a question mark inside it. When the content is printed, an infinity symbol is drawn instead of the degree symbol.
The same thing happens when I use .ConvertEncoding to convert the original ASCII string to UTF8. The degree symbol prints and displays onscreen as infinity. It does not convert to the unicode version of a degree symbol, which is what I was expecting.
Here’s the relevant code:
'Fi is the FolderItem passed to the method
DimTextInput as TextInputStream
Dim FileChunkStr as String
Dim ResultStr as String
DimTextInput = TextInputStream.Open(Fi) 'Open the file
TextInput.Encodidng = Encodings.ASCII. 'Because I know the file is ASCII
FileChunk = Textinput.ReadAll. 'Get the file content
'Look at FileChunk in the Inspector, it shows the degree symbol and says encoding is ASCII.
ResultStr=FileChunk.ConvertEncoding(Encodings.UTF8)
Look at ResultStr in the Inspector, it says it’s UTF8, and has a black diamond instead of the degree symbol. Printing ResultStr or drawing it in a graphics instance shows it as an infinity symbol.
I also tried using a TextConverter:
Dim tc As TextConverter
tc = GetTextConverter(GetTextEncoding(&h0600),GetTextEncoding(&h0600))
Dim ResultStr As String
ResultStr = tc.convert(FileChunk)
ResultStr still shows as the black diamond and prints infinity, not the degree symbol. Clearly I don’t understand what ConvertEncoding and TextConverter are designed to do.
Thoughts? Suggestions?
- John