Text Encoding

Good afternoon.

Some background: We have a questionnaire. It’s maintained in a spreadsheet. There’s a tool to import it into a database, which we do by saving the spreadsheet as a text file. We have had the questions translated into Spanish.

To preserve the accents et cetera I’m now exporting the spreadsheet as UTF16. The problem is that I can’t seem to read the UTF16 correctly inside Xojo. I did a DefineEncoding to UTF16, which makes the data readable. Then I did a ConvertEncoding to UTF8, but this did not seem to do anything (it still looks like UTF16).

My solution so far is to export as UTF16, open the resulting file in TextWrangler, and save it from there as UTF8. This works. Since we don’t do this very often, I can live with it, but I’m curious as to what I’ve missed.

Thank you.

-Bob Gordon

I didn’t understand this part. Do you mean the bytes of the string do not change? Or did you expect some difference in the text? If the latter, there shouldn’t be.

I don’t understand the UTF16 part. If you don’t have Chinese or Asian complex scripts, you should not do that. Just save as UTF-8, and make sure to DefineEncoding correctly UTF-8 when you load back from the database.

Even if there are Chinese or Asian complex scripts, does it make a difference? Don’t all the UTF encodings cover the entire range of Unicode? (Asking because I’m suddenly unsure.)

here is a very good description (I think)