Problems with encodings

Paul_Magnussen · July 1, 2022, 6:36pm

Hi,

I have been programming for some time, but I am new to Xojo and have just finished working through the Introduction.

I am now trying to write an app that will process legacy files, which means it has to deal with codings other than UTF-8. However, I am having problems with the documentation.

The conversion from MacRoman seems simple enough:

InBuff = InBuff.DefineEncoding(Encodings.MacRoman) ’ Tell the system the text is Mac Roman.
InBuff = InBuff.ConvertEncoding(Encodings.UTF8) ’ Convert it to UTF-8.

This works fine.

But a problem arises when I try to find the complete list of the internal names for other encodings.

• A find on “MacGreek” locates one such list (“Codes for Base”).

• However, “UTF8” is not in the list. Why not? Is it the same as “UnicodeDefault”?

• Also, I need the names for the old US DOS codepages 437 and 850.

Can anyone help? Where are these things to be found?

TIA.

Tom_Dixon · July 1, 2022, 7:15pm

I typed in the following:

Var InBuff As String
InBuff = InBuff.DefineEncoding(Encodings.[press tab key]
InBuff = InBuff.ConvertEncoding(Encodings.[press tab key]

Pressing the tab Key after .Encodings. will give you the Autocomplete list of all the supported encodings. I’m not sure which ones of the DOS encodings are the ones you are looking for. Hopefully someone else smarter than me knows.

Edit:
Update from a reliable source, below should be the correct encodings codepages 437 and 850.
437 → DOSLatinUS
850 → DOSLatin1

Paul_Magnussen · July 1, 2022, 8:00pm

Many thanks, Tom!

But when I press the Tab key, the list only appears for an instant: then it disappears!

P.S. What about Windows-1252?

Tom_Dixon · July 1, 2022, 8:42pm

The list will stay open and scrollable until you click a selection or click outside the list to close it. I suspect you may be unintentionally clicking outside the list?

Per my reliable source 1252 will give you some trouble since its not a well defined “standard”.

en.wikipedia.org

Windows-1252

Windows-1252 or CP-1252 (code page 1252) is a single-byte character encoding of the Latin alphabet, used by default in the legacy components of Microsoft Windows for English and many European languages including Spanish, French, and German. It is the most-used single-byte character encoding in the world (on websites at least). As of July 2022[update], 0.3% of all websites declared use of Windows-1252, but at the same time 1.2% used ISO 8859-1 (while only 4 of the top 1000 websites), which by HTM…

WindowsLatin1 is close, but as Wikipedia notes, it is not exactly the same.

Paul_Magnussen · July 1, 2022, 10:10pm

OK, found the problem: I was continuing to hold the Tab key down, instead of just tapping it once.

And I notice that UTF8 is indeed there,

Thanks again.