Reading Old encoded files

Lee_Badham · July 1, 2021, 7:03pm

Hi,
We need to read files which have been encoded in what seems a very old macintosh text format

http://string-functions.com/encodingtable.aspx?encoding=65000&decoding=10000

The files we have contain +AF8- (an underscore in UTF-7) but may contain other encoded characters.

Is there a fast way of converting these to UTF-7/8?

So far we actually only come across an underscore, but there may be any of those encoded characters in the text.

Regards,

Lee

Christian_Schmitz · July 1, 2021, 7:28pm

You may do a decoder function:

Read the text as ASCII.
Loop over all characters.
For every “+” you find, lookup the following characters in a table to decode.
Otherwise add the character to final result.

Arnaud_N · July 1, 2021, 9:06pm

How would you distinguish if the file actually contains a + followed by a valid character code (e.g. that the user typed&saved and is now part of the file)?

Lee_Badham · July 1, 2021, 9:28pm

So long as the encoding is consistent then an actual + character is written as + - in the file.

Here’s an example line:

INKINFOS “Unit=4,InkName=Y,InkNameL=U401077+AF8-OP,DFilter=D_RED,Lab=20.64 -1.34 -24.62,Blocked=0”

In fact, only the InkNameL part of the line is encoded like that, rather than the whole file. The equals characters should be +AD0- according to the lookup table URL I posted.

R!

TimStreater · July 1, 2021, 9:36pm

See the Wikipedia article on UTF-7, which is what your files seem to be. That gives an algorithm for decoding. Also it doesn’t look like this format has anything to do with the Mac, which isn’t mentioned in the article.

Arnaud_N · July 1, 2021, 9:45pm

Right. If the user can’t save himself/herself to the file and you always know where to expect encoded or clear strings, you’re good to go. I thought you didn’t know the saved pattern too.

Lee_Badham · July 1, 2021, 9:53pm

I only mentioned Macintosh because the string-functions link mentioned it. It does just look like the text is UTF-7.

I was after a fast way of decoding it, but I can do loops and lookups fine. Maybe with a #pragma and indexofbytes as well to make it quicker.

Regards,

Lee

Beatrix_Willius · July 2, 2021, 3:00am

Could you post your solution? I still use a php function that is going to need an update at some time in the future.