We need to read files which have been encoded in what seems a very old macintosh text format
The files we have contain +AF8- (an underscore in UTF-7) but may contain other encoded characters.
Is there a fast way of converting these to UTF-7/8?
So far we actually only come across an underscore, but there may be any of those encoded characters in the text.
You may do a decoder function:
- Read the text as ASCII.
- Loop over all characters.
- For every “+” you find, lookup the following characters in a table to decode.
- Otherwise add the character to final result.
How would you distinguish if the file actually contains a + followed by a valid character code (e.g. that the user typed&saved and is now part of the file)?
So long as the encoding is consistent then an actual + character is written as + - in the file.
Here’s an example line:
INKINFOS “Unit=4,InkName=Y,InkNameL=U401077+AF8-OP,DFilter=D_RED,Lab=20.64 -1.34 -24.62,Blocked=0”
In fact, only the InkNameL part of the line is encoded like that, rather than the whole file. The equals characters should be +AD0- according to the lookup table URL I posted.
See the Wikipedia article on UTF-7, which is what your files seem to be. That gives an algorithm for decoding. Also it doesn’t look like this format has anything to do with the Mac, which isn’t mentioned in the article.
Right. If the user can’t save himself/herself to the file and you always know where to expect encoded or clear strings, you’re good to go. I thought you didn’t know the saved pattern too.
I only mentioned Macintosh because the string-functions link mentioned it. It does just look like the text is UTF-7.
I was after a fast way of decoding it, but I can do loops and lookups fine. Maybe with a #pragma and indexofbytes as well to make it quicker.
Could you post your solution? I still use a php function that is going to need an update at some time in the future.