Recognizing bad encoding data

Yes. If you recognize that the text has the wrong encoding and you know which encoding it is, then it’s easy enough to fix. I think Beatrix was trying to find an automatic way to detect when the encoding is wrong.

@Robert Weaver: yep, you state the problem exactly. I have GBs of data and need to find out when I have a bad apple.

Well, at least you can see that you have a lot of ? and in text.
Maybe if they are over 20%, it’s probably wrong encoding.

Statistical analysis. Why didn’t I think of that. :slight_smile:

Roberts code works fine so that my immediate problem is solved. I’ll have a more thorough look at the Python library ftfy later. It shouldn’t be too difficult to translate the code into Xojo.