Recognizing bad encoding data

Robert_Weaver · January 30, 2018, 8:32am

Yes. If you recognize that the text has the wrong encoding and you know which encoding it is, then it’s easy enough to fix. I think Beatrix was trying to find an automatic way to detect when the encoding is wrong.

Beatrix_Willius · January 30, 2018, 8:33am

@Robert Weaver: yep, you state the problem exactly. I have GBs of data and need to find out when I have a bad apple.

Christian_Schmitz · January 30, 2018, 8:50am

Well, at least you can see that you have a lot of ? and in text.
Maybe if they are over 20%, it’s probably wrong encoding.

Robert_Weaver · January 30, 2018, 8:51am

Statistical analysis. Why didn’t I think of that.

Beatrix_Willius · January 30, 2018, 2:50pm

Roberts code works fine so that my immediate problem is solved. I’ll have a more thorough look at the Python library ftfy later. It shouldn’t be too difficult to translate the code into Xojo.