Decoding Text Encodings

Very often we get foreigners entering text in one of our apps that ends up with weird characters in our DB. Is there a way we can filter out the garbage easily?

Have you read:

Search UTF when you are in the page above.

Would this apply to MySql?

No. Use DefineEncoding when you read data from the database. Unless you’re using a third party tool to view the database, it doesn’t really matter what the encoding is set to in the database itself.

make sure you send string as same encoding as MySQL and before use string from recordset define encoding from stringValue.

It sounds like the app isn’t normalizing the input on UTF-8 (or your preferred encoding) before storing the data, or you are not defining it as UTF-8 when pulling it out.

I have set MySql to UTF-8, Values in fields are then put into MySql. Do I need to do something else?

Maybe. Do you have a current example of this? Can you get the data as hex so we can see the actual bytes that end up in the database?

The database just stores the bytes as-is. It’s not going to modify the data to match the encoding of the field. But you have to tell Xojo what the encoding is when you pull the data back out.

had issues like that when others (not xojo) update the database (a website) and they use other encoding (doesn’t matter if you set UTF8)