mysql special characters UTF-8

Hi,

What mySQL character set are you using if you need ‘special’ characters, like , , etc.

UTF-8 works fine with mySQL, but - for example - ‘Mrz’ would be displayed in the App as ‘M?rz’ (but with a question mark inside that square).
Obviously, I have to change the charset but how and where?

Ah, and I am using a Mac of course, Mavericks, Firefox/Safari, and xojo 2014 R1

Any help much appreciated.

Andreas

  • Use DefineEncoding on everything that comes from the “outside” (Databases, Files, TCPSocket, etc.):

Dim rst As RecordSet = db.SQLSelect(...) MsgBox(rst.Field("aTextColumnNameHere").DefineEncoding(Encodings.UTF8))

  • Depending on the settings of the MySQL server and the settings of the database within it you are querying, you also need that:
db.SQLExecute("SET NAMES utf8;")

… immediately after Db.Connect() has succeeded.

Thanks, Eli.

I understand the theory behind your solution, but it doesn’t work on my system. I have tried the last couple hours and the only thing that works is to ‘define.encode’ every string I receive from mySQL into ‘ISOLatin1’.

More specifically, mysql seems to simply ignore the set names or set charset command, but still the encoding of the resultset is nil. DefineEncode to UTF will result in ’ M??rz’ , compared to ’ M?rz’ (without any encoding)

Well, ISOLatin1 would be suitable for me, but there is a parade of horribles connected with it, I suspect?

Best

That’s correct. The RecordSet class doesn’t know what comes from the db server. You define what the encoding is on each StringValue.

“SET NAMES utf8;” works and the MySQL server then converts the string to UTF8 before sending it to your app. So I assume the error is happening somewhere else.

Eli,
it is not the RecordSet I tested but the string I extracted from it. But I found the error/mistake. It’s not too obvious, but an interesting effect:

What I did was to manipulate the data in the table directly, i.e. using a software called Sequel Pro. It seems that this front-end is not really recognizing what character set to write to the database table. If I write the data through my application, then everything works fine, even without encoding (because xojo sends utf-8 and the database is configured for utf-8).

Why manipulating directly? Well, for testing purposes, of course. :slight_smile:

[quote=74510:@Andreas Leitzbach]Eli,
it is not the RecordSet I tested but the string I extracted from it. But I found the error/mistake. It’s not too obvious, but an interesting effect:

What I did was to manipulate the data in the table directly, i.e. using a software called Sequel Pro. It seems that this front-end is not really recognizing what character set to write to the database table. If I write the data through my application, then everything works fine, even without encoding (because xojo sends utf-8 and the database is configured for utf-8).

Why manipulating directly? Well, for testing purposes, of course. :)[/quote]
Which is why, as Eli said, you need to use DefineEncoding on everything that comes from outside of your application. At some point there will likely be other inputs to your database that will cause this problem again. To make it easier to implement and maintain, create a data access class that handles reading/writing to the database. Then you have only one place where you need to setup the DefineEncoding calls.

Jay,

I anyway have an application wide method to sanitize any string that comes in and goes out. You know, you also have to deal with special characters like ', /,\ and the like and these may (or may not) lead to exceptions somewhere in your code (or somewhere else).

But thanks for the hint, anyway. :slight_smile:

I ran across a problem with the MySQLCommunityPlugin relative to Japanese language because the database character encoding is not set if you use it to connect to a specific database rather than just the database server. I resolved the problem like this.

mysqldb.host = "hostname" mysqldb.port = port number //mysqldb.databaseName = This may result in encoding issues mmysqldb.userName = "user" mysqldb.Password = "password If mysqldb.Connect then mysqldb.SQLExecute("set names utf8 collate utf8_general_ci") mysqldb.SQLExecute("set character set utf8") mysqldb.SQLExecute("use dbname") End if

Carl

interesting… I will give it a try and report on the results.

Carl,

you’re right, after using this mechanism there is no further need to encode manually. However, since I anyway have to sanitize everything (masking all the '´`stuff) I keep encoding in place - doesn’t cost much and I am safe should mySQL changes their policy in another release… :slight_smile:

Could this be regarded a bug in mySQL or the related plug-in?

For optimisation you can combine the two commands:

mysqldb.SQLExecute(“set names utf8 collate utf8_general_ci; set character set utf8”)

optimisation comes last…

[quote=74728:@Andreas Leitzbach]Carl,

you’re right, after using this mechanism there is no further need to encode manually. However, since I anyway have to sanitize everything (masking all the '´`stuff) I keep encoding in place - doesn’t cost much and I am safe should mySQL changes their policy in another release… :slight_smile:

Could this be regarded a bug in mySQL or the related plug-in?[/quote]
I’d say it’s a problem with the plugin. There probably should be encoding properties in the plugin in order to avoid these encoding issues when connecting to a specific database.

This was a helpful thread. As of 2016 R3 this is still an issue.

Does anyone know if this has been reported?

[quote=288041:@Bob Keeney]This was a helpful thread. As of 2016 R3 this is still an issue.

Does anyone know if this has been reported?[/quote]
Norman did five years ago: 18906 - MySQL plugin should set using utf-8 by default
Christian did two years ago: 22836 - MySQLCommunityPlugin gives strings with no encoding set

Personally I don’t see this as an issue, as I think the developer should do all configurable database stuff explicitly in his application.

Why do this when you can use PreparedStatemets instead?

Some applications connect to databases not controlled by the developer.

The discussion is about the client side not the server side.

We discovered this in Shorts this week. The strings are NOT returned in UTF8 unless you go through the procedure listed above. Since Shorts, a reporting tool, has zero control over the input of the data it seems odd that we have to go through these hoops to get it to work properly.

From an end user standpoint it’s Shorts that’s wrong - not their database. This is why it’s important to fix it.

Any client can tell the server what encoding it prefers data be sent to it in.
So you could ask for data in utf-8 on one connection and utf-16 on another and so on.

Its plausible to have the mysql plugin ask for data to be in utf-8 by default but there still has to be a way for anyone to override this.
It is just a one liner at connection time.