String encoding with MySQL...

Sorry to ask this question for the XXXXth time on the forum… I read many answers but none of the proposed solutions solved my issue. Strings with accented characters like ", , "… dont show up as they should and are replaced by black diamonds with a question mark. I’m building a WEB app using a MySQL database. Here are some infos about my development environment…

  • XOJO release 4 on iMac (Maverick)
  • MySQL Database : Version 5.1.57 Community (I know it should be updated… :slight_smile: ); charset = UTF-8 Unicode tried general_ci and _bin collation but it didn’t help… I think anyway that the strings are transferred without any encoding anyway… The database has been created in UTF-8

My code is :

If RC <> Nil Then While Not RC.EOF ClientsListBox.AddRow(RC.Field("id").StringValue.DefineEncoding(Encodings.UTF8), RC.Field("nom_client").StringValue.DefineEncoding(Encodings.UTF8)) RC.MoveNext Wend End If

When I open my project in the Windows version of XOJO and I open it in Explorer, strings are OK!!!

What’s wrong with my setup???

Thanks!

Roger

On the web, it’s the browser that is displaying the data, not Xojo, so things work a little differently. Your code is correct for desktop, but I really don’t know how encodings play in WE. The browser has to interpret the string you’re sending it, and I don’t really know how much Xojo does to help it.

Are you sure the data you put into the database was UTF-8? Could it be Latin1? The fact that the windows side interprets it correctly indicates that it’s not an encoding the Mac browser is expecting.

Let’s do a little debugging to see where the issue is. Change your code to this:

If RC <> Nil Then
    While Not RC.EOF
      dim id as string = RC.Field("id").StringValue
      dim nom as string = RC.Field("nom_client").StringValue
      // Break here and see how the strings look in the debugger
      // Also check to see what bytes are being pulled from the database for each

      id = id.DefineEncoding( Encodings.UTF8 )
      nom = nom.DefineEncoding( Encodings.UTF8 )
      // Break again to see how it looks now

      ClientsListBox.AddRow(id, nom)
      RC.MoveNext
    Wend
  End If

This will tell you something about the encoding of the strings before they went into the database, and how to properly manage that encoding after pulling the data. If it all looks right, the problem is in how the browser is being informed about the encoding.

Thanks to you all!

I found kind of a solution… I can’t remember where though… (Sorry for the author but thanks a lot :slight_smile: …) I wish it would be easier to implement…
Here it is anyway…

I created two global properties called TC_IsoToUTF and TC_UTFToIso. They are instantiated in the open event of the session of my web app.

//Initialisation des convertisseurs de texte TC_IsoToUTF= GetTextConverter(encodings.WindowsLatin1 , encodings.UTF8) TC_UTFToIso= GetTextConverter(encodings.UTF8 , encodings.WindowsLatin1)

They allow to go back and forth between my Web App and the MySQL Database. Honestly, I can’t explain why I need to do that since my DB is in UTF-8 already. It seems that it keeps sending data in WindowsLatin1 encoding whatever the configuration of the character set might be.

It solved part of my problem but there is still one remaining :

I can’t retrieve info from the database if my WHERE Clause contains a string with accented characters even if I convert the encoding to WindowsLatin1. It works fine with “normal” strings though.


Dim strSQLClients As  String
Dim RCClient As RecordSet
Dim strIDClient As String
  
  strClientName = TC_UTFToIso.convert(strClientName)
  
  strSQLClients = "SELECT id FROM clients WHERE nom_client = '" + strClientName + "'"
  RCClient = Session.db.SQLSelect(strSQLClients)
  If RCClient <> Nil Then
    strIDClient = RCClient.Field("id")
  End If
  
  Return strIDClient

Any Idea?

Try this after connecting to the MySQL server:

Session.db.SQLExecute("set names utf8")

You need to DEFINE the encoding, not CONVERT. Converting an unknown encoding will only mangle the string. The string comes to you from the database with an unknown encoding - not WindowsLatin1, not UTF-8. You must tell Xojo what encoding to use to display the string. Use DefineEncoding for that, not ConvertEncoding or TextConverter.

Thanks again!

I found a «better» way to solve my difficulties… After reading this (in french :)), I understood that as long as I would keep working with MySQL, I’d have problems dealing with accented characters. So I decided to switch to a PostgreSQL Database and that solved all my problems!!!

Migration has been easy since my databases didn’t contain many records. I used commercial tools on a trial basis to achieve the migration.

Well, there is another problem that MySQL Workbench doesn’t play well with UTF8, meaning that if you edit data using it you might get weird results.

What is “Session”? A variable? Why not just:

db.SQLExecute("set names utf8")

From the OP

So each session may need to set this ash nit connects but beyond that should be no problem as long as each connection sets it & remembers to read data & define the encoding when it grabs the values from the record sets

So “Session” must be something for web app only? I’m working on a desktop app with encoding issues. So perhaps

db.SQLExecute("set names utf8")

would be what I’m looking for?

[quote=61494:@Duane Mitchell]So “Session” must be something for web app only? I’m working on a desktop app with encoding issues. So perhaps

db.SQLExecute("set names utf8")

would be what I’m looking for?[/quote]
Yes
And make sure that when you DO get StringValues out of recordset columns you use DEFINE ENCODING

Excellent, making progress. A couple of oddities left. For example. I have a record where the last name is Muoz. I get this data via XML calls to what I believe is an MS SQL Server database. It comes into the MySQL database as Muñoz. So there’s probably something wrong there. I have a listbox that is populated by the records in the MySQL db and in the listbox the name appears as Muñoz. However, when that row is double-clicked to get to a detailed view of that record it comes out correctly as Muoz. In the process of populating the text fields of that window I used the Define Encoding as suggested.

winContactData.txtLastName.Text=rs.Field("LastName").StringValue.DefineEncoding(Encodings.UTF8)

I tried using DEFINE ENCODING in the building of the listbox row but it’s not working.

winDataList.lbContactBrowser.Cell(winDataList.lbContactBrowser.LastIndex, i-1) = rs.IdxField(i).Value.DefineEncoding(Encodings.UTF8)

Could use a suggestion on that. Seemingly contrary to this behavior I have a record with the last name Gonzalez with an accent over the “a”. I can’t reproduce it here, I get . This name displays correctly in the listbox but with a black diamond in the detail view of the record. Puzzling.

[quote=61666:@Duane Mitchell]Excellent, making progress. A couple of oddities left. For example. I have a record where the last name is Muñoz. I get this data via XML calls to what I believe is an MS SQL Server database. It comes into the MySQL database as Muñoz. So there’s probably something wrong there.
[/quote]
You probably need to call DEFINE ENCODING on the “string”

Generally the rule is ANY DATA coming in from outside sources (files, sockets, XML, etc) has NO encoding information - its just a pile of BYTES
Right at the point you get the data call DEFINE ENCODING so Xojo knows what encoding it should be treated as
How you can determine the correct one will vary from use to use - some data transfer mechanism include a marker that says “the data is this encoding” and some don’t so you have to guess OR its something known as part of the data transfer protocol
Once you do this the “Muñoz” will go away and be “Muñoz” throughout

This issue gets more curious.

At the point of entry from the source I define the encoding as UTF8 and it populates the MySQL database correctly. Muoz is Muoz in the db. When I bring the data from MySQL to a listbox I “DefineEncoding(Encodings.UTF8” and the data displays a Muoz. However, when I display Muoz in a text field in another window it shows up as Mu oz. In Windows it will display with a black diamond. I use define encoding for the text field as well.

xtLastName.Text=rs.Field("LastName").StringValue.DefineEncoding(Encodings.UTF8)

I found this in the forum here:

Unfortunately there was no indication as to what page of what doc it was.

So I’m puzzled by why data that looks good in MySQL, looks good in listbox cells, does not look good in a text field.

The string field bit is not relevant
It refers to a string field in a structure which we’re not using here

The diamond usually means the data is NOT valid for the encoding
Which leads me to ask if you do what Eli ott suggested earlier

oh man…yes I did but not for this one instance. drove myself nuts for about 3 or 4 hours.

Thank you for the reminder. I’m going to work this a little more to see to it that it all works but if I’m not mistaken the “answer” for this thread might need to be changed. The answer is not that it won’t work in MySQL and to switch. I think this is going to work.

I think you can uncheck the “answered” then check a different one

That option does not appear to be available to me. Perhaps because I didn’t start the thread?

For me the “answer” could not be to switch to Postgres. The solution to the issue of accents on character in my case was not that complicated and lies within Xojo itself though I searched for it elsewhere. I read the MySQL docs for this.
http://dev.mysql.com/doc/refman/5.6/en/globalization.html
And I also read this excellent posting.
http://mysql.rjweb.org/doc.php/charcoll

In my case I’ve got an application that gets data from a remote database in XML form and then puts it into MySQL. The XML comes to my in “utf-8”.

<?xml version="1.0" encoding="utf-8"?>

But that isn’t enough to get it right in MySQL. At the point where the data enters Xojo I need to define the encoding of the XML as UTF8.

answerString=hsock.post("https://api.someURL.com", 10) answerString=answerString.DefineEncoding(Encodings.UTF8)
So I get a the XML in a variable and then redefine it in the proper encoding. This lets me parse the XML and put it into MySQL correctly.

I also need to set a couple of MySQL system variables for character sets and collation. That is done simply enough using the “SET NAMES” statement. I do it first thing after establishing the database connection.

If db.Connect Then db.SQLExecute("set names utf8")
This tells MySQL what the character set of the incoming SQL will be (UTF8) and what character set to use for the outgoing return data (UTF8).
For reference: http://dev.mysql.com/doc/refman/5.1/en/charset-connection.html

Next I have a client app that is used to access the MySQL database by a group of users. This also needs to use the “SET NAMES” statement right after establishing the connection to the MySQL database in every instance where the character encoding is critical.

Lastly, when pulling individual fields of data from MySQL I needed to DEFINE ENCODING for each field as you put it into the text field.

txtLastName.Text=rs.Field("LastName").StringValue.DefineEncoding(Encodings.UTF8)

It was only with the support here on this forum that this solution was found at all. Thank you Norman for your support.

1 Like