Accents Issue

The words in my database contain accents, but they display differently in the Listbox; e.g., the database has ἀγάπη, whereas the Listbox has άγάπη (notice the difference in the first letter). This is creating a problem when attempting to retrieve the data associated with the word. Is there a trick to getting accents to display correctly in the Listbox?

Just to be clear, the ListBox is populated with the words from the database.

Thanks!

Are you setting the text encoding when you read values from the database?

1 Like

I’m not. The database is set to UTF-8 but I didn’t know there was a way to set the encoding in a SELECT statement. Would that go something like:

SELECT CONVERT(myColumn using utf8) as myColumnUtf8 from myTable?

Check the strings that you are using for your Listbox, maybe you need to use DefineEncoding on them too.

2 Likes

I use a global method placed in a module so you don’t need to know the condition of the string first:

(Typed from memory, but it should be close)

Function UTF8(extends s as String) as String
  Select Case s.Encoding
  Case Is Encodings.UTF8
    // Don't modify if it's already UTF8
    Return s
  Case is Nil
    // Just set if it's not set
    Return DefineEncoding(s, Encodings.UTF8)
  Case Else
    // otherwise, convert
    Return ConvertEncoding(s, Encodings.UTF8)
  End Select
End Function

Usage:

Var name as String = rs.field("name").StringValue.UTF8

2 Likes

Updated slightly (the “is” keyword is not needed in the first two cases) to prevent syntax errors:

Function UTF8(extends s as String) as String
  Select Case s.Encoding
  Case Encodings.UTF8
    // Don't modify if it's already UTF8
    Return s    
  Case Nil
    // Just set if it's not set
    Return DefineEncoding(s, Encodings.UTF8)    
  Case Else
    // otherwise, convert
    Return ConvertEncoding(s, Encodings.UTF8)    
  End Select
End Function
1 Like

I would say if using that then be careful to not trust it was magic bullet where you can just use it without thinking. Since if Encoding is nil and you then promise its UTF8 without thinking then result might be wrong if the chars represented something totally different than UTF8 or you could even get crash in some cases.

This is a good point. It would be a good idea to use the Encoding.IsValidData function before setting it.

Case Nil
  If Encodings.UTF8.IsValidData(s) Then
    Return s.DefineEncoding(Encodings.UTF8)
  Else
    // Raise an exception? Leave it alone?
  End If
1 Like

Why is any of this necessary? If the database is set to UTF-8, and is read using Xojo methods (either direct for SQLite or via a plugin for the apparently-unspecified database type in question), then surely:

Var name as String = rs.field("name").StringValue

should be all that is needed.

Execute this command before your select to tell the database that this connection will be communicating via UTF-8 encoding.

db.SQLExecute("SET NAMES 'utf8'")

Edit to add:

If you are using the latest version of Xojo and API 2 database methods, then your string will be UTF-8 by default, but if you want to play it safe, then explicitly convert the encoding to UTF-8. If you are using API 1 and getting a recordset, then you would want to do this

dim myValue as string = rs.Field("myfield").StringValue.DefineEncoding(Encodings.UTF8)

I adjusted the function. (my version based on @Greg_O 's version)

Public Function as_utf8_string(extends s as String) As String
  Select Case s.Encoding
  Case Encodings.UTF8
    // Don't modify if it's already UTF8
    Return s
    
  Case Nil
    // Just set if it's not set; if the string is not actually UTF8, this may break things
    If Encodings.UTF8.IsValidData(s) Then
      Return s.DefineEncoding(Encodings.UTF8)
      
    Else
      // Raise an exception? Leave it alone?
      Return "FOUND NON-UTF8 STRING WITH NIL ENCODING. IS THIS UTF8 DATA?"
      
    End If
  Case Else
    // otherwise, convert
    Return ConvertEncoding(s, Encodings.UTF8)
    
  End Select
End Function

It returns ALL CAPS string stating if “bad” data is passed to it. (I think this is the only viable route for an extends which returns a string without going into some strange and unknown territory??)

You really should throw exception

Else you would need to compare every single time you call this if you get your long upper case string, to have proper error handling, which would make no sense.

2 Likes

Not if you are using Xojo older than 2024r3.

This change was added to 2024r3:
#18906 - MySQL plugin should set using utf-8 by default

William Yu

[RN] MySQLCommunityServer: We now encode RowSet string values in UTF-8 if the database character set is set to UTF-8; otherwise, we leave the encoding as Unknown, as we did previously.

Edit: OP didn’t say if MySQL is used. This information is for those that use MySQL/Xojo < 2024r3 and think their db is UTF-8 or that the Xojo plugins returns UTF-8

Really, really, really should do this.

  Else
    Raise New RuntimeException("Could not convert to UTF-8")

  End If
1 Like

@Kem_Tekinay

I’m not sure I agree in the case of an Extends function, but I’m open to being persuaded as to why this is better.

The first time mysql has been mentioned in this topic.

1 Like

When there is the potential for a fundamental failure, you want that failure to get “loud”. Remember, you might return to this code in 6 months or 6 years and not remember that it is substituting the original string with something else.

If you don’t want to risk that disruption, then return the original string with its nil encoding. But don’t silently change your data.

Very good point. Updated function:

Public Function as_utf8_string(extends s as String) As String
  Select Case s.Encoding
  Case Encodings.UTF8
    // Don't modify if it's already UTF8
    Return s
    
  Case Nil
    // Just set if it's not set; if the string is not actually UTF8, this may break things
    If Encodings.UTF8.IsValidData(s) Then
      Return s.DefineEncoding(Encodings.UTF8)
      
    Else
      // Raise an exception
       Raise New RuntimeException("Could not conver stringt to UTF-8 at: " + CurrentMethodName)
      //And leave the string as is
      Return s
      //changing the string to indicate an error could cause hard-to-track-down bugs,
      //where the string is changed and doesn't break the program, but breaks *the functionality*
      
      //see discussion here:
      //https://forum.xojo.com/t/accents-issue/83014/14
      
    End If
  Case Else
    // otherwise, convert
    Return ConvertEncoding(s, Encodings.UTF8)
    
  End Select
End Function

The Exception will exit the function so you will never hit that “Return s”. It doesn’t hurt anything, but it won’t do anything either.

1 Like

You also have typos in your exception message.