The words in my database contain accents, but they display differently in the Listbox; e.g., the database has ἀγάπη, whereas the Listbox has άγάπη (notice the difference in the first letter). This is creating a problem when attempting to retrieve the data associated with the word. Is there a trick to getting accents to display correctly in the Listbox?
Just to be clear, the ListBox is populated with the words from the database.
I use a global method placed in a module so you don’t need to know the condition of the string first:
(Typed from memory, but it should be close)
Function UTF8(extends s as String) as String
Select Case s.Encoding
Case Is Encodings.UTF8
// Don't modify if it's already UTF8
Return s
Case is Nil
// Just set if it's not set
Return DefineEncoding(s, Encodings.UTF8)
Case Else
// otherwise, convert
Return ConvertEncoding(s, Encodings.UTF8)
End Select
End Function
Usage:
Var name as String = rs.field("name").StringValue.UTF8
Updated slightly (the “is” keyword is not needed in the first two cases) to prevent syntax errors:
Function UTF8(extends s as String) as String
Select Case s.Encoding
Case Encodings.UTF8
// Don't modify if it's already UTF8
Return s
Case Nil
// Just set if it's not set
Return DefineEncoding(s, Encodings.UTF8)
Case Else
// otherwise, convert
Return ConvertEncoding(s, Encodings.UTF8)
End Select
End Function
I would say if using that then be careful to not trust it was magic bullet where you can just use it without thinking. Since if Encoding is nil and you then promise its UTF8 without thinking then result might be wrong if the chars represented something totally different than UTF8 or you could even get crash in some cases.
Why is any of this necessary? If the database is set to UTF-8, and is read using Xojo methods (either direct for SQLite or via a plugin for the apparently-unspecified database type in question), then surely:
Execute this command before your select to tell the database that this connection will be communicating via UTF-8 encoding.
db.SQLExecute("SET NAMES 'utf8'")
Edit to add:
If you are using the latest version of Xojo and API 2 database methods, then your string will be UTF-8 by default, but if you want to play it safe, then explicitly convert the encoding to UTF-8. If you are using API 1 and getting a recordset, then you would want to do this
dim myValue as string = rs.Field("myfield").StringValue.DefineEncoding(Encodings.UTF8)
I adjusted the function. (my version based on @Greg_O 's version)
Public Function as_utf8_string(extends s as String) As String
Select Case s.Encoding
Case Encodings.UTF8
// Don't modify if it's already UTF8
Return s
Case Nil
// Just set if it's not set; if the string is not actually UTF8, this may break things
If Encodings.UTF8.IsValidData(s) Then
Return s.DefineEncoding(Encodings.UTF8)
Else
// Raise an exception? Leave it alone?
Return "FOUND NON-UTF8 STRING WITH NIL ENCODING. IS THIS UTF8 DATA?"
End If
Case Else
// otherwise, convert
Return ConvertEncoding(s, Encodings.UTF8)
End Select
End Function
It returns ALL CAPS string stating if “bad” data is passed to it. (I think this is the only viable route for an extends which returns a string without going into some strange and unknown territory??)
Else you would need to compare every single time you call this if you get your long upper case string, to have proper error handling, which would make no sense.
[RN] MySQLCommunityServer: We now encode RowSet string values in UTF-8 if the database character set is set to UTF-8; otherwise, we leave the encoding as Unknown, as we did previously.
Edit: OP didn’t say if MySQL is used. This information is for those that use MySQL/Xojo < 2024r3 and think their db is UTF-8 or that the Xojo plugins returns UTF-8
When there is the potential for a fundamental failure, you want that failure to get “loud”. Remember, you might return to this code in 6 months or 6 years and not remember that it is substituting the original string with something else.
If you don’t want to risk that disruption, then return the original string with its nil encoding. But don’t silently change your data.
Public Function as_utf8_string(extends s as String) As String
Select Case s.Encoding
Case Encodings.UTF8
// Don't modify if it's already UTF8
Return s
Case Nil
// Just set if it's not set; if the string is not actually UTF8, this may break things
If Encodings.UTF8.IsValidData(s) Then
Return s.DefineEncoding(Encodings.UTF8)
Else
// Raise an exception
Raise New RuntimeException("Could not conver stringt to UTF-8 at: " + CurrentMethodName)
//And leave the string as is
Return s
//changing the string to indicate an error could cause hard-to-track-down bugs,
//where the string is changed and doesn't break the program, but breaks *the functionality*
//see discussion here:
//https://forum.xojo.com/t/accents-issue/83014/14
End If
Case Else
// otherwise, convert
Return ConvertEncoding(s, Encodings.UTF8)
End Select
End Function