Define Text Encodings from SQL Server Database

I am querying an SQL server database and then caching the data into an in-memory SQLite database.

I am having issues with encodings as certain characters e.g. vulgar fractions, are not displaying in the list box properly. The StringValue of the data returned from the SQL server query has Nil encoding.

I checked the SQL server character set name for nvarchar in the table of interest and it states ‘UNICODE’

So when defining an encoding should I be using Encodings.ASCII as I’m not sure what UNICODE should map to in Xojo?

After defining the encoding, I add data to a SQLite database as follows:

If rs <> Nil Then
  While Not rs.AfterLastRow
    dbRow = New DatabaseRow
    For col = 0 To rs.ColumnCount - 1
      ' create and populate row object
      ColName = cols(col)
      dbRow.Column(ColName).StringValue = rs.ColumnAt(col).StringValue.DefineEncoding(Encodings.ASCII)
    Next
    
    ' add row to db
    MySQLiteDB.AddRow("Parts", dbRow)
    
    rs.MoveToNextRow
  Wend
End If

Assuming I defined the correct Encoding, when setting DatabaseColumn.StringValue (for sqlite database), I think it think it converts the encoding to UTF8 according to the below:

DatabaseColumn — Xojo documentation

Is it possible to stop it from converting the encoding to UTF8 as this may be causing the issue when writing the data to a ListBox?

Have you tried:

 dbRow.Column(ColName).StringValue = rs.ColumnAt(col).StringValue.DefineEncoding(Encodings.UTF8)

The first problem is that if your database is already built and has data in a given encoding you likely want to keep it that way. Alternatively you will need to convert the whole thing at once. What you shouldn’t do is start adding things to a database and end up with some records in one encoding and others in another. Unless you are very careful and store the information on the encoding within the records themselves.

As for changing the encoding you are using you can’t use DefineEncoding to do it. There are two basic functions for encodings that matter here:

DefineEncoding - This sets an encoding on a given string. It does not change the underlying data in any way. It simply labels the data as having a given encoding. It is useful when you read data from a database or external source.

ConvertEncoding - This actually performs changes on the data you have stored. It attempts to change the coding from one form to another. For example if you have data from windows, it is quite possible it is stored in WindowsLatin1. You may wish for it to be covered to UTF-8 for use on another platform (such as Mac). To do this you would have to read the string, label it as WindowsLatin1 and then convert it to UTF-8.

MyString = dbRow.Column(ColName).StringValue.DefineEncoding( Encodings.WindowsLatin1 )
MyString = MyString.ConvertEncodings( Encodings.UTF8 )

Equally storing the data back you would have to do the opposite, this time the Xojo already knows it is using UTF-8 so you don’t need to define that:

dbRow.Column(ColName).StringValue = MyString.ConvertEncodings( Encodings,WindowLatin1 )

Not all encodings are created equal, ASCII for example only consists of 127 codes, it has no support for accented characters or other alphabets. It is unlikely you want to use it in the database.

UTF-8 is 100% compatible with ASCII, however, the reverse is not true. All ASCII codes have the same meaning in UTF-8, not all UTF-8 exist in ASCII.

UTF-8 is a version of Unicode. Along with UTF-16 and UTF-32. If it already is in UNICODE, then you would likely just need to define the correct version to make everything right:

' It is likely UTF-8
MyString = dbRow.Column(ColName).StringValue.DefineEncoding( Encodings.UTF8 )

' Perhaps it is UTF-16?
MyString = dbRow.Column(ColName).StringValue.DefineEncoding( Encodings.UTF16 )

' Very unlikely to be UTF-32 but it is possible
MyString = dbRow.Column(ColName).StringValue.DefineEncoding( Encodings.UTF32 )

This is what I tried first, but it made no difference to Nil Encoding.

Thanks for the explanation. For clarification, yes the SQL Server database is already built and the nvarchar type has a character set name of UNICODE. I am not writing any data back to this database, but simply querying the data and then copying it into an in-memory SQLite database (to use later when populating a ListBox as faster to load from in-memory database each time).

When retrieving data from the SQL Server database in Xojo, it has Nil encoding, therefore I must use DefineEncodings and not ConvertEncodings.
The trouble is UTF8, UTF16 and UT32 has made no difference. In the DesktopListBox some characters appear as a question mark inside a black diamond shape.

Just another thought, were did the data in the database come from?

It is quite possible that the database is configured to store UTF-8 but isn’t actually, UTF-8. Windows, for example, uses a different encoding. If it comes from Windows it would be with trying WindowsLatin1. In which case you would need both the “DefineEncoding( Encodings.WindowsLatin1 )” and the “ConvertEncodings( Encodings.UTF8 )”.

If it came from a web interface, it could be that it is in “ISO-8859-1”, in which case try

String = databasestring.DefineEncoding( Encodings.ISOLatin1 ).ConvertEncoding( Encodings.UTF8 )

The data in the SQL server database comes from a third party commercial application (a .net Windows desktop application).

The following made no difference when later populating a ListBox:

dbRow.Column(ColName).StringValue = rs.ColumnAt(col).StringValue.DefineEncoding(Encodings.WindowsLatin1)

I note the above where you mentiond:

MyString = dbRow.Column(ColName).StringValue.DefineEncoding( Encodings.UTF8 )

However I am not retrieving the string from a DatabaseRow. I am retrieving data from the SQL server database and defining an encoding before adding to DatabaseRow and to the in-memory SQLite database.

This worked! Thank you very much

This is what worked to confirm:

dbRow.Column(ColName).StringValue = rs.ColumnAt(col).StringValue.DefineEncoding(Encodings.WindowsLatin1).ConvertEncoding( Encodings.UTF8 )

The listbox is expecting UTF-8. You need to do the conversion. Defining the encoding just allows that conversion to work.

For now you need to keep things simple. Get a string from the source database. Define an encoding and then convert it to UTF-8.

Good, you need to be sure to store the data back into the database in the correct encoding. Assuming you need to change something. The data is already in UTF-8 and define that way.

orignaldatabasecolumn = SourceString.ConvertEncoding( Encodings.WindowsLatin1 )

As I said you don’t want to end up with some records containing UTF-8 and others WindowLatin1.

I do write data back to the database using a Stored Procedure that came with the third-party database which takes parameters as an XML string. So what I do is construct the XML string in Xojo and then execute the stored procedure on the database. In this case, is it still important to convert the XML string to WindowsLatin1 in Xojo before executing the procedure on the database?

Hmm… I don’t know. The stored procedure is likely dealing with the conversion. XML standard is for UTF-8 encoding, but they may have not dealt with that. You would be best to give it a try.

Assuming you are able try and store something and see if you newly working code still retrieves it correctly. If it doesn’t you then know you need to fix something.

Obviously the test need to include letters or symbols that would otherwise go wrong, “ABC123” isn’t ever going to go wrong with WindowsLatin1.