Weird characters in MSSQL Varchar(MAX)

Nathan_Wright · April 8, 2014, 8:44pm

I am reading data in from a table on an MSSQL 2012 server and all the content comes back fine except for one column that is a VARCHAR(MAX). For some reason the text is coming back as Chinese characters. I am sure this is something to do with encoding in the way I am getting the data (shown below). Anyone know how I can tell Xojo to use the right encoding etc for this field.

data.Field("Notes").StringValue

Kem_Tekinay · April 8, 2014, 8:46pm

Do you know what the encoding should be? (If it’s coming back as Chinese, it’s probably some form of UTF-16.)

You can always define the encoding of a string with DefineEncoding, and you can use the Encodings module for choices. In this case, this might work:

data.Field( "Notes" ).StringValue.DefineEncoding( Encodings.UTF16 )

Nathan_Wright · April 8, 2014, 9:00pm

No that didnt work so I have tried some of the others with no luck yet. I have looked in the DB and the field is defined as SQL_Latin1_General_CP1_CI_AS but cant find that exact match in the Xojo list.

Kem_Tekinay · April 8, 2014, 9:17pm

It would be helpful to see exactly what you’re getting back. Try this:

dim notesHex as string = EncodeHex( data.Field( "Notes" ).StringValue, true )
break

In the debugger, copy and paste the value in notesHex here.

Nathan_Wright · April 8, 2014, 9:29pm

Herewith the value of notesHex (didnt know you could get to the encoding stuff in the debugger)

E6 A1 94 E7 8D A9 E6 A4 A0 E2 81 B3 E2 81 A1 E6 95 B4 E7 91 B3 E6 98 A0 E7 89 AF E7 90 A0 E6 95 A8 E6 B8 A0 E7 91 AF E7 8D A5 2E 00

DaveS · April 8, 2014, 9:56pm

taking just the first two E6 A1 or reversed (A1 E6) both are UTF8 characters… but NOT any thing used in the ENGLISH language
neither is ASCII (as they exceed 0x7F) abd tge same holds true for UTF16

Do you have an idea of what it SHOULD be saying?
Or can you create a record with a KNOWN phrase and post ITS result?

Nathan_Wright · April 8, 2014, 10:01pm

Yes I put the text in the field using MS Management Studio and the text says “This is a test for the notes.”

Varchars work fine but this is a VARCHAR(MAX) which is the recommended these days instead of using a TEXT field type.

Nathan_Wright · April 8, 2014, 10:08pm

Have just done another test and created another field in the same table but this time created it as NVARCHAR(MAX) and it decodes perfectly in Xojo so it would appear that it is specifically an issues with VARCHAR(MAX) as VARCHAR and NVARCHAR without the MAX also work fine.

Norman_P · April 8, 2014, 11:01pm

Nvarchar is wide chars, I believe, capable of handling UTF (reading various MSDN docs it seems like its UCS2 - also usually referred to as UTF-16)
Varchar is single byte encodings only - I expect that’s probably latin1 (or windows 1252 )

Nathan_Wright · April 8, 2014, 11:06pm

I have tried both of those and in fact all of them in the debugger encoding options and non of them work, in fact and version of UTF-32 causes Xojo to crash under Windows 8.1 and trashes the plugin cache which needs to be deleted and rebuild.

Wayne_Golding · April 9, 2014, 12:12am

Known & verified bug. <https://xojo.com/issue/30780>

Nathan_Wright · April 9, 2014, 1:30pm

Brilliant, thanks Wayne. I did try looking in the Feedback system but obviously used wrong criteria. Do Xojo update the case when the issue has been fixed by the developers?

Eli_Ott · April 9, 2014, 2:36pm

SQL_Latin1_General_CP1_CI_AS means:

CP1: CodePage 1 -> Code page 1252
CI: Case insensitive
AS: Accent sensitive ( <> a)

Wayne_Golding · April 9, 2014, 7:37pm

Yes, but it won’t be available until it’s in a release note.