MSSQL Prepared Statements mess up Scandinavian characters

Jani_Klingberg · July 19, 2024, 11:29am

Building a web app with 2024r2.

When using Prepared Statements to select, create, or update rows, any Scandinavian characters (mostly ä, ö, å in my case) get garbled in MSSQL. I’ve tried defining and converting the string’s encoding to pretty much every reasonable option, but none have fixed the issue. Sometimes the way the characters garble does change a bit.

However, when using the input values in the SQL command directly, without Prepared Statements, they work exactly as they should, without needing to fiddle with the encoding at all. So it seems the problem is only within Prepared Statements.

Can I somehow alter how the Prepared Statements work, and/or do some additional string encoding in there?

Eric_Williams · July 19, 2024, 1:21pm

How can you tell it is garbled - via a DB tool, by reading the values back in your code, etc?

TimStreater · July 19, 2024, 2:38pm

Are you using SQLSelect and SQLExecute, or Select SQL and ExecuteSQL? If you use the latter, you don’t haver to bother preparing statements. Which is it?

Rick_Araujo · July 19, 2024, 3:46pm

Check the tables collation. Are they UTF8 ready? Is your MSSQL version 2019+ (UTF8 support started late like that)

Tim_Parnell · July 19, 2024, 5:08pm

If you’re doing something in a loop, using Prepared Statements can be far more efficient because using the API 2.0 functions means a new PreparedStatement is created internally for each query.

TimStreater · July 19, 2024, 6:02pm

Yeah, I know. But I thought it worth asking, as I found using explicit prepared statements to be a large overhead of code. In fact so much so, that before API2 I wrote a set of wrappers that encapsulated the statement preparation in just the same way that API2 does, although only for string arguments (but that was enough to oversome the SQL injection issue).

There are folk who have not converted to API2 and who may therefore be unaware of what I found to be a huge improvement when API2 database methods were introduced. These can be adopted without needing to convert everything else to API2. I’d need to check, whether or not I have any SQL usage in a loop which would justify explicit prepared statements.

Jani_Klingberg · July 22, 2024, 5:48am

I have access to the DB server so I can just look at the values directly in Server Management Studio.

Jani_Klingberg · July 22, 2024, 5:53am

I’ve been using SelectSQL for selection queries.

Jani_Klingberg · July 22, 2024, 5:58am

MSSQL is 2019, collation is the default SQL_Latin1_General_CP1_CI_AS.

I could understand having a character encoding issue if it happened always, but like mentioned before, this only happens when using Prepared Statements. Not when inserting the values into the query directly.

Christian_Schmitz · July 22, 2024, 6:54am

If you’d use MBS Xojo SQL Plugin, we’d connect with unicode mode and pass all strings correctly.

Maybe you like to check out SQLDatabaseMBS class?

(currently available in OmegaBundle)

Rick_Araujo · July 22, 2024, 7:54am

I’ve read that as: Xojo is bugged.

Eric_Williams · July 22, 2024, 12:34pm

That’s the problem right there - you are probably mismatching the encoding somewhere. I’m defer to MSSQL experts, but I would try to get that table switched over to UTF-8 if possible. Things will run much more smoothly.

Rick_Araujo · July 22, 2024, 1:53pm

Well, this collation is a one byte per char, case insentive (ö = Ö), accent sensitive (A <> Ä) but supporting only some chars as seem in this table:

I guess something may be broken with the storage data, and knowing some tables causing unexpected results, I would take a copy of some of these records into a compatible sample table, same collation, and would start to inspect it.

Something during the UTF8<->CP1252 may have it broken, or bytes may be there, but just not read in a way it should be expected.

At the end, when finding out what happened, and a way to read/convert to the expected values, I would consider other steps towards a solution, probably involving some conversion to a SQL_Latin1_General_CP1_CI_AS_UTF8 collation

Be warned that UTF8 expands some chars to multiple bytes, like the Scandinavian chars with codepoints > 127, so a varchar(14) may not hold things as “Asløg Ásgerður”, you may need a larger space declared for that column. Do some experiments.