DefineEncoding failing (not doing anything)

I have a Class with a String Property. This Property is set within the Constructor of this Class using the following line:

Self.Title = rs.Column("title").StringValue.DefineEncoding(Encodings.ISOLatin1)

But the Self.Title String has no Encoding (showing up as h201 in the debugger) after this line?

Edit: Using Xojo 2021 r2.1
Edit 2: Even defining the Encoding later in the Constructor fails too.

Can you try this:

Var title_value As String = rs.Column("title").StringValue // break here, what's the encoding?
title_value = title_value.DefineEncoding(Encodings.ISOLatin1)
Self.Title = title_value // Break this line check the debugger encodings
Var title_value As String = rs.Column("title").StringValue // break here, what's the encoding?
// Break: Encoding is NIL, because it's not yet defined by my code

title_value = title_value.DefineEncoding(Encodings.ISOLatin1)
// Break: Still has no encoding here

Self.Title = title_value // Break this line check the debugger encodings
// Break: Also no encoding... strange!

I think i’ve found the issue: If the encoding is wrong (f.e. Latin1 instead of UTF8), it does not define the desired encoding but an unknown/broken encoding?

DefineEncoding is of course in no way conversion of any kind. Its only a promise, in this case you are promising the encoding is ISPLatin1. If the promise is false then you can expect crash or any undefined behavior basically.

1 Like

Of course, there is no crash. The end user just gets some nice mojibake.

1 Like

Trust me you can get crash, especially on Windows if encoding is other than promised.

1 Like

Presumably it just sets a value indicating the encoding somewhere in the String object. It doesn’t do anything to the string’s data bytes. ConvertEncoding OTOH will assume your string has the given encoding and then converts it (and also changes that encoding value to the new one).

So it’s up to us to ensure our text actually has the defined encoding.

1 Like

DefineEnoding is for when there is no encoding set.
ConvertEncoding is to convert from one to another

The really “bad” thing seems to be in my case, that the DB,Table and Column are defined as latin1 names + charset, but a PHP Script (of which i have no control over) saves the data as UTF8 encoded data but my app uses the correct encoding (latin1) to save the data into the DB.

When my app reads data from the DB, it can’t “know” if it’s latin1 or UTF8 encoded.

BTW: I use

Self.ExecuteSQL "SET NAMES 'latin1'"
Self.ExecuteSQL "SET CHARSET 'latin1'"

In my DB Connection Class.

Why don’t you put it all to UTF-8 ? You need Latin1 for a specific thing?

Unfortunately i can’t alter the DB schema. And it’s a DB many different Systems have read/write access to.
I don’t want to “break” something i can’t handle afterwards. :wink:

I am not sure you can actually fix when its been done fundamentally wrong under the hood. You might be able to get it somewhat ok, but the heart of the problem would always lure over you as in what if you get different combination of letters or different symbol which you had not tested for, then you always have the problem that UTF8 was forced into IsoLatin1 and your at mercy of the database, how bad or well it handles that.

1 Like

That’s a mess. So you have a pseudo Latin1 encoded DB with UTF8 bytes inserted. Unknown things can happen at some point…

But ignoring this, the way to go is reading the Latin1 column and DEFINE its encoding as UTF8 just after, and process it as usual in Xojo.

When writing to the DB you must assure that such “bag of bytes” is still UTF8 but the DB won’t know it thinking it is receiving Latin1.

The bad side is that DB comparisons are broken, a SELECT * WHERE str_a > str_b may be incorrect as the DB is interpreting “alien chars” as Latin1.

Exactly. ATM i’m pulling those VarChars as UTF8 encoded and if my Code “detects” strange/alien chars, it pulls those chars again as latin1. Then it moves on to the next DB Column.
Not perfect but is good enough for our usecase.

Thank you all for your kind help. Case Closed :slight_smile:

You never did tell us which sort of database it is.

mySQL (i think v8.x). It‘s an outdated OTRS Ticket System DB.

I wrote a Software for our Company which combines various Systems like OTRS, selectLine,Zabbix and our own Wimax/Ubiquity Network Solutions with our Fiber and Copper DSL Switches and Servers. And soooo much more… :wink: