Encoding (again!)

Hi everybody.
I have a class which sets a string property, and which contains accented characters (, etc.)

When I put this property on a label the accented letters are not displayed correctly: debug tells me that the class property has encoding = Nil.

Since I have always had problems with the management of encodings, does anyone explain to me what to do?

Thank to everybody.

where the text came from?
if you fill default property at your class with , , the ide show this correct?
and what happens if you then put the string into a label.
the label have a font that can show this chars?
at r3.1 i can’t reproduce your problem.

@Nedi Freguglia — Strings have a UTF8 encoding by default in Xojo but only if you typed the string within the IDE. Strings read from other sources (MemoryBlocks, Stream, Serial etc.) do not have a default encoding which is then nil.

When reading data from an external source, you should always use DefineEncoding to tell Xojo which encoding it should use. For example:

// Reading data from Serial1 (assuming that data are transferred as UTF8) dim myString as String = DefineEncoding( Serial1.ReadAll, Encodings.UTF8 )

Once you have defined the proper encoding, you can convert from one to another, e.g:

dim convertedString as String = ConvertEncoding( myString, Encodings.MacRoman )

NOTE that a String with no encoding (aka nil encoding) cannot be converted.

Thank you, Markus.

I am using Xojo 2019r1.1.

In a class method I set a string property (an error message) with a text containing and . I have notice, a few minutes ago, that if i put the property value in a label everything is ok.
The issue comes if I concatenate values coming from a Structure: in this case (and ONLY in this case, the property encoding becomes Nil.

I think the solution is this (but I am not sure it’s the best possible):

mPropErrMessage = mPropErrMessage.DefineEncoding(Encodings.UTF8)

where mPropErrMessage is the property which contains ,

However I still make confusion between DefineEncoding, Encoding, ConvertEncoding etc.

[quote=479163:@Nedi Freguglia]I think the solution is this (but I am not sure it’s the best possible):

mPropErrMessage = mPropErrMessage.DefineEncoding(Encodings.UTF8)
where mPropErrMessage is the property which contains è, à[/quote]

While this is the correct approach in principle – whenever you get some text from outside sources you use DefineEncoding to inform Xojo which encoding it is in –, you must find out first which encoding was actually used. It may be UTF8 but you need to make sure.

Thank you, Stphane and Michael. I understand, as Stphane tells me, that if the text come from external sources it may be without encoding (or with a different one), but what makes me crazy is that this happens with a class which handles a database table (SQL Server). There is a method of this class which verify if the barcode I assign to a record has already been assigned to another (with different key). To do this I use a structure

Protected Structure EsisteBarCode Generazione As Integer Linea As String * 1 Matricola As Integer Generazione1 As Integer Linea1 As String * 1 Matricola1 As Integer Errore As Boolean Barcode As String * 5 End Structure

Generazione, Linea and Matricola are the unique key identifying the record; Generazione1, Linea1 and Matricola1 are the unique key of the record which the barcode is already assigned to.
If the barcode is already assigned, the class sets the string property “mPropErrMessage” with this value:

mPropErrMessage = "Il Bar Code  gi stato assegnato a " + CStr(e.Generazione1) + " " + e.Linea1 + " " + CStr(e.Matricola1)

If the value were limited to the "Il Bar Code gi stato assegnato a " it is correctly displayed; but the string concatenation leads the issue.
Debugging the code I noticed that the encoding of the string elements of the structure (coming from the database table) is always nil: so I think that the concatenation of nil encoding strings make the whole property to be nil encoding.

Am I right?

As long as the encodings of alle the strings to concatenate are known, Xojo can convert these to a common encoding. When the encoding of just one string is unknown (nil) then this isn’t possible and thus the encoding of the concatenated string is unknown. But if you have put the string values in the structure yourself then you know which encoding they are in (probably UTF8 as that’s the default) and all you need to to is to define the encoding to be UTF8 once you have fetched the strings from the structure.

The encoding of a string from a structure cannot be known and will be Nil. Use DefineEncoding as you are doing. That is the correct solution.

[quote=479175:@Nedi Freguglia] There is a method of this class which verify if the barcode I assign to a record has already been assigned to another (with different key). To do this I use a structure
[/quote]
Or dont use a structure for this (personally that would be my choice)
A class with properties only is functionally VERY similar to a structure
Except it CAN use strings and those strings WILL retain the encoding assigned.

Now my mind is clearer about encoding!
Your posts make me understand that I use DefineEncoding with strings which have no encoding at all (and this is the case of my structure).
Now I know that I can use ConvertEncoding when I want to CHANGE a string var encoding, and therefore the string var MUST have an encoding (I can’t convert a nil encoding!)
However, taking a look at LR about Encoding, I read the following

// If the file actually has text in a different encoding, then specify the // encoding using DefineEncoding source = source.DefineEncoding(Encodings.UTF16LE)

In this example source already has an encoding? And if the answer is Yes, why use DefineEncoding instead of ConvertEncoding?

I have so many things to learn!!

However thank you all, you are fantastic! And I love this forum!

The string already has an encoding in the sense that there is a specific encoding it is in. It is just that Xojo doesn’t which encoding that is. You use DefineEncoding to make it official so to speak. DefineEncoding doesn’t alter the actual string data, it just adds meta data: Interpret the string data as having this encoding. Once the encoding is known you can freely convert to other encodings as necessary.

It’s like having a text written in a foreign language. Once you know which language it is written in you can look for a translator who speaks that language (and yours).

If a string has encoding UTF-8, may I write myString = myString.DefineEncoding(Encodings.IsoLatin1) ?
And what happens if I do such a thing?

[quote=479256:@Nedi Freguglia]If a string has encoding UTF-8, may I write myString = myString.DefineEncoding(Encodings.IsoLatin1) ?
And what happens if I do such a thing?[/quote]
It depends…
If the characters are in the ASCII range (0 - 127) then apart from the encoding metadata changing there shouldn’t be any effect.
If the data was wrongly detected as UTF-8 and is actually ISOLatin1 then Xojo should start using the text correctly.
Otherwise expect all sorts of weird character corruption to occur.

[quote=479256:@Nedi Freguglia]If a string has encoding UTF-8, may I write myString = myString.DefineEncoding(Encodings.IsoLatin1) ?
And what happens if I do such a thing?[/quote]
As kevin notes you may be able to
But CAN you is not the same as SHOULD you. You should not.
IF the string HAS a valid encoding then to make it have a different one you should use ConvertEncoding

DefineEncoding is for the times when you get data from outside the app

Ok, thank you all!
Now things are clearer.