When to define encodings?

Hi,
When do I need to define text encodings in my app?

Do I need to define the encoding type when I am saving an entry to a database or a text file, or should I define the encoding type when the data gets read by the database or text file?

In other words - define at the time of WRITING, or define at the time or READING?

Hope that made sense.

Thank you all in advance.

If you are working only with text created by and read by your app, you should not need to define the encodings at all. Xojo creates Unicode8 text. You should define encodings if the text comes from an outside source. Define it when you app reads the source so it will (hopefully) be correctly read.

Roger,
Thanks. My app writes to a database file, and a text file - and then later reads them both back in.

So if it needs to be defined when / if parsing external text, can I set the encoding type to the actual text area which displays the text, so that any text read / typed into it automatically becomes encoded, OR, do I need to define the encoding to the text, and THEN read it in?

I know what I am saying - just hope you do :slight_smile:

Forgive me if I’m wrong, but your question suggests that still don’t have a firm grasp on the concept of encodings.

A string is just a series of bytes. The encoding tells the system how to interpret those bytes to convert them into characters.

When you create and manipulate strings entirely within Xojo code, each string will be UTF-8, the Xojo default, so each byte or series of bytes will be interpreted accordingly.

When you store a string to a database, you are storing the bytes as they exist, and you read back those bytes as they exist, so it’s up to you to make sure that the encoding of what you’ve read matches what was written. If it doesn’t, the bytes may be misinterpret and may not look right to the end users.

Does that clear things up?

To relate what Kem said to your situation, if your app WRITES the data and your app READS the data, you don’t need to worry about encodings. They will all be UTF-8

Generally you should use ConvertEncoding for outgoing strings and DefineEncoding for incoming data. This could be from/to databases, tcp sockets, files, etc.

I find the names of the two methods a bit misleading:

Dim aString As String = aRecordSet.IdxField(0).StringValue.DefineEncoding(Encodings.WindowANSI)

I think the term ConvertFromEncoding for DefineEncoding (and ConvertToEncoding for ConvertEncoding) would be more accurate.

Sorry - but I sometimes have trouble explaining what I mean.

I understand that text created in Xojo will be UTF-8.
I also understand that if my Xojo app reads that same data back in - it will know it is UTF-8.

What I was trying to say is:
If my Xojo app was to read a text file for example, which was created elsewhere, like on a Windows PC, - is it possible to tell the text area that ANY text displayed by it, should be displayed using UTF-8, OR, do I have to read the external data in, then convert the encoding to UTF-8, THEN display it in the text field.

Hopefully, I don’t sound as dumb this time ??

I agree Eli - the names seem to be reversed (to me at least).

No, you cannot define the TextArea. If you are reading from a text file, you can define the TextInputStream, but you have to know what the encoding is.

But DefineEncoding doesn’t perform any conversion; it only defines what the encoding is. After you have defined the encoding you may use ConvertEncoding to convert it to a different encoding – say to UTF8 if that is what you want to standardise on –, but you don’t have to. Xojo has no difficulty dealing with strings of different encodings and will always do the right thing, provided the encodings are (correctly) defined.

Thanks Kem - I just thought maybe you could set the text area to always display as UTF-8.
I now understand that I need to define the TextInputStream and then display it.

Thank you for clearing that up.
I didn’t think it was possible - It was more of a hope :slight_smile:

So,

  1. If I know the encoding of a text file, but my app doesn’t - I use DefineEncoding.

  2. If I Know the encoding of a text file, and wan’t to convert it to another encoding, I use - ConvertEncoding.

  3. What do I use If I do not know the encoding of an external text file???

I didn’t know that a variable, property or parameter declared as String could have any encoding - I was under the impression that all Strings are utf8. Either I had a lot of luck the last six years or all incoming data in my two projects was utf8 anyway. Thank you!

Yes, but you need to call DefineEncoding first – ConvertEncoding wouldn’t know how to perform the conversion if it didn’t know which encoding the string is in to begin with.

You have to find out what it is – by looking for a BOM, for example, or an explicit declaration of the encoding (as in HTML or XML files. And you can use TextEncoding.IsValidData to check whether you assumption of a specific encoding might be correct.

My M_String package has a method to try to determine the encoding by analyzing the contents of the string. It’s not perfect, but an option if you simply don’t know.

http://www.mactechnologies.com/downloads

Thanks.
My only problem now is question 3.

I will look into that and also Kem’s M_String :slight_smile:

Just to be clear, this is incorrect. You must still define the encoding when you read it back.

OK,
final question on this subject:

If I define the encoding, and then convert the encoding, do I then need to define the new encoding? Or does ConvertEncoding convert and define?

Converts and defines.

Phew - glad that’s over :slight_smile: