If you are working only with text created by and read by your app, you should not need to define the encodings at all. Xojo creates Unicode8 text. You should define encodings if the text comes from an outside source. Define it when you app reads the source so it will (hopefully) be correctly read.
Thanks. My app writes to a database file, and a text file - and then later reads them both back in.
So if it needs to be defined when / if parsing external text, can I set the encoding type to the actual text area which displays the text, so that any text read / typed into it automatically becomes encoded, OR, do I need to define the encoding to the text, and THEN read it in?
Forgive me if I’m wrong, but your question suggests that still don’t have a firm grasp on the concept of encodings.
A string is just a series of bytes. The encoding tells the system how to interpret those bytes to convert them into characters.
When you create and manipulate strings entirely within Xojo code, each string will be UTF-8, the Xojo default, so each byte or series of bytes will be interpreted accordingly.
When you store a string to a database, you are storing the bytes as they exist, and you read back those bytes as they exist, so it’s up to you to make sure that the encoding of what you’ve read matches what was written. If it doesn’t, the bytes may be misinterpret and may not look right to the end users.
Sorry - but I sometimes have trouble explaining what I mean.
I understand that text created in Xojo will be UTF-8.
I also understand that if my Xojo app reads that same data back in - it will know it is UTF-8.
What I was trying to say is:
If my Xojo app was to read a text file for example, which was created elsewhere, like on a Windows PC, - is it possible to tell the text area that ANY text displayed by it, should be displayed using UTF-8, OR, do I have to read the external data in, then convert the encoding to UTF-8, THEN display it in the text field.
But DefineEncoding doesnt perform any conversion; it only defines what the encoding is. After you have defined the encoding you may use ConvertEncoding to convert it to a different encoding say to UTF8 if that is what you want to standardise on , but you dont have to. Xojo has no difficulty dealing with strings of different encodings and will always do the right thing, provided the encodings are (correctly) defined.
I didn’t know that a variable, property or parameter declared as String could have any encoding - I was under the impression that all Strings are utf8. Either I had a lot of luck the last six years or all incoming data in my two projects was utf8 anyway. Thank you!
Yes, but you need to call DefineEncoding first ConvertEncoding wouldnt know how to perform the conversion if it didnt know which encoding the string is in to begin with.
You have to find out what it is by looking for a BOM, for example, or an explicit declaration of the encoding (as in HTML or XML files. And you can use TextEncoding.IsValidData to check whether you assumption of a specific encoding might be correct.