I was reading the new xojo release notes and found a bit about superseding the string type for a new text variable. How are we supposed to use this new variable mode and how should we refactor if we have a string that includes both text and numbers in future versions?
So if i throw some int32/64 values into a text variable everything should be fine? If I have a ansi or everything but utf8 encoded textinputstream how can i pass the encoding to the unicode-only based text variable?
Is this simply but a rename or there is some caveats we should be aware of compared to the Ye Olde String?
The big difference is that String was really just a “bag of bytes”. Text is for, well, text and does not deal with bytes.[/quote]
That doesn’t sound very good, is this a one way street? Is string going away without a valid replacement? Textinputstream is still giving me a string so i’ll have to keep my code as it is but I don’t like the idea of refactor the code without much of a reason to do so. Adding a text variable option and leaving string as it is would have been a tad less obnoxious.
Text has a mechanism to convert to & from raw bytes using a specific encoding
The reason an encoding is required is that different encodings can result in different byte sequences
Excuse my ignorance on the subject but now I will have to declare string as text?
Like: dim T1 as string = dim t1 as text
sorry if my question and very basic.[/quote]
Apparently it’s a prerequisite of ios applications, the remaining platforms haven’t got any hard requirement yet(most of the desktop framework still output strings last time I’ve checked). You must use strings now but expect to switch them to text in a few versions.
So, I need to write an ASCII STRING out a serial port or receive same on a serial port. What would I need to do to get that in or out of text? Sorry for the noise, Paul’s post, above, covers this, I guess.
As of today - nothing
You can just read & write string as always
For now I wouldn’t do anything with / to / for / about using strings and serial ports as there are no advantages to this TODAY
All you’ll end up doing is using the TEXT type and converting to / from string to read & write the serial port
There are lots of different ways to encode characters into bytes. Most of them are very limited, only defining encodings for some characters, and even when they define encodings for the same characters, they often use different bytes.
The only encodings which can represent every character are the Unicode encodings: UTF-8, UTF-16, UTF-32.
The old String type tries to represent either text or bytes or both, and as a result it’s complicated and confusing. The new framework makes it very simple: text is characters, and if you want to convert to or from an array of bytes (or an old-fashioned String), you have to be clear about the encoding you intend to use.
When you say that you want to write an ASCII string to a serial port - well, you are actually writing bytes to the serial port, because you are doing something concrete, something that interchanges with other programs or machines. So you would convert the text to bytes, and you would do so using the ASCII encoding. Conversely, you can translate some bytes, contained in a string or a memoryblock, up to a Text value by specifying the encoding that was used to generate them.
[quote=150042:@Mars Saxman]Text is abstract - a series of characters.
Bytes are concrete - a series of bits.
There are lots of different ways to encode characters into bytes. Most of them are very limited, only defining encodings for some characters, and even when they define encodings for the same characters, they often use different bytes.
The only encodings which can represent every character are the Unicode encodings: UTF-8, UTF-16, UTF-32.
The old String type tries to represent either text or bytes or both, and as a result it’s complicated and confusing. The new framework makes it very simple: text is characters, and if you want to convert to or from an array of bytes (or an old-fashioned String), you have to be clear about the encoding you intend to use.
When you say that you want to write an ASCII string to a serial port - well, you are actually writing bytes to the serial port, because you are doing something concrete, something that interchanges with other programs or machines. So you would convert the text to bytes, and you would do so using the ASCII encoding. Conversely, you can translate some bytes, contained in a string or a memoryblock, up to a Text value by specifying the encoding that was used to generate them.[/quote]
Text is not string. The equivalent of string is memoryblock. Text is a completely different representation. Text is an array of codepoints which have nothing to do with the byte values we’re used to thinking about.
Actually, UTF-8 strings are kind of a step on the way to Text. Unlike a memory block, or ol’time Apple II strings, it contains Unicode glyphs that may be represented by several bytes.
The mess ensues from the lack of encoding in implicit conversion from databases and sockets. Text requires datatotext, so encoding becomes mandatory, and the lozenges are gone