I get a son string from a website. It has newline characters in it. I can replaceAll of them to two vertical bars (||). I then replaceAll the || to EndOfLine and display in a TextArea. I don’t get line breaks; I get an unknown character.
If I create my own string (hello||goodbye) and use it instead of the string I received, things work as expected.
I checked the encoding of the received string: nil. Then I used DefineEncoding (to UTF8), and then things worked. Apparently ConvertEncoding does not work if there is no encoding on the string. I would say that would be a bug.
Thanks for the suggestions. They sent me in the right direction.
It’s not. ConvertEncoding is for changing between valid encodings. DefineEncoding is for setting one when it’s nil. They’re specifically different because the other case (calling DefineEncoding on an already encoded String) will corrupt the string in most cases.
No. Nil is the absence of an encoding. This might be time for a feature request that causes ConvertEncoding to throw an exception if the encoding is Nil.
I respectfully disagree. While technically Nil is an allowed value for a string’s encoding, it may well be a nonsensical value, and therefore invalid with respect to ConvertEncoding.
Take a japanese unicode string and set its encoding to nil. It becomes incomprehensible. There is nothing that ConvertEncoding can do with it. It will just mangle it further.
The correct course of action in that case is DefineEncoding, provided you know the correct encoding to apply.
You are making the assumption that all strings are for displaying text, but they’re not. They also function as a “bucket of bytes” when the encoding is nil, and so a Nil encoding is perfectly valid.