String.replaceAll

Desktop, 2021.1.1

I get a son string from a website. It has newline characters in it. I can replaceAll of them to two vertical bars (||). I then replaceAll the || to EndOfLine and display in a TextArea. I don’t get line breaks; I get an unknown character.

If I create my own string (hello||goodbye) and use it instead of the string I received, things work as expected.

???

Have you tried:

https://documentation.xojo.com/api/data_types/string.html#string-replaceLineEndings

Just did. No joy,

What’s the encoding on the string before you started replacing?

So they arent ‘just’ line breaks
ReplaceLineEndings should handle LF and CRLF
You need to replace them with ‘just’ LF

Have you looked at the string in a hex editor?

Hadn’t checked that. I did ConvertEncoding(Encodings.UTF8).

Yes.

It does successfully replace the LF with ||, but I can’t get from || to a line ending that actually breaks the line.

Solved:

I checked the encoding of the received string: nil. Then I used DefineEncoding (to UTF8), and then things worked. Apparently ConvertEncoding does not work if there is no encoding on the string. I would say that would be a bug.

Thanks for the suggestions. They sent me in the right direction.

It’s not. ConvertEncoding is for changing between valid encodings. DefineEncoding is for setting one when it’s nil. They’re specifically different because the other case (calling DefineEncoding on an already encoded String) will corrupt the string in most cases.

Greg,

But ConvertEncoding should know if there’s not encoding on the string and complain somehow.

-Bob

3 Likes

Nil is a valid encoding … :roll_eyes:

No. Nil is the absence of an encoding. This might be time for a feature request that causes ConvertEncoding to throw an exception if the encoding is Nil.

Of course now that I know this, it would be easy enough to right a method (setEncoding) the uses DefineEncoding if nil and ConvertEncoding otherwise.

Just checked the Language Reference. ConvertEncoding is clear that that the string must have the encoding defined. DefineEncoding is not as obvious.

An object can be nil, and that is a valid value for it, eg

dim f as folderItem

f has a value of Nil.

Encodings are classes - see TextEncoding — Xojo documentation

Consequently Nil is a valid value for it.

It’s unlikely that we’d do that because

A. It’s a behavior change
B. It would break peoples apps at runtime with no way to warn you.

The best we could do is come up with a new name/signature with the new behavior.

1 Like

I respectfully disagree. While technically Nil is an allowed value for a string’s encoding, it may well be a nonsensical value, and therefore invalid with respect to ConvertEncoding.

Take a japanese unicode string and set its encoding to nil. It becomes incomprehensible. There is nothing that ConvertEncoding can do with it. It will just mangle it further.

The correct course of action in that case is DefineEncoding, provided you know the correct encoding to apply.

DefineEncoding can be used whether the string has an encoding or not. It doesn’t matter, so the LR doesn’t need to address it.

You are making the assumption that all strings are for displaying text, but they’re not. They also function as a “bucket of bytes” when the encoding is nil, and so a Nil encoding is perfectly valid.

Context, people. The subject was the use of ConvertEncoding. Even the language reference indicates that nil is not valid.

Oh, I’m aware of the context. Still doesn’t change that Nil is a valid encoding.

The question is NOT if Nil is a valid encoding or not, but how ConvertEncoding handles a Nil encoding - and that is quite a different matter.

Mangling the string (whatever that means) is probably the worst option.

It COULD throw an exception - my guess is you will see a lot of code “explode”.

Or it could output a warning and automatically try GuessEncoding and DefineEncoding.

There are probably more options, but I leave that to Greg …