String.replaceAll

Bob_Gordon · July 16, 2021, 6:28pm

Desktop, 2021.1.1

I get a son string from a website. It has newline characters in it. I can replaceAll of them to two vertical bars (||). I then replaceAll the || to EndOfLine and display in a TextArea. I don’t get line breaks; I get an unknown character.

If I create my own string (hello||goodbye) and use it instead of the string I received, things work as expected.

???

Jeff_Tullin · July 16, 2021, 6:50pm

Have you tried:

https://documentation.xojo.com/api/data_types/string.html#string-replaceLineEndings

Bob_Gordon · July 16, 2021, 7:31pm

Just did. No joy,

Greg_O_Lone · July 16, 2021, 7:33pm

What’s the encoding on the string before you started replacing?

Jeff_Tullin · July 16, 2021, 8:05pm

So they arent ‘just’ line breaks
ReplaceLineEndings should handle LF and CRLF
You need to replace them with ‘just’ LF

Have you looked at the string in a hex editor?

Bob_Gordon · July 16, 2021, 8:18pm

Hadn’t checked that. I did ConvertEncoding(Encodings.UTF8).

Bob_Gordon · July 16, 2021, 8:19pm

Yes.

It does successfully replace the LF with ||, but I can’t get from || to a line ending that actually breaks the line.

Bob_Gordon · July 16, 2021, 8:30pm

Solved:

I checked the encoding of the received string: nil. Then I used DefineEncoding (to UTF8), and then things worked. Apparently ConvertEncoding does not work if there is no encoding on the string. I would say that would be a bug.

Thanks for the suggestions. They sent me in the right direction.

Greg_O_Lone · July 16, 2021, 9:46pm

It’s not. ConvertEncoding is for changing between valid encodings. DefineEncoding is for setting one when it’s nil. They’re specifically different because the other case (calling DefineEncoding on an already encoded String) will corrupt the string in most cases.

Bob_Gordon · July 16, 2021, 10:18pm

Greg,

But ConvertEncoding should know if there’s not encoding on the string and complain somehow.

-Bob

Markus_Winter · July 17, 2021, 12:18am

Nil is a valid encoding …

Tim_Hare · July 17, 2021, 12:49am

No. Nil is the absence of an encoding. This might be time for a feature request that causes ConvertEncoding to throw an exception if the encoding is Nil.

Bob_Gordon · July 17, 2021, 1:22am

Of course now that I know this, it would be easy enough to right a method (setEncoding) the uses DefineEncoding if nil and ConvertEncoding otherwise.

Just checked the Language Reference. ConvertEncoding is clear that that the string must have the encoding defined. DefineEncoding is not as obvious.

Markus_Winter · July 17, 2021, 2:12am

An object can be nil, and that is a valid value for it, eg

dim f as folderItem

f has a value of Nil.

Encodings are classes - see TextEncoding — Xojo documentation

Consequently Nil is a valid value for it.

Greg_O_Lone · July 17, 2021, 2:14am

It’s unlikely that we’d do that because

A. It’s a behavior change
B. It would break peoples apps at runtime with no way to warn you.

The best we could do is come up with a new name/signature with the new behavior.

Tim_Hare · July 17, 2021, 3:22am

I respectfully disagree. While technically Nil is an allowed value for a string’s encoding, it may well be a nonsensical value, and therefore invalid with respect to ConvertEncoding.

Take a japanese unicode string and set its encoding to nil. It becomes incomprehensible. There is nothing that ConvertEncoding can do with it. It will just mangle it further.

The correct course of action in that case is DefineEncoding, provided you know the correct encoding to apply.

Tim_Hare · July 17, 2021, 3:24am

DefineEncoding can be used whether the string has an encoding or not. It doesn’t matter, so the LR doesn’t need to address it.

Greg_O_Lone · July 17, 2021, 3:25am

You are making the assumption that all strings are for displaying text, but they’re not. They also function as a “bucket of bytes” when the encoding is nil, and so a Nil encoding is perfectly valid.

Tim_Hare · July 17, 2021, 3:27am

Context, people. The subject was the use of ConvertEncoding. Even the language reference indicates that nil is not valid.

Markus_Winter · July 17, 2021, 3:52am

Oh, I’m aware of the context. Still doesn’t change that Nil is a valid encoding.

The question is NOT if Nil is a valid encoding or not, but how ConvertEncoding handles a Nil encoding - and that is quite a different matter.

Mangling the string (whatever that means) is probably the worst option.

It COULD throw an exception - my guess is you will see a lot of code “explode”.

Or it could output a warning and automatically try GuessEncoding and DefineEncoding.

There are probably more options, but I leave that to Greg …