Is this a bug or expected behaviour?

There is no such “culture” to my knowledge. But when you file Feedback, be sure to mention the updated docs in case that should be reversed or further clarified.

Its not broken.

Xojo has a culture of if there’s a workaround, its not relevant. Bugs keep there for years and people on the forum says “Its a know thing, we have a workaround for that”. It became a cultural thing already.

3 Likes

Kindly keep this conversation on track.

Completely broken until redefined in docs to appear less broken.

1 Like

The whole point of a high level language is to make it easy for users to get things done. If the user now goes outside their “safe bubble” and gets some text in from a 3rd party, be that a web service or an untrained user’s input there is a possibility that their code will not perform as expected.

If any of those inputs have a multibyte encoding then any string manipulation will fragment as happens just by selecting code in the IDE, nice.

image

This makes the usefulness of the simplest of things like indexof, length, middle, left, right etc. pretty much pointless going forward in a society where the increased use of diverse language and pictures is only going to increase.

If people want speed, they should be using the Bytes variants of these much like you would use a memoryblock for speed

3 Likes

It isn’t broken though. The string functions have never supported grapheme clusters so expecting them to return a grapheme cluster length is wrong.

1 Like

Right, which is why I said above “for better or worse”.

At any rate, I think @GarryPettet has his answer, and anyone who thinks this is the wrong behavior should file Feedback and post the link here.

You know that if we don’t process, split, etc chars (here known as the possible cases of grapheme clusters) we will have numerous numbers of bugs due to processing a pack of not meaningful bytes instead of the expected chars (in bytes terms, a grapheme cluster), don’t you?

Yes, this has been a challenge practically since Unicode was introduced in the REALbasic days. That’s why understanding how it works is important.

So, let’s fix the bugs.

1 Like

Not a bug, and your continually calling it that doesn’t make it so.

But file Feedback so you can get the engineers involved. In the meantime, this is getting repetitive and unhelpful.

As I said, we have a sick cultural thing of tolerance of bugs, and rewriting the rules to make them “features” needing workarounds. I give up, again.

1 Like

It all depends on how the 3rd party deals with text. For example, the 3rd party libraries we integrate with handle text the same way Xojo does.

That isn’t correct. The problem only occurs when you have characters comprised of multiple code points (for example, emoji & decomposed characters).

No. Using the Bytes variants would mean that you would have to write your own UTF-8 processing code which I imagine would be quite slow if 100% Xojo code.

I would say there are 3 levels of text processing:
a) Bytes
b) Code Points
c) Grapheme Clusters

Xojo provides function for ‘a’ & ‘b’ with ‘b’ being the default. ‘c’ can be implemented via the Characters iterator.

Saying something is a bug doesn’t make it a bug though.

Well, saying that a bug is not, too.

1 Like

It isn’t a bug. The OP is expecting Xojo to return the length of a string based on a different level of text handling.

@Rick_Araujo doesn’t appear to appreciate that the problem, if any, lies in Unicode’s acceptance of the notion of combining code points to give what to a human looks like a single character.

Combining code points as one char is a char for both, humans and computer.
Not Rick’s opinion, just a fact.

1 Like

You’ve all made your points, and this is now turning into an argument. I’m asking you all to stop here unless you have something new to contribute to @GarryPettet 's question.

2 Likes