Looks correct to me.
‘a’ consists of 2 UTF-8 sequences: E298BA & EFB88F while ‘b’ consists of 1 UTF-8 sequence: F09F9880
The Xojo string functions have always operated on code points rather than user perceived characters which are a higher level text concept.
If you want user perceived characters you will probably have to count them using the Characters iterator. I’m sure manipulating strings at a user perceived character level would also be possible with a bit of work.
Is it a bug? I would say not. You can process strings either as code points or as user perceived characters. Many years ago, Realbasic chose code points for some reason. Maybe the concept of user perceived characters didn’t exist at the time or maybe the string processing code just didn’t support them.
If Xojo changed how this worked then it would break lots of code. Some of our apps interact with 3rd party libraries which also process text the same way so changing this would be a show stopper for us.
From what I can remember, the Xojo text data type did operate on user perceived characters so you could say that deprecating it was a step backwards. Maybe an enhancement to String would be to introduce an additional set of string methods (or maybe parameters on the existing methods) that specified code point mode or user perceived character mode.