TextArea and Emoji

So I’ve started to use some emoji in my application which uses a TextArea, however this creates all kinds of weirdness for Xojo functions.

I’ve searched in Feedback and can only find my case about Emoji in a Popup menu not displaying, before I take the time out to create sample projects, has anyone else filed any cases that I can sign on to (which I simply can’t find).

So far the issues I’ve found are:

  • len( textArea.text ) is incorrect (off by a varying amount).
  • textArea.charAtPos( x, y ) is incorrect after the emoji, again by varying places.
  • This one occurs regardless of Emoji; LineNumAtCharPos( index ) is off by two places.

I managed to get the correct length by using declares, can probably replace the others with declares also, but then I don’t know if the declare results would be compatible with other Xojo functions, such as Mid or textArea.selStart & textArea.selLength.

Decided to file cases anyway.

LineNumberAtCharPos is wrong:
<https://xojo.com/issue/43874>

CharAtPos is wrong:
<https://xojo.com/issue/43875>

len( string ) is wrong:
<https://xojo.com/issue/43876>

I simply cannot guarantee that emoji won’t get used.

Good catch. I’ve been thinking of building a chat app and this would seriously hinder all efforts.

Make sure you sign on to the cases :slight_smile:

They’ve all been verified, so hopefully a fix won’t be too far away. Although I have no idea of what’s the cause.

[quote=267936:@Sam Rowlands]Make sure you sign on to the cases :slight_smile:

They’ve all been verified, so hopefully a fix won’t be too far away. Although I have no idea of what’s the cause.[/quote]

Emojis seem to be encoded with two unicode glyphs. Looking at the RTFData, Grinning Face which is normally &u1F600 comes up as \\u-10179\\u-8704 where I would have expected decimal \\u128512. Per comparison, “é” comes up as \u233, which is expectedly decimal for &uE9.

I guess the code behind len and charAtPos never expected to encounter two double byte characters in place of one.

You can see the difference if you look at LenB instead of Len. Some emoji take 4 bytes (these are the ones where Len yields the correct result) while some take 6 (makes Len wrong).

Len was the first one I found and with some declares I was able to find a workaround; however I then found that mid was wrong and also LineNumAtCharPos; which are all functions I need for my to work.

I stopped at that point and haven’t tested anything else.

[quote=267988:@Sam Rowlands]Len was the first one I found and with some declares I was able to find a workaround; however I then found that mid was wrong and also LineNumAtCharPos; which are all functions I need for my to work.

I stopped at that point and haven’t tested anything else.[/quote]

I bet putting the TextArea.Text into a Text type will fix len and mid will be right, since it deals well with composited characters.

As for LineNumAtCharPos, short of temporarily replacing emojis by tokens, we’ll have to wait for a fix.

Interesting, I’ll have to try this.

I’m hoping they’ll fix this problem as I suspect it affects text through the Xojo framework and not just the TextArea, I wonder if it’s related to my bug report whereby using emoji on a Popupmenu corrupts what gets display in the menu.

I’ll have to test this, but I wonder if you convert to UTF-32 if it might be a workaround. That should be a large enough bit size to hold any current character or emoji.

I did try UTF-16, but didn’t think to try further. Let us know how you get on.

You won’t have any luck trying to work around CharAtPos or LineNumAtCharPos by changing string encodings.

Pah!.. Any pointers on what we might be able to use as a workaround?

The best bet is to use a shadow TextArea loaded with the same data, and do a general replace on each and all emojis with single or double byte characters before getting CharAtPos.

This won’t get you the same results.

I was talking about a workaround, until a fix that for all we know, can be a while away.

Joe seems aware of the problem… is there any fix coming ?

I forgot to mention in this thread that SelStart is also incorrect, I reported this in the same case as CharAtPos <https://xojo.com/issue/43875>

Hopefully a fix isn’t too far off. The TextArea is a very powerful control and pretty much the basis for the application I was looking to create.

Crap. Just got bitten by this one too.

I’ve added to the case that I encountered the same problem when using IndexOf.

It’s probably all these functions… Right, Left, Mid, etc.

Any people smarter than me with a workaround? Memblock conversion & filter for emoji or something?