IndexOf with UTF-8

If I perform a search with the indexof function (casesensitive = true) and the text contains characters such as üèö… I get an incorrect value back. I understand that the problem is with the UTF8 encoding but I don’t know how to fix it.

If I set casesensitive = false it works fine.

Please post your code, the text you are searching, what you are searching for, what you expect, and what you got.

2 Likes

The code

Var MyText as string = "Test öö IndexOf "
Var Position as integer = MyText.IndexOf("Index",ComparisonOptions.Caseinsensitive)
'Return 8


Var MyText as string = "Test öö IndexOf "
Var Position as integer = MyText.IndexOf("Index",ComparisonOptions.Casesensitive)
'Return 10

Bug for sure - report it. I experimented with converting it to UTF-32 and UTF-16 and it completely fails:

Var MyText as string = "Test öö IndexOf "

mytext=mytext.ConvertEncoding(encodings.UTF32)

Var Position as integer = MyText.IndexOf("Index",ComparisonOptions.Casesensitive)
//Position = -1

Change:

to

Var MyText As String = "Test öö IndexOf "
Var Position As Integer = MyText.IndexOf("Index",ComparisonOptions.Casesensitive, locale.Raw)
'Return 10

it should return 8 too. Info from Issue #65969 (note from Paul)

2 Likes

Confirmed here.

Works great for UTF-8 but fails (position = -1) with UTF-16 and throws a RunTimeException with UTF-32.

That’s with 2020 r1.2.

This is the kind of bug that gives the language a bad reputation for being riddled with framework bugs. This is totally a unit testable bug.

It’s returning the byte index instead of the character index. You see that if you search for “ö” instead. (It returns 5.)

Sign onto the Issue here:

https://tracker.xojo.com/xojoinc/xojo/-/issues/65969

I think this is a similar issue : #70437
Same kind of problem : #69674 which has been fixed.

Yeah - seems like there are plenty of issues with this method. Hopefully this will get fixed sooner than later.

Targeted to 2023r4 (Milestone).