Text and Locale

I have an issue when testing for letters in texts. There is a ligature in the German language called Eszett, written as “”. It can also be written as “ss”.

Now when searching for that ligature, it finds all texts containing “” and all texts containing “ss”. This is correct when the comparing is done with the German locale, but it also finds all texts containing “ss” when I use a different locale like “en-US”.

[code]Dim search As Text = “”
Dim txt As Text = “Zweitadresse” // Note the “ss” on position 9 to 10

Dim result1 As Integer = txt.IndexOf(search, 0, New xojo.Core.Locale(“de-DE”)) // result1 = 9, which is correct
Dim result2 As Integer = txt.IndexOf(search, 0, New xojo.Core.Locale(“en-US”)) // result2 = 9, should be 0
Dim result3 As Integer = txt.IndexOf(search) // result3 = 9, should be 0[/code]

How can I get rid of that erroneous comparing?

With CaseSensitive comparison, no longer ss triggers the confusion.

Dim result2 As Integer = txt.IndexOf(search, Text.CompareCaseSensitive, New xojo.Core.Locale("en-US")) 

That’s correct, but I do wanna search case insensitive…

It could be that Text using Unicode graph clusters is mistaking ss for ß like it would take ü as the same as ¨u. that would then be a disadvantage of the type. Maybe it is a bug.

In the meantime, you should probably consider String Instr. Or Regex.

I tested it with declares into OS X foundation framework and I get the same result. There is also a discussion of this Eszett-issue on CocoaBuilder.

It is an interesting case, because the result makes sense when sorting ("" should be “replaced” by “ss”). But for filtering it is wrong, because you should be able to search for “” without getting all texts containing “ss”. But since even Apple is doing it wrong, I’ll have to find a work-around.

As I suspected, Regex saves the day :

[code]// http://developer.xojo.com/regex

using Xojo.core

Dim re As New RegEx
Dim match As RegExMatch

re.SearchPattern = “[]”
match = re.Search(“Zweitadresse”)

Dim result As String
Do
If match <> Nil Then
result = match.SubExpressionString(0)
MsgBox(result)
else
MsgBox (“not found”)
End If

match = re.Search

Loop Until match Is Nil[/code]

This results in “not found”. If I add the eszett to the text to search, it finds it fine.

Since I’m working with Text only in this project I won’t use RegEx.

The work-around is simple: I get the character at the position IndexOf returns and check if it is an “s” or a “”.

[quote=230426:@Eli Ott]Since I’m working with Text only in this project I won’t use RegEx.

The work-around is simple: I get the character at the position IndexOf returns and check if it is an “s” or a “ß”.[/quote]

Good workaround.