Comparing a Name

Brian_O_Brien · March 21, 2019, 5:18am

I am trying to compare two names and figure out if they are the same, different, same but case different…
This code just doesn’t want to work… (Maybe it’s late…)

Test vs TEST … says the are just different.

if StrComp( self.Name, rhs.Name, REALbasic.StrCompLexical) = 0 then System.DebugLog("Same Name but case may different") if StrComp( self.Name, rhs.Name, REALbasic.StrCompCaseSensitive) <> 0 then System.DebugLog("Same Name :" + self.Name + ": vs :" + rhs.Name + ": case different") return 1 end if else System.DebugLog("Different Name :" + self.Name + ": vs :" + rhs.Name + ":") return -1 end if

Kem_Tekinay · March 21, 2019, 5:28am

Change the first line to:

if self.Name = rhs.Name then

PaulS · March 21, 2019, 8:39am

You are using REALbasic.StrCompCaseSensitive, which means that the comparison is case sensitive
Therefore Test is indeed different in comparison with TEST.
Try REALbasic.StrCompLexical instead.

Brian_O_Brien · March 21, 2019, 2:19pm

[quote=429539:@Kem Tekinay]Change the first line to:

if self.Name = rhs.Name then[/quote]

I get that Kem… but this statement is the same isn’t it?

StrComp( self.Name, rhs.Name, REALbasic.StrCompLexical) = 0

PaulS · March 21, 2019, 2:38pm

I haven’t read the documentation correct.
In the documentation of StrComp it is mentioned that StrComp still returns -1 when comparing Test vs TEST

[quote]The following code returns -1 because the two strings are the same in every way except in case

StrComp("Spam", "spam", REALbasic.StrCompLexical)

[/quote]

Kem’s method ignores the case.

Kem_Tekinay · March 21, 2019, 2:48pm

I never use it that way, so I couldn’t tell you. I only ever use StrComp to do a byte-level comparison of strings.

Thomas_Tempelmann · March 21, 2019, 2:49pm

The Lexical comparison option is only affecting how text gets ordered (i.e. it gets ordered differently than if you used “<” and “>” to compare strings).

If you want to actually ignore case, then convert both strings to lowercase first.

Jürg_Otter · March 21, 2019, 4:15pm

good idea, but…
…there are some other things to consider when doing that.
first off: have both to-be-compared Strings the same Encoding?
And if so: which Encoding?

WindowsANSI: Lowercase is kind of broken… especially with special chars (Umlaut), such as “Ü”. See <https://xojo.com/issue/54926>
UTF8: Converting to UTF8 is a workaround for the WindowsANSI issue. You might have UTF8 strings anyway. But… where do they come from? Are they both the same way (pre/de-composed)? If not, “ü = ü” might be false, since there are different binary representations for the very same “visual” character. So keep that in mind when doing binary comparisons of UTF8 strings (e.g. with StrComp).
I haven’t found much in Feedback regarding UTF8 and pre/decomposed Strings, and how to normalize them in Xojo… An old one talking about issues is <https://xojo.com/issue/19163>. Shouldn’t there be one requesting a way to normalize UTF8 Strings?

Brian_O_Brien · March 21, 2019, 4:59pm

I want to know that the strings are
a) Test <> Test (Different character set encoding)
b) Test = TEST (Lexically the same)
c) Test <> TEST. (Different because of case)

i.e. I want to KNOW that they differ only by case.
Maybe that’s clearer.

if self.Name.Encoding <> rhs.Name.Encoding then System.DebugLog("Character encodings are different.") return -1 end if if self.Name <> rhs.Name then System.DebugLog("The names are lexically different") return -1 end if if StrComp(self.Name, rhs.Name, REALbasic.StrCompCaseSensitive) = 0 then System.DebugLog("The names are identical") return 0 end if return -1

Marius_Dieter_Noetzel · March 21, 2019, 5:41pm

dim a, b as string

if a = b then
'same
else if a.lowecase = b.lowercase then
'same but different case
end if

Norman_Palardy · March 21, 2019, 5:48pm

[quote=429605:@Thomas Tempelmann]
If you want to actually ignore case, then convert both strings to lowercase first.[/quote]

There are language where this literally cant work oddly enough
I think Turkish is one where there are some letters in upper case that have no lower case equivalent (maybe I have that backwards)
By lower casing you actually change the “words”

Brian_O_Brien · March 21, 2019, 5:51pm

I’m not trying to ignore the case I’m trying to log why the comparison of two string has failed.
either the encodings don’t match or the case doesn’t match or its a match. but i have to log why…

Christian_Schmitz · March 21, 2019, 6:15pm

We have NSStringCompareMBS in the plugin, which uses Apple’s framework to compare and you can specify options like case insensitive, diacritics insensitive and width differences insensitive.

Christian_Schmitz · March 21, 2019, 6:15pm

LevenshteinDistanceMBS or JaroWinklerDistanceMBS may also help as it will tell you whether two strings are nearly equal.

Brian_O_Brien · March 21, 2019, 7:07pm

dim a, b as string. a = "Hello" b = "Hello" a = defineEncoding(a, UTF8) b = defineEncoding(b, ASCII)

If i’m not mistaken these are binary equivalent, but will fail string compare.

I’m beginning to wonder if I can convert all my strings to UTF16 and then do the comparison.

Christian_Schmitz · March 21, 2019, 7:20pm

Those two strings should compare as equally!
They have same bytes and compatible encoding.

DaveS · March 21, 2019, 7:29pm

Between UTF8 and ASCII encoding difference will only be noted for characters above 0x7F including those that are multi-byte, all characters below 0x7F will compare identically in either encoding

DerkJ · March 21, 2019, 9:38pm

Will = should? Since @Brian O’Brien just said it wasnt comparing (in xojo)

Brian_O_Brien · March 21, 2019, 9:43pm

Yes they compare fine… but i though it wasn’t a sure thing.
I am learning that some encodings are partially compatible with each other.
Anything 7 bit in ascii will compare with UTF8 but not above 128.

I’m still trying to figure out if there is a way to promote all strings to UTF-16 or ISO Latin and then compare in that encoding…

DaveS · March 21, 2019, 9:48pm

If you “promote to UTF-16” then the encoding will be UTF-16… it will at that point have no bearing on what its previous encoding was

So if you take a string encoding UTF-8 and a String encoded ISO-Latin, and promote each to UTF-16, now you have TWO UTF-16 strings… and those ASCII characters will simply become 00xx where xx was the UTF-8/ASCII character