Comparing a Name

I am trying to compare two names and figure out if they are the same, different, same but case different…
This code just doesn’t want to work… (Maybe it’s late…)

Test vs TEST … says the are just different.

if StrComp( self.Name, rhs.Name, REALbasic.StrCompLexical) = 0 then System.DebugLog("Same Name but case may different") if StrComp( self.Name, rhs.Name, REALbasic.StrCompCaseSensitive) <> 0 then System.DebugLog("Same Name :" + self.Name + ": vs :" + rhs.Name + ": case different") return 1 end if else System.DebugLog("Different Name :" + self.Name + ": vs :" + rhs.Name + ":") return -1 end if

Change the first line to:

if self.Name = rhs.Name then

You are using REALbasic.StrCompCaseSensitive, which means that the comparison is case sensitive
Therefore Test is indeed different in comparison with TEST.
Try REALbasic.StrCompLexical instead.

[quote=429539:@Kem Tekinay]Change the first line to:

if self.Name = rhs.Name then [/quote]

I get that Kem… but this statement is the same isn’t it?

StrComp( self.Name, rhs.Name, REALbasic.StrCompLexical) = 0

I haven’t read the documentation correct.
In the documentation of StrComp it is mentioned that StrComp still returns -1 when comparing Test vs TEST

[quote]The following code returns -1 because the two strings are the same in every way except in case

StrComp("Spam", "spam", REALbasic.StrCompLexical)

[/quote]

Kem’s method ignores the case.

I never use it that way, so I couldn’t tell you. I only ever use StrComp to do a byte-level comparison of strings.

The Lexical comparison option is only affecting how text gets ordered (i.e. it gets ordered differently than if you used “<” and “>” to compare strings).

If you want to actually ignore case, then convert both strings to lowercase first.

good idea, but… :wink:
…there are some other things to consider when doing that.
first off: have both to-be-compared Strings the same Encoding?
And if so: which Encoding?

  • WindowsANSI: Lowercase is kind of broken… especially with special chars (Umlaut), such as “Ü”. See <https://xojo.com/issue/54926>
  • UTF8: Converting to UTF8 is a workaround for the WindowsANSI issue. You might have UTF8 strings anyway. But… where do they come from? Are they both the same way (pre/de-composed)? If not, “ü = ü” might be false, since there are different binary representations for the very same “visual” character. So keep that in mind when doing binary comparisons of UTF8 strings (e.g. with StrComp).
    I haven’t found much in Feedback regarding UTF8 and pre/decomposed Strings, and how to normalize them in Xojo… An old one talking about issues is <https://xojo.com/issue/19163>. Shouldn’t there be one requesting a way to normalize UTF8 Strings?

I want to know that the strings are
a) Test <> Test (Different character set encoding)
b) Test = TEST (Lexically the same)
c) Test <> TEST. (Different because of case)

i.e. I want to KNOW that they differ only by case.
Maybe that’s clearer.

if self.Name.Encoding <> rhs.Name.Encoding then System.DebugLog("Character encodings are different.") return -1 end if if self.Name <> rhs.Name then System.DebugLog("The names are lexically different") return -1 end if if StrComp(self.Name, rhs.Name, REALbasic.StrCompCaseSensitive) = 0 then System.DebugLog("The names are identical") return 0 end if return -1

dim a, b as string

if a = b then
'same
else if a.lowecase = b.lowercase then
'same but different case
end if

[quote=429605:@Thomas Tempelmann]
If you want to actually ignore case, then convert both strings to lowercase first.[/quote]

There are language where this literally cant work oddly enough
I think Turkish is one where there are some letters in upper case that have no lower case equivalent (maybe I have that backwards)
By lower casing you actually change the “words”

I’m not trying to ignore the case I’m trying to log why the comparison of two string has failed.
either the encodings don’t match or the case doesn’t match or its a match. but i have to log why…

We have NSStringCompareMBS in the plugin, which uses Apple’s framework to compare and you can specify options like case insensitive, diacritics insensitive and width differences insensitive.

LevenshteinDistanceMBS or JaroWinklerDistanceMBS may also help as it will tell you whether two strings are nearly equal.

dim a, b as string. a = "Hello" b = "Hello" a = defineEncoding(a, UTF8) b = defineEncoding(b, ASCII)

If i’m not mistaken these are binary equivalent, but will fail string compare.

I’m beginning to wonder if I can convert all my strings to UTF16 and then do the comparison.

Those two strings should compare as equally!
They have same bytes and compatible encoding.

Between UTF8 and ASCII encoding difference will only be noted for characters above 0x7F including those that are multi-byte, all characters below 0x7F will compare identically in either encoding

Will = should? Since @Brian O’Brien just said it wasn’t comparing (in xojo)

Yes they compare fine… but i though it wasn’t a sure thing.
I am learning that some encodings are partially compatible with each other.
Anything 7 bit in ascii will compare with UTF8 but not above 128.

I’m still trying to figure out if there is a way to promote all strings to UTF-16 or ISO Latin and then compare in that encoding…

If you “promote to UTF-16” then the encoding will be UTF-16… it will at that point have no bearing on what its previous encoding was

So if you take a string encoding UTF-8 and a String encoded ISO-Latin, and promote each to UTF-16, now you have TWO UTF-16 strings… and those ASCII characters will simply become 00xx where xx was the UTF-8/ASCII character