I am trying to compare two names and figure out if they are the same, different, same but case different…
This code just doesn’t want to work… (Maybe it’s late…)
Test vs TEST … says the are just different.
if StrComp( self.Name, rhs.Name, REALbasic.StrCompLexical) = 0 then
System.DebugLog("Same Name but case may different")
if StrComp( self.Name, rhs.Name, REALbasic.StrCompCaseSensitive) <> 0 then
System.DebugLog("Same Name :" + self.Name + ": vs :" + rhs.Name + ": case different")
System.DebugLog("Different Name :" + self.Name + ": vs :" + rhs.Name + ":")
UTF8: Converting to UTF8 is a workaround for the WindowsANSI issue. You might have UTF8 strings anyway. But… where do they come from? Are they both the same way (pre/de-composed)? If not, “ü = ü” might be false, since there are different binary representations for the very same “visual” character. So keep that in mind when doing binary comparisons of UTF8 strings (e.g. with StrComp).
I haven’t found much in Feedback regarding UTF8 and pre/decomposed Strings, and how to normalize them in Xojo… An old one talking about issues is feedback://showreport?report_id=19163. Shouldn’t there be one requesting a way to normalize UTF8 Strings?
I want to know that the strings are
a) Test <> Test (Different character set encoding)
b) Test = TEST (Lexically the same)
c) Test <> TEST. (Different because of case)
i.e. I want to KNOW that they differ only by case.
Maybe that’s clearer.
if self.Name.Encoding <> rhs.Name.Encoding then
System.DebugLog("Character encodings are different.")
if self.Name <> rhs.Name then
System.DebugLog("The names are lexically different")
if StrComp(self.Name, rhs.Name, REALbasic.StrCompCaseSensitive) = 0 then
System.DebugLog("The names are identical")
If you want to actually ignore case, then convert both strings to lowercase first.[/quote]
There are language where this literally cant work oddly enough
I think Turkish is one where there are some letters in upper case that have no lower case equivalent (maybe I have that backwards)
By lower casing you actually change the “words”
Yes they compare fine… but i though it wasn’t a sure thing.
I am learning that some encodings are partially compatible with each other.
Anything 7 bit in ascii will compare with UTF8 but not above 128.
I’m still trying to figure out if there is a way to promote all strings to UTF-16 or ISO Latin and then compare in that encoding…
If you “promote to UTF-16” then the encoding will be UTF-16… it will at that point have no bearing on what its previous encoding was
So if you take a string encoding UTF-8 and a String encoded ISO-Latin, and promote each to UTF-16, now you have TWO UTF-16 strings… and those ASCII characters will simply become 00xx where xx was the UTF-8/ASCII character