I am confused…
Playing around with textencoding on a Mac.
I have two TextFields, one contains a character (a umlaut), typed via interface, another one contains the same character, copied and pasted from the name of a FolderItem. In a variable, they are not equal…
Length (Bytes): 3
Binary: 61CC 88
Typed in TextField:
Length (Bytes): 2
Why is this so and how to make it comparable?
Read up on precomposed versus decomposed unicode. You need to normalize your strings.
Didn’t I write about this topic yesterday???
The first one seems to be a combined character:
U+0061 a 61 LATIN SMALL LETTER A
U+0308 ? cc 88 COMBINING DIAERESIS
The second one is just:
U+00E4 c3 a4 LATIN SMALL LETTER A WITH DIAERESIS
I’d like to know whether these (and similar pairs) can compare to being equal and what happens (or is supposed to happen) with methods such as .mid, .left, and so on.
Normalize first, compare later. Code with original comment:
'string needs to be normalized or string comparison won't work, b.l.o.o.d.y Exchange!!!
dim theCFString as CFStringMBS = NewCFStringMBS(theName)
if theCFString <> Nil then
dim theMutableString as CFMutableStringMBS = theCFString.Normalize(2)
if theMutableString <> nil then theName = theMutableString.str
I thought I was going nuts because the strings weren’t the same.
(well I see the forum interface mishandled that by posting twice and doing something odd to the COMBINING DIAERESIS).
Thank you Beatrix.
I did not find anything you wrote yesterday and I do not check every post every day here, sorry.
Am I right you use MBS-Elements in your codesample?
[quote=394921:@Beatrix Willius]Read up on precomposed versus decomposed unicode. You need to normalize your strings.
Didn’t I write about this topic yesterday???[/quote]
I think you did in the topic on filenames in Mac/Win - “Encoding issue with Mac and Win filenames”.
I found that topic quite helpful for this link:
I didn’t find any documentation for normalize(), however.
[quote=394929:@Tim Streater]Oops, sorry:
Just a reminder… when posting links, you dont need to wrap them in a url tag unless you want the clickable text to be something other than the link itself.
Maybe it would make sense to take care this in XOJO.
Thank you Betrix and Tim.
I wonder if converting each string to text and back would do the trick?
Right you are, even though comparing the Text values works.
My test code:
dim s1, s2 as string
s1 = DecodeHex( "61CC88" ).DefineEncoding( Encodings.UTF8 )
s2 = DecodeHex( "C3A4" ).DefineEncoding( Encodings.UTF8 )
AddToResult s1 = s2 // False
dim t1, t2 as text
t1 = s1.ToText
s1 = t1
t2 = s2.ToText
s2 = t2
AddToResult s1 = s2 // Still false
AddToResult t1 = t2 // True