A question about text encoding

I am confused…
Playing around with textencoding on a Mac.

I have two TextFields, one contains a character (a umlaut), typed via interface, another one contains the same character, copied and pasted from the name of a FolderItem. In a variable, they are not equal…

From FolderItem:
Encoding: UTF-8
Length: 2
Length (Bytes): 3
Binary: 61CC 88

Typed in TextField:
Encoding: UTF-8
Length: 1
Length (Bytes): 2
Binary: C3A4

Why is this so and how to make it comparable?

Read up on precomposed versus decomposed unicode. You need to normalize your strings.

Didn’t I write about this topic yesterday???

The first one seems to be a combined character:

U+0061 a 61 LATIN SMALL LETTER A
U+0308 ? cc 88 COMBINING DIAERESIS

The second one is just:

U+00E4 c3 a4 LATIN SMALL LETTER A WITH DIAERESIS

I’d like to know whether these (and similar pairs) can compare to being equal and what happens (or is supposed to happen) with methods such as .mid, .left, and so on.

Normalize first, compare later. Code with original comment:

'string needs to be normalized or string comparison won't work, b.l.o.o.d.y Exchange!!! dim theCFString as CFStringMBS = NewCFStringMBS(theName) if theCFString <> Nil then dim theMutableString as CFMutableStringMBS = theCFString.Normalize(2) if theMutableString <> nil then theName = theMutableString.str end if

I thought I was going nuts because the strings weren’t the same.

(well I see the forum interface mishandled that by posting twice and doing something odd to the COMBINING DIAERESIS).

Thank you Beatrix.
I did not find anything you wrote yesterday and I do not check every post every day here, sorry.

Am I right you use MBS-Elements in your codesample?

[quote=394921:@Beatrix Willius]Read up on precomposed versus decomposed unicode. You need to normalize your strings.

Didn’t I write about this topic yesterday???[/quote]
I think you did in the topic on filenames in Mac/Win - “Encoding issue with Mac and Win filenames”.

I found that topic quite helpful for this link:

https://forum.xojo.com/26510-two-identical-strings-not-identical

I didn’t find any documentation for normalize(), however.

Your link does not work.

Oops, sorry:

https://forum.xojo.com/26510-two-identical-strings-not-identical

[quote=394929:@Tim Streater]Oops, sorry:

https://forum.xojo.com/26510-two-identical-strings-not-identical[/quote]
Just a reminder… when posting links, you don’t need to wrap them in a url tag unless you want the clickable text to be something other than the link itself.

This helped…

https://forum.xojo.com/conversation/post/46322

Maybe it would make sense to take care this in XOJO.
Thank you Betrix and Tim.

I wonder if converting each string to text and back would do the trick?

It cannot…

Right you are, even though comparing the Text values works.

My test code:

dim s1, s2 as string

s1 = DecodeHex( "61CC88" ).DefineEncoding( Encodings.UTF8 )
s2 = DecodeHex( "C3A4" ).DefineEncoding( Encodings.UTF8 )

AddToResult s1
AddToResult s2

AddToResult s1 = s2 // False

dim t1, t2 as text

t1 = s1.ToText
s1 = t1

t2 = s2.ToText
s2 = t2

AddToResult s1 = s2 // Still false
AddToResult t1 = t2 // True