Bug in ReplaceAll?

Hi,

as mentioned already in an other topic (https://forum.xojo.com/20139-mail-app-attachment-as-symbol-or-within-mail) there seems to be a generell bug in ReplaceAll. I don’t want to mix to much things in the other post, so i made this one.

I use this code to convert a string:

    Dim dlg As New OpenDialog
    Dim f As FolderItem
    
    dlg.Title = "Bitte Anhang auswhlen..."
    f = dlg.ShowModal
    
    if f <> NIL then
      
      dim before As String
      dim after As String
      
      System.DebugLog("f.Name: " + f.Name)
      
      'before = f.Name
      before = "geschftsverlauf.pdf"
      
      system.DebugLog("before: " + before)
      
      after = ReplaceAll(before, "", "ae")
      after = ReplaceAll(after, "", "oe")
      after = ReplaceAll(after, "", "ue")
      
      system.DebugLog("after: " + after)

   end if

that works, the last system.debuglog shows this:

after: geschaeftsverlauf.pdf

but when i use an already filled string like f.Name which comes from a FileOpen Dialog, the converting does not work.

Can anyone confirm this?

Thanks

Michael

I’d guess that this is a problem of precomposed vs. decomposed UTF8. And this is a feature and not a bug. You have to make sure that your strings are normalized. Have had fun like this before, too, and thought I’m going mad because the strings wouldn’t match in a comparison.

Noob-Question: How can i normalize my string?

e.g. with MBS Plugin:

[code]Function Normalize(t as string) As string
const kCFStringNormalizationFormD = 0 // Canonical Decomposition
const kCFStringNormalizationFormKD = 1 // Compatibility Decomposition
const kCFStringNormalizationFormC = 2 // Canonical Decomposition followed by Canonical Composition
const kCFStringNormalizationFormKC = 3 // Compatibility Decomposition followed by Canonical Composition

dim s as CFStringMBS = NewCFStringMBS(t)
dim m as CFMutableStringMBS = s.Normalize(kCFStringNormalizationFormD)

Return m.str

End Function[/code]

Hi Christian,

i understand, that you want to sell your Plugins. Thats ok, i do this with my products as well. But i hope you understand that i want to realize this without plugins. Beatrix mentioned, that this is not a bug. But when i look at these two variables f.Name and attachment.Name they are both strings. Why is xojo not able to compare two strings? What about the text-type? Can this help to get the strings “normalized”?

the Mac OS X uses decomposed unicode string. So you have as two characters, the a and the dots. Normally in Xojo we use as one character.

There is no need for a Plug-in. Just search for “normalise” in the forum.
There e.g. a mail from me some time ago in thread "String comparison works on Windows but not on OSX "

oh, you could even do it manually with ReplaceAllB.

This code refers to the carbon-framework. is it still present in xojo and will it be also in the future?

as i read about ReplaceALL, i also found ReplaceALLB but didn’t get the difference between these two?

The declares into “Carbon.framework” are misleading. Those functions actually live in “CoreFoundation.framework” and the declare works only because Carbon re-exports the symbols from CoreFoundation. The “Normalize” function in the post will keep working in the future, even in 64-bit applications.

Michael, I don’t know if you even READ posts made for you. This post explains a big deal, as well as provides a cleaned up file name for you.

https://forum.xojo.com/20139-mail-app-attachment-as-symbol-or-within-mail?search=äéèçàùâêîôû

At some point, it is extremely frustrating to try to help just to see someone apparently intent on overlooking one’s efforts :confused:

I would ignore the suggestion to use ReplaceAllB. Operating on strings at the byte level is generally inadvisable unless you really know what you’re doing or your string comprised of binary data (as opposed to something textual).

[quote=169288:@Michel Bujardet]Michael, I don’t know if you even READ posts made for you. This post explains a big deal, as well as provides a cleaned up file name for you.

https://forum.xojo.com/20139-mail-app-attachment-as-symbol-or-within-mail?search=äéèçàùâêîôû

At some point, it is extremely frustrating to try to help just to see someone apparently intent on overlooking one’s efforts :/[/quote]

Sorry Michel, didn’t wanted to ignore you. Your suggestion works, it replaces ä with a, the best i have at the moment. But what im searching for i to replace ä with ae. in german language it is not just replace ä with a, this becomes a totally different name. And to answer the question: i do read all posts made for me.

You know, what is most frustrating is that I tried painfully to help, attempted to explain what a composite character was, and you did not even make the elementary effort to ask anything.

If you paid attention to what I have been painfully trying to explain, there are two ways to present accented characters in an UTF-8 string : in one character that has the accent, or in two characters : the letter itself, then the accent.

Here is the solution :

replaceall(name,"a"+chr(776),"ae")

Now, I am done. If you want to do the same for other characters, analyse the string character by character. That’s all I will share since apparently it does not get through.

thanks Michel. i will try this.

This works nice. Thanks again.

Last question: how do you found chr(776)?

As Beatrix mentioned, it’s almost certainly due to precomposed versus decomposed characters. This is a quirk of Unicode where some characters can be represented multiple ways. For example, like can be represented in two different ways:

  • U+00E4 LATIN SMALL LETTER A WITH DIAERESIS
  • U+0061 LATIN SMALL LETTER A, U+0308 COMBINING DIAERESIS

Simplifying a small bit, string comparison is done by looking at Unicode codepoints. In this case, ReplaceAll is looking for the specific sequence U+00E4 but your string has U+0061, U+0308.

The Text type operates at a higher level than String and treats both representations the same. For example:

Dim precomposedValue As Text = &u00E4 Dim decomposedValue As Text = &u0061 + &u0308 Dim result As Text = precomposedValue.ReplaceAll(decomposedValue, "foo") // result is now simply "foo"

Normalization refers to the act of taking a series of Unicode codepoints and making all of them being in one specific form. For example, there’s a form of normalization that converts everything to be decomposed characters. There’s no way in Xojo to perform normalization, but the Text type renders it mostly unnecessary.

If you don’t want to use Text, the forum post with declares that was linked to earlier is a good approach. Once you’ve normalized your string, you only have to worry about doing a ReplaceAll on one specific form of your character.

Thanks very much for the explanation, Joe. The solutions Michael provided above works like i need it. But i want to understand where i can find the characternumber like michael posted with chr(776) for the second part of the . Where do i have to search to find what an is made of (o and some chr(?)).

Michael

[quote=169294:@Michael Bzdega]This works nice. Thanks again.

Last question: how do you found chr(776)?[/quote]

I told you : I looked at the ascii value of each character in the string. Use the Mid function for that in a loop.

You also have a clue in the post I made in the other thread, explaining that ä was made of “a”+chr(&hCC). &hCC is decimal 776.

In the encodeURLComponent result, you can see it as well :

Here is how to get the same result with EncodeURLComponent :

name =EncodeURLComponent(name) name = replaceall(EncodeURLComponent(name),"a%CC%88","ae")