RegEx replace multiple characters

Of course, I supposed that :innocent:

Could my existing project handle RegEx different from a new project?

Thanks for the effort, the result is what I need, but I have to make it work with a generic existing RegEx option in my app…

it was not my effort, but a chatGPT one …

Try it once more with “my code is faultive” in mind and if it is the case, you will now find the error…
You may start by replacing ComboBox with a string with u umlaut… (ü)…
Then modify each and every line …

This one too I guess :slight_smile:

This is super crazy: if in the code I type ü in the text to be processed, the code works. If I copy a file name with an ü and paste the it into the code, it fails. Maybe time for lunch.

Test confirms: a typed ü is different from a ü copied from the finder. This blows my mind.
In a normal s.replace function, this does not matter. But in RegEx it does.

There are more than one “ü” depending on the encoding. Only in UTF-8 there are 2.

ü as one codepoint, and ü as u codepoint + combining ¨ codepoint.

They look the same, but internally are different byte codes.

Not crazy: there is a difference between the two…
ü typed as item name in the Finder (*) is <> from ü typed in Xojo who is UTF-8 (AFAIK).
I do not know the “Finder Encoding”…

  • Read with FolderItem1.Display…

Go Figure !

I’m pretty sure you can’t replace multiple characters with others in ONE regex
except if @Kem_Tekinay proves the contrary (and he is capable of…)
you need one regex for each kind of characters you want to replace
ex: [ÀÁÂÃÄÅ] → A, [ÈÉÊË]->E, etc…

1 Like

What I found in Xojo: the Encoding of a text (• test ü ä 123) typed and copied from the finder are both UTF-8. But the length/length(bytes) is different: 14/18 vs 16/20. Does this information lead somehow to a solution?

The web shows these kinds of examples:

var article = a.replace(/ |\./g, "_").replace(/\r?\n|\r|\$|\#|\[|\]/g, "")

But I don’t know if Xojo RegEx can handle this, appears not to be standard.

As already said, looks the same they aren’t:

Const addsDiaeresis = &u0308

Var uUmlautPreComposed As String = "ü"

Var uUmlautComposite As String = "u"+addsDiaeresis

image

image

image

Can I convert one into the other so I can use them as expected?

Replace both.

User won’t :wink:

Your code can.

I am not going to code every single character in any format, text must be unified.

Ok

I appreciate your effort!