Text and regex

Hi,

I want to remove pattern found using regex. was trying to use this code, but seem still not working
the goals is want to remove the salutation on the first 5 or less character from the left.

dim rx1 as new RegEx
rx1.SearchPattern = "^.*\b(mr |Mr.|mrs |mrs.|ms|ms.|)\b.*$"
dim match1 as RegExMatch 
match1= rx1.Search((txtnama.text))

if match1 is nil then 
else
txtnama.text=ReplaceAll(txtnama.text), rx1, "")
end if

I am putting the code in textfield lostfocus, in order to execute the code while end typing.

any help?

thanks
arief

You can remove the code to do the search. That’s not needed for the replace function. But you do need to use the replace function that’s part of regex instead of the global ReplaceAll function.

rx1.ReplacementPattern="" txtnama.text=Rx1.Replace(txtnama.text)

I have change it to,

dim rx1 as new RegEx
rx1.SearchPattern = "^.*\b(mr |Mr.|mrs |mrs.|ms|ms.|)\b.*$"
rx1.ReplacementPattern=""
txtnama.text=rx1.Replace(txtnama.text)

but all the text gone.

what I am doing wrong ?

thanks
regards,
arief

You pattern is not correct in many levels. The worst one is the use of the special char dot “.”

1 Like

ah okay,
so regex will not work in this case.
so I have to use replace function.

thanks
regards,
arief

Why not fix the expression? Write and test live. https://regex101.com/

What about "^\s*(M|m)(rs\.?|r\.?|s\.?|)\s*"

Why use a regex when a simple function will work instead?

  1. Use nthfield to get the text before the first space.
  2. Make an array out of your mr. and ms. values.
  3. Check the text before the first space against the values in the array.

If this is true, opt for the simplest or fastest one.
But sometimes a Regex is the simplest one, and easily expandable for more cases, or even dynamically changeable (like the expression in a variable) instead of statically (hardcoded needing recompiling). Those cases. All depends of your use cases.

Wouldn’t the even simpler:

^([mM]?r?s?\.?) \w+$

Work as well. This matches upper or lower case M, followed by rs, r, or s and an optional period

But will not catch things like “Dr” or “The Honorable” etc. Salutations come in lots of flavors, with or without a period. And that is just for common English ones.

Properly splitting a “full name” field into component parts is actually very hard to handle all the edge case. Try “The Honorable Richard Van ■■■■ Sr” and give me the middle name…

But even splitting off the salutation is more complex than it would seem at first blush.

Edit: I guess the software does not like a certain last name. Think of a famous actor who normally went by ■■■■ Van D…

Edit 2: Guess it does not like the first name either…

From your example I thought you only cared about Mr/Mrs/Ms. If you care about Dr, The Honorable, etc including stripping Sr and related from the end of names, then you arent really dealing with “regular” inputs so regular expressions/regex might not be the approach to use.

The OP did not mention Dr or other salutations. I was merely pointing out that attempting to properly split off a salutation is more complex than it first seems. And in my experience, if you are trying to do that you may also be wanting to attempt to parse out the other parts of the name.

That may or may not be the case here – but my point was that this is actually harder than it seems, if you care about edge cases.

Thanks for all the helps.

its working now.

regards,
arief

1 Like

As programmers, we can find multiple ways to solve a problem, but using the right tool is usually best, and in this case, RegEx is the right tool.

As Rick pointed out above, the pattern has issues, but those can be fixed.

As a reminder, here is the pattern:

"^.*\b(mr |Mr.|mrs |mrs.|ms|ms.|)\b.*$"

The first issue is one of overmatching. You are interesting in certain strings, but match everything that comes before and after it too, and that’s unnecessary.

The next issue is that there is an alternation with no pattern at all (the “|” followed by the closing parenthesis), which is also not needed.

The next is that, with alternations, order matters. If you have the pattern mat|matter, “matter” will never match because “mat” already did.

The next is that you can’t have a word boundary after a space, and some of your parts have spaces.

The next is the use of the dot which is a wildcard, so Mr. will match “Mr.”, but would also match “Mrs”.

So here is how I’d fix it:

\b(mrs?\.|mrs?\b|ms\.|ms\b)\x20*

Translated:

  • A word boundary
  • Either:
    • Mr. or Mrs.; or
    • Mr or Mrs with a trailing word boundary; or
    • Ms.; or
    • Ms with a trailing word boundary
  • Any trailing spaces

Plug that pattern into your code and it should work as expected.

1 Like

D’oh!

hi,
This very clear,
Thanks for the details explanation.
its worked.

regards,
arief

@Kem_Tekinay is always great at RegEx.

2 Likes
Forum for Xojo Programming Language and IDE. Copyright © 2021 Xojo, Inc.