As programmers, we can find multiple ways to solve a problem, but using the right tool is usually best, and in this case, RegEx is the right tool.
As Rick pointed out above, the pattern has issues, but those can be fixed.
As a reminder, here is the pattern:
"^.*\b(mr |Mr.|mrs |mrs.|ms|ms.|)\b.*$"
The first issue is one of overmatching. You are interesting in certain strings, but match everything that comes before and after it too, and that’s unnecessary.
The next issue is that there is an alternation with no pattern at all (the “|” followed by the closing parenthesis), which is also not needed.
The next is that, with alternations, order matters. If you have the pattern mat|matter
, “matter” will never match because “mat” already did.
The next is that you can’t have a word boundary after a space, and some of your parts have spaces.
The next is the use of the dot which is a wildcard, so Mr.
will match “Mr.”, but would also match “Mrs”.
So here is how I’d fix it:
\b(mrs?\.|mrs?\b|ms\.|ms\b)\x20*
Translated:
- A word boundary
- Either:
- Mr. or Mrs.; or
- Mr or Mrs with a trailing word boundary; or
- Ms.; or
- Ms with a trailing word boundary
- Any trailing spaces
Plug that pattern into your code and it should work as expected.