RegEx Help Please

Hi. I’m trying to find the correct search pattern for a search word that includes a period such as replacing the abbreviation “b.g.” with “background”.

If I use “\bb.g.\b” it does not work…probably because \b denotes a word boundary (which I need in general) but in this instance each period it encounters in the search text is also a word boundary so when it finds “b.” it just stops and starts again etc.

Can anybody help me out with this one?

Thanks.

For one thing, the dot has special meaning in RegEx - it means any character. You need to escape it.

“b\.g\.” or “\bb\.g\.” both work for me, but I think the period is not working with your second word boundary.

Thanks Mark. That seems to work without the last \b but I need to test this a bit more to be sure with various string combinations.

With regard to escaping the dot, is there a RegEx search pattern I could use to search a string for all “special characters” and replace them with escaped alternatives? i.e. find ., *, $, +, ? etc in a string and return an escaped version? This is because the “b.g.” was just an example but in reality it could be any string that would be placed in the RegEx search pattern.

It looks like a word break (\b) will only register if it’s around a word character so leaving it off the end is the right move. If you really want to make sure what was follows is either whitespace or the end of the document, you can use a lookahead:

\\bb\\.g\\.(?=\\s|$)

I didn’t understand your second comment. Why would you need escaped alternatives of those characters? How would you use them?

You can include practically any characters within a character class (square brackets), and they will act as if escaped:

[$?^] 

You can also enclose any literal string between \Q and \E.

\\Q\\b.\\s\\E

That will match, literally, “\b.\s”.

Thanks Kem. I’ll try to explain. I will be looping through an array of user generated words that may or may not contain “special characters” in order to perform replacements on some of those words according to my own needs . The user may create a word “b.g.” but not realise that word contains the special character “dot” so before I perform a RegEx replacement of that word (changing it to “background”) I have to escape it for the RegEx search pattern and change it to “\bb\.\g\.” etc.

What I’m saying is that for each word it looks as if I will have to check for each form of “special character” and rather than running a ReplaceAll in a loop for each variation (i.e. looking for “*” and escaping, looking for “$” and escaping, looking for “.” and escaping etc) I was wondering whether there was a RegEx replace all search pattern I could perform in a single statement.

Update: the following seems to be working…

reg.SearchPattern = "\\b\\Q" + word + "\\E(?=\\s|$)"

That is a fine solution as long as your user doesn’t enter “\E” in their word. If they did, .e.g, they entered “abc\Ed”, you would end up with a pattern like “\Qabc\Ed\E” and it wouldn’t work.

To be on the safe side, this should cover it:

word = word.ReplaceAllB( "\\E", "\\\\EE\\Q" )
reg.SearchPattern = "\\b\\Q" + word + "\\E(?=\\s|$)"

To be on the even safer side, you can convert each character into its “\x{NNNN}” form. That form will let you specify any character by its Unicode code point, and it will always be treated as a literal.

dim chars() as string = word.Split( "" )
for i as integer = 0 to chars.Ubound
  dim char as string = chars( i )
  if ( char >= "a" and char <= "z" )  or ( char >= "0" and char <= "9" ) then
    // Do nothing
  else
    chars( i ) = "\\x{" + Hex( char.Asc ) + "}"
  end if
next
word = Join( chars, "" )
reg.SearchPattern = "\\b" + word + "(?=\\s|$)"

That would ultimately turn the word “b.g.” into the pattern “\bb\x{2E}g\x{2E}(?=\s|$)”. Not easy to read, but it will absolutely work in all cases.

For what it’s worth, the latter is how I’d do it.

That’s fantastic. Thank you for the help and detailed explanation. I’ve learned a lot!