RegEx literal characters

Let’s say you have a RegEx pattern like “*** My stars ***” but you don’t want the *'s to be wildcards, you want them to be literal. How do you escape them? Better yet, how do you escape ALL characters?

Put a slash in front of it, like \* and then it’s escaped.

As Marc says, you escape them with a backslash. You don’t escape ALL characters, you escape each character.

That becomes a problem when there’s a RegEx special modifier. For example, what if you wanted to escape “b”? There’s already a \b that performs something special.

Would this be a good solution? Or, is there a better way to do this?

        Dim xRegEx As New RegEx
        Dim xMatch As RegExMatch
        Dim sPatternBits() As String = Split(this_bit,"")
        Dim m, mEnd As Integer
        mEnd = uBound(sPatternBits)
        For m = 0 to mEnd
          If sPatternBits(m) = "*" Then
            sPatternBits(m) = ".*"
          Else
            sPatternBits(m) = "\" + Oct(Asc(sPatternBits(m)))
          End If
        Next
        xRegEx.Options.CaseSensitive = False
        xRegEx.Options.TreatTargetAsOneLine = False
        xRegEx.Options.StringBeginIsLineBegin = True
        xRegEx.Options.StringEndIsLineEnd = True
        xRegEx.Options.MatchEmpty = False
        xRegEx.SearchPattern = ReplaceAll(this_bit,"*",".*")
        xMatch = xRegEx.Search(FThisItem.Name)
        If xMatch = Nil or xMatch.SubExpressionCount = 0 Then Exit For i//Haven't handled partial wildcard matches yet

Why would you want to escape a b? A “b” is already a b – it doesn’t need to be escaped. You only need to escape RegEx specific letters, such as . and * and \\ (which you can do with a \\).

Unless I’m missing something?

I’m matching file names on the user’s computer, which could be anything.

Wouldn’t you just use a .+ then?

Not really, because the escape character is not supposed to be used to escape alphas (it does, when the alpha in question is not part of an escape sequence).

http://www.pcre.org/pcre.txt

My understanding is that Xojo uses PCRE. You don’t escape low-ascii characters in regex. Alternatively, I believe you can do \Q…\E where everything between \Q and \E will be taken literally.

I’m not matching everything. I’m matching the potential for characters.

For example, I want to have a query like:

Cachesdb*

But, with a bit more flexibility. I want to check the user’s home path so like:

/Users/This is my super star (*) home folder//Library/Caches/.db

Now you see those ***'s shouldn’t be matched dynamically but the other ones should be.

I think I’ve figured it out… does anyone notice any issues?

Function MatchesWildcardKSW(Extends search_this As String, the_pattern As String) As Boolean
Dim sPatternBits() As String = Split(the_pattern,"")
Dim xRegEx As New RegEx
Dim xMatch As RegExMatch
Dim m, mEnd As Integer

mEnd = uBound(sPatternBits)
For m = 0 to mEnd
If sPatternBits(m) = “" Then
sPatternBits(m) = ".

Else
sPatternBits(m) = “” + Oct(Asc(sPatternBits(m)))
End If
Next

xRegEx.Options.CaseSensitive = False
xRegEx.Options.TreatTargetAsOneLine = False
xRegEx.Options.StringBeginIsLineBegin = True
xRegEx.Options.StringEndIsLineEnd = True
xRegEx.Options.MatchEmpty = False

xRegEx.SearchPattern = Join(sPatternBits,"")
xMatch = xRegEx.Search(search_this)
If xMatch = Nil or xMatch.SubExpressionCount = 0 Then Return False

Return True
End Function

It probably won’t be an issue, but if you get into Unicode characters with values > &o777, this code won’t work. As an alternative, you can use the hex code with the token \\x{XXX}.

Also as an alternative:

the_pattern = the_pattern.ReplaceAllB( "\\E", "\\\\EE\\Q" )
the_pattern = the_pattern.ReplaceAllB( "*", "\\E.*\\Q" )
the_pattern = "\\Q" + the_pattern + "\\E"

So a pattern like “/this/path//to/nthing” would become “\Q/this/path/\E.\Q/to/n\E.\Qthing\E”, and that will work.