RegEx Question : OH MIGHTY KEM!!!!

  Dim re As New RegEx
  Dim rm As RegExMatch
  re.SearchPattern="^[a-zA-Z][a-zA-Z_0-9]\\w*$"
  rm=re.search(keyword)
  If rm<>Nil Then Return tokenSYMBOL
  Return tokenUNKNOWN

I need to validate that the variable “KEYWORD” contains a valid symbol name
(ie. Starts with A-Z (case no matter), and is followed optionally by A-Z,0-9 or _)
what I am missing is the “optionally” part… this fails if keyword is only a single character

You don’t need the second character class since \w will match all of those characters anyway, and having it there makes a second character required. Try this:

^[a-z]\\w*$

(Unless you change the options, the patterns are case-insensitive.)

Ok… forgetting about the case sensitive part… doesn’t that say the string must ONLY be the characters a-z?

I need the FIRST character to be A-Z
and the 2nd to N characters to be A-Z, 0-9 or _ , but ONLY if the string > 1 character long
if it is only 1 character long then it must be A-Z
basically a string that matches the defintion for a variable name for the most part

x is valid
x3 is valid
x_3_4z is valid
3x is not valid

seems this works

re.SearchPattern="^[a-z][a-z_0-9]*$"

You just wrote the longer version of the same pattern. :slight_smile:

I see nothing in your pattern that checks for digits ONLY after the 1st character… now I’m not saying you are wrong… I’m trying to see what I am missing

OH!!!
the magic word here is the “\w”

start with A-Z and optionally followed by 0-n “word characters” where are “A-Z”, “0-9” and “_”

NOW IT MAKES SENSE :slight_smile:

Of all the elements of computer programming, RegEx is the one that I have never ever been able to make sense out of…

I THOUGHT I could take what I learned above and make a simple addtion, but its not happening… :frowning:

I need a pattern that will match a string, that must meet this criteria

  • Starts with a Letter (case is not important)
  • followed by [A-Z] (any case) , or [0-9], or “_”, or “.”
  • and in the case of the “.”, only ONE is allowed, and must be followed by at least one other character

basically a strict filename pattern

Do you have Kem’s RegExRX? Absolutely essential.

Does this do what you want?

A-Z

or, if you are not setting the case insensitive flag

A-Za-z

As for the second and third bullets, are we talking about a single character, or multiple? And the character following the period, are we talking any non-space character?

this RegEx needs to be “generic” I have two projects to use it in … one in Xojo, one not, so I’d rather not use something like RegExRX

  • Starts with a Letter (case is not important)
  • followed by 0 to n characters that must be [A-Z] (any case) , or [0-9], or “_”, or “.”
  • and in the case of the “.”, only ONE is allowed, and must be followed by at least one other character
$abc.3  is not valid  (no $ allowed)
9.abc is not valid (must start with letter)
A. is not valid (period must have at least one [A-Z,0-9] following it
A..B is not valid (contains multiple ".")

as to “any non-space” character, no, not “any”… the entire string must be A-Z a-z 0-9 . or _ characters only… nothing else
no spaces, or non alphanumeric except “.” or “_”

^[A-Z][A-Z0-9_]*(\.[A-Z0-9]+)?$

or

^[A-Za-z][A-Za-z0-9_]*(\.[A-Za-z0-9]+)?$

if you are not setting the case insensitive flag

this is close , but a dot and following are required

"^[A-Za-z]\\w*+\\.+\\w*+$"

I would have thought this would make the dot etc optional, but it doesn’t work

"^[A-Za-z][\\w*+\\.+\\w*]?+$"

Try this:

^[A-Z]\\w*(\\.\\w+)?$

I will, but this is what I came up with via trial and error

^[A-Za-z]\\w*([\\.][\\w*]+)?$

Kem… your suggestion gave this

RegEx=^[A-Z]\\w*(\\.\\w+)?$  String=filename.ext result=false
RegEx=^[A-Z]\\w*(\\.\\w+)?$  String=filename result=false
RegEx=^[A-Z]\\w*(\\.\\w+)?$  String=filename..ext result=false
RegEx=^[A-Z]\\w*(\\.\\w+)?$  String=.ext result=false

mine gave this

RegEx=^[A-Za-z]\\w*([\\.][\\w*]+)?$  String=filename.ext result=true
RegEx=^[A-Za-z]\\w*([\\.][\\w*]+)?$  String=filename result=true
RegEx=^[A-Za-z]\\w*([\\.][\\w*]+)?$  String=filename..ext result=false
RegEx=^[A-Za-z]\\w*([\\.][\\w*]+)?$  String=.ext result=false

Your regex must have the case sensitive flag set, as yours and Kem’s are essentially the same (aside from yours has [A-Za-z] while Kem;s has [A-Z])

You don’t need the brackets around the \. and the \w ( your code [\w*]+ is essentially saying that there are 1 or more instances of 0 or more words)

well I don’t know… but mine works and Kems (sorry) does not even when I change to [A-Za-z]

“r.r” is valid, but Kem regex says it is not

RegEx=^[A-Za-z]\\w*([\\.][\\w*]+)?$  String=r.r result=true
RegEx=^[A-Za-z]\\w(\\.\\w+)?$  String=r.r result=false

You’re missing the asterisk after the first \w in the second line.

And, I was mistaken in saying that yours and Kem’s were essentially the same, the ‘1 or more instances of 0 or more words’ should evaluate differenty

(\.\w+)? = an optional group consisting of a period followed by 1 or more words.
([\.][\w*]+)? = an optional group consisting of a period followed by one or more instances of 0 or more words.

I’m not sure why the second works; I would assume that this would evaluate true if the period is not followed by a character. I expected this regex to result true for string=r. (no character following the period) but it didn’t. Hopefully Kem can explain this one, as it’s beyond my understanding.