Simple RegEx woes again

Uggh. I think I stand a better chance of getting a Doctorate in Nuclear Physics than ever getting a grasp of RegEx. I’ve read the help, looked at examples on the web, even went out to a RegEx tester and think I have the right pattern. But still can’t figure out how to detect 3 characters that must be in A-D and F to be valid. “ABD” would accepted, “CCF” is OK. “AAE” is rejected. Obviously, “AMP” would be rejected. This is basically what I have:

MyTestString = “ABD”
rx.SearchPattern = “[A-D,F]”
myMatch = rx.Search(MyTestString)
intMatches = myMatch.SubExpressionCount

Can anyone tell me what the heck I am missing or where I should be checking for 3 matches (that’s what the RegEx tester shows – 3 matches on a valid 3 character grouping.

Did you pick up RegExRX?
I don’t do anything RegEx without it.

Try this as your SearchPattern: code[/code]
The SubExpressions will be your matches.

Thanks Tim. I’m on something called regextester.com It shows /[A-D,F]/g at the top and shows 3 matches for test on ABF. I tried adding {3} but I am not sure where I pick up that it is valid when I pass in ABF?

The {3} says this repeats 3 times, and the parenthesis groups it all together. The grouping also lets us use the subexpressions. I do have a habit of over-grouping (but it hasn’t failed me yet!) I’m not sure it’s entirely necessary in this case.

Per your definition above, ABF would be valid - should it not be?

Check out RegExRX from Kem, the RegEx guy http://www.mactechnologies.com/index.php?page=downloads#regexrx

Here it is working on regextexter: http://regextester.com/?fam=96440
Note, I switched the engine to PCRE, as Xojo uses the PCRE engine - but in this case, (again) I’m not sure it matters.

Yes, Kem helped me with (a more complex) RegEx about a year ago, very helpful. Yes, ABF is valid. I’ve got what I think is the working pattern at your suggestion, I am just not sure where to get the results out of the Xojo code.

strTemp = myMatch.SubExpressionString(0)

Returns the first character, and I can’t find anywhere that it says I had 3 matches. But I can see it on the regextester link you sent. That looks correct.

Tim, thanks for the help, {3} is what I was missing. I think this works:

rx.SearchPattern = "[A-D,F]{3}" myMatch = rx.Search(strTemp) If MyMatch = Nil Then intTemp = 0 Else intTemp = MyMatch.SubExpressionCount End If

Undoubtedly a better way of doing this, but this will work.

Maybe this demo project will help: regex.xojo_binary_project

Let me try it…

Yes because without the {3} you’re telling it that any single letter that fits [A-D,F] is a match. The {3} says this must repeat 3 times to match.

Excellent, that seems to work better. I wasn’t thinking I needed to try this in a loop. {3} is the part I was missing I need to read up on that. Thanks again!

A couple of things.

When you use a character class, anything within the square brackets may match. In your case, [A-D,F] will match the letters you specified or a comma, which is not what you want. Use [A-DF].

That pattern [A-DF]{3} will match any of the following:

ABF
ABCF
AAABBBCCC

In other words, the overall length of the string will never be considered so “AAA” will be matched in the last example. If you want to prevent that, use:

rx.SearchPattern = "\\A[A-DF]{3}\\z"

That will confirm that the entire string is nothing more than the three allowable letters.

Shoot, I thought the comma meant “additionally, this letter”

That’s what I thought too. I’m starting to think RegEx is sorcery, black magic… best not to go near it!

Thanks, Kem. I was checking that for 3 characters with a Len() statement, but if I can have the RegEx do it as well. And thanks for the help Tim, the project was very helpful in understanding this problem.

A great tutorial site is

http://regular-expressions.info

Regular expressions, like any other type of language, come with an “ah-HA” moment. Once you have it, everything else falls into place.

I think I had that in my 12th year of marriage.

Magistral Kem as always …

BTW, as far as “black magic goes”, consider this:

[A-D] means “any letter from A through D”, while [-AD] means “a hyphen, A, or D”, and [-A-D] and [A-D-] both mean “a hyphen or any letter from A through D”. []A-D] means “a close, square bracket, or any letter from A through D”, same as [A-D\\]].

:slight_smile:

And after reading that, I just signed up for my first class in Nuclear Physics. My approach going forward will be more likely be brute forcing it into RegExRX until it works, C++ was never this hard.

Sometimes that’s about the only way to do it.