Looking for working RegEx-Pattern

Hi,

Some tricky stuff. I have the following sample line:

ABC D E F G

Description

  • ABC (optional) - Characters
  • D (optional) should only match, if E (optional) was matched - D = Digits, E = Characters
  • E (optional) should only match, if F was matched - Digits
  • G (optional) - Characters
  • empty matches are allowed

Looking for a Pattern which works for those aspects. Valid matches are:

  • (ABC) F (G)
  • (ABC) E F (G)
  • (ABC) D E F (G)

Thank you all

Kem is THE regex guru here … :wink:
may I suggest to download (and surely buy) his excellent software regexrx to help you make regex queries ?

…i have a Personal License of Kems Tool. But this Topic is not so easy to get it working :smiley:

Hard to do without actual data but it sounds like you need to take advantage of lookaheads, something like…

^([a-z]+)?\\b *(D(?= *E *F))? *(E(?= *F))? *(F) *(G)?

I used your sample matches without parens to test.

BTW, it’s helpful to show cases where it should NOT match for a proper test.

Actually, because it’s not real data, let me change the pattern for testing purposes:

^(ABC)? *(D(?= *E *F))? *(E(?= *F))? *(F) *(G)?

My sample data:

== Matches ==
ABC F G
ABC E F G
ABC D E F G
D E F G
D E F
E F

== Won't match ==
D F
D E

Kem, you are a Magician. One thing matches i don’t want: D F and ABC D/E/G . How to fix it? And F G do not match.
Ok here Full sample line with all matches:

[code]ABC = Calendar
D = Day
E = Month
F = Year
G = B.C.

@#DGREGORIAN@ 24 MAR 1748 B.C.[/code]

Done my best, to make the Structure of the DateValue clear to you. Please have a Looks.

[h]Structure of Date[/h]

[Calendar] [Day] [Month] [Year][B.C.]

valid Before Christ
B.C. - Pattern: (B.C.)?

valid Year
Can look 1748 or 1748/49 - Pattern: (.*) -> will have a separated RegEx in a later Method

valid Month (optional)
JAN, FEB, MAR, APR, MAY, JUN, JUL, AUG, SEP, OCT, NOV, DEC,
NSN, IYR, SVN, TMZ, AAV, ELL, TSH, CSH, KSL, TVT, SHV, ADR, ADS,
VEND, BRUM, FRIM, NIVO, PLUV, VENT, GERM, FLOR, PRAI, MESS, THER, FRUC, COMP

valid Day (optional)
1…31 - Pattern: (\d*)?

valid Calendar (optional)
@#DGREGORIAN@, @#DJULIAN@, @#DHEBREW@, @#DFRENCH R@ - Pattern: (?:@#D(.*)@)?

[h]Matches[/h]

[code]1748/49 // Year
1748B.C. // Year, B.C.
JAN 1748 // Month, Jahr
1 JAN 1748 // Day, Month, Year

@#DJULIAN@ 1748 // Calendar, Year
@#DJULIAN@ 1748B.C. // Calendar, Year, B.C.
@#DJULIAN@ JAN 1748 // Calendar, Month, Year
@#DJULIAN@ 1 JAN 1748 // Calendar, Day, Month, Year.[/code]

[h]Won’t match[/h]

1 JAN // Day, Month 1 1748 // Day, Jahr 1 B.C. // Day, B.C. JAN B.C. // Month, B.C. @#DJULIAN@ 1 // Calendar, Day @#DJULIAN@ JAN // Calendar, Month @#DJULIAN@ B.C. // Calendar, B.C. @#DJULIAN@ 1 JAN // Calendar, Day, Month @#DJULIAN@ 1 1748 // Calendar, Day, Year @#DJULIAN@ 1 JANB.C. // Calendar, Day, Monat, B.C. @#DJULIAN@ 1 1748B.C. // Calendar, Day, Year, B.C. @#DJULIAN@ 1 JAN 1748B.C. // Calendar, Day, Month, Year, B.C @#DJULIAN@ JANB.C. // Calendar, Month, B.C.
Thanks for your Time :slight_smile:

Are these ones it SHOULD match but doesn’t ?
Or ones it should not match ?
Thats not clear to me
The reason I ask is some appear to be valid (1BC, and some not - 1 Jan 1748 BC in a julian calendar)

[quote=243775:@Martin Trippensee]
[h]Won’t match[/h]

1 JAN // Day, Month 1 1748 // Day, Jahr 1 B.C. // Day, B.C. JAN B.C. // Month, B.C. @#DJULIAN@ 1 // Calendar, Day @#DJULIAN@ JAN // Calendar, Month @#DJULIAN@ B.C. // Calendar, B.C. @#DJULIAN@ 1 JAN // Calendar, Day, Month @#DJULIAN@ 1 1748 // Calendar, Day, Year @#DJULIAN@ 1 JANB.C. // Calendar, Day, Monat, B.C. @#DJULIAN@ 1 1748B.C. // Calendar, Day, Year, B.C. @#DJULIAN@ 1 JAN 1748B.C. // Calendar, Day, Month, Year, B.C @#DJULIAN@ JANB.C. // Calendar, Month, B.C.
Thanks for your Time :)[/quote]

Secondarily it may be better to use a regex to just grab the chunks the write a small if then else statement to validate rather than try & have the regex do validation
Trying to write a validating parser just using regexs is … well … problematic at best
Like trying to have a regex validate a Xojo method
I wouldn’t use it that way
You can get regex to do a lot but there are things its not well suited to

  • I’ll duck now before Kem tries to hit me *

I agree with Norman. Sometimes a regex should be one in many steps.

I interpreted the second part as stuff that should not match, so I wrote the pattern that way. I’m a bit confused as to why @#DJULIAN@ 1 JAN 1748B.C. was included among those though.

So I came up with this. Hopefully it’s self-explanatory.

(?x)
(?(DEFINE)
	(?<cal> (?:@\\#D[a-z]+@))
	(?<day>\\d{1,2})
	(?<month>JAN | FEB | MAR | APR | MAY | JUN | JUL | AUG | SEP | OCT | NOV | DEC | NSN | IYR | SVN | TMZ | AAV | ELL | TSH | CSH | KSL | TVT | SHV | ADR | ADS | VEND | BRUM | FRIM | NIVO | PLUV | VENT | GERM | FLOR | PRAI | MESS | THER | FRUC | COMP)
	(?<year>\\d{4}(?:/\\d{2})?)
	(?<bc>B\\.C\\.)
)

^
# calendar (optional)
((?&cal) \\x20)?

# day but only if followed by month and year
((?&day) \\x20 (?=(?&month) \\x20 (?&year)))?

# month but only if followed by year
((?&month) \\x20 (?=(?&year)))?

# year
((?&year))

# B.C.
(\\x20? (?&bc))?

[quote=243792:@Kem Tekinay]I agree with Norman. Sometimes a regex should be one in many steps.

I interpreted the second part as stuff that should not match, so I wrote the pattern that way. I’m a bit confused as to why @#DJULIAN@ 1 JAN 1748B.C. was included among those though.
[/quote]
I’d also consider these legal
1 B.C. // Day, B.C. <<<< year 1 BC

If you use Julian as “strictly increasing day number from the date of introduction” then day1 has a meaning
If you use Julian to mean just numbering days successively through the year then 1 1748 is Jan 1 1748 and well defined
Depending on how you use “julian calendar” either of these should be legal

@#DJULIAN@ 1 // Calendar, Day <<<< Julian calendar day 1 is legal (see above)
@#DJULIAN@ 1 1748 // Calendar, Day, Year <<<<< OR this is legal (see above)

Heho, thanks for your Wirk. I copied and paste it Kems to RegEsRx. It’s mit matching anstaunt. Why.
B.C. Is only Lloyd

Wondering if that last message was a product of auto-correct run amok or a liquid lunch… :wink:

Most of it I get but [quote=243813:@Martin Trippensee]B.C. Is only Lloyd[/quote]

I dont recall anyone named Lloyd in BC

Thanks to both of you. Now i got the Matches within RegExRX, but i would love to matche everything, inclusive Nil-Matches.
I should never answer by Phone (Autocorrect ahhhrrrrr) :smiley:
Please have a look:

Why i don’t get the Matches from $1 to $5 and only from $6 to $10?
I generated the Xojo-Code via RegExRX and then i tried to output the matched Strings via Match.SubExpressionString(1) (one to five)…i don’t see any results…Could You please create a Xojo Demoproject?
Thanks again. Greetings

The screen shot tells you what’s going on. The DEFINE block sets up subroutines, and each subroutine counts as a subgroup, even though it isn’t meant to match anything. It’s strange, I know, but that’s how it works. So subgroup 1 ($1) is where “cal” is defined, and so on. Your actual matches start at subgroup 6 ($6) so you should write your code accordingly.

dim cal as string = match.SubExpressionString( 6 )
dim day as integer = match.SubExpressionString( 7 ).Val
// and so on