RegEx - conditionals?

Hi everyone,

i have the following good working RegEx Pattern. There’s only one situation, the Pattern won’t work correctly. If I have a three or four digit number (e.g. the year “1800”), the result matches wrong and 1800 matches as day (18) and year2 (00). The day should only match, if it is followed by a month and a year. A years, like 1800 should only match, if the given String has a 1…4 digit year number or a a second year like this 1800/01. How will I get the right results?

Maybe this is a topic for @Kem Tekinay :slight_smile:

code
(?(DEFINE)
(?@\#D[A-Z\s+]+@)
(?\d{1,2})
(?JAN|FEB|MAR|APR|MAY|JUN|JUL|AUG|SEP|OCT|NOV|DEC|VEND|BRUM|FRIM|NIVO|PLUV|VENT|GERM|FLOR|PRAI|MESS|THER|FRUC|COMP|TSH|CSH|KSL|TVT|SHV|ADR|ADS|NSN|IYR|SVN|TMZ|AAV|ELL)
(?\d{1,4})
(?\d{2})
(?B\.C\.|BC)
)
^
((?&calendar))?\s* # Calendar (optional)
((?&day))? \s*(?=(?&month)\s*(?&year))? # Day, only if followed by Month & Year
((?&month))? \s*(?=(?&year)) # Month, only if followed by Year
((?&year)) \s*(?:/((?&year2))\s*)? # Year (must), second Year (optional)
((?&bc))? # B.C. (optional)
$[/code]

Thanks.

Some sample text showing both success and failure would help.

Success (e.g.):

  • MAY 1800
  • MAY 1800/01
  • 1 MAY 1800
  • 1 MAY 1800/01
  • 1 MAY 1800/01 B.C.
  • 1 MAY 1800 B.C.
  • @#DGREGORIAN@ MAY 1800/01 B.C.

Failure (e.g. - this should be also match):

  • 1800
  • 1800 B.C.
  • 1800/01
  • 1800/01 B.C.
  • @#DGREGORIAN@ 1800/01 B.C.

Try this:

(?xmi-Us)
(?(DEFINE)
(?<calendar>@\\#D[A-Z\\x20+]+@)
(?<month>  JAN|FEB|MAR|APR|MAY|JUN|JUL|
           AUG|SEP|OCT|NOV|DEC|VEND|BRUM|
           FRIM|NIVO|PLUV|VENT|GERM|FLOR|
           PRAI|MESS|THER|FRUC|COMP|TSH|
           CSH|KSL|TVT|SHV|ADR|ADS|NSN|IYR|
           SVN|TMZ|AAV|ELL)
(?<day>\\d{1,2})
(?<year>\\d{1,4})
(?<year2>\\d{2})
(?<bc>B\\.C\\.|BC)
)
^
((?&calendar))? \\x20*                          # Calendar (optional)
(?'mday'(?&day) (?=\\x20+ (?&month) \\x20+ (?&year)))?   \\x20*  # Day, only if followed by Month & Year
(?'mmonth'(?&month) (?=\\x20+ (?&year)) )? \\x20*              # Month, only if followed by Year
(?'myear'(?&year))    \\x20* (?:/\\d{2}\\x20*)?      # Year (must), second Year (optional)
(?'mbc'(?&bc))?                                   # B.C. (optional)
$

Thank you @Kem Tekinay . Your pattern worked better but not perfect. I modified it to this:

code
(?(DEFINE)
(?@\#D[A-Z\x20+]+@)
(? JAN|FEB|MAR|APR|MAY|JUN|JUL|
AUG|SEP|OCT|NOV|DEC|VEND|BRUM|
FRIM|NIVO|PLUV|VENT|GERM|FLOR|
PRAI|MESS|THER|FRUC|COMP|TSH|
CSH|KSL|TVT|SHV|ADR|ADS|NSN|IYR|
SVN|TMZ|AAV|ELL)
(?\d{1,2})
(?\d{1,4})
(?\d{2})
(?B\.C\.|BC)
)
^
(?‘mcalendar’(?&calendar))? \x20* # Calendar (optional)
(?‘mday’(?&day) (?=\x20+ (?&month) \x20+ (?&year)))? \x20* # Day, only if followed by Month & Year
(?‘mmonth’(?&month) (?=\x20+ (?&year)) )? \x20* # Month, only if followed by Year
(?‘myear’(?&year)) \x20* (?‘myear2’(?:/\d{2})\x20*)? # Year (must), second Year (optional)
(?‘mbc’(?&bc))? # B.C. (optional)
$[/code]

Success:

  • 1800
  • 1800 B.C.
  • MAY 1800
  • 1 MAY 1800
  • 1 MAY 1800 B.C.

Failure (They will match also, but every time the second year match includes “/” at the beginning.):

  • 1800/01
  • 1800/01 B.C.
  • MAY 1800/01
  • 1 MAY 1800/01
  • 1 MAY 1800/01 B.C.
  • @#DGREGORIAN@ 1800/01 B.C.
  • @#DGREGORIAN@ MAY 1800/01 B.C.

Yes, that’s what the pattern says. Try this instead:

(?xmi-Us)
(?(DEFINE)
(?<calendar>@\\#D[A-Z\\x20+]+@)
(?<month>  JAN|FEB|MAR|APR|MAY|JUN|JUL|
           AUG|SEP|OCT|NOV|DEC|VEND|BRUM|
           FRIM|NIVO|PLUV|VENT|GERM|FLOR|
           PRAI|MESS|THER|FRUC|COMP|TSH|
           CSH|KSL|TVT|SHV|ADR|ADS|NSN|IYR|
           SVN|TMZ|AAV|ELL)
(?<day>\\d{1,2})
(?<year>\\d{1,4})
(?<year2>\\d{2})
(?<bc>B\\.C\\.|BC)
)
^
(?'mcalendar'(?&calendar))? \\x20*                             # Calendar (optional)
(?'mday'(?&day) (?=\\x20+ (?&month) \\x20+ (?&year)))?   \\x20*  # Day, only if followed by Month & Year
(?'mmonth'(?&month) (?=\\x20+ (?&year)) )? \\x20*               # Month, only if followed by Year
(?'myear'(?&year))    \\x20* (?:/(?'myear2'\\d{2})\\x20*)?       # Year (must), second Year (optional)
(?'mbc'(?&bc))?                                               # B.C. (optional)
$

Perfect. Thanks again for your professional help.