i have the following good working RegEx Pattern. There’s only one situation, the Pattern won’t work correctly. If I have a three or four digit number (e.g. the year “1800”), the result matches wrong and 1800 matches as day (18) and year2 (00). The day should only match, if it is followed by a month and a year. A years, like 1800 should only match, if the given String has a 1…4 digit year number or a a second year like this 1800/01. How will I get the right results?
Maybe this is a topic for @Kem Tekinay
code
(?(DEFINE)
(?@\#D[A-Z\s+]+@)
(?\d{1,2})
(?JAN|FEB|MAR|APR|MAY|JUN|JUL|AUG|SEP|OCT|NOV|DEC|VEND|BRUM|FRIM|NIVO|PLUV|VENT|GERM|FLOR|PRAI|MESS|THER|FRUC|COMP|TSH|CSH|KSL|TVT|SHV|ADR|ADS|NSN|IYR|SVN|TMZ|AAV|ELL)
(?\d{1,4})
(?\d{2})
(?B\.C\.|BC)
)
^
((?&calendar))?\s* # Calendar (optional)
((?&day))? \s*(?=(?&month)\s*(?&year))? # Day, only if followed by Month & Year
((?&month))? \s*(?=(?&year)) # Month, only if followed by Year
((?&year)) \s*(?:/((?&year2))\s*)? # Year (must), second Year (optional)
((?&bc))? # B.C. (optional)
$[/code]
(?xmi-Us)
(?(DEFINE)
(?<calendar>@\\#D[A-Z\\x20+]+@)
(?<month> JAN|FEB|MAR|APR|MAY|JUN|JUL|
AUG|SEP|OCT|NOV|DEC|VEND|BRUM|
FRIM|NIVO|PLUV|VENT|GERM|FLOR|
PRAI|MESS|THER|FRUC|COMP|TSH|
CSH|KSL|TVT|SHV|ADR|ADS|NSN|IYR|
SVN|TMZ|AAV|ELL)
(?<day>\\d{1,2})
(?<year>\\d{1,4})
(?<year2>\\d{2})
(?<bc>B\\.C\\.|BC)
)
^
((?&calendar))? \\x20* # Calendar (optional)
(?'mday'(?&day) (?=\\x20+ (?&month) \\x20+ (?&year)))? \\x20* # Day, only if followed by Month & Year
(?'mmonth'(?&month) (?=\\x20+ (?&year)) )? \\x20* # Month, only if followed by Year
(?'myear'(?&year)) \\x20* (?:/\\d{2}\\x20*)? # Year (must), second Year (optional)
(?'mbc'(?&bc))? # B.C. (optional)
$
Thank you @Kem Tekinay . Your pattern worked better but not perfect. I modified it to this:
code
(?(DEFINE)
(?@\#D[A-Z\x20+]+@)
(? JAN|FEB|MAR|APR|MAY|JUN|JUL|
AUG|SEP|OCT|NOV|DEC|VEND|BRUM|
FRIM|NIVO|PLUV|VENT|GERM|FLOR|
PRAI|MESS|THER|FRUC|COMP|TSH|
CSH|KSL|TVT|SHV|ADR|ADS|NSN|IYR|
SVN|TMZ|AAV|ELL)
(?\d{1,2})
(?\d{1,4})
(?\d{2})
(?B\.C\.|BC)
)
^
(?‘mcalendar’(?&calendar))? \x20* # Calendar (optional)
(?‘mday’(?&day) (?=\x20+ (?&month) \x20+ (?&year)))? \x20* # Day, only if followed by Month & Year
(?‘mmonth’(?&month) (?=\x20+ (?&year)) )? \x20* # Month, only if followed by Year
(?‘myear’(?&year)) \x20* (?‘myear2’(?:/\d{2})\x20*)? # Year (must), second Year (optional)
(?‘mbc’(?&bc))? # B.C. (optional)
$[/code]
Success:
1800
1800 B.C.
MAY 1800
1 MAY 1800
1 MAY 1800 B.C.
Failure (They will match also, but every time the second year match includes “/” at the beginning.):
Yes, that’s what the pattern says. Try this instead:
(?xmi-Us)
(?(DEFINE)
(?<calendar>@\\#D[A-Z\\x20+]+@)
(?<month> JAN|FEB|MAR|APR|MAY|JUN|JUL|
AUG|SEP|OCT|NOV|DEC|VEND|BRUM|
FRIM|NIVO|PLUV|VENT|GERM|FLOR|
PRAI|MESS|THER|FRUC|COMP|TSH|
CSH|KSL|TVT|SHV|ADR|ADS|NSN|IYR|
SVN|TMZ|AAV|ELL)
(?<day>\\d{1,2})
(?<year>\\d{1,4})
(?<year2>\\d{2})
(?<bc>B\\.C\\.|BC)
)
^
(?'mcalendar'(?&calendar))? \\x20* # Calendar (optional)
(?'mday'(?&day) (?=\\x20+ (?&month) \\x20+ (?&year)))? \\x20* # Day, only if followed by Month & Year
(?'mmonth'(?&month) (?=\\x20+ (?&year)) )? \\x20* # Month, only if followed by Year
(?'myear'(?&year)) \\x20* (?:/(?'myear2'\\d{2})\\x20*)? # Year (must), second Year (optional)
(?'mbc'(?&bc))? # B.C. (optional)
$