Regex problem

Got a little regex problem. My app has an app log which shows what was done. I wanted to correct the grammar when only 1 thing was done with a bit of regex.

“xxx mail/mails was/were archived”

should become

“xxx mail was archived” with xxx = 1

and

“xxx mails were archived” with xxx > 1

This works nicely with a simple regex of “(\w*)/(\w*)” and replacement patterns of \1 and \2.

Enter unicode with “xxx Anhang/Anhnge wurde/wurden archiviert” where the isn’t matched by the \w. Is there something like \w that works with unicode characters?

What alternatives could work as regex? “(\s)(.)/(.)(\s)” doesn’t work because the 2 matches can be directly after each other. Then I thought to do the replace in a loop but this also doesn’t work for 2 matches because on the second match the first one is included (matches " mails was/were " instead of " was/were".

Any idea?

Latest code:

[code] dim theRegex as new RegEx
theRegex.options.greedy = False
theRegex.searchPattern = “(\s)(.)/(.)(\s)”
if theNumber = 1 then
theRegex.replacementPattern = “\1\2\4”
elseif theNumber > 1 then
theRegex.replacementPattern = “\1\3\4”
end if
dim theRegexMatch as RegExMatch = theRegex.Search(theText)

	while theRegexMatch <> Nil
			theText = theRegex.Replace(theText)
			theRegexMatch = theRegex.Search(theText)
	wend
	
	Return theText[/code]

First thought is, can you have it written to the log correctly in the first place rather than going back and correcting it?

Second thought, do you need to rearrange the text or can you just create new text based on the number?

pseudocode:

[code]if messageCount = 1 then

if language = “german” then
log = " Anhang wurde archiviert"
elseif language = “english” then
log = " mail was archived"
end if

else

if language = “german” then
log = " Anhnge wurden archiviert"
elseif language = “english” then
log = " mails were archived"
end if

end if

log = str(messageCount) + log

[/code]

There are Unicode-aware tokens that will help you here. For example, if you were only concerned about letters, this would do the trick:

SearchPattern = "(\\pL+)/(\\pL+)" // Just letters

If you wanted to emulate \\w, then:

SearchPattern = "([\\pL\\pN_]+)/([\\pL\\pN_]+)" // Letters, numbers, underscore

If you use RegExRX, take a look at the “Search Pattern” popup menu under “Unicode” and “Unicode Scripts & Properties”.

Edit: I forgot the trailing “+” in the last pattern.

@Scott: changing the API would be a major effort. All my localization constants are in the individual classes.

@Kem: thanks!