I’d like to be able to detect if a character is a punctuation mark (ideally in a unicode-savvy fashion, so the smart quotes would be recognized as punctuation). I can hard code comparisons to code points, but I suspect this can be done more elegantly with Regex. \W looks like it might work, but is ASCII only. Any Regex guru out there who can point me to a better expression? Thanks.
\W works here. This code matches every character:
dim sourceText as string = """" dim rx as new RegEx rx.SearchPattern = "\\W" dim match as RegExMatch = rx.Search( sourceText ) while match <> nil AddToResult match.SubExpressionString( 0 ) match = rx.Search wend
Thanks. But that picks up non-punctuation characters, too, such as space and (accented e).
Right. If you want to get just punctuations, you would have to create your own set. You can start with the POSIX class [:punct:] and add the Unicode characters to it. Something like this:
Got it. Thanks.
And if you switch to RegExMBS, you can use the Unicode-specific properties so you could do this:
BTW, you can see that in action using RegExRX as I used RegExMBS as the engine.