RegEx Pattern wanted: Apostrophe

Hi,

this is a new research for a correct RegEx Pattern. I have to following sample string an Pattern:

String: "I'm doing the loop, which probably has to be recursive." Pattern: "(\\s*\\S\\w*(?:[:.!?,;\\-\\_\\\\]+)?\\s*)"
It works fine, but I also wanna match "I’m " as on match. How can I do that? And ist the pattern unicode save?

Are you trying to match any word including any trailing selected punctuation and whitespace? And what constitutes a “word” may include a single apostrophe?

I try to find all words with 1…* trailing spaces. However, as soon as a word has a dot, exclamation mark, question mark, colon, comma, semicolon, hyphen or a backlash, a new match should be found. However, the characters just described still belong to the previous match. An apostrophe in words should also be matched.

I crafted this based on your description, not the original pattern:

(?:\\w+')?\\w+[.!?:,;\\-\\\\]*\\x20*

Thanks Kem. It’s not exactly matching everything. Please watch:

The pattern I gave you would only match an apostrophe within the word, but it may be at the end of the word too? That’s going to be tricky if you have quoted text. For example:

‘Away we go’ <-- go’ would match based on this criteria

Is that a concern?

Not every perversion needs to be treated with regex. How about you do a split(mytext, " ")? The split won’t get you 100% of what you need. But you can then do a much simpler regex on the result.

Dear Beatrix,
it is not a perversion, but there is a reason why we need such a pattern. Splitting for spaces is out of the question for us (as described above) because we need every single space and don’t want to change the original text. But we need the individual parts.

Yes Kem, there could be also apostrophes at the end like „Kem‘s“ or like the „Lorem‘“ in some languages.

So this?

(?:\\w+'\\w*|\\w+)[.!?:,;\\-\\\\]*\\x20*

Nice Kem, thanks. For now it’s working fine. Just found out that if the very first character of the string is „‘“ it won‘t match. But I can live with that at the moment.

How about

\\b[\\w']+[ .!?:,;\\-\\\\]*

?

[quote=433183:@Kem Tekinay]So this?

(?:\\w+'\\w*|\\w+)[.!?:,;\\-\\\\]*\\x20* [/quote]
I did a small modification, now it works fine. Thanks again.

(?:\\w*'\\w*|\\w*)[.!?:,;\\-\\\\]*\\x20*
\\b[\\w']+[ .!?:,;\\-\\\\]*?

Just for the record, the ? Should have been part of the code. Glad you got what you needed from Kem and your own machinations. Not sure of the status of \b When using Xojo.