RegEx Pattern wanted: Apostrophe

  1. 5 weeks ago

    Martin T

    Apr 17 Pre-Release Testers Germany

    Hi,

    this is a new research for a correct RegEx Pattern. I have to following sample string an Pattern:

    String: "I'm doing the loop, which probably   has to be recursive."
    Pattern: "(\s*\S\w*(?:[:.!?,;\-\_\\]+)?\s*)"

    It works fine, but I also wanna match "I'm " as on match. How can I do that? And ist the pattern unicode save?
    -image-

  2. Kem T

    Apr 17 Pre-Release Testers, Xojo Pro, XDC Speakers New York

    Are you trying to match any word including any trailing selected punctuation and whitespace? And what constitutes a "word" may include a single apostrophe?

  3. Martin T

    Apr 18 Pre-Release Testers Germany

    I try to find all words with 1..* trailing spaces. However, as soon as a word has a dot, exclamation mark, question mark, colon, comma, semicolon, hyphen or a backlash, a new match should be found. However, the characters just described still belong to the previous match. An apostrophe in words should also be matched.

  4. Kem T

    Apr 18 Pre-Release Testers, Xojo Pro, XDC Speakers New York

    I crafted this based on your description, not the original pattern:

    (?:\w+')?\w+[.!?:,;\-\\]*\x20*
  5. Martin T

    Apr 18 Pre-Release Testers Germany

    Thanks Kem. It's not exactly matching everything. Please watch:
    -image-

  6. Kem T

    Apr 18 Pre-Release Testers, Xojo Pro, XDC Speakers New York

    The pattern I gave you would only match an apostrophe within the word, but it may be at the end of the word too? That's going to be tricky if you have quoted text. For example:

    'Away we go' <-- go' would match based on this criteria

    Is that a concern?

  7. Beatrix W

    Apr 18 Pre-Release Testers Europe (Germany)

    Not every perversion needs to be treated with regex. How about you do a split(mytext, " ")? The split won't get you 100% of what you need. But you can then do a much simpler regex on the result.

  8. Martin T

    Apr 18 Pre-Release Testers Germany

    Dear Beatrix,
    it is not a perversion, but there is a reason why we need such a pattern. Splitting for spaces is out of the question for us (as described above) because we need every single space and don't want to change the original text. But we need the individual parts.

    Yes Kem, there could be also apostrophes at the end like „Kem‘s“ or like the „Lorem‘“ in some languages.

  9. Kem T

    Apr 18 Pre-Release Testers, Xojo Pro, XDC Speakers New York

    So this?

    (?:\w+'\w*|\w+)[.!?:,;\-\\]*\x20*
  10. Martin T

    Apr 19 Pre-Release Testers Germany

    Nice Kem, thanks. For now it’s working fine. Just found out that if the very first character of the string is „‘“ it won‘t match. But I can live with that at the moment.

  11. 4 weeks ago

    Robert L

    Apr 20 Federal Way, WA (Seattle Area)
    Edited 4 weeks ago

    How about

    \b[\w']+[ .!?:,;\-\\]*

    ?

  12. Martin T

    Apr 22 Pre-Release Testers Germany

    @Kem T So this?

    (?:\w+'\w*|\w+)[.!?:,;\-\\]*\x20*

    I did a small modification, now it works fine. Thanks again.

    (?:\w*'\w*|\w*)[.!?:,;\-\\]*\x20*
  13. Robert L

    Apr 23 Federal Way, WA (Seattle Area)
    \b[\w']+[ .!?:,;\-\\]*?

    Just for the record, the ? Should have been part of the code. Glad you got what you needed from Kem and your own machinations. Not sure of the status of \b When using Xojo.

or Sign Up to reply!