New Regex/String Task

  1. 7 months ago

    Martin T

    10 Sep 2019 Pre-Release Testers Germany

    Hello,

    I'm looking for a RegEx pattern again. It should search a Unicode string, no matter if with russian or arabic letters, for capital letters and then divide it. Only one string without spaces before or after is used as source.

    Example: Dim s As String = "helloWorldisACoolThing"

    To be split as in: "hello", "Worldis", "A", "Cool", "Thing".

    How can you do that? Any ideas?

    The pattern only looks for symbols that would be considered letters, and nothing else. If you only care about uppercase letters as a delimiter...

    (?:^.|\p{Lu})\P{Lu}*
  2. Kem T

    10 Sep 2019 Pre-Release Testers, Xojo Pro, XDC Speakers, MVP Connecticut

    Coming up...

  3. Kem T

    10 Sep 2019 Pre-Release Testers, Xojo Pro, XDC Speakers, MVP Connecticut
    (?:^|\p{Lu})\p{Ll}*
  4. Kem T

    10 Sep 2019 Pre-Release Testers, Xojo Pro, XDC Speakers, MVP Connecticut

    \p matches based on Unicode properties. {Ll} is lowercase letters, {Lu} is uppercase. If you have RegExRX, you can see the complete list of scripts and properties.

  5. Martin T

    10 Sep 2019 Pre-Release Testers Germany
    Edited 7 months ago

    @Kem T (?:^|\p{Lu})\p{Ll}*

    Looking good for my sample, but if I modify my source to "HhelloWorldisACoolThing" then the pattern doesn't work correct. Same for "HHHhelloWorldisACoolThing".

  6. Kem T

    10 Sep 2019 Pre-Release Testers, Xojo Pro, XDC Speakers, MVP Connecticut
    (?:^\pL|\p{Lu})\p{Ll}*
  7. Kem T

    10 Sep 2019 Pre-Release Testers, Xojo Pro, XDC Speakers, MVP Connecticut

    At the start of a line, any letter will do.

  8. Martin T

    10 Sep 2019 Pre-Release Testers Germany
    Edited 7 months ago

    Great Kem, thank you so much. You should write a book about RegEx-Patterns :). BTW, do you think, this is the fastest Pattern we can get?

  9. Kem T

    10 Sep 2019 Pre-Release Testers, Xojo Pro, XDC Speakers, MVP Connecticut

    Considering your requirements, I think so.

  10. Martin T

    11 Sep 2019 Pre-Release Testers Germany

    Hi Kem,
    I have tested the pattern in detail once. Unfortunately it ignores punctuation marks within a string like dot, comma, semicolon, hyphen, underscore, Numbers etc. (e.g. helloWorldisACoolThing1,;:^"&%). What is missing in the pattern?

  11. Kem T

    11 Sep 2019 Pre-Release Testers, Xojo Pro, XDC Speakers, MVP Answer Connecticut

    The pattern only looks for symbols that would be considered letters, and nothing else. If you only care about uppercase letters as a delimiter...

    (?:^.|\p{Lu})\P{Lu}*

or Sign Up to reply!