New Regex/String Task

  1. last week

    Martin T

    Sep 10 Pre-Release Testers Germany

    Hello,

    I'm looking for a RegEx pattern again. It should search a Unicode string, no matter if with russian or arabic letters, for capital letters and then divide it. Only one string without spaces before or after is used as source.

    Example: Dim s As String = "helloWorldisACoolThing"

    To be split as in: "hello", "Worldis", "A", "Cool", "Thing".

    How can you do that? Any ideas?

    The pattern only looks for symbols that would be considered letters, and nothing else. If you only care about uppercase letters as a delimiter...

    (?:^.|\p{Lu})\P{Lu}*
  2. Kem T

    Sep 10 Pre-Release Testers, Xojo Pro, XDC Speakers Connecticut

    Coming up...

  3. Kem T

    Sep 10 Pre-Release Testers, Xojo Pro, XDC Speakers Connecticut
    (?:^|\p{Lu})\p{Ll}*
  4. Kem T

    Sep 10 Pre-Release Testers, Xojo Pro, XDC Speakers Connecticut

    \p matches based on Unicode properties. {Ll} is lowercase letters, {Lu} is uppercase. If you have RegExRX, you can see the complete list of scripts and properties.

  5. Martin T

    Sep 10 Pre-Release Testers Germany
    Edited last week

    @Kem T (?:^|\p{Lu})\p{Ll}*

    Looking good for my sample, but if I modify my source to "HhelloWorldisACoolThing" then the pattern doesn't work correct. Same for "HHHhelloWorldisACoolThing".

  6. Kem T

    Sep 10 Pre-Release Testers, Xojo Pro, XDC Speakers Connecticut
    (?:^\pL|\p{Lu})\p{Ll}*
  7. Kem T

    Sep 10 Pre-Release Testers, Xojo Pro, XDC Speakers Connecticut

    At the start of a line, any letter will do.

  8. Martin T

    Sep 10 Pre-Release Testers Germany
    Edited last week

    Great Kem, thank you so much. You should write a book about RegEx-Patterns :). BTW, do you think, this is the fastest Pattern we can get?

  9. Kem T

    Sep 10 Pre-Release Testers, Xojo Pro, XDC Speakers Connecticut

    Considering your requirements, I think so.

  10. Martin T

    Sep 11 Pre-Release Testers Germany

    Hi Kem,
    I have tested the pattern in detail once. Unfortunately it ignores punctuation marks within a string like dot, comma, semicolon, hyphen, underscore, Numbers etc. (e.g. helloWorldisACoolThing1,;:^"&%). What is missing in the pattern?

  11. Kem T

    Sep 11 Pre-Release Testers, Xojo Pro, XDC Speakers Answer Connecticut

    The pattern only looks for symbols that would be considered letters, and nothing else. If you only care about uppercase letters as a delimiter...

    (?:^.|\p{Lu})\P{Lu}*

or Sign Up to reply!