New Regex/String Task

Hello,

I’m looking for a RegEx pattern again. It should search a Unicode string, no matter if with russian or arabic letters, for capital letters and then divide it. Only one string without spaces before or after is used as source.

Example: Dim s As String = "helloWorldisACoolThing"

To be split as in: “hello”, “Worldis”, “A”, “Cool”, “Thing”.

How can you do that? Any ideas?

Coming up…

(?:^|\\p{Lu})\\p{Ll}*

\p matches based on Unicode properties. {Ll} is lowercase letters, {Lu} is uppercase. If you have RegExRX, you can see the complete list of scripts and properties.

Looking good for my sample, but if I modify my source to “HhelloWorldisACoolThing” then the pattern doesn’t work correct. Same for “HHHhelloWorldisACoolThing”.

(?:^\\pL|\\p{Lu})\\p{Ll}*

At the start of a line, any letter will do.

Great Kem, thank you so much. You should write a book about RegEx-Patterns :). BTW, do you think, this is the fastest Pattern we can get?

Considering your requirements, I think so.

Hi Kem,
I have tested the pattern in detail once. Unfortunately it ignores punctuation marks within a string like dot, comma, semicolon, hyphen, underscore, Numbers etc. (e.g. helloWorldisACoolThing1,;:^"&%). What is missing in the pattern?

The pattern only looks for symbols that would be considered letters, and nothing else. If you only care about uppercase letters as a delimiter…

(?:^.|\\p{Lu})\\P{Lu}*