I’m looking fo a pattern, which will parse everything from an uppercase character and behind until the next one. If possible, also for letters of other writing systems (Cyrillic, Asian, etc.).
(\\p{Lu}\\p{Ll}*)
This pattern only finds single Unicode words that begin with a capital letter, but nothing later until the next capital letter.
I was wrong about my change. The pattern does not work if a string starts with x…* whitespace. These would then also have to be split until the next capital letter comes.
Thank you Kem, the pattern is getting closer to the desired result. But it’s not perfect yet. My hint regarding spaces should only apply to spaces before the very first letter of a string. Your pattern currently returns the following:
I shall return to this subject. It’s not quite what I need yet.
I need to divide a string into items. The point is to divide the string case by case, ignoring punctuation and special characters. To find lowercase unicode texts in a string, I now use the following pattern:
(?:\\p{Ll}+)
But what I need is an array of pairs with all parts, where the right value of a pair, a boolean, should indicate whether it is a RegEx match or not. So the returned array of my example string HHmmmHelloWorld123&???+*0815 should look like this:
HH : False
mmm : True
H : False
ello : True
W : False
orld : True
123&?? : False
???????? : True
+*0815 : False
How do I get exactly this array, since the pattern currently only finds the values marked True?