This is a new, maybe easy, question to our RegEx Pattern king @Kem_Tekinay.
I’m looking for a working pattern to match this:
- Name // variable length, no whitespaces
- | // OPTIONAL: including optional .* whitespaces before and behind |
- [0-9]+ // OPTIONAL
- / abc // OPTIONAL: including optional .* whitespaces before # and the rest after /
Test | 100
Test | 1/follow us
Specifically what kind of white space? Just spaces, spaces and tabs, or linefeeds too?
Just normal whitespaces (ASCII: 32).
If possible, could you please do it with subroutines, so we can get a better overview over the pattern?
Fyi, “whitespace” refers to spaces, tabs, vertical tabs, carriage returns, linefeeds, and some other invisible characters.
^\w+( *\| *\d+)?( */ *.+)?$
This assumes that each of these will be on a line by themselves.
Thanks Kem. This works for me:
^(?'Name'\w+)(?'Separator' *\| *)?(?'Number'\d+)?(?'Slash' */ *.*)?$
That would match “name3”, fyi.
What do you mean? Since
name3 will be a “single” word without whitespace, it’ll be valid.
Brain freeze. You’re right, of course.
This does not care about emojis or composite characters.
How can I add an alternative, say
++ will be valid?
@Kem_Tekinay, how can the pattern also takes care about such a construction:
HumanOf🙅🏿♂️ should be matched as Name.
^(?'Name'(?:(?!\+\+|[\s/|[:punct:]]).)+)(?'Separator' *\| *)?(?'Number'\d+)?(?'Slash' *(?:/|\+\+) *.*)?$
looks like the pattern works, but you RegExRX App doesn’t match the right selection range. I think you use
String.Length instead of
String.Bytes to select the right range.
I do some fancy things there to match bytes to characters, and it mostly works, but clearly failed here. I’ll have to see if a newer version of Xojo fixes those issues.
@Kem_Tekinay, for now it works great, but I need one more extension:
Test[acdc] should be matched as Name.
Let try is this way. This pattern specifies what’s permitted, including every unicode character >= 128 . It’s also in free-space mode for readability.
If you want to allow any other characters (or not allow ones I’ve allowed), modify the ‘Name’ character class.
(?'Separator' \x20* \| \x20*)?
(?'Slash' \x20* (?:/|\+\+) \x20* .*)?
Thanks @Kem_Tekinay, looks good to me.