RegEx Pattern Question to Kem

This is a new, maybe easy, question to our RegEx Pattern king @Kem_Tekinay.
I’m looking for a working pattern to match this:

  1. Name // variable length, no whitespaces
  2. | // OPTIONAL: including optional .* whitespaces before and behind |
  3. [0-9]+ // OPTIONAL
  4. / abc // OPTIONAL: including optional .* whitespaces before # and the rest after /

Examples:

Valid:

Test
Test/follow us
Test|1
Test | 100
Test | 1/follow us

Specifically what kind of white space? Just spaces, spaces and tabs, or linefeeds too?

Just normal whitespaces (ASCII: 32).

If possible, could you please do it with subroutines, so we can get a better overview over the pattern?

(?(DEFINE)
...single parts
)

Fyi, “whitespace” refers to spaces, tabs, vertical tabs, carriage returns, linefeeds, and some other invisible characters.

Funny guy.

1 Like

Try this:

^\w+( *\| *\d+)?( */ *.+)?$

This assumes that each of these will be on a line by themselves.

Right.

Thanks Kem. This works for me:

^(?'Name'\w+)(?'Separator' *\| *)?(?'Number'\d+)?(?'Slash' */ *.*)?$

That would match “name3”, fyi.

What do you mean? Since name3 will be a “single” word without whitespace, it’ll be valid.

Brain freeze. You’re right, of course.

1 Like

This does not care about emojis or composite characters.

How can I add an alternative, say ++. So /and ++ will be valid?

(\+\+|/)

@Kem_Tekinay, how can the pattern also takes care about such a construction:

🙅🏿‍♂️/India
HumanOf🙅🏿‍♂️/India

🙅🏿‍♂️ / HumanOf🙅🏿‍♂️ should be matched as Name.

Try this:

^(?'Name'(?:(?!\+\+|[\s/|[:punct:]]).)+)(?'Separator' *\| *)?(?'Number'\d+)?(?'Slash' *(?:/|\+\+) *.*)?$

Thanks @Kem_Tekinay,
looks like the pattern works, but you RegExRX App doesn’t match the right selection range. I think you use String.Length instead of String.Bytes to select the right range.

I saw.

I do some fancy things there to match bytes to characters, and it mostly works, but clearly failed here. I’ll have to see if a newer version of Xojo fixes those issues.

1 Like

@Kem_Tekinay, for now it works great, but I need one more extension:

Test[acdc]/follow us

Test[acdc] should be matched as Name.

Let try is this way. This pattern specifies what’s permitted, including every unicode character >= 128 . It’s also in free-space mode for readability.

If you want to allow any other characters (or not allow ones I’ve allowed), modify the ‘Name’ character class.

(?x)
^
(?'Name' [\w[\]{}()\x{80}-\x{10ffff}]+)
(?'Separator' \x20* \| \x20*)?
(?'Number' \d+)?
(?'Slash' \x20* (?:/|\+\+) \x20* .*)?
$

Thanks @Kem_Tekinay, looks good to me.