Martin_T
(Martin T)
September 30, 2021, 11:34pm
1
This is a new, maybe easy, question to our RegEx Pattern king @Kem_Tekinay .
I’m looking for a working pattern to match this:
Name // variable length, no whitespaces
| // OPTIONAL: including optional .* whitespaces before and behind |
[0-9]+ // OPTIONAL
/ abc // OPTIONAL: including optional .* whitespaces before # and the rest after /
Examples:
Valid:
Test
Test/follow us
Test|1
Test | 100
Test | 1/follow us
Kem_Tekinay
(Kem Tekinay)
September 30, 2021, 11:39pm
2
Specifically what kind of white space? Just spaces, spaces and tabs, or linefeeds too?
Martin_T
(Martin T)
September 30, 2021, 11:41pm
3
Just normal whitespaces (ASCII: 32).
If possible, could you please do it with subroutines, so we can get a better overview over the pattern?
(?(DEFINE)
...single parts
)
Kem_Tekinay
(Kem Tekinay)
September 30, 2021, 11:45pm
4
Fyi, “whitespace” refers to spaces, tabs, vertical tabs, carriage returns, linefeeds, and some other invisible characters.
Funny guy.
1 Like
Kem_Tekinay
(Kem Tekinay)
September 30, 2021, 11:53pm
5
Try this:
^\w+( *\| *\d+)?( */ *.+)?$
This assumes that each of these will be on a line by themselves.
Martin_T
(Martin T)
October 1, 2021, 12:14am
6
Right.
Thanks Kem. This works for me:
^(?'Name'\w+)(?'Separator' *\| *)?(?'Number'\d+)?(?'Slash' */ *.*)?$
That would match “name3”, fyi.
Martin_T
(Martin T)
October 1, 2021, 12:29am
8
What do you mean? Since name3
will be a “single” word without whitespace, it’ll be valid.
Brain freeze. You’re right, of course.
1 Like
Martin_T
(Martin T)
October 1, 2021, 12:55am
10
Kem_Tekinay:
\w+
This does not care about emojis or composite characters.
Martin_T:
(?'Slash' */ *.*)
How can I add an alternative, say ++
. So /
and ++
will be valid?
Martin_T
(Martin T)
October 1, 2021, 1:28pm
12
@Kem_Tekinay , how can the pattern also takes care about such a construction:
🙅🏿♂️/India
HumanOf🙅🏿♂️/India
🙅🏿♂️
/ HumanOf🙅🏿♂️
should be matched as Name .
Try this:
^(?'Name'(?:(?!\+\+|[\s/|[:punct:]]).)+)(?'Separator' *\| *)?(?'Number'\d+)?(?'Slash' *(?:/|\+\+) *.*)?$
Martin_T
(Martin T)
October 1, 2021, 8:45pm
14
Thanks @Kem_Tekinay ,
looks like the pattern works, but you RegExRX App doesn’t match the right selection range. I think you use String.Length
instead of String.Bytes
to select the right range.
I saw.
I do some fancy things there to match bytes to characters, and it mostly works, but clearly failed here. I’ll have to see if a newer version of Xojo fixes those issues.
1 Like
Martin_T
(Martin T)
October 2, 2021, 3:51pm
16
@Kem_Tekinay , for now it works great, but I need one more extension:
Test[acdc]/follow us
Test[acdc]
should be matched as Name .
Let try is this way. This pattern specifies what’s permitted, including every unicode character >= 128 . It’s also in free-space mode for readability.
If you want to allow any other characters (or not allow ones I’ve allowed), modify the ‘Name’ character class.
(?x)
^
(?'Name' [\w[\]{}()\x{80}-\x{10ffff}]+)
(?'Separator' \x20* \| \x20*)?
(?'Number' \d+)?
(?'Slash' \x20* (?:/|\+\+) \x20* .*)?
$
Martin_T
(Martin T)
October 3, 2021, 9:44am
18
Thanks @Kem_Tekinay , looks good to me.