Unable to find the right SearchPattern

Mariano_Poli · May 28, 2025, 8:15pm

I need to determine the right SearchPattern to find a substrings like the format as follows:
plenus.ar
bmd.com.ar
plenus.com

I tried many, simple ones like this:
rxLinks.SearchPattern = “[.?)"
rxLinks.SearchPattern = "\[.?\)”

and more complex ones consulting ChatGPT like:
rxLinks.SearchPattern = “\[[^\]]+\]\(https://[a-zA-Z0-9./_-]+\)”
rxLinks.SearchPattern = “\[[^\]]+\]\(https://[^\)]+\)”
rxLinks.SearchPattern = “\[[^\]]+\]\(https:\/\/[^\)]+\)”

None of them works

Anyone with more experience on RegEx expression that could give me support ?

Thanks

AlbertoD · May 28, 2025, 8:33pm

You may need to define more details to get the exact pattern that you need, like:

do you need a limit of . between elements?
maximum length of characters for an element
do you only want valid domains .com, .net, .mx, .ar or anything like .abcd (if that is not an option with all the new definitions)
do your text always have http:// or https:// before what you want to extract or you can have text like “this is a test plenus.ar of something you want to extract”

maybe others.

Google suggest this: ((?!-)[a-zA-Z0-9-]{1,63}(?<!-)\.)+[a-zA-Z]{2,6}

Edit: the pattern should find plenus.ar from the quoted text above. If you need https:// too, then this won’t work.

Kem_Tekinay · May 28, 2025, 8:34pm

I’m afraid your post is unclear because, I think, the forum software tried to interpret what you typed.

Please re-post using the code tags (three backticks before and after the text, or using the “</>” icon) along with what exactly you’re hoping to extract from the source text.

Eric_Williams · May 28, 2025, 8:46pm

If that’s what I think it is…

Mariano_Poli · May 28, 2025, 9:39pm

No so bad… I had some results with your suggest, but not exactly what I need… and I believe that it was because I was not clear in my first post (as @KemTekinay answered).

I need to find the following pattern of substrings:

[plenus.ar](https://plenus.ar)
[bmd.com.ar](https://bmd.com.ar)
[plenus.com.ar](https://plenus.com.ar)

starts with a [
then a domain without https://
then ]
then (
then domanin with https://
and ending with )

it all so could be
starting with [
then anything else
and ending with )

Kem_Tekinay · May 29, 2025, 12:40pm

This pattern will match with the subgroups:

\[(?!https)([^\]]+)\]\((https[^)]+)\)

Stam_Kapetanakis · May 29, 2025, 4:56pm

another way to do this:

((\[\S+\])(\(https?:\/\/\S+\.\S+\)))

There aren’t many invalid characters in URLs (I think only " ^ < > ` are completely invalid), so 1 or more of the ‘non-whitespace’ token \S+ probably suffices to describe any series of characters.

As the ‘name’ portion of the markdown URL [...] can include http:// etc or doesn’t have to be a domain name at all, \[\S+\] suffices.

If you really don’t want this to include names starting with https:// you can use the negative lookahead (?!https) @Kem_Tekinay includes, but not sure that’s needed…

The ‘url’ portion of the markdown URL (...) should start with either (http:// or (https:// - the optional s is s? - then some characters that should include at least 1 period ‘.’
The latter part \(https?:\/\/\S+\.\S+\) provides a rudimentary validation, nowhere near as comprehensive a validation as @Eric_Williams links to, but not sure that’s needed.

Each match includes 3 groups: the full markdown URL structure, the ‘name’ portion and the ‘url’ portion - or remove the subgroup parentheses if you don’t want these.

Not sure if that’s helpful… hopefully it is.

PS: https://regex101.com is a great resource and regex IDE/playground