Unable to find the right SearchPattern

I need to determine the right SearchPattern to find a substrings like the format as follows:
plenus.ar
bmd.com.ar
plenus.com

I tried many, simple ones like this:
rxLinks.SearchPattern = “[.?)"
rxLinks.SearchPattern = "\[.
?\)”

and more complex ones consulting ChatGPT like:
rxLinks.SearchPattern = “\[[^\]]+\]\(https://[a-zA-Z0-9./_-]+\)”
rxLinks.SearchPattern = “\[[^\]]+\]\(https://[^\)]+\)”
rxLinks.SearchPattern = “\[[^\]]+\]\(https:\/\/[^\)]+\)”

None of them works

Anyone with more experience on RegEx expression that could give me support ?

Thanks

You may need to define more details to get the exact pattern that you need, like:

  • do you need a limit of . between elements?
  • maximum length of characters for an element
  • do you only want valid domains .com, .net, .mx, .ar or anything like .abcd (if that is not an option with all the new definitions)
  • do your text always have http:// or https:// before what you want to extract or you can have text like “this is a test plenus.ar of something you want to extract”

maybe others.

Google suggest this: ((?!-)[a-zA-Z0-9-]{1,63}(?<!-)\.)+[a-zA-Z]{2,6}

Edit: the pattern should find plenus.ar from the quoted text above. If you need https:// too, then this won’t work.

I’m afraid your post is unclear because, I think, the forum software tried to interpret what you typed.

Please re-post using the code tags (three backticks before and after the text, or using the “</>” icon) along with what exactly you’re hoping to extract from the source text.

2 Likes

If that’s what I think it is…

No so bad… I had some results with your suggest, but not exactly what I need… and I believe that it was because I was not clear in my first post (as @KemTekinay answered).

I need to find the following pattern of substrings:

[plenus.ar](https://plenus.ar)
[bmd.com.ar](https://bmd.com.ar)
[plenus.com.ar](https://plenus.com.ar)

starts with a [
then a domain without https://
then ]
then (
then domanin with https://
and ending with )

it all so could be
starting with [
then anything else
and ending with )

This pattern will match with the subgroups:

\[(?!https)([^\]]+)\]\((https[^)]+)\)
1 Like

another way to do this:

((\[\S+\])(\(https?:\/\/\S+\.\S+\)))

There aren’t many invalid characters in URLs (I think only " ^ < > ` are completely invalid), so 1 or more of the ‘non-whitespace’ token \S+ probably suffices to describe any series of characters.

As the ‘name’ portion of the markdown URL [...] can include http:// etc or doesn’t have to be a domain name at all, \[\S+\] suffices.

If you really don’t want this to include names starting with https:// you can use the negative lookahead (?!https) @Kem_Tekinay includes, but not sure that’s needed…

The ‘url’ portion of the markdown URL (...) should start with either (http:// or (https:// - the optional s is s? - then some characters that should include at least 1 period ‘.’
The latter part \(https?:\/\/\S+\.\S+\) provides a rudimentary validation, nowhere near as comprehensive a validation as @Eric_Williams links to, but not sure that’s needed.

Each match includes 3 groups: the full markdown URL structure, the ‘name’ portion and the ‘url’ portion - or remove the subgroup parentheses if you don’t want these.

Not sure if that’s helpful… hopefully it is.

PS: https://regex101.com is a great resource and regex IDE/playground