Regex for an url that is followed by a period

I have a working regex for finding urls:

(^|>| )((http:\\/\\/www\\.|https:\\/\\/www\\.|http:\\/\\/|https:\\/\\/)?[a-z0-9]+([\\-\\.]{1}[a-z0-9]+)*\\.[a-z]{2,5}(:[0-9]{1,5})?(\\/.*)?)($|<|\\.)

I have some text with an url that is followed by a period:

[quote]You received this message because you are subscribed to the Google Groups “gcd-main” group.

To unsubscribe from this group and stop receiving emails from it, send an email to gcd-main+unsubscribe@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/gcd-main/19d-052750c290dd%40comhem.se.[/quote]

I replace the urls with a hrefs. The replacement pattern is:

\\1<a href="\\2">\\2</a>\\7

The trailing period is matched, too, so that I get the following result:

[quote]You received this message because you are subscribed to the Google Groups “gcd-main” group.

To unsubscribe from this group and stop receiving emails from it, send an email to gcd-main+unsubscribe@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/gcd-main/19e6290dd%40comhem.se.[/quote]

How do I remove the trailing period from the match?

Put the \. Outside of the ) ?

That doesn’t fix the problem.

My original regex was

(^|>| )((http:\\/\\/www\\.|https:\\/\\/www\\.|http:\\/\\/|https:\\/\\/)?[a-z0-9]+([\\-\\.]{1}[a-z0-9]+)*\\.[a-z]{2,5}(:[0-9]{1,5})?(\\/.*)?)($|<)

without the dot. A \. doesn’t change the match.

Would this work for you instead?

\\b(https?://|www\\.)\\S+

Edit: Shorter pattern.

The pattern is much nicer than my own. But it also matches the last period which isn’t in the url. Perhaps I need to restate my problem: I need to recognise the end of an url. The forum mangled the url. It ends with “comhem.se.” and not with “comhem.se .”. There is no space between se and the final period. The problem is that the period is legal in the middle of a url but not at the end.

How about this?

\\b(https?://|www)\\S+(?<=[\\w/])

This ensures that the last character is some word character or a slash.

Not RegEx, but it works :slight_smile:

If RightB(myURL, 1) = "." Then myURL = LeftB(myURL, LenB(myURL - 1) End If

Many thanks, oh god of regexes.

@Tim Jones: I need a generic solution.

That’s pretty generic - you get the URL from your RegEx and set that to “myURL” and then call my sample. If there’s no period, nothing happens. : if there is, it’s removed.

Take what the RegEx gives you and call that If … Then:

myURL = "https://groups.google.com/d/msgid/gcd-main/19e66b5e-2c1c-a1fe-b4ad-052750c290dd%40comhem.se." If RightB(myURL, 1) = "." Then myURL = LeftB(myURL, LenB(myURL - 1) End If

myURL ends up being:

https://groups.google.com/d/msgid/gcd-main/19e66b5e-2c1c-a1fe-b4ad-052750c290dd%40comhem.se