Regex for an url that is followed by a period

  1. 2 weeks ago

    Beatrix W

    Mar 22 Pre-Release Testers, Third Party Store Europe (Germany)

    I have a working regex for finding urls:

    (^|>| )((http:\/\/www\.|https:\/\/www\.|http:\/\/|https:\/\/)?[a-z0-9]+([\-\.]{1}[a-z0-9]+)*\.[a-z]{2,5}(:[0-9]{1,5})?(\/.*)?)($|<|\.)

    I have some text with an url that is followed by a period:

    You received this message because you are subscribed to the Google Groups "gcd-main" group.<br/>
    To unsubscribe from this group and stop receiving emails from it, send an email to gcd-main+unsubscribe@googlegroups.com.<br/>
    To view this discussion on the web visit https://groups.google.com/d/msgid/gcd-main/19d-052750c290dd%40comhem.se .

    I replace the urls with a hrefs. The replacement pattern is:

    \1<a href="\2">\2</a>\7

    The trailing period is matched, too, so that I get the following result:

    You received this message because you are subscribed to the Google Groups "gcd-main" group.<br/>
    To unsubscribe from this group and stop receiving emails from it, send an email to gcd-main+unsubscribe@googlegroups.com.<br/>
    To view this discussion on the web visit <a href="https://groups.google.com/d/msgid/gcd-main/19e66b5e-2c1c-a1fe-b4ad-052750c290dd%40comhem.se.">https://groups.google.com/d/msgid/gcd-main/19e6290dd%40comhem.se.</a>

    How do I remove the trailing period from the match?

    How about this?

    \b(https?://|www)\S+(?<=[\w/])

    This ensures that the last character is some word character or a slash.

  2. Greg O

    Mar 22 Xojo Inc scout.galaxy.barn

    Put the \. Outside of the ) ?

  3. Beatrix W

    Mar 22 Pre-Release Testers, Third Party Store Europe (Germany)

    That doesn't fix the problem.

    My original regex was

    (^|>| )((http:\/\/www\.|https:\/\/www\.|http:\/\/|https:\/\/)?[a-z0-9]+([\-\.]{1}[a-z0-9]+)*\.[a-z]{2,5}(:[0-9]{1,5})?(\/.*)?)($|<)

    without the dot. A \. doesn't change the match.

  4. Kem T

    Mar 22 Pre-Release Testers, Xojo Pro, XDC Speakers, MVP Connecticut
    Edited 2 weeks ago

    Would this work for you instead?

    \b(https?://|www\.)\S+

    Edit: Shorter pattern.

  5. Beatrix W

    Mar 22 Pre-Release Testers, Third Party Store Europe (Germany)

    The pattern is much nicer than my own. But it also matches the last period which isn't in the url. Perhaps I need to restate my problem: I need to recognise the end of an url. The forum mangled the url. It ends with "comhem.se." and not with "comhem.se .". There is no space between se and the final period. The problem is that the period is legal in the middle of a url but not at the end.

  6. Kem T

    Mar 22 Pre-Release Testers, Xojo Pro, XDC Speakers, MVP Answer Connecticut

    How about this?

    \b(https?://|www)\S+(?<=[\w/])

    This ensures that the last character is some word character or a slash.

  7. Tim J

    Mar 22 Pre-Release Testers, Xojo Pro N. Phoenix, AZ

    Not RegEx, but it works :)

    If RightB(myURL, 1) = "." Then
      myURL = LeftB(myURL, LenB(myURL - 1)
    End If
  8. Beatrix W

    Mar 23 Pre-Release Testers, Third Party Store Europe (Germany)

    Many thanks, oh god of regexes.

    @Tim Jones: I need a generic solution.

  9. Tim J

    Mar 24 Pre-Release Testers, Xojo Pro N. Phoenix, AZ

    @Beatrix W @Tim Jones: I need a generic solution.

    That's pretty generic - you get the URL from your RegEx and set that to "myURL" and then call my sample. If there's no period, nothing happens. : if there is, it's removed.

    Take what the RegEx gives you and call that If ... Then:

    myURL = "https://groups.google.com/d/msgid/gcd-main/19e66b5e-2c1c-a1fe-b4ad-052750c290dd%40comhem.se."
    If RightB(myURL, 1) = "." Then
      myURL = LeftB(myURL, LenB(myURL - 1)
    End If

    myURL ends up being:

    https://groups.google.com/d/msgid/gcd-main/19e66b5e-2c1c-a1fe-b4ad-052750c290dd%40comhem.se

or Sign Up to reply!