RegEx just melts my brain :(

  1. last week

    Dave S

    Aug 8 San Diego, California USA
    Edited last week

    I need a function that does the following

    • given a String, determine if that string contains a VALID URL (not that it exists, just meets the criteria)
    • it must NOT be enclosed in ( ) or Quotes (single or double)
    • If such a string is found it needs to be "replaced" with

    [](the_found_url)

    ie. two square brackets followed by the URL enclosed in ()

    Extra sugar would be... if the URL does not start with HTTP:// or HTTPS:// then prepend HTTP://

    Note for this, I don't need to worry about other URL types.... assumption will be HTTP only

    s="my favorite website is www.rdsisemore.com"
    s=FixTheURL(s)

    would become

    my favorite webiste is [](http://www.rdsisemore.com)"

    EDIT : actually.... it would need to become this

    my favorite webiste is [http://www.rdsisemore.com](http://www.rdsisemore.com)"

    Thank you all... I found a method that works, and is 'fast enough'

    • Given a string that "might" contain a URL
    • split the string on "space" boundaries [assumption... URL has %20 instead of ' ']
    • check each sub string against above RegEx
    • if TRUE, using Instr(original,substring), replace it with new pattern
  2. Greg O

    Aug 8 Xojo Inc Somewhere near Raleigh, NC
    Edited last week

    How about this...

    [^’”(]([a-z0-9]+\.[a-z0-9]+\.[a-z0-9]+)[^’”)]

    Replaced with

    [http://\1](http://\1)
  3. Dave S

    Aug 8 San Diego, California USA

    ok.... I guess.. not sure how to turn that into the required function

  4. Beatrix W

    Aug 8 Pre-Release Testers Europe (Germany)
    dim theRegex as new RegEx
    theRegex.options.greedy = False
    theRegex.Options.TreatTargetAsOneLine = True
    
    theRegex.searchPattern = your search pattern here
    theRegex.ReplacementPattern = your replacement pattern here
    new string = theRegex.Replace(old string)

    You also should check if the search pattern from Greg meets your needs. The regex is for three words with a point in between - as far as I can see at a glance. That wouldn't capture the top level domains. It also wouldn't capture cruftless domains where you don't have a www.

  5. 7 days ago

    Greg O

    Aug 9 Xojo Inc Somewhere near Raleigh, NC

    If you use Kem’s excellent RegExRx product, he’s got a Copy as Xojo Code Function that does all that for you.

    FWIW, It’s also an excellent way to learn how to use Regular Expressions.

  6. Sascha S

    Aug 9 Pre-Release Testers, Xojo Pro Germany/W'haven

    @Greg OLone If you use Kem’s excellent RegExRx product, he’s got a Copy as Xojo Code Function that does all that for you.

    Link to RegExRX in the Mac App Store

    It's really worth every cent! :)

  7. Dave S

    Aug 9 San Diego, California USA

    And a RegEx tool doesn't help if you can't make heads or tails of the code to begin with..

    But thanks anyways.

  8. Sascha S

    Aug 9 Pre-Release Testers, Xojo Pro Germany/W'haven

    @Dave S And a RegEx tool doesn't help if you can't make heads or tails of the code to begin with..

    RegExRX generates easy to understand native Xojo Code. And helps learning Regular Expressions.

  9. Dave S

    Aug 9 San Diego, California USA

    The "Xojo" part is not a huge issue..... its the RegEx part, and RegErRx won;t magically produce that.... it will just tell me what a supplied pattern might do...... If it took in an English description and spit out RegEx that would be one thing.... but it doesn't....

    No worries, this is a super minor part of my app, and I can come up with another way

  10. Kem T

    Aug 9 Pre-Release Testers, Xojo Pro, XDC Speakers New York

    I have made available a number of free templates with useful patterns, and most of those are commented and offer a description in the Source Text area. For example, the "Identify URL" pattern:

    (?xi-U) # FREE SPACING, case-insensitive, greedy
    
    # Define the prefix
    (?(DEFINE)(?<prefix>[A-Z]{3,}://))
    # Define a valid URL character
    (?(DEFINE)(?<valid>[A-Z0-9\-_~:/?\#[\]@!$&'()*+;=.,%]))
    
    # START
    \b # Word boundary
    (?: # Non-capturing group
    (?<=\<)(?&prefix)(?&valid)+(?=\>) # Anything between angle-brackets
    | # OR
    (?<=\[)(?&prefix)(?&valid)+(?=\]) # Anything between square-brackets
    | # OR
    (?<=\{)(?&prefix)(?&valid)+(?=\}) # Anything between curly-brackets
    | # OR
    (?&prefix)(?&valid)+(?<![\.,]) # Can't end on a dot or comma
    ) # End non-capturing group

    The description:

    This pattern will attempt to identify a URL. It contains four versions. The first three will attempt to identify and include any valid-looking URL between angle-, square-, or curly-brackets. The final one will mathing almost any valid-looking URL anywhere within text, but will exclude any trailing dot or comma.
    
    The benefit of this pattern is that it will include most URLs or attempted URLs. The drawback is, it will also include obviously invalid URLs.
    
    These are included: <http://www.something.com>, https://something.com?index=1&page=2, ftp://ftp.com/, httttp://blah.com, http://this.and.that/?s=%40, <http://www.something.com/?m=,,,>, [ftp://3.4.], {url://www.1223.com,} ssh://www.one%4t.com, http://a.
    
    This is not: htp3://www.something.com.
  11. Dave S

    Aug 9 San Diego, California USA
    Edited 7 days ago

    Thanks Kem,..... I had seen that, but have no idea how to make an Xojo function out of it....]

    I found this elsewhere

    Function IsValidURL(url As String) As Boolean
      Dim r As New RegEx
      
      r.SearchPattern = "((([A-Za-z]{3,9}:(?:\/\/)?)(?:[-;:&=\+\$,\w]+@)?[A-Za-z0-9.-]+|(?:www.|[-;:&=\+\$,\w]+@)[A-Za-z0-9.-]+)((?:\/[\+~%\/.\w-_]*)?\??(?:[-\+=&;%@.\w_]*)#?(?:[\w]*))?)"
      
      Return (r.Search(url) <> Nil)
      
    End Function

    but it is "wrong" (for my needs), as it returns that a string "contains" a url not that it IS a url

    I was hoping for some that was the eqivalent of Instr, since I have to replace any standalone URL with the string I described above

  12. Alberto D

    Aug 9 Pre-Release Testers, Xojo Pro

    RegEx is one of those things that I want to learn but I found it too difficult.

    What may help is to first define your criteria exactly:

    • will all URL will end in .com or you can have .net, .com.mx, .mx (yes in México we have domain.mx and domain.com.mx)
    • If you have something like 'domain.com.Hello' will you try to take the URL part? or just when it is the final thing on your URL or there is a space next to the URL, like 'Something domain.com' or 'Something domain.com something else'

    and more things that you define.

    It is not an easy task, took for example the automatic URL parser in this forum, it can link something like xojo.net.socket

    Note: I used italic option for some '.' to avoid the auto link

  13. Dave S

    Aug 9 San Diego, California USA
    Edited 7 days ago

    @Alberto D;Poo What may help is to first define your criteria exactly:

    not sure how much more I can define
    Does the string contain a VALID URL, and at what location is it in the string

    The code I posted above, does in fact "return the URL" but it is wrong in some situations

    • input String = "[link text](http://dev.nodeca.com)"
    • returned = text](http://dev.nodeca.com ) // the "text]" is not part of the URL for example, nor are the ()

    obviously it can be done.... look at the line above... This Forum code did EXACTLY what I'm trying to do

  14. Alberto D

    Aug 9 Pre-Release Testers, Xojo Pro
    Edited 7 days ago

    Yes, my point is not if it's possible, my point is that it is easy to have.it.wrong

    See what the forum did

    Sorry that I can't help you to even do what this forum does. Maybe some day I will learn the basics.

  15. Dave S

    Aug 9 San Diego, California USA
    Edited 7 days ago

    This seems to work... no freaking clue how or why (found elsewhere on this forum)

    Function IsValidURL(url As String) As Boolean
      Dim r As New RegEx
      
      r.SearchPattern = "^(?:(?:https?|ftp):\/\/)(?:\S+(?::\S*)?@)?(?:(?!10(?:\.\d{1,3}){3})(?!127(?:\.\d{1,3}){3})(?!169\.254(?:\.\d{1,3}){2})(?!192\.168(?:\.\d{1,3}){2})(?!172\.(?:1[6-9]|2\d|3[0-1])(?:\.\d{1,3}){2})(?:[1-9]\d?|1\d\d|2[01]\d|22[0-3])(?:\.(?:1?\d{1,2}|2[0-4]\d|25[0-5])){2}(?:\.(?:[1-9]\d?|1\d\d|2[0-4]\d|25[0-4]))|(?:(?:[a-z0-9]+-?)*[a-z0-9]+)(?:\.(?:[a-z0-9]+-?)*[a-z0-9]+)*(?:\.(?:[a-z]{2,})))(?::\d{2,5})?(?:/[^\s]*)?$"
    		
      
      Return (r.Search(url) <> Nil)
      
    End Function

    however I have to split out the strings to test...... which is fine

    and Alberto.... why is the link in your example "wrong"... it might not be a real URL, but techinally it is valid

  16. Dave S

    Aug 9 Answer San Diego, California USA

    Thank you all... I found a method that works, and is 'fast enough'

    • Given a string that "might" contain a URL
    • split the string on "space" boundaries [assumption... URL has %20 instead of ' ']
    • check each sub string against above RegEx
    • if TRUE, using Instr(original,substring), replace it with new pattern
  17. Dave S

    Aug 9 San Diego, California USA

    @Alberto D;Poo Just FYI
    https://regex101.com/ there is a pattern error on the one you posted also doing some tests there (after fixing the pattern) it doesn't match 'google.com' but it does match 'http://google.com'

    Thanks, and you are correct... however, in this app, its up to the user to take some responsibily...
    So if they type in google.com and the app does not make a link out it, then they just need to make it "more valid" :)

  18. Dave S

    Aug 9 San Diego, California USA

    this is what I would liked to achieve, but what I have is good enough for this application

    http://soapbox.github.io/linkifyjs/

  19. Newer ›

or Sign Up to reply!