RegEx just melts my brain :(

I need a function that does the following

  • given a String, determine if that string contains a VALID URL (not that it exists, just meets the criteria)
  • it must NOT be enclosed in ( ) or Quotes (single or double)
  • If such a string is found it needs to be “replaced” with

ie. two square brackets followed by the URL enclosed in ()

Extra sugar would be… if the URL does not start with HTTP:// or HTTPS:// then prepend HTTP://

Note for this, I don’t need to worry about other URL types… assumption will be HTTP only

s="my favorite website is www.rdS.com"
s=FixTheURL(s)

would become

EDIT : actually… it would need to become this

How about this…

[^’”(]([a-z0-9]+\\.[a-z0-9]+\\.[a-z0-9]+)[^’”)]

Replaced with

[http://\\1](http://\\1)

ok… I guess… not sure how to turn that into the required function

[code]dim theRegex as new RegEx
theRegex.options.greedy = False
theRegex.Options.TreatTargetAsOneLine = True

theRegex.searchPattern = your search pattern here
theRegex.ReplacementPattern = your replacement pattern here
new string = theRegex.Replace(old string)[/code]

You also should check if the search pattern from Greg meets your needs. The regex is for three words with a point in between - as far as I can see at a glance. That wouldn’t capture the top level domains. It also wouldn’t capture cruftless domains where you don’t have a www.

If you use Kem’s excellent RegExRx product, he’s got a Copy as Xojo Code Function that does all that for you.

FWIW, It’s also an excellent way to learn how to use Regular Expressions.

Link to RegExRX in the Mac App Store

It’s really worth every cent! :slight_smile:

And a RegEx tool doesn’t help if you can’t make heads or tails of the code to begin with…

But thanks anyways.

RegExRX generates easy to understand native Xojo Code. And helps learning Regular Expressions.

The “Xojo” part is not a huge issue… its the RegEx part, and RegErRx won;t magically produce that… it will just tell me what a supplied pattern might do… If it took in an English description and spit out RegEx that would be one thing… but it doesn’t…

No worries, this is a super minor part of my app, and I can come up with another way

I have made available a number of free templates with useful patterns, and most of those are commented and offer a description in the Source Text area. For example, the “Identify URL” pattern:

(?xi-U) # FREE SPACING, case-insensitive, greedy

# Define the prefix
(?(DEFINE)(?<prefix>[A-Z]{3,}://))
# Define a valid URL character
(?(DEFINE)(?<valid>[A-Z0-9\\-_~:/?\\#[\\]@!$&'()*+;=.,%]))

# START
\\b # Word boundary
(?: # Non-capturing group
(?<=\\<)(?&prefix)(?&valid)+(?=\\>) # Anything between angle-brackets
| # OR
(?<=\\[)(?&prefix)(?&valid)+(?=\\]) # Anything between square-brackets
| # OR
(?<=\\{)(?&prefix)(?&valid)+(?=\\}) # Anything between curly-brackets
| # OR
(?&prefix)(?&valid)+(?<![\\.,]) # Can't end on a dot or comma
) # End non-capturing group

The description:

This pattern will attempt to identify a URL. It contains four versions. The first three will attempt to identify and include any valid-looking URL between angle-, square-, or curly-brackets. The final one will mathing almost any valid-looking URL anywhere within text, but will exclude any trailing dot or comma.

The benefit of this pattern is that it will include most URLs or attempted URLs. The drawback is, it will also include obviously invalid URLs.

These are included: <http://www.something.com>, https://something.com?index=1&page=2, ftp://ftp.com/, httttp://blah.com, http://this.and.that/?s=%40, <http://www.something.com/?m=,,,>, [ftp://3.4.], {url://www.1223.com,} ssh://www.one%4t.com, http://a.

This is not: htp3://www.something.com.

Thanks Kem,… I had seen that, but have no idea how to make an Xojo function out of it…]

I found this elsewhere

Function IsValidURL(url As String) As Boolean
  Dim r As New RegEx
  
  r.SearchPattern = "((([A-Za-z]{3,9}:(?:\\/\\/)?)(?:[-;:&=\\+\\$,\\w]+@)?[A-Za-z0-9.-]+|(?:www.|[-;:&=\\+\\$,\\w]+@)[A-Za-z0-9.-]+)((?:\\/[\\+~%\\/.\\w-_]*)?\\??(?:[-\\+=&;%@.\\w_]*)#?(?:[\\w]*))?)"
  
  Return (r.Search(url) <> Nil)
  
End Function

but it is “wrong” (for my needs), as it returns that a string “contains” a url not that it IS a url

I was hoping for some that was the eqivalent of Instr, since I have to replace any standalone URL with the string I described above

RegEx is one of those things that I want to learn but I found it too difficult.

What may help is to first define your criteria exactly:

  • will all URL will end in .com or you can have .net, .com.mx, .mx (yes in Mxico we have domain.mx and domain.com.mx)
  • If you have something like ‘domain.com.Hello’ will you try to take the URL part? or just when it is the final thing on your URL or there is a space next to the URL, like ‘Something domain.com’ or ‘Something domain.com something else’
    and more things that you define.

It is not an easy task, took for example the automatic URL parser in this forum, it can link something like xojo.net.socket

Note: I used italic option for some ‘.’ to avoid the auto link

not sure how much more I can define
Does the string contain a VALID URL, and at what location is it in the string

The code I posted above, does in fact “return the URL” but it is wrong in some situations

obviously it can be done… look at the line above… This Forum code did EXACTLY what I’m trying to do

Yes, my point is not if it’s possible, my point is that it is easy to have.it.wrong

See what the forum did

Sorry that I can’t help you to even do what this forum does. Maybe some day I will learn the basics.

This seems to work… no freaking clue how or why (found elsewhere on this forum)

Function IsValidURL(url As String) As Boolean
  Dim r As New RegEx
  
  r.SearchPattern = "^(?:(?:https?|ftp):\\/\\/)(?:\\S+(?::\\S*)?@)?(?:(?!10(?:\\.\\d{1,3}){3})(?!127(?:\\.\\d{1,3}){3})(?!169\\.254(?:\\.\\d{1,3}){2})(?!192\\.168(?:\\.\\d{1,3}){2})(?!172\\.(?:1[6-9]|2\\d|3[0-1])(?:\\.\\d{1,3}){2})(?:[1-9]\\d?|1\\d\\d|2[01]\\d|22[0-3])(?:\\.(?:1?\\d{1,2}|2[0-4]\\d|25[0-5])){2}(?:\\.(?:[1-9]\\d?|1\\d\\d|2[0-4]\\d|25[0-4]))|(?:(?:[a-z0-9]+-?)*[a-z0-9]+)(?:\\.(?:[a-z0-9]+-?)*[a-z0-9]+)*(?:\\.(?:[a-z]{2,})))(?::\\d{2,5})?(?:/[^\\s]*)?$"
		
  
  Return (r.Search(url) <> Nil)
  
End Function

however I have to split out the strings to test… which is fine

and Alberto… why is the link in your example “wrong”… it might not be a real URL, but techinally it is valid

Thank you all… I found a method that works, and is ‘fast enough’

  • Given a string that “might” contain a URL
  • split the string on “space” boundaries [assumption… URL has %20 instead of ’ ']
  • check each sub string against above RegEx
  • if TRUE, using Instr(original,substring), replace it with new pattern

[quote=400012:@Alberto De Poo]Just FYI
https://regex101.com/ there is a pattern error on the one you posted also doing some tests there (after fixing the pattern) it doesn’t match ‘google.com’ but it does match ‘http://google.com’[/quote]

Thanks, and you are correct… however, in this app, its up to the user to take some responsibily…
So if they type in google.com and the app does not make a link out it, then they just need to make it “more valid” :slight_smile:

this is what I would liked to achieve, but what I have is good enough for this application

http://soapbox.github.io/linkifyjs/

[quote=400008:@Dave S]
and Alberto… why is the link in your example “wrong”… it might not be a real URL, but techinally it is valid[/quote]

Because the forum see .it and then creates a link but not with other characters:
have.it.wrong
have.pp.wrong

either the URL with it pp should also be linked or the process to create link should check if it is the last word and only then link that. If the first one is valid URL (but wrong), then the second should be valid too.

Anyway, that doesn’t matter, it was just an observation on how the forum do things.

[quote=400014:@Dave S]Thanks, and you are correct… however, in this app, its up to the user to take some responsibily…
So if they type in google.com and the app does not make a link out it, then they just need to make it “more valid” :)[/quote]

But you said:

[quote=399889:@Dave S]I need a function that does the following

s="my favorite website is www.rdS.com"
s=FixTheURL(s)

would become
my favorite webiste is "

EDIT : actually… it would need to become this
my favorite webiste is http://www.rdS.com"[/quote]

But it’s ok, is your program and you know what you want. I’m glad that you got the solution.