[code]dim theRegex as new RegEx
theRegex.options.greedy = False
theRegex.Options.TreatTargetAsOneLine = True
theRegex.searchPattern = your search pattern here
theRegex.ReplacementPattern = your replacement pattern here
new string = theRegex.Replace(old string)[/code]
You also should check if the search pattern from Greg meets your needs. The regex is for three words with a point in between - as far as I can see at a glance. That wouldn’t capture the top level domains. It also wouldn’t capture cruftless domains where you don’t have a www.
The “Xojo” part is not a huge issue… its the RegEx part, and RegErRx won;t magically produce that… it will just tell me what a supplied pattern might do… If it took in an English description and spit out RegEx that would be one thing… but it doesn’t…
No worries, this is a super minor part of my app, and I can come up with another way
I have made available a number of free templates with useful patterns, and most of those are commented and offer a description in the Source Text area. For example, the “Identify URL” pattern:
(?xi-U) # FREE SPACING, case-insensitive, greedy
# Define the prefix
(?(DEFINE)(?<prefix>[A-Z]{3,}://))
# Define a valid URL character
(?(DEFINE)(?<valid>[A-Z0-9\\-_~:/?\\#[\\]@!$&'()*+;=.,%]))
# START
\\b # Word boundary
(?: # Non-capturing group
(?<=\\<)(?&prefix)(?&valid)+(?=\\>) # Anything between angle-brackets
| # OR
(?<=\\[)(?&prefix)(?&valid)+(?=\\]) # Anything between square-brackets
| # OR
(?<=\\{)(?&prefix)(?&valid)+(?=\\}) # Anything between curly-brackets
| # OR
(?&prefix)(?&valid)+(?<![\\.,]) # Can't end on a dot or comma
) # End non-capturing group
The description:
This pattern will attempt to identify a URL. It contains four versions. The first three will attempt to identify and include any valid-looking URL between angle-, square-, or curly-brackets. The final one will mathing almost any valid-looking URL anywhere within text, but will exclude any trailing dot or comma.
The benefit of this pattern is that it will include most URLs or attempted URLs. The drawback is, it will also include obviously invalid URLs.
These are included: <http://www.something.com>, https://something.com?index=1&page=2, ftp://ftp.com/, httttp://blah.com, http://this.and.that/?s=%40, <http://www.something.com/?m=,,,>, [ftp://3.4.], {url://www.1223.com,} ssh://www.one%4t.com, http://a.
This is not: htp3://www.something.com.
Thanks Kem,… I had seen that, but have no idea how to make an Xojo function out of it…]
I found this elsewhere
Function IsValidURL(url As String) As Boolean
Dim r As New RegEx
r.SearchPattern = "((([A-Za-z]{3,9}:(?:\\/\\/)?)(?:[-;:&=\\+\\$,\\w]+@)?[A-Za-z0-9.-]+|(?:www.|[-;:&=\\+\\$,\\w]+@)[A-Za-z0-9.-]+)((?:\\/[\\+~%\\/.\\w-_]*)?\\??(?:[-\\+=&;%@.\\w_]*)#?(?:[\\w]*))?)"
Return (r.Search(url) <> Nil)
End Function
but it is “wrong” (for my needs), as it returns that a string “contains” a url not that it IS a url
I was hoping for some that was the eqivalent of Instr, since I have to replace any standalone URL with the string I described above
RegEx is one of those things that I want to learn but I found it too difficult.
What may help is to first define your criteria exactly:
will all URL will end in .com or you can have .net, .com.mx, .mx (yes in Mxico we have domain.mx and domain.com.mx)
If you have something like ‘domain.com.Hello’ will you try to take the URL part? or just when it is the final thing on your URL or there is a space next to the URL, like ‘Something domain.com’ or ‘Something domain.com something else’
and more things that you define.
It is not an easy task, took for example the automatic URL parser in this forum, it can link something like xojo.net.socket
Note: I used italic option for some ‘.’ to avoid the auto link
This seems to work… no freaking clue how or why (found elsewhere on this forum)
Function IsValidURL(url As String) As Boolean
Dim r As New RegEx
r.SearchPattern = "^(?:(?:https?|ftp):\\/\\/)(?:\\S+(?::\\S*)?@)?(?:(?!10(?:\\.\\d{1,3}){3})(?!127(?:\\.\\d{1,3}){3})(?!169\\.254(?:\\.\\d{1,3}){2})(?!192\\.168(?:\\.\\d{1,3}){2})(?!172\\.(?:1[6-9]|2\\d|3[0-1])(?:\\.\\d{1,3}){2})(?:[1-9]\\d?|1\\d\\d|2[01]\\d|22[0-3])(?:\\.(?:1?\\d{1,2}|2[0-4]\\d|25[0-5])){2}(?:\\.(?:[1-9]\\d?|1\\d\\d|2[0-4]\\d|25[0-4]))|(?:(?:[a-z0-9]+-?)*[a-z0-9]+)(?:\\.(?:[a-z0-9]+-?)*[a-z0-9]+)*(?:\\.(?:[a-z]{2,})))(?::\\d{2,5})?(?:/[^\\s]*)?$"
Return (r.Search(url) <> Nil)
End Function
however I have to split out the strings to test… which is fine
and Alberto… why is the link in your example “wrong”… it might not be a real URL, but techinally it is valid
[quote=400012:@Alberto De Poo]Just FYI https://regex101.com/ there is a pattern error on the one you posted also doing some tests there (after fixing the pattern) it doesn’t match ‘google.com’ but it does match ‘http://google.com’[/quote]
Thanks, and you are correct… however, in this app, its up to the user to take some responsibily…
So if they type in google.com and the app does not make a link out it, then they just need to make it “more valid”
[quote=400008:@Dave S]
and Alberto… why is the link in your example “wrong”… it might not be a real URL, but techinally it is valid[/quote]
Because the forum see .it and then creates a link but not with other characters:
have.it.wrong
have.pp.wrong
either the URL with it pp should also be linked or the process to create link should check if it is the last word and only then link that. If the first one is valid URL (but wrong), then the second should be valid too.
Anyway, that doesn’t matter, it was just an observation on how the forum do things.
[quote=400014:@Dave S]Thanks, and you are correct… however, in this app, its up to the user to take some responsibily…
So if they type in google.com and the app does not make a link out it, then they just need to make it “more valid” :)[/quote]
But you said:
[quote=399889:@Dave S]I need a function that does the following
s="my favorite website is www.rdS.com"
s=FixTheURL(s)
would become
my favorite webiste is "
EDIT : actually… it would need to become this
my favorite webiste is http://www.rdS.com"[/quote]
But it’s ok, is your program and you know what you want. I’m glad that you got the solution.