Test if a string is a valid URL

The PCRE version used in the native RegEx class does not support values of \x{NN} greater than FF, nor the Unicode tokens (not used here). The more modern implementation provided by the MBS plugins supports both.

It’s not a factor here, but the MBS implementation is also substantially faster for repeated matches and provides other features missing from the native which is why it will be the engine behind RegExRX starting with the next version.

Is RegExRX written in Xojo? Nice.

Perhaps I’ll look at using the MBS plugin until Xojo decide to utilise the newer PCRE version.

Not to detract from using a plugin, but are you testing URLs with high UTF characters? You can use:

^(?:(?:https?|ftp)://)(?:\\S+(?::\\S*)?@)?(?:(?!10(?:\\.\\d{1,3}){3})(?!127(?:\\.\\d{1,3}){3})(?!169\\.254(?:\\.\\d{1,3}){2})(?!192\\.168(?:\\.\\d{1,3}){2})(?!172\\.(?:1[6-9]|2\\d|3[0-1])(?:\\.\\d{1,3}){2})(?:[1-9]\\d?|1\\d\\d|2[01]\\d|22[0-3])(?:\\.(?:1?\\d{1,2}|2[0-4]\\d|25[0-5])){2}(?:\\.(?:[1-9]\\d?|1\\d\\d|2[0-4]\\d|25[0-4]))|(?:(?:[a-z\\x{00a1}-\\x{ffff}0-9]+-?)*[a-z\\x{00a1}-\\x{ffff}0-9]+)(?:\\.(?:[a-z\\x{00a1}-\\x{ffff}0-9]+-?)*[a-z\\x{00a1}-\\x{ffff}0-9]+)*(?:\\.(?:[a-z\\x{00a1}-\\x{ffff}]{2,})))(?::\\d{2,5})?(?:/[^\\s]*)?$

Just change the UTF ranges to fit in the confines of Xojo’s lib.

That pattern won’t work with the native implementation. The maximum value for \x is FF.

Technically, written in Real Studio. :slight_smile: The next version (Cocoa) will be Xojo with RegExMBS as the engine.

Right, but I did say change the UTF ranges to fit. I did that, and the pattern seems to work just fine for all the URLs (and anti-URLs) I’ve tested it on.

Sorry, I thought you had posted what you thought was a “fixed” pattern. I see your meaning now.

I went through this tread and tested the suggestions. None worked satisfactorily to validate a url on syntax.
It’s on my list to get informed on regular expressions syntax, but for now I just need a sample that at least works for 90%.
Hope somebody can help me here building the nearly perfect url-validation function.

I have:

[code] Function UrlCheck(Url as String) As Boolean
// check url via regular expressions
Dim re As RegEx
re = New RegEx
Dim rm As RegExMatch
Dim strUrl As String = Lowercase(Trim(Url))

re.Options.DotMatchAll = True
re.Options.CaseSensitive = True
re.SearchPattern = “(?i)\b(?:(?:https?|ftp):\/\/|www\.)[-a-z0-9+&@#\/%?=~|!:,.;]*[-a-z0-9+&@#\/%=~|]”

rm = re.Search(strUrl)

Return (rm <> Nil)
End Function[/code]

they all conform to

scheme:[//[user:password@]host[:port]][/]path[?query][#fragment]

problem is that the optional parts vary from scheme to scheme
here’s the iana list of registered schemes
http://www.iana.org/assignments/uri-schemes/uri-schemes.xhtml
you can peruse the specifics of each from there

http allows one set https://tools.ietf.org/html/rfc7230#section-2.7.1
ftp allows a very different one https://tools.ietf.org/html/rfc1738#section-3.1

but both conform to the URI scheme

I doubt you’ll get one regex that handles them all and is “readable”

This is too complex for me and does not understand, but perhaps helps you :sunglasses:

http://jmrware.com/articles/2009/uri_regexp/URI_regex.html