The method uses the RegEx pattern from here. Everytime I try to use one of the RegEx patterns from StackOverflow (like the one you posted) I get a RegExSearchPatternException saying:
FUNCTION isValidURL(URL as string) as boolean
Dim err_flag As Boolean
Dim i As Integer
Dim s As String
err_flag=False
If Left(url,7)="HTTP://" Then
url=Mid(url,8)
Elseif Left(url,8)="HTTPS://" Then
url=Mid(url,9)
End If
url=ReplaceAll(url,"\",".")
url=ReplaceAll(url,"/",".")
err_flag=(url.Len=0)
If Not err_flag Then
For i=1 To url.Len
Select Case Mid(url,i,1)
Case "A" To "Z","a" To "z","0" To "9","_","-",".","~"
Case "%" ' hex char
s="&H"+Mid(url,i+1,2)
If (Val(s)=0 And s<>"&H00") Then err_flag=True
Case Else
err_flag=True
End Select
If err_flag=True Then Exit For
Next i
End If
Return Not err_flag
END FUNCTION
It returns TRUE if URL is VALID (not if URL exists)… just it meets the patter requirement
Average across 100,000 iterations. So, they are equals, well within the err of benchmarking. The regex version, though, will test all sorts of URLs, such as ftp://, mailto:, etc… It will also handle URLs with ports, users and passwords, for example: https://john:doe@google.com:394/jack … So I believe it to be a bit more robust.
Average across 100,000 iterations. So, they are equals, well within the err of benchmarking. The regex version, though, will test all sorts of URLs, such as ftp://, mailto:, etc… It will also handle URLs with ports, users and passwords, for example: https://john:doe@google.com:394/jack … So I believe it to be a bit more robust.[/quote]
You know, if your app does a lot of IsURL checking, the regexp version can be run in 1/2 the time. Simply make the RegEx itself a property of your application, window, class, module. That saves the creation, assignment per iteration. Then change the function to read:
Function IsValidURL(url As String) As Boolean
Return (IsValidURL_RegEx.Search(url) <> Nil)
End Function
This reduces the time per iteration to 0.0096ms vs. 0.0198ms.
[quote=16224:@Dave S]Here is what I use… without REGEX
FUNCTION isValidURL(URL as string) as boolean
Dim err_flag As Boolean
Dim i As Integer
Dim s As String
err_flag=False
If Left(url,7)="HTTP://" Then
url=Mid(url,8)
Elseif Left(url,8)="HTTPS://" Then
url=Mid(url,9)
End If
url=ReplaceAll(url,"\",".")
url=ReplaceAll(url,"/",".")
err_flag=(url.Len=0)
If Not err_flag Then
For i=1 To url.Len
Select Case Mid(url,i,1)
Case "A" To "Z","a" To "z","0" To "9","_","-",".","~"
Case "%" ' hex char
s="&H"+Mid(url,i+1,2)
If (Val(s)=0 And s<>"&H00") Then err_flag=True
Case Else
err_flag=True
End Select
If err_flag=True Then Exit For
Next i
End If
Return Not err_flag
END FUNCTION
It returns TRUE if URL is VALID (not if URL exists)… just it meets the patter requirement[/quote]
Again thanks for this Dave but there are flaws with this method too. It incorrectly thinks the following are valid URLs:
Sometimes finding the right regex is the hard part :-/
I did find, what appears to be the perfect one, but I receive the same error you do, character value in \x{…} sequence is too large… Not sure where that is coming from.
Does anyone have the MBS plugin installed? I think Christian makes a RegEx class. If that handles the search pattern correctly it would confirm a Xojo bug I guess?
I’m not certain, someone who knows the internals of Xojo’s regexp implementation should chime in on Unicode characters in the actual expression. Stripping the unicode ranges from diegoperini’s expression makes it work in Xojo:
However, it does not do tld validation, for example, www.googlecom is a valid URL semantically. Same as john.com, i.e. host name and TLD. Now, is com valid vs. googlecom? or com vs. ca, or ca vs biz, etc… What about us.edu vs. .kao.edu, etc…
Oh, you could give the Regex’s a go in another language, like Ruby, Python or Perl. That would be a good test also.
The RegEx class uses a rather old version of PCRE. There’s probably a Feedback case about upgrading PCRE that you can sign on to, but I don’t know it offhand.
If you allow the omission of the ‘http://’ or ‘https://’ in a URL, I can’t see why this one is not valid? I obviously see that there is a mistake, but this one can be perfectly valid. What you’re expecting from your code is more than telling if an URL is valid, you’re asking to determine if this URL looks valid… You won’t get that with a simple REGEX pattern. You may need a bit of A.I. to do this…