It’s not really moving the problem to a different place, it is adding functionality to RegEx that you will use. Make it generic and then reuse it everywhere. For example, create a module named “RegExHelpers” and then add this method to it:
Function FindAll(Extends r As RegEx, TargetString As String) As String()
Dim matches() As String
Dim match As RegExMatch = r.Search(TargetString)
Do
If match <> Nil Then
matches.Append match.SubExpressionString(0)
End If
match = r.Search
Loop Until match Is Nil
Return matches
End Function
Now, in your app, simply do:
Dim r As New RegEx
r.SearchPattern = r.SearchPattern = "(?U:\\(.*\\))"
Dim matches() As String = r.FindAll(theText)
You can of course make as many of these helpers as you’d like, populate the array w/match objects instead of strings, etc…
Can’t edit… Forgot to mention that all “Extends” methods in a module has to be marked as Global. It’s not really a global method as it is callable only as an extension of an already instantiated RegEx object.
Well, it’s really a “Globally” callable method from anywhere you place and handle a RegEx.
A little OCD optimization:
Function FindAll(Extends r As RegEx, TargetString As String) As String()
Dim matches() As String
Dim match As RegExMatch = r.Search(TargetString)
While match <> Nil
matches.Append match.SubExpressionString(0)
match = r.Search
Wend
Return matches
End Function
[quote=116831:@Markus Winter]H
but is there a way to have RegEx return such a list directly?
[/quote]
What everyones saying is - no - not directly from the Reg Ex engine - you need a loop
Well, excepting the subtle “in one step directly using RegEx” part, it made exactly what you asked.
But I suspected you wished to to ask something different.
You asked: I wished a list of used values using RegEx.
But because your mention of using Dictionaries instead Arrays, I suspect you wish a list of used values using RegEx with no repetition.
Oh, now I understand what you want! I thought you meant by just one method call, not the text just once, my bad.
Again, I would make things generic, you will likely find other uses for a method such as this. Doing it in a dictionary is likely to be costly, because of key lookups taking place each time you would add a new value. I do not have large amounts of text to benchmark against, but here is a little method that you can change and rename to your needs, and again use on any array…
Function CountUnique(list() As String) As Pair()
Dim results() As Pair
// Catch a special condition where only 1 item exists. This is in place
// because it will run only once per call, other methods of looping through
// data would require an If to be inserted into the main for loop executed
// n times.
If list.Ubound = 0 Then
results.Append list(0) : 1
End If
// This will return on an empty list or on a list of 1
If list.Ubound < 1 Then
Return results
End If
list.Sort
list.Append "" // End Of List Marker, simply triggers a change on 'current'
Dim current As String = list(0)
Dim count As Integer = 1
For i As Integer = 1 To list.Ubound
If list(i) <> current Then
results.Append current : count
count = 1
current = list(i)
Else
count = count + 1
End If
Next
Return results
End Function
Then in your main code, you can do:
Dim r As New RegEx
r.SearchPattern = "(?U:\\(.*\\))"
Dim list() As String = r.FindAll(text)
Dim counts() As Pair = CountUnique(list)
For Each p As Pair In counts
Print Str(p.Left) + " occurred " + Str(p.Right) + " times"
Next
Here is a FindUnique (mark Global and place in the RegExHelpers module) that uses a Dictionary that you can benchmark. On my tests (with your sample text duplicated quite a few times) I see no noticeable difference in a Dictionary and Array, but it only contains two variants. I do not know how many possible variants there are, so best to test w/real data.
Function FindUnique(Extends r As RegEx, TargetString As String) As Pair()
Dim matches As New Dictionary
Dim match As RegExMatch = r.Search(TargetString)
While match <> Nil
Dim v As String = match.SubExpressionString(0)
matches.Value(v) = matches.Lookup(v, 0) + 1
match = r.Search
Wend
Dim results() As Pair
ReDim results(matches.Count - 1)
For i As Integer = 0 To results.Ubound
Dim key As String = matches.Key(i)
results(i) = key : matches.Value(key)
Next
Return results
End Function
I’m a little late to this party, and admittedly didn’t read every post here. Having said that, based on the OP, this will do what Markus wants by locating only the last occurrence of each matching string.
dim rx as new RegEx
rx.SearchPattern = "(?msi-U)\\(([^)]+)\\)(?!.*\\(\\g1\\))"
dim matches() as string
dim match as RegExMatch = rx.Search( sourceText )
while match <> nil
matches.Append match.SubExpressionString( 1 )
match = rx.Search()
wend
It matches the string and puts into subgroup 1, then uses a negative lookahead to make sure the same string doesn’t occur later. The mode code[/code] is the same as DotMatchesAll.