regEx - help with pattern

I’ve dome some work and some of it actually work!

[code] '

About 40 800 results (0,19 seconds) 

'

About96 900 results
[/code]

It’s the HTML tag “

” that’s I’m looking for. It may vary in design…

I’ve tried different alternatives for the rg.SearchPattern= but so far, it’s failing!
Any ideas of what may be wrong!?

[code]
Dim rg as New RegEx
Dim myMatch as RegExMatch
DIM match as Integer

rg.SearchPattern= (<div.id"resultatStats">).(<\/div>) <--------------------THIS ONE!!

myMatch = rg.search(TextArea1.text)

if myMatch <> Nil then
MsgBox myMatch.SubExpressionString(0)
else
MsgBox “Text not found!”
End if
exception err as RegExException

MsgBox err.message[/code]

Avoid .* whenever possible. It’s too imprecise and may include far more than you intended. You also want to control greediness when you want to make sure the closing tag matches the opening tag.

Try this:

rg.SearchPattern = "(?Umsi)<div\\b[^<>]*\\bid\\b *= *""resultStats""[^<>]*>(.*)</div>"

The opening switches “(?Umsi)” tells the engine that you don’t want to be greedy, treat the text as multiple lines, the dot is allowed to match a newline character, and case doesn’t matter.

It then looks for the opening “<div” followed by any amount of text that is not “<” or “>”, followed by “id=“resultStats””. (There can be spaces around the “=” and it won’t matter.) Next, any amount of text that is not “<” or “>” until it reaches “>”.

Here is where we use “.*” to grab all text until the closing “”, and this will work because the engine is not greedy.

The actual results will be in SubExpressionString( 1 ).

Thank you!
For the code and for the excellent description!!

http://manual.macromates.com/en/regular_expressions

I read about “greedy” here. It’s a new expression for me.
In my previous life I was married with a woman… she told me I was greedy…

But when it comes to code, the expression is new to me.
I never found any example of a non greedy code!

I will try the code tomorrow! Lovely!! Thank you!!

With regular expressions, “greedy” means, “match as much as you can”, as opposed to, “as little as you can”. Another way to look at it is, when greedy, the engine will start at the end of the string and work backwards to match the closing token(s), whereas ungreedy will work forwards.

Let’s take the example pattern <.*> and apply it to the string “”. With greedy turned on, that pattern will match “”, but with it off, it will only match “”.

Another example is the pattern .* applied to “12345”. With greedy turned on, the result will be “12345”, but with it off, it will match nothing. Without greedy, the engine returns the minimum it needs to fulfill the pattern, and since “*” means “zero or more”, the minimum is zero.

Another great source is:

http://regular-expressions.info

Beware of the cthulhu:

http://blog.codinghorror.com/parsing-html-the-cthulhu-way/