HTML Parser

Have anyone of you developed an HTML Parser?
The thing that I need to do is to read an HTML page and extract some div with a fixed ID or Class.
Anyone of you can tell me if there is some already made thing?
Thanks to all

You could do this with a regular expression, something like:

<div .*id=\"put your id in here\".*>(.*)<\\/div>

the first match would be the contents of the div.

you can read the source text of the HTML Page

Document.Complete

me.ExecuteJavaScript "window.status = document.getElementsByTagName('html')[0].innerHTML;"

Status Changed

dim s as String = newStatus
s = NthField(s, "<div id="""xxxxx", 2)
s = NthField(s, "</div>", 1)
MSGBox s

Regex is the way to go, I scrape content from HTML that way all the time.

But remember to not overdo it http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454

:slight_smile:

I’ve developed HTML parser library using Xojo, similar to XMLDocument and has ability to extract contents of some tags using xpath. But, if you just analyze a html with same structure only, it’s more easy to use regex function.