HTML Parser

Leonardo_Capoccia1 · October 23, 2015, 5:17pm

Have anyone of you developed an HTML Parser?
The thing that I need to do is to read an HTML page and extract some div with a fixed ID or Class.
Anyone of you can tell me if there is some already made thing?
Thanks to all

Greg_O_Lone · October 23, 2015, 5:46pm

You could do this with a regular expression, something like:

<div .*id=\"put your id in here\".*>(.*)<\\/div>

the first match would be the contents of the div.

Axel_Schneider · October 23, 2015, 5:51pm

you can read the source text of the HTML Page

Document.Complete

me.ExecuteJavaScript "window.status = document.getElementsByTagName('html')[0].innerHTML;"

Status Changed

dim s as String = newStatus
s = NthField(s, "<div id="""xxxxx", 2)
s = NthField(s, "</div>", 1)
MSGBox s

Ashot_Khachatryan · October 23, 2015, 8:27pm

Regex is the way to go, I scrape content from HTML that way all the time.

But remember to not overdo it http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454

Asis_Patisahusiwa · October 25, 2015, 12:23am

I’ve developed HTML parser library using Xojo, similar to XMLDocument and has ability to extract contents of some tags using xpath. But, if you just analyze a html with same structure only, it’s more easy to use regex function.