Read HTMLviewer contents

Mark_Jordan · August 9, 2022, 4:14pm

I want to be able to grab a web page, copy it’s contents to clipboard, and search for one item on that page (it’s from a music site that lists a song’s key). I can get the page up easy enough on an HTMLviewer but don’t know how to grab the text within.

Any help would be appreciated as it would allow me to get the key signatures of over 1000 songs.

Anthony_G_Cyphers · August 9, 2022, 4:19pm

Add this method to a module:

Public Function HTML(extends h as DesktopHTMLViewer) As String
  Return h.ExecuteJavaScriptSync( "document.documentElement.outerHTML;" )
End Function

Call as:

var result as String = HTMLViewer1.HTML

Mark_Jordan · August 9, 2022, 4:21pm

I’ll try it!

Anthony_G_Cyphers · August 9, 2022, 4:22pm

If you want the (more or less) plain-text content:

Public Function PlainText(extends h as DesktopHTMLViewer) As String
  Return h.ExecuteJavaScriptSync( "document.documentElement.textContent;" )
End Function

Anthony_G_Cyphers · August 9, 2022, 4:25pm

I do feel obligated to say that, instead of scraping from a web site which may be a violation of that site’s licensing or otherwise questionable, you should see if there is an API available that you can use to get the information you need. Maybe something like this?

Mark_Jordan · August 9, 2022, 4:31pm

Worked! But I first had to update the htmlviewer to Desktop version.

Thanks. As for the comment below that scraping a site may be a violation of licensing, etc., I hadn’t thought of that. I doubt highly it’s a problem since it’s a website of hymns I’m scouring but I’ll look into that.

Again, thanks. This is the type of thing an amateur programmer like me would never have figured out on my own.

Anthony_G_Cyphers · August 9, 2022, 4:32pm

Happy to help!