Hello all!
I need to get the text inside an htmlviewer, as rendered (not source code). No matter of text format (bold, italics… no need them). Exactly when I read the .text property of a textarea…
Any help?

Many thanks!


Dim f As Folderitem
HTMLViewer1.loadpage(textArea1.text, f)

works fine for me

Yes, but I don’t need it.
I have an htmlviewer with a webpage loaded inside it, and I need to have (as string) the text inside that webpage (not the html source but the text as viewed inside the page). Something like if I load a webpage in Safari, select all, copy and paste in a simple text editor.
I hope it’s more clear now…

In our MBS Plugins we have methods on HTMLViewer for this. The IETextMBS method is for Windows and HTMLTextMBS for Mac.
You can try that for free if you like.

I have the license to your very nice MBS Plugins… but that methods return me the html source and not the rendered html text. My problem is because I need to read some informations in a webpage that are generated by an encrypted javascript… the web page display the text I need but it’s not readable in the source code…

Ups. HTMLTextMBS on Mac is of course the wrong one.
Looks like we have no direct funvtion, so please use this:

TextArea1.text = HTMLViewer1.EvaluateJavaScriptMBS("document.body.innerText")

and on Windows use IETextMBS.

It’s better now, but not yet sufficient.
I have this url (for example)
that displays some email addresses. That email addresses do not appear in the source code, and do not appear in your function.

Is possible in any way to select all and copy the text in the htmlviewer? I think that may be a good way… but I don’t know how…

Many thanks for your precious help.

No wonder, it’s a frame embedded in the page. YOu need to load the frame alone.

i need to do something similar on HTMLViewer on Windows. Do a Select All and Copy. How do i accomplish that Christian??

The content of the HTMLViewer consist of both images and text.

never mind… i think i got it…

  #If TargetWin32 Then
  #elseIf TargetMacOS Then

Maybe I am raising the dead, but I have the same issue with a different twist. I am trying to gather status information from a site and the information is in a CSS page that is not easy to read (not encrypted). The HTTPSocket is useless and the HTMLViewer add ons from MBS don’t seem to help either. The idea is to periodically, maybe once a month or so, read the status from the site and report any change. An example of such a page is here:
No frames, no encryption. The status is visible to the eye and to the HTML viewer, but I can’t seem to reach it programatically.

I would recommend using the HTMLViewer and then accessing the DOM via JavaScript to get the specific node you are looking for… If you take a screen cap and highlight the data you want, I can whip something up for you…

Shao Sean, I greatly appreciate the help. Using the URL referenced before (I am just using XOJO as an example) the following is the screen shot annotated with the information I would like to capture circled by annotating the screen shot:

Original URL:

Again, thank you for your assistance.

Hopefully it is documented enough to understand, otherwise just ask…

That is absolutely brilliant. While my understanding of JS is a bit weak, I get it in concept. You have given me something that works and something to study and learn in detail. Thank you!

Glad it suits your needs… Any questions, feel free to ask…

Basically you’d want to look at the HTML and find the name or ID of the HTML object that contains that text. Then use JavaScript to get it, using a technique like this:

Note that doing this via code may be considered “web scraping” and may be violating the website’s TOS. Given that it’s the US Govt, may be better to ask permission rather than forgiveness in this case :slight_smile:

Could not find anything in regards to a ToS for the site, but there is this information in regards to copyright of the data

[quote]Copyrights are administered by the Copyright Office, a division of the Library of Congress. Copyright law (17 U.S.C. § 105) states that all materials created by the United States government are in the public domain. However, there are restrictions on use.

Anyone incorporating a work of the U.S. Government into a copyrighted work should be aware of 17 U.S.C. § 403. This section requires a copyright notice to contain a statement identifying what portions of the work consist of a work of the U.S.Government. Failure to do so could result in loss of copyright protection for the entire work.[/quote]

hi shao sean, i try your file but the result is ‘done’, not the html sourcecode, any settings? that must i do?