HTMLviewer.text?

Sergio_Tamborini · July 25, 2013, 3:54pm

Hello all!
I need to get the text inside an htmlviewer, as rendered (not source code). No matter of text format (bold, italics… no need them). Exactly when I read the .text property of a textarea…
Any help?

Many thanks!

Nick_P · July 25, 2013, 3:59pm

USe



Dim f As Folderitem
HTMLViewer1.loadpage(textArea1.text, f)

works fine for me

Sergio_Tamborini · July 25, 2013, 4:06pm

Yes, but I don’t need it.
I have an htmlviewer with a webpage loaded inside it, and I need to have (as string) the text inside that webpage (not the html source but the text as viewed inside the page). Something like if I load a webpage in Safari, select all, copy and paste in a simple text editor.
I hope it’s more clear now…

Christian_Schmitz · July 25, 2013, 4:19pm

In our MBS Plugins we have methods on HTMLViewer for this. The IETextMBS method is for Windows and HTMLTextMBS for Mac.
You can try that for free if you like.

Sergio_Tamborini · July 25, 2013, 5:27pm

I have the license to your very nice MBS Plugins… but that methods return me the html source and not the rendered html text. My problem is because I need to read some informations in a webpage that are generated by an encrypted javascript… the web page display the text I need but it’s not readable in the source code…

Christian_Schmitz · July 25, 2013, 5:33pm

Ups. HTMLTextMBS on Mac is of course the wrong one.
Looks like we have no direct funvtion, so please use this:

TextArea1.text = HTMLViewer1.EvaluateJavaScriptMBS("document.body.innerText")

and on Windows use IETextMBS.

Sergio_Tamborini · July 25, 2013, 5:44pm

It’s better now, but not yet sufficient.
I have this url (for example)
http://www2.ucsg.edu.ec/filosofia/index.php?option=com_wrapper&view=wrapper&Itemid=6
that displays some email addresses. That email addresses do not appear in the source code, and do not appear in your function.

Is possible in any way to select all and copy the text in the htmlviewer? I think that may be a good way… but I don’t know how…

Many thanks for your precious help.

Christian_Schmitz · July 25, 2013, 6:59pm

No wonder, it’s a frame embedded in the page. YOu need to load the frame alone.

Richard_Duke · September 4, 2013, 11:43am

i need to do something similar on HTMLViewer on Windows. Do a Select All and Copy. How do i accomplish that Christian??

Richard_Duke · September 4, 2013, 11:50am

The content of the HTMLViewer consist of both images and text.

Richard_Duke · September 4, 2013, 1:08pm

never mind… i think i got it…

  #If TargetWin32 Then
    me.IEEditableMBS=True
  #elseIf TargetMacOS Then
    me.EditableMBS=True
  #EndIf

Greg_Gemignani · March 26, 2014, 1:24am

Maybe I am raising the dead, but I have the same issue with a different twist. I am trying to gather status information from a site and the information is in a CSS page that is not easy to read (not encrypted). The HTTPSocket is useless and the HTMLViewer add ons from MBS don’t seem to help either. The idea is to periodically, maybe once a month or so, read the status from the site and report any change. An example of such a page is here: http://tsdr.uspto.gov/#caseNumber=85723861&caseType=SERIAL_NO&searchType=statusSearch
No frames, no encryption. The status is visible to the eye and to the HTML viewer, but I can’t seem to reach it programatically.

shao_sean · March 26, 2014, 5:37am

I would recommend using the HTMLViewer and then accessing the DOM via JavaScript to get the specific node you are looking for… If you take a screen cap and highlight the data you want, I can whip something up for you…

Greg_Gemignani · March 26, 2014, 4:59pm

Shao Sean, I greatly appreciate the help. Using the URL referenced before (I am just using XOJO as an example) the following is the screen shot annotated with the information I would like to capture circled by annotating the screen shot:

Original URL: http://tsdr.uspto.gov/#caseNumber=85723861&caseType=SERIAL_NO&searchType=statusSearch

Again, thank you for your assistance.

shao_sean · March 26, 2014, 8:13pm

http://shaosean.tk/xojo/uspto.zip

Hopefully it is documented enough to understand, otherwise just ask…

Greg_Gemignani · March 26, 2014, 11:13pm

That is absolutely brilliant. While my understanding of JS is a bit weak, I get it in concept. You have given me something that works and something to study and learn in detail. Thank you!

shao_sean · March 27, 2014, 4:14am

Glad it suits your needs… Any questions, feel free to ask…

Mike_D · March 28, 2014, 1:26am

Basically you’d want to look at the HTML and find the name or ID of the HTML object that contains that text. Then use JavaScript to get it, using a technique like this: http://stackoverflow.com/questions/8647216/get-content-of-a-div-using-javascript

Note that doing this via code may be considered “web scraping” and may be violating the website’s TOS. Given that it’s the US Govt, may be better to ask permission rather than forgiveness in this case

shao_sean · March 28, 2014, 4:10am

Could not find anything in regards to a ToS for the site, but there is this information in regards to copyright of the data

[quote]Copyrights are administered by the Copyright Office, a division of the Library of Congress. Copyright law (17 U.S.C. § 105) states that all materials created by the United States government are in the public domain. However, there are restrictions on use.

Anyone incorporating a work of the U.S. Government into a copyrighted work should be aware of 17 U.S.C. § 403. This section requires a copyright notice to contain a statement identifying what portions of the work consist of a work of the U.S.Government. Failure to do so could result in loss of copyright protection for the entire work.[/quote]

husnul_yaqin · August 22, 2014, 12:43pm

hi shao sean, i try your file but the result is ‘done’, not the html sourcecode, any settings? that must i do?