Getting meta tags from web page

I’m using WKWebViewMBS with 2020R1.1 to load web pages. I want to collect all meta tags in the downloaded page to look for embedded metadata such as author, title, etc., which Google uses to catalog sites for Google Scholar. Although it seems straightforward, my attempts have failed. Here are a few things I’ve tried, among many variants (all yield an empty string):

dim metaTags as string = me.evaluateJavaScript(“document.getElementsByTagName(”“meta”").innerText;", error)

dim metaTags as string = me.evaluateJavaScript(“document.getElementsByTagName(‘meta’).innerText;”, error)

dim metaTags as string = me.evaluateJavaScript(“document.getElementsByTagName(‘meta’).content;”, error)

Any suggestions?

First try the expressions in a browser with the web inspector there.

Second check the quotes. Use normal quotes like ’ or ", but not the typographic ones.

The quotes and such are straight. they were made “smart” when pasted into the forum reply box.

I don’t know what you mean by try the web inspector – I’m use Safari and I see the Devleop -> Web Inspector, but I don’t see the opportunity to run JS.

You can write js in the console insode the inspector.

Thanks for the info, I didn’t know that, but that’s not addressing the problem.

But the problem isn’t that I can’t write JS in Xojo, I can. It’s that the JS posted above does not work, despite the fact that information from may web sites say it should.

I’m asking if someone knows the correct syntax to achieve what I want – a list of a meta tags.

Do you need to use Javascript? You can read the html from the htmlviewer and parse the html with Tidy.

try “.innerHTML” instead of “.innerText”, and JS is case-sensitive

No, I just thought JS was the proper way to do this (other than trying to parse the HTML myself, which because of the enormous flexibility of HTML I figured will be prone to errors). JS should work.

How would I use Tidy inside a Xojo app? Note that this is for a commercial product, so there can’t be any required 3rd party installs.

@DerkJ I’m afraid that’s not working, either. I wonder if that’s because the meta tags are in the element.

As always Tidy is part of the MBS plugin: https://www.monkeybreadsoftware.net/pluginpart-tidyplugin.shtml .

It’s probably showing empty because you’re attempting to return an array (or HTMLCollection) to a textual interface. As an example, in vanilla Xojo, I just wrote this for the built-in HTMLViewer, and it functions as expected:

var returnJSON as String = HTMLViewer1.ExecuteJavaScriptSync("var meta = document.getElementsByTagName('meta');var returnData = {};for (let node of meta) {if(node && node.getAttribute('name')) {returnData[node.getAttribute('name')] = node.getAttribute('content');}}JSON.stringify(returnData);")
var jsonMeta as new JSONItem(returnJSON)
var viewport as String = jsonMeta.Value("viewport")
break

The JavaScript in that execute statement, when formatted, looks like this:

var meta = document.getElementsByTagName('meta');
var returnData = {};
for (let node of meta) {
	if(node && node.getAttribute('name')) {
    returnData[node.getAttribute('name')] = node.getAttribute('content');
  }
}
JSON.stringify(returnData);

This will return a JSON-formatted string. You should be able to adapt this to your MBS component easily.

@Anthony_G_Cyphers Thank you very much, that seems to have done the trick!

1 Like

Happy to help! If you don’t mind, take a second to mark that as the solution.