DesktopHTMLViewer DocumentBegin & DocumentComplete not firing

In my application I have an in-app browser that opens the link https://music.amazon.com/search to let the user browse Amazon’s playlists.

Then I need to detect when they’re on a playlist page (any url that begins with “https://music.amazon.com/playlists/”) and then give the user the option to tag that playlist in my application.

Problem is, on this site, the DocumentBegin and DocumentComplete events are not firing.

When I browse the same link in a regular browser I can see the URLs changing in the address bar, but the DesktopHTMLViewer does not detect when the user clicks on a link in the page.

Simple demo project illustrating the problem here.

What’s going on here?

Thanks in advance for any insight.

The demo project is coming up “this file was deleted” but taking a super quick glance at the link you provided, the Amazon music player isn’t actually loading new pages but is using Javascript to change the URL displayed in the browser. This is very common for web applications. It provides a better, more seamless experience for the user.

You might be able to watch for TitleChanged, but you’ll likely have to do some Javascript trickery to get more details from the HTMLViewer.

1 Like

Thanks Tim,

I think I botched the demo project link but you’re probably right, I’ve replicated the problem on literally every music streaming service site I tried. TitleChanged is also a no-go because on some sites, like Deezer, the title remains the same across different pages.

For the time being, I’ve worked around the problem by having a timer track the result of HTMLViewer1.ExecuteJavaScriptSync(“window.location.href”)

But this causes a noticeable increase in CPU activity for my application and I really would like to find a better solution long-term. If anyone has any ideas I’m all ears.

Really? How often does that timer fire?

Every 50ms. But if I take the javascript out the excess cpu activity goes away. This is on a 2013 Macbook Air though so on a newer machine it’d probably fare better.

Uh yeah, that’s way too fast.

Considering most people struggle to notice a 250ms delay, you should try higher numbers.

The other thing to try is that you may be able to inject JavaScript code onto the page to run that timer and get rid of the Xojo overhead.

@Tim_Parnell @Greg_O I’ve had the same problem with some sites, and now I realize why. I am using a WKWebViewControlMB, and in the didFinishNavigation event I scrape the HTML. I could do the same in the TitleChanged event (not a problem for me, but don’t know if that’s foolproof. My question is, what would be the best practice for dealing with web pages loaded “normally” and those that use JavaScript to change the URL displayed without triggering the “document complete” event? This is an example of the latter that I’m trying to deal with: https://ui.adsabs.harvard.edu

The best solution is to not scrape info from sites without their explicit permission. Sites that do allow this will typically offer an API to access the data.

Barring that, you could try to inject some code onto the page to look for changes, but the technique to do so will vary from site to site.

1 Like

These are sites that freely disseminate information. I’m simply scraping for DOIs so that I can query Crossref.org (via their API). I have used APIs for some of the most popular sites when they are available, but it’s impossible to do that for all.

I’ve actually been successful in using the TitleChanged event to get the updated HTML, I don’t need to inject code. My question is, can I safely use that event instead of didFinishNavigation (DocumentComplete) as a notification that the page has loaded? Empirically it seems to work well, but maybe there are cases in which it fails? So far I haven’t found any.

While I agree, the APIs for most of these sites are woefully inadequate.

I’m just saying though. There’s been a lot of litigation surrounding data scraping in recent years.

Not to mention that if they decide to change their website in any way, your app suddenly breaks.

1 Like

There may be cases where the content providers object, but that’s not an issue here. In any case, this isn’t addressing the question. I’ve found, like the OP, that some sites don’t change the title when the content is downloaded, so TitleChanged isn’t reliable, either. The solution to use a timer to track window.location.href may have pitfalls too. You’d think there would be a simple standard way to handle cases like these, but it seems not.

This probably should be considered a bug in the HTMLViewer because the .GoBack and .GoForward methods are indeed tracking the URL changes, the events just aren’t firing when those urls change.

This HTMLViewer DocumentBegin - #6 by Michel_Bujardet

Makes me think that once loaded, maybe cached, it may not fire again.

I wouldn’t use a timer or anything on Xojo’s side to track this. I would use a MutationObserver added to the page once loaded to keep track of the URL and <title> changes then use executeInXojo to notify your Xojo app of the changes. This will reduce overhead significantly.

1 Like

A fascinating approach, @Anthony_G_Cyphers! Thanks for sharing that as I would never have thought of it on my own.

I added this line to the DocumentComplete event:

HTMLViewer1.ExecuteJavaScript("(function(){let currentURL=window.location.href;" _
+ "function handleUrlChange(){const newURL=window.location.href;" _ 
+"if(newURL!==currentURL){currentURL=newURL;executeInXojo('newURL', newURL);}}" _ 
+"handleUrlChange();const observer=new MutationObserver(handleUrlChange);" _
+"observer.observe(document.querySelector('head > title'),{childList:true});" _ 
+"window.addEventListener('popstate',handleUrlChange);" _ 
+"window.addEventListener('load',handleUrlChange);})();")

Unfortunately JavscriptRequest never is raised. The same javascript snippet works successfully when pasted into the address bar in an actual browser (replacing executeInXojo with alert). So I wonder if the HTMLViewer is using a similar method internally to trigger the DocumentBegin and DocumentComplete events and they are all failing for the same reason.

OK, I assume you were testing on Windows. It works as expected on macOS, but CEF on Windows seems to raise the DocumentComplete event before the page is actually rendered. So you’re creating the observer when there’s nothing to observe. I’ve created an example to address this that you can download here. Essentially it uses Timer.CallLater to install the observer outside the event loop so the page continues to load before the Javascript for the observer is run.

EDIT: Actually, I’ve started seeing the same thing on macOS intermittently, so it looks like the behavior is consistent from a framework standpoint. I set the timeout on the Timer.CallLater function call to 2000 and it seems to work everywhere, but slow networks/machines may experience issues.

The bad part about all of this is that we can’t use things like document.body.onload because of the same issue.

1 Like

I would recommend you implement the Amazon Music API and build your own interface if you continue to see problems.

2 Likes

Perhaps a timer can poll for elements that we know will be present in the page before creating the MutationObserver. But if we’re back to using timers, might as well just stick with monitoring window.location.href as it’s a bit more straightforward.

I would recommend you implement the Amazon Music API and build your own interface if you continue to see problems.

The API does not offer an endpoint to browse playlists in the same way as the page linked above. This is a common issue across all of the music services. You can load a playlist if you know its ID, and you can get a list of playlists a user has in their personal library, but you can not browse publicly available playlists by genre, decade or mood the way you can on the services’ publicly facing websites.

Amazon specifically offers a browse endpoint but it yields completely different results than what you get on the website and they don’t break it down in the same categories as on the website. This is similar to Spotify, they allow you to retrieve lists of “featured” playlists but not by category the way they are displayed on the website. Other services, like Deezer and Apple Music, don’t offer a browse playlist endpoint at all. YouTube Music doesn’t seem to have its own API whatsoever. The “unofficial” one makes use of scraping.

Theoretically, you could use ExecuteJavascriptSync in a timer to check for the existence of the body tag, and return a value when it does, then install the observer. You should get nothing back if the page isn’t loaded as the ExecuteJavascriptSync code just silently fails. Just make sure you disable that timer before you install the observer, and use a timer period of, say, one second to reduce CPU usage and avoid the timer firing again during the check/install/any other a sync operation.

1 Like