Extracting Data from a Web Page

In a desktop application for personal use, I’ve been using a URLConnection to visit a public website to extract information from the content to perform additional calculations. Recently though, the content returned said

The browser you are using is no longer supported on this site. It is highly recommended that you use the latest versions of a supported browser in order to receive an optimal viewing experience. The following browsers are supported: Chrome, Edge (v80 and later), Firefox and Safari.

Does this mean I can no longer use that page with a URLConnection?

When I load the URL into an HTMLViewer it works fine. But I don’t believe it’s possible to automate the process of copying the data I need from an HTMLViewer so I can process it.

Must I now do this manually or am I missing something?

Change the UserAgent.
In safari:

2 Likes

Set the User-Agent in your http-Request to something like “Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/85.0.4183.102 Safari/537.36” f.e. (this is what a Chrome Browser looks like f.e.).

yourURLConnection.RequestHeader("User-Agent") =  "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/85.0.4183.102 Safari/537.36"
1 Like

Is this the old “best viewed with IE9” nonsense? Why don’t these people learn to make their sites using agreed standards?

Tim-Berners Lee: anyone who puts “best viewed with “ on their site obviously wants to go back to the old days before the Web, when it was difficult to access a document on another system, another computer, another network.

No, this is the opposite problem. The site is saying URLConnection isn’t a new enough browser.

Now to be fair, if they’re just using a user-agent string, I’m sad for them because they really don’t tell you anything. Modern sites should be using feature detection to either insert bridge code or to turn certain features on/off based on their actual capabilities. That, unfortunately, would fail spectacularly for URLConnection because it doesn’t have a JavaScript engine, much less a DOM.

You could try an off screen HTMLViewer. It has JS and DOM abilities so is more likely to work. Heavier, I know… There are CURL classes out there too.

This seems to work for me. Thanks for the help.

URLConnection1.RequestHeader("User-Agent") = "Mozilla/5.0 (Macintosh; Intel Mac OS X 14_7_4) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/18.3 Safari/605.1.15"

URLConnection1.Send("GET", url, 60)
1 Like