URLConnection, me.send("get",s), returning nothing

I am using an URLConnection (named GetData) to retrieve data from a website. I am using the following User-Agent settings.

Window1.GetData.RequestHeader(“User-Agent”) = “Mozilla/5.0 (X11; U; Linux i686) AppleWebKit/537.36 (KHTML, like Gecko) Ubuntu Chromium/123.0.6266.215 Chrome/123.0.6266.215 Safari/537.36”

However, when I query the site

s = “https://marketwatch.com/investing/fund/FCNTX
me.send(“get”,s)

When the Event ContentReceived is triggered
content as string is empty

Does anyone know what might be the problem. I suspect it involves the User-Agent settings for my Mac (OS 11.7.10).

Thank you in advance.

I should add that this was working fine until just recently.

Running that URL in Postman results in a 401 error with the following data:

<html lang="en"><head><title>marketwatch.com</title><style>#cmsg{animation: A 1.5s;}@keyframes A{0%{opacity:0;}99%{opacity:0;}100%{opacity:1;}}</style></head><body style="margin:0"><p id="cmsg">Please enable JS and disable any ad blocker</p><script data-cfasync="false">var dd={'rt':'c','cid':'AHrlqAAAAAMA6qtxW9gEyLsAVdDZgw==','hsh':'D428D51E28968797BC27FB9153435D','t':'bv','s':47891,'e':'a838ee8ef0155cf5058dd22d067e8199cf933dee2374f34045203d584bf24977','host':'geo.captcha-delivery.com','cookie':'ONU7iFCq~7~HogJJslwTzZxyXV2VIOttptL3aD0ImMpf~bHX3zGKvgpkmxWptkcxwHvQfEcEAo8xxVjZjZ0g6JHggcNSkHpzFf9JmN3VvziGJej5loXt3vec~y_SbxQK'}</script><script data-cfasync="false" src="https://ct.captcha-delivery.com/c.js"></script></body></html>

You might have been hitting that website pretty hard and they decided to add a Javascript captcha.

it is not a single call, the web page load other data and execute java script.

you should have a look for a provider that offer a web api interface via JSON or XML.

1 Like

It’s also possible that accessing that page requires a cookie to be set, but I agree with Markus, if they offer an API, use that. Companies frown on people scraping data from their sites and will employ multiple tactics to prevent it.

1 Like

The software I wrote is for my personal use and is basically like a little widget on the desktop that displays various data that you can get when you use a web browser to go to various websites. It updates every half hour. It was a little project to learn more about URL connections. I became curious why my web browser returned results from some sites but the get command in the URL connection would not. I was also curious why it was working fine for many months and then displayed various degrees of intermittent failure.

You may already know that the browser works different and can do more things than URL connection, like following a redirect, execute javascript, etc.

Server configuration changes, security changes, etc. There are sites that relay on the user-agent to return different information. If it detects is a mobile device then the mobile webpage is shown. I don’t remember if it was MarketWatch in the past (here in the forum) but one stock ticker page responded different than other with the same user-agent (default from Xojo), so it seemed that one page worked and the other didn’t work for a strange reason. If I remember correctly then CURL was used and changing the user-agent to what normal Chrome/Firefox use then both stock tickers gave the information the user wanted.

After saying that, if you want a long time solution you should see if they offer an API to get the information instead of pulling info directly. My guess is that they may offer a pay solution only.

Thanks. However, some change was make to Xojo. When I run under Xojo 2024 Release 1, data is received, but when I run under Xojo 2024 Release 4.1, content is blank.

Create a sample project and open an issue. It could be as simple as different user-agent.

The release notes for 2024r4.1 has this line:

  • URLConnection.FollowRedirects settable property.

https://documentation.xojo.com/resources/release_notes/2024r4.1.html

You might want to try changing that property to see if you get different results

I will do that. Thanks.

I read that followredirect defaults to true, thus I didn’t originally experiment with changing it. I just tried with followredirect set to true and false, in both cases no content was returned using the latest Xojo release.