Read data from a website

Michael_Cebasek · April 11, 2025, 10:31am

Hello.

I need to read data from a website, and don’t even have an idea where to start.
Can anyone help?

I don’t know if I should use the desktop app, web stuff.

That is how stalled I am right now.

Or if you can point me to a tutorial, that would be great too.

Regards

MarkusR · April 11, 2025, 10:41am

xojo have a URLConnection component, but it would read raw html page.

usually there should be a web api interface and this speak input/output via JSON or XML.

Christian_Wheel · April 11, 2025, 10:42am

There are a few ways you can go about this.

You could use the URLConnection class, connect to your website and make a GET/POST request with the necessary variables, then receive and parse the response.

Another way is to use an HTMLViewer and use Javascript to extract the information you need.

You could also shell to curl and retrieve your page that way (or use one of the MBS curl classes).

David_Cox · April 11, 2025, 1:14pm

Here’s what I use, assuming you want to download a file:

Protected Function getURLConnectionFolderItem(URLConnectionHost As String, f As FolderItem, URLConnectionTimeOut As Integer = 10) As FolderItem
  Var tempURLConnection As New URLConnection
  Var MIMEType As String = "application/json"
  Var HTTPStatusCode As Integer
  
  If f = Nil Or URLConnectionHost = "" Or (URLConnectionHost.left(7) <> "http://" And URLConnectionHost.left(8) <> "https://") Then
    Return Nil
  End If
  
  Try 'network timeout causes exception!
    #If TargetWindows Or TargetLinux Then
      tempURLConnection.AllowCertificateValidation = False 'Windows gets a Security Error if True
    #EndIf
    tempURLConnection.SendSync("GET", URLConnectionHost, f, URLConnectionTimeOut)
    HTTPStatusCode = tempURLConnection.HTTPStatusCode
    tempURLConnection.Disconnect
    
  Catch Error
    Return Nil
  End Try
  
  If f <> Nil And f.Exists Then 'And HTTPStatusCode = 0
    Return f
  Else
    Return Nil
  End If
      
End Function

Michael_Cebasek · April 11, 2025, 8:42pm

HI David.

Thanks for the response.

However, I don’t want to download a file. Let’s say I want to track the value of the Canadian dollar, and a banking website FREELY AND OPENLY (no bad stuff being done) posts it.

I want to get the value from the website, and plot it.

Yes, I am Canadian. And proud of it! But I just used that option as an example!

Regards.

Tim_Hare · April 11, 2025, 9:02pm

If it’s a banking website, they will certainly have and api you can use to get the information. Scraping a website (downloading it and picking out information) is generally frowned upon.

Michael_Cebasek · April 11, 2025, 10:03pm

Hi Tim.

I used the Canadian Dollar example as it was the first thing that came to mind. I stay far away from touching things that I know… don’t like being touched.

MarkusR · April 12, 2025, 6:45am

using a api is not magic
its like calling
https://api.frankfurter.app/latest?from=CAD

and you get a json struct

some api need to register and you get a api key for use in the requests.
the requests are usually get or post method.

the other idea is via javascript and the DOM (Document Object Model)

as example JS

const element = document.getElementById(“theelementid”);

Tim_Hare · April 12, 2025, 11:12am

The point is, the sites that want you to have their data will provide an api.

Beatrix_Willius · April 12, 2025, 11:33am

Everyone should have fun with parsing html at least once. Then he actually knows why this is such a bad idea.

Michael_Hußmann · April 12, 2025, 12:15pm

Indeed it is, unless one wholeheartedly agrees with Camus’ famous dictum that “one must imagine Sisyphus happy”. The parsing part as such isn’t so bad but going this route puts one in a treadmill as every so often the website’s code changes and one has to catch up, fixing one’s broken parsing code.

TimStreater · April 12, 2025, 6:10pm

Trouble is, sometimes you can’t avoid it.

Christian_Wheel · April 12, 2025, 6:58pm

Indeed, scraping and parsing is sometimes necessary.

Perfect example: I place hundreds of Amazon orders a year for my business. Amazon used to offer a tool that let you download your order history for a year in a CSV, I need this to file my taxes. They no longer offer this tool and scraping is the only way to get this data efficiently.

There are libraries dedicated strictly to scraping html, BeautifulSoup in Python is particularly versatile. Not sure if anyone has written generic classes to abstract the legwork in Xojo, but I have personally written a ton of html parsing code over the years. Chilkat’s HTMLToText and HTMLToXML have also been helpful at times.

MarkusR · April 13, 2025, 10:35am

its not part of amazon api?

Christian_Wheel · April 14, 2025, 2:15pm

That only retrieves sales, not purchases. Also, regular Amazon customers without a Seller account can’t even register for access to that API.