HTTPSecureSocket Help Needed

I had a program that previously was able to access baseball standings but can no longer do so. I believe it began failing when the site switched to https. I’ve tried to fix my program to use HTTPSecureSocket but I am failing. When I go to the site with a normal browser and save the page returned I am able to obtain the standings, but when I try to do this programmatically with Xojo, I get and save the page returned, but it does not contain the standings. I know there are other sites where I can the data without a problem, but I am hoping to gain a better understanding of Xojo and https by forging ahead with this. Am I programming this incorrectly or is it something that is not possible/permissible to do?

In my button I have

HTTPSecureSocket1.Send("Get", "https://www.mlb.com/standings/")

In my socket’s page received I have

[code] textarea1.AppendText "Status: " + str(httpStatus) + EndOfLine

dim html as string
dim output as TextOutputStream
dim f as FolderItem

html = Xojo.Core.TextEncoding.UTF8.ConvertDataToText(content)

f = specialfolder.Desktop.Child("standings-test.html")
output = f.CreateTextFile
output.write html
output.close
me.disconnect[/code]

I get an httpStatus code of 200.

Thanks in advance for any advice.

Do they offer an API? I would be more than happy to help you with implementing interactions with an API. The purpose of APIs is so that interacting with data sources doesn’t break when a website design does.

Thanks Tim. I’ve been checking to see if MLB has an API but I can’t find one, although other sites have an API for the data.

But the purpose for my question isn’t just to work with an API, but to actually learn why my approach which worked in the past no longer works now that the site has shifted from http to https and why, when I load the site into an HTMLViewer with

HtmlViewer1.LoadURL("https://www.mlb.com/standings/")

I can view the standings, but when I try to obtain the exact same page with the approach in my original question, the data does not appear.

[quote=432106:@Chris Malumphy]But the purpose for my question isn’t just to work with an API, but to actually learn why my approach which worked in the past no longer works now that the site has shifted from http to https and why, when I load the site into an HTMLViewer with

HtmlViewer1.LoadURL("https://www.mlb.com/standings/")

I can view the standings, but when I try to obtain the exact same page with the approach in my original question, the data does not appear.[/quote]
I can tell you that it’s because the page loads extra data after you load it into the browser. This is sometimes for the user experience, also sometimes as a digital rights management mechanism.

Ah, Thanks for the insight. So that means it’s likely impossible/impermissible. Right? I’m assuming there is no way for the socket to deal with extra data if it loads after the page is received although somehow browsers and the HTMLViewer are capable of doing it.

You would need to know how it’s loading the data and implement similar measures to load it. While not impossible, it would be very specific to the way the page is implemented; and if they change the page, you have to do it all over again. This is the cost of unauthorized third party works.

I’m not trying to say that’s something that’s not worth doing. I wrote an unauthorized third party app with Xojo to control my smart home lights :slight_smile: Just that you have to be ready for the workload. The legal implications are also something to consider, which is why even though I built my controller with information already out there on the web, I haven’t released it.

To display the full page, the browser makes 73 requests (incluiding images)

To do the same, you just have to parse the downloaded page to get the resources and download them.

Actually, on of those 73 requests is a call to an API:

https://statsapi.mlb.com/api/v1/standings

you just need to provide the parameters to get the info in a JSON

Well, I wasn’t able to resolve my problem of how to get baseball standings from from MLB.com using HTTPSecureSocket since I couldn’t figure out how to obtain the full page, including data, so I didn’t really learn what I wanted to know. Instead, I obtained the standings using an API provided at: https://erikberg.com/mlb/standings.json

If you want to get the standings yourself, here is my code. The program has a button, Listbox1 to display the standings, Label1to show the date and HTTPSecureSocket1 to handle the call. Note the comment in the code which shows that there are many more variables that could be displayed.

In the button, I have:

HTTPSecureSocket1.Send("Get", "https://erikberg.com/mlb/standings.json")

In the PageReceived of HTTPSecureSocket1 I have:

[code]dim jsondata as text
jsondata = Xojo.Core.TextEncoding.UTF8.ConvertDataToText(content)

dim data as Xojo.Core.Dictionary
data = Xojo.Data.ParseJSON(jsondata)
dim date as string = data.value(“standings_date”)
date = nthfield(date, “T”, 1)
label1.text = "Date: " + date

Dim templates() As Auto = data.Value(“standing”)
listbox1.ColumnCount = 7
ListBox1.ColumnWidths = “10%, 10%, 20%, 15%, 15%, 15%, 15%”
listbox1.heading(0) = “League”
listbox1.heading(1) = “Division”
listbox1.heading(2) = “Team”
listbox1.heading(3) = “Wins”
listbox1.heading(4) = “Losses”
listbox1.heading(5) = “Pct”
listbox1.heading(6) = “GB”

for each d as Xojo.Core.Dictionary in templates
'available variables: rank, won, lost, streak, ordinal_rank, last_name, team_id,
'games_back, points_for, points_against, home_won, home_lost, away_won, away_lost,
'conference_won, conference_lost, division_won, division_lost, lastfive,
'last_ten, conference, division, points_scored_per_game, points_allowed_per_game,
'win_percentage, point_differential, point_differential_per_game, streak_type,
'streak_total, games_played
dim league as string = d.Value(“conference”)
dim division as string = d.Value(“division”)
dim team as string = d.Value(“first_name”)
dim wins as integer = d.value(“won”)
dim losses as integer = d.value(“lost”)
dim pct as string = d.value(“win_percentage”)
dim gb as double = d.value(“games_back”)
listbox1.AddRow league
ListBox1.Cell(Listbox1.LastIndex, 1) = division
ListBox1.Cell(Listbox1.LastIndex, 2) = team
ListBox1.Cell(Listbox1.LastIndex, 3) = str(wins)
ListBox1.Cell(Listbox1.LastIndex, 4) = str(losses)
ListBox1.Cell(Listbox1.LastIndex, 5) = pct
ListBox1.Cell(Listbox1.LastIndex, 6) = str(gb)
next[/code]

I suppose the could be simpler, so comments are welcome. And I really wanted to better understand HTTPSecureSockets and learn how to handle more difficult web pages because it has wider applications to me but c’est la vie.

Thanks to Tim Parnell and Ivan Tellez for your comments.

So the problem is that HTTPSecureSocket (and URLConnection for that matter) is not the equivalent of a web browser. Most notably, there’s no JavaScript engine, nor a DOM to represent all of the nodes of a web page so they can be manipulated. Sockets are simply for making requests and getting data back, just like you finally did with the api call. These days, anything beyond that will require use of a browser or an HTMLViewer.

Thanks Greg. I’m trying to get my head around that.

When an HTMLViewer renders a page, is there any way to get the underlying information that I’m seeking in those instance when an API does not exist? I’m assuming (rightfully or wrongfully) that within the browser/viewer there is some “final” html that is ultimately rendered and that we see on screen after all the JavaScript and DOM manipulation occurs.

Sometimes, when I save a page using a browser from a site in which the HTTPSocket doesn’t seem to be able to handle, I can find the data in the saved .html file. At other times, I cannot, and I assume that is when the site is protecting its digital rights, which I understand and respect. But baseball standings, unlike more proprietary statistics, are available on many sites including through 3rd party APIs. So it seemed strange that the HTTPSocket couldn’t obtain them some sites but not from others and I was trying to learn whether it’s my lack of understanding of HTTPSocket or that it is just completely impossible to do on those sites based on how those sites are programmed. I did learn more about parsing some JSON which contained both a object and an array when using the API, which was a plus.