HTTPSecureSocket Help Needed

  1. 2 weeks ago

    Chris M

    Apr 9 Pre-Release Testers, Xojo Pro

    I had a program that previously was able to access baseball standings but can no longer do so. I believe it began failing when the site switched to https. I've tried to fix my program to use HTTPSecureSocket but I am failing. When I go to the site with a normal browser and save the page returned I am able to obtain the standings, but when I try to do this programmatically with Xojo, I get and save the page returned, but it does not contain the standings. I know there are other sites where I can the data without a problem, but I am hoping to gain a better understanding of Xojo and https by forging ahead with this. Am I programming this incorrectly or is it something that is not possible/permissible to do?

    In my button I have

    HTTPSecureSocket1.Send("Get", "https://www.mlb.com/standings/")


    In my socket's page received I have

    	textarea1.AppendText "Status: " + str(httpStatus) + EndOfLine
    
    	dim html as string
    	dim output as TextOutputStream
    	dim f as FolderItem
    
    	html = Xojo.Core.TextEncoding.UTF8.ConvertDataToText(content)
    
    	f = specialfolder.Desktop.Child("standings-test.html")
    	output = f.CreateTextFile
    	output.write html
    	output.close
    	me.disconnect


    I get an httpStatus code of 200.

    Thanks in advance for any advice.

  2. Tim P

    Apr 9 Pre-Release Testers Austin, TX

    Do they offer an API? I would be more than happy to help you with implementing interactions with an API. The purpose of APIs is so that interacting with data sources doesn't break when a website design does.

  3. Chris M

    Apr 9 Pre-Release Testers, Xojo Pro

    Thanks Tim. I've been checking to see if MLB has an API but I can't find one, although other sites have an API for the data.

    But the purpose for my question isn't just to work with an API, but to actually learn why my approach which worked in the past no longer works now that the site has shifted from http to https and why, when I load the site into an HTMLViewer with

    HtmlViewer1.LoadURL("https://www.mlb.com/standings/")

    I can view the standings, but when I try to obtain the exact same page with the approach in my original question, the data does not appear.

  4. Tim P

    Apr 9 Pre-Release Testers Austin, TX

    @Chris M But the purpose for my question isn't just to work with an API, but to actually learn why my approach which worked in the past no longer works now that the site has shifted from http to https and why, when I load the site into an HTMLViewer with

    HtmlViewer1.LoadURL("https://www.mlb.com/standings/")

    I can view the standings, but when I try to obtain the exact same page with the approach in my original question, the data does not appear.

    I can tell you that it's because the page loads extra data after you load it into the browser. This is sometimes for the user experience, also sometimes as a digital rights management mechanism.

  5. Chris M

    Apr 9 Pre-Release Testers, Xojo Pro

    Ah, Thanks for the insight. So that means it's likely impossible/impermissible. Right? I'm assuming there is no way for the socket to deal with extra data if it loads after the page is received although somehow browsers and the HTMLViewer are capable of doing it.

  6. Tim P

    Apr 9 Pre-Release Testers Austin, TX
    Edited 2 weeks ago

    You would need to know how it's loading the data and implement similar measures to load it. While not impossible, it would be very specific to the way the page is implemented; and if they change the page, you have to do it all over again. This is the cost of unauthorized third party works.

    I'm not trying to say that's something that's not worth doing. I wrote an unauthorized third party app with Xojo to control my smart home lights :) Just that you have to be ready for the workload. The legal implications are also something to consider, which is why even though I built my controller with information already out there on the web, I haven't released it.

  7. Ivan T

    Apr 9 Pre-Release Testers
    Edited 2 weeks ago

    To display the full page, the browser makes 73 requests (incluiding images)

    To do the same, you just have to parse the downloaded page to get the resources and download them.

    Actually, on of those 73 requests is a call to an API:

    https://statsapi.mlb.com/api/v1/standings

    you just need to provide the parameters to get the info in a JSON

  8. Chris M

    Apr 10 Pre-Release Testers, Xojo Pro

    Well, I wasn't able to resolve my problem of how to get baseball standings from from MLB.com using HTTPSecureSocket since I couldn't figure out how to obtain the full page, including data, so I didn't really learn what I wanted to know. Instead, I obtained the standings using an API provided at: https://erikberg.com/mlb/standings.json

    If you want to get the standings yourself, here is my code. The program has a button, Listbox1 to display the standings, Label1to show the date and HTTPSecureSocket1 to handle the call. Note the comment in the code which shows that there are many more variables that could be displayed.

    In the button, I have:

    HTTPSecureSocket1.Send("Get", "https://erikberg.com/mlb/standings.json")

    In the PageReceived of HTTPSecureSocket1 I have:

    dim jsondata as text
    jsondata = Xojo.Core.TextEncoding.UTF8.ConvertDataToText(content)
    
    dim data as Xojo.Core.Dictionary
    data = Xojo.Data.ParseJSON(jsondata)
    dim date as string = data.value("standings_date")
    date = nthfield(date, "T", 1)
    label1.text = "Date: " + date
    
    Dim templates() As Auto = data.Value("standing")
    listbox1.ColumnCount = 7
    ListBox1.ColumnWidths = "10%, 10%, 20%, 15%, 15%, 15%, 15%"
    listbox1.heading(0) = "League"
    listbox1.heading(1) = "Division"
    listbox1.heading(2) = "Team"
    listbox1.heading(3) = "Wins"
    listbox1.heading(4) = "Losses"
    listbox1.heading(5) = "Pct"
    listbox1.heading(6) = "GB"
    
    for each d as Xojo.Core.Dictionary in templates
      'available variables: rank, won, lost, streak, ordinal_rank, last_name, team_id, 
      'games_back,  points_for, points_against, home_won, home_lost, away_won, away_lost, 
      'conference_won, conference_lost, division_won, division_lost, lastfive, 
      'last_ten, conference, division, points_scored_per_game, points_allowed_per_game, 
      'win_percentage, point_differential, point_differential_per_game, streak_type, 
      'streak_total, games_played
      dim league as string = d.Value("conference")
      dim division as string = d.Value("division")
      dim team as string = d.Value("first_name")
      dim wins as integer = d.value("won")
      dim losses as integer = d.value("lost")
      dim pct as string = d.value("win_percentage")
      dim gb as double = d.value("games_back")
      listbox1.AddRow league
      ListBox1.Cell(Listbox1.LastIndex, 1) = division
      ListBox1.Cell(Listbox1.LastIndex, 2) = team
      ListBox1.Cell(Listbox1.LastIndex, 3) = str(wins)
      ListBox1.Cell(Listbox1.LastIndex, 4) = str(losses)
      ListBox1.Cell(Listbox1.LastIndex, 5) = pct
      ListBox1.Cell(Listbox1.LastIndex, 6) = str(gb)
    next

    I suppose the could be simpler, so comments are welcome. And I really wanted to better understand HTTPSecureSockets and learn how to handle more difficult web pages because it has wider applications to me but c'est la vie.

    Thanks to Tim Parnell and Ivan Tellez for your comments.

  9. Greg O

    Apr 10 Xojo Inc

    @Chris M And I really wanted to better understand HTTPSecureSockets and learn how to handle more difficult web pages because it has wider applications

    So the problem is that HTTPSecureSocket (and URLConnection for that matter) is not the equivalent of a web browser. Most notably, there’s no JavaScript engine, nor a DOM to represent all of the nodes of a web page so they can be manipulated. Sockets are simply for making requests and getting data back, just like you finally did with the api call. These days, anything beyond that will require use of a browser or an HTMLViewer.

  10. last week

    Chris M

    Apr 14 Pre-Release Testers, Xojo Pro

    Thanks Greg. I'm trying to get my head around that.

    When an HTMLViewer renders a page, is there any way to get the underlying information that I'm seeking in those instance when an API does not exist? I'm assuming (rightfully or wrongfully) that within the browser/viewer there is some "final" html that is ultimately rendered and that we see on screen after all the JavaScript and DOM manipulation occurs.

    Sometimes, when I save a page using a browser from a site in which the HTTPSocket doesn't seem to be able to handle, I can find the data in the saved .html file. At other times, I cannot, and I assume that is when the site is protecting its digital rights, which I understand and respect. But baseball standings, unlike more proprietary statistics, are available on many sites including through 3rd party APIs. So it seemed strange that the HTTPSocket couldn't obtain them some sites but not from others and I was trying to learn whether it's my lack of understanding of HTTPSocket or that it is just completely impossible to do on those sites based on how those sites are programmed. I did learn more about parsing some JSON which contained both a object and an array when using the API, which was a plus.

or Sign Up to reply!