Using HTMLViewer to retrieve a text file

Jon_Fitch · March 10, 2023, 7:23pm

I am attempting to use the HTMLViewer to access a website and download a text file. I get as far as viewing the text file, but am unsure how to move it to a TextField or some other structure so that I can operate on it. I can manually select all, then copy and paste, but I cannot seem to do that programmatically from the HTLMViewer. I would also like to upload a text file to a website.

Can anybody point me to examples of how to do this? Or is it not possible? Or don’t use the HTMLViewer?

Kem_Tekinay · March 10, 2023, 7:38pm

Use URLConnection instead. HTMLViewer is a GUI control whereas URLConnection is a class meant retrieve or send data to web servers.

Jon_Fitch · March 10, 2023, 7:50pm

Thanks will take a look at that!

Tim_Parnell · March 10, 2023, 8:11pm

When you get to this part, you’ll need to tell us about which method of uploading you’re attempting. Explore URLConnection for downloading your text file, and when you’ve got a handle on that expand into uploading. Depending on what you’re doing, the processes could be different in meaningful ways.

Jon_Fitch · March 10, 2023, 8:19pm

I am looking at URLConnection, it seems to depend on passing an HTML message to the webpage. In this case, the web page is just the rendered text I want to access. Obviously I am very new to this and the examples in the Xojo help are “brief”. Does the webserver need to cooperate in this transaction, or are there standard HTML messages that would most likely be acknowledged. I tried a simple GET message, but clearly I don’t know what I am doing.

Tim_Parnell · March 10, 2023, 8:43pm

By responding, the webserver has cooperated in the transaction.

All of the technical stuff is part of the Xojo Framework so you don’t have to worry about it. If you want to download a text file, you get the contents of the text file.

There are three basic examples in Example Projects/Networking/URLConnection which should get you started downloading a text file and communicating with servers. There are more examples (that get into advanced things like communicating with a REST API) in Example Projects/Communication/Internet/Web Servcies

You have to be more specific about what you’re getting stuck on to get answers. You haven’t described the problem.

Jon_Fitch · March 10, 2023, 10:13pm

OK, making progress now. It is a bit of the Dunning-Kruger effect: Asking the right questions presumes knowledge of the subject that I do not have.

The examples are what I was after, foolishly searching in the Documentation. Forgot about the examples in the IDE itself.

Now I am getting the wanted text (wrapped in some HTML), after I get by the login challenge. I tried the AuthenticationRequest method but it is not being called. In a browser, the webpage brings up a login dialog box with (I think) some HTLM method stuff. It looks like this:

Log In

Is there an easy way around this? If I add an HTMLViewer to the example app, and first go through the login, then the URLConnection fetches the data. Otherwise I get this.

hmm… the board is rendering the html rather than displaying it…

Jon_Fitch · March 11, 2023, 2:22am

OK, I’ve replaced the angle brackets with parenthesis so that it can be seen. It looks like they are expecting a post with the names and values for login and password. Also the token, which changes every time it loads, an anti-robot thing I think.

So I tried this (with correct login and password:

Var postData As String
postData = “csrfmiddlewaretoken=” + TokenField.Text
postData = postData + “&login=loginname&password=password”
postData = postData + “&next=/url.txt”

WebConnection.SetRequestContent(postData, “application/x-www-form-urlencoded”)
WebConnection.Send(“POST”, URLField.Text)

Which is accepted without a bunch of errors, but the result is exactly the same thing with a new token. I think an html thing more than a Xojo thing but I haven’t been able to get past this. If I log in by other means, it gets by this and I can download the file.

(form action=“” method=“post” id=“login-form”)(input type=“hidden” name=“csrfmiddlewaretoken” value=“Fyzvtx1PL4eyJKWxICcmWbVisJdZvNyNaUq10tvkrE0OPUOVy5Zsu0a14WoGKc5Y”)
(input type=“text” name=“login” placeholder=“Email or Username” aria-label=“Email or Username” title=“Email or Username” value=“” class=“input name-input” /)(br /)
(input type=“password” name=“password” placeholder=“Password” aria-label=“Password” title=“Password” value=“” class=“input passwd-input” autocomplete=“off” /)
    (input type="hidden" name="next" value="/url.txt" /)
    
    (p class="error hide")(/p)
    

    (label class="checkbox-label remember")
        (input type="checkbox" name="remember_me" class="vam remember-input" /)
        (span class="vam")Remember me for 7 days (/span)
    (/label)

    (a href="/accounts/password/reset/" class="normal forgot-passwd")Forgot password?(/a)


    (button type="submit" class="submit btn btn-primary btn-block")Log In(/button)
(/form)

Greg_O · March 11, 2023, 12:05pm

Keep in mind that the fact that the site has a login page may indicate that the site doesn’t want to be “scraped” in the way you are attempting. You may also need to get permission from the provider of the data to do so. It is not uncommon for the policy to be hidden down in the fine print of the EULA that you agreed to when you first accessed the site. I’ve seen this behavior in the past depending on how valuable the data is.

That said, the site probably provides a session “cookie” or “token” of some sort after the login is complete and you will have to capture that if you can, because you’ll need to send it back with every request after that to identify who you are.

It is also possible that requests to pages once you have logged in are required to come from the same site or referrer, which may prevent you from accessing this data altogether using a simple URLConnection.

Tim_Parnell · March 11, 2023, 4:12pm

To supplement what Greg mentions regarding accessing this data, websites that offer services usually offer a way you can programmatically use that service. It’s called an API, and they are to make interacting with the service easy for authenticated users.

I would suggest starting by looking for an API at whatever service you’re using. If you can’t find one, ask the website operator if they offer one (or if you’re even allowed to be programmatically taking the data you are). As a community, we’re happy to help people implement APIs when they get stuck and there are even Example Projects specifically for this already!

Jon_Fitch · March 11, 2023, 4:26pm

Not really the case here, it is simply an online file storage site, and the owner of the data is the one trying to access. We just want to get the data off of it in an easy way, and into a Xojo project for some processing and analysis. The site provides a client to access files, but it requires a separate installation and several steps to get it into the project. I’m trying to streamline that.