Using HTMLViewer to retrieve a text file

I am attempting to use the HTMLViewer to access a website and download a text file. I get as far as viewing the text file, but am unsure how to move it to a TextField or some other structure so that I can operate on it. I can manually select all, then copy and paste, but I cannot seem to do that programmatically from the HTLMViewer. I would also like to upload a text file to a website.

Can anybody point me to examples of how to do this? Or is it not possible? Or don’t use the HTMLViewer?

Use URLConnection instead. HTMLViewer is a GUI control whereas URLConnection is a class meant retrieve or send data to web servers.

3 Likes

Thanks will take a look at that!

When you get to this part, you’ll need to tell us about which method of uploading you’re attempting. Explore URLConnection for downloading your text file, and when you’ve got a handle on that expand into uploading. Depending on what you’re doing, the processes could be different in meaningful ways.

I am looking at URLConnection, it seems to depend on passing an HTML message to the webpage. In this case, the web page is just the rendered text I want to access. Obviously I am very new to this and the examples in the Xojo help are “brief”. Does the webserver need to cooperate in this transaction, or are there standard HTML messages that would most likely be acknowledged. I tried a simple GET message, but clearly I don’t know what I am doing.

By responding, the webserver has cooperated in the transaction.

All of the technical stuff is part of the Xojo Framework so you don’t have to worry about it. If you want to download a text file, you get the contents of the text file.

There are three basic examples in Example Projects/Networking/URLConnection which should get you started downloading a text file and communicating with servers. There are more examples (that get into advanced things like communicating with a REST API) in Example Projects/Communication/Internet/Web Servcies

You have to be more specific about what you’re getting stuck on to get answers. You haven’t described the problem.

OK, making progress now. It is a bit of the Dunning-Kruger effect: Asking the right questions presumes knowledge of the subject that I do not have.

The examples are what I was after, foolishly searching in the Documentation. Forgot about the examples in the IDE itself.

Now I am getting the wanted text (wrapped in some HTML), after I get by the login challenge. I tried the AuthenticationRequest method but it is not being called. In a browser, the webpage brings up a login dialog box with (I think) some HTLM method stuff. It looks like this:

Log In


Is there an easy way around this? If I add an HTMLViewer to the example app, and first go through the login, then the URLConnection fetches the data. Otherwise I get this.

hmm… the board is rendering the html rather than displaying it…

OK, I’ve replaced the angle brackets with parenthesis so that it can be seen. It looks like they are expecting a post with the names and values for login and password. Also the token, which changes every time it loads, an anti-robot thing I think.

So I tried this (with correct login and password:

Var postData As String
postData = “csrfmiddlewaretoken=” + TokenField.Text
postData = postData + “&login=loginname&password=password”
postData = postData + “&next=/url.txt”

WebConnection.SetRequestContent(postData, “application/x-www-form-urlencoded”)
WebConnection.Send(“POST”, URLField.Text)

Which is accepted without a bunch of errors, but the result is exactly the same thing with a new token. I think an html thing more than a Xojo thing but I haven’t been able to get past this. If I log in by other means, it gets by this and I can download the file.

(form action=“” method=“post” id=“login-form”)(input type=“hidden” name=“csrfmiddlewaretoken” value=“Fyzvtx1PL4eyJKWxICcmWbVisJdZvNyNaUq10tvkrE0OPUOVy5Zsu0a14WoGKc5Y”)
(input type=“text” name=“login” placeholder=“Email or Username” aria-label=“Email or Username” title=“Email or Username” value=“” class=“input name-input” /)(br /)
(input type=“password” name=“password” placeholder=“Password” aria-label=“Password” title=“Password” value=“” class=“input passwd-input” autocomplete=“off” /)

    (input type="hidden" name="next" value="/url.txt" /)
    
    (p class="error hide")(/p)
    

    (label class="checkbox-label remember")
        (input type="checkbox" name="remember_me" class="vam remember-input" /)
        (span class="vam")Remember me for 7 days (/span)
    (/label)

    (a href="/accounts/password/reset/" class="normal forgot-passwd")Forgot password?(/a)


    (button type="submit" class="submit btn btn-primary btn-block")Log In(/button)
(/form)

Keep in mind that the fact that the site has a login page may indicate that the site doesn’t want to be “scraped” in the way you are attempting. You may also need to get permission from the provider of the data to do so. It is not uncommon for the policy to be hidden down in the fine print of the EULA that you agreed to when you first accessed the site. I’ve seen this behavior in the past depending on how valuable the data is.

That said, the site probably provides a session “cookie” or “token” of some sort after the login is complete and you will have to capture that if you can, because you’ll need to send it back with every request after that to identify who you are.

It is also possible that requests to pages once you have logged in are required to come from the same site or referrer, which may prevent you from accessing this data altogether using a simple URLConnection.

To supplement what Greg mentions regarding accessing this data, websites that offer services usually offer a way you can programmatically use that service. It’s called an API, and they are to make interacting with the service easy for authenticated users.

I would suggest starting by looking for an API at whatever service you’re using. If you can’t find one, ask the website operator if they offer one (or if you’re even allowed to be programmatically taking the data you are). As a community, we’re happy to help people implement APIs when they get stuck and there are even Example Projects specifically for this already!

Not really the case here, it is simply an online file storage site, and the owner of the data is the one trying to access. We just want to get the data off of it in an easy way, and into a Xojo project for some processing and analysis. The site provides a client to access files, but it requires a separate installation and several steps to get it into the project. I’m trying to streamline that.