Using an older version (REAL Studio 2011 r1).
The goal is to download images files from a site that requires a user name and password. The user name and password are submitted from a form on the site’s main page.
I was able to figure out how to log in by submitting a form using the HTTPSocket, but after logging in, I am unsure of how to proceed. I have used the HTTPSocket for this sort of thing before, but the method I used with other sites doesn’t work with this site. With the other sites, I know the path to the files on the web server in advance and can point directly to them using: http://thewebserver/thefolders/thefile. The user name and password are send each time.
When I try this method with this site, even when using a path I know to be valid, it instead returns the login page.
I’m not sure it makes a difference, but with this site, the path does not point directly to a file. Instead it’s being processed through a CGI script using a URL something like this:
With the site I am working with now, I am unable to figure out how they come up with some of the information that’s used in this path. I’m sure it’s not, but it seems random to me. However, I can get this information by parsing the HTML of the pages during navigation.
So, my idea was to:
Go to the standardized URL that has all of the links available that week.
Search for the file name I need in the HTML of that page and grab the next URL pointing to where it can be found.
Go to that file’s URL. This page has the URL that will actually initiate the download.
Parse the page for this URL and go get the file.
Now that I know how to submit the log in form, how do I begin “navigating” to or returning the pages I need to parse afterward?
I think I could do this with an HTMLViewer, but this process needs to run without intervention. Too many pop-up script errors are generated using the HTMLViewer.
I also have CURLMBS, if that would be a better route. I didn’t get very far with it when trying with this site, but have used it with others and had no problems.
Thanks for any help.