HTTPSocket with a redirected URL

Emile_Schwarz · January 9, 2017, 2:15pm

I am in the advanced stage of a prototype method that get files from the internet.

Everything (now) is OK, excepted for some days (and some years, the older ones).

When with a web browser, if the target date (the html url endw with: /1997/12/07/ ) does not exists in the web server, a redirection is done on todays page.
I wrote todays page (/2017/01/09/).

I tried to check the HTTPStatusCode (below), but it does not works as expected: I get todays data

If mySocket.HTTPStatusCode <> 308 Then

I tried 301, 302, 307, 308 (in the If code above) with the same bad result: the code in the If block is executed (I have an empty Else for the “Bothing for me here” part).

BTW: the redirect done in the web browser is silent": the user (me now) only see todays contents, but the page URL is the one set.

I found at Wikipedia: List of HTTP status codes and I try the four that seemed to be used in my case.

Help is welcome.

Of course, I can live with the unwanted files, but this fill my hard disk with 52 / 53 (x 3) unwanted files / slowdown the whole process. I will have 29 years to take on multiples base urls. And I do not count some other things I have to do manually ('cause I cannot read the image contents to know what # the image is tagged / when there is a tag). Complex to understand, but not the code AFAIK.

PS: the idea of this method tooks me around 16 years ! Writing the method tooks me less than one hour (all included).

Greg_O_Lone · January 9, 2017, 2:49pm

I’m a little confused. You say this:

but then you say that you’re confused that you’re getting Today’s data?

Anyway… How you deal with this depends largely on whether you are using the Classic or New Framework. New Framework sockets take care of the redirection for you where as the Classic Framework does not. Are you using a plain HTTPSocket or is it a Xojo.Net.HTTPSocket?

Marius_Dieter_Noetzel · January 9, 2017, 3:22pm

Perhaps there is no redirect? Could it be, that the Webserver directly delivers the today page?
You could test that easy, when opening an invalid URL (date) with your browser. After the today page is delivered, which URL is shown in the browser?

Marius

Emile_Schwarz · January 9, 2017, 3:23pm

Hi Greg,

Thank you for your answer.
I use Classic HTTPSocket (my license stops with Xojo 2015r1).

I try to rephase using examples:

http://www.emile.com/images/1999/01/01/ http://www.emile.com/images/1999/01/02/ http://www.emile.com/images/1999/01/03/ http://www.emile.com/images/1999/01/04/ http://www.emile.com/images/1999/01/05/ http://www.emile.com/images/1999/01/06/ http://www.emile.com/images/1999/01/07/

January 3rd, 1997 is a sunday and in that year, there is nothing defined for this URL. The web server returns this image:

http://www.emile.com/images/2017/01/09/

When I pass http://www.emile.com/images/1999/01/03/ to a web browser, I get todays image. That is why I searched for the HTTP Status Codes. Of course 404 was not working.

My method returns one file for each sent URL including for the Sunday one, but the image is todays image (the web server do that by itself / is configured by its owner to do that; I am not the web owner). *

That is why I added the If block If mySocket.HTTPStatusCode <> 308 Then.

I hope this is clear now.

The idea is to automatize the download process instead of doing that manually (that is what computer are created for): boring process and very time consuming.

shao_sean · January 9, 2017, 4:06pm

Sounds like the server is handling the 404, so your client will not see it…

Marius_Dieter_Noetzel · January 9, 2017, 4:19pm

As I said, it sounds like the Server is directly delivering the today page. Perhaps with 404 or 200. You should have a look, which status code you are getting. If it is 404 you can handle that. If 200 you could first load the today page, save the image, get the md5 hash or something like that. and check that against all the other pages.

Emile_Schwarz · January 9, 2017, 4:21pm

Thank you Marius for your answers.

The original link. There is not even a feedback return (no flickering, nothing). To understand what happns, you have to read the image date ijn the page. (if you do not know what you are supposed to get).

Marius: you may be right: I do not think at that, but something is done because 404 File Not Found Error is not returned. That was my first check. Redirection is the only thing I was thinking at.

Nota: maybe the web server does not send anything to the web browser but a page (the asked one or the default one), leaving HTTPStatusCode empty.

PS: I do not tested 200. (success ?)

That is an idea to explore.

Marius_Dieter_Noetzel · January 9, 2017, 4:24pm

There would be a script on the Server that returns images. In case there is no image, the script will return the todays image.
So this call is handled and 200 could be sent.

Emile_Schwarz · January 9, 2017, 4:36pm

BTW: I start by asking for an html file, then parse it and then ask for the image. That certainly change nothing for the code.

It is possible that [quote=308813:@Marius Dieter Noetzel]There would be a script on the Server that returns web page - the current day one - that hold today data.[/quote]
Yes.

Marius_Dieter_Noetzel · January 9, 2017, 4:39pm

Perhaps the file URL ist not scriptbased. I that case you could check if you already downloaded the current url and skip the actual task…

Emile_Schwarz · January 15, 2017, 9:41am

In fact, when I get Error Code 302, and I read in wikipedia 302 Found, I cannot understand.

The error stands either in my understanding or in Wikipedia Code message.

3xx Redirection

302 Found <-- My understanding was: the server found the file !

I changed that in my newly created dictionary I use to report an error as:

  Err_Codes.Value(302) = "Redirection: Found"

With a current html page I know that does not exists (yet), I passed to the method, and I correctly found the error.

Since Jan 9, I greatly expand the method and now I do not download html instead of image (test in “Image” from the Content-Type), etc.
Now I also report the errors in a Log - Error (with the file name, the error code and message). Log - Download holds the list of downloaded items (images) with properties and InternetHeaders.

FWIW

Marius_Dieter_Noetzel · January 15, 2017, 11:47am

It is really easy. 300-headers are headers that handle redirection. First there only was 301 and 302.
301 is something like “it is not here, in the future you have to go there”
302 is “it should be here but isn’t. Ohh, wait, I found it for you. In future it should be here don’t save that new URL”

Emile_Schwarz · January 15, 2017, 4:26pm

Yes, Marius, that is what I understandearlier today ;).

Todays context allows me to really understand where I was wrong.

Thanks all.

Greg_O_Lone · January 15, 2017, 11:04pm

You may find it easier to understand if you think of these as status codes instead of error codes.

Emile_Schwarz · January 16, 2017, 4:30pm

A brain exchange with a far younger may be better