because chrome automatically follows the redirects
an HTTPSocket is NOT a web browser and does NOT automatically follow redirects etc
it grabs whatever the server replies with from that URL and then YOU need to read that response and act accordingly - like a web browser would
[quote=455569:@Tomas Jakobs]what is absolutly okay. the http status code 301 is important. if it is 301 then the body may be empty. it’s defined in RFC 2616
should… not must…[/quote]
well must, but trust me some servers don’t have it… this one has it, as the browser can follow it.
I think this has something to do with the server requiring eighter cookies or some other data ?
This exact url “https://www.nytimes.com/es/” works 9/10 times in postman (if follow redirects = set to off)
so it should do the same with xojo.
Dim URL as string = "https://www.nytimes.com/es/"
Socket1.SetRequestHeader( "User-Agent", "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.169 Safari/537.36" )
Dim pageData As String
pageData = Socket1.Get(url, 10)
If you set a breakpoint @ pageData = Socket1.Get(url, 10)
you won’t see the data in the debugger until you step over it. so he might not see data while it’s actually there.
Dim URL As String = "https://www.nytimes.com/es/"
Dim s1 As New URLConnection
s1.RequestHeader( "User-Agent") = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.169 Safari/537.36"
Dim pageData As String
pageData = s1.SendSync("GET", url, 10)
break
If s1.HTTPStatusCode = 301 Then
If s1.ResponseHeader("Location") <> "" Then
Dim NewURL As String = s1.ResponseHeader("Location")
Dim s2 As New URLConnection
s1.RequestHeader( "User-Agent") = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.169 Safari/537.36"
Dim p2dat As String
p2dat = s1.SendSync("GET", NewURL, 10)
break
End If
End If
FWIW, It’s not only Nytimes… I’ve tried a couple of ws in different parts of the world, with the same result… (yeah, they are all newspapers and banking sites, so they probably care about intellectual property…) But chrome can redirect…
[quote=455563:@Norman Palardy]because chrome automatically follows the redirects
an HTTPSocket is NOT a web browser and does NOT automatically follow redirects etc
it grabs whatever the server replies with from that URL and then YOU need to read that response and act accordingly - like a web browser would[/quote]
Yes. They do that on purpose. Website owners only have to respond if they want to, so many websites are designed to check for the facets of a browser. User-Agent is one. There are other ways.
You’re asking us to help you circumvent a specific server by guessing at it’s requirements.
@Norman Palardy : Since we’re repeating ourselves I’ll do it as well : I understand, Norman… but the reply is empty…
I am not comparing xojo with chorme. I am just trying to understand the logic, and where does the data for the redirection come from…
ok, Tim. This helps me understand better … “so many websites are designed to check for the facets of a browser. User-Agent is one. There are other ways.”
headers that indicate if things have moved cant be found redirected etc
a properly written web client, like chrome, will know what to do with these and behave properly
the content - or “the reply”
for certain requests there will be headers but no reply
for some there will be headers and a reply
a properly written client, like chrome, knows the HTTP protocol and deals with all this accordingly
for YOUR application to do what it wants you will need to implement all that as well
and you may still get empty replies & headers because, as tim notes, the NYT may in fact be determining that this “client” is just someone trying to scrape data because it doesnt send the right headers, doesnt support capability examinination so the web site can tell if you are indeed a legitimate web client, or some other technique to stop scraping etc
without you properly determinging what it is the NYT is looking for you’ll probably make no neadway
And until your code replies inthe the way the NYT expects you’ll make no headway
I’d start by reading the HTTP protocol docs and implementing as much of that as you need