I’m writing a small app that uses Wget.exe to pull html files from our library website. There are about 20,000 pages.
I have Wget.exe as a FolderItem and use launch(parameters) to get the html files (1 at a time). Wget.exe opens in a command window, gets the file and writes it to disk then closes the command window
Problem:
Sometimes it takes a microsecond to get the file sometimes it takes 15 seconds. So my question is there a way of just waiting until the command window is closed?
I don’t know that this will work either but it’s an easy fix if it works
(not the code, just the jist)
Do
Check to see if the file exists yet
read new html file
If it contains then loopControl = True
Loop until loopControl = True
Any thoughts would be appreciated. I don’t need help with the above code, just want to know if that will work or if there’s a better technique. i.e. monitor the command window. Sleeping the thread for 20 seconds on every pull doesn’t make a lot of sense to me.
TIA
Instead, use a multiple timer. Basically same code and if contains </html> stop the timer and return, call a method containing the code you have after the loop…
Couldn’t get the Secure socket to work properly. I’d download 10 pages then I’d get about 20 blank ones, then I’d get barred from the site for 30 mins.
Michael said last year: https://forum.xojo.com/28253-getting-the-html but as you see I didn’t get an answer to synchronous vrs. asynchronous so I went to wget.exe
To pause code in general, one would not use a Timer but threads and semaphores. The thread, when it knows it has to wait, would put itself dormant by suspending itself using the semaphore, and then, when the signal appears that tells the app that new data has arrived and the paused code should continue, it would wake it thru the semaphore again.
That way, the paused code doesn’t have to be split up into multiple methods and doesn’t need to remember state in variables outside of that code.
I think you need to set the user agent of your socket. If it works with a browser and wget in quick succession, but fails via the socket, it’s probably because your socket looks like a bot (which it is), while wget does not. This is less about async vs sync than presenting yourself as a proper browser to the website.
[quote=260375:@Thomas Tempelmann]To pause code in general, one would not use a Timer but threads and semaphores. The thread, when it knows it has to wait, would put itself dormant by suspending itself using the semaphore, and then, when the signal appears that tells the app that new data has arrived and the paused code should continue, it would wake it thru the semaphore again.
That way, the paused code doesn’t have to be split up into multiple methods and doesn’t need to remember state in variables outside of that code.
That’s what I was hoping would be the answer as I saw Mutex and Semaphore in the LR but it really doesn’t do a very good job of explaining it. It was what gave me the idea of the Do Loop. To me the Semaphore in the LR is not even monitoring a file.
Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/42.0.2311.135 Safari/537.36 Edge/12.246
Does that look right? I didn’t write it, I just found it on UserAgentString.com and clicked on the Edge Browser as I have no issue looking at the library on Edge. I’m looking for the pages that are javascript enabled. I see from last year’s pull a lot of the pages I posted for the students said “We’re sorry, some parts of the RyeLib website don’t work properly without JavaScript enabled.”
Now the LR is a bit confusing. Does the code (assuming this is correct)