Google TOS aside, and pass the pretention to sell engineering belonging to someone else, I find the concept intriguing. He pulled off using a web site by running a program that must send events, character chains and maybe clicks, which produce a result meant for human eyes, and repackage that into a text field.
There are uses for web robots, to fetch informations not necessarily accessible through APIs.
I want to explore the concept. At this time, I will experiment with HTTPSocket which is meant to send and receive HTTP.
OK. I started work on the web robot. An HTTPSocket I use to open the page, and an HTMLViewer to see the result of the incoming HTML. At first glance, translate.google.com seems like it can be automated with keystrokes.
The field to translate has the focus as default. A paste seems possible. Then, tab jumps between popmenus, and Return clicks “Translate”.
I have now to figure how to feed the keystrokes through the HTTPSocket
Don’t use HTMLViewer to do website automation. It’s far too limited in Xojo and not even necessary for what you want to do. You don’t have access to the cookies, to the DOM of the website and can only execute javascript.
I use HTTPSocket for all website automation.
Install Fiddler, sniff the traffic of the website you want to automate and start coding it with HTTPSocket.
Ask any specific questions you may have for web automation with HTTPSocket, and I will try to answer them!
use HTTPsocket or HTTPSecuresocket to fetch the data. If the site has an API, use the API. APIs dont change that much. And are designed to be called by “bots” or programs. If there is no API, you can fetch the pages, then parse the source. Problem with doing the later is, when the marketing dept changes the page, your program breaks. Some companies change their website layout (even if it is only under the hood) to break people page scraping.
Not sure if this will be of any help or not, but I’ve done some web bot work using cURL’s command line interface, SHELL and RealStudio. This is just a preference thing, but I much preferred working with cURL than HTTPSocket.
Not sure if this will be of any help or not, but I’ve done some web bot work using cURL’s command line interface, SHELL and RealStudio. This is just a preference thing, but I much preferred working with cURL than HTTPSocket.
Ben[/quote]
I started with the tool I am most comfortable with. But am open to any solution that works. Basically, I want to see if it can be done
[quote=76898:@Alain Bailleul]I’ve ripped some webpages with AutoIt (Windows only I think) in the past. Very powerful scripting stuff. Maybe mac has a similar thing?
[/quote]
Actually, I would love to build such a tool. May not be awfully powerful, but a small scripting language and some parsing of the results. Back in the eighties, I had devised a bot in QuickBasic for QuickBasic and the French Minitel, and have some ideas about what can be involved. For some reason, I thought HTTP transaction where more complicated than line oriented terminals.
[quote=76898:@Alain Bailleul]The result I got back from the TCPSocket looked suprisingly easy. A simple parse gets the translation.
[/quote]
I see. Now it makes the packaged $40 class even less justified.
QuickBasic! This brings back memories… As a young teenager I wrote my first memory optimalization tool in QuickBasic so I could get the extra 5KB I needed to play Police Quest 4 on my x86
OK. Now I have a question : with HTMLViewer1.LoadPage(http.Get(url, 30),f) I get a page. Now, to simulate a user, I need to send keys to the server. It is not a URL. How to I do that ?
I’ll say you can most definitely make a bot to walk a set of pages that has low dynamic content (since there’s no nice way to execute on page javascript that maybe modifies the dom & computes urls etc with an httpsocket)
I’ve done it using just the httpsocket for producing the internal documentation set from a private copy of the wiki (so we DONT scraps the online one & cause performance issues)
But those pages how no dynamic content that is created or accessed using Java script - the URL’s for images & references to other pages are all static or tags in html pages
Most of the Sierra quests were awesome. Actually downloaded Hero’s Quest I (my favorite series of theirs) a few months ago and played it in DosBOX. Probably got most of my typing skills from these typing quests.