Im looking to scrape some public web sites, reformat the data, save it in a database and then utilize it with other information on a website. Also create an iOS app that allows users to input their data/get results. Well, thats the plan.
I know that XOJO can help with most of it. Im wondering if XOJO is a solution for the web scrapping or, should I utilize something else.
a) You do the scraping at your own risk. Don’t try to sell an app with data that you don’t own. That’s why there are APIs which usually cost money.
b) Sockets, regex: everything you need is there. Oh scratch that, you want to do an iOS app. Those have sockets but no regex I think.
Leaving aside the legal issues for the moment, have you checked that the source code of the page actually contains the data you need? These days most web pages no longer contain the data, but download it on the fly so there is nothing to scrap. And those that dont most likely will in the not to distant future.
I would not go building kind of apps on such a weak foundation.
I’ve done web scraping for my own personal use, but would never consider doing it for any kind of application that I would ever distribute, both for the legal ramifications already mentioned, and also for the fact that the target site’s html code can and will change without warning, suddenly leaving you with a non functional application, until you figure out how to parse the new code.
[Also note, the correct term is “scraping”, not “scrapping”. They’re two different words that mean two totally different things.]
I also did web scraping with Xojo, though I found that a lot of trial and error is involved. Methods working on website A don’t work on website B with the same HTML code. With Xojo it is easy to create a basic setup, a socket, some parsing methods and off you go.
I scraped about 40.000 pages in a couple of hours, saving the parsed data in a SQLite db. Another approach is RPA, which is considered a tool for the ‘non programmer’ but I found it very complex and not easy to get stable results.
Web scraping for an app is a recipe for unwanted litigation.
Just to clarify the scrapping for an app…
I completely agree with all the warnings. What I am obtaining are ingredients to various products. The information you can obtain by reading a product label. It would be utilized in decrypting a product name to its various ingredients.
I do very much appreciate the input. I intend to utilize Python for the scrapping. And the value add information is mine, proprietary and will be database handled by dojo and sql lite. At telecast thats the plan if I can figure out the databases add/edit/delete in xoxo.