Library for webscraping

I need webscrapping, but there is a library in Xojo for Webscrapping?


FWIW, scraping is usually very specific to the site & pages you are extracting data from.

Perhaps an explanation of what you are trying to do would help us help you…

I’ve done quite some scraping…:

  1. write the HTML page to disk using HTTPSocket and TextOutputStream
  2. Convert HTML to text with ‘textutil’ (OSX only)
  3. Write specific handlers for the text you are looking for (i.e. looking for specific tags or markers)
  4. Validate the found text (make sure it is the text you want)
  5. Write the found text fragments to a table

In this fashion I managed to page through 40.000 bulletin board pages, taking about a half a day on a fast Mac.

If your needs are simple it’s quite easy to do. If RegEx will work and you only need certain pages then you can just keep a list of RegExes and URLs and retrieve and search in a simple loop. I’ve done this before for basic notifications.

If you want to be the next Google, then a lot more work will be required. :slight_smile: