We’ve asked this before and someone already scraped the entire NEW wiki last night.
PLEASE DON’T DO THIS
It makes the entire thing run very slowly for everyone & IF you really want the whole thing locally we CAN provide you with a dump you can import
You just have to ask
Asking for a dump is not likely to happen if the scraper need it immediately.
I’d suggest to provide a weekly (or monthly) zipped dump of the wiki and publish a link.
In order to generate it for Dash you’ll want the HTML not the wiki markup (or you have to run the entire wiki locally which you can do - but then you 'll want to scrape your local wiki)
You could run your own if we give you a dump + our extensions + images
Been reading about creating a docset and it doesn’t look like it should take me too long. I’m happy to do it (I use Dash a lot for PHP and Python work and love it) but I’m just trying to figure out the best way to get the HTML. What’d you reckon?
Looks like there’s an extension for MediaWiki called DumpHTML. If I could get a dump of your docs I could add this extension, dump the HTML and then write a little Xojo program to parse it and add it to a Dash docset database. Unless you want to give me the dump and I’ll just knock up a Xojo app…
OK I need a couple guinea pigs to try out the extract & set up so you can run a local wiki
email me directly norman@xojo.com and we’ll get the files & instructions to you
Done a lot of data scraping before… but not sure what the purpose of scraping the xojo docs would be. Unless they think it is the only way they can get an offline version or something.
archive.org changed the way they scrape, normally they do it over a long course of time now.
Bing is notorious for ravaging a server though.
Annoying Badu, doesn’t usually get too bad.
Google you can throttle if you have webmaster tools, but you can’t throttle them any other way. Well you can, but it isn’t great for SEO.