I’m building a sample project and in this project I’m creating a web domain search engine. In a text field the user will input a domain name (to check its availability) and then select the extension (.com, .org, .net, etc) from a popup menu. When clicking on the search button, GoDaddy.com will be searched to check the domains availability. While searching, a progress wheel will initiate and the result (yes, it is available OR no it’s not available) will populate in a blank label next to the progress wheel.
I’ve looked at Xojo examples in the installation package, and also I’ve made some examples with HTML Viewer and TCP Socket. I’ve played around with a SOAPMethod example, but to be honest, I’m not sure which one is the best approach. I’ve also read about CURL as another option. Any suggestions?
I looked at the API for a bit, but I’m more interested in learning about web scraping, since I want to do that in the future. So I’ll take the web scraping route.
if you want your program to work longer, then you should use the api if it exists
web scraping will stop working each time the webmaster relooks his website
so that can be (very) often and you have to modify your program also very often
an API is less subject to change.
Start by getting HTTPSocket working fine at downloading pages. This also implies you learn how to send data to a web site that way. Which is not evident at all. Be ready for a lot of trial and errors.
Then you will need to learn how to fetch the significant parts into the HTML code. Which IMHO requires you get at least a superficial knowledge of the way HTML works.
[quote=201164:@Michel Bujardet]Then you will need to learn how to fetch the significant parts into the HTML code. Which IMHO requires you get at least a superficial knowledge of the way HTML works.
Dont web scrape. They will change the page and your app will break. its a never ending cycle and you will pull all your hair out. trust me on this one.
there are various DNS APIs out there, I would find one and use their API. APIs tend not to change very often and generally changes are announced so you can prep for them. Whereas website changes are done whenever the webmaster/mistress desires. Generally with zero announcements before hand.
To be fair, Web scrapping can have it’s uses, when for instance one needs to repatriate a lot of data that exist only as web pages, for instance for some academic study or survey. Then it can greatly speed up the process over copy and paste. But we are talking usually a one off circumstance.
For an app such as describe by the OP, web scraping is a recipe for pain and suffering. Not to mention possible copyright infringement.
APIs are usually limited in a few ways. Request amount limits, data restrictions, etc, so web scraping works better in most of the cases.
I don’t think most websites would do something legal if they detected someone scraping. Most probably uthey will use a firewall to block access to the website, or use recaptcha.
Possible, until the format of the page breaks the scrape. It is just like driving at the speed limit will get you safely there, whereas driving too fast will get you to the curb
They will probably not go to court, but they may request the app be removed from the MAS for copyright infringement, and most probably will get satisfaction. Or as you say just use technical counter measures.
If an API is available, why chance it ?
Once again, I am not against scraping per se. For an app that must be reliable, it is a bad idea IMHO.
Most websites don’t change the format for years. I have made some scrapers in 2012 but still work perfectly!
Not everyone releases their app publicly, and most of the time, bypassing their technical counter measures is easy. Like using deathbycaptcha for captchas, simulate a browser with the headers, and even use a browser like PhantomJS to execute JS.
Because sometimes the API doesn’t provide the info you need. For example youtube has a very good documented api, with which you can search and get info about the result videos. But that part of the API doesn’t provide the full description, which you may need. So scraping via the website is a better choice.
[quote=201822:@Ashot Khachatryan]Most websites don’t change the format for years. I have made some scrapers in 2012 but still work perfectly!
Not everyone releases their app publicly, and most of the time, bypassing their technical counter measures is easy. Like using deathbycaptcha for captchas, simulate a browser with the headers, and even use a browser like PhantomJS to execute JS.
Because sometimes the API doesn’t provide the info you need. For example youtube has a very good documented api, with which you can search and get info about the result videos. But that part of the API doesn’t provide the full description, which you may need. So scraping via the website is a better choice.[/quote]
Ashot, you missed my point. I do not believe scrapping should not be done, or that API is the best. In fact, each has its own advantages. None should be dismissed outright without looking into these differences.
You also got to keep in mind that the OP does not have a lot of experience. A clearly document API may be easier to implement for her.
Unless someone more acquainted with scrapping like you creates a sample project for her to learn from…
[quote=201839:@Michel Bujardet]Ashot, you missed my point. I do not believe scrapping should not be done, or that API is the best. In fact, each has its own advantages. None should be dismissed outright without looking into these differences.
You also got to keep in mind that the OP does not have a lot of experience. A clearly document API may be easier to implement for her.
Unless someone more acquainted with scrapping like you creates a sample project for her to learn from…[/quote]
I did not miss your point, I agree with you. Of course using the API is the best choice in most cases. I was just providing some more info about scraping without an API.
Kayla just a suggestion: A simple nslookup shell command would be more relieable than any alien API or Web Scrap techniques. Keep in mind that domains still can be registred but remain unavailable for http Website access.