Is Xojo web for web sites

Thanks. After reading that, though, I am not certain it will enable Google to crawl a Xojo web app, as navigation inside will be carried out only through controls and not links.

In my proof-of-concept implementation…

A .htaccess file is used to direct the crawler (with mod_rewrite) and a robot.txt file to SpecialURL and from the links it’s been supplied, the application then outputs the raw html from an SQlite database per request of the crawler by invoking the URLs (where the output comes from below). The robot.txt ensures private pages don’t get indexed since the crawler control (made with WebControlWrapper) “indexes”(snapshots) each page containing a crawler control. (User acct pages could avoid containing the control) When a page is viewed by normal methods (webbrowser), a background script is ran (inside the crawler control):

“var markup = [document.documentElement.innerHTML];
Xojo.triggerServerEvent (’” + Me.ControlID + “’,‘CrawledPage’, markup);”

And triggers the event to write the markup to the database using the ExecuteEvent. This method keeps the content updated for when the crawler visits again (for dynamic webapp content ie news on front page, etc. static pages only need the crawler control to be used once). Originally I thought I’d have to use an HTMLViewer control to snapshot the page, but javascript alone can gather the page markup and pass it back to the application to store.

I’ve been using OpenWebSpider localhost-side to crawl the web application and it works great, so far no bugs.

Matt, your above comments look interesting and I would love to understand more of what you are saying. Can you encapsulate your ‘proof of concept implementation’ in a few sentences for us dummies out there (me being the chief dummy).

  1. Crawlers look for a robot.txt file in the root path of a domain (ie http://www.domain.com). A robot file is sort of like an htaccess file for site crawlers/spiders. The file contains allow/deny specs for certain paths (urls) of the domain.

  2. mod_rewrite is an apache module that can “rewrite” urls. So http://www.domain.com/special/#page1 (contains the raw html output of our crawled page) could be synonymous for http://www.domain.com/#page1 (which is our actual un-crawlable url because it’s created by javascript when our webapp runs) the htaccess file can specify that the redirected urls are only seen by a crawler (ie Google).

  3. Since a Xojo webapp is uncrawlable, because it’s output is generated in the client browser, we can grab the “formulated html output”; which contains the readable text a crawler looks for. After we obtain that, you can pass it back to the Xojo application, which saves that output for the crawler to an SQlite database file. In order to generate each page’s html output we need to literally visit each page of the Xojo webapp so the raw html output will be saved, and each page to be crawled, must contains a script (I made a custom control) to get the html of the page after all javascript have rendered the page.

  4. Use HandleSpecialURL to output the sqlite content of the html page to the crawler. Since mod_rewrite was used, the true urls will appear in the search results done by the end user, after the site has been crawled, and the search results added to the search provider. Using hashtags makes crawling easier, and each hashtag displays a page. (domain.com/#home domain.com/#products etc) These urls will be added to the robot.txt file as domain.com/special/#home domain.com/special/#products etc). After they’ve been crawled, mod_rewrite can redirect these links to the correct urls to load the true app page (created by javascript) for the user.

Mod_Rewrite:
http://httpd.apache.org/docs/trunk/rewrite/remapping.html

Robot.TXT:
http://www.robotstxt.org

For a quick example without writing any code, paste the following in your url addressbar and press enter

javascript:alert (document.documentElement.innerHTML);

Some broswers remove the javascript: prefix and you may have to manually type it in after pasting.

Matt, Good reply. You should write a book for all us dipsticks out there. Even I understood. Thanks for the response.

[quote=109822:@Matthew Combatti]1) Crawlers look for a robot.txt file in the root path of a domain (ie http://www.domain.com). A robot file is sort of like an htaccess file for site crawlers/spiders. The file contains allow/deny specs for certain paths (urls) of the domain.

  1. mod_rewrite is an apache module that can “rewrite” urls. So http://www.domain.com/special/#page1 (contains the raw html output of our crawled page) could be synonymous for http://www.domain.com/#page1 (which is our actual un-crawlable url because it’s created by javascript when our webapp runs) the htaccess file can specify that the redirected urls are only seen by a crawler (ie Google).

  2. Since a Xojo webapp is uncrawlable, because it’s output is generated in the client browser, we can grab the “formulated html output”; which contains the readable text a crawler looks for. After we obtain that, you can pass it back to the Xojo application, which saves that output for the crawler to an SQlite database file. In order to generate each page’s html output we need to literally visit each page of the Xojo webapp so the raw html output will be saved, and each page to be crawled, must contains a script (I made a custom control) to get the html of the page after all javascript have rendered the page.

  3. Use HandleSpecialURL to output the sqlite content of the html page to the crawler. Since mod_rewrite was used, the true urls will appear in the search results done by the end user, after the site has been crawled, and the search results added to the search provider. Using hashtags makes crawling easier, and each hashtag displays a page. (domain.com/#home domain.com/#products etc) These urls will be added to the robot.txt file as domain.com/special/#home domain.com/special/#products etc). After they’ve been crawled, mod_rewrite can redirect these links to the correct urls to load the true app page (created by javascript) for the user.

Mod_Rewrite:
http://httpd.apache.org/docs/trunk/rewrite/remapping.html

Robot.TXT:
http://www.robotstxt.org

For a quick example without writing any code, paste the following in your url addressbar and press enter

javascript:alert (document.documentElement.innerHTML);

Some broswers remove the javascript: prefix and you may have to manually type it in after pasting.[/quote]

This an astute way of having a set of pages indexed that represent the app. Congratulations. Now, let us imagine Google displays a page for the user that is, say, somewhere in the app which can be accessed only by selecting a given value in a PopupMenu. How will the crawled page allow the visitor to get to that part of the app and use its navigation ? You would have to use HashTags or something to prepare links to every WebPage in the app. When I look at Google statistics for my web sites, a majority of users land on pages other than the index page, which would be the app location. In terms of user experience, if the visitor is not able to navigate the app, it can be counterproductive.

Let us admit the navigation issue is solved. Another major element of getting a good ranking from Google is not to simply be able to crawl a page and index the ‘OK’ and other ‘Continue’ button labels and general labels, or the limited view of a listbox. This would be absolutely irrelevant to the actual content of the app and therefore as useless as a fifth wheel in terms of indexation. What characterizes a static HTML page is that it contains the whole package. The more verbose the text, the happier Google is, which will be able to index away and start weighting words in terms of relevance. Then as described in the PDF I posted above in a link, it will ponder the title against the meta keywords and the description, plus how many times relevant terms are present in the web page. Not to mention how many links have been made to that particular site from other domains. Which appears rather shaky for a web app.

Xojo web apps, unless specifically designed to be open book pages, are like desktop apps notoriously concise in terms of text. It would take adding a load of descriptions and sentences to them through for instance PageSource controls that load the bottom of the page with literature.

Altogether, and in spite of your generous efforts, one cannot in good conscience advise the use of a Xojo app in lieu of an HTML site and hope any sort of valid ranking from Google or any major search engine. The best ranking is still achieved by HTML, maybe HTML+PHP. I will still advise a reasonable use of Xojo as a dynamic adjunction to well built HTML sites rather than as a replacement.

Thanks Michel for elaborating on what Matt posted. Unfortunately it looks like I have blown $400 on the Web app version of Xojo
Maybe it was a case of RTFM-DH

[quote=110689:@Chris Benton]Thanks Michel for elaborating on what Matt posted. Unfortunately it looks like I have blown $400 on the Web app version of Xojo
Maybe it was a case of RTFM-DH[/quote]

Sorry about that. But still, my own experience shows Xojo is a wonderful asset to make an HTML site more attractive, by adding dynamic pages where they are needed, and carrying out some functions. For instance, I use Xojo to get input from the customer. Since I started using Xojo for that, all spam stopped. That is a great improvement.

Can you point me to a site where Xojo is integrated into the site?

I have successfully integrated Xojo into a wordpress website. It was so painless I will be doing it again shortly for another client.
All you need is your favorite CMS and an iframe plugin with standalone config. Your app and the cms can run easily on the same server.

[quote=111290:@Chris Musty]I have successfully integrated Xojo into a wordpress website. It was so painless I will be doing it again shortly for another client.
All you need is your favorite CMS and an iframe plugin with standalone config. Your app and the cms can run easily on the same server.

[/quote]
Chris, any chance you can direct me to the site where I can see your work in action?

I have developed my last one for a company intranet so cant show you that.

Its really quite easy though, just set up wordpress or joomla or drupal, whatever your flavor is, then install an iframe plugin and set the address to your app (with port - compiled in standalone).

I just put together a quick and dirty one for you - Try This!

It does not do much - just click login and exit but it shows you the gist of what you can do.
Benefit is website is fully indexed with google but dynamic content cant be.
If you wanted stats you can easily log some yourself to a db with xojo.

Other sample Xojo web apps were shared in this conversation:

Deployed Web App Examples

Thanks Paul and Chris. Chris I am interested in how you implemented such great drop down menus. I have been working on getting a nice drop down for over a month now and all I have produced is some clunky drop down that does not always hide the menu because the mouse exit event does not always fire. How did you do those menus??

The drop down menu’s are not Xojo, that is wordpress.
The Xojo app is a 800X600 Pixel object under the title bar that says “Hey Xojoers!”.
Its the best of both worlds!

I will do a video on how to do it one day…

Sorry, I was under the misunderstanding that the whole web browsing environment was the launched app. I am still trying to get my head around how this thing works.

Its an iframe like a window within the website.
Kinda similar to the xojo webviewer control.
If I put a box around it you would see it clearly.

Incidentally mouse over events in xojo web are known to be flakey. I would steer clear of them.

That explains a lot, I thought I was going mental because the code wasnt being executed in the mouseexit event. I wonder if I can bill Xojo inc for the 400 bottles of whiskey and 100,000 strands of hair this has cost me.

You have more chance of getting mouseexit to work flawlessly.