CGI web app crashes randomly

Frustrating problem: I have 2 different apps running on same server. Both apps are quite complex and use SQLite and SMTPSecure Sockets, etc.

One of the apps has run very reliably 24/7 for over a year.

The other app crashes occasionally (couple of times a week) - and when a user attempts to access the crashed app, we see “500 Server Error” message. I can then force the app to quit (by deleting the AppName.cgi file - the app has a timer which checks the existence of the .cgi file and calls App.quit if the file is missing…) .

Once the app is restarted - (by replacing the .cgi file) it will then continue to run perfectly for a few days.

Obviously the unreliable app must have some sort of bug in it which I can’t find. I have a logging routine which logs User Events and Session Events, and all looks normal (ie every Session that Opens, eventually Closes OK, and if all Sessions are closed for a while, the App quits OK.) When the app crashes I do not see any sign of an error being raised,

Any ideas how I might go about troubleshooting this? Is there some way I can get the app to detect this “500 Server Error” condition, and have it re-start automatically without manual intervention. It’s doing my head in!

AFAIK, the 500 Internal Server error happens way before the app itself is launched, in the cgi script that runs before the Xojo executable, so your logging routine will not help. It is usually an Apache issue. I have seen it happen with permission issues, as well as bugs in Perl scripts.

You will find the 500 error in the server log, and it should tell you more about what happened.

The strange thing is the random occurrence.

This summarizes the most common causes : http://www.serverschool.com/dedicated-servers/how-to-troubleshoot-an-internal-server-error/

I do not see how that can be detected from inside the app, since it has no chance to run.

Hi Michel, thanks for the info. Yet it is strange that forcing the App to quit, then re-starting seems to solve the issue.

Also, the fact that the App Quit timer code is still running suggests that the App itself is still running ok. Could it be that the App is somehow causing Apache to “lock” a file which is then preventing new connections to start up??

How odd!

[quote=224707:@Tony Davies]Hi Michel, thanks for the info. Yet it is strange that forcing the App to quit, then re-starting seems to solve the issue.

Also, the fact that the App Quit timer code is still running suggests that the App itself is still running ok. Could it be that the App is somehow causing Apache to “lock” a file which is then preventing new connections to start up??

How odd![/quote]

A CGI Web app is made of actually two things : a cgi script, which launches an executable and directs the user to it.

I believe the issue is with the CGI, which seems to lose its permissions or gets corrupted for some reason. But once the Linux executable has been launched, it does not need the cgi script to work. Hence the app still runs fine, and the timer ticks nicely.

If you check running processes with ps aux, you will see both myapp.cgi and myapp running when all goes well.

I bet uploading just the cgi instead of the whole thing will temporarily fix the issue. Now, the challenge will be to understand how that cgi file gets stuck.

@Tony Davies — what’s your cpu & memory usage like when you need to quit that app?

This sounds more like a hang than a crash…

@Tony Davies

Were you able to come up with a solution. I have about 10 different cgi apps running on our server. And every once in while one exhibits the same behavior. I go into the server and have to terminate the app before it will start up again.

I thought I could create an app that would ping each app that registered a launch, then if it didn’t get a response would be presumed hung. However I’m not sure it’s possible with Xojo to kill another process on the web server.

Right now I have all the apps that are running at 3 am terminate, via a cron script.

Thanks Rich

Probably file rights problem, server is trying to access something that isn’t there or can’t be referred to.

First try your .htacces file, check it for possible errors.

Next try setting file rights, as they state in the documentation of Xojo. (probably of the .cgi file)

See if you have local caches (clear them to make sure).

Make sure you have uploaded everyting in “binary transfer mode” it could be “ASCII or TEXT” mode transfers corrupted something.

EDIT: Memory of the server can be an issue.
As well as concurrent (same time) connections - WebSockets or so, well i guess you use .cgi instead of stand alone so that shouldn’t be a problem.
A server could be configured so that only (mostly 20-50) connections are available at the same time. If you have this problem, it could show 500 Internal Server error. Check your logs, or contact your host.

These only crash/freeze and throw the internal server error once and awhile. And not always the same app.

If you are able to shut down apps with cron, use the same command from you monitoring app if you need to stop an app.

I also use the 3:00 AM shutdown routine and have not seen any app freezing since.

My web registration app has the exact same problem, but it’s something of a mystery. Philip Zedalis suggested that it was a problem that occurs when too many people try to use a web app at the same time. It appears that the app is still running, but unable to open new sessions. Restarting the app always solves the problem (it’s not necessary to restart Apache or the server)

I plan to switch to a standalone web app using load balancing, which Philip thinks will solve the problem, but it would be good to know for sure.

The best bet here would be concurrent connections i think.

I wonder if a stopgap solution would a shell app, that auto launches the core app if it’s not running and then core app will auto quit if there are no connections. Won’t stop a hang, but will clear the memory when no-one’s using it.

The problem is, if the last thing the app reported before it hung was that there were 9 connections, the helper can’t tell if it’s time to quit. You’d essentially be just adding another layer to the launch mechanism.

The fact that I’m having so much trouble reproducing this, really makes me theorize that the app is internally running out of ram somewhere, even if just for a critical moment (like when the server socket needs to allocate more connection sockets) and is left in a bad state. Sessions start dying off because the app can no longer respond to requests and by the time you get to it, the memory footprint has reduced back down to a minimal level. But that’s just a theory.

That’s possible, but this app doesn’t use any significant graphics or upload/download files, it just collects info from a few fields and puts that info into a database. The DB is a global variable that stays available for all sessions. Maybe the prepared statements that query the database and send it data are failing to release ram (though all are local variables)? Is that even possible?

[quote=290591:@John McKernon]My web registration app has the exact same problem, but it’s something of a mystery. Philip Zedalis suggested that it was a problem that occurs when too many people try to use a web app at the same time. It appears that the app is still running, but unable to open new sessions. Restarting the app always solves the problem (it’s not necessary to restart Apache or the server)

I plan to switch to a standalone web app using load balancing, which Philip thinks will solve the problem, but it would be good to know for sure.[/quote]
Did switching to a standalone app solve your problem?

I wound up not making any changes, it still fails maybe once a month, but inertia is the strongest force in the universe. If you switch to standalone and web balancing, I’d love to hear if it solves the problem for you.

And with Web 2.0 (whenever it’s released), I’ll switch to standalone for sure.

[quote=439125:@John McKernon]I wound up not making any changes, it still fails maybe once a month, but inertia is the strongest force in the universe. If you switch to standalone and web balancing, I’d love to hear if it solves the problem for you.

And with Web 2.0 (whenever it’s released), I’ll switch to standalone for sure.[/quote]
My App timer script that allows me to kill the app if I rename the CGI file also kills the app when it sees no sessions (which the framework doesn’t always do on its own when there’s an Error 500 present, even if AutoQuit=True). So I never have to manually kill the app when there’s an Error 500 anymore. However my server logs show an Error 500 about every 2 days, and only when the app is hit when not running. No App or Session exceptions are ever logged during such crashes. I may switch to standalone if I am assured that the app can be automatically killed when Error 500 appears. At least, right now, killing an Error 500 is done automatically when all sessions die.

Have you tracked memory usage of this app? Failures this far apart are often the result of a small memory leak.

FWIW, cgi apps will quit on their own when the session count reaches zero. That makes me think that you do have a circular reference somewhere that’s holding onto sessions.

You wouldn’t see an error 500 on standalone. That is a response from the web server saying it cannot talk to the CGI app. In standalone if the app were to crash it would be restarted (in our environment or when installed as a service).

When I asked about this here, Thomas Hamann said he got an Error 500 when testing a standalone. Scroll down to May 15 (this month) there.