Thinking through CGIs and Internal Server Error 500

When I get an Internal Server Error 500 (hitherto called “ISE 500”), I easily resolve it by renaming the CGI file, then wait 5 seconds and rename it back. This always resolves the problem, in that I can then load the app without the error. This is because I use Phillip Zedalis’ kill code in the App object. That code consists of a timer that looks for the CGI file every 5 seconds, and if it fails to find it, it kills the app. I get an ISE 500 about every 2 to 6 weeks.

So, given that killing the app this way always resolves the situation, this tells me that, when the ISE 500 is present, the app is running, Perhaps in a corrupted state. But running. Otherwise, Phillip’s kill code would not execute when I rename the CGI file. But it aways executes and kills the (albeit perhaps corrupted) app.

Suppose I place another condition for killing the app in Phillip’s kill code: App.SessionCount = 0. That is, the code would kill the app if either the CGI file is missing or it sees no sessions (checking every 5 seconds). I know that App.AutoQuit should take care of this, but is it possible that it’s not doing that when in this corrupted state? Might doing this greatly reduce or eliminate ISE 500 events, because it’s killing the errant app even when App.AutoQuit fails to do that?

If you are running as cgi, your app should already automatically quit once the session count reaches zero and 3 minutes have passed (default session timeout).

If it’s not doing that, it likely means that you’ve got a circular reference somewhere in your app. Usually this is caused by keeping hard references to sessions or webpages within their children.

There is a larger issue here. Even if the session count never reaches zero because of circular references that does not explain the 500 issues and or app unresponsiveness.

Having moved many many apps to standalone mode I now know that whatever the issue is it has to be related to the CGI implementation. Standalone apps will keep chugging away forever until they either get too much traffic or a hard crash occurs. Standalone apps never find themselves just running but not responding as the CGI deployments do.

Now whether thats part of the Xojo binary and its the parsing of the requests handed off to it from the web server, or if its something to do with the session counter and auto-quitter I don’t know. Maybe even its some poor optimization and socket limitations of the perl script (cgi file) but something is different between CGI and standalone that causes this behavior.

To illustrate an example: If it were something in the developer’s code itself then I would expect the standalone mode app to eventually stall out and stop responding to requests while still running. This behavior is never exhibited.

Greg,

My session count routinely drops to zero, all day. I check on it frequently. And since Phillip’s kill code successfully resolves an Error 500 when I trigger it with a CGI filename change, I can only assume it was caused by an app that was still running but not responding somehow. I thought I might be able to put Error 500 to bed by making sure such apps die via his kill code when it sees no sessions, since, in Error 500 moments, they don’t seem to be dying via App.AutoQuit. Unless, of course, during those rare moments when I get an Error 500 (once every 2 to 6 weeks), the session count actually rises sky high. But my frequent testing of the app many times a day never shows more than one or two sessions other than mine, and most of the time it shows only my solitary session (implying it was zero before I hit the app at that moment). So that’s hard to imagine. But, then again, I’m still a relative novice with Xojo.

I use exclusively cgi

I observe exactly the same here. Once in a while, I have to kill the app because it became unresponsive. Yet the apps have autoquit on, and in the case of a mail program, I even explicitly kill the app once a day.

It is extremely annoying for the app delivering software by download to my customers. When that happens, I justifiably get angry calls.

I’d be curious if you could record the ticks in app.open and then have a place where you could check the difference:

str(ticks-app.startticks)

I’d love to know if when you log in and there is just one session, whether that number is effectively zero.

Another test you could try is to write the number of sessions and the ticks difference to a text file next to your app and then when the error 500s start, inspect the file. If the app has truly hung, that may tell us a little about what was going on.

[quote=327964:@Phillip Zedalis]There is a larger issue here. Even if the session count never reaches zero because of circular references that does not explain the 500 issues and or app unresponsiveness.

Having moved many many apps to standalone mode I now know that whatever the issue is it has to be related to the CGI implementation. Standalone apps will keep chugging away forever until they either get too much traffic or a hard crash occurs. Standalone apps never find themselves just running but not responding as the CGI deployments do.

Now whether thats part of the Xojo binary and its the parsing of the requests handed off to it from the web server, or if its something to do with the session counter and auto-quitter I don’t know. Maybe even its some poor optimization and socket limitations of the perl script (cgi file) but something is different between CGI and standalone that causes this behavior.

To illustrate an example: If it were something in the developer’s code itself then I would expect the standalone mode app to eventually stall out and stop responding to requests while still running. This behavior is never exhibited.[/quote]
Phillip, I’m not aware of this issue happening on Xojo Cloud, and since we’re exclusively cgi at the moment, it may actually point to a server config problem.

Can you tell if the 500 errors are coming from within the app or from the web server in front of it?

FWIW, the over the years we’ve worked diligently to bring these two code paths together so that there won’t be differences like this in mode behavior. i would expect you to see this less with newer versions of Xojo.

Ok, so I just looked at the source and there’s only one place where we generate a 500 Server Error that you can get to, and that’s in response to a request that ends with an Unhandled Exception at the application level. That is, we only return 500 when you don’t return True in App.UnhandledException. Please check your code and see if you have implemented that and that you’ve coded it carefully enough that you won’t get another exception. The most common case of this is when trying to write to a log file using a TextOutputStream and the .Create or .Append methods raise an exception because the target directory or file is not writable.

On ServerWarp, Error 500 is the generic response to any error, from 401 to error in the app.

The 500 error comes from the web server because the perl script stalls out essentially. I have traced this before and basically what happens is the perl script tries to query the running app and if it doesn’t respond then it attempts to start it. However in this situation that seems to only afflict apps running in CGI mode the app is still running but it is no longer responding to the perl script. The perl script basically throws an error that it can’t make contact or start the app and then the web server responds with a 500.

So my belief is that somehow whatever is parsing the request from the perl script gets ‘stuck’ somehow. It either is waiting for a request to finish or it stalls out or who knows. Anyway it stops responding to requests wholesale and the perl script does not know what to do.

I do not believe its server configuration because I have hundreds of these things running and only a percentage of customers ever run into this problem. Then some smaller percentage of them get it all the time. When the app is moved to standalone it never happens so I don’t think its the Xojo web request handler so much as the hand off from the perl script to the request handler that is specific to the CGI implementation.

I’m familiar with this situation, but what the perl script does in here is write a single line of text saying that the mutex couldn’t be cleaned.

Next time you guys see one, take a screenshot. I’d really like to see it. [quote=328090:@Phillip Zedalis]I don’t think its the Xojo web request handler so much as the hand off from the perl script to the request handler that is specific to the CGI implementation.[/quote]
Conversely I cant say that we’ve seen this situation and xojo cloud servers exclusively run as cgi.

I think it’s very easy to purposely cause an Internal Server Error 500 to see what it looks like. With a web running, just upload a new binary and hit the CGI file from your web browser. As long as you didn’t kill the first instance of it before the “upgrade”, I’m pretty sure this will cause a 500.

By the way, for some reason my original post appeared as Answered when I logged on here this time. I must have accidentally pressed the “This answers my question” button. I removed that.

Replacing the binary on disk while it’s running could certainly cause bad things to happen…

I thought you just wanted to see what a 500 looks like. Another way to see it is to upload a copy of your running app to another folder. Rename it if you want, but don’t change its AppID. Now hit both apps with your browser. You should be able to see what the 500 looks like that way.

Well, I updated my main web app with the modified version of Phillip Zedalis’ kill code in the App object (that code was mentioned in my original post). Again, his kill code runs on a 5-second timer, killing the app if it sees the CGI file missing. My modification is that it kills the app if it sees the CGI file missing OR App.SessionCount=0.

Eventually I got an Internal Server Error 500 with that web app, but with different consequences. This time the Error 500 went away without my having to manually stop the app. When I hit the app and saw the Error 500, I hit it again and saw it one more time, but the third time, it was running again. This is not my usual experience with Xojo web app Error 500s. Prior to this, once an Error 500 occurs, it will happen forever until you manually stop the app. I think that is the experience of others as well. Not this time, though. So I have hope that this modification will stop Error 500s from persisting beyond 5 seconds after the last session drops off, without my having to watch for them.

Well, I have had no persistent Error 500s since instituting this approach. So as long as your app regularly drops to 0 sessions, I think this will get rid of occasional persistent Error 500s.

What this experience tells me is two things. First, the app is still running when I get an Error 500 (otherwise the kill code would be ineffective). Second, the framework’s AutoQuit is failing when there’s a persistent Error 500.

Of course this doesn’t address why an Error 500 arises. It just makes sure that such errors disappear without manual intervention.

I have the same error 500 problem others do, and never have found a solution. I tried having the app track when the number of sessions reached 0 but the app wouldn’t quit.

So Michel, how do you kill the app once a day? Is it possible to automate the kill? Thoughts on what time of day are best (I have customers all around the world)?