Intermittent Server 500 Error on working application

Craig_Robinson · October 16, 2020, 1:47pm

Hi All - might seem the same as other topics I’ve found in a search, but the circumstances of this one are slightly different and none of the solutions that appear in other topics are either applicable, or otherwise do not work.

I have an otherwise perfectly working Web application running on an Ionos server running CentOS. No problems in general, but then intermittently and without warning the server randomly starts to scream a Server 500 error for new requests.

Again (as reported in other posts), the fault seems to be (from examing the logs) in the cgi file reported as:

i. Unitialised variable $length on line 131 and 130 (appearing once for each line)
ii. Many repetitive reports of undefined symbol reference in line 118

In this case however, the problem is not with new loads, new apps etc. but with an otherwise working application that may go for days or weeks without any problems at all, followed by the random appearance of the error.

The solution is to simply run a killall [appname] and the connections start working again.

Anyone seen this one before / have any ideas of cause?

Mike_D · October 16, 2020, 2:42pm

This appears to be a Web 1 project (since Web 2 doesn’t support CGI builds). What Xojo version are you using?

When I build for linux as CGI using 2019R1.1, here are lines 118-131 of the cgi script (which is PERL):

	print $sock $body;
	
	my $continue = 0;
	my $loopcount = 0;
	do {
		$sock->recv($type,1);
		$sock->recv($length,4);
		
		$type = unpack('C',$type);
		$length = unpack(PACK_KEY,$length);
		
		$response = '';
		do {
			my $chunk_size = $length - length($response);

the variable $sock is defined earlier at line 47:

my $sock = create_socket($app_port);

My guess is that create_socket() is failing, but no idea why that would be…

KevinW · October 16, 2020, 2:45pm

We have a couple of apps that run on the older web framework and see something similar but it happens maybe 2 or 3 times a year. At one point a couple of years ago it was happening somewhat regularly and I think it was bad actors hitting the apps and causing issues. I put the apps behind a basic authentication form generated for apache and now it happens maybe once or twice a year, so I don’t really worry about it any more.

I’d like to upgrade those apps to web 2 at some point, but I don’t think it’s ready quite yet, so I’m just in a holding pattern.

Craig_Robinson · October 16, 2020, 2:52pm

Similar here - this app is quite a size, and uses a lot of internally created custom. In amongst all the other work, it could take us a couple of years to update.

Craig_Robinson · October 16, 2020, 2:56pm

Yes it is, latest version prior to Web2.

That’s the big puzzle really, as everything works fine until the sudden failure - same script, same environment. On the one hand I’m hoping that it is purely external and something the host is doing that triggers it rather than the app, but then on the other hand if it is it will be a pain to try and track and fix.

Greg_O_Lone · October 17, 2020, 1:39pm

The errors you listed are coming from the cgi script and are related to one of two problems…

The app is not running and could not be started
The app is running but is not responding.

I suspect it’s the latter because we’ve seen this in many configurations over the years.

The app is caught in a tight loop and can’t respond
The app is running low on memory’s
The app is running low on disk space

Craig_Robinson · October 17, 2020, 3:58pm

Thanks Greg.

I’m sorry to say that (1) seems to be the only likely option, as checking the memory and disk space usage on the host reveals it to be pretty normal.

Right now however, I’m finding a hard time thinking what the loop might be - the only large / tight loops we have in there now are all handed off to threads and as it is so random and often goes a while without occurring, tracking it may be an issue - especially if it is being triggered by a single client and not the general cohort.

Time to start digging.

Greg_O_Lone · October 17, 2020, 5:57pm

Make sure those threads yield time through the sleep method.