cgi app dies

I have a cgi app that seemed to run quite stable for some time. Lately the app has been locking up so that I have to go to the server and kill it in order to reconnect. It’s a pretty basic app that has an sqlite db and one main search screen.

Looking through logs, I only see one real error.

[Fri Mar 10 07:21:34.232528 2017] [cgi:error] [pid 113596] [client 192.168.2.159:63248] AH01215: Can’t use an undefined value as a symbol reference at radiolist.cgi line 118., referer: https://myapp.com/cgi-bin/RadioList/radiolist.cgi
[Fri Mar 10 07:21:34.233137 2017] [cgi:error] [pid 113596] [client 192.168.2.159:63248] End of script output before headers: radiolist.cgi, referer: https://myapp.com/cgi-bin/RadioList/radiolist.cgi

Looking at the cgi script line 118 is where the socket connection is made, so it appears that the app is locked up at this point. I haven’t found any other indication of a problem. Any ideas where I should start looking to resolve this?

The error you are seeing is because the binary isn’t returning a valid response. Look in the kernel logs and see if there’s anything specific about your app.

I’d also look and see if when your app is hung, is it using a lot of memory or CPU.

I can’t find any other logs. I’ll try and check the process itself next time I catch it in a bad state. Is it gdb on linux that lets you dump a stack trace? Maybe I’ll install that on the server.

How much traffic are you getting? Xojo CGI apps are notorious for locking up when they can’t handle their load.

start the web app using ssh ./MyAppname and see the output

That won’t help much in this case. CGI apps only run correctly when there’s a socket connection from the perl script.

Traffic is relatively light. Only about 5-6 people use this app, and I’d guess that we’re all using it at the same time very rarely.

5-6 people should not bring the app down.

So it happened again. The app was using very little CPU or Memory. I found the port it was running on and was able to telnet to that port. I tried a basic GET / HTTP/1.1, but got no response. Nothing I input seems to elicit a response. After killing the app and trying that again, it seems normal. The app must be looking for other specifics before responding, or filtering IPs or something.

I’m currently at a loss. Maybe I’ll recompile with a newer version and see how that goes.

I recompiled with the latest version. Working fine so far.

So it’s happening with the new compile as well. I found out the strace is the built in tool on linux to capture stack traces. I ran it on the process which is still running, and I got two traces as it appears to have two threads running.

The main thread is a repeating loop of the following:

select(9, [8], [], [8], {0, 0}) = 0 (Timeout) ioctl(8, FIONREAD, [0]) = -1 EINVAL (Invalid argument) futex(0x7f4daf7d8f24, FUTEX_WAIT_BITSET_PRIVATE, 47589, {3685633, 241799518}, ffffffff) = -1 ETIMEDOUT (Connection timed out) futex(0x7f4daf7d8ee0, FUTEX_WAKE_PRIVATE, 1) = 0 poll([{fd=3, events=POLLIN}, {fd=8, events=POLLIN|POLLPRI|POLLOUT}], 2, 0) = 0 (Timeout) select(9, [8], [], [8], {0, 0}) = 0 (Timeout) ioctl(8, FIONREAD, [0]) = -1 EINVAL (Invalid argument) futex(0x7f4daf7d8f24, FUTEX_WAIT_BITSET_PRIVATE, 47591, {3685633, 341404280}, ffffffff) = -1 ETIMEDOUT (Connection timed out) futex(0x7f4daf7d8ee0, FUTEX_WAKE_PRIVATE, 1) = 0 poll([{fd=3, events=POLLIN}, {fd=8, events=POLLIN|POLLPRI|POLLOUT}], 2, 0) = 0 (Timeout) select(9, [8], [], [8], {0, 0}) = 0 (Timeout) ioctl(8, FIONREAD, [0]) = -1 EINVAL (Invalid argument) futex(0x7f4daf7d8f24, FUTEX_WAIT_BITSET_PRIVATE, 47593, {3685633, 390989916}, ffffffff) = -1 ETIMEDOUT (Connection timed out) futex(0x7f4daf7d8ee0, FUTEX_WAKE_PRIVATE, 1) = 0 futex(0x1832164, FUTEX_WAKE_OP_PRIVATE, 1, 1, 0x1832160, {FUTEX_OP_SET, 0, FUTEX_OP_CMP_GT, 1}) = 1 futex(0x1832138, FUTEX_WAKE_PRIVATE, 1) = 1 futex(0x1743514, FUTEX_WAIT_PRIVATE, 1419, NULL) = 0 futex(0x17434e8, FUTEX_WAKE_PRIVATE, 1) = 0 poll([{fd=3, events=POLLIN}, {fd=8, events=POLLIN|POLLPRI|POLLOUT}], 2, 0) = 0 (Timeout)

The second thread is a repeating loop of this:

futex(0x7f4daf7d8f24, FUTEX_WAIT_BITSET_PRIVATE, 49801, {3685718, 470571646}, ffffffff) = -1 ETIMEDOUT (Connection timed out) futex(0x7f4daf7d8ee0, FUTEX_WAKE_PRIVATE, 1) = 0 futex(0x1743514, FUTEX_WAKE_OP_PRIVATE, 1, 1, 0x1743510, {FUTEX_OP_SET, 0, FUTEX_OP_CMP_GT, 1}) = 1 futex(0x1832164, FUTEX_WAIT_PRIVATE, 4607, NULL) = 0 futex(0x1832138, FUTEX_WAIT_PRIVATE, 2, NULL) = 0 futex(0x1832138, FUTEX_WAKE_PRIVATE, 1) = 0 futex(0x1832164, FUTEX_WAIT_PRIVATE, 4609, NULL) = 0 futex(0x1832138, FUTEX_WAIT_PRIVATE, 2, NULL) = 0 futex(0x1832138, FUTEX_WAKE_PRIVATE, 1) = 0 futex(0x7f4daf7d8f24, FUTEX_WAIT_BITSET_PRIVATE, 49813, {3685718, 968282750}, ffffffff) = -1 ETIMEDOUT (Connection timed out) futex(0x7f4daf7d8ee0, FUTEX_WAKE_PRIVATE, 1) = 0 futex(0x1743514, FUTEX_WAKE_OP_PRIVATE, 1, 1, 0x1743510, {FUTEX_OP_SET, 0, FUTEX_OP_CMP_GT, 1}) = 1 futex(0x1832164, FUTEX_WAIT_PRIVATE, 4611, NULL) = 0 futex(0x1832138, FUTEX_WAIT_PRIVATE, 2, NULL) = -1 EAGAIN (Resource temporarily unavailable) futex(0x1832138, FUTEX_WAKE_PRIVATE, 1) = 0 futex(0x1743514, FUTEX_WAKE_OP_PRIVATE, 1, 1, 0x1743510, {FUTEX_OP_SET, 0, FUTEX_OP_CMP_GT, 1}) = 1 futex(0x1832164, FUTEX_WAIT_PRIVATE, 4613, NULL) = 0 futex(0x1832138, FUTEX_WAIT_PRIVATE, 2, NULL) = -1 EAGAIN (Resource temporarily unavailable) futex(0x1832138, FUTEX_WAKE_PRIVATE, 1) = 0

Anyone have any ideas how to interpret this?

The listening port on a cgi app is not http, so connecting that way gets you nothing. It’s expecting a binary stream of data.

I was wondering about that. So any ideas what I should look at next to try and figure out why my app keeps getting into this state. If I had to guess, I’d suspect that it’s getting some kind of bogus request data and can’t close the session so it never times out and quits.

If it hasn’t locked up and no one has connected for a bit when I check it’s not running. When it gets into this state, it will stay that way forever until we kill it.

When I run a strace on a freshly launched run of the app with no one connected, it looks similar, but there are read and write calls as well.

The only thing you’ve talked about that suggests a problem would be if a user were able to actually connect to your app from the outside world. You said you connected to it with telnet… from the server itself or was that from another computer entirely?

I just used telnet to localhost with the apps port number. I was able to connect, but I couldn’t do anything once connected of course. At the same time I was able to connect, when I tried through the web server I got the error messages in the first post in the server logs. So it seems like the app wasn’t responding to the cgi script.

Then when doing the strace to get the stack calls, there were no read/write calls like there are when the app is working. So I think it’s somehow getting into a deadlocked state or something like that, but I’m at a loss how to troubleshoot it further as I can’t make it happen. Although it happens fairly often these days. Sometimes multiple times in a day, sometimes it will go a week. There are two of us now that know how to reset it. I had to teach someone else for when I’m not around.

Maybe if I get some time I’ll implement some pervasive logging and just log every method and event and see if it gets stuck at any particular spot. If that doesn’t turn up anything I’ll probably go to a standalone app, but I’ll have to figure out where to run it since it needs to be accessible from the internet and it’s running behind our website through apache now. The cgi was very easy to just throw on the server and get working.

This matches the behavior that has been going on awhile. Eventually the CGI app locks up and the only recourse is to kill all instances of it. It seems to match a pattern of high usage / number of concurrent users but ultimately no real cause has yet to be determined.

It does not happen in standalone mode which tells me it has something to do with how the CGI handler receives data from the underlying web server and parses it out. Somewhere on that track, maybe a broken request, maybe too many requests, some trigger causes the app to basically lock up. I wish I could pinpoint it but I have not spent much time on it. Standalone mode is simply much better all around.

I have put an htaccess file in place so that any requests must go through the password. So far there have been no lockups that I am aware of. Probably need to go a couple of weeks with no lockups to be sure that it isn’t just a coincidence.