Ok, so I have this web application that reads the output from a java application and displays it on a webpage. I’m having this random issue that I have no idea how to fix.
It seems like at random intervals (about 3 times a month) the program “hangs” (goes to 100% CPU and becomes unresponsive) . All threads stop working and so do API requests, my only option is to kill and restart the application.
At first I thought it was a synchronous shell issue where it was blocking the event loop but then as I did some exploring I found out that shells appear in “htop” (as separate system threads) and the main thread goes to 0.0 CPU. When the app hangs I don’t see and extra threads or running shells. This leads me to the conclusion that it might be a never ending while or for loop, however, In my testing I have found that both for and while loops don’t freeze the UI.
What other ways might full app blocking occur? Reading Files, synchronous shells and MsgBox’s in desktop are all the ways the come to my mind at the moment.
Is it possible to get an idea of what method it might be stalled on in a compiled running application (gdb)? If so, would I need to start the app with gdb or could I attach later?
Some more details:
It hangs as I said about 3 times a month but it’s never spaced evenly. Its happend 2 times back to back in less than a week and it’s gone 2 months with no issues.
I have a customer with a VPS running the same linux distro that has had it happen 2 times a day for 3 days in a row but as soon as I went to debug with gdb, it never happened again.
Current system: 14.04.1-Ubuntu x86_64 - i7 4790k, 32gb 1866mhz RAM, 250gb SSD, 1gbps up/down - “I know, I know, it’s really overkill but it runs a game server”
VPS system: 14.04.1-Ubuntu x86_64 - Hardware unknown , 100mb connection.
I’ve seen this numerous times and my working theory at the moment is system backups - especially in a virtualized environment. KVM and other hypervisors typically do a short “pause” when snapshotting a virtual machine. I’ve correlated it a couple times to apps I’ve had running when this occurs and the app does not always fail or gracefully get restarted. It just gets stuck somewhere and stops responding to everything.
Yeah when its happened to me none of my other apps had an issue. The best way/theory I have to describe it is under certain circumstances the event loop just goes crazy and nothing else ever happens. The app is not dead but it is not responsive to anything.
Just to clarify, I said my app goes to 100% and hangs (won’t load in the browser, doesn’t read from the async shell). However in my testing I found that if I can force the app to hang by doing a long operation in a synchronous shell (this is where the main thread being at 0.0% comes from). This way of forcing the app to hang does not have the same result as when it fails in production.
Even long operations in the app won’t cause the whole thing to freeze up.
Is it possible that you could do something that would cause the app to use 100% CPU (like a while or for loop) but also cause all threads (SpecialRequestAPI, webpages, webthreads) to be blocked?
I only use threads for doing a few things at runtime (sending an email, doing some file parsing). I have tested the methods that the threads use and completely removed them from the app but it still hangs.
My end plan is to have most of the parts separated into console apps but I do have a question about console apps. Do I need to copy the libs for all of them or is there a way I share them between the apps?
Thanks for the help, I will try DoEvents and sleep.
Question about the tight loop theory, what constitutes a tight loop and how long could it possibly block for? I assume that the thread scheduler would cause a context change and it would allow time to the other threads?
Tight loop is often a While/Wend or a do/Loop, or even a For Next with a very high to condition. Could even be (where is my crucifix ?) a GOTO (thunder). You can have loops of any duration, as long as you put DoEvents inside. If you don’t put a DoEvent, the whole app remains frozen until the end of the method or event containing the loop. As mentioned above, synchronous sockets also freeze the app.
It hung again today, I started GDB and got this for “backtrace/bt”
#0 0xf71f8a87 in TCPSocketPosix::BytesLeftToSend() ()
from /home/me/MyApp/MyApp Libs/XojoConsoleFramework32.so #1 0xf71ca395 in TCPSocketFlush ()
from /home/me/MyApp/MyApp Libs/XojoConsoleFramework32.so #2 0xf4bf6b73 in ?? () #3 0xf4f8479d in ?? () #4 0xf71f525c in TCPSocket::FireEvents() ()
from /home/me/MyApp/MyApp Libs/XojoConsoleFramework32.so #5 0xf71f8515 in TCPSocketPosix::DoAccept() ()
from /home/me/MyApp/MyApp Libs/XojoConsoleFramework32.so #6 0xf71f7cbc in TCPSocketPosix::Poll() ()
from /home/me/MyApp/MyApp Libs/XojoConsoleFramework32.so #7 0xf71e2b96 in DoNetIdle(unsigned char) ()
from /home/me/MyApp/MyApp Libs/XojoConsoleFramework32.so #8 0xf71e3a6b in NetIdle(unsigned char) ()
from /home/me/MyApp/MyApp Libs/XojoConsoleFramework32.so #9 0xf717ef1f in PollingTask::PerformTask() ()
from /home/me/MyApp/MyApp Libs/XojoConsoleFramework32.so #10 0xf70e3fe1 in ?? ()
from /home/me/MyApp/MyApp Libs/XojoConsoleFramework32.so #11 0xf6d27be1 in ?? () from /lib/i386-linux-gnu/libglib-2.0.so.0 #12 0xf6d270a7 in g_main_context_dispatch ()
from /lib/i386-linux-gnu/libglib-2.0.so.0 #13 0xf6d27468 in ?? () from /lib/i386-linux-gnu/libglib-2.0.so.0 #14 0xf6d27528 in g_main_context_iteration ()
from /lib/i386-linux-gnu/libglib-2.0.so.0 #15 0xf718e654 in RunLoopLinux::RunIteration() ()
from /home/me/MyApp/MyApp Libs/XojoConsoleFramework32.so #16 0xf71c5c28 in ModalEvents(unsigned char) ()
from /home/me/MyApp/MyApp Libs/XojoConsoleFramework32.so #17 0xf70eb704 in RuntimeDoEvents ()
from /home/me/MyApp/MyApp Libs/XojoConsoleFramework32.so #18 0xf4bfa689 in ?? () #19 0xf4cd28c6 in ?? () #20 0xf70ecec7 in CallConsoleApplicationRunEvent() ()
from /home/me/MyApp/MyApp Libs/XojoConsoleFramework32.so #21 0xf4bcc61e in ?? () #22 0xf4bfa8bb in ?? () #23 0xf71c5bc7 in CallFunctionWithExceptionHandling(void (*)()) ()
from /home/me/MyApp/MyApp Libs/XojoConsoleFramework32.so #24 0xf71c4cde in RuntimeRun ()
from /home/me/MyApp/MyApp Libs/XojoConsoleFramework32.so #25 0xf4e0799f in ?? () #26 0xf4bcc50a in ?? () #27 0xf4bcc024 in ?? () #28 0xf71c5897 in MainExport ()
from /home/me/MyApp/MyApp Libs/XojoConsoleFramework32.so #29 0x08048c1e in ?? () #30 0x72632f65 in ?? () #31 0x79746661 in ?? () #32 0x656e796d in ?? () #33 0x4d432f73 in ?? () #34 0x656e6150 in ?? () #35 0x4d432f6c in ?? () #36 0x656e6150 in ?? () #37 0x202d206c in ?? () #38 0x504d4554 in ?? () #39 0x62694c20 in ?? () #40 0x6f582f73 in ?? () #41 0x6f436f6a in ?? () #42 0x6c6f736e in ?? () #43 0x61724665 in ?? () #44 0x6f77656d in ?? () #45 0x32336b72 in ?? () #46 0x006f732e in ?? () #47 0x00000000 in ?? ()
Yes, this has happened to me with other projects, it’s when the application is blocking. My issues is that with the randomness/non reproducible hang happening up to 2 months apart. It’s almost impossible to start it with some sort of debugger.
I believe I’ve seen this too (but with IPCSockets): if an app is in a While loop waiting on Socket.BytesLeftToSend() and the connected app dies, it will get stuck there. You would expect that the socket would error out and BytesLeftToSend would go to zero or something, but that’s not what happens.
I do see that it is a TCP socket, but I’m not sure if the webapp framework uses the same TCPSocket as the code version (for the http front end). If it isn’t, I have a few TCPSockets that has been subclassed. I’m worried though, if its not my socket, it might be a framework thing and that would be a pain to fix.
As we’re not getting lots of bug reports about this, I’d tend to believe that it’s not in the web framework. That said, we actually use SSLSocket behind the scenes so that we can serve secure data as well.
Are you periodically calling the Poll method on your socket?