TCPSocketFlush hanging?

Daniel_Bronson · June 15, 2016, 12:09pm

I’ve been using the http server classes from Brandon Skrtich/Thom McGrath/TheZAZ in a project for literally years. All of a sudden, my application is hanging (but not crashing - I have to force it to quit.) The only change is that I built it with 2016r1.1. The last intelligible thing in the spindump/hang report involves TCPSocketFlush;

Thread 0x43f69a priority 5-46 cpu time 1.151s 15 ??? (SpeechProxy + 1830017) [0x1bfc81] 15 ??? (SpeechProxy + 1813975) [0x1bbdd7] 15 ??? (SpeechProxy + 1815860) [0x1bc534] 15 ??? (SpeechProxy + 978862) [0xeffae] 15 RuntimeRun + 49 (XojoFramework) [0x3b3f7c] 15 ??? (XojoFramework + 1752540) [0x3b5ddc] 15 -[NSApplication run] + 727 (AppKit) [0x93a3715c] 15 ??? (XojoFramework + 272793) [0x24c999] 15 ??? (XojoFramework + 1752374) [0x3b5d36] 15 ??? (SpeechProxy + 399085) [0x626ed] 15 ??? (SpeechProxy + 1142630) [0x117f66] 15 ??? (XojoFramework + 273008) [0x24ca70] 15 ??? (XojoFramework + 272906) [0x24ca0a] 15 -[NSApplication nextEventMatchingMask:untilDate:inMode:dequeue:] + 119 (AppKit) [0x93a44870] 15 _DPSNextEvent + 1602 (AppKit) [0x93a45349] 15 _BlockUntilNextEventMatchingListInModeWithFilter + 92 (HIToolbox) [0x931df6bd] 15 ReceiveNextEventCommon + 526 (HIToolbox) [0x931df8e2] 15 RunCurrentEventLoopInMode + 259 (HIToolbox) [0x931dfb5d] 15 CFRunLoopRunInMode + 123 (CoreFoundation) [0x95cce84b] 15 CFRunLoopRunSpecific + 394 (CoreFoundation) [0x95cce9ea] 15 __CFRunLoopRun + 1779 (CoreFoundation) [0x95ccf353] 15 __CFRunLoopDoTimers + 349 (CoreFoundation) [0x95d9206d] 15 __CFRunLoopDoTimer + 1395 (CoreFoundation) [0x95d17863] 15 __CFRUNLOOP_IS_CALLING_OUT_TO_A_TIMER_CALLBACK_FUNCTION__ + 22 (CoreFoundation) [0x95d17ea6] 15 ??? (XojoFramework + 806655) [0x2ceeff] 15 ??? (XojoFramework + 657239) [0x2aa757] 15 ??? (XojoFramework + 1977640) [0x3ecd28] 15 ??? (XojoFramework + 536559) [0x28cfef] 15 ??? (XojoFramework + 527059) [0x28aad3] 15 ??? (XojoFramework + 539639) [0x28dbf7] 15 ??? (SpeechProxy + 1389770) [0x1544ca] 15 ??? (SpeechProxy + 195760) [0x30cb0] 8 TCPSocketFlush + 142 (XojoFramework) [0x3c08cf] 2 ??? (XojoFramework + 526087) [0x28a707] 2 <executing in user space> 2 ??? (XojoFramework + 526057) [0x28a6e9] *2 special_handler_continue + 0 (mach_kernel) [0xffffff80002489b0] *2 <suspended> 2 ??? (XojoFramework + 527067) [0x28aadb] 2 <executing in user space> 1 ??? (XojoFramework + 526090) [0x28a70a] 1 <executing in user space> 1 ??? (XojoFramework + 527081) [0x28aae9] 1 <executing in user space> 3 TCPSocketFlush + 153 (XojoFramework) [0x3c08da] 3 <executing in user space> 2 TCPSocketFlush + 168 (XojoFramework) [0x3c08e9] 1 <executing in user space> 1 ??? (XojoFramework + 529773) [0x28b56d] 1 <executing in user space> 1 ??? (XojoFramework + 526053) [0x28a6e5] 1 <executing in user space> 1 TCPSocketFlush + 139 (XojoFramework) [0x3c08cc] 1 <executing in user space>

Before forcing quit, this appears in the logs;

6/15/16 12:01:58.000 AM kernel[0]: process SpeechProxy[20706] thread 4454042 caught burning CPU! It used more than 50% CPU (Actual recent usage: 99%) over 180 seconds. thread lifetime cpu usage 340.237853 seconds, (241.541519 user, 98.696334 system) ledger info: balance: 90004526655 credit: 338080839156 debit: 248076312501 limit: 90000000000 (50%) period: 180000000000 time since last refill (ns): 90015230328

This looks like a framework issue, and not something I can solve directly, but I could be wrong. Anyone else having issues with sockets? Any suggestions how to proceed? I’ve kind of reached an impasse here.

Daniel_Bronson · June 15, 2016, 2:16pm

I actually got lucky walking through it in the debugger. Right when it got to TCPSocket.Flush(), the debugger went blank, and the application hung. Next time through, it worked. It seems to hit randomly. Wonderful.

Greg_O_Lone · June 15, 2016, 5:18pm

I’ve seen this. It’s usually a re-entrancy issue. Make sure your routine for dealing with incoming data isn’t so slow that the next call of DataAvailable overlaps it.

Greg_O_Lone · June 15, 2016, 5:19pm

One way to do that is to simply use DataAvailable to put the new data onto a buffer and then use a timer to periodically look at the buffer and deal with it.

Daniel_Bronson · June 15, 2016, 6:19pm

Thanks for the insight. Using timers the way you describe would require a substantial rewrite of someone else’s code. What if I try disabling background tasks during the critical part of the DataAvailable handler? Could that prevent re-entrancy?

Daniel_Bronson · June 16, 2016, 12:58pm

It’s now been running for 19 hours without incident. Prior to this, it was hanging within about 6-8 hours. I’ll keep an eye on it, but I’m going to call it solved. For the record, here’s what I did in the DataAvailable event. Where the original just did a flush and a disconnect, I added the pragmas, and used poll instead of flush;

[code]#Pragma BackgroundTasks FALSE

  me.poll
  'me.Flush
  Me.Disconnect

#Pragma BackgroundTasks TRUE
[/code]

Not sure if replacing Flush with Poll contributed to this in any way, but as long as it’s working I’m not going to mess with it.

Joe_Ranieri · June 16, 2016, 2:17pm

I wouldn’t ever bother calling Flush in this situation, so you should be fine.

Daniel_Bronson · June 16, 2016, 8:27pm

Presumably Thom McGrath originally put the Flush in there - it was his baby. A few more hours running like a champ. I’m happy with this unless it breaks again.