Mac daemon immediately closes new connections

HELP! I’ve chasing a very illusive Xojo issue for nearly six months.

The Problem:
On a handful of customer Macs (OS X 10.6, 10.8 and 10.10), a daemon that runs perfectly all week starts closing new SSL connections (using ServerSocket) after running idle for a weekend. The client connects and immediately receives a 102 that the connection has closed.

Restarting the Mac or daemon immediately returns all functions to normal for the week.

Data points:
(1) Existing connections continue to work fine, proving the daemon is otherwise still running normally.

(2) Existing connections that are disconnected cannot reconnect.

(3) ServerSocket.ActiveConnections are well under Maximum (usually at 0 or 1). I verified that prior sockets are being released properly.

(4) Memory, object count, threads and CPU are all normal when it starts rejecting new connections.

(5) We have zero problems with the same code running as a Windows service.

(6) We have zero reports of this happening any other time except after a 2-3 day weekend where the server runs idle on a Mac that doesn’t sleep or spin down the hard drive.

(7) We’ve done our best to rule out external things like anti-virus software or scheduled tasks that could interfere with our daemon listening.

(8) In every case of an upgrade, reverting back to our prior release resolves the issue.
Differences with prior build: (A) Using TCPSocket instead of SSL and (B) prior build compiled with RealStudio 2011 R3. The problem release was tested using Xojo 2013 R3.3 & 2015R2.2 with the same results.

Lastest Test:
Assuming it was still must somehow a ServerSocket or SSL socket issue, we just completed the following test:
We set a timer to have the ServerSocket stop listening, nil and reinitialize the ServerSocket every night. We logged that this was happening. We also have logs where a client was left in a loop attempting to connect overnight while it was rejecting new connections. Existing clients were disconnected (as expected) when the serverSocket reinitialized. However, the looping client continued to get an immediate 102 disconnect. Once again, restarting the daemon allowed the client to instantly connect.

So it appears there has to be something external to the ServerSocket that is immediately closing new connections.

I’m totally stumped. Any ideas that could help us track this down?

Just an idea for a workaround : stop the service late at night when there is no connection and restart the daemon.

@Michel Beaucourt Thanks for the idea. A forced nightly restart is impractical since we have many situations where clients must remain connected 24x7. The ServerSocket reinitialization was run as a test to isolate/remove the ServerSocket as the essential problem.

@Keith DeLong — Here’s where I’d start… in the sockets that are created in the AddSocket event of the ServerSocket, do you have any calls to Disconnect for any reason?

Also, are there any scenarios where the client might disconnect automatically? Perhaps the problem is actually there?

[quote=223476:@Keith DeLong]Differences with prior build: (A) Using TCPSocket instead of SSL and (B) prior build compiled with RealStudio 2011 R3. The problem release was tested using Xojo 2013 R3.3 & 2015R2.2 with the same results.

[/quote]

Have you tried narrowing it down to just the change from 2011r3 -> 2015r2? or the change to SSL? For example, what if you build with 2015r2 but disable SSL?

Thanks Greg. Nothing in the AddSocket event. I’ve looked pretty carefully for code where the daemon or client might trigger the disconnect.

Since it’s Mac only I also reviewed every TargetMacOS statement.

I haven’t yet. I’m thinking about disabling SSL first. Any suspicions? I am using a self signed cert. Any chance OS X runs a cron job checking suspicious listeners?

Since we cannot duplicate in house, testing takes a week or more as we rely on customers to get it installed before a weekend where it can run idle. Most have tired of the problem/testing and just reverted back to our prior release.

I don’t have any suspicions at this point, but narrowing it down to SSLSocket weirdness or some change we made would give me somewhere to start being suspicious.

I’ll work on narrowing it down. I’m happy to pay for some extra attention at this point.