Mac daemon ServerSocket/SSLSocket is refusing new connections (102) after running idle for 2-3 days.

(Note: The prior conversation about this issue was incorrectly marked answered)

After months of searching and testing, I’ve narrowed an issue down to ServerSockets using SSLSocket. Prior releases of our Xojo networking code an fine for years. Early this year, we updated the app to use SSLSockets and a new version of Xojo. Since, we’ve had a number of Mac users suddenly unable to connect clients with their app daemon running on their network.

Here’s what I did to narrow it down to an SSLSocket issue:

  1. I duplicated our SSLSocket and changed the Super of the duplicate to TCPSocket.

  2. On the new TCPSocket, I commented out the SSLSocket specific .Secure and .ConnectionType property settings.

  3. I created a program preference for using SSL or TCP connections on both client and server.

  4. This test build has now run several weeks at a handful of customer sites experiencing this problem.

  5. When the daemon ServerSocket is using TCPSocket connections, the daemon runs normally for extended periods of time.

  6. When the daemon ServerSocket is switched to use SSLSockets, after several days of running idle (like over a weekend), new client connections fail. New client TCPSocket.Connect is successful, followed immediately by receiving an 102 disconnect from the daemon.

Observations:
(i) This connect/disconnect with a 102 behavior in 6 is the same as when a ServerSocket reaches MaxConnections. So I’ve logged MaxConnections and the ServerSocket reports that this is not the issue.

(ii) When in this state, it is only new connections that fail. Existing clients with already established connections continue to communicate and operate with the daemon perfectly. It appears that each customer site with this issue is leaving clients connected all the time (this may or may not be related).

(iii) When in this state, closing an existing connected client and attempting to reconnect results in the same inability to reconnect.

(iv) Quitting and restarting a client does not change the issue.

(v) Reinitializing the ServerSocket within the daemon does not change the issue.

(vi) Restarting the daemon immediately resolves the issue until the daemon runs idle again for several days.

(vii) This issue appears to be Mac OS only, We’ve seen it on OS X 10.6, 10.8, 10.10 and 10.11. It has not been reported by any of our many Windows customers running the same app as a service.

(viii) When in this state, the ServerSocket reports normal function and listening. NetStat confirms the socket/port is listening.

(ix) Xojo objectcount, memory usage, CPU usage, and all other observable properties indicate that the daemon is functioning normally.

Reported here: <https://xojo.com/issue/41888>

32-bit or 64-bit build? Have you tried the other (assuming platform can run both) to see if it still happens?

These are all 32 bit builds. Switching to 64 bit would introduce a huge number of additional variables… code changes, plugins, etc.

Keith…

I finally gave up using Xojo Sockets in favor of using MBS CURL… They worked OK without SSL but once I turned that option on my apps would leak objects and memory… and that was death to my Service app which needs to run forever. See FeedBack 36565. It sounds like your problem is similar but you do not see the leaks that I did/do.

As a temporary workaround for whatever you’re encountering, I’d script the restarting of the daemon each night if/when no one is connected. That should make sure that it works each day without inconveniencing users.

I don’t suppose it’s a precise, reproducible amount of idle time?

If you’re able to reproduce this in a controlled environment, I’d try making a small little iOS app that just connects to the server and see what it gives for error information (Xojo.Net.TCPSocket gives a lot more information about what goes wrong). It doesn’t need to run on the device and you’ll probably need to disable certificate validation on the socket.

This isn’t practical for many of our users. When the daemon restarts it of course drops existing connections. Many of our installations are for non technical users and/or kiosks where reestablishing a connection is a hassle. That is often why clients and servers run 24x7.

Ooops looks like I accidentally edited above by mistake. Yes, that’s why you’d do it at a time when no one was connected. The logic would be something like:

-Is it 2 AM (or whatever time), if so set a WantToRestart flag
-Check in a loop or timer (depending on app construction) to see if that flag is set and the number of users connected is 0. Restart self if so.

Are you saying there isn’t really a time when the # of users connected is 0, and/or that the client doesn’t automatically reconnect?

Not it’s not… And as soon as you check the server, it restarts your multi-day wait time for the next check.

In the couple of dozen customer environments where it does occur, it is absolutely predictable: Always on Monday after the business has not been using the software (but it has been left running).

We have multiple thousands of of installs since March. We also rely on the app in house. No one reports the issue during the week and the vast majority appear to never see it all all. About 90% of problem installs are Ortho/Dentists who tend to not practice on Friday so my hypothesis is that the idle time is actually more like 3 days.

Over the past months I have remotely connected to customers with the issue and setup extensive logging by both client and server in some test builds. A client synchronous .Connect that loops polling the socket until .IsConnected and then evaluates the socket finds everything normal except that it is no longer connected and the LastErrorCode = 102. Do you think an iOS app using Xojo.Net.TCPSocket would tell us anything different?

You’re correct on both. For historical/technical reasons we auto reconnect for dropped connections but not on daemon restart. we could change this behavior but not super conveniently.

In an effort to preserve existing connections, we did a test build where we handed off management of existing connections and reinitialized the ServerSocket every morning.

Rather surprisingly, we have logs that confirm we reinitialized the ServerSocket every morning. The app runs all week perfectly, reinitializing the ServerSocket even over the weekend on Saturday, Sunday, and Monday am, just hours before the failure when they returned to use software.

Yes. The existing TCPSocket/SSLSocket classes generally only give you a vague notion of what’s wrong since it’s a single integer. Xojo.Net.TCPSocket’s error event generally (always?) passes in a Xojo.Net.NetException (a subclass of ErrorException) that exposes a lot more information.

// Assumes a custom LogMsg function which does something useful.
Sub DumpError(exc As RuntimeException)
  Dim type As Xojo.Introspection.TypeInfo = Xojo.Introspection.GetType(exc)
  LogMsg(type.Name)
  LogMsg("ErrorNumber: " + exc.ErrorNumber.ToText)
  LogMsg("Reason: " + exc.Reason)
  If exc IsA Xojo.Core.ErrorException Then
    Dim err As Xojo.Core.ErrorException = Xojo.Core.ErrorException(exc)
    LogMsg("Domain: " + err.ErrorDomain)
    If err.UnderlyingError Is Nil Then
      LogMsg("UnderlingError: Nil")
    Else
      LogMsg("UnderlyingError:")
      DumpError(err.UnderlyingError)
    End If
  End If
End Sub

I’ve gotten 102 SSL errors in one of my apps for a while.

<https://xojo.com/issue/31800>

At least chrome doesn’t crash my app anymore, but chrome still produces these while the other browsers don’t seem to. At least not in limited testing. This is probably a totally different issue, but I’ve never figured out why it does it.

[quote=236353:@Kevin Windham]I’ve gotten 102 SSL errors in one of my apps for a while.

<https://xojo.com/issue/31800>

At least chrome doesn’t crash my app anymore, but chrome still produces these while the other browsers don’t seem to. At least not in limited testing. This is probably a totally different issue, but I’ve never figured out why it does it.[/quote]
102 and 22 are just the other side of the connection disconnecting.