TCPSocket reconnect issue on Mac

Here’s a link to a simple Xojo binary project file: Dropbox - Clear Core Connect Test.xojo_binary_project.zip - Simplify your life

This is just the networking part of a much larger application we’re working on. All it’s designed to do is connect to an MCU, send it commands, receive responses. I built this version this afternoon to help our firmware developer debug a problem I’ve been seeing on my end. to wit:

ONLY On the Mac, whether I’m in the debugger or running a compiled app, the following happens:


SCENARIO 1:

  1. Power Off MCU
  2. Launch Application
  3. Initiate connection from application
  4. Power on MCU

After a few seconds, the Application connects to the MCU and the Application’s server begins receiving information from the MCU on a different port. This is what we expect, however, this is the only way we’re able to successfully initiate a connection to the MCU.

The problem:

• if at this point I quit the app and restart it, It receives a few sporadic messages on its server then they stop completely (even though the MCU is sending periodic messages, every few seconds), and it is totally unable to reconnect with the MCU’s server.

• Quit the app again, relaunch it, and now nothing at all coming in on the local server, and it’s still unable to send any commands to the MCU.


SCENARIO 2:

  1. Power On the MCU
  2. Launch the app
  3. Initiate the connection from the application

In this case, on the Mac, I cannot send to the MCU, but I am receiving messages on the app’s local server. If I quit and relaunch the app, the behavior is similar to above, where eventually neither is responsive after subsequent restarts of the application while the MCU remains running.


On Windows, however, both SCENARIO 1 and SCENARIO 2 work fine. I can power cycle the MCU, or quit and relaunch the app all day long and it always reconnects and just works. Ultimately, the target platform for this is Windows so that’s good, but I do all my development on the mac. I’d like to know why this isn’t working. I really hate working on Windows, so I don’t want to have to switch my dev platform for this.

Is there something I’m missing that’s causing this to happen on the mac, but not on Windows?

bumping this.

Any thoughts on why this problem might only be appearing on the mac? I need to make a decision on whether I need to switch development over to Windows, because this is completely blocking development at the moment. If there’s a workaround, I’m happy to implement that conditionally for the mac.

Additional data point: I installed Parallels on my mac, and Windows 10 in there. It seems to work fine, so at least I can continue development at my desk. However, I would love to know what is going on here that’s different about the Mac, causing all this trouble.

I’m a bit confused about the data flow here. You mention servers and applications. What connects to what, who initiates the connection, who is detecting loss of connection and what steps to they take to reconnect?

Yeah, I know, it’s confusing.

  • Our desktop application has a client and a server. Client connection initiated by the desktop app. We send commands to the MCU from here, and receive back acknowledgements on this channel.
  • The MCU has a client and a server as well. Client connection initiated by the MCU. The MCU sends us status updates, as well as alerts on this channel.

On the Mac, these all work as advertised but ONLY when you follow the exact steps outlined above in Scenario 1. Any other order and it doesn’t work. On Windows this doesn’t matter.

On Windows we don’t need to do anything to re-establish the connection when it’s lost. Let’s say you’ve got a successful connection going, so the Desktop app is talking to the MCU, and the MCU’s client is talking to the desktop app’s server. I can quit the desktop app and relaunch it as frequently as I’d like. Every time I relaunch it connects normally to the MCU, and begins listening on its server. so the MCU is connected back up with the desktop software. This is doing nothing special to re-establish those connections, it’s just happening automatically.

When I do the same on the mac (successful connection, then quit the desktop app and relaunch), I cannot reconnect with the MCU without going through the exact steps in Scenario 1.

Does that clarify it at all?

So each side only talks to the server at the remote end, presumably on well-known ports? And the server at each end is repsonsible for passing on what it receives to the client software?

[EDITED]

The MCU Server’s port is 8888 (set by the manufacturer, a bit difficult to change so we left it as is), and the Desktop app’s server port is 8889 (chosen by us, just to differentiate).

A command sent from Desktop Client to MCU Server is echoed back to Desktop Client on 8888.

If the MCU has to do something that may take some time, and necessitates a report (new position, for example), then the MCU client communicates that to the Desktop Server on 8889 when that data is available.

Have you tried monitoring the network traffic? I found PacketPeeper on the Mac to be very handy for this purpose, for example. Other than that not sure I can offer any specific advice.

I have not, but will give that a try. thanks!

I set up two filters, one to show all packets going from my desktop app to the MCU and one for all packets coming in (The App’s Client is at 8888, and the remote server is 8889)

This is the result in PacketPeeper for Outgoing packets:

Two interesting things here:

  1. Every single OUTGOING packet from my Xojo app is tagged in PacketPeeper as having a bad checksum.
  2. at 2021-12-06 14:09:20.327 the port number (locally, for my desktop app) changed to 60994. This is when I quit my app, relaunched, and re-initiated the connection.

After #2 happens, I no longer receive acknowledgements of commands from the MCU.

Here’s what the INCOMING packets look like:

No bad checksum errors on the incoming packets.

It’s possible the checksum business can be ignored. And you can click on a line, then do cmd-R to see the whole conversation.

When things are working properly, the TCP Stream (CMD-R) shows me exactly what I typed and what the response was:

But after the quit/restart, those come up blank.

What’s weird to me is that after the first quit/restart of my app, I make a connection with the remote server, and when I send a command it doesn’t appear as an outgoing packet, and subsequently it’s never received by the remote server. It’s like it never happened.

I am immediately checking ClearCoreServer.IsConnected upon connect, and it’s returning true. (ClearCoreServer is the TCPSocket for the client in my app)

Are you also watching the Error event for an error code 102 or 22 in case it immediately disconnects?

Yes, but in this case it’s not about the disconnection, it’s about not reconnecting after the Xojo app is quit and relaunched. That works perfectly on Windows, but not on Mac. That is, this happens even when I intentionally quit the app

That’s maybe because tcp will think it’s still connected if both sides did not get the disconnect command. Say you unplug a cable or quit you app without calling .disconnect first, one end will not know it’s disconnected until the system timeout (if any) of about 60-75 seconds.

In app.close make sure if the socket was connected, you call .disconnect first so the MCU knows it’s disconnected.

Thanks. I am, in App.close:

MainWindow.ClearCoreSend.Disconnect
MainWindow.ClearCoreReceive.Disconnect

The thing that makes no sense is that the exact same code works on Windows. This is just a Mac thing, far as I can tell.

(those TCPSockets have different names than above, I know. Sorry- copy/pasted that from the main application which I have open and the code examples above are from a simple test app that isolates the network stuff, but the call is there in both)

Can you try:

If MainWindow.ClearCoreSend.IsConnected Then
MainWindow.ClearCoreSend.Disconnect
End If
If MainWindow.ClearCoreReceive.isConnected Then
MainWindow.ClearCoreReceive.Disconnect
End If 
// Sometimes Socket.Poll can help but i may also cause a deadlock (hanging app)
// This is the same for socket.flush
// Maybe it's better to have a command that tells the mcu to close the conn (when possible)

The issue you are giving here, is not something we’ve seen it’s however possible that a connection remains for 60-75 seconds so a heartbeat or polling (request a command) is most commonly used.

How do you connect ? Please show the connection code (or reconnection code).

Yeah I can give that a try, however, I’ve found that .isconnected is virtually useless except to check the initial connection is. We tried using it to detect if the connection was alive. But I’ll give that a shot now.

How come ? It’s very informative as in “we should expect it’s still connected” not as in “it’s still connected” (as that’s a thing you may never know.