TCP Socket how to know when all data has been received

Tim_Seyfarth · February 16, 2021, 1:51am

Hello all,

First time working with TCPSockets. I am sending about 97K bytes of text. On the first send, all of the data shows up in seemingly one chunk. However on subsequent sends, the data arrives at two different times, one with a size of about 65536 bytes and the balance on the second chunk.

My question is, how to know how much data is coming over, and how to join the data when there is more thank one chunk? Do I need to send a small header first or something like that so the Server knows what to expect? Is there a more elegant method to do this? Any sample code? Also, why does it appear to arrive in one chunk in the first go, and subsequent transfers are always made up of two or more chunks?

One more question, should I close the connection after each use, or is it OK to remain open and connected?

Thanks,
Tim

Ivan_Tellez · February 16, 2021, 3:44am

To send discrete packets easily, use the EasyTCPSocket instead of the normal TCPSockets. It handles all the low level things for you (designing a protocol, headers, memory blocks, limits, sizes, etc).

Wayne_Golding · February 16, 2021, 3:47am

If the data is truly text you could terminate the send with ChrB(26) <Ctrl + Z> and have the server wait for that byte to appear or you could send the size of the payload in the form of a memoryblock as a header.

Leaving the connection open is ok, but there could be errors if the network disconnects.

Tim_Hare · February 16, 2021, 3:59am

The only way to know if you’ve got all the data is to define a protocol that will tell you when you’ve got it all. All 3 suggestions are reasonable approaches.

As far as joining packets, you can either ReadAll and append to a buffer until you’ve got it all, then process the buffer. Or you can use LookAhead to see if you’ve got it all and if not, just leave it in the socket until the next DataAvailable. The socket will then have the whole thing - the part that came in first plus the part that just arrived.

One more scenario to code for is a DataAvailable can have the last part of one message and the first part of another. Design your protocol so you can tell when that happens.

Arnaud_N · February 16, 2021, 11:04am

It’s still less expensive to keep the connection open (and, in case the network fails, you reconnect) than re-connecting each time.

Kem_Tekinay · February 16, 2021, 4:29pm

All good suggestions.

We define a header that includes a 4-byte signature, the header version, the data size, and (for good measure) a hash or some other way to confirm the data content. (We use RSA signatures, but in most cases, a MD5 or SHA-256 hash would be enough.) This makes it easy to split the incoming data, and we know when the data is complete for processing. If the hash doesn’t match, we close the connection instantly.

In the future, if you have to add more to the header, the header version will let you continue to process older versions, if that’s a thing you need to worry about.