UDPSocket Speed/Latency questions

Hey guys,

I’ve you’ve read any of my posts in the last week or two, you’ll know I’ve been working on an application using TFTP. I managed to get some Xojo code for a TFTPServer from a developer I found online. He was gracious enough to give me his source code and I’ve been playing with it today. In general it seems pretty well written and makes sense. I’m only testing it on Windows as OS X has a built in TFTP server and pretty much locks out port 69 which TFTP uses.

I’m seeing some strange behavior though that I don’t quite understand. As written, his code ends up transferring the data really, really slow. I mean something that should take seconds takes minutes. However, if in his write method where he writes the datagram to the UDP socket, I take and update a label in the app window with the current block count, the transfer runs amazingly faster. Something that took minutes before now takes about 7 seconds which is what I’d expect - still a tiny bit slower than on the Mac but not bad - doable.

So why would updating the UI from within socket operations end up speeding up operation? Somewhere his code is getting bogged down and the call to a UI function is helping to speed that up. I’m trying to understand what mechanism it might be in the sockets that is doing this. I don’t want to have to update my UI with the block count in my app. So there must be something that I can do that loosens up the bottleneck. And I think that it could probably be optimized further. I’m just not sure what as to me his code looks pretty good. I don’t see why it is running slow unless there’s something in the framework and how sockets operate that is just making it slow…

Ideas? I don’t think this developer is active here and I am not comfortable spreading his source code around w/o his permission…

Thanks for any ideas…

I’ve got some more info about my issue here that should distill this down somewhat and generalizes the question some more.

So I’ve been working on this latency issue. In order to see how long each step in the packet transfer process is taking, I’ve started calculating the microseconds that it takes for each aspect of the packet transfer.

1.) When a packet is sent and the write event of the UDP socket is called, I get the current value of microseconds. I store that in an array property of the UDPSocket subclass.
2.) When the SendComplete event fires, I get the current value of Microseconds and subtract from it the value stored in step one. I store that in an array.
3.) TFTP requires the receiving host to send back an ACK packet. So in the data available event, of the socket, I then get the current value of Microseconds when the ACK packet is received. I then calculate the time from the write method call to the ack packet reception. And I store this in an array.

I then have code that as soon as I process 150 packets that I break program execution to look at the values of the transfer times.

I’m doing a remote debug session using Parallels on my MacBookPro. From the Mac side I am connecting to the app and starting the transfer. So the transfer should be VERY fast. I have found some interesting numbers and something I don’t understand:

1.) The time for the send complete event to fire is about 30 to 40 Microseconds on average.
2.) If I do no UI updates during the transfer process, it takes about 15000 microseconds for the ACK to be received.
3.) If I update a label in the UI after each packet write, the it takes about 225 microseconds for the ACK to be received!

On the protocol, it’s event driven. So a 512 byte packet gets sent. Then the code waits for the ACK packet to come back before sending the next 512 byte packet. So there’s no way the buffer is getting overrun or anything.

But something is getting stuck in a buffer somewhere on either the transmit or receive end. And a UI update somehow makes this faster. But to have to rely on a UI update to speed up a network transfer is not an acceptable mode of operation!

So does anyone have any idea here on why the framework is holding on to packets in either the transmit or receive buffers? How can I push them out sooner? The UDP class has no Flush method like TCP.

What about polling the udp socket in a while loop until the ACK packet is received?

Yeah I’ve somewhat tried that. I even went so far as to have a multimode time with a period of 0 continuously polling the socket. No difference. I’ve tried messing with the period of the timer and that made no difference either. So I"ll try doing it just during the waiting period…

Well, Scott, you definitely had a point! I think I figured it out.

I created a thread that basically consists of a do loop that continually polls the socket. I start the thread at the creation of the session and just let it run. BOOM! Transfer times are down to 3 seconds - what it should be!

In the past, performance of an app could be greatly improved (at some cost for idle CPU usage) by putting a high-frequency timer on the window that’s showing. Something about that seemed to “tickle” the event loop. Based on your experience that updating the UI helps performance, I would suggest giving this a try.

Well, I have solved my issue - when I start a session, I start a thread that’s nothing more than:

Do
   me.poll
Loop Until Completed

CloseTimer.Mode = Timer.ModeSingle

Where Completed is a boolean that gets set when the transfer completes. Then we start a timer that closes the socket. The timer to close the socket is needed because attempting to call close to the socket from within the socket ends up in the close not happening. It’s kind of strange but using the timer to do the close was the only way I could make that work.

But now with this constant polling, I get great transfer speeds.

What’s funny is putting a thread.sleep(1) to sleep the thread for 1 millisecond between Do Loop passes, causes the transfer speed to tank. There’s definitely some funky buffering going on in the Xojo sockets.

maybe better run a timer with 1ms delay and call poll there. Else you easily block whole app…

If that code is in a thread, then the framework will Yield on each iteration of the do / loop, so it shouldn’t be a problem. however, I bet it’s burning extra CPU by polling in such a tight loop.

Another way to do it would be:

#pragma DisableBackgroundTasks
do
   me.poll
   me.sleep(5) // adjust this value for a good combination of speed/CPU usage
loop until Completed