Multiple SSLSocket I/O in Threads: the slowest connection universally sets the pace for all connections

Yes, I’m aware of that. If the threading model plays a role in this case, I guess it’s going to be for more complicated reasons. When multiple threads (tried up to 5) are downloading a large file over a fast link, performance does noticeably degrade, but it’s still not that bad.
The problem is when one of the connections is over a slow link: that seems to affect all active transfers at that time.
The other thing I’ve noticed, is that performance is slightly better when socket threads have the lowest priority. In my example they are all assigned a priority of 1 in their constructor.

Somethings not right here. That code you posted above… is that being called from inside the thread?

Workers wouldn’t help because:

  1. Workers are Desktop-only. The server is a service application, as all servers usually are.
  2. Even if workers were available for console applications (which would be great), I don’t think the main application’s ServerSocket would be able to pass a new connection to an SSLSocket running on the helper application. Haven’t tried it, but I’d instinctively say it can’t.

But generally, as an architecture, it’s not bad at all. PostgreSQL does this: fires up a new separate process for every session and lets the OS handle allocation of CPU resources.

What about forking?

The main application wouldn’t be handling the connections at all, you’d fire up the worker, give it the URL and the worker (or fork) would then return the result.

The thread calls the following routing method, passing it a reference of itself

Public Sub RouteRequest(WorkerThread as ipsc_ConnectionThread)
  select case WorkerThread.SocketRef.RequestPath.NthField("/" , 2).Lowercase
    
  case "files"
    
    dim files as new endpoint_files(WorkerThread , RootFolder)
    
  case "folders"
    
    dim folders as new endpoint_folders(WorkerThread , RootFolder)
    
  case "opensockets" // just for debugging, method is irrelevant
    
    dim sockets() as TCPSocket = Server.ActiveConnections
    dim SocketHandles as String = "Active socket handles at " + DateTime.Now.SQLDateTime + EndOfLine + EndOfLine
    for i as Integer = 0 to Sockets.Ubound
      SocketHandles = SocketHandles + Sockets(i).Handle.ToString + EndOfLine
    next i
    WorkerThread.SocketRef.RespondOK(SocketHandles)
    
  else
    WorkerThread.SocketRef.RespondInError(501)  // not implemented
  end select
  
End Sub

The actual processing happens in the endpoint_* classes which are also passed a reference of the thread. And the thread contains a reference of the socket.

So it’s methods calling other methods with the starting point being the thread’s Run event.
That’s not a source of problems, right?

Ok, I’ve read and re-read your code and I think you’re actually working against yourself. My guess is that your code is actually what’s slowing you down.

If this were me…

  1. Subclass thread
  2. Have a property on the thread for the file being downloaded
  3. Have the code above in the Run event (or a method called from run) which does this:
While not stream.endoffile 
      Sockeref.write(stream.read(ipsc_Lib.SocketChunkSize * n))
    Self.sleep(10)
Wend

Yes, ultimately the data may be sent in a single chunk, but that’s up to the negotiation between the socket and the client. Calling Flush actually causes speed loss because all you’re doing is causing the whole process to stop while that socket sends its data.

IMHO, the only reason for using flush is if the client really needs to stream the data… like audio or video.

On putting the thread to sleep: this is among the first things I tried. While doing so, I was getting the exact same data transfer rate I would get if I did the whole thing in the Web Frameworks’s HandleURL event: ~4MB/sec. I wanted more, and I got more with the way I did it: >100MB/sec.

On chopping the data by calling Flush often: it makes the server more responsive to handling concurrent requests. If I don’t chop, especially in the case of large files (or datasets anyway), a new request isn’t handled until the current finishes.

I can try your recommendations again and get back with the results, but the code you see is the result of a long process of trial and error.

Sam, I’m not sure I understand what you mean. It’s a server, it’s listening for incoming connections. it doesn’t have any URL to follow. That’s what the client does.
Am I missing something?

It was just an idea, but if it would / could work, I’d imagine that the main app would launch an instance of the worker, which listens for a connection.

Upon the worker being connected to, it then notifies the main application, which launches another instance of the worker, which then opens up listening for a connection.

It’s important to remember that sockets in general have a buffer for incoming and outgoing data. That is, for incoming data, socket data is queued into a buffer before the DataAvailable event fires and is held there until you actually read from the socket. This is why you can use lookahead and then read it later.

On an outgoing buffer, any time that you write data it is also put into a buffer and fed to the client at a rate that it can handle which also doesn’t bog down other sockets and threads. Calling Flush may make a single socket perform better, but it also pauses the rest of the app while doing so and the performance for the rest of the users will suffer.

Remember, sockets all run their code on the main thread, regardless of where you write to them, so they’re all sharing the main thread’s time. That main thread is also responsible for keeping everything else working cooperatively, including the app itself.

So anyway yes, you might get individual sockets to get 100Mbps locally, but once you have multiple sockets running, you’ll end up being limited by the slowest connection the way you’re doing it. Allowing the socket to send data off as it sees fit will get you much better concurrent performance overall.

…at least that’s what we found when updating the web framework for web 2.

And no, workers don’t help you here. There’s no way to pass a socket generated by a server socket off to a worker to allow it to run independently of the main app at this point.

This is a place where load balancing your app is really helpful.

On the subject of Flush, @Greg_O_Lone , I ended up with my own version which goes like this:


While  (True)
  
  me.Poll ()            // Get latest status from the socket
  
  if  (me.IsConnected=False)  then Return False
  if  (me.LastErrorCode>0)  then Return False
 
  If  (me.BytesLeftToSend<=0)  Then Return True
  
Wend

Is this a waste of time or are there any other downsides? I have, or may have at times, multiple threads doing SSLSocket I/O.

I would mark this post as the answer:

1 Like

Okay, first of all, Greg thank you for taking the time, I appreciate it!
I tried with the minimal setup you suggested. The code in the thread’s Run event has been updated in the Github repository and is as follows:

while not stream.EndOfFile
  
  SocketRef.write(stream.read(ipsc_Lib.SocketChunkSize * n))  // n=4 , SocketChunkSize = 1048576 
  
  if endpoint.IndexOf("flush") > 0 then
    SocketRef.Flush
  end if
  
  Self.sleep(10)
  
wend

As you can see, this code allows for two modes of operation, one with Flush invoked and another without.
I run these 7 test scenarios, all trying to download a 2GB file using Mozilla Firefox as a client:

  1. No flush, 1 local client: 4MB/sec (same throughput I’d get with Web1 or Web2 2021R2 and much better than the 700kb/sec when Web2 was first introduced, thanks for fixing that!)
  2. No flush, 4 local clients: all run at 4MB/sec
  3. No flush, 4 local clients, 2 remote clients over slow wifi: locals run at a steady 4MB/sec , remotes run at 1,5MB/sec. The issue of “SYnchronized-THroughputs” does not manifest. (shall we call it SYTH for short?)
  4. With flush, 1 local client: 100 MB/sec. Proud of Xojo!
  5. With flush, 5 local clients: declining from 50MB/sec to ~25MB/sec. Very proud of Xojo!
  6. With flush, 1 local client, 3 remote over gigabit ethernet: local at 40MB/sec declining , remotes at ~30MB/sec. Enthousiastic about Xojo!
  7. With flush, 1 local client, 1 remote over slow wifi: local at 2MB/sec , remote at 2MB/sec. Clearly a bad case of SYTH.

Now, there are two ways of looking at the experimental data:

  1. I’m doing things wrong: If I don’t want any SYTH, I should not Flush. Period.
    The problem in this case is that I’m really not happy with the flat throughput of 4MB/sec. And it makes people who accuse Xojo of being a “toy language” happy. I’m not happy with that either.

  2. I’m not doing things wrong: And I’m getting tangible results I can rub in the face of Xojo detractors: Xojo can prove somewhat acceptably performant in the servers game too.
    But deep inside the Framework, there’s something that’s causing SYTH. And it might be worth having someone take a closer look into it, because it’s going to make Xojo a better product.

The outline you presented (and Rick A suggests I mark as the answer), looks to me like a solid explanation of how performance degrades with increasing connections, something that’s neither surprising, nor do I see as a problem: that’s life and that’s what load balancers are for, as you correctly pointed out.
But, to my understanding at least, it does not account for SYTH: 5 clients over a fast, uniform link run beautifully at 25MB/sec. 2 clients over different-bandwidth links run both 10 times slower. This is something I still find strange and had Xojo been my product, I’d be curious to find out why.

Anyway, I think I can live with the possibility of SYTH much better than I can live with the certainty of 4MB/sec over optical fiber.
Shall I file a bug report just for the formality of it?

1 Like

Slow xfer rates causes slow services and slow events. Any thread in some “busy wait condition”, not sleeping, causes a “global wait”, so the entire system degrades. So any time you add a slow xfer service to the pool, the slow operations affects all the system and the entire pool degrades.

I can’t see “a cure” under the current cooperative threading design. So it does not need a bug report, it is as is “by design”, and would not trigger an action, as it is an already known fact, and requests for a preemptive model were already made in the past with no results.
A way to mitigate the effect is making heavy use of features that currently the Xojo framework and the language lacks, like Futures, Async and Await.

Now, that’s a potential explanation!
Then my question to Greg becomes: does the root of the SYTH evil lie in the threading architecture?

If you read again, only this part of the explanation he wrote, don’t you conclude it is?

I’m curious… how are you measuring throughput?

The numbers I’m mentioning are per download. When I’m using Firefox, it’s what it says on the active downloads status. I consider it valid if it has somewhat stabilized around a certain level after a few refreshes.
In some other cases, I’m using another xojo-made tool that downloads via the URLConnection. There, I’m getting the duration of the download in milliseconds.

Are my figures dissimilar to the ones you’re seeing?

No. I just wanted to make sure you weren’t doing anything that would have skewed the results. Browsers will sometimes report speeds based on uncompressed sizes even if the content is compressed and the download speeds will be falsely inflated. So I was just checking.