WebApp struggling under load

I have a WebApp which I use for collecting survey data, which has worked fine for a few years now, handling moderate loads of between 1 and 50 simultaneous connections).

Feeling pretty comfortable with it, I decided to try updating it to 2016R1, and it seemed OK in debug tests. Foolishly (in hindsight) I put this into production, and it immediately fell apart : CPU spiking to 100% and barely handling a dozen connections at once.

I took samples of the app under load and killed it. I recompiled in an older version of Xojo (2015 R4.1) but saw basically the same issues. I need to check my notes, but I think the earlier versions that handled high load better was possibly made with 2014 R2.1?

Anyone ran into similar issues: recompiling a web app with newer version of the IDE is giving much worse load-handling ability?

Here’s a sample of the app when it was getting hammered with about 40 connections and was spiking at 100% CPU. Seems like it’s hung up inside a socket trying to send data:

 + !                     958 RuntimeDoEvents  (in rbframework.dylib) + 20  [0x6b2faf]
    + !                     : 958 RuntimeTextFromOldString  (in rbframework.dylib) + 5415  [0x6a025f]
    + !                     :   521 RuntimeTextFromOldString  (in rbframework.dylib) + 5461  [0x6a028d]
    + !                     :   | 521 IPCSocketBytesLeftToSend  (in rbframework.dylib) + 2039  [0x5d00e3]
    + !                     :   |   521 ApplicationAddMenuHandler  (in rbframework.dylib) + 629  [0x6b451c]
    + !                     :   |     520 _HTTPServer.HTTPRequestSocket._Poll%%o<_HTTPServer.HTTPRequestSocket>  (in Curve) + 269  [0x9f1b1]
    + !                     :   |     + 519 SSLSocket._Poll%%o<SSLSocket>  (in Curve) + 48  [0x67675]
    + !                     :   |     + ! 513 REALPluginMain  (in SSLSocket.dylib) + 5400  [0x20d79db]
    + !                     :   |     + ! : 182 REALPluginMain  (in SSLSocket.dylib) + 3946  [0x20d742d]
    + !                     :   |     + ! : | 177 REALPluginMain  (in SSLSocket.dylib) + 11321  [0x20d90fc]
    + !                     :   |     + ! : | + 176 ???  (in SSLSocket.dylib)  load address 0x20d5000 + 0x1474  [0x20d6474]
    + !                     :   |     + ! : | + ! 150 VirtVolFlush  (in rbframework.dylib) + 69499  [0x659d97]
    + !                     :   |     + ! : | + ! : 103 VirtVolFlush  (in rbframework.dylib) + 66588  [0x659238]
    + !                     :   |     + ! : | + ! : | 68 VirtVolFlush  (in rbframework.dylib) + 125764  [0x667960]
    + !                     :   |     + ! : | + ! : | + 63 RandomInt  (in rbframework.dylib) + 4715  [0x6b633c]
    + !                     :   |     + ! : | + ! : | + ! 42 RuntimeOnesCompliment  (in rbframework.dylib) + 11260  [0x5f9310]
    + !                     :   |     + ! : | + ! : | + ! : 25 RandomInt  (in rbframework.dylib) + 3989  [0x6b6066]
    + !                     :   |     + ! : | + ! : | + ! : | 25 FolderItemIterator_Value  (in rbframework.dylib) + 4125  [0x6f9525]
    + !                     :   |     + ! : | + ! : | + ! : |   20 free  (in libsystem_malloc.dylib) + 301  [0x946c9ca9]
    + !                     :   |     + ! : | + ! : | + ! : |   + 9 szone_free_definite_size  (in libsystem_malloc.dylib) + 659,87,...  [0x946ca2b5,0x946ca079,...]
    + !                     :   |     + ! : | + ! : | + ! : |   + 5 szone_free_definite_size  (in libsystem_malloc.dylib) + 495  [0x946ca211]
    + !                     :   |     + ! : | + ! : | + ! : |   + ! 5 _os_lock_spin_lock  (in libsystem_platform.dylib) + 21,0,...  [0x92c38e83,0x92c38e6e,...]
    + !                     :   |     + ! : | + ! : | + ! : |   + 3 os_lock_unlock  (in libsystem_platform.dylib) + 9  [0x92c37948]
    + !                     :   |     + ! : | + ! : | + ! : |   + 3 szone_free_definite_size  (in libsystem_malloc.dylib) + 1356  [0x946ca56e]
    + !                     :   |     + ! : | + ! : | + ! : |   +   3 tiny_free_list_add_ptr  (in libsystem_malloc.dylib) + 156,60,...  [0x946cb621,0x946cb5c1,...]
    + !                     :   |     + ! : | + ! : | + ! : |   4 free  (in libsystem_malloc.dylib) + 60  [0x946c9bb8]
    + !                     :   |     + ! : | + ! : | + ! : |   + 4 szone_size  (in libsystem_malloc.dylib) + 0,183,...  [0x946c9d29,0x946c9de0,...]
    + !                     :   |     + ! : | + ! : | + ! : |   1 free  (in libsystem_malloc.dylib) + 0  [0x946c9b7c]
    + !                     :   |     + ! : | + ! : | + ! : 16 RandomInt  (in rbframework.dylib) + 4024  [0x6b6089]
    + !                     :   |     + ! : | + ! : | + ! : | 9 FolderItemIterator_Value  (in rbframework.dylib) + 1373  [0x6f8a65]
    + !                     :   |     + ! : | + ! : | + ! : | + 9 FolderItemIterator_Value  (in rbframework.dylib) + 1256  [0x6f89f0]
    + !                     :   |     + ! : | + ! : | + ! : | +   8 malloc  (in libsystem_malloc.dylib) + 52  [0x946c717e]
    + !                     :   |     + ! : | + ! : | + ! : | +   ! 8 malloc_zone_malloc  (in libsystem_malloc.dylib) + 75  [0x946c838c]
    + !                     :   |     + ! : | + ! : | + ! : | +   !   8 szone_malloc  (in libsystem_malloc.dylib) + 24  [0x946c83e1]
    + !                     :   |     + ! : | + ! : | + ! : | +   !     3 szone_malloc_should_clear  (in libsystem_malloc.dylib) + 18,61,...  [0x946c83f9,0x946c8424,...]
    + !                     :   |     + ! : | + ! : | + ! : | +   !     2 os_lock_lock  (in libsystem_platform.dylib) + 9  [0x92c3795d]
    + !                     :   |     + ! : | + ! : | + ! : | +   !     2 szone_malloc_should_clear  (in libsystem_malloc.dylib) + 102  [0x946c844d]
    + !                     :   |     + ! : | + ! : | + ! : | +   !     : 2 _os_lock_spin_lock  (in libsystem_platform.dylib) + 21  [0x92c38e83]
    + !                     :   |     + ! : | + ! : | + ! : | +   !     1 os_lock_unlock  (in libsystem_platform.dylib) + 9  [0x92c37948]
    + !                     :   |     + ! : | + ! : | + ! : | +   1 malloc  (in libsystem_malloc.dylib) + 23  [0x946c7161]
    + !                     :   |     + ! : | + ! : | + ! : | 4 FolderItemIterator_Value  (in rbframework.dylib) + 1409  [0x6f8a89]
    + !                     :   |     + ! : | + ! : | + ! : | + 2 RandomInt  (in rbframework.dylib) + 3548  [0x6b5ead]
    + !                     :   |     + ! : | + ! : | + ! : | + ! 2 _platform_memmove$VARIANT$sse3x  (in libsystem_platform.dylib) + 65,42  [0x92c391f1,0x92c391da]
    + !                     :   |     + ! : | + ! : | + ! : | + 2 RandomInt  (in rbframework.dylib) + 3523,3551  [0x6b5e94,0x6b5eb0]
    + !                     :   |     + ! : | + ! : | + ! : | 2 FolderItemIterator_Value  (in rbframework.dylib) + 1420  [0x6f8a94]
    + !                     :   |     + ! : | + ! : | + ! : | + 2 RandomInt  (in rbframework.dylib) + 3633  [0x6b5f02]
    + !                     :   |     + ! : | + ! : | + ! : | 1 FolderItemIterator_Value  (in rbframework.dylib) + 1409  [0x6f8a89]
    + !                     :   |     + ! : | + ! : | + ! : 1 RandomInt  (in rbframework.dylib) + 4001  [0x6b6072]
    + !                     :   |     + ! : | + ! : | + ! :   1 FolderItemIterator_Value  (in rbframework.dylib) + 29142  [0x6ff6de]

Here’s a sample of the app now sitting idle (with zero connections) yet still stuck at 30% CPU - looks very similar like it’s also hung up in socket sending data:

Call graph:
    2363 Thread_1328236   DispatchQueue_1: com.apple.main-thread  (serial)
    + 2272 start  (in Curve) + 53  [0x329b64]
    + ! 2272 main  (in Curve) + 36  [0x32702c]
    + !   2272 _Main  (in Curve) + 257  [0x327566]
    + !     2272 REALbasic._RuntimeRun  (in Curve) + 34  [0x1c6a37]
    + !       2272 RuntimeRun  (in rbframework.dylib) + 58  [0x64d04e]
    + !         2272 RuntimeTextFromOldString  (in rbframework.dylib) + 5330  [0x6af23a]
    + !           2272 ConsoleApplication._CallFunctionWithExceptionHandling%%o<ConsoleApplication>p  (in Curve) + 248  [0x2b8c5]
    + !             2272 Delegate.Invoke%%  (in Curve) + 34  [0x24ea96]
    + !               2272 RuntimeGetStdErr  (in rbframework.dylib) + 467  [0x5e6591]
    + !                 2272 WebApplication.Event_Run%i4%o<WebApplication>A1s  (in Curve) + 11556  [0xc6ac0]
    + !                   2272 ConsoleApplication.DoEvents%%o<ConsoleApplication>i4  (in Curve) + 44  [0x2b693]
    + !                     2195 RuntimeDoEvents  (in rbframework.dylib) + 20  [0x6c1fdf]
    + !                     : 2195 RuntimeTextFromOldString  (in rbframework.dylib) + 5415  [0x6af28f]
    + !                     :   1097 RuntimeTextFromOldString  (in rbframework.dylib) + 5451  [0x6af2b3]
    + !                     :   | 1097 UInt64Parse  (in rbframework.dylib) + 6898  [0x6d5cd2]
    + !                     :   |   1088 IPCSocketBytesLeftToSend  (in rbframework.dylib) + 2143  [0x5df1fb]
    + !                     :   |   + 1088 IPCSocketBytesLeftToSend  (in rbframework.dylib) + 2039  [0x5df193]
    + !                     :   |   +   1088 ApplicationAddMenuHandler  (in rbframework.dylib) + 629  [0x6c354c]
    + !                     :   |   +     1088 _HTTPServer.HTTPRequestSocket._Poll%%o<_HTTPServer.HTTPRequestSocket>  (in Curve) + 269  [0x9e97b]
    + !                     :   |   +       1088 SSLSocket._Poll%%o<SSLSocket>  (in Curve) + 48  [0x66e2e]
    + !                     :   |   +         1063 REALPluginMain  (in SSLSocket.dylib) + 5400  [0x20dd9db]
    + !                     :   |   +         ! 379 REALPluginMain  (in SSLSocket.dylib) + 3890  [0x20dd3f5]
    + !                     :   |   +         ! : 377 REALPluginMain  (in SSLSocket.dylib) + 11260  [0x20df0bf]
    + !                     :   |   +         ! : | 375 ???  (in SSLSocket.dylib)  load address 0x20db000 + 0x1474  [0x20dc474]
    + !                     :   |   +         ! : | + 300 VirtVolFlush  (in rbframework.dylib) + 69499  [0x668e27]
    + !                     :   |   +         ! : | + ! 218 VirtVolFlush  (in rbframework.dylib) + 66588  [0x6682c8]
    + !                     :   |   +         ! : | + ! : 160 VirtVolFlush  (in rbframework.dylib) + 125764  [0x6769f0]
    + !                     :   |   +         ! : | + ! : | 149 RandomInt  (in rbframework.dylib) + 4715  [0x6c536c]
    + !                     :   |   +         ! : | + ! : | + 92 RuntimeOnesCompliment  (in rbframework.dylib) + 11260  [0x6083a0]
    + !                     :   |   +         ! : | + ! : | + ! 46 RandomInt  (in rbframework.dylib) + 4024  [0x6c50b9]
    + !                     :   |   +         ! : | + ! : | + ! : 38 FolderItemIterator_Value  (in rbframework.dylib) + 1387  [0x706d95]
    + !                     :   |   +         ! : | + ! : | + ! : | 35 FolderItemIterator_Value  (in rbframework.dylib) + 1270  [0x706d20]
    + !                     :   |   +         ! : | + ! : | + ! : | + 32 malloc  (in libsystem_malloc.dylib) + 52  [0x946c717e]
    + !                     :   |   +         ! : | + ! : | + ! : | + ! 21 malloc_zone_malloc  (in libsystem_malloc.dylib) + 75  [0x946c838c]
    + !                     :   |   +         ! : | + ! : | + ! : | + ! : 20 szone_malloc  (in libsystem_malloc.dylib) + 24  [0x946c83e1]
    + !                     :   |   +         ! : | + ! : | + ! : | + ! : | 14 szone_malloc_should_clear  (in libsystem_malloc.dylib) + 61,2158,...  [0x946c8424,0x946c8c55,...]
    + !                     :   |   +         ! : | + ! : | + ! : | + ! : | 4 szone_malloc_should_clear  (in libsystem_malloc.dylib) + 102  [0x946c844d]
    + !                     :   |   +         ! : | + ! : | + ! : | + ! : | + 4 _os_lock_spin_lock  (in libsystem_platform.dylib) + 21  [0x92c38e83]
    + !                     :   |   +         ! : | + ! : | + ! : | + ! : | 1 DYLD-STUB$$os_lock_unlock  (in libsystem_malloc.dylib) + 0  [0x946dcc76]
    + !                     :   |   +         ! : | + ! : | + ! : | + ! : | 1 os_lock_lock  (in libsystem_platform.dylib) + 9  [0x92c3795d]
    + !                     :   |   +         ! : | + ! : | + ! : | + ! : 1 szone_malloc  (in libsystem_malloc.dylib) + 0  [0x946c83c9]
    + !                     :   |   +         ! : | + ! : | + ! : | + ! 11 malloc_zone_malloc  (in libsystem_malloc.dylib) + 4,14,...  [0x946c8345,0x946c834f,...]
    + !                     :   |   +         ! : | + ! : | + ! : | + 3 malloc  (in libsystem_malloc.dylib) + 75,52,...  [0x946c7195,0x946c717e,...]
    + !                     :   |   +         ! : | + ! : | + ! : | 2 FolderItemIterator_Value  (in rbframework.dylib) + 1282  [0x706d2c]
    + !                     :   |   +         ! : | + ! : | + ! : | 1 DYLD-STUB$$malloc  (in rbframework.dylib) + 0  [0x8c17c6]
    + !                     :   |   +         ! : | + ! : | + ! : 5 FolderItemIterator_Value  (in rbframework.dylib) + 1423  [0x706db9]
    + !                     :   |   +         ! : | + ! : | + ! : | 5 RandomInt  (in rbframework.dylib) + 3548  [0x6c4edd]
    + !                     :   |   +         ! : | + ! : | + ! : |   5 _platform_memmove$VARIANT$sse3x  (in libsystem_platform.dylib) + 6,65,...  

Checking my notes, it looks like the version that was working well was compiled with 2015R3.1.

2015R4.1 and later (including 2016R1) don’t seem to do well under even mild loads.

Bump: I think this is not alpha or beta-related so I moved this into the normal Web channel.

To summarize: I believe that WebApps built with older IDEs (such as 2015R3.1) perform much better under load than newer ones (2015R4.1 and later). It seems like possibly two separate issues:

  • Leftover sessions chewing CPU
  • Slow performance in socket sends (see samples above).

Has anyone else had this issue?

I have an opportunity today to re-test the 2015R3.1 version under load, so I’ll report back as to whether or not I’m correct about the older versions being more robust.

I have problems with SMTPSecureSockets in R4 and not R3 so I had to go back to that version. I really can’t upgrade until that bug is fixed as most of my projects send emails.

However, in your trace I see VirtVol Flush. I remember something about virtual volumes from the RealBasic days. I guess it’s still in there. Maybe it has something to do with that.

We need some kind of a test project or something for WebApps and other common functions. It seems that there is often a random network issue that comes up in new versions.

I ran my “standardized*” test against the 2015R3.1 build, and although it appeared to do a little better, it still fell apart, and is now sitting with zero live connections but using 70% cpu and a very similar stack trace.

I’m pretty sure this used to work just fine, but then something changed. Of course, I changed a few things at once:

  • New version of Xojo
  • Using SSL rather than not
  • New version of OS X (10.9 -> 10.10)

Your comment about SecureSockets is interesting. I am not using any of them in the app, but the whole web app is invoked with the “–secureport=xxxx” option.

So maybe the problem is not with the Xojo IDE version but running using SSL? I will do more tests…

As for VirtualVolume: I’m not using them in the app at all, so not clear why that ends up in the stack trace. Maybe WebFile uses them internally?

  • Where I ask a classroom of 45 people on computers to all hit the app at once.

I like using haproxy to handle SSL, and it’s also very easy to setup load balancing with it. I can’t remember who it was, but someone on the forum wrote up how to set it up. A search will probably turn it up.

[quote]Your comment about SecureSockets is interesting. I am not using any of them in the app, but the whole web app is invoked with the “–secureport=xxxx” option.

So maybe the problem is not with the Xojo IDE version but running using SSL? I will do more tests…[/quote]
All sockets in web apps are derived from SSLSocket, some with Secure=True, some with False.

Nope. My guess is this is the closest offset it was able to find in Folderitem.

That’s asking for trouble without a load balancer of some kind. When we did the first XDC app, we got hit by ~1000 users in the first 3 minutes and it brought the whole app down. Last year we used a load balancer to several instances on the same server and the problem was abated.

People ask me about this all the time. I have written about it in a few places but the info is a little dated now.

Start here http://john-joyce.com/xojo-and-load-balancing-with-haproxy/
There are links in there that have more info as well.

Also search for HAproxy in the forum and you will find the conversations. You may have to dig around.
https://forum.xojo.com/conversations/all?search=haproxy

I will do my best to help you if you need help.

Offloading SSL makes a big difference and balancing multiple instances makes an even bigger one. HAproxy is free and easy to set up and you don’t need any additional hardware - just maximizing the hardware you have.

Well, the thing is I’ve used this same app for a couple years and it’s always done OK with handling 40-50 users - the app itself degraded gracefully, just becoming slower, but speeding up once the load dropped off.

Behavior now is not the same : the app actually stops serving content and seems to get stuck with running threads even after all connections are terminated.

I’ll bet you are surpassing the maximum of what you can handle now that you turned SSL on. I’ll bet turning SSL back off (or handing it off to HAProxy) returns you to your prior performance. If there is absolutely no free overhead, the actual process of winding down dead sessions and threads itself could be impacted.

I’ve just discovered something really odd. Testing the app using only a few (from 1 to 4) simultaneous connections and watching Activity monitor, every time there is data being transmitted, mds_stores starts burning cpu, spiking to 50% or higher. That doesn’t seem right to me, and I wonder if that could be affecting performance.

I do have spotlight indexing turned on, and I am using two kinds of logging in the web app:

  • the app is launched with the --logging flag
  • I’m using a simple logging system in the app:
  // does rudimentary logging
  static logFile as FolderItem
  static tos as TextOutputStream
    
  if logFile = nil then
    logFile = GetSpecialFolder("logs", "events.log")
    tos = TextOutputStream.Append(logFile)
  end if
  
  dim d as new date
  
  tos.WriteLine("[" + d.SQLDateTime + "]" + " " + msg)
  tos.Flush  // so log updates in real time

I wonder if either of these logging systems is getting caught up with Spotlight indexing to cause a big slowdown? I’ll try some tests to narrow it down…

More info:

  • Disabling Spotlight indexing for the entire web app folder (including both Xojo’s log, and my own log file) completely resolves the issue with mds_stores burning 100% cpu.
  • Disabling Spotlight indexing for only the folder containing my log: doesn’t help.

Conclusion:
The --logging flag causes a serious performance hit when the webApp’s log file (output.log) is subject to Spotlight indexing.

I don’t know how much of the overall performance issues I’m having was due to this issue, but it surely doesn’t help things. Submitted as <https://xojo.com/issue/42632>

Could be. I will do another test with both Spotlight indexing off, and just using the regular (non SSL) version of the webApp and report back on Tuesday.

Update: though I’m not able to do the full load test with 45 people, I was able to do a small test running 4 simultaneous sessions (with 3 of them over a slow Tor network link to simulate low bandwidth / high latency connections). Even with this small test it’s apparent that performance is much better - the web apps load much more quickly in the browser AND the cpu usage of the web app on the server is extremely low (hovering below 5% even when serving 4 sessions).

So this is very promising, and suggests that the combination of SpotlightIndexing and SSL was causing a big performance hit.

Travis, I discovered a third possible problem: My server is behind a Dual WAN configuration, and I had a mistake in the firewall settings, which meant that if the server tried to make a socket to contact the client, sometimes it would go out over WAN IP #1, sometimes over WAN IP #2. I wonder if this behavior might have caused an unusually high # of sockets to get stuck and fail to connect, which could have led to additional performance problems?

Problem solved - I’m not sure which of these issues was the biggest issue, but after fixing all three the performance is literally 100x faster:

  • Turned off Spotlight Indexing for the web app’s folder (since the combination of the “–logging” flag and Spotlight Indexing caused a performance hit).
  • Turned off SSL, using only normal sockets
  • Set my firewall rules to ensure that all communication was happening on a single WAN circuit (I have a dual-WAN router that was misconfigured).