Web Load Testing Discussion

Recently I spent some time trying to determine how much load a Xojo Web app could sustain and figured I’d share my journey here with everyone. Let’s start with the setup…

DigitalOcean (DO) Droplet
Debian 12 x86
Premium AMD w/ NVMe SSD
1 GB / 1 AMD CPU
25 GB NVMe SSDs
Not the very bottom of the barrel ($4 as of this writing), but moved up to the Premium tier for better specs and throughput ($7)
Web apps loaded via Lifeboat with an nginx backend and no load balancing

Testing Location
I’m just doing this from home
MacBook Pro M1 Pro
Fiber 1GB although Wi-Fi to my Mac (802.11ac w/ Tx rate of 780 Mbps)
I’m about 100 geographic miles to the DO data center
Be aware that this is not normally how you would do in depth load testing where you’ll have multiple (hundreds/thousands) of clients geographically dispersed to add load to your web app.

Testing Product Selection
There’s lots of tools out there from jMeter to Locust, but ultimately I wanted something I didn’t need to invest a lot of time in although I knew this would limit my results. I started with hey before I found its successor, oha. The big plus here is that there’s a precompiled binary and you don’t have to go off using make or creating a bunch of dependencies for Java, Python, etc. Note for anyone who’s not a terminal person, both hey and oha will download a binary that won’t run on the Mac until you “chmod +x ./oha” to set the executable bit.

Testing Methodology
I ended using the example Eddie’s Electronics web app that comes with Xojo. Note that you’ll find it within Samples Apps/Eddie’s Electronics.

I’m not sure how much caching if any DO or nginx is running out of the gate, so I wanted to do the test “warm” as this is the most likely real world scenario. So this means I did one test run that was discarded and took the result from a second one.

Of course I was curious, so I ran this test across Xojo 2019r3.2 (Web 1.0), 2023r4 and 2024r1.62274. Be aware that only the latter two Web 2.0 products are directly comparable but it’s still nevertheless interesting to see the contrast with Web 1.0.

Per the oha recommendations, I ran the following command to test things out…

./Downloads/oha -z 60s -c 100 --latency-correction --disable-keepalive https://myserver.com

Note that they did recommend doing a -q but I don’t believe Eddie’s Electronics is using queries so I disregarded this command.

Results
Here are the terminal results from each of the runs…


Xojo 2019r3.2 (Web 1.0)


Xojo 2023r4 (Web 2.0)


Xojo 2024r1.62274 (Web 2.0)

Discussion
So the first thing to call out is probably the first deficiency here of the testing just hitting the home page and not navigating through the interface at all. As a result, the findings aren’t going to be super conclusive beyond how quickly or reliably the homepage loads.

The first thing that stands out, is how different Web 1.0 is from Web 2.0. Not only is the size/request drastically down under 2.0 but the size/sec is higher which should mean much better efficiency and throughput versus 1.0. Of course the caveat is that the total data size is greater under 2.0, but I think this has to do with the project itself being different than 1.0.

Next it looks like the requests/sec are about the same across all three tests, although 2023r4 did have slightly more timeouts. Note that the “aborted due to deadline” isn’t an error per se, but how many connections were ceased once the 60 sec test time was up.

It’s also interesting to see a majority (75%) of all requests happening in the low 5 sec range across all tests with the slowest and fastest times about the same for all.

Next Steps
So on my side, I was more curious than anything else about how much load 2.0 could take. In this very rudimentary test, it appears that having 50 concurrent users hitting the homepage of Eddie’s Electronics at the same time, is very doable. Next step for me might be to crank up the number of requests and concurrent requests on my own web app to see if I can gauge at what point I should move up to higher tiers of DO droplets, add more load balancing, etc. At the very least, I now have a baseline that I can work against.

4 Likes

This is really good stuff.

As a followup, @Ricardo_Cruz has some really great data using wrk.

Just for kicks I just ran wrk versus oha to see the comparison. One important thing to note is that oha is in active development and is “newer” with more UI than wrk.

./wrk -d 60s -c 100 --latency https://myServer.com
./oha -z 60s -c 100 --latency-correction --disable-keepalive https://myServer.com

There are some deltas since my initial posting in February…

DigitalOcean (DO) Droplet
Debian 12 x86
Premium AMD w/ NVMe SSD
2 GB / 1 AMD CPU
50 GB NVMe SSDs
Now at a higher premium tier ($14 as of this writing).
Web apps loaded via Lifeboat with an nginx backend and no load balancing

Xojo
Web 2.0 app compiled under 2024r1

Mirroring my prior test runs, here is what wrk versus oha looks like…

wrk
wrk 2024-05-13 at 2.17.51 PM 100

oha

What’s particularly interesting here is how different the results are from each tool. For example wrk shows 154.47 requests per second versus oha at 20.0599. wrk also shows 469 timeouts versus 8 under oha. wrk shows 9276 requests versus 1120 responses under oha (which matches up to 20.0599 requests/sec * 60 seconds). The latency and response times are all over the place in comparison to the two tools. Some of this might come down to what’s being loaded up as wrk shows 41.20MB read versus oha at 1.60MiB so maybe wrk is loading up the whole web app but oha isn’t. But this also doesn’t make sense as wrk is showing better response and throughput than oha.

For comparison with Ricardo’s own data in the other thread, here’s two more runs, but this time with the duration and connections reduced to 10…

./wrk -d 10s -c 10 --latency https://myServer.com
./oha -z 10s -c 10 --latency-correction --disable-keepalive https://myServer.com

wrk
wrk 2024-05-13 at 2.19.34 PM 10

oha

The same analysis as above, appears to hold true here as well. I think the big take away isn’t to try and rectify the deltas between these two tools, but instead that one should pick a single tool and then stick with it to capture the positive or negative deltas over time as the analysis methodology will be consistent then.

This does continue to beg the question of, “What exactly is the throughput of Web 2.0 and how many users can a server sustain in burst as well as normal operations?” Of course, lots of variables feed into these questions and all these load tester tools are theoretical in nature so there might never be a fully accurate number here.

3 Likes

I haven’t heard about oha before, but it looks really nice. Both oha and wrk are useful. It seems (without haven’t played with it yet) oha is more a stress tool, while wrk is focusing on throughput.

Thanks for sharing this.

1 Like

Also although I haven’t posted a screenshot for this yet, oha has a very nice real-time “dashboard” so that you can see the status while things are progressing. All the screenshots above are from the “conclusion report”, and here’s an example of the “dashboard”…

oha
Screenshot 2024-05-14 at 11.18.01 AM

2 Likes