Best web request generator for benchmarking

I needed to compare several Lua web server software packages for raw speed. This proved to be rather difficult because one web request generator after another had problems. In the end, I found what I needed, and produced an article benchmarking OpenResty vs Lua 5.4 web servers. But it took some real pain, which perhaps I can spare you with this write-up.

  • TLDR: Use wrk2, ignore its build warnings, and specify a huge -R10000000.

  • Update: More importantly, don't specify more than 100 threads, and keep an eye on its maths to make sure it says it's testing for as many seconds as you specified, so that you don't get "833071 requests in 1.25ms" when you told it to test for 5s (this will give you wildly high req/s). See the end of this article for more detail and for cross-checking with weighttp.

I should point out that I'm testing server software that is also running on localhost. This means I can speed-test the actual server software rather than the link speed. But it also means that the benchmark tool's overhead must be as low as possible, to interfere as little as possible with the server speed result.

I wanted speed, so I tried a range of C and Go benchmark tools. I started, of course, with ab (Apache Benchmark):

ab -k -c1000 -n50000 -S "http://localhost:8081/multiply?a=2&b=3"

That worked for most of my web servers based on: OpenResty, Apache, Nginx, although I did have to manually tweak concurrent connections -c to get optimal performance on each different server.

But when I added Redbean into the mix, it produced huge figures. Apparently, this is because ab only supports HTTP/1.0 instead of 1.1, and so its keep-alive doesn't work with Redbean. Whey this didn't stymie it on other servers I still don't understand. Furthermore, as I discovered later, it's actually surpassed by the speed of some of the others.

So I started looking for a better tool. There are way too many benchmarking tools out there but this list of load-testing tools was very helpful (though of course it didn't tell me which ones do what I needed).

I figured that Go tools might be good since Go has great threading support, so should be able to fully load the server. However, the Go options were noticeably slower than the C options and produced lower requests/second.

Here's a summary of all the tools I tried, and the problems I had with each:

  • ab: HTTP/1.0 only. Not fastest (single thread). Redbeand trouble with keep-alive

Go tools:

  • cassowary: Failed against OpenResty when concurrent users >1 (-c2)

  • gobench: Only measures to 1s resolution, which puts all other stats out, too

  • gohttpbench: Works well, but half the speed of weighttp

  • vegeta: Very nice tool, but again, half the speed of weighttp

C tools:

  • htstress: Half the speed of weighttp

  • httperf: Fails against OpenResty. Also not the fastest

  • httpress: Even though it's HTTP/1.1, Redbean, again, doesn't recognize its keep-alive so takes 10x longer. (I delved and think it may be a bug in http-parser, but I didn't conclude). Fails some requests if concurrent connections (-c) is above about 10. Plus, the many build dependencies are horrendous.

  • hurl: Good tool but you have to tune the amount of parallel requests differently for each server, or there are slowdowns and failures.

  • weighttp: Runner-up to best tool.

  • wrk: Bogus results with short tests or large number of thread. A 1s test with 2000 threads supposedly handles an unbelievable 15 million requests/sec. It's a calibration issue for runs <10s, according to the wrk2 docs.

  • wrk2: Best tool; slightly faster than weighttp. But see note below.

wrk2 -d5 -c10 -t10 -R1000000 "http://localhost:$i/multiply?a=2&b=3"

wrk2 seems to be the best tool. You can't specify number of requests to perform like in other tools. Instead, you specify a duration, and an overly-high rate to get maximum throughput (specify -R10000000). Also, it has a heavy build since it includes LuaJIT (which I'm not using), and the LuaJIT version in the wrk2 repository is old enough to produce build warnings. Oh well, it works. :-)

If you're interested in the details of any of these, see the .sh files and Makefile in this branch of my lua-server-benchmark repository.

Update

Actually, it turns out wrk2 also has bad bugs giving wildly high results when you specify more than about 100 threads. For example:

wrk2/wrk -d5 -c1000 -t250 -R 10000000 "http://localhost:8085/multiply?a=2&b=3"
Running 5s test @ http://localhost:8085/multiply?a=2&b=3
  250 threads and 1000 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     1.97s     1.28s    4.93s    60.75%
    Req/Sec       -nan      -nan   0.00      0.00%
  833071 requests in 1.25ms, 223.25MB read
  Socket errors: connect 0, read 0, write 0, timeout 29
Requests/sec: 663801593.63
Transfer/sec:    173.72GB

See how I specified 250 threads, and a 5s test? Well, it did create 833071 requests in 5s, but as you see, it thinks it did it in 1.25ms, producing a ridiculous figure of 663 million requests/sec.

Unfortunately, this means you should probably double-check your wrk2 results against weighttp, but note that with weighttp the last request sometimes weirdly takes seconds to return, giving you low req/sec. I've found it more likely to work with -c10 -t10.