var http = require('http')
var fs = require('fs')
http.createServer(function (req, res) {
res.writeHead(200, {'Content-Type': 'text/plain'})
res.end("hello world")
}).listen(3000);
This is just run with node server.js
ab -n 100000 -c100 http://localhost:3000/
Requests per second: 1339.35 [#/sec] (mean)
Requests per second: 1019.17 [#/sec] (mean)
Requests per second: 2899.45 [#/sec] (mean)
Requests per second: 4085.26 [#/sec] (mean)
for x in xrange(100000000):
x + 1
time python test.py
real 0m13.531s user 0m13.487s sys 0m0.060s
real 0m10.337s user 0m10.302s sys 0m0.054s
real 0m9.177s user 0m9.120s sys 0m0.040s
real 0m8.909s user 0m8.897s sys 0m0.012s
The first iterations of this had node clustered with 2 forks, incrementing a remote redis counter, and ab being run from a remote machine. While the numbers were higher ( about 50% ) they were proportionally the same. This version just removes all variables.
I understand ab isn't exact but it's margin of error is tight enough to make it suitable for a ballpark. I've worked with Joyent and Amazon in the past. Infact I pulled 100 machines off AWS and put them on Joyent precisely because my own benchmarks - much more thorough than this - showed Joyent being 5 - 10 times faster.
When I ran the above I expected Joyent to completely demolish AWS.
I'm happy to be told I've done something terribly wrong. In fact I'm hoping for it.
I took some notes while debugging this on my own lab machine; if they are of interest, I've pasted them below:
ab performs one connection per request. This can be observed using:
The output shows ab initially running fast and then slowing down.
Fast:
Slow:
Both node and ab are blocking (SLP). No caps are being hit (LAT). Note that ab is single-process and single-threaded.
This type of benchmarking can flood the server with TCP connections stuck in TIME_WAIT, since they are launched from a single client to a single server and port, and only have one 16 bit identifier – the ephemeral client port – to differentiate them. Checking for this:
11k isn't enough; with the default tuning there should be 33k slots: tcp_smallest_anon_port () to tcp_largest_anon_port (). This looks like a different issue.
Checking syscall times for both ab and node:
The distributions show the nanosecond durations for the system calls. The calls to portfs can block for 10s of milliseconds. Why? I don't know. This type of syscall (event ports) basically means it's blocked on someone else. We'll need to trace that someone else.
From the global zone, I'll investigate which code paths cause the node portfs to unblock, by tracing conditional variable wakeups:
Ah. tcp_send_data() is being called by tcp_timer()? Is this a retransmit? On localhost??
Back to basic TCP counters:
Yes! So, when TCP retransmits begin to occur, performance drops.
The rate seems interesting, between about 70 and 100 retransmits per second, and appearing to hit a ceiling of 100. The ab client concurrency is set to 100. As an experiment, this was adjusted to 333, which caused the following rate of retransmits:
This makes sense, as the TCP retransmit interval is 1000ms:
So, as each ab client hits a retransmit, it waits for 1 second. The most that can retransmit and wait during a second is equal to the ab client concurrency.
Getting some information about those retransmits:
So these are SYN sent.
Listen drops?
These aren't listen drops (tcpListenDrop & Q0), but the tcpOutRsts counter is consistent with tcpRetransSegs.
Some digging:
This doesn't tell me much; will need to trace the tcp functions further.
Trying snoop quickly, this time from a remote client.
A successful connection, showing SYN, SYN-ACK, ACK:
And an unsuccessful connection, resulting in a RST:
That's a SYN, ACK, RST. Packet arrival during TIME_WAIT?
Back on the server, lots of packets are arriving during TIME_WAIT:
The rate is around 300 per second.
Checking where in the stack:
tcp_xmit_ctl() is a:
So I can:
So this is in tcp_input_data() as (usr/src/uts/common/inet/tcp/tcp_input.c):
This is testing that the sequence number is sane, base on tcp_iss and tcp_snxt. It looks like this is failing up to 100 times per second (one for each ab client), causing the RSTs, retransmits, blocking while wating on the retransmits, and finally the SLP latency.