Conventional wisdom says that Linux has a better TCP stack than Windows. But with the current latest Linux and the current latest Windows (or even Vista), there is at least one aspect where this is not true. (My definition of better is simple- which one is fastest)
Over the past year or so, researchers have proposed to adjust TCP’s congestion window from it’s current form (2pkts or ~4KB) up to about 10 packets. These changes are still being debated, but it looks likely that a change will be ratified. But even without official ratification, many commercial sites and commercially available load balancing software have already increased initcwnd on their systems in order to reduce latency.
Back to the matter at hand – when a client makes a connection to a server, there are two variables which dictate how quickly a server can send data to the client. The first variable is the client’s “receive windowâ€. The client tells the server, “please don’t exceed X bytes without my acknowledgementâ€, and this is a fundamental part of how TCP controls information flow. The second variable is the server’s cwnd, which, as stated previously is generally the bottleneck and is usually initialized to 2.
In the long-ago past, TCP clients (like web browsers) would specify receive-window buffer sizes manually. But these days, all modern TCP stacks use dynamic window size adjustments based on measurements from the network, and applications are recommended to leave it alone, since the computer can do it better. Unfortunately, the defaults on Linux are too low.
On my systems, with a 1Gbps network, here are the initial window sizes. Keep in mind your system may vary as each of the TCP stacks does dynamically change the window size based on many factors.
Vista: 64KB
Mac: 64KB
Linux: 6KB
6KB! Yikes! Well, the argument can be made that there is no need for the Linux client to use a larger initial receive window, since the servers are supposed to abide by RFC2581. But there really isn’t much downside to using a larger initial receive window, and we already know that many sites do benefit from a large cwnd already. The net result is that when the server is legitimately trying to use a larger cwnd, web browsing on Linux will be slower than web browsing on Mac or Windows, which don’t artificially constrain the initial receive window.
Some good news – a patch is in the works to allow users to change the default, but you’ll need to be a TCP whiz and install a kernel change to use it. I don’t know of any plans to change the default value on Linux yet. Certainly if the cwnd changes are approved, the default initial receive window must also be changed. I have yet to find any way to make linux use a larger initial receive window without a kernel change.
Two last notes:
1) This isn’t theoretical. It’s very visible in network traces to existing servers on the web that use larger-than-2 cwnd values. And you don’t hit the stall just once, you hit it for every connection which tries to send more than 6KB of data in the initial burst.
2) As we look to make HTTP more efficient by using fewer connections (SPDY), this limit becomes yet-another-factor which favors protocols that use many connections instead of just one. TCP implementors lament that browsers open 20-40 concurrent connections routinely as part of making sites load quickly. But if a connection has an initial window of only 6KB, the use of many connections is the only way to work around the artificially low throttle.
There is always one more configuration setting to tweak.
Watch the debate rage at the IETF:
Full speed ahead:
http://tools.ietf.org/html/draft-ietf-tcpm-initcwnd-00
A more cautious approach:
http://tools.ietf.org/html/draft-allman-tcpm-bump-initcwnd-00
An rtnetlink patch to add init_rcv_wnd is in 2.6.34 and later kernels:
http://git.kernel.org/?p=linux/kernel/git/stable/linux-2.6.36.y.git;a=commit;h=31d12926e37291970dd4f6e9940df3897766a81d
But, it looks like the sysctl patch didn’t make it yet, so some manual patching is still required.