Performance and the TLS Record Size

Today I ran into a problem with TLS (SSL) Record Sizes causing the performance of my site to be sluggish and slow.  The server was doing a good job of sending large messages down the client, and I am using a late-model version of the OpenSSL library, why would this happen?

HTTP and TLS both seem like streaming protocols.  But with HTTP, the smallest sized “chunk” you can send is a single byte.  With TLS, the smallest chunk you can send is a TLS record.  As the the TLS record arrives on the client, it cannot be passed to the application layer until the full record is received and the checksum is verified.  So, if you send large SSL records, all packets that make up that record must be received before any of the data can be used by the browser.

In my case, the HTTP-to-SPDY proxy in front of my webserver was reading chunks of 40KB from the HTTP server, and then calling SSL_Write() for as much of that data over SPDY (which uses SSL for now).  This meant that the client couldn’t use any of the 40KB until all of the 40KB was received.  And since 40KB of data will often incur round-trips, this is a very bad thing.

It turns out this problem surfaces more with time-to-first-paint than with overall page-load-time (PLT), because it has to do with the browser seeing data incrementally rather than in a big batch.  But it still can impact PLT because it can cause multi-hundred-millisecond delays before discovering sub-resources.

The solution is easy – on your server, don’t call SSL_Write() in big chunks.  Chop it down to something smallish – 1500-3000 bytes.  Here is a graph comparing the time-to-first paint for my site with just this change.  Shaved off over 100ms on the time-to-first-paint.

smallbuf.ttfp

Gettys on Bufferbloat

Jim Gettys has a nice tale of what he calls ‘bufferbloat’.  Instinctively, it seems like bigger buffers should result in less packet loss.  As long as you can buffer it, the other guy doesn’t have to retransmit, right?  But that is not the way TCP works.  It’s going to retransmit if you don’t reply fast enough.  And if you clog the buffers, its going to take a long time before the endpoint can acknowledge the data.

One interesting anecdote to me (and it isn’t really a conclusion) is that the world’s love affair with Windows XP (which has an ancient TCP stack) may actually be helping the internet at large, even though the Vista TCP stack is measurably a better stack:

The most commonly used system on the Internet today remains Windows XP, which does not implement window scaling and will never have more than 64KB in flight at once. But the bufferbloat will become much more obvious and common as more users switch to other operating systems and/or later versions of Windows, any of which can saturate a broadband link with but a merely a single TCP connection.

Gettys did conclude that this was a problem for video downloads, which is something everyone is doing these days.  He’s not wrong, but real video services may not be as subject to this as it seems.  Video services live-and-die by bandwidth costs, so to preserve bandwidth costs, they avoid simply transmitting the whole video – instead they dribble it out manually, at the application layer.  If they depended on TCP for throttling, he’d be right, but I don’t think many large-scale video services work this way.  Need more data! 🙂

Anyway, a great read.

Free SSL Certificates

padlock Adam Langley slammed me today for using a self-signed cert on this site (https://www.belshe.com/), pointing out that there is no reason not to have a real certificate, especially when you can get them for free.

As usual, he is right, of course.  So I got myself a signed certificate from StartSSL.

Here are the step by step instructions.  You can do it too:

https://github.com/ioerror/duraconf/blob/master/startssl/README.markdown

Chrome Speeding up SSL with SSL FalseStart

The latest releases of Chrome now enable a feature called SSL False Start.  False Start is a client-side change which makes your SSL connections faster.  As of this writing, Chrome is the only browser implementing it.  Here is what it does.

In order to establish a secure connection, SSL uses a special handshake where the client and server exchange basic information to establish the secure connection.  The very last message exchanged has traditionally been implemented such that the client says, “done”, waits for the server, and then the server says, “done”.  However, this waiting-for-done is unnecessary, and the SSL researchers have discovered that we can remove one round trip from the process and allow the client to start sending data immediately after it is done.

To visualize this, lets look at some packet traces during the handshake sequence, comparing two browsers:

Chrome

Browser w/o FalseStart

  0ms SEND TCP SYN
83ms RECV TCP SYN ACK
83ms SEND TCP ACK
83ms SEND Client Hello
175ms RECV Server Hello
           Certificate
           Server Hello Done
176ms SEND Client Key Exchange
           Change Cipher Spec
           Enc Handshake Msg
           HTTP Request
274ms RECV Enc Handshake Msg
           Change Cipher Spec
           Enc Handshake Msg
275ms RECV HTTP Response
  0ms SEND TCP SYN
84ms RECV TCP SYN ACK
84ms SEND TCP ACK
84ms SEND Client Hello
173ms RECV Server Hello
           Certificate
           Server Hello Done
176ms SEND Client Key Exchange
           Change Cipher Spec
           Enc Handshake Msg

269ms RECV Enc Handshake Msg
           Change Cipher Spec
           Enc Handshake Msg
269ms SEND HTTP Request

524ms RECV HTTP Response

These two traces are almost identical.  Highlighted in red is the subtle difference.  Notice that Chrome sent the HTTP Request at time 176ms, which was a little more than one round-trip-time faster than the other browser could send it. 

(Note- it is unclear why the HTTP response for the non-FalseStart browser was 250ms late; the savings here is, in theory, just 1 round trip, or 83ms.  There is always variance on the net, and I’ll attribute this to bad luck)

Multiplicative Effect on Web Pages
Today, almost all web pages combine data from multiple sites.  For SSL sites, this means that the handshake must be repeated to each server that is referenced by the page.  In our tests, we see that there are often 2-3 “critical path” connections while loading a web page.  If your round-trip-time is 83ms, as in this example, that’s 249ms of savings – just for getting started with your page.  I hope to do a more thorough report on the effect of FalseStart on overall PLT in the future.

For more information on the topic, check out Adam Langley’s post on how Chrome deals with the very few sites that can’t handle FalseStart.

Linux Client TCP Stack Slower Than Windows

Conventional wisdom says that Linux has a better TCP stack than Windows.  But with the current latest Linux and the current latest Windows (or even Vista), there is at least one aspect where this is not true.  (My definition of better is simple- which one is fastest)

Over the past year or so, researchers have proposed to adjust TCP’s congestion window from it’s current form (2pkts or ~4KB) up to about 10 packets.  These changes are still being debated, but it looks likely that a change will be ratified.  But even without official ratification, many commercial sites and commercially available load balancing software have already increased initcwnd on their systems in order to reduce latency. 

Back to the matter at hand – when a client makes a connection to a server, there are two variables which dictate how quickly a server can send data to the client.  The first variable is the client’s “receive window”.  The client tells the server, “please don’t exceed X bytes without my acknowledgement”, and this is a fundamental part of how TCP controls information flow.  The second variable is the server’s cwnd, which, as stated previously is generally the bottleneck and is usually initialized to 2.

In the long-ago past,  TCP clients (like web browsers) would specify receive-window buffer sizes manually.  But these days, all modern TCP stacks use dynamic window size adjustments based on measurements from the network, and applications are recommended to leave it alone, since the computer can do it better.  Unfortunately, the defaults on Linux are too low. 

On my systems, with a 1Gbps network, here are the initial window sizes.  Keep in mind your system may vary as each of the TCP stacks does dynamically change the window size based on many factors.

Vista:  64KB
Mac:    64KB
Linux:    6KB

6KB!  Yikes! Well, the argument can be made that there is no need for the Linux client to use a larger initial receive window, since the servers are supposed to abide by RFC2581.  But there really isn’t much downside to using a larger initial receive window,  and we already know that many sites do benefit from a large cwnd already.  The net result is that when the server is legitimately trying to use a larger cwnd, web browsing on Linux will be slower than web browsing on Mac or Windows, which don’t artificially constrain the initial receive window.

Some good news – a patch is in the works to allow users to change the default, but you’ll need to be a TCP whiz and install a kernel change to use it.  I don’t know of any plans to change the default value on Linux yet.  Certainly if the cwnd changes are approved, the default initial receive window must also be changed.  I have yet to find any way to make linux use a larger initial receive window without a kernel change.

Two last notes: 

1) This isn’t theoretical.  It’s very visible in network traces to existing servers on the web that use larger-than-2 cwnd values.  And you don’t hit the stall just once, you hit it for every connection which tries to send more than 6KB of data in the initial burst.

2) As we look to make HTTP more efficient by using fewer connections (SPDY), this limit becomes yet-another-factor which favors protocols that use many connections instead of just one.  TCP implementors lament that browsers open 20-40 concurrent connections routinely as part of making sites load quickly.  But if a connection has an initial window of only 6KB, the use of many connections is the only way to work around the artificially low throttle.

There is always one more configuration setting to tweak.

SSL, Compression, and You

One aspect of SSL which many people are not aware of is that SSL is capable of compressing the entire SSL stream.  The authors of SSL knew that if you’re going to encrypt data, you need to compress it before you encrypt it, since well-encrypted data tends to look pretty random and non-compressible. But even though SSL supports compression, no browsers support it.  Except Chrome 6 & later.

Generally, stream-level compression at the SSL layer is not ideal.  Since SSL doesn’t know what data it is transporting, and it could be transporting data which is already compressed, such as a JPG file, or GZIP content from your web site.  And double-compression is a waste of time.  Because of this, historically, no browsers compressed at the SSL layer – we all felt certain that our good brothers on the server side would solve this problem better, with more optimal compression.

But it turns out we were wrong.  The compression battle has been waging for 15 years now, and it is still not over.  Attendees of the Velocity conference each year lament that more than a third of the web’s compressible content remains uncompressed today.

When we started work on SPDY last year, we investigated what it would take to make SSL fast, and we noticed something odd.  It seemed that the SSL sites we tested (and these were common, Fortune-500 companies) often were not compressing the content from their web servers in SSL mode!  So we asked the Web Metrics team to break-out compression statistics for SSL sites as opposed to unsecure HTTP sites.  Sure enough, they confirmed what we had noticed anecdotally – a whopping 56% of content from secure web servers that could be compressed was sent uncompressed!

Saddened and dismayed, the protocol team at Chromium decided to reverse a decade long trend, and Chrome became the first browser to negotiate compression with SSL servers.  We still recognize that compression at the application (HTTP) layer would be better.  But with less than half of compressible SSL content being compressed, optimizing for the minority seems like the wrong choice.

So how do you know if your browser compresses content over SSL?  It’s not for the faint of heart.  All it takes is your friendly neighborhood packet tracer, and a little knowledge of the SSL protocol.  Both the client and the server must agree to use compression.  So if a server doesn’t want to use it (because it may be smart enough to compress at the application layer already), that is no problem.  But, if your server uses recent OpenSSL libraries, it can.  And you can detect this by looking at the SSL “Client Hello” message.  This is the first message sent from the client after the TCP connection is established.  Here is an example from Chrome, viewed with Wireshark.

sslchromeheader

So, does this help?  Sites that use compression at the application layer don’t need this, and it has no effect (those are in the minority).  For servers that opt out, it has no effect.  But for servers that opt in,   HTTP headers do get compressed a little better, and any uncompressed data stream gets compressed much better.

I did find that https://www.apache.org/ is a showcase example for my cause. Apache.org runs a modern SSL stack and advertises compression support on the server, but they forgot to compress jquery.js.  If you load this site with Chrome, you’ll only download 77,128 bytes of data.  If you use IE8 (firefox will be similar), you’ll download 146,595 bytes of data.  Most of this difference is just plain old compression.

For the 50% of site owners out there that can’t configure their servers to do compression properly, don’t feel too bad – even apache.org can’t get it right!  (Sorry Apache, we still love you, but can we turn mod_gzip on by default now? 🙂

No Political Efficiency since 1913?

We currently have 435 legislators in the House of Representatives.  This number has been fixed since 1913.  Question:  Do we need the same number of representatives that we had back then?

On one hand, you could argue that we need more seats in Congress.  After all, there were only 97M Americans in 1913.  Today we have 307M Americans.  Surely more constituents requires a larger congress?

But think about the technology advancements since that time.  In 1913, if you wanted to communicate with your representative, what choices did you have?  He certainly didn’t visit his local district very often – the first commercial flight didn’t even take place until 1914.  Calling your representative was unlikely – there was no long distance from California to Washington at back then, and long distance calls from closer geographies were manual and time consuming.  And of course there was no internet, so real-time communication was impossible.   We did have the one-way megaphones of newspapers and magazines.  And of course, you could write a letter. 

So, in 1913, maybe we needed 435 legislators.  Each had a significant job to do with just communicating, corresponding, traveling, and coordinating between Washington and his local region. 

But today, do we need so many?  With a single email, legislators can reach far more than 225,000 people right from the comfort of his mistress’ bed.  Websites, telephones, television, and email combined certainly make the communication burden almost non-existent compared to 1913.

Obviously, there is more to legislation than just communication with constituents.  But, given the gridlock in Washington, the skyrocketing costs of Washington, and the increased dissatisfaction with the never-ending burden of an increasingly complex set of laws, maybe we should cut that 435 in half.  Any reason why not?  Or is that just the way we roll around here?

Firesheep, SPDY, and you

For the past year, the SPDY team has been advocating that SPDY only work over SSL.  Many pundits have asked why, citing that this is not in the best interest of performance.  Of course, that is true – security is not free.  But what if we can make it almost free? 

SPDY aims to give you the full security and privacy of SSL without the latency of SSL.  When you combine the improvements inherent in SPDY with an improved SSL, we believe we have a new protocol which is both significantly faster than HTTP, and yet also fully encrypted, private, and secure.  Sure, we could make SPDY without SSL.  But that would be unsecure.  And is there any good argument for a protocol of the future that doesn’t embed security natively?

So, if you weren’t convinced before, you should be convinced today.  This weekend, Firesheep was unleashed.  It’s an extension for Firefox which leverages HTTP’s lack of security to allow any user to take over most of your social networking accounts – Facebook, Twitter, etc.  Of course, Firesheep isn’t doing anything that couldn’t be done yesterday – it’s just making it available to anyone.

As we move forward, all data communication needs to be secured and private.  It’s the only way.

Mike’s Voting Guide to the Propositions Nov ‘10

First, some guiding principles:

  1. Consider who is backing each bill and how much they’re spending to back it.  The more they are spending, the more valuable it is to them.  Ask yourself why.
  2. Remember that every law has overhead – a new commission, a new study, a new enforcement, etc.  Even if the burden is placed on existing agencies (like our police officers or our firefighters or our court systems), each law usually costs money.  Unions (teachers, firefighters, police) usually support more work, because they get bigger.
  3. Be skeptical.
  4. If everything looks equal, vote no.

Second, some resources:

  1. BallotPedia.  I have found this site to be pretty comprehensive, well organized, and fair.
  2. OpenSecrets.  OpenSecret tracks political contributions and lobbying.  Their coverage is mostly at the federal level, however.

Finally, the votes!

yes Prop 19:  Legalize Marijuana.  As with alcohol, legalize it and deal with the consequences.  I won’t touch the stuff.

yesProp 20: Redistricting of Congressional Districts via committee.  Committees are just as corrupt as congress.  A computer should draw the lines, but this is better than today.

no Prop 21: Tax to fund state parks.  The park system is plagued with administrative overhead.  Supporters should donate to the parks rather than to this bill.

no Prop 22: Prohibit State Spending against Local Funds.  Our governors need to be able to legislate holistically.  This creates unnecessary boundaries.

yes Prop 23: Suspend the “Global Warning Act” until unemployment drops below 5.5%.  I don’t like California at an economic disadvantage in the global market.

no Prop 24: Increase business taxes in California.  Check out the Teacher’s Union support on this bill.  This is just a tax to prolong big government.

no Prop 25: State budget via simple majority.  The teacher’s union working to expand big-government.  I don’t want the budget controlled by the ruling party.  This is downright scary.

yes Prop 26: Make “fees” require 2/3 vote since they are taxes.  The state doesn’t have a revenue problem, it has a spending problem.  Act now, or “fees” will cripple California.

no Prop 27: Abolish committee for State Legislature redistricting.  It’s either Prop 20 or Prop 27.  Prop 20 is better.