We need to talk about Session Tickets

More specifically, TLS 1.2 Session Tickets.

Session Tickets, specified in RFC 5077, are a technique to resume TLS sessions by storing key material encrypted on the clients. In TLS 1.2 they speed up the handshake from two to one round-trips.

Unfortunately, a combination of deployment realities and three design flaws makes them the weakest link in modern TLS, potentially turning limited key compromise into passive decryption of large amounts of traffic.

How Session Tickets work

A modern TLS 1.2 connection starts like this:

  • The client sends the supported parameters;
  • the server chooses the parameters and sends the certificate along with the first half of the Diffie-Hellman key exchange;
  • the client sends the second half of the Diffie-Hellman exchange, computes the session keys and switches to encrypted communication;
  • the server computes the session keys and switches to encrypted communication.

This involves two round-trips between client and server before the connection is ready for application data.

normal 1.2

The Diffie-Hellman key exchange is what provides Forward Secrecy: even if the attacker obtains the certificate key and a connection transcript after the connection ended they can't decrypt the data, because they don't have the ephemeral session key.

Forward Secrecy also translates into security against a passive attacker. An attacker that can wiretap but not modify the traffic has the same capabilities of an attacker that obtains a transcript of the connection after it's over. Preventing passive attacks is important because they can be carried out at scale with little risk of detection.

Session Tickets reduce the overhead of the handshake. When a client supports Session Tickets, the server will encrypt the session key with a key only the server has, the Session Ticket Encryption Key (STEK), and send it to the client. The client holds on to that encrypted session key, called a ticket, and to the corresponding session key. The server forgets about the client, allowing stateless deployments.

The next time the client wants to connect to that server it sends the ticket along with the initial parameters. If the server still has the STEK it will decrypt the ticket, extract the session key, and start using it. This establishes a resumed connection and saves a round-trip by skipping the key negotiation. Otherwise, client and server fallback to a normal handshake.

resumed 1.2

For a recap you can also watch the first part of my 33c3 talk.

Fatal flaw #1

The first problem with 1.2 Session Tickets is that resumed connections don't perform any Diffie-Hellman exchange, so they don't offer Forward Secrecy against the compromise of the STEK. That is, an attacker that obtains a transcript of a resumed connection and the STEK can decrypt the whole conversation.

How the specification solves this is by stating that STEKs must be rotated and destroyed periodically. I now believe this to be extremely unrealistic.

Session Tickets were expressly designed for stateless server deployments, implying scenarios where there are multiple servers serving the same site without shared state. These server must also share STEKs or resumption wouldn't work across them.

As soon as a key requires distribution it's exposed to an array of possible attacks that an ephemeral key in memory doesn't face. It has to be generated somewhere, and transmitted somehow between the machines, and that transmission might be recorded or persisted. Twitter wrote about how they faced and approached exactly this problem.

Moreover, an attacker that compromises a single machine can now decrypt traffic flowing through other machines, potentially violating security assumptions.

Finally, if a key is not properly rotated it allows an attacker to decrypt past traffic upon compromise.

TLS 1.3 solves this by supporting Diffie-Hellman along with Session Tickets, but TLS 1.2 was not yet structured to support one round trip Diffie-Hellman (because of the legacy static RSA structure).

These observations are not new, Adam Langley wrote about them in 2013 and TLS 1.3 was indeed built to address them.

Fatal flaw #2

Session Tickets contain the session keys of the original connection, so a compromised Session Ticket lets the attacker decrypt not only the resumed connection, but also the original connection.

This potentially degrades the Forward Secrecy of non-resumed connections, too.

The problem is exacerbated when a session is regularly resumed, and the same session keys keep getting re-wrapped into new Session Tickets (a resumed connection can in turn generate a Session Ticket), possibly with different STEKs over time. The same session key can stay in use for weeks or even months, weakening Forward Secrecy.

TLS 1.3 addresses this by effectively hashing (a one-way function) the current keys to obtain the keys for the resumed connection. While hashing is a pretty obvious solution, in TLS 1.2 there was no structured key schedule, so there was no easy agnostic way to specify how keys should be derived for each different cipher suite.

Fatal flaw #3

The NewSessionTicket message containing the Session Ticket is sent from the server to the client just before the ChangeCipherSpec message.

Client                                               Server

ClientHello                  -------->  
                             <--------      ServerHelloDone
Finished                     -------->  
                             <--------             Finished
Application Data             <------->     Application Data  

The ChangeCipherSpec message enables encryption with the session keys and the negotiated cipher, so everything exchanged during the handshake before that message is sent in plaintext.

This means that Session Tickets are sent in the clear at the beginning of the original connection.

(╯°□°)╯︵ ┻━┻

An attacker with the STEK doesn't need to wait until session resumption is attempted. Session Tickets containing the current session keys are sent at the beginning of every connection that merely supports Session Tickets. In plaintext on the wire, ready to be decrypted with the STEK, fully bypassing Diffie-Hellman.

TLS 1.3 solves this by... not sending them in plaintext. There is no strong reason I can find for why TLS 1.2 wouldn't wait until after the ChangeCipherSpec to send the NewSessionTicket. The two messages are sent back to back in the same flight. Someone suggested it might be not to complicate implementations that do not expect encrypted handshake messages (except Finished).

1 + 2 + 3 = dragnet surveillance

The unfortunate combination of these three well known flaws is that an attacker that obtains the Session Ticket Encryption Key can passively decrypt all connections that support Session Tickets, resumed and not.

It's grimly similar to a key escrow system: just before switching to encrypted communication, the session keys are sent on the wire encrypted with a (somewhat) fixed key.

Passive attacks are the enablers of dragnet surveillance, what HTTPS aims to prevent, and the same actors that are known to engage in dragnet surveillance have specialized in surgical key extraction attacks.

There is no proof that these attacks are currently performed and the aim of this post is not to spread FUD about TLS, which is still the most impactful security measure on the Internet today despite all its defects. However, war-gaming the most effective attacks is a valuable exercise to ensure we focus on improving the important parts, and Session Tickets are often the single weakest link in TLS, far ahead of the CA system that receives so much more attention.

Session Tickets in the real world

The likeliness and impact of the described attacks changes depending on how Session Tickets are deployed.

Drew Springall et al. made a good survey in "Measuring the Security Harm of TLS Crypto Shortcuts", revealing how many networks neglect to regularly rotate STEKs. Tim Taubert wrote about what popular software stacks do regarding key rotation. The landscape is bleak.

In some cases, the same STEK can be used across national borders, putting it under multiple jurisdictional threats. A single compromised machine then enables an attacker to decrypt traffic passively across the whole world by simply exfiltrating a short key every rotation period.

Mitigating this by using different STEKs across geographical locations involves a trade-off, since it disables session resumption for clients roaming across them. It does however increase the cost for what appears to be the easiest dragnet surveillance avenue at this time, which is always a good result.

In conclusion, I can't wait for TLS 1.3.