Bleichenbacher'06 signature forgery in python-rsa

While looking at the source of python-rsa (>100K daily downloads) I found it vulnerable to a straightforward variant of the Bleichenbacher'06 attack against RSA signature verification with low public exponent.

The bug allows us to forge signatures for arbitrary messages, as long as the public key has a low exponent (e), like 3. Thankfully keys generated with python-rsa have e=65537 hardcoded, but the library offers a lot of options to import existing keys, which might very well have e=3.

Update: the issue was assigned CVE-2016-1494.

I'm taking the occasion to explain the background and details of the attack, which happens to be one of my favorites because of its simplicity. If you already know about it or are just interested in the 'sploit, skip the next section.

Bleichenbacher'06

RSA is just math. When you generate a keypair you get three numbers, N, e and d. The cool thing is that if you take a number m and elevate it to both e and d modulus N, you get back m. If you keep d secret and publish e and N, it's extremely difficult for an attacker to do d's job (because root and logarithm are hard in modular arithmetics).

To use RSA as a public key encryption primitive, you give out (N, e), Alice sends you m ^ e mod N, and you decrypt performing (m ^ e) ^ d = m mod N. Signing is the other way around: to sign a message you generate m ^ d mod N, and the recipient uses (N, e) to perform (m ^ d) ^ e = m mod N. Only you, having d, can create a signature (m ^ d mod N) that elevated to e mod N gives exactly m.

You might have noticed that m is not a real arbitrary message, but actually a number, that needs to be smaller than N. So to encrypt arbitrary messages we usually encrypt a symmetric (like, AES) key, and to sign arbitrary messages we sign the hash (like, SHA2) of the message. Both are short enough to be converted to a number lower than N.

The most widely used scheme for RSA signing until ~10 years ago (and sadly still prevalent in legacy-ridden protocols like TLS and DNSSEC) is PKCS#1 1.5. PKCS#1 1.5 signing works like this: you take the hash of the message you want to sign, and then you encode it like this

00 01 FF FF ... FF FF 00 ASN.1 HASH  

where ASN.1 is a very complex binary encoding of the hash type and length. And then you sign that encoding with RSA. FF bytes provide padding to make the message exactly as long as the modulus N.

The intuition behind the BB'06 attack is that while it's impossible (without d) to find a number that elevated to e gives exactly the encoding above, we can get to an approximation, for example by taking the e-th root of the target message. If e is small enough, the approximation might be good enough to get a message like

00 01 FF 00 ASN.1 HASH GARBAGE  

If the verification function fails to check that the hash is aligned at the end of the message (i.e. that there are enough FF bytes), we can then forge signatures that will work with any public key using a certain small e. As you can see, N becomes completely irrelevant because exponentiation by e never wraps past the modulus. It helps that the second most used value for e (which is picked arbitrarily during key generation) is 3. =)

Bleichenbacher, while presenting the attack at the rump session of CRYPTO'06, provided a "pen and paper" method of generating such a number s that when elevated to a small e gives a message with the intended prefix, as documented by Hal Finney. However, I like the cube-root-and-round-up method for its simplicity.

At the time, both OpenSSL and NSS were vulnerable to a trivial version of the attack. Since then, variants of the vulnerability were found in all sort of libraries, making the attack one of the evergreens of offensive cryptography engineering. The recent BERserk attack against NSS was nothing else than BB'06 using an ASN.1 bug to hide the garbage in two parts in the middle of the message. The cryptopals level on BB'06 even has a "Crypto-tourism informational placard"!

The python-rsa vulnerability

The python-rsa bug is not a vanilla BB'06, because the hash is compared to all the data following the ASN.1 blob, but a simple variant.

Here is the source of the verify() function:

def verify(message, signature, pub_key):  
    blocksize = common.byte_size(pub_key.n)
    encrypted = transform.bytes2int(signature)
    decrypted = core.decrypt_int(encrypted, pub_key.e, pub_key.n)
    clearsig = transform.int2bytes(decrypted, blocksize)

    # If we can't find the signature  marker, verification failed.
    if clearsig[0:2] != b('\x00\x01'):
        raise VerificationError('Verification failed')

    # Find the 00 separator between the padding and the payload
    try:
        sep_idx = clearsig.index(b('\x00'), 2)
    except ValueError:
        raise VerificationError('Verification failed')

    # Get the hash and the hash method
    (method_name, signature_hash) = _find_method_hash(clearsig[sep_idx+1:])
    message_hash = _hash(message, method_name)

    # Compare the real hash to the hash in the signature
    if message_hash != signature_hash:
        raise VerificationError('Verification failed')

    return True

def _find_method_hash(method_hash):  
    for (hashname, asn1code) in HASH_ASN1.items():
        if not method_hash.startswith(asn1code):
            continue

        return (hashname, method_hash[len(asn1code):])

    raise VerificationError('Verification failed')

HASH_ASN1 = {  
    'MD5': b('\x30\x20\x30\x0c\x06\x08\x2a\x86'
             '\x48\x86\xf7\x0d\x02\x05\x05\x00\x04\x10'),
    'SHA-1': b('\x30\x21\x30\x09\x06\x05\x2b\x0e'
               '\x03\x02\x1a\x05\x00\x04\x14'),
    'SHA-256': b('\x30\x31\x30\x0d\x06\x09\x60\x86'
                 '\x48\x01\x65\x03\x04\x02\x01\x05\x00\x04\x20'),
    'SHA-384': b('\x30\x41\x30\x0d\x06\x09\x60\x86'
                 '\x48\x01\x65\x03\x04\x02\x02\x05\x00\x04\x30'),
    'SHA-512': b('\x30\x51\x30\x0d\x06\x09\x60\x86'
                 '\x48\x01\x65\x03\x04\x02\x03\x05\x00\x04\x40'),
}

You can see that this parser will accept anything of the form

00 01 XX XX ... XX XX 00 ASN.1 HASH  

where XX are non-00 bytes. The relevant line, which jumps to the next 00 after the 00 01 prefix is:

sep_idx = clearsig.index(b('\x00'), 2)  

There is no check that the padding bytes are actually FF bytes. This means we can hide imperfect cube-rooting garbage in the space between the 00 01 prefix and the 00 separator.

Such a variant is already well known in the literature and is actually a subset of the recent BERserk attack. There are many ways to efficiently exploit it; I used the method I implemented for BERserk, which I find pretty intuitive.

Our target is making a message that, when cubed, has a given prefix (00 01) and suffix (00 ASN.1 HASH). Prefix is easy (cube root will do), so let's start with the suffix. We will call the crafted message s, and its cube c. We will count bits from 0, where the 0th bit is the rightmost, least significant one.

The base intuition is that flipping the Nth bit in s causes the Nth bit in c to flip, and leaves all bits from 0 to N-1 unaffected. If you are having trouble imagining this try visualizing the pen-and-paper column operation. Using this property, we can build s simply by iterating over the bits in the suffix from the right, flipping the bit in s whenever the bit in c is not the one we want.

Let's see an example: searching for a s that cubes to a c ending in 1010101101.

       9876543210  <- Indexes

s:     0000000001  <- Initial s  
c:     0000000001  <- Bit 0 an 1 match, skip  
tgt:   1010101101  <- Target suffix of c

s:     0000000001  
c:     0000000001  <- Bit 2 doesn't match, flip in s  
tgt:   1010101101

s:     0000000101  
c:     0001111101  <- Bit 3 matches, skip  
tgt:   1010101101

s:     0000000101  
c:     0001111101  <- Bit 4 doesn't, flip in s  
tgt:   1010101101

s:     0000010101  
c: 10010000101101  <- Bit 7 doesn't match, flip in s  
tgt:   1010101101

s:     0010010101  
c: ...00110101101  <- Bit 8 doesn't match, flip in s  
tgt:   1010101101

s:     0110010101  
c: ...10010101101  <- Bit 9 doesn't match, flip in s  
tgt:   1010101101

s:     1110010101  
c: ...01010101101  <- Done!  
tgt:   1010101101

1110010101 ^ 3 = 101101111101011111101010101101  
                                     ^^^^^^^^^^

There we have it. If our signature ends in 1110010101, the cubed "clearsig" will end in 1010101101. If we apply the same process to the target suffix 00 ASN.1 HASH we get the final part of our fake signature.

The prefix is easier to craft: just take the cube root of 00 01 XX XX ... where XX are random bytes, as many as are needed to make the string as long as the public key. It won't be an exact cube root, but it will be precise enough to keep the first two bytes intact. (This step is pure BB'06).

Now just stitch the two parts together by truncating len(sig_suffix) bytes at the end of the prefix cube root and concatenating. The suffix is short enough that changing it probably won't affect the two high bytes of the cube, and we know that adding bits to the left does not corrupt the cube suffix.

Final problem: there must be no other 00 bytes in the cube. Lazy solution: repeat the prefix computation with different filler random bytes until there aren't. (Yeah, I know. So what? :P)

You can find an iPython notebook putting it all together and validating it against python-rsa below. (These things are amazing for presenting algorithms!)

Practice safe crypto

Reading about how often this vulnerability was found in the wild in the last 10 years, you might think that RSA signature verification is just a hard problem, and that we are bound to get it wrong over and over again. Or at least that PKCS#1 1.5 is.

Well, even if PKCS#1 1.5 is atrocious, it's perfectly possible to implement verification safely: instead of trying to parse the encoded signature, generate what you expect the signature to look like, complete of padding, ASN.1 and hash, and then simply compare the user/attacker's one. (Remember kids: parsing is dangerous!) AGL and tqbf explain it better than I do, anyway.

This entirely kills any opportunity for the attacker to sling garbage past you.

As usual, the fact that it's possible to implement a dangerous algorithm correctly, doesn't mean it's humanely possible, and safer algorithms (and primitives) must always be preferred. (See also: deterministic ECDSA.)

Moreover, there are extremely fast algorithms to perform RSA with e=65537, so there's no reason to expose yourself to vulnerable implementations by using keys with e=3.

Next steps

I applied for a CVE (CVE-2016-1494) and submitted a patch.

To prove once more how dangerous it is to hand-roll crypto, I found that also my old implementation of RSA verification in youtube-dl (used to verify updates) is vulnerable to a similar attack. Since youtube-dl can't import any dependencies, I replaced the verify function with a build-and-compare one. For the record, youtube-dl users were never at risk because I had hardcoded a public key with e=65537.

For more cryptography failures, follow me on Twitter.