How the new Gmail image proxy works and what this means for you

Google recently announced that images in emails will be displayed automatically by default to Gmail users, thanks to an anonymizing proxy operated by them.

This, they say, will actually benefit users privacy.

This might very well be true if images are prefetched when an email is received. The help page however does not make it seem like so (and states that images are transcoded, interesting).

Since this feature has already been rolled out to me, I thought to check out how it actually works.

So, I set up a slightly modified Python SimpleHTTPServer to also log request headers (just added the line below)

print json.dumps(self.headers.dict, indent=4, separators=(',', ': '))  

Downloaded this image and exposed it at http://filosottile.info/test.png

the test image

Here how a request from my browser looks like

{
    "accept-language": "en-US,en;q=0.8,it-IT;q=0.6,it;q=0.4",
    "accept-encoding": "gzip,deflate,sdch",
    "cache-control": "max-age=0",
    "connection": "keep-alive",
    "accept": "image/webp,*/*;q=0.8",
    "user-agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/31.0.1650.63 Safari/537.36",
    "host": "filosottile.info",
    "if-modified-since": "Wed, 31 Oct 2012 23:52:07 GMT"
}
cpe-68-175-8-151.nyc.res.rr.com - - [12/Dec/2013 22:11:54] "GET /test.png HTTP/1.1" 200 -  

Then, I sent the following HTML message to myself at 17:21:29 EST (here the full email body when received)

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">  
<html>  
<head><title></title></head>  
<body>

<img src="http://filosottile.info/test.png">

</body>  
</html>  

It immediately showed up on my phone. No requests. I waited a bit and opened my desktop inbox. No request.

Then, I opened the email, the image automatically loaded and immediately a request got logged on my server

{
    "host": "filosottile.info",
    "connection": "Keep-alive",
    "accept-encoding": "gzip,deflate",
    "user-agent": "Mozilla/5.0 (Windows; U; Windows NT 5.1; de; rv:1.9.0.7) Gecko/2009021910 Firefox/3.0.7 (via ggpht.com)"
}
google-proxy-66-249-88-131.google.com - - [12/Dec/2013 22:23:40] "GET /test.png HTTP/1.1" 200 -  

The image is indeed transcoded: exact same metadata (format, size...) but different body. Here is it, as got from the URL https://ci6.googleusercontent.com/proxy/5YvKA8rt5kSAfWUwLZ1LfA_3fBdc2Qr5pHI-aWBr8fg0I27pvkXn5vljroVhYVWBHb5iCIIs=s0-d-e1-ft#http://filosottile.info/test.png

the test image

And here are the md5sum and identify outputs

MD5 (unnamed.png) = ff614aa9214d23e6c292d357f043a7a5  
MD5 (test.png) = 5dfe622b1ce0d027e3918d601ff160d0  
unnamed.png PNG 568x63 568x63+0+0 8-bit sRGB 8.98KB 0.000u 0:00.000  
test.png PNG 568x63 568x63+0+0 8-bit sRGB 8.66KB 0.000u 0:00.009  

Also, no caching is performed server-side, every time I downloaded that URL, a request showed up on my server.

So, what's the issue?

The issue is that the single most useful piece of information a sender gets from you (or the Google proxy) loading the image is that/when you read the email. And this is not mitigated at all by this system, as it is only really a proxy and when you open an email the server will see a request. Mix that with the ubiquitous uniquely-named images (images with a name that is unique to an email) and you get read notifications.

Ok, they won't know my IP and this is really good, they won't set tracking cookies to link my different email accounts and they won't know what browser I'm running, they might even fail to exploit my machine thanks to transcoding (if they wanted to waste such a 0-day) but the default setting -- what most users settle on, let's face it -- just got weaker on privacy.

Now, Gmail has "✓ Seen".

Note: you can turn automatic loading off and gain the privacy benefits of the proxy anyway.

And you can follow me on Twitter, too.

Discuss on HackerNews

Bonus: the ArsTechnica article

ArsTechnica put out a terribly un-informed and un-researched article that is so full of errors that I'm going to dissect it in reading order.

Starting from the title, "Gmail blows up e-mail marketing by caching all images on Google servers". As you can see, this might even benefit email marketing, for sure not blow it up.

[...] it will cache all images for Gmail users. Embedded images will now be saved by Google, and the e-mail content will be modified to display those images from Google's cache, instead of from a third-party server.

Simply wrong.

E-mail marketers will no longer be able to get any information from images—they will see a single request from Google, which will then be used to send the image out to all Gmail users. Unless you click on a link, marketers will have no idea the e-mail has been seen.

We verified that instead this data is alive and kickin', and there is NOT a single request.

While this means improved privacy from e-mail marketers, Google will now be digging deeper than ever into your e-mails and literally modifying the contents. If you were worried about e-mail scanning, this may take things a step further.

Google always modified the email contents to sanitize HTML and, guess what, to disable images. Also, nothing barred Google from fetching the images in your emails anyway.

Google servers should also be faster than the usual third-party image host.

All the opposite, as it is a proxy server and NOT a caching server it adds roundtrips to image loading.