Reproducing Go binaries byte-by-byte

Fully reproducible builds are important because they bridge the gap between auditable open source and convenient binary artifacts. Technologies like TUF and Binary Transparency provide accountability for what binaries are shipped to users, but that's of limited utility if there is no way (short of reverse engineering) of proving that the binary is in fact the result of compiling the intended source.

That's why the Debian project is putting tremendous effort into making packages reproducible. The good news is that Go builds are reproducible by default.

Prerequisites

There are a few common sense requirements.

  • Of course, the builds must be reproducible in the weaker sense: that means the source code must match perfectly.
    • This includes dependencies, so the project has to vendor them strictly. This is important beyond binary reproducibility: you don't want for "version 1.3" of a software to mean different things based on when it was built.
  • The compiler version must be the same.
  • GOPATH and GOROOT must match (#16860), annoyingly, as they are all over the binary in debug file paths.
    • EDIT: As Shawn Walker suggests on Twitter, you can strip the paths with -asmflags -trimpath. (Of course only works if you control the original build.)
    • Note: the default GOROOT, the one that the compiler will use if the environment variable is not set, must also match, since it will be copied into binaries (#17943). You can only change that by recompiling the toolchain in the right directory.
  • In cgo be dragons (#15405, #19964, #9206), meaning that it's possible to get reproducible builds since 1.7 but it depends on the C linker.

Interestingly, the build host architecture does not matter. In other words, builds are reproducible across cross-compiling.

Reproducing rclone

I picked rclone for this exercise because it's a self-contained Go binary that vendors dependencies and offers binary installs.

Here's the binaries we will try to reproduce.

bfe0d7e041b4020001b6c48ff170e727243855cbb447f96d983e05b04c090ea8  rclone-v1.36-windows-386/rclone.exe
71827d554c5d860d302ec76d79dcd8433fe63065eac5df4d81b4d2bbefc760b3  rclone-v1.36-linux-amd64/rclone
61ab593c6a007e54c63e64ff2b6ee66dba77c40e12d8ca6b81cf50e8272f43b3  rclone-v1.36-openbsd-amd64/rclone

Detecting parameters

To start, we need to figure out the GOPATH and GOROOT values they were built with. This is easy to figure out using debug/gosym and debug information to query the file path of known functions. (PE support is... left as an exercise to the reader.)

$ go run gosym.go rclone-v1.36-linux-amd64/rclone
/home/ncw/go/src/github.com/ncw/rclone/rclone.go
/opt/go/go1.8/src/runtime/extern.go

So the GOPATH is /home/ncw/go and the GOROOT is /opt/go/go1.8.

For the compiler version I don't have a good solution (that will work even if DWARF is stripped), so I'll give you a bad one, that relies on the global variable backing runtime.Version().

$ egrep -a -o 'go[0-9\.]+' rclone-v1.36-linux-amd64/rclone
go.
go1.8
go1.8

Yes, it's literally strings.

You're also on your own for the compiler's default GOROOT, but strings will bring it up.

Finally, you might have to look at the project docs to find out what flags they use. rclone uses -s, -X and CGO_ENABLED=0.

Reproducing it

Since the host architecture does not matter but the environment does, we'll use Docker to do our build.

FROM debian:jessie

RUN apt-get update && apt-get install -y unzip wget tar ca-certificates git build-essential

RUN wget https://storage.googleapis.com/golang/go1.8.linux-amd64.tar.gz
RUN tar xvf go1.8.linux-amd64.tar.gz
RUN mkdir -p /opt/go && cp -r go /opt/go/go1.8
RUN cd /opt/go/go1.8/src && GOROOT_BOOTSTRAP=/go ./make.bash

ENV PATH "/opt/go/go1.8/bin:$PATH"

RUN mkdir -p /home/ncw/go/src/github.com/ncw/
RUN cd /home/ncw/go/src/github.com/ncw && git clone https://github.com/ncw/rclone
RUN cd /home/ncw/go/src/github.com/ncw/rclone && git checkout v1.36

ENV GOPATH /home/ncw/go

ENTRYPOINT ["go"]
$ docker run -it --rm -v $(pwd):$(pwd) -w $(pwd) -e CGO_ENABLED=0 4f6d1bc86d5e \
  build --ldflags "-s -X github.com/ncw/rclone/fs.Version=v1.36" \
  -o rclone-v1.36-linux-amd64/rclone.ours github.com/ncw/rclone

To cross-compile, I just added the GOOS and GOARCH environment variables with docker run -e.

Debugging

Reproducing someone else's build is not always easy. And indeed, my rclone build mismatched.

The first thing to look at is the Build ID. The Build ID is a hash of the filenames of the compiled files, plus the version of the compiler (and other things in zversion.go, like the default GOROOT). See pkg.go.

You can read it with readelf -x .note.go.buildid or by extracting it from the text section.

If the build ID does not match, the first thing you can compare are the paths of all symbols, again with gosym. Here's a slight patch to the gosym.go script we used above:

	for _, fu := range table.Funcs {
		path, _, _ := table.PCToLine(fu.Entry)
		fmt.Println(path)
	}

If the build ID matches, then you're looking at compiler flags.

Failing all that, strings and vbindiff are your friend.

What got me with rclone was not rebuilding the compiler in the new location to get the right default GOROOT—the make.bash step of the Dockerfile. If you enjoy debugging, here's the tootstorm on Mastodon.

Result

bfe0d7e041b4020001b6c48ff170e727243855cbb447f96d983e05b04c090ea8  rclone-v1.36-windows-386/rclone.exe
bfe0d7e041b4020001b6c48ff170e727243855cbb447f96d983e05b04c090ea8  rclone-v1.36-windows-386/rclone.ours
71827d554c5d860d302ec76d79dcd8433fe63065eac5df4d81b4d2bbefc760b3  rclone-v1.36-linux-amd64/rclone
71827d554c5d860d302ec76d79dcd8433fe63065eac5df4d81b4d2bbefc760b3  rclone-v1.36-linux-amd64/rclone.ours
61ab593c6a007e54c63e64ff2b6ee66dba77c40e12d8ca6b81cf50e8272f43b3  rclone-v1.36-openbsd-amd64/rclone
61ab593c6a007e54c63e64ff2b6ee66dba77c40e12d8ca6b81cf50e8272f43b3  rclone-v1.36-openbsd-amd64/rclone.ours

So good news, rclone is not backdoored!

If you enjoy these exercises, you can follow me on Twitter or Mastodon.