Analyzing Go Vendoring with BigQuery

GitHub published a snapshot of all the public open-source repositories to BigQuery and Francesc used it to draw some cool statistics about Go projects.

I used the same dataset to analyze how the Go ecosystem does vendoring. Disclosure: there's some ego stroking here, as I'm the author of gvt. (Try it! It's meant to be as easy and idiomatic as possible.)

First, I extracted the subset of Go repositories. Then, I used filepath patterns to figure out what vendoring tool they used and wether they checked in the source of the dependencies. Finally, I sorted the tools by popularity by adding together the stars of each repository using them (from the GitHubArchive dataset).

Note: the dataset includes only public GitHub repositories with a LICENSE, and seems to be one or two months old.

Here are the results:

ToolRepositoriesChecking source inTotal stars
Godeps vendor60437862.58%76312
Godeps _workspace1204114995.43%53263
None of the above305--42883

I'm happy to see that checking source in is an established practice (since it makes build reproducible and packages go-get-able) with 78% projects adopting it.

Below are some popular projects using each tool. (You can find the entire lists here.)

Funnily enough, the most popular gb project is my whosthere, but no public project of mine used gvt at the time of the snapshot. (All my private and CloudFlare ones do!) On the other hand, 10% of the gvt projects are by @jessfraz.

Figuring out what the median number of vendored dependencies is and what the most popular ones are is... left as an exercise to the reader.

All the queries used for this article are here.

For more random Go statistics, follow me on Twitter.


Godeps (vendor style)

Godeps (_workspace style)