Building Python modules with Go 1.5

tl;dr: with Go 1.5 you can build .so objects and import them as Python modules, running Go code (instead of C) directly from Python. Here's the code.

The Go 1.5 release brings a number of nifty changes. The one we will be playing with today is the ability of the standard toolchain to build libraries (.so, .a) exporting a C ABI. (This is just one of an exciting series of new and planned buildmodes.)

The ABI is the low-level binary interface that functions employ to call each other. It standardizes things like where and how to pass arguments and return values, what happens to CPU registries, etc. It's needed so that for example shared libraries (.so objects) built by one compiler can be loaded by executables built by a different compiler.

Various languages can be compiled to libraries exporting a C-compatible ABI: C (obviously), C++, Rust... and now Go. These libraries can all be used just like normal C libraries. This way we can package Go functions and then run them from software that might have never heard of Go just by linking against the .so or building against the .a.

I got inspired by this post calling Go shared libraries from Firefox Add-ons and decided to try the same with another .so consuming interface: Python modules. In Python you can import a properly constructed .so just like you import a .py file. This way you can reuse your Go core or optimize hotspots without playing with fire C.

In this post we'll go through building Go shared libraries, then C Python modules, and finally put the things together to build Go Python modules. Feel free to peek at the final result.

Running Go shared libraries from C

A first simple try: let's build a .so with Go and run it from a C binary.

The release notes above point us to go help buildmodes:

$ go help buildmode
The 'go build' and 'go install' commands take a -buildmode argument which  
indicates which kind of object file is to be built. Currently supported values  
are:

[...]

-buildmode=c-shared
    Build the listed main packages, plus all packages that they
    import, into C shared libraries. The only callable symbols will
    be those functions exported using a cgo //export comment.
    Non-main packages are ignored.

[...]

So what we need is go build -buildmode=c-shared, cgo and a main package, sounds good!

The cgo export command is documented in go doc cgo, section "C references to Go". Essentially, write //export FUNCNAME before the function definition.

Here is how our sum.go looks (the empty main function is just there to make the compiler happy, it's ignored with -buildmode=c-shared):

package main

import "C"

//export Sum
func Sum(a, b int) int {  
    return a + b
}

func main() {}

We build it into a shared library object:

$ go build -buildmode=c-shared -o sum.so sum.go

And we get a handy (and interesting!) header file sum.h for free with the definitions of all Go standard types and obviously our Sum function declaration.

Let's try it. Here's a banal main.c file:

#include "sum.h"
#include <stdio.h>

int main(int argc, char const *argv[])  
{
    printf("%d\n", Sum(2, 40));
    return 0;
}
$ gcc -Wall -o main main.c ./sum.so
$ ./main
42  

Olè!

As a more complex example, to play with types, here's a function that takes a C string and returns a Go string. The generated boilerplate cgo header is pretty self-explanatory, but the full docs are again at go doc cgo.

package main

import "C"

//export AddDot
func AddDot(s *C.char) string {  
    return C.GoString(s) + "."
}

func main() {}  
#include "dot.h"
#include <stdio.h>

int main(int argc, char const *argv[])  
{
    GoString res = AddDot("Hello, world");
    printf("%.*s\n", (int)res.n, res.p);
    return 0;
}
$ go build -buildmode=c-shared -o dot.so dot.go
$ gcc -Wall -o main main.c ./dot.so
$ ./main
Hello, world.  

Note: you can also use -buildmode=c-archive to build .a objects which you can then link statically into your binaries. Also note that hardcoding the shared library path like ./dot.so is the Wrong (But Easy) Way™ here.

For more implementation details you might want to read the design document.

C Python extensions

CPython, the "main" implementation of the Python interpreter exposes an extensive C API to interact with it and for dealing with Python types. Moreover, it can load .so objects as modules which can then be called from Python code.

You might already see where this is heading, but let's start by just making a regular C extension. We'll be using Python 3.4 because it's much more pleasant to work with[citation needed] and it has a stable ABI, meaning that we can compile extensions that work with any version >= 3.2 of CPython. Anyway all the linked docs have a version dropdown and it shouldn't be hard to adapt this guide to 2.7.

Following the official tutorial, we write this short C file:

#define Py_LIMITED_API
#include <Python.h>

static PyObject *  
sum(PyObject *self, PyObject *args)  
{
    const long long a, b;

    if (!PyArg_ParseTuple(args, "LL", &a, &b))
        return NULL;

    return PyLong_FromLongLong(a + b);
}

static PyMethodDef FooMethods[] = {  
    {"sum", sum, METH_VARARGS, "Add two numbers."},
    {NULL, NULL, 0, NULL}
};

static struct PyModuleDef foomodule = {  
   PyModuleDef_HEAD_INIT, "foo", NULL, -1, FooMethods
};

PyMODINIT_FUNC  
PyInit_foo(void)  
{
    return PyModule_Create(&foomodule);
}
$ gcc -Wall -fPIC -shared -o foo.so `pkg-config --cflags --libs python3` foo.c
$ python3
Python 3.4.3 (default, Jul 13 2015, 12:18:23)  
[GCC 4.2.1 Compatible Apple LLVM 6.1.0 (clang-602.0.53)] on darwin
Type "help", "copyright", "credits" or "license" for more information.  
>>> import foo
>>> foo.sum(40, 2)
42  
>>>

Basically, all we need to expose is a PyInit_foo function, which calls PyModule_Create passing in a pointer to a PyModuleDef metadata object. The metadata eventually includes a PyMethodDef object with a pointer to our PyObject * sum(PyObject *, PyObject *) function, which is the actual code we want foo.sum() to run.

Building a Go Python module

Nice! Now we just have to put the two pieces together: building a Go shared library and using a shared library as a Python module.

Here the reference documentation is go doc cgo, give it a whole read, we'll use most of it anyway.

A good starting point is just slapping the whole foo.c file in the cgo preamble (the comment block before import "C") with a #cgo pkg-config: python3 line and a empty main function.

package main

/*

#cgo pkg-config: python3
#define Py_LIMITED_API
#include <Python.h>

static PyObject *  
sum(PyObject *self, PyObject *args)  
{
    const long long a, b;

    if (!PyArg_ParseTuple(args, "LL", &a, &b))
        return NULL;

    return PyLong_FromLongLong(a + b);
}

static PyMethodDef FooMethods[] = {  
    {"sum", sum, METH_VARARGS, "Add two numbers."},
    {NULL, NULL, 0, NULL}
};

static struct PyModuleDef foomodule = {  
   PyModuleDef_HEAD_INIT, "foo", NULL, -1, FooMethods
};

PyMODINIT_FUNC  
PyInit_foo(void)  
{
    return PyModule_Create(&foomodule);
}

*/
import "C"

func main() {}  

It works, but it has very little of a "Go module".

$ go build -buildmode=c-shared -o foo.so foo.go
$ python3 -c 'import foo; print(foo.sum(2, 40))'
42  

What we want is to write the sum function as an exported Go function. Since we will need to reference it in FooMethods anyway, we replace its implementation in the preamble with a declaration:

PyObject * sum(PyObject *, PyObject *);  

The docs tell us not to put definitions in the cgo preamble of files that include //export comments, because they will be included twice: in the header and in the cgo source

Using //export in a file places a restriction on the preamble: since it is copied into two different C output files, it must not contain any definitions, only declarations. If a file contains both definitions and declarations, then the two output files will produce duplicate symbols and the linker will fail. To avoid this, definitions must be placed in preambles in other files, or in C source files.

So we create a new file sum.go and put the exported function there:

package main

// #cgo pkg-config: python3
// #define Py_LIMITED_API
// #include <Python.h>
import "C"

//export sum
func sum(self, args *C.PyObject) *C.PyObject {  
    return C.PyLong_FromLongLong(0)
}
$ go build -buildmode=c-shared -o foo.so foo.go sum.go
$ python3 -c 'import foo; print(foo.sum(2, 40))'
0  

Our first proper Go code ran from Python!

Now in a pure exercise of cgo, we need to actually implement the sum function. This would be easy, but the PyArg_ParseTuple function is variadic (it takes a variable number of arguments) and cgo does not support them.

To workaround this, we write a short ad-hoc wrapper in the foo.go preamble:

// Workaround missing variadic function support
// https://github.com/golang/go/issues/975
int PyArg_ParseTuple_LL(PyObject * args, long long * a, long long * b) {  
    return PyArg_ParseTuple(args, "LL", a, b);
}

And put its declaration in sum.go:

// int PyArg_ParseTuple_LL(PyObject *, long long *, long long *);
import "C"  

We can now write a real sum function:

//export sum
func sum(self, args *C.PyObject) *C.PyObject {  
    var a, b C.longlong
    if C.PyArg_ParseTuple_LL(args, &a, &b) == 0 {
        return nil
    }
    return C.PyLong_FromLongLong(a + b)
}

✨🎉

$ go build -buildmode=c-shared -o foo.so foo.go sum.go
$ python3 -c 'import foo; print(foo.sum(2, 40))'
42  

As a final touch, we realize that foo.go amounts to just the cgo preamble, which we can just put in a .c file that go build will compile with the rest of the package.

When the Go tool sees that one or more Go files use the special import "C", it will look for other non-Go files in the directory and compile them as part of the Go package. Any .c, .s, or .S files will be compiled with the C compiler.

The complete demo source

$ go build -buildmode=c-shared -o foo.so
$ python3 -c 'import foo; print(foo.sum(2, 40))'
42  

sum.go

package main

// #cgo pkg-config: python3
// #define Py_LIMITED_API
// #include <Python.h>
// int PyArg_ParseTuple_LL(PyObject *, long long *, long long *);
import "C"

//export sum
func sum(self, args *C.PyObject) *C.PyObject {  
    var a, b C.longlong
    if C.PyArg_ParseTuple_LL(args, &a, &b) == 0 {
        return nil
    }
    return C.PyLong_FromLongLong(a + b)
}

func main() {}  

foo.c

#define Py_LIMITED_API
#include <Python.h>

PyObject * sum(PyObject *, PyObject *);

// Workaround missing variadic function support
// https://github.com/golang/go/issues/975
int PyArg_ParseTuple_LL(PyObject * args, long long * a, long long * b) {  
    return PyArg_ParseTuple(args, "LL", a, b);
}

static PyMethodDef FooMethods[] = {  
    {"sum", sum, METH_VARARGS, "Add two numbers."},
    {NULL, NULL, 0, NULL}
};

static struct PyModuleDef foomodule = {  
   PyModuleDef_HEAD_INIT, "foo", NULL, -1, FooMethods
};

PyMODINIT_FUNC  
PyInit_foo(void)  
{
    return PyModule_Create(&foomodule);
}

Concurrency

The runtime behavior when loaded as a plugin is not widely documented. All I could find is the design document linked above, which tells us that the runtime is initialized once when the first Go plugin is loaded. Apparently a full fledged Go runtime runs alongside the main process.

Empirically, goroutines work naturally, and they keep running after the entry function returns.

//export tick
func tick(self, args *C.PyObject) *C.PyObject {  
    go func() {
        for range time.NewTicker(time.Second).C {
            log.Println("tick")
        }
    }()
    return C.PyLong_FromLong(0)
}
>>> import foo
>>> foo.tick()
0  
>>> 2015/08/25 22:50:16 tick
2015/08/25 22:50:17 tick  
2015/08/25 22:50:18 tick  

On the Python side, you manage the GIL just like you do in C. You release it before blocking or going busy so that other Python threads can run (similarly to how the goroutines scheduler works) and you don't interact with Python objects until you re-acquire it—which you can do even from a different goroutine.

//export gil
func gil(self, args *C.PyObject) *C.PyObject {  
    var res *C.PyObject

    tState := C.PyEval_SaveThread()

    var mu sync.Mutex
    mu.Lock()

    go func() {
        C.PyEval_RestoreThread(tState)
        res = C.PyLong_FromLong(1)
        mu.Unlock()
    }()

    mu.Lock()

    return res
}
>>> import foo
>>> foo.gil()
1  

Bonus: the needlessly hard way

For the sake of "showing the process", I'm posting here the stupidly complex way I first got this working. (Don't you hate when blog posts look like the author got everything right at the first try?)

I didn't realize I could just declare the sum function in the cgo preamble, put it in the PyMethodDef definition and then implement it in Go somewhere else. So I converted the whole thing in Go. It involved a lot of manual macro extension.

package main

/*

#include <Python.h>

// Workaround missing variadic function support
// https://github.com/golang/go/issues/975

int PyArg_ParseTuple_LL(PyObject * args, long long * a, long long * b) {  
    return PyArg_ParseTuple(args, "LL", a, b);
}

*/
import "C"
package main

// #cgo pkg-config: python3
// #include <Python.h>
//
// int PyArg_ParseTuple_LL(PyObject *, long long *, long long *);
//
// PyObject * sum(PyObject *, PyObject *);
import "C"

//export sum
func sum(self, args *C.PyObject) *C.PyObject {  
    var a, b C.longlong
    if C.PyArg_ParseTuple_LL(args, &a, &b) == 0 {
        return nil
    }
    return C.PyLong_FromLongLong(a + b)
}

var FooMethods = []C.PyMethodDef{  
    {
        C.CString("sum"),
        C.PyCFunction(C.sum),
        C.METH_VARARGS, [4]byte{},
        C.CString("Add two numbers."),
    },
    {nil, nil, 0, [4]byte{}, nil},
}

var PyModuleDef_HEAD_INIT = C.PyModuleDef_Base{  
    C.PyObject{1, nil}, nil, 0, nil,
}
var foomodule = C.PyModuleDef{  
    PyModuleDef_HEAD_INIT,
    C.CString("foo"), nil, -1, &FooMethods[0],
    nil, nil, nil, nil,
}

//export PyInit_foo
func PyInit_foo() *C.PyObject {  
    return C.PyModule_Create2(&foomodule, 3)
}

func main() {}  

As you can see I ended up declaring sum in the preamble anyway and then referencing it as C.PyCFunction(C.sum). It's actually nice to know that it's possible to craft all the needed C objects with cgo.

go-pymodule, a small helper function

I'm writing a thin helper library to build Go Python modules. It's not meant to offer comprehensive Python API bindings, calling C.Py* is fine, but it seeks to make those bits that are particularly annoying because of the C-Go bridge easier. It will support for example:

  • initialization and method export, i.e. pass some py.Config object to a function and return that from an exported PyInit_foo
  • arguments parsing, to solve the PyArg_ParseTuple issue
  • GIL acquisition and release with global state

Ideally it will avoid you all the foo.c boilerplate and the second file hack. I'm still working on it (and I'm still technically on vacation), so I guess watch this space :)

Anyway for this sort of things (whatever this is), you can follow me on Twitter.