Wenzel Jakob | 38bd711 | 2015-07-05 20:05:44 +0200 | [diff] [blame] | 1 | # pybind11 -- Seamless operability between C++11 and Python |
| 2 | |
| 3 | **pybind11** is a lightweight header library that exposes C++ types in Python |
| 4 | and vice versa, mainly to create Python bindings of existing C++ code. Its |
| 5 | goals and syntax are similar to the excellent |
| 6 | [Boost.Python](http://www.boost.org/doc/libs/1_58_0/libs/python/doc/) library |
| 7 | by David Abrahams: to minimize boilerplate code in traditional extension |
| 8 | modules by inferring type information using compile-time introspection. |
| 9 | |
| 10 | The main issue with Boost.Python—and the reason for creating such a similar |
| 11 | project—is Boost. Boost is an enormously large and complex suite of utility |
| 12 | libraries that works with almost every C++ compiler in existence. This |
| 13 | compatibility has its cost: arcane template tricks and workarounds are |
| 14 | necessary to support the oldest and buggiest of compiler specimens. Now that |
| 15 | C++11-compatible compilers are widely available, this heavy machinery has |
| 16 | become an excessively large and unnecessary dependency. |
| 17 | |
| 18 | Think of this library as a tiny self-contained version of Boost.Python with |
| 19 | everything stripped away that isn't relevant for binding generation. The whole |
Wenzel Jakob | d4258ba | 2015-07-26 16:33:49 +0200 | [diff] [blame] | 20 | codebase requires just over 2000 lines of code and only depends on Python and |
Wenzel Jakob | 38bd711 | 2015-07-05 20:05:44 +0200 | [diff] [blame] | 21 | the C++ standard library. This compact implementation was possible thanks to |
| 22 | some of the new C++11 language features (tuples, lambda functions and variadic |
| 23 | templates), and by only targeting Python 3.x and higher. |
| 24 | |
| 25 | ## Core features |
Wenzel Jakob | d4258ba | 2015-07-26 16:33:49 +0200 | [diff] [blame] | 26 | The following core C++ features can be mapped to Python |
Wenzel Jakob | 38bd711 | 2015-07-05 20:05:44 +0200 | [diff] [blame] | 27 | |
| 28 | - Functions accepting and returning custom data structures per value, reference, or pointer |
| 29 | - Instance methods and static methods |
| 30 | - Overloaded functions |
| 31 | - Instance attributes and static attributes |
| 32 | - Exceptions |
| 33 | - Enumerations |
| 34 | - Callbacks |
| 35 | - Custom operators |
| 36 | - STL data structures |
| 37 | - Smart pointers with reference counting like `std::shared_ptr` |
| 38 | - Internal references with correct reference counting |
| 39 | |
| 40 | ## Goodies |
| 41 | In addition to the core functionality, pybind11 provides some extra goodies: |
| 42 | |
| 43 | - It's easy to expose the internal storage of custom data types through |
| 44 | Pythons' buffer protocols. This is handy e.g. for fast conversion between |
| 45 | C++ matrix classes like Eigen and NumPy without expensive copy operations. |
| 46 | |
Wenzel Jakob | d4258ba | 2015-07-26 16:33:49 +0200 | [diff] [blame] | 47 | - pybind11 can automatically vectorize functions so that they are transparently |
| 48 | applied to all entries of one or more NumPy array arguments. |
| 49 | |
Wenzel Jakob | 38bd711 | 2015-07-05 20:05:44 +0200 | [diff] [blame] | 50 | - Python's slice-based access and assignment operations can be supported with |
| 51 | just a few lines of code. |
| 52 | |
| 53 | - pybind11 uses C++11 move constructors and move assignment operators whenever |
| 54 | possible to efficiently transfer custom data types. |
| 55 | |
Wenzel Jakob | d4258ba | 2015-07-26 16:33:49 +0200 | [diff] [blame] | 56 | - It is possible to bind C++11 lambda functions with captured variables. The |
| 57 | lambda capture data is stored inside the resulting Python function object. |
| 58 | |
Wenzel Jakob | 38bd711 | 2015-07-05 20:05:44 +0200 | [diff] [blame] | 59 | ## What does the binding code look like? |
| 60 | Here is a simple example. The directory `example` contains many more. |
| 61 | ```C++ |
| 62 | #include <pybind/pybind.h> |
| 63 | #include <pybind/operators.h> |
| 64 | |
| 65 | namespace py = pybind; |
| 66 | |
| 67 | /// Example C++ class which should be bound to Python |
| 68 | class Test { |
| 69 | public: |
| 70 | Test(); |
| 71 | Test(int value); |
| 72 | std::string toString(); |
| 73 | Test operator+(const Test &e) const; |
| 74 | |
| 75 | void print_dict(py::dict dict) { |
| 76 | /* Easily interact with Python types */ |
| 77 | for (auto item : dict) |
| 78 | std::cout << "key=" << item.first << ", " |
| 79 | << "value=" << item.second << std::endl; |
| 80 | } |
| 81 | |
| 82 | int value = 0; |
| 83 | }; |
| 84 | |
| 85 | |
| 86 | PYTHON_PLUGIN(example) { |
| 87 | py::module m("example", "pybind example plugin"); |
| 88 | |
| 89 | py::class_<Test>(m, "Test", "docstring for the Test class") |
| 90 | .def(py::init<>(), "docstring for constructor 1") |
| 91 | .def(py::init<int>(), "docstring for constructor 2") |
| 92 | .def(py::self + py::self, "Addition operator") |
| 93 | .def("__str__", &Test::toString, "Convert to a string representation") |
| 94 | .def("print_dict", &Test::print_dict, "Print a Python dictionary") |
| 95 | .def_readwrite("value", &Test::value, "An instance attribute"); |
| 96 | |
| 97 | return m.ptr(); |
| 98 | } |
| 99 | ``` |
Wenzel Jakob | d4258ba | 2015-07-26 16:33:49 +0200 | [diff] [blame] | 100 | |
| 101 | ## A collection of specific use cases (mostly buffer-related for now) |
| 102 | For brevity, let's set |
| 103 | ```C++ |
| 104 | namespace py = pybind; |
| 105 | ``` |
| 106 | ### Exposing buffer views |
| 107 | Python supports an extremely general and convenient approach for exchanging |
| 108 | data between plugin libraries. Types can expose a buffer view which provides |
| 109 | fast direct access to the raw internal representation. Suppose we want to bind |
| 110 | the following simplistic Matrix class: |
| 111 | |
| 112 | ```C++ |
| 113 | class Matrix { |
| 114 | public: |
| 115 | Matrix(size_t rows, size_t cols) : m_rows(rows), m_cols(cols) { |
| 116 | m_data = new float[rows*cols]; |
| 117 | } |
| 118 | float *data() { return m_data; } |
| 119 | size_t rows() const { return m_rows; } |
| 120 | size_t cols() const { return m_cols; } |
| 121 | private: |
| 122 | size_t m_rows, m_cols; |
| 123 | float *m_data; |
| 124 | }; |
| 125 | ``` |
| 126 | The following binding code exposes the ``Matrix`` contents as a buffer object, |
| 127 | making it possible to cast Matrixes into NumPy arrays. It is even possible to |
| 128 | completely avoid copy operations with Python expressions like |
| 129 | ``np.array(matrix_instance, copy = False)``. |
| 130 | ```C++ |
| 131 | py::class_<Matrix>(m, "Matrix") |
| 132 | .def_buffer([](Matrix &m) -> py::buffer_info { |
| 133 | return py::buffer_info( |
| 134 | m.data(), /* Pointer to buffer */ |
| 135 | sizeof(float), /* Size of one scalar */ |
| 136 | py::format_descriptor<float>::value(), /* Python struct-style format descriptor */ |
| 137 | 2, /* Number of dimensions */ |
| 138 | { m.rows(), m.cols() }, /* Buffer dimensions */ |
| 139 | { sizeof(float) * m.rows(), /* Strides (in bytes) for each index */ |
| 140 | sizeof(float) } |
| 141 | ); |
| 142 | }); |
| 143 | ``` |
| 144 | The snippet above binds a lambda function, which can create ``py::buffer_info`` |
| 145 | description records on demand describing a given matrix. The contents of |
| 146 | ``py::buffer_info`` mirror the Python buffer protocol specification. |
| 147 | ```C++ |
| 148 | struct buffer_info { |
| 149 | void *ptr; |
| 150 | size_t itemsize; |
| 151 | std::string format; |
| 152 | int ndim; |
| 153 | std::vector<size_t> shape; |
| 154 | std::vector<size_t> strides; |
| 155 | }; |
| 156 | ``` |
| 157 | ### Taking Python buffer objects as arguments |
| 158 | To create a C++ function that can take a Python buffer object as an argument, |
| 159 | simply use the type ``py::buffer`` as one of its arguments. Buffers can exist |
| 160 | in a great variety of configurations, hence some safety checks are usually |
| 161 | necessary in the function body. Below, you can see an basic example on how to |
| 162 | define a custom constructor for the Eigen double precision matrix |
| 163 | (``Eigen::MatrixXd``) type, which supports initialization from compatible |
| 164 | buffer |
| 165 | objects (e.g. a NumPy matrix). |
| 166 | ```C++ |
| 167 | py::class_<Eigen::MatrixXd>(m, "MatrixXd") |
| 168 | .def("__init__", [](Eigen::MatrixXd &m, py::buffer b) { |
| 169 | /* Request a buffer descriptor from Python */ |
Wenzel Jakob | a576e6a | 2015-07-29 17:51:54 +0200 | [diff] [blame^] | 170 | py::buffer_info info = b.request(); |
Wenzel Jakob | d4258ba | 2015-07-26 16:33:49 +0200 | [diff] [blame] | 171 | |
| 172 | /* Some sanity checks ... */ |
| 173 | if (info.format != py::format_descriptor<double>::value()) |
| 174 | throw std::runtime_error("Incompatible format: expected a double array!"); |
| 175 | |
| 176 | if (info.ndim != 2) |
| 177 | throw std::runtime_error("Incompatible buffer dimension!"); |
| 178 | |
| 179 | if (info.strides[0] == sizeof(double)) { |
| 180 | /* Buffer has the right layout -- directly copy. */ |
| 181 | new (&m) Eigen::MatrixXd(info.shape[0], info.shape[1]); |
| 182 | memcpy(m.data(), info.ptr, sizeof(double) * m.size()); |
| 183 | } else { |
| 184 | /* Oops -- the buffer is transposed */ |
| 185 | new (&m) Eigen::MatrixXd(info.shape[1], info.shape[0]); |
| 186 | memcpy(m.data(), info.ptr, sizeof(double) * m.size()); |
| 187 | m.transposeInPlace(); |
| 188 | } |
| 189 | }); |
| 190 | ``` |
| 191 | |
| 192 | ### Taking NumPy arrays as arguments |
| 193 | |
| 194 | By exchanging ``py::buffer`` with ``py::array`` in the above snippet, we can |
| 195 | restrict the function so that it only accepts NumPy arrays (rather than any |
| 196 | type of Python object satisfying the buffer object protocol). |
| 197 | |
| 198 | In many situations, we want to define a function which only accepts a NumPy |
| 199 | array of a certain data type. This is possible via the ``py::array_dtype<T>`` |
| 200 | template. For instance, the following function requires the argument to be a |
| 201 | dense array of doubles in C-style ordering. |
| 202 | ```C++ |
| 203 | void f(py::array_dtype<double> array); |
| 204 | ``` |
| 205 | When it is invoked with a different type (e.g. an integer), the binding code |
| 206 | will attempt to cast the input into a NumPy array of the requested type. |
| 207 | |
| 208 | ### Auto-vectorizing a function over NumPy array arguments |
| 209 | Suppose we want to bind a function with the following signature to Python so |
| 210 | that it can process arbitrary NumPy array arguments (vectors, matrices, general |
| 211 | N-D arrays) in addition to its normal arguments: |
| 212 | ```C++ |
| 213 | double my_func(int x, float y, double z); |
| 214 | ``` |
| 215 | This is extremely simple to do! |
| 216 | ```C++ |
| 217 | m.def("vectorized_func", py::vectorize(my_func)); |
| 218 | ``` |
| 219 | Invoking the function like below causes 4 calls to be made to ``my_func`` with |
| 220 | each of the the array elements. The result is returned as a NumPy array of type |
Wenzel Jakob | a576e6a | 2015-07-29 17:51:54 +0200 | [diff] [blame^] | 221 | ``numpy.dtype.float64``. |
Wenzel Jakob | d4258ba | 2015-07-26 16:33:49 +0200 | [diff] [blame] | 222 | ```Python |
| 223 | >>> x = np.array([[1, 3],[5, 7]]) |
| 224 | >>> y = np.array([[2, 4],[6, 8]]) |
| 225 | >>> z = 3 |
| 226 | >>> result = vectorized_func(x, y, z) |
| 227 | ``` |
| 228 | The scalar argument ``z`` is transparently replicated 4 times. The input |
| 229 | arrays ``x`` and ``y`` are automatically converted into the right types (they |
| 230 | are of type ``numpy.dtype.int64`` but need to be ``numpy.dtype.int32`` and |
| 231 | ``numpy.dtype.float32``, respectively) |
| 232 | |
| 233 | Sometimes we might want to explitly exclude an argument from the vectorization |
| 234 | because it makes little sense to wrap it in a NumPy array. For instance, |
| 235 | suppose the function signature was |
| 236 | ```C++ |
| 237 | double my_func(int x, float y, my_custom_type *z); |
| 238 | ``` |
| 239 | This can be done with a stateful Lambda closure: |
| 240 | ```C++ |
| 241 | // Vectorize a lambda function with a capture object (e.g. to exclude some arguments from the vectorization) |
| 242 | m.def("vectorized_func", |
| 243 | [](py::array_dtype<int> x, py::array_dtype<float> y, my_custom_type *z) { |
| 244 | auto stateful_closure = [z](int x, float y) { return my_func(x, y, z); }; |
| 245 | return py::vectorize(stateful_closure)(x, y); |
| 246 | } |
| 247 | ); |
| 248 | ``` |