Wenzel Jakob | 10d992e | 2015-08-04 13:59:51 +0200 | [diff] [blame] | 1 |  |
| 2 | |
| 3 | # pybind11 — Seamless operability between C++11 and Python |
Wenzel Jakob | 38bd711 | 2015-07-05 20:05:44 +0200 | [diff] [blame] | 4 | |
Wenzel Jakob | 3b806d4 | 2015-10-11 16:29:35 +0200 | [diff] [blame] | 5 | [](https://travis-ci.org/wjakob/pybind11) |
| 6 | |
Wenzel Jakob | 38bd711 | 2015-07-05 20:05:44 +0200 | [diff] [blame] | 7 | **pybind11** is a lightweight header library that exposes C++ types in Python |
| 8 | and vice versa, mainly to create Python bindings of existing C++ code. Its |
| 9 | goals and syntax are similar to the excellent |
| 10 | [Boost.Python](http://www.boost.org/doc/libs/1_58_0/libs/python/doc/) library |
| 11 | by David Abrahams: to minimize boilerplate code in traditional extension |
| 12 | modules by inferring type information using compile-time introspection. |
| 13 | |
| 14 | The main issue with Boost.Python—and the reason for creating such a similar |
| 15 | project—is Boost. Boost is an enormously large and complex suite of utility |
| 16 | libraries that works with almost every C++ compiler in existence. This |
| 17 | compatibility has its cost: arcane template tricks and workarounds are |
| 18 | necessary to support the oldest and buggiest of compiler specimens. Now that |
| 19 | C++11-compatible compilers are widely available, this heavy machinery has |
| 20 | become an excessively large and unnecessary dependency. |
| 21 | |
| 22 | Think of this library as a tiny self-contained version of Boost.Python with |
| 23 | everything stripped away that isn't relevant for binding generation. The whole |
Wenzel Jakob | 5708221 | 2015-09-04 23:42:12 +0200 | [diff] [blame] | 24 | codebase requires less than 3000 lines of code and only depends on Python (2.7 |
| 25 | or 3.x) and the C++ standard library. This compact implementation was possible |
| 26 | thanks to some of the new C++11 language features (tuples, lambda functions and |
| 27 | variadic templates). |
Wenzel Jakob | 38bd711 | 2015-07-05 20:05:44 +0200 | [diff] [blame] | 28 | |
| 29 | ## Core features |
Wenzel Jakob | d4258ba | 2015-07-26 16:33:49 +0200 | [diff] [blame] | 30 | The following core C++ features can be mapped to Python |
Wenzel Jakob | 38bd711 | 2015-07-05 20:05:44 +0200 | [diff] [blame] | 31 | |
| 32 | - Functions accepting and returning custom data structures per value, reference, or pointer |
| 33 | - Instance methods and static methods |
| 34 | - Overloaded functions |
| 35 | - Instance attributes and static attributes |
| 36 | - Exceptions |
| 37 | - Enumerations |
| 38 | - Callbacks |
| 39 | - Custom operators |
| 40 | - STL data structures |
| 41 | - Smart pointers with reference counting like `std::shared_ptr` |
| 42 | - Internal references with correct reference counting |
Wenzel Jakob | a2f6fde | 2015-10-01 16:46:03 +0200 | [diff] [blame] | 43 | - C++ classes with virtual (and pure virtual) methods can be extended in Python |
Wenzel Jakob | 38bd711 | 2015-07-05 20:05:44 +0200 | [diff] [blame] | 44 | |
| 45 | ## Goodies |
| 46 | In addition to the core functionality, pybind11 provides some extra goodies: |
| 47 | |
| 48 | - It's easy to expose the internal storage of custom data types through |
| 49 | Pythons' buffer protocols. This is handy e.g. for fast conversion between |
| 50 | C++ matrix classes like Eigen and NumPy without expensive copy operations. |
| 51 | |
Wenzel Jakob | d4258ba | 2015-07-26 16:33:49 +0200 | [diff] [blame] | 52 | - pybind11 can automatically vectorize functions so that they are transparently |
| 53 | applied to all entries of one or more NumPy array arguments. |
| 54 | |
Wenzel Jakob | 38bd711 | 2015-07-05 20:05:44 +0200 | [diff] [blame] | 55 | - Python's slice-based access and assignment operations can be supported with |
| 56 | just a few lines of code. |
| 57 | |
| 58 | - pybind11 uses C++11 move constructors and move assignment operators whenever |
| 59 | possible to efficiently transfer custom data types. |
| 60 | |
Wenzel Jakob | d4258ba | 2015-07-26 16:33:49 +0200 | [diff] [blame] | 61 | - It is possible to bind C++11 lambda functions with captured variables. The |
| 62 | lambda capture data is stored inside the resulting Python function object. |
| 63 | |
Wenzel Jakob | 38bd711 | 2015-07-05 20:05:44 +0200 | [diff] [blame] | 64 | ## What does the binding code look like? |
| 65 | Here is a simple example. The directory `example` contains many more. |
| 66 | ```C++ |
| 67 | #include <pybind/pybind.h> |
| 68 | #include <pybind/operators.h> |
| 69 | |
| 70 | namespace py = pybind; |
| 71 | |
| 72 | /// Example C++ class which should be bound to Python |
| 73 | class Test { |
| 74 | public: |
| 75 | Test(); |
| 76 | Test(int value); |
| 77 | std::string toString(); |
| 78 | Test operator+(const Test &e) const; |
| 79 | |
| 80 | void print_dict(py::dict dict) { |
| 81 | /* Easily interact with Python types */ |
| 82 | for (auto item : dict) |
| 83 | std::cout << "key=" << item.first << ", " |
| 84 | << "value=" << item.second << std::endl; |
| 85 | } |
| 86 | |
| 87 | int value = 0; |
| 88 | }; |
| 89 | |
| 90 | |
| 91 | PYTHON_PLUGIN(example) { |
| 92 | py::module m("example", "pybind example plugin"); |
| 93 | |
| 94 | py::class_<Test>(m, "Test", "docstring for the Test class") |
| 95 | .def(py::init<>(), "docstring for constructor 1") |
| 96 | .def(py::init<int>(), "docstring for constructor 2") |
| 97 | .def(py::self + py::self, "Addition operator") |
| 98 | .def("__str__", &Test::toString, "Convert to a string representation") |
| 99 | .def("print_dict", &Test::print_dict, "Print a Python dictionary") |
| 100 | .def_readwrite("value", &Test::value, "An instance attribute"); |
| 101 | |
| 102 | return m.ptr(); |
| 103 | } |
| 104 | ``` |
Wenzel Jakob | d4258ba | 2015-07-26 16:33:49 +0200 | [diff] [blame] | 105 | |
| 106 | ## A collection of specific use cases (mostly buffer-related for now) |
| 107 | For brevity, let's set |
| 108 | ```C++ |
| 109 | namespace py = pybind; |
| 110 | ``` |
| 111 | ### Exposing buffer views |
| 112 | Python supports an extremely general and convenient approach for exchanging |
| 113 | data between plugin libraries. Types can expose a buffer view which provides |
| 114 | fast direct access to the raw internal representation. Suppose we want to bind |
| 115 | the following simplistic Matrix class: |
| 116 | |
| 117 | ```C++ |
| 118 | class Matrix { |
| 119 | public: |
| 120 | Matrix(size_t rows, size_t cols) : m_rows(rows), m_cols(cols) { |
| 121 | m_data = new float[rows*cols]; |
| 122 | } |
| 123 | float *data() { return m_data; } |
| 124 | size_t rows() const { return m_rows; } |
| 125 | size_t cols() const { return m_cols; } |
| 126 | private: |
| 127 | size_t m_rows, m_cols; |
| 128 | float *m_data; |
| 129 | }; |
| 130 | ``` |
| 131 | The following binding code exposes the ``Matrix`` contents as a buffer object, |
| 132 | making it possible to cast Matrixes into NumPy arrays. It is even possible to |
| 133 | completely avoid copy operations with Python expressions like |
| 134 | ``np.array(matrix_instance, copy = False)``. |
| 135 | ```C++ |
| 136 | py::class_<Matrix>(m, "Matrix") |
| 137 | .def_buffer([](Matrix &m) -> py::buffer_info { |
| 138 | return py::buffer_info( |
| 139 | m.data(), /* Pointer to buffer */ |
| 140 | sizeof(float), /* Size of one scalar */ |
| 141 | py::format_descriptor<float>::value(), /* Python struct-style format descriptor */ |
| 142 | 2, /* Number of dimensions */ |
| 143 | { m.rows(), m.cols() }, /* Buffer dimensions */ |
| 144 | { sizeof(float) * m.rows(), /* Strides (in bytes) for each index */ |
| 145 | sizeof(float) } |
| 146 | ); |
| 147 | }); |
| 148 | ``` |
| 149 | The snippet above binds a lambda function, which can create ``py::buffer_info`` |
| 150 | description records on demand describing a given matrix. The contents of |
| 151 | ``py::buffer_info`` mirror the Python buffer protocol specification. |
| 152 | ```C++ |
| 153 | struct buffer_info { |
| 154 | void *ptr; |
| 155 | size_t itemsize; |
| 156 | std::string format; |
| 157 | int ndim; |
| 158 | std::vector<size_t> shape; |
| 159 | std::vector<size_t> strides; |
| 160 | }; |
| 161 | ``` |
| 162 | ### Taking Python buffer objects as arguments |
| 163 | To create a C++ function that can take a Python buffer object as an argument, |
| 164 | simply use the type ``py::buffer`` as one of its arguments. Buffers can exist |
| 165 | in a great variety of configurations, hence some safety checks are usually |
| 166 | necessary in the function body. Below, you can see an basic example on how to |
| 167 | define a custom constructor for the Eigen double precision matrix |
| 168 | (``Eigen::MatrixXd``) type, which supports initialization from compatible |
| 169 | buffer |
| 170 | objects (e.g. a NumPy matrix). |
| 171 | ```C++ |
| 172 | py::class_<Eigen::MatrixXd>(m, "MatrixXd") |
| 173 | .def("__init__", [](Eigen::MatrixXd &m, py::buffer b) { |
| 174 | /* Request a buffer descriptor from Python */ |
Wenzel Jakob | a576e6a | 2015-07-29 17:51:54 +0200 | [diff] [blame] | 175 | py::buffer_info info = b.request(); |
Wenzel Jakob | d4258ba | 2015-07-26 16:33:49 +0200 | [diff] [blame] | 176 | |
| 177 | /* Some sanity checks ... */ |
| 178 | if (info.format != py::format_descriptor<double>::value()) |
| 179 | throw std::runtime_error("Incompatible format: expected a double array!"); |
| 180 | |
| 181 | if (info.ndim != 2) |
| 182 | throw std::runtime_error("Incompatible buffer dimension!"); |
| 183 | |
| 184 | if (info.strides[0] == sizeof(double)) { |
| 185 | /* Buffer has the right layout -- directly copy. */ |
| 186 | new (&m) Eigen::MatrixXd(info.shape[0], info.shape[1]); |
| 187 | memcpy(m.data(), info.ptr, sizeof(double) * m.size()); |
| 188 | } else { |
| 189 | /* Oops -- the buffer is transposed */ |
| 190 | new (&m) Eigen::MatrixXd(info.shape[1], info.shape[0]); |
| 191 | memcpy(m.data(), info.ptr, sizeof(double) * m.size()); |
| 192 | m.transposeInPlace(); |
| 193 | } |
| 194 | }); |
| 195 | ``` |
| 196 | |
| 197 | ### Taking NumPy arrays as arguments |
| 198 | |
| 199 | By exchanging ``py::buffer`` with ``py::array`` in the above snippet, we can |
| 200 | restrict the function so that it only accepts NumPy arrays (rather than any |
| 201 | type of Python object satisfying the buffer object protocol). |
| 202 | |
| 203 | In many situations, we want to define a function which only accepts a NumPy |
| 204 | array of a certain data type. This is possible via the ``py::array_dtype<T>`` |
| 205 | template. For instance, the following function requires the argument to be a |
| 206 | dense array of doubles in C-style ordering. |
| 207 | ```C++ |
| 208 | void f(py::array_dtype<double> array); |
| 209 | ``` |
| 210 | When it is invoked with a different type (e.g. an integer), the binding code |
| 211 | will attempt to cast the input into a NumPy array of the requested type. |
| 212 | |
| 213 | ### Auto-vectorizing a function over NumPy array arguments |
| 214 | Suppose we want to bind a function with the following signature to Python so |
| 215 | that it can process arbitrary NumPy array arguments (vectors, matrices, general |
| 216 | N-D arrays) in addition to its normal arguments: |
| 217 | ```C++ |
| 218 | double my_func(int x, float y, double z); |
| 219 | ``` |
| 220 | This is extremely simple to do! |
| 221 | ```C++ |
| 222 | m.def("vectorized_func", py::vectorize(my_func)); |
| 223 | ``` |
| 224 | Invoking the function like below causes 4 calls to be made to ``my_func`` with |
| 225 | each of the the array elements. The result is returned as a NumPy array of type |
Wenzel Jakob | a576e6a | 2015-07-29 17:51:54 +0200 | [diff] [blame] | 226 | ``numpy.dtype.float64``. |
Wenzel Jakob | d4258ba | 2015-07-26 16:33:49 +0200 | [diff] [blame] | 227 | ```Python |
| 228 | >>> x = np.array([[1, 3],[5, 7]]) |
| 229 | >>> y = np.array([[2, 4],[6, 8]]) |
| 230 | >>> z = 3 |
| 231 | >>> result = vectorized_func(x, y, z) |
| 232 | ``` |
| 233 | The scalar argument ``z`` is transparently replicated 4 times. The input |
| 234 | arrays ``x`` and ``y`` are automatically converted into the right types (they |
| 235 | are of type ``numpy.dtype.int64`` but need to be ``numpy.dtype.int32`` and |
| 236 | ``numpy.dtype.float32``, respectively) |
| 237 | |
| 238 | Sometimes we might want to explitly exclude an argument from the vectorization |
| 239 | because it makes little sense to wrap it in a NumPy array. For instance, |
| 240 | suppose the function signature was |
| 241 | ```C++ |
| 242 | double my_func(int x, float y, my_custom_type *z); |
| 243 | ``` |
| 244 | This can be done with a stateful Lambda closure: |
| 245 | ```C++ |
| 246 | // Vectorize a lambda function with a capture object (e.g. to exclude some arguments from the vectorization) |
| 247 | m.def("vectorized_func", |
| 248 | [](py::array_dtype<int> x, py::array_dtype<float> y, my_custom_type *z) { |
| 249 | auto stateful_closure = [z](int x, float y) { return my_func(x, y, z); }; |
| 250 | return py::vectorize(stateful_closure)(x, y); |
| 251 | } |
| 252 | ); |
| 253 | ``` |