blob: 111ff0e3cf376d89d5429b3bc1d0bdaeb401bd8c [file] [log] [blame]
Dean Moldovan67b52d82016-10-16 19:12:43 +02001.. _numpy:
2
3NumPy
4#####
5
6Buffer protocol
7===============
8
9Python supports an extremely general and convenient approach for exchanging
10data between plugin libraries. Types can expose a buffer view [#f2]_, which
11provides fast direct access to the raw internal data representation. Suppose we
12want to bind the following simplistic Matrix class:
13
14.. code-block:: cpp
15
16 class Matrix {
17 public:
18 Matrix(size_t rows, size_t cols) : m_rows(rows), m_cols(cols) {
19 m_data = new float[rows*cols];
20 }
21 float *data() { return m_data; }
22 size_t rows() const { return m_rows; }
23 size_t cols() const { return m_cols; }
24 private:
25 size_t m_rows, m_cols;
26 float *m_data;
27 };
28
29The following binding code exposes the ``Matrix`` contents as a buffer object,
30making it possible to cast Matrices into NumPy arrays. It is even possible to
31completely avoid copy operations with Python expressions like
32``np.array(matrix_instance, copy = False)``.
33
34.. code-block:: cpp
35
Wenzel Jakob1d1f81b2016-12-16 15:00:46 +010036 py::class_<Matrix>(m, "Matrix", py::buffer_protocol())
Dean Moldovan67b52d82016-10-16 19:12:43 +020037 .def_buffer([](Matrix &m) -> py::buffer_info {
38 return py::buffer_info(
39 m.data(), /* Pointer to buffer */
40 sizeof(float), /* Size of one scalar */
41 py::format_descriptor<float>::format(), /* Python struct-style format descriptor */
42 2, /* Number of dimensions */
43 { m.rows(), m.cols() }, /* Buffer dimensions */
44 { sizeof(float) * m.rows(), /* Strides (in bytes) for each index */
45 sizeof(float) }
46 );
47 });
48
Wenzel Jakob1d1f81b2016-12-16 15:00:46 +010049Supporting the buffer protocol in a new type involves specifying the special
50``py::buffer_protocol()`` tag in the ``py::class_`` constructor and calling the
51``def_buffer()`` method with a lambda function that creates a
52``py::buffer_info`` description record on demand describing a given matrix
53instance. The contents of ``py::buffer_info`` mirror the Python buffer protocol
54specification.
Dean Moldovan67b52d82016-10-16 19:12:43 +020055
56.. code-block:: cpp
57
58 struct buffer_info {
59 void *ptr;
60 size_t itemsize;
61 std::string format;
62 int ndim;
63 std::vector<size_t> shape;
64 std::vector<size_t> strides;
65 };
66
67To create a C++ function that can take a Python buffer object as an argument,
68simply use the type ``py::buffer`` as one of its arguments. Buffers can exist
69in a great variety of configurations, hence some safety checks are usually
70necessary in the function body. Below, you can see an basic example on how to
71define a custom constructor for the Eigen double precision matrix
72(``Eigen::MatrixXd``) type, which supports initialization from compatible
73buffer objects (e.g. a NumPy matrix).
74
75.. code-block:: cpp
76
77 /* Bind MatrixXd (or some other Eigen type) to Python */
78 typedef Eigen::MatrixXd Matrix;
79
80 typedef Matrix::Scalar Scalar;
81 constexpr bool rowMajor = Matrix::Flags & Eigen::RowMajorBit;
82
Wenzel Jakob1d1f81b2016-12-16 15:00:46 +010083 py::class_<Matrix>(m, "Matrix", py::buffer_protocol())
Dean Moldovan67b52d82016-10-16 19:12:43 +020084 .def("__init__", [](Matrix &m, py::buffer b) {
85 typedef Eigen::Stride<Eigen::Dynamic, Eigen::Dynamic> Strides;
86
87 /* Request a buffer descriptor from Python */
88 py::buffer_info info = b.request();
89
90 /* Some sanity checks ... */
91 if (info.format != py::format_descriptor<Scalar>::format())
92 throw std::runtime_error("Incompatible format: expected a double array!");
93
94 if (info.ndim != 2)
95 throw std::runtime_error("Incompatible buffer dimension!");
96
97 auto strides = Strides(
98 info.strides[rowMajor ? 0 : 1] / sizeof(Scalar),
99 info.strides[rowMajor ? 1 : 0] / sizeof(Scalar));
100
101 auto map = Eigen::Map<Matrix, 0, Strides>(
102 static_cat<Scalar *>(info.ptr), info.shape[0], info.shape[1], strides);
103
104 new (&m) Matrix(map);
105 });
106
107For reference, the ``def_buffer()`` call for this Eigen data type should look
108as follows:
109
110.. code-block:: cpp
111
112 .def_buffer([](Matrix &m) -> py::buffer_info {
113 return py::buffer_info(
114 m.data(), /* Pointer to buffer */
115 sizeof(Scalar), /* Size of one scalar */
116 /* Python struct-style format descriptor */
117 py::format_descriptor<Scalar>::format(),
118 /* Number of dimensions */
119 2,
120 /* Buffer dimensions */
121 { (size_t) m.rows(),
122 (size_t) m.cols() },
123 /* Strides (in bytes) for each index */
124 { sizeof(Scalar) * (rowMajor ? m.cols() : 1),
125 sizeof(Scalar) * (rowMajor ? 1 : m.rows()) }
126 );
127 })
128
129For a much easier approach of binding Eigen types (although with some
130limitations), refer to the section on :doc:`/advanced/cast/eigen`.
131
132.. seealso::
133
134 The file :file:`tests/test_buffers.cpp` contains a complete example
135 that demonstrates using the buffer protocol with pybind11 in more detail.
136
137.. [#f2] http://docs.python.org/3/c-api/buffer.html
138
139Arrays
140======
141
142By exchanging ``py::buffer`` with ``py::array`` in the above snippet, we can
143restrict the function so that it only accepts NumPy arrays (rather than any
144type of Python object satisfying the buffer protocol).
145
146In many situations, we want to define a function which only accepts a NumPy
147array of a certain data type. This is possible via the ``py::array_t<T>``
148template. For instance, the following function requires the argument to be a
149NumPy array containing double precision values.
150
151.. code-block:: cpp
152
153 void f(py::array_t<double> array);
154
155When it is invoked with a different type (e.g. an integer or a list of
156integers), the binding code will attempt to cast the input into a NumPy array
157of the requested type. Note that this feature requires the
158:file:``pybind11/numpy.h`` header to be included.
159
160Data in NumPy arrays is not guaranteed to packed in a dense manner;
161furthermore, entries can be separated by arbitrary column and row strides.
162Sometimes, it can be useful to require a function to only accept dense arrays
163using either the C (row-major) or Fortran (column-major) ordering. This can be
164accomplished via a second template argument with values ``py::array::c_style``
165or ``py::array::f_style``.
166
167.. code-block:: cpp
168
169 void f(py::array_t<double, py::array::c_style | py::array::forcecast> array);
170
171The ``py::array::forcecast`` argument is the default value of the second
172template parameter, and it ensures that non-conforming arguments are converted
173into an array satisfying the specified requirements instead of trying the next
174function overload.
175
176Structured types
177================
178
179In order for ``py::array_t`` to work with structured (record) types, we first need
180to register the memory layout of the type. This can be done via ``PYBIND11_NUMPY_DTYPE``
181macro which expects the type followed by field names:
182
183.. code-block:: cpp
184
185 struct A {
186 int x;
187 double y;
188 };
189
190 struct B {
191 int z;
192 A a;
193 };
194
195 PYBIND11_NUMPY_DTYPE(A, x, y);
196 PYBIND11_NUMPY_DTYPE(B, z, a);
197
198 /* now both A and B can be used as template arguments to py::array_t */
199
200Vectorizing functions
201=====================
202
203Suppose we want to bind a function with the following signature to Python so
204that it can process arbitrary NumPy array arguments (vectors, matrices, general
205N-D arrays) in addition to its normal arguments:
206
207.. code-block:: cpp
208
209 double my_func(int x, float y, double z);
210
211After including the ``pybind11/numpy.h`` header, this is extremely simple:
212
213.. code-block:: cpp
214
215 m.def("vectorized_func", py::vectorize(my_func));
216
217Invoking the function like below causes 4 calls to be made to ``my_func`` with
218each of the array elements. The significant advantage of this compared to
219solutions like ``numpy.vectorize()`` is that the loop over the elements runs
220entirely on the C++ side and can be crunched down into a tight, optimized loop
221by the compiler. The result is returned as a NumPy array of type
222``numpy.dtype.float64``.
223
224.. code-block:: pycon
225
226 >>> x = np.array([[1, 3],[5, 7]])
227 >>> y = np.array([[2, 4],[6, 8]])
228 >>> z = 3
229 >>> result = vectorized_func(x, y, z)
230
231The scalar argument ``z`` is transparently replicated 4 times. The input
232arrays ``x`` and ``y`` are automatically converted into the right types (they
233are of type ``numpy.dtype.int64`` but need to be ``numpy.dtype.int32`` and
234``numpy.dtype.float32``, respectively)
235
236Sometimes we might want to explicitly exclude an argument from the vectorization
237because it makes little sense to wrap it in a NumPy array. For instance,
238suppose the function signature was
239
240.. code-block:: cpp
241
242 double my_func(int x, float y, my_custom_type *z);
243
244This can be done with a stateful Lambda closure:
245
246.. code-block:: cpp
247
248 // Vectorize a lambda function with a capture object (e.g. to exclude some arguments from the vectorization)
249 m.def("vectorized_func",
250 [](py::array_t<int> x, py::array_t<float> y, my_custom_type *z) {
251 auto stateful_closure = [z](int x, float y) { return my_func(x, y, z); };
252 return py::vectorize(stateful_closure)(x, y);
253 }
254 );
255
256In cases where the computation is too complicated to be reduced to
257``vectorize``, it will be necessary to create and access the buffer contents
258manually. The following snippet contains a complete example that shows how this
259works (the code is somewhat contrived, since it could have been done more
260simply using ``vectorize``).
261
262.. code-block:: cpp
263
264 #include <pybind11/pybind11.h>
265 #include <pybind11/numpy.h>
266
267 namespace py = pybind11;
268
269 py::array_t<double> add_arrays(py::array_t<double> input1, py::array_t<double> input2) {
270 auto buf1 = input1.request(), buf2 = input2.request();
271
272 if (buf1.ndim != 1 || buf2.ndim != 1)
273 throw std::runtime_error("Number of dimensions must be one");
274
275 if (buf1.size != buf2.size)
276 throw std::runtime_error("Input shapes must match");
277
278 /* No pointer is passed, so NumPy will allocate the buffer */
279 auto result = py::array_t<double>(buf1.size);
280
281 auto buf3 = result.request();
282
283 double *ptr1 = (double *) buf1.ptr,
284 *ptr2 = (double *) buf2.ptr,
285 *ptr3 = (double *) buf3.ptr;
286
287 for (size_t idx = 0; idx < buf1.shape[0]; idx++)
288 ptr3[idx] = ptr1[idx] + ptr2[idx];
289
290 return result;
291 }
292
293 PYBIND11_PLUGIN(test) {
294 py::module m("test");
295 m.def("add_arrays", &add_arrays, "Add two NumPy arrays");
296 return m.ptr();
297 }
298
299.. seealso::
300
301 The file :file:`tests/test_numpy_vectorize.cpp` contains a complete
302 example that demonstrates using :func:`vectorize` in more detail.