blob: 71123631ab4755d1e98965878204fe7893450f1b [file] [log] [blame]
Dean Moldovan67b52d82016-10-16 19:12:43 +02001.. _numpy:
2
3NumPy
4#####
5
6Buffer protocol
7===============
8
9Python supports an extremely general and convenient approach for exchanging
10data between plugin libraries. Types can expose a buffer view [#f2]_, which
11provides fast direct access to the raw internal data representation. Suppose we
12want to bind the following simplistic Matrix class:
13
14.. code-block:: cpp
15
16 class Matrix {
17 public:
18 Matrix(size_t rows, size_t cols) : m_rows(rows), m_cols(cols) {
19 m_data = new float[rows*cols];
20 }
21 float *data() { return m_data; }
22 size_t rows() const { return m_rows; }
23 size_t cols() const { return m_cols; }
24 private:
25 size_t m_rows, m_cols;
26 float *m_data;
27 };
28
29The following binding code exposes the ``Matrix`` contents as a buffer object,
30making it possible to cast Matrices into NumPy arrays. It is even possible to
31completely avoid copy operations with Python expressions like
32``np.array(matrix_instance, copy = False)``.
33
34.. code-block:: cpp
35
Wenzel Jakob1d1f81b2016-12-16 15:00:46 +010036 py::class_<Matrix>(m, "Matrix", py::buffer_protocol())
Dean Moldovan67b52d82016-10-16 19:12:43 +020037 .def_buffer([](Matrix &m) -> py::buffer_info {
38 return py::buffer_info(
39 m.data(), /* Pointer to buffer */
40 sizeof(float), /* Size of one scalar */
41 py::format_descriptor<float>::format(), /* Python struct-style format descriptor */
42 2, /* Number of dimensions */
43 { m.rows(), m.cols() }, /* Buffer dimensions */
Jason Rhinelanderb68959e2017-04-06 18:16:35 -040044 { sizeof(float) * m.rows(), /* Strides (in bytes) for each index */
45 sizeof(float) }
Dean Moldovan67b52d82016-10-16 19:12:43 +020046 );
47 });
48
Wenzel Jakob1d1f81b2016-12-16 15:00:46 +010049Supporting the buffer protocol in a new type involves specifying the special
50``py::buffer_protocol()`` tag in the ``py::class_`` constructor and calling the
51``def_buffer()`` method with a lambda function that creates a
52``py::buffer_info`` description record on demand describing a given matrix
53instance. The contents of ``py::buffer_info`` mirror the Python buffer protocol
54specification.
Dean Moldovan67b52d82016-10-16 19:12:43 +020055
56.. code-block:: cpp
57
58 struct buffer_info {
59 void *ptr;
Cris Luengo30d43c42017-04-14 14:33:44 -060060 ssize_t itemsize;
Dean Moldovan67b52d82016-10-16 19:12:43 +020061 std::string format;
Cris Luengo30d43c42017-04-14 14:33:44 -060062 ssize_t ndim;
63 std::vector<ssize_t> shape;
Cris Luengod400f602017-04-05 16:13:04 -060064 std::vector<ssize_t> strides;
Dean Moldovan67b52d82016-10-16 19:12:43 +020065 };
66
67To create a C++ function that can take a Python buffer object as an argument,
68simply use the type ``py::buffer`` as one of its arguments. Buffers can exist
69in a great variety of configurations, hence some safety checks are usually
70necessary in the function body. Below, you can see an basic example on how to
71define a custom constructor for the Eigen double precision matrix
72(``Eigen::MatrixXd``) type, which supports initialization from compatible
73buffer objects (e.g. a NumPy matrix).
74
75.. code-block:: cpp
76
77 /* Bind MatrixXd (or some other Eigen type) to Python */
78 typedef Eigen::MatrixXd Matrix;
79
80 typedef Matrix::Scalar Scalar;
81 constexpr bool rowMajor = Matrix::Flags & Eigen::RowMajorBit;
82
Wenzel Jakob1d1f81b2016-12-16 15:00:46 +010083 py::class_<Matrix>(m, "Matrix", py::buffer_protocol())
Dean Moldovan67b52d82016-10-16 19:12:43 +020084 .def("__init__", [](Matrix &m, py::buffer b) {
85 typedef Eigen::Stride<Eigen::Dynamic, Eigen::Dynamic> Strides;
86
87 /* Request a buffer descriptor from Python */
88 py::buffer_info info = b.request();
89
90 /* Some sanity checks ... */
91 if (info.format != py::format_descriptor<Scalar>::format())
92 throw std::runtime_error("Incompatible format: expected a double array!");
93
94 if (info.ndim != 2)
95 throw std::runtime_error("Incompatible buffer dimension!");
96
97 auto strides = Strides(
Cris Luengo30d43c42017-04-14 14:33:44 -060098 info.strides[rowMajor ? 0 : 1] / (py::ssize_t)sizeof(Scalar),
99 info.strides[rowMajor ? 1 : 0] / (py::ssize_t)sizeof(Scalar));
Dean Moldovan67b52d82016-10-16 19:12:43 +0200100
101 auto map = Eigen::Map<Matrix, 0, Strides>(
102 static_cat<Scalar *>(info.ptr), info.shape[0], info.shape[1], strides);
103
104 new (&m) Matrix(map);
105 });
106
107For reference, the ``def_buffer()`` call for this Eigen data type should look
108as follows:
109
110.. code-block:: cpp
111
112 .def_buffer([](Matrix &m) -> py::buffer_info {
113 return py::buffer_info(
Cris Luengo30d43c42017-04-14 14:33:44 -0600114 m.data(), /* Pointer to buffer */
115 sizeof(Scalar), /* Size of one scalar */
116 py::format_descriptor<Scalar>::format(), /* Python struct-style format descriptor */
117 2, /* Number of dimensions */
118 { m.rows(), m.cols() }, /* Buffer dimensions */
Jason Rhinelanderb68959e2017-04-06 18:16:35 -0400119 { sizeof(Scalar) * (rowMajor ? m.cols() : 1),
120 sizeof(Scalar) * (rowMajor ? 1 : m.rows()) }
Cris Luengo30d43c42017-04-14 14:33:44 -0600121 /* Strides (in bytes) for each index */
Dean Moldovan67b52d82016-10-16 19:12:43 +0200122 );
123 })
124
125For a much easier approach of binding Eigen types (although with some
126limitations), refer to the section on :doc:`/advanced/cast/eigen`.
127
128.. seealso::
129
130 The file :file:`tests/test_buffers.cpp` contains a complete example
131 that demonstrates using the buffer protocol with pybind11 in more detail.
132
133.. [#f2] http://docs.python.org/3/c-api/buffer.html
134
135Arrays
136======
137
138By exchanging ``py::buffer`` with ``py::array`` in the above snippet, we can
139restrict the function so that it only accepts NumPy arrays (rather than any
140type of Python object satisfying the buffer protocol).
141
142In many situations, we want to define a function which only accepts a NumPy
143array of a certain data type. This is possible via the ``py::array_t<T>``
144template. For instance, the following function requires the argument to be a
145NumPy array containing double precision values.
146
147.. code-block:: cpp
148
149 void f(py::array_t<double> array);
150
151When it is invoked with a different type (e.g. an integer or a list of
152integers), the binding code will attempt to cast the input into a NumPy array
153of the requested type. Note that this feature requires the
Jason Rhinelander12494522017-01-31 11:28:29 -0500154:file:`pybind11/numpy.h` header to be included.
Dean Moldovan67b52d82016-10-16 19:12:43 +0200155
156Data in NumPy arrays is not guaranteed to packed in a dense manner;
157furthermore, entries can be separated by arbitrary column and row strides.
158Sometimes, it can be useful to require a function to only accept dense arrays
159using either the C (row-major) or Fortran (column-major) ordering. This can be
160accomplished via a second template argument with values ``py::array::c_style``
161or ``py::array::f_style``.
162
163.. code-block:: cpp
164
165 void f(py::array_t<double, py::array::c_style | py::array::forcecast> array);
166
167The ``py::array::forcecast`` argument is the default value of the second
168template parameter, and it ensures that non-conforming arguments are converted
169into an array satisfying the specified requirements instead of trying the next
170function overload.
171
172Structured types
173================
174
Jason Rhinelanderf7f5bc82017-01-31 11:00:15 -0500175In order for ``py::array_t`` to work with structured (record) types, we first
176need to register the memory layout of the type. This can be done via
177``PYBIND11_NUMPY_DTYPE`` macro, called in the plugin definition code, which
178expects the type followed by field names:
Dean Moldovan67b52d82016-10-16 19:12:43 +0200179
180.. code-block:: cpp
181
182 struct A {
183 int x;
184 double y;
185 };
186
187 struct B {
188 int z;
189 A a;
190 };
191
Jason Rhinelanderf7f5bc82017-01-31 11:00:15 -0500192 // ...
193 PYBIND11_PLUGIN(test) {
194 // ...
Dean Moldovan67b52d82016-10-16 19:12:43 +0200195
Jason Rhinelanderf7f5bc82017-01-31 11:00:15 -0500196 PYBIND11_NUMPY_DTYPE(A, x, y);
197 PYBIND11_NUMPY_DTYPE(B, z, a);
198 /* now both A and B can be used as template arguments to py::array_t */
199 }
Dean Moldovan67b52d82016-10-16 19:12:43 +0200200
201Vectorizing functions
202=====================
203
204Suppose we want to bind a function with the following signature to Python so
205that it can process arbitrary NumPy array arguments (vectors, matrices, general
206N-D arrays) in addition to its normal arguments:
207
208.. code-block:: cpp
209
210 double my_func(int x, float y, double z);
211
212After including the ``pybind11/numpy.h`` header, this is extremely simple:
213
214.. code-block:: cpp
215
216 m.def("vectorized_func", py::vectorize(my_func));
217
218Invoking the function like below causes 4 calls to be made to ``my_func`` with
219each of the array elements. The significant advantage of this compared to
220solutions like ``numpy.vectorize()`` is that the loop over the elements runs
221entirely on the C++ side and can be crunched down into a tight, optimized loop
222by the compiler. The result is returned as a NumPy array of type
223``numpy.dtype.float64``.
224
225.. code-block:: pycon
226
227 >>> x = np.array([[1, 3],[5, 7]])
228 >>> y = np.array([[2, 4],[6, 8]])
229 >>> z = 3
230 >>> result = vectorized_func(x, y, z)
231
232The scalar argument ``z`` is transparently replicated 4 times. The input
233arrays ``x`` and ``y`` are automatically converted into the right types (they
234are of type ``numpy.dtype.int64`` but need to be ``numpy.dtype.int32`` and
235``numpy.dtype.float32``, respectively)
236
237Sometimes we might want to explicitly exclude an argument from the vectorization
238because it makes little sense to wrap it in a NumPy array. For instance,
239suppose the function signature was
240
241.. code-block:: cpp
242
243 double my_func(int x, float y, my_custom_type *z);
244
245This can be done with a stateful Lambda closure:
246
247.. code-block:: cpp
248
249 // Vectorize a lambda function with a capture object (e.g. to exclude some arguments from the vectorization)
250 m.def("vectorized_func",
251 [](py::array_t<int> x, py::array_t<float> y, my_custom_type *z) {
252 auto stateful_closure = [z](int x, float y) { return my_func(x, y, z); };
253 return py::vectorize(stateful_closure)(x, y);
254 }
255 );
256
257In cases where the computation is too complicated to be reduced to
258``vectorize``, it will be necessary to create and access the buffer contents
259manually. The following snippet contains a complete example that shows how this
260works (the code is somewhat contrived, since it could have been done more
261simply using ``vectorize``).
262
263.. code-block:: cpp
264
265 #include <pybind11/pybind11.h>
266 #include <pybind11/numpy.h>
267
268 namespace py = pybind11;
269
270 py::array_t<double> add_arrays(py::array_t<double> input1, py::array_t<double> input2) {
271 auto buf1 = input1.request(), buf2 = input2.request();
272
273 if (buf1.ndim != 1 || buf2.ndim != 1)
274 throw std::runtime_error("Number of dimensions must be one");
275
276 if (buf1.size != buf2.size)
277 throw std::runtime_error("Input shapes must match");
278
279 /* No pointer is passed, so NumPy will allocate the buffer */
280 auto result = py::array_t<double>(buf1.size);
281
282 auto buf3 = result.request();
283
284 double *ptr1 = (double *) buf1.ptr,
285 *ptr2 = (double *) buf2.ptr,
286 *ptr3 = (double *) buf3.ptr;
287
288 for (size_t idx = 0; idx < buf1.shape[0]; idx++)
289 ptr3[idx] = ptr1[idx] + ptr2[idx];
290
291 return result;
292 }
293
294 PYBIND11_PLUGIN(test) {
295 py::module m("test");
296 m.def("add_arrays", &add_arrays, "Add two NumPy arrays");
297 return m.ptr();
298 }
299
300.. seealso::
301
302 The file :file:`tests/test_numpy_vectorize.cpp` contains a complete
303 example that demonstrates using :func:`vectorize` in more detail.
Jason Rhinelander423a49b2017-03-19 01:14:23 -0300304
305Direct access
306=============
307
308For performance reasons, particularly when dealing with very large arrays, it
309is often desirable to directly access array elements without internal checking
310of dimensions and bounds on every access when indices are known to be already
311valid. To avoid such checks, the ``array`` class and ``array_t<T>`` template
312class offer an unchecked proxy object that can be used for this unchecked
313access through the ``unchecked<N>`` and ``mutable_unchecked<N>`` methods,
314where ``N`` gives the required dimensionality of the array:
315
316.. code-block:: cpp
317
318 m.def("sum_3d", [](py::array_t<double> x) {
319 auto r = x.unchecked<3>(); // x must have ndim = 3; can be non-writeable
320 double sum = 0;
Cris Luengo30d43c42017-04-14 14:33:44 -0600321 for (ssize_t i = 0; i < r.shape(0); i++)
322 for (ssize_t j = 0; j < r.shape(1); j++)
323 for (ssize_t k = 0; k < r.shape(2); k++)
Jason Rhinelander423a49b2017-03-19 01:14:23 -0300324 sum += r(i, j, k);
325 return sum;
326 });
327 m.def("increment_3d", [](py::array_t<double> x) {
328 auto r = x.mutable_unchecked<3>(); // Will throw if ndim != 3 or flags.writeable is false
Cris Luengo30d43c42017-04-14 14:33:44 -0600329 for (ssize_t i = 0; i < r.shape(0); i++)
330 for (ssize_t j = 0; j < r.shape(1); j++)
331 for (ssize_t k = 0; k < r.shape(2); k++)
Jason Rhinelander423a49b2017-03-19 01:14:23 -0300332 r(i, j, k) += 1.0;
333 }, py::arg().noconvert());
334
335To obtain the proxy from an ``array`` object, you must specify both the data
336type and number of dimensions as template arguments, such as ``auto r =
337myarray.mutable_unchecked<float, 2>()``.
338
Jason Rhinelander773339f2017-03-20 17:48:38 -0300339If the number of dimensions is not known at compile time, you can omit the
340dimensions template parameter (i.e. calling ``arr_t.unchecked()`` or
341``arr.unchecked<T>()``. This will give you a proxy object that works in the
342same way, but results in less optimizable code and thus a small efficiency
343loss in tight loops.
344
Jason Rhinelander423a49b2017-03-19 01:14:23 -0300345Note that the returned proxy object directly references the array's data, and
346only reads its shape, strides, and writeable flag when constructed. You must
347take care to ensure that the referenced array is not destroyed or reshaped for
348the duration of the returned object, typically by limiting the scope of the
349returned instance.
350
Jason Rhinelander773339f2017-03-20 17:48:38 -0300351The returned proxy object supports some of the same methods as ``py::array`` so
352that it can be used as a drop-in replacement for some existing, index-checked
353uses of ``py::array``:
354
355- ``r.ndim()`` returns the number of dimensions
356
357- ``r.data(1, 2, ...)`` and ``r.mutable_data(1, 2, ...)``` returns a pointer to
358 the ``const T`` or ``T`` data, respectively, at the given indices. The
359 latter is only available to proxies obtained via ``a.mutable_unchecked()``.
360
361- ``itemsize()`` returns the size of an item in bytes, i.e. ``sizeof(T)``.
362
363- ``ndim()`` returns the number of dimensions.
364
365- ``shape(n)`` returns the size of dimension ``n``
366
367- ``size()`` returns the total number of elements (i.e. the product of the shapes).
368
369- ``nbytes()`` returns the number of bytes used by the referenced elements
370 (i.e. ``itemsize()`` times ``size()``).
371
Jason Rhinelander423a49b2017-03-19 01:14:23 -0300372.. seealso::
373
374 The file :file:`tests/test_numpy_array.cpp` contains additional examples
375 demonstrating the use of this feature.