blob: 0a81aa8fd54b94928d5748a04134967e8a5d8875 [file] [log] [blame]
Dean Moldovan67b52d82016-10-16 19:12:43 +02001.. _numpy:
2
3NumPy
4#####
5
6Buffer protocol
7===============
8
9Python supports an extremely general and convenient approach for exchanging
10data between plugin libraries. Types can expose a buffer view [#f2]_, which
11provides fast direct access to the raw internal data representation. Suppose we
12want to bind the following simplistic Matrix class:
13
14.. code-block:: cpp
15
16 class Matrix {
17 public:
18 Matrix(size_t rows, size_t cols) : m_rows(rows), m_cols(cols) {
19 m_data = new float[rows*cols];
20 }
21 float *data() { return m_data; }
22 size_t rows() const { return m_rows; }
23 size_t cols() const { return m_cols; }
24 private:
25 size_t m_rows, m_cols;
26 float *m_data;
27 };
28
29The following binding code exposes the ``Matrix`` contents as a buffer object,
30making it possible to cast Matrices into NumPy arrays. It is even possible to
31completely avoid copy operations with Python expressions like
32``np.array(matrix_instance, copy = False)``.
33
34.. code-block:: cpp
35
Wenzel Jakob1d1f81b2016-12-16 15:00:46 +010036 py::class_<Matrix>(m, "Matrix", py::buffer_protocol())
Dean Moldovan67b52d82016-10-16 19:12:43 +020037 .def_buffer([](Matrix &m) -> py::buffer_info {
38 return py::buffer_info(
39 m.data(), /* Pointer to buffer */
40 sizeof(float), /* Size of one scalar */
41 py::format_descriptor<float>::format(), /* Python struct-style format descriptor */
42 2, /* Number of dimensions */
43 { m.rows(), m.cols() }, /* Buffer dimensions */
Ansgar Burchardta22dd2d2017-09-21 23:07:48 +020044 { sizeof(float) * m.cols(), /* Strides (in bytes) for each index */
Jason Rhinelanderb68959e2017-04-06 18:16:35 -040045 sizeof(float) }
Dean Moldovan67b52d82016-10-16 19:12:43 +020046 );
47 });
48
Wenzel Jakob1d1f81b2016-12-16 15:00:46 +010049Supporting the buffer protocol in a new type involves specifying the special
50``py::buffer_protocol()`` tag in the ``py::class_`` constructor and calling the
51``def_buffer()`` method with a lambda function that creates a
52``py::buffer_info`` description record on demand describing a given matrix
53instance. The contents of ``py::buffer_info`` mirror the Python buffer protocol
54specification.
Dean Moldovan67b52d82016-10-16 19:12:43 +020055
56.. code-block:: cpp
57
58 struct buffer_info {
59 void *ptr;
Henry Schreinerf2008322020-10-15 09:26:38 -040060 py::ssize_t itemsize;
Dean Moldovan67b52d82016-10-16 19:12:43 +020061 std::string format;
Henry Schreinerf2008322020-10-15 09:26:38 -040062 py::ssize_t ndim;
63 std::vector<py::ssize_t> shape;
64 std::vector<py::ssize_t> strides;
Dean Moldovan67b52d82016-10-16 19:12:43 +020065 };
66
67To create a C++ function that can take a Python buffer object as an argument,
68simply use the type ``py::buffer`` as one of its arguments. Buffers can exist
69in a great variety of configurations, hence some safety checks are usually
Guilherme Dantas89085522020-04-27 17:19:17 -030070necessary in the function body. Below, you can see a basic example on how to
Dean Moldovan67b52d82016-10-16 19:12:43 +020071define a custom constructor for the Eigen double precision matrix
72(``Eigen::MatrixXd``) type, which supports initialization from compatible
73buffer objects (e.g. a NumPy matrix).
74
75.. code-block:: cpp
76
77 /* Bind MatrixXd (or some other Eigen type) to Python */
78 typedef Eigen::MatrixXd Matrix;
79
80 typedef Matrix::Scalar Scalar;
81 constexpr bool rowMajor = Matrix::Flags & Eigen::RowMajorBit;
82
Wenzel Jakob1d1f81b2016-12-16 15:00:46 +010083 py::class_<Matrix>(m, "Matrix", py::buffer_protocol())
Yannick Jadoul44937512020-08-23 18:35:51 +020084 .def(py::init([](py::buffer b) {
Dean Moldovan67b52d82016-10-16 19:12:43 +020085 typedef Eigen::Stride<Eigen::Dynamic, Eigen::Dynamic> Strides;
86
87 /* Request a buffer descriptor from Python */
88 py::buffer_info info = b.request();
89
90 /* Some sanity checks ... */
91 if (info.format != py::format_descriptor<Scalar>::format())
92 throw std::runtime_error("Incompatible format: expected a double array!");
93
94 if (info.ndim != 2)
95 throw std::runtime_error("Incompatible buffer dimension!");
96
97 auto strides = Strides(
Cris Luengo30d43c42017-04-14 14:33:44 -060098 info.strides[rowMajor ? 0 : 1] / (py::ssize_t)sizeof(Scalar),
99 info.strides[rowMajor ? 1 : 0] / (py::ssize_t)sizeof(Scalar));
Dean Moldovan67b52d82016-10-16 19:12:43 +0200100
101 auto map = Eigen::Map<Matrix, 0, Strides>(
chenzy39b9e042017-05-26 16:46:00 +0800102 static_cast<Scalar *>(info.ptr), info.shape[0], info.shape[1], strides);
Dean Moldovan67b52d82016-10-16 19:12:43 +0200103
Yannick Jadoul44937512020-08-23 18:35:51 +0200104 return Matrix(map);
105 }));
Dean Moldovan67b52d82016-10-16 19:12:43 +0200106
107For reference, the ``def_buffer()`` call for this Eigen data type should look
108as follows:
109
110.. code-block:: cpp
111
112 .def_buffer([](Matrix &m) -> py::buffer_info {
113 return py::buffer_info(
Cris Luengo30d43c42017-04-14 14:33:44 -0600114 m.data(), /* Pointer to buffer */
115 sizeof(Scalar), /* Size of one scalar */
116 py::format_descriptor<Scalar>::format(), /* Python struct-style format descriptor */
117 2, /* Number of dimensions */
118 { m.rows(), m.cols() }, /* Buffer dimensions */
Jason Rhinelanderb68959e2017-04-06 18:16:35 -0400119 { sizeof(Scalar) * (rowMajor ? m.cols() : 1),
120 sizeof(Scalar) * (rowMajor ? 1 : m.rows()) }
Cris Luengo30d43c42017-04-14 14:33:44 -0600121 /* Strides (in bytes) for each index */
Dean Moldovan67b52d82016-10-16 19:12:43 +0200122 );
123 })
124
125For a much easier approach of binding Eigen types (although with some
126limitations), refer to the section on :doc:`/advanced/cast/eigen`.
127
128.. seealso::
129
130 The file :file:`tests/test_buffers.cpp` contains a complete example
131 that demonstrates using the buffer protocol with pybind11 in more detail.
132
133.. [#f2] http://docs.python.org/3/c-api/buffer.html
134
135Arrays
136======
137
138By exchanging ``py::buffer`` with ``py::array`` in the above snippet, we can
139restrict the function so that it only accepts NumPy arrays (rather than any
140type of Python object satisfying the buffer protocol).
141
142In many situations, we want to define a function which only accepts a NumPy
143array of a certain data type. This is possible via the ``py::array_t<T>``
144template. For instance, the following function requires the argument to be a
145NumPy array containing double precision values.
146
147.. code-block:: cpp
148
149 void f(py::array_t<double> array);
150
151When it is invoked with a different type (e.g. an integer or a list of
152integers), the binding code will attempt to cast the input into a NumPy array
Antony Leed068ab22020-12-09 00:07:36 +0100153of the requested type. This feature requires the :file:`pybind11/numpy.h`
154header to be included. Note that :file:`pybind11/numpy.h` does not depend on
155the NumPy headers, and thus can be used without declaring a build-time
156dependency on NumPy; NumPy>=1.7.0 is a runtime dependency.
Dean Moldovan67b52d82016-10-16 19:12:43 +0200157
158Data in NumPy arrays is not guaranteed to packed in a dense manner;
159furthermore, entries can be separated by arbitrary column and row strides.
160Sometimes, it can be useful to require a function to only accept dense arrays
161using either the C (row-major) or Fortran (column-major) ordering. This can be
162accomplished via a second template argument with values ``py::array::c_style``
163or ``py::array::f_style``.
164
165.. code-block:: cpp
166
167 void f(py::array_t<double, py::array::c_style | py::array::forcecast> array);
168
169The ``py::array::forcecast`` argument is the default value of the second
170template parameter, and it ensures that non-conforming arguments are converted
171into an array satisfying the specified requirements instead of trying the next
172function overload.
173
174Structured types
175================
176
Jason Rhinelanderf7f5bc82017-01-31 11:00:15 -0500177In order for ``py::array_t`` to work with structured (record) types, we first
178need to register the memory layout of the type. This can be done via
179``PYBIND11_NUMPY_DTYPE`` macro, called in the plugin definition code, which
180expects the type followed by field names:
Dean Moldovan67b52d82016-10-16 19:12:43 +0200181
182.. code-block:: cpp
183
184 struct A {
185 int x;
186 double y;
187 };
188
189 struct B {
190 int z;
191 A a;
192 };
193
Jason Rhinelanderf7f5bc82017-01-31 11:00:15 -0500194 // ...
Dean Moldovan443ab592017-04-24 01:51:44 +0200195 PYBIND11_MODULE(test, m) {
Jason Rhinelanderf7f5bc82017-01-31 11:00:15 -0500196 // ...
Dean Moldovan67b52d82016-10-16 19:12:43 +0200197
Jason Rhinelanderf7f5bc82017-01-31 11:00:15 -0500198 PYBIND11_NUMPY_DTYPE(A, x, y);
199 PYBIND11_NUMPY_DTYPE(B, z, a);
200 /* now both A and B can be used as template arguments to py::array_t */
201 }
Dean Moldovan67b52d82016-10-16 19:12:43 +0200202
Bruce Merryb82c0f02017-05-10 11:36:24 +0200203The structure should consist of fundamental arithmetic types, ``std::complex``,
204previously registered substructures, and arrays of any of the above. Both C++
205arrays and ``std::array`` are supported. While there is a static assertion to
206prevent many types of unsupported structures, it is still the user's
207responsibility to use only "plain" structures that can be safely manipulated as
208raw memory without violating invariants.
Bruce Merry8e0d8322017-05-10 10:21:01 +0200209
Dean Moldovan67b52d82016-10-16 19:12:43 +0200210Vectorizing functions
211=====================
212
213Suppose we want to bind a function with the following signature to Python so
214that it can process arbitrary NumPy array arguments (vectors, matrices, general
215N-D arrays) in addition to its normal arguments:
216
217.. code-block:: cpp
218
219 double my_func(int x, float y, double z);
220
221After including the ``pybind11/numpy.h`` header, this is extremely simple:
222
223.. code-block:: cpp
224
225 m.def("vectorized_func", py::vectorize(my_func));
226
227Invoking the function like below causes 4 calls to be made to ``my_func`` with
228each of the array elements. The significant advantage of this compared to
229solutions like ``numpy.vectorize()`` is that the loop over the elements runs
230entirely on the C++ side and can be crunched down into a tight, optimized loop
231by the compiler. The result is returned as a NumPy array of type
232``numpy.dtype.float64``.
233
234.. code-block:: pycon
235
236 >>> x = np.array([[1, 3],[5, 7]])
237 >>> y = np.array([[2, 4],[6, 8]])
238 >>> z = 3
239 >>> result = vectorized_func(x, y, z)
240
241The scalar argument ``z`` is transparently replicated 4 times. The input
242arrays ``x`` and ``y`` are automatically converted into the right types (they
243are of type ``numpy.dtype.int64`` but need to be ``numpy.dtype.int32`` and
Jason Rhinelanderf3ce00e2017-03-26 00:51:40 -0300244``numpy.dtype.float32``, respectively).
Dean Moldovan67b52d82016-10-16 19:12:43 +0200245
Jason Rhinelanderf3ce00e2017-03-26 00:51:40 -0300246.. note::
Dean Moldovan67b52d82016-10-16 19:12:43 +0200247
Jason Rhinelanderf3ce00e2017-03-26 00:51:40 -0300248 Only arithmetic, complex, and POD types passed by value or by ``const &``
249 reference are vectorized; all other arguments are passed through as-is.
250 Functions taking rvalue reference arguments cannot be vectorized.
Dean Moldovan67b52d82016-10-16 19:12:43 +0200251
252In cases where the computation is too complicated to be reduced to
253``vectorize``, it will be necessary to create and access the buffer contents
254manually. The following snippet contains a complete example that shows how this
255works (the code is somewhat contrived, since it could have been done more
256simply using ``vectorize``).
257
258.. code-block:: cpp
259
260 #include <pybind11/pybind11.h>
261 #include <pybind11/numpy.h>
262
263 namespace py = pybind11;
264
265 py::array_t<double> add_arrays(py::array_t<double> input1, py::array_t<double> input2) {
Thomas Hrabe534b7562018-06-24 06:41:27 -0700266 py::buffer_info buf1 = input1.request(), buf2 = input2.request();
Dean Moldovan67b52d82016-10-16 19:12:43 +0200267
268 if (buf1.ndim != 1 || buf2.ndim != 1)
269 throw std::runtime_error("Number of dimensions must be one");
270
271 if (buf1.size != buf2.size)
272 throw std::runtime_error("Input shapes must match");
273
274 /* No pointer is passed, so NumPy will allocate the buffer */
275 auto result = py::array_t<double>(buf1.size);
276
Thomas Hrabe534b7562018-06-24 06:41:27 -0700277 py::buffer_info buf3 = result.request();
Dean Moldovan67b52d82016-10-16 19:12:43 +0200278
Boris Staletic32bb9072020-09-14 20:07:29 +0200279 double *ptr1 = static_cast<double *>(buf1.ptr);
280 double *ptr2 = static_cast<double *>(buf2.ptr);
281 double *ptr3 = static_cast<double *>(buf3.ptr);
Dean Moldovan67b52d82016-10-16 19:12:43 +0200282
283 for (size_t idx = 0; idx < buf1.shape[0]; idx++)
284 ptr3[idx] = ptr1[idx] + ptr2[idx];
285
286 return result;
287 }
288
Dean Moldovan443ab592017-04-24 01:51:44 +0200289 PYBIND11_MODULE(test, m) {
Dean Moldovan67b52d82016-10-16 19:12:43 +0200290 m.def("add_arrays", &add_arrays, "Add two NumPy arrays");
Dean Moldovan67b52d82016-10-16 19:12:43 +0200291 }
292
293.. seealso::
294
295 The file :file:`tests/test_numpy_vectorize.cpp` contains a complete
296 example that demonstrates using :func:`vectorize` in more detail.
Jason Rhinelander423a49b2017-03-19 01:14:23 -0300297
298Direct access
299=============
300
301For performance reasons, particularly when dealing with very large arrays, it
302is often desirable to directly access array elements without internal checking
303of dimensions and bounds on every access when indices are known to be already
304valid. To avoid such checks, the ``array`` class and ``array_t<T>`` template
305class offer an unchecked proxy object that can be used for this unchecked
306access through the ``unchecked<N>`` and ``mutable_unchecked<N>`` methods,
307where ``N`` gives the required dimensionality of the array:
308
309.. code-block:: cpp
310
311 m.def("sum_3d", [](py::array_t<double> x) {
312 auto r = x.unchecked<3>(); // x must have ndim = 3; can be non-writeable
313 double sum = 0;
Henry Schreinerf2008322020-10-15 09:26:38 -0400314 for (py::ssize_t i = 0; i < r.shape(0); i++)
315 for (py::ssize_t j = 0; j < r.shape(1); j++)
316 for (py::ssize_t k = 0; k < r.shape(2); k++)
Jason Rhinelander423a49b2017-03-19 01:14:23 -0300317 sum += r(i, j, k);
318 return sum;
319 });
320 m.def("increment_3d", [](py::array_t<double> x) {
321 auto r = x.mutable_unchecked<3>(); // Will throw if ndim != 3 or flags.writeable is false
Henry Schreinerf2008322020-10-15 09:26:38 -0400322 for (py::ssize_t i = 0; i < r.shape(0); i++)
323 for (py::ssize_t j = 0; j < r.shape(1); j++)
324 for (py::ssize_t k = 0; k < r.shape(2); k++)
Jason Rhinelander423a49b2017-03-19 01:14:23 -0300325 r(i, j, k) += 1.0;
326 }, py::arg().noconvert());
327
328To obtain the proxy from an ``array`` object, you must specify both the data
329type and number of dimensions as template arguments, such as ``auto r =
330myarray.mutable_unchecked<float, 2>()``.
331
Jason Rhinelander773339f2017-03-20 17:48:38 -0300332If the number of dimensions is not known at compile time, you can omit the
333dimensions template parameter (i.e. calling ``arr_t.unchecked()`` or
334``arr.unchecked<T>()``. This will give you a proxy object that works in the
335same way, but results in less optimizable code and thus a small efficiency
336loss in tight loops.
337
Jason Rhinelander423a49b2017-03-19 01:14:23 -0300338Note that the returned proxy object directly references the array's data, and
339only reads its shape, strides, and writeable flag when constructed. You must
340take care to ensure that the referenced array is not destroyed or reshaped for
341the duration of the returned object, typically by limiting the scope of the
342returned instance.
343
Jason Rhinelander773339f2017-03-20 17:48:38 -0300344The returned proxy object supports some of the same methods as ``py::array`` so
345that it can be used as a drop-in replacement for some existing, index-checked
346uses of ``py::array``:
347
348- ``r.ndim()`` returns the number of dimensions
349
350- ``r.data(1, 2, ...)`` and ``r.mutable_data(1, 2, ...)``` returns a pointer to
351 the ``const T`` or ``T`` data, respectively, at the given indices. The
352 latter is only available to proxies obtained via ``a.mutable_unchecked()``.
353
354- ``itemsize()`` returns the size of an item in bytes, i.e. ``sizeof(T)``.
355
356- ``ndim()`` returns the number of dimensions.
357
358- ``shape(n)`` returns the size of dimension ``n``
359
360- ``size()`` returns the total number of elements (i.e. the product of the shapes).
361
362- ``nbytes()`` returns the number of bytes used by the referenced elements
363 (i.e. ``itemsize()`` times ``size()``).
364
Jason Rhinelander423a49b2017-03-19 01:14:23 -0300365.. seealso::
366
367 The file :file:`tests/test_numpy_array.cpp` contains additional examples
368 demonstrating the use of this feature.
Wenzel Jakobd4b37a22018-08-28 00:23:59 +0200369
370Ellipsis
371========
372
373Python 3 provides a convenient ``...`` ellipsis notation that is often used to
374slice multidimensional arrays. For instance, the following snippet extracts the
375middle dimensions of a tensor with the first and last index set to zero.
Yannick Jadoul3e448c02020-08-04 14:45:55 +0200376In Python 2, the syntactic sugar ``...`` is not available, but the singleton
377``Ellipsis`` (of type ``ellipsis``) can still be used directly.
Wenzel Jakobd4b37a22018-08-28 00:23:59 +0200378
379.. code-block:: python
380
381 a = # a NumPy array
382 b = a[0, ..., 0]
383
384The function ``py::ellipsis()`` function can be used to perform the same
385operation on the C++ side:
386
387.. code-block:: cpp
388
389 py::array a = /* A NumPy array */;
390 py::array b = a[py::make_tuple(0, py::ellipsis(), 0)];
Kota Yamaguchie2488692020-07-16 00:50:43 +0900391
Henry Schreinera6887b62020-08-19 14:53:59 -0400392.. versionchanged:: 2.6
393 ``py::ellipsis()`` is now also avaliable in Python 2.
394
Kota Yamaguchie2488692020-07-16 00:50:43 +0900395Memory view
396===========
397
398For a case when we simply want to provide a direct accessor to C/C++ buffer
399without a concrete class object, we can return a ``memoryview`` object. Suppose
400we wish to expose a ``memoryview`` for 2x4 uint8_t array, we can do the
401following:
402
403.. code-block:: cpp
404
405 const uint8_t buffer[] = {
406 0, 1, 2, 3,
407 4, 5, 6, 7
408 };
409 m.def("get_memoryview2d", []() {
410 return py::memoryview::from_buffer(
411 buffer, // buffer pointer
412 { 2, 4 }, // shape (rows, cols)
413 { sizeof(uint8_t) * 4, sizeof(uint8_t) } // strides in bytes
414 );
415 })
416
417This approach is meant for providing a ``memoryview`` for a C/C++ buffer not
418managed by Python. The user is responsible for managing the lifetime of the
419buffer. Using a ``memoryview`` created in this way after deleting the buffer in
420C++ side results in undefined behavior.
421
422We can also use ``memoryview::from_memory`` for a simple 1D contiguous buffer:
423
424.. code-block:: cpp
425
426 m.def("get_memoryview1d", []() {
427 return py::memoryview::from_memory(
428 buffer, // buffer pointer
429 sizeof(uint8_t) * 8 // buffer size
430 );
431 })
432
433.. note::
434
435 ``memoryview::from_memory`` is not available in Python 2.
Henry Schreinera6887b62020-08-19 14:53:59 -0400436
437.. versionchanged:: 2.6
438 ``memoryview::from_memory`` added.