blob: d06370611704c8e53271bd133ecaaad37613eab0 [file] [log] [blame]
Dean Moldovan67b52d82016-10-16 19:12:43 +02001.. _numpy:
2
3NumPy
4#####
5
6Buffer protocol
7===============
8
9Python supports an extremely general and convenient approach for exchanging
10data between plugin libraries. Types can expose a buffer view [#f2]_, which
11provides fast direct access to the raw internal data representation. Suppose we
12want to bind the following simplistic Matrix class:
13
14.. code-block:: cpp
15
16 class Matrix {
17 public:
18 Matrix(size_t rows, size_t cols) : m_rows(rows), m_cols(cols) {
19 m_data = new float[rows*cols];
20 }
21 float *data() { return m_data; }
22 size_t rows() const { return m_rows; }
23 size_t cols() const { return m_cols; }
24 private:
25 size_t m_rows, m_cols;
26 float *m_data;
27 };
28
29The following binding code exposes the ``Matrix`` contents as a buffer object,
30making it possible to cast Matrices into NumPy arrays. It is even possible to
31completely avoid copy operations with Python expressions like
32``np.array(matrix_instance, copy = False)``.
33
34.. code-block:: cpp
35
Wenzel Jakob1d1f81b2016-12-16 15:00:46 +010036 py::class_<Matrix>(m, "Matrix", py::buffer_protocol())
Dean Moldovan67b52d82016-10-16 19:12:43 +020037 .def_buffer([](Matrix &m) -> py::buffer_info {
38 return py::buffer_info(
39 m.data(), /* Pointer to buffer */
40 sizeof(float), /* Size of one scalar */
41 py::format_descriptor<float>::format(), /* Python struct-style format descriptor */
42 2, /* Number of dimensions */
43 { m.rows(), m.cols() }, /* Buffer dimensions */
Jason Rhinelanderb68959e2017-04-06 18:16:35 -040044 { sizeof(float) * m.rows(), /* Strides (in bytes) for each index */
45 sizeof(float) }
Dean Moldovan67b52d82016-10-16 19:12:43 +020046 );
47 });
48
Wenzel Jakob1d1f81b2016-12-16 15:00:46 +010049Supporting the buffer protocol in a new type involves specifying the special
50``py::buffer_protocol()`` tag in the ``py::class_`` constructor and calling the
51``def_buffer()`` method with a lambda function that creates a
52``py::buffer_info`` description record on demand describing a given matrix
53instance. The contents of ``py::buffer_info`` mirror the Python buffer protocol
54specification.
Dean Moldovan67b52d82016-10-16 19:12:43 +020055
56.. code-block:: cpp
57
58 struct buffer_info {
59 void *ptr;
60 size_t itemsize;
61 std::string format;
62 int ndim;
63 std::vector<size_t> shape;
Cris Luengod400f602017-04-05 16:13:04 -060064 std::vector<ssize_t> strides;
Dean Moldovan67b52d82016-10-16 19:12:43 +020065 };
66
67To create a C++ function that can take a Python buffer object as an argument,
68simply use the type ``py::buffer`` as one of its arguments. Buffers can exist
69in a great variety of configurations, hence some safety checks are usually
70necessary in the function body. Below, you can see an basic example on how to
71define a custom constructor for the Eigen double precision matrix
72(``Eigen::MatrixXd``) type, which supports initialization from compatible
73buffer objects (e.g. a NumPy matrix).
74
75.. code-block:: cpp
76
77 /* Bind MatrixXd (or some other Eigen type) to Python */
78 typedef Eigen::MatrixXd Matrix;
79
80 typedef Matrix::Scalar Scalar;
81 constexpr bool rowMajor = Matrix::Flags & Eigen::RowMajorBit;
82
Wenzel Jakob1d1f81b2016-12-16 15:00:46 +010083 py::class_<Matrix>(m, "Matrix", py::buffer_protocol())
Dean Moldovan67b52d82016-10-16 19:12:43 +020084 .def("__init__", [](Matrix &m, py::buffer b) {
85 typedef Eigen::Stride<Eigen::Dynamic, Eigen::Dynamic> Strides;
86
87 /* Request a buffer descriptor from Python */
88 py::buffer_info info = b.request();
89
90 /* Some sanity checks ... */
91 if (info.format != py::format_descriptor<Scalar>::format())
92 throw std::runtime_error("Incompatible format: expected a double array!");
93
94 if (info.ndim != 2)
95 throw std::runtime_error("Incompatible buffer dimension!");
96
97 auto strides = Strides(
98 info.strides[rowMajor ? 0 : 1] / sizeof(Scalar),
99 info.strides[rowMajor ? 1 : 0] / sizeof(Scalar));
100
101 auto map = Eigen::Map<Matrix, 0, Strides>(
102 static_cat<Scalar *>(info.ptr), info.shape[0], info.shape[1], strides);
103
104 new (&m) Matrix(map);
105 });
106
107For reference, the ``def_buffer()`` call for this Eigen data type should look
108as follows:
109
110.. code-block:: cpp
111
112 .def_buffer([](Matrix &m) -> py::buffer_info {
113 return py::buffer_info(
114 m.data(), /* Pointer to buffer */
115 sizeof(Scalar), /* Size of one scalar */
116 /* Python struct-style format descriptor */
117 py::format_descriptor<Scalar>::format(),
118 /* Number of dimensions */
119 2,
120 /* Buffer dimensions */
Jason Rhinelanderb68959e2017-04-06 18:16:35 -0400121 { m.rows(), m.cols() },
Dean Moldovan67b52d82016-10-16 19:12:43 +0200122 /* Strides (in bytes) for each index */
Jason Rhinelanderb68959e2017-04-06 18:16:35 -0400123 { sizeof(Scalar) * (rowMajor ? m.cols() : 1),
124 sizeof(Scalar) * (rowMajor ? 1 : m.rows()) }
Dean Moldovan67b52d82016-10-16 19:12:43 +0200125 );
126 })
127
128For a much easier approach of binding Eigen types (although with some
129limitations), refer to the section on :doc:`/advanced/cast/eigen`.
130
131.. seealso::
132
133 The file :file:`tests/test_buffers.cpp` contains a complete example
134 that demonstrates using the buffer protocol with pybind11 in more detail.
135
136.. [#f2] http://docs.python.org/3/c-api/buffer.html
137
138Arrays
139======
140
141By exchanging ``py::buffer`` with ``py::array`` in the above snippet, we can
142restrict the function so that it only accepts NumPy arrays (rather than any
143type of Python object satisfying the buffer protocol).
144
145In many situations, we want to define a function which only accepts a NumPy
146array of a certain data type. This is possible via the ``py::array_t<T>``
147template. For instance, the following function requires the argument to be a
148NumPy array containing double precision values.
149
150.. code-block:: cpp
151
152 void f(py::array_t<double> array);
153
154When it is invoked with a different type (e.g. an integer or a list of
155integers), the binding code will attempt to cast the input into a NumPy array
156of the requested type. Note that this feature requires the
Jason Rhinelander12494522017-01-31 11:28:29 -0500157:file:`pybind11/numpy.h` header to be included.
Dean Moldovan67b52d82016-10-16 19:12:43 +0200158
159Data in NumPy arrays is not guaranteed to packed in a dense manner;
160furthermore, entries can be separated by arbitrary column and row strides.
161Sometimes, it can be useful to require a function to only accept dense arrays
162using either the C (row-major) or Fortran (column-major) ordering. This can be
163accomplished via a second template argument with values ``py::array::c_style``
164or ``py::array::f_style``.
165
166.. code-block:: cpp
167
168 void f(py::array_t<double, py::array::c_style | py::array::forcecast> array);
169
170The ``py::array::forcecast`` argument is the default value of the second
171template parameter, and it ensures that non-conforming arguments are converted
172into an array satisfying the specified requirements instead of trying the next
173function overload.
174
175Structured types
176================
177
Jason Rhinelanderf7f5bc82017-01-31 11:00:15 -0500178In order for ``py::array_t`` to work with structured (record) types, we first
179need to register the memory layout of the type. This can be done via
180``PYBIND11_NUMPY_DTYPE`` macro, called in the plugin definition code, which
181expects the type followed by field names:
Dean Moldovan67b52d82016-10-16 19:12:43 +0200182
183.. code-block:: cpp
184
185 struct A {
186 int x;
187 double y;
188 };
189
190 struct B {
191 int z;
192 A a;
193 };
194
Jason Rhinelanderf7f5bc82017-01-31 11:00:15 -0500195 // ...
196 PYBIND11_PLUGIN(test) {
197 // ...
Dean Moldovan67b52d82016-10-16 19:12:43 +0200198
Jason Rhinelanderf7f5bc82017-01-31 11:00:15 -0500199 PYBIND11_NUMPY_DTYPE(A, x, y);
200 PYBIND11_NUMPY_DTYPE(B, z, a);
201 /* now both A and B can be used as template arguments to py::array_t */
202 }
Dean Moldovan67b52d82016-10-16 19:12:43 +0200203
204Vectorizing functions
205=====================
206
207Suppose we want to bind a function with the following signature to Python so
208that it can process arbitrary NumPy array arguments (vectors, matrices, general
209N-D arrays) in addition to its normal arguments:
210
211.. code-block:: cpp
212
213 double my_func(int x, float y, double z);
214
215After including the ``pybind11/numpy.h`` header, this is extremely simple:
216
217.. code-block:: cpp
218
219 m.def("vectorized_func", py::vectorize(my_func));
220
221Invoking the function like below causes 4 calls to be made to ``my_func`` with
222each of the array elements. The significant advantage of this compared to
223solutions like ``numpy.vectorize()`` is that the loop over the elements runs
224entirely on the C++ side and can be crunched down into a tight, optimized loop
225by the compiler. The result is returned as a NumPy array of type
226``numpy.dtype.float64``.
227
228.. code-block:: pycon
229
230 >>> x = np.array([[1, 3],[5, 7]])
231 >>> y = np.array([[2, 4],[6, 8]])
232 >>> z = 3
233 >>> result = vectorized_func(x, y, z)
234
235The scalar argument ``z`` is transparently replicated 4 times. The input
236arrays ``x`` and ``y`` are automatically converted into the right types (they
237are of type ``numpy.dtype.int64`` but need to be ``numpy.dtype.int32`` and
238``numpy.dtype.float32``, respectively)
239
240Sometimes we might want to explicitly exclude an argument from the vectorization
241because it makes little sense to wrap it in a NumPy array. For instance,
242suppose the function signature was
243
244.. code-block:: cpp
245
246 double my_func(int x, float y, my_custom_type *z);
247
248This can be done with a stateful Lambda closure:
249
250.. code-block:: cpp
251
252 // Vectorize a lambda function with a capture object (e.g. to exclude some arguments from the vectorization)
253 m.def("vectorized_func",
254 [](py::array_t<int> x, py::array_t<float> y, my_custom_type *z) {
255 auto stateful_closure = [z](int x, float y) { return my_func(x, y, z); };
256 return py::vectorize(stateful_closure)(x, y);
257 }
258 );
259
260In cases where the computation is too complicated to be reduced to
261``vectorize``, it will be necessary to create and access the buffer contents
262manually. The following snippet contains a complete example that shows how this
263works (the code is somewhat contrived, since it could have been done more
264simply using ``vectorize``).
265
266.. code-block:: cpp
267
268 #include <pybind11/pybind11.h>
269 #include <pybind11/numpy.h>
270
271 namespace py = pybind11;
272
273 py::array_t<double> add_arrays(py::array_t<double> input1, py::array_t<double> input2) {
274 auto buf1 = input1.request(), buf2 = input2.request();
275
276 if (buf1.ndim != 1 || buf2.ndim != 1)
277 throw std::runtime_error("Number of dimensions must be one");
278
279 if (buf1.size != buf2.size)
280 throw std::runtime_error("Input shapes must match");
281
282 /* No pointer is passed, so NumPy will allocate the buffer */
283 auto result = py::array_t<double>(buf1.size);
284
285 auto buf3 = result.request();
286
287 double *ptr1 = (double *) buf1.ptr,
288 *ptr2 = (double *) buf2.ptr,
289 *ptr3 = (double *) buf3.ptr;
290
291 for (size_t idx = 0; idx < buf1.shape[0]; idx++)
292 ptr3[idx] = ptr1[idx] + ptr2[idx];
293
294 return result;
295 }
296
297 PYBIND11_PLUGIN(test) {
298 py::module m("test");
299 m.def("add_arrays", &add_arrays, "Add two NumPy arrays");
300 return m.ptr();
301 }
302
303.. seealso::
304
305 The file :file:`tests/test_numpy_vectorize.cpp` contains a complete
306 example that demonstrates using :func:`vectorize` in more detail.
Jason Rhinelander423a49b2017-03-19 01:14:23 -0300307
308Direct access
309=============
310
311For performance reasons, particularly when dealing with very large arrays, it
312is often desirable to directly access array elements without internal checking
313of dimensions and bounds on every access when indices are known to be already
314valid. To avoid such checks, the ``array`` class and ``array_t<T>`` template
315class offer an unchecked proxy object that can be used for this unchecked
316access through the ``unchecked<N>`` and ``mutable_unchecked<N>`` methods,
317where ``N`` gives the required dimensionality of the array:
318
319.. code-block:: cpp
320
321 m.def("sum_3d", [](py::array_t<double> x) {
322 auto r = x.unchecked<3>(); // x must have ndim = 3; can be non-writeable
323 double sum = 0;
324 for (size_t i = 0; i < r.shape(0); i++)
325 for (size_t j = 0; j < r.shape(1); j++)
326 for (size_t k = 0; k < r.shape(2); k++)
327 sum += r(i, j, k);
328 return sum;
329 });
330 m.def("increment_3d", [](py::array_t<double> x) {
331 auto r = x.mutable_unchecked<3>(); // Will throw if ndim != 3 or flags.writeable is false
332 for (size_t i = 0; i < r.shape(0); i++)
333 for (size_t j = 0; j < r.shape(1); j++)
334 for (size_t k = 0; k < r.shape(2); k++)
335 r(i, j, k) += 1.0;
336 }, py::arg().noconvert());
337
338To obtain the proxy from an ``array`` object, you must specify both the data
339type and number of dimensions as template arguments, such as ``auto r =
340myarray.mutable_unchecked<float, 2>()``.
341
Jason Rhinelander773339f2017-03-20 17:48:38 -0300342If the number of dimensions is not known at compile time, you can omit the
343dimensions template parameter (i.e. calling ``arr_t.unchecked()`` or
344``arr.unchecked<T>()``. This will give you a proxy object that works in the
345same way, but results in less optimizable code and thus a small efficiency
346loss in tight loops.
347
Jason Rhinelander423a49b2017-03-19 01:14:23 -0300348Note that the returned proxy object directly references the array's data, and
349only reads its shape, strides, and writeable flag when constructed. You must
350take care to ensure that the referenced array is not destroyed or reshaped for
351the duration of the returned object, typically by limiting the scope of the
352returned instance.
353
Jason Rhinelander773339f2017-03-20 17:48:38 -0300354The returned proxy object supports some of the same methods as ``py::array`` so
355that it can be used as a drop-in replacement for some existing, index-checked
356uses of ``py::array``:
357
358- ``r.ndim()`` returns the number of dimensions
359
360- ``r.data(1, 2, ...)`` and ``r.mutable_data(1, 2, ...)``` returns a pointer to
361 the ``const T`` or ``T`` data, respectively, at the given indices. The
362 latter is only available to proxies obtained via ``a.mutable_unchecked()``.
363
364- ``itemsize()`` returns the size of an item in bytes, i.e. ``sizeof(T)``.
365
366- ``ndim()`` returns the number of dimensions.
367
368- ``shape(n)`` returns the size of dimension ``n``
369
370- ``size()`` returns the total number of elements (i.e. the product of the shapes).
371
372- ``nbytes()`` returns the number of bytes used by the referenced elements
373 (i.e. ``itemsize()`` times ``size()``).
374
Jason Rhinelander423a49b2017-03-19 01:14:23 -0300375.. seealso::
376
377 The file :file:`tests/test_numpy_array.cpp` contains additional examples
378 demonstrating the use of this feature.