Blame - docs/advanced/pycpp/numpy.rst - platform/external/python/pybind11

blob: 98b0c25b9fe6112edeb1852447801df9430ad1c8 [file] [log] [blame]

Dean Moldovan	67b52d8	2016-10-16 19:12:43 +0200	[diff] [blame]	1	.. _numpy:
				2
				3	NumPy
				4	#####
				5
				6	Buffer protocol
				7	===============
				8
				9	Python supports an extremely general and convenient approach for exchanging
				10	data between plugin libraries. Types can expose a buffer view [#f2]_, which
				11	provides fast direct access to the raw internal data representation. Suppose we
				12	want to bind the following simplistic Matrix class:
				13
				14	.. code-block:: cpp
				15
				16	class Matrix {
				17	public:
				18	Matrix(size_t rows, size_t cols) : m_rows(rows), m_cols(cols) {
				19	m_data = new float[rows*cols];
				20	}
				21	float *data() { return m_data; }
				22	size_t rows() const { return m_rows; }
				23	size_t cols() const { return m_cols; }
				24	private:
				25	size_t m_rows, m_cols;
				26	float *m_data;
				27	};
				28
				29	The following binding code exposes the ``Matrix`` contents as a buffer object,
				30	making it possible to cast Matrices into NumPy arrays. It is even possible to
				31	completely avoid copy operations with Python expressions like
				32	``np.array(matrix_instance, copy = False)``.
				33
				34	.. code-block:: cpp
				35
Wenzel Jakob	1d1f81b	2016-12-16 15:00:46 +0100	[diff] [blame]	36	py::class_<Matrix>(m, "Matrix", py::buffer_protocol())
Dean Moldovan	67b52d8	2016-10-16 19:12:43 +0200	[diff] [blame]	37	.def_buffer([](Matrix &m) -> py::buffer_info {
				38	return py::buffer_info(
				39	m.data(), /* Pointer to buffer */
				40	sizeof(float), /* Size of one scalar */
				41	py::format_descriptor<float>::format(), /* Python struct-style format descriptor */
				42	2, /* Number of dimensions */
				43	{ m.rows(), m.cols() }, /* Buffer dimensions */
Jason Rhinelander	b68959e	2017-04-06 18:16:35 -0400	[diff] [blame]	44	{ sizeof(float) * m.rows(), /* Strides (in bytes) for each index */
				45	sizeof(float) }
Dean Moldovan	67b52d8	2016-10-16 19:12:43 +0200	[diff] [blame]	46	);
				47	});
				48
Wenzel Jakob	1d1f81b	2016-12-16 15:00:46 +0100	[diff] [blame]	49	Supporting the buffer protocol in a new type involves specifying the special
				50	``py::buffer_protocol()`` tag in the ``py::class_`` constructor and calling the
				51	``def_buffer()`` method with a lambda function that creates a
				52	``py::buffer_info`` description record on demand describing a given matrix
				53	instance. The contents of ``py::buffer_info`` mirror the Python buffer protocol
				54	specification.
Dean Moldovan	67b52d8	2016-10-16 19:12:43 +0200	[diff] [blame]	55
				56	.. code-block:: cpp
				57
				58	struct buffer_info {
				59	void *ptr;
Cris Luengo	30d43c4	2017-04-14 14:33:44 -0600	[diff] [blame]	60	ssize_t itemsize;
Dean Moldovan	67b52d8	2016-10-16 19:12:43 +0200	[diff] [blame]	61	std::string format;
Cris Luengo	30d43c4	2017-04-14 14:33:44 -0600	[diff] [blame]	62	ssize_t ndim;
				63	std::vector<ssize_t> shape;
Cris Luengo	d400f60	2017-04-05 16:13:04 -0600	[diff] [blame]	64	std::vector<ssize_t> strides;
Dean Moldovan	67b52d8	2016-10-16 19:12:43 +0200	[diff] [blame]	65	};
				66
				67	To create a C++ function that can take a Python buffer object as an argument,
				68	simply use the type ``py::buffer`` as one of its arguments. Buffers can exist
				69	in a great variety of configurations, hence some safety checks are usually
				70	necessary in the function body. Below, you can see an basic example on how to
				71	define a custom constructor for the Eigen double precision matrix
				72	(``Eigen::MatrixXd``) type, which supports initialization from compatible
				73	buffer objects (e.g. a NumPy matrix).
				74
				75	.. code-block:: cpp
				76
				77	/* Bind MatrixXd (or some other Eigen type) to Python */
				78	typedef Eigen::MatrixXd Matrix;
				79
				80	typedef Matrix::Scalar Scalar;
				81	constexpr bool rowMajor = Matrix::Flags & Eigen::RowMajorBit;
				82
Wenzel Jakob	1d1f81b	2016-12-16 15:00:46 +0100	[diff] [blame]	83	py::class_<Matrix>(m, "Matrix", py::buffer_protocol())
Dean Moldovan	67b52d8	2016-10-16 19:12:43 +0200	[diff] [blame]	84	.def("__init__", [](Matrix &m, py::buffer b) {
				85	typedef Eigen::Stride<Eigen::Dynamic, Eigen::Dynamic> Strides;
				86
				87	/* Request a buffer descriptor from Python */
				88	py::buffer_info info = b.request();
				89
				90	/* Some sanity checks ... */
				91	if (info.format != py::format_descriptor<Scalar>::format())
				92	throw std::runtime_error("Incompatible format: expected a double array!");
				93
				94	if (info.ndim != 2)
				95	throw std::runtime_error("Incompatible buffer dimension!");
				96
				97	auto strides = Strides(
Cris Luengo	30d43c4	2017-04-14 14:33:44 -0600	[diff] [blame]	98	info.strides[rowMajor ? 0 : 1] / (py::ssize_t)sizeof(Scalar),
				99	info.strides[rowMajor ? 1 : 0] / (py::ssize_t)sizeof(Scalar));
Dean Moldovan	67b52d8	2016-10-16 19:12:43 +0200	[diff] [blame]	100
				101	auto map = Eigen::Map<Matrix, 0, Strides>(
chenzy	39b9e04	2017-05-26 16:46:00 +0800	[diff] [blame]	102	static_cast<Scalar *>(info.ptr), info.shape[0], info.shape[1], strides);
Dean Moldovan	67b52d8	2016-10-16 19:12:43 +0200	[diff] [blame]	103
				104	new (&m) Matrix(map);
				105	});
				106
				107	For reference, the ``def_buffer()`` call for this Eigen data type should look
				108	as follows:
				109
				110	.. code-block:: cpp
				111
				112	.def_buffer([](Matrix &m) -> py::buffer_info {
				113	return py::buffer_info(
Cris Luengo	30d43c4	2017-04-14 14:33:44 -0600	[diff] [blame]	114	m.data(), /* Pointer to buffer */
				115	sizeof(Scalar), /* Size of one scalar */
				116	py::format_descriptor<Scalar>::format(), /* Python struct-style format descriptor */
				117	2, /* Number of dimensions */
				118	{ m.rows(), m.cols() }, /* Buffer dimensions */
Jason Rhinelander	b68959e	2017-04-06 18:16:35 -0400	[diff] [blame]	119	{ sizeof(Scalar) * (rowMajor ? m.cols() : 1),
				120	sizeof(Scalar) * (rowMajor ? 1 : m.rows()) }
Cris Luengo	30d43c4	2017-04-14 14:33:44 -0600	[diff] [blame]	121	/* Strides (in bytes) for each index */
Dean Moldovan	67b52d8	2016-10-16 19:12:43 +0200	[diff] [blame]	122	);
				123	})
				124
				125	For a much easier approach of binding Eigen types (although with some
				126	limitations), refer to the section on :doc:`/advanced/cast/eigen`.
				127
				128	.. seealso::
				129
				130	The file :file:`tests/test_buffers.cpp` contains a complete example
				131	that demonstrates using the buffer protocol with pybind11 in more detail.
				132
				133	.. [#f2] http://docs.python.org/3/c-api/buffer.html
				134
				135	Arrays
				136	======
				137
				138	By exchanging ``py::buffer`` with ``py::array`` in the above snippet, we can
				139	restrict the function so that it only accepts NumPy arrays (rather than any
				140	type of Python object satisfying the buffer protocol).
				141
				142	In many situations, we want to define a function which only accepts a NumPy
				143	array of a certain data type. This is possible via the ``py::array_t<T>``
				144	template. For instance, the following function requires the argument to be a
				145	NumPy array containing double precision values.
				146
				147	.. code-block:: cpp
				148
				149	void f(py::array_t<double> array);
				150
				151	When it is invoked with a different type (e.g. an integer or a list of
				152	integers), the binding code will attempt to cast the input into a NumPy array
				153	of the requested type. Note that this feature requires the
Jason Rhinelander	1249452	2017-01-31 11:28:29 -0500	[diff] [blame]	154	:file:`pybind11/numpy.h` header to be included.
Dean Moldovan	67b52d8	2016-10-16 19:12:43 +0200	[diff] [blame]	155
				156	Data in NumPy arrays is not guaranteed to packed in a dense manner;
				157	furthermore, entries can be separated by arbitrary column and row strides.
				158	Sometimes, it can be useful to require a function to only accept dense arrays
				159	using either the C (row-major) or Fortran (column-major) ordering. This can be
				160	accomplished via a second template argument with values ``py::array::c_style``
				161	or ``py::array::f_style``.
				162
				163	.. code-block:: cpp
				164
				165	void f(py::array_t<double, py::array::c_style \| py::array::forcecast> array);
				166
				167	The ``py::array::forcecast`` argument is the default value of the second
				168	template parameter, and it ensures that non-conforming arguments are converted
				169	into an array satisfying the specified requirements instead of trying the next
				170	function overload.
				171
				172	Structured types
				173	================
				174
Jason Rhinelander	f7f5bc8	2017-01-31 11:00:15 -0500	[diff] [blame]	175	In order for ``py::array_t`` to work with structured (record) types, we first
				176	need to register the memory layout of the type. This can be done via
				177	``PYBIND11_NUMPY_DTYPE`` macro, called in the plugin definition code, which
				178	expects the type followed by field names:
Dean Moldovan	67b52d8	2016-10-16 19:12:43 +0200	[diff] [blame]	179
				180	.. code-block:: cpp
				181
				182	struct A {
				183	int x;
				184	double y;
				185	};
				186
				187	struct B {
				188	int z;
				189	A a;
				190	};
				191
Jason Rhinelander	f7f5bc8	2017-01-31 11:00:15 -0500	[diff] [blame]	192	// ...
Dean Moldovan	443ab59	2017-04-24 01:51:44 +0200	[diff] [blame^]	193	PYBIND11_MODULE(test, m) {
Jason Rhinelander	f7f5bc8	2017-01-31 11:00:15 -0500	[diff] [blame]	194	// ...
Dean Moldovan	67b52d8	2016-10-16 19:12:43 +0200	[diff] [blame]	195
Jason Rhinelander	f7f5bc8	2017-01-31 11:00:15 -0500	[diff] [blame]	196	PYBIND11_NUMPY_DTYPE(A, x, y);
				197	PYBIND11_NUMPY_DTYPE(B, z, a);
				198	/* now both A and B can be used as template arguments to py::array_t */
				199	}
Dean Moldovan	67b52d8	2016-10-16 19:12:43 +0200	[diff] [blame]	200
Bruce Merry	b82c0f0	2017-05-10 11:36:24 +0200	[diff] [blame]	201	The structure should consist of fundamental arithmetic types, ``std::complex``,
				202	previously registered substructures, and arrays of any of the above. Both C++
				203	arrays and ``std::array`` are supported. While there is a static assertion to
				204	prevent many types of unsupported structures, it is still the user's
				205	responsibility to use only "plain" structures that can be safely manipulated as
				206	raw memory without violating invariants.
Bruce Merry	8e0d832	2017-05-10 10:21:01 +0200	[diff] [blame]	207
Dean Moldovan	67b52d8	2016-10-16 19:12:43 +0200	[diff] [blame]	208	Vectorizing functions
				209	=====================
				210
				211	Suppose we want to bind a function with the following signature to Python so
				212	that it can process arbitrary NumPy array arguments (vectors, matrices, general
				213	N-D arrays) in addition to its normal arguments:
				214
				215	.. code-block:: cpp
				216
				217	double my_func(int x, float y, double z);
				218
				219	After including the ``pybind11/numpy.h`` header, this is extremely simple:
				220
				221	.. code-block:: cpp
				222
				223	m.def("vectorized_func", py::vectorize(my_func));
				224
				225	Invoking the function like below causes 4 calls to be made to ``my_func`` with
				226	each of the array elements. The significant advantage of this compared to
				227	solutions like ``numpy.vectorize()`` is that the loop over the elements runs
				228	entirely on the C++ side and can be crunched down into a tight, optimized loop
				229	by the compiler. The result is returned as a NumPy array of type
				230	``numpy.dtype.float64``.
				231
				232	.. code-block:: pycon
				233
				234	>>> x = np.array([[1, 3],[5, 7]])
				235	>>> y = np.array([[2, 4],[6, 8]])
				236	>>> z = 3
				237	>>> result = vectorized_func(x, y, z)
				238
				239	The scalar argument ``z`` is transparently replicated 4 times. The input
				240	arrays ``x`` and ``y`` are automatically converted into the right types (they
				241	are of type ``numpy.dtype.int64`` but need to be ``numpy.dtype.int32`` and
Jason Rhinelander	f3ce00e	2017-03-26 00:51:40 -0300	[diff] [blame]	242	``numpy.dtype.float32``, respectively).
Dean Moldovan	67b52d8	2016-10-16 19:12:43 +0200	[diff] [blame]	243
Jason Rhinelander	f3ce00e	2017-03-26 00:51:40 -0300	[diff] [blame]	244	.. note::
Dean Moldovan	67b52d8	2016-10-16 19:12:43 +0200	[diff] [blame]	245
Jason Rhinelander	f3ce00e	2017-03-26 00:51:40 -0300	[diff] [blame]	246	Only arithmetic, complex, and POD types passed by value or by ``const &``
				247	reference are vectorized; all other arguments are passed through as-is.
				248	Functions taking rvalue reference arguments cannot be vectorized.
Dean Moldovan	67b52d8	2016-10-16 19:12:43 +0200	[diff] [blame]	249
				250	In cases where the computation is too complicated to be reduced to
				251	``vectorize``, it will be necessary to create and access the buffer contents
				252	manually. The following snippet contains a complete example that shows how this
				253	works (the code is somewhat contrived, since it could have been done more
				254	simply using ``vectorize``).
				255
				256	.. code-block:: cpp
				257
				258	#include <pybind11/pybind11.h>
				259	#include <pybind11/numpy.h>
				260
				261	namespace py = pybind11;
				262
				263	py::array_t<double> add_arrays(py::array_t<double> input1, py::array_t<double> input2) {
				264	auto buf1 = input1.request(), buf2 = input2.request();
				265
				266	if (buf1.ndim != 1 \|\| buf2.ndim != 1)
				267	throw std::runtime_error("Number of dimensions must be one");
				268
				269	if (buf1.size != buf2.size)
				270	throw std::runtime_error("Input shapes must match");
				271
				272	/* No pointer is passed, so NumPy will allocate the buffer */
				273	auto result = py::array_t<double>(buf1.size);
				274
				275	auto buf3 = result.request();
				276
				277	double ptr1 = (double ) buf1.ptr,
				278	ptr2 = (double ) buf2.ptr,
				279	ptr3 = (double ) buf3.ptr;
				280
				281	for (size_t idx = 0; idx < buf1.shape[0]; idx++)
				282	ptr3[idx] = ptr1[idx] + ptr2[idx];
				283
				284	return result;
				285	}
				286
Dean Moldovan	443ab59	2017-04-24 01:51:44 +0200	[diff] [blame^]	287	PYBIND11_MODULE(test, m) {
Dean Moldovan	67b52d8	2016-10-16 19:12:43 +0200	[diff] [blame]	288	m.def("add_arrays", &add_arrays, "Add two NumPy arrays");
Dean Moldovan	67b52d8	2016-10-16 19:12:43 +0200	[diff] [blame]	289	}
				290
				291	.. seealso::
				292
				293	The file :file:`tests/test_numpy_vectorize.cpp` contains a complete
				294	example that demonstrates using :func:`vectorize` in more detail.
Jason Rhinelander	423a49b	2017-03-19 01:14:23 -0300	[diff] [blame]	295
				296	Direct access
				297	=============
				298
				299	For performance reasons, particularly when dealing with very large arrays, it
				300	is often desirable to directly access array elements without internal checking
				301	of dimensions and bounds on every access when indices are known to be already
				302	valid. To avoid such checks, the ``array`` class and ``array_t<T>`` template
				303	class offer an unchecked proxy object that can be used for this unchecked
				304	access through the ``unchecked<N>`` and ``mutable_unchecked<N>`` methods,
				305	where ``N`` gives the required dimensionality of the array:
				306
				307	.. code-block:: cpp
				308
				309	m.def("sum_3d", [](py::array_t<double> x) {
				310	auto r = x.unchecked<3>(); // x must have ndim = 3; can be non-writeable
				311	double sum = 0;
Cris Luengo	30d43c4	2017-04-14 14:33:44 -0600	[diff] [blame]	312	for (ssize_t i = 0; i < r.shape(0); i++)
				313	for (ssize_t j = 0; j < r.shape(1); j++)
				314	for (ssize_t k = 0; k < r.shape(2); k++)
Jason Rhinelander	423a49b	2017-03-19 01:14:23 -0300	[diff] [blame]	315	sum += r(i, j, k);
				316	return sum;
				317	});
				318	m.def("increment_3d", [](py::array_t<double> x) {
				319	auto r = x.mutable_unchecked<3>(); // Will throw if ndim != 3 or flags.writeable is false
Cris Luengo	30d43c4	2017-04-14 14:33:44 -0600	[diff] [blame]	320	for (ssize_t i = 0; i < r.shape(0); i++)
				321	for (ssize_t j = 0; j < r.shape(1); j++)
				322	for (ssize_t k = 0; k < r.shape(2); k++)
Jason Rhinelander	423a49b	2017-03-19 01:14:23 -0300	[diff] [blame]	323	r(i, j, k) += 1.0;
				324	}, py::arg().noconvert());
				325
				326	To obtain the proxy from an ``array`` object, you must specify both the data
				327	type and number of dimensions as template arguments, such as ``auto r =
				328	myarray.mutable_unchecked<float, 2>()``.
				329
Jason Rhinelander	773339f	2017-03-20 17:48:38 -0300	[diff] [blame]	330	If the number of dimensions is not known at compile time, you can omit the
				331	dimensions template parameter (i.e. calling ``arr_t.unchecked()`` or
				332	``arr.unchecked<T>()``. This will give you a proxy object that works in the
				333	same way, but results in less optimizable code and thus a small efficiency
				334	loss in tight loops.
				335
Jason Rhinelander	423a49b	2017-03-19 01:14:23 -0300	[diff] [blame]	336	Note that the returned proxy object directly references the array's data, and
				337	only reads its shape, strides, and writeable flag when constructed. You must
				338	take care to ensure that the referenced array is not destroyed or reshaped for
				339	the duration of the returned object, typically by limiting the scope of the
				340	returned instance.
				341
Jason Rhinelander	773339f	2017-03-20 17:48:38 -0300	[diff] [blame]	342	The returned proxy object supports some of the same methods as ``py::array`` so
				343	that it can be used as a drop-in replacement for some existing, index-checked
				344	uses of ``py::array``:
				345
				346	- ``r.ndim()`` returns the number of dimensions
				347
				348	- ``r.data(1, 2, ...)`` and ``r.mutable_data(1, 2, ...)``` returns a pointer to
				349	the ``const T`` or ``T`` data, respectively, at the given indices. The
				350	latter is only available to proxies obtained via ``a.mutable_unchecked()``.
				351
				352	- ``itemsize()`` returns the size of an item in bytes, i.e. ``sizeof(T)``.
				353
				354	- ``ndim()`` returns the number of dimensions.
				355
				356	- ``shape(n)`` returns the size of dimension ``n``
				357
				358	- ``size()`` returns the total number of elements (i.e. the product of the shapes).
				359
				360	- ``nbytes()`` returns the number of bytes used by the referenced elements
				361	(i.e. ``itemsize()`` times ``size()``).
				362
Jason Rhinelander	423a49b	2017-03-19 01:14:23 -0300	[diff] [blame]	363	.. seealso::
				364
				365	The file :file:`tests/test_numpy_array.cpp` contains additional examples
				366	demonstrating the use of this feature.