Blame - docs/advanced/pycpp/numpy.rst - platform/external/python/pybind11

blob: d06370611704c8e53271bd133ecaaad37613eab0 [file] [log] [blame]

Dean Moldovan	67b52d8	2016-10-16 19:12:43 +0200	[diff] [blame]	1	.. _numpy:
				2
				3	NumPy
				4	#####
				5
				6	Buffer protocol
				7	===============
				8
				9	Python supports an extremely general and convenient approach for exchanging
				10	data between plugin libraries. Types can expose a buffer view [#f2]_, which
				11	provides fast direct access to the raw internal data representation. Suppose we
				12	want to bind the following simplistic Matrix class:
				13
				14	.. code-block:: cpp
				15
				16	class Matrix {
				17	public:
				18	Matrix(size_t rows, size_t cols) : m_rows(rows), m_cols(cols) {
				19	m_data = new float[rows*cols];
				20	}
				21	float *data() { return m_data; }
				22	size_t rows() const { return m_rows; }
				23	size_t cols() const { return m_cols; }
				24	private:
				25	size_t m_rows, m_cols;
				26	float *m_data;
				27	};
				28
				29	The following binding code exposes the ``Matrix`` contents as a buffer object,
				30	making it possible to cast Matrices into NumPy arrays. It is even possible to
				31	completely avoid copy operations with Python expressions like
				32	``np.array(matrix_instance, copy = False)``.
				33
				34	.. code-block:: cpp
				35
Wenzel Jakob	1d1f81b	2016-12-16 15:00:46 +0100	[diff] [blame]	36	py::class_<Matrix>(m, "Matrix", py::buffer_protocol())
Dean Moldovan	67b52d8	2016-10-16 19:12:43 +0200	[diff] [blame]	37	.def_buffer([](Matrix &m) -> py::buffer_info {
				38	return py::buffer_info(
				39	m.data(), /* Pointer to buffer */
				40	sizeof(float), /* Size of one scalar */
				41	py::format_descriptor<float>::format(), /* Python struct-style format descriptor */
				42	2, /* Number of dimensions */
				43	{ m.rows(), m.cols() }, /* Buffer dimensions */
Jason Rhinelander	b68959e	2017-04-06 18:16:35 -0400	[diff] [blame^]	44	{ sizeof(float) * m.rows(), /* Strides (in bytes) for each index */
				45	sizeof(float) }
Dean Moldovan	67b52d8	2016-10-16 19:12:43 +0200	[diff] [blame]	46	);
				47	});
				48
Wenzel Jakob	1d1f81b	2016-12-16 15:00:46 +0100	[diff] [blame]	49	Supporting the buffer protocol in a new type involves specifying the special
				50	``py::buffer_protocol()`` tag in the ``py::class_`` constructor and calling the
				51	``def_buffer()`` method with a lambda function that creates a
				52	``py::buffer_info`` description record on demand describing a given matrix
				53	instance. The contents of ``py::buffer_info`` mirror the Python buffer protocol
				54	specification.
Dean Moldovan	67b52d8	2016-10-16 19:12:43 +0200	[diff] [blame]	55
				56	.. code-block:: cpp
				57
				58	struct buffer_info {
				59	void *ptr;
				60	size_t itemsize;
				61	std::string format;
				62	int ndim;
				63	std::vector<size_t> shape;
Cris Luengo	d400f60	2017-04-05 16:13:04 -0600	[diff] [blame]	64	std::vector<ssize_t> strides;
Dean Moldovan	67b52d8	2016-10-16 19:12:43 +0200	[diff] [blame]	65	};
				66
				67	To create a C++ function that can take a Python buffer object as an argument,
				68	simply use the type ``py::buffer`` as one of its arguments. Buffers can exist
				69	in a great variety of configurations, hence some safety checks are usually
				70	necessary in the function body. Below, you can see an basic example on how to
				71	define a custom constructor for the Eigen double precision matrix
				72	(``Eigen::MatrixXd``) type, which supports initialization from compatible
				73	buffer objects (e.g. a NumPy matrix).
				74
				75	.. code-block:: cpp
				76
				77	/* Bind MatrixXd (or some other Eigen type) to Python */
				78	typedef Eigen::MatrixXd Matrix;
				79
				80	typedef Matrix::Scalar Scalar;
				81	constexpr bool rowMajor = Matrix::Flags & Eigen::RowMajorBit;
				82
Wenzel Jakob	1d1f81b	2016-12-16 15:00:46 +0100	[diff] [blame]	83	py::class_<Matrix>(m, "Matrix", py::buffer_protocol())
Dean Moldovan	67b52d8	2016-10-16 19:12:43 +0200	[diff] [blame]	84	.def("__init__", [](Matrix &m, py::buffer b) {
				85	typedef Eigen::Stride<Eigen::Dynamic, Eigen::Dynamic> Strides;
				86
				87	/* Request a buffer descriptor from Python */
				88	py::buffer_info info = b.request();
				89
				90	/* Some sanity checks ... */
				91	if (info.format != py::format_descriptor<Scalar>::format())
				92	throw std::runtime_error("Incompatible format: expected a double array!");
				93
				94	if (info.ndim != 2)
				95	throw std::runtime_error("Incompatible buffer dimension!");
				96
				97	auto strides = Strides(
				98	info.strides[rowMajor ? 0 : 1] / sizeof(Scalar),
				99	info.strides[rowMajor ? 1 : 0] / sizeof(Scalar));
				100
				101	auto map = Eigen::Map<Matrix, 0, Strides>(
				102	static_cat<Scalar *>(info.ptr), info.shape[0], info.shape[1], strides);
				103
				104	new (&m) Matrix(map);
				105	});
				106
				107	For reference, the ``def_buffer()`` call for this Eigen data type should look
				108	as follows:
				109
				110	.. code-block:: cpp
				111
				112	.def_buffer([](Matrix &m) -> py::buffer_info {
				113	return py::buffer_info(
				114	m.data(), /* Pointer to buffer */
				115	sizeof(Scalar), /* Size of one scalar */
				116	/* Python struct-style format descriptor */
				117	py::format_descriptor<Scalar>::format(),
				118	/* Number of dimensions */
				119	2,
				120	/* Buffer dimensions */
Jason Rhinelander	b68959e	2017-04-06 18:16:35 -0400	[diff] [blame^]	121	{ m.rows(), m.cols() },
Dean Moldovan	67b52d8	2016-10-16 19:12:43 +0200	[diff] [blame]	122	/* Strides (in bytes) for each index */
Jason Rhinelander	b68959e	2017-04-06 18:16:35 -0400	[diff] [blame^]	123	{ sizeof(Scalar) * (rowMajor ? m.cols() : 1),
				124	sizeof(Scalar) * (rowMajor ? 1 : m.rows()) }
Dean Moldovan	67b52d8	2016-10-16 19:12:43 +0200	[diff] [blame]	125	);
				126	})
				127
				128	For a much easier approach of binding Eigen types (although with some
				129	limitations), refer to the section on :doc:`/advanced/cast/eigen`.
				130
				131	.. seealso::
				132
				133	The file :file:`tests/test_buffers.cpp` contains a complete example
				134	that demonstrates using the buffer protocol with pybind11 in more detail.
				135
				136	.. [#f2] http://docs.python.org/3/c-api/buffer.html
				137
				138	Arrays
				139	======
				140
				141	By exchanging ``py::buffer`` with ``py::array`` in the above snippet, we can
				142	restrict the function so that it only accepts NumPy arrays (rather than any
				143	type of Python object satisfying the buffer protocol).
				144
				145	In many situations, we want to define a function which only accepts a NumPy
				146	array of a certain data type. This is possible via the ``py::array_t<T>``
				147	template. For instance, the following function requires the argument to be a
				148	NumPy array containing double precision values.
				149
				150	.. code-block:: cpp
				151
				152	void f(py::array_t<double> array);
				153
				154	When it is invoked with a different type (e.g. an integer or a list of
				155	integers), the binding code will attempt to cast the input into a NumPy array
				156	of the requested type. Note that this feature requires the
Jason Rhinelander	1249452	2017-01-31 11:28:29 -0500	[diff] [blame]	157	:file:`pybind11/numpy.h` header to be included.
Dean Moldovan	67b52d8	2016-10-16 19:12:43 +0200	[diff] [blame]	158
				159	Data in NumPy arrays is not guaranteed to packed in a dense manner;
				160	furthermore, entries can be separated by arbitrary column and row strides.
				161	Sometimes, it can be useful to require a function to only accept dense arrays
				162	using either the C (row-major) or Fortran (column-major) ordering. This can be
				163	accomplished via a second template argument with values ``py::array::c_style``
				164	or ``py::array::f_style``.
				165
				166	.. code-block:: cpp
				167
				168	void f(py::array_t<double, py::array::c_style \| py::array::forcecast> array);
				169
				170	The ``py::array::forcecast`` argument is the default value of the second
				171	template parameter, and it ensures that non-conforming arguments are converted
				172	into an array satisfying the specified requirements instead of trying the next
				173	function overload.
				174
				175	Structured types
				176	================
				177
Jason Rhinelander	f7f5bc8	2017-01-31 11:00:15 -0500	[diff] [blame]	178	In order for ``py::array_t`` to work with structured (record) types, we first
				179	need to register the memory layout of the type. This can be done via
				180	``PYBIND11_NUMPY_DTYPE`` macro, called in the plugin definition code, which
				181	expects the type followed by field names:
Dean Moldovan	67b52d8	2016-10-16 19:12:43 +0200	[diff] [blame]	182
				183	.. code-block:: cpp
				184
				185	struct A {
				186	int x;
				187	double y;
				188	};
				189
				190	struct B {
				191	int z;
				192	A a;
				193	};
				194
Jason Rhinelander	f7f5bc8	2017-01-31 11:00:15 -0500	[diff] [blame]	195	// ...
				196	PYBIND11_PLUGIN(test) {
				197	// ...
Dean Moldovan	67b52d8	2016-10-16 19:12:43 +0200	[diff] [blame]	198
Jason Rhinelander	f7f5bc8	2017-01-31 11:00:15 -0500	[diff] [blame]	199	PYBIND11_NUMPY_DTYPE(A, x, y);
				200	PYBIND11_NUMPY_DTYPE(B, z, a);
				201	/* now both A and B can be used as template arguments to py::array_t */
				202	}
Dean Moldovan	67b52d8	2016-10-16 19:12:43 +0200	[diff] [blame]	203
				204	Vectorizing functions
				205	=====================
				206
				207	Suppose we want to bind a function with the following signature to Python so
				208	that it can process arbitrary NumPy array arguments (vectors, matrices, general
				209	N-D arrays) in addition to its normal arguments:
				210
				211	.. code-block:: cpp
				212
				213	double my_func(int x, float y, double z);
				214
				215	After including the ``pybind11/numpy.h`` header, this is extremely simple:
				216
				217	.. code-block:: cpp
				218
				219	m.def("vectorized_func", py::vectorize(my_func));
				220
				221	Invoking the function like below causes 4 calls to be made to ``my_func`` with
				222	each of the array elements. The significant advantage of this compared to
				223	solutions like ``numpy.vectorize()`` is that the loop over the elements runs
				224	entirely on the C++ side and can be crunched down into a tight, optimized loop
				225	by the compiler. The result is returned as a NumPy array of type
				226	``numpy.dtype.float64``.
				227
				228	.. code-block:: pycon
				229
				230	>>> x = np.array([[1, 3],[5, 7]])
				231	>>> y = np.array([[2, 4],[6, 8]])
				232	>>> z = 3
				233	>>> result = vectorized_func(x, y, z)
				234
				235	The scalar argument ``z`` is transparently replicated 4 times. The input
				236	arrays ``x`` and ``y`` are automatically converted into the right types (they
				237	are of type ``numpy.dtype.int64`` but need to be ``numpy.dtype.int32`` and
				238	``numpy.dtype.float32``, respectively)
				239
				240	Sometimes we might want to explicitly exclude an argument from the vectorization
				241	because it makes little sense to wrap it in a NumPy array. For instance,
				242	suppose the function signature was
				243
				244	.. code-block:: cpp
				245
				246	double my_func(int x, float y, my_custom_type *z);
				247
				248	This can be done with a stateful Lambda closure:
				249
				250	.. code-block:: cpp
				251
				252	// Vectorize a lambda function with a capture object (e.g. to exclude some arguments from the vectorization)
				253	m.def("vectorized_func",
				254	[](py::array_t<int> x, py::array_t<float> y, my_custom_type *z) {
				255	auto stateful_closure = [z](int x, float y) { return my_func(x, y, z); };
				256	return py::vectorize(stateful_closure)(x, y);
				257	}
				258	);
				259
				260	In cases where the computation is too complicated to be reduced to
				261	``vectorize``, it will be necessary to create and access the buffer contents
				262	manually. The following snippet contains a complete example that shows how this
				263	works (the code is somewhat contrived, since it could have been done more
				264	simply using ``vectorize``).
				265
				266	.. code-block:: cpp
				267
				268	#include <pybind11/pybind11.h>
				269	#include <pybind11/numpy.h>
				270
				271	namespace py = pybind11;
				272
				273	py::array_t<double> add_arrays(py::array_t<double> input1, py::array_t<double> input2) {
				274	auto buf1 = input1.request(), buf2 = input2.request();
				275
				276	if (buf1.ndim != 1 \|\| buf2.ndim != 1)
				277	throw std::runtime_error("Number of dimensions must be one");
				278
				279	if (buf1.size != buf2.size)
				280	throw std::runtime_error("Input shapes must match");
				281
				282	/* No pointer is passed, so NumPy will allocate the buffer */
				283	auto result = py::array_t<double>(buf1.size);
				284
				285	auto buf3 = result.request();
				286
				287	double ptr1 = (double ) buf1.ptr,
				288	ptr2 = (double ) buf2.ptr,
				289	ptr3 = (double ) buf3.ptr;
				290
				291	for (size_t idx = 0; idx < buf1.shape[0]; idx++)
				292	ptr3[idx] = ptr1[idx] + ptr2[idx];
				293
				294	return result;
				295	}
				296
				297	PYBIND11_PLUGIN(test) {
				298	py::module m("test");
				299	m.def("add_arrays", &add_arrays, "Add two NumPy arrays");
				300	return m.ptr();
				301	}
				302
				303	.. seealso::
				304
				305	The file :file:`tests/test_numpy_vectorize.cpp` contains a complete
				306	example that demonstrates using :func:`vectorize` in more detail.
Jason Rhinelander	423a49b	2017-03-19 01:14:23 -0300	[diff] [blame]	307
				308	Direct access
				309	=============
				310
				311	For performance reasons, particularly when dealing with very large arrays, it
				312	is often desirable to directly access array elements without internal checking
				313	of dimensions and bounds on every access when indices are known to be already
				314	valid. To avoid such checks, the ``array`` class and ``array_t<T>`` template
				315	class offer an unchecked proxy object that can be used for this unchecked
				316	access through the ``unchecked<N>`` and ``mutable_unchecked<N>`` methods,
				317	where ``N`` gives the required dimensionality of the array:
				318
				319	.. code-block:: cpp
				320
				321	m.def("sum_3d", [](py::array_t<double> x) {
				322	auto r = x.unchecked<3>(); // x must have ndim = 3; can be non-writeable
				323	double sum = 0;
				324	for (size_t i = 0; i < r.shape(0); i++)
				325	for (size_t j = 0; j < r.shape(1); j++)
				326	for (size_t k = 0; k < r.shape(2); k++)
				327	sum += r(i, j, k);
				328	return sum;
				329	});
				330	m.def("increment_3d", [](py::array_t<double> x) {
				331	auto r = x.mutable_unchecked<3>(); // Will throw if ndim != 3 or flags.writeable is false
				332	for (size_t i = 0; i < r.shape(0); i++)
				333	for (size_t j = 0; j < r.shape(1); j++)
				334	for (size_t k = 0; k < r.shape(2); k++)
				335	r(i, j, k) += 1.0;
				336	}, py::arg().noconvert());
				337
				338	To obtain the proxy from an ``array`` object, you must specify both the data
				339	type and number of dimensions as template arguments, such as ``auto r =
				340	myarray.mutable_unchecked<float, 2>()``.
				341
Jason Rhinelander	773339f	2017-03-20 17:48:38 -0300	[diff] [blame]	342	If the number of dimensions is not known at compile time, you can omit the
				343	dimensions template parameter (i.e. calling ``arr_t.unchecked()`` or
				344	``arr.unchecked<T>()``. This will give you a proxy object that works in the
				345	same way, but results in less optimizable code and thus a small efficiency
				346	loss in tight loops.
				347
Jason Rhinelander	423a49b	2017-03-19 01:14:23 -0300	[diff] [blame]	348	Note that the returned proxy object directly references the array's data, and
				349	only reads its shape, strides, and writeable flag when constructed. You must
				350	take care to ensure that the referenced array is not destroyed or reshaped for
				351	the duration of the returned object, typically by limiting the scope of the
				352	returned instance.
				353
Jason Rhinelander	773339f	2017-03-20 17:48:38 -0300	[diff] [blame]	354	The returned proxy object supports some of the same methods as ``py::array`` so
				355	that it can be used as a drop-in replacement for some existing, index-checked
				356	uses of ``py::array``:
				357
				358	- ``r.ndim()`` returns the number of dimensions
				359
				360	- ``r.data(1, 2, ...)`` and ``r.mutable_data(1, 2, ...)``` returns a pointer to
				361	the ``const T`` or ``T`` data, respectively, at the given indices. The
				362	latter is only available to proxies obtained via ``a.mutable_unchecked()``.
				363
				364	- ``itemsize()`` returns the size of an item in bytes, i.e. ``sizeof(T)``.
				365
				366	- ``ndim()`` returns the number of dimensions.
				367
				368	- ``shape(n)`` returns the size of dimension ``n``
				369
				370	- ``size()`` returns the total number of elements (i.e. the product of the shapes).
				371
				372	- ``nbytes()`` returns the number of bytes used by the referenced elements
				373	(i.e. ``itemsize()`` times ``size()``).
				374
Jason Rhinelander	423a49b	2017-03-19 01:14:23 -0300	[diff] [blame]	375	.. seealso::
				376
				377	The file :file:`tests/test_numpy_array.cpp` contains additional examples
				378	demonstrating the use of this feature.