Blame - docs/advanced/pycpp/numpy.rst - platform/external/python/pybind11

blob: 111ff0e3cf376d89d5429b3bc1d0bdaeb401bd8c [file] [log] [blame]

Dean Moldovan	67b52d8	2016-10-16 19:12:43 +0200	[diff] [blame]	1	.. _numpy:
				2
				3	NumPy
				4	#####
				5
				6	Buffer protocol
				7	===============
				8
				9	Python supports an extremely general and convenient approach for exchanging
				10	data between plugin libraries. Types can expose a buffer view [#f2]_, which
				11	provides fast direct access to the raw internal data representation. Suppose we
				12	want to bind the following simplistic Matrix class:
				13
				14	.. code-block:: cpp
				15
				16	class Matrix {
				17	public:
				18	Matrix(size_t rows, size_t cols) : m_rows(rows), m_cols(cols) {
				19	m_data = new float[rows*cols];
				20	}
				21	float *data() { return m_data; }
				22	size_t rows() const { return m_rows; }
				23	size_t cols() const { return m_cols; }
				24	private:
				25	size_t m_rows, m_cols;
				26	float *m_data;
				27	};
				28
				29	The following binding code exposes the ``Matrix`` contents as a buffer object,
				30	making it possible to cast Matrices into NumPy arrays. It is even possible to
				31	completely avoid copy operations with Python expressions like
				32	``np.array(matrix_instance, copy = False)``.
				33
				34	.. code-block:: cpp
				35
Wenzel Jakob	1d1f81b	2016-12-16 15:00:46 +0100	[diff] [blame]	36	py::class_<Matrix>(m, "Matrix", py::buffer_protocol())
Dean Moldovan	67b52d8	2016-10-16 19:12:43 +0200	[diff] [blame]	37	.def_buffer([](Matrix &m) -> py::buffer_info {
				38	return py::buffer_info(
				39	m.data(), /* Pointer to buffer */
				40	sizeof(float), /* Size of one scalar */
				41	py::format_descriptor<float>::format(), /* Python struct-style format descriptor */
				42	2, /* Number of dimensions */
				43	{ m.rows(), m.cols() }, /* Buffer dimensions */
				44	{ sizeof(float) * m.rows(), /* Strides (in bytes) for each index */
				45	sizeof(float) }
				46	);
				47	});
				48
Wenzel Jakob	1d1f81b	2016-12-16 15:00:46 +0100	[diff] [blame]	49	Supporting the buffer protocol in a new type involves specifying the special
				50	``py::buffer_protocol()`` tag in the ``py::class_`` constructor and calling the
				51	``def_buffer()`` method with a lambda function that creates a
				52	``py::buffer_info`` description record on demand describing a given matrix
				53	instance. The contents of ``py::buffer_info`` mirror the Python buffer protocol
				54	specification.
Dean Moldovan	67b52d8	2016-10-16 19:12:43 +0200	[diff] [blame]	55
				56	.. code-block:: cpp
				57
				58	struct buffer_info {
				59	void *ptr;
				60	size_t itemsize;
				61	std::string format;
				62	int ndim;
				63	std::vector<size_t> shape;
				64	std::vector<size_t> strides;
				65	};
				66
				67	To create a C++ function that can take a Python buffer object as an argument,
				68	simply use the type ``py::buffer`` as one of its arguments. Buffers can exist
				69	in a great variety of configurations, hence some safety checks are usually
				70	necessary in the function body. Below, you can see an basic example on how to
				71	define a custom constructor for the Eigen double precision matrix
				72	(``Eigen::MatrixXd``) type, which supports initialization from compatible
				73	buffer objects (e.g. a NumPy matrix).
				74
				75	.. code-block:: cpp
				76
				77	/* Bind MatrixXd (or some other Eigen type) to Python */
				78	typedef Eigen::MatrixXd Matrix;
				79
				80	typedef Matrix::Scalar Scalar;
				81	constexpr bool rowMajor = Matrix::Flags & Eigen::RowMajorBit;
				82
Wenzel Jakob	1d1f81b	2016-12-16 15:00:46 +0100	[diff] [blame]	83	py::class_<Matrix>(m, "Matrix", py::buffer_protocol())
Dean Moldovan	67b52d8	2016-10-16 19:12:43 +0200	[diff] [blame]	84	.def("__init__", [](Matrix &m, py::buffer b) {
				85	typedef Eigen::Stride<Eigen::Dynamic, Eigen::Dynamic> Strides;
				86
				87	/* Request a buffer descriptor from Python */
				88	py::buffer_info info = b.request();
				89
				90	/* Some sanity checks ... */
				91	if (info.format != py::format_descriptor<Scalar>::format())
				92	throw std::runtime_error("Incompatible format: expected a double array!");
				93
				94	if (info.ndim != 2)
				95	throw std::runtime_error("Incompatible buffer dimension!");
				96
				97	auto strides = Strides(
				98	info.strides[rowMajor ? 0 : 1] / sizeof(Scalar),
				99	info.strides[rowMajor ? 1 : 0] / sizeof(Scalar));
				100
				101	auto map = Eigen::Map<Matrix, 0, Strides>(
				102	static_cat<Scalar *>(info.ptr), info.shape[0], info.shape[1], strides);
				103
				104	new (&m) Matrix(map);
				105	});
				106
				107	For reference, the ``def_buffer()`` call for this Eigen data type should look
				108	as follows:
				109
				110	.. code-block:: cpp
				111
				112	.def_buffer([](Matrix &m) -> py::buffer_info {
				113	return py::buffer_info(
				114	m.data(), /* Pointer to buffer */
				115	sizeof(Scalar), /* Size of one scalar */
				116	/* Python struct-style format descriptor */
				117	py::format_descriptor<Scalar>::format(),
				118	/* Number of dimensions */
				119	2,
				120	/* Buffer dimensions */
				121	{ (size_t) m.rows(),
				122	(size_t) m.cols() },
				123	/* Strides (in bytes) for each index */
				124	{ sizeof(Scalar) * (rowMajor ? m.cols() : 1),
				125	sizeof(Scalar) * (rowMajor ? 1 : m.rows()) }
				126	);
				127	})
				128
				129	For a much easier approach of binding Eigen types (although with some
				130	limitations), refer to the section on :doc:`/advanced/cast/eigen`.
				131
				132	.. seealso::
				133
				134	The file :file:`tests/test_buffers.cpp` contains a complete example
				135	that demonstrates using the buffer protocol with pybind11 in more detail.
				136
				137	.. [#f2] http://docs.python.org/3/c-api/buffer.html
				138
				139	Arrays
				140	======
				141
				142	By exchanging ``py::buffer`` with ``py::array`` in the above snippet, we can
				143	restrict the function so that it only accepts NumPy arrays (rather than any
				144	type of Python object satisfying the buffer protocol).
				145
				146	In many situations, we want to define a function which only accepts a NumPy
				147	array of a certain data type. This is possible via the ``py::array_t<T>``
				148	template. For instance, the following function requires the argument to be a
				149	NumPy array containing double precision values.
				150
				151	.. code-block:: cpp
				152
				153	void f(py::array_t<double> array);
				154
				155	When it is invoked with a different type (e.g. an integer or a list of
				156	integers), the binding code will attempt to cast the input into a NumPy array
				157	of the requested type. Note that this feature requires the
				158	:file:``pybind11/numpy.h`` header to be included.
				159
				160	Data in NumPy arrays is not guaranteed to packed in a dense manner;
				161	furthermore, entries can be separated by arbitrary column and row strides.
				162	Sometimes, it can be useful to require a function to only accept dense arrays
				163	using either the C (row-major) or Fortran (column-major) ordering. This can be
				164	accomplished via a second template argument with values ``py::array::c_style``
				165	or ``py::array::f_style``.
				166
				167	.. code-block:: cpp
				168
				169	void f(py::array_t<double, py::array::c_style \| py::array::forcecast> array);
				170
				171	The ``py::array::forcecast`` argument is the default value of the second
				172	template parameter, and it ensures that non-conforming arguments are converted
				173	into an array satisfying the specified requirements instead of trying the next
				174	function overload.
				175
				176	Structured types
				177	================
				178
				179	In order for ``py::array_t`` to work with structured (record) types, we first need
				180	to register the memory layout of the type. This can be done via ``PYBIND11_NUMPY_DTYPE``
				181	macro which expects the type followed by field names:
				182
				183	.. code-block:: cpp
				184
				185	struct A {
				186	int x;
				187	double y;
				188	};
				189
				190	struct B {
				191	int z;
				192	A a;
				193	};
				194
				195	PYBIND11_NUMPY_DTYPE(A, x, y);
				196	PYBIND11_NUMPY_DTYPE(B, z, a);
				197
				198	/* now both A and B can be used as template arguments to py::array_t */
				199
				200	Vectorizing functions
				201	=====================
				202
				203	Suppose we want to bind a function with the following signature to Python so
				204	that it can process arbitrary NumPy array arguments (vectors, matrices, general
				205	N-D arrays) in addition to its normal arguments:
				206
				207	.. code-block:: cpp
				208
				209	double my_func(int x, float y, double z);
				210
				211	After including the ``pybind11/numpy.h`` header, this is extremely simple:
				212
				213	.. code-block:: cpp
				214
				215	m.def("vectorized_func", py::vectorize(my_func));
				216
				217	Invoking the function like below causes 4 calls to be made to ``my_func`` with
				218	each of the array elements. The significant advantage of this compared to
				219	solutions like ``numpy.vectorize()`` is that the loop over the elements runs
				220	entirely on the C++ side and can be crunched down into a tight, optimized loop
				221	by the compiler. The result is returned as a NumPy array of type
				222	``numpy.dtype.float64``.
				223
				224	.. code-block:: pycon
				225
				226	>>> x = np.array([[1, 3],[5, 7]])
				227	>>> y = np.array([[2, 4],[6, 8]])
				228	>>> z = 3
				229	>>> result = vectorized_func(x, y, z)
				230
				231	The scalar argument ``z`` is transparently replicated 4 times. The input
				232	arrays ``x`` and ``y`` are automatically converted into the right types (they
				233	are of type ``numpy.dtype.int64`` but need to be ``numpy.dtype.int32`` and
				234	``numpy.dtype.float32``, respectively)
				235
				236	Sometimes we might want to explicitly exclude an argument from the vectorization
				237	because it makes little sense to wrap it in a NumPy array. For instance,
				238	suppose the function signature was
				239
				240	.. code-block:: cpp
				241
				242	double my_func(int x, float y, my_custom_type *z);
				243
				244	This can be done with a stateful Lambda closure:
				245
				246	.. code-block:: cpp
				247
				248	// Vectorize a lambda function with a capture object (e.g. to exclude some arguments from the vectorization)
				249	m.def("vectorized_func",
				250	[](py::array_t<int> x, py::array_t<float> y, my_custom_type *z) {
				251	auto stateful_closure = [z](int x, float y) { return my_func(x, y, z); };
				252	return py::vectorize(stateful_closure)(x, y);
				253	}
				254	);
				255
				256	In cases where the computation is too complicated to be reduced to
				257	``vectorize``, it will be necessary to create and access the buffer contents
				258	manually. The following snippet contains a complete example that shows how this
				259	works (the code is somewhat contrived, since it could have been done more
				260	simply using ``vectorize``).
				261
				262	.. code-block:: cpp
				263
				264	#include <pybind11/pybind11.h>
				265	#include <pybind11/numpy.h>
				266
				267	namespace py = pybind11;
				268
				269	py::array_t<double> add_arrays(py::array_t<double> input1, py::array_t<double> input2) {
				270	auto buf1 = input1.request(), buf2 = input2.request();
				271
				272	if (buf1.ndim != 1 \|\| buf2.ndim != 1)
				273	throw std::runtime_error("Number of dimensions must be one");
				274
				275	if (buf1.size != buf2.size)
				276	throw std::runtime_error("Input shapes must match");
				277
				278	/* No pointer is passed, so NumPy will allocate the buffer */
				279	auto result = py::array_t<double>(buf1.size);
				280
				281	auto buf3 = result.request();
				282
				283	double ptr1 = (double ) buf1.ptr,
				284	ptr2 = (double ) buf2.ptr,
				285	ptr3 = (double ) buf3.ptr;
				286
				287	for (size_t idx = 0; idx < buf1.shape[0]; idx++)
				288	ptr3[idx] = ptr1[idx] + ptr2[idx];
				289
				290	return result;
				291	}
				292
				293	PYBIND11_PLUGIN(test) {
				294	py::module m("test");
				295	m.def("add_arrays", &add_arrays, "Add two NumPy arrays");
				296	return m.ptr();
				297	}
				298
				299	.. seealso::
				300
				301	The file :file:`tests/test_numpy_vectorize.cpp` contains a complete
				302	example that demonstrates using :func:`vectorize` in more detail.