Blame - docs/advanced/pycpp/numpy.rst - platform/external/python/pybind11

blob: 8b46b7c83474b1ad88dd822f58ec1bb6bb51909c [file] [log] [blame]

Dean Moldovan	67b52d8	2016-10-16 19:12:43 +0200	[diff] [blame]	1	.. _numpy:
				2
				3	NumPy
				4	#####
				5
				6	Buffer protocol
				7	===============
				8
				9	Python supports an extremely general and convenient approach for exchanging
				10	data between plugin libraries. Types can expose a buffer view [#f2]_, which
				11	provides fast direct access to the raw internal data representation. Suppose we
				12	want to bind the following simplistic Matrix class:
				13
				14	.. code-block:: cpp
				15
				16	class Matrix {
				17	public:
				18	Matrix(size_t rows, size_t cols) : m_rows(rows), m_cols(cols) {
				19	m_data = new float[rows*cols];
				20	}
				21	float *data() { return m_data; }
				22	size_t rows() const { return m_rows; }
				23	size_t cols() const { return m_cols; }
				24	private:
				25	size_t m_rows, m_cols;
				26	float *m_data;
				27	};
				28
				29	The following binding code exposes the ``Matrix`` contents as a buffer object,
				30	making it possible to cast Matrices into NumPy arrays. It is even possible to
				31	completely avoid copy operations with Python expressions like
				32	``np.array(matrix_instance, copy = False)``.
				33
				34	.. code-block:: cpp
				35
				36	py::class_<Matrix>(m, "Matrix")
				37	.def_buffer([](Matrix &m) -> py::buffer_info {
				38	return py::buffer_info(
				39	m.data(), /* Pointer to buffer */
				40	sizeof(float), /* Size of one scalar */
				41	py::format_descriptor<float>::format(), /* Python struct-style format descriptor */
				42	2, /* Number of dimensions */
				43	{ m.rows(), m.cols() }, /* Buffer dimensions */
				44	{ sizeof(float) * m.rows(), /* Strides (in bytes) for each index */
				45	sizeof(float) }
				46	);
				47	});
				48
				49	The snippet above binds a lambda function, which can create ``py::buffer_info``
				50	description records on demand describing a given matrix. The contents of
				51	``py::buffer_info`` mirror the Python buffer protocol specification.
				52
				53	.. code-block:: cpp
				54
				55	struct buffer_info {
				56	void *ptr;
				57	size_t itemsize;
				58	std::string format;
				59	int ndim;
				60	std::vector<size_t> shape;
				61	std::vector<size_t> strides;
				62	};
				63
				64	To create a C++ function that can take a Python buffer object as an argument,
				65	simply use the type ``py::buffer`` as one of its arguments. Buffers can exist
				66	in a great variety of configurations, hence some safety checks are usually
				67	necessary in the function body. Below, you can see an basic example on how to
				68	define a custom constructor for the Eigen double precision matrix
				69	(``Eigen::MatrixXd``) type, which supports initialization from compatible
				70	buffer objects (e.g. a NumPy matrix).
				71
				72	.. code-block:: cpp
				73
				74	/* Bind MatrixXd (or some other Eigen type) to Python */
				75	typedef Eigen::MatrixXd Matrix;
				76
				77	typedef Matrix::Scalar Scalar;
				78	constexpr bool rowMajor = Matrix::Flags & Eigen::RowMajorBit;
				79
				80	py::class_<Matrix>(m, "Matrix")
				81	.def("__init__", [](Matrix &m, py::buffer b) {
				82	typedef Eigen::Stride<Eigen::Dynamic, Eigen::Dynamic> Strides;
				83
				84	/* Request a buffer descriptor from Python */
				85	py::buffer_info info = b.request();
				86
				87	/* Some sanity checks ... */
				88	if (info.format != py::format_descriptor<Scalar>::format())
				89	throw std::runtime_error("Incompatible format: expected a double array!");
				90
				91	if (info.ndim != 2)
				92	throw std::runtime_error("Incompatible buffer dimension!");
				93
				94	auto strides = Strides(
				95	info.strides[rowMajor ? 0 : 1] / sizeof(Scalar),
				96	info.strides[rowMajor ? 1 : 0] / sizeof(Scalar));
				97
				98	auto map = Eigen::Map<Matrix, 0, Strides>(
				99	static_cat<Scalar *>(info.ptr), info.shape[0], info.shape[1], strides);
				100
				101	new (&m) Matrix(map);
				102	});
				103
				104	For reference, the ``def_buffer()`` call for this Eigen data type should look
				105	as follows:
				106
				107	.. code-block:: cpp
				108
				109	.def_buffer([](Matrix &m) -> py::buffer_info {
				110	return py::buffer_info(
				111	m.data(), /* Pointer to buffer */
				112	sizeof(Scalar), /* Size of one scalar */
				113	/* Python struct-style format descriptor */
				114	py::format_descriptor<Scalar>::format(),
				115	/* Number of dimensions */
				116	2,
				117	/* Buffer dimensions */
				118	{ (size_t) m.rows(),
				119	(size_t) m.cols() },
				120	/* Strides (in bytes) for each index */
				121	{ sizeof(Scalar) * (rowMajor ? m.cols() : 1),
				122	sizeof(Scalar) * (rowMajor ? 1 : m.rows()) }
				123	);
				124	})
				125
				126	For a much easier approach of binding Eigen types (although with some
				127	limitations), refer to the section on :doc:`/advanced/cast/eigen`.
				128
				129	.. seealso::
				130
				131	The file :file:`tests/test_buffers.cpp` contains a complete example
				132	that demonstrates using the buffer protocol with pybind11 in more detail.
				133
				134	.. [#f2] http://docs.python.org/3/c-api/buffer.html
				135
				136	Arrays
				137	======
				138
				139	By exchanging ``py::buffer`` with ``py::array`` in the above snippet, we can
				140	restrict the function so that it only accepts NumPy arrays (rather than any
				141	type of Python object satisfying the buffer protocol).
				142
				143	In many situations, we want to define a function which only accepts a NumPy
				144	array of a certain data type. This is possible via the ``py::array_t<T>``
				145	template. For instance, the following function requires the argument to be a
				146	NumPy array containing double precision values.
				147
				148	.. code-block:: cpp
				149
				150	void f(py::array_t<double> array);
				151
				152	When it is invoked with a different type (e.g. an integer or a list of
				153	integers), the binding code will attempt to cast the input into a NumPy array
				154	of the requested type. Note that this feature requires the
				155	:file:``pybind11/numpy.h`` header to be included.
				156
				157	Data in NumPy arrays is not guaranteed to packed in a dense manner;
				158	furthermore, entries can be separated by arbitrary column and row strides.
				159	Sometimes, it can be useful to require a function to only accept dense arrays
				160	using either the C (row-major) or Fortran (column-major) ordering. This can be
				161	accomplished via a second template argument with values ``py::array::c_style``
				162	or ``py::array::f_style``.
				163
				164	.. code-block:: cpp
				165
				166	void f(py::array_t<double, py::array::c_style \| py::array::forcecast> array);
				167
				168	The ``py::array::forcecast`` argument is the default value of the second
				169	template parameter, and it ensures that non-conforming arguments are converted
				170	into an array satisfying the specified requirements instead of trying the next
				171	function overload.
				172
				173	Structured types
				174	================
				175
				176	In order for ``py::array_t`` to work with structured (record) types, we first need
				177	to register the memory layout of the type. This can be done via ``PYBIND11_NUMPY_DTYPE``
				178	macro which expects the type followed by field names:
				179
				180	.. code-block:: cpp
				181
				182	struct A {
				183	int x;
				184	double y;
				185	};
				186
				187	struct B {
				188	int z;
				189	A a;
				190	};
				191
				192	PYBIND11_NUMPY_DTYPE(A, x, y);
				193	PYBIND11_NUMPY_DTYPE(B, z, a);
				194
				195	/* now both A and B can be used as template arguments to py::array_t */
				196
				197	Vectorizing functions
				198	=====================
				199
				200	Suppose we want to bind a function with the following signature to Python so
				201	that it can process arbitrary NumPy array arguments (vectors, matrices, general
				202	N-D arrays) in addition to its normal arguments:
				203
				204	.. code-block:: cpp
				205
				206	double my_func(int x, float y, double z);
				207
				208	After including the ``pybind11/numpy.h`` header, this is extremely simple:
				209
				210	.. code-block:: cpp
				211
				212	m.def("vectorized_func", py::vectorize(my_func));
				213
				214	Invoking the function like below causes 4 calls to be made to ``my_func`` with
				215	each of the array elements. The significant advantage of this compared to
				216	solutions like ``numpy.vectorize()`` is that the loop over the elements runs
				217	entirely on the C++ side and can be crunched down into a tight, optimized loop
				218	by the compiler. The result is returned as a NumPy array of type
				219	``numpy.dtype.float64``.
				220
				221	.. code-block:: pycon
				222
				223	>>> x = np.array([[1, 3],[5, 7]])
				224	>>> y = np.array([[2, 4],[6, 8]])
				225	>>> z = 3
				226	>>> result = vectorized_func(x, y, z)
				227
				228	The scalar argument ``z`` is transparently replicated 4 times. The input
				229	arrays ``x`` and ``y`` are automatically converted into the right types (they
				230	are of type ``numpy.dtype.int64`` but need to be ``numpy.dtype.int32`` and
				231	``numpy.dtype.float32``, respectively)
				232
				233	Sometimes we might want to explicitly exclude an argument from the vectorization
				234	because it makes little sense to wrap it in a NumPy array. For instance,
				235	suppose the function signature was
				236
				237	.. code-block:: cpp
				238
				239	double my_func(int x, float y, my_custom_type *z);
				240
				241	This can be done with a stateful Lambda closure:
				242
				243	.. code-block:: cpp
				244
				245	// Vectorize a lambda function with a capture object (e.g. to exclude some arguments from the vectorization)
				246	m.def("vectorized_func",
				247	[](py::array_t<int> x, py::array_t<float> y, my_custom_type *z) {
				248	auto stateful_closure = [z](int x, float y) { return my_func(x, y, z); };
				249	return py::vectorize(stateful_closure)(x, y);
				250	}
				251	);
				252
				253	In cases where the computation is too complicated to be reduced to
				254	``vectorize``, it will be necessary to create and access the buffer contents
				255	manually. The following snippet contains a complete example that shows how this
				256	works (the code is somewhat contrived, since it could have been done more
				257	simply using ``vectorize``).
				258
				259	.. code-block:: cpp
				260
				261	#include <pybind11/pybind11.h>
				262	#include <pybind11/numpy.h>
				263
				264	namespace py = pybind11;
				265
				266	py::array_t<double> add_arrays(py::array_t<double> input1, py::array_t<double> input2) {
				267	auto buf1 = input1.request(), buf2 = input2.request();
				268
				269	if (buf1.ndim != 1 \|\| buf2.ndim != 1)
				270	throw std::runtime_error("Number of dimensions must be one");
				271
				272	if (buf1.size != buf2.size)
				273	throw std::runtime_error("Input shapes must match");
				274
				275	/* No pointer is passed, so NumPy will allocate the buffer */
				276	auto result = py::array_t<double>(buf1.size);
				277
				278	auto buf3 = result.request();
				279
				280	double ptr1 = (double ) buf1.ptr,
				281	ptr2 = (double ) buf2.ptr,
				282	ptr3 = (double ) buf3.ptr;
				283
				284	for (size_t idx = 0; idx < buf1.shape[0]; idx++)
				285	ptr3[idx] = ptr1[idx] + ptr2[idx];
				286
				287	return result;
				288	}
				289
				290	PYBIND11_PLUGIN(test) {
				291	py::module m("test");
				292	m.def("add_arrays", &add_arrays, "Add two NumPy arrays");
				293	return m.ptr();
				294	}
				295
				296	.. seealso::
				297
				298	The file :file:`tests/test_numpy_vectorize.cpp` contains a complete
				299	example that demonstrates using :func:`vectorize` in more detail.