Blame - Documentation/crypto/async-tx-api.txt - kernel/msm-4.9

blob: ba046b8fa92fb4a34a360fe54e6b997c019feb43 [file] [log] [blame]

Dan Williams	c5d2b9f	2007-09-20 15:49:08 -0700	[diff] [blame]	1	Asynchronous Transfers/Transforms API
				2
				3	1 INTRODUCTION
				4
				5	2 GENEALOGY
				6
				7	3 USAGE
				8	3.1 General format of the API
				9	3.2 Supported operations
				10	3.3 Descriptor management
				11	3.4 When does the operation execute?
				12	3.5 When does the operation complete?
				13	3.6 Constraints
				14	3.7 Example
				15
Dan Williams	28405d8	2009-01-05 17:14:31 -0700	[diff] [blame]	16	4 DMAENGINE DRIVER DEVELOPER NOTES
Dan Williams	c5d2b9f	2007-09-20 15:49:08 -0700	[diff] [blame]	17	4.1 Conformance points
Dan Williams	28405d8	2009-01-05 17:14:31 -0700	[diff] [blame]	18	4.2 "My application needs exclusive control of hardware channels"
Dan Williams	c5d2b9f	2007-09-20 15:49:08 -0700	[diff] [blame]	19
				20	5 SOURCE
				21
				22	---
				23
				24	1 INTRODUCTION
				25
				26	The async_tx API provides methods for describing a chain of asynchronous
				27	bulk memory transfers/transforms with support for inter-transactional
				28	dependencies. It is implemented as a dmaengine client that smooths over
				29	the details of different hardware offload engine implementations. Code
				30	that is written to the API can optimize for asynchronous operation and
				31	the API will fit the chain of operations to the available offload
				32	resources.
				33
				34	2 GENEALOGY
				35
				36	The API was initially designed to offload the memory copy and
				37	xor-parity-calculations of the md-raid5 driver using the offload engines
				38	present in the Intel(R) Xscale series of I/O processors. It also built
				39	on the 'dmaengine' layer developed for offloading memory copies in the
				40	network stack using Intel(R) I/OAT engines. The following design
				41	features surfaced as a result:
				42	1/ implicit synchronous path: users of the API do not need to know if
				43	the platform they are running on has offload capabilities. The
				44	operation will be offloaded when an engine is available and carried out
				45	in software otherwise.
				46	2/ cross channel dependency chains: the API allows a chain of dependent
				47	operations to be submitted, like xor->copy->xor in the raid5 case. The
				48	API automatically handles cases where the transition from one operation
				49	to another implies a hardware channel switch.
				50	3/ dmaengine extensions to support multiple clients and operation types
				51	beyond 'memcpy'
				52
				53	3 USAGE
				54
				55	3.1 General format of the API:
				56	struct dma_async_tx_descriptor *
Dan Williams	a08abd8	2009-06-03 11:43:59 -0700	[diff] [blame]	57	async_<operation>(<op specific parameters>, struct async_submit ctl *submit)
Dan Williams	c5d2b9f	2007-09-20 15:49:08 -0700	[diff] [blame]	58
				59	3.2 Supported operations:
Dan Williams	099f53c	2009-04-08 14:28:37 -0700	[diff] [blame]	60	memcpy - memory copy between a source and a destination buffer
				61	memset - fill a destination buffer with a byte value
				62	xor - xor a series of source buffers and write the result to a
				63	destination buffer
				64	xor_val - xor a series of source buffers and set a flag if the
				65	result is zero. The implementation attempts to prevent
				66	writes to memory
Dan Williams	b2f46fd	2009-07-14 12:20:36 -0700	[diff] [blame]	67	pq - generate the p+q (raid6 syndrome) from a series of source buffers
				68	pq_val - validate that a p and or q buffer are in sync with a given series of
				69	sources
Dan Williams	0a82a62	2009-07-14 12:20:37 -0700	[diff] [blame]	70	datap - (raid6_datap_recov) recover a raid6 data block and the p block
				71	from the given sources
				72	2data - (raid6_2data_recov) recover 2 raid6 data blocks from the given
				73	sources
Dan Williams	c5d2b9f	2007-09-20 15:49:08 -0700	[diff] [blame]	74
				75	3.3 Descriptor management:
				76	The return value is non-NULL and points to a 'descriptor' when the operation
				77	has been queued to execute asynchronously. Descriptors are recycled
				78	resources, under control of the offload engine driver, to be reused as
				79	operations complete. When an application needs to submit a chain of
				80	operations it must guarantee that the descriptor is not automatically recycled
				81	before the dependency is submitted. This requires that all descriptors be
				82	acknowledged by the application before the offload engine driver is allowed to
				83	recycle (or free) the descriptor. A descriptor can be acked by one of the
				84	following methods:
				85	1/ setting the ASYNC_TX_ACK flag if no child operations are to be submitted
Dan Williams	88ba2aa	2009-04-09 16:16:18 -0700	[diff] [blame]	86	2/ submitting an unacknowledged descriptor as a dependency to another
				87	async_tx call will implicitly set the acknowledged state.
Dan Williams	c5d2b9f	2007-09-20 15:49:08 -0700	[diff] [blame]	88	3/ calling async_tx_ack() on the descriptor.
				89
				90	3.4 When does the operation execute?
				91	Operations do not immediately issue after return from the
				92	async_<operation> call. Offload engine drivers batch operations to
				93	improve performance by reducing the number of mmio cycles needed to
				94	manage the channel. Once a driver-specific threshold is met the driver
				95	automatically issues pending operations. An application can force this
				96	event by calling async_tx_issue_pending_all(). This operates on all
				97	channels since the application has no knowledge of channel to operation
				98	mapping.
				99
				100	3.5 When does the operation complete?
				101	There are two methods for an application to learn about the completion
				102	of an operation.
				103	1/ Call dma_wait_for_async_tx(). This call causes the CPU to spin while
				104	it polls for the completion of the operation. It handles dependency
				105	chains and issuing pending operations.
				106	2/ Specify a completion callback. The callback routine runs in tasklet
				107	context if the offload engine driver supports interrupts, or it is
				108	called in application context if the operation is carried out
				109	synchronously in software. The callback can be set in the call to
				110	async_<operation>, or when the application needs to submit a chain of
				111	unknown length it can use the async_trigger_callback() routine to set a
				112	completion interrupt/callback at the end of the chain.
				113
				114	3.6 Constraints:
				115	1/ Calls to async_<operation> are not permitted in IRQ context. Other
				116	contexts are permitted provided constraint #2 is not violated.
				117	2/ Completion callback routines cannot submit new operations. This
				118	results in recursion in the synchronous case and spin_locks being
				119	acquired twice in the asynchronous case.
				120
				121	3.7 Example:
				122	Perform a xor->copy->xor operation where each operation depends on the
				123	result from the previous operation:
				124
Dan Williams	04ce9ab	2009-06-03 14:22:28 -0700	[diff] [blame]	125	void callback(void *param)
Dan Williams	c5d2b9f	2007-09-20 15:49:08 -0700	[diff] [blame]	126	{
Dan Williams	04ce9ab	2009-06-03 14:22:28 -0700	[diff] [blame]	127	struct completion *cmp = param;
				128
				129	complete(cmp);
Dan Williams	c5d2b9f	2007-09-20 15:49:08 -0700	[diff] [blame]	130	}
				131
Dan Williams	04ce9ab	2009-06-03 14:22:28 -0700	[diff] [blame]	132	void run_xor_copy_xor(struct page **xor_srcs,
				133	int xor_src_cnt,
				134	struct page *xor_dest,
				135	size_t xor_len,
				136	struct page *copy_src,
				137	struct page *copy_dest,
				138	size_t copy_len)
Dan Williams	c5d2b9f	2007-09-20 15:49:08 -0700	[diff] [blame]	139	{
				140	struct dma_async_tx_descriptor *tx;
Dan Williams	04ce9ab	2009-06-03 14:22:28 -0700	[diff] [blame]	141	addr_conv_t addr_conv[xor_src_cnt];
				142	struct async_submit_ctl submit;
				143	addr_conv_t addr_conv[NDISKS];
				144	struct completion cmp;
Dan Williams	c5d2b9f	2007-09-20 15:49:08 -0700	[diff] [blame]	145
Dan Williams	04ce9ab	2009-06-03 14:22:28 -0700	[diff] [blame]	146	init_async_submit(&submit, ASYNC_TX_XOR_DROP_DST, NULL, NULL, NULL,
				147	addr_conv);
				148	tx = async_xor(xor_dest, xor_srcs, 0, xor_src_cnt, xor_len, &submit)
				149
				150	submit->depend_tx = tx;
				151	tx = async_memcpy(copy_dest, copy_src, 0, 0, copy_len, &submit);
				152
				153	init_completion(&cmp);
				154	init_async_submit(&submit, ASYNC_TX_XOR_DROP_DST \| ASYNC_TX_ACK, tx,
				155	callback, &cmp, addr_conv);
				156	tx = async_xor(xor_dest, xor_srcs, 0, xor_src_cnt, xor_len, &submit);
Dan Williams	c5d2b9f	2007-09-20 15:49:08 -0700	[diff] [blame]	157
				158	async_tx_issue_pending_all();
Dan Williams	04ce9ab	2009-06-03 14:22:28 -0700	[diff] [blame]	159
				160	wait_for_completion(&cmp);
Dan Williams	c5d2b9f	2007-09-20 15:49:08 -0700	[diff] [blame]	161	}
				162
				163	See include/linux/async_tx.h for more information on the flags. See the
				164	ops_run_* and ops_complete_* routines in drivers/md/raid5.c for more
				165	implementation examples.
				166
				167	4 DRIVER DEVELOPMENT NOTES
Dan Williams	28405d8	2009-01-05 17:14:31 -0700	[diff] [blame]	168
Dan Williams	c5d2b9f	2007-09-20 15:49:08 -0700	[diff] [blame]	169	4.1 Conformance points:
				170	There are a few conformance points required in dmaengine drivers to
				171	accommodate assumptions made by applications using the async_tx API:
				172	1/ Completion callbacks are expected to happen in tasklet context
				173	2/ dma_async_tx_descriptor fields are never manipulated in IRQ context
				174	3/ Use async_tx_run_dependencies() in the descriptor clean up path to
				175	handle submission of dependent operations
				176
Dan Williams	28405d8	2009-01-05 17:14:31 -0700	[diff] [blame]	177	4.2 "My application needs exclusive control of hardware channels"
				178	Primarily this requirement arises from cases where a DMA engine driver
				179	is being used to support device-to-memory operations. A channel that is
				180	performing these operations cannot, for many platform specific reasons,
				181	be shared. For these cases the dma_request_channel() interface is
				182	provided.
Dan Williams	c5d2b9f	2007-09-20 15:49:08 -0700	[diff] [blame]	183
Dan Williams	28405d8	2009-01-05 17:14:31 -0700	[diff] [blame]	184	The interface is:
				185	struct dma_chan *dma_request_channel(dma_cap_mask_t mask,
				186	dma_filter_fn filter_fn,
				187	void *filter_param);
Dan Williams	c5d2b9f	2007-09-20 15:49:08 -0700	[diff] [blame]	188
Dan Williams	28405d8	2009-01-05 17:14:31 -0700	[diff] [blame]	189	Where dma_filter_fn is defined as:
				190	typedef bool (dma_filter_fn)(struct dma_chan chan, void *filter_param);
Dan Williams	c5d2b9f	2007-09-20 15:49:08 -0700	[diff] [blame]	191
Dan Williams	28405d8	2009-01-05 17:14:31 -0700	[diff] [blame]	192	When the optional 'filter_fn' parameter is set to NULL
				193	dma_request_channel simply returns the first channel that satisfies the
				194	capability mask. Otherwise, when the mask parameter is insufficient for
				195	specifying the necessary channel, the filter_fn routine can be used to
				196	disposition the available channels in the system. The filter_fn routine
				197	is called once for each free channel in the system. Upon seeing a
				198	suitable channel filter_fn returns DMA_ACK which flags that channel to
				199	be the return value from dma_request_channel. A channel allocated via
				200	this interface is exclusive to the caller, until dma_release_channel()
				201	is called.
Dan Williams	c5d2b9f	2007-09-20 15:49:08 -0700	[diff] [blame]	202
Dan Williams	28405d8	2009-01-05 17:14:31 -0700	[diff] [blame]	203	The DMA_PRIVATE capability flag is used to tag dma devices that should
				204	not be used by the general-purpose allocator. It can be set at
				205	initialization time if it is known that a channel will always be
				206	private. Alternatively, it is set when dma_request_channel() finds an
				207	unused "public" channel.
Dan Williams	c5d2b9f	2007-09-20 15:49:08 -0700	[diff] [blame]	208
Dan Williams	28405d8	2009-01-05 17:14:31 -0700	[diff] [blame]	209	A couple caveats to note when implementing a driver and consumer:
				210	1/ Once a channel has been privately allocated it will no longer be
				211	considered by the general-purpose allocator even after a call to
				212	dma_release_channel().
				213	2/ Since capabilities are specified at the device level a dma_device
				214	with multiple channels will either have all channels public, or all
				215	channels private.
Dan Williams	c5d2b9f	2007-09-20 15:49:08 -0700	[diff] [blame]	216
				217	5 SOURCE
Dan Williams	28405d8	2009-01-05 17:14:31 -0700	[diff] [blame]	218
				219	include/linux/dmaengine.h: core header file for DMA drivers and api users
Dan Williams	c5d2b9f	2007-09-20 15:49:08 -0700	[diff] [blame]	220	drivers/dma/dmaengine.c: offload engine channel management routines
				221	drivers/dma/: location for offload engine drivers
				222	include/linux/async_tx.h: core header file for the async_tx api
				223	crypto/async_tx/async_tx.c: async_tx interface to dmaengine and common code
				224	crypto/async_tx/async_memcpy.c: copy offload
				225	crypto/async_tx/async_memset.c: memory fill offload
				226	crypto/async_tx/async_xor.c: xor and xor zero sum offload