Blame - Documentation/DMA-API.txt - kernel/msm

blob: 2ffb0d62f0fe3ed8156586505b7faf1efe1c33ce [file] [log] [blame]

Linus Torvalds	1da177e	2005-04-16 15:20:36 -0700	[diff] [blame]	1	Dynamic DMA mapping using the generic device
				2	============================================
				3
				4	James E.J. Bottomley <James.Bottomley@HansenPartnership.com>
				5
				6	This document describes the DMA API. For a more gentle introduction
				7	phrased in terms of the pci_ equivalents (and actual examples) see
				8	DMA-mapping.txt
				9
				10	This API is split into two pieces. Part I describes the API and the
				11	corresponding pci_ API. Part II describes the extensions to the API
				12	for supporting non-consistent memory machines. Unless you know that
				13	your driver absolutely has to support non-consistent platforms (this
				14	is usually only legacy platforms) you should only use the API
				15	described in part I.
				16
				17	Part I - pci_ and dma_ Equivalent API
				18	-------------------------------------
				19
				20	To get the pci_ API, you must #include <linux/pci.h>
				21	To get the dma_ API, you must #include <linux/dma-mapping.h>
				22
				23
				24	Part Ia - Using large dma-coherent buffers
				25	------------------------------------------
				26
				27	void *
				28	dma_alloc_coherent(struct device *dev, size_t size,
				29	dma_addr_t *dma_handle, int flag)
				30	void *
				31	pci_alloc_consistent(struct pci_dev *dev, size_t size,
				32	dma_addr_t *dma_handle)
				33
				34	Consistent memory is memory for which a write by either the device or
				35	the processor can immediately be read by the processor or device
David Brownell	21440d3	2006-04-01 10:21:52 -0800	[diff] [blame]	36	without having to worry about caching effects. (You may however need
				37	to make sure to flush the processor's write buffers before telling
				38	devices to read that memory.)
Linus Torvalds	1da177e	2005-04-16 15:20:36 -0700	[diff] [blame]	39
				40	This routine allocates a region of <size> bytes of consistent memory.
				41	it also returns a <dma_handle> which may be cast to an unsigned
				42	integer the same width as the bus and used as the physical address
				43	base of the region.
				44
				45	Returns: a pointer to the allocated region (in the processor's virtual
				46	address space) or NULL if the allocation failed.
				47
				48	Note: consistent memory can be expensive on some platforms, and the
				49	minimum allocation length may be as big as a page, so you should
				50	consolidate your requests for consistent memory as much as possible.
				51	The simplest way to do that is to use the dma_pool calls (see below).
				52
				53	The flag parameter (dma_alloc_coherent only) allows the caller to
				54	specify the GFP_ flags (see kmalloc) for the allocation (the
				55	implementation may chose to ignore flags that affect the location of
				56	the returned memory, like GFP_DMA). For pci_alloc_consistent, you
				57	must assume GFP_ATOMIC behaviour.
				58
				59	void
				60	dma_free_coherent(struct device dev, size_t size, void cpu_addr
				61	dma_addr_t dma_handle)
				62	void
				63	pci_free_consistent(struct pci_dev dev, size_t size, void cpu_addr
				64	dma_addr_t dma_handle)
				65
				66	Free the region of consistent memory you previously allocated. dev,
				67	size and dma_handle must all be the same as those passed into the
				68	consistent allocate. cpu_addr must be the virtual address returned by
				69	the consistent allocate
				70
				71
				72	Part Ib - Using small dma-coherent buffers
				73	------------------------------------------
				74
				75	To get this part of the dma_ API, you must #include <linux/dmapool.h>
				76
				77	Many drivers need lots of small dma-coherent memory regions for DMA
				78	descriptors or I/O buffers. Rather than allocating in units of a page
				79	or more using dma_alloc_coherent(), you can use DMA pools. These work
				80	much like a kmem_cache_t, except that they use the dma-coherent allocator
				81	not __get_free_pages(). Also, they understand common hardware constraints
				82	for alignment, like queue heads needing to be aligned on N byte boundaries.
				83
				84
				85	struct dma_pool *
				86	dma_pool_create(const char name, struct device dev,
				87	size_t size, size_t align, size_t alloc);
				88
				89	struct pci_pool *
				90	pci_pool_create(const char name, struct pci_device dev,
				91	size_t size, size_t align, size_t alloc);
				92
				93	The pool create() routines initialize a pool of dma-coherent buffers
				94	for use with a given device. It must be called in a context which
				95	can sleep.
				96
				97	The "name" is for diagnostics (like a kmem_cache_t name); dev and size
				98	are like what you'd pass to dma_alloc_coherent(). The device's hardware
				99	alignment requirement for this type of data is "align" (which is expressed
				100	in bytes, and must be a power of two). If your device has no boundary
				101	crossing restrictions, pass 0 for alloc; passing 4096 says memory allocated
				102	from this pool must not cross 4KByte boundaries.
				103
				104
				105	void dma_pool_alloc(struct dma_pool pool, int gfp_flags,
				106	dma_addr_t *dma_handle);
				107
				108	void pci_pool_alloc(struct pci_pool pool, int gfp_flags,
				109	dma_addr_t *dma_handle);
				110
				111	This allocates memory from the pool; the returned memory will meet the size
				112	and alignment requirements specified at creation time. Pass GFP_ATOMIC to
				113	prevent blocking, or if it's permitted (not in_interrupt, not holding SMP locks)
				114	pass GFP_KERNEL to allow blocking. Like dma_alloc_coherent(), this returns
				115	two values: an address usable by the cpu, and the dma address usable by the
				116	pool's device.
				117
				118
				119	void dma_pool_free(struct dma_pool pool, void vaddr,
				120	dma_addr_t addr);
				121
				122	void pci_pool_free(struct pci_pool pool, void vaddr,
				123	dma_addr_t addr);
				124
				125	This puts memory back into the pool. The pool is what was passed to
Tobias Klauser	d533f67	2005-09-10 00:26:46 -0700	[diff] [blame]	126	the pool allocation routine; the cpu and dma addresses are what
Linus Torvalds	1da177e	2005-04-16 15:20:36 -0700	[diff] [blame]	127	were returned when that routine allocated the memory being freed.
				128
				129
				130	void dma_pool_destroy(struct dma_pool *pool);
				131
				132	void pci_pool_destroy(struct pci_pool *pool);
				133
				134	The pool destroy() routines free the resources of the pool. They must be
				135	called in a context which can sleep. Make sure you've freed all allocated
				136	memory back to the pool before you destroy it.
				137
				138
				139	Part Ic - DMA addressing limitations
				140	------------------------------------
				141
				142	int
				143	dma_supported(struct device *dev, u64 mask)
				144	int
				145	pci_dma_supported(struct device *dev, u64 mask)
				146
				147	Checks to see if the device can support DMA to the memory described by
				148	mask.
				149
				150	Returns: 1 if it can and 0 if it can't.
				151
				152	Notes: This routine merely tests to see if the mask is possible. It
				153	won't change the current mask settings. It is more intended as an
				154	internal API for use by the platform than an external API for use by
				155	driver writers.
				156
				157	int
				158	dma_set_mask(struct device *dev, u64 mask)
				159	int
				160	pci_set_dma_mask(struct pci_device *dev, u64 mask)
				161
				162	Checks to see if the mask is possible and updates the device
				163	parameters if it is.
				164
				165	Returns: 0 if successful and a negative error if not.
				166
				167	u64
				168	dma_get_required_mask(struct device *dev)
				169
				170	After setting the mask with dma_set_mask(), this API returns the
				171	actual mask (within that already set) that the platform actually
				172	requires to operate efficiently. Usually this means the returned mask
				173	is the minimum required to cover all of memory. Examining the
				174	required mask gives drivers with variable descriptor sizes the
				175	opportunity to use smaller descriptors as necessary.
				176
				177	Requesting the required mask does not alter the current mask. If you
				178	wish to take advantage of it, you should issue another dma_set_mask()
				179	call to lower the mask again.
				180
				181
				182	Part Id - Streaming DMA mappings
				183	--------------------------------
				184
				185	dma_addr_t
				186	dma_map_single(struct device dev, void cpu_addr, size_t size,
				187	enum dma_data_direction direction)
				188	dma_addr_t
				189	pci_map_single(struct device dev, void cpu_addr, size_t size,
				190	int direction)
				191
				192	Maps a piece of processor virtual memory so it can be accessed by the
				193	device and returns the physical handle of the memory.
				194
				195	The direction for both api's may be converted freely by casting.
				196	However the dma_ API uses a strongly typed enumerator for its
				197	direction:
				198
				199	DMA_NONE = PCI_DMA_NONE no direction (used for
				200	debugging)
				201	DMA_TO_DEVICE = PCI_DMA_TODEVICE data is going from the
				202	memory to the device
				203	DMA_FROM_DEVICE = PCI_DMA_FROMDEVICE data is coming from
				204	the device to the
				205	memory
				206	DMA_BIDIRECTIONAL = PCI_DMA_BIDIRECTIONAL direction isn't known
				207
				208	Notes: Not all memory regions in a machine can be mapped by this
				209	API. Further, regions that appear to be physically contiguous in
				210	kernel virtual space may not be contiguous as physical memory. Since
				211	this API does not provide any scatter/gather capability, it will fail
				212	if the user tries to map a non physically contiguous piece of memory.
				213	For this reason, it is recommended that memory mapped by this API be
				214	obtained only from sources which guarantee to be physically contiguous
				215	(like kmalloc).
				216
				217	Further, the physical address of the memory must be within the
				218	dma_mask of the device (the dma_mask represents a bit mask of the
				219	addressable region for the device. i.e. if the physical address of
				220	the memory anded with the dma_mask is still equal to the physical
				221	address, then the device can perform DMA to the memory). In order to
				222	ensure that the memory allocated by kmalloc is within the dma_mask,
				223	the driver may specify various platform dependent flags to restrict
				224	the physical memory range of the allocation (e.g. on x86, GFP_DMA
				225	guarantees to be within the first 16Mb of available physical memory,
				226	as required by ISA devices).
				227
				228	Note also that the above constraints on physical contiguity and
				229	dma_mask may not apply if the platform has an IOMMU (a device which
				230	supplies a physical to virtual mapping between the I/O memory bus and
				231	the device). However, to be portable, device driver writers may not
				232	assume that such an IOMMU exists.
				233
				234	Warnings: Memory coherency operates at a granularity called the cache
				235	line width. In order for memory mapped by this API to operate
				236	correctly, the mapped region must begin exactly on a cache line
				237	boundary and end exactly on one (to prevent two separately mapped
				238	regions from sharing a single cache line). Since the cache line size
				239	may not be known at compile time, the API will not enforce this
				240	requirement. Therefore, it is recommended that driver writers who
				241	don't take special care to determine the cache line size at run time
				242	only map virtual regions that begin and end on page boundaries (which
				243	are guaranteed also to be cache line boundaries).
				244
				245	DMA_TO_DEVICE synchronisation must be done after the last modification
				246	of the memory region by the software and before it is handed off to
				247	the driver. Once this primitive is used. Memory covered by this
				248	primitive should be treated as read only by the device. If the device
				249	may write to it at any point, it should be DMA_BIDIRECTIONAL (see
				250	below).
				251
				252	DMA_FROM_DEVICE synchronisation must be done before the driver
				253	accesses data that may be changed by the device. This memory should
				254	be treated as read only by the driver. If the driver needs to write
				255	to it at any point, it should be DMA_BIDIRECTIONAL (see below).
				256
				257	DMA_BIDIRECTIONAL requires special handling: it means that the driver
				258	isn't sure if the memory was modified before being handed off to the
				259	device and also isn't sure if the device will also modify it. Thus,
				260	you must always sync bidirectional memory twice: once before the
				261	memory is handed off to the device (to make sure all memory changes
				262	are flushed from the processor) and once before the data may be
				263	accessed after being used by the device (to make sure any processor
				264	cache lines are updated with data that the device may have changed.
				265
				266	void
				267	dma_unmap_single(struct device *dev, dma_addr_t dma_addr, size_t size,
				268	enum dma_data_direction direction)
				269	void
				270	pci_unmap_single(struct pci_dev *hwdev, dma_addr_t dma_addr,
				271	size_t size, int direction)
				272
				273	Unmaps the region previously mapped. All the parameters passed in
				274	must be identical to those passed in (and returned) by the mapping
				275	API.
				276
				277	dma_addr_t
				278	dma_map_page(struct device dev, struct page page,
				279	unsigned long offset, size_t size,
				280	enum dma_data_direction direction)
				281	dma_addr_t
				282	pci_map_page(struct pci_dev hwdev, struct page page,
				283	unsigned long offset, size_t size, int direction)
				284	void
				285	dma_unmap_page(struct device *dev, dma_addr_t dma_address, size_t size,
				286	enum dma_data_direction direction)
				287	void
				288	pci_unmap_page(struct pci_dev *hwdev, dma_addr_t dma_address,
				289	size_t size, int direction)
				290
				291	API for mapping and unmapping for pages. All the notes and warnings
				292	for the other mapping APIs apply here. Also, although the <offset>
				293	and <size> parameters are provided to do partial page mapping, it is
				294	recommended that you never use these unless you really know what the
				295	cache width is.
				296
				297	int
				298	dma_mapping_error(dma_addr_t dma_addr)
				299
				300	int
				301	pci_dma_mapping_error(dma_addr_t dma_addr)
				302
				303	In some circumstances dma_map_single and dma_map_page will fail to create
				304	a mapping. A driver can check for these errors by testing the returned
				305	dma address with dma_mapping_error(). A non zero return value means the mapping
				306	could not be created and the driver should take appropriate action (eg
				307	reduce current DMA mapping usage or delay and try again later).
				308
David Brownell	21440d3	2006-04-01 10:21:52 -0800	[diff] [blame]	309	int
				310	dma_map_sg(struct device dev, struct scatterlist sg,
				311	int nents, enum dma_data_direction direction)
				312	int
				313	pci_map_sg(struct pci_dev hwdev, struct scatterlist sg,
				314	int nents, int direction)
Linus Torvalds	1da177e	2005-04-16 15:20:36 -0700	[diff] [blame]	315
				316	Maps a scatter gather list from the block layer.
				317
				318	Returns: the number of physical segments mapped (this may be shorted
				319	than <nents> passed in if the block layer determines that some
				320	elements of the scatter/gather list are physically adjacent and thus
				321	may be mapped with a single entry).
				322
				323	Please note that the sg cannot be mapped again if it has been mapped once.
				324	The mapping process is allowed to destroy information in the sg.
				325
				326	As with the other mapping interfaces, dma_map_sg can fail. When it
				327	does, 0 is returned and a driver must take appropriate action. It is
				328	critical that the driver do something, in the case of a block driver
				329	aborting the request or even oopsing is better than doing nothing and
				330	corrupting the filesystem.
				331
David Brownell	21440d3	2006-04-01 10:21:52 -0800	[diff] [blame]	332	With scatterlists, you use the resulting mapping like this:
				333
				334	int i, count = dma_map_sg(dev, sglist, nents, direction);
				335	struct scatterlist *sg;
				336
				337	for (i = 0, sg = sglist; i < count; i++, sg++) {
				338	hw_address[i] = sg_dma_address(sg);
				339	hw_len[i] = sg_dma_len(sg);
				340	}
				341
				342	where nents is the number of entries in the sglist.
				343
				344	The implementation is free to merge several consecutive sglist entries
				345	into one (e.g. with an IOMMU, or if several pages just happen to be
				346	physically contiguous) and returns the actual number of sg entries it
				347	mapped them to. On failure 0, is returned.
				348
				349	Then you should loop count times (note: this can be less than nents times)
				350	and use sg_dma_address() and sg_dma_len() macros where you previously
				351	accessed sg->address and sg->length as shown above.
				352
				353	void
				354	dma_unmap_sg(struct device dev, struct scatterlist sg,
				355	int nhwentries, enum dma_data_direction direction)
				356	void
				357	pci_unmap_sg(struct pci_dev hwdev, struct scatterlist sg,
				358	int nents, int direction)
Linus Torvalds	1da177e	2005-04-16 15:20:36 -0700	[diff] [blame]	359
				360	unmap the previously mapped scatter/gather list. All the parameters
				361	must be the same as those and passed in to the scatter/gather mapping
				362	API.
				363
				364	Note: <nents> must be the number you passed in, not the number of
				365	physical entries returned.
				366
				367	void
				368	dma_sync_single(struct device *dev, dma_addr_t dma_handle, size_t size,
				369	enum dma_data_direction direction)
				370	void
				371	pci_dma_sync_single(struct pci_dev *hwdev, dma_addr_t dma_handle,
				372	size_t size, int direction)
				373	void
				374	dma_sync_sg(struct device dev, struct scatterlist sg, int nelems,
				375	enum dma_data_direction direction)
				376	void
				377	pci_dma_sync_sg(struct pci_dev hwdev, struct scatterlist sg,
				378	int nelems, int direction)
				379
				380	synchronise a single contiguous or scatter/gather mapping. All the
				381	parameters must be the same as those passed into the single mapping
				382	API.
				383
				384	Notes: You must do this:
				385
				386	- Before reading values that have been written by DMA from the device
				387	(use the DMA_FROM_DEVICE direction)
				388	- After writing values that will be written to the device using DMA
				389	(use the DMA_TO_DEVICE) direction
				390	- before and after handing memory to the device if the memory is
				391	DMA_BIDIRECTIONAL
				392
				393	See also dma_map_single().
				394
				395
				396	Part II - Advanced dma_ usage
				397	-----------------------------
				398
				399	Warning: These pieces of the DMA API have no PCI equivalent. They
				400	should also not be used in the majority of cases, since they cater for
				401	unlikely corner cases that don't belong in usual drivers.
				402
				403	If you don't understand how cache line coherency works between a
				404	processor and an I/O device, you should not be using this part of the
				405	API at all.
				406
				407	void *
				408	dma_alloc_noncoherent(struct device *dev, size_t size,
				409	dma_addr_t *dma_handle, int flag)
				410
				411	Identical to dma_alloc_coherent() except that the platform will
				412	choose to return either consistent or non-consistent memory as it sees
				413	fit. By using this API, you are guaranteeing to the platform that you
				414	have all the correct and necessary sync points for this memory in the
				415	driver should it choose to return non-consistent memory.
				416
				417	Note: where the platform can return consistent memory, it will
				418	guarantee that the sync points become nops.
				419
				420	Warning: Handling non-consistent memory is a real pain. You should
				421	only ever use this API if you positively know your driver will be
				422	required to work on one of the rare (usually non-PCI) architectures
				423	that simply cannot make consistent memory.
				424
				425	void
				426	dma_free_noncoherent(struct device dev, size_t size, void cpu_addr,
				427	dma_addr_t dma_handle)
				428
				429	free memory allocated by the nonconsistent API. All parameters must
				430	be identical to those passed in (and returned by
				431	dma_alloc_noncoherent()).
				432
				433	int
				434	dma_is_consistent(dma_addr_t dma_handle)
				435
				436	returns true if the memory pointed to by the dma_handle is actually
				437	consistent.
				438
				439	int
				440	dma_get_cache_alignment(void)
				441
				442	returns the processor cache alignment. This is the absolute minimum
				443	alignment and width that you must observe when either mapping
				444	memory or doing partial flushes.
				445
				446	Notes: This API may return a number larger than the actual cache
				447	line, but it will guarantee that one or more cache lines fit exactly
				448	into the width returned by this call. It will also always be a power
				449	of two for easy alignment
				450
				451	void
				452	dma_sync_single_range(struct device *dev, dma_addr_t dma_handle,
				453	unsigned long offset, size_t size,
				454	enum dma_data_direction direction)
				455
				456	does a partial sync. starting at offset and continuing for size. You
				457	must be careful to observe the cache alignment and width when doing
				458	anything like this. You must also be extra careful about accessing
				459	memory you intend to sync partially.
				460
				461	void
				462	dma_cache_sync(void *vaddr, size_t size,
				463	enum dma_data_direction direction)
				464
				465	Do a partial sync of memory that was allocated by
				466	dma_alloc_noncoherent(), starting at virtual address vaddr and
				467	continuing on for size. Again, you must observe the cache line
				468	boundaries when doing this.
				469
				470	int
				471	dma_declare_coherent_memory(struct device *dev, dma_addr_t bus_addr,
				472	dma_addr_t device_addr, size_t size, int
				473	flags)
				474
				475
				476	Declare region of memory to be handed out by dma_alloc_coherent when
				477	it's asked for coherent memory for this device.
				478
				479	bus_addr is the physical address to which the memory is currently
				480	assigned in the bus responding region (this will be used by the
				481	platform to perform the mapping)
				482
				483	device_addr is the physical address the device needs to be programmed
				484	with actually to address this memory (this will be handed out as the
				485	dma_addr_t in dma_alloc_coherent())
				486
				487	size is the size of the area (must be multiples of PAGE_SIZE).
				488
				489	flags can be or'd together and are
				490
				491	DMA_MEMORY_MAP - request that the memory returned from
				492	dma_alloc_coherent() be directly writeable.
				493
				494	DMA_MEMORY_IO - request that the memory returned from
				495	dma_alloc_coherent() be addressable using read/write/memcpy_toio etc.
				496
				497	One or both of these flags must be present
				498
				499	DMA_MEMORY_INCLUDES_CHILDREN - make the declared memory be allocated by
				500	dma_alloc_coherent of any child devices of this one (for memory residing
				501	on a bridge).
				502
				503	DMA_MEMORY_EXCLUSIVE - only allocate memory from the declared regions.
				504	Do not allow dma_alloc_coherent() to fall back to system memory when
				505	it's out of memory in the declared region.
				506
				507	The return value will be either DMA_MEMORY_MAP or DMA_MEMORY_IO and
				508	must correspond to a passed in flag (i.e. no returning DMA_MEMORY_IO
				509	if only DMA_MEMORY_MAP were passed in) for success or zero for
				510	failure.
				511
				512	Note, for DMA_MEMORY_IO returns, all subsequent memory returned by
				513	dma_alloc_coherent() may no longer be accessed directly, but instead
				514	must be accessed using the correct bus functions. If your driver
				515	isn't prepared to handle this contingency, it should not specify
				516	DMA_MEMORY_IO in the input flags.
				517
				518	As a simplification for the platforms, only one such region of
				519	memory may be declared per device.
				520
				521	For reasons of efficiency, most platforms choose to track the declared
				522	region only at the granularity of a page. For smaller allocations,
				523	you should use the dma_pool() API.
				524
				525	void
				526	dma_release_declared_memory(struct device *dev)
				527
				528	Remove the memory region previously declared from the system. This
				529	API performs no in-use checking for this region and will return
				530	unconditionally having removed all the required structures. It is the
				531	drivers job to ensure that no parts of this memory region are
				532	currently in use.
				533
				534	void *
				535	dma_mark_declared_memory_occupied(struct device *dev,
				536	dma_addr_t device_addr, size_t size)
				537
				538	This is used to occupy specific regions of the declared space
				539	(dma_alloc_coherent() will hand out the first free region it finds).
				540
				541	device_addr is the device address of the region requested
				542
				543	size is the size (and should be a page sized multiple).
				544
				545	The return value will be either a pointer to the processor virtual
				546	address of the memory, or an error (via PTR_ERR()) if any part of the
				547	region is occupied.
				548
				549