Blame - Documentation/DMA-mapping.txt - kernel/msm-4.9

blob: 684557474c156210114243f89b2f79667bdb179e [file] [log] [blame]

Linus Torvalds	1da177e	2005-04-16 15:20:36 -0700	[diff] [blame]	1	Dynamic DMA mapping
				2	===================
				3
				4	David S. Miller <davem@redhat.com>
				5	Richard Henderson <rth@cygnus.com>
				6	Jakub Jelinek <jakub@redhat.com>
				7
				8	This document describes the DMA mapping system in terms of the pci_
				9	API. For a similar API that works for generic devices, see
				10	DMA-API.txt.
				11
				12	Most of the 64bit platforms have special hardware that translates bus
				13	addresses (DMA addresses) into physical addresses. This is similar to
				14	how page tables and/or a TLB translates virtual addresses to physical
				15	addresses on a CPU. This is needed so that e.g. PCI devices can
				16	access with a Single Address Cycle (32bit DMA address) any page in the
				17	64bit physical address space. Previously in Linux those 64bit
				18	platforms had to set artificial limits on the maximum RAM size in the
				19	system, so that the virt_to_bus() static scheme works (the DMA address
				20	translation tables were simply filled on bootup to map each bus
				21	address to the physical page __pa(bus_to_virt())).
				22
				23	So that Linux can use the dynamic DMA mapping, it needs some help from the
				24	drivers, namely it has to take into account that DMA addresses should be
				25	mapped only for the time they are actually used and unmapped after the DMA
				26	transfer.
				27
				28	The following API will work of course even on platforms where no such
				29	hardware exists, see e.g. include/asm-i386/pci.h for how it is implemented on
				30	top of the virt_to_bus interface.
				31
				32	First of all, you should make sure
				33
				34	#include <linux/pci.h>
				35
				36	is in your driver. This file will obtain for you the definition of the
				37	dma_addr_t (which can hold any valid DMA address for the platform)
				38	type which should be used everywhere you hold a DMA (bus) address
				39	returned from the DMA mapping functions.
				40
				41	What memory is DMA'able?
				42
				43	The first piece of information you must know is what kernel memory can
				44	be used with the DMA mapping facilities. There has been an unwritten
				45	set of rules regarding this, and this text is an attempt to finally
				46	write them down.
				47
				48	If you acquired your memory via the page allocator
				49	(i.e. __get_free_page*()) or the generic memory allocators
				50	(i.e. kmalloc() or kmem_cache_alloc()) then you may DMA to/from
				51	that memory using the addresses returned from those routines.
				52
				53	This means specifically that you may _not_ use the memory/addresses
				54	returned from vmalloc() for DMA. It is possible to DMA to the
				55	_underlying_ memory mapped into a vmalloc() area, but this requires
				56	walking page tables to get the physical addresses, and then
				57	translating each of those pages back to a kernel address using
				58	something like __va(). [ EDIT: Update this when we integrate
				59	Gerd Knorr's generic code which does this. ]
				60
				61	This rule also means that you may not use kernel image addresses
				62	(ie. items in the kernel's data/text/bss segment, or your driver's)
				63	nor may you use kernel stack addresses for DMA. Both of these items
				64	might be mapped somewhere entirely different than the rest of physical
				65	memory.
				66
				67	Also, this means that you cannot take the return of a kmap()
				68	call and DMA to/from that. This is similar to vmalloc().
				69
				70	What about block I/O and networking buffers? The block I/O and
				71	networking subsystems make sure that the buffers they use are valid
				72	for you to DMA from/to.
				73
				74	DMA addressing limitations
				75
				76	Does your device have any DMA addressing limitations? For example, is
				77	your device only capable of driving the low order 24-bits of address
				78	on the PCI bus for SAC DMA transfers? If so, you need to inform the
				79	PCI layer of this fact.
				80
				81	By default, the kernel assumes that your device can address the full
				82	32-bits in a SAC cycle. For a 64-bit DAC capable device, this needs
				83	to be increased. And for a device with limitations, as discussed in
				84	the previous paragraph, it needs to be decreased.
				85
				86	pci_alloc_consistent() by default will return 32-bit DMA addresses.
				87	PCI-X specification requires PCI-X devices to support 64-bit
				88	addressing (DAC) for all transactions. And at least one platform (SGI
				89	SN2) requires 64-bit consistent allocations to operate correctly when
				90	the IO bus is in PCI-X mode. Therefore, like with pci_set_dma_mask(),
				91	it's good practice to call pci_set_consistent_dma_mask() to set the
				92	appropriate mask even if your device only supports 32-bit DMA
				93	(default) and especially if it's a PCI-X device.
				94
				95	For correct operation, you must interrogate the PCI layer in your
				96	device probe routine to see if the PCI controller on the machine can
				97	properly support the DMA addressing limitation your device has. It is
				98	good style to do this even if your device holds the default setting,
				99	because this shows that you did think about these issues wrt. your
				100	device.
				101
				102	The query is performed via a call to pci_set_dma_mask():
				103
				104	int pci_set_dma_mask(struct pci_dev *pdev, u64 device_mask);
				105
				106	The query for consistent allocations is performed via a a call to
				107	pci_set_consistent_dma_mask():
				108
				109	int pci_set_consistent_dma_mask(struct pci_dev *pdev, u64 device_mask);
				110
				111	Here, pdev is a pointer to the PCI device struct of your device, and
				112	device_mask is a bit mask describing which bits of a PCI address your
				113	device supports. It returns zero if your card can perform DMA
				114	properly on the machine given the address mask you provided.
				115
				116	If it returns non-zero, your device can not perform DMA properly on
				117	this platform, and attempting to do so will result in undefined
				118	behavior. You must either use a different mask, or not use DMA.
				119
				120	This means that in the failure case, you have three options:
				121
				122	1) Use another DMA mask, if possible (see below).
				123	2) Use some non-DMA mode for data transfer, if possible.
				124	3) Ignore this device and do not initialize it.
				125
				126	It is recommended that your driver print a kernel KERN_WARNING message
				127	when you end up performing either #2 or #3. In this manner, if a user
				128	of your driver reports that performance is bad or that the device is not
				129	even detected, you can ask them for the kernel messages to find out
				130	exactly why.
				131
				132	The standard 32-bit addressing PCI device would do something like
				133	this:
				134
				135	if (pci_set_dma_mask(pdev, DMA_32BIT_MASK)) {
				136	printk(KERN_WARNING
				137	"mydev: No suitable DMA available.\n");
				138	goto ignore_this_device;
				139	}
				140
				141	Another common scenario is a 64-bit capable device. The approach
				142	here is to try for 64-bit DAC addressing, but back down to a
				143	32-bit mask should that fail. The PCI platform code may fail the
				144	64-bit mask not because the platform is not capable of 64-bit
				145	addressing. Rather, it may fail in this case simply because
				146	32-bit SAC addressing is done more efficiently than DAC addressing.
				147	Sparc64 is one platform which behaves in this way.
				148
				149	Here is how you would handle a 64-bit capable device which can drive
				150	all 64-bits when accessing streaming DMA:
				151
				152	int using_dac;
				153
				154	if (!pci_set_dma_mask(pdev, DMA_64BIT_MASK)) {
				155	using_dac = 1;
				156	} else if (!pci_set_dma_mask(pdev, DMA_32BIT_MASK)) {
				157	using_dac = 0;
				158	} else {
				159	printk(KERN_WARNING
				160	"mydev: No suitable DMA available.\n");
				161	goto ignore_this_device;
				162	}
				163
				164	If a card is capable of using 64-bit consistent allocations as well,
				165	the case would look like this:
				166
				167	int using_dac, consistent_using_dac;
				168
				169	if (!pci_set_dma_mask(pdev, DMA_64BIT_MASK)) {
				170	using_dac = 1;
				171	consistent_using_dac = 1;
				172	pci_set_consistent_dma_mask(pdev, DMA_64BIT_MASK);
				173	} else if (!pci_set_dma_mask(pdev, DMA_32BIT_MASK)) {
				174	using_dac = 0;
				175	consistent_using_dac = 0;
				176	pci_set_consistent_dma_mask(pdev, DMA_32BIT_MASK);
				177	} else {
				178	printk(KERN_WARNING
				179	"mydev: No suitable DMA available.\n");
				180	goto ignore_this_device;
				181	}
				182
				183	pci_set_consistent_dma_mask() will always be able to set the same or a
				184	smaller mask as pci_set_dma_mask(). However for the rare case that a
				185	device driver only uses consistent allocations, one would have to
				186	check the return value from pci_set_consistent_dma_mask().
				187
				188	If your 64-bit device is going to be an enormous consumer of DMA
				189	mappings, this can be problematic since the DMA mappings are a
				190	finite resource on many platforms. Please see the "DAC Addressing
				191	for Address Space Hungry Devices" section near the end of this
				192	document for how to handle this case.
				193
				194	Finally, if your device can only drive the low 24-bits of
				195	address during PCI bus mastering you might do something like:
				196
				197	if (pci_set_dma_mask(pdev, 0x00ffffff)) {
				198	printk(KERN_WARNING
				199	"mydev: 24-bit DMA addressing not available.\n");
				200	goto ignore_this_device;
				201	}
				202
				203	When pci_set_dma_mask() is successful, and returns zero, the PCI layer
				204	saves away this mask you have provided. The PCI layer will use this
				205	information later when you make DMA mappings.
				206
				207	There is a case which we are aware of at this time, which is worth
				208	mentioning in this documentation. If your device supports multiple
				209	functions (for example a sound card provides playback and record
				210	functions) and the various different functions have _different_
				211	DMA addressing limitations, you may wish to probe each mask and
				212	only provide the functionality which the machine can handle. It
				213	is important that the last call to pci_set_dma_mask() be for the
				214	most specific mask.
				215
				216	Here is pseudo-code showing how this might be done:
				217
				218	#define PLAYBACK_ADDRESS_BITS DMA_32BIT_MASK
				219	#define RECORD_ADDRESS_BITS 0x00ffffff
				220
				221	struct my_sound_card *card;
				222	struct pci_dev *pdev;
				223
				224	...
				225	if (!pci_set_dma_mask(pdev, PLAYBACK_ADDRESS_BITS)) {
				226	card->playback_enabled = 1;
				227	} else {
				228	card->playback_enabled = 0;
				229	printk(KERN_WARN "%s: Playback disabled due to DMA limitations.\n",
				230	card->name);
				231	}
				232	if (!pci_set_dma_mask(pdev, RECORD_ADDRESS_BITS)) {
				233	card->record_enabled = 1;
				234	} else {
				235	card->record_enabled = 0;
				236	printk(KERN_WARN "%s: Record disabled due to DMA limitations.\n",
				237	card->name);
				238	}
				239
				240	A sound card was used as an example here because this genre of PCI
				241	devices seems to be littered with ISA chips given a PCI front end,
				242	and thus retaining the 16MB DMA addressing limitations of ISA.
				243
				244	Types of DMA mappings
				245
				246	There are two types of DMA mappings:
				247
				248	- Consistent DMA mappings which are usually mapped at driver
				249	initialization, unmapped at the end and for which the hardware should
				250	guarantee that the device and the CPU can access the data
				251	in parallel and will see updates made by each other without any
				252	explicit software flushing.
				253
				254	Think of "consistent" as "synchronous" or "coherent".
				255
				256	The current default is to return consistent memory in the low 32
				257	bits of the PCI bus space. However, for future compatibility you
				258	should set the consistent mask even if this default is fine for your
				259	driver.
				260
				261	Good examples of what to use consistent mappings for are:
				262
				263	- Network card DMA ring descriptors.
				264	- SCSI adapter mailbox command data structures.
				265	- Device firmware microcode executed out of
				266	main memory.
				267
				268	The invariant these examples all require is that any CPU store
				269	to memory is immediately visible to the device, and vice
				270	versa. Consistent mappings guarantee this.
				271
				272	IMPORTANT: Consistent DMA memory does not preclude the usage of
				273	proper memory barriers. The CPU may reorder stores to
				274	consistent memory just as it may normal memory. Example:
				275	if it is important for the device to see the first word
				276	of a descriptor updated before the second, you must do
				277	something like:
				278
				279	desc->word0 = address;
				280	wmb();
				281	desc->word1 = DESC_VALID;
				282
				283	in order to get correct behavior on all platforms.
				284
				285	- Streaming DMA mappings which are usually mapped for one DMA transfer,
				286	unmapped right after it (unless you use pci_dma_sync_* below) and for which
				287	hardware can optimize for sequential accesses.
				288
				289	This of "streaming" as "asynchronous" or "outside the coherency
				290	domain".
				291
				292	Good examples of what to use streaming mappings for are:
				293
				294	- Networking buffers transmitted/received by a device.
				295	- Filesystem buffers written/read by a SCSI device.
				296
				297	The interfaces for using this type of mapping were designed in
				298	such a way that an implementation can make whatever performance
				299	optimizations the hardware allows. To this end, when using
				300	such mappings you must be explicit about what you want to happen.
				301
				302	Neither type of DMA mapping has alignment restrictions that come
				303	from PCI, although some devices may have such restrictions.
				304
				305	Using Consistent DMA mappings.
				306
				307	To allocate and map large (PAGE_SIZE or so) consistent DMA regions,
				308	you should do:
				309
				310	dma_addr_t dma_handle;
				311
				312	cpu_addr = pci_alloc_consistent(dev, size, &dma_handle);
				313
				314	where dev is a struct pci_dev *. You should pass NULL for PCI like buses
				315	where devices don't have struct pci_dev (like ISA, EISA). This may be
				316	called in interrupt context.
				317
				318	This argument is needed because the DMA translations may be bus
				319	specific (and often is private to the bus which the device is attached
				320	to).
				321
				322	Size is the length of the region you want to allocate, in bytes.
				323
				324	This routine will allocate RAM for that region, so it acts similarly to
				325	__get_free_pages (but takes size instead of a page order). If your
				326	driver needs regions sized smaller than a page, you may prefer using
				327	the pci_pool interface, described below.
				328
				329	The consistent DMA mapping interfaces, for non-NULL dev, will by
				330	default return a DMA address which is SAC (Single Address Cycle)
				331	addressable. Even if the device indicates (via PCI dma mask) that it
				332	may address the upper 32-bits and thus perform DAC cycles, consistent
				333	allocation will only return > 32-bit PCI addresses for DMA if the
				334	consistent dma mask has been explicitly changed via
				335	pci_set_consistent_dma_mask(). This is true of the pci_pool interface
				336	as well.
				337
				338	pci_alloc_consistent returns two values: the virtual address which you
				339	can use to access it from the CPU and dma_handle which you pass to the
				340	card.
				341
				342	The cpu return address and the DMA bus master address are both
				343	guaranteed to be aligned to the smallest PAGE_SIZE order which
				344	is greater than or equal to the requested size. This invariant
				345	exists (for example) to guarantee that if you allocate a chunk
				346	which is smaller than or equal to 64 kilobytes, the extent of the
				347	buffer you receive will not cross a 64K boundary.
				348
				349	To unmap and free such a DMA region, you call:
				350
				351	pci_free_consistent(dev, size, cpu_addr, dma_handle);
				352
				353	where dev, size are the same as in the above call and cpu_addr and
				354	dma_handle are the values pci_alloc_consistent returned to you.
				355	This function may not be called in interrupt context.
				356
				357	If your driver needs lots of smaller memory regions, you can write
				358	custom code to subdivide pages returned by pci_alloc_consistent,
				359	or you can use the pci_pool API to do that. A pci_pool is like
				360	a kmem_cache, but it uses pci_alloc_consistent not __get_free_pages.
				361	Also, it understands common hardware constraints for alignment,
				362	like queue heads needing to be aligned on N byte boundaries.
				363
				364	Create a pci_pool like this:
				365
				366	struct pci_pool *pool;
				367
				368	pool = pci_pool_create(name, dev, size, align, alloc);
				369
				370	The "name" is for diagnostics (like a kmem_cache name); dev and size
				371	are as above. The device's hardware alignment requirement for this
				372	type of data is "align" (which is expressed in bytes, and must be a
				373	power of two). If your device has no boundary crossing restrictions,
				374	pass 0 for alloc; passing 4096 says memory allocated from this pool
				375	must not cross 4KByte boundaries (but at that time it may be better to
				376	go for pci_alloc_consistent directly instead).
				377
				378	Allocate memory from a pci pool like this:
				379
				380	cpu_addr = pci_pool_alloc(pool, flags, &dma_handle);
				381
				382	flags are SLAB_KERNEL if blocking is permitted (not in_interrupt nor
				383	holding SMP locks), SLAB_ATOMIC otherwise. Like pci_alloc_consistent,
				384	this returns two values, cpu_addr and dma_handle.
				385
				386	Free memory that was allocated from a pci_pool like this:
				387
				388	pci_pool_free(pool, cpu_addr, dma_handle);
				389
				390	where pool is what you passed to pci_pool_alloc, and cpu_addr and
				391	dma_handle are the values pci_pool_alloc returned. This function
				392	may be called in interrupt context.
				393
				394	Destroy a pci_pool by calling:
				395
				396	pci_pool_destroy(pool);
				397
				398	Make sure you've called pci_pool_free for all memory allocated
				399	from a pool before you destroy the pool. This function may not
				400	be called in interrupt context.
				401
				402	DMA Direction
				403
				404	The interfaces described in subsequent portions of this document
				405	take a DMA direction argument, which is an integer and takes on
				406	one of the following values:
				407
				408	PCI_DMA_BIDIRECTIONAL
				409	PCI_DMA_TODEVICE
				410	PCI_DMA_FROMDEVICE
				411	PCI_DMA_NONE
				412
				413	One should provide the exact DMA direction if you know it.
				414
				415	PCI_DMA_TODEVICE means "from main memory to the PCI device"
				416	PCI_DMA_FROMDEVICE means "from the PCI device to main memory"
				417	It is the direction in which the data moves during the DMA
				418	transfer.
				419
				420	You are _strongly_ encouraged to specify this as precisely
				421	as you possibly can.
				422
				423	If you absolutely cannot know the direction of the DMA transfer,
				424	specify PCI_DMA_BIDIRECTIONAL. It means that the DMA can go in
				425	either direction. The platform guarantees that you may legally
				426	specify this, and that it will work, but this may be at the
				427	cost of performance for example.
				428
				429	The value PCI_DMA_NONE is to be used for debugging. One can
				430	hold this in a data structure before you come to know the
				431	precise direction, and this will help catch cases where your
				432	direction tracking logic has failed to set things up properly.
				433
				434	Another advantage of specifying this value precisely (outside of
				435	potential platform-specific optimizations of such) is for debugging.
				436	Some platforms actually have a write permission boolean which DMA
				437	mappings can be marked with, much like page protections in the user
				438	program address space. Such platforms can and do report errors in the
				439	kernel logs when the PCI controller hardware detects violation of the
				440	permission setting.
				441
				442	Only streaming mappings specify a direction, consistent mappings
				443	implicitly have a direction attribute setting of
				444	PCI_DMA_BIDIRECTIONAL.
				445
	be7db05	2005-04-17 15:26:13 -0500	[diff] [blame]	446	The SCSI subsystem tells you the direction to use in the
				447	'sc_data_direction' member of the SCSI command your driver is
				448	working on.
Linus Torvalds	1da177e	2005-04-16 15:20:36 -0700	[diff] [blame]	449
				450	For Networking drivers, it's a rather simple affair. For transmit
				451	packets, map/unmap them with the PCI_DMA_TODEVICE direction
				452	specifier. For receive packets, just the opposite, map/unmap them
				453	with the PCI_DMA_FROMDEVICE direction specifier.
				454
				455	Using Streaming DMA mappings
				456
				457	The streaming DMA mapping routines can be called from interrupt
				458	context. There are two versions of each map/unmap, one which will
				459	map/unmap a single memory region, and one which will map/unmap a
				460	scatterlist.
				461
				462	To map a single region, you do:
				463
				464	struct pci_dev *pdev = mydev->pdev;
				465	dma_addr_t dma_handle;
				466	void *addr = buffer->ptr;
				467	size_t size = buffer->len;
				468
				469	dma_handle = pci_map_single(dev, addr, size, direction);
				470
				471	and to unmap it:
				472
				473	pci_unmap_single(dev, dma_handle, size, direction);
				474
				475	You should call pci_unmap_single when the DMA activity is finished, e.g.
				476	from the interrupt which told you that the DMA transfer is done.
				477
				478	Using cpu pointers like this for single mappings has a disadvantage,
				479	you cannot reference HIGHMEM memory in this way. Thus, there is a
				480	map/unmap interface pair akin to pci_{map,unmap}_single. These
				481	interfaces deal with page/offset pairs instead of cpu pointers.
				482	Specifically:
				483
				484	struct pci_dev *pdev = mydev->pdev;
				485	dma_addr_t dma_handle;
				486	struct page *page = buffer->page;
				487	unsigned long offset = buffer->offset;
				488	size_t size = buffer->len;
				489
				490	dma_handle = pci_map_page(dev, page, offset, size, direction);
				491
				492	...
				493
				494	pci_unmap_page(dev, dma_handle, size, direction);
				495
				496	Here, "offset" means byte offset within the given page.
				497
				498	With scatterlists, you map a region gathered from several regions by:
				499
				500	int i, count = pci_map_sg(dev, sglist, nents, direction);
				501	struct scatterlist *sg;
				502
				503	for (i = 0, sg = sglist; i < count; i++, sg++) {
				504	hw_address[i] = sg_dma_address(sg);
				505	hw_len[i] = sg_dma_len(sg);
				506	}
				507
				508	where nents is the number of entries in the sglist.
				509
				510	The implementation is free to merge several consecutive sglist entries
				511	into one (e.g. if DMA mapping is done with PAGE_SIZE granularity, any
				512	consecutive sglist entries can be merged into one provided the first one
				513	ends and the second one starts on a page boundary - in fact this is a huge
				514	advantage for cards which either cannot do scatter-gather or have very
				515	limited number of scatter-gather entries) and returns the actual number
				516	of sg entries it mapped them to. On failure 0 is returned.
				517
				518	Then you should loop count times (note: this can be less than nents times)
				519	and use sg_dma_address() and sg_dma_len() macros where you previously
				520	accessed sg->address and sg->length as shown above.
				521
				522	To unmap a scatterlist, just call:
				523
				524	pci_unmap_sg(dev, sglist, nents, direction);
				525
				526	Again, make sure DMA activity has already finished.
				527
				528	PLEASE NOTE: The 'nents' argument to the pci_unmap_sg call must be
				529	the _same_ one you passed into the pci_map_sg call,
				530	it should _NOT_ be the 'count' value _returned_ from the
				531	pci_map_sg call.
				532
				533	Every pci_map_{single,sg} call should have its pci_unmap_{single,sg}
				534	counterpart, because the bus address space is a shared resource (although
				535	in some ports the mapping is per each BUS so less devices contend for the
				536	same bus address space) and you could render the machine unusable by eating
				537	all bus addresses.
				538
				539	If you need to use the same streaming DMA region multiple times and touch
				540	the data in between the DMA transfers, the buffer needs to be synced
				541	properly in order for the cpu and device to see the most uptodate and
				542	correct copy of the DMA buffer.
				543
				544	So, firstly, just map it with pci_map_{single,sg}, and after each DMA
				545	transfer call either:
				546
				547	pci_dma_sync_single_for_cpu(dev, dma_handle, size, direction);
				548
				549	or:
				550
				551	pci_dma_sync_sg_for_cpu(dev, sglist, nents, direction);
				552
				553	as appropriate.
				554
				555	Then, if you wish to let the device get at the DMA area again,
				556	finish accessing the data with the cpu, and then before actually
				557	giving the buffer to the hardware call either:
				558
				559	pci_dma_sync_single_for_device(dev, dma_handle, size, direction);
				560
				561	or:
				562
				563	pci_dma_sync_sg_for_device(dev, sglist, nents, direction);
				564
				565	as appropriate.
				566
				567	After the last DMA transfer call one of the DMA unmap routines
				568	pci_unmap_{single,sg}. If you don't touch the data from the first pci_map_*
				569	call till pci_unmap_, then you don't have to call the pci_dma_sync_
				570	routines at all.
				571
				572	Here is pseudo code which shows a situation in which you would need
				573	to use the pci_dma_sync_*() interfaces.
				574
				575	my_card_setup_receive_buffer(struct my_card cp, char buffer, int len)
				576	{
				577	dma_addr_t mapping;
				578
				579	mapping = pci_map_single(cp->pdev, buffer, len, PCI_DMA_FROMDEVICE);
				580
				581	cp->rx_buf = buffer;
				582	cp->rx_len = len;
				583	cp->rx_dma = mapping;
				584
				585	give_rx_buf_to_card(cp);
				586	}
				587
				588	...
				589
				590	my_card_interrupt_handler(int irq, void devid, struct pt_regs regs)
				591	{
				592	struct my_card *cp = devid;
				593
				594	...
				595	if (read_card_status(cp) == RX_BUF_TRANSFERRED) {
				596	struct my_card_header *hp;
				597
				598	/* Examine the header to see if we wish
				599	* to accept the data. But synchronize
				600	* the DMA transfer with the CPU first
				601	* so that we see updated contents.
				602	*/
				603	pci_dma_sync_single_for_cpu(cp->pdev, cp->rx_dma,
				604	cp->rx_len,
				605	PCI_DMA_FROMDEVICE);
				606
				607	/* Now it is safe to examine the buffer. */
				608	hp = (struct my_card_header *) cp->rx_buf;
				609	if (header_is_ok(hp)) {
				610	pci_unmap_single(cp->pdev, cp->rx_dma, cp->rx_len,
				611	PCI_DMA_FROMDEVICE);
				612	pass_to_upper_layers(cp->rx_buf);
				613	make_and_setup_new_rx_buf(cp);
				614	} else {
				615	/* Just sync the buffer and give it back
				616	* to the card.
				617	*/
				618	pci_dma_sync_single_for_device(cp->pdev,
				619	cp->rx_dma,
				620	cp->rx_len,
				621	PCI_DMA_FROMDEVICE);
				622	give_rx_buf_to_card(cp);
				623	}
				624	}
				625	}
				626
				627	Drivers converted fully to this interface should not use virt_to_bus any
				628	longer, nor should they use bus_to_virt. Some drivers have to be changed a
				629	little bit, because there is no longer an equivalent to bus_to_virt in the
				630	dynamic DMA mapping scheme - you have to always store the DMA addresses
				631	returned by the pci_alloc_consistent, pci_pool_alloc, and pci_map_single
				632	calls (pci_map_sg stores them in the scatterlist itself if the platform
				633	supports dynamic DMA mapping in hardware) in your driver structures and/or
				634	in the card registers.
				635
				636	All PCI drivers should be using these interfaces with no exceptions.
				637	It is planned to completely remove virt_to_bus() and bus_to_virt() as
				638	they are entirely deprecated. Some ports already do not provide these
				639	as it is impossible to correctly support them.
				640
				641	64-bit DMA and DAC cycle support
				642
				643	Do you understand all of the text above? Great, then you already
				644	know how to use 64-bit DMA addressing under Linux. Simply make
				645	the appropriate pci_set_dma_mask() calls based upon your cards
				646	capabilities, then use the mapping APIs above.
				647
				648	It is that simple.
				649
				650	Well, not for some odd devices. See the next section for information
				651	about that.
				652
				653	DAC Addressing for Address Space Hungry Devices
				654
				655	There exists a class of devices which do not mesh well with the PCI
				656	DMA mapping API. By definition these "mappings" are a finite
				657	resource. The number of total available mappings per bus is platform
				658	specific, but there will always be a reasonable amount.
				659
				660	What is "reasonable"? Reasonable means that networking and block I/O
				661	devices need not worry about using too many mappings.
				662
				663	As an example of a problematic device, consider compute cluster cards.
				664	They can potentially need to access gigabytes of memory at once via
				665	DMA. Dynamic mappings are unsuitable for this kind of access pattern.
				666
				667	To this end we've provided a small API by which a device driver
				668	may use DAC cycles to directly address all of physical memory.
				669	Not all platforms support this, but most do. It is easy to determine
				670	whether the platform will work properly at probe time.
				671
				672	First, understand that there may be a SEVERE performance penalty for
				673	using these interfaces on some platforms. Therefore, you MUST only
				674	use these interfaces if it is absolutely required. %99 of devices can
				675	use the normal APIs without any problems.
				676
				677	Note that for streaming type mappings you must either use these
				678	interfaces, or the dynamic mapping interfaces above. You may not mix
				679	usage of both for the same device. Such an act is illegal and is
				680	guaranteed to put a banana in your tailpipe.
				681
				682	However, consistent mappings may in fact be used in conjunction with
				683	these interfaces. Remember that, as defined, consistent mappings are
				684	always going to be SAC addressable.
				685
				686	The first thing your driver needs to do is query the PCI platform
				687	layer with your devices DAC addressing capabilities:
				688
				689	int pci_dac_set_dma_mask(struct pci_dev *pdev, u64 mask);
				690
				691	This routine behaves identically to pci_set_dma_mask. You may not
				692	use the following interfaces if this routine fails.
				693
				694	Next, DMA addresses using this API are kept track of using the
				695	dma64_addr_t type. It is guaranteed to be big enough to hold any
				696	DAC address the platform layer will give to you from the following
				697	routines. If you have consistent mappings as well, you still
				698	use plain dma_addr_t to keep track of those.
				699
				700	All mappings obtained here will be direct. The mappings are not
				701	translated, and this is the purpose of this dialect of the DMA API.
				702
				703	All routines work with page/offset pairs. This is the _ONLY_ way to
				704	portably refer to any piece of memory. If you have a cpu pointer
				705	(which may be validly DMA'd too) you may easily obtain the page
				706	and offset using something like this:
				707
				708	struct page *page = virt_to_page(ptr);
				709	unsigned long offset = offset_in_page(ptr);
				710
				711	Here are the interfaces:
				712
				713	dma64_addr_t pci_dac_page_to_dma(struct pci_dev *pdev,
				714	struct page *page,
				715	unsigned long offset,
				716	int direction);
				717
				718	The DAC address for the tuple PAGE/OFFSET are returned. The direction
				719	argument is the same as for pci_{map,unmap}_single(). The same rules
				720	for cpu/device access apply here as for the streaming mapping
				721	interfaces. To reiterate:
				722
				723	The cpu may touch the buffer before pci_dac_page_to_dma.
				724	The device may touch the buffer after pci_dac_page_to_dma
				725	is made, but the cpu may NOT.
				726
				727	When the DMA transfer is complete, invoke:
				728
				729	void pci_dac_dma_sync_single_for_cpu(struct pci_dev *pdev,
				730	dma64_addr_t dma_addr,
				731	size_t len, int direction);
				732
				733	This must be done before the CPU looks at the buffer again.
				734	This interface behaves identically to pci_dma_sync_{single,sg}_for_cpu().
				735
				736	And likewise, if you wish to let the device get back at the buffer after
				737	the cpu has read/written it, invoke:
				738
				739	void pci_dac_dma_sync_single_for_device(struct pci_dev *pdev,
				740	dma64_addr_t dma_addr,
				741	size_t len, int direction);
				742
				743	before letting the device access the DMA area again.
				744
				745	If you need to get back to the PAGE/OFFSET tuple from a dma64_addr_t
				746	the following interfaces are provided:
				747
				748	struct page pci_dac_dma_to_page(struct pci_dev pdev,
				749	dma64_addr_t dma_addr);
				750	unsigned long pci_dac_dma_to_offset(struct pci_dev *pdev,
				751	dma64_addr_t dma_addr);
				752
				753	This is possible with the DAC interfaces purely because they are
				754	not translated in any way.
				755
				756	Optimizing Unmap State Space Consumption
				757
				758	On many platforms, pci_unmap_{single,page}() is simply a nop.
				759	Therefore, keeping track of the mapping address and length is a waste
				760	of space. Instead of filling your drivers up with ifdefs and the like
				761	to "work around" this (which would defeat the whole purpose of a
				762	portable API) the following facilities are provided.
				763
				764	Actually, instead of describing the macros one by one, we'll
				765	transform some example code.
				766
				767	1) Use DECLARE_PCI_UNMAP_{ADDR,LEN} in state saving structures.
				768	Example, before:
				769
				770	struct ring_state {
				771	struct sk_buff *skb;
				772	dma_addr_t mapping;
				773	__u32 len;
				774	};
				775
				776	after:
				777
				778	struct ring_state {
				779	struct sk_buff *skb;
				780	DECLARE_PCI_UNMAP_ADDR(mapping)
				781	DECLARE_PCI_UNMAP_LEN(len)
				782	};
				783
				784	NOTE: DO NOT put a semicolon at the end of the DECLARE_*()
				785	macro.
				786
				787	2) Use pci_unmap_{addr,len}_set to set these values.
				788	Example, before:
				789
				790	ringp->mapping = FOO;
				791	ringp->len = BAR;
				792
				793	after:
				794
				795	pci_unmap_addr_set(ringp, mapping, FOO);
				796	pci_unmap_len_set(ringp, len, BAR);
				797
				798	3) Use pci_unmap_{addr,len} to access these values.
				799	Example, before:
				800
				801	pci_unmap_single(pdev, ringp->mapping, ringp->len,
				802	PCI_DMA_FROMDEVICE);
				803
				804	after:
				805
				806	pci_unmap_single(pdev,
				807	pci_unmap_addr(ringp, mapping),
				808	pci_unmap_len(ringp, len),
				809	PCI_DMA_FROMDEVICE);
				810
				811	It really should be self-explanatory. We treat the ADDR and LEN
				812	separately, because it is possible for an implementation to only
				813	need the address in order to perform the unmap operation.
				814
				815	Platform Issues
				816
				817	If you are just writing drivers for Linux and do not maintain
				818	an architecture port for the kernel, you can safely skip down
				819	to "Closing".
				820
				821	1) Struct scatterlist requirements.
				822
				823	Struct scatterlist must contain, at a minimum, the following
				824	members:
				825
				826	struct page *page;
				827	unsigned int offset;
				828	unsigned int length;
				829
				830	The base address is specified by a "page+offset" pair.
				831
				832	Previous versions of struct scatterlist contained a "void *address"
				833	field that was sometimes used instead of page+offset. As of Linux
				834	2.5., page+offset is always used, and the "address" field has been
				835	deleted.
				836
				837	2) More to come...
				838
				839	Handling Errors
				840
				841	DMA address space is limited on some architectures and an allocation
				842	failure can be determined by:
				843
				844	- checking if pci_alloc_consistent returns NULL or pci_map_sg returns 0
				845
				846	- checking the returned dma_addr_t of pci_map_single and pci_map_page
				847	by using pci_dma_mapping_error():
				848
				849	dma_addr_t dma_handle;
				850
				851	dma_handle = pci_map_single(dev, addr, size, direction);
				852	if (pci_dma_mapping_error(dma_handle)) {
				853	/*
				854	* reduce current DMA mapping usage,
				855	* delay and try again later or
				856	* reset driver.
				857	*/
				858	}
				859
				860	Closing
				861
				862	This document, and the API itself, would not be in it's current
				863	form without the feedback and suggestions from numerous individuals.
				864	We would like to specifically mention, in no particular order, the
				865	following people:
				866
				867	Russell King <rmk@arm.linux.org.uk>
				868	Leo Dagum <dagum@barrel.engr.sgi.com>
				869	Ralf Baechle <ralf@oss.sgi.com>
				870	Grant Grundler <grundler@cup.hp.com>
				871	Jay Estabrook <Jay.Estabrook@compaq.com>
				872	Thomas Sailer <sailer@ife.ee.ethz.ch>
				873	Andrea Arcangeli <andrea@suse.de>
				874	Jens Axboe <axboe@suse.de>
				875	David Mosberger-Tang <davidm@hpl.hp.com>