ARCv2: Support IO Coherency and permutations involving L1 and L2 caches In case of ARCv2 CPU there're could be following configurations that affect cache handling for data exchanged with peripherals via DMA: [1] Only L1 cache exists [2] Both L1 and L2 exist, but no IO coherency unit [3] L1, L2 caches and IO coherency unit exist Current implementation takes care of [1] and [2]. Moreover support of [2] is implemented with run-time check for SLC existence which is not super optimal. This patch introduces support of [3] and rework of DMA ops usage. Instead of doing run-time check every time a particular DMA op is executed we'll have 3 different implementations of DMA ops and select appropriate one during init. As for IOC support for it we need: [a] Implement empty DMA ops because IOC takes care of cache coherency with DMAed data [b] Route dma_alloc_coherent() via dma_alloc_noncoherent() This is required to make IOC work in first place and also serves as optimization as LD/ST to coherent buffers can be srviced from caches w/o going all the way to memory Signed-off-by: Alexey Brodkin <abrodkin@synopsys.com> [vgupta: -Added some comments about IOC gains -Marked dma ops as static, -Massaged changelog a bit] Signed-off-by: Vineet Gupta <vgupta@synopsys.com>

commit: f2b0b25a37a6db12580dcdfdf00f020e5e0e3a43 [log] [tgz]
author: Alexey Brodkin <abrodkin@synopsys.com> Mon May 25 19:54:28 2015 +0300
committer: Vineet Gupta <vgupta@synopsys.com> Thu Aug 20 18:11:17 2015 +0530
tree: 13670a5cef1737ba025aea9301875e0504193f53
parent: 2a4401687c11def29fe5a9b23ab98bf7ab1dce61 [diff] [blame]
diff --git a/arch/arc/mm/dma.c b/arch/arc/mm/dma.c
index 57706a9..e039fac 100644
--- a/arch/arc/mm/dma.c
+++ b/arch/arc/mm/dma.c

@@ -19,6 +19,7 @@
 #include <linux/dma-mapping.h>
 #include <linux/dma-debug.h>
 #include <linux/export.h>
+#include <asm/cache.h>
 #include <asm/cacheflush.h>
 
 /*
@@ -53,6 +54,20 @@
 {
 	void *paddr, *kvaddr;
 
+	/*
+	 * IOC relies on all data (even coherent DMA data) being in cache
+	 * Thus allocate normal cached memory
+	 *
+	 * The gains with IOC are two pronged:
+	 *   -For streaming data, elides needs for cache maintenance, saving
+	 *    cycles in flush code, and bus bandwidth as all the lines of a
+	 *    buffer need to be flushed out to memory
+	 *   -For coherent data, Read/Write to buffers terminate early in cache
+	 *   (vs. always going to memory - thus are faster)
+	 */
+	if (ioc_exists)
+		return dma_alloc_noncoherent(dev, size, dma_handle, gfp);
+
 	/* This is linear addr (0x8000_0000 based) */
 	paddr = alloc_pages_exact(size, gfp);
 	if (!paddr)
@@ -85,6 +100,9 @@
 void dma_free_coherent(struct device *dev, size_t size, void *kvaddr,
 		       dma_addr_t dma_handle)
 {
+	if (ioc_exists)
+		return dma_free_noncoherent(dev, size, kvaddr, dma_handle);
+
 	iounmap((void __force __iomem *)kvaddr);
 
 	free_pages_exact((void *)dma_handle, size);
commit	f2b0b25a37a6db12580dcdfdf00f020e5e0e3a43	[log] [tgz]
author	Alexey Brodkin <abrodkin@synopsys.com>	Mon May 25 19:54:28 2015 +0300
committer	Vineet Gupta <vgupta@synopsys.com>	Thu Aug 20 18:11:17 2015 +0530
tree	13670a5cef1737ba025aea9301875e0504193f53
parent	2a4401687c11def29fe5a9b23ab98bf7ab1dce61 [diff] [blame]