ARM: 6411/1: vexpress: set RAM latencies to 1 cycle for PL310 on ct-ca9x4 tile

The PL310 on the ct-ca9x4 tile for the Versatile Express does not need
to add additional latency when accessing its cache RAMs. Unfortunately,
the boot monitor sets this up for an 8-cycle delay on reads and writes,
resulting in greatly reduced memory performance when the L2 cache is
enabled.

This patch sets the L2 RAM latencies to the correct value of 1 cycle
on the ct-ca9x4 tile before enabling the L2 cache.

Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
diff --git a/arch/arm/mach-vexpress/ct-ca9x4.c b/arch/arm/mach-vexpress/ct-ca9x4.c
index 1c9c13e..efb1270 100644
--- a/arch/arm/mach-vexpress/ct-ca9x4.c
+++ b/arch/arm/mach-vexpress/ct-ca9x4.c
@@ -227,7 +227,13 @@
 	int i;
 
 #ifdef CONFIG_CACHE_L2X0
-	l2x0_init(MMIO_P2V(CT_CA9X4_L2CC), 0x00400000, 0xfe0fffff);
+	void __iomem *l2x0_base = MMIO_P2V(CT_CA9X4_L2CC);
+
+	/* set RAM latencies to 1 cycle for this core tile. */
+	writel(0, l2x0_base + L2X0_TAG_LATENCY_CTRL);
+	writel(0, l2x0_base + L2X0_DATA_LATENCY_CTRL);
+
+	l2x0_init(l2x0_base, 0x00400000, 0xfe0fffff);
 #endif
 
 	clkdev_add_table(lookups, ARRAY_SIZE(lookups));