[ARM] Fix some corner cases in new mm initialisation

Document that the VMALLOC_END address must be aligned to 2MB since
it must align with a PGD boundary.

Allocate the vectors page early so that the flush_cache_all() later
will cause any dirty cache lines in the direct mapping will be safely
written back.

Move the flush_cache_all() to the second local_flush_cache_tlb() and
remove the now redundant first local_flush_cache_tlb().

Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
diff --git a/Documentation/arm/memory.txt b/Documentation/arm/memory.txt
index 4b1c93a..dc60455 100644
--- a/Documentation/arm/memory.txt
+++ b/Documentation/arm/memory.txt
@@ -1,7 +1,7 @@
 		Kernel Memory Layout on ARM Linux
 
 		Russell King <rmk@arm.linux.org.uk>
-			May 21, 2004 (2.6.6)
+		     November 17, 2005 (2.6.15)
 
 This document describes the virtual memory layout which the Linux
 kernel uses for ARM processors.  It indicates which regions are
@@ -37,6 +37,8 @@
 				mapping region.
 
 VMALLOC_END	feffffff	Free for platform use, recommended.
+				VMALLOC_END must be aligned to a 2MB
+				boundary.
 
 VMALLOC_START	VMALLOC_END-1	vmalloc() / ioremap() space.
 				Memory returned by vmalloc/ioremap will
diff --git a/arch/arm/mm/init.c b/arch/arm/mm/init.c
index c168f32..8b276ee 100644
--- a/arch/arm/mm/init.c
+++ b/arch/arm/mm/init.c
@@ -420,7 +420,8 @@
  * Set up device the mappings.  Since we clear out the page tables for all
  * mappings above VMALLOC_END, we will remove any debug device mappings.
  * This means you have to be careful how you debug this function, or any
- * called function.  (Do it by code inspection!)
+ * called function.  This means you can't use any function or debugging
+ * method which may touch any device, otherwise the kernel _will_ crash.
  */
 static void __init devicemaps_init(struct machine_desc *mdesc)
 {
@@ -428,6 +429,12 @@
 	unsigned long addr;
 	void *vectors;
 
+	/*
+	 * Allocate the vector page early.
+	 */
+	vectors = alloc_bootmem_low_pages(PAGE_SIZE);
+	BUG_ON(!vectors);
+
 	for (addr = VMALLOC_END; addr; addr += PGDIR_SIZE)
 		pmd_clear(pmd_off_k(addr));
 
@@ -461,12 +468,6 @@
 	create_mapping(&map);
 #endif
 
-	flush_cache_all();
-	local_flush_tlb_all();
-
-	vectors = alloc_bootmem_low_pages(PAGE_SIZE);
-	BUG_ON(!vectors);
-
 	/*
 	 * Create a mapping for the machine vectors at the high-vectors
 	 * location (0xffff0000).  If we aren't using high-vectors, also
@@ -491,12 +492,13 @@
 		mdesc->map_io();
 
 	/*
-	 * Finally flush the tlb again - this ensures that we're in a
-	 * consistent state wrt the writebuffer if the writebuffer needs
-	 * draining.  After this point, we can start to touch devices
-	 * again.
+	 * Finally flush the caches and tlb to ensure that we're in a
+	 * consistent state wrt the writebuffer.  This also ensures that
+	 * any write-allocated cache lines in the vector page are written
+	 * back.  After this point, we can start to touch devices again.
 	 */
 	local_flush_tlb_all();
+	flush_cache_all();
 }
 
 /*