ARC: [mm] optimise icache flush for user mappings

ARC icache doesn't snoop dcache thus executable pages need to be made
coherent before mapping into userspace in flush_icache_page().

However ARC700 CDU (hardware cache flush module) requires both vaddr
(index in cache) as well as paddr (tag match) to correctly identify a
line in the VIPT cache. A typical ARC700 SoC has aliasing icache, thus
the paddr only based flush_icache_page() API couldn't be implemented
efficiently. It had to loop thru all possible alias indexes and perform
the invalidate operation (ofcourse the cache op would only succeed at
the index(es) where tag matches - typically only 1, but the cost of
visiting all the cache-bins needs to paid nevertheless).

Turns out however that the vaddr (along with paddr) is available in
update_mmu_cache() hence better suits ARC icache flush semantics.
With both vaddr+paddr, exactly one flush operation per line is done.

Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
diff --git a/arch/arc/mm/tlb.c b/arch/arc/mm/tlb.c
index c03364a..086be52 100644
--- a/arch/arc/mm/tlb.c
+++ b/arch/arc/mm/tlb.c
@@ -422,12 +422,18 @@
  * when a new PTE is entered in Page Tables or an existing one
  * is modified. We aggresively pre-install a TLB entry
  */
-
-void update_mmu_cache(struct vm_area_struct *vma, unsigned long vaddress,
+void update_mmu_cache(struct vm_area_struct *vma, unsigned long vaddr_unaligned,
 		      pte_t *ptep)
 {
+	unsigned long vaddr = vaddr_unaligned & PAGE_MASK;
 
-	create_tlb(vma, vaddress, ptep);
+	create_tlb(vma, vaddr, ptep);
+
+	/* icache doesn't snoop dcache, thus needs to be made coherent here */
+	if (vma->vm_flags & VM_EXEC) {
+		unsigned long paddr =  pte_val(*ptep) & PAGE_MASK;
+		__inv_icache_page(paddr, vaddr);
+	}
 }
 
 /* Read the Cache Build Confuration Registers, Decode them and save into