[MIPS] Reimplement clear_page/copy_page

Fold the SB-1 specific implementation of clear_page/copy_page in the
generic version, and rewrite that one in tlbex style. The immediate
benefits:
  - It converts the compile-time workaround for SB-1 pass 1 prefetches
    to a more efficient run-time check.
  - It allows adjustment of loop unfolling, which helps to reduce the
    number of redundant cdex cache ops.
  - It fixes some esoteric cornercases (the cache line length calculations
    can go wrong, and support for 64k pages without prefetch instructions
    will overflow the addiu immediate).
  - Somewhat better guesses of "good" prefetch values.

Signed-off-by: Thiemo Seufer <ths@networkno.de>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
diff --git a/arch/mips/mm/uasm.h b/arch/mips/mm/uasm.h
index fe0574f..0d6a66f 100644
--- a/arch/mips/mm/uasm.h
+++ b/arch/mips/mm/uasm.h
@@ -55,6 +55,7 @@
 Ip_u1s2(_bltz);
 Ip_u1s2(_bltzl);
 Ip_u1u2s3(_bne);
+Ip_u2s3u1(_cache);
 Ip_u1u2u3(_dmfc0);
 Ip_u1u2u3(_dmtc0);
 Ip_u2u1s3(_daddiu);
@@ -77,6 +78,7 @@
 Ip_u1u2u3(_mfc0);
 Ip_u1u2u3(_mtc0);
 Ip_u2u1u3(_ori);
+Ip_u2s3u1(_pref);
 Ip_0(_rfe);
 Ip_u2s3u1(_sc);
 Ip_u2s3u1(_scd);
@@ -177,6 +179,8 @@
 void uasm_il_b(u32 **p, struct uasm_reloc **r, int lid);
 void uasm_il_beqz(u32 **p, struct uasm_reloc **r, unsigned int reg, int lid);
 void uasm_il_beqzl(u32 **p, struct uasm_reloc **r, unsigned int reg, int lid);
+void uasm_il_bne(u32 **p, struct uasm_reloc **r, unsigned int reg1,
+		 unsigned int reg2, int lid);
 void uasm_il_bnez(u32 **p, struct uasm_reloc **r, unsigned int reg, int lid);
 void uasm_il_bgezl(u32 **p, struct uasm_reloc **r, unsigned int reg, int lid);
 void uasm_il_bgez(u32 **p, struct uasm_reloc **r, unsigned int reg, int lid);