arm: Use optimized memcpy and memset from linux

Using optimized versions of memset and memcpy from linux brings a quite
noticeable speed (x2 or better) improvement for these two functions.

Here are some numbers for test done with jadecpu

                           | HEAD(1)| HEAD(1)| HEAD(2)| HEAD(2)|
                           |        | +patch |        | +patch |
---------------------------+--------+--------+--------+--------+
Reset to prompt            |  438ms |  330ms |  228ms |  120ms |
                           |        |        |        |        |
TFTP a 3MB img             | 4782ms | 3428ms | 3245ms | 2820ms |
                           |        |        |        |        |
FATLOAD USB a 3MB img*     | 8515ms | 8510ms | ------ | ------ |
                           |        |        |        |        |
BOOTM LZO img in RAM       | 3473ms | 3168ms |  592ms |  592ms |
 where CRC is              |  615ms |  615ms |   54ms |   54ms |
 uncompress                | 2460ms | 2462ms |  450ms |  451ms |
 final boot_elf            |  376ms |   68ms |   65ms |   65ms |
                           |        |        |        |        |
BOOTM LZO img in FLASH     | 3207ms | 2902ms | 1050ms | 1050ms |
 where CRC is              |  600ms |  600ms |  135ms |  135ms |
 uncompress                | 2209ms | 2211ms |  828ms |  828ms |
                           |        |        |        |        |
Copy 1.4MB from NOR to RAM |  134ms |   72ms |  120ms |   70ms |

(1) No dcache
(2) dcache enabled in board_init
*Does not work when dcache is on

Size impact:

C version:
   text    data     bss     dec     hex filename
 202862   18912  266456  488230   77326 u-boot

ASM version:
   text    data     bss     dec     hex filename
 203798   18912  266288  488998   77626 u-boot
222712  u-boot.bin

Signed-off-by: Matthias Weisser <weisserm@arcor.de>
diff --git a/README b/README
index cdbb9de..85cbb8e 100644
--- a/README
+++ b/README
@@ -2944,6 +2944,12 @@
 		that is executed before the actual U-Boot. E.g. when
 		compiling a NAND SPL.
 
+- CONFIG_USE_ARCH_MEMCPY
+  CONFIG_USE_ARCH_MEMSET
+		If these options are used a optimized version of memcpy/memset will
+		be used if available. These functions may be faster under some
+		conditions but may increase the binary size.
+
 Building the Software:
 ======================