Specializing x86 range argument copying

The ARM implementation of range argument copying was specialized in some cases.
For all other architectures, it would fall back to generating memcpy. This patch
updates the x86 implementation so it does not call memcpy and instead generates
loads and stores, favoring movement of 128-bit chunks.

Change-Id: Ic891e5609a4b0e81a47c29cc5a9b301bd10a1933
Signed-off-by: Razvan A Lupusoru <razvan.a.lupusoru@intel.com>
12 files changed