Reduce x86 sequence for GP pair to XMM

Added support for punpckldq which is useful for interleaving
32-bit values from two xmm registers.

This new instruction is now used for transfers from GP pairs
to XMM in order to reduce path length.

Change-Id: I70d9b69449dfcfb9a94a628deb74a7cffe96bac7
Signed-off-by: Razvan A Lupusoru <razvan.a.lupusoru@intel.com>
4 files changed