Fill in a few more SSE/SSE2 insns, with current aim of being able to
run Qt-3.1 as built with "icc -xW" (P4 code generation).  Hopefully by
now I've worked through most SSE/SSE2 conceptual nasties, and it's
mostly a question of filling in the gaps.

I think I might have created a bug of some kind with SSE3g_RegWr.  My
current test app segfaults if I run without --optimise=no, which makes
me think I've written something erroneous in the UInstr predicates
controlling optimisation.  I don't know what though.


git-svn-id: svn://svn.valgrind.org/valgrind/trunk@1676 a5019735-40e9-0310-863c-91ae7b9d1cf9
diff --git a/include/vg_skin.h b/include/vg_skin.h
index 5ef6581..fc4c190 100644
--- a/include/vg_skin.h
+++ b/include/vg_skin.h
@@ -643,6 +643,22 @@
       */
       SSE3g_RegRd,
 
+      /* 4 bytes, reads memory, writes an integer register, but is
+         nevertheless an SSE insn.  The insn is of the form
+         bbbbbbbb:bbbbbbbb:bbbbbbbb:mod ireg rm where mod indicates
+         memory (ie is not 11b) and ireg is the int reg written.  The
+         first 3 bytes are held in lit32[23:0] since there is
+         insufficient space elsewhere.  mod and rm are to be replaced
+         at codegen time by a reference to the Temp/RealReg holding
+         the address.  Arg1 holds this Temp/RealReg.  ireg is to be
+         replaced at codegen time by a reference to the relevant
+         RealReg in which the answer is to be written.  Arg2 holds
+         this Temp/RealReg.  Transfer to the destination reg is always
+         at size 4.  However the memory read can be at sizes 4 or 8
+         and so this is what the sz field holds.
+      */
+      SSE3ag_MemRd_RegWr,
+
       /* 5 bytes, no memrefs, no iregdefs, copy exactly to the
          output.  Held in val1[15:0], val2[15:0] and val3[7:0]. */
       SSE5,