Extend the state components in VG_(m_state_static) and VG_(baseBlock)
to include the SSE/SSE2 architectural state.  Automagically detect
at startup, in vg_startup.S, whether or not this is a SSE-enabled
CPU and act accordingly.  All subsequent FPU/SSE state transfers
between the simulated and real machine are then done either with
fsave/frstor (as before) or fxsave/fxrstor (the SSE equivalents).

Fragile and fiddly; (1) the SSE state needs to be stored on a 16-byte
boundary, and (2) certain bits in the saved MXCSR reg in a state
written by fxsave need to be anded out before we can safely restore
using fxrstor.

It does appear to work.  I'd appreciate people trying it out on
various CPUs to establish whether the SSE / not-SSE check works
right, and/or anything else is broken.

Unfortunately makes some programs run significantly slower.
I don't know why.  Perhaps due to copying around more processor
state than there was before (SSE state is 512 bytes, FPU state
was only 108).  I will look into this.


git-svn-id: svn://svn.valgrind.org/valgrind/trunk@1574 a5019735-40e9-0310-863c-91ae7b9d1cf9
9 files changed