align stack properly for calling global ctors/dtors on x86[_64]

failure to do so was causing crashes on x86_64 when ctors used SSE,
which was first observed when ctors called variadic functions due to
the SSE prologue code inserted into every variadic function.
4 files changed