Fix x86-64 ABI conformance issue in SIMD code
(descriptions cribbed by DRC from discussion in #20)
In the x86-64 ABI, the high (unused) DWORD of a 32-bit argument's
register is undefined, so it was incorrect to use a 64-bit mov
instruction to transfer a JDIMENSION argument in the 64-bit SSE2 SIMD
functions. The code worked thus far only because the existing compiler
optimizers weren't smart enough to do anything else with the register in
question, so the upper 32 bits happened to be all zeroes-- for the past
6 years, on every x86-64 compiler previously known to mankind.
The bleeding-edge Clang/LLVM compiler has a smarter optimizer, and
under certain circumstances, it will attempt to load-combine adjacent
32-bit integers from one of the libjpeg structures into a single 64-bit
integer and pass that 64-bit integer as a 32-bit argument to one of the
SIMD functions (which is allowed by the ABI, since the upper 32 bits of
the 32-bit argument's register are undefined.) This caused the
libjpeg-turbo regression tests to crash.
Also enhance the documentation of JDIMENSION to explain that its size
is significant to the implementation of the SIMD code.
Closes #20. Refer also to http://crbug.com/532214.
diff --git a/simd/jidctflt-sse2-64.asm b/simd/jidctflt-sse2-64.asm
index 32e4ec2..95bd4dc 100644
--- a/simd/jidctflt-sse2-64.asm
+++ b/simd/jidctflt-sse2-64.asm
@@ -326,7 +326,7 @@
mov rax, [original_rbp]
lea rsi, [workspace] ; FAST_FLOAT * wsptr
mov rdi, r12 ; (JSAMPROW *)
- mov rax, r13
+ mov eax, r13d
mov rcx, DCTSIZE/4 ; ctr
.rowloop: