Fix x86-64 ABI conformance issue in SIMD code
(descriptions cribbed by DRC from discussion in #20)
In the x86-64 ABI, the high (unused) DWORD of a 32-bit argument's
register is undefined, so it was incorrect to use a 64-bit mov
instruction to transfer a JDIMENSION argument in the 64-bit SSE2 SIMD
functions.  The code worked thus far only because the existing compiler
optimizers weren't smart enough to do anything else with the register in
question, so the upper 32 bits happened to be all zeroes-- for the past
6 years, on every x86-64 compiler previously known to mankind.

The bleeding-edge Clang/LLVM compiler has a smarter optimizer, and
under certain circumstances, it will attempt to load-combine adjacent
32-bit integers from one of the libjpeg structures into a single 64-bit
integer and pass that 64-bit integer as a 32-bit argument to one of the
SIMD functions (which is allowed by the ABI, since the upper 32 bits of
the 32-bit argument's register are undefined.)  This caused the
libjpeg-turbo regression tests to crash.

Also enhance the documentation of JDIMENSION to explain that its size
is significant to the implementation of the SIMD code.

Closes #20.  Refer also to http://crbug.com/532214.
diff --git a/jmorecfg.h b/jmorecfg.h
index e86ccc6..8e3007c 100644
--- a/jmorecfg.h
+++ b/jmorecfg.h
@@ -162,7 +162,9 @@
  * images up to 64K*64K due to 16-bit fields in SOF markers.  Therefore
  * "unsigned int" is sufficient on all machines.  However, if you need to
  * handle larger images and you don't mind deviating from the spec, you
- * can change this datatype.
+ * can change this datatype.  (Note that changing this datatype will
+ * potentially require modifying the SIMD code.  The x86-64 SIMD extensions,
+ * in particular, assume a 32-bit JDIMENSION.)
  */
 
 typedef unsigned int JDIMENSION;