SSE2 SIMD implementation of Huffman encoding

Full-color compression speedups relative to libjpeg-turbo 1.4.2:

2.8 GHz Intel Xeon W3530, Linux, 64-bit:  2.2-18% (avg. 9.5%)
2.8 GHz Intel Xeon W3530, Linux, 32-bit:  10-25% (avg. 17%)

2.3 GHz AMD A10-4600M APU, Linux, 64-bit:  4.9-17% (avg. 11%)
2.3 GHz AMD A10-4600M APU, Linux, 32-bit:  8.8-19% (avg. 15%)

3.0 GHz Intel Core i7, OS X, 64-bit:  3.5-16% (avg. 10%)
3.0 GHz Intel Core i7, OS X, 32-bit:  4.8-14% (avg. 11%)

2.6 GHz AMD Athlon 64 X2 5050e:
Performance-neutral (give or take a few percent)

Full-color compression speedups relative to IPP:

2.8 GHz Intel Xeon W3530, Linux, 64-bit:  4.8-34% (avg. 19%)
2.8 GHz Intel Xeon W3530, Linux, 32-bit:  -19%-7.0% (avg. -7.0%)

Refer to #42 for discussion.  Numerous other approaches were attempted,
but this one proved to be the most performant across all platforms.

This commit also fixes #3 (works around, really-- the clang-compiled version
of jchuff.c still performs 20% worse than its GCC-compiled counterpart, but
that code is now bypassed by the new SSE2 Huffman algorithm.)

Based on:
https://github.com/mayeut/libjpeg-turbo/commit/2cb4d41330e1edc4469f6b97ba73b73abfbeb02f
https://github.com/mayeut/libjpeg-turbo/commit/36c94e050d117912adbff9fbcc6fe307df240168
diff --git a/ChangeLog.txt b/ChangeLog.txt
index 16f695b..2b0cd32 100644
--- a/ChangeLog.txt
+++ b/ChangeLog.txt
@@ -57,6 +57,16 @@
 system overhead that might be caused by lazy writes to disk and thus improves
 the consistency of the performance measurements.
 
+[12] Added SIMD acceleration for Huffman encoding on SSE2-capable x86 and
+x86-64 platforms.  This speeds up the compression of full-color JPEGs by about
+10-15% on average (relative to libjpeg-turbo 1.4.x) when using modern Intel and
+AMD CPUs.  Additionally, this works around an issue in the clang optimizer that
+prevents it (as of this writing) from achieving the same performance as GCC
+when compiling the C version of the Huffman encoder
+(https://llvm.org/bugs/show_bug.cgi?id=16035). For the purposes of benchmarking
+or regression testing, SIMD-accelerated Huffman encoding can be disabled by
+setting the JSIMD_NOHUFFENC environment variable to 1.
+
 
 1.4.2
 =====