SSE2 SIMD implementation of Huffman encoding
Full-color compression speedups relative to libjpeg-turbo 1.4.2:
2.8 GHz Intel Xeon W3530, Linux, 64-bit: 2.2-18% (avg. 9.5%)
2.8 GHz Intel Xeon W3530, Linux, 32-bit: 10-25% (avg. 17%)
2.3 GHz AMD A10-4600M APU, Linux, 64-bit: 4.9-17% (avg. 11%)
2.3 GHz AMD A10-4600M APU, Linux, 32-bit: 8.8-19% (avg. 15%)
3.0 GHz Intel Core i7, OS X, 64-bit: 3.5-16% (avg. 10%)
3.0 GHz Intel Core i7, OS X, 32-bit: 4.8-14% (avg. 11%)
2.6 GHz AMD Athlon 64 X2 5050e:
Performance-neutral (give or take a few percent)
Full-color compression speedups relative to IPP:
2.8 GHz Intel Xeon W3530, Linux, 64-bit: 4.8-34% (avg. 19%)
2.8 GHz Intel Xeon W3530, Linux, 32-bit: -19%-7.0% (avg. -7.0%)
Refer to #42 for discussion. Numerous other approaches were attempted,
but this one proved to be the most performant across all platforms.
This commit also fixes #3 (works around, really-- the clang-compiled version
of jchuff.c still performs 20% worse than its GCC-compiled counterpart, but
that code is now bypassed by the new SSE2 Huffman algorithm.)
Based on:
https://github.com/mayeut/libjpeg-turbo/commit/2cb4d41330e1edc4469f6b97ba73b73abfbeb02f
https://github.com/mayeut/libjpeg-turbo/commit/36c94e050d117912adbff9fbcc6fe307df240168
diff --git a/ChangeLog.txt b/ChangeLog.txt
index 16f695b..2b0cd32 100644
--- a/ChangeLog.txt
+++ b/ChangeLog.txt
@@ -57,6 +57,16 @@
system overhead that might be caused by lazy writes to disk and thus improves
the consistency of the performance measurements.
+[12] Added SIMD acceleration for Huffman encoding on SSE2-capable x86 and
+x86-64 platforms. This speeds up the compression of full-color JPEGs by about
+10-15% on average (relative to libjpeg-turbo 1.4.x) when using modern Intel and
+AMD CPUs. Additionally, this works around an issue in the clang optimizer that
+prevents it (as of this writing) from achieving the same performance as GCC
+when compiling the C version of the Huffman encoder
+(https://llvm.org/bugs/show_bug.cgi?id=16035). For the purposes of benchmarking
+or regression testing, SIMD-accelerated Huffman encoding can be disabled by
+setting the JSIMD_NOHUFFENC environment variable to 1.
+
1.4.2
=====