zlib adler_simd.c

Add SSSE3 implementation of the adler32 checksum, suitable for
both large workloads, and small workloads commonly seen during
PNG image decoding. Add a NEON implementation.

Speed is comparable to the serial adler32 computation but near
64 bytes of input data, the SIMD code paths begin to be faster
than the serial path: 3x faster at 256 bytes of input data, to
~8x faster for 1M of input data (~4x on ARMv8 NEON).

For the PNG 140 image corpus, PNG decoding speed is ~8% faster
on average on the desktop machines tested, and ~2% on an ARMv8
Pixel C Android (N) tablet, https://crbug.com/762564#c41

Update x86.{c,h} to runtime detect SSSE3 support and use it to
enable the adler32_simd code path and update inflate.c to call
x86_check_features(). Update the name mangler file names.h for
the new symbols added, add FIXME about simd.patch.

Ignore data alignment in the SSSE3 case since unaligned access
is no longer penalized on current generation Intel CPU. Use it
in the NEON case however to avoid the extra costs of unaligned
memory access on ARMv8/v7.

NEON credits: the v_s1/s2 vector component accumulate code was
provided by Adenilson Cavalcanti. The uint16 column vector sum
code is from libdeflate with corrections to process NMAX input
bytes which improves performance by 3% for large buffers.

Update BUILD.gn to put the code in its own source set, and add
it conditionally to the zlib library build rule. On ARM, build
the SIMD with max-speed config to produce the smallest code.

No change in behavior, covered by many existing tests.

Bug: 762564
Change-Id: I14a39940ae113b5a67ba70a99c3741e289b1796b
Reviewed-on: https://chromium-review.googlesource.com/660019
Commit-Queue: Chris Blume <cblume@chromium.org>
Reviewed-by: Adenilson Cavalcanti <cavalcantii@chromium.org>
Reviewed-by: Chris Blume <cblume@chromium.org>
Cr-Original-Commit-Position: refs/heads/master@{#505447}
Cr-Mirrored-From: https://chromium.googlesource.com/chromium/src
Cr-Mirrored-Commit: 09b784fd12f255a9da38107ac6e0386f4dde6d68
diff --git a/names.h b/names.h
index 3436baa..cd98ec9 100644
--- a/names.h
+++ b/names.h
@@ -162,6 +162,13 @@
 #define fill_window_sse Cr_z_fill_window_sse
 #define read_buf Cr_z_read_buf
 #define x86_check_features Cr_z_x86_check_features
+/* FIXME: x86_cpu_enable_ssse3 wasn't part of the simd.patch */
+#define x86_cpu_enable_ssse3 Cr_z_x86_cpu_enable_ssse3
 #define x86_cpu_enable_simd Cr_z_x86_cpu_enable_simd
 
+#if defined(ADLER32_SIMD_SSSE3) || defined(ADLER32_SIMD_NEON)
+/* Symbols added by adler_simd.c, see also the FIXME above */
+#define adler32_simd_ Cr_z_adler32_simd_
+#endif
+
 #endif  /* THIRD_PARTY_ZLIB_NAMES_H_ */