Improve zlib inflate speed by using SSE4.2 crc32

Using an SSE4.2-based crc32 improves the decoding rate of the PNG
140 corpus by 4% average, giving a total 40% performance increase
when combined with adler32 SIMD code and inflate chunk copy code,
see https://crbug.com/796178#c2 for details.

Raw crc32 speed is 5x - 25x faster than the zlib default "BYFOUR"
crc32, and gzip- and zlib-wrapped inflate performance improves by
69% and 50% for the snappy corpus (https://crbug.com/796178#c3 #4
for details).

Add crc32 SIMD implementation and update the call-site in crc32.c
to use the new crc32 code, using run-time detection of the SSE4.2
and PCLMUL support required by the crc32 SIMD code.

Update BUILD.gn to compile the crc32 SIMD code for Intel devices,
also update names.h with the new symbol defined by the crc32 SIMD
code path.

Bug: 796178
Change-Id: I1bb94b47c9a4934eed01ba3d4feda51d67c4bf85
Reviewed-on: https://chromium-review.googlesource.com/833820
Commit-Queue: Noel Gordon <noel@chromium.org>
Reviewed-by: Chris Blume <cblume@chromium.org>
Cr-Original-Commit-Position: refs/heads/master@{#526935}
Cr-Mirrored-From: https://chromium.googlesource.com/chromium/src
Cr-Mirrored-Commit: 65e2abcb74b1c07fa14f46abaa1fb1717892eec3
diff --git a/names.h b/names.h
index c18b90f..55a8a3f 100644
--- a/names.h
+++ b/names.h
@@ -176,4 +176,9 @@
 #define inflate_fast_chunk_ Cr_z_inflate_fast_chunk_
 #endif
 
+#if defined(CRC32_SIMD_SSE42_PCLMUL)
+/* Symbols added by crc32_simd.c */
+#define crc32_sse42_simd_ Cr_z_crc32_sse42_simd_
+#endif
+
 #endif  /* THIRD_PARTY_ZLIB_NAMES_H_ */