Reland "Integrate SIMD optimisations for zlib"

This version uses a "pthread_once" implementation, using Windows
synchronisation primitives, imported from tcmalloc.

Previous CLs:
https://codereview.chromium.org/677713002/
https://codereview.chromium.org/552123005

This version of the CL also runs fine on Windows Server 2003.

These optimisations have been published on zlib mailing list and at
https://github.com/jtkukunas/zlib/

This change merges the following optimisation patches:
- "For x86, add CPUID check."
- "Adds SSE2 optimized hash shifting to fill_window."
- "add SSE4.2 optimized hash function"
- "add PCLMULQDQ optimized CRC folding"

From Jim Kukunas <james.t.kukunas@linux.intel.com>; and adapts them to the
current zlib version in Chromium.

The optimisations are enabled at runtime if all the necessary CPU features are
present. As the optimisations require extra cflags to enable the compiler to
use the instructions the optimisations are held in their own static library
with a stub implementation to allow linking on other platforms.

TEST=net_unittests(GZipUnitTest) passes, Chrome functions and performance
improvement seen on RoboHornet benchmark on Linux Desktop
BUG=401517

Review URL: https://codereview.chromium.org/678423002

Cr-Original-Commit-Position: refs/heads/master@{#302799}
Cr-Mirrored-From: https://chromium.googlesource.com/chromium/src
Cr-Mirrored-Commit: 02a95e3084f979084fa8586e1718a6e6dd4c22da
diff --git a/crc32.c b/crc32.c
index 91be372..75f2290 100644
--- a/crc32.c
+++ b/crc32.c
@@ -26,6 +26,8 @@
 #  endif /* !DYNAMIC_CRC_TABLE */
 #endif /* MAKECRCH */
 
+#include "deflate.h"
+#include "x86.h"
 #include "zutil.h"      /* for STDC and FAR definitions */
 
 #define local static
@@ -440,3 +442,28 @@
 {
     return crc32_combine_(crc1, crc2, len2);
 }
+
+ZLIB_INTERNAL void crc_reset(deflate_state *const s)
+{
+    if (x86_cpu_enable_simd) {
+        crc_fold_init(s);
+        return;
+    }
+    s->strm->adler = crc32(0L, Z_NULL, 0);
+}
+
+ZLIB_INTERNAL void crc_finalize(deflate_state *const s)
+{
+    if (x86_cpu_enable_simd)
+        s->strm->adler = crc_fold_512to32(s);
+}
+
+ZLIB_INTERNAL void copy_with_crc(z_streamp strm, Bytef *dst, long size)
+{
+    if (x86_cpu_enable_simd) {
+        crc_fold_copy(strm->state, dst, strm->next_in, size);
+        return;
+    }
+    zmemcpy(dst, strm->next_in, size);
+    strm->adler = crc32(strm->adler, dst, size);
+}