ARM optimized insert_string

Using a faster hash function yields a considerable performance boost
in compression (average 8% on A53 and 24% on A72).

This change was enabled by previous patch with optimized crc32 using
ARMv8-1 crypto extensions for performing CPU feature detection
(so won't help older ARMv7 SoCs).

Bug: 873759
Change-Id: I88ece549a63d923beef4f96a046acdf09e529784
Reviewed-on: https://chromium-review.googlesource.com/1173262
Reviewed-by: Chris Blume <cblume@chromium.org>
Reviewed-by: Mike Klein <mtklein@chromium.org>
Commit-Queue: Adenilson Cavalcanti <cavalcantii@chromium.org>
Cr-Original-Commit-Position: refs/heads/master@{#583113}
Cr-Mirrored-From: https://chromium.googlesource.com/chromium/src
Cr-Mirrored-Commit: 1364a33fe0f2b9588a2d018f62ff4d966a525f37
diff --git a/crc32_simd.h b/crc32_simd.h
index d3d0bce..08f1756 100644
--- a/crc32_simd.h
+++ b/crc32_simd.h
@@ -9,6 +9,7 @@
 
 #include "zconf.h"
 #include "zutil.h"
+#include "deflate.h"
 
 /*
  * crc32_sse42_simd_(): compute the crc32 of the buffer, where the buffer
@@ -33,3 +34,8 @@
                                           const unsigned char* buf,
                                           z_size_t len);
 
+/*
+ * Insert hash string.
+ */
+Pos ZLIB_INTERNAL insert_string_arm(deflate_state *const s, const Pos str);
+