util: Use ZSTD for shader cache if possible

This allows ZSTD instead of ZLIB to be used for compressing the shader
cache.

On a 72 core system emulating skl with a full shader-db (with i965):
ZSTD:
    1915.10s user 229.27s system 5150% cpu 41.632 total (cold cache)
    225.40s user 10.87s system 3810% cpu 6.201 total (warm cache)
    154M (235M on disk)
ZLIB:
    2231.33s user 194.24s system 1899% cpu 2:07.72 total (cold cache)
    229.15s user 10.63s system 3906% cpu 6.139 total (warm cache)
    163M (244M on disk)

Tim Arceri sees (8 core ryzen and a full shader-db):
ZSTD:
    2505.22 user 40.50 system 3:18.73 elapsed 1280% CPU (cold cache)
    418.71 user 14.93 system 0:46.53 elapsed 931% CPU (warm cache)
    454.3 MB (681.7 MB on disk)
ZLIB:
    3069.83 user 40.02 system 4:20.13 elapsed 1195% CPU (cold cache)
    425.50 user 15.17 system 0:46.80 elapsed 941% CPU (warm cache)
    470.3 MB (701.4 MB on disk)

Reviewed-by: Eric Engestrom <eric.engestrom@intel.com> (v1)
Reviewed-by: Eric Anholt <eric@anholt.net>
diff --git a/meson.build b/meson.build
index b9a6da0..3990770 100644
--- a/meson.build
+++ b/meson.build
@@ -1251,6 +1251,17 @@
 # TODO: some of these may be conditional
 dep_zlib = dependency('zlib', version : '>= 1.2.3', fallback : ['zlib', 'zlib_dep'])
 pre_args += '-DHAVE_ZLIB'
+
+_zstd = get_option('zstd')
+if _zstd != 'false'
+  dep_zstd = dependency('libzstd', required : _zstd == 'true')
+  if dep_zstd.found()
+    pre_args += '-DHAVE_ZSTD'
+  endif
+else
+  dep_zstd = null_dep
+endif
+
 dep_thread = dependency('threads')
 if dep_thread.found() and host_machine.system() != 'windows'
   pre_args += '-DHAVE_PTHREAD'