[clangd] Use xxhash instead of SHA1 for background index file digests.
Summary:
Currently SHA1 is about 10% of our CPU, this patch reduces it to ~1%.
xxhash is a well-defined (stable) non-cryptographic hash optimized for
fast checksums (like crc32).
Collisions shouldn't be a problem, despite the reduced length:
- for actual file content (used to invalidate bg index shards), there
are only two versions that can collide (new shard and old shard).
- for file paths in bg index shard filenames, we would need 2^32 files
with the same filename to expect a collision. Imperfect hashing may
reduce this a bit but it's well beyond what's plausible.
This will invalidate shards on disk (as usual; I bumped the version),
but this time the filenames are changing so the old files will stick
around :-( So this is more expensive than the usual bump, but would be
good to land before the v9 branch when everyone will start using bg index.
Reviewers: kadircet
Subscribers: ilya-biryukov, MaskRay, jkorous, arphaman, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D64306
llvm-svn: 365311
diff --git a/clang-tools-extra/clangd/SourceCode.cpp b/clang-tools-extra/clangd/SourceCode.cpp
index 9fcaed4..5c715ba 100644
--- a/clang-tools-extra/clangd/SourceCode.cpp
+++ b/clang-tools-extra/clangd/SourceCode.cpp
@@ -25,6 +25,7 @@
#include "llvm/Support/Error.h"
#include "llvm/Support/ErrorHandling.h"
#include "llvm/Support/Path.h"
+#include "llvm/Support/xxhash.h"
#include <algorithm>
namespace clang {
@@ -376,7 +377,13 @@
}
FileDigest digest(llvm::StringRef Content) {
- return llvm::SHA1::hash({(const uint8_t *)Content.data(), Content.size()});
+ uint64_t Hash{llvm::xxHash64(Content)};
+ FileDigest Result;
+ for (unsigned I = 0; I < Result.size(); ++I) {
+ Result[I] = uint8_t(Hash);
+ Hash >>= 8;
+ }
+ return Result;
}
llvm::Optional<FileDigest> digestFile(const SourceManager &SM, FileID FID) {