Reapply ~"Bitcode: Collect all MDString records into a single blob"
Spiritually reapply commit r264409 (reverted in r264410), albeit with a
bit of a redesign.
Firstly, avoid splitting the big blob into multiple chunks of strings.
r264409 imposed an arbitrary limit to avoid a massive allocation on the
shared 'Record' SmallVector. The bug with that commit only reproduced
when there were more than "chunk-size" strings. A test for this would
have been useless long-term, since we're liable to adjust the chunk-size
in the future.
Thus, eliminate the motivation for chunk-ing by storing the string sizes
in the blob. Here's the layout:
vbr6: # of strings
vbr6: offset-to-blob
blob:
[vbr6]: string lengths
[char]: concatenated strings
Secondly, make the output of llvm-bcanalyzer readable.
I noticed when debugging r264409 that llvm-bcanalyzer was outputting a
massive blob all in one line. Past a small number, the strings were
impossible to split in my head, and the lines were way too long. This
version adds support in llvm-bcanalyzer for pretty-printing.
<STRINGS abbrevid=4 op0=3 op1=9/> num-strings = 3 {
'abc'
'def'
'ghi'
}
From the original commit:
Inspired by Mehdi's similar patch, http://reviews.llvm.org/D18342, this
should (a) slightly reduce bitcode size, since there is less record
overhead, and (b) greatly improve reading speed, since blobs are super
cheap to deserialize.
llvm-svn: 264551
diff --git a/llvm/lib/Bitcode/Writer/ValueEnumerator.cpp b/llvm/lib/Bitcode/Writer/ValueEnumerator.cpp
index 08b5e45..69cafb7 100644
--- a/llvm/lib/Bitcode/Writer/ValueEnumerator.cpp
+++ b/llvm/lib/Bitcode/Writer/ValueEnumerator.cpp
@@ -280,8 +280,7 @@
ValueEnumerator::ValueEnumerator(const Module &M,
bool ShouldPreserveUseListOrder)
- : HasMDString(false),
- ShouldPreserveUseListOrder(ShouldPreserveUseListOrder) {
+ : ShouldPreserveUseListOrder(ShouldPreserveUseListOrder) {
if (ShouldPreserveUseListOrder)
UseListOrders = predictUseListOrder(M);
@@ -375,6 +374,9 @@
// Optimize constant ordering.
OptimizeConstants(FirstConstant, Values.size());
+
+ // Organize metadata ordering.
+ organizeMetadata();
}
unsigned ValueEnumerator::getInstructionID(const Instruction *Inst) const {
@@ -530,8 +532,8 @@
EnumerateMDNodeOperands(N);
else if (auto *C = dyn_cast<ConstantAsMetadata>(MD))
EnumerateValue(C->getValue());
-
- HasMDString |= isa<MDString>(MD);
+ else
+ ++NumMDStrings;
// Replace the dummy ID inserted above with the correct one. MetadataMap may
// have changed by inserting operands, so we need a fresh lookup here.
@@ -557,6 +559,19 @@
FunctionLocalMDs.push_back(Local);
}
+void ValueEnumerator::organizeMetadata() {
+ if (!NumMDStrings)
+ return;
+
+ // Put the strings first.
+ std::stable_partition(MDs.begin(), MDs.end(),
+ [](const Metadata *MD) { return isa<MDString>(MD); });
+
+ // Renumber.
+ for (unsigned I = 0, E = MDs.size(); I != E; ++I)
+ MetadataMap[MDs[I]] = I + 1;
+}
+
void ValueEnumerator::EnumerateValue(const Value *V) {
assert(!V->getType()->isVoidTy() && "Can't insert void values!");
assert(!isa<MetadataAsValue>(V) && "EnumerateValue doesn't handle Metadata!");