Reapply ~"Bitcode: Collect all MDString records into a single blob" Spiritually reapply commit r264409 (reverted in r264410), albeit with a bit of a redesign. Firstly, avoid splitting the big blob into multiple chunks of strings. r264409 imposed an arbitrary limit to avoid a massive allocation on the shared 'Record' SmallVector. The bug with that commit only reproduced when there were more than "chunk-size" strings. A test for this would have been useless long-term, since we're liable to adjust the chunk-size in the future. Thus, eliminate the motivation for chunk-ing by storing the string sizes in the blob. Here's the layout: vbr6: # of strings vbr6: offset-to-blob blob: [vbr6]: string lengths [char]: concatenated strings Secondly, make the output of llvm-bcanalyzer readable. I noticed when debugging r264409 that llvm-bcanalyzer was outputting a massive blob all in one line. Past a small number, the strings were impossible to split in my head, and the lines were way too long. This version adds support in llvm-bcanalyzer for pretty-printing. <STRINGS abbrevid=4 op0=3 op1=9/> num-strings = 3 { 'abc' 'def' 'ghi' } From the original commit: Inspired by Mehdi's similar patch, http://reviews.llvm.org/D18342, this should (a) slightly reduce bitcode size, since there is less record overhead, and (b) greatly improve reading speed, since blobs are super cheap to deserialize. llvm-svn: 264551

commit: 6565a0d4b2c98722eb8fee9093cdde4f37928986 [log] [tgz]
author: Duncan P. N. Exon Smith <dexonsmith@apple.com> Sun Mar 27 23:17:54 2016 +0000
committer: Duncan P. N. Exon Smith <dexonsmith@apple.com> Sun Mar 27 23:17:54 2016 +0000
tree: 9a98af7c4407ac1b6d74a183c4cf30bec6919fc4
parent: 376fa2606069cdd5840fd035312bad027d8b2428 [diff] [blame]
diff --git a/llvm/lib/Bitcode/Writer/ValueEnumerator.cpp b/llvm/lib/Bitcode/Writer/ValueEnumerator.cpp
index 08b5e45..69cafb7 100644
--- a/llvm/lib/Bitcode/Writer/ValueEnumerator.cpp
+++ b/llvm/lib/Bitcode/Writer/ValueEnumerator.cpp

@@ -280,8 +280,7 @@
 
 ValueEnumerator::ValueEnumerator(const Module &M,
                                  bool ShouldPreserveUseListOrder)
-    : HasMDString(false),
-      ShouldPreserveUseListOrder(ShouldPreserveUseListOrder) {
+    : ShouldPreserveUseListOrder(ShouldPreserveUseListOrder) {
   if (ShouldPreserveUseListOrder)
     UseListOrders = predictUseListOrder(M);
 
@@ -375,6 +374,9 @@
 
   // Optimize constant ordering.
   OptimizeConstants(FirstConstant, Values.size());
+
+  // Organize metadata ordering.
+  organizeMetadata();
 }
 
 unsigned ValueEnumerator::getInstructionID(const Instruction *Inst) const {
@@ -530,8 +532,8 @@
     EnumerateMDNodeOperands(N);
   else if (auto *C = dyn_cast<ConstantAsMetadata>(MD))
     EnumerateValue(C->getValue());
-
-  HasMDString |= isa<MDString>(MD);
+  else
+    ++NumMDStrings;
 
   // Replace the dummy ID inserted above with the correct one.  MetadataMap may
   // have changed by inserting operands, so we need a fresh lookup here.
@@ -557,6 +559,19 @@
   FunctionLocalMDs.push_back(Local);
 }
 
+void ValueEnumerator::organizeMetadata() {
+  if (!NumMDStrings)
+    return;
+
+  // Put the strings first.
+  std::stable_partition(MDs.begin(), MDs.end(),
+                        [](const Metadata *MD) { return isa<MDString>(MD); });
+
+  // Renumber.
+  for (unsigned I = 0, E = MDs.size(); I != E; ++I)
+    MetadataMap[MDs[I]] = I + 1;
+}
+
 void ValueEnumerator::EnumerateValue(const Value *V) {
   assert(!V->getType()->isVoidTy() && "Can't insert void values!");
   assert(!isa<MetadataAsValue>(V) && "EnumerateValue doesn't handle Metadata!");
commit	6565a0d4b2c98722eb8fee9093cdde4f37928986	[log] [tgz]
author	Duncan P. N. Exon Smith <dexonsmith@apple.com>	Sun Mar 27 23:17:54 2016 +0000
committer	Duncan P. N. Exon Smith <dexonsmith@apple.com>	Sun Mar 27 23:17:54 2016 +0000
tree	9a98af7c4407ac1b6d74a183c4cf30bec6919fc4
parent	376fa2606069cdd5840fd035312bad027d8b2428 [diff] [blame]