[MirNamer][Canonicalizer]: Perform instruction semantic based renaming https://reviews.llvm.org/D70210 Previously: Due to sensitivity of the algorithm with gaps, and extra instructions, when diffing, often we see naming being off by a few. Makes the diff unreadable even for tests with 7 and 8 instructions respectively. Naming can change depending on candidates (and order of picking candidates). Suddenly if there's one extra instruction somewhere, the entire subtree would be named completely differently. No consistent naming of similar instructions which occur in different functions. If we try to do something like count the frequency distribution of various differences across suite, then the above sensitivity issues are going to result in poor results. Instead: Name instruction based on semantics of the instruction (hash of the opcode and operands). Essentially for a given instruction that occurs in any module/function it'll be named similarly (ie semantic). This has some nice properties Can easily look at many instructions and just check the hash and if they're named similarly, then it's the same instruction. Makes it very easy to spot the same instruction both multiple times, as well as across many functions (useful for frequency distribution). Independent of traversal/candidates/depth of graph. No need to keep track of last index/gaps/skip count etc. No off by few issues with diffs. I've tried the old vs new implementation in files ranging from 30 to 700 instructions. In both cases with the old algorithm, diffs are a sea of red, where as for the semantic version, in both cases, the diffs line up beautifully. Simplified implementation of the main loop (simple iteration) , no keep track of what's visited and not. Handle collision just by incrementing a counter. Roughly bb[N]_hash_[CollisionCount]. Additionally with the new implementation, we can probably avoid doing the hoisting of instructions to various places, as they'll likely be named the same resulting in differences only based on collision (ie regardless of whether the instruction is hoisted or not/close to use or not, it'll be named the same hash which should result in use of the instruction be identical with the only change being the collision count) which is very easy to spot visually.

commit: 72768685567b5e2ef9820b80997c5aed615e9f57 [log] [tgz]
author: Aditya Nandakumar <aditya_nandakumar@apple.com> Fri Nov 15 08:23:32 2019 -0800
committer: Aditya Nandakumar <aditya_nandakumar@apple.com> Fri Nov 15 08:38:54 2019 -0800
tree: 4c353444fe6d7bf981f3356a1ad4743a2bbf9161
parent: c9081968ead183ee1df824f7b96fcafcfcbe57cd [diff] [blame]
diff --git a/llvm/lib/CodeGen/MIRVRegNamerUtils.h b/llvm/lib/CodeGen/MIRVRegNamerUtils.h
index c5b52a9..ebe3097 100644
--- a/llvm/lib/CodeGen/MIRVRegNamerUtils.h
+++ b/llvm/lib/CodeGen/MIRVRegNamerUtils.h

@@ -25,65 +25,68 @@
 #include "llvm/CodeGen/Passes.h"
 #include "llvm/Support/raw_ostream.h"
 
-#include <queue>
 
 namespace llvm {
+/// VRegRenamer - This class is used for renaming vregs in a machine basic
+/// block according to semantics of the instruction.
+class VRegRenamer {
+  class NamedVReg {
+    Register Reg;
+    std::string Name;
 
-/// NamedVRegCursor - The cursor is an object that keeps track of what the next
-/// vreg name should be. It does book keeping to determine when to skip the
-/// index value and by how much, or if the next vreg name should be an increment
-/// from the previous.
-class NamedVRegCursor {
+  public:
+    NamedVReg(Register Reg, std::string Name = "") : Reg(Reg), Name(Name) {}
+    NamedVReg(std::string Name = "") : Reg(~0U), Name(Name) {}
+
+    const std::string &getName() const { return Name; }
+
+    Register getReg() const { return Reg; }
+  };
+
   MachineRegisterInfo &MRI;
 
-  /// virtualVRegNumber - Book keeping of the last vreg position.
-  unsigned virtualVRegNumber;
+  unsigned CurrentBBNumber = 0;
 
-  /// SkipGapSize - Used to calculate a modulo amount to skip by after every
-  /// sequence of instructions starting from a given side-effecting
-  /// MachineInstruction for a given MachineBasicBlock. The general idea is that
-  /// for a given program compiled with two different opt pipelines, there
-  /// shouldn't be greater than SkipGapSize difference in how many vregs are in
-  /// play between the two and for every def-use graph of vregs we rename we
-  /// will round up to the next SkipGapSize'th number so that we have a high
-  /// change of landing on the same name for two given matching side-effects
-  /// for the two compilation outcomes.
-  const unsigned SkipGapSize;
+  /// Given an Instruction, construct a hash of the operands
+  /// of the instructions along with the opcode.
+  /// When dealing with virtual registers, just hash the opcode of
+  /// the instruction defining that vreg.
+  /// Handle immediates, registers (physical and virtual) explicitly,
+  /// and return a common value for the other cases.
+  /// Instruction will be named in the following scheme
+  /// bb<block_no>_hash_<collission_count>.
+  std::string getInstructionOpcodeHash(MachineInstr &MI);
 
-  /// RenamedInOtherBB - VRegs that we already renamed: ie breadcrumbs.
-  std::vector<Register> RenamedInOtherBB;
+  /// For all the VRegs that are candidates for renaming,
+  /// return a mapping from old vregs to new vregs with names.
+  std::map<unsigned, unsigned>
+  getVRegRenameMap(const std::vector<NamedVReg> &VRegs);
+
+  /// Perform replacing of registers based on the <old,new> vreg map.
+  bool doVRegRenaming(const std::map<unsigned, unsigned> &VRegRenameMap);
 
 public:
-  NamedVRegCursor() = delete;
-  /// 1000 for the SkipGapSize was a good heuristic at the time of the writing
-  /// of the MIRCanonicalizerPass. Adjust as needed.
-  NamedVRegCursor(MachineRegisterInfo &MRI, unsigned SkipGapSize = 1000)
-      : MRI(MRI), virtualVRegNumber(0), SkipGapSize(SkipGapSize) {}
-
-  /// SkipGapSize - Skips modulo a gap value of indices. Indices are used to
-  /// produce the next vreg name.
-  void skipVRegs();
-
-  unsigned getVirtualVReg() const { return virtualVRegNumber; }
-
-  /// incrementVirtualVReg - This increments an index value that us used to
-  /// create a new vreg name. This is not a Register.
-  unsigned incrementVirtualVReg(unsigned incr = 1) {
-    virtualVRegNumber += incr;
-    return virtualVRegNumber;
-  }
+  VRegRenamer() = delete;
+  VRegRenamer(MachineRegisterInfo &MRI) : MRI(MRI) {}
 
   /// createVirtualRegister - Given an existing vreg, create a named vreg to
-  /// take its place.
+  /// take its place. The name is determined by calling
+  /// getInstructionOpcodeHash.
   unsigned createVirtualRegister(unsigned VReg);
 
-  /// renameVRegs - For a given MachineBasicBlock, scan for side-effecting
-  /// instructions, walk the def-use from each side-effecting root (in sorted
-  /// root order) and rename the encountered vregs in the def-use graph in a
-  /// canonical ordering. This method maintains book keeping for which vregs
-  /// were already renamed in RenamedInOtherBB.
-  // @return changed
-  bool renameVRegs(MachineBasicBlock *MBB);
+  /// Create a vreg with name and return it.
+  unsigned createVirtualRegisterWithName(unsigned VReg,
+                                         const std::string &Name);
+  /// Linearly traverse the MachineBasicBlock and rename each instruction's
+  /// vreg definition based on the semantics of the instruction.
+  /// Names are as follows bb<BBNum>_hash_[0-9]+
+  bool renameInstsInMBB(MachineBasicBlock *MBB);
+
+  /// Same as the above, but sets a BBNum depending on BB traversal that
+  /// will be used as prefix for the vreg names.
+  bool renameVRegs(MachineBasicBlock *MBB, unsigned BBNum = 0);
+
+  unsigned getCurrentBBNumber() const { return CurrentBBNumber; }
 };
 
 } // namespace llvm
commit	72768685567b5e2ef9820b80997c5aed615e9f57	[log] [tgz]
author	Aditya Nandakumar <aditya_nandakumar@apple.com>	Fri Nov 15 08:23:32 2019 -0800
committer	Aditya Nandakumar <aditya_nandakumar@apple.com>	Fri Nov 15 08:38:54 2019 -0800
tree	4c353444fe6d7bf981f3356a1ad4743a2bbf9161
parent	c9081968ead183ee1df824f7b96fcafcfcbe57cd [diff] [blame]