Subzero: Cleanly implement register allocation after phi lowering.

After finding a valid linearization of phi assignments, the old approach calls a complicated target-specific method that lowers and ad-hoc register allocates the phi assignments.

In the new approach, we use existing target lowering to lower assignments into mov/whatever instructions, and enhance the register allocator to be able to forcibly spill and reload a register if one is needed but none are available.

The new approach incrementally updates liveness and live ranges for newly added nodes and variable uses, to avoid having to expensively recompute it all.

Advanced phi lowering is enabled now on ARM, and constant blinding no longer needs to be disabled during phi lowering.

Some of the metadata regarding which CfgNode a local variable belongs to, needed to be made non-const, in order to add spill/fill instructions to a CfgNode during register allocation.

Most of the testing came from spec2k.  There are some minor differences in the output regarding stack frame offsets, probably related to the order that new nodes are phi-lowered.  The changes related to constant blinding were tested by running spec with "-randomize-pool-immediates=randomize -randomize-pool-threshold=8".

Unfortunately, this appears to add about 10% to the translation time for 176.gcc.  The cost is clear in the -timing output so it can be investigated later.  There is a TODO suggesting the possible cause and solution, for later investigation.

BUG= none
R=jvoung@chromium.org

Review URL: https://codereview.chromium.org/1253833002.
diff --git a/src/IceCfg.cpp b/src/IceCfg.cpp
index 7e91508..99ed0bc 100644
--- a/src/IceCfg.cpp
+++ b/src/IceCfg.cpp
@@ -248,11 +248,50 @@
 
 void Cfg::advancedPhiLowering() {
   TimerMarker T(TimerStack::TT_advancedPhiLowering, this);
+  // Clear all previously computed live ranges (but not live-in/live-out bit
+  // vectors or last-use markers), because the follow-on register allocation is
+  // only concerned with live ranges across the newly created blocks.
+  for (Variable *Var : Variables) {
+    Var->getLiveRange().reset();
+  }
   // This splits edges and appends new nodes to the end of the node
   // list.  This can invalidate iterators, so don't use an iterator.
   SizeT NumNodes = getNumNodes();
+  SizeT NumVars = getNumVariables();
   for (SizeT I = 0; I < NumNodes; ++I)
     Nodes[I]->advancedPhiLowering();
+
+  TimerMarker TT(TimerStack::TT_lowerPhiAssignments, this);
+  if (true) {
+    // The following code does an in-place update of liveness and live ranges as
+    // a result of adding the new phi edge split nodes.
+    getLiveness()->initPhiEdgeSplits(Nodes.begin() + NumNodes,
+                                     Variables.begin() + NumVars);
+    TimerMarker TTT(TimerStack::TT_liveness, this);
+    // Iterate over the newly added nodes to add their liveness info.
+    for (auto I = Nodes.begin() + NumNodes, E = Nodes.end(); I != E; ++I) {
+      InstNumberT FirstInstNum = getNextInstNumber();
+      (*I)->renumberInstructions();
+      InstNumberT LastInstNum = getNextInstNumber() - 1;
+      // TODO(stichnot): May be able to speed up liveness and live range
+      // calculation by having it consider only pre-colored or infinite-weight
+      // variables.  Could also simplify LinearScan::initForInfOnly() that way.
+      (*I)->liveness(getLiveness());
+      (*I)->livenessAddIntervals(getLiveness(), FirstInstNum, LastInstNum);
+    }
+  } else {
+    // The following code does a brute-force recalculation of live ranges as a
+    // result of adding the new phi edge split nodes.  The liveness calculation
+    // is particularly expensive because the new nodes are not yet in a proper
+    // topological order and so convergence is slower.
+    //
+    // This code is kept here for reference and can be temporarily enabled in
+    // case the incremental code is under suspicion.
+    renumberInstructions();
+    liveness(Liveness_Intervals);
+    getVMetadata()->init(VMK_All);
+  }
+  Target->regAlloc(RAK_Phi);
 }
 
 // Find a reasonable placement for nodes that have not yet been
diff --git a/src/IceCfgNode.cpp b/src/IceCfgNode.cpp
index a901753..78063e3 100644
--- a/src/IceCfgNode.cpp
+++ b/src/IceCfgNode.cpp
@@ -270,33 +270,44 @@
 
 } // end of anonymous namespace
 
-// This the "advanced" version of Phi lowering for a basic block, in
-// contrast to the simple version that lowers through assignments
-// involving temporaries.
+// This the "advanced" version of Phi lowering for a basic block, in contrast to
+// the simple version that lowers through assignments involving temporaries.
 //
-// All Phi instructions in a basic block are conceptually executed in
-// parallel.  However, if we lower Phis early and commit to a
-// sequential ordering, we may end up creating unnecessary
-// interferences which lead to worse register allocation.  Delaying
-// Phi scheduling until after register allocation can help unless
-// there are no free registers for shuffling registers or stack slots
-// and spilling becomes necessary.
+// All Phi instructions in a basic block are conceptually executed in parallel.
+// However, if we lower Phis early and commit to a sequential ordering, we may
+// end up creating unnecessary interferences which lead to worse register
+// allocation.  Delaying Phi scheduling until after register allocation can help
+// unless there are no free registers for shuffling registers or stack slots and
+// spilling becomes necessary.
 //
-// The advanced Phi lowering starts by finding a topological sort of
-// the Phi instructions, where "A=B" comes before "B=C" due to the
-// anti-dependence on B.  If a topological sort is not possible due to
-// a cycle, the cycle is broken by introducing a non-parallel
-// temporary.  For example, a cycle arising from a permutation like
-// "A=B;B=C;C=A" can become "T=A;A=B;B=C;C=T".  All else being equal,
-// prefer to schedule assignments with register-allocated Src operands
-// earlier, in case that register becomes free afterwards, and prefer
-// to schedule assignments with register-allocated Dest variables
-// later, to keep that register free for longer.
+// The advanced Phi lowering starts by finding a topological sort of the Phi
+// instructions, where "A=B" comes before "B=C" due to the anti-dependence on B.
+// Preexisting register assignments are considered in the topological sort.  If
+// a topological sort is not possible due to a cycle, the cycle is broken by
+// introducing a non-parallel temporary.  For example, a cycle arising from a
+// permutation like "A=B;B=C;C=A" can become "T=A;A=B;B=C;C=T".  All else being
+// equal, prefer to schedule assignments with register-allocated Src operands
+// earlier, in case that register becomes free afterwards, and prefer to
+// schedule assignments with register-allocated Dest variables later, to keep
+// that register free for longer.
 //
-// Once the ordering is determined, the Cfg edge is split and the
-// assignment list is lowered by the target lowering layer.  The
-// specific placement of the new node within the Cfg node list is
-// deferred until later, including after empty node contraction.
+// Once the ordering is determined, the Cfg edge is split and the assignment
+// list is lowered by the target lowering layer.  Since the assignment lowering
+// may create new infinite-weight temporaries, a follow-on register allocation
+// pass will be needed.  To prepare for this, liveness (including live range
+// calculation) of the split nodes needs to be calculated, and liveness of the
+// original node need to be updated to "undo" the effects of the phi
+// assignments.
+
+// The specific placement of the new node within the Cfg node list is deferred
+// until later, including after empty node contraction.
+//
+// After phi assignments are lowered across all blocks, another register
+// allocation pass is run, focusing only on pre-colored and infinite-weight
+// variables, similar to Om1 register allocation (except without the need to
+// specially compute these variables' live ranges, since they have already been
+// precisely calculated).  The register allocator in this mode needs the ability
+// to forcibly spill and reload registers in case none are naturally available.
 void CfgNode::advancedPhiLowering() {
   if (getPhis().empty())
     return;
@@ -314,11 +325,18 @@
 
   size_t NumPhis = 0;
   for (Inst &I : Phis) {
-    auto Inst = llvm::dyn_cast<InstPhi>(&I);
+    auto *Inst = llvm::dyn_cast<InstPhi>(&I);
     if (!Inst->isDeleted()) {
+      Variable *Dest = Inst->getDest();
       Desc[NumPhis].Phi = Inst;
-      Desc[NumPhis].Dest = Inst->getDest();
+      Desc[NumPhis].Dest = Dest;
       ++NumPhis;
+      // Undo the effect of the phi instruction on this node's live-in set by
+      // marking the phi dest variable as live on entry.
+      SizeT VarNum = Func->getLiveness()->getLiveIndex(Dest->getIndex());
+      assert(!Func->getLiveness()->getLiveIn(this)[VarNum]);
+      Func->getLiveness()->getLiveIn(this)[VarNum] = true;
+      Inst->setDeleted();
     }
   }
   if (NumPhis == 0)
@@ -327,7 +345,6 @@
   SizeT InEdgeIndex = 0;
   for (CfgNode *Pred : InEdges) {
     CfgNode *Split = splitIncomingEdge(Pred, InEdgeIndex++);
-    AssignList Assignments;
     SizeT Remaining = NumPhis;
 
     // First pass computes Src and initializes NumPred.
@@ -348,7 +365,7 @@
           // be redundant, and if Dest/Src are on the stack, the
           // target lowering may naively decide to lower it using a
           // temporary register.
-          Assignments.push_back(InstAssign::create(Func, Dest, Src));
+          Split->appendInst(InstAssign::create(Func, Dest, Src));
       }
     }
     // Second pass computes NumPred by comparing every pair of Phi
@@ -374,12 +391,12 @@
     // Another pass to compute initial Weight values.
 
     // Always pick NumPred=0 over NumPred>0.
-    const int32_t WeightNoPreds = 4;
+    constexpr int32_t WeightNoPreds = 4;
     // Prefer Src as a register because the register might free up.
-    const int32_t WeightSrcIsReg = 2;
+    constexpr int32_t WeightSrcIsReg = 2;
     // Prefer Dest not as a register because the register stays free
     // longer.
-    const int32_t WeightDestNotReg = 1;
+    constexpr int32_t WeightDestNotReg = 1;
 
     for (size_t I = 0; I < NumPhis; ++I) {
       if (Desc[I].Processed)
@@ -434,7 +451,7 @@
             Variable *Tmp = Func->makeVariable(OtherSrc->getType());
             if (BuildDefs::dump())
               Tmp->setName(Func, "__split_" + std::to_string(VarNum));
-            Assignments.push_back(InstAssign::create(Func, Tmp, OtherSrc));
+            Split->appendInst(InstAssign::create(Func, Tmp, OtherSrc));
             Desc[J].Src = Tmp;
             Found = true;
           }
@@ -443,7 +460,7 @@
       }
       // Now that a cycle (if any) has been broken, create the actual
       // assignment.
-      Assignments.push_back(InstAssign::create(Func, Dest, Src));
+      Split->appendInst(InstAssign::create(Func, Dest, Src));
       // Update NumPred for all Phi assignments using this Phi's Src
       // as their Dest variable.  Also update Weight if NumPred
       // dropped from 1 to 0.
@@ -459,18 +476,11 @@
       }
       Desc[BestIndex].Processed = true;
     }
+    Split->appendInst(InstBr::create(Func, this));
 
-    Func->getTarget()->lowerPhiAssignments(Split, Assignments);
-
-    // Renumber the instructions to be monotonically increasing so
-    // that addNode() doesn't assert when multi-definitions are added
-    // out of order.
-    Split->renumberInstructions();
+    Split->genCode();
     Func->getVMetadata()->addNode(Split);
   }
-
-  for (Inst &I : Phis)
-    I.setDeleted();
 }
 
 // Does address mode optimization.  Pass each instruction to the
@@ -896,7 +906,7 @@
   // inference allows it to be greater than 1 for short periods.
   std::vector<SizeT> LiveRegCount(Func->getTarget()->getNumRegisters());
   if (DecorateAsm) {
-    const bool IsLiveIn = true;
+    constexpr bool IsLiveIn = true;
     emitRegisterUsage(Str, Func, this, IsLiveIn, LiveRegCount);
   }
 
@@ -931,7 +941,7 @@
     updateStats(Func, &I);
   }
   if (DecorateAsm) {
-    const bool IsLiveIn = false;
+    constexpr bool IsLiveIn = false;
     emitRegisterUsage(Str, Func, this, IsLiveIn, LiveRegCount);
   }
 }
diff --git a/src/IceClFlags.cpp b/src/IceClFlags.cpp
index c6ea696..cbacd8b 100644
--- a/src/IceClFlags.cpp
+++ b/src/IceClFlags.cpp
@@ -401,15 +401,7 @@
   OutFlags.setFunctionSections(::FunctionSections);
   OutFlags.setNumTranslationThreads(::NumThreads);
   OutFlags.setOptLevel(::OLevel);
-  if (::TargetArch == Target_ARM32) {
-    // TODO(jvoung): We need lowerPhiAssignments to handle spilling
-    // more than one register, since some ARM lowerAssign sequences
-    // may require more than one register. For now, disable PhiEdgeSplit
-    // to avoid requiring lowerPhiAssignments.
-    OutFlags.setPhiEdgeSplit(false);
-  } else {
-    OutFlags.setPhiEdgeSplit(::EnablePhiEdgeSplit);
-  }
+  OutFlags.setPhiEdgeSplit(::EnablePhiEdgeSplit);
   OutFlags.setRandomSeed(::RandomSeed);
   OutFlags.setShouldDoNopInsertion(::ShouldDoNopInsertion);
   OutFlags.setShouldRandomizeRegAlloc(::RandomizeRegisterAllocation);
diff --git a/src/IceDefs.h b/src/IceDefs.h
index 17e2fb1..4e1fb3a 100644
--- a/src/IceDefs.h
+++ b/src/IceDefs.h
@@ -197,7 +197,9 @@
 };
 
 enum RegAllocKind {
+  RAK_Unknown,
   RAK_Global, /// full, global register allocation
+  RAK_Phi,    /// infinite-weight Variables with active spilling/filling
   RAK_InfOnly /// allocation only for infinite-weight Variables
 };
 
diff --git a/src/IceLiveness.cpp b/src/IceLiveness.cpp
index 551a400..da9b23d 100644
--- a/src/IceLiveness.cpp
+++ b/src/IceLiveness.cpp
@@ -30,7 +30,13 @@
 
 namespace Ice {
 
-void Liveness::init() {
+// Initializes the basic liveness-related data structures for full liveness
+// analysis (IsFullInit=true), or for incremental update after phi lowering
+// (IsFullInit=false).  In the latter case, FirstNode points to the first node
+// added since starting phi lowering, and FirstVar points to the first Variable
+// added since starting phi lowering.
+void Liveness::initInternal(NodeList::const_iterator FirstNode,
+                            VarList::const_iterator FirstVar, bool IsFullInit) {
   // Initialize most of the container sizes.
   SizeT NumVars = Func->getVariables().size();
   SizeT NumNodes = Func->getNumNodes();
@@ -38,32 +44,38 @@
   Nodes.resize(NumNodes);
   VarToLiveMap.resize(NumVars);
 
-  // Count the number of globals, and the number of locals for each
-  // block.
-  for (SizeT i = 0; i < NumVars; ++i) {
-    Variable *Var = Func->getVariables()[i];
+  // Count the number of globals, and the number of locals for each block.
+  SizeT TmpNumGlobals = 0;
+  for (auto I = FirstVar, E = Func->getVariables().end(); I != E; ++I) {
+    Variable *Var = *I;
     if (VMetadata->isMultiBlock(Var)) {
-      ++NumGlobals;
+      ++TmpNumGlobals;
     } else {
       SizeT Index = VMetadata->getLocalUseNode(Var)->getIndex();
       ++Nodes[Index].NumLocals;
     }
   }
+  if (IsFullInit)
+    NumGlobals = TmpNumGlobals;
+  else
+    assert(TmpNumGlobals == 0);
 
-  // Resize each LivenessNode::LiveToVarMap, and the global
-  // LiveToVarMap.  Reset the counts to 0.
-  for (SizeT i = 0; i < NumNodes; ++i) {
-    Nodes[i].LiveToVarMap.assign(Nodes[i].NumLocals, nullptr);
-    Nodes[i].NumLocals = 0;
-    Nodes[i].NumNonDeadPhis = 0;
+  // Resize each LivenessNode::LiveToVarMap, and the global LiveToVarMap.  Reset
+  // the counts to 0.
+  for (auto I = FirstNode, E = Func->getNodes().end(); I != E; ++I) {
+    LivenessNode &N = Nodes[(*I)->getIndex()];
+    N.LiveToVarMap.assign(N.NumLocals, nullptr);
+    N.NumLocals = 0;
+    N.NumNonDeadPhis = 0;
   }
-  LiveToVarMap.assign(NumGlobals, nullptr);
+  if (IsFullInit)
+    LiveToVarMap.assign(NumGlobals, nullptr);
 
   // Sort each variable into the appropriate LiveToVarMap.  Also set
   // VarToLiveMap.
-  SizeT TmpNumGlobals = 0;
-  for (SizeT i = 0; i < NumVars; ++i) {
-    Variable *Var = Func->getVariables()[i];
+  TmpNumGlobals = 0;
+  for (auto I = FirstVar, E = Func->getVariables().end(); I != E; ++I) {
+    Variable *Var = *I;
     SizeT VarIndex = Var->getIndex();
     SizeT LiveIndex;
     if (VMetadata->isMultiBlock(Var)) {
@@ -77,11 +89,11 @@
     }
     VarToLiveMap[VarIndex] = LiveIndex;
   }
-  assert(NumGlobals == TmpNumGlobals);
+  assert(TmpNumGlobals == (IsFullInit ? NumGlobals : 0));
 
   // Process each node.
-  for (const CfgNode *LNode : Func->getNodes()) {
-    LivenessNode &Node = Nodes[LNode->getIndex()];
+  for (auto I = FirstNode, E = Func->getNodes().end(); I != E; ++I) {
+    LivenessNode &Node = Nodes[(*I)->getIndex()];
     // NumLocals, LiveToVarMap already initialized
     Node.LiveIn.resize(NumGlobals);
     Node.LiveOut.resize(NumGlobals);
@@ -90,6 +102,19 @@
   }
 }
 
+void Liveness::init() {
+  constexpr bool IsFullInit = true;
+  NodeList::const_iterator FirstNode = Func->getNodes().begin();
+  VarList::const_iterator FirstVar = Func->getVariables().begin();
+  initInternal(FirstNode, FirstVar, IsFullInit);
+}
+
+void Liveness::initPhiEdgeSplits(NodeList::const_iterator FirstNode,
+                                 VarList::const_iterator FirstVar) {
+  constexpr bool IsFullInit = false;
+  initInternal(FirstNode, FirstVar, IsFullInit);
+}
+
 Variable *Liveness::getVariable(SizeT LiveIndex, const CfgNode *Node) const {
   if (LiveIndex < NumGlobals)
     return LiveToVarMap[LiveIndex];
diff --git a/src/IceLiveness.h b/src/IceLiveness.h
index 7fcad25..da2c52e 100644
--- a/src/IceLiveness.h
+++ b/src/IceLiveness.h
@@ -63,6 +63,8 @@
 public:
   Liveness(Cfg *Func, LivenessMode Mode) : Func(Func), Mode(Mode) {}
   void init();
+  void initPhiEdgeSplits(NodeList::const_iterator FirstNode,
+                         VarList::const_iterator FirstVar);
   Cfg *getFunc() const { return Func; }
   LivenessMode getMode() const { return Mode; }
   Variable *getVariable(SizeT LiveIndex, const CfgNode *Node) const;
@@ -96,6 +98,8 @@
   }
 
 private:
+  void initInternal(NodeList::const_iterator FirstNode,
+                    VarList::const_iterator FirstVar, bool IsFullInit);
   /// Resize Nodes so that Nodes[Index] is valid.
   void resize(SizeT Index) {
     if (Index >= Nodes.size())
diff --git a/src/IceOperand.cpp b/src/IceOperand.cpp
index 0d739ca..815efaf 100644
--- a/src/IceOperand.cpp
+++ b/src/IceOperand.cpp
@@ -146,8 +146,7 @@
 }
 
 void VariableTracking::markUse(MetadataKind TrackingKind, const Inst *Instr,
-                               const CfgNode *Node, bool IsFromDef,
-                               bool IsImplicit) {
+                               CfgNode *Node, bool IsImplicit) {
   (void)TrackingKind;
   if (MultiBlock == MBS_MultiBlock)
     return;
@@ -165,7 +164,7 @@
   // is because there can be additional control flow before branching
   // back to this node, and the variable is live throughout those
   // nodes.
-  if (!IsFromDef && Instr && llvm::isa<InstPhi>(Instr))
+  if (Instr && llvm::isa<InstPhi>(Instr))
     MakeMulti = true;
 
   if (!MakeMulti) {
@@ -190,7 +189,7 @@
 }
 
 void VariableTracking::markDef(MetadataKind TrackingKind, const Inst *Instr,
-                               const CfgNode *Node) {
+                               CfgNode *Node) {
   // TODO(stichnot): If the definition occurs in the last instruction
   // of the block, consider not marking this as a separate use.  But
   // be careful not to omit all uses of the variable if markDef() and
@@ -205,9 +204,8 @@
            Instr->getNumber() >= LastInstruction->getNumber());
   }
 #endif
-  const bool IsFromDef = true;
-  const bool IsImplicit = false;
-  markUse(TrackingKind, Instr, Node, IsFromDef, IsImplicit);
+  constexpr bool IsImplicit = false;
+  markUse(TrackingKind, Instr, Node, IsImplicit);
   if (TrackingKind == VMK_Uses)
     return;
   if (FirstOrSingleDefinition == nullptr)
@@ -276,12 +274,10 @@
 
   // Mark implicit args as being used in the entry node.
   for (Variable *Var : Func->getImplicitArgs()) {
-    const Inst *NoInst = nullptr;
-    const CfgNode *EntryNode = Func->getEntryNode();
-    const bool IsFromDef = false;
-    const bool IsImplicit = true;
-    Metadata[Var->getIndex()].markUse(Kind, NoInst, EntryNode, IsFromDef,
-                                      IsImplicit);
+    constexpr Inst *NoInst = nullptr;
+    CfgNode *EntryNode = Func->getEntryNode();
+    constexpr bool IsImplicit = true;
+    Metadata[Var->getIndex()].markUse(Kind, NoInst, EntryNode, IsImplicit);
   }
 
   for (CfgNode *Node : Func->getNodes())
@@ -304,9 +300,8 @@
       if (const Variable *Var = llvm::dyn_cast<Variable>(I.getSrc(SrcNum))) {
         SizeT VarNum = Var->getIndex();
         assert(VarNum < Metadata.size());
-        const bool IsFromDef = false;
-        const bool IsImplicit = false;
-        Metadata[VarNum].markUse(Kind, &I, Node, IsFromDef, IsImplicit);
+        constexpr bool IsImplicit = false;
+        Metadata[VarNum].markUse(Kind, &I, Node, IsImplicit);
       }
     }
   }
@@ -328,9 +323,8 @@
         const Variable *Var = Src->getVar(J);
         SizeT VarNum = Var->getIndex();
         assert(VarNum < Metadata.size());
-        const bool IsFromDef = false;
-        const bool IsImplicit = false;
-        Metadata[VarNum].markUse(Kind, &I, Node, IsFromDef, IsImplicit);
+        constexpr bool IsImplicit = false;
+        Metadata[VarNum].markUse(Kind, &I, Node, IsImplicit);
       }
     }
   }
@@ -382,7 +376,7 @@
   return Metadata[VarNum].getLatterDefinitions();
 }
 
-const CfgNode *VariablesMetadata::getLocalUseNode(const Variable *Var) const {
+CfgNode *VariablesMetadata::getLocalUseNode(const Variable *Var) const {
   if (!isTracked(Var))
     return nullptr; // conservative answer
   SizeT VarNum = Var->getIndex();
diff --git a/src/IceOperand.h b/src/IceOperand.h
index 8db3cf1..ccfefb9 100644
--- a/src/IceOperand.h
+++ b/src/IceOperand.h
@@ -377,6 +377,9 @@
   InstNumberT getStart() const {
     return Range.empty() ? -1 : Range.begin()->first;
   }
+  InstNumberT getEnd() const {
+    return Range.empty() ? -1 : Range.rbegin()->second;
+  }
 
   void untrim() { TrimmedBegin = Range.begin(); }
   void trim(InstNumberT Lower);
@@ -577,17 +580,16 @@
   const Inst *getFirstDefinition() const;
   const Inst *getSingleDefinition() const;
   const InstDefList &getLatterDefinitions() const { return Definitions; }
-  const CfgNode *getNode() const { return SingleUseNode; }
-  void markUse(MetadataKind TrackingKind, const Inst *Instr,
-               const CfgNode *Node, bool IsFromDef, bool IsImplicit);
-  void markDef(MetadataKind TrackingKind, const Inst *Instr,
-               const CfgNode *Node);
+  CfgNode *getNode() const { return SingleUseNode; }
+  void markUse(MetadataKind TrackingKind, const Inst *Instr, CfgNode *Node,
+               bool IsImplicit);
+  void markDef(MetadataKind TrackingKind, const Inst *Instr, CfgNode *Node);
 
 private:
   MultiDefState MultiDef = MDS_Unknown;
   MultiBlockState MultiBlock = MBS_Unknown;
-  const CfgNode *SingleUseNode = nullptr;
-  const CfgNode *SingleDefNode = nullptr;
+  CfgNode *SingleUseNode = nullptr;
+  CfgNode *SingleDefNode = nullptr;
   /// All definitions of the variable are collected here, in increasing
   /// order of instruction number.
   InstDefList Definitions; /// Only used if Kind==VMK_All
@@ -645,7 +647,7 @@
   bool isMultiBlock(const Variable *Var) const;
   /// Returns the node that the given Variable is used in, assuming
   /// isMultiBlock() returns false.  Otherwise, nullptr is returned.
-  const CfgNode *getLocalUseNode(const Variable *Var) const;
+  CfgNode *getLocalUseNode(const Variable *Var) const;
 
 private:
   const Cfg *Func;
diff --git a/src/IceRegAlloc.cpp b/src/IceRegAlloc.cpp
index 400cb7d..fad3d01 100644
--- a/src/IceRegAlloc.cpp
+++ b/src/IceRegAlloc.cpp
@@ -37,7 +37,7 @@
 // is with respect Cur.start.  Initial tests show no measurable
 // performance difference, so we'll keep the code simple for now.
 bool overlapsDefs(const Cfg *Func, const Variable *Item, const Variable *Var) {
-  const bool UseTrimmed = true;
+  constexpr bool UseTrimmed = true;
   VariablesMetadata *VMetadata = Func->getVMetadata();
   if (const Inst *FirstDef = VMetadata->getFirstDefinition(Var))
     if (Item->getLiveRange().overlapsInst(FirstDef->getNumber(), UseTrimmed))
@@ -73,9 +73,8 @@
   if (!BuildDefs::dump())
     return;
   Ostream &Str = Func->getContext()->getStrDump();
-  const static size_t BufLen = 30;
-  char buf[BufLen];
-  snprintf(buf, BufLen, "%2d", Var->getRegNumTmp());
+  char buf[30];
+  snprintf(buf, llvm::array_lengthof(buf), "%2d", Var->getRegNumTmp());
   Str << "R=" << buf << "  V=";
   Var->dump(Func);
   Str << "  Range=" << Var->getLiveRange();
@@ -88,7 +87,12 @@
 void LinearScan::initForGlobal() {
   TimerMarker T(TimerStack::TT_initUnhandled, Func);
   FindPreference = true;
-  FindOverlap = true;
+  // For full register allocation, normally we want to enable FindOverlap
+  // (meaning we look for opportunities for two overlapping live ranges to
+  // safely share the same register).  However, we disable it for phi-lowering
+  // register allocation since no overlap opportunities should be available and
+  // it's more expensive to look for opportunities.
+  FindOverlap = (Kind != RAK_Phi);
   const VarList &Vars = Func->getVariables();
   Unhandled.reserve(Vars.size());
   UnhandledPrecolored.reserve(Vars.size());
@@ -103,6 +107,10 @@
     // it was never referenced.
     if (Var->getLiveRange().isEmpty())
       continue;
+    // Post phi lowering register allocation is only concerned with variables
+    // that are infinite-weight or pre-colored.
+    if (Kind == RAK_Phi && !Var->getWeight().isInf() && !Var->hasReg())
+      continue;
     Var->untrimLiveRange();
     Unhandled.push_back(Var);
     if (Var->hasReg()) {
@@ -114,6 +122,11 @@
 
   // Build the (ordered) list of FakeKill instruction numbers.
   Kills.clear();
+  // Phi lowering should not be creating new call instructions, so there should
+  // be no infinite-weight not-yet-colored live ranges that span a call
+  // instruction, hence no need to construct the Kills list.
+  if (Kind == RAK_Phi)
+    return;
   for (CfgNode *Node : Func->getNodes()) {
     for (Inst &I : Node->getInsts()) {
       if (auto Kill = llvm::dyn_cast<InstFakeKill>(&I)) {
@@ -196,7 +209,7 @@
       assert(LREnd[i] != Inst::NumberSentinel);
       Unhandled.push_back(Var);
       Var->resetLiveRange();
-      const uint32_t WeightDelta = 1;
+      constexpr uint32_t WeightDelta = 1;
       Var->addLiveRange(LRBegin[i], LREnd[i], WeightDelta);
       Var->untrimLiveRange();
       if (Var->hasReg()) {
@@ -217,6 +230,7 @@
 }
 
 void LinearScan::init(RegAllocKind Kind) {
+  this->Kind = Kind;
   Unhandled.clear();
   UnhandledPrecolored.clear();
   Handled.clear();
@@ -224,7 +238,11 @@
   Active.clear();
 
   switch (Kind) {
+  case RAK_Unknown:
+    llvm::report_fatal_error("Invalid RAK_Unknown");
+    break;
   case RAK_Global:
+  case RAK_Phi:
     initForGlobal();
     break;
   case RAK_InfOnly:
@@ -251,6 +269,76 @@
   Active.reserve(Unhandled.size());
 }
 
+// This is called when Cur must be allocated a register but no registers are
+// available across Cur's live range.  To handle this, we find a register that
+// is not explicitly used during Cur's live range, spill that register to a
+// stack location right before Cur's live range begins, and fill (reload) the
+// register from the stack location right after Cur's live range ends.
+void LinearScan::addSpillFill(Variable *Cur, llvm::SmallBitVector RegMask) {
+  // Identify the actual instructions that begin and end Cur's live range.
+  // Iterate through Cur's node's instruction list until we find the actual
+  // instructions with instruction numbers corresponding to Cur's recorded live
+  // range endpoints.  This sounds inefficient but shouldn't be a problem in
+  // practice because:
+  // (1) This function is almost never called in practice.
+  // (2) Since this register over-subscription problem happens only for
+  //     phi-lowered instructions, the number of instructions in the node is
+  //     proportional to the number of phi instructions in the original node,
+  //     which is never very large in practice.
+  // (3) We still have to iterate through all instructions of Cur's live range
+  //     to find all explicitly used registers (though the live range is usually
+  //     only 2-3 instructions), so the main cost that could be avoided would be
+  //     finding the instruction that begin's Cur's live range.
+  assert(!Cur->getLiveRange().isEmpty());
+  InstNumberT Start = Cur->getLiveRange().getStart();
+  InstNumberT End = Cur->getLiveRange().getEnd();
+  CfgNode *Node = Func->getVMetadata()->getLocalUseNode(Cur);
+  assert(Node);
+  InstList &Insts = Node->getInsts();
+  InstList::iterator SpillPoint = Insts.end();
+  InstList::iterator FillPoint = Insts.end();
+  // Stop searching after we have found both the SpillPoint and the FillPoint.
+  for (auto I = Insts.begin(), E = Insts.end();
+       I != E && (SpillPoint == E || FillPoint == E); ++I) {
+    if (I->getNumber() == Start)
+      SpillPoint = I;
+    if (I->getNumber() == End)
+      FillPoint = I;
+    if (SpillPoint != E) {
+      // Remove from RegMask any physical registers referenced during Cur's live
+      // range.  Start looking after SpillPoint gets set, i.e. once Cur's live
+      // range begins.
+      for (SizeT i = 0; i < I->getSrcSize(); ++i) {
+        Operand *Src = I->getSrc(i);
+        SizeT NumVars = Src->getNumVars();
+        for (SizeT j = 0; j < NumVars; ++j) {
+          const Variable *Var = Src->getVar(j);
+          if (Var->hasRegTmp())
+            RegMask[Var->getRegNumTmp()] = false;
+        }
+      }
+    }
+  }
+  assert(SpillPoint != Insts.end());
+  assert(FillPoint != Insts.end());
+  ++FillPoint;
+  // TODO(stichnot): Randomize instead of find_first().
+  int32_t RegNum = RegMask.find_first();
+  assert(RegNum != -1);
+  Cur->setRegNumTmp(RegNum);
+  TargetLowering *Target = Func->getTarget();
+  Variable *Preg = Target->getPhysicalRegister(RegNum, Cur->getType());
+  // TODO(stichnot): Add SpillLoc to VariablesMetadata tracking so that SpillLoc
+  // is correctly identified as !isMultiBlock(), reducing stack frame size.
+  Variable *SpillLoc = Func->makeVariable(Cur->getType());
+  // Add "reg=FakeDef;spill=reg" before SpillPoint
+  Target->lowerInst(Node, SpillPoint, InstFakeDef::create(Func, Preg));
+  Target->lowerInst(Node, SpillPoint, InstAssign::create(Func, SpillLoc, Preg));
+  // add "reg=spill;FakeUse(reg)" before FillPoint
+  Target->lowerInst(Node, FillPoint, InstAssign::create(Func, Preg, SpillLoc));
+  Target->lowerInst(Node, FillPoint, InstFakeUse::create(Func, Preg));
+}
+
 // Implements the linear-scan algorithm.  Based on "Linear Scan
 // Register Allocation in the Context of SSA Form and Register
 // Constraints" by Hanspeter Mössenböck and Michael Pfeiffer,
@@ -312,10 +400,10 @@
         RegMaskFull & Func->getTarget()->getRegisterSetForType(Cur->getType());
     KillsRange.trim(Cur->getLiveRange().getStart());
 
-    // Check for precolored ranges.  If Cur is precolored, it
+    // Check for pre-colored ranges.  If Cur is pre-colored, it
     // definitely gets that register.  Previously processed live
     // ranges would have avoided that register due to it being
-    // precolored.  Future processed live ranges won't evict that
+    // pre-colored.  Future processed live ranges won't evict that
     // register because the live range has infinite weight.
     if (Cur->hasReg()) {
       int32_t RegNum = Cur->getRegNum();
@@ -501,7 +589,7 @@
     llvm::SmallVector<RegWeight, REGS_SIZE> Weights(RegMask.size());
 
     // Remove registers from the Free[] list where an Unhandled
-    // precolored range overlaps with the current range, and set those
+    // pre-colored range overlaps with the current range, and set those
     // registers to infinite weight so that they aren't candidates for
     // eviction.  Cur->rangeEndsBefore(Item) is an early exit check
     // that turns a guaranteed O(N^2) algorithm into expected linear
@@ -518,7 +606,7 @@
         Free[ItemReg] = false;
         PrecoloredUnhandledMask[ItemReg] = true;
         // Disable AllowOverlap if the preferred register is one of
-        // these precolored unhandled overlapping ranges.
+        // these pre-colored unhandled overlapping ranges.
         if (AllowOverlap && ItemReg == PreferReg) {
           AllowOverlap = false;
           dumpDisableOverlap(Func, Item, "PrecoloredUnhandled");
@@ -528,7 +616,7 @@
 
     // Remove scratch registers from the Free[] list, and mark their
     // Weights[] as infinite, if KillsRange overlaps Cur's live range.
-    const bool UseTrimmed = true;
+    constexpr bool UseTrimmed = true;
     if (Cur->getLiveRange().overlaps(KillsRange, UseTrimmed)) {
       Free.reset(KillsMask);
       for (int i = KillsMask.find_first(); i != -1;
@@ -615,8 +703,11 @@
         // Handled state.
         Handled.push_back(Cur);
         if (Cur->getLiveRange().getWeight().isInf()) {
-          Func->setError("Unable to find a physical register for an "
-                         "infinite-weight live range");
+          if (Kind == RAK_Phi)
+            addSpillFill(Cur, RegMask);
+          else
+            Func->setError("Unable to find a physical register for an "
+                           "infinite-weight live range");
         }
       } else {
         // Evict all live ranges in Active that register number
diff --git a/src/IceRegAlloc.h b/src/IceRegAlloc.h
index 89ae715..207b276 100644
--- a/src/IceRegAlloc.h
+++ b/src/IceRegAlloc.h
@@ -39,6 +39,9 @@
 
   void initForGlobal();
   void initForInfOnly();
+  /// Free up a register for infinite-weight Cur by spilling and reloading some
+  /// register that isn't used during Cur's live range.
+  void addSpillFill(Variable *Cur, llvm::SmallBitVector RegMask);
   /// Move an item from the From set to the To set.  From[Index] is
   /// pushed onto the end of To[], then the item is efficiently removed
   /// from From[] by effectively swapping it with the last item in
@@ -58,12 +61,9 @@
   OrderedRanges UnhandledPrecolored;
   UnorderedRanges Active, Inactive, Handled;
   std::vector<InstNumberT> Kills;
+  RegAllocKind Kind = RAK_Unknown;
   bool FindPreference = false;
   bool FindOverlap = false;
-  // TODO(stichnot): We're not really using FindOverlap yet, but we
-  // may want a flavor of register allocation where FindPreference is
-  // useful but we didn't want to initialize VMetadata with VMK_All
-  // and therefore we can't safely allow overlap.
 };
 
 } // end of namespace Ice
diff --git a/src/IceTargetLowering.cpp b/src/IceTargetLowering.cpp
index 671bab0..417ad2f 100644
--- a/src/IceTargetLowering.cpp
+++ b/src/IceTargetLowering.cpp
@@ -211,6 +211,20 @@
   Context.advanceNext();
 }
 
+void TargetLowering::lowerInst(CfgNode *Node, InstList::iterator Next,
+                               InstHighLevel *Instr) {
+  // TODO(stichnot): Consider modifying the design/implementation to avoid
+  // multiple init() calls when using lowerInst() to lower several instructions
+  // in the same node.
+  Context.init(Node);
+  Context.setNext(Next);
+  Context.insert(Instr);
+  --Next;
+  assert(&*Next == Instr);
+  Context.setCur(Next);
+  lower();
+}
+
 void TargetLowering::lowerOther(const Inst *Instr) {
   (void)Instr;
   Func->setError("Can't lower unsupported instruction type");
diff --git a/src/IceTargetLowering.h b/src/IceTargetLowering.h
index e87bf93..018a555 100644
--- a/src/IceTargetLowering.h
+++ b/src/IceTargetLowering.h
@@ -63,6 +63,8 @@
   Inst *getLastInserted() const;
   void advanceCur() { Cur = Next; }
   void advanceNext() { advanceForward(Next); }
+  void setCur(InstList::iterator C) { Cur = C; }
+  void setNext(InstList::iterator N) { Next = N; }
   void rewind();
   void setInsertPoint(const InstList::iterator &Position) { Next = Position; }
 
@@ -136,16 +138,15 @@
   void doNopInsertion();
   /// Lowers a single non-Phi instruction.
   void lower();
+  /// Inserts and lowers a single high-level instruction at a specific insertion
+  /// point.
+  void lowerInst(CfgNode *Node, InstList::iterator Next, InstHighLevel *Instr);
   /// Does preliminary lowering of the set of Phi instructions in the
   /// current node.  The main intention is to do what's needed to keep
   /// the unlowered Phi instructions consistent with the lowered
   /// non-Phi instructions, e.g. to lower 64-bit operands on a 32-bit
   /// target.
   virtual void prelowerPhis() {}
-  /// Lowers a list of "parallel" assignment instructions representing
-  /// a topological sort of the Phi instructions.
-  virtual void lowerPhiAssignments(CfgNode *Node,
-                                   const AssignList &Assignments) = 0;
   /// Tries to do branch optimization on a single instruction.  Returns
   /// true if some optimization was done.
   virtual bool doBranchOpt(Inst * /*I*/, const CfgNode * /*NextNode*/) {
diff --git a/src/IceTargetLoweringARM32.cpp b/src/IceTargetLoweringARM32.cpp
index faf1aa0..0925467 100644
--- a/src/IceTargetLoweringARM32.cpp
+++ b/src/IceTargetLoweringARM32.cpp
@@ -1470,26 +1470,22 @@
     _mov(T_Hi, Src0Hi);
     _mov(DestHi, T_Hi);
   } else {
-    Operand *SrcR;
+    Operand *NewSrc;
     if (Dest->hasReg()) {
-      // If Dest already has a physical register, then legalize the
-      // Src operand into a Variable with the same register
-      // assignment.  This is mostly a workaround for advanced phi
-      // lowering's ad-hoc register allocation which assumes no
-      // register allocation is needed when at least one of the
-      // operands is non-memory.
-      // TODO(jvoung): check this for ARM.
-      SrcR = legalize(Src0, Legal_Reg, Dest->getRegNum());
+      // If Dest already has a physical register, then legalize the Src operand
+      // into a Variable with the same register assignment.  This especially
+      // helps allow the use of Flex operands.
+      NewSrc = legalize(Src0, Legal_Reg | Legal_Flex, Dest->getRegNum());
     } else {
       // Dest could be a stack operand. Since we could potentially need
       // to do a Store (and store can only have Register operands),
       // legalize this to a register.
-      SrcR = legalize(Src0, Legal_Reg);
+      NewSrc = legalize(Src0, Legal_Reg);
     }
     if (isVectorType(Dest->getType())) {
       UnimplementedError(Func->getContext()->getFlags());
     } else {
-      _mov(Dest, SrcR);
+      _mov(Dest, NewSrc);
     }
   }
 }
@@ -2415,15 +2411,6 @@
   PhiLowering::prelowerPhis32Bit<TargetARM32>(this, Context.getNode(), Func);
 }
 
-// Lower the pre-ordered list of assignments into mov instructions.
-// Also has to do some ad-hoc register allocation as necessary.
-void TargetARM32::lowerPhiAssignments(CfgNode *Node,
-                                      const AssignList &Assignments) {
-  (void)Node;
-  (void)Assignments;
-  UnimplementedError(Func->getContext()->getFlags());
-}
-
 Variable *TargetARM32::makeVectorOfZeros(Type Ty, int32_t RegNum) {
   Variable *Reg = makeReg(Ty, RegNum);
   UnimplementedError(Func->getContext()->getFlags());
diff --git a/src/IceTargetLoweringARM32.h b/src/IceTargetLoweringARM32.h
index 198563c..9727bd9 100644
--- a/src/IceTargetLoweringARM32.h
+++ b/src/IceTargetLoweringARM32.h
@@ -132,8 +132,6 @@
   void lowerSwitch(const InstSwitch *Inst) override;
   void lowerUnreachable(const InstUnreachable *Inst) override;
   void prelowerPhis() override;
-  void lowerPhiAssignments(CfgNode *Node,
-                           const AssignList &Assignments) override;
   void doAddressOptLoad() override;
   void doAddressOptStore() override;
   void randomlyInsertNop(float Probability) override;
diff --git a/src/IceTargetLoweringMIPS32.cpp b/src/IceTargetLoweringMIPS32.cpp
index add97b3..99c9928 100644
--- a/src/IceTargetLoweringMIPS32.cpp
+++ b/src/IceTargetLoweringMIPS32.cpp
@@ -648,15 +648,6 @@
   UnimplementedError(Func->getContext()->getFlags());
 }
 
-// Lower the pre-ordered list of assignments into mov instructions.
-// Also has to do some ad-hoc register allocation as necessary.
-void TargetMIPS32::lowerPhiAssignments(CfgNode *Node,
-                                       const AssignList &Assignments) {
-  (void)Node;
-  (void)Assignments;
-  UnimplementedError(Func->getContext()->getFlags());
-}
-
 void TargetMIPS32::postLower() {
   if (Ctx->getFlags().getOptLevel() == Opt_m1)
     return;
diff --git a/src/IceTargetLoweringMIPS32.h b/src/IceTargetLoweringMIPS32.h
index e63d738..5ec9c0b 100644
--- a/src/IceTargetLoweringMIPS32.h
+++ b/src/IceTargetLoweringMIPS32.h
@@ -112,8 +112,6 @@
   void lowerSwitch(const InstSwitch *Inst) override;
   void lowerUnreachable(const InstUnreachable *Inst) override;
   void prelowerPhis() override;
-  void lowerPhiAssignments(CfgNode *Node,
-                           const AssignList &Assignments) override;
   void doAddressOptLoad() override;
   void doAddressOptStore() override;
   void randomlyInsertNop(float Probability) override;
diff --git a/src/IceTargetLoweringX86Base.h b/src/IceTargetLoweringX86Base.h
index 58de721..84a1e19 100644
--- a/src/IceTargetLoweringX86Base.h
+++ b/src/IceTargetLoweringX86Base.h
@@ -144,8 +144,6 @@
   void lowerOther(const Inst *Instr) override;
   void lowerRMW(const typename Traits::Insts::FakeRMW *RMW);
   void prelowerPhis() override;
-  void lowerPhiAssignments(CfgNode *Node,
-                           const AssignList &Assignments) override;
   void doAddressOptLoad() override;
   void doAddressOptStore() override;
   void randomlyInsertNop(float Probability) override;
diff --git a/src/IceTargetLoweringX86BaseImpl.h b/src/IceTargetLoweringX86BaseImpl.h
index d9cc5e4..046d88f 100644
--- a/src/IceTargetLoweringX86BaseImpl.h
+++ b/src/IceTargetLoweringX86BaseImpl.h
@@ -387,13 +387,7 @@
   Func->dump("After linear scan regalloc");
 
   if (Ctx->getFlags().getPhiEdgeSplit()) {
-    // We need to pause constant blinding or pooling during advanced phi
-    // lowering, unless the lowering assignment has a physical register for the
-    // dest Variable.
-    {
-      BoolFlagSaver B(RandomizationPoolingPaused, true);
-      Func->advancedPhiLowering();
-    }
+    Func->advancedPhiLowering();
     Func->dump("After advanced Phi lowering");
   }
 
@@ -2064,32 +2058,21 @@
     _mov(T_Hi, Src0Hi);
     _mov(DestHi, T_Hi);
   } else {
-    Operand *RI;
+    Operand *Src0Legal;
     if (Dest->hasReg()) {
-      // If Dest already has a physical register, then legalize the
-      // Src operand into a Variable with the same register
-      // assignment.  This is mostly a workaround for advanced phi
-      // lowering's ad-hoc register allocation which assumes no
-      // register allocation is needed when at least one of the
-      // operands is non-memory.
-
-      // If we have a physical register for the dest variable, we can
-      // enable our constant blinding or pooling again. Note this is
-      // only for advancedPhiLowering(), the flag flip should leave
-      // no other side effect.
-      {
-        BoolFlagSaver B(RandomizationPoolingPaused, false);
-        RI = legalize(Src0, Legal_Reg, Dest->getRegNum());
-      }
+      // If Dest already has a physical register, then only basic legalization
+      // is needed, as the source operand can be a register, immediate, or
+      // memory.
+      Src0Legal = legalize(Src0);
     } else {
       // If Dest could be a stack operand, then RI must be a physical
       // register or a scalar integer immediate.
-      RI = legalize(Src0, Legal_Reg | Legal_Imm);
+      Src0Legal = legalize(Src0, Legal_Reg | Legal_Imm);
     }
     if (isVectorType(Dest->getType()))
-      _movp(Dest, RI);
+      _movp(Dest, Src0Legal);
     else
-      _mov(Dest, RI);
+      _mov(Dest, Src0Legal);
   }
 }
 
@@ -4938,164 +4921,6 @@
       this, Context.getNode(), Func);
 }
 
-inline bool isMemoryOperand(const Operand *Opnd) {
-  if (const auto Var = llvm::dyn_cast<Variable>(Opnd))
-    return !Var->hasReg();
-  // We treat vector undef values the same as a memory operand,
-  // because they do in fact need a register to materialize the vector
-  // of zeroes into.
-  if (llvm::isa<ConstantUndef>(Opnd))
-    return isScalarFloatingType(Opnd->getType()) ||
-           isVectorType(Opnd->getType());
-  if (llvm::isa<Constant>(Opnd))
-    return isScalarFloatingType(Opnd->getType());
-  return true;
-}
-
-/// Lower the pre-ordered list of assignments into mov instructions.
-/// Also has to do some ad-hoc register allocation as necessary.
-template <class Machine>
-void TargetX86Base<Machine>::lowerPhiAssignments(
-    CfgNode *Node, const AssignList &Assignments) {
-  // Check that this is a properly initialized shell of a node.
-  assert(Node->getOutEdges().size() == 1);
-  assert(Node->getInsts().empty());
-  assert(Node->getPhis().empty());
-  CfgNode *Succ = Node->getOutEdges().front();
-  getContext().init(Node);
-  // Register set setup similar to regAlloc().
-  RegSetMask RegInclude = RegSet_All;
-  RegSetMask RegExclude = RegSet_StackPointer;
-  if (hasFramePointer())
-    RegExclude |= RegSet_FramePointer;
-  llvm::SmallBitVector Available = getRegisterSet(RegInclude, RegExclude);
-  bool NeedsRegs = false;
-  // Initialize the set of available registers to the set of what is
-  // available (not live) at the beginning of the successor block,
-  // minus all registers used as Dest operands in the Assignments.  To
-  // do this, we start off assuming all registers are available, then
-  // iterate through the Assignments and remove Dest registers.
-  // During this iteration, we also determine whether we will actually
-  // need any extra registers for memory-to-memory copies.  If so, we
-  // do the actual work of removing the live-in registers from the
-  // set.  TODO(stichnot): This work is being repeated for every split
-  // edge to the successor, so consider updating LiveIn just once
-  // after all the edges are split.
-  for (const Inst &I : Assignments) {
-    Variable *Dest = I.getDest();
-    if (Dest->hasReg()) {
-      Available[Dest->getRegNum()] = false;
-    } else if (isMemoryOperand(I.getSrc(0))) {
-      NeedsRegs = true; // Src and Dest are both in memory
-    }
-  }
-  if (NeedsRegs) {
-    LivenessBV &LiveIn = Func->getLiveness()->getLiveIn(Succ);
-    for (int i = LiveIn.find_first(); i != -1; i = LiveIn.find_next(i)) {
-      Variable *Var = Func->getLiveness()->getVariable(i, Succ);
-      if (Var->hasReg())
-        Available[Var->getRegNum()] = false;
-    }
-  }
-  // Iterate backwards through the Assignments.  After lowering each
-  // assignment, add Dest to the set of available registers, and
-  // remove Src from the set of available registers.  Iteration is
-  // done backwards to enable incremental updates of the available
-  // register set, and the lowered instruction numbers may be out of
-  // order, but that can be worked around by renumbering the block
-  // afterwards if necessary.
-  for (const Inst &I : reverse_range(Assignments)) {
-    Context.rewind();
-    auto Assign = llvm::dyn_cast<InstAssign>(&I);
-    Variable *Dest = Assign->getDest();
-
-    // If the source operand is ConstantUndef, do not legalize it.  In function
-    // test_split_undef_int_vec, the advanced phi lowering process will find an
-    // assignment of undefined vector. This vector, as the Src here, will crash
-    // if it go through legalize(). legalize() will create a new variable with
-    // makeVectorOfZeros(), but this new variable will be assigned a stack
-    // slot. This will fail with pxor(Var, Var) because it is an illegal
-    // instruction form. Note this failure is irrelevant to randomization or
-    // pooling of constants.  So, we do not call legalize() to add pool label
-    // for the src operands of phi assignment instructions.  Instead, we
-    // manually add pool label for constant float and constant double values
-    // here.  Note going through legalize() does not affect the testing results
-    // of SPEC2K and xtests.
-    Operand *Src = Assign->getSrc(0);
-    if (!llvm::isa<ConstantUndef>(Assign->getSrc(0))) {
-      Src = legalize(Src);
-    }
-
-    Variable *SrcVar = llvm::dyn_cast<Variable>(Src);
-    // Use normal assignment lowering, except lower mem=mem specially
-    // so we can register-allocate at the same time.
-    if (!isMemoryOperand(Dest) || !isMemoryOperand(Src)) {
-      lowerAssign(Assign);
-    } else {
-      assert(Dest->getType() == Src->getType());
-      const llvm::SmallBitVector &RegsForType =
-          getRegisterSetForType(Dest->getType());
-      llvm::SmallBitVector AvailRegsForType = RegsForType & Available;
-      Variable *SpillLoc = nullptr;
-      Variable *Preg = nullptr;
-      // TODO(stichnot): Opportunity for register randomization.
-      int32_t RegNum = AvailRegsForType.find_first();
-      bool IsVector = isVectorType(Dest->getType());
-      bool NeedSpill = (RegNum == -1);
-      if (NeedSpill) {
-        // Pick some register to spill and update RegNum.
-        // TODO(stichnot): Opportunity for register randomization.
-        RegNum = RegsForType.find_first();
-        Preg = getPhysicalRegister(RegNum, Dest->getType());
-        SpillLoc = Func->makeVariable(Dest->getType());
-        // Create a fake def of the physical register to avoid
-        // liveness inconsistency problems during late-stage liveness
-        // analysis (e.g. asm-verbose mode).
-        Context.insert(InstFakeDef::create(Func, Preg));
-        if (IsVector)
-          _movp(SpillLoc, Preg);
-        else
-          _mov(SpillLoc, Preg);
-      }
-      assert(RegNum >= 0);
-      if (llvm::isa<ConstantUndef>(Src))
-        // Materialize an actual constant instead of undef.  RegNum is
-        // passed in for vector types because undef vectors are
-        // lowered to vector register of zeroes.
-        Src =
-            legalize(Src, Legal_All, IsVector ? RegNum : Variable::NoRegister);
-      Variable *Tmp = makeReg(Dest->getType(), RegNum);
-      if (IsVector) {
-        _movp(Tmp, Src);
-        _movp(Dest, Tmp);
-      } else {
-        _mov(Tmp, Src);
-        _mov(Dest, Tmp);
-      }
-      if (NeedSpill) {
-        // Restore the spilled register.
-        if (IsVector)
-          _movp(Preg, SpillLoc);
-        else
-          _mov(Preg, SpillLoc);
-        // Create a fake use of the physical register to keep it live
-        // for late-stage liveness analysis (e.g. asm-verbose mode).
-        Context.insert(InstFakeUse::create(Func, Preg));
-      }
-    }
-    // Update register availability before moving to the previous
-    // instruction on the Assignments list.
-    if (Dest->hasReg())
-      Available[Dest->getRegNum()] = true;
-    if (SrcVar && SrcVar->hasReg())
-      Available[SrcVar->getRegNum()] = false;
-  }
-
-  // Add the terminator branch instruction to the end.
-  Context.setInsertPoint(Context.getEnd());
-  _br(Succ);
-}
-
 // There is no support for loading or emitting vector constants, so the
 // vector values returned from makeVectorOfZeros, makeVectorOfOnes,
 // etc. are initialized with register operations.
diff --git a/src/IceTimerTree.def b/src/IceTimerTree.def
index 8c955c7..663cc39 100644
--- a/src/IceTimerTree.def
+++ b/src/IceTimerTree.def
@@ -36,6 +36,7 @@
   X(liveness)                  \
   X(livenessLightweight)       \
   X(llvmConvert)               \
+  X(lowerPhiAssignments)       \
   X(parse)                     \
   X(parseConstants)            \
   X(parseFunctions)            \