[HotColdSplitting] Identify larger cold regions using domtree queries

The current splitting algorithm works in three stages:

  1) Identify cold blocks, then
  2) Use forward/backward propagation to mark hot blocks, then
  3) Grow a SESE region of blocks *outside* of the set of hot blocks and
  start outlining.

While testing this pass on Apple internal frameworks I noticed that some
kinds of control flow (e.g. loops) are never outlined, even though they
unconditionally lead to / follow cold blocks. I noticed two other issues
related to how cold regions are identified:

  - An inconsistency can arise in the internal state of the hotness
  propagation stage, as a block may end up in both the ColdBlocks set
  and the HotBlocks set. Further inconsistencies can arise as these sets
  do not match what's in ProfileSummaryInfo.

  - It isn't necessary to limit outlining to single-exit regions.

This patch teaches the splitting algorithm to identify maximal cold
regions and outline them. A maximal cold region is defined as the set of
blocks post-dominated by a cold sink block, or dominated by that sink
block. This approach can successfully outline loops in the cold path. As
a side benefit, it maintains less internal state than the current
approach.

Due to a limitation in CodeExtractor, blocks within the maximal cold
region which aren't dominated by a single entry point (a so-called "max
ancestor") are filtered out.

Results:
  - X86 (LNT + -Os + externals): 134KB of TEXT were outlined compared to
  47KB pre-patch, or a ~3x improvement. Did not see a performance impact
  across two runs.
  - AArch64 (LNT + -Os + externals + Apple-internal benchmarks): 149KB
  of TEXT were outlined. Ditto re: performance impact.
  - Outlining results improve marginally in the internal frameworks I
  tested.

Follow-ups:
  - Outline more than once per function, outline large single basic
  blocks, & try to remove unconditional branches in outlined functions.

Differential Revision: https://reviews.llvm.org/D53627

llvm-svn: 345209
diff --git a/llvm/test/Transforms/HotColdSplit/multiple-exits.ll b/llvm/test/Transforms/HotColdSplit/multiple-exits.ll
new file mode 100644
index 0000000..2e7cf84
--- /dev/null
+++ b/llvm/test/Transforms/HotColdSplit/multiple-exits.ll
@@ -0,0 +1,73 @@
+; RUN: opt -S -hotcoldsplit < %s | FileCheck %s
+
+; Source:
+;
+; extern void sideeffect(int);
+; extern void __attribute__((cold)) sink();
+; void foo(int cond) {
+;   if (cond) { //< Start outlining here.
+;     sink();
+;     if (cond > 10)
+;       goto exit1;
+;     else
+;       goto exit2;
+;   }
+; exit1:
+;   sideeffect(1);
+;   return;
+; exit2:
+;   sideeffect(2);
+;   return;
+; }
+
+target datalayout = "e-m:o-i64:64-f80:128-n8:16:32:64-S128"
+target triple = "x86_64-apple-macosx10.14.0"
+
+; CHECK-LABEL: define {{.*}}@foo(
+; CHECK: br i1 {{.*}}, label %exit1, label %codeRepl
+; CHECK-LABEL: codeRepl:
+; CHECK: [[targetBlock:%.*]] = call i1 @foo.cold.1(
+; CHECK-NEXT: br i1 [[targetBlock]], label %exit1, label %[[return:.*]]
+; CHECK-LABEL: exit1:
+; CHECK: call {{.*}}@sideeffect(i32 1)
+; CHECK: [[return]]:
+; CHECK-NEXT: ret void
+define void @foo(i32 %cond) {
+entry:
+  %tobool = icmp eq i32 %cond, 0
+  br i1 %tobool, label %exit1, label %if.then
+
+if.then:                                          ; preds = %entry
+  tail call void (...) @sink()
+  %cmp = icmp sgt i32 %cond, 10
+  br i1 %cmp, label %exit1, label %exit2
+
+exit1:                                            ; preds = %entry, %if.then
+  call void @sideeffect(i32 1)
+  br label %return
+
+exit2:                                            ; preds = %if.then
+  call void @sideeffect(i32 2)
+  br label %return
+
+return:                                           ; preds = %exit2, %exit1
+  ret void
+}
+
+; CHECK-LABEL: define {{.*}}@foo.cold.1(
+; TODO: Eliminate this unnecessary unconditional branch.
+; CHECK: br
+; CHECK: [[exit1Stub:.*]]:
+; CHECK-NEXT: ret i1 true
+; CHECK: [[returnStub:.*]]:
+; CHECK-NEXT: ret i1 false
+; CHECK: call {{.*}}@sink
+; CHECK-NEXT: [[cmp:%.*]] = icmp
+; CHECK-NEXT: br i1 [[cmp]], label %[[exit1Stub]], label %exit2
+; CHECK-LABEL: exit2:
+; CHECK-NEXT: call {{.*}}@sideeffect(i32 2)
+; CHECK-NEXT: br label %[[returnStub]]
+
+declare void @sink(...) cold
+
+declare void @sideeffect(i32)