Improve recognition of select-based period induction.

Rationale:
Similar to the previous CL, this helps to eliminate more dead induction.
Now, CaffeineLogic, when compiled with dx (rather than jack) improves
by a 1.5 speedup (9000us -> 6000us).

Note:
We need to run the simplifier before induction analysis to trigger
the select simplification first. Although a bit of a compile-time hit,
it seems a good idea to run a simplifier here again anyway.

Test: test-art-host
Change-Id: I93b91ca40a4d64385c64393028e8d213f0c904a8
diff --git a/compiler/optimizing/optimizing_compiler.cc b/compiler/optimizing/optimizing_compiler.cc
index 6ba0963..532ea43 100644
--- a/compiler/optimizing/optimizing_compiler.cc
+++ b/compiler/optimizing/optimizing_compiler.cc
@@ -743,8 +743,10 @@
   HLoopOptimization* loop = new (arena) HLoopOptimization(graph, induction);
   HSharpening* sharpening = new (arena) HSharpening(graph, codegen, dex_compilation_unit, driver);
   InstructionSimplifier* simplify2 = new (arena) InstructionSimplifier(
-      graph, stats, "instruction_simplifier$after_bce");
+      graph, stats, "instruction_simplifier$after_inlining");
   InstructionSimplifier* simplify3 = new (arena) InstructionSimplifier(
+      graph, stats, "instruction_simplifier$after_bce");
+  InstructionSimplifier* simplify4 = new (arena) InstructionSimplifier(
       graph, stats, "instruction_simplifier$before_codegen");
   IntrinsicsRecognizer* intrinsics = new (arena) IntrinsicsRecognizer(graph, stats);
 
@@ -764,6 +766,7 @@
     // redundant suspend checks to recognize empty blocks.
     select_generator,
     fold2,  // TODO: if we don't inline we can also skip fold2.
+    simplify2,
     side_effects,
     gvn,
     licm,
@@ -771,13 +774,13 @@
     bce,
     loop,
     fold3,  // evaluates code generated by dynamic bce
-    simplify2,
+    simplify3,
     lse,
     dce2,
     // The codegen has a few assumptions that only the instruction simplifier
     // can satisfy. For example, the code generator does not expect to see a
     // HTypeConversion from a type to the same type.
-    simplify3,
+    simplify4,
   };
   RunOptimizations(optimizations2, arraysize(optimizations2), pass_observer);