[ARM] Make -mcpu=generic schedule for an in-order core (Cortex-A8).

The benchmarking summarized in
http://lists.llvm.org/pipermail/llvm-dev/2017-May/113525.html showed
this is beneficial for a wide range of cores.

As is to be expected, quite a few small adaptations are needed to the
regressions tests, as the difference in scheduling results in:
- Quite a few small instruction schedule differences.
- A few changes in register allocation decisions caused by different
 instruction schedules.
- A few changes in IfConversion decisions, due to a difference in
 instruction schedule and/or the estimated cost of a branch mispredict.

llvm-svn: 306514
diff --git a/llvm/test/CodeGen/ARM/arm-shrink-wrapping-linux.ll b/llvm/test/CodeGen/ARM/arm-shrink-wrapping-linux.ll
index 1434f40..7007018 100644
--- a/llvm/test/CodeGen/ARM/arm-shrink-wrapping-linux.ll
+++ b/llvm/test/CodeGen/ARM/arm-shrink-wrapping-linux.ll
@@ -45,15 +45,19 @@
 ; CHECK: @ %while.cond2
 ; CHECK: add
 ; CHECK-NEXT: cmp r{{[0-1]+}}, #1
-; Set the return value.
-; CHECK-NEXT: moveq r0,
-; CHECK-NEXT: popeq
+; Jump to the return block
+; CHECK-NEXT: beq [[RETURN_BLOCK:[.a-zA-Z0-9_]+]]
 ;
 ; Use the back edge to check we get the label of the loop right.
 ; This is to make sure we check the right loop pattern.
 ; CHECK:  @ %while.body24.land.rhs14_crit_edge
 ; CHECK: cmp r{{[0-9]+}}, #192
 ; CHECK-NEXT bhs [[LOOP_HEADER]]
+;
+; CHECK: [[RETURN_BLOCK]]:
+; Set the return value.
+; CHECK-NEXT: mov r0,
+; CHECK-NEXT: pop
 define fastcc i8* @wrongUseOfPostDominate(i8* readonly %s, i32 %off, i8* readnone %lim) {
 entry:
   %cmp = icmp sgt i32 %off, -1