ARM: align loops to 4 bytes on Cortex-M3 and Cortex-M4. The Technical Reference Manuals for these two CPUs state that branching to an unaligned 32-bit instruction incurs an extra pipeline reload penalty. That's bad. This also enables the optimization at -Os since it costs on average one byte per loop in return for 1 cycle per iteration, which is pretty good going. llvm-svn: 342127

commit: c15d47bb013e975da582c8fd786ba8234d70d75d [log] [tgz]
author: Tim Northover <tnorthover@apple.com> Thu Sep 13 10:28:05 2018 +0000
committer: Tim Northover <tnorthover@apple.com> Thu Sep 13 10:28:05 2018 +0000
tree: e13262451793600a29c0df26342fc954e1a8a79a
parent: 95ac65bc32180744cbc67d4e82a0f6417fb92aa9 [diff] [blame]
diff --git a/llvm/lib/CodeGen/MachineBlockPlacement.cpp b/llvm/lib/CodeGen/MachineBlockPlacement.cpp
index 21350df6..624d336 100644
--- a/llvm/lib/CodeGen/MachineBlockPlacement.cpp
+++ b/llvm/lib/CodeGen/MachineBlockPlacement.cpp

@@ -2497,7 +2497,8 @@
   // exclusively on the loop info here so that we can align backedges in
   // unnatural CFGs and backedges that were introduced purely because of the
   // loop rotations done during this layout pass.
-  if (F->getFunction().optForSize())
+  if (F->getFunction().optForMinSize() ||
+      (F->getFunction().optForSize() && !TLI->alignLoopsWithOptSize()))
     return;
   BlockChain &FunctionChain = *BlockToChain[&F->front()];
   if (FunctionChain.begin() == FunctionChain.end())
commit	c15d47bb013e975da582c8fd786ba8234d70d75d	[log] [tgz]
author	Tim Northover <tnorthover@apple.com>	Thu Sep 13 10:28:05 2018 +0000
committer	Tim Northover <tnorthover@apple.com>	Thu Sep 13 10:28:05 2018 +0000
tree	e13262451793600a29c0df26342fc954e1a8a79a
parent	95ac65bc32180744cbc67d4e82a0f6417fb92aa9 [diff] [blame]