Change Thumb2 jumptable codegen to one that uses two level jumps:
Before:
adr r12, #LJTI3_0_0
ldr pc, [r12, +r0, lsl #2]
LJTI3_0_0:
.long LBB3_24
.long LBB3_30
.long LBB3_31
.long LBB3_32
After:
adr r12, #LJTI3_0_0
add pc, r12, +r0, lsl #2
LJTI3_0_0:
b.w LBB3_24
b.w LBB3_30
b.w LBB3_31
b.w LBB3_32
This has several advantages.
1. This will make it easier to optimize this to a TBB / TBH instruction +
(smaller) table.
2. This eliminate the need for ugly asm printer hack to force the address
into thumb addresses (bit 0 is one).
3. Same codegen for pic and non-pic.
4. This eliminate the need to align the table so constantpool island pass
won't have to over-estimate the size.
Based on my calculation, the later is probably slightly faster as well since
ldr pc with shifter address is very slow. That is, it should be a win as long
as the HW implementation can do a reasonable job of branch predict the second
branch.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@77024 91177308-0d34-0410-b5e6-96231b3b80d8
diff --git a/lib/Target/ARM/Thumb2InstrInfo.cpp b/lib/Target/ARM/Thumb2InstrInfo.cpp
index f1ac221..4d442c0 100644
--- a/lib/Target/ARM/Thumb2InstrInfo.cpp
+++ b/lib/Target/ARM/Thumb2InstrInfo.cpp
@@ -38,9 +38,6 @@
case ARMII::ADDrr: return ARM::t2ADDrr;
case ARMII::B: return ARM::t2B;
case ARMII::Bcc: return ARM::t2Bcc;
- case ARMII::BR_JTr: return ARM::t2BR_JTr;
- case ARMII::BR_JTm: return ARM::t2BR_JTm;
- case ARMII::BR_JTadd: return ARM::t2BR_JTadd;
case ARMII::BX_RET: return ARM::tBX_RET;
case ARMII::LDRrr: return ARM::t2LDRs;
case ARMII::LDRri: return ARM::t2LDRi12;
@@ -64,9 +61,7 @@
switch (MBB.back().getOpcode()) {
case ARM::t2LDM_RET:
case ARM::t2B: // Uncond branch.
- case ARM::t2BR_JTr: // Jumptable branch.
- case ARM::t2BR_JTm: // Jumptable branch through mem.
- case ARM::t2BR_JTadd: // Jumptable branch add to pc.
+ case ARM::t2BR_JT: // Jumptable branch.
case ARM::tBR_JTr: // Jumptable branch (16-bit version).
case ARM::tBX_RET:
case ARM::tBX_RET_vararg: