[XRay] Align entry and return sleds to 2 byte boundaries

This should ensure that we can atomically write two bytes (on top of the
retq and the one past it) and have those two bytes not straddle cache
lines.

We also move the label past the alignment instruction so that we can refer
to the actual first instruction, as opposed to potential padding before the
aligned instruction.

Update the tests to allow us to reflect the new order of assembly.

Reviewers: rSerge, echristo, majnemer

Subscribers: llvm-commits, mehdi_amini

Differential Revision: https://reviews.llvm.org/D23101

llvm-svn: 277701
diff --git a/llvm/lib/Target/X86/X86MCInstLower.cpp b/llvm/lib/Target/X86/X86MCInstLower.cpp
index 18f9b3e..58033ee 100644
--- a/llvm/lib/Target/X86/X86MCInstLower.cpp
+++ b/llvm/lib/Target/X86/X86MCInstLower.cpp
@@ -1038,8 +1038,8 @@
                                                   X86MCInstLower &MCIL) {
   // We want to emit the following pattern:
   //
+  //   .p2align 1, ...
   // .Lxray_sled_N:
-  //   .palign 2, ...
   //   jmp .tmpN
   //   # 9 bytes worth of noops
   // .tmpN
@@ -1051,8 +1051,8 @@
   //   call <relative offset, 32-bits>   // 5 bytes
   //
   auto CurSled = OutContext.createTempSymbol("xray_sled_", true);
+  OutStreamer->EmitCodeAlignment(2);
   OutStreamer->EmitLabel(CurSled);
-  OutStreamer->EmitCodeAlignment(4);
   auto Target = OutContext.createTempSymbol();
 
   // Use a two-byte `jmp`. This version of JMP takes an 8-bit relative offset as
@@ -1074,12 +1074,14 @@
   //
   // We should emit the RET followed by sleds.
   //
+  //   .p2align 1, ...
   // .Lxray_sled_N:
   //   ret  # or equivalent instruction
   //   # 10 bytes worth of noops
   //
   // This just makes sure that the alignment for the next instruction is 2.
   auto CurSled = OutContext.createTempSymbol("xray_sled_", true);
+  OutStreamer->EmitCodeAlignment(2);
   OutStreamer->EmitLabel(CurSled);
   unsigned OpCode = MI.getOperand(0).getImm();
   MCInst Ret;