[NVPTX] Move NVPTXPeephole after NVPTXPrologEpilogPass

Summary:
Offset of frame index is calculated by NVPTXPrologEpilogPass. Before
that the correct offset of stack objects cannot be obtained, which
leads to wrong offset if there are more than 2 frame objects. This patch
move NVPTXPeephole after NVPTXPrologEpilogPass. Because the frame index
is already replaced by %VRFrame in NVPTXPrologEpilogPass, we check
VRFrame register instead, and try to remove the VRFrame if there
is no usage after NVPTXPeephole pass.

Patched by Xuetian Weng. 

Test Plan:
Strengthened test/CodeGen/NVPTX/local-stack-frame.ll to check the
offset calculation based on SP and SPL.

Reviewers: jholewinski, jingyue

Reviewed By: jingyue

Subscribers: jholewinski, llvm-commits

Differential Revision: http://reviews.llvm.org/D10853

llvm-svn: 241185
diff --git a/llvm/test/CodeGen/NVPTX/local-stack-frame.ll b/llvm/test/CodeGen/NVPTX/local-stack-frame.ll
index fba5dd8..ef1b7da 100644
--- a/llvm/test/CodeGen/NVPTX/local-stack-frame.ll
+++ b/llvm/test/CodeGen/NVPTX/local-stack-frame.ll
@@ -59,10 +59,16 @@
 
 ; PTX32:        cvta.local.u32   %SP, %SPL;
 ; PTX32:        add.u32          {{%r[0-9]+}}, %SP, 0;
+; PTX32:        add.u32          {{%r[0-9]+}}, %SPL, 0;
+; PTX32:        add.u32          {{%r[0-9]+}}, %SP, 4;
+; PTX32:        add.u32          {{%r[0-9]+}}, %SPL, 4;
 ; PTX32:        st.local.u32     [{{%r[0-9]+}}], {{%r[0-9]+}}
 ; PTX32:        st.local.u32     [{{%r[0-9]+}}], {{%r[0-9]+}}
 ; PTX64:        cvta.local.u64   %SP, %SPL;
 ; PTX64:        add.u64          {{%rd[0-9]+}}, %SP, 0;
+; PTX64:        add.u64          {{%rd[0-9]+}}, %SPL, 0;
+; PTX64:        add.u64          {{%rd[0-9]+}}, %SP, 4;
+; PTX64:        add.u64          {{%rd[0-9]+}}, %SPL, 4;
 ; PTX64:        st.local.u32     [{{%rd[0-9]+}}], {{%r[0-9]+}}
 ; PTX64:        st.local.u32     [{{%rd[0-9]+}}], {{%r[0-9]+}}
 define void @foo4() {