[AMDGPU] Waitcnt pass: Modify the waitcnt pass to propagate info in the case of a single basic block loop. mergeInputScoreBrackets() does this for us; update it so that it processes the single bb's score bracket when processing the single bb's preds. It is, after all, a pred of itself, so it's score bracket is needed.

Differential Revision: https://reviews.llvm.org/D44434

llvm-svn: 327583
diff --git a/llvm/test/CodeGen/AMDGPU/waitcnt-loop-single-basic-block.mir b/llvm/test/CodeGen/AMDGPU/waitcnt-loop-single-basic-block.mir
new file mode 100644
index 0000000..1067d47
--- /dev/null
+++ b/llvm/test/CodeGen/AMDGPU/waitcnt-loop-single-basic-block.mir
@@ -0,0 +1,26 @@
+# RUN: llc -o - %s -march=amdgcn -run-pass=si-insert-waitcnts -verify-machineinstrs | FileCheck -check-prefix=GCN %s
+
+# Check that the waitcnt propogates info in the case of a single basic block loop
+
+# GCN-LABEL: waitcnt-loop-single-basic-block
+# GCN: bb.0
+# GCN: S_WAITCNT 3952
+# GCN-NEXT: GLOBAL_STORE_DWORD
+# GCN: S_WAITCNT 3953
+# GCN-NEXT: GLOBAL_STORE_DWORD
+
+...
+name: waitcnt-loop-single-basic-block
+body: |
+  bb.0:
+    S_BRANCH %bb.1
+  bb.1:
+    GLOBAL_STORE_DWORD $vgpr7_vgpr8, $vgpr11, 0, 0, 0, implicit $exec
+    $vgpr21 = GLOBAL_LOAD_DWORD $vgpr4_vgpr5, 0, 0, 0, implicit $exec
+    $vgpr10 = GLOBAL_LOAD_DWORD $vgpr10_vgpr11, 0, 0, 0, implicit $exec
+    GLOBAL_STORE_DWORD $vgpr14_vgpr15, $vgpr21, 0, 0, 0, implicit $exec
+    $vgpr11 = GLOBAL_LOAD_DWORD $vgpr11_vgpr12, 0, 0, 0, implicit $exec
+    S_CBRANCH_SCC1 %bb.1, implicit $scc
+  bb.2:
+    S_ENDPGM
+...