AMDGPU: Match load d16 hi instructions Also starts selecting global loads for constant address in some cases. Some end up selecting to mubuf still, which requires investigation. We still get sub-optimal regalloc and extra waitcnts inserted due to not really tracking the liveness of the separate register halves. llvm-svn: 313716

commit: b81495dccb2977e8861fa40e0b38657b46f148e6 [log] [tgz]
author: Matt Arsenault <Matthew.Arsenault@amd.com> Wed Sep 20 05:01:53 2017 +0000
committer: Matt Arsenault <Matthew.Arsenault@amd.com> Wed Sep 20 05:01:53 2017 +0000
tree: 5b9cc193b82458716f81e05376caf19c023f973f
parent: e08ccfe3a116e4c0907fed266788d05681cb5db2 [diff] [blame]
diff --git a/llvm/test/CodeGen/AMDGPU/fabs.f16.ll b/llvm/test/CodeGen/AMDGPU/fabs.f16.ll
index 9da2479..4429cfa 100644
--- a/llvm/test/CodeGen/AMDGPU/fabs.f16.ll
+++ b/llvm/test/CodeGen/AMDGPU/fabs.f16.ll

@@ -7,7 +7,7 @@
 ; unless isFabsFree returns true
 
 ; GCN-LABEL: {{^}}s_fabs_free_f16:
-; GCN: flat_load_ushort [[VAL:v[0-9]+]],
+; GCN: {{flat|global}}_load_ushort [[VAL:v[0-9]+]],
 ; GCN: v_and_b32_e32 [[RESULT:v[0-9]+]], 0x7fff, [[VAL]]
 ; GCN: {{flat|global}}_store_short v{{\[[0-9]+:[0-9]+\]}}, [[RESULT]]
 
@@ -75,8 +75,8 @@
 }
 
 ; GCN-LABEL: {{^}}fabs_fold_f16:
-; GCN: flat_load_ushort [[IN0:v[0-9]+]]
-; GCN: flat_load_ushort [[IN1:v[0-9]+]]
+; GCN: {{flat|global}}_load_ushort [[IN0:v[0-9]+]]
+; GCN: {{flat|global}}_load_ushort [[IN1:v[0-9]+]]
 
 ; CI-DAG: v_cvt_f32_f16_e32 [[CVT0:v[0-9]+]], [[IN0]]
 ; CI-DAG: v_cvt_f32_f16_e64 [[ABS_CVT1:v[0-9]+]], |[[IN1]]|
commit	b81495dccb2977e8861fa40e0b38657b46f148e6	[log] [tgz]
author	Matt Arsenault <Matthew.Arsenault@amd.com>	Wed Sep 20 05:01:53 2017 +0000
committer	Matt Arsenault <Matthew.Arsenault@amd.com>	Wed Sep 20 05:01:53 2017 +0000
tree	5b9cc193b82458716f81e05376caf19c023f973f
parent	e08ccfe3a116e4c0907fed266788d05681cb5db2 [diff] [blame]