ARM: Improve codegen for vget_low_* and vget_high_ intrinsics.
These intrinsics use the __builtin_shuffle() function to extract the
low and high half, respectively, of a 128-bit NEON vector. Currently,
they're defined to use bitcasts to simplify the emitter, so we get code
like:
uint16x4_t vget_low_u32(uint16x8_t __a) {
return (uint32x2_t) __builtin_shufflevector((int64x2_t) __a,
(int64x2_t) __a,
0);
}
While this works, it results in those bitcasts going all the way through
to the IR, resulting in code like:
%1 = bitcast <8 x i16> %in to <2 x i64>
%2 = shufflevector <2 x i64> %1, <2 x i64> undef, <1 x i32>
%zeroinitializer
%3 = bitcast <1 x i64> %2 to <4 x i16>
We can instead easily perform the operation directly on the input vector
like:
uint16x4_t vget_low_u16(uint16x8_t __a) {
return __builtin_shufflevector(__a, __a, 0, 1, 2, 3);
}
Not only is that much easier to read on its own, it also results in
cleaner IR like:
%1 = shufflevector <8 x i16> %in, <8 x i16> undef,
<4 x i32> <i32 0, i32 1, i32 2, i32 3>
This is both easier to read and easier for the back end to reason
about effectively since the operation is obfuscating the source with
bitcasts.
rdar://13894163
git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@181865 91177308-0d34-0410-b5e6-96231b3b80d8
diff --git a/utils/TableGen/NeonEmitter.cpp b/utils/TableGen/NeonEmitter.cpp
index 34b955e..05505c9 100644
--- a/utils/TableGen/NeonEmitter.cpp
+++ b/utils/TableGen/NeonEmitter.cpp
@@ -1410,12 +1410,17 @@
s += ", (int64x1_t)__b, 0, 1);";
break;
case OpHi:
- s += "(" + ts +
- ")__builtin_shufflevector((int64x2_t)__a, (int64x2_t)__a, 1);";
+ // nElts is for the result vector, so the source is twice that number.
+ s += "__builtin_shufflevector(__a, __a";
+ for (unsigned i = nElts; i < nElts * 2; ++i)
+ s += ", " + utostr(i);
+ s+= ");";
break;
case OpLo:
- s += "(" + ts +
- ")__builtin_shufflevector((int64x2_t)__a, (int64x2_t)__a, 0);";
+ s += "__builtin_shufflevector(__a, __a";
+ for (unsigned i = 0; i < nElts; ++i)
+ s += ", " + utostr(i);
+ s+= ");";
break;
case OpDup:
s += Duplicate(nElts, typestr, "__a") + ";";