[Clang] New loop pragma vectorize_predicate
This adds a new vectorize predication loop hint:
#pragma clang loop vectorize_predicate(enable)
that can be used to indicate to the vectoriser that all (load/store)
instructions should be predicated (masked). This allows, for example, folding
of the remainder loop into the main loop.
This patch will be followed up with D64916 and D65197. The former is a
refactoring in the loopvectorizer and the groundwork to make tail loop folding
a more general concept, and in the latter the actual tail loop folding
transformation will be implemented.
Differential Revision: https://reviews.llvm.org/D64744
llvm-svn: 366989
diff --git a/clang/docs/LanguageExtensions.rst b/clang/docs/LanguageExtensions.rst
index cb72c45..5bd234f2 100644
--- a/clang/docs/LanguageExtensions.rst
+++ b/clang/docs/LanguageExtensions.rst
@@ -2946,12 +2946,12 @@
The ``#pragma clang loop`` directive is used to specify hints for optimizing the
subsequent for, while, do-while, or c++11 range-based for loop. The directive
-provides options for vectorization, interleaving, unrolling and
+provides options for vectorization, interleaving, predication, unrolling and
distribution. Loop hints can be specified before any loop and will be ignored if
the optimization is not safe to apply.
-Vectorization and Interleaving
-------------------------------
+Vectorization, Interleaving, and Predication
+--------------------------------------------
A vectorized loop performs multiple iterations of the original loop
in parallel using vector instructions. The instruction set of the target
@@ -2994,6 +2994,21 @@
Specifying a width/count of 1 disables the optimization, and is equivalent to
``vectorize(disable)`` or ``interleave(disable)``.
+Vector predication is enabled by ``vectorize_predicate(enable)``, for example:
+
+.. code-block:: c++
+
+ #pragma clang loop vectorize(enable)
+ #pragma clang loop vectorize_predicate(enable)
+ for(...) {
+ ...
+ }
+
+This predicates (masks) all instructions in the loop, which allows the scalar
+remainder loop (the tail) to be folded into the main vectorized loop. This
+might be more efficient when vector predication is efficiently supported by the
+target platform.
+
Loop Unrolling
--------------