blob: e8ec1e371b0498d941bb6839b377f091b3b4899c [file] [log] [blame]
Alexey Bataev8ede8192018-01-08 19:02:51 +00001.. raw:: html
2
3 <style type="text/css">
4 .none { background-color: #FFCCCC }
5 .partial { background-color: #FFFF99 }
6 .good { background-color: #CCFF99 }
7 </style>
8
9.. role:: none
10.. role:: partial
11.. role:: good
12
Alexey Bataev3bdd6002018-07-26 17:53:45 +000013.. contents::
14 :local:
15
Alexey Bataev8ede8192018-01-08 19:02:51 +000016==================
17OpenMP Support
18==================
19
Alexey Bataev3bdd6002018-07-26 17:53:45 +000020Clang fully supports OpenMP 4.5. Clang supports offloading to X86_64, AArch64,
21PPC64[LE] and has `basic support for Cuda devices`_.
Alexey Bataev8ede8192018-01-08 19:02:51 +000022
23Standalone directives
24=====================
25
26* #pragma omp [for] simd: :good:`Complete`.
27
28* #pragma omp declare simd: :partial:`Partial`. We support parsing/semantic
29 analysis + generation of special attributes for X86 target, but still
30 missing the LLVM pass for vectorization.
31
32* #pragma omp taskloop [simd]: :good:`Complete`.
33
34* #pragma omp target [enter|exit] data: :good:`Complete`.
35
36* #pragma omp target update: :good:`Complete`.
37
Alexey Bataevbbe81f22018-01-15 19:08:36 +000038* #pragma omp target: :good:`Complete`.
Alexey Bataev8ede8192018-01-08 19:02:51 +000039
Alexey Bataev3bdd6002018-07-26 17:53:45 +000040* #pragma omp declare target: :good:`Complete`.
Alexey Bataev8ede8192018-01-08 19:02:51 +000041
42* #pragma omp teams: :good:`Complete`.
43
44* #pragma omp distribute [simd]: :good:`Complete`.
45
46* #pragma omp distribute parallel for [simd]: :good:`Complete`.
47
48Combined directives
49===================
50
51* #pragma omp parallel for simd: :good:`Complete`.
52
Alexey Bataev96753022018-01-16 19:22:49 +000053* #pragma omp target parallel: :good:`Complete`.
Alexey Bataev8ede8192018-01-08 19:02:51 +000054
Alexey Bataev96753022018-01-16 19:22:49 +000055* #pragma omp target parallel for [simd]: :good:`Complete`.
Alexey Bataev8ede8192018-01-08 19:02:51 +000056
Alexey Bataev96753022018-01-16 19:22:49 +000057* #pragma omp target simd: :good:`Complete`.
Alexey Bataev8ede8192018-01-08 19:02:51 +000058
Alexey Bataev96753022018-01-16 19:22:49 +000059* #pragma omp target teams: :good:`Complete`.
Alexey Bataev8ede8192018-01-08 19:02:51 +000060
61* #pragma omp teams distribute [simd]: :good:`Complete`.
62
Alexey Bataev96753022018-01-16 19:22:49 +000063* #pragma omp target teams distribute [simd]: :good:`Complete`.
Alexey Bataev8ede8192018-01-08 19:02:51 +000064
65* #pragma omp teams distribute parallel for [simd]: :good:`Complete`.
66
Alexey Bataev96753022018-01-16 19:22:49 +000067* #pragma omp target teams distribute parallel for [simd]: :good:`Complete`.
Alexey Bataev8ede8192018-01-08 19:02:51 +000068
Alexey Bataev3bdd6002018-07-26 17:53:45 +000069Clang does not support any constructs/updates from upcoming OpenMP 5.0 except
70for `reduction`-based clauses in the `task` and `target`-based directives.
71
72In addition, the LLVM OpenMP runtime `libomp` supports the OpenMP Tools
73Interface (OMPT) on x86, x86_64, AArch64, and PPC64 on Linux, Windows, and mac OS.
74ows, and mac OS.
75
76.. _basic support for Cuda devices:
77
78Cuda devices support
79====================
80
81Directives execution modes
82--------------------------
83
84Clang code generation for target regions supports two modes: the SPMD and
85non-SPMD modes. Clang chooses one of these two modes automatically based on the
86way directives and clauses on those directives are used. The SPMD mode uses a
87simplified set of runtime functions thus increasing performance at the cost of
88supporting some OpenMP features. The non-SPMD mode is the most generic mode and
89supports all currently available OpenMP features. The compiler will always
90attempt to use the SPMD mode wherever possible. SPMD mode will not be used if:
91
92 - The target region contains an `if()` clause that refers to a `parallel`
93 directive.
94
95 - The target region contains a `parallel` directive with a `num_threads()`
96 clause.
97
98 - The target region contains user code (other than OpenMP-specific
99 directives) in between the `target` and the `parallel` directives.
100
101Data-sharing modes
102------------------
103
104Clang supports two data-sharing models for Cuda devices: `Generic` and `Cuda`
105modes. The default mode is `Generic`. `Cuda` mode can give an additional
106performance and can be activated using the `-fopenmp-cuda-mode` flag. In
107`Generic` mode all local variables that can be shared in the parallel regions
108are stored in the global memory. In `Cuda` mode local variables are not shared
109between the threads and it is user responsibility to share the required data
110between the threads in the parallel regions.
111
112Features not supported or with limited support for Cuda devices
113---------------------------------------------------------------
114
115- Reductions across the teams are not supported yet.
116
117- Cancellation constructs are not supported.
118
119- Doacross loop nest is not supported.
120
121- User-defined reductions are supported only for trivial types.
122
123- Nested parallelism: inner parallel regions are executed sequentially.
124
125- Static linking of libraries containing device code is not supported yet.
126
127- Automatic translation of math functions in target regions to device-specific
128 math functions is not implemented yet.
129
130- Debug information for OpenMP target regions is not supported yet.
131