|  | .. raw:: html | 
|  |  | 
|  | <style type="text/css"> | 
|  | .none { background-color: #FFCCCC } | 
|  | .partial { background-color: #FFFF99 } | 
|  | .good { background-color: #CCFF99 } | 
|  | </style> | 
|  |  | 
|  | .. role:: none | 
|  | .. role:: partial | 
|  | .. role:: good | 
|  |  | 
|  | .. contents:: | 
|  | :local: | 
|  |  | 
|  | ================== | 
|  | OpenMP Support | 
|  | ================== | 
|  |  | 
|  | Clang fully supports OpenMP 4.5. Clang supports offloading to X86_64, AArch64, | 
|  | PPC64[LE] and has `basic support for Cuda devices`_. | 
|  |  | 
|  | Standalone directives | 
|  | ===================== | 
|  |  | 
|  | * #pragma omp [for] simd: :good:`Complete`. | 
|  |  | 
|  | * #pragma omp declare simd: :partial:`Partial`.  We support parsing/semantic | 
|  | analysis + generation of special attributes for X86 target, but still | 
|  | missing the LLVM pass for vectorization. | 
|  |  | 
|  | * #pragma omp taskloop [simd]: :good:`Complete`. | 
|  |  | 
|  | * #pragma omp target [enter|exit] data: :good:`Complete`. | 
|  |  | 
|  | * #pragma omp target update: :good:`Complete`. | 
|  |  | 
|  | * #pragma omp target: :good:`Complete`. | 
|  |  | 
|  | * #pragma omp declare target: :good:`Complete`. | 
|  |  | 
|  | * #pragma omp teams: :good:`Complete`. | 
|  |  | 
|  | * #pragma omp distribute [simd]: :good:`Complete`. | 
|  |  | 
|  | * #pragma omp distribute parallel for [simd]: :good:`Complete`. | 
|  |  | 
|  | Combined directives | 
|  | =================== | 
|  |  | 
|  | * #pragma omp parallel for simd: :good:`Complete`. | 
|  |  | 
|  | * #pragma omp target parallel: :good:`Complete`. | 
|  |  | 
|  | * #pragma omp target parallel for [simd]: :good:`Complete`. | 
|  |  | 
|  | * #pragma omp target simd: :good:`Complete`. | 
|  |  | 
|  | * #pragma omp target teams: :good:`Complete`. | 
|  |  | 
|  | * #pragma omp teams distribute [simd]: :good:`Complete`. | 
|  |  | 
|  | * #pragma omp target teams distribute [simd]: :good:`Complete`. | 
|  |  | 
|  | * #pragma omp teams distribute parallel for [simd]: :good:`Complete`. | 
|  |  | 
|  | * #pragma omp target teams distribute parallel for [simd]: :good:`Complete`. | 
|  |  | 
|  | Clang does not support any constructs/updates from upcoming OpenMP 5.0 except | 
|  | for `reduction`-based clauses in the `task` and `target`-based directives. | 
|  |  | 
|  | In addition, the LLVM OpenMP runtime `libomp` supports the OpenMP Tools | 
|  | Interface (OMPT) on x86, x86_64, AArch64, and PPC64 on Linux, Windows, and mac OS. | 
|  | ows, and mac OS. | 
|  |  | 
|  | .. _basic support for Cuda devices: | 
|  |  | 
|  | Cuda devices support | 
|  | ==================== | 
|  |  | 
|  | Directives execution modes | 
|  | -------------------------- | 
|  |  | 
|  | Clang code generation for target regions supports two modes: the SPMD and | 
|  | non-SPMD modes. Clang chooses one of these two modes automatically based on the | 
|  | way directives and clauses on those directives are used. The SPMD mode uses a | 
|  | simplified set of runtime functions thus increasing performance at the cost of | 
|  | supporting some OpenMP features. The non-SPMD mode is the most generic mode and | 
|  | supports all currently available OpenMP features. The compiler will always | 
|  | attempt to use the SPMD mode wherever possible. SPMD mode will not be used if: | 
|  |  | 
|  | - The target region contains an `if()` clause that refers to a `parallel` | 
|  | directive. | 
|  |  | 
|  | - The target region contains a `parallel` directive with a `num_threads()` | 
|  | clause. | 
|  |  | 
|  | - The target region contains user code (other than OpenMP-specific | 
|  | directives) in between the `target` and the `parallel` directives. | 
|  |  | 
|  | Data-sharing modes | 
|  | ------------------ | 
|  |  | 
|  | Clang supports two data-sharing models for Cuda devices: `Generic` and `Cuda` | 
|  | modes. The default mode is `Generic`. `Cuda` mode can give an additional | 
|  | performance and can be activated using the `-fopenmp-cuda-mode` flag. In | 
|  | `Generic` mode all local variables that can be shared in the parallel regions | 
|  | are stored in the global memory. In `Cuda` mode local variables are not shared | 
|  | between the threads and it is user responsibility to share the required data | 
|  | between the threads in the parallel regions. | 
|  |  | 
|  | Features not supported or with limited support for Cuda devices | 
|  | --------------------------------------------------------------- | 
|  |  | 
|  | - Reductions across the teams are not supported yet. | 
|  |  | 
|  | - Cancellation constructs are not supported. | 
|  |  | 
|  | - Doacross loop nest is not supported. | 
|  |  | 
|  | - User-defined reductions are supported only for trivial types. | 
|  |  | 
|  | - Nested parallelism: inner parallel regions are executed sequentially. | 
|  |  | 
|  | - Static linking of libraries containing device code is not supported yet. | 
|  |  | 
|  | - Automatic translation of math functions in target regions to device-specific | 
|  | math functions is not implemented yet. | 
|  |  | 
|  | - Debug information for OpenMP target regions is not supported yet. | 
|  |  |