Alexey Bataev | 8ede819 | 2018-01-08 19:02:51 +0000 | [diff] [blame] | 1 | .. raw:: html |
| 2 | |
| 3 | <style type="text/css"> |
| 4 | .none { background-color: #FFCCCC } |
| 5 | .partial { background-color: #FFFF99 } |
| 6 | .good { background-color: #CCFF99 } |
| 7 | </style> |
| 8 | |
| 9 | .. role:: none |
| 10 | .. role:: partial |
| 11 | .. role:: good |
| 12 | |
Alexey Bataev | 3bdd600 | 2018-07-26 17:53:45 +0000 | [diff] [blame] | 13 | .. contents:: |
| 14 | :local: |
| 15 | |
Alexey Bataev | 8ede819 | 2018-01-08 19:02:51 +0000 | [diff] [blame] | 16 | ================== |
| 17 | OpenMP Support |
| 18 | ================== |
| 19 | |
Alexey Bataev | 3bdd600 | 2018-07-26 17:53:45 +0000 | [diff] [blame] | 20 | Clang fully supports OpenMP 4.5. Clang supports offloading to X86_64, AArch64, |
| 21 | PPC64[LE] and has `basic support for Cuda devices`_. |
Alexey Bataev | 8ede819 | 2018-01-08 19:02:51 +0000 | [diff] [blame] | 22 | |
| 23 | Standalone directives |
| 24 | ===================== |
| 25 | |
| 26 | * #pragma omp [for] simd: :good:`Complete`. |
| 27 | |
| 28 | * #pragma omp declare simd: :partial:`Partial`. We support parsing/semantic |
| 29 | analysis + generation of special attributes for X86 target, but still |
| 30 | missing the LLVM pass for vectorization. |
| 31 | |
| 32 | * #pragma omp taskloop [simd]: :good:`Complete`. |
| 33 | |
| 34 | * #pragma omp target [enter|exit] data: :good:`Complete`. |
| 35 | |
| 36 | * #pragma omp target update: :good:`Complete`. |
| 37 | |
Alexey Bataev | bbe81f2 | 2018-01-15 19:08:36 +0000 | [diff] [blame] | 38 | * #pragma omp target: :good:`Complete`. |
Alexey Bataev | 8ede819 | 2018-01-08 19:02:51 +0000 | [diff] [blame] | 39 | |
Alexey Bataev | 3bdd600 | 2018-07-26 17:53:45 +0000 | [diff] [blame] | 40 | * #pragma omp declare target: :good:`Complete`. |
Alexey Bataev | 8ede819 | 2018-01-08 19:02:51 +0000 | [diff] [blame] | 41 | |
| 42 | * #pragma omp teams: :good:`Complete`. |
| 43 | |
| 44 | * #pragma omp distribute [simd]: :good:`Complete`. |
| 45 | |
| 46 | * #pragma omp distribute parallel for [simd]: :good:`Complete`. |
| 47 | |
| 48 | Combined directives |
| 49 | =================== |
| 50 | |
| 51 | * #pragma omp parallel for simd: :good:`Complete`. |
| 52 | |
Alexey Bataev | 9675302 | 2018-01-16 19:22:49 +0000 | [diff] [blame] | 53 | * #pragma omp target parallel: :good:`Complete`. |
Alexey Bataev | 8ede819 | 2018-01-08 19:02:51 +0000 | [diff] [blame] | 54 | |
Alexey Bataev | 9675302 | 2018-01-16 19:22:49 +0000 | [diff] [blame] | 55 | * #pragma omp target parallel for [simd]: :good:`Complete`. |
Alexey Bataev | 8ede819 | 2018-01-08 19:02:51 +0000 | [diff] [blame] | 56 | |
Alexey Bataev | 9675302 | 2018-01-16 19:22:49 +0000 | [diff] [blame] | 57 | * #pragma omp target simd: :good:`Complete`. |
Alexey Bataev | 8ede819 | 2018-01-08 19:02:51 +0000 | [diff] [blame] | 58 | |
Alexey Bataev | 9675302 | 2018-01-16 19:22:49 +0000 | [diff] [blame] | 59 | * #pragma omp target teams: :good:`Complete`. |
Alexey Bataev | 8ede819 | 2018-01-08 19:02:51 +0000 | [diff] [blame] | 60 | |
| 61 | * #pragma omp teams distribute [simd]: :good:`Complete`. |
| 62 | |
Alexey Bataev | 9675302 | 2018-01-16 19:22:49 +0000 | [diff] [blame] | 63 | * #pragma omp target teams distribute [simd]: :good:`Complete`. |
Alexey Bataev | 8ede819 | 2018-01-08 19:02:51 +0000 | [diff] [blame] | 64 | |
| 65 | * #pragma omp teams distribute parallel for [simd]: :good:`Complete`. |
| 66 | |
Alexey Bataev | 9675302 | 2018-01-16 19:22:49 +0000 | [diff] [blame] | 67 | * #pragma omp target teams distribute parallel for [simd]: :good:`Complete`. |
Alexey Bataev | 8ede819 | 2018-01-08 19:02:51 +0000 | [diff] [blame] | 68 | |
Alexey Bataev | 3bdd600 | 2018-07-26 17:53:45 +0000 | [diff] [blame] | 69 | Clang does not support any constructs/updates from upcoming OpenMP 5.0 except |
| 70 | for `reduction`-based clauses in the `task` and `target`-based directives. |
| 71 | |
| 72 | In addition, the LLVM OpenMP runtime `libomp` supports the OpenMP Tools |
| 73 | Interface (OMPT) on x86, x86_64, AArch64, and PPC64 on Linux, Windows, and mac OS. |
| 74 | ows, and mac OS. |
| 75 | |
| 76 | .. _basic support for Cuda devices: |
| 77 | |
| 78 | Cuda devices support |
| 79 | ==================== |
| 80 | |
| 81 | Directives execution modes |
| 82 | -------------------------- |
| 83 | |
| 84 | Clang code generation for target regions supports two modes: the SPMD and |
| 85 | non-SPMD modes. Clang chooses one of these two modes automatically based on the |
| 86 | way directives and clauses on those directives are used. The SPMD mode uses a |
| 87 | simplified set of runtime functions thus increasing performance at the cost of |
| 88 | supporting some OpenMP features. The non-SPMD mode is the most generic mode and |
| 89 | supports all currently available OpenMP features. The compiler will always |
| 90 | attempt to use the SPMD mode wherever possible. SPMD mode will not be used if: |
| 91 | |
| 92 | - The target region contains an `if()` clause that refers to a `parallel` |
| 93 | directive. |
| 94 | |
| 95 | - The target region contains a `parallel` directive with a `num_threads()` |
| 96 | clause. |
| 97 | |
| 98 | - The target region contains user code (other than OpenMP-specific |
| 99 | directives) in between the `target` and the `parallel` directives. |
| 100 | |
| 101 | Data-sharing modes |
| 102 | ------------------ |
| 103 | |
| 104 | Clang supports two data-sharing models for Cuda devices: `Generic` and `Cuda` |
| 105 | modes. The default mode is `Generic`. `Cuda` mode can give an additional |
| 106 | performance and can be activated using the `-fopenmp-cuda-mode` flag. In |
| 107 | `Generic` mode all local variables that can be shared in the parallel regions |
| 108 | are stored in the global memory. In `Cuda` mode local variables are not shared |
| 109 | between the threads and it is user responsibility to share the required data |
| 110 | between the threads in the parallel regions. |
| 111 | |
| 112 | Features not supported or with limited support for Cuda devices |
| 113 | --------------------------------------------------------------- |
| 114 | |
| 115 | - Reductions across the teams are not supported yet. |
| 116 | |
| 117 | - Cancellation constructs are not supported. |
| 118 | |
| 119 | - Doacross loop nest is not supported. |
| 120 | |
| 121 | - User-defined reductions are supported only for trivial types. |
| 122 | |
| 123 | - Nested parallelism: inner parallel regions are executed sequentially. |
| 124 | |
| 125 | - Static linking of libraries containing device code is not supported yet. |
| 126 | |
| 127 | - Automatic translation of math functions in target regions to device-specific |
| 128 | math functions is not implemented yet. |
| 129 | |
| 130 | - Debug information for OpenMP target regions is not supported yet. |
| 131 | |