Jim Cownie | 5e8470a | 2013-09-27 10:38:44 +0000 | [diff] [blame^] | 1 | // This file does not contain any code; it just contains additional text and formatting
|
| 2 | // for doxygen.
|
| 3 |
|
| 4 | |
| 5 | //===----------------------------------------------------------------------===// |
| 6 | // |
| 7 | // The LLVM Compiler Infrastructure |
| 8 | // |
| 9 | // This file is dual licensed under the MIT and the University of Illinois Open |
| 10 | // Source Licenses. See LICENSE.txt for details. |
| 11 | // |
| 12 | //===----------------------------------------------------------------------===// |
| 13 | |
| 14 |
|
| 15 | /*! @mainpage Intel® OpenMP* Runtime Library Interface
|
| 16 | @section sec_intro Introduction
|
| 17 |
|
| 18 | This document describes the interface provided by the
|
| 19 | Intel® OpenMP\other runtime library to the compiler.
|
| 20 | Routines that are directly called as simple functions by user code are
|
| 21 | not currently described here, since their definition is in the OpenMP
|
| 22 | specification available from http://openmp.org
|
| 23 |
|
| 24 | The aim here is to explain the interface from the compiler to the runtime.
|
| 25 |
|
| 26 | The overall design is described, and each function in the interface
|
| 27 | has its own description. (At least, that's the ambition, we may not be there yet).
|
| 28 |
|
| 29 | @section sec_building Building the Runtime
|
| 30 | For the impatient, we cover building the runtime as the first topic here.
|
| 31 |
|
| 32 | A top-level Makefile is provided that attempts to derive a suitable
|
| 33 | configuration for the most commonly used environments. To see the
|
| 34 | default settings, type:
|
| 35 | @code
|
| 36 | % make info
|
| 37 | @endcode
|
| 38 |
|
| 39 | You can change the Makefile's behavior with the following options:
|
| 40 |
|
| 41 | - <b>omp_root</b>: The path to the top-level directory containing the top-level
|
| 42 | Makefile. By default, this will take on the value of the
|
| 43 | current working directory.
|
| 44 |
|
| 45 | - <b>omp_os</b>: Operating system. By default, the build will attempt to
|
| 46 | detect this. Currently supports "linux", "macos", and
|
| 47 | "windows".
|
| 48 |
|
| 49 | - <b>arch</b>: Architecture. By default, the build will attempt to
|
| 50 | detect this if not specified by the user. Currently
|
| 51 | supported values are
|
| 52 | - "32" for IA-32 architecture
|
| 53 | - "32e" for Intel® 64 architecture
|
| 54 | - "mic" for Intel® Many Integrated Core Architecture (
|
| 55 | If "mic" is specified then "icc" will be used as the
|
| 56 | compiler, and appropriate k1om binutils will be used. The
|
| 57 | necessary packages must be installed on the build machine
|
| 58 | for this to be possible, but an
|
| 59 | Intel® Xeon Phi™
|
| 60 | coprocessor is not required to build the library).
|
| 61 |
|
| 62 | - <b>compiler</b>: Which compiler to use for the build. Defaults to "icc"
|
| 63 | or "icl" depending on the value of omp_os. Also supports
|
| 64 | "gcc" when omp_os is "linux" for gcc\other versions
|
| 65 | 4.6.2 and higher. For icc on OS X\other, OS X\other versions
|
| 66 | greater than 10.6 are not supported currently. Also, icc
|
| 67 | version 13.0 is not supported. The selected compiler should be
|
| 68 | installed and in the user's path. The corresponding
|
| 69 | Fortran compiler should also be in the path.
|
| 70 |
|
| 71 | - <b>mode</b>: Library mode: default is "release". Also supports "debug".
|
| 72 |
|
| 73 | To use any of the options above, simple add <option_name>=<value>. For
|
| 74 | example, if you want to build with gcc instead of icc, type:
|
| 75 | @code
|
| 76 | % make compiler=gcc
|
| 77 | @endcode
|
| 78 |
|
| 79 | Underneath the hood of the top-level Makefile, the runtime is built by
|
| 80 | a perl script that in turn drives a detailed runtime system make. The
|
| 81 | script can be found at <tt>tools/build.pl</tt>, and will print
|
| 82 | information about all its flags and controls if invoked as
|
| 83 | @code
|
| 84 | % tools/build.pl --help
|
| 85 | @endcode
|
| 86 |
|
| 87 | If invoked with no arguments, it will try to build a set of libraries
|
| 88 | that are appropriate for the machine on which the build is happening.
|
| 89 | There are many options for building out of tree, and configuring library
|
| 90 | features that can also be used. Consult the <tt>--help</tt> output for details.
|
| 91 |
|
| 92 | @section sec_supported Supported RTL Build Configurations
|
| 93 |
|
| 94 | The architectures supported are IA-32 architecture, Intel® 64, and
|
| 95 | Intel® Many Integrated Core Architecture. The build configurations
|
| 96 | supported are shown in the table below.
|
| 97 |
|
| 98 | <table border=1>
|
| 99 | <tr><th> <th>icc/icl<th>gcc
|
| 100 | <tr><td>Linux\other OS<td>Yes(1,5)<td>Yes(2,4)
|
| 101 | <tr><td>OS X\other<td>Yes(1,3,4)<td>No
|
| 102 | <tr><td>Windows\other OS<td>Yes(1,4)<td>No
|
| 103 | </table>
|
| 104 | (1) On IA-32 architecture and Intel® 64, icc/icl versions 12.x
|
| 105 | are supported (12.1 is recommended).<br>
|
| 106 | (2) gcc version 4.6.2 is supported.<br>
|
| 107 | (3) For icc on OS X\other, OS X\other version 10.5.8 is supported.<br>
|
| 108 | (4) Intel® Many Integrated Core Architecture not supported.<br>
|
| 109 | (5) On Intel® Many Integrated Core Architecture, icc/icl versions 13.0 or later are required.
|
| 110 |
|
| 111 | @section sec_frontend Front-end Compilers that work with this RTL
|
| 112 |
|
| 113 | The following compilers are known to do compatible code generation for
|
| 114 | this RTL: icc/icl, gcc. Code generation is discussed in more detail
|
| 115 | later in this document.
|
| 116 |
|
| 117 | @section sec_outlining Outlining
|
| 118 |
|
| 119 | The runtime interface is based on the idea that the compiler
|
| 120 | "outlines" sections of code that are to run in parallel into separate
|
| 121 | functions that can then be invoked in multiple threads. For instance,
|
| 122 | simple code like this
|
| 123 |
|
| 124 | @code
|
| 125 | void foo()
|
| 126 | {
|
| 127 | #pragma omp parallel
|
| 128 | {
|
| 129 | ... do something ...
|
| 130 | }
|
| 131 | }
|
| 132 | @endcode
|
| 133 | is converted into something that looks conceptually like this (where
|
| 134 | the names used are merely illustrative; the real library function
|
| 135 | names will be used later after we've discussed some more issues...)
|
| 136 |
|
| 137 | @code
|
| 138 | static void outlinedFooBody()
|
| 139 | {
|
| 140 | ... do something ...
|
| 141 | }
|
| 142 |
|
| 143 | void foo()
|
| 144 | {
|
| 145 | __OMP_runtime_fork(outlinedFooBody, (void*)0); // Not the real function name!
|
| 146 | }
|
| 147 | @endcode
|
| 148 |
|
| 149 | @subsection SEC_SHAREDVARS Addressing shared variables
|
| 150 |
|
| 151 | In real uses of the OpenMP\other API there are normally references
|
| 152 | from the outlined code to shared variables that are in scope in the containing function.
|
| 153 | Therefore the containing function must be able to address
|
| 154 | these variables. The runtime supports two alternate ways of doing
|
| 155 | this.
|
| 156 |
|
| 157 | @subsubsection SEC_SEC_OT Current Technique
|
| 158 | The technique currently supported by the runtime library is to receive
|
| 159 | a separate pointer to each shared variable that can be accessed from
|
| 160 | the outlined function. This is what is shown in the example below.
|
| 161 |
|
| 162 | We hope soon to provide an alternative interface to support the
|
| 163 | alternate implementation described in the next section. The
|
| 164 | alternative implementation has performance advantages for small
|
| 165 | parallel regions that have many shared variables.
|
| 166 |
|
| 167 | @subsubsection SEC_SEC_PT Future Technique
|
| 168 | The idea is to treat the outlined function as though it
|
| 169 | were a lexically nested function, and pass it a single argument which
|
| 170 | is the pointer to the parent's stack frame. Provided that the compiler
|
| 171 | knows the layout of the parent frame when it is generating the outlined
|
| 172 | function it can then access the up-level variables at appropriate
|
| 173 | offsets from the parent frame. This is a classical compiler technique
|
| 174 | from the 1960s to support languages like Algol (and its descendants)
|
| 175 | that support lexically nested functions.
|
| 176 |
|
| 177 | The main benefit of this technique is that there is no code required
|
| 178 | at the fork point to marshal the arguments to the outlined function.
|
| 179 | Since the runtime knows statically how many arguments must be passed to the
|
| 180 | outlined function, it can easily copy them to the thread's stack
|
| 181 | frame. Therefore the performance of the fork code is independent of
|
| 182 | the number of shared variables that are accessed by the outlined
|
| 183 | function.
|
| 184 |
|
| 185 | If it is hard to determine the stack layout of the parent while generating the
|
| 186 | outlined code, it is still possible to use this approach by collecting all of
|
| 187 | the variables in the parent that are accessed from outlined functions into
|
| 188 | a single `struct` which is placed on the stack, and whose address is passed
|
| 189 | to the outlined functions. In this way the offsets of the shared variables
|
| 190 | are known (since they are inside the struct) without needing to know
|
| 191 | the complete layout of the parent stack-frame. From the point of view
|
| 192 | of the runtime either of these techniques is equivalent, since in either
|
| 193 | case it only has to pass a single argument to the outlined function to allow
|
| 194 | it to access shared variables.
|
| 195 |
|
| 196 | A scheme like this is how gcc\other generates outlined functions.
|
| 197 |
|
| 198 | @section SEC_INTERFACES Library Interfaces
|
| 199 | The library functions used for specific parts of the OpenMP\other language implementation
|
| 200 | are documented in different modules.
|
| 201 |
|
| 202 | - @ref BASIC_TYPES fundamental types used by the runtime in many places
|
| 203 | - @ref DEPRECATED functions that are in the library but are no longer required
|
| 204 | - @ref STARTUP_SHUTDOWN functions for initializing and finalizing the runtime
|
| 205 | - @ref PARALLEL functions for implementing `omp parallel`
|
| 206 | - @ref THREAD_STATES functions for supporting thread state inquiries
|
| 207 | - @ref WORK_SHARING functions for work sharing constructs such as `omp for`, `omp sections`
|
| 208 | - @ref THREADPRIVATE functions to support thread private data, copyin etc
|
| 209 | - @ref SYNCHRONIZATION functions to support `omp critical`, `omp barrier`, `omp master`, reductions etc
|
| 210 | - @ref ATOMIC_OPS functions to support atomic operations
|
| 211 | - Documentation on tasking has still to be written...
|
| 212 |
|
| 213 | @section SEC_EXAMPLES Examples
|
| 214 | @subsection SEC_WORKSHARING_EXAMPLE Work Sharing Example
|
| 215 | This example shows the code generated for a parallel for with reduction and dynamic scheduling.
|
| 216 |
|
| 217 | @code
|
| 218 | extern float foo( void );
|
| 219 |
|
| 220 | int main () {
|
| 221 | int i;
|
| 222 | float r = 0.0;
|
| 223 | #pragma omp parallel for schedule(dynamic) reduction(+:r)
|
| 224 | for ( i = 0; i < 10; i ++ ) {
|
| 225 | r += foo();
|
| 226 | }
|
| 227 | }
|
| 228 | @endcode
|
| 229 |
|
| 230 | The transformed code looks like this.
|
| 231 | @code
|
| 232 | extern float foo( void );
|
| 233 |
|
| 234 | int main () {
|
| 235 | static int zero = 0;
|
| 236 | auto int gtid;
|
| 237 | auto float r = 0.0;
|
| 238 | __kmpc_begin( & loc3, 0 );
|
| 239 | // The gtid is not actually required in this example so could be omitted;
|
| 240 | // We show its initialization here because it is often required for calls into
|
| 241 | // the runtime and should be locally cached like this.
|
| 242 | gtid = __kmpc_global thread num( & loc3 );
|
| 243 | __kmpc_fork call( & loc7, 1, main_7_parallel_3, & r );
|
| 244 | __kmpc_end( & loc0 );
|
| 245 | return 0;
|
| 246 | }
|
| 247 |
|
| 248 | struct main_10_reduction_t_5 { float r_10_rpr; };
|
| 249 |
|
| 250 | static kmp_critical_name lck = { 0 };
|
| 251 | static ident_t loc10; // loc10.flags should contain KMP_IDENT_ATOMIC_REDUCE bit set
|
| 252 | // if compiler has generated an atomic reduction.
|
| 253 |
|
| 254 | void main_7_parallel_3( int *gtid, int *btid, float *r_7_shp ) {
|
| 255 | auto int i_7_pr;
|
| 256 | auto int lower, upper, liter, incr;
|
| 257 | auto struct main_10_reduction_t_5 reduce;
|
| 258 | reduce.r_10_rpr = 0.F;
|
| 259 | liter = 0;
|
| 260 | __kmpc_dispatch_init_4( & loc7,*gtid, 35, 0, 9, 1, 1 );
|
| 261 | while ( __kmpc_dispatch_next_4( & loc7, *gtid, & liter, & lower, & upper, & incr ) ) {
|
| 262 | for( i_7_pr = lower; upper >= i_7_pr; i_7_pr ++ )
|
| 263 | reduce.r_10_rpr += foo();
|
| 264 | }
|
| 265 | switch( __kmpc_reduce_nowait( & loc10, *gtid, 1, 4, & reduce, main_10_reduce_5, & lck ) ) {
|
| 266 | case 1:
|
| 267 | *r_7_shp += reduce.r_10_rpr;
|
| 268 | __kmpc_end_reduce_nowait( & loc10, *gtid, & lck );
|
| 269 | break;
|
| 270 | case 2:
|
| 271 | __kmpc_atomic_float4_add( & loc10, *gtid, r_7_shp, reduce.r_10_rpr );
|
| 272 | break;
|
| 273 | default:;
|
| 274 | }
|
| 275 | }
|
| 276 |
|
| 277 | void main_10_reduce_5( struct main_10_reduction_t_5 *reduce_lhs,
|
| 278 | struct main_10_reduction_t_5 *reduce_rhs )
|
| 279 | {
|
| 280 | reduce_lhs->r_10_rpr += reduce_rhs->r_10_rpr;
|
| 281 | }
|
| 282 | @endcode
|
| 283 |
|
| 284 | @defgroup BASIC_TYPES Basic Types
|
| 285 | Types that are used throughout the runtime.
|
| 286 |
|
| 287 | @defgroup DEPRECATED Deprecated Functions
|
| 288 | Functions in this group are for backwards compatibility only, and
|
| 289 | should not be used in new code.
|
| 290 |
|
| 291 | @defgroup STARTUP_SHUTDOWN Startup and Shutdown
|
| 292 | These functions are for library initialization and shutdown.
|
| 293 |
|
| 294 | @defgroup PARALLEL Parallel (fork/join)
|
| 295 | These functions are used for implementing <tt>\#pragma omp parallel</tt>.
|
| 296 |
|
| 297 | @defgroup THREAD_STATES Thread Information
|
| 298 | These functions return information about the currently executing thread.
|
| 299 |
|
| 300 | @defgroup WORK_SHARING Work Sharing
|
| 301 | These functions are used for implementing
|
| 302 | <tt>\#pragma omp for</tt>, <tt>\#pragma omp sections</tt>, <tt>\#pragma omp single</tt> and
|
| 303 | <tt>\#pragma omp master</tt> constructs.
|
| 304 |
|
| 305 | When handling loops, there are different functions for each of the signed and unsigned 32 and 64 bit integer types
|
| 306 | which have the name suffixes `_4`, `_4u`, `_8` and `_8u`. The semantics of each of the functions is the same,
|
| 307 | so they are only described once.
|
| 308 |
|
| 309 | Static loop scheduling is handled by @ref __kmpc_for_static_init_4 and friends. Only a single call is needed,
|
| 310 | since the iterations to be executed by any give thread can be determined as soon as the loop parameters are known.
|
| 311 |
|
| 312 | Dynamic scheduling is handled by the @ref __kmpc_dispatch_init_4 and @ref __kmpc_dispatch_next_4 functions.
|
| 313 | The init function is called once in each thread outside the loop, while the next function is called each
|
| 314 | time that the previous chunk of work has been exhausted.
|
| 315 |
|
| 316 | @defgroup SYNCHRONIZATION Synchronization
|
| 317 | These functions are used for implementing barriers.
|
| 318 |
|
| 319 | @defgroup THREADPRIVATE Thread private data support
|
| 320 | These functions support copyin/out and thread private data.
|
| 321 |
|
| 322 | @defgroup TASKING Tasking support
|
| 323 | These functions support are used to implement tasking constructs.
|
| 324 |
|
| 325 | */
|
| 326 |
|