blob: d51ae95b119f187f8ce87431aaece3f35ac39e2f [file] [log] [blame]
Jim Cownie5e8470a2013-09-27 10:38:44 +00001// This file does not contain any code; it just contains additional text and formatting
2// for doxygen.
3
4
5//===----------------------------------------------------------------------===//
6//
7// The LLVM Compiler Infrastructure
8//
9// This file is dual licensed under the MIT and the University of Illinois Open
10// Source Licenses. See LICENSE.txt for details.
11//
12//===----------------------------------------------------------------------===//
13
14
Andrey Churbanov820b2552015-05-08 17:41:32 +000015/*! @mainpage LLVM  OpenMP* Runtime Library Interface
Jim Cownie5e8470a2013-09-27 10:38:44 +000016@section sec_intro Introduction
17
18This document describes the interface provided by the
Andrey Churbanov820b2552015-05-08 17:41:32 +000019LLVM  OpenMP\other runtime library to the compiler.
Jim Cownie5e8470a2013-09-27 10:38:44 +000020Routines that are directly called as simple functions by user code are
21not currently described here, since their definition is in the OpenMP
22specification available from http://openmp.org
23
24The aim here is to explain the interface from the compiler to the runtime.
25
26The overall design is described, and each function in the interface
27has its own description. (At least, that's the ambition, we may not be there yet).
28
Jonathan Peyton1acc2db2015-09-23 18:09:47 +000029@section sec_building Quickly Building the Runtime
Jim Cownie5e8470a2013-09-27 10:38:44 +000030For the impatient, we cover building the runtime as the first topic here.
31
Jonathan Peyton1acc2db2015-09-23 18:09:47 +000032CMake is used to build the OpenMP runtime. For details and a full list of options for the CMake build system,
33see <tt>Build_With_CMake.txt</tt> inside the <tt>runtime/</tt> subdirectory. These
34instructions will provide the most typical build.
35
36In-LLVM-tree build:.
Jim Cownie5e8470a2013-09-27 10:38:44 +000037@code
Jonathan Peyton1acc2db2015-09-23 18:09:47 +000038$ cd where-you-want-to-live
39Check out openmp into llvm/projects
40$ cd where-you-want-to-build
41$ mkdir build && cd build
42$ cmake path/to/llvm -DCMAKE_C_COMPILER=<C compiler> -DCMAKE_CXX_COMPILER=<C++ compiler>
43$ make omp
Jim Cownie5e8470a2013-09-27 10:38:44 +000044@endcode
Jonathan Peyton1acc2db2015-09-23 18:09:47 +000045Out-of-LLVM-tree build:
Jim Cownie5e8470a2013-09-27 10:38:44 +000046@code
Jonathan Peyton1acc2db2015-09-23 18:09:47 +000047$ cd where-you-want-to-live
48Check out openmp
49$ cd where-you-want-to-live/openmp/runtime
50$ mkdir build && cd build
51$ cmake path/to/openmp -DCMAKE_C_COMPILER=<C compiler> -DCMAKE_CXX_COMPILER=<C++ compiler>
52$ make
Jim Cownie5e8470a2013-09-27 10:38:44 +000053@endcode
54
Jim Cownie5e8470a2013-09-27 10:38:44 +000055@section sec_supported Supported RTL Build Configurations
56
57The architectures supported are IA-32 architecture, Intel&reg;&nbsp; 64, and
58Intel&reg;&nbsp; Many Integrated Core Architecture. The build configurations
59supported are shown in the table below.
60
61<table border=1>
Jonathan Peyton1acc2db2015-09-23 18:09:47 +000062<tr><th> <th>icc/icl<th>gcc<th>clang
63<tr><td>Linux\other OS<td>Yes(1,5)<td>Yes(2,4)<td>Yes(4,6,7)
64<tr><td>FreeBSD\other<td>Yes(1,5)<td>Yes(2,4)<td>Yes(4,6,7,8)
65<tr><td>OS X\other<td>Yes(1,3,4)<td>No<td>Yes(4,6,7)
66<tr><td>Windows\other OS<td>Yes(1,4)<td>No<td>No
Jim Cownie5e8470a2013-09-27 10:38:44 +000067</table>
68(1) On IA-32 architecture and Intel&reg;&nbsp; 64, icc/icl versions 12.x
69 are supported (12.1 is recommended).<br>
Jonathan Peyton1acc2db2015-09-23 18:09:47 +000070(2) gcc version 4.7 is supported.<br>
Jim Cownie5e8470a2013-09-27 10:38:44 +000071(3) For icc on OS X\other, OS X\other version 10.5.8 is supported.<br>
72(4) Intel&reg;&nbsp; Many Integrated Core Architecture not supported.<br>
Jonathan Peyton1acc2db2015-09-23 18:09:47 +000073(5) On Intel&reg;&nbsp; Many Integrated Core Architecture, icc/icl versions 13.0 or later are required.<br>
74(6) Clang\other version 3.3 is supported.<br>
75(7) Clang\other currently does not offer a software-implemented 128 bit extended
76 precision type. Thus, all entry points reliant on this type are removed
77 from the library and cannot be called in the user program. The following
78 functions are not available:
79@code
80 __kmpc_atomic_cmplx16_*
81 __kmpc_atomic_float16_*
82 __kmpc_atomic_*_fp
83@endcode
84(8) Community contribution provided AS IS, not tested by Intel.
85
86Supported Architectures: IBM(R) Power 7 and Power 8
87<table border=1>
88<tr><th> <th>gcc<th>clang
89<tr><td>Linux\other OS<td>Yes(1,2)<td>Yes(3,4)
90</table>
91(1) On Power 7, gcc version 4.8.2 is supported.<br>
92(2) On Power 8, gcc version 4.8.2 is supported.<br>
93(3) On Power 7, clang version 3.7 is supported.<br>
94(4) On Power 8, clang version 3.7 is supported.<br>
Jim Cownie5e8470a2013-09-27 10:38:44 +000095
96@section sec_frontend Front-end Compilers that work with this RTL
97
98The following compilers are known to do compatible code generation for
99this RTL: icc/icl, gcc. Code generation is discussed in more detail
100later in this document.
101
102@section sec_outlining Outlining
103
104The runtime interface is based on the idea that the compiler
105"outlines" sections of code that are to run in parallel into separate
106functions that can then be invoked in multiple threads. For instance,
107simple code like this
108
109@code
110void foo()
111{
112#pragma omp parallel
113 {
114 ... do something ...
115 }
116}
117@endcode
118is converted into something that looks conceptually like this (where
119the names used are merely illustrative; the real library function
120names will be used later after we've discussed some more issues...)
121
122@code
123static void outlinedFooBody()
124{
125 ... do something ...
126}
127
128void foo()
129{
130 __OMP_runtime_fork(outlinedFooBody, (void*)0); // Not the real function name!
131}
132@endcode
133
134@subsection SEC_SHAREDVARS Addressing shared variables
135
136In real uses of the OpenMP\other API there are normally references
137from the outlined code to shared variables that are in scope in the containing function.
138Therefore the containing function must be able to address
139these variables. The runtime supports two alternate ways of doing
140this.
141
142@subsubsection SEC_SEC_OT Current Technique
143The technique currently supported by the runtime library is to receive
144a separate pointer to each shared variable that can be accessed from
145the outlined function. This is what is shown in the example below.
146
147We hope soon to provide an alternative interface to support the
148alternate implementation described in the next section. The
149alternative implementation has performance advantages for small
150parallel regions that have many shared variables.
151
152@subsubsection SEC_SEC_PT Future Technique
153The idea is to treat the outlined function as though it
154were a lexically nested function, and pass it a single argument which
155is the pointer to the parent's stack frame. Provided that the compiler
156knows the layout of the parent frame when it is generating the outlined
157function it can then access the up-level variables at appropriate
158offsets from the parent frame. This is a classical compiler technique
159from the 1960s to support languages like Algol (and its descendants)
160that support lexically nested functions.
161
162The main benefit of this technique is that there is no code required
163at the fork point to marshal the arguments to the outlined function.
164Since the runtime knows statically how many arguments must be passed to the
165outlined function, it can easily copy them to the thread's stack
166frame. Therefore the performance of the fork code is independent of
167the number of shared variables that are accessed by the outlined
168function.
169
170If it is hard to determine the stack layout of the parent while generating the
171outlined code, it is still possible to use this approach by collecting all of
172the variables in the parent that are accessed from outlined functions into
173a single `struct` which is placed on the stack, and whose address is passed
174to the outlined functions. In this way the offsets of the shared variables
175are known (since they are inside the struct) without needing to know
176the complete layout of the parent stack-frame. From the point of view
177of the runtime either of these techniques is equivalent, since in either
178case it only has to pass a single argument to the outlined function to allow
179it to access shared variables.
180
181A scheme like this is how gcc\other generates outlined functions.
182
183@section SEC_INTERFACES Library Interfaces
184The library functions used for specific parts of the OpenMP\other language implementation
185are documented in different modules.
186
187 - @ref BASIC_TYPES fundamental types used by the runtime in many places
188 - @ref DEPRECATED functions that are in the library but are no longer required
189 - @ref STARTUP_SHUTDOWN functions for initializing and finalizing the runtime
190 - @ref PARALLEL functions for implementing `omp parallel`
191 - @ref THREAD_STATES functions for supporting thread state inquiries
192 - @ref WORK_SHARING functions for work sharing constructs such as `omp for`, `omp sections`
193 - @ref THREADPRIVATE functions to support thread private data, copyin etc
194 - @ref SYNCHRONIZATION functions to support `omp critical`, `omp barrier`, `omp master`, reductions etc
195 - @ref ATOMIC_OPS functions to support atomic operations
Jonathan Peyton469dcc62015-06-01 02:32:03 +0000196 - @ref STATS_GATHERING macros to support developer profiling of libomp
Jim Cownie5e8470a2013-09-27 10:38:44 +0000197 - Documentation on tasking has still to be written...
198
199@section SEC_EXAMPLES Examples
200@subsection SEC_WORKSHARING_EXAMPLE Work Sharing Example
201This example shows the code generated for a parallel for with reduction and dynamic scheduling.
202
203@code
204extern float foo( void );
205
206int main () {
207 int i;
208 float r = 0.0;
209 #pragma omp parallel for schedule(dynamic) reduction(+:r)
210 for ( i = 0; i < 10; i ++ ) {
211 r += foo();
212 }
213}
214@endcode
215
216The transformed code looks like this.
217@code
218extern float foo( void );
219
220int main () {
221 static int zero = 0;
222 auto int gtid;
223 auto float r = 0.0;
224 __kmpc_begin( & loc3, 0 );
225 // The gtid is not actually required in this example so could be omitted;
226 // We show its initialization here because it is often required for calls into
227 // the runtime and should be locally cached like this.
228 gtid = __kmpc_global thread num( & loc3 );
229 __kmpc_fork call( & loc7, 1, main_7_parallel_3, & r );
230 __kmpc_end( & loc0 );
231 return 0;
232}
233
234struct main_10_reduction_t_5 { float r_10_rpr; };
235
236static kmp_critical_name lck = { 0 };
237static ident_t loc10; // loc10.flags should contain KMP_IDENT_ATOMIC_REDUCE bit set
238 // if compiler has generated an atomic reduction.
239
240void main_7_parallel_3( int *gtid, int *btid, float *r_7_shp ) {
241 auto int i_7_pr;
242 auto int lower, upper, liter, incr;
243 auto struct main_10_reduction_t_5 reduce;
244 reduce.r_10_rpr = 0.F;
245 liter = 0;
246 __kmpc_dispatch_init_4( & loc7,*gtid, 35, 0, 9, 1, 1 );
247 while ( __kmpc_dispatch_next_4( & loc7, *gtid, & liter, & lower, & upper, & incr ) ) {
248 for( i_7_pr = lower; upper >= i_7_pr; i_7_pr ++ )
249 reduce.r_10_rpr += foo();
250 }
251 switch( __kmpc_reduce_nowait( & loc10, *gtid, 1, 4, & reduce, main_10_reduce_5, & lck ) ) {
252 case 1:
253 *r_7_shp += reduce.r_10_rpr;
254 __kmpc_end_reduce_nowait( & loc10, *gtid, & lck );
255 break;
256 case 2:
257 __kmpc_atomic_float4_add( & loc10, *gtid, r_7_shp, reduce.r_10_rpr );
258 break;
259 default:;
260 }
261}
262
263void main_10_reduce_5( struct main_10_reduction_t_5 *reduce_lhs,
264 struct main_10_reduction_t_5 *reduce_rhs )
265{
266 reduce_lhs->r_10_rpr += reduce_rhs->r_10_rpr;
267}
268@endcode
269
270@defgroup BASIC_TYPES Basic Types
271Types that are used throughout the runtime.
272
273@defgroup DEPRECATED Deprecated Functions
274Functions in this group are for backwards compatibility only, and
275should not be used in new code.
276
277@defgroup STARTUP_SHUTDOWN Startup and Shutdown
278These functions are for library initialization and shutdown.
279
280@defgroup PARALLEL Parallel (fork/join)
281These functions are used for implementing <tt>\#pragma omp parallel</tt>.
282
283@defgroup THREAD_STATES Thread Information
284These functions return information about the currently executing thread.
285
286@defgroup WORK_SHARING Work Sharing
287These functions are used for implementing
288<tt>\#pragma omp for</tt>, <tt>\#pragma omp sections</tt>, <tt>\#pragma omp single</tt> and
289<tt>\#pragma omp master</tt> constructs.
290
291When handling loops, there are different functions for each of the signed and unsigned 32 and 64 bit integer types
292which have the name suffixes `_4`, `_4u`, `_8` and `_8u`. The semantics of each of the functions is the same,
293so they are only described once.
294
295Static loop scheduling is handled by @ref __kmpc_for_static_init_4 and friends. Only a single call is needed,
296since the iterations to be executed by any give thread can be determined as soon as the loop parameters are known.
297
298Dynamic scheduling is handled by the @ref __kmpc_dispatch_init_4 and @ref __kmpc_dispatch_next_4 functions.
299The init function is called once in each thread outside the loop, while the next function is called each
300time that the previous chunk of work has been exhausted.
301
302@defgroup SYNCHRONIZATION Synchronization
303These functions are used for implementing barriers.
304
305@defgroup THREADPRIVATE Thread private data support
306These functions support copyin/out and thread private data.
307
Jim Cownie4cc4bb42014-10-07 16:25:50 +0000308@defgroup STATS_GATHERING Statistics Gathering from OMPTB
Jonathan Peyton469dcc62015-06-01 02:32:03 +0000309These macros support profiling the libomp library. Use --stats=on when building with build.pl to enable
310and then use the KMP_* macros to profile (through counts or clock ticks) libomp during execution of an OpenMP program.
Jim Cownie4cc4bb42014-10-07 16:25:50 +0000311
312@section sec_stats_env_vars Environment Variables
313
Jonathan Peytonbb02c252015-08-12 21:05:22 +0000314This section describes the environment variables relevant to stats-gathering in libomp
Jim Cownie4cc4bb42014-10-07 16:25:50 +0000315
316@code
317KMP_STATS_FILE
318@endcode
319This environment variable is set to an output filename that will be appended *NOT OVERWRITTEN* if it exists. If this environment variable is undefined, the statistics will be output to stderr
320
321@code
322KMP_STATS_THREADS
323@endcode
324This environment variable indicates to print thread-specific statistics as well as aggregate statistics. Each thread's statistics will be shown as well as the collective sum of all threads. The values "true", "on", "1", "yes" will all indicate to print per thread statistics.
325
Jim Cownie5e8470a2013-09-27 10:38:44 +0000326@defgroup TASKING Tasking support
Jim Cownie4cc4bb42014-10-07 16:25:50 +0000327These functions support tasking constructs.
328
329@defgroup USER User visible functions
330These functions can be called directly by the user, but are runtime library specific, rather than being OpenMP interfaces.
Jim Cownie5e8470a2013-09-27 10:38:44 +0000331
332*/
333