blob: b4be75cd0a1aa229ea114cac0fb34778383bb3ec [file] [log] [blame]
Jim Cownie5e8470a2013-09-27 10:38:44 +00001// This file does not contain any code; it just contains additional text and formatting
2// for doxygen.
3
4
5//===----------------------------------------------------------------------===//
6//
7// The LLVM Compiler Infrastructure
8//
9// This file is dual licensed under the MIT and the University of Illinois Open
10// Source Licenses. See LICENSE.txt for details.
11//
12//===----------------------------------------------------------------------===//
13
14
15/*! @mainpage Intel®  OpenMP* Runtime Library Interface
16@section sec_intro Introduction
17
18This document describes the interface provided by the
19Intel® OpenMP\other runtime library to the compiler.
20Routines that are directly called as simple functions by user code are
21not currently described here, since their definition is in the OpenMP
22specification available from http://openmp.org
23
24The aim here is to explain the interface from the compiler to the runtime.
25
26The overall design is described, and each function in the interface
27has its own description. (At least, that's the ambition, we may not be there yet).
28
29@section sec_building Building the Runtime
30For the impatient, we cover building the runtime as the first topic here.
31
32A top-level Makefile is provided that attempts to derive a suitable
33configuration for the most commonly used environments. To see the
34default settings, type:
35@code
36% make info
37@endcode
38
39You can change the Makefile's behavior with the following options:
40
41 - <b>omp_root</b>: The path to the top-level directory containing the top-level
42 Makefile. By default, this will take on the value of the
43 current working directory.
44
45 - <b>omp_os</b>: Operating system. By default, the build will attempt to
46 detect this. Currently supports "linux", "macos", and
47 "windows".
48
49 - <b>arch</b>: Architecture. By default, the build will attempt to
50 detect this if not specified by the user. Currently
51 supported values are
52 - "32" for IA-32 architecture
53 - "32e" for Intel&reg;&nbsp;64 architecture
54 - "mic" for Intel&reg;&nbsp;Many Integrated Core Architecture (
55 If "mic" is specified then "icc" will be used as the
56 compiler, and appropriate k1om binutils will be used. The
57 necessary packages must be installed on the build machine
58 for this to be possible, but an
59 Intel&reg;&nbsp;Xeon Phi&trade;&nbsp;
60 coprocessor is not required to build the library).
61
62 - <b>compiler</b>: Which compiler to use for the build. Defaults to "icc"
63 or "icl" depending on the value of omp_os. Also supports
64 "gcc" when omp_os is "linux" for gcc\other versions
65 4.6.2 and higher. For icc on OS X\other, OS X\other versions
66 greater than 10.6 are not supported currently. Also, icc
67 version 13.0 is not supported. The selected compiler should be
68 installed and in the user's path. The corresponding
69 Fortran compiler should also be in the path.
70
71 - <b>mode</b>: Library mode: default is "release". Also supports "debug".
72
73To use any of the options above, simple add &lt;option_name&gt;=&lt;value&gt;. For
74example, if you want to build with gcc instead of icc, type:
75@code
76% make compiler=gcc
77@endcode
78
79Underneath the hood of the top-level Makefile, the runtime is built by
80a perl script that in turn drives a detailed runtime system make. The
81script can be found at <tt>tools/build.pl</tt>, and will print
82information about all its flags and controls if invoked as
83@code
84% tools/build.pl --help
85@endcode
86
87If invoked with no arguments, it will try to build a set of libraries
88that are appropriate for the machine on which the build is happening.
89There are many options for building out of tree, and configuring library
90features that can also be used. Consult the <tt>--help</tt> output for details.
91
92@section sec_supported Supported RTL Build Configurations
93
94The architectures supported are IA-32 architecture, Intel&reg;&nbsp; 64, and
95Intel&reg;&nbsp; Many Integrated Core Architecture. The build configurations
96supported are shown in the table below.
97
98<table border=1>
99<tr><th> <th>icc/icl<th>gcc
100<tr><td>Linux\other OS<td>Yes(1,5)<td>Yes(2,4)
101<tr><td>OS X\other<td>Yes(1,3,4)<td>No
102<tr><td>Windows\other OS<td>Yes(1,4)<td>No
103</table>
104(1) On IA-32 architecture and Intel&reg;&nbsp; 64, icc/icl versions 12.x
105 are supported (12.1 is recommended).<br>
106(2) gcc version 4.6.2 is supported.<br>
107(3) For icc on OS X\other, OS X\other version 10.5.8 is supported.<br>
108(4) Intel&reg;&nbsp; Many Integrated Core Architecture not supported.<br>
109(5) On Intel&reg;&nbsp; Many Integrated Core Architecture, icc/icl versions 13.0 or later are required.
110
111@section sec_frontend Front-end Compilers that work with this RTL
112
113The following compilers are known to do compatible code generation for
114this RTL: icc/icl, gcc. Code generation is discussed in more detail
115later in this document.
116
117@section sec_outlining Outlining
118
119The runtime interface is based on the idea that the compiler
120"outlines" sections of code that are to run in parallel into separate
121functions that can then be invoked in multiple threads. For instance,
122simple code like this
123
124@code
125void foo()
126{
127#pragma omp parallel
128 {
129 ... do something ...
130 }
131}
132@endcode
133is converted into something that looks conceptually like this (where
134the names used are merely illustrative; the real library function
135names will be used later after we've discussed some more issues...)
136
137@code
138static void outlinedFooBody()
139{
140 ... do something ...
141}
142
143void foo()
144{
145 __OMP_runtime_fork(outlinedFooBody, (void*)0); // Not the real function name!
146}
147@endcode
148
149@subsection SEC_SHAREDVARS Addressing shared variables
150
151In real uses of the OpenMP\other API there are normally references
152from the outlined code to shared variables that are in scope in the containing function.
153Therefore the containing function must be able to address
154these variables. The runtime supports two alternate ways of doing
155this.
156
157@subsubsection SEC_SEC_OT Current Technique
158The technique currently supported by the runtime library is to receive
159a separate pointer to each shared variable that can be accessed from
160the outlined function. This is what is shown in the example below.
161
162We hope soon to provide an alternative interface to support the
163alternate implementation described in the next section. The
164alternative implementation has performance advantages for small
165parallel regions that have many shared variables.
166
167@subsubsection SEC_SEC_PT Future Technique
168The idea is to treat the outlined function as though it
169were a lexically nested function, and pass it a single argument which
170is the pointer to the parent's stack frame. Provided that the compiler
171knows the layout of the parent frame when it is generating the outlined
172function it can then access the up-level variables at appropriate
173offsets from the parent frame. This is a classical compiler technique
174from the 1960s to support languages like Algol (and its descendants)
175that support lexically nested functions.
176
177The main benefit of this technique is that there is no code required
178at the fork point to marshal the arguments to the outlined function.
179Since the runtime knows statically how many arguments must be passed to the
180outlined function, it can easily copy them to the thread's stack
181frame. Therefore the performance of the fork code is independent of
182the number of shared variables that are accessed by the outlined
183function.
184
185If it is hard to determine the stack layout of the parent while generating the
186outlined code, it is still possible to use this approach by collecting all of
187the variables in the parent that are accessed from outlined functions into
188a single `struct` which is placed on the stack, and whose address is passed
189to the outlined functions. In this way the offsets of the shared variables
190are known (since they are inside the struct) without needing to know
191the complete layout of the parent stack-frame. From the point of view
192of the runtime either of these techniques is equivalent, since in either
193case it only has to pass a single argument to the outlined function to allow
194it to access shared variables.
195
196A scheme like this is how gcc\other generates outlined functions.
197
198@section SEC_INTERFACES Library Interfaces
199The library functions used for specific parts of the OpenMP\other language implementation
200are documented in different modules.
201
202 - @ref BASIC_TYPES fundamental types used by the runtime in many places
203 - @ref DEPRECATED functions that are in the library but are no longer required
204 - @ref STARTUP_SHUTDOWN functions for initializing and finalizing the runtime
205 - @ref PARALLEL functions for implementing `omp parallel`
206 - @ref THREAD_STATES functions for supporting thread state inquiries
207 - @ref WORK_SHARING functions for work sharing constructs such as `omp for`, `omp sections`
208 - @ref THREADPRIVATE functions to support thread private data, copyin etc
209 - @ref SYNCHRONIZATION functions to support `omp critical`, `omp barrier`, `omp master`, reductions etc
210 - @ref ATOMIC_OPS functions to support atomic operations
211 - Documentation on tasking has still to be written...
212
213@section SEC_EXAMPLES Examples
214@subsection SEC_WORKSHARING_EXAMPLE Work Sharing Example
215This example shows the code generated for a parallel for with reduction and dynamic scheduling.
216
217@code
218extern float foo( void );
219
220int main () {
221 int i;
222 float r = 0.0;
223 #pragma omp parallel for schedule(dynamic) reduction(+:r)
224 for ( i = 0; i < 10; i ++ ) {
225 r += foo();
226 }
227}
228@endcode
229
230The transformed code looks like this.
231@code
232extern float foo( void );
233
234int main () {
235 static int zero = 0;
236 auto int gtid;
237 auto float r = 0.0;
238 __kmpc_begin( & loc3, 0 );
239 // The gtid is not actually required in this example so could be omitted;
240 // We show its initialization here because it is often required for calls into
241 // the runtime and should be locally cached like this.
242 gtid = __kmpc_global thread num( & loc3 );
243 __kmpc_fork call( & loc7, 1, main_7_parallel_3, & r );
244 __kmpc_end( & loc0 );
245 return 0;
246}
247
248struct main_10_reduction_t_5 { float r_10_rpr; };
249
250static kmp_critical_name lck = { 0 };
251static ident_t loc10; // loc10.flags should contain KMP_IDENT_ATOMIC_REDUCE bit set
252 // if compiler has generated an atomic reduction.
253
254void main_7_parallel_3( int *gtid, int *btid, float *r_7_shp ) {
255 auto int i_7_pr;
256 auto int lower, upper, liter, incr;
257 auto struct main_10_reduction_t_5 reduce;
258 reduce.r_10_rpr = 0.F;
259 liter = 0;
260 __kmpc_dispatch_init_4( & loc7,*gtid, 35, 0, 9, 1, 1 );
261 while ( __kmpc_dispatch_next_4( & loc7, *gtid, & liter, & lower, & upper, & incr ) ) {
262 for( i_7_pr = lower; upper >= i_7_pr; i_7_pr ++ )
263 reduce.r_10_rpr += foo();
264 }
265 switch( __kmpc_reduce_nowait( & loc10, *gtid, 1, 4, & reduce, main_10_reduce_5, & lck ) ) {
266 case 1:
267 *r_7_shp += reduce.r_10_rpr;
268 __kmpc_end_reduce_nowait( & loc10, *gtid, & lck );
269 break;
270 case 2:
271 __kmpc_atomic_float4_add( & loc10, *gtid, r_7_shp, reduce.r_10_rpr );
272 break;
273 default:;
274 }
275}
276
277void main_10_reduce_5( struct main_10_reduction_t_5 *reduce_lhs,
278 struct main_10_reduction_t_5 *reduce_rhs )
279{
280 reduce_lhs->r_10_rpr += reduce_rhs->r_10_rpr;
281}
282@endcode
283
284@defgroup BASIC_TYPES Basic Types
285Types that are used throughout the runtime.
286
287@defgroup DEPRECATED Deprecated Functions
288Functions in this group are for backwards compatibility only, and
289should not be used in new code.
290
291@defgroup STARTUP_SHUTDOWN Startup and Shutdown
292These functions are for library initialization and shutdown.
293
294@defgroup PARALLEL Parallel (fork/join)
295These functions are used for implementing <tt>\#pragma omp parallel</tt>.
296
297@defgroup THREAD_STATES Thread Information
298These functions return information about the currently executing thread.
299
300@defgroup WORK_SHARING Work Sharing
301These functions are used for implementing
302<tt>\#pragma omp for</tt>, <tt>\#pragma omp sections</tt>, <tt>\#pragma omp single</tt> and
303<tt>\#pragma omp master</tt> constructs.
304
305When handling loops, there are different functions for each of the signed and unsigned 32 and 64 bit integer types
306which have the name suffixes `_4`, `_4u`, `_8` and `_8u`. The semantics of each of the functions is the same,
307so they are only described once.
308
309Static loop scheduling is handled by @ref __kmpc_for_static_init_4 and friends. Only a single call is needed,
310since the iterations to be executed by any give thread can be determined as soon as the loop parameters are known.
311
312Dynamic scheduling is handled by the @ref __kmpc_dispatch_init_4 and @ref __kmpc_dispatch_next_4 functions.
313The init function is called once in each thread outside the loop, while the next function is called each
314time that the previous chunk of work has been exhausted.
315
316@defgroup SYNCHRONIZATION Synchronization
317These functions are used for implementing barriers.
318
319@defgroup THREADPRIVATE Thread private data support
320These functions support copyin/out and thread private data.
321
322@defgroup TASKING Tasking support
323These functions support are used to implement tasking constructs.
324
325*/
326