blob: b0f88fc7ab6a54b092e703fbf19306fe9413caf7 [file] [log] [blame]
Mikhail Glushenkov772d9c92008-05-30 06:25:24 +00001===================================
Mikhail Glushenkov1ce87222008-05-30 06:14:42 +00002Customizing LLVMC: Reference Manual
3===================================
Anton Korobeynikove9ffb5b2008-03-23 08:57:20 +00004
Mikhail Glushenkov2e6a8442008-05-06 18:17:19 +00005LLVMC is a generic compiler driver, designed to be customizable and
6extensible. It plays the same role for LLVM as the ``gcc`` program
7does for GCC - LLVMC's job is essentially to transform a set of input
8files into a set of targets depending on configuration rules and user
9options. What makes LLVMC different is that these transformation rules
10are completely customizable - in fact, LLVMC knows nothing about the
11specifics of transformation (even the command-line options are mostly
12not hard-coded) and regards the transformation structure as an
13abstract graph. This makes it possible to adapt LLVMC for other
Mikhail Glushenkov1ce87222008-05-30 06:14:42 +000014purposes - for example, as a build tool for game resources.
Anton Korobeynikove9ffb5b2008-03-23 08:57:20 +000015
Mikhail Glushenkov2e6a8442008-05-06 18:17:19 +000016Because LLVMC employs TableGen [1]_ as its configuration language, you
17need to be familiar with it to customize LLVMC.
Anton Korobeynikove9ffb5b2008-03-23 08:57:20 +000018
Mikhail Glushenkov772d9c92008-05-30 06:25:24 +000019
20.. contents::
21
22
Mikhail Glushenkov2e6a8442008-05-06 18:17:19 +000023Compiling with LLVMC
Mikhail Glushenkov772d9c92008-05-30 06:25:24 +000024====================
Anton Korobeynikove9ffb5b2008-03-23 08:57:20 +000025
Mikhail Glushenkov1ce87222008-05-30 06:14:42 +000026LLVMC tries hard to be as compatible with ``gcc`` as possible,
27although there are some small differences. Most of the time, however,
28you shouldn't be able to notice them::
Anton Korobeynikove9ffb5b2008-03-23 08:57:20 +000029
Mikhail Glushenkov1ce87222008-05-30 06:14:42 +000030 $ # This works as expected:
Mikhail Glushenkov2e6a8442008-05-06 18:17:19 +000031 $ llvmc2 -O3 -Wall hello.cpp
32 $ ./a.out
33 hello
Anton Korobeynikove9ffb5b2008-03-23 08:57:20 +000034
Mikhail Glushenkov1ce87222008-05-30 06:14:42 +000035One nice feature of LLVMC is that one doesn't have to distinguish
Mikhail Glushenkov2e6a8442008-05-06 18:17:19 +000036between different compilers for different languages (think ``g++`` and
37``gcc``) - the right toolchain is chosen automatically based on input
Mikhail Glushenkov1ce87222008-05-30 06:14:42 +000038language names (which are, in turn, determined from file
39extensions). If you want to force files ending with ".c" to compile as
40C++, use the ``-x`` option, just like you would do it with ``gcc``::
Anton Korobeynikove9ffb5b2008-03-23 08:57:20 +000041
Mikhail Glushenkov2e6a8442008-05-06 18:17:19 +000042 $ llvmc2 -x c hello.cpp
43 $ # hello.cpp is really a C file
44 $ ./a.out
45 hello
Anton Korobeynikove9ffb5b2008-03-23 08:57:20 +000046
Mikhail Glushenkov2e6a8442008-05-06 18:17:19 +000047On the other hand, when using LLVMC as a linker to combine several C++
48object files you should provide the ``--linker`` option since it's
49impossible for LLVMC to choose the right linker in that case::
Anton Korobeynikove9ffb5b2008-03-23 08:57:20 +000050
Mikhail Glushenkov2e6a8442008-05-06 18:17:19 +000051 $ llvmc2 -c hello.cpp
52 $ llvmc2 hello.o
53 [A lot of link-time errors skipped]
54 $ llvmc2 --linker=c++ hello.o
55 $ ./a.out
56 hello
Anton Korobeynikove9ffb5b2008-03-23 08:57:20 +000057
Mikhail Glushenkov772d9c92008-05-30 06:25:24 +000058Predefined options
59==================
60
61LLVMC has some built-in options that can't be overridden in the
62configuration files:
63
64* ``-o FILE`` - Output file name.
65
66* ``-x LANGUAGE`` - Specify the language of the following input files
67 until the next -x option.
68
69* ``-v`` - Enable verbose mode, i.e. print out all executed commands.
70
71* ``--view-graph`` - Show a graphical representation of the compilation
72 graph. Requires that you have ``dot`` and ``gv`` commands
73 installed. Hidden option, useful for debugging.
74
75* ``--write-graph`` - Write a ``compilation-graph.dot`` file in the
76 current directory with the compilation graph description in the
77 Graphviz format. Hidden option, useful for debugging.
78
Mikhail Glushenkov2e6a8442008-05-06 18:17:19 +000079
80Customizing LLVMC: the compilation graph
Mikhail Glushenkov772d9c92008-05-30 06:25:24 +000081========================================
Mikhail Glushenkov2e6a8442008-05-06 18:17:19 +000082
83At the time of writing LLVMC does not support on-the-fly reloading of
Mikhail Glushenkov1ce87222008-05-30 06:14:42 +000084configuration, so to customize LLVMC you'll have to recompile the
85source code (which lives under ``$LLVM_DIR/tools/llvmc2``). The
86default configuration files are ``Common.td`` (contains common
87definitions, don't forget to ``include`` it in your configuration
88files), ``Tools.td`` (tool descriptions) and ``Graph.td`` (compilation
89graph definition).
Mikhail Glushenkov2e6a8442008-05-06 18:17:19 +000090
Mikhail Glushenkov1ce87222008-05-30 06:14:42 +000091To compile LLVMC with your own configuration file (say,``MyGraph.td``),
92run ``make`` like this::
Mikhail Glushenkov2e6a8442008-05-06 18:17:19 +000093
Mikhail Glushenkov1ce87222008-05-30 06:14:42 +000094 $ cd $LLVM_DIR/tools/llvmc2
95 $ make GRAPH=MyGraph.td TOOLNAME=my_llvmc
96
97This will build an executable named ``my_llvmc``. There are also
98several sample configuration files in the ``llvmc2/examples``
99subdirectory that should help to get you started.
100
101Internally, LLVMC stores information about possible source
102transformations in form of a graph. Nodes in this graph represent
103tools, and edges between two nodes represent a transformation path. A
104special "root" node is used to mark entry points for the
105transformations. LLVMC also assigns a weight to each edge (more on
106this later) to choose between several alternative edges.
107
108The definition of the compilation graph (see file ``Graph.td``) is
Mikhail Glushenkov2e6a8442008-05-06 18:17:19 +0000109just a list of edges::
110
111 def CompilationGraph : CompilationGraph<[
112 Edge<root, llvm_gcc_c>,
113 Edge<root, llvm_gcc_assembler>,
114 ...
115
116 Edge<llvm_gcc_c, llc>,
117 Edge<llvm_gcc_cpp, llc>,
118 ...
119
120 OptionalEdge<llvm_gcc_c, opt, [(switch_on "opt")]>,
121 OptionalEdge<llvm_gcc_cpp, opt, [(switch_on "opt")]>,
122 ...
123
124 OptionalEdge<llvm_gcc_assembler, llvm_gcc_cpp_linker,
Mikhail Glushenkov1ce87222008-05-30 06:14:42 +0000125 (case (input_languages_contain "c++"), (inc_weight),
126 (or (parameter_equals "linker", "g++"),
127 (parameter_equals "linker", "c++")), (inc_weight))>,
Mikhail Glushenkov2e6a8442008-05-06 18:17:19 +0000128 ...
129
130 ]>;
131
132As you can see, the edges can be either default or optional, where
Mikhail Glushenkov1ce87222008-05-30 06:14:42 +0000133optional edges are differentiated by sporting a ``case`` expression
134used to calculate the edge's weight.
Mikhail Glushenkov2e6a8442008-05-06 18:17:19 +0000135
Mikhail Glushenkov1ce87222008-05-30 06:14:42 +0000136The default edges are assigned a weight of 1, and optional edges get a
137weight of 0 + 2*N where N is the number of tests that evaluated to
138true in the ``case`` expression. It is also possible to provide an
139integer parameter to ``inc_weight`` and ``dec_weight`` - in this case,
140the weight is increased (or decreased) by the provided value instead
141of the default 2.
142
143When passing an input file through the graph, LLVMC picks the edge
144with the maximum weight. To avoid ambiguity, there should be only one
145default edge between two nodes (with the exception of the root node,
146which gets a special treatment - there you are allowed to specify one
147default edge *per language*).
148
149To get a visual representation of the compilation graph (useful for
150debugging), run ``llvmc2 --view-graph``. You will need ``dot`` and
151``gsview`` installed for this to work properly.
152
153
Mikhail Glushenkov2e6a8442008-05-06 18:17:19 +0000154Writing a tool description
Mikhail Glushenkov772d9c92008-05-30 06:25:24 +0000155==========================
Mikhail Glushenkov2e6a8442008-05-06 18:17:19 +0000156
Mikhail Glushenkov1ce87222008-05-30 06:14:42 +0000157As was said earlier, nodes in the compilation graph represent tools,
158which are described separately. A tool definition looks like this
159(taken from the ``Tools.td`` file)::
Anton Korobeynikove9ffb5b2008-03-23 08:57:20 +0000160
161 def llvm_gcc_cpp : Tool<[
162 (in_language "c++"),
163 (out_language "llvm-assembler"),
164 (output_suffix "bc"),
165 (cmd_line "llvm-g++ -c $INFILE -o $OUTFILE -emit-llvm"),
166 (sink)
167 ]>;
168
169This defines a new tool called ``llvm_gcc_cpp``, which is an alias for
170``llvm-g++``. As you can see, a tool definition is just a list of
Mikhail Glushenkov1ce87222008-05-30 06:14:42 +0000171properties; most of them should be self-explanatory. The ``sink``
172property means that this tool should be passed all command-line
173options that lack explicit descriptions.
Anton Korobeynikove9ffb5b2008-03-23 08:57:20 +0000174
175The complete list of the currently implemented tool properties follows:
176
177* Possible tool properties:
Anton Korobeynikove9ffb5b2008-03-23 08:57:20 +0000178
Mikhail Glushenkov2e6a8442008-05-06 18:17:19 +0000179 - ``in_language`` - input language name.
Anton Korobeynikove9ffb5b2008-03-23 08:57:20 +0000180
Mikhail Glushenkov2e6a8442008-05-06 18:17:19 +0000181 - ``out_language`` - output language name.
Anton Korobeynikove9ffb5b2008-03-23 08:57:20 +0000182
Mikhail Glushenkov2e6a8442008-05-06 18:17:19 +0000183 - ``output_suffix`` - output file suffix.
Anton Korobeynikove9ffb5b2008-03-23 08:57:20 +0000184
Mikhail Glushenkov1ce87222008-05-30 06:14:42 +0000185 - ``cmd_line`` - the actual command used to run the tool. You can
186 use ``$INFILE`` and ``$OUTFILE`` variables, output redirection
187 with ``>``, hook invocations (``$CALL``), environment variables
188 (via ``$ENV``) and the ``case`` construct (more on this below).
Mikhail Glushenkov2e6a8442008-05-06 18:17:19 +0000189
190 - ``join`` - this tool is a "join node" in the graph, i.e. it gets a
Anton Korobeynikove9ffb5b2008-03-23 08:57:20 +0000191 list of input files and joins them together. Used for linkers.
192
Mikhail Glushenkov2e6a8442008-05-06 18:17:19 +0000193 - ``sink`` - all command-line options that are not handled by other
Anton Korobeynikove9ffb5b2008-03-23 08:57:20 +0000194 tools are passed to this tool.
195
196The next tool definition is slightly more complex::
197
198 def llvm_gcc_linker : Tool<[
199 (in_language "object-code"),
200 (out_language "executable"),
201 (output_suffix "out"),
202 (cmd_line "llvm-gcc $INFILE -o $OUTFILE"),
203 (join),
Mikhail Glushenkov1ce87222008-05-30 06:14:42 +0000204 (prefix_list_option "L", (forward),
205 (help "add a directory to link path")),
206 (prefix_list_option "l", (forward),
207 (help "search a library when linking")),
208 (prefix_list_option "Wl", (unpack_values),
209 (help "pass options to linker"))
Anton Korobeynikove9ffb5b2008-03-23 08:57:20 +0000210 ]>;
211
212This tool has a "join" property, which means that it behaves like a
Mikhail Glushenkov1ce87222008-05-30 06:14:42 +0000213linker. This tool also defines several command-line options: ``-l``,
Anton Korobeynikove9ffb5b2008-03-23 08:57:20 +0000214``-L`` and ``-Wl`` which have their usual meaning. An option has two
215attributes: a name and a (possibly empty) list of properties. All
216currently implemented option types and properties are described below:
217
218* Possible option types:
Anton Korobeynikove9ffb5b2008-03-23 08:57:20 +0000219
Mikhail Glushenkov2e6a8442008-05-06 18:17:19 +0000220 - ``switch_option`` - a simple boolean switch, for example ``-time``.
Anton Korobeynikove9ffb5b2008-03-23 08:57:20 +0000221
Mikhail Glushenkov2e6a8442008-05-06 18:17:19 +0000222 - ``parameter_option`` - option that takes an argument, for example
223 ``-std=c99``;
224
225 - ``parameter_list_option`` - same as the above, but more than one
Anton Korobeynikove9ffb5b2008-03-23 08:57:20 +0000226 occurence of the option is allowed.
227
Mikhail Glushenkov2e6a8442008-05-06 18:17:19 +0000228 - ``prefix_option`` - same as the parameter_option, but the option name
Anton Korobeynikove9ffb5b2008-03-23 08:57:20 +0000229 and parameter value are not separated.
230
Mikhail Glushenkov2e6a8442008-05-06 18:17:19 +0000231 - ``prefix_list_option`` - same as the above, but more than one
Anton Korobeynikove9ffb5b2008-03-23 08:57:20 +0000232 occurence of the option is allowed; example: ``-lm -lpthread``.
233
Mikhail Glushenkov2e6a8442008-05-06 18:17:19 +0000234
Anton Korobeynikove9ffb5b2008-03-23 08:57:20 +0000235* Possible option properties:
Anton Korobeynikove9ffb5b2008-03-23 08:57:20 +0000236
Mikhail Glushenkov2e6a8442008-05-06 18:17:19 +0000237 - ``append_cmd`` - append a string to the tool invocation command.
Anton Korobeynikove9ffb5b2008-03-23 08:57:20 +0000238
Mikhail Glushenkov2e6a8442008-05-06 18:17:19 +0000239 - ``forward`` - forward this option unchanged.
Anton Korobeynikove9ffb5b2008-03-23 08:57:20 +0000240
Mikhail Glushenkov1ce87222008-05-30 06:14:42 +0000241 - ``output_suffix`` - modify the output suffix of this
242 tool. Example : ``(switch "E", (output_suffix "i")``.
243
Mikhail Glushenkov2e6a8442008-05-06 18:17:19 +0000244 - ``stop_compilation`` - stop compilation after this phase.
245
246 - ``unpack_values`` - used for for splitting and forwarding
Anton Korobeynikove9ffb5b2008-03-23 08:57:20 +0000247 comma-separated lists of options, e.g. ``-Wa,-foo=bar,-baz`` is
248 converted to ``-foo=bar -baz`` and appended to the tool invocation
249 command.
250
Mikhail Glushenkov1ce87222008-05-30 06:14:42 +0000251 - ``help`` - help string associated with this option. Used for
252 ``--help`` output.
Anton Korobeynikove9ffb5b2008-03-23 08:57:20 +0000253
Mikhail Glushenkov2e6a8442008-05-06 18:17:19 +0000254 - ``required`` - this option is obligatory.
255
Anton Korobeynikove9ffb5b2008-03-23 08:57:20 +0000256
Mikhail Glushenkov772d9c92008-05-30 06:25:24 +0000257Using hooks and environment variables in the ``cmd_line`` property
258==================================================================
Mikhail Glushenkov1ce87222008-05-30 06:14:42 +0000259
260Normally, LLVMC executes programs from the system ``PATH``. Sometimes,
261this is not sufficient: for example, we may want to specify tool names
262in the configuration file. This can be achieved via the mechanism of
263hooks - to compile LLVMC with your hooks, just drop a .cpp file into
264``tools/llvmc2`` directory. Hooks should live in the ``hooks``
265namespace and have the signature ``std::string hooks::MyHookName
266(void)``. They can be used from the ``cmd_line`` tool property::
267
268 (cmd_line "$CALL(MyHook)/path/to/file -o $CALL(AnotherHook)")
269
270It is also possible to use environment variables in the same manner::
271
272 (cmd_line "$ENV(VAR1)/path/to/file -o $ENV(VAR2)")
273
274To change the command line string based on user-provided options use
Mikhail Glushenkov772d9c92008-05-30 06:25:24 +0000275the ``case`` expression (documented below)::
Mikhail Glushenkov1ce87222008-05-30 06:14:42 +0000276
277 (cmd_line
278 (case
279 (switch_on "E"),
280 "llvm-g++ -E -x c $INFILE -o $OUTFILE",
281 (default),
282 "llvm-g++ -c -x c $INFILE -o $OUTFILE -emit-llvm"))
283
Mikhail Glushenkov772d9c92008-05-30 06:25:24 +0000284Conditional evaluation: the ``case`` expression
285===============================================
286
287The 'case' construct can be used to calculate weights of the optional
288edges and to choose between several alternative command line strings
289in the ``cmd_line`` tool property. It is designed after the
290similarly-named construct in functional languages and takes the form
291``(case (test_1), statement_1, (test_2), statement_2, ... (test_N),
292statement_N)``. The statements are evaluated only if the corresponding
293tests evaluate to true.
294
295Examples::
296
297 // Increases edge weight by 5 if "-A" is provided on the
298 // command-line, and by 5 more if "-B" is also provided.
299 (case
300 (switch_on "A"), (inc_weight 5),
301 (switch_on "B"), (inc_weight 5))
302
303 // Evaluates to "cmdline1" if option "-A" is provided on the
304 // command line, otherwise to "cmdline2"
305 (case
306 (switch_on "A"), "cmdline1",
307 (switch_on "B"), "cmdline2",
308 (default), "cmdline3")
309
310Note the slight difference in 'case' expression handling in contexts
311of edge weights and command line specification - in the second example
312the value of the ``"B"`` switch is never checked when switch ``"A"`` is
313enabled, and the whole expression always evaluates to ``"cmdline1"`` in
314that case.
315
316Case expressions can also be nested, i.e. the following is legal::
317
318 (case (switch_on "E"), (case (switch_on "o"), ..., (default), ...)
319 (default), ...)
320
321You should, however, try to avoid doing that because it hurts
322readability. It is usually better to split tool descriptions and/or
323use TableGen inheritance instead.
324
325* Possible tests are:
326
327 - ``switch_on`` - Returns true if a given command-line option is
328 provided by the user. Example: ``(switch_on "opt")``. Note that
329 you have to define all possible command-line options separately in
330 the tool descriptions. See the next section for the discussion of
331 different kinds of command-line options.
332
333 - ``parameter_equals`` - Returns true if a command-line parameter equals
334 a given value. Example: ``(parameter_equals "W", "all")``.
335
336 - ``element_in_list`` - Returns true if a command-line parameter list
337 includes a given value. Example: ``(parameter_in_list "l", "pthread")``.
338
339 - ``input_languages_contain`` - Returns true if a given language
340 belongs to the current input language set. Example:
341 ```(input_languages_contain "c++")``.
342
343 - ``in_language`` - Evaluates to true if the language of the input
344 file equals to the argument. Valid only when using ``case``
345 expression in a ``cmd_line`` tool property. Example:
346 ```(in_language "c++")``.
347
348 - ``not_empty`` - Returns true if a given option (which should be
349 either a parameter or a parameter list) is set by the
350 user. Example: ```(not_empty "o")``.
351
352 - ``default`` - Always evaluates to true. Should always be the last
353 test in the ``case`` expression.
354
355 - ``and`` - A standard logical combinator that returns true iff all
356 of its arguments return true. Used like this: ``(and (test1),
357 (test2), ... (testN))``. Nesting of ``and`` and ``or`` is allowed,
358 but not encouraged.
359
360 - ``or`` - Another logical combinator that returns true only if any
361 one of its arguments returns true. Example: ``(or (test1),
362 (test2), ... (testN))``.
363
Mikhail Glushenkov1ce87222008-05-30 06:14:42 +0000364
Anton Korobeynikove9ffb5b2008-03-23 08:57:20 +0000365Language map
Mikhail Glushenkov772d9c92008-05-30 06:25:24 +0000366============
Anton Korobeynikove9ffb5b2008-03-23 08:57:20 +0000367
Mikhail Glushenkov1ce87222008-05-30 06:14:42 +0000368One last thing that you will need to modify when adding support for a
369new language to LLVMC is the language map, which defines mappings from
Mikhail Glushenkov2e6a8442008-05-06 18:17:19 +0000370file extensions to language names. It is used to choose the proper
Mikhail Glushenkov1ce87222008-05-30 06:14:42 +0000371toolchain(s) for a given input file set. Language map definition is
372located in the file ``Tools.td`` and looks like this::
Anton Korobeynikove9ffb5b2008-03-23 08:57:20 +0000373
374 def LanguageMap : LanguageMap<
375 [LangToSuffixes<"c++", ["cc", "cp", "cxx", "cpp", "CPP", "c++", "C"]>,
376 LangToSuffixes<"c", ["c"]>,
377 ...
378 ]>;
379
380
Anton Korobeynikove9ffb5b2008-03-23 08:57:20 +0000381References
382==========
383
384.. [1] TableGen Fundamentals
385 http://llvm.cs.uiuc.edu/docs/TableGenFundamentals.html