blob: f2c5a34bfd6747199a77c642c62883f38a28756b [file] [log] [blame]
Mikhail Glushenkov772d9c92008-05-30 06:25:24 +00001===================================
Mikhail Glushenkov1ce87222008-05-30 06:14:42 +00002Customizing LLVMC: Reference Manual
3===================================
Anton Korobeynikove9ffb5b2008-03-23 08:57:20 +00004
Mikhail Glushenkov2e6a8442008-05-06 18:17:19 +00005LLVMC is a generic compiler driver, designed to be customizable and
6extensible. It plays the same role for LLVM as the ``gcc`` program
7does for GCC - LLVMC's job is essentially to transform a set of input
8files into a set of targets depending on configuration rules and user
9options. What makes LLVMC different is that these transformation rules
10are completely customizable - in fact, LLVMC knows nothing about the
11specifics of transformation (even the command-line options are mostly
12not hard-coded) and regards the transformation structure as an
13abstract graph. This makes it possible to adapt LLVMC for other
Mikhail Glushenkov1ce87222008-05-30 06:14:42 +000014purposes - for example, as a build tool for game resources.
Anton Korobeynikove9ffb5b2008-03-23 08:57:20 +000015
Mikhail Glushenkov2e6a8442008-05-06 18:17:19 +000016Because LLVMC employs TableGen [1]_ as its configuration language, you
17need to be familiar with it to customize LLVMC.
Anton Korobeynikove9ffb5b2008-03-23 08:57:20 +000018
Mikhail Glushenkov772d9c92008-05-30 06:25:24 +000019
20.. contents::
21
22
Mikhail Glushenkov2e6a8442008-05-06 18:17:19 +000023Compiling with LLVMC
Mikhail Glushenkov772d9c92008-05-30 06:25:24 +000024====================
Anton Korobeynikove9ffb5b2008-03-23 08:57:20 +000025
Mikhail Glushenkov1ce87222008-05-30 06:14:42 +000026LLVMC tries hard to be as compatible with ``gcc`` as possible,
27although there are some small differences. Most of the time, however,
28you shouldn't be able to notice them::
Anton Korobeynikove9ffb5b2008-03-23 08:57:20 +000029
Mikhail Glushenkov1ce87222008-05-30 06:14:42 +000030 $ # This works as expected:
Mikhail Glushenkov2e6a8442008-05-06 18:17:19 +000031 $ llvmc2 -O3 -Wall hello.cpp
32 $ ./a.out
33 hello
Anton Korobeynikove9ffb5b2008-03-23 08:57:20 +000034
Mikhail Glushenkov1ce87222008-05-30 06:14:42 +000035One nice feature of LLVMC is that one doesn't have to distinguish
Mikhail Glushenkov2e6a8442008-05-06 18:17:19 +000036between different compilers for different languages (think ``g++`` and
37``gcc``) - the right toolchain is chosen automatically based on input
Mikhail Glushenkov1ce87222008-05-30 06:14:42 +000038language names (which are, in turn, determined from file
39extensions). If you want to force files ending with ".c" to compile as
40C++, use the ``-x`` option, just like you would do it with ``gcc``::
Anton Korobeynikove9ffb5b2008-03-23 08:57:20 +000041
Mikhail Glushenkov2e6a8442008-05-06 18:17:19 +000042 $ llvmc2 -x c hello.cpp
43 $ # hello.cpp is really a C file
44 $ ./a.out
45 hello
Anton Korobeynikove9ffb5b2008-03-23 08:57:20 +000046
Mikhail Glushenkov2e6a8442008-05-06 18:17:19 +000047On the other hand, when using LLVMC as a linker to combine several C++
48object files you should provide the ``--linker`` option since it's
49impossible for LLVMC to choose the right linker in that case::
Anton Korobeynikove9ffb5b2008-03-23 08:57:20 +000050
Mikhail Glushenkov2e6a8442008-05-06 18:17:19 +000051 $ llvmc2 -c hello.cpp
52 $ llvmc2 hello.o
53 [A lot of link-time errors skipped]
54 $ llvmc2 --linker=c++ hello.o
55 $ ./a.out
56 hello
Anton Korobeynikove9ffb5b2008-03-23 08:57:20 +000057
Mikhail Glushenkov772d9c92008-05-30 06:25:24 +000058Predefined options
59==================
60
61LLVMC has some built-in options that can't be overridden in the
62configuration files:
63
64* ``-o FILE`` - Output file name.
65
66* ``-x LANGUAGE`` - Specify the language of the following input files
67 until the next -x option.
68
69* ``-v`` - Enable verbose mode, i.e. print out all executed commands.
70
71* ``--view-graph`` - Show a graphical representation of the compilation
72 graph. Requires that you have ``dot`` and ``gv`` commands
73 installed. Hidden option, useful for debugging.
74
75* ``--write-graph`` - Write a ``compilation-graph.dot`` file in the
76 current directory with the compilation graph description in the
77 Graphviz format. Hidden option, useful for debugging.
78
Mikhail Glushenkov2e6a8442008-05-06 18:17:19 +000079
80Customizing LLVMC: the compilation graph
Mikhail Glushenkov772d9c92008-05-30 06:25:24 +000081========================================
Mikhail Glushenkov2e6a8442008-05-06 18:17:19 +000082
83At the time of writing LLVMC does not support on-the-fly reloading of
Mikhail Glushenkov1ce87222008-05-30 06:14:42 +000084configuration, so to customize LLVMC you'll have to recompile the
85source code (which lives under ``$LLVM_DIR/tools/llvmc2``). The
86default configuration files are ``Common.td`` (contains common
87definitions, don't forget to ``include`` it in your configuration
88files), ``Tools.td`` (tool descriptions) and ``Graph.td`` (compilation
89graph definition).
Mikhail Glushenkov2e6a8442008-05-06 18:17:19 +000090
Mikhail Glushenkov1ce87222008-05-30 06:14:42 +000091To compile LLVMC with your own configuration file (say,``MyGraph.td``),
92run ``make`` like this::
Mikhail Glushenkov2e6a8442008-05-06 18:17:19 +000093
Mikhail Glushenkov1ce87222008-05-30 06:14:42 +000094 $ cd $LLVM_DIR/tools/llvmc2
95 $ make GRAPH=MyGraph.td TOOLNAME=my_llvmc
96
97This will build an executable named ``my_llvmc``. There are also
98several sample configuration files in the ``llvmc2/examples``
99subdirectory that should help to get you started.
100
101Internally, LLVMC stores information about possible source
102transformations in form of a graph. Nodes in this graph represent
103tools, and edges between two nodes represent a transformation path. A
104special "root" node is used to mark entry points for the
105transformations. LLVMC also assigns a weight to each edge (more on
106this later) to choose between several alternative edges.
107
108The definition of the compilation graph (see file ``Graph.td``) is
Mikhail Glushenkov2e6a8442008-05-06 18:17:19 +0000109just a list of edges::
110
111 def CompilationGraph : CompilationGraph<[
112 Edge<root, llvm_gcc_c>,
113 Edge<root, llvm_gcc_assembler>,
114 ...
115
116 Edge<llvm_gcc_c, llc>,
117 Edge<llvm_gcc_cpp, llc>,
118 ...
119
120 OptionalEdge<llvm_gcc_c, opt, [(switch_on "opt")]>,
121 OptionalEdge<llvm_gcc_cpp, opt, [(switch_on "opt")]>,
122 ...
123
124 OptionalEdge<llvm_gcc_assembler, llvm_gcc_cpp_linker,
Mikhail Glushenkov1ce87222008-05-30 06:14:42 +0000125 (case (input_languages_contain "c++"), (inc_weight),
126 (or (parameter_equals "linker", "g++"),
127 (parameter_equals "linker", "c++")), (inc_weight))>,
Mikhail Glushenkov2e6a8442008-05-06 18:17:19 +0000128 ...
129
130 ]>;
131
132As you can see, the edges can be either default or optional, where
Mikhail Glushenkov1ce87222008-05-30 06:14:42 +0000133optional edges are differentiated by sporting a ``case`` expression
134used to calculate the edge's weight.
Mikhail Glushenkov2e6a8442008-05-06 18:17:19 +0000135
Mikhail Glushenkov1ce87222008-05-30 06:14:42 +0000136The default edges are assigned a weight of 1, and optional edges get a
137weight of 0 + 2*N where N is the number of tests that evaluated to
138true in the ``case`` expression. It is also possible to provide an
139integer parameter to ``inc_weight`` and ``dec_weight`` - in this case,
140the weight is increased (or decreased) by the provided value instead
141of the default 2.
142
143When passing an input file through the graph, LLVMC picks the edge
144with the maximum weight. To avoid ambiguity, there should be only one
145default edge between two nodes (with the exception of the root node,
146which gets a special treatment - there you are allowed to specify one
147default edge *per language*).
148
149To get a visual representation of the compilation graph (useful for
150debugging), run ``llvmc2 --view-graph``. You will need ``dot`` and
151``gsview`` installed for this to work properly.
152
153
Mikhail Glushenkov2e6a8442008-05-06 18:17:19 +0000154Writing a tool description
Mikhail Glushenkov772d9c92008-05-30 06:25:24 +0000155==========================
Mikhail Glushenkov2e6a8442008-05-06 18:17:19 +0000156
Mikhail Glushenkov1ce87222008-05-30 06:14:42 +0000157As was said earlier, nodes in the compilation graph represent tools,
158which are described separately. A tool definition looks like this
159(taken from the ``Tools.td`` file)::
Anton Korobeynikove9ffb5b2008-03-23 08:57:20 +0000160
161 def llvm_gcc_cpp : Tool<[
162 (in_language "c++"),
163 (out_language "llvm-assembler"),
164 (output_suffix "bc"),
165 (cmd_line "llvm-g++ -c $INFILE -o $OUTFILE -emit-llvm"),
166 (sink)
167 ]>;
168
169This defines a new tool called ``llvm_gcc_cpp``, which is an alias for
170``llvm-g++``. As you can see, a tool definition is just a list of
Mikhail Glushenkov1ce87222008-05-30 06:14:42 +0000171properties; most of them should be self-explanatory. The ``sink``
172property means that this tool should be passed all command-line
173options that lack explicit descriptions.
Anton Korobeynikove9ffb5b2008-03-23 08:57:20 +0000174
175The complete list of the currently implemented tool properties follows:
176
177* Possible tool properties:
Anton Korobeynikove9ffb5b2008-03-23 08:57:20 +0000178
Mikhail Glushenkov2e6a8442008-05-06 18:17:19 +0000179 - ``in_language`` - input language name.
Anton Korobeynikove9ffb5b2008-03-23 08:57:20 +0000180
Mikhail Glushenkov2e6a8442008-05-06 18:17:19 +0000181 - ``out_language`` - output language name.
Anton Korobeynikove9ffb5b2008-03-23 08:57:20 +0000182
Mikhail Glushenkov2e6a8442008-05-06 18:17:19 +0000183 - ``output_suffix`` - output file suffix.
Anton Korobeynikove9ffb5b2008-03-23 08:57:20 +0000184
Mikhail Glushenkov1ce87222008-05-30 06:14:42 +0000185 - ``cmd_line`` - the actual command used to run the tool. You can
186 use ``$INFILE`` and ``$OUTFILE`` variables, output redirection
187 with ``>``, hook invocations (``$CALL``), environment variables
188 (via ``$ENV``) and the ``case`` construct (more on this below).
Mikhail Glushenkov2e6a8442008-05-06 18:17:19 +0000189
190 - ``join`` - this tool is a "join node" in the graph, i.e. it gets a
Anton Korobeynikove9ffb5b2008-03-23 08:57:20 +0000191 list of input files and joins them together. Used for linkers.
192
Mikhail Glushenkov2e6a8442008-05-06 18:17:19 +0000193 - ``sink`` - all command-line options that are not handled by other
Anton Korobeynikove9ffb5b2008-03-23 08:57:20 +0000194 tools are passed to this tool.
195
196The next tool definition is slightly more complex::
197
198 def llvm_gcc_linker : Tool<[
199 (in_language "object-code"),
200 (out_language "executable"),
201 (output_suffix "out"),
202 (cmd_line "llvm-gcc $INFILE -o $OUTFILE"),
203 (join),
Mikhail Glushenkov1ce87222008-05-30 06:14:42 +0000204 (prefix_list_option "L", (forward),
205 (help "add a directory to link path")),
206 (prefix_list_option "l", (forward),
207 (help "search a library when linking")),
208 (prefix_list_option "Wl", (unpack_values),
209 (help "pass options to linker"))
Anton Korobeynikove9ffb5b2008-03-23 08:57:20 +0000210 ]>;
211
212This tool has a "join" property, which means that it behaves like a
Mikhail Glushenkov1ce87222008-05-30 06:14:42 +0000213linker. This tool also defines several command-line options: ``-l``,
Anton Korobeynikove9ffb5b2008-03-23 08:57:20 +0000214``-L`` and ``-Wl`` which have their usual meaning. An option has two
215attributes: a name and a (possibly empty) list of properties. All
216currently implemented option types and properties are described below:
217
218* Possible option types:
Anton Korobeynikove9ffb5b2008-03-23 08:57:20 +0000219
Mikhail Glushenkov2e6a8442008-05-06 18:17:19 +0000220 - ``switch_option`` - a simple boolean switch, for example ``-time``.
Anton Korobeynikove9ffb5b2008-03-23 08:57:20 +0000221
Mikhail Glushenkov2e6a8442008-05-06 18:17:19 +0000222 - ``parameter_option`` - option that takes an argument, for example
223 ``-std=c99``;
224
225 - ``parameter_list_option`` - same as the above, but more than one
Anton Korobeynikove9ffb5b2008-03-23 08:57:20 +0000226 occurence of the option is allowed.
227
Mikhail Glushenkov2e6a8442008-05-06 18:17:19 +0000228 - ``prefix_option`` - same as the parameter_option, but the option name
Anton Korobeynikove9ffb5b2008-03-23 08:57:20 +0000229 and parameter value are not separated.
230
Mikhail Glushenkov2e6a8442008-05-06 18:17:19 +0000231 - ``prefix_list_option`` - same as the above, but more than one
Anton Korobeynikove9ffb5b2008-03-23 08:57:20 +0000232 occurence of the option is allowed; example: ``-lm -lpthread``.
233
Mikhail Glushenkov75ade502008-05-30 06:28:00 +0000234 - ``alias_option`` - a special option type for creating
235 aliases. Unlike other option types, aliases are not allowed to
236 have any properties besides the aliased option name. Usage
237 example: ``(alias_option "preprocess", "E")``
238
Mikhail Glushenkov2e6a8442008-05-06 18:17:19 +0000239
Anton Korobeynikove9ffb5b2008-03-23 08:57:20 +0000240* Possible option properties:
Anton Korobeynikove9ffb5b2008-03-23 08:57:20 +0000241
Mikhail Glushenkov2e6a8442008-05-06 18:17:19 +0000242 - ``append_cmd`` - append a string to the tool invocation command.
Anton Korobeynikove9ffb5b2008-03-23 08:57:20 +0000243
Mikhail Glushenkov2e6a8442008-05-06 18:17:19 +0000244 - ``forward`` - forward this option unchanged.
Anton Korobeynikove9ffb5b2008-03-23 08:57:20 +0000245
Mikhail Glushenkov1ce87222008-05-30 06:14:42 +0000246 - ``output_suffix`` - modify the output suffix of this
247 tool. Example : ``(switch "E", (output_suffix "i")``.
248
Mikhail Glushenkov2e6a8442008-05-06 18:17:19 +0000249 - ``stop_compilation`` - stop compilation after this phase.
250
251 - ``unpack_values`` - used for for splitting and forwarding
Anton Korobeynikove9ffb5b2008-03-23 08:57:20 +0000252 comma-separated lists of options, e.g. ``-Wa,-foo=bar,-baz`` is
253 converted to ``-foo=bar -baz`` and appended to the tool invocation
254 command.
255
Mikhail Glushenkov1ce87222008-05-30 06:14:42 +0000256 - ``help`` - help string associated with this option. Used for
257 ``--help`` output.
Anton Korobeynikove9ffb5b2008-03-23 08:57:20 +0000258
Mikhail Glushenkov2e6a8442008-05-06 18:17:19 +0000259 - ``required`` - this option is obligatory.
260
Anton Korobeynikove9ffb5b2008-03-23 08:57:20 +0000261
Mikhail Glushenkov75ade502008-05-30 06:28:00 +0000262Option list - specifying all options in a single place
263======================================================
264
265It can be handy to have all information about options gathered in a
266single place to provide an overview. This can be achieved by using a
267so-called ``OptionList``::
268
269 def Options : OptionList<[
270 (switch_option "E", (help "Help string")),
271 (alias_option "quiet", "q")
272 ...
273 ]>;
274
275``OptionList`` is also a good place to specify option aliases.
276
277Tool-specific option properties like ``append_cmd`` have (obviously)
278no meaning in the context of ``OptionList``, so the only properties
279allowed there are ``help`` and ``required``.
280
281Option lists are used at the file scope. See file
282``examples/Clang.td`` for an example of ``OptionList`` usage.
283
Mikhail Glushenkov772d9c92008-05-30 06:25:24 +0000284Using hooks and environment variables in the ``cmd_line`` property
285==================================================================
Mikhail Glushenkov1ce87222008-05-30 06:14:42 +0000286
287Normally, LLVMC executes programs from the system ``PATH``. Sometimes,
288this is not sufficient: for example, we may want to specify tool names
289in the configuration file. This can be achieved via the mechanism of
290hooks - to compile LLVMC with your hooks, just drop a .cpp file into
291``tools/llvmc2`` directory. Hooks should live in the ``hooks``
292namespace and have the signature ``std::string hooks::MyHookName
293(void)``. They can be used from the ``cmd_line`` tool property::
294
295 (cmd_line "$CALL(MyHook)/path/to/file -o $CALL(AnotherHook)")
296
297It is also possible to use environment variables in the same manner::
298
299 (cmd_line "$ENV(VAR1)/path/to/file -o $ENV(VAR2)")
300
301To change the command line string based on user-provided options use
Mikhail Glushenkov772d9c92008-05-30 06:25:24 +0000302the ``case`` expression (documented below)::
Mikhail Glushenkov1ce87222008-05-30 06:14:42 +0000303
304 (cmd_line
305 (case
306 (switch_on "E"),
307 "llvm-g++ -E -x c $INFILE -o $OUTFILE",
308 (default),
309 "llvm-g++ -c -x c $INFILE -o $OUTFILE -emit-llvm"))
310
Mikhail Glushenkov772d9c92008-05-30 06:25:24 +0000311Conditional evaluation: the ``case`` expression
312===============================================
313
314The 'case' construct can be used to calculate weights of the optional
315edges and to choose between several alternative command line strings
316in the ``cmd_line`` tool property. It is designed after the
317similarly-named construct in functional languages and takes the form
318``(case (test_1), statement_1, (test_2), statement_2, ... (test_N),
319statement_N)``. The statements are evaluated only if the corresponding
320tests evaluate to true.
321
322Examples::
323
324 // Increases edge weight by 5 if "-A" is provided on the
325 // command-line, and by 5 more if "-B" is also provided.
326 (case
327 (switch_on "A"), (inc_weight 5),
328 (switch_on "B"), (inc_weight 5))
329
330 // Evaluates to "cmdline1" if option "-A" is provided on the
331 // command line, otherwise to "cmdline2"
332 (case
333 (switch_on "A"), "cmdline1",
334 (switch_on "B"), "cmdline2",
335 (default), "cmdline3")
336
337Note the slight difference in 'case' expression handling in contexts
338of edge weights and command line specification - in the second example
339the value of the ``"B"`` switch is never checked when switch ``"A"`` is
340enabled, and the whole expression always evaluates to ``"cmdline1"`` in
341that case.
342
343Case expressions can also be nested, i.e. the following is legal::
344
345 (case (switch_on "E"), (case (switch_on "o"), ..., (default), ...)
346 (default), ...)
347
348You should, however, try to avoid doing that because it hurts
349readability. It is usually better to split tool descriptions and/or
350use TableGen inheritance instead.
351
352* Possible tests are:
353
354 - ``switch_on`` - Returns true if a given command-line option is
355 provided by the user. Example: ``(switch_on "opt")``. Note that
356 you have to define all possible command-line options separately in
357 the tool descriptions. See the next section for the discussion of
358 different kinds of command-line options.
359
360 - ``parameter_equals`` - Returns true if a command-line parameter equals
361 a given value. Example: ``(parameter_equals "W", "all")``.
362
363 - ``element_in_list`` - Returns true if a command-line parameter list
364 includes a given value. Example: ``(parameter_in_list "l", "pthread")``.
365
366 - ``input_languages_contain`` - Returns true if a given language
367 belongs to the current input language set. Example:
368 ```(input_languages_contain "c++")``.
369
370 - ``in_language`` - Evaluates to true if the language of the input
371 file equals to the argument. Valid only when using ``case``
372 expression in a ``cmd_line`` tool property. Example:
373 ```(in_language "c++")``.
374
375 - ``not_empty`` - Returns true if a given option (which should be
376 either a parameter or a parameter list) is set by the
377 user. Example: ```(not_empty "o")``.
378
379 - ``default`` - Always evaluates to true. Should always be the last
380 test in the ``case`` expression.
381
382 - ``and`` - A standard logical combinator that returns true iff all
383 of its arguments return true. Used like this: ``(and (test1),
384 (test2), ... (testN))``. Nesting of ``and`` and ``or`` is allowed,
385 but not encouraged.
386
387 - ``or`` - Another logical combinator that returns true only if any
388 one of its arguments returns true. Example: ``(or (test1),
389 (test2), ... (testN))``.
390
Mikhail Glushenkov1ce87222008-05-30 06:14:42 +0000391
Anton Korobeynikove9ffb5b2008-03-23 08:57:20 +0000392Language map
Mikhail Glushenkov772d9c92008-05-30 06:25:24 +0000393============
Anton Korobeynikove9ffb5b2008-03-23 08:57:20 +0000394
Mikhail Glushenkov1ce87222008-05-30 06:14:42 +0000395One last thing that you will need to modify when adding support for a
396new language to LLVMC is the language map, which defines mappings from
Mikhail Glushenkov2e6a8442008-05-06 18:17:19 +0000397file extensions to language names. It is used to choose the proper
Mikhail Glushenkov1ce87222008-05-30 06:14:42 +0000398toolchain(s) for a given input file set. Language map definition is
399located in the file ``Tools.td`` and looks like this::
Anton Korobeynikove9ffb5b2008-03-23 08:57:20 +0000400
401 def LanguageMap : LanguageMap<
402 [LangToSuffixes<"c++", ["cc", "cp", "cxx", "cpp", "CPP", "c++", "C"]>,
403 LangToSuffixes<"c", ["c"]>,
404 ...
405 ]>;
406
407
Anton Korobeynikove9ffb5b2008-03-23 08:57:20 +0000408References
409==========
410
411.. [1] TableGen Fundamentals
412 http://llvm.cs.uiuc.edu/docs/TableGenFundamentals.html