Mikhail Glushenkov | 77ddce9 | 2008-05-06 18:17:19 +0000 | [diff] [blame] | 1 | Tutorial - Writing LLVMC Configuration files |
Anton Korobeynikov | ac67b7e | 2008-03-23 08:57:20 +0000 | [diff] [blame] | 2 | ============================================= |
| 3 | |
Mikhail Glushenkov | 77ddce9 | 2008-05-06 18:17:19 +0000 | [diff] [blame] | 4 | LLVMC is a generic compiler driver, designed to be customizable and |
| 5 | extensible. It plays the same role for LLVM as the ``gcc`` program |
| 6 | does for GCC - LLVMC's job is essentially to transform a set of input |
| 7 | files into a set of targets depending on configuration rules and user |
| 8 | options. What makes LLVMC different is that these transformation rules |
| 9 | are completely customizable - in fact, LLVMC knows nothing about the |
| 10 | specifics of transformation (even the command-line options are mostly |
| 11 | not hard-coded) and regards the transformation structure as an |
| 12 | abstract graph. This makes it possible to adapt LLVMC for other |
| 13 | purposes - for example, as a build tool for game resources. This |
| 14 | tutorial describes the basic usage and configuration of LLVMC. |
Anton Korobeynikov | ac67b7e | 2008-03-23 08:57:20 +0000 | [diff] [blame] | 15 | |
Mikhail Glushenkov | 77ddce9 | 2008-05-06 18:17:19 +0000 | [diff] [blame] | 16 | Because LLVMC employs TableGen [1]_ as its configuration language, you |
| 17 | need to be familiar with it to customize LLVMC. |
Anton Korobeynikov | ac67b7e | 2008-03-23 08:57:20 +0000 | [diff] [blame] | 18 | |
Mikhail Glushenkov | 77ddce9 | 2008-05-06 18:17:19 +0000 | [diff] [blame] | 19 | Compiling with LLVMC |
| 20 | -------------------- |
Anton Korobeynikov | ac67b7e | 2008-03-23 08:57:20 +0000 | [diff] [blame] | 21 | |
Mikhail Glushenkov | 77ddce9 | 2008-05-06 18:17:19 +0000 | [diff] [blame] | 22 | In general, LLVMC tries to be command-line compatible with ``gcc`` as |
| 23 | much as possible, so most of the familiar options work:: |
Anton Korobeynikov | ac67b7e | 2008-03-23 08:57:20 +0000 | [diff] [blame] | 24 | |
Mikhail Glushenkov | 77ddce9 | 2008-05-06 18:17:19 +0000 | [diff] [blame] | 25 | $ llvmc2 -O3 -Wall hello.cpp |
| 26 | $ ./a.out |
| 27 | hello |
Anton Korobeynikov | ac67b7e | 2008-03-23 08:57:20 +0000 | [diff] [blame] | 28 | |
Mikhail Glushenkov | 77ddce9 | 2008-05-06 18:17:19 +0000 | [diff] [blame] | 29 | One nice feature of LLVMC is that you don't have to distinguish |
| 30 | between different compilers for different languages (think ``g++`` and |
| 31 | ``gcc``) - the right toolchain is chosen automatically based on input |
| 32 | language names (which are, in turn, determined from file extension). If |
| 33 | you want to force files ending with ".c" compile as C++, use the |
| 34 | ``-x`` option, just like you would do it with ``gcc``:: |
Anton Korobeynikov | ac67b7e | 2008-03-23 08:57:20 +0000 | [diff] [blame] | 35 | |
Mikhail Glushenkov | 77ddce9 | 2008-05-06 18:17:19 +0000 | [diff] [blame] | 36 | $ llvmc2 -x c hello.cpp |
| 37 | $ # hello.cpp is really a C file |
| 38 | $ ./a.out |
| 39 | hello |
Anton Korobeynikov | ac67b7e | 2008-03-23 08:57:20 +0000 | [diff] [blame] | 40 | |
Mikhail Glushenkov | 77ddce9 | 2008-05-06 18:17:19 +0000 | [diff] [blame] | 41 | On the other hand, when using LLVMC as a linker to combine several C++ |
| 42 | object files you should provide the ``--linker`` option since it's |
| 43 | impossible for LLVMC to choose the right linker in that case:: |
Anton Korobeynikov | ac67b7e | 2008-03-23 08:57:20 +0000 | [diff] [blame] | 44 | |
Mikhail Glushenkov | 77ddce9 | 2008-05-06 18:17:19 +0000 | [diff] [blame] | 45 | $ llvmc2 -c hello.cpp |
| 46 | $ llvmc2 hello.o |
| 47 | [A lot of link-time errors skipped] |
| 48 | $ llvmc2 --linker=c++ hello.o |
| 49 | $ ./a.out |
| 50 | hello |
Anton Korobeynikov | ac67b7e | 2008-03-23 08:57:20 +0000 | [diff] [blame] | 51 | |
Mikhail Glushenkov | 77ddce9 | 2008-05-06 18:17:19 +0000 | [diff] [blame] | 52 | For further help on command-line LLVMC usage, refer to the ``llvmc |
| 53 | --help`` output. |
| 54 | |
| 55 | Customizing LLVMC: the compilation graph |
| 56 | ---------------------------------------- |
| 57 | |
| 58 | At the time of writing LLVMC does not support on-the-fly reloading of |
| 59 | configuration, so to customize LLVMC you'll have to edit and recompile |
| 60 | the source code (which lives under ``$LLVM_DIR/tools/llvmc2``). The |
| 61 | relevant files are ``Common.td``, ``Tools.td`` and ``Example.td``. |
| 62 | |
| 63 | Internally, LLVMC stores information about possible transformations in |
| 64 | form of a graph. Nodes in this graph represent tools, and edges |
| 65 | between two nodes represent a transformation path. A special "root" |
| 66 | node represents entry points for the transformations. LLVMC also |
| 67 | assigns a weight to each edge (more on that below) to choose between |
| 68 | several alternative edges. |
| 69 | |
| 70 | The definition of the compilation graph (see file ``Example.td``) is |
| 71 | just a list of edges:: |
| 72 | |
| 73 | def CompilationGraph : CompilationGraph<[ |
| 74 | Edge<root, llvm_gcc_c>, |
| 75 | Edge<root, llvm_gcc_assembler>, |
| 76 | ... |
| 77 | |
| 78 | Edge<llvm_gcc_c, llc>, |
| 79 | Edge<llvm_gcc_cpp, llc>, |
| 80 | ... |
| 81 | |
| 82 | OptionalEdge<llvm_gcc_c, opt, [(switch_on "opt")]>, |
| 83 | OptionalEdge<llvm_gcc_cpp, opt, [(switch_on "opt")]>, |
| 84 | ... |
| 85 | |
| 86 | OptionalEdge<llvm_gcc_assembler, llvm_gcc_cpp_linker, |
| 87 | [(if_input_languages_contain "c++"), |
| 88 | (or (parameter_equals "linker", "g++"), |
| 89 | (parameter_equals "linker", "c++"))]>, |
| 90 | ... |
| 91 | |
| 92 | ]>; |
| 93 | |
| 94 | As you can see, the edges can be either default or optional, where |
| 95 | optional edges are differentiated by sporting a list of patterns (or |
| 96 | edge properties) which are used to calculate the edge's weight. The |
| 97 | default edges are assigned a weight of 1, and optional edges get a |
| 98 | weight of 0 + 2*N where N is the number of succesful edge property |
| 99 | matches. When passing an input file through the graph, LLVMC picks the |
| 100 | edge with the maximum weight. To avoid ambiguity, there should be only |
| 101 | one default edge between two nodes (with the exception of the root |
| 102 | node, which gets a special treatment - there you are allowed to |
| 103 | specify one default edge *per language*). |
| 104 | |
| 105 | * Possible edge properties are: |
| 106 | |
| 107 | - ``switch_on`` - Returns true if a given command-line option is |
| 108 | provided by the user. Example: ``(switch_on "opt")``. Note that |
| 109 | you have to define all possible command-line options separately in |
| 110 | the tool descriptions. See the next section for the discussion of |
| 111 | different kinds of command-line options. |
| 112 | |
| 113 | - ``parameter_equals`` - Returns true if a command-line parameter equals |
| 114 | a given value. Example: ``(parameter_equals "W", "all")``. |
| 115 | |
| 116 | - ``element_in_list`` - Returns true if a command-line parameter list |
| 117 | includes a given value. Example: ``(parameter_in_list "l", "pthread")``. |
| 118 | |
| 119 | - ``if_input_languages_contain`` - Returns true if a given input |
| 120 | language belongs to the current input language set. |
| 121 | |
| 122 | - ``and`` - Edge property combinator. Returns true if all of its |
Mikhail Glushenkov | 2906355 | 2008-05-06 18:18:20 +0000 | [diff] [blame] | 123 | arguments return true. Used like this: ``(and (prop1), (prop2), |
| 124 | ... (propN))``. Nesting is allowed, but not encouraged. |
Mikhail Glushenkov | 77ddce9 | 2008-05-06 18:17:19 +0000 | [diff] [blame] | 125 | |
| 126 | - ``or`` - Edge property combinator that returns true if any one of its |
Mikhail Glushenkov | 2906355 | 2008-05-06 18:18:20 +0000 | [diff] [blame] | 127 | arguments returns true. Example: ``(or (prop1), (prop2), ... (propN))``. |
| 128 | |
| 129 | - ``weight`` - Makes it possible to explicitly specify the quantity |
| 130 | added to the edge weight if this edge property matches. Used like |
| 131 | this: ``(weight N, (prop))``. The inner property can include |
| 132 | ``and`` and ``or`` combinators. When N is equal to 2, equivalent |
| 133 | to ``(prop)``. |
| 134 | |
| 135 | Example: ``(weight 8, (and (switch_on "a"), (switch_on "b")))``. |
| 136 | |
Mikhail Glushenkov | 77ddce9 | 2008-05-06 18:17:19 +0000 | [diff] [blame] | 137 | |
| 138 | To get a visual representation of the compilation graph (useful for |
| 139 | debugging), run ``llvmc2 --view-graph``. You will need ``dot`` and |
| 140 | ``gsview`` installed for this to work properly. |
| 141 | |
| 142 | |
| 143 | Writing a tool description |
| 144 | -------------------------- |
| 145 | |
| 146 | As was said earlier, nodes in the compilation graph represent tools. A |
| 147 | tool definition looks like this (taken from the ``Tools.td`` file):: |
Anton Korobeynikov | ac67b7e | 2008-03-23 08:57:20 +0000 | [diff] [blame] | 148 | |
| 149 | def llvm_gcc_cpp : Tool<[ |
| 150 | (in_language "c++"), |
| 151 | (out_language "llvm-assembler"), |
| 152 | (output_suffix "bc"), |
| 153 | (cmd_line "llvm-g++ -c $INFILE -o $OUTFILE -emit-llvm"), |
| 154 | (sink) |
| 155 | ]>; |
| 156 | |
| 157 | This defines a new tool called ``llvm_gcc_cpp``, which is an alias for |
| 158 | ``llvm-g++``. As you can see, a tool definition is just a list of |
| 159 | properties; most of them should be self-evident. The ``sink`` property |
| 160 | means that this tool should be passed all command-line options that |
| 161 | aren't handled by the other tools. |
| 162 | |
| 163 | The complete list of the currently implemented tool properties follows: |
| 164 | |
| 165 | * Possible tool properties: |
Anton Korobeynikov | ac67b7e | 2008-03-23 08:57:20 +0000 | [diff] [blame] | 166 | |
Mikhail Glushenkov | 77ddce9 | 2008-05-06 18:17:19 +0000 | [diff] [blame] | 167 | - ``in_language`` - input language name. |
Anton Korobeynikov | ac67b7e | 2008-03-23 08:57:20 +0000 | [diff] [blame] | 168 | |
Mikhail Glushenkov | 77ddce9 | 2008-05-06 18:17:19 +0000 | [diff] [blame] | 169 | - ``out_language`` - output language name. |
Anton Korobeynikov | ac67b7e | 2008-03-23 08:57:20 +0000 | [diff] [blame] | 170 | |
Mikhail Glushenkov | 77ddce9 | 2008-05-06 18:17:19 +0000 | [diff] [blame] | 171 | - ``output_suffix`` - output file suffix. |
Anton Korobeynikov | ac67b7e | 2008-03-23 08:57:20 +0000 | [diff] [blame] | 172 | |
Mikhail Glushenkov | 77ddce9 | 2008-05-06 18:17:19 +0000 | [diff] [blame] | 173 | - ``cmd_line`` - the actual command used to run the tool. You can use |
| 174 | ``$INFILE`` and ``$OUTFILE`` variables, as well as output |
| 175 | redirection with ``>``. |
| 176 | |
| 177 | - ``join`` - this tool is a "join node" in the graph, i.e. it gets a |
Anton Korobeynikov | ac67b7e | 2008-03-23 08:57:20 +0000 | [diff] [blame] | 178 | list of input files and joins them together. Used for linkers. |
| 179 | |
Mikhail Glushenkov | 77ddce9 | 2008-05-06 18:17:19 +0000 | [diff] [blame] | 180 | - ``sink`` - all command-line options that are not handled by other |
Anton Korobeynikov | ac67b7e | 2008-03-23 08:57:20 +0000 | [diff] [blame] | 181 | tools are passed to this tool. |
| 182 | |
| 183 | The next tool definition is slightly more complex:: |
| 184 | |
| 185 | def llvm_gcc_linker : Tool<[ |
| 186 | (in_language "object-code"), |
| 187 | (out_language "executable"), |
| 188 | (output_suffix "out"), |
| 189 | (cmd_line "llvm-gcc $INFILE -o $OUTFILE"), |
| 190 | (join), |
| 191 | (prefix_list_option "L", (forward), (help "add a directory to link path")), |
| 192 | (prefix_list_option "l", (forward), (help "search a library when linking")), |
| 193 | (prefix_list_option "Wl", (unpack_values), (help "pass options to linker")) |
| 194 | ]>; |
| 195 | |
| 196 | This tool has a "join" property, which means that it behaves like a |
| 197 | linker (because of that this tool should be the last in the |
| 198 | toolchain). This tool also defines several command-line options: ``-l``, |
| 199 | ``-L`` and ``-Wl`` which have their usual meaning. An option has two |
| 200 | attributes: a name and a (possibly empty) list of properties. All |
| 201 | currently implemented option types and properties are described below: |
| 202 | |
| 203 | * Possible option types: |
Anton Korobeynikov | ac67b7e | 2008-03-23 08:57:20 +0000 | [diff] [blame] | 204 | |
Mikhail Glushenkov | 77ddce9 | 2008-05-06 18:17:19 +0000 | [diff] [blame] | 205 | - ``switch_option`` - a simple boolean switch, for example ``-time``. |
Anton Korobeynikov | ac67b7e | 2008-03-23 08:57:20 +0000 | [diff] [blame] | 206 | |
Mikhail Glushenkov | 77ddce9 | 2008-05-06 18:17:19 +0000 | [diff] [blame] | 207 | - ``parameter_option`` - option that takes an argument, for example |
| 208 | ``-std=c99``; |
| 209 | |
| 210 | - ``parameter_list_option`` - same as the above, but more than one |
Anton Korobeynikov | ac67b7e | 2008-03-23 08:57:20 +0000 | [diff] [blame] | 211 | occurence of the option is allowed. |
| 212 | |
Mikhail Glushenkov | 77ddce9 | 2008-05-06 18:17:19 +0000 | [diff] [blame] | 213 | - ``prefix_option`` - same as the parameter_option, but the option name |
Anton Korobeynikov | ac67b7e | 2008-03-23 08:57:20 +0000 | [diff] [blame] | 214 | and parameter value are not separated. |
| 215 | |
Mikhail Glushenkov | 77ddce9 | 2008-05-06 18:17:19 +0000 | [diff] [blame] | 216 | - ``prefix_list_option`` - same as the above, but more than one |
Anton Korobeynikov | ac67b7e | 2008-03-23 08:57:20 +0000 | [diff] [blame] | 217 | occurence of the option is allowed; example: ``-lm -lpthread``. |
| 218 | |
Mikhail Glushenkov | 77ddce9 | 2008-05-06 18:17:19 +0000 | [diff] [blame] | 219 | |
Anton Korobeynikov | ac67b7e | 2008-03-23 08:57:20 +0000 | [diff] [blame] | 220 | * Possible option properties: |
Anton Korobeynikov | ac67b7e | 2008-03-23 08:57:20 +0000 | [diff] [blame] | 221 | |
Mikhail Glushenkov | 77ddce9 | 2008-05-06 18:17:19 +0000 | [diff] [blame] | 222 | - ``append_cmd`` - append a string to the tool invocation command. |
Anton Korobeynikov | ac67b7e | 2008-03-23 08:57:20 +0000 | [diff] [blame] | 223 | |
Mikhail Glushenkov | 77ddce9 | 2008-05-06 18:17:19 +0000 | [diff] [blame] | 224 | - ``forward`` - forward this option unchanged. |
Anton Korobeynikov | ac67b7e | 2008-03-23 08:57:20 +0000 | [diff] [blame] | 225 | |
Mikhail Glushenkov | 77ddce9 | 2008-05-06 18:17:19 +0000 | [diff] [blame] | 226 | - ``stop_compilation`` - stop compilation after this phase. |
| 227 | |
| 228 | - ``unpack_values`` - used for for splitting and forwarding |
Anton Korobeynikov | ac67b7e | 2008-03-23 08:57:20 +0000 | [diff] [blame] | 229 | comma-separated lists of options, e.g. ``-Wa,-foo=bar,-baz`` is |
| 230 | converted to ``-foo=bar -baz`` and appended to the tool invocation |
| 231 | command. |
| 232 | |
Mikhail Glushenkov | 77ddce9 | 2008-05-06 18:17:19 +0000 | [diff] [blame] | 233 | - ``help`` - help string associated with this option. |
Anton Korobeynikov | ac67b7e | 2008-03-23 08:57:20 +0000 | [diff] [blame] | 234 | |
Mikhail Glushenkov | 77ddce9 | 2008-05-06 18:17:19 +0000 | [diff] [blame] | 235 | - ``required`` - this option is obligatory. |
| 236 | |
Anton Korobeynikov | ac67b7e | 2008-03-23 08:57:20 +0000 | [diff] [blame] | 237 | |
| 238 | Language map |
| 239 | ------------ |
| 240 | |
Mikhail Glushenkov | 77ddce9 | 2008-05-06 18:17:19 +0000 | [diff] [blame] | 241 | One last thing that you need to modify when adding support for a new |
| 242 | language to LLVMC is the language map, which defines mappings from |
| 243 | file extensions to language names. It is used to choose the proper |
| 244 | toolchain based on the input. Language map definition is located in |
| 245 | the file ``Tools.td`` and looks like this:: |
Anton Korobeynikov | ac67b7e | 2008-03-23 08:57:20 +0000 | [diff] [blame] | 246 | |
| 247 | def LanguageMap : LanguageMap< |
| 248 | [LangToSuffixes<"c++", ["cc", "cp", "cxx", "cpp", "CPP", "c++", "C"]>, |
| 249 | LangToSuffixes<"c", ["c"]>, |
| 250 | ... |
| 251 | ]>; |
| 252 | |
| 253 | |
Anton Korobeynikov | ac67b7e | 2008-03-23 08:57:20 +0000 | [diff] [blame] | 254 | References |
| 255 | ========== |
| 256 | |
| 257 | .. [1] TableGen Fundamentals |
| 258 | http://llvm.cs.uiuc.edu/docs/TableGenFundamentals.html |