| =================================== |
| Customizing LLVMC: Reference Manual |
| =================================== |
| |
| LLVMC is a generic compiler driver, designed to be customizable and |
| extensible. It plays the same role for LLVM as the ``gcc`` program |
| does for GCC - LLVMC's job is essentially to transform a set of input |
| files into a set of targets depending on configuration rules and user |
| options. What makes LLVMC different is that these transformation rules |
| are completely customizable - in fact, LLVMC knows nothing about the |
| specifics of transformation (even the command-line options are mostly |
| not hard-coded) and regards the transformation structure as an |
| abstract graph. The structure of this graph is completely determined |
| by plugins, which can be either statically or dynamically linked. This |
| makes it possible to easily adapt LLVMC for other purposes - for |
| example, as a build tool for game resources. |
| |
| Because LLVMC employs TableGen [1]_ as its configuration language, you |
| need to be familiar with it to customize LLVMC. |
| |
| |
| .. contents:: |
| |
| |
| Compiling with LLVMC |
| ==================== |
| |
| LLVMC tries hard to be as compatible with ``gcc`` as possible, |
| although there are some small differences. Most of the time, however, |
| you shouldn't be able to notice them:: |
| |
| $ # This works as expected: |
| $ llvmc2 -O3 -Wall hello.cpp |
| $ ./a.out |
| hello |
| |
| One nice feature of LLVMC is that one doesn't have to distinguish |
| between different compilers for different languages (think ``g++`` and |
| ``gcc``) - the right toolchain is chosen automatically based on input |
| language names (which are, in turn, determined from file |
| extensions). If you want to force files ending with ".c" to compile as |
| C++, use the ``-x`` option, just like you would do it with ``gcc``:: |
| |
| $ llvmc2 -x c hello.cpp |
| $ # hello.cpp is really a C file |
| $ ./a.out |
| hello |
| |
| On the other hand, when using LLVMC as a linker to combine several C++ |
| object files you should provide the ``--linker`` option since it's |
| impossible for LLVMC to choose the right linker in that case:: |
| |
| $ llvmc2 -c hello.cpp |
| $ llvmc2 hello.o |
| [A lot of link-time errors skipped] |
| $ llvmc2 --linker=c++ hello.o |
| $ ./a.out |
| hello |
| |
| |
| Predefined options |
| ================== |
| |
| LLVMC has some built-in options that can't be overridden in the |
| configuration files: |
| |
| * ``-o FILE`` - Output file name. |
| |
| * ``-x LANGUAGE`` - Specify the language of the following input files |
| until the next -x option. |
| |
| * ``-load PLUGIN_NAME`` - Load the specified plugin DLL. Example: |
| ``-load $LLVM_DIR/Release/lib/LLVMCSimple.so``. |
| |
| * ``-v`` - Enable verbose mode, i.e. print out all executed commands. |
| |
| * ``--view-graph`` - Show a graphical representation of the compilation |
| graph. Requires that you have ``dot`` and ``gv`` programs |
| installed. Hidden option, useful for debugging. |
| |
| * ``--write-graph`` - Write a ``compilation-graph.dot`` file in the |
| current directory with the compilation graph description in the |
| Graphviz format. Hidden option, useful for debugging. |
| |
| * ``--save-temps`` - Write temporary files to the current directory |
| and do not delete them on exit. Hidden option, useful for debugging. |
| |
| * ``--help``, ``--help-hidden``, ``--version`` - These options have |
| their standard meaning. |
| |
| |
| Compiling LLVMC plugins |
| ======================= |
| |
| It's easiest to start working on your own LLVMC plugin by copying the |
| skeleton project which lives under ``$LLVMC_DIR/plugins/Simple``:: |
| |
| $ cd $LLVMC_DIR/plugins |
| $ cp -r Simple MyPlugin |
| $ cd MyPlugin |
| $ ls |
| Makefile PluginMain.cpp Simple.td |
| |
| As you can see, our basic plugin consists of only two files (not |
| counting the build script). ``Simple.td`` contains TableGen |
| description of the compilation graph; its format is documented in the |
| following sections. ``PluginMain.cpp`` is just a helper file used to |
| compile the auto-generated C++ code produced from TableGen source. It |
| can also contain hook definitions (see `below`__). |
| |
| __ hooks_ |
| |
| The first thing that you should do is to change the ``LLVMC_PLUGIN`` |
| variable in the ``Makefile`` to avoid conflicts (since this variable |
| is used to name the resulting library):: |
| |
| LLVMC_PLUGIN=MyPlugin |
| |
| It is also a good idea to rename ``Simple.td`` to something less |
| generic:: |
| |
| $ mv Simple.td MyPlugin.td |
| |
| Note that the plugin source directory should be placed under |
| ``$LLVMC_DIR/plugins`` to make use of the existing build |
| infrastructure. To build a version of the LLVMC executable called |
| ``mydriver`` with your plugin compiled in, use the following command:: |
| |
| $ cd $LLVMC_DIR |
| $ make BUILTIN_PLUGINS=MyPlugin DRIVER_NAME=mydriver |
| |
| When linking plugins dynamically, you'll usually want a 'bare-bones' |
| version of LLVMC that has no built-in plugins. It can be compiled with |
| the following command:: |
| |
| $ cd $LLVMC_DIR |
| $ make BUILTIN_PLUGINS="" |
| |
| To build your plugin as a dynamic library, just ``cd`` to its source |
| directory and run ``make``. The resulting file will be called |
| ``LLVMC$(LLVMC_PLUGIN).$(DLL_EXTENSION)`` (in our case, |
| ``LLVMCMyPlugin.so``). This library can be then loaded in with the |
| ``-load`` option. Example:: |
| |
| $ cd $LLVMC_DIR/plugins/Simple |
| $ make |
| $ llvmc2 -load $LLVM_DIR/Release/lib/LLVMCSimple.so |
| |
| In the future LLVMC will be able to load TableGen files directly. |
| |
| |
| Customizing LLVMC: the compilation graph |
| ======================================== |
| |
| Each TableGen configuration file should include the common |
| definitions:: |
| |
| include "llvm/CompilerDriver/Common.td" |
| // And optionally: |
| // include "llvm/CompilerDriver/Tools.td" |
| // which contains tool definitions. |
| |
| Internally, LLVMC stores information about possible source |
| transformations in form of a graph. Nodes in this graph represent |
| tools, and edges between two nodes represent a transformation path. A |
| special "root" node is used to mark entry points for the |
| transformations. LLVMC also assigns a weight to each edge (more on |
| this later) to choose between several alternative edges. |
| |
| The definition of the compilation graph (see file |
| ``plugins/Base/Base.td`` for an example) is just a list of edges:: |
| |
| def CompilationGraph : CompilationGraph<[ |
| Edge<root, llvm_gcc_c>, |
| Edge<root, llvm_gcc_assembler>, |
| ... |
| |
| Edge<llvm_gcc_c, llc>, |
| Edge<llvm_gcc_cpp, llc>, |
| ... |
| |
| OptionalEdge<llvm_gcc_c, opt, [(switch_on "opt")]>, |
| OptionalEdge<llvm_gcc_cpp, opt, [(switch_on "opt")]>, |
| ... |
| |
| OptionalEdge<llvm_gcc_assembler, llvm_gcc_cpp_linker, |
| (case (input_languages_contain "c++"), (inc_weight), |
| (or (parameter_equals "linker", "g++"), |
| (parameter_equals "linker", "c++")), (inc_weight))>, |
| ... |
| |
| ]>; |
| |
| As you can see, the edges can be either default or optional, where |
| optional edges are differentiated by an additional ``case`` expression |
| used to calculate the weight of this edge. |
| |
| The default edges are assigned a weight of 1, and optional edges get a |
| weight of 0 + 2*N where N is the number of tests that evaluated to |
| true in the ``case`` expression. It is also possible to provide an |
| integer parameter to ``inc_weight`` and ``dec_weight`` - in this case, |
| the weight is increased (or decreased) by the provided value instead |
| of the default 2. |
| |
| When passing an input file through the graph, LLVMC picks the edge |
| with the maximum weight. To avoid ambiguity, there should be only one |
| default edge between two nodes (with the exception of the root node, |
| which gets a special treatment - there you are allowed to specify one |
| default edge *per language*). |
| |
| To get a visual representation of the compilation graph (useful for |
| debugging), run ``llvmc2 --view-graph``. You will need ``dot`` and |
| ``gsview`` installed for this to work properly. |
| |
| |
| Writing a tool description |
| ========================== |
| |
| As was said earlier, nodes in the compilation graph represent tools, |
| which are described separately. A tool definition looks like this |
| (taken from the ``include/llvm/CompilerDriver/Tools.td`` file):: |
| |
| def llvm_gcc_cpp : Tool<[ |
| (in_language "c++"), |
| (out_language "llvm-assembler"), |
| (output_suffix "bc"), |
| (cmd_line "llvm-g++ -c $INFILE -o $OUTFILE -emit-llvm"), |
| (sink) |
| ]>; |
| |
| This defines a new tool called ``llvm_gcc_cpp``, which is an alias for |
| ``llvm-g++``. As you can see, a tool definition is just a list of |
| properties; most of them should be self-explanatory. The ``sink`` |
| property means that this tool should be passed all command-line |
| options that lack explicit descriptions. |
| |
| The complete list of the currently implemented tool properties follows: |
| |
| * Possible tool properties: |
| |
| - ``in_language`` - input language name. Can be either a string or a |
| list, in case the tool supports multiple input languages. |
| |
| - ``out_language`` - output language name. |
| |
| - ``output_suffix`` - output file suffix. |
| |
| - ``cmd_line`` - the actual command used to run the tool. You can |
| use ``$INFILE`` and ``$OUTFILE`` variables, output redirection |
| with ``>``, hook invocations (``$CALL``), environment variables |
| (via ``$ENV``) and the ``case`` construct (more on this below). |
| |
| - ``join`` - this tool is a "join node" in the graph, i.e. it gets a |
| list of input files and joins them together. Used for linkers. |
| |
| - ``sink`` - all command-line options that are not handled by other |
| tools are passed to this tool. |
| |
| The next tool definition is slightly more complex:: |
| |
| def llvm_gcc_linker : Tool<[ |
| (in_language "object-code"), |
| (out_language "executable"), |
| (output_suffix "out"), |
| (cmd_line "llvm-gcc $INFILE -o $OUTFILE"), |
| (join), |
| (prefix_list_option "L", (forward), |
| (help "add a directory to link path")), |
| (prefix_list_option "l", (forward), |
| (help "search a library when linking")), |
| (prefix_list_option "Wl", (unpack_values), |
| (help "pass options to linker")) |
| ]>; |
| |
| This tool has a "join" property, which means that it behaves like a |
| linker. This tool also defines several command-line options: ``-l``, |
| ``-L`` and ``-Wl`` which have their usual meaning. An option has two |
| attributes: a name and a (possibly empty) list of properties. All |
| currently implemented option types and properties are described below: |
| |
| * Possible option types: |
| |
| - ``switch_option`` - a simple boolean switch, for example ``-time``. |
| |
| - ``parameter_option`` - option that takes an argument, for example |
| ``-std=c99``; |
| |
| - ``parameter_list_option`` - same as the above, but more than one |
| occurence of the option is allowed. |
| |
| - ``prefix_option`` - same as the parameter_option, but the option name |
| and parameter value are not separated. |
| |
| - ``prefix_list_option`` - same as the above, but more than one |
| occurence of the option is allowed; example: ``-lm -lpthread``. |
| |
| - ``alias_option`` - a special option type for creating |
| aliases. Unlike other option types, aliases are not allowed to |
| have any properties besides the aliased option name. Usage |
| example: ``(alias_option "preprocess", "E")`` |
| |
| |
| * Possible option properties: |
| |
| - ``append_cmd`` - append a string to the tool invocation command. |
| |
| - ``forward`` - forward this option unchanged. |
| |
| - ``forward_as`` - Change the name of this option, but forward the |
| argument unchanged. Example: ``(forward_as "--disable-optimize")``. |
| |
| - ``output_suffix`` - modify the output suffix of this |
| tool. Example: ``(switch "E", (output_suffix "i")``. |
| |
| - ``stop_compilation`` - stop compilation after this phase. |
| |
| - ``unpack_values`` - used for for splitting and forwarding |
| comma-separated lists of options, e.g. ``-Wa,-foo=bar,-baz`` is |
| converted to ``-foo=bar -baz`` and appended to the tool invocation |
| command. |
| |
| - ``help`` - help string associated with this option. Used for |
| ``--help`` output. |
| |
| - ``required`` - this option is obligatory. |
| |
| |
| Option list - specifying all options in a single place |
| ====================================================== |
| |
| It can be handy to have all information about options gathered in a |
| single place to provide an overview. This can be achieved by using a |
| so-called ``OptionList``:: |
| |
| def Options : OptionList<[ |
| (switch_option "E", (help "Help string")), |
| (alias_option "quiet", "q") |
| ... |
| ]>; |
| |
| ``OptionList`` is also a good place to specify option aliases. |
| |
| Tool-specific option properties like ``append_cmd`` have (obviously) |
| no meaning in the context of ``OptionList``, so the only properties |
| allowed there are ``help`` and ``required``. |
| |
| Option lists are used at the file scope. See file |
| ``plugins/Clang/Clang.td`` for an example of ``OptionList`` usage. |
| |
| .. _hooks: |
| |
| Using hooks and environment variables in the ``cmd_line`` property |
| ================================================================== |
| |
| Normally, LLVMC executes programs from the system ``PATH``. Sometimes, |
| this is not sufficient: for example, we may want to specify tool names |
| in the configuration file. This can be achieved via the mechanism of |
| hooks - to write your own hooks, just add their definitions to the |
| ``PluginMain.cpp`` or drop a ``.cpp`` file into the |
| ``$LLVMC_DIR/driver`` directory. Hooks should live in the ``hooks`` |
| namespace and have the signature ``std::string hooks::MyHookName |
| (void)``. They can be used from the ``cmd_line`` tool property:: |
| |
| (cmd_line "$CALL(MyHook)/path/to/file -o $CALL(AnotherHook)") |
| |
| It is also possible to use environment variables in the same manner:: |
| |
| (cmd_line "$ENV(VAR1)/path/to/file -o $ENV(VAR2)") |
| |
| To change the command line string based on user-provided options use |
| the ``case`` expression (documented below):: |
| |
| (cmd_line |
| (case |
| (switch_on "E"), |
| "llvm-g++ -E -x c $INFILE -o $OUTFILE", |
| (default), |
| "llvm-g++ -c -x c $INFILE -o $OUTFILE -emit-llvm")) |
| |
| Conditional evaluation: the ``case`` expression |
| =============================================== |
| |
| The 'case' construct can be used to calculate weights of the optional |
| edges and to choose between several alternative command line strings |
| in the ``cmd_line`` tool property. It is designed after the |
| similarly-named construct in functional languages and takes the form |
| ``(case (test_1), statement_1, (test_2), statement_2, ... (test_N), |
| statement_N)``. The statements are evaluated only if the corresponding |
| tests evaluate to true. |
| |
| Examples:: |
| |
| // Increases edge weight by 5 if "-A" is provided on the |
| // command-line, and by 5 more if "-B" is also provided. |
| (case |
| (switch_on "A"), (inc_weight 5), |
| (switch_on "B"), (inc_weight 5)) |
| |
| // Evaluates to "cmdline1" if option "-A" is provided on the |
| // command line, otherwise to "cmdline2" |
| (case |
| (switch_on "A"), "cmdline1", |
| (switch_on "B"), "cmdline2", |
| (default), "cmdline3") |
| |
| Note the slight difference in 'case' expression handling in contexts |
| of edge weights and command line specification - in the second example |
| the value of the ``"B"`` switch is never checked when switch ``"A"`` is |
| enabled, and the whole expression always evaluates to ``"cmdline1"`` in |
| that case. |
| |
| Case expressions can also be nested, i.e. the following is legal:: |
| |
| (case (switch_on "E"), (case (switch_on "o"), ..., (default), ...) |
| (default), ...) |
| |
| You should, however, try to avoid doing that because it hurts |
| readability. It is usually better to split tool descriptions and/or |
| use TableGen inheritance instead. |
| |
| * Possible tests are: |
| |
| - ``switch_on`` - Returns true if a given command-line option is |
| provided by the user. Example: ``(switch_on "opt")``. Note that |
| you have to define all possible command-line options separately in |
| the tool descriptions. See the next section for the discussion of |
| different kinds of command-line options. |
| |
| - ``parameter_equals`` - Returns true if a command-line parameter equals |
| a given value. Example: ``(parameter_equals "W", "all")``. |
| |
| - ``element_in_list`` - Returns true if a command-line parameter list |
| includes a given value. Example: ``(parameter_in_list "l", "pthread")``. |
| |
| - ``input_languages_contain`` - Returns true if a given language |
| belongs to the current input language set. Example: |
| ``(input_languages_contain "c++")``. |
| |
| - ``in_language`` - Evaluates to true if the language of the input |
| file equals to the argument. At the moment works only with |
| ``cmd_line`` property on non-join nodes. Example: ``(in_language |
| "c++")``. |
| |
| - ``not_empty`` - Returns true if a given option (which should be |
| either a parameter or a parameter list) is set by the |
| user. Example: ``(not_empty "o")``. |
| |
| - ``default`` - Always evaluates to true. Should always be the last |
| test in the ``case`` expression. |
| |
| - ``and`` - A standard logical combinator that returns true iff all |
| of its arguments return true. Used like this: ``(and (test1), |
| (test2), ... (testN))``. Nesting of ``and`` and ``or`` is allowed, |
| but not encouraged. |
| |
| - ``or`` - Another logical combinator that returns true only if any |
| one of its arguments returns true. Example: ``(or (test1), |
| (test2), ... (testN))``. |
| |
| |
| Language map |
| ============ |
| |
| One last thing that you will need to modify when adding support for a |
| new language to LLVMC is the language map, which defines mappings from |
| file extensions to language names. It is used to choose the proper |
| toolchain(s) for a given input file set. Language map definition is |
| located in the file ``Tools.td`` and looks like this:: |
| |
| def LanguageMap : LanguageMap< |
| [LangToSuffixes<"c++", ["cc", "cp", "cxx", "cpp", "CPP", "c++", "C"]>, |
| LangToSuffixes<"c", ["c"]>, |
| ... |
| ]>; |
| |
| Debugging |
| ========= |
| |
| When writing LLVMC plugins, it can be useful to get a visual view of |
| the resulting compilation graph. This can be achieved via the command |
| line option ``--view-graph``. This command assumes that Graphviz [2]_ and |
| Ghostview [3]_ are installed. There is also a ``--dump-graph`` option that |
| creates a Graphviz source file(``compilation-graph.dot``) in the |
| current directory. |
| |
| |
| References |
| ========== |
| |
| .. [1] TableGen Fundamentals |
| http://llvm.cs.uiuc.edu/docs/TableGenFundamentals.html |
| |
| .. [2] Graphviz |
| http://www.graphviz.org/ |
| |
| .. [3] Ghostview |
| http://pages.cs.wisc.edu/~ghost/ |