| <html> |
| <head> |
| <title>Clang Driver Manual</title> |
| <link type="text/css" rel="stylesheet" href="../menu.css" /> |
| <link type="text/css" rel="stylesheet" href="../content.css" /> |
| <style type="text/css"> |
| td { |
| vertical-align: top; |
| } |
| </style> |
| </head> |
| <body> |
| |
| <!--#include virtual="../menu.html.incl"--> |
| |
| <div id="content"> |
| |
| <h1>Driver Design & Internals</h1> |
| |
| <ul> |
| <li><a href="#intro">Introduction</a></li> |
| <li><a href="#features">Features and Goals</a></li> |
| <ul> |
| <li><a href="#gcccompat">GCC Compatibility</a></li> |
| <li><a href="#components">Flexible</a></li> |
| <li><a href="#performance">Low Overhead</a></li> |
| <li><a href="#simple">Simple</a></li> |
| </ul> |
| <li><a href="#design">Design</a></li> |
| <ul> |
| <li><a href="#int_intro">Internals Introduction</a></li> |
| <li><a href="#int_overview">Design Overview</a></li> |
| <li><a href="#int_notes">Additional Notes</a></li> |
| <ul> |
| <li><a href="#int_compilation">The Compilation Object</a></li> |
| <li><a href="#int_unified_parsing">Unified Parsing & Pipelining</a></li> |
| <li><a href="#int_toolchain_translation">ToolChain Argument Translation</a></li> |
| <li><a href="#int_unused_warnings">Unused Argument Warnings</a></li> |
| </ul> |
| <li><a href="#int_gcc_concepts">Relation to GCC Driver Concepts</a></li> |
| </ul> |
| </ul> |
| |
| |
| <!-- ======================================================================= --> |
| <h2 id="intro">Introduction</h2> |
| <!-- ======================================================================= --> |
| |
| <p>This document describes the Clang driver. The purpose of this |
| document is to describe both the motivation and design goals |
| for the driver, as well as details of the internal |
| implementation.</p> |
| |
| <!-- ======================================================================= --> |
| <h2 id="features">Features and Goals</h2> |
| <!-- ======================================================================= --> |
| |
| <p>The Clang driver is intended to be a production quality |
| compiler driver providing access to the Clang compiler and |
| tools, with a command line interface which is compatible with |
| the gcc driver.</p> |
| |
| <p>Although the driver is part of and driven by the Clang |
| project, it is logically a separate tool which shares many of |
| the same goals as Clang:</p> |
| |
| <p><b>Features</b>:</p> |
| <ul> |
| <li><a href="#gcccompat">GCC Compatibility</a></li> |
| <li><a href="#components">Flexible</a></li> |
| <li><a href="#performance">Low Overhead</a></li> |
| <li><a href="#simple">Simple</a></li> |
| </ul> |
| |
| <!--=======================================================================--> |
| <h3 id="gcccompat">GCC Compatibility</h3> |
| <!--=======================================================================--> |
| |
| <p>The number one goal of the driver is to ease the adoption of |
| Clang by allowing users to drop Clang into a build system |
| which was designed to call GCC. Although this makes the driver |
| much more complicated than might otherwise be necessary, we |
| decided that being very compatible with the gcc command line |
| interface was worth it in order to allow users to quickly test |
| clang on their projects.</p> |
| |
| <!--=======================================================================--> |
| <h3 id="components">Flexible</h3> |
| <!--=======================================================================--> |
| |
| <p>The driver was designed to be flexible and easily accomodate |
| new uses as we grow the clang and LLVM infrastructure. As one |
| example, the driver can easily support the introduction of |
| tools which have an integrated assembler; something we hope to |
| add to LLVM in the future.</p> |
| |
| <p>Similarly, most of the driver functionality is kept in a |
| library which can be used to build other tools which want to |
| implement or accept a gcc like interface. </p> |
| |
| <!--=======================================================================--> |
| <h3 id="performance">Low Overhead</h3> |
| <!--=======================================================================--> |
| |
| <p>The driver should have as little overhead as possible. In |
| practice, we found that the gcc driver by itself incurred a |
| small but meaningful overhead when compiling many small |
| files. The driver doesn't do much work compared to a |
| compilation, but we have tried to keep it as efficient as |
| possible by following a few simple principles:</p> |
| <ul> |
| <li>Avoid memory allocation and string copying when |
| possible.</li> |
| |
| <li>Don't parse arguments more than once.</li> |
| |
| <li>Provide a few simple interfaces for efficiently searching |
| arguments.</li> |
| </ul> |
| |
| <!--=======================================================================--> |
| <h3 id="simple">Simple</h3> |
| <!--=======================================================================--> |
| |
| <p>Finally, the driver was designed to be "as simple as |
| possible", given the other goals. Notably, trying to be |
| completely compatible with the gcc driver adds a significant |
| amount of complexity. However, the design of the driver |
| attempts to mitigate this complexity by dividing the process |
| into a number of independent stages instead of a single |
| monolithic task.</p> |
| |
| <!-- ======================================================================= --> |
| <h2 id="design">Internal Design and Implementation</h2> |
| <!-- ======================================================================= --> |
| |
| <ul> |
| <li><a href="#int_intro">Internals Introduction</a></li> |
| <li><a href="#int_overview">Design Overview</a></li> |
| <li><a href="#int_notes">Additional Notes</a></li> |
| <li><a href="#int_gcc_concepts">Relation to GCC Driver Concepts</a></li> |
| </ul> |
| |
| <!--=======================================================================--> |
| <h3><a name="int_intro">Internals Introduction</a></h3> |
| <!--=======================================================================--> |
| |
| <p>In order to satisfy the stated goals, the driver was designed |
| to completely subsume the functionality of the gcc executable; |
| that is, the driver should not need to delegate to gcc to |
| perform subtasks. On Darwin, this implies that the Clang |
| driver also subsumes the gcc driver-driver, which is used to |
| implement support for building universal images (binaries and |
| object files). This also implies that the driver should be |
| able to call the language specific compilers (e.g. cc1) |
| directly, which means that it must have enough information to |
| forward command line arguments to child processes |
| correctly.</p> |
| |
| <!--=======================================================================--> |
| <h3><a name="int_overview">Design Overview</a></h3> |
| <!--=======================================================================--> |
| |
| <p>The diagram below shows the significant components of the |
| driver architecture and how they relate to one another. The |
| orange components represent concrete data structures built by |
| the driver, the green components indicate conceptually |
| distinct stages which manipulate these data structures, and |
| the blue components are important helper classes. </p> |
| |
| <center> |
| <a href="DriverArchitecture.png" alt="Driver Architecture Diagram"> |
| <img width=400 src="DriverArchitecture.png"> |
| </a> |
| </center> |
| |
| <!--=======================================================================--> |
| <h3><a name="int_stages">Driver Stages</a></h3> |
| <!--=======================================================================--> |
| |
| <p>The driver functionality is conceptually divided into five stages:</p> |
| |
| <ol> |
| <li> |
| <b>Parse: Option Parsing</b> |
| |
| <p>The command line argument strings are decomposed into |
| arguments (<tt>Arg</tt> instances). The driver expects to |
| understand all available options, although there is some |
| facility for just passing certain classes of options |
| through (like <tt>-Wl,</tt>).</p> |
| |
| <p>Each argument corresponds to exactly one |
| abstract <tt>Option</tt> definition, which describes how |
| the option is parsed along with some additional |
| metadata. The Arg instances themselves are lightweight and |
| merely contain enough information for clients to determine |
| which option they correspond to and their values (if they |
| have additional parameters).</p> |
| |
| <p>For example, a command line like "-Ifoo -I foo" would |
| parse to two Arg instances (a JoinedArg and a SeparateArg |
| instance), but each would refer to the same Option.</p> |
| |
| <p>Options are lazily created in order to avoid populating |
| all Option classes when the driver is loaded. Most of the |
| driver code only needs to deal with options by their |
| unique ID (e.g., <tt>options::OPT_I</tt>),</p> |
| |
| <p>Arg instances themselves do not generally store the |
| values of parameters. In many cases, this would |
| simply result in creating unnecessary string |
| copies. Instead, Arg instances are always embedded inside |
| an ArgList structure, which contains the original vector |
| of argument strings. Each Arg itself only needs to contain |
| an index into this vector instead of storing its values |
| directly.</p> |
| |
| <p>The clang driver can dump the results of this |
| stage using the <tt>-ccc-print-options</tt> flag (which |
| must preceed any actual command line arguments). For |
| example:</p> |
| <pre> |
| $ <b>clang -ccc-print-options -Xarch_i386 -fomit-frame-pointer -Wa,-fast -Ifoo -I foo t.c</b> |
| Option 0 - Name: "-Xarch_", Values: {"i386", "-fomit-frame-pointer"} |
| Option 1 - Name: "-Wa,", Values: {"-fast"} |
| Option 2 - Name: "-I", Values: {"foo"} |
| Option 3 - Name: "-I", Values: {"foo"} |
| Option 4 - Name: "<input>", Values: {"t.c"} |
| </pre> |
| |
| <p>After this stage is complete the command line should be |
| broken down into well defined option objects with their |
| appropriate parameters. Subsequent stages should rarely, |
| if ever, need to do any string processing.</p> |
| </li> |
| |
| <li> |
| <b>Pipeline: Compilation Job Construction</b> |
| |
| <p>Once the arguments are parsed, the tree of subprocess |
| jobs needed for the desired compilation sequence are |
| constructed. This involves determining the input files and |
| their types, what work is to be done on them (preprocess, |
| compile, assemble, link, etc.), and constructing a list of |
| Action instances for each task. The result is a list of |
| one or more top-level actions, each of which generally |
| corresponds to a single output (for example, an object or |
| linked executable).</p> |
| |
| <p>The majority of Actions correspond to actual tasks, |
| however there are two special Actions. The first is |
| InputAction, which simply serves to adapt an input |
| argument for use as an input to other Actions. The second |
| is BindArchAction, which conceptually alters the |
| architecture to be used for all of its input Actions.</p> |
| |
| <p>The clang driver can dump the results of this |
| stage using the <tt>-ccc-print-phases</tt> flag. For |
| example:</p> |
| <pre> |
| $ <b>clang -ccc-print-phases -x c t.c -x assembler t.s</b> |
| 0: input, "t.c", c |
| 1: preprocessor, {0}, cpp-output |
| 2: compiler, {1}, assembler |
| 3: assembler, {2}, object |
| 4: input, "t.s", assembler |
| 5: assembler, {4}, object |
| 6: linker, {3, 5}, image |
| </pre> |
| <p>Here the driver is constructing seven distinct actions, |
| four to compile the "t.c" input into an object file, two to |
| assemble the "t.s" input, and one to link them together.</p> |
| |
| <p>A rather different compilation pipeline is shown here; in |
| this example there are two top level actions to compile |
| the input files into two separate object files, where each |
| object file is built using <tt>lipo</tt> to merge results |
| built for two separate architectures.</p> |
| <pre> |
| $ <b>clang -ccc-print-phases -c -arch i386 -arch x86_64 t0.c t1.c</b> |
| 0: input, "t0.c", c |
| 1: preprocessor, {0}, cpp-output |
| 2: compiler, {1}, assembler |
| 3: assembler, {2}, object |
| 4: bind-arch, "i386", {3}, object |
| 5: bind-arch, "x86_64", {3}, object |
| 6: lipo, {4, 5}, object |
| 7: input, "t1.c", c |
| 8: preprocessor, {7}, cpp-output |
| 9: compiler, {8}, assembler |
| 10: assembler, {9}, object |
| 11: bind-arch, "i386", {10}, object |
| 12: bind-arch, "x86_64", {10}, object |
| 13: lipo, {11, 12}, object |
| </pre> |
| |
| <p>After this stage is complete the compilation process is |
| divided into a simple set of actions which need to be |
| performed to produce intermediate or final outputs (in |
| some cases, like <tt>-fsyntax-only</tt>, there is no |
| "real" final output). Phases are well known compilation |
| steps, such as "preprocess", "compile", "assemble", |
| "link", etc.</p> |
| </li> |
| |
| <li> |
| <b>Bind: Tool & Filename Selection</b> |
| |
| <p>This stage (in conjunction with the Translate stage) |
| turns the tree of Actions into a list of actual subprocess |
| to run. Conceptually, the driver performs a top down |
| matching to assign Action(s) to Tools. The ToolChain is |
| responsible for selecting the tool to perform a particular |
| action; once selected the driver interacts with the tool |
| to see if it can match additional actions (for example, by |
| having an integrated preprocessor). |
| |
| <p>Once Tools have been selected for all actions, the driver |
| determines how the tools should be connected (for example, |
| using an inprocess module, pipes, temporary files, or user |
| provided filenames). If an output file is required, the |
| driver also computes the appropriate file name (the suffix |
| and file location depend on the input types and options |
| such as <tt>-save-temps</tt>). |
| |
| <p>The driver interacts with a ToolChain to perform the Tool |
| bindings. Each ToolChain contains information about all |
| the tools needed for compilation for a particular |
| architecture, platform, and operating system. A single |
| driver invocation may query multiple ToolChains during one |
| compilation in order to interact with tools for separate |
| architectures.</p> |
| |
| <p>The results of this stage are not computed directly, but |
| the driver can print the results via |
| the <tt>-ccc-print-bindings</tt> option. For example:</p> |
| <pre> |
| $ <b>clang -ccc-print-bindings -arch i386 -arch ppc t0.c</b> |
| # "i386-apple-darwin9" - "clang", inputs: ["t0.c"], output: "/tmp/cc-Sn4RKF.s" |
| # "i386-apple-darwin9" - "darwin::Assemble", inputs: ["/tmp/cc-Sn4RKF.s"], output: "/tmp/cc-gvSnbS.o" |
| # "i386-apple-darwin9" - "darwin::Link", inputs: ["/tmp/cc-gvSnbS.o"], output: "/tmp/cc-jgHQxi.out" |
| # "ppc-apple-darwin9" - "gcc::Compile", inputs: ["t0.c"], output: "/tmp/cc-Q0bTox.s" |
| # "ppc-apple-darwin9" - "gcc::Assemble", inputs: ["/tmp/cc-Q0bTox.s"], output: "/tmp/cc-WCdicw.o" |
| # "ppc-apple-darwin9" - "gcc::Link", inputs: ["/tmp/cc-WCdicw.o"], output: "/tmp/cc-HHBEBh.out" |
| # "i386-apple-darwin9" - "darwin::Lipo", inputs: ["/tmp/cc-jgHQxi.out", "/tmp/cc-HHBEBh.out"], output: "a.out" |
| </pre> |
| |
| <p>This shows the tool chain, tool, inputs and outputs which |
| have been bound for this compilation sequence. Here clang |
| is being used to compile t0.c on the i386 architecture and |
| darwin specific versions of the tools are being used to |
| assemble and link the result, but generic gcc versions of |
| the tools are being used on PowerPC.</p> |
| </li> |
| |
| <li> |
| <b>Translate: Tool Specific Argument Translation</b> |
| |
| <p>Once a Tool has been selected to perform a particular |
| Action, the Tool must construct concrete Jobs which will be |
| executed during compilation. The main work is in translating |
| from the gcc style command line options to whatever options |
| the subprocess expects.</p> |
| |
| <p>Some tools, such as the assembler, only interact with a |
| handful of arguments and just determine the path of the |
| executable to call and pass on their input and output |
| arguments. Others, like the compiler or the linker, may |
| translate a large number of arguments in addition.</p> |
| |
| <p>The ArgList class provides a number of simple helper |
| methods to assist with translating arguments; for example, |
| to pass on only the last of arguments corresponding to some |
| option, or all arguments for an option.</p> |
| |
| <p>The result of this stage is a list of Jobs (executable |
| paths and argument strings) to execute.</p> |
| </li> |
| |
| <li> |
| <b>Execute</b> |
| <p>Finally, the compilation pipeline is executed. This is |
| mostly straightforward, although there is some interaction |
| with options |
| like <tt>-pipe</tt>, <tt>-pass-exit-codes</tt> |
| and <tt>-time</tt>.</p> |
| </li> |
| |
| </ol> |
| |
| <!--=======================================================================--> |
| <h3><a name="int_notes">Additional Notes</a></h3> |
| <!--=======================================================================--> |
| |
| <h4 id="int_compilation">The Compilation Object</h4> |
| |
| <p>The driver constructs a Compilation object for each set of |
| command line arguments. The Driver itself is intended to be |
| invariant during construction of a Compilation; an IDE should be |
| able to construct a single long lived driver instance to use |
| for an entire build, for example.</p> |
| |
| <p>The Compilation object holds information that is particular |
| to each compilation sequence. For example, the list of used |
| temporary files (which must be removed once compilation is |
| finished) and result files (which should be removed if |
| compilation files).</p> |
| |
| <h4 id="int_unified_parsing">Unified Parsing & Pipelining</h4> |
| |
| <p>Parsing and pipelining both occur without reference to a |
| Compilation instance. This is by design; the driver expects that |
| both of these phases are platform neutral, with a few very well |
| defined exceptions such as whether the platform uses a driver |
| driver.</p> |
| |
| <h4 id="int_toolchain_translation">ToolChain Argument Translation</h4> |
| |
| <p>In order to match gcc very closely, the clang driver |
| currently allows tool chains to perform their own translation of |
| the argument list (into a new ArgList data structure). Although |
| this allows the clang driver to match gcc easily, it also makes |
| the driver operation much harder to understand (since the Tools |
| stop seeing some arguments the user provided, and see new ones |
| instead).</p> |
| |
| <p>For example, on Darwin <tt>-gfull</tt> gets translated into two |
| separate arguments, <tt>-g</tt> |
| and <tt>-fno-eliminate-unused-debug-symbols</tt>. Trying to write Tool |
| logic to do something with <tt>-gfull</tt> will not work, because Tool |
| argument translation is done after the arguments have been |
| translated.</p> |
| |
| <p>A long term goal is to remove this tool chain specific |
| translation, and instead force each tool to change its own logic |
| to do the right thing on the untranslated original arguments.</p> |
| |
| <h4 id="int_unused_warnings">Unused Argument Warnings</h4> |
| <p>The driver operates by parsing all arguments but giving Tools |
| the opportunity to choose which arguments to pass on. One |
| downside of this infrastructure is that if the user misspells |
| some option, or is confused about which options to use, some |
| command line arguments the user really cared about may go |
| unused. This problem is particularly important when using |
| clang as a compiler, since the clang compiler does not support |
| anywhere near all the options that gcc does, and we want to make |
| sure users know which ones are being used.</p> |
| |
| <p>To support this, the driver maintains a bit associated with |
| each argument of whether it has been used (at all) during the |
| compilation. This bit usually doesn't need to be set by hand, |
| as the key ArgList accessors will set it automatically.</p> |
| |
| <p>When a compilation is successful (there are no errors), the |
| driver checks the bit and emits an "unused argument" warning for |
| any arguments which were never accessed. This is conservative |
| (the argument may not have been used to do what the user wanted) |
| but still catches the most obvious cases.</p> |
| |
| <!--=======================================================================--> |
| <h3><a name="int_gcc_concepts">Relation to GCC Driver Concepts</a></h3> |
| <!--=======================================================================--> |
| |
| <p>For those familiar with the gcc driver, this section provides |
| a brief overview of how things from the gcc driver map to the |
| clang driver.</p> |
| |
| <ul> |
| <li> |
| <b>Driver Driver</b> |
| <p>The driver driver is fully integrated into the clang |
| driver. The driver simply constructs additional Actions to |
| bind the architecture during the <i>Pipeline</i> |
| phase. The tool chain specific argument translation is |
| responsible for handling <tt>-Xarch_</tt>.</p> |
| |
| <p>The one caveat is that this approach |
| requires <tt>-Xarch_</tt> not be used to alter the |
| compilation itself (for example, one cannot |
| provide <tt>-S</tt> as an <tt>-Xarch_</tt> argument). The |
| driver attempts to reject such invocations, and overall |
| there isn't a good reason to abuse <tt>-Xarch_</tt> to |
| that end in practice.</p> |
| |
| <p>The upside is that the clang driver is more efficient and |
| does little extra work to support universal builds. It also |
| provides better error reporting and UI consistency.</p> |
| </li> |
| |
| <li> |
| <b>Specs</b> |
| <p>The clang driver has no direct correspondant for |
| "specs". The majority of the functionality that is |
| embedded in specs is in the Tool specific argument |
| translation routines. The parts of specs which control the |
| compilation pipeline are generally part of |
| the <ii>Pipeline</ii> stage.</p> |
| </li> |
| |
| <li> |
| <b>Toolchains</b> |
| <p>The gcc driver has no direct understanding of tool |
| chains. Each gcc binary roughly corresponds to the |
| information which is embedded inside a single |
| ToolChain.</p> |
| |
| <p>The clang driver is intended to be portable and support |
| complex compilation environments. All platform and tool |
| chain specific code should be protected behind either |
| abstract or well defined interfaces (such as whether the |
| platform supports use as a driver driver).</p> |
| </li> |
| </ul> |
| </div> |
| </body> |
| </html> |