Some very rough Driver documentation.


git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@68030 91177308-0d34-0410-b5e6-96231b3b80d8
diff --git a/docs/DriverInternals.html b/docs/DriverInternals.html
new file mode 100644
index 0000000..da3808b
--- /dev/null
+++ b/docs/DriverInternals.html
@@ -0,0 +1,510 @@
+<html>
+  <head>
+    <title>Clang Driver Manual</title>
+    <link type="text/css" rel="stylesheet" href="../menu.css" />
+    <link type="text/css" rel="stylesheet" href="../content.css" />
+    <style type="text/css">
+      td {
+      vertical-align: top;
+      }
+    </style>
+  </head>
+  <body>
+
+    <!--#include virtual="../menu.html.incl"-->
+
+    <div id="content">
+
+      <h1>Driver Design & Internals</h1>
+
+      <ul>
+        <li><a href="#intro">Introduction</a></li>
+        <li><a href="#features">Features and Goals</a></li>
+        <ul>
+          <li><a href="#gcccompat">GCC Compatibility</a></li>
+          <li><a href="#components">Flexible</a></li>
+          <li><a href="#performance">Low Overhead</a></li>
+          <li><a href="#simple">Simple</a></li>
+        </ul>
+        <li><a href="#design">Design</a></li>
+        <ul>
+          <li><a href="#int_intro">Internals Introduction</a></li>
+          <li><a href="#int_overview">Design Overview</a></li>
+          <li><a href="#int_notes">Additional Notes</a></li>
+          <ul>
+            <li><a href="#int_compilation">The Compilation Object</a></li>
+            <li><a href="#int_unified_parsing">Unified Parsing &amp; Pipelining</a></li>
+            <li><a href="#int_toolchain_translation">ToolChain Argument Translation</a></li>
+            <li><a href="#int_unused_warnings">Unused Argument Warnings</a></li>
+          </ul>
+          <li><a href="#int_gcc_concepts">Relation to GCC Driver Concepts</a></li>
+        </ul>
+      </ul>
+
+
+      <!-- ======================================================================= -->
+      <h2 id="intro">Introduction</h2>
+      <!-- ======================================================================= -->
+
+      <p>This document describes the Clang driver. The purpose of this
+        document is to describe both the motivation and design goals
+        for the driver, as well as details of the internal
+        implementation.</p>
+
+      <!-- ======================================================================= -->
+      <h2 id="features">Features and Goals</h2>
+      <!-- ======================================================================= -->
+
+      <p>The Clang driver is intended to be a production quality
+        compiler driver providing access to the Clang compiler and
+        tools, with a command line interface which is compatible with
+        the gcc driver.</p>
+
+      <p>Although the driver is part of and driven by the Clang
+        project, it is logically a separate tool which shares many of
+        the same goals as Clang:</p>
+
+      <p><b>Features</b>:</p>
+      <ul>
+        <li><a href="#gcccompat">GCC Compatibility</a></li>
+        <li><a href="#components">Flexible</a></li>
+        <li><a href="#performance">Low Overhead</a></li>
+        <li><a href="#simple">Simple</a></li>
+      </ul>
+
+      <!--=======================================================================-->
+      <h3 id="gcccompat">GCC Compatibility</h3>
+      <!--=======================================================================-->
+
+      <p>The number one goal of the driver is to ease the adoption of
+        Clang by allowing users to drop Clang into a build system
+        which was designed to call GCC. Although this makes the driver
+        much more complicated than might otherwise be necessary, we
+        decided that being very compatible with the gcc command line
+        interface was worth it in order to allow users to quickly test
+        clang on their projects.</p>
+
+      <!--=======================================================================-->
+      <h3 id="components">Flexible</h3>
+      <!--=======================================================================-->
+
+      <p>The driver was designed to be flexible and easily accomodate
+        new uses as we grow the clang and LLVM infrastructure. As one
+        example, the driver can easily support the introduction of
+        tools which have an integrated assembler; something we hope to
+        add to LLVM in the future.</p>
+
+      <p>Similarly, most of the driver functionality is kept in a
+        library which can be used to build other tools which want to
+        implement or accept a gcc like interface. </p>
+
+      <!--=======================================================================-->
+      <h3 id="performance">Low Overhead</h3>
+      <!--=======================================================================-->
+
+      <p>The driver should have as little overhead as possible. In
+        practice, we found that the gcc driver by itself incurred a
+        small but meaningful overhead when compiling many small
+        files. The driver doesn't do much work compared to a
+        compilation, but we have tried to keep it as efficient as
+        possible by following a few simple principles:</p>
+      <ul>
+        <li>Avoid memory allocation and string copying when
+          possible.</li>
+
+        <li>Don't parse arguments more than once.</li>
+
+        <li>Provide a few simple interfaces for effienctly searching
+          arguments.</li>
+      </ul>
+
+      <!--=======================================================================-->
+      <h3 id="simple">Simple</h3>
+      <!--=======================================================================-->
+
+      <p>Finally, the driver was designed to be "as simple as
+        possible", given the other goals. Notably, trying to be
+        completely compatible with the gcc driver adds a significant
+        amount of complexity. However, the design of the driver
+        attempts to mitigate this complexity by dividing the process
+        into a number of independent stages instead of a single
+        monolithic task.</p>
+
+      <!-- ======================================================================= -->
+      <h2 id="design">Internal Design and Implementation</h2>
+      <!-- ======================================================================= -->
+
+      <ul>
+        <li><a href="#int_intro">Internals Introduction</a></li>
+        <li><a href="#int_overview">Design Overview</a></li>
+        <li><a href="#int_notes">Additional Notes</a></li>
+        <li><a href="#int_gcc_concepts">Relation to GCC Driver Concepts</a></li>
+      </ul>
+
+      <!--=======================================================================-->
+      <h3><a name="int_intro">Internals Introduction</a></h3>
+      <!--=======================================================================-->
+
+      <p>In order to satisfy the stated goals, the driver was designed
+        to completely subsume the functionality of the gcc executable;
+        that is, the driver should not need to delegate to gcc to
+        perform subtasks. On Darwin, this implies that the Clang
+        driver also subsumes the gcc driver-driver, which is used to
+        implement support for building universal images (binaries and
+        object files). This also implies that the driver should be
+        able to call the language specific compilers (e.g. cc1)
+        directly, which means that it must have enough information to
+        forward command line arguments to child processes
+        correctly.</p>
+
+      <!--=======================================================================-->
+      <h3><a name="int_overview">Design Overview</a></h3>
+      <!--=======================================================================-->
+
+      <p>The diagram below shows the significant components of the
+        driver architecture and how they relate to one another. The
+        orange components represent concrete data structures built by
+        the driver, the green components indicate conceptually
+        distinct stages which manipulate these data structures, and
+        the blue components are important helper classes. </p>
+
+      <center>
+        <a href="DriverArchitecture.png" alt="Driver Architecture Diagram">
+          <img width=400 src="DriverArchitecture.png">
+        </a>
+      </center>
+
+      <!--=======================================================================-->
+      <h3><a name="int_stages">Driver Stages</a></h3>
+      <!--=======================================================================-->
+
+      <p>The driver functionality is conceptually divided into five stages:</p>
+
+      <ol>
+        <li>
+          <b>Parse: Option Parsing</b>
+
+          <p>The command line argument strings are decomposed into
+            arguments (<tt>Arg</tt> instances). The driver expects to
+            understand all available options, although there is some
+            facility for just passing certain classes of options
+            through (like <tt>-Wl,</tt>).</p>
+
+          <p>Each argument corresponds to exactly one
+            abstract <tt>Option</tt> definition, which describes how
+            the option is parsed along with some additional
+            metadata. The Arg instances themselves are lightweight and
+            merely contain enough information for clients to determine
+            which option they correspond to and their values (if they
+            have additional parameters).</p>
+
+          <p>For example, a command line like "-Ifoo -I foo" would
+            parse to two Arg instances (a JoinedArg and a SeparateArg
+            instance), but each would refer to the same Option.</p>
+
+          <p>Options are lazily created in order to avoid populating
+            all Option classes when the driver is loaded. Most of the
+            driver code only needs to deal with options by their
+            unique ID (e.g., <tt>options::OPT_I</tt>),</p>
+
+          <p>Arg instances themselves do not generally store the
+            values of parameters. In almost all cases, this would
+            simply result in creating unnecessary string
+            copies. Instead, Arg instances are always embedded inside
+            an ArgList structure, which contains the original vector
+            of argument strings. Each Arg itself can then only contain
+            an index into this vector instead of storing its values
+            directly.</p>
+
+          <p>The current clang driver can dump the results of this
+            stage using the <tt>-ccc-print-options</tt> flag (which
+            must preceed any actual command line arguments). For
+            example:</p>
+          <pre>
+            $ <b>clang -ccc-print-options -Xarch_i386 -fomit-frame-pointer -Wa,-fast -Ifoo -I foo t.c</b>
+            Option 0 - Name: "-Xarch_", Values: {"i386", "-fomit-frame-pointer"}
+            Option 1 - Name: "-Wa,", Values: {"-fast"}
+            Option 2 - Name: "-I", Values: {"foo"}
+            Option 3 - Name: "-I", Values: {"foo"}
+            Option 4 - Name: "&lt;input&gt;", Values: {"t.c"}
+          </pre>
+
+          <p>After this stage is complete the command line should be
+            broken down into well defined option objects with their
+            appropriate parameters.  Subsequent stages should rarely,
+            if ever, need to do any string processing.</p>
+        </li>
+
+        <li>
+          <b>Pipeline: Compilation Job Construction</b>
+
+          <p>Once the arguments are parsed, the tree of subprocess
+            jobs needed for the desired compilation sequence are
+            constructed. This involves determing the input files and
+            their types, what work is to be done on them (preprocess,
+            compile, assemble, link, etc.), and constructing a list of
+            Action instances for each task. The result is a list of
+            one or more top-level actions, each of which generally
+            corresponds to a single output (for example, an object or
+            linked executable).</p>
+
+          <p>The majority of Actions correspond to actual tasks,
+            however there are two special Actions. The first is
+            InputAction, which simply serves to adapt an input
+            argument for use as an input to other Actions. The second
+            is BindArchAction, which conceptually alters the
+            architecture to be used for all of its input Actions.</p>
+
+          <p>The current clang driver can dump the results of this
+            stage using the <tt>-ccc-print-phases</tt> flag. For
+            example:</p>
+          <pre>
+            $ <b>clang -ccc-print-phases -x c t.c -x assembler t.s</b>
+            0: input, "t.c", c
+            1: preprocessor, {0}, cpp-output
+            2: compiler, {1}, assembler
+            3: assembler, {2}, object
+            4: input, "t.s", assembler
+            5: assembler, {4}, object
+            6: linker, {3, 5}, image
+          </pre>
+          <p>Here the driver is constructing sevent distinct actions,
+            four to compile the "t.c" input into an object file, two to
+            assemble the "t.s" input, and one to link them together.</p>
+
+          <p>A rather different compilation pipeline is shown here; in
+            this example there are two top level actions to compile
+            the input files into two separate object files, where each
+            object file is built using <tt>lipo</tt> to merge results
+            built for two separate architectures.</p>
+          <pre>
+            $ <b>clang -ccc-print-phases -c -arch i386 -arch x86_64 t0.c t1.c</b>
+            0: input, "t0.c", c
+            1: preprocessor, {0}, cpp-output
+            2: compiler, {1}, assembler
+            3: assembler, {2}, object
+            4: bind-arch, "i386", {3}, object
+            5: bind-arch, "x86_64", {3}, object
+            6: lipo, {4, 5}, object
+            7: input, "t1.c", c
+            8: preprocessor, {7}, cpp-output
+            9: compiler, {8}, assembler
+            10: assembler, {9}, object
+            11: bind-arch, "i386", {10}, object
+            12: bind-arch, "x86_64", {10}, object
+            13: lipo, {11, 12}, object
+          </pre>
+
+          <p>After this stage is complete the compilation process is
+            divided into a simple set of actions which need to be
+            performed to produce intermediate or final outputs (in
+            some cases, like <tt>-fsyntax-only</tt>, there is no
+            "real" final output). Phases are well known compilation
+            steps, such as "preprocess", "compile", "assemble",
+            "link", etc.</p>
+        </li>
+
+        <li>
+          <b>Bind: Tool &amp; Filename Selection</b>
+
+          <p>The stage (in conjunction with the Translate stage) turns
+            the tree of Actions into a list of actual subprocess to
+            run. Conceptually, the driver performs a simple tree match
+            to assign Action(s) to Tools. Once an Action has been
+            bound to a Tool, the driver interacts with the tool to
+            determine how the Tools should be connected (via pipes,
+            temporary files, or user provided filenames) and whether
+            the tool supports things like an integrated
+            preprocessor.</p>
+
+          <p>The driver interacts with a ToolChain to perform the Tool
+            bindings. Each ToolChain contains information about all
+            the tools needed for compilation for a particular
+            architecture, platform, and operating system. A single
+            driver may query multiple ToolChains during a single
+            compilation in order to interact with tools for separate
+            architectures.</p>
+
+          <p>The results of this stage are not computed directly, but
+            the driver can print the results via
+            the <tt>-ccc-print-bindings</tt> option. For example:</p>
+          <pre>
+            $ <b>clang -ccc-print-bindings -arch i386 -arch ppc t0.c</b>
+            # "i386-apple-darwin10.0.0d6" - "clang", inputs: ["t0.c"], output: "/tmp/cc-Sn4RKF.s"
+            # "i386-apple-darwin10.0.0d6" - "darwin::Assemble", inputs: ["/tmp/cc-Sn4RKF.s"], output: "/tmp/cc-gvSnbS.o"
+            # "i386-apple-darwin10.0.0d6" - "darwin::Link", inputs: ["/tmp/cc-gvSnbS.o"], output: "/tmp/cc-jgHQxi.out"
+            # "ppc-apple-darwin10.0.0d6" - "gcc::Compile", inputs: ["t0.c"], output: "/tmp/cc-Q0bTox.s"
+            # "ppc-apple-darwin10.0.0d6" - "gcc::Assemble", inputs: ["/tmp/cc-Q0bTox.s"], output: "/tmp/cc-WCdicw.o"
+            # "ppc-apple-darwin10.0.0d6" - "gcc::Link", inputs: ["/tmp/cc-WCdicw.o"], output: "/tmp/cc-HHBEBh.out"
+            # "i386-apple-darwin10.0.0d6" - "darwin::Lipo", inputs: ["/tmp/cc-jgHQxi.out", "/tmp/cc-HHBEBh.out"], output: "a.out"
+          </pre>
+
+          <p>This shows the tool chain, tool, inputs and outputs which
+            have been bound for this compilation sequence. Here clang
+            is being used to compile t0.c on the i386 architecture and
+            darwin specific versions of the tools are being used to
+            assemble and link the result, but generic gcc versions of
+            the tools are being used on PowerPC.</p>
+        </li>
+
+        <li>
+          <b>Translate: Tool Specific Argument Translation</b>
+
+          <p>Once a Tool has been selected to perform a particular
+            Action, the Tool must construct concrete Jobs which will be
+            executed during compilation. The main work is in translating
+            from the gcc style command line options to whatever options
+            the subprocess expects.</p>
+
+          <p>Some tools, such as the assembler, only interact with a
+            handful of arguments and just determine the path of the
+            executable to call and pass on their input and output
+            arguments. Others, like the compiler or the linker, may
+            translate a large number of arguments in addition.</p>
+
+          <p>The ArgList class provides a number of simple helper
+            methods to assist with translating arguments; for example,
+            to pass on only the last of arguments corresponding to some
+            option, or all arguments for an option.</p>
+
+          <p>The result of this stage is a list of Jobs (executable
+            paths and argument strings) to execute.</p>
+        </li>
+
+        <li>
+          <b>Execute</b>
+          <p>Finally, the compilation pipeline is executed. This is
+            mostly straightforward, although there is some interaction
+            with options
+            like <tt>-pipe</tt>, <tt>-pass-exit-codes</tt>
+            and <tt>-time</tt>.</p>
+        </li>
+
+      </ol>
+
+      <!--=======================================================================-->
+      <h3><a name="int_notes">Additional Notes</a></h3>
+      <!--=======================================================================-->
+
+      <h4 id="int_compilation">The Compilation Object</h4>
+
+      <p>The driver constructs a Compilation object for each set of
+        command line arguments. The Driver itself is intended to be
+        invariant during construct of a Compilation; an IDE should be
+        able to construct a single long lived driver instance to use
+        for an entire build, for example.</p>
+
+      <p>The Compilation object holds information that is particular
+        to each compilation sequence. For example, the list of used
+        temporary files (which must be removed once compilation is
+        finished) and result files (which should be removed if
+        compilation files).</p>
+
+      <h4 id="int_unified_parsing">Unified Parsing &amp; Pipelining</h4>
+
+      <p>Parsing and pipeling both occur without reference to a
+        Compilation instance. This is by design; the driver expects that
+        both of these phases are platform neutral, with a few very well
+        defined exceptions such as whether the platform uses a driver
+        driver.</p>
+
+      <h4 id="int_toolchain_translation">ToolChain Argument Translation</h4>
+
+      <p>In order to match gcc very closely, the clang driver
+        currently allows tool chains to perform their own translation of
+        the argument list (into a new ArgList data structure). Although
+        this allows the clang driver to match gcc easily, it also makes
+        the driver operation much harder to understand (since the Tools
+        stop seeing some arguments the user provided, and see new ones
+        instead).</p>
+
+      <p>For example, on Darwin <tt>-gfull</tt> gets translated into
+        two separate arguments, <tt>-g</tt>
+        and <tt>-fno-eliminate-unused-debug-symbols</tt>. Trying to
+        write Tool logic to do something with <tt>-gfull</tt> will not
+        work, because at Tools run after the arguments have been
+        translated.</p>
+
+      <p>A long term goal is to remove this tool chain specific
+        translation, and instead force each tool to change its own logic
+        to do the right thing on the untranslated original arguments.</p>
+
+      <h4 id="int_unused_warnings">Unused Argument Warnings</h4>
+      <p>The driver operates by parsing all arguments but giving Tools
+        the opportunity to choose which arguments to pass on. One
+        downside of this infrastructure is that if the user misspells
+        some option, or is confused about which options to use, some
+        command line arguments the user really cared about may go
+        unused. This problem is particularly important when using
+        clang as a compiler, since the clang compiler does not support
+        anywhere all the options that gcc does, and we want to make
+        sure users know which ones are being used.</p>
+
+      <p>To support this, the driver maintains a bit associated with
+        each argument of whether it has been used (at all) during the
+        compilation. This bit usually doesn't need to be set by hand,
+        as the key ArgList accessors will set it automatically.</p>
+
+      <p>When a compilation is successfull (there are no errors), the
+        driver checks the bit and emits an "unused argument" warning for
+        any arguments which were never accessed. This is conservative
+        (the argument may not have been used to do what the user wanted)
+        but still catches the most obvious cases.</p>
+
+      <!--=======================================================================-->
+      <h3><a name="int_gcc_concepts">Relation to GCC Driver Concepts</a></h3>
+      <!--=======================================================================-->
+
+      <p>For those familiar with the gcc driver, this section provides
+        a brief overview of how things from the gcc driver map to the
+        clang driver.</p>
+
+      <ul>
+        <li>
+          <b>Driver Driver</b>
+          <p>The driver driver is fully integrated into the clang
+            driver. The driver simply constructs additional Actions to
+            bind the architecture during the <i>Pipeline</i>
+            phase. The tool chain specific argument translation is
+            responsible for handling <tt>-Xarch_</tt>.</p>
+
+          <p>The one caveat is that this approach
+            requires <tt>-Xarch_</tt> not be used to alter the
+            compilation itself (for example, one cannot
+            provide <tt>-S</tt> as an <tt>-Xarch_</tt> argument). The
+            driver attempts to reject such invocations, and overall
+            there isn't a good reason to abuse <tt>-Xarch_</tt> to
+            that end in practice.</p>
+
+          <p>The upside is that the clang driver is more efficient and
+            does little extra work to support universal builds. It also
+            provides better error reporting and UI consistency.</p>
+        </li>
+
+        <li>
+          <b>Specs</b>
+          <p>The clang driver has no direct correspondant for
+            "specs". The majority of the functionality that is
+            embedded in specs is in the Tool specific argument
+            translation routines. The parts of specs which control the
+            compilation pipeline are generally part of
+            the <ii>Pipeline</ii> stage.</p>
+        </li>
+
+        <li>
+          <b>Toolchains</b>
+          <p>The gcc driver has no direct understanding of tool
+            chains. Each gcc binary roughly corresponds to the
+            information which is embedded inside a single
+            ToolChain.</p>
+
+          <p>The clang driver is intended to be portable and support
+            complex compilation environments. All platform and tool
+            chain specific code should be protected behind either
+            abstract or well defined interfaces (such as whether the
+            platform supports use as a driver driver).</p>
+        </li>
+      </ul>
+    </div>
+  </body>
+</html>