Benjamin Kramer | 665a8dc | 2012-01-15 15:26:07 +0000 | [diff] [blame] | 1 | <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" |
| 2 | "http://www.w3.org/TR/html4/strict.dtd"> |
Douglas Gregor | 29dde39 | 2009-06-03 21:57:43 +0000 | [diff] [blame] | 3 | <html> |
| 4 | <head> |
Douglas Gregor | 72e4f25 | 2012-09-16 01:44:02 +0000 | [diff] [blame] | 5 | <title>Precompiled Header and Modules Internals</title> |
Benjamin Kramer | 665a8dc | 2012-01-15 15:26:07 +0000 | [diff] [blame] | 6 | <link type="text/css" rel="stylesheet" href="../menu.css"> |
| 7 | <link type="text/css" rel="stylesheet" href="../content.css"> |
Douglas Gregor | 29dde39 | 2009-06-03 21:57:43 +0000 | [diff] [blame] | 8 | <style type="text/css"> |
| 9 | td { |
| 10 | vertical-align: top; |
| 11 | } |
| 12 | </style> |
Douglas Gregor | 32110df | 2009-05-20 00:16:32 +0000 | [diff] [blame] | 13 | </head> |
| 14 | |
| 15 | <body> |
| 16 | |
| 17 | <!--#include virtual="../menu.html.incl"--> |
| 18 | |
| 19 | <div id="content"> |
| 20 | |
Douglas Gregor | 72e4f25 | 2012-09-16 01:44:02 +0000 | [diff] [blame] | 21 | <h1>Precompiled Header and Modules Internals</h1> |
Douglas Gregor | 32110df | 2009-05-20 00:16:32 +0000 | [diff] [blame] | 22 | |
| 23 | <p>This document describes the design and implementation of Clang's |
Douglas Gregor | 72e4f25 | 2012-09-16 01:44:02 +0000 | [diff] [blame] | 24 | precompiled headers (PCH) and modules. If you are interested in the end-user |
Douglas Gregor | 32110df | 2009-05-20 00:16:32 +0000 | [diff] [blame] | 25 | view, please see the <a |
| 26 | href="UsersManual.html#precompiledheaders">User's Manual</a>.</p> |
| 27 | |
Douglas Gregor | 923cb23 | 2009-06-03 18:35:59 +0000 | [diff] [blame] | 28 | <p><b>Table of Contents</b></p> |
| 29 | <ul> |
| 30 | <li><a href="#usage">Using Precompiled Headers with |
Daniel Dunbar | 69cfd86 | 2009-12-11 23:17:03 +0000 | [diff] [blame] | 31 | <tt>clang</tt></a></li> |
Douglas Gregor | 923cb23 | 2009-06-03 18:35:59 +0000 | [diff] [blame] | 32 | <li><a href="#philosophy">Design Philosophy</a></li> |
Douglas Gregor | 72e4f25 | 2012-09-16 01:44:02 +0000 | [diff] [blame] | 33 | <li><a href="#contents">Serialized AST File Contents</a> |
Douglas Gregor | 923cb23 | 2009-06-03 18:35:59 +0000 | [diff] [blame] | 34 | <ul> |
| 35 | <li><a href="#metadata">Metadata Block</a></li> |
| 36 | <li><a href="#sourcemgr">Source Manager Block</a></li> |
| 37 | <li><a href="#preprocessor">Preprocessor Block</a></li> |
| 38 | <li><a href="#types">Types Block</a></li> |
| 39 | <li><a href="#decls">Declarations Block</a></li> |
| 40 | <li><a href="#stmt">Statements and Expressions</a></li> |
| 41 | <li><a href="#idtable">Identifier Table Block</a></li> |
| 42 | <li><a href="#method-pool">Method Pool Block</a></li> |
| 43 | </ul> |
| 44 | </li> |
Douglas Gregor | 72e4f25 | 2012-09-16 01:44:02 +0000 | [diff] [blame] | 45 | <li><a href="#tendrils">AST Reader Integration Points</a></li> |
| 46 | <li><a href="#chained">Chained precompiled headers</a></li> |
| 47 | <li><a href="#modules">Modules</a></li> |
Douglas Gregor | 0084ead | 2009-06-03 21:41:31 +0000 | [diff] [blame] | 48 | </ul> |
Douglas Gregor | 923cb23 | 2009-06-03 18:35:59 +0000 | [diff] [blame] | 49 | |
Daniel Dunbar | 69cfd86 | 2009-12-11 23:17:03 +0000 | [diff] [blame] | 50 | <h2 id="usage">Using Precompiled Headers with <tt>clang</tt></h2> |
Douglas Gregor | 32110df | 2009-05-20 00:16:32 +0000 | [diff] [blame] | 51 | |
Daniel Dunbar | 69cfd86 | 2009-12-11 23:17:03 +0000 | [diff] [blame] | 52 | <p>The Clang compiler frontend, <tt>clang -cc1</tt>, supports two command line |
| 53 | options for generating and using PCH files.<p> |
Douglas Gregor | 32110df | 2009-05-20 00:16:32 +0000 | [diff] [blame] | 54 | |
Daniel Dunbar | 69cfd86 | 2009-12-11 23:17:03 +0000 | [diff] [blame] | 55 | <p>To generate PCH files using <tt>clang -cc1</tt>, use the option |
Douglas Gregor | 32110df | 2009-05-20 00:16:32 +0000 | [diff] [blame] | 56 | <b><tt>-emit-pch</tt></b>: |
| 57 | |
Daniel Dunbar | 69cfd86 | 2009-12-11 23:17:03 +0000 | [diff] [blame] | 58 | <pre> $ clang -cc1 test.h -emit-pch -o test.h.pch </pre> |
Douglas Gregor | 32110df | 2009-05-20 00:16:32 +0000 | [diff] [blame] | 59 | |
| 60 | <p>This option is transparently used by <tt>clang</tt> when generating |
| 61 | PCH files. The resulting PCH file contains the serialized form of the |
| 62 | compiler's internal representation after it has completed parsing and |
| 63 | semantic analysis. The PCH file can then be used as a prefix header |
| 64 | with the <b><tt>-include-pch</tt></b> option:</p> |
| 65 | |
| 66 | <pre> |
Daniel Dunbar | 69cfd86 | 2009-12-11 23:17:03 +0000 | [diff] [blame] | 67 | $ clang -cc1 -include-pch test.h.pch test.c -o test.s |
Douglas Gregor | 32110df | 2009-05-20 00:16:32 +0000 | [diff] [blame] | 68 | </pre> |
| 69 | |
Douglas Gregor | 923cb23 | 2009-06-03 18:35:59 +0000 | [diff] [blame] | 70 | <h2 id="philosophy">Design Philosophy</h2> |
Douglas Gregor | 32110df | 2009-05-20 00:16:32 +0000 | [diff] [blame] | 71 | |
| 72 | <p>Precompiled headers are meant to improve overall compile times for |
| 73 | projects, so the design of precompiled headers is entirely driven by |
| 74 | performance concerns. The use case for precompiled headers is |
| 75 | relatively simple: when there is a common set of headers that is |
| 76 | included in nearly every source file in the project, we |
| 77 | <i>precompile</i> that bundle of headers into a single precompiled |
| 78 | header (PCH file). Then, when compiling the source files in the |
| 79 | project, we load the PCH file first (as a prefix header), which acts |
| 80 | as a stand-in for that bundle of headers.</p> |
| 81 | |
| 82 | <p>A precompiled header implementation improves performance when:</p> |
| 83 | <ul> |
| 84 | <li>Loading the PCH file is significantly faster than re-parsing the |
| 85 | bundle of headers stored within the PCH file. Thus, a precompiled |
| 86 | header design attempts to minimize the cost of reading the PCH |
| 87 | file. Ideally, this cost should not vary with the size of the |
| 88 | precompiled header file.</li> |
| 89 | |
| 90 | <li>The cost of generating the PCH file initially is not so large |
| 91 | that it counters the per-source-file performance improvement due to |
| 92 | eliminating the need to parse the bundled headers in the first |
| 93 | place. This is particularly important on multi-core systems, because |
| 94 | PCH file generation serializes the build when all compilations |
| 95 | require the PCH file to be up-to-date.</li> |
| 96 | </ul> |
Douglas Gregor | 2cc390e | 2009-06-02 22:08:07 +0000 | [diff] [blame] | 97 | |
Douglas Gregor | 72e4f25 | 2012-09-16 01:44:02 +0000 | [diff] [blame] | 98 | <p>Modules, as implemented in Clang, use the same mechanisms as |
| 99 | precompiled headers to save a serialized AST file (one per module) and |
| 100 | use those AST modules. From an implementation standpoint, modules are |
| 101 | a generalization of precompiled headers, lifting a number of |
| 102 | restrictions placed on precompiled headers. In particular, there can |
| 103 | only be one precompiled header and it must be included at the |
| 104 | beginning of the translation unit. The extensions to the AST file |
| 105 | format required for modules are discussed in the section on <a href="#modules">modules</a>.</p> |
| 106 | |
| 107 | <p>Clang's AST files are designed with a compact on-disk |
| 108 | representation, which minimizes both creation time and the time |
| 109 | required to initially load the AST file. The AST file itself contains |
Douglas Gregor | 2cc390e | 2009-06-02 22:08:07 +0000 | [diff] [blame] | 110 | a serialized representation of Clang's abstract syntax trees and |
| 111 | supporting data structures, stored using the same compressed bitstream |
| 112 | as <a href="http://llvm.org/docs/BitCodeFormat.html">LLVM's bitcode |
| 113 | file format</a>.</p> |
| 114 | |
Douglas Gregor | 72e4f25 | 2012-09-16 01:44:02 +0000 | [diff] [blame] | 115 | <p>Clang's AST files are loaded "lazily" from disk. When an |
| 116 | AST file is initially loaded, Clang reads only a small amount of data |
| 117 | from the AST file to establish where certain important data structures |
Douglas Gregor | 2cc390e | 2009-06-02 22:08:07 +0000 | [diff] [blame] | 118 | are stored. The amount of data read in this initial load is |
Douglas Gregor | 72e4f25 | 2012-09-16 01:44:02 +0000 | [diff] [blame] | 119 | independent of the size of the AST file, such that a larger AST file |
| 120 | does not lead to longer AST load times. The actual header data in the |
| 121 | AST file--macros, functions, variables, types, etc.--is loaded only |
Douglas Gregor | 2cc390e | 2009-06-02 22:08:07 +0000 | [diff] [blame] | 122 | when it is referenced from the user's code, at which point only that |
| 123 | entity (and those entities it depends on) are deserialized from the |
Douglas Gregor | 72e4f25 | 2012-09-16 01:44:02 +0000 | [diff] [blame] | 124 | AST file. With this approach, the cost of using an AST file |
Douglas Gregor | 2cc390e | 2009-06-02 22:08:07 +0000 | [diff] [blame] | 125 | for a translation unit is proportional to the amount of code actually |
Douglas Gregor | 72e4f25 | 2012-09-16 01:44:02 +0000 | [diff] [blame] | 126 | used from the AST file, rather than being proportional to the size of |
| 127 | the AST file itself.</p> |
Douglas Gregor | 4c0397f | 2009-06-03 21:55:35 +0000 | [diff] [blame] | 128 | |
| 129 | <p>When given the <code>-print-stats</code> option, Clang produces |
Douglas Gregor | 72e4f25 | 2012-09-16 01:44:02 +0000 | [diff] [blame] | 130 | statistics describing how much of the AST file was actually |
Douglas Gregor | 4c0397f | 2009-06-03 21:55:35 +0000 | [diff] [blame] | 131 | loaded from disk. For a simple "Hello, World!" program that includes |
| 132 | the Apple <code>Cocoa.h</code> header (which is built as a precompiled |
| 133 | header), this option illustrates how little of the actual precompiled |
| 134 | header is required:</p> |
| 135 | |
| 136 | <pre> |
| 137 | *** PCH Statistics: |
| 138 | 933 stat cache hits |
| 139 | 4 stat cache misses |
| 140 | 895/39981 source location entries read (2.238563%) |
| 141 | 19/15315 types read (0.124061%) |
| 142 | 20/82685 declarations read (0.024188%) |
| 143 | 154/58070 identifiers read (0.265197%) |
| 144 | 0/7260 selectors read (0.000000%) |
| 145 | 0/30842 statements read (0.000000%) |
| 146 | 4/8400 macros read (0.047619%) |
| 147 | 1/4995 lexical declcontexts read (0.020020%) |
| 148 | 0/4413 visible declcontexts read (0.000000%) |
| 149 | 0/7230 method pool entries read (0.000000%) |
| 150 | 0 method pool misses |
| 151 | </pre> |
| 152 | |
| 153 | <p>For this small program, only a tiny fraction of the source |
| 154 | locations, types, declarations, identifiers, and macros were actually |
| 155 | deserialized from the precompiled header. These statistics can be |
Douglas Gregor | 72e4f25 | 2012-09-16 01:44:02 +0000 | [diff] [blame] | 156 | useful to determine whether the AST file implementation can |
Douglas Gregor | 4c0397f | 2009-06-03 21:55:35 +0000 | [diff] [blame] | 157 | be improved by making more of the implementation lazy.</p> |
Douglas Gregor | 2cc390e | 2009-06-02 22:08:07 +0000 | [diff] [blame] | 158 | |
Sebastian Redl | a93e3b5 | 2010-07-08 22:01:51 +0000 | [diff] [blame] | 159 | <p>Precompiled headers can be chained. When you create a PCH while |
| 160 | including an existing PCH, Clang can create the new PCH by referencing |
| 161 | the original file and only writing the new data to the new file. For |
| 162 | example, you could create a PCH out of all the headers that are very |
| 163 | commonly used throughout your project, and then create a PCH for every |
| 164 | single source file in the project that includes the code that is |
| 165 | specific to that file, so that recompiling the file itself is very fast, |
Douglas Gregor | 72e4f25 | 2012-09-16 01:44:02 +0000 | [diff] [blame] | 166 | without duplicating the data from the common headers for every |
| 167 | file. The mechanisms behind chained precompiled headers are discussed |
| 168 | in a <a href="#chained">later section</a>. |
Sebastian Redl | a93e3b5 | 2010-07-08 22:01:51 +0000 | [diff] [blame] | 169 | |
Douglas Gregor | 72e4f25 | 2012-09-16 01:44:02 +0000 | [diff] [blame] | 170 | <h2 id="contents">AST File Contents</h2> |
Douglas Gregor | 2cc390e | 2009-06-02 22:08:07 +0000 | [diff] [blame] | 171 | |
Benjamin Kramer | 665a8dc | 2012-01-15 15:26:07 +0000 | [diff] [blame] | 172 | <img src="PCHLayout.png" style="float:right" alt="Precompiled header layout"> |
Douglas Gregor | 2cc390e | 2009-06-02 22:08:07 +0000 | [diff] [blame] | 173 | |
Douglas Gregor | 72e4f25 | 2012-09-16 01:44:02 +0000 | [diff] [blame] | 174 | <p>Clang's AST files are organized into several different |
Douglas Gregor | 2cc390e | 2009-06-02 22:08:07 +0000 | [diff] [blame] | 175 | blocks, each of which contains the serialized representation of a part |
| 176 | of Clang's internal representation. Each of the blocks corresponds to |
| 177 | either a block or a record within <a |
| 178 | href="http://llvm.org/docs/BitCodeFormat.html">LLVM's bitstream |
| 179 | format</a>. The contents of each of these logical blocks are described |
| 180 | below.</p> |
| 181 | |
Douglas Gregor | 72e4f25 | 2012-09-16 01:44:02 +0000 | [diff] [blame] | 182 | <p>For a given AST file, the <a |
Douglas Gregor | 4c0397f | 2009-06-03 21:55:35 +0000 | [diff] [blame] | 183 | href="http://llvm.org/cmds/llvm-bcanalyzer.html"><code>llvm-bcanalyzer</code></a> |
| 184 | utility can be used to examine the actual structure of the bitstream |
Douglas Gregor | 72e4f25 | 2012-09-16 01:44:02 +0000 | [diff] [blame] | 185 | for the AST file. This information can be used both to help |
| 186 | understand the structure of the AST file and to isolate |
| 187 | areas where AST files can still be optimized, e.g., through |
Douglas Gregor | 4c0397f | 2009-06-03 21:55:35 +0000 | [diff] [blame] | 188 | the introduction of abbreviations.</p> |
| 189 | |
Douglas Gregor | 923cb23 | 2009-06-03 18:35:59 +0000 | [diff] [blame] | 190 | <h3 id="metadata">Metadata Block</h3> |
Douglas Gregor | 2cc390e | 2009-06-02 22:08:07 +0000 | [diff] [blame] | 191 | |
| 192 | <p>The metadata block contains several records that provide |
Douglas Gregor | 72e4f25 | 2012-09-16 01:44:02 +0000 | [diff] [blame] | 193 | information about how the AST file was built. This metadata |
| 194 | is primarily used to validate the use of an AST file. For |
Douglas Gregor | fe3f223 | 2009-06-03 18:26:16 +0000 | [diff] [blame] | 195 | example, a precompiled header built for a 32-bit x86 target cannot be used |
| 196 | when compiling for a 64-bit x86 target. The metadata block contains |
Douglas Gregor | 2cc390e | 2009-06-02 22:08:07 +0000 | [diff] [blame] | 197 | information about:</p> |
| 198 | |
| 199 | <dl> |
| 200 | <dt>Language options</dt> |
| 201 | <dd>Describes the particular language dialect used to compile the |
Douglas Gregor | 72e4f25 | 2012-09-16 01:44:02 +0000 | [diff] [blame] | 202 | AST file, including major options (e.g., Objective-C support) and more |
Douglas Gregor | 2cc390e | 2009-06-02 22:08:07 +0000 | [diff] [blame] | 203 | minor options (e.g., support for "//" comments). The contents of this |
| 204 | record correspond to the <code>LangOptions</code> class.</dd> |
Douglas Gregor | 32110df | 2009-05-20 00:16:32 +0000 | [diff] [blame] | 205 | |
Douglas Gregor | 2cc390e | 2009-06-02 22:08:07 +0000 | [diff] [blame] | 206 | <dt>Target architecture</dt> |
| 207 | <dd>The target triple that describes the architecture, platform, and |
Douglas Gregor | 72e4f25 | 2012-09-16 01:44:02 +0000 | [diff] [blame] | 208 | ABI for which the AST file was generated, e.g., |
Douglas Gregor | 2cc390e | 2009-06-02 22:08:07 +0000 | [diff] [blame] | 209 | <code>i386-apple-darwin9</code>.</dd> |
| 210 | |
Douglas Gregor | 72e4f25 | 2012-09-16 01:44:02 +0000 | [diff] [blame] | 211 | <dt>AST version</dt> |
| 212 | <dd>The major and minor version numbers of the AST file |
Douglas Gregor | 2cc390e | 2009-06-02 22:08:07 +0000 | [diff] [blame] | 213 | format. Changes in the minor version number should not affect backward |
| 214 | compatibility, while changes in the major version number imply that a |
| 215 | newer compiler cannot read an older precompiled header (and |
| 216 | vice-versa).</dd> |
| 217 | |
| 218 | <dt>Original file name</dt> |
| 219 | <dd>The full path of the header that was used to generate the |
Douglas Gregor | 72e4f25 | 2012-09-16 01:44:02 +0000 | [diff] [blame] | 220 | AST file.</dd> |
Douglas Gregor | 2cc390e | 2009-06-02 22:08:07 +0000 | [diff] [blame] | 221 | |
| 222 | <dt>Predefines buffer</dt> |
| 223 | <dd>Although not explicitly stored as part of the metadata, the |
Douglas Gregor | 72e4f25 | 2012-09-16 01:44:02 +0000 | [diff] [blame] | 224 | predefines buffer is used in the validation of the AST file. |
Douglas Gregor | 2cc390e | 2009-06-02 22:08:07 +0000 | [diff] [blame] | 225 | The predefines buffer itself contains code generated by the compiler |
| 226 | to initialize the preprocessor state according to the current target, |
| 227 | platform, and command-line options. For example, the predefines buffer |
| 228 | will contain "<code>#define __STDC__ 1</code>" when we are compiling C |
| 229 | without Microsoft extensions. The predefines buffer itself is stored |
| 230 | within the <a href="#sourcemgr">source manager block</a>, but its |
Douglas Gregor | 5accbb9 | 2009-06-03 16:06:22 +0000 | [diff] [blame] | 231 | contents are verified along with the rest of the metadata.</dd> |
| 232 | |
| 233 | </dl> |
Douglas Gregor | 2cc390e | 2009-06-02 22:08:07 +0000 | [diff] [blame] | 234 | |
Douglas Gregor | 72e4f25 | 2012-09-16 01:44:02 +0000 | [diff] [blame] | 235 | <p>A chained PCH file (that is, one that references another PCH) and a |
| 236 | module (which may import other modules) have additional metadata |
| 237 | containing the list of all AST files that this AST file depends |
| 238 | on. Each of those files will be loaded along with this AST file.</p> |
Sebastian Redl | a93e3b5 | 2010-07-08 22:01:51 +0000 | [diff] [blame] | 239 | |
Douglas Gregor | 72e4f25 | 2012-09-16 01:44:02 +0000 | [diff] [blame] | 240 | <p>For chained precompiled headers, the language options, target |
| 241 | architecture and predefines buffer data is taken from the end of the |
| 242 | chain, since they have to match anyway.</p> |
Sebastian Redl | a93e3b5 | 2010-07-08 22:01:51 +0000 | [diff] [blame] | 243 | |
Douglas Gregor | 923cb23 | 2009-06-03 18:35:59 +0000 | [diff] [blame] | 244 | <h3 id="sourcemgr">Source Manager Block</h3> |
Douglas Gregor | 2cc390e | 2009-06-02 22:08:07 +0000 | [diff] [blame] | 245 | |
| 246 | <p>The source manager block contains the serialized representation of |
| 247 | Clang's <a |
| 248 | href="InternalsManual.html#SourceLocation">SourceManager</a> class, |
| 249 | which handles the mapping from source locations (as represented in |
| 250 | Clang's abstract syntax tree) into actual column/line positions within |
Douglas Gregor | 72e4f25 | 2012-09-16 01:44:02 +0000 | [diff] [blame] | 251 | a source file or macro instantiation. The AST file's |
Douglas Gregor | 2cc390e | 2009-06-02 22:08:07 +0000 | [diff] [blame] | 252 | representation of the source manager also includes information about |
| 253 | all of the headers that were (transitively) included when building the |
Douglas Gregor | 72e4f25 | 2012-09-16 01:44:02 +0000 | [diff] [blame] | 254 | AST file.</p> |
Douglas Gregor | 2cc390e | 2009-06-02 22:08:07 +0000 | [diff] [blame] | 255 | |
| 256 | <p>The bulk of the source manager block is dedicated to information |
| 257 | about the various files, buffers, and macro instantiations into which |
| 258 | a source location can refer. Each of these is referenced by a numeric |
| 259 | "file ID", which is a unique number (allocated starting at 1) stored |
| 260 | in the source location. Clang serializes the information for each kind |
| 261 | of file ID, along with an index that maps file IDs to the position |
Douglas Gregor | 72e4f25 | 2012-09-16 01:44:02 +0000 | [diff] [blame] | 262 | within the AST file where the information about that file ID is |
Douglas Gregor | 2cc390e | 2009-06-02 22:08:07 +0000 | [diff] [blame] | 263 | stored. The data associated with a file ID is loaded only when |
| 264 | required by the front end, e.g., to emit a diagnostic that includes a |
| 265 | macro instantiation history inside the header itself.</p> |
| 266 | |
| 267 | <p>The source manager block also contains information about all of the |
Douglas Gregor | 72e4f25 | 2012-09-16 01:44:02 +0000 | [diff] [blame] | 268 | headers that were included when building the AST file. This |
Douglas Gregor | 2cc390e | 2009-06-02 22:08:07 +0000 | [diff] [blame] | 269 | includes information about the controlling macro for the header (e.g., |
| 270 | when the preprocessor identified that the contents of the header |
| 271 | dependent on a macro like <code>LLVM_CLANG_SOURCEMANAGER_H</code>) |
| 272 | along with a cached version of the results of the <code>stat()</code> |
Douglas Gregor | 72e4f25 | 2012-09-16 01:44:02 +0000 | [diff] [blame] | 273 | system calls performed when building the AST file. The |
Douglas Gregor | 2cc390e | 2009-06-02 22:08:07 +0000 | [diff] [blame] | 274 | latter is particularly useful in reducing system time when searching |
| 275 | for include files.</p> |
| 276 | |
Douglas Gregor | 923cb23 | 2009-06-03 18:35:59 +0000 | [diff] [blame] | 277 | <h3 id="preprocessor">Preprocessor Block</h3> |
Douglas Gregor | 2cc390e | 2009-06-02 22:08:07 +0000 | [diff] [blame] | 278 | |
| 279 | <p>The preprocessor block contains the serialized representation of |
| 280 | the preprocessor. Specifically, it contains all of the macros that |
| 281 | have been defined by the end of the header used to build the |
Douglas Gregor | 72e4f25 | 2012-09-16 01:44:02 +0000 | [diff] [blame] | 282 | AST file, along with the token sequences that comprise each |
| 283 | macro. The macro definitions are only read from the AST file when the |
Douglas Gregor | 2cc390e | 2009-06-02 22:08:07 +0000 | [diff] [blame] | 284 | name of the macro first occurs in the program. This lazy loading of |
Chris Lattner | 57eccbe | 2009-06-13 18:11:10 +0000 | [diff] [blame] | 285 | macro definitions is triggered by lookups into the <a |
Douglas Gregor | 2cc390e | 2009-06-02 22:08:07 +0000 | [diff] [blame] | 286 | href="#idtable">identifier table</a>.</p> |
| 287 | |
Douglas Gregor | 923cb23 | 2009-06-03 18:35:59 +0000 | [diff] [blame] | 288 | <h3 id="types">Types Block</h3> |
Douglas Gregor | 2cc390e | 2009-06-02 22:08:07 +0000 | [diff] [blame] | 289 | |
| 290 | <p>The types block contains the serialized representation of all of |
| 291 | the types referenced in the translation unit. Each Clang type node |
| 292 | (<code>PointerType</code>, <code>FunctionProtoType</code>, etc.) has a |
Douglas Gregor | 72e4f25 | 2012-09-16 01:44:02 +0000 | [diff] [blame] | 293 | corresponding record type in the AST file. When types are deserialized |
| 294 | from the AST file, the data within the record is used to |
Douglas Gregor | 2cc390e | 2009-06-02 22:08:07 +0000 | [diff] [blame] | 295 | reconstruct the appropriate type node using the AST context.</p> |
| 296 | |
| 297 | <p>Each type has a unique type ID, which is an integer that uniquely |
| 298 | identifies that type. Type ID 0 represents the NULL type, type IDs |
| 299 | less than <code>NUM_PREDEF_TYPE_IDS</code> represent predefined types |
| 300 | (<code>void</code>, <code>float</code>, etc.), while other |
| 301 | "user-defined" type IDs are assigned consecutively from |
| 302 | <code>NUM_PREDEF_TYPE_IDS</code> upward as the types are encountered. |
Douglas Gregor | 72e4f25 | 2012-09-16 01:44:02 +0000 | [diff] [blame] | 303 | The AST file has an associated mapping from the user-defined types |
Douglas Gregor | 2cc390e | 2009-06-02 22:08:07 +0000 | [diff] [blame] | 304 | block to the location within the types block where the serialized |
| 305 | representation of that type resides, enabling lazy deserialization of |
Douglas Gregor | 72e4f25 | 2012-09-16 01:44:02 +0000 | [diff] [blame] | 306 | types. When a type is referenced from within the AST file, that |
Douglas Gregor | 2cc390e | 2009-06-02 22:08:07 +0000 | [diff] [blame] | 307 | reference is encoded using the type ID shifted left by 3 bits. The |
| 308 | lower three bits are used to represent the <code>const</code>, |
| 309 | <code>volatile</code>, and <code>restrict</code> qualifiers, as in |
| 310 | Clang's <a |
| 311 | href="http://clang.llvm.org/docs/InternalsManual.html#Type">QualType</a> |
| 312 | class.</p> |
| 313 | |
Douglas Gregor | 923cb23 | 2009-06-03 18:35:59 +0000 | [diff] [blame] | 314 | <h3 id="decls">Declarations Block</h3> |
Douglas Gregor | 2cc390e | 2009-06-02 22:08:07 +0000 | [diff] [blame] | 315 | |
| 316 | <p>The declarations block contains the serialized representation of |
| 317 | all of the declarations referenced in the translation unit. Each Clang |
| 318 | declaration node (<code>VarDecl</code>, <code>FunctionDecl</code>, |
Douglas Gregor | 72e4f25 | 2012-09-16 01:44:02 +0000 | [diff] [blame] | 319 | etc.) has a corresponding record type in the AST file. When |
| 320 | declarations are deserialized from the AST file, the data |
Douglas Gregor | 2cc390e | 2009-06-02 22:08:07 +0000 | [diff] [blame] | 321 | within the record is used to build and populate a new instance of the |
| 322 | corresponding <code>Decl</code> node. As with types, each declaration |
| 323 | node has a numeric ID that is used to refer to that declaration within |
Douglas Gregor | 72e4f25 | 2012-09-16 01:44:02 +0000 | [diff] [blame] | 324 | the AST file. In addition, a lookup table provides a mapping from that |
Douglas Gregor | 2cc390e | 2009-06-02 22:08:07 +0000 | [diff] [blame] | 325 | numeric ID to the offset within the precompiled header where that |
| 326 | declaration is described.</p> |
| 327 | |
| 328 | <p>Declarations in Clang's abstract syntax trees are stored |
| 329 | hierarchically. At the top of the hierarchy is the translation unit |
| 330 | (<code>TranslationUnitDecl</code>), which contains all of the |
Douglas Gregor | 72e4f25 | 2012-09-16 01:44:02 +0000 | [diff] [blame] | 331 | declarations in the translation unit but is not actually written as a |
| 332 | specific declaration node. Its child declarations (such as |
Chris Lattner | 57eccbe | 2009-06-13 18:11:10 +0000 | [diff] [blame] | 333 | functions or struct types) may also contain other declarations inside |
Douglas Gregor | 2cc390e | 2009-06-02 22:08:07 +0000 | [diff] [blame] | 334 | them, and so on. Within Clang, each declaration is stored within a <a |
| 335 | href="http://clang.llvm.org/docs/InternalsManual.html#DeclContext">declaration |
| 336 | context</a>, as represented by the <code>DeclContext</code> class. |
| 337 | Declaration contexts provide the mechanism to perform name lookup |
| 338 | within a given declaration (e.g., find the member named <code>x</code> |
| 339 | in a structure) and iterate over the declarations stored within a |
| 340 | context (e.g., iterate over all of the fields of a structure for |
| 341 | structure layout).</p> |
| 342 | |
Douglas Gregor | 72e4f25 | 2012-09-16 01:44:02 +0000 | [diff] [blame] | 343 | <p>In Clang's AST file format, deserializing a declaration |
Douglas Gregor | 2cc390e | 2009-06-02 22:08:07 +0000 | [diff] [blame] | 344 | that is a <code>DeclContext</code> is a separate operation from |
| 345 | deserializing all of the declarations stored within that declaration |
| 346 | context. Therefore, Clang will deserialize the translation unit |
| 347 | declaration without deserializing the declarations within that |
| 348 | translation unit. When required, the declarations stored within a |
Chris Lattner | 57eccbe | 2009-06-13 18:11:10 +0000 | [diff] [blame] | 349 | declaration context will be deserialized. There are two representations |
Douglas Gregor | 2cc390e | 2009-06-02 22:08:07 +0000 | [diff] [blame] | 350 | of the declarations within a declaration context, which correspond to |
| 351 | the name-lookup and iteration behavior described above:</p> |
| 352 | |
| 353 | <ul> |
| 354 | <li>When the front end performs name lookup to find a name |
| 355 | <code>x</code> within a given declaration context (for example, |
| 356 | during semantic analysis of the expression <code>p->x</code>, |
| 357 | where <code>p</code>'s type is defined in the precompiled header), |
Douglas Gregor | 72e4f25 | 2012-09-16 01:44:02 +0000 | [diff] [blame] | 358 | Clang refers to an on-disk hash table that maps from the names |
| 359 | within that declaration context to the declaration IDs that |
| 360 | represent each visible declaration with that name. The actual |
| 361 | declarations will then be deserialized to provide the results of |
| 362 | name lookup.</li> |
Douglas Gregor | 2cc390e | 2009-06-02 22:08:07 +0000 | [diff] [blame] | 363 | |
| 364 | <li>When the front end performs iteration over all of the |
| 365 | declarations within a declaration context, all of those declarations |
| 366 | are immediately de-serialized. For large declaration contexts (e.g., |
| 367 | the translation unit), this operation is expensive; however, large |
| 368 | declaration contexts are not traversed in normal compilation, since |
| 369 | such a traversal is unnecessary. However, it is common for the code |
| 370 | generator and semantic analysis to traverse declaration contexts for |
| 371 | structs, classes, unions, and enumerations, although those contexts |
| 372 | contain relatively few declarations in the common case.</li> |
| 373 | </ul> |
| 374 | |
Douglas Gregor | 923cb23 | 2009-06-03 18:35:59 +0000 | [diff] [blame] | 375 | <h3 id="stmt">Statements and Expressions</h3> |
Douglas Gregor | 5accbb9 | 2009-06-03 16:06:22 +0000 | [diff] [blame] | 376 | |
Douglas Gregor | 72e4f25 | 2012-09-16 01:44:02 +0000 | [diff] [blame] | 377 | <p>Statements and expressions are stored in the AST file in |
Douglas Gregor | 5accbb9 | 2009-06-03 16:06:22 +0000 | [diff] [blame] | 378 | both the <a href="#types">types</a> and the <a |
| 379 | href="#decls">declarations</a> blocks, because every statement or |
| 380 | expression will be associated with either a type or declaration. The |
| 381 | actual statement and expression records are stored immediately |
| 382 | following the declaration or type that owns the statement or |
| 383 | expression. For example, the statement representing the body of a |
| 384 | function will be stored directly following the declaration of the |
| 385 | function.</p> |
| 386 | |
| 387 | <p>As with types and declarations, each statement and expression kind |
| 388 | in Clang's abstract syntax tree (<code>ForStmt</code>, |
| 389 | <code>CallExpr</code>, etc.) has a corresponding record type in the |
Douglas Gregor | 72e4f25 | 2012-09-16 01:44:02 +0000 | [diff] [blame] | 390 | AST file, which contains the serialized representation of |
Douglas Gregor | fe3f223 | 2009-06-03 18:26:16 +0000 | [diff] [blame] | 391 | that statement or expression. Each substatement or subexpression |
| 392 | within an expression is stored as a separate record (which keeps most |
Douglas Gregor | 72e4f25 | 2012-09-16 01:44:02 +0000 | [diff] [blame] | 393 | records to a fixed size). Within the AST file, the |
Argyrios Kyrtzidis | 86d3ca5 | 2010-09-13 17:48:02 +0000 | [diff] [blame] | 394 | subexpressions of an expression are stored, in reverse order, prior to the expression |
Douglas Gregor | fe3f223 | 2009-06-03 18:26:16 +0000 | [diff] [blame] | 395 | that owns those expression, using a form of <a |
| 396 | href="http://en.wikipedia.org/wiki/Reverse_Polish_notation">Reverse |
| 397 | Polish Notation</a>. For example, an expression <code>3 - 4 + 5</code> |
| 398 | would be represented as follows:</p> |
| 399 | |
| 400 | <table border="1"> |
Douglas Gregor | fe3f223 | 2009-06-03 18:26:16 +0000 | [diff] [blame] | 401 | <tr><td><code>IntegerLiteral(5)</code></td></tr> |
Argyrios Kyrtzidis | 86d3ca5 | 2010-09-13 17:48:02 +0000 | [diff] [blame] | 402 | <tr><td><code>IntegerLiteral(4)</code></td></tr> |
| 403 | <tr><td><code>IntegerLiteral(3)</code></td></tr> |
| 404 | <tr><td><code>BinaryOperator(-)</code></td></tr> |
Douglas Gregor | fe3f223 | 2009-06-03 18:26:16 +0000 | [diff] [blame] | 405 | <tr><td><code>BinaryOperator(+)</code></td></tr> |
| 406 | <tr><td>STOP</td></tr> |
| 407 | </table> |
| 408 | |
| 409 | <p>When reading this representation, Clang evaluates each expression |
Argyrios Kyrtzidis | 86d3ca5 | 2010-09-13 17:48:02 +0000 | [diff] [blame] | 410 | record it encounters, builds the appropriate abstract syntax tree node, |
Douglas Gregor | fe3f223 | 2009-06-03 18:26:16 +0000 | [diff] [blame] | 411 | and then pushes that expression on to a stack. When a record contains <i>N</i> |
| 412 | subexpressions--<code>BinaryOperator</code> has two of them--those |
| 413 | expressions are popped from the top of the stack. The special STOP |
| 414 | code indicates that we have reached the end of a serialized expression |
| 415 | or statement; other expression or statement records may follow, but |
| 416 | they are part of a different expression.</p> |
Douglas Gregor | 5accbb9 | 2009-06-03 16:06:22 +0000 | [diff] [blame] | 417 | |
Douglas Gregor | 923cb23 | 2009-06-03 18:35:59 +0000 | [diff] [blame] | 418 | <h3 id="idtable">Identifier Table Block</h3> |
Douglas Gregor | 2cc390e | 2009-06-02 22:08:07 +0000 | [diff] [blame] | 419 | |
| 420 | <p>The identifier table block contains an on-disk hash table that maps |
Douglas Gregor | 72e4f25 | 2012-09-16 01:44:02 +0000 | [diff] [blame] | 421 | each identifier mentioned within the AST file to the |
Douglas Gregor | 2cc390e | 2009-06-02 22:08:07 +0000 | [diff] [blame] | 422 | serialized representation of the identifier's information (e.g, the |
| 423 | <code>IdentifierInfo</code> structure). The serialized representation |
| 424 | contains:</p> |
| 425 | |
| 426 | <ul> |
| 427 | <li>The actual identifier string.</li> |
| 428 | <li>Flags that describe whether this identifier is the name of a |
| 429 | built-in, a poisoned identifier, an extension token, or a |
| 430 | macro.</li> |
| 431 | <li>If the identifier names a macro, the offset of the macro |
| 432 | definition within the <a href="#preprocessor">preprocessor |
| 433 | block</a>.</li> |
| 434 | <li>If the identifier names one or more declarations visible from |
| 435 | translation unit scope, the <a href="#decls">declaration IDs</a> of these |
| 436 | declarations.</li> |
| 437 | </ul> |
| 438 | |
Douglas Gregor | 72e4f25 | 2012-09-16 01:44:02 +0000 | [diff] [blame] | 439 | <p>When an AST file is loaded, the AST file reader |
Douglas Gregor | 2cc390e | 2009-06-02 22:08:07 +0000 | [diff] [blame] | 440 | mechanism introduces itself into the identifier table as an external |
| 441 | lookup source. Thus, when the user program refers to an identifier |
| 442 | that has not yet been seen, Clang will perform a lookup into the |
Chris Lattner | 57eccbe | 2009-06-13 18:11:10 +0000 | [diff] [blame] | 443 | identifier table. If an identifier is found, its contents (macro |
Douglas Gregor | 72e4f25 | 2012-09-16 01:44:02 +0000 | [diff] [blame] | 444 | definitions, flags, top-level declarations, etc.) will be |
| 445 | deserialized, at which point the corresponding |
| 446 | <code>IdentifierInfo</code> structure will have the same contents it |
| 447 | would have after parsing the headers in the AST file.</p> |
Douglas Gregor | 2cc390e | 2009-06-02 22:08:07 +0000 | [diff] [blame] | 448 | |
Douglas Gregor | 72e4f25 | 2012-09-16 01:44:02 +0000 | [diff] [blame] | 449 | <p>Within the AST file, the identifiers used to name declarations are represented with an integral value. A separate table provides a mapping from this integral value (the identifier ID) to the location within the on-disk |
Douglas Gregor | 2cc390e | 2009-06-02 22:08:07 +0000 | [diff] [blame] | 450 | hash table where that identifier is stored. This mapping is used when |
| 451 | deserializing the name of a declaration, the identifier of a token, or |
Douglas Gregor | 72e4f25 | 2012-09-16 01:44:02 +0000 | [diff] [blame] | 452 | any other construct in the AST file that refers to a name.</p> |
Douglas Gregor | 2cc390e | 2009-06-02 22:08:07 +0000 | [diff] [blame] | 453 | |
Douglas Gregor | 923cb23 | 2009-06-03 18:35:59 +0000 | [diff] [blame] | 454 | <h3 id="method-pool">Method Pool Block</h3> |
Douglas Gregor | 5accbb9 | 2009-06-03 16:06:22 +0000 | [diff] [blame] | 455 | |
| 456 | <p>The method pool block is represented as an on-disk hash table that |
| 457 | serves two purposes: it provides a mapping from the names of |
| 458 | Objective-C selectors to the set of Objective-C instance and class |
| 459 | methods that have that particular selector (which is required for |
| 460 | semantic analysis in Objective-C) and also stores all of the selectors |
Douglas Gregor | 72e4f25 | 2012-09-16 01:44:02 +0000 | [diff] [blame] | 461 | used by entities within the AST file. The design of the |
Douglas Gregor | 5accbb9 | 2009-06-03 16:06:22 +0000 | [diff] [blame] | 462 | method pool is similar to that of the <a href="#idtable">identifier |
| 463 | table</a>: the first time a particular selector is formed during the |
| 464 | compilation of the program, Clang will search in the on-disk hash |
| 465 | table of selectors; if found, Clang will read the Objective-C methods |
| 466 | associated with that selector into the appropriate front-end data |
| 467 | structure (<code>Sema::InstanceMethodPool</code> and |
| 468 | <code>Sema::FactoryMethodPool</code> for instance and class methods, |
| 469 | respectively).</p> |
| 470 | |
| 471 | <p>As with identifiers, selectors are represented by numeric values |
Douglas Gregor | 72e4f25 | 2012-09-16 01:44:02 +0000 | [diff] [blame] | 472 | within the AST file. A separate index maps these numeric selector |
Douglas Gregor | 5accbb9 | 2009-06-03 16:06:22 +0000 | [diff] [blame] | 473 | values to the offset of the selector within the on-disk hash table, |
| 474 | and will be used when de-serializing an Objective-C method declaration |
| 475 | (or other Objective-C construct) that refers to the selector.</p> |
| 476 | |
Douglas Gregor | 72e4f25 | 2012-09-16 01:44:02 +0000 | [diff] [blame] | 477 | <h2 id="tendrils">AST Reader Integration Points</h2> |
Douglas Gregor | 0084ead | 2009-06-03 21:41:31 +0000 | [diff] [blame] | 478 | |
Douglas Gregor | 72e4f25 | 2012-09-16 01:44:02 +0000 | [diff] [blame] | 479 | <p>The "lazy" deserialization behavior of AST files requires |
Douglas Gregor | 0084ead | 2009-06-03 21:41:31 +0000 | [diff] [blame] | 480 | their integration into several completely different submodules of |
| 481 | Clang. For example, lazily deserializing the declarations during name |
| 482 | lookup requires that the name-lookup routines be able to query the |
Douglas Gregor | 72e4f25 | 2012-09-16 01:44:02 +0000 | [diff] [blame] | 483 | AST file to find entities stored there.</p> |
Douglas Gregor | 0084ead | 2009-06-03 21:41:31 +0000 | [diff] [blame] | 484 | |
| 485 | <p>For each Clang data structure that requires direct interaction with |
Douglas Gregor | 72e4f25 | 2012-09-16 01:44:02 +0000 | [diff] [blame] | 486 | the AST reader logic, there is an abstract class that provides |
| 487 | the interface between the two modules. The <code>ASTReader</code> |
| 488 | class, which handles the loading of an AST file, inherits |
Douglas Gregor | 0084ead | 2009-06-03 21:41:31 +0000 | [diff] [blame] | 489 | from all of these abstract classes to provide lazy deserialization of |
Douglas Gregor | 72e4f25 | 2012-09-16 01:44:02 +0000 | [diff] [blame] | 490 | Clang's data structures. <code>ASTReader</code> implements the |
Douglas Gregor | 0084ead | 2009-06-03 21:41:31 +0000 | [diff] [blame] | 491 | following abstract classes:</p> |
| 492 | |
| 493 | <dl> |
| 494 | <dt><code>StatSysCallCache</code></dt> |
| 495 | <dd>This abstract interface is associated with the |
| 496 | <code>FileManager</code> class, and is used whenever the file |
| 497 | manager is going to perform a <code>stat()</code> system call.</dd> |
| 498 | |
| 499 | <dt><code>ExternalSLocEntrySource</code></dt> |
| 500 | <dd>This abstract interface is associated with the |
| 501 | <code>SourceManager</code> class, and is used whenever the |
| 502 | <a href="#sourcemgr">source manager</a> needs to load the details |
| 503 | of a file, buffer, or macro instantiation.</dd> |
| 504 | |
| 505 | <dt><code>IdentifierInfoLookup</code></dt> |
| 506 | <dd>This abstract interface is associated with the |
| 507 | <code>IdentifierTable</code> class, and is used whenever the |
| 508 | program source refers to an identifier that has not yet been seen. |
Douglas Gregor | 72e4f25 | 2012-09-16 01:44:02 +0000 | [diff] [blame] | 509 | In this case, the AST reader searches for |
Douglas Gregor | 0084ead | 2009-06-03 21:41:31 +0000 | [diff] [blame] | 510 | this identifier within its <a href="#idtable">identifier table</a> |
| 511 | to load any top-level declarations or macros associated with that |
| 512 | identifier.</dd> |
| 513 | |
| 514 | <dt><code>ExternalASTSource</code></dt> |
| 515 | <dd>This abstract interface is associated with the |
| 516 | <code>ASTContext</code> class, and is used whenever the abstract |
Douglas Gregor | 72e4f25 | 2012-09-16 01:44:02 +0000 | [diff] [blame] | 517 | syntax tree nodes need to loaded from the AST file. It |
Douglas Gregor | 0084ead | 2009-06-03 21:41:31 +0000 | [diff] [blame] | 518 | provides the ability to de-serialize declarations and types |
| 519 | identified by their numeric values, read the bodies of functions |
| 520 | when required, and read the declarations stored within a |
| 521 | declaration context (either for iteration or for name lookup).</dd> |
| 522 | |
| 523 | <dt><code>ExternalSemaSource</code></dt> |
| 524 | <dd>This abstract interface is associated with the <code>Sema</code> |
| 525 | class, and is used whenever semantic analysis needs to read |
| 526 | information from the <a href="#methodpool">global method |
| 527 | pool</a>.</dd> |
| 528 | </dl> |
| 529 | |
Douglas Gregor | 72e4f25 | 2012-09-16 01:44:02 +0000 | [diff] [blame] | 530 | <h2 id="chained">Chained precompiled headers</h2> |
| 531 | |
| 532 | <p>Chained precompiled headers were initially intended to improve the |
| 533 | performance of IDE-centric operations such as syntax highlighting and |
| 534 | code completion while a particular source file is being edited by the |
| 535 | user. To minimize the amount of reparsing required after a change to |
| 536 | the file, a form of precompiled header--called a precompiled |
| 537 | <i>preamble</i>--is automatically generated by parsing all of the |
| 538 | headers in the source file, up to and including the last |
| 539 | #include. When only the source file changes (and none of the headers |
| 540 | it depends on), reparsing of that source file can use the precompiled |
| 541 | preamble and start parsing after the #includes, so parsing time is |
| 542 | proportional to the size of the source file (rather than all of its |
| 543 | includes). However, the compilation of that translation unit |
| 544 | may already uses a precompiled header: in this case, Clang will create |
| 545 | the precompiled preamble as a chained precompiled header that refers |
| 546 | to the original precompiled header. This drastically reduces the time |
| 547 | needed to serialize the precompiled preamble for use in reparsing.</p> |
| 548 | |
| 549 | <p>Chained precompiled headers get their name because each precompiled header |
| 550 | can depend on one other precompiled header, forming a chain of |
| 551 | dependencies. A translation unit will then include the precompiled |
| 552 | header that starts the chain (i.e., nothing depends on it). This |
| 553 | linearity of dependencies is important for the semantic model of |
| 554 | chained precompiled headers, because the most-recent precompiled |
| 555 | header can provide information that overrides the information provided |
| 556 | by the precompiled headers it depends on, just like a header file |
| 557 | <code>B.h</code> that includes another header <code>A.h</code> can |
| 558 | modify the state produced by parsing <code>A.h</code>, e.g., by |
| 559 | <code>#undef</code>'ing a macro defined in <code>A.h</code>.</p> |
| 560 | |
| 561 | <p>There are several ways in which chained precompiled headers |
| 562 | generalize the AST file model:</p> |
| 563 | |
| 564 | <dl> |
| 565 | <dt>Numbering of IDs</dt> |
| 566 | <dd>Many different kinds of entities--identifiers, declarations, |
| 567 | types, etc.---have ID numbers that start at 1 or some other |
| 568 | predefined constant and grow upward. Each precompiled header records |
| 569 | the maximum ID number it has assigned in each category. Then, when a |
| 570 | new precompiled header is generated that depends on (chains to) |
| 571 | another precompiled header, it will start counting at the next |
| 572 | available ID number. This way, one can determine, given an ID |
| 573 | number, which AST file actually contains the entity.</dd> |
| 574 | |
| 575 | <dt>Name lookup</dt> |
| 576 | <dd>When writing a chained precompiled header, Clang attempts to |
| 577 | write only information that has changed from the precompiled header |
| 578 | on which it is based. This changes the lookup algorithm for the |
| 579 | various tables, such as the <a href="#idtable">identifier table</a>: |
| 580 | the search starts at the most-recent precompiled header. If no entry |
| 581 | is found, lookup then proceeds to the identifier table in the |
| 582 | precompiled header it depends on, and so one. Once a lookup |
| 583 | succeeds, that result is considered definitive, overriding any |
| 584 | results from earlier precompiled headers.</dd> |
| 585 | |
| 586 | <dt>Update records</dt> |
| 587 | <dd>There are various ways in which a later precompiled header can |
| 588 | modify the entities described in an earlier precompiled header. For |
| 589 | example, later precompiled headers can add entries into the various |
| 590 | name-lookup tables for the translation unit or namespaces, or add |
| 591 | new categories to an Objective-C class. Each of these updates is |
| 592 | captured in an "update record" that is stored in the chained |
| 593 | precompiled header file and will be loaded along with the original |
| 594 | entity.</dd> |
| 595 | </dl> |
| 596 | |
| 597 | <h2 id="modules">Modules</h2> |
| 598 | |
| 599 | <p>Modules generalize the chained precompiled header model yet |
| 600 | further, from a linear chain of precompiled headers to an arbitrary |
| 601 | directed acyclic graph (DAG) of AST files. All of the same techniques |
| 602 | used to make chained precompiled headers work---ID number, name |
| 603 | lookup, update records---are shared with modules. However, the DAG |
| 604 | nature of modules introduce a number of additional complications to |
| 605 | the model: |
| 606 | |
| 607 | <dl> |
| 608 | <dt>Numbering of IDs</dt> |
| 609 | <dd>The simple, linear numbering scheme used in chained precompiled |
| 610 | headers falls apart with the module DAG, because different modules |
| 611 | may end up with different numbering schemes for entities they |
| 612 | imported from common shared modules. To account for this, each |
| 613 | module file provides information about which modules it depends on |
| 614 | and which ID numbers it assigned to the entities in those modules, |
| 615 | as well as which ID numbers it took for its own new entities. The |
| 616 | AST reader then maps these "local" ID numbers into a "global" ID |
| 617 | number space for the current translation unit, providing a 1-1 |
| 618 | mapping between entities (in whatever AST file they inhabit) and |
| 619 | global ID numbers. If that translation unit is then serialized into |
| 620 | an AST file, this mapping will be stored for use when the AST file |
| 621 | is imported.</dd> |
| 622 | |
| 623 | <dt>Declaration merging</dt> |
| 624 | <dd>It is possible for a given entity (from the language's |
| 625 | perspective) to be declared multiple times in different places. For |
| 626 | example, two different headers can have the declaration of |
| 627 | <tt>printf</tt> or could forward-declare <tt>struct stat</tt>. If |
| 628 | each of those headers is included in a module, and some third party |
| 629 | imports both of those modules, there is a potentially serious |
| 630 | problem: name lookup for <tt>printf</tt> or <tt>struct stat</tt> will |
| 631 | find both declarations, but the AST nodes are unrelated. This would |
| 632 | result in a compilation error, due to an ambiguity in name |
| 633 | lookup. Therefore, the AST reader performs declaration merging |
Douglas Gregor | 0e6b155 | 2012-09-21 20:16:09 +0000 | [diff] [blame] | 634 | according to the appropriate language semantics, ensuring that the |
Douglas Gregor | 72e4f25 | 2012-09-16 01:44:02 +0000 | [diff] [blame] | 635 | two disjoint declarations are merged into a single redeclaration |
| 636 | chain (with a common canonical declaration), so that it is as if one |
| 637 | of the headers had been included before the other.</dd> |
| 638 | |
| 639 | <dt>Name Visibility</dt> |
| 640 | <dd>Modules allow certain names that occur during module creation to |
| 641 | be "hidden", so that they are not part of the public interface of |
| 642 | the module and are not visible to its clients. The AST reader |
| 643 | maintains a "visible" bit on various AST nodes (declarations, macros, |
| 644 | etc.) to indicate whether that particular AST node is currently |
| 645 | visible; the various name lookup mechanisms in Clang inspect the |
| 646 | visible bit to determine whether that entity, which is still in the |
| 647 | AST (because other, visible AST nodes may depend on it), can |
| 648 | actually be found by name lookup. When a new (sub)module is |
| 649 | imported, it may make existing, non-visible, already-deserialized |
| 650 | AST nodes visible; it is the responsibility of the AST reader to |
| 651 | find and update these AST nodes when it is notified of the import.</dd> |
| 652 | |
| 653 | </dl> |
| 654 | |
Douglas Gregor | 32110df | 2009-05-20 00:16:32 +0000 | [diff] [blame] | 655 | </div> |
| 656 | |
Douglas Gregor | 4c0397f | 2009-06-03 21:55:35 +0000 | [diff] [blame] | 657 | </body> |
Douglas Gregor | 32110df | 2009-05-20 00:16:32 +0000 | [diff] [blame] | 658 | </html> |