Sean Silva | 3872b46 | 2012-12-12 23:44:55 +0000 | [diff] [blame] | 1 | ======================================== |
| 2 | Precompiled Header and Modules Internals |
| 3 | ======================================== |
| 4 | |
| 5 | .. contents:: |
| 6 | :local: |
| 7 | |
| 8 | This document describes the design and implementation of Clang's precompiled |
| 9 | headers (PCH) and modules. If you are interested in the end-user view, please |
Sean Silva | 159cc9e | 2013-01-02 13:07:47 +0000 | [diff] [blame] | 10 | see the :ref:`User's Manual <usersmanual-precompiled-headers>`. |
Sean Silva | 3872b46 | 2012-12-12 23:44:55 +0000 | [diff] [blame] | 11 | |
| 12 | Using Precompiled Headers with ``clang`` |
| 13 | ---------------------------------------- |
| 14 | |
| 15 | The Clang compiler frontend, ``clang -cc1``, supports two command line options |
| 16 | for generating and using PCH files. |
| 17 | |
| 18 | To generate PCH files using ``clang -cc1``, use the option :option:`-emit-pch`: |
| 19 | |
| 20 | .. code-block:: bash |
| 21 | |
| 22 | $ clang -cc1 test.h -emit-pch -o test.h.pch |
| 23 | |
| 24 | This option is transparently used by ``clang`` when generating PCH files. The |
| 25 | resulting PCH file contains the serialized form of the compiler's internal |
| 26 | representation after it has completed parsing and semantic analysis. The PCH |
| 27 | file can then be used as a prefix header with the :option:`-include-pch` |
| 28 | option: |
| 29 | |
| 30 | .. code-block:: bash |
| 31 | |
| 32 | $ clang -cc1 -include-pch test.h.pch test.c -o test.s |
| 33 | |
| 34 | Design Philosophy |
| 35 | ----------------- |
| 36 | |
| 37 | Precompiled headers are meant to improve overall compile times for projects, so |
| 38 | the design of precompiled headers is entirely driven by performance concerns. |
| 39 | The use case for precompiled headers is relatively simple: when there is a |
| 40 | common set of headers that is included in nearly every source file in the |
| 41 | project, we *precompile* that bundle of headers into a single precompiled |
| 42 | header (PCH file). Then, when compiling the source files in the project, we |
| 43 | load the PCH file first (as a prefix header), which acts as a stand-in for that |
| 44 | bundle of headers. |
| 45 | |
| 46 | A precompiled header implementation improves performance when: |
| 47 | |
| 48 | * Loading the PCH file is significantly faster than re-parsing the bundle of |
| 49 | headers stored within the PCH file. Thus, a precompiled header design |
| 50 | attempts to minimize the cost of reading the PCH file. Ideally, this cost |
| 51 | should not vary with the size of the precompiled header file. |
| 52 | |
| 53 | * The cost of generating the PCH file initially is not so large that it |
| 54 | counters the per-source-file performance improvement due to eliminating the |
| 55 | need to parse the bundled headers in the first place. This is particularly |
| 56 | important on multi-core systems, because PCH file generation serializes the |
| 57 | build when all compilations require the PCH file to be up-to-date. |
| 58 | |
| 59 | Modules, as implemented in Clang, use the same mechanisms as precompiled |
| 60 | headers to save a serialized AST file (one per module) and use those AST |
| 61 | modules. From an implementation standpoint, modules are a generalization of |
| 62 | precompiled headers, lifting a number of restrictions placed on precompiled |
| 63 | headers. In particular, there can only be one precompiled header and it must |
| 64 | be included at the beginning of the translation unit. The extensions to the |
| 65 | AST file format required for modules are discussed in the section on |
| 66 | :ref:`modules <pchinternals-modules>`. |
| 67 | |
| 68 | Clang's AST files are designed with a compact on-disk representation, which |
| 69 | minimizes both creation time and the time required to initially load the AST |
| 70 | file. The AST file itself contains a serialized representation of Clang's |
| 71 | abstract syntax trees and supporting data structures, stored using the same |
| 72 | compressed bitstream as `LLVM's bitcode file format |
| 73 | <http://llvm.org/docs/BitCodeFormat.html>`_. |
| 74 | |
| 75 | Clang's AST files are loaded "lazily" from disk. When an AST file is initially |
| 76 | loaded, Clang reads only a small amount of data from the AST file to establish |
| 77 | where certain important data structures are stored. The amount of data read in |
| 78 | this initial load is independent of the size of the AST file, such that a |
| 79 | larger AST file does not lead to longer AST load times. The actual header data |
| 80 | in the AST file --- macros, functions, variables, types, etc. --- is loaded |
| 81 | only when it is referenced from the user's code, at which point only that |
| 82 | entity (and those entities it depends on) are deserialized from the AST file. |
| 83 | With this approach, the cost of using an AST file for a translation unit is |
| 84 | proportional to the amount of code actually used from the AST file, rather than |
| 85 | being proportional to the size of the AST file itself. |
| 86 | |
| 87 | When given the :option:`-print-stats` option, Clang produces statistics |
| 88 | describing how much of the AST file was actually loaded from disk. For a |
| 89 | simple "Hello, World!" program that includes the Apple ``Cocoa.h`` header |
| 90 | (which is built as a precompiled header), this option illustrates how little of |
| 91 | the actual precompiled header is required: |
| 92 | |
| 93 | .. code-block:: none |
| 94 | |
Argyrios Kyrtzidis | 8c42a67 | 2013-02-14 00:12:44 +0000 | [diff] [blame] | 95 | *** AST File Statistics: |
Sean Silva | 3872b46 | 2012-12-12 23:44:55 +0000 | [diff] [blame] | 96 | 895/39981 source location entries read (2.238563%) |
| 97 | 19/15315 types read (0.124061%) |
| 98 | 20/82685 declarations read (0.024188%) |
| 99 | 154/58070 identifiers read (0.265197%) |
| 100 | 0/7260 selectors read (0.000000%) |
| 101 | 0/30842 statements read (0.000000%) |
| 102 | 4/8400 macros read (0.047619%) |
| 103 | 1/4995 lexical declcontexts read (0.020020%) |
| 104 | 0/4413 visible declcontexts read (0.000000%) |
| 105 | 0/7230 method pool entries read (0.000000%) |
| 106 | 0 method pool misses |
| 107 | |
| 108 | For this small program, only a tiny fraction of the source locations, types, |
| 109 | declarations, identifiers, and macros were actually deserialized from the |
| 110 | precompiled header. These statistics can be useful to determine whether the |
| 111 | AST file implementation can be improved by making more of the implementation |
| 112 | lazy. |
| 113 | |
| 114 | Precompiled headers can be chained. When you create a PCH while including an |
| 115 | existing PCH, Clang can create the new PCH by referencing the original file and |
| 116 | only writing the new data to the new file. For example, you could create a PCH |
| 117 | out of all the headers that are very commonly used throughout your project, and |
| 118 | then create a PCH for every single source file in the project that includes the |
| 119 | code that is specific to that file, so that recompiling the file itself is very |
| 120 | fast, without duplicating the data from the common headers for every file. The |
| 121 | mechanisms behind chained precompiled headers are discussed in a :ref:`later |
| 122 | section <pchinternals-chained>`. |
| 123 | |
| 124 | AST File Contents |
| 125 | ----------------- |
| 126 | |
| 127 | Clang's AST files are organized into several different blocks, each of which |
| 128 | contains the serialized representation of a part of Clang's internal |
| 129 | representation. Each of the blocks corresponds to either a block or a record |
| 130 | within `LLVM's bitstream format <http://llvm.org/docs/BitCodeFormat.html>`_. |
| 131 | The contents of each of these logical blocks are described below. |
| 132 | |
| 133 | .. image:: PCHLayout.png |
| 134 | |
| 135 | For a given AST file, the `llvm-bcanalyzer |
| 136 | <http://llvm.org/docs/CommandGuide/llvm-bcanalyzer.html>`_ utility can be used |
| 137 | to examine the actual structure of the bitstream for the AST file. This |
| 138 | information can be used both to help understand the structure of the AST file |
| 139 | and to isolate areas where AST files can still be optimized, e.g., through the |
| 140 | introduction of abbreviations. |
| 141 | |
| 142 | Metadata Block |
| 143 | ^^^^^^^^^^^^^^ |
| 144 | |
| 145 | The metadata block contains several records that provide information about how |
| 146 | the AST file was built. This metadata is primarily used to validate the use of |
| 147 | an AST file. For example, a precompiled header built for a 32-bit x86 target |
| 148 | cannot be used when compiling for a 64-bit x86 target. The metadata block |
| 149 | contains information about: |
| 150 | |
| 151 | Language options |
| 152 | Describes the particular language dialect used to compile the AST file, |
| 153 | including major options (e.g., Objective-C support) and more minor options |
| 154 | (e.g., support for "``//``" comments). The contents of this record correspond to |
| 155 | the ``LangOptions`` class. |
| 156 | |
| 157 | Target architecture |
| 158 | The target triple that describes the architecture, platform, and ABI for |
| 159 | which the AST file was generated, e.g., ``i386-apple-darwin9``. |
| 160 | |
| 161 | AST version |
| 162 | The major and minor version numbers of the AST file format. Changes in the |
| 163 | minor version number should not affect backward compatibility, while changes |
| 164 | in the major version number imply that a newer compiler cannot read an older |
| 165 | precompiled header (and vice-versa). |
| 166 | |
| 167 | Original file name |
| 168 | The full path of the header that was used to generate the AST file. |
| 169 | |
| 170 | Predefines buffer |
| 171 | Although not explicitly stored as part of the metadata, the predefines buffer |
| 172 | is used in the validation of the AST file. The predefines buffer itself |
| 173 | contains code generated by the compiler to initialize the preprocessor state |
| 174 | according to the current target, platform, and command-line options. For |
| 175 | example, the predefines buffer will contain "``#define __STDC__ 1``" when we |
| 176 | are compiling C without Microsoft extensions. The predefines buffer itself |
| 177 | is stored within the :ref:`pchinternals-sourcemgr`, but its contents are |
| 178 | verified along with the rest of the metadata. |
| 179 | |
| 180 | A chained PCH file (that is, one that references another PCH) and a module |
| 181 | (which may import other modules) have additional metadata containing the list |
| 182 | of all AST files that this AST file depends on. Each of those files will be |
| 183 | loaded along with this AST file. |
| 184 | |
| 185 | For chained precompiled headers, the language options, target architecture and |
| 186 | predefines buffer data is taken from the end of the chain, since they have to |
| 187 | match anyway. |
| 188 | |
| 189 | .. _pchinternals-sourcemgr: |
| 190 | |
| 191 | Source Manager Block |
| 192 | ^^^^^^^^^^^^^^^^^^^^ |
| 193 | |
| 194 | The source manager block contains the serialized representation of Clang's |
Dmitri Gribenko | 5cc0580 | 2012-12-15 20:41:17 +0000 | [diff] [blame] | 195 | :ref:`SourceManager <SourceManager>` class, which handles the mapping from |
| 196 | source locations (as represented in Clang's abstract syntax tree) into actual |
| 197 | column/line positions within a source file or macro instantiation. The AST |
| 198 | file's representation of the source manager also includes information about all |
| 199 | of the headers that were (transitively) included when building the AST file. |
Sean Silva | 3872b46 | 2012-12-12 23:44:55 +0000 | [diff] [blame] | 200 | |
| 201 | The bulk of the source manager block is dedicated to information about the |
| 202 | various files, buffers, and macro instantiations into which a source location |
| 203 | can refer. Each of these is referenced by a numeric "file ID", which is a |
| 204 | unique number (allocated starting at 1) stored in the source location. Clang |
| 205 | serializes the information for each kind of file ID, along with an index that |
| 206 | maps file IDs to the position within the AST file where the information about |
| 207 | that file ID is stored. The data associated with a file ID is loaded only when |
| 208 | required by the front end, e.g., to emit a diagnostic that includes a macro |
| 209 | instantiation history inside the header itself. |
| 210 | |
| 211 | The source manager block also contains information about all of the headers |
| 212 | that were included when building the AST file. This includes information about |
| 213 | the controlling macro for the header (e.g., when the preprocessor identified |
| 214 | that the contents of the header dependent on a macro like |
Argyrios Kyrtzidis | 8c42a67 | 2013-02-14 00:12:44 +0000 | [diff] [blame] | 215 | ``LLVM_CLANG_SOURCEMANAGER_H``). |
Sean Silva | 3872b46 | 2012-12-12 23:44:55 +0000 | [diff] [blame] | 216 | |
| 217 | .. _pchinternals-preprocessor: |
| 218 | |
| 219 | Preprocessor Block |
| 220 | ^^^^^^^^^^^^^^^^^^ |
| 221 | |
| 222 | The preprocessor block contains the serialized representation of the |
| 223 | preprocessor. Specifically, it contains all of the macros that have been |
| 224 | defined by the end of the header used to build the AST file, along with the |
| 225 | token sequences that comprise each macro. The macro definitions are only read |
| 226 | from the AST file when the name of the macro first occurs in the program. This |
| 227 | lazy loading of macro definitions is triggered by lookups into the |
| 228 | :ref:`identifier table <pchinternals-ident-table>`. |
| 229 | |
| 230 | .. _pchinternals-types: |
| 231 | |
| 232 | Types Block |
| 233 | ^^^^^^^^^^^ |
| 234 | |
| 235 | The types block contains the serialized representation of all of the types |
| 236 | referenced in the translation unit. Each Clang type node (``PointerType``, |
| 237 | ``FunctionProtoType``, etc.) has a corresponding record type in the AST file. |
| 238 | When types are deserialized from the AST file, the data within the record is |
| 239 | used to reconstruct the appropriate type node using the AST context. |
| 240 | |
| 241 | Each type has a unique type ID, which is an integer that uniquely identifies |
| 242 | that type. Type ID 0 represents the NULL type, type IDs less than |
| 243 | ``NUM_PREDEF_TYPE_IDS`` represent predefined types (``void``, ``float``, etc.), |
| 244 | while other "user-defined" type IDs are assigned consecutively from |
| 245 | ``NUM_PREDEF_TYPE_IDS`` upward as the types are encountered. The AST file has |
| 246 | an associated mapping from the user-defined types block to the location within |
| 247 | the types block where the serialized representation of that type resides, |
| 248 | enabling lazy deserialization of types. When a type is referenced from within |
| 249 | the AST file, that reference is encoded using the type ID shifted left by 3 |
| 250 | bits. The lower three bits are used to represent the ``const``, ``volatile``, |
Dmitri Gribenko | 5cc0580 | 2012-12-15 20:41:17 +0000 | [diff] [blame] | 251 | and ``restrict`` qualifiers, as in Clang's :ref:`QualType <QualType>` class. |
Sean Silva | 3872b46 | 2012-12-12 23:44:55 +0000 | [diff] [blame] | 252 | |
| 253 | .. _pchinternals-decls: |
| 254 | |
| 255 | Declarations Block |
| 256 | ^^^^^^^^^^^^^^^^^^ |
| 257 | |
| 258 | The declarations block contains the serialized representation of all of the |
| 259 | declarations referenced in the translation unit. Each Clang declaration node |
| 260 | (``VarDecl``, ``FunctionDecl``, etc.) has a corresponding record type in the |
| 261 | AST file. When declarations are deserialized from the AST file, the data |
| 262 | within the record is used to build and populate a new instance of the |
| 263 | corresponding ``Decl`` node. As with types, each declaration node has a |
| 264 | numeric ID that is used to refer to that declaration within the AST file. In |
| 265 | addition, a lookup table provides a mapping from that numeric ID to the offset |
| 266 | within the precompiled header where that declaration is described. |
| 267 | |
| 268 | Declarations in Clang's abstract syntax trees are stored hierarchically. At |
| 269 | the top of the hierarchy is the translation unit (``TranslationUnitDecl``), |
| 270 | which contains all of the declarations in the translation unit but is not |
| 271 | actually written as a specific declaration node. Its child declarations (such |
| 272 | as functions or struct types) may also contain other declarations inside them, |
Dmitri Gribenko | 5cc0580 | 2012-12-15 20:41:17 +0000 | [diff] [blame] | 273 | and so on. Within Clang, each declaration is stored within a :ref:`declaration |
| 274 | context <DeclContext>`, as represented by the ``DeclContext`` class. |
| 275 | Declaration contexts provide the mechanism to perform name lookup within a |
| 276 | given declaration (e.g., find the member named ``x`` in a structure) and |
| 277 | iterate over the declarations stored within a context (e.g., iterate over all |
| 278 | of the fields of a structure for structure layout). |
Sean Silva | 3872b46 | 2012-12-12 23:44:55 +0000 | [diff] [blame] | 279 | |
| 280 | In Clang's AST file format, deserializing a declaration that is a |
| 281 | ``DeclContext`` is a separate operation from deserializing all of the |
| 282 | declarations stored within that declaration context. Therefore, Clang will |
| 283 | deserialize the translation unit declaration without deserializing the |
| 284 | declarations within that translation unit. When required, the declarations |
| 285 | stored within a declaration context will be deserialized. There are two |
| 286 | representations of the declarations within a declaration context, which |
| 287 | correspond to the name-lookup and iteration behavior described above: |
| 288 | |
| 289 | * When the front end performs name lookup to find a name ``x`` within a given |
| 290 | declaration context (for example, during semantic analysis of the expression |
| 291 | ``p->x``, where ``p``'s type is defined in the precompiled header), Clang |
| 292 | refers to an on-disk hash table that maps from the names within that |
| 293 | declaration context to the declaration IDs that represent each visible |
| 294 | declaration with that name. The actual declarations will then be |
| 295 | deserialized to provide the results of name lookup. |
| 296 | * When the front end performs iteration over all of the declarations within a |
| 297 | declaration context, all of those declarations are immediately |
| 298 | de-serialized. For large declaration contexts (e.g., the translation unit), |
| 299 | this operation is expensive; however, large declaration contexts are not |
| 300 | traversed in normal compilation, since such a traversal is unnecessary. |
| 301 | However, it is common for the code generator and semantic analysis to |
| 302 | traverse declaration contexts for structs, classes, unions, and |
| 303 | enumerations, although those contexts contain relatively few declarations in |
| 304 | the common case. |
| 305 | |
| 306 | Statements and Expressions |
| 307 | ^^^^^^^^^^^^^^^^^^^^^^^^^^ |
| 308 | |
| 309 | Statements and expressions are stored in the AST file in both the :ref:`types |
| 310 | <pchinternals-types>` and the :ref:`declarations <pchinternals-decls>` blocks, |
| 311 | because every statement or expression will be associated with either a type or |
| 312 | declaration. The actual statement and expression records are stored |
| 313 | immediately following the declaration or type that owns the statement or |
| 314 | expression. For example, the statement representing the body of a function |
| 315 | will be stored directly following the declaration of the function. |
| 316 | |
| 317 | As with types and declarations, each statement and expression kind in Clang's |
| 318 | abstract syntax tree (``ForStmt``, ``CallExpr``, etc.) has a corresponding |
| 319 | record type in the AST file, which contains the serialized representation of |
| 320 | that statement or expression. Each substatement or subexpression within an |
| 321 | expression is stored as a separate record (which keeps most records to a fixed |
| 322 | size). Within the AST file, the subexpressions of an expression are stored, in |
| 323 | reverse order, prior to the expression that owns those expression, using a form |
| 324 | of `Reverse Polish Notation |
| 325 | <http://en.wikipedia.org/wiki/Reverse_Polish_notation>`_. For example, an |
| 326 | expression ``3 - 4 + 5`` would be represented as follows: |
| 327 | |
| 328 | +-----------------------+ |
| 329 | | ``IntegerLiteral(5)`` | |
| 330 | +-----------------------+ |
| 331 | | ``IntegerLiteral(4)`` | |
| 332 | +-----------------------+ |
| 333 | | ``IntegerLiteral(3)`` | |
| 334 | +-----------------------+ |
| 335 | | ``IntegerLiteral(-)`` | |
| 336 | +-----------------------+ |
| 337 | | ``IntegerLiteral(+)`` | |
| 338 | +-----------------------+ |
| 339 | | ``STOP`` | |
| 340 | +-----------------------+ |
| 341 | |
| 342 | When reading this representation, Clang evaluates each expression record it |
| 343 | encounters, builds the appropriate abstract syntax tree node, and then pushes |
| 344 | that expression on to a stack. When a record contains *N* subexpressions --- |
| 345 | ``BinaryOperator`` has two of them --- those expressions are popped from the |
| 346 | top of the stack. The special STOP code indicates that we have reached the end |
| 347 | of a serialized expression or statement; other expression or statement records |
| 348 | may follow, but they are part of a different expression. |
| 349 | |
| 350 | .. _pchinternals-ident-table: |
| 351 | |
| 352 | Identifier Table Block |
| 353 | ^^^^^^^^^^^^^^^^^^^^^^ |
| 354 | |
| 355 | The identifier table block contains an on-disk hash table that maps each |
| 356 | identifier mentioned within the AST file to the serialized representation of |
| 357 | the identifier's information (e.g, the ``IdentifierInfo`` structure). The |
| 358 | serialized representation contains: |
| 359 | |
| 360 | * The actual identifier string. |
| 361 | * Flags that describe whether this identifier is the name of a built-in, a |
| 362 | poisoned identifier, an extension token, or a macro. |
| 363 | * If the identifier names a macro, the offset of the macro definition within |
| 364 | the :ref:`pchinternals-preprocessor`. |
| 365 | * If the identifier names one or more declarations visible from translation |
| 366 | unit scope, the :ref:`declaration IDs <pchinternals-decls>` of these |
| 367 | declarations. |
| 368 | |
| 369 | When an AST file is loaded, the AST file reader mechanism introduces itself |
| 370 | into the identifier table as an external lookup source. Thus, when the user |
| 371 | program refers to an identifier that has not yet been seen, Clang will perform |
| 372 | a lookup into the identifier table. If an identifier is found, its contents |
| 373 | (macro definitions, flags, top-level declarations, etc.) will be deserialized, |
| 374 | at which point the corresponding ``IdentifierInfo`` structure will have the |
| 375 | same contents it would have after parsing the headers in the AST file. |
| 376 | |
| 377 | Within the AST file, the identifiers used to name declarations are represented |
| 378 | with an integral value. A separate table provides a mapping from this integral |
| 379 | value (the identifier ID) to the location within the on-disk hash table where |
| 380 | that identifier is stored. This mapping is used when deserializing the name of |
| 381 | a declaration, the identifier of a token, or any other construct in the AST |
| 382 | file that refers to a name. |
| 383 | |
| 384 | .. _pchinternals-method-pool: |
| 385 | |
| 386 | Method Pool Block |
| 387 | ^^^^^^^^^^^^^^^^^ |
| 388 | |
| 389 | The method pool block is represented as an on-disk hash table that serves two |
| 390 | purposes: it provides a mapping from the names of Objective-C selectors to the |
| 391 | set of Objective-C instance and class methods that have that particular |
| 392 | selector (which is required for semantic analysis in Objective-C) and also |
| 393 | stores all of the selectors used by entities within the AST file. The design |
| 394 | of the method pool is similar to that of the :ref:`identifier table |
| 395 | <pchinternals-ident-table>`: the first time a particular selector is formed |
| 396 | during the compilation of the program, Clang will search in the on-disk hash |
| 397 | table of selectors; if found, Clang will read the Objective-C methods |
| 398 | associated with that selector into the appropriate front-end data structure |
| 399 | (``Sema::InstanceMethodPool`` and ``Sema::FactoryMethodPool`` for instance and |
| 400 | class methods, respectively). |
| 401 | |
| 402 | As with identifiers, selectors are represented by numeric values within the AST |
| 403 | file. A separate index maps these numeric selector values to the offset of the |
| 404 | selector within the on-disk hash table, and will be used when de-serializing an |
| 405 | Objective-C method declaration (or other Objective-C construct) that refers to |
| 406 | the selector. |
| 407 | |
| 408 | AST Reader Integration Points |
| 409 | ----------------------------- |
| 410 | |
| 411 | The "lazy" deserialization behavior of AST files requires their integration |
| 412 | into several completely different submodules of Clang. For example, lazily |
| 413 | deserializing the declarations during name lookup requires that the name-lookup |
| 414 | routines be able to query the AST file to find entities stored there. |
| 415 | |
| 416 | For each Clang data structure that requires direct interaction with the AST |
| 417 | reader logic, there is an abstract class that provides the interface between |
| 418 | the two modules. The ``ASTReader`` class, which handles the loading of an AST |
| 419 | file, inherits from all of these abstract classes to provide lazy |
| 420 | deserialization of Clang's data structures. ``ASTReader`` implements the |
| 421 | following abstract classes: |
| 422 | |
Sean Silva | 3872b46 | 2012-12-12 23:44:55 +0000 | [diff] [blame] | 423 | ``ExternalSLocEntrySource`` |
| 424 | This abstract interface is associated with the ``SourceManager`` class, and |
| 425 | is used whenever the :ref:`source manager <pchinternals-sourcemgr>` needs to |
| 426 | load the details of a file, buffer, or macro instantiation. |
| 427 | |
| 428 | ``IdentifierInfoLookup`` |
| 429 | This abstract interface is associated with the ``IdentifierTable`` class, and |
| 430 | is used whenever the program source refers to an identifier that has not yet |
| 431 | been seen. In this case, the AST reader searches for this identifier within |
| 432 | its :ref:`identifier table <pchinternals-ident-table>` to load any top-level |
| 433 | declarations or macros associated with that identifier. |
| 434 | |
| 435 | ``ExternalASTSource`` |
| 436 | This abstract interface is associated with the ``ASTContext`` class, and is |
| 437 | used whenever the abstract syntax tree nodes need to loaded from the AST |
| 438 | file. It provides the ability to de-serialize declarations and types |
| 439 | identified by their numeric values, read the bodies of functions when |
| 440 | required, and read the declarations stored within a declaration context |
| 441 | (either for iteration or for name lookup). |
| 442 | |
| 443 | ``ExternalSemaSource`` |
| 444 | This abstract interface is associated with the ``Sema`` class, and is used |
| 445 | whenever semantic analysis needs to read information from the :ref:`global |
| 446 | method pool <pchinternals-method-pool>`. |
| 447 | |
| 448 | .. _pchinternals-chained: |
| 449 | |
| 450 | Chained precompiled headers |
| 451 | --------------------------- |
| 452 | |
| 453 | Chained precompiled headers were initially intended to improve the performance |
| 454 | of IDE-centric operations such as syntax highlighting and code completion while |
| 455 | a particular source file is being edited by the user. To minimize the amount |
| 456 | of reparsing required after a change to the file, a form of precompiled header |
| 457 | --- called a precompiled *preamble* --- is automatically generated by parsing |
| 458 | all of the headers in the source file, up to and including the last |
| 459 | ``#include``. When only the source file changes (and none of the headers it |
| 460 | depends on), reparsing of that source file can use the precompiled preamble and |
| 461 | start parsing after the ``#include``\ s, so parsing time is proportional to the |
| 462 | size of the source file (rather than all of its includes). However, the |
| 463 | compilation of that translation unit may already use a precompiled header: in |
| 464 | this case, Clang will create the precompiled preamble as a chained precompiled |
| 465 | header that refers to the original precompiled header. This drastically |
| 466 | reduces the time needed to serialize the precompiled preamble for use in |
| 467 | reparsing. |
| 468 | |
| 469 | Chained precompiled headers get their name because each precompiled header can |
| 470 | depend on one other precompiled header, forming a chain of dependencies. A |
| 471 | translation unit will then include the precompiled header that starts the chain |
| 472 | (i.e., nothing depends on it). This linearity of dependencies is important for |
| 473 | the semantic model of chained precompiled headers, because the most-recent |
| 474 | precompiled header can provide information that overrides the information |
| 475 | provided by the precompiled headers it depends on, just like a header file |
| 476 | ``B.h`` that includes another header ``A.h`` can modify the state produced by |
| 477 | parsing ``A.h``, e.g., by ``#undef``'ing a macro defined in ``A.h``. |
| 478 | |
| 479 | There are several ways in which chained precompiled headers generalize the AST |
| 480 | file model: |
| 481 | |
| 482 | Numbering of IDs |
| 483 | Many different kinds of entities --- identifiers, declarations, types, etc. |
| 484 | --- have ID numbers that start at 1 or some other predefined constant and |
| 485 | grow upward. Each precompiled header records the maximum ID number it has |
| 486 | assigned in each category. Then, when a new precompiled header is generated |
| 487 | that depends on (chains to) another precompiled header, it will start |
| 488 | counting at the next available ID number. This way, one can determine, given |
| 489 | an ID number, which AST file actually contains the entity. |
| 490 | |
| 491 | Name lookup |
| 492 | When writing a chained precompiled header, Clang attempts to write only |
| 493 | information that has changed from the precompiled header on which it is |
| 494 | based. This changes the lookup algorithm for the various tables, such as the |
| 495 | :ref:`identifier table <pchinternals-ident-table>`: the search starts at the |
| 496 | most-recent precompiled header. If no entry is found, lookup then proceeds |
| 497 | to the identifier table in the precompiled header it depends on, and so one. |
| 498 | Once a lookup succeeds, that result is considered definitive, overriding any |
| 499 | results from earlier precompiled headers. |
| 500 | |
| 501 | Update records |
| 502 | There are various ways in which a later precompiled header can modify the |
| 503 | entities described in an earlier precompiled header. For example, later |
| 504 | precompiled headers can add entries into the various name-lookup tables for |
| 505 | the translation unit or namespaces, or add new categories to an Objective-C |
| 506 | class. Each of these updates is captured in an "update record" that is |
| 507 | stored in the chained precompiled header file and will be loaded along with |
| 508 | the original entity. |
| 509 | |
| 510 | .. _pchinternals-modules: |
| 511 | |
| 512 | Modules |
| 513 | ------- |
| 514 | |
| 515 | Modules generalize the chained precompiled header model yet further, from a |
| 516 | linear chain of precompiled headers to an arbitrary directed acyclic graph |
| 517 | (DAG) of AST files. All of the same techniques used to make chained |
| 518 | precompiled headers work --- ID number, name lookup, update records --- are |
| 519 | shared with modules. However, the DAG nature of modules introduce a number of |
| 520 | additional complications to the model: |
| 521 | |
| 522 | Numbering of IDs |
| 523 | The simple, linear numbering scheme used in chained precompiled headers falls |
| 524 | apart with the module DAG, because different modules may end up with |
| 525 | different numbering schemes for entities they imported from common shared |
| 526 | modules. To account for this, each module file provides information about |
| 527 | which modules it depends on and which ID numbers it assigned to the entities |
| 528 | in those modules, as well as which ID numbers it took for its own new |
| 529 | entities. The AST reader then maps these "local" ID numbers into a "global" |
| 530 | ID number space for the current translation unit, providing a 1-1 mapping |
| 531 | between entities (in whatever AST file they inhabit) and global ID numbers. |
| 532 | If that translation unit is then serialized into an AST file, this mapping |
| 533 | will be stored for use when the AST file is imported. |
| 534 | |
| 535 | Declaration merging |
| 536 | It is possible for a given entity (from the language's perspective) to be |
| 537 | declared multiple times in different places. For example, two different |
| 538 | headers can have the declaration of ``printf`` or could forward-declare |
| 539 | ``struct stat``. If each of those headers is included in a module, and some |
| 540 | third party imports both of those modules, there is a potentially serious |
| 541 | problem: name lookup for ``printf`` or ``struct stat`` will find both |
| 542 | declarations, but the AST nodes are unrelated. This would result in a |
| 543 | compilation error, due to an ambiguity in name lookup. Therefore, the AST |
| 544 | reader performs declaration merging according to the appropriate language |
| 545 | semantics, ensuring that the two disjoint declarations are merged into a |
| 546 | single redeclaration chain (with a common canonical declaration), so that it |
| 547 | is as if one of the headers had been included before the other. |
| 548 | |
| 549 | Name Visibility |
| 550 | Modules allow certain names that occur during module creation to be "hidden", |
| 551 | so that they are not part of the public interface of the module and are not |
| 552 | visible to its clients. The AST reader maintains a "visible" bit on various |
| 553 | AST nodes (declarations, macros, etc.) to indicate whether that particular |
| 554 | AST node is currently visible; the various name lookup mechanisms in Clang |
| 555 | inspect the visible bit to determine whether that entity, which is still in |
| 556 | the AST (because other, visible AST nodes may depend on it), can actually be |
| 557 | found by name lookup. When a new (sub)module is imported, it may make |
| 558 | existing, non-visible, already-deserialized AST nodes visible; it is the |
| 559 | responsibility of the AST reader to find and update these AST nodes when it |
| 560 | is notified of the import. |
| 561 | |