| Douglas Gregor | 29dde39 | 2009-06-03 21:57:43 +0000 | [diff] [blame] | 1 | <html> | 
|  | 2 | <head> | 
|  | 3 | <title>Precompiled Headers (PCH)</title> | 
|  | 4 | <link type="text/css" rel="stylesheet" href="../menu.css" /> | 
|  | 5 | <link type="text/css" rel="stylesheet" href="../content.css" /> | 
|  | 6 | <style type="text/css"> | 
|  | 7 | td { | 
|  | 8 | vertical-align: top; | 
|  | 9 | } | 
|  | 10 | </style> | 
| Douglas Gregor | 32110df | 2009-05-20 00:16:32 +0000 | [diff] [blame] | 11 | </head> | 
|  | 12 |  | 
|  | 13 | <body> | 
|  | 14 |  | 
|  | 15 | <!--#include virtual="../menu.html.incl"--> | 
|  | 16 |  | 
|  | 17 | <div id="content"> | 
|  | 18 |  | 
|  | 19 | <h1>Precompiled Headers</h1> | 
|  | 20 |  | 
|  | 21 | <p>This document describes the design and implementation of Clang's | 
|  | 22 | precompiled headers (PCH). If you are interested in the end-user | 
|  | 23 | view, please see the <a | 
|  | 24 | href="UsersManual.html#precompiledheaders">User's Manual</a>.</p> | 
|  | 25 |  | 
| Douglas Gregor | 923cb23 | 2009-06-03 18:35:59 +0000 | [diff] [blame] | 26 | <p><b>Table of Contents</b></p> | 
|  | 27 | <ul> | 
|  | 28 | <li><a href="#usage">Using Precompiled Headers with | 
| Daniel Dunbar | 69cfd86 | 2009-12-11 23:17:03 +0000 | [diff] [blame] | 29 | <tt>clang</tt></a></li> | 
| Douglas Gregor | 923cb23 | 2009-06-03 18:35:59 +0000 | [diff] [blame] | 30 | <li><a href="#philosophy">Design Philosophy</a></li> | 
|  | 31 | <li><a href="#contents">Precompiled Header Contents</a> | 
|  | 32 | <ul> | 
|  | 33 | <li><a href="#metadata">Metadata Block</a></li> | 
|  | 34 | <li><a href="#sourcemgr">Source Manager Block</a></li> | 
|  | 35 | <li><a href="#preprocessor">Preprocessor Block</a></li> | 
|  | 36 | <li><a href="#types">Types Block</a></li> | 
|  | 37 | <li><a href="#decls">Declarations Block</a></li> | 
|  | 38 | <li><a href="#stmt">Statements and Expressions</a></li> | 
|  | 39 | <li><a href="#idtable">Identifier Table Block</a></li> | 
|  | 40 | <li><a href="#method-pool">Method Pool Block</a></li> | 
|  | 41 | </ul> | 
|  | 42 | </li> | 
| Douglas Gregor | 4c0397f | 2009-06-03 21:55:35 +0000 | [diff] [blame] | 43 | <li><a href="#tendrils">Precompiled Header Integration | 
|  | 44 | Points</a></li> | 
| Douglas Gregor | 0084ead | 2009-06-03 21:41:31 +0000 | [diff] [blame] | 45 | </ul> | 
| Douglas Gregor | 923cb23 | 2009-06-03 18:35:59 +0000 | [diff] [blame] | 46 |  | 
| Daniel Dunbar | 69cfd86 | 2009-12-11 23:17:03 +0000 | [diff] [blame] | 47 | <h2 id="usage">Using Precompiled Headers with <tt>clang</tt></h2> | 
| Douglas Gregor | 32110df | 2009-05-20 00:16:32 +0000 | [diff] [blame] | 48 |  | 
| Daniel Dunbar | 69cfd86 | 2009-12-11 23:17:03 +0000 | [diff] [blame] | 49 | <p>The Clang compiler frontend, <tt>clang -cc1</tt>, supports two command line | 
|  | 50 | options for generating and using PCH files.<p> | 
| Douglas Gregor | 32110df | 2009-05-20 00:16:32 +0000 | [diff] [blame] | 51 |  | 
| Daniel Dunbar | 69cfd86 | 2009-12-11 23:17:03 +0000 | [diff] [blame] | 52 | <p>To generate PCH files using <tt>clang -cc1</tt>, use the option | 
| Douglas Gregor | 32110df | 2009-05-20 00:16:32 +0000 | [diff] [blame] | 53 | <b><tt>-emit-pch</tt></b>: | 
|  | 54 |  | 
| Daniel Dunbar | 69cfd86 | 2009-12-11 23:17:03 +0000 | [diff] [blame] | 55 | <pre> $ clang -cc1 test.h -emit-pch -o test.h.pch </pre> | 
| Douglas Gregor | 32110df | 2009-05-20 00:16:32 +0000 | [diff] [blame] | 56 |  | 
|  | 57 | <p>This option is transparently used by <tt>clang</tt> when generating | 
|  | 58 | PCH files. The resulting PCH file contains the serialized form of the | 
|  | 59 | compiler's internal representation after it has completed parsing and | 
|  | 60 | semantic analysis. The PCH file can then be used as a prefix header | 
|  | 61 | with the <b><tt>-include-pch</tt></b> option:</p> | 
|  | 62 |  | 
|  | 63 | <pre> | 
| Daniel Dunbar | 69cfd86 | 2009-12-11 23:17:03 +0000 | [diff] [blame] | 64 | $ clang -cc1 -include-pch test.h.pch test.c -o test.s | 
| Douglas Gregor | 32110df | 2009-05-20 00:16:32 +0000 | [diff] [blame] | 65 | </pre> | 
|  | 66 |  | 
| Douglas Gregor | 923cb23 | 2009-06-03 18:35:59 +0000 | [diff] [blame] | 67 | <h2 id="philosophy">Design Philosophy</h2> | 
| Douglas Gregor | 32110df | 2009-05-20 00:16:32 +0000 | [diff] [blame] | 68 |  | 
|  | 69 | <p>Precompiled headers are meant to improve overall compile times for | 
|  | 70 | projects, so the design of precompiled headers is entirely driven by | 
|  | 71 | performance concerns. The use case for precompiled headers is | 
|  | 72 | relatively simple: when there is a common set of headers that is | 
|  | 73 | included in nearly every source file in the project, we | 
|  | 74 | <i>precompile</i> that bundle of headers into a single precompiled | 
|  | 75 | header (PCH file). Then, when compiling the source files in the | 
|  | 76 | project, we load the PCH file first (as a prefix header), which acts | 
|  | 77 | as a stand-in for that bundle of headers.</p> | 
|  | 78 |  | 
|  | 79 | <p>A precompiled header implementation improves performance when:</p> | 
|  | 80 | <ul> | 
|  | 81 | <li>Loading the PCH file is significantly faster than re-parsing the | 
|  | 82 | bundle of headers stored within the PCH file. Thus, a precompiled | 
|  | 83 | header design attempts to minimize the cost of reading the PCH | 
|  | 84 | file. Ideally, this cost should not vary with the size of the | 
|  | 85 | precompiled header file.</li> | 
|  | 86 |  | 
|  | 87 | <li>The cost of generating the PCH file initially is not so large | 
|  | 88 | that it counters the per-source-file performance improvement due to | 
|  | 89 | eliminating the need to parse the bundled headers in the first | 
|  | 90 | place. This is particularly important on multi-core systems, because | 
|  | 91 | PCH file generation serializes the build when all compilations | 
|  | 92 | require the PCH file to be up-to-date.</li> | 
|  | 93 | </ul> | 
| Douglas Gregor | 2cc390e | 2009-06-02 22:08:07 +0000 | [diff] [blame] | 94 |  | 
|  | 95 | <p>Clang's precompiled headers are designed with a compact on-disk | 
|  | 96 | representation, which minimizes both PCH creation time and the time | 
|  | 97 | required to initially load the PCH file. The PCH file itself contains | 
|  | 98 | a serialized representation of Clang's abstract syntax trees and | 
|  | 99 | supporting data structures, stored using the same compressed bitstream | 
|  | 100 | as <a href="http://llvm.org/docs/BitCodeFormat.html">LLVM's bitcode | 
|  | 101 | file format</a>.</p> | 
|  | 102 |  | 
|  | 103 | <p>Clang's precompiled headers are loaded "lazily" from disk. When a | 
|  | 104 | PCH file is initially loaded, Clang reads only a small amount of data | 
|  | 105 | from the PCH file to establish where certain important data structures | 
|  | 106 | are stored. The amount of data read in this initial load is | 
|  | 107 | independent of the size of the PCH file, such that a larger PCH file | 
|  | 108 | does not lead to longer PCH load times. The actual header data in the | 
|  | 109 | PCH file--macros, functions, variables, types, etc.--is loaded only | 
|  | 110 | when it is referenced from the user's code, at which point only that | 
|  | 111 | entity (and those entities it depends on) are deserialized from the | 
|  | 112 | PCH file. With this approach, the cost of using a precompiled header | 
|  | 113 | for a translation unit is proportional to the amount of code actually | 
|  | 114 | used from the header, rather than being proportional to the size of | 
| Douglas Gregor | 4c0397f | 2009-06-03 21:55:35 +0000 | [diff] [blame] | 115 | the header itself.</p> | 
|  | 116 |  | 
|  | 117 | <p>When given the <code>-print-stats</code> option, Clang produces | 
|  | 118 | statistics describing how much of the precompiled header was actually | 
|  | 119 | loaded from disk. For a simple "Hello, World!" program that includes | 
|  | 120 | the Apple <code>Cocoa.h</code> header (which is built as a precompiled | 
|  | 121 | header), this option illustrates how little of the actual precompiled | 
|  | 122 | header is required:</p> | 
|  | 123 |  | 
|  | 124 | <pre> | 
|  | 125 | *** PCH Statistics: | 
|  | 126 | 933 stat cache hits | 
|  | 127 | 4 stat cache misses | 
|  | 128 | 895/39981 source location entries read (2.238563%) | 
|  | 129 | 19/15315 types read (0.124061%) | 
|  | 130 | 20/82685 declarations read (0.024188%) | 
|  | 131 | 154/58070 identifiers read (0.265197%) | 
|  | 132 | 0/7260 selectors read (0.000000%) | 
|  | 133 | 0/30842 statements read (0.000000%) | 
|  | 134 | 4/8400 macros read (0.047619%) | 
|  | 135 | 1/4995 lexical declcontexts read (0.020020%) | 
|  | 136 | 0/4413 visible declcontexts read (0.000000%) | 
|  | 137 | 0/7230 method pool entries read (0.000000%) | 
|  | 138 | 0 method pool misses | 
|  | 139 | </pre> | 
|  | 140 |  | 
|  | 141 | <p>For this small program, only a tiny fraction of the source | 
|  | 142 | locations, types, declarations, identifiers, and macros were actually | 
|  | 143 | deserialized from the precompiled header. These statistics can be | 
|  | 144 | useful to determine whether the precompiled header implementation can | 
|  | 145 | be improved by making more of the implementation lazy.</p> | 
| Douglas Gregor | 2cc390e | 2009-06-02 22:08:07 +0000 | [diff] [blame] | 146 |  | 
| Douglas Gregor | 923cb23 | 2009-06-03 18:35:59 +0000 | [diff] [blame] | 147 | <h2 id="contents">Precompiled Header Contents</h2> | 
| Douglas Gregor | 2cc390e | 2009-06-02 22:08:07 +0000 | [diff] [blame] | 148 |  | 
|  | 149 | <img src="PCHLayout.png" align="right" alt="Precompiled header layout"> | 
|  | 150 |  | 
|  | 151 | <p>Clang's precompiled headers are organized into several different | 
|  | 152 | blocks, each of which contains the serialized representation of a part | 
|  | 153 | of Clang's internal representation. Each of the blocks corresponds to | 
|  | 154 | either a block or a record within <a | 
|  | 155 | href="http://llvm.org/docs/BitCodeFormat.html">LLVM's bitstream | 
|  | 156 | format</a>. The contents of each of these logical blocks are described | 
|  | 157 | below.</p> | 
|  | 158 |  | 
| Douglas Gregor | 4c0397f | 2009-06-03 21:55:35 +0000 | [diff] [blame] | 159 | <p>For a given precompiled header, the <a | 
|  | 160 | href="http://llvm.org/cmds/llvm-bcanalyzer.html"><code>llvm-bcanalyzer</code></a> | 
|  | 161 | utility can be used to examine the actual structure of the bitstream | 
|  | 162 | for the precompiled header. This information can be used both to help | 
|  | 163 | understand the structure of the precompiled header and to isolate | 
|  | 164 | areas where precompiled headers can still be optimized, e.g., through | 
|  | 165 | the introduction of abbreviations.</p> | 
|  | 166 |  | 
| Douglas Gregor | 923cb23 | 2009-06-03 18:35:59 +0000 | [diff] [blame] | 167 | <h3 id="metadata">Metadata Block</h3> | 
| Douglas Gregor | 2cc390e | 2009-06-02 22:08:07 +0000 | [diff] [blame] | 168 |  | 
|  | 169 | <p>The metadata block contains several records that provide | 
|  | 170 | information about how the precompiled header was built. This metadata | 
|  | 171 | is primarily used to validate the use of a precompiled header. For | 
| Douglas Gregor | fe3f223 | 2009-06-03 18:26:16 +0000 | [diff] [blame] | 172 | example, a precompiled header built for a 32-bit x86 target cannot be used | 
|  | 173 | when compiling for a 64-bit x86 target. The metadata block contains | 
| Douglas Gregor | 2cc390e | 2009-06-02 22:08:07 +0000 | [diff] [blame] | 174 | information about:</p> | 
|  | 175 |  | 
|  | 176 | <dl> | 
|  | 177 | <dt>Language options</dt> | 
|  | 178 | <dd>Describes the particular language dialect used to compile the | 
|  | 179 | PCH file, including major options (e.g., Objective-C support) and more | 
|  | 180 | minor options (e.g., support for "//" comments). The contents of this | 
|  | 181 | record correspond to the <code>LangOptions</code> class.</dd> | 
| Douglas Gregor | 32110df | 2009-05-20 00:16:32 +0000 | [diff] [blame] | 182 |  | 
| Douglas Gregor | 2cc390e | 2009-06-02 22:08:07 +0000 | [diff] [blame] | 183 | <dt>Target architecture</dt> | 
|  | 184 | <dd>The target triple that describes the architecture, platform, and | 
|  | 185 | ABI for which the PCH file was generated, e.g., | 
|  | 186 | <code>i386-apple-darwin9</code>.</dd> | 
|  | 187 |  | 
|  | 188 | <dt>PCH version</dt> | 
|  | 189 | <dd>The major and minor version numbers of the precompiled header | 
|  | 190 | format. Changes in the minor version number should not affect backward | 
|  | 191 | compatibility, while changes in the major version number imply that a | 
|  | 192 | newer compiler cannot read an older precompiled header (and | 
|  | 193 | vice-versa).</dd> | 
|  | 194 |  | 
|  | 195 | <dt>Original file name</dt> | 
|  | 196 | <dd>The full path of the header that was used to generate the | 
| Douglas Gregor | 5accbb9 | 2009-06-03 16:06:22 +0000 | [diff] [blame] | 197 | precompiled header.</dd> | 
| Douglas Gregor | 2cc390e | 2009-06-02 22:08:07 +0000 | [diff] [blame] | 198 |  | 
|  | 199 | <dt>Predefines buffer</dt> | 
|  | 200 | <dd>Although not explicitly stored as part of the metadata, the | 
|  | 201 | predefines buffer is used in the validation of the precompiled header. | 
|  | 202 | The predefines buffer itself contains code generated by the compiler | 
|  | 203 | to initialize the preprocessor state according to the current target, | 
|  | 204 | platform, and command-line options. For example, the predefines buffer | 
|  | 205 | will contain "<code>#define __STDC__ 1</code>" when we are compiling C | 
|  | 206 | without Microsoft extensions. The predefines buffer itself is stored | 
|  | 207 | within the <a href="#sourcemgr">source manager block</a>, but its | 
| Douglas Gregor | 5accbb9 | 2009-06-03 16:06:22 +0000 | [diff] [blame] | 208 | contents are verified along with the rest of the metadata.</dd> | 
|  | 209 |  | 
|  | 210 | </dl> | 
| Douglas Gregor | 2cc390e | 2009-06-02 22:08:07 +0000 | [diff] [blame] | 211 |  | 
| Douglas Gregor | 923cb23 | 2009-06-03 18:35:59 +0000 | [diff] [blame] | 212 | <h3 id="sourcemgr">Source Manager Block</h3> | 
| Douglas Gregor | 2cc390e | 2009-06-02 22:08:07 +0000 | [diff] [blame] | 213 |  | 
|  | 214 | <p>The source manager block contains the serialized representation of | 
|  | 215 | Clang's <a | 
|  | 216 | href="InternalsManual.html#SourceLocation">SourceManager</a> class, | 
|  | 217 | which handles the mapping from source locations (as represented in | 
|  | 218 | Clang's abstract syntax tree) into actual column/line positions within | 
|  | 219 | a source file or macro instantiation. The precompiled header's | 
|  | 220 | representation of the source manager also includes information about | 
|  | 221 | all of the headers that were (transitively) included when building the | 
|  | 222 | precompiled header.</p> | 
|  | 223 |  | 
|  | 224 | <p>The bulk of the source manager block is dedicated to information | 
|  | 225 | about the various files, buffers, and macro instantiations into which | 
|  | 226 | a source location can refer. Each of these is referenced by a numeric | 
|  | 227 | "file ID", which is a unique number (allocated starting at 1) stored | 
|  | 228 | in the source location. Clang serializes the information for each kind | 
|  | 229 | of file ID, along with an index that maps file IDs to the position | 
|  | 230 | within the PCH file where the information about that file ID is | 
|  | 231 | stored. The data associated with a file ID is loaded only when | 
|  | 232 | required by the front end, e.g., to emit a diagnostic that includes a | 
|  | 233 | macro instantiation history inside the header itself.</p> | 
|  | 234 |  | 
|  | 235 | <p>The source manager block also contains information about all of the | 
|  | 236 | headers that were included when building the precompiled header. This | 
|  | 237 | includes information about the controlling macro for the header (e.g., | 
|  | 238 | when the preprocessor identified that the contents of the header | 
|  | 239 | dependent on a macro like <code>LLVM_CLANG_SOURCEMANAGER_H</code>) | 
|  | 240 | along with a cached version of the results of the <code>stat()</code> | 
|  | 241 | system calls performed when building the precompiled header. The | 
|  | 242 | latter is particularly useful in reducing system time when searching | 
|  | 243 | for include files.</p> | 
|  | 244 |  | 
| Douglas Gregor | 923cb23 | 2009-06-03 18:35:59 +0000 | [diff] [blame] | 245 | <h3 id="preprocessor">Preprocessor Block</h3> | 
| Douglas Gregor | 2cc390e | 2009-06-02 22:08:07 +0000 | [diff] [blame] | 246 |  | 
|  | 247 | <p>The preprocessor block contains the serialized representation of | 
|  | 248 | the preprocessor. Specifically, it contains all of the macros that | 
|  | 249 | have been defined by the end of the header used to build the | 
|  | 250 | precompiled header, along with the token sequences that comprise each | 
|  | 251 | macro. The macro definitions are only read from the PCH file when the | 
|  | 252 | name of the macro first occurs in the program. This lazy loading of | 
| Chris Lattner | 57eccbe | 2009-06-13 18:11:10 +0000 | [diff] [blame] | 253 | macro definitions is triggered by lookups into the <a | 
| Douglas Gregor | 2cc390e | 2009-06-02 22:08:07 +0000 | [diff] [blame] | 254 | href="#idtable">identifier table</a>.</p> | 
|  | 255 |  | 
| Douglas Gregor | 923cb23 | 2009-06-03 18:35:59 +0000 | [diff] [blame] | 256 | <h3 id="types">Types Block</h3> | 
| Douglas Gregor | 2cc390e | 2009-06-02 22:08:07 +0000 | [diff] [blame] | 257 |  | 
|  | 258 | <p>The types block contains the serialized representation of all of | 
|  | 259 | the types referenced in the translation unit. Each Clang type node | 
|  | 260 | (<code>PointerType</code>, <code>FunctionProtoType</code>, etc.) has a | 
|  | 261 | corresponding record type in the PCH file. When types are deserialized | 
|  | 262 | from the precompiled header, the data within the record is used to | 
|  | 263 | reconstruct the appropriate type node using the AST context.</p> | 
|  | 264 |  | 
|  | 265 | <p>Each type has a unique type ID, which is an integer that uniquely | 
|  | 266 | identifies that type. Type ID 0 represents the NULL type, type IDs | 
|  | 267 | less than <code>NUM_PREDEF_TYPE_IDS</code> represent predefined types | 
|  | 268 | (<code>void</code>, <code>float</code>, etc.), while other | 
|  | 269 | "user-defined" type IDs are assigned consecutively from | 
|  | 270 | <code>NUM_PREDEF_TYPE_IDS</code> upward as the types are encountered. | 
|  | 271 | The PCH file has an associated mapping from the user-defined types | 
|  | 272 | block to the location within the types block where the serialized | 
|  | 273 | representation of that type resides, enabling lazy deserialization of | 
|  | 274 | types. When a type is referenced from within the PCH file, that | 
|  | 275 | reference is encoded using the type ID shifted left by 3 bits. The | 
|  | 276 | lower three bits are used to represent the <code>const</code>, | 
|  | 277 | <code>volatile</code>, and <code>restrict</code> qualifiers, as in | 
|  | 278 | Clang's <a | 
|  | 279 | href="http://clang.llvm.org/docs/InternalsManual.html#Type">QualType</a> | 
|  | 280 | class.</p> | 
|  | 281 |  | 
| Douglas Gregor | 923cb23 | 2009-06-03 18:35:59 +0000 | [diff] [blame] | 282 | <h3 id="decls">Declarations Block</h3> | 
| Douglas Gregor | 2cc390e | 2009-06-02 22:08:07 +0000 | [diff] [blame] | 283 |  | 
|  | 284 | <p>The declarations block contains the serialized representation of | 
|  | 285 | all of the declarations referenced in the translation unit. Each Clang | 
|  | 286 | declaration node (<code>VarDecl</code>, <code>FunctionDecl</code>, | 
|  | 287 | etc.) has a corresponding record type in the PCH file. When | 
|  | 288 | declarations are deserialized from the precompiled header, the data | 
|  | 289 | within the record is used to build and populate a new instance of the | 
|  | 290 | corresponding <code>Decl</code> node. As with types, each declaration | 
|  | 291 | node has a numeric ID that is used to refer to that declaration within | 
|  | 292 | the PCH file. In addition, a lookup table provides a mapping from that | 
|  | 293 | numeric ID to the offset within the precompiled header where that | 
|  | 294 | declaration is described.</p> | 
|  | 295 |  | 
|  | 296 | <p>Declarations in Clang's abstract syntax trees are stored | 
|  | 297 | hierarchically. At the top of the hierarchy is the translation unit | 
|  | 298 | (<code>TranslationUnitDecl</code>), which contains all of the | 
| Chris Lattner | 57eccbe | 2009-06-13 18:11:10 +0000 | [diff] [blame] | 299 | declarations in the translation unit. These declarations (such as | 
|  | 300 | functions or struct types) may also contain other declarations inside | 
| Douglas Gregor | 2cc390e | 2009-06-02 22:08:07 +0000 | [diff] [blame] | 301 | them, and so on. Within Clang, each declaration is stored within a <a | 
|  | 302 | href="http://clang.llvm.org/docs/InternalsManual.html#DeclContext">declaration | 
|  | 303 | context</a>, as represented by the <code>DeclContext</code> class. | 
|  | 304 | Declaration contexts provide the mechanism to perform name lookup | 
|  | 305 | within a given declaration (e.g., find the member named <code>x</code> | 
|  | 306 | in a structure) and iterate over the declarations stored within a | 
|  | 307 | context (e.g., iterate over all of the fields of a structure for | 
|  | 308 | structure layout).</p> | 
|  | 309 |  | 
|  | 310 | <p>In Clang's precompiled header format, deserializing a declaration | 
|  | 311 | that is a <code>DeclContext</code> is a separate operation from | 
|  | 312 | deserializing all of the declarations stored within that declaration | 
|  | 313 | context. Therefore, Clang will deserialize the translation unit | 
|  | 314 | declaration without deserializing the declarations within that | 
|  | 315 | translation unit. When required, the declarations stored within a | 
| Chris Lattner | 57eccbe | 2009-06-13 18:11:10 +0000 | [diff] [blame] | 316 | declaration context will be deserialized. There are two representations | 
| Douglas Gregor | 2cc390e | 2009-06-02 22:08:07 +0000 | [diff] [blame] | 317 | of the declarations within a declaration context, which correspond to | 
|  | 318 | the name-lookup and iteration behavior described above:</p> | 
|  | 319 |  | 
|  | 320 | <ul> | 
|  | 321 | <li>When the front end performs name lookup to find a name | 
|  | 322 | <code>x</code> within a given declaration context (for example, | 
|  | 323 | during semantic analysis of the expression <code>p->x</code>, | 
|  | 324 | where <code>p</code>'s type is defined in the precompiled header), | 
|  | 325 | Clang deserializes a hash table mapping from the names within that | 
|  | 326 | declaration context to the declaration IDs that represent each | 
|  | 327 | visible declaration with that name. The entire hash table is | 
|  | 328 | deserialized at this point (into the <code>llvm::DenseMap</code> | 
|  | 329 | stored within each <code>DeclContext</code> object), but the actual | 
|  | 330 | declarations are not yet deserialized. In a second step, those | 
|  | 331 | declarations with the name <code>x</code> will be deserialized and | 
|  | 332 | will be used as the result of name lookup.</li> | 
|  | 333 |  | 
|  | 334 | <li>When the front end performs iteration over all of the | 
|  | 335 | declarations within a declaration context, all of those declarations | 
|  | 336 | are immediately de-serialized. For large declaration contexts (e.g., | 
|  | 337 | the translation unit), this operation is expensive; however, large | 
|  | 338 | declaration contexts are not traversed in normal compilation, since | 
|  | 339 | such a traversal is unnecessary. However, it is common for the code | 
|  | 340 | generator and semantic analysis to traverse declaration contexts for | 
|  | 341 | structs, classes, unions, and enumerations, although those contexts | 
|  | 342 | contain relatively few declarations in the common case.</li> | 
|  | 343 | </ul> | 
|  | 344 |  | 
| Douglas Gregor | 923cb23 | 2009-06-03 18:35:59 +0000 | [diff] [blame] | 345 | <h3 id="stmt">Statements and Expressions</h3> | 
| Douglas Gregor | 5accbb9 | 2009-06-03 16:06:22 +0000 | [diff] [blame] | 346 |  | 
|  | 347 | <p>Statements and expressions are stored in the precompiled header in | 
|  | 348 | both the <a href="#types">types</a> and the <a | 
|  | 349 | href="#decls">declarations</a> blocks, because every statement or | 
|  | 350 | expression will be associated with either a type or declaration. The | 
|  | 351 | actual statement and expression records are stored immediately | 
|  | 352 | following the declaration or type that owns the statement or | 
|  | 353 | expression. For example, the statement representing the body of a | 
|  | 354 | function will be stored directly following the declaration of the | 
|  | 355 | function.</p> | 
|  | 356 |  | 
|  | 357 | <p>As with types and declarations, each statement and expression kind | 
|  | 358 | in Clang's abstract syntax tree (<code>ForStmt</code>, | 
|  | 359 | <code>CallExpr</code>, etc.) has a corresponding record type in the | 
|  | 360 | precompiled header, which contains the serialized representation of | 
| Douglas Gregor | fe3f223 | 2009-06-03 18:26:16 +0000 | [diff] [blame] | 361 | that statement or expression. Each substatement or subexpression | 
|  | 362 | within an expression is stored as a separate record (which keeps most | 
|  | 363 | records to a fixed size). Within the precompiled header, the | 
|  | 364 | subexpressions of an expression are stored prior to the expression | 
|  | 365 | that owns those expression, using a form of <a | 
|  | 366 | href="http://en.wikipedia.org/wiki/Reverse_Polish_notation">Reverse | 
|  | 367 | Polish Notation</a>. For example, an expression <code>3 - 4 + 5</code> | 
|  | 368 | would be represented as follows:</p> | 
|  | 369 |  | 
|  | 370 | <table border="1"> | 
|  | 371 | <tr><td><code>IntegerLiteral(3)</code></td></tr> | 
|  | 372 | <tr><td><code>IntegerLiteral(4)</code></td></tr> | 
|  | 373 | <tr><td><code>BinaryOperator(-)</code></td></tr> | 
|  | 374 | <tr><td><code>IntegerLiteral(5)</code></td></tr> | 
|  | 375 | <tr><td><code>BinaryOperator(+)</code></td></tr> | 
|  | 376 | <tr><td>STOP</td></tr> | 
|  | 377 | </table> | 
|  | 378 |  | 
|  | 379 | <p>When reading this representation, Clang evaluates each expression | 
|  | 380 | record it encounters, builds the appropriate abstract synax tree node, | 
|  | 381 | and then pushes that expression on to a stack. When a record contains <i>N</i> | 
|  | 382 | subexpressions--<code>BinaryOperator</code> has two of them--those | 
|  | 383 | expressions are popped from the top of the stack. The special STOP | 
|  | 384 | code indicates that we have reached the end of a serialized expression | 
|  | 385 | or statement; other expression or statement records may follow, but | 
|  | 386 | they are part of a different expression.</p> | 
| Douglas Gregor | 5accbb9 | 2009-06-03 16:06:22 +0000 | [diff] [blame] | 387 |  | 
| Douglas Gregor | 923cb23 | 2009-06-03 18:35:59 +0000 | [diff] [blame] | 388 | <h3 id="idtable">Identifier Table Block</h3> | 
| Douglas Gregor | 2cc390e | 2009-06-02 22:08:07 +0000 | [diff] [blame] | 389 |  | 
|  | 390 | <p>The identifier table block contains an on-disk hash table that maps | 
|  | 391 | each identifier mentioned within the precompiled header to the | 
|  | 392 | serialized representation of the identifier's information (e.g, the | 
|  | 393 | <code>IdentifierInfo</code> structure). The serialized representation | 
|  | 394 | contains:</p> | 
|  | 395 |  | 
|  | 396 | <ul> | 
|  | 397 | <li>The actual identifier string.</li> | 
|  | 398 | <li>Flags that describe whether this identifier is the name of a | 
|  | 399 | built-in, a poisoned identifier, an extension token, or a | 
|  | 400 | macro.</li> | 
|  | 401 | <li>If the identifier names a macro, the offset of the macro | 
|  | 402 | definition within the <a href="#preprocessor">preprocessor | 
|  | 403 | block</a>.</li> | 
|  | 404 | <li>If the identifier names one or more declarations visible from | 
|  | 405 | translation unit scope, the <a href="#decls">declaration IDs</a> of these | 
|  | 406 | declarations.</li> | 
|  | 407 | </ul> | 
|  | 408 |  | 
|  | 409 | <p>When a precompiled header is loaded, the precompiled header | 
|  | 410 | mechanism introduces itself into the identifier table as an external | 
|  | 411 | lookup source. Thus, when the user program refers to an identifier | 
|  | 412 | that has not yet been seen, Clang will perform a lookup into the | 
| Chris Lattner | 57eccbe | 2009-06-13 18:11:10 +0000 | [diff] [blame] | 413 | identifier table. If an identifier is found, its contents (macro | 
|  | 414 | definitions, flags, top-level declarations, etc.) will be deserialized, at which point the corresponding <code>IdentifierInfo</code> structure will have the same contents it would have after parsing the headers in the precompiled header.</p> | 
| Douglas Gregor | 2cc390e | 2009-06-02 22:08:07 +0000 | [diff] [blame] | 415 |  | 
| Douglas Gregor | 5accbb9 | 2009-06-03 16:06:22 +0000 | [diff] [blame] | 416 | <p>Within the PCH file, the identifiers used to name declarations are represented with an integral value. A separate table provides a mapping from this integral value (the identifier ID) to the location within the on-disk | 
| Douglas Gregor | 2cc390e | 2009-06-02 22:08:07 +0000 | [diff] [blame] | 417 | hash table where that identifier is stored. This mapping is used when | 
|  | 418 | deserializing the name of a declaration, the identifier of a token, or | 
|  | 419 | any other construct in the PCH file that refers to a name.</p> | 
|  | 420 |  | 
| Douglas Gregor | 923cb23 | 2009-06-03 18:35:59 +0000 | [diff] [blame] | 421 | <h3 id="method-pool">Method Pool Block</h3> | 
| Douglas Gregor | 5accbb9 | 2009-06-03 16:06:22 +0000 | [diff] [blame] | 422 |  | 
|  | 423 | <p>The method pool block is represented as an on-disk hash table that | 
|  | 424 | serves two purposes: it provides a mapping from the names of | 
|  | 425 | Objective-C selectors to the set of Objective-C instance and class | 
|  | 426 | methods that have that particular selector (which is required for | 
|  | 427 | semantic analysis in Objective-C) and also stores all of the selectors | 
|  | 428 | used by entities within the precompiled header. The design of the | 
|  | 429 | method pool is similar to that of the <a href="#idtable">identifier | 
|  | 430 | table</a>: the first time a particular selector is formed during the | 
|  | 431 | compilation of the program, Clang will search in the on-disk hash | 
|  | 432 | table of selectors; if found, Clang will read the Objective-C methods | 
|  | 433 | associated with that selector into the appropriate front-end data | 
|  | 434 | structure (<code>Sema::InstanceMethodPool</code> and | 
|  | 435 | <code>Sema::FactoryMethodPool</code> for instance and class methods, | 
|  | 436 | respectively).</p> | 
|  | 437 |  | 
|  | 438 | <p>As with identifiers, selectors are represented by numeric values | 
|  | 439 | within the PCH file. A separate index maps these numeric selector | 
|  | 440 | values to the offset of the selector within the on-disk hash table, | 
|  | 441 | and will be used when de-serializing an Objective-C method declaration | 
|  | 442 | (or other Objective-C construct) that refers to the selector.</p> | 
|  | 443 |  | 
| Douglas Gregor | 0084ead | 2009-06-03 21:41:31 +0000 | [diff] [blame] | 444 | <h2 id="tendrils">Precompiled Header Integration Points</h2> | 
|  | 445 |  | 
|  | 446 | <p>The "lazy" deserialization behavior of precompiled headers requires | 
|  | 447 | their integration into several completely different submodules of | 
|  | 448 | Clang. For example, lazily deserializing the declarations during name | 
|  | 449 | lookup requires that the name-lookup routines be able to query the | 
|  | 450 | precompiled header to find entities within the PCH file.</p> | 
|  | 451 |  | 
|  | 452 | <p>For each Clang data structure that requires direct interaction with | 
|  | 453 | the precompiled header logic, there is an abstract class that provides | 
|  | 454 | the interface between the two modules. The <code>PCHReader</code> | 
|  | 455 | class, which handles the loading of a precompiled header, inherits | 
|  | 456 | from all of these abstract classes to provide lazy deserialization of | 
|  | 457 | Clang's data structures. <code>PCHReader</code> implements the | 
|  | 458 | following abstract classes:</p> | 
|  | 459 |  | 
|  | 460 | <dl> | 
|  | 461 | <dt><code>StatSysCallCache</code></dt> | 
|  | 462 | <dd>This abstract interface is associated with the | 
|  | 463 | <code>FileManager</code> class, and is used whenever the file | 
|  | 464 | manager is going to perform a <code>stat()</code> system call.</dd> | 
|  | 465 |  | 
|  | 466 | <dt><code>ExternalSLocEntrySource</code></dt> | 
|  | 467 | <dd>This abstract interface is associated with the | 
|  | 468 | <code>SourceManager</code> class, and is used whenever the | 
|  | 469 | <a href="#sourcemgr">source manager</a> needs to load the details | 
|  | 470 | of a file, buffer, or macro instantiation.</dd> | 
|  | 471 |  | 
|  | 472 | <dt><code>IdentifierInfoLookup</code></dt> | 
|  | 473 | <dd>This abstract interface is associated with the | 
|  | 474 | <code>IdentifierTable</code> class, and is used whenever the | 
|  | 475 | program source refers to an identifier that has not yet been seen. | 
|  | 476 | In this case, the precompiled header implementation searches for | 
|  | 477 | this identifier within its <a href="#idtable">identifier table</a> | 
|  | 478 | to load any top-level declarations or macros associated with that | 
|  | 479 | identifier.</dd> | 
|  | 480 |  | 
|  | 481 | <dt><code>ExternalASTSource</code></dt> | 
|  | 482 | <dd>This abstract interface is associated with the | 
|  | 483 | <code>ASTContext</code> class, and is used whenever the abstract | 
|  | 484 | syntax tree nodes need to loaded from the precompiled header. It | 
|  | 485 | provides the ability to de-serialize declarations and types | 
|  | 486 | identified by their numeric values, read the bodies of functions | 
|  | 487 | when required, and read the declarations stored within a | 
|  | 488 | declaration context (either for iteration or for name lookup).</dd> | 
|  | 489 |  | 
|  | 490 | <dt><code>ExternalSemaSource</code></dt> | 
|  | 491 | <dd>This abstract interface is associated with the <code>Sema</code> | 
|  | 492 | class, and is used whenever semantic analysis needs to read | 
|  | 493 | information from the <a href="#methodpool">global method | 
|  | 494 | pool</a>.</dd> | 
|  | 495 | </dl> | 
|  | 496 |  | 
| Douglas Gregor | 32110df | 2009-05-20 00:16:32 +0000 | [diff] [blame] | 497 | </div> | 
|  | 498 |  | 
| Douglas Gregor | 4c0397f | 2009-06-03 21:55:35 +0000 | [diff] [blame] | 499 | </body> | 
| Douglas Gregor | 32110df | 2009-05-20 00:16:32 +0000 | [diff] [blame] | 500 | </html> |