blob: b15f68160676a6d6f776d15132a8f46f95d2a209 [file] [log] [blame]
Benjamin Kramer665a8dc2012-01-15 15:26:07 +00001<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
2 "http://www.w3.org/TR/html4/strict.dtd">
Ted Kremenek797a2472009-04-08 05:07:30 +00003<html>
4 <head>
5 <title>Pretokenized Headers (PTH)</title>
Benjamin Kramer665a8dc2012-01-15 15:26:07 +00006 <link type="text/css" rel="stylesheet" href="../menu.css">
7 <link type="text/css" rel="stylesheet" href="../content.css">
Ted Kremenek797a2472009-04-08 05:07:30 +00008 <style type="text/css">
9 td {
10 vertical-align: top;
11 }
12 </style>
13</head>
14<body>
15
16<!--#include virtual="../menu.html.incl"-->
17
18<div id="content">
Ted Kremenek797a2472009-04-08 05:07:30 +000019
Chris Lattner5c3074f2009-04-20 04:37:38 +000020<h1>Pretokenized Headers (PTH)</h1>
Ted Kremenek797a2472009-04-08 05:07:30 +000021
Chris Lattner5c3074f2009-04-20 04:37:38 +000022<p>This document first describes the low-level
23interface for using PTH and then briefly elaborates on its design and
24implementation. If you are interested in the end-user view, please see the
25<a href="UsersManual.html#precompiledheaders">User's Manual</a>.</p>
Ted Kremenek797a2472009-04-08 05:07:30 +000026
27
Daniel Dunbar69cfd862009-12-11 23:17:03 +000028<h2>Using Pretokenized Headers with <tt>clang</tt> (Low-level Interface)</h2>
Ted Kremenek797a2472009-04-08 05:07:30 +000029
Daniel Dunbar69cfd862009-12-11 23:17:03 +000030<p>The Clang compiler frontend, <tt>clang -cc1</tt>, supports three command line
31options for generating and using PTH files.<p>
Ted Kremenek797a2472009-04-08 05:07:30 +000032
Daniel Dunbar69cfd862009-12-11 23:17:03 +000033<p>To generate PTH files using <tt>clang -cc1</tt>, use the option
Ted Kremenekb7fd6b02009-04-09 18:17:39 +000034<b><tt>-emit-pth</tt></b>:
35
Daniel Dunbar69cfd862009-12-11 23:17:03 +000036<pre> $ clang -cc1 test.h -emit-pth -o test.h.pth </pre>
Ted Kremenek797a2472009-04-08 05:07:30 +000037
38<p>This option is transparently used by <tt>clang</tt> when generating PTH
Ted Kremenekb7fd6b02009-04-09 18:17:39 +000039files. Similarly, PTH files can be used as prefix headers using the
40<b><tt>-include-pth</tt></b> option:</p>
Ted Kremenek797a2472009-04-08 05:07:30 +000041
42<pre>
Daniel Dunbar69cfd862009-12-11 23:17:03 +000043 $ clang -cc1 -include-pth test.h.pth test.c -o test.s
Ted Kremenek797a2472009-04-08 05:07:30 +000044</pre>
45
46<p>Alternatively, Clang's PTH files can be used as a raw &quot;token-cache&quot;
47(or &quot;content&quot; cache) of the source included by the original header
48file. This means that the contents of the PTH file are searched as substitutes
Daniel Dunbar69cfd862009-12-11 23:17:03 +000049for <em>any</em> source files that are used by <tt>clang -cc1</tt> to process a
Ted Kremenekb7fd6b02009-04-09 18:17:39 +000050source file. This is done by specifying the <b><tt>-token-cache</tt></b>
51option:</p>
Ted Kremenek797a2472009-04-08 05:07:30 +000052
53<pre>
54 $ cat test.h
Chris Lattner0a069992009-04-08 06:00:32 +000055 #include &lt;stdio.h&gt;
Daniel Dunbar69cfd862009-12-11 23:17:03 +000056 $ clang -cc1 -emit-pth test.h -o test.h.pth
Ted Kremenek797a2472009-04-08 05:07:30 +000057 $ cat test.c
58 #include "test.h"
Daniel Dunbar69cfd862009-12-11 23:17:03 +000059 $ clang -cc1 test.c -o test -token-cache test.h.pth
Ted Kremenek797a2472009-04-08 05:07:30 +000060</pre>
61
62<p>In this example the contents of <tt>stdio.h</tt> (and the files it includes)
63will be retrieved from <tt>test.h.pth</tt>, as the PTH file is being used in
64this case as a raw cache of the contents of <tt>test.h</tt>. This is a low-level
65interface used to both implement the high-level PTH interface as well as to
66provide alternative means to use PTH-style caching.</p>
67
68<h2>PTH Design and Implementation</h2>
69
70<p>Unlike GCC's precompiled headers, which cache the full ASTs and preprocessor
71state of a header file, Clang's pretokenized header files mainly cache the raw
72lexer <em>tokens</em> that are needed to segment the stream of characters in a
73source file into keywords, identifiers, and operators. Consequently, PTH serves
74to mainly directly speed up the lexing and preprocessing of a source file, while
75parsing and type-checking must be completely redone every time a PTH file is
76used.</p>
77
78<h3>Basic Design Tradeoffs</h3>
79
80<p>In the long term there are plans to provide an alternate PCH implementation
81for Clang that also caches the work for parsing and type checking the contents
82of header files. The current implementation of PCH in Clang as pretokenized
83header files was motivated by the following factors:<p>
84
85<ul>
Ted Kremenek07f08d22009-04-09 18:03:21 +000086
Ted Kremenek5890c632009-04-09 18:22:40 +000087<li><p><b>Language independence</b>: PTH files work with any language that
Ted Kremenek07f08d22009-04-09 18:03:21 +000088Clang's lexer can handle, including C, Objective-C, and (in the early stages)
89C++. This means development on language features at the parsing level or above
90(which is basically almost all interesting pieces) does not require PTH to be
91modified.</p></li>
Ted Kremenek797a2472009-04-08 05:07:30 +000092
Ted Kremenek5890c632009-04-09 18:22:40 +000093<li><b>Simple design</b>: Relatively speaking, PTH has a simple design and
Ted Kremenek797a2472009-04-08 05:07:30 +000094implementation, making it easy to test. Further, because the machinery for PTH
95resides at the lower-levels of the Clang library stack it is fairly
96straightforward to profile and optimize.</li>
97</ul>
98
99<p>Further, compared to GCC's PCH implementation (which is the dominate
100precompiled header file implementation that Clang can be directly compared
101against) the PTH design in Clang yields several attractive features:</p>
102
103<ul>
104
Ted Kremenek5890c632009-04-09 18:22:40 +0000105<li><p><b>Architecture independence</b>: In contrast to GCC's PCH files (and
Ted Kremenek797a2472009-04-08 05:07:30 +0000106those of several other compilers), Clang's PTH files are architecture
107independent, requiring only a single PTH file when building an program for
108multiple architectures.</p>
109
110<p>For example, on Mac OS X one may wish to
111compile a &quot;universal binary&quot; that runs on PowerPC, 32-bit Intel
112(i386), and 64-bit Intel architectures. In contrast, GCC requires a PCH file for
113each architecture, as the definitions of types in the AST are
114architecture-specific. Since a Clang PTH file essentially represents a lexical
115cache of header files, a single PTH file can be safely used when compiling for
116multiple architectures. This can also reduce compile times because only a single
117PTH file needs to be generated during a build instead of several.</p></li>
118
Ted Kremenek5890c632009-04-09 18:22:40 +0000119<li><p><b>Reduced memory pressure</b>: Similar to GCC,
Ted Kremenek797a2472009-04-08 05:07:30 +0000120Clang reads PTH files via the use of memory mapping (i.e., <tt>mmap</tt>).
121Clang, however, memory maps PTH files as read-only, meaning that multiple
Daniel Dunbar69cfd862009-12-11 23:17:03 +0000122invocations of <tt>clang -cc1</tt> can share the same pages in memory from a
Ted Kremenek797a2472009-04-08 05:07:30 +0000123memory-mapped PTH file. In comparison, GCC also memory maps its PCH files but
124also modifies those pages in memory, incurring the copy-on-write costs. The
125read-only nature of PTH can greatly reduce memory pressure for builds involving
126multiple cores, thus improving overall scalability.</p></li>
127
Ted Kremenek5890c632009-04-09 18:22:40 +0000128<li><p><b>Fast generation</b>: PTH files can be generated in a small fraction
Ted Kremenek07f08d22009-04-09 18:03:21 +0000129of the time needed to generate GCC's PCH files. Since PTH/PCH generation is a
130serial operation that typically blocks progress during a build, faster
131generation time leads to improved processor utilization with parallel builds on
132multicore machines.</p></li>
133
Ted Kremenek797a2472009-04-08 05:07:30 +0000134</ul>
135
136<p>Despite these strengths, PTH's simple design suffers some algorithmic
137handicaps compared to other PCH strategies such as those used by GCC. While PTH
138can greatly speed up the processing time of a header file, the amount of work
139required to process a header file is still roughly linear in the size of the
140header file. In contrast, the amount of work done by GCC to process a
141precompiled header is (theoretically) constant (the ASTs for the header are
142literally memory mapped into the compiler). This means that only the pieces of
143the header file that are referenced by the source file including the header are
144the only ones the compiler needs to process during actual compilation. While
145GCC's particular implementation of PCH mitigates some of these algorithmic
146strengths via the use of copy-on-write pages, the approach itself can
147fundamentally dominate at an algorithmic level, especially when one considers
148header files of arbitrary size.</p>
149
Ted Kremenek07f08d22009-04-09 18:03:21 +0000150<p>There are plans to potentially implement an complementary PCH implementation
151for Clang based on the lazy deserialization of ASTs. This approach would
152theoretically have the same constant-time algorithmic advantages just mentioned
153but would also retain some of the strengths of PTH such as reduced memory
154pressure (ideal for multi-core builds).</p>
Ted Kremenek797a2472009-04-08 05:07:30 +0000155
156<h3>Internal PTH Optimizations</h3>
157
158<p>While the main optimization employed by PTH is to reduce lexing time of
159header files by caching pre-lexed tokens, PTH also employs several other
160optimizations to speed up the processing of header files:</p>
161
162<ul>
163
164<li><p><em><tt>stat</tt> caching</em>: PTH files cache information obtained via
Daniel Dunbar69cfd862009-12-11 23:17:03 +0000165calls to <tt>stat</tt> that <tt>clang -cc1</tt> uses to resolve which files are
Ted Kremenek797a2472009-04-08 05:07:30 +0000166included by <tt>#include</tt> directives. This greatly reduces the overhead
167involved in context-switching to the kernel to resolve included files.</p></li>
168
169<li><p><em>Fasting skipping of <tt>#ifdef</tt>...<tt>#endif</tt> chains</em>:
170PTH files record the basic structure of nested preprocessor blocks. When the
171condition of the preprocessor block is false, all of its tokens are immediately
172skipped instead of requiring them to be handled by Clang's
173preprocessor.</p></li>
174
175</ul>
176
177</div>
178</body>
179</html>