blob: 79439f878588b27e271aadf9468a47c05ef6cd87 [file] [log] [blame]
Chris Lattnerce90ba62007-12-10 05:20:47 +00001<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
2 "http://www.w3.org/TR/html4/strict.dtd">
Chris Lattner7a274392007-10-06 05:23:00 +00003<html>
4<head>
Chris Lattnerce90ba62007-12-10 05:20:47 +00005 <META http-equiv="Content-Type" content="text/html; charset=ISO-8859-1" />
Chris Lattner6908f302007-12-10 05:52:05 +00006 <title>Clang - Features and Goals</title>
Chris Lattnerce90ba62007-12-10 05:20:47 +00007 <link type="text/css" rel="stylesheet" href="menu.css" />
8 <link type="text/css" rel="stylesheet" href="content.css" />
9 <style type="text/css">
Chris Lattner7a274392007-10-06 05:23:00 +000010</style>
11</head>
12<body>
Chris Lattnerce90ba62007-12-10 05:20:47 +000013
Chris Lattner7a274392007-10-06 05:23:00 +000014<!--#include virtual="menu.html.incl"-->
Chris Lattnerce90ba62007-12-10 05:20:47 +000015
Chris Lattner7a274392007-10-06 05:23:00 +000016<div id="content">
Chris Lattner7a274392007-10-06 05:23:00 +000017
Chris Lattner6908f302007-12-10 05:52:05 +000018<h1>Clang - Features and Goals</h1>
19<p>
20This page describes the <a href="index.html#goals">features and goals</a> of
21Clang in more detail and gives a more broad explanation about what we mean.
22These features are:
23</p>
Chris Lattner7a274392007-10-06 05:23:00 +000024
Chris Lattner1a380a02007-12-10 07:14:08 +000025<p>End-User Features:</p>
26
27<ul>
28<li><a href="#performance">High Performance and Low Memory Use</a></li>
29<li><a href="#expressivediags">Expressive Diagnostics</a></a></li>
30</ul>
31
32<p>Driving Goals and Internal Design:</p>
Chris Lattner6908f302007-12-10 05:52:05 +000033<ul>
34<li><a href="#real">A real-world, production quality compiler</a></li>
35<li><a href="#unifiedparser">A single unified parser for C, Objective C, C++,
36 and Objective C++</a></li>
37<li><a href="#conformance">Conformance with C/C++/ObjC and their
38 variants</a></li>
Chris Lattner1a380a02007-12-10 07:14:08 +000039<li><a href="#gcccompat">GCC compatibility</a></li>
Chris Lattner6908f302007-12-10 05:52:05 +000040</ul>
Chris Lattner7a274392007-10-06 05:23:00 +000041
Chris Lattner6908f302007-12-10 05:52:05 +000042<!--=======================================================================-->
Chris Lattner1a380a02007-12-10 07:14:08 +000043<h1>End-User Features</h1>
44<!--=======================================================================-->
45
46
47<!--=======================================================================-->
48<h2><a name="performance">High Performance and Low Memory Use</a></h2>
49<!--=======================================================================-->
50
51<p>A major focus of our work on clang is to make it fast, light and scalable.
52The library-based architecture of clang makes it straight-forward to time and
53profile the cost of each layer of the stack, and the driver has a number of
54options for performance analysis.</p>
55
56<p>While there is still much that can be done, we find that the clang front-end
57is significantly quicker than gcc and uses less memory For example, when
58compiling "Carbon.h" on Mac OS/X, we see that clang is 2.5x faster than GCC:</p>
59
60<img class="img_slide" src="feature-compile1.png" width="400" height="300" />
61
62<p>Carbon.h is a monster: it transitively includes 558 files, 12.3M of code,
63declares 10000 functions, has 2000 struct definitions, 8000 fields, 20000 enum
64constants, etc (see slide 25+ of the <a href="clang_video-07-25-2007.html">clang
65talk</a> for more information). It is also #include'd into almost every C file
66in a GUI app on the Mac, so its compile time is very important.</p>
67
68<p>From the slide above, you can see that we can measure the time to preprocess
69the file independently from the time to parse it, and independently from the
70time to build the ASTs for the code. GCC doesn't provide a way to measure the
71parser without AST building (it only provides -fsyntax-only). In our
72measurements, we find that clang's preprocessor is consistently 40% faster than
73GCCs, and the parser + AST builder is ~4x faster than GCC's. If you have
74sources that do not depend as heavily on the preprocessor (or if you
75use Precompiled Headers) you may see a much bigger speedup from clang.
76</p>
77
78<p>Compile time performance is important, but when using clang as an API, often
79memory use is even moreso: the less memory the code takes the more code you can
80fit into memory at a time (useful for whole program analysis tools, for
81example).</p>
82
83<img class="img_slide" src="feature-memory1.png" width="400" height="300" />
84
85<p>Here we see a huge advantage of clang: its ASTs take <b>5x less memory</b>
86than GCC's syntax trees, despite the fact that clang's ASTs capture far more
87source-level information than GCC's trees do. This feat is accomplished through
88the use of carefully designed APIs and efficient representations.</p>
89
90<p>In addition to being efficient when pitted head-to-head against GCC in batch
91mode, clang is built with a <a href="#libraryarch">library based
92architecture</a> that makes it relatively easy to adapt it and build new tools
93with it. This means that it is often possible to apply out-of-the-box thinking
94and novel techniques to improve compilation in various ways.</p>
95
96<img class="img_slide" src="feature-compile2.png" width="400" height="300" />
97
98<p>This slide shows how the clang preprocessor can be used to make "distcc"
99parallelization <b>3x</b> more scalable than when using the GCC preprocessor.
100"distcc" quickly bottlenecks on the preprocessor running on the central driver
101machine, so a fast preprocessor is very useful. Comparing the first two bars
102of each group shows how a ~40% faster preprocessor can reduce preprocessing time
103of these large C++ apps by about 40% (shocking!).</p>
104
105<p>The third bar on the slide is the interesting part: it shows how trivial
106caching of file system accesses across invocations of the preprocessor allows
107clang to reduce time spent in the kernel by 10x, making distcc over 3x more
108scalable. This is obviously just one simple hack, doing more interesting things
109(like caching tokens across preprocessed files) would yield another substantial
110speedup.</p>
111
112<p>The clean framework-based design of clang means that many things are possible
113that would be very difficult in other systems, for example incremental
114compilation, multithreading, intelligent caching, etc. We are only starting
115to tap the full potential of the clang design.</p>
116
117
118<!--=======================================================================-->
119<h2><a name="expressivediags">Expressive Diagnostics</a></h2>
120<!--=======================================================================-->
121
122<p>Clang is designed to efficiently capture range information for expressions
123and statements, which allows it to emit very useful and detailed diagnostic
124information (e.g. warnings and errors) when a problem is detected.</p>
125
126<p>For example, this slide compares the diagnostics emitted by clang (top) to
127the diagnostics emitted by GCC (middle) for a simple example:</p>
128
129<img class="img_slide" src="feature-diagnostics1.png" width="400" height="300"/>
130
131<p>As you can see, clang goes beyond tracking just column number information: it
132is able to highlight the subexpressions involved in a problem, making it much
133easier to understand the source of the problem in many cases. For example, in
134the first problem, it tells you <em>why</em> the operand is invalid (it
135requires a pointer) and what type it really is.</p>
136
137<p>In the second error, you can see how clang uses column number information to
138identify exactly which "+" out of the four on that line is causing the problem.
139Further, it highlights the subexpressions involved, which can be very useful
140when a complex subexpression that relies on tricky precedence rules.</p>
141
142<p>The example doesn't show it, but clang works very hard to retain typedef
143information, ensuring that diagnostics print the user types, not the fully
144expanded (and often huge) types. This is clearly important for C++ code (tell
145me about "<tt>std::string</tt>", not about "<tt>std::basic_string&lt;char,
146std::char_traits&lt;char&gt;, std::allocator&lt;char&gt; &gt;</tt>"!), but it is
147also very useful in C code in some cases as well (e.g. "<tt>__m128"</tt> vs
148"<tt>float __attribute__((__vector_size__(16)))</tt>").</p>
149
150
151<!--=======================================================================-->
152<h1>Driving Goals and Internal Design</h1>
153<!--=======================================================================-->
154
155<!--=======================================================================-->
Chris Lattner6908f302007-12-10 05:52:05 +0000156<h2><a name="real">A real-world, production quality compiler</a></h2>
157<!--=======================================================================-->
Chris Lattner7a274392007-10-06 05:23:00 +0000158
Chris Lattner6908f302007-12-10 05:52:05 +0000159<p>
160Clang is designed and built by experienced commercial compiler developers who
161are increasingly frustrated with the problems that <a
162href="comparison.html">existing open source compilers</a> have. Clang is
163carefully and thoughtfully designed and built to provide the foundation of a
164whole new generation of C/C++/Objective C development tools, and we intend for
165it to be commercial quality.</p>
166
167<p>Being a production quality compiler means many things: it means being high
168performance, being solid and (relatively) bug free, and it means eventually
169being used and depended on by a broad range of people. While we are still in
170the early development stages, we strongly believe that this will become a
171reality.</p>
172
173<!--=======================================================================-->
174<h2><a name="unifiedparser">A single unified parser for C, Objective C, C++,
175and Objective C++</a></h2>
176<!--=======================================================================-->
177
178<p>Clang is the "C Language Family Front-end", which means we intend to support
179the most popular members of the C family. We are convinced that the right
180parsing technology for this class of languages is a hand-built recursive-descent
181parser. Because it is plain C++ code, recursive descent makes it very easy for
182new developers to understand the code, it easily supports ad-hoc rules and other
183strange hacks required by C/C++, and makes it straight-forward to implement
184excellent diagnostics and error recovery.</p>
185
186<p>We believe that implementing C/C++/ObjC in a single unified parser makes the
187end result easier to maintain and evolve than maintaining a separate C and C++
188parser which must be bugfixed and maintained independently of each other.</p>
189
190<!--=======================================================================-->
191<h2><a name="conformance">Conformance with C/C++/ObjC and their
192 variants</a></h2>
193<!--=======================================================================-->
194
195<p>When you start work on implementing a language, you find out that there is a
196huge gap between how the language works and how most people understand it to
197work. This gap is the difference between a normal programmer and a (scary?
198super-natural?) "language lawyer", who knows the ins and outs of the language
199and can grok standardese with ease.</p>
200
201<p>In practice, being conformant with the languages means that we aim to support
202the full language, including the dark and dusty corners (like trigraphs,
203preprocessor arcana, C99 VLAs, etc). Where we support extensions above and
204beyond what the standard officially allows, we make an effort to explicitly call
205this out in the code and emit warnings about it (which are disabled by default,
206but can optionally be mapped to either warnings or errors), allowing you to use
207clang in "strict" mode if you desire.</p>
208
209<p>We also intend to support "dialects" of these languages, such as C89, K&amp;R
210C, C++'03, Objective-C 2, etc.</p>
211
212<!--=======================================================================-->
Chris Lattner1a380a02007-12-10 07:14:08 +0000213<h2><a name="gcccompat">GCC Compatibility</a></h2>
214<!--=======================================================================-->
215
216<p>GCC is currently the defacto-standard open source compiler today, and it
217routinely compiles a huge volume of code. GCC supports a huge number of
218extensions and features (many of which are undocumented) and a lot of
219code and header files depend on these features in order to build.</p>
220
221<p>While it would be nice to be able to ignore these extensions and focus on
222implementing the language standards to the letter, pragmatics force us to
223support the GCC extensions that see the most use. As mentioned above, all
224extensions are explicitly recognized as such and marked with extension
225diagnostics, which can be mapped to warnings, errors, or just ignored.
226</p>
227
228
229<!--=======================================================================-->
230<h2><a name="libraryarch">Library based architecture</a></h2>
Chris Lattner6908f302007-12-10 05:52:05 +0000231<!--=======================================================================-->
232
Chris Lattner7a274392007-10-06 05:23:00 +0000233A major design concept for the LLVM front-end involves using a library based architecture. In this library based architecture, various parts of the front-end can be cleanly divided into separate libraries which can then be mixed up for different needs and uses. In addition, the library based approach makes it much easier for new developers to get involved and extend LLVM to do new and unique things. In the words of Chris,
Chris Lattner1a380a02007-12-10 07:14:08 +0000234
235<blockquote>
236"The world needs better compiler tools, tools which are built as libraries.
237This design point allows reuse of the tools in new and novel ways. However,
238building the tools as libraries isn't enough: they must have clean APIs, be as
239decoupled from each other as possible, and be easy to modify/extend. This
240requires clean layering, decent design, and keeping the libraries independent of
241any specific client."</blockquote>
242
Chris Lattner7a274392007-10-06 05:23:00 +0000243Currently, the LLVM front-end is divided into the following libraries:
244<ul>
Chris Lattner6908f302007-12-10 05:52:05 +0000245<li>libsupport - Basic support library, reused from LLVM.
246<li>libsystem - System abstraction library, reused from LLVM.
247<li>libbasic - Diagnostics, SourceLocations, SourceBuffer abstraction, file system caching for input source files. <span class="weak_txt">(depends on above libraries)</span>
248<li>libast - Provides classes to represent the C AST, the C type system, builtin functions, and various helpers for analyzing and manipulating the AST (visitors, pretty printers, etc). <span class="weak_txt">(depends on above libraries)</span>
249<li>liblex - C/C++/ObjC lexing and preprocessing, identifier hash table, pragma handling, tokens, and macros. <span class="weak_txt">(depends on above libraries)</span>
250<li>libparse - Parsing and local semantic analysis. This library invokes coarse-grained 'Actions' provided by the client to do stuff (e.g. libsema builds ASTs). <span class="weak_txt">(depends on above libraries)</span>
251<li>libsema - Provides a set of parser actions to build a standardized AST for programs. AST's are 'streamed' out a top-level declaration at a time, allowing clients to use decl-at-a-time processing, build up entire translation units, or even build 'whole program' ASTs depending on how they use the APIs. <span class="weak_txt">(depends on libast and libparse)</span>
252<li>libcodegen - Lower the AST to LLVM IR for optimization &amp; codegen. <span class="weak_txt">(depends on libast)</span>
253<li>librewrite - Editing of text buffers, depends on libast.</li>
254<li>libanalysis - Static analysis support, depends on libast.</li>
255<li><b>clang</b> - An example driver, client of the libraries at various levels. <span class="weak_txt">(depends on above libraries, and LLVM VMCore)</span>
Chris Lattner7a274392007-10-06 05:23:00 +0000256</ul>
257As an example of the power of this library based design.... If you wanted to build a preprocessor, you would take the Basic and Lexer libraries. If you want an indexer, you would take the previous two and add the Parser library and some actions for indexing. If you want a refactoring, static analysis, or source-to-source compiler tool, you would then add the AST building and semantic analyzer libraries.
258In the end, LLVM's library based design will provide developers with many more possibilities.
259
Chris Lattner7a274392007-10-06 05:23:00 +0000260<h2>Better Integration with IDEs</h2>
Chris Lattner96e778b2007-10-06 05:30:19 +0000261Another design goal of Clang is to integrate extremely well with IDEs. IDEs often have very different requirements than code generation, often requiring information that a codegen-only frontend can throw away. Clang is specifically designed and built to capture this information.
Chris Lattner7a274392007-10-06 05:23:00 +0000262</div>
263</body>
Chris Lattnerbafc68f2007-10-06 05:48:57 +0000264</html>