blob: 91ebf43183d97cd5b95312c1375438038a7d324b [file] [log] [blame]
Chris Lattnerce90ba62007-12-10 05:20:47 +00001<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
2 "http://www.w3.org/TR/html4/strict.dtd">
Chris Lattner7a274392007-10-06 05:23:00 +00003<html>
4<head>
Chris Lattnerce90ba62007-12-10 05:20:47 +00005 <META http-equiv="Content-Type" content="text/html; charset=ISO-8859-1" />
Chris Lattner6908f302007-12-10 05:52:05 +00006 <title>Clang - Features and Goals</title>
Chris Lattnerce90ba62007-12-10 05:20:47 +00007 <link type="text/css" rel="stylesheet" href="menu.css" />
8 <link type="text/css" rel="stylesheet" href="content.css" />
9 <style type="text/css">
Chris Lattner7a274392007-10-06 05:23:00 +000010</style>
11</head>
12<body>
Chris Lattnerce90ba62007-12-10 05:20:47 +000013
Chris Lattner7a274392007-10-06 05:23:00 +000014<!--#include virtual="menu.html.incl"-->
Chris Lattnerce90ba62007-12-10 05:20:47 +000015
Chris Lattner7a274392007-10-06 05:23:00 +000016<div id="content">
Chris Lattner7a274392007-10-06 05:23:00 +000017
Chris Lattner6908f302007-12-10 05:52:05 +000018<h1>Clang - Features and Goals</h1>
19<p>
20This page describes the <a href="index.html#goals">features and goals</a> of
21Clang in more detail and gives a more broad explanation about what we mean.
22These features are:
23</p>
Chris Lattner7a274392007-10-06 05:23:00 +000024
Chris Lattner1a380a02007-12-10 07:14:08 +000025<p>End-User Features:</p>
26
27<ul>
28<li><a href="#performance">High Performance and Low Memory Use</a></li>
29<li><a href="#expressivediags">Expressive Diagnostics</a></a></li>
Chris Lattnerb5604af2007-12-10 07:23:52 +000030<li><a href="#gcccompat">GCC compatibility</a></li>
Chris Lattner1a380a02007-12-10 07:14:08 +000031</ul>
32
33<p>Driving Goals and Internal Design:</p>
Chris Lattner6908f302007-12-10 05:52:05 +000034<ul>
35<li><a href="#real">A real-world, production quality compiler</a></li>
Chris Lattnerb5604af2007-12-10 07:23:52 +000036<li><a href="#simplecode">A simple and hackable code base</a></li>
Chris Lattner6908f302007-12-10 05:52:05 +000037<li><a href="#unifiedparser">A single unified parser for C, Objective C, C++,
38 and Objective C++</a></li>
39<li><a href="#conformance">Conformance with C/C++/ObjC and their
40 variants</a></li>
41</ul>
Chris Lattner7a274392007-10-06 05:23:00 +000042
Chris Lattner6908f302007-12-10 05:52:05 +000043<!--=======================================================================-->
Chris Lattner1a380a02007-12-10 07:14:08 +000044<h1>End-User Features</h1>
45<!--=======================================================================-->
46
47
48<!--=======================================================================-->
49<h2><a name="performance">High Performance and Low Memory Use</a></h2>
50<!--=======================================================================-->
51
52<p>A major focus of our work on clang is to make it fast, light and scalable.
53The library-based architecture of clang makes it straight-forward to time and
54profile the cost of each layer of the stack, and the driver has a number of
55options for performance analysis.</p>
56
57<p>While there is still much that can be done, we find that the clang front-end
58is significantly quicker than gcc and uses less memory For example, when
59compiling "Carbon.h" on Mac OS/X, we see that clang is 2.5x faster than GCC:</p>
60
61<img class="img_slide" src="feature-compile1.png" width="400" height="300" />
62
63<p>Carbon.h is a monster: it transitively includes 558 files, 12.3M of code,
64declares 10000 functions, has 2000 struct definitions, 8000 fields, 20000 enum
65constants, etc (see slide 25+ of the <a href="clang_video-07-25-2007.html">clang
66talk</a> for more information). It is also #include'd into almost every C file
67in a GUI app on the Mac, so its compile time is very important.</p>
68
69<p>From the slide above, you can see that we can measure the time to preprocess
70the file independently from the time to parse it, and independently from the
71time to build the ASTs for the code. GCC doesn't provide a way to measure the
72parser without AST building (it only provides -fsyntax-only). In our
73measurements, we find that clang's preprocessor is consistently 40% faster than
74GCCs, and the parser + AST builder is ~4x faster than GCC's. If you have
75sources that do not depend as heavily on the preprocessor (or if you
76use Precompiled Headers) you may see a much bigger speedup from clang.
77</p>
78
79<p>Compile time performance is important, but when using clang as an API, often
80memory use is even moreso: the less memory the code takes the more code you can
81fit into memory at a time (useful for whole program analysis tools, for
82example).</p>
83
84<img class="img_slide" src="feature-memory1.png" width="400" height="300" />
85
86<p>Here we see a huge advantage of clang: its ASTs take <b>5x less memory</b>
87than GCC's syntax trees, despite the fact that clang's ASTs capture far more
88source-level information than GCC's trees do. This feat is accomplished through
89the use of carefully designed APIs and efficient representations.</p>
90
91<p>In addition to being efficient when pitted head-to-head against GCC in batch
92mode, clang is built with a <a href="#libraryarch">library based
93architecture</a> that makes it relatively easy to adapt it and build new tools
94with it. This means that it is often possible to apply out-of-the-box thinking
95and novel techniques to improve compilation in various ways.</p>
96
97<img class="img_slide" src="feature-compile2.png" width="400" height="300" />
98
99<p>This slide shows how the clang preprocessor can be used to make "distcc"
100parallelization <b>3x</b> more scalable than when using the GCC preprocessor.
101"distcc" quickly bottlenecks on the preprocessor running on the central driver
102machine, so a fast preprocessor is very useful. Comparing the first two bars
103of each group shows how a ~40% faster preprocessor can reduce preprocessing time
104of these large C++ apps by about 40% (shocking!).</p>
105
106<p>The third bar on the slide is the interesting part: it shows how trivial
107caching of file system accesses across invocations of the preprocessor allows
108clang to reduce time spent in the kernel by 10x, making distcc over 3x more
109scalable. This is obviously just one simple hack, doing more interesting things
110(like caching tokens across preprocessed files) would yield another substantial
111speedup.</p>
112
113<p>The clean framework-based design of clang means that many things are possible
114that would be very difficult in other systems, for example incremental
115compilation, multithreading, intelligent caching, etc. We are only starting
116to tap the full potential of the clang design.</p>
117
118
119<!--=======================================================================-->
120<h2><a name="expressivediags">Expressive Diagnostics</a></h2>
121<!--=======================================================================-->
122
123<p>Clang is designed to efficiently capture range information for expressions
124and statements, which allows it to emit very useful and detailed diagnostic
125information (e.g. warnings and errors) when a problem is detected.</p>
126
127<p>For example, this slide compares the diagnostics emitted by clang (top) to
128the diagnostics emitted by GCC (middle) for a simple example:</p>
129
130<img class="img_slide" src="feature-diagnostics1.png" width="400" height="300"/>
131
132<p>As you can see, clang goes beyond tracking just column number information: it
133is able to highlight the subexpressions involved in a problem, making it much
134easier to understand the source of the problem in many cases. For example, in
135the first problem, it tells you <em>why</em> the operand is invalid (it
136requires a pointer) and what type it really is.</p>
137
138<p>In the second error, you can see how clang uses column number information to
139identify exactly which "+" out of the four on that line is causing the problem.
140Further, it highlights the subexpressions involved, which can be very useful
141when a complex subexpression that relies on tricky precedence rules.</p>
142
143<p>The example doesn't show it, but clang works very hard to retain typedef
144information, ensuring that diagnostics print the user types, not the fully
145expanded (and often huge) types. This is clearly important for C++ code (tell
146me about "<tt>std::string</tt>", not about "<tt>std::basic_string&lt;char,
147std::char_traits&lt;char&gt;, std::allocator&lt;char&gt; &gt;</tt>"!), but it is
148also very useful in C code in some cases as well (e.g. "<tt>__m128"</tt> vs
149"<tt>float __attribute__((__vector_size__(16)))</tt>").</p>
150
Chris Lattnerb5604af2007-12-10 07:23:52 +0000151<!--=======================================================================-->
152<h2><a name="gcccompat">GCC Compatibility</a></h2>
153<!--=======================================================================-->
154
155<p>GCC is currently the defacto-standard open source compiler today, and it
156routinely compiles a huge volume of code. GCC supports a huge number of
157extensions and features (many of which are undocumented) and a lot of
158code and header files depend on these features in order to build.</p>
159
160<p>While it would be nice to be able to ignore these extensions and focus on
161implementing the language standards to the letter, pragmatics force us to
162support the GCC extensions that see the most use. Many users just want their
163code to compile, they don't care to argue about whether it is pedantically C99
164or not.</p>
165
166<p>As mentioned above, all
167extensions are explicitly recognized as such and marked with extension
168diagnostics, which can be mapped to warnings, errors, or just ignored.
169</p>
170
Chris Lattner1a380a02007-12-10 07:14:08 +0000171
172<!--=======================================================================-->
173<h1>Driving Goals and Internal Design</h1>
174<!--=======================================================================-->
175
176<!--=======================================================================-->
Chris Lattner6908f302007-12-10 05:52:05 +0000177<h2><a name="real">A real-world, production quality compiler</a></h2>
178<!--=======================================================================-->
Chris Lattner7a274392007-10-06 05:23:00 +0000179
Chris Lattner6908f302007-12-10 05:52:05 +0000180<p>
181Clang is designed and built by experienced commercial compiler developers who
182are increasingly frustrated with the problems that <a
183href="comparison.html">existing open source compilers</a> have. Clang is
184carefully and thoughtfully designed and built to provide the foundation of a
185whole new generation of C/C++/Objective C development tools, and we intend for
186it to be commercial quality.</p>
187
188<p>Being a production quality compiler means many things: it means being high
189performance, being solid and (relatively) bug free, and it means eventually
190being used and depended on by a broad range of people. While we are still in
191the early development stages, we strongly believe that this will become a
192reality.</p>
193
194<!--=======================================================================-->
Chris Lattnerb5604af2007-12-10 07:23:52 +0000195<h2><a name="simplecode">A simple and hackable code base</a></h2>
196<!--=======================================================================-->
197
198<p>Our goal is to make it possible for anyone with a basic understanding
199of compilers and working knowledge of the C/C++/ObjC languages to understand and
200extend the clang source base. A large part of this falls out of our decision to
201make the AST mirror the languages as closely as possible: you have your friendly
202if statement, for statement, parenthesis expression, structs, unions, etc, all
203represented in a simple and explicit way.</p>
204
205<p>In addition to a simple design, we work to make the source base approachable
206by commenting it well, including citations of the language standards where
207appropriate, and designing the code for simplicity. Beyond that, clang offers
208a set of AST dumpers, printers, and visualizers that make it easy to put code in
209and see how it is represented.</p>
210
211<!--=======================================================================-->
Chris Lattner6908f302007-12-10 05:52:05 +0000212<h2><a name="unifiedparser">A single unified parser for C, Objective C, C++,
213and Objective C++</a></h2>
214<!--=======================================================================-->
215
216<p>Clang is the "C Language Family Front-end", which means we intend to support
217the most popular members of the C family. We are convinced that the right
218parsing technology for this class of languages is a hand-built recursive-descent
219parser. Because it is plain C++ code, recursive descent makes it very easy for
220new developers to understand the code, it easily supports ad-hoc rules and other
221strange hacks required by C/C++, and makes it straight-forward to implement
222excellent diagnostics and error recovery.</p>
223
224<p>We believe that implementing C/C++/ObjC in a single unified parser makes the
225end result easier to maintain and evolve than maintaining a separate C and C++
226parser which must be bugfixed and maintained independently of each other.</p>
227
228<!--=======================================================================-->
229<h2><a name="conformance">Conformance with C/C++/ObjC and their
230 variants</a></h2>
231<!--=======================================================================-->
232
233<p>When you start work on implementing a language, you find out that there is a
234huge gap between how the language works and how most people understand it to
235work. This gap is the difference between a normal programmer and a (scary?
236super-natural?) "language lawyer", who knows the ins and outs of the language
237and can grok standardese with ease.</p>
238
239<p>In practice, being conformant with the languages means that we aim to support
240the full language, including the dark and dusty corners (like trigraphs,
241preprocessor arcana, C99 VLAs, etc). Where we support extensions above and
242beyond what the standard officially allows, we make an effort to explicitly call
243this out in the code and emit warnings about it (which are disabled by default,
244but can optionally be mapped to either warnings or errors), allowing you to use
245clang in "strict" mode if you desire.</p>
246
247<p>We also intend to support "dialects" of these languages, such as C89, K&amp;R
248C, C++'03, Objective-C 2, etc.</p>
249
Chris Lattner1a380a02007-12-10 07:14:08 +0000250
251<!--=======================================================================-->
252<h2><a name="libraryarch">Library based architecture</a></h2>
Chris Lattner6908f302007-12-10 05:52:05 +0000253<!--=======================================================================-->
254
Chris Lattner7a274392007-10-06 05:23:00 +0000255A major design concept for the LLVM front-end involves using a library based architecture. In this library based architecture, various parts of the front-end can be cleanly divided into separate libraries which can then be mixed up for different needs and uses. In addition, the library based approach makes it much easier for new developers to get involved and extend LLVM to do new and unique things. In the words of Chris,
Chris Lattner1a380a02007-12-10 07:14:08 +0000256
257<blockquote>
258"The world needs better compiler tools, tools which are built as libraries.
259This design point allows reuse of the tools in new and novel ways. However,
260building the tools as libraries isn't enough: they must have clean APIs, be as
261decoupled from each other as possible, and be easy to modify/extend. This
262requires clean layering, decent design, and keeping the libraries independent of
263any specific client."</blockquote>
264
Chris Lattner7a274392007-10-06 05:23:00 +0000265Currently, the LLVM front-end is divided into the following libraries:
266<ul>
Chris Lattner6908f302007-12-10 05:52:05 +0000267<li>libsupport - Basic support library, reused from LLVM.
268<li>libsystem - System abstraction library, reused from LLVM.
269<li>libbasic - Diagnostics, SourceLocations, SourceBuffer abstraction, file system caching for input source files. <span class="weak_txt">(depends on above libraries)</span>
270<li>libast - Provides classes to represent the C AST, the C type system, builtin functions, and various helpers for analyzing and manipulating the AST (visitors, pretty printers, etc). <span class="weak_txt">(depends on above libraries)</span>
271<li>liblex - C/C++/ObjC lexing and preprocessing, identifier hash table, pragma handling, tokens, and macros. <span class="weak_txt">(depends on above libraries)</span>
272<li>libparse - Parsing and local semantic analysis. This library invokes coarse-grained 'Actions' provided by the client to do stuff (e.g. libsema builds ASTs). <span class="weak_txt">(depends on above libraries)</span>
273<li>libsema - Provides a set of parser actions to build a standardized AST for programs. AST's are 'streamed' out a top-level declaration at a time, allowing clients to use decl-at-a-time processing, build up entire translation units, or even build 'whole program' ASTs depending on how they use the APIs. <span class="weak_txt">(depends on libast and libparse)</span>
274<li>libcodegen - Lower the AST to LLVM IR for optimization &amp; codegen. <span class="weak_txt">(depends on libast)</span>
275<li>librewrite - Editing of text buffers, depends on libast.</li>
276<li>libanalysis - Static analysis support, depends on libast.</li>
277<li><b>clang</b> - An example driver, client of the libraries at various levels. <span class="weak_txt">(depends on above libraries, and LLVM VMCore)</span>
Chris Lattner7a274392007-10-06 05:23:00 +0000278</ul>
279As an example of the power of this library based design.... If you wanted to build a preprocessor, you would take the Basic and Lexer libraries. If you want an indexer, you would take the previous two and add the Parser library and some actions for indexing. If you want a refactoring, static analysis, or source-to-source compiler tool, you would then add the AST building and semantic analyzer libraries.
280In the end, LLVM's library based design will provide developers with many more possibilities.
281
Chris Lattner7a274392007-10-06 05:23:00 +0000282<h2>Better Integration with IDEs</h2>
Chris Lattner96e778b2007-10-06 05:30:19 +0000283Another design goal of Clang is to integrate extremely well with IDEs. IDEs often have very different requirements than code generation, often requiring information that a codegen-only frontend can throw away. Clang is specifically designed and built to capture this information.
Chris Lattner7a274392007-10-06 05:23:00 +0000284</div>
285</body>
Chris Lattnerbafc68f2007-10-06 05:48:57 +0000286</html>