<title>Clang - Features and Goals</title>
<h1>Clang - Features and Goals</h1>
This page describes the <a href="index.html#goals">features and goals</a> of
Clang in more detail and gives a more broad explanation about what we mean.
These features are:
<p>End-User Features:</p>
<li><a href="#performance">High Performance and Low Memory Use</a></li>
<li><a href="#expressivediags">Expressive Diagnostics</a></a></li>
<p>Driving Goals and Internal Design:</p>
<li><a href="#real">A real-world, production quality compiler</a></li>
<li><a href="#unifiedparser">A single unified parser for C, Objective C, C++,
and Objective C++</a></li>
<li><a href="#conformance">Conformance with C/C++/ObjC and their
<li><a href="#gcccompat">GCC compatibility</a></li>
<h1>End-User Features</h1>
<h2><a name="performance">High Performance and Low Memory Use</a></h2>
<p>A major focus of our work on clang is to make it fast, light and scalable.
The library-based architecture of clang makes it straight-forward to time and
profile the cost of each layer of the stack, and the driver has a number of
options for performance analysis.</p>
<p>While there is still much that can be done, we find that the clang front-end
is significantly quicker than gcc and uses less memory For example, when
compiling "Carbon.h" on Mac OS/X, we see that clang is 2.5x faster than GCC:</p>
<img class="img_slide" src="feature-compile1.png" width="400" height="300" />
<p>Carbon.h is a monster: it transitively includes 558 files, 12.3M of code,
declares 10000 functions, has 2000 struct definitions, 8000 fields, 20000 enum
constants, etc (see slide 25+ of the <a href="clang_video-07-25-2007.html">clang
talk</a> for more information). It is also #include'd into almost every C file
in a GUI app on the Mac, so its compile time is very important.</p>
<p>From the slide above, you can see that we can measure the time to preprocess
the file independently from the time to parse it, and independently from the
time to build the ASTs for the code. GCC doesn't provide a way to measure the
parser without AST building (it only provides -fsyntax-only). In our
measurements, we find that clang's preprocessor is consistently 40% faster than
GCCs, and the parser + AST builder is ~4x faster than GCC's. If you have
sources that do not depend as heavily on the preprocessor (or if you
use Precompiled Headers) you may see a much bigger speedup from clang.
<p>Compile time performance is important, but when using clang as an API, often
memory use is even moreso: the less memory the code takes the more code you can
fit into memory at a time (useful for whole program analysis tools, for
<img class="img_slide" src="feature-memory1.png" width="400" height="300" />
<p>Here we see a huge advantage of clang: its ASTs take <b>5x less memory</b>
than GCC's syntax trees, despite the fact that clang's ASTs capture far more
source-level information than GCC's trees do. This feat is accomplished through
the use of carefully designed APIs and efficient representations.</p>
<p>In addition to being efficient when pitted head-to-head against GCC in batch
mode, clang is built with a <a href="#libraryarch">library based
architecture</a> that makes it relatively easy to adapt it and build new tools
with it. This means that it is often possible to apply out-of-the-box thinking
and novel techniques to improve compilation in various ways.</p>
<img class="img_slide" src="feature-compile2.png" width="400" height="300" />
<p>This slide shows how the clang preprocessor can be used to make "distcc"
parallelization <b>3x</b> more scalable than when using the GCC preprocessor.
"distcc" quickly bottlenecks on the preprocessor running on the central driver
machine, so a fast preprocessor is very useful. Comparing the first two bars
of each group shows how a ~40% faster preprocessor can reduce preprocessing time
of these large C++ apps by about 40% (shocking!).</p>
<p>The third bar on the slide is the interesting part: it shows how trivial
caching of file system accesses across invocations of the preprocessor allows
clang to reduce time spent in the kernel by 10x, making distcc over 3x more
scalable. This is obviously just one simple hack, doing more interesting things
(like caching tokens across preprocessed files) would yield another substantial
<p>The clean framework-based design of clang means that many things are possible
that would be very difficult in other systems, for example incremental
compilation, multithreading, intelligent caching, etc. We are only starting
to tap the full potential of the clang design.</p>
<h2><a name="expressivediags">Expressive Diagnostics</a></h2>
<p>Clang is designed to efficiently capture range information for expressions
and statements, which allows it to emit very useful and detailed diagnostic
information (e.g. warnings and errors) when a problem is detected.</p>
<p>For example, this slide compares the diagnostics emitted by clang (top) to
the diagnostics emitted by GCC (middle) for a simple example:</p>
<img class="img_slide" src="feature-diagnostics1.png" width="400" height="300"/>
<p>As you can see, clang goes beyond tracking just column number information: it
is able to highlight the subexpressions involved in a problem, making it much
easier to understand the source of the problem in many cases. For example, in
the first problem, it tells you <em>why</em> the operand is invalid (it
requires a pointer) and what type it really is.</p>
<p>In the second error, you can see how clang uses column number information to
identify exactly which "+" out of the four on that line is causing the problem.
Further, it highlights the subexpressions involved, which can be very useful
when a complex subexpression that relies on tricky precedence rules.</p>
<p>The example doesn't show it, but clang works very hard to retain typedef
information, ensuring that diagnostics print the user types, not the fully
expanded (and often huge) types. This is clearly important for C++ code (tell
me about "<tt>std::string</tt>", not about "<tt>std::basic_string&lt;char,
std::char_traits&lt;char&gt;, std::allocator&lt;char&gt; &gt;</tt>"!), but it is
also very useful in C code in some cases as well (e.g. "<tt>__m128"</tt> vs
"<tt>float __attribute__((__vector_size__(16)))</tt>").</p>
<h1>Driving Goals and Internal Design</h1>
<h2><a name="real">A real-world, production quality compiler</a></h2>
Clang is designed and built by experienced commercial compiler developers who
are increasingly frustrated with the problems that <a
href="comparison.html">existing open source compilers</a> have. Clang is
carefully and thoughtfully designed and built to provide the foundation of a
whole new generation of C/C++/Objective C development tools, and we intend for
it to be commercial quality.</p>
<p>Being a production quality compiler means many things: it means being high
performance, being solid and (relatively) bug free, and it means eventually
being used and depended on by a broad range of people. While we are still in
the early development stages, we strongly believe that this will become a
<h2><a name="unifiedparser">A single unified parser for C, Objective C, C++,
and Objective C++</a></h2>
<p>Clang is the "C Language Family Front-end", which means we intend to support
the most popular members of the C family. We are convinced that the right
parsing technology for this class of languages is a hand-built recursive-descent
parser. Because it is plain C++ code, recursive descent makes it very easy for
new developers to understand the code, it easily supports ad-hoc rules and other
strange hacks required by C/C++, and makes it straight-forward to implement
excellent diagnostics and error recovery.</p>
<p>We believe that implementing C/C++/ObjC in a single unified parser makes the
end result easier to maintain and evolve than maintaining a separate C and C++
parser which must be bugfixed and maintained independently of each other.</p>
<h2><a name="conformance">Conformance with C/C++/ObjC and their
<p>When you start work on implementing a language, you find out that there is a
huge gap between how the language works and how most people understand it to
work. This gap is the difference between a normal programmer and a (scary?
super-natural?) "language lawyer", who knows the ins and outs of the language
and can grok standardese with ease.</p>
<p>In practice, being conformant with the languages means that we aim to support
the full language, including the dark and dusty corners (like trigraphs,
preprocessor arcana, C99 VLAs, etc). Where we support extensions above and
beyond what the standard officially allows, we make an effort to explicitly call
this out in the code and emit warnings about it (which are disabled by default,
but can optionally be mapped to either warnings or errors), allowing you to use
clang in "strict" mode if you desire.</p>
<p>We also intend to support "dialects" of these languages, such as C89, K&amp;R
C, C++'03, Objective-C 2, etc.</p>
<h2><a name="gcccompat">GCC Compatibility</a></h2>
<p>GCC is currently the defacto-standard open source compiler today, and it
routinely compiles a huge volume of code. GCC supports a huge number of
extensions and features (many of which are undocumented) and a lot of
code and header files depend on these features in order to build.</p>
<p>While it would be nice to be able to ignore these extensions and focus on
implementing the language standards to the letter, pragmatics force us to
support the GCC extensions that see the most use. As mentioned above, all
extensions are explicitly recognized as such and marked with extension
diagnostics, which can be mapped to warnings, errors, or just ignored.
<h2><a name="libraryarch">Library based architecture</a></h2>
A major design concept for the LLVM front-end involves using a library based architecture. In this library based architecture, various parts of the front-end can be cleanly divided into separate libraries which can then be mixed up for different needs and uses. In addition, the library based approach makes it much easier for new developers to get involved and extend LLVM to do new and unique things. In the words of Chris,
"The world needs better compiler tools, tools which are built as libraries.
This design point allows reuse of the tools in new and novel ways. However,
building the tools as libraries isn't enough: they must have clean APIs, be as
decoupled from each other as possible, and be easy to modify/extend. This
requires clean layering, decent design, and keeping the libraries independent of
any specific client."</blockquote>
Currently, the LLVM front-end is divided into the following libraries:
<li>libsupport - Basic support library, reused from LLVM.
<li>libsystem - System abstraction library, reused from LLVM.
<li>libbasic - Diagnostics, SourceLocations, SourceBuffer abstraction, file system caching for input source files. <span class="weak_txt">(depends on above libraries)</span>
<li>libast - Provides classes to represent the C AST, the C type system, builtin functions, and various helpers for analyzing and manipulating the AST (visitors, pretty printers, etc). <span class="weak_txt">(depends on above libraries)</span>
<li>liblex - C/C++/ObjC lexing and preprocessing, identifier hash table, pragma handling, tokens, and macros. <span class="weak_txt">(depends on above libraries)</span>
<li>libparse - Parsing and local semantic analysis. This library invokes coarse-grained 'Actions' provided by the client to do stuff (e.g. libsema builds ASTs). <span class="weak_txt">(depends on above libraries)</span>
<li>libsema - Provides a set of parser actions to build a standardized AST for programs. AST's are 'streamed' out a top-level declaration at a time, allowing clients to use decl-at-a-time processing, build up entire translation units, or even build 'whole program' ASTs depending on how they use the APIs. <span class="weak_txt">(depends on libast and libparse)</span>
<li>libcodegen - Lower the AST to LLVM IR for optimization &amp; codegen. <span class="weak_txt">(depends on libast)</span>
<li>librewrite - Editing of text buffers, depends on libast.</li>
<li>libanalysis - Static analysis support, depends on libast.</li>
<li><b>clang</b> - An example driver, client of the libraries at various levels. <span class="weak_txt">(depends on above libraries, and LLVM VMCore)</span>
As an example of the power of this library based design.... If you wanted to build a preprocessor, you would take the Basic and Lexer libraries. If you want an indexer, you would take the previous two and add the Parser library and some actions for indexing. If you want a refactoring, static analysis, or source-to-source compiler tool, you would then add the AST building and semantic analyzer libraries.
In the end, LLVM's library based design will provide developers with many more possibilities.
<h2>Better Integration with IDEs</h2>
Another design goal of Clang is to integrate extremely well with IDEs. IDEs often have very different requirements than code generation, often requiring information that a codegen-only frontend can throw away. Clang is specifically designed and built to capture this information.