Chris Lattner | ce90ba6 | 2007-12-10 05:20:47 +0000 | [diff] [blame] | 1 | <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" |
| 2 | "http://www.w3.org/TR/html4/strict.dtd"> |
Chris Lattner | 7a27439 | 2007-10-06 05:23:00 +0000 | [diff] [blame] | 3 | <html> |
| 4 | <head> |
Chris Lattner | ce90ba6 | 2007-12-10 05:20:47 +0000 | [diff] [blame] | 5 | <META http-equiv="Content-Type" content="text/html; charset=ISO-8859-1" /> |
Chris Lattner | 6908f30 | 2007-12-10 05:52:05 +0000 | [diff] [blame] | 6 | <title>Clang - Features and Goals</title> |
Chris Lattner | ce90ba6 | 2007-12-10 05:20:47 +0000 | [diff] [blame] | 7 | <link type="text/css" rel="stylesheet" href="menu.css" /> |
| 8 | <link type="text/css" rel="stylesheet" href="content.css" /> |
| 9 | <style type="text/css"> |
Chris Lattner | 7a27439 | 2007-10-06 05:23:00 +0000 | [diff] [blame] | 10 | </style> |
| 11 | </head> |
| 12 | <body> |
Chris Lattner | ce90ba6 | 2007-12-10 05:20:47 +0000 | [diff] [blame] | 13 | |
Chris Lattner | 7a27439 | 2007-10-06 05:23:00 +0000 | [diff] [blame] | 14 | <!--#include virtual="menu.html.incl"--> |
Chris Lattner | ce90ba6 | 2007-12-10 05:20:47 +0000 | [diff] [blame] | 15 | |
Chris Lattner | 7a27439 | 2007-10-06 05:23:00 +0000 | [diff] [blame] | 16 | <div id="content"> |
Chris Lattner | 7a27439 | 2007-10-06 05:23:00 +0000 | [diff] [blame] | 17 | |
Chris Lattner | 6908f30 | 2007-12-10 05:52:05 +0000 | [diff] [blame] | 18 | <h1>Clang - Features and Goals</h1> |
| 19 | <p> |
| 20 | This page describes the <a href="index.html#goals">features and goals</a> of |
| 21 | Clang in more detail and gives a more broad explanation about what we mean. |
| 22 | These features are: |
| 23 | </p> |
Chris Lattner | 7a27439 | 2007-10-06 05:23:00 +0000 | [diff] [blame] | 24 | |
Chris Lattner | 1a380a0 | 2007-12-10 07:14:08 +0000 | [diff] [blame^] | 25 | <p>End-User Features:</p> |
| 26 | |
| 27 | <ul> |
| 28 | <li><a href="#performance">High Performance and Low Memory Use</a></li> |
| 29 | <li><a href="#expressivediags">Expressive Diagnostics</a></a></li> |
| 30 | </ul> |
| 31 | |
| 32 | <p>Driving Goals and Internal Design:</p> |
Chris Lattner | 6908f30 | 2007-12-10 05:52:05 +0000 | [diff] [blame] | 33 | <ul> |
| 34 | <li><a href="#real">A real-world, production quality compiler</a></li> |
| 35 | <li><a href="#unifiedparser">A single unified parser for C, Objective C, C++, |
| 36 | and Objective C++</a></li> |
| 37 | <li><a href="#conformance">Conformance with C/C++/ObjC and their |
| 38 | variants</a></li> |
Chris Lattner | 1a380a0 | 2007-12-10 07:14:08 +0000 | [diff] [blame^] | 39 | <li><a href="#gcccompat">GCC compatibility</a></li> |
Chris Lattner | 6908f30 | 2007-12-10 05:52:05 +0000 | [diff] [blame] | 40 | </ul> |
Chris Lattner | 7a27439 | 2007-10-06 05:23:00 +0000 | [diff] [blame] | 41 | |
Chris Lattner | 6908f30 | 2007-12-10 05:52:05 +0000 | [diff] [blame] | 42 | <!--=======================================================================--> |
Chris Lattner | 1a380a0 | 2007-12-10 07:14:08 +0000 | [diff] [blame^] | 43 | <h1>End-User Features</h1> |
| 44 | <!--=======================================================================--> |
| 45 | |
| 46 | |
| 47 | <!--=======================================================================--> |
| 48 | <h2><a name="performance">High Performance and Low Memory Use</a></h2> |
| 49 | <!--=======================================================================--> |
| 50 | |
| 51 | <p>A major focus of our work on clang is to make it fast, light and scalable. |
| 52 | The library-based architecture of clang makes it straight-forward to time and |
| 53 | profile the cost of each layer of the stack, and the driver has a number of |
| 54 | options for performance analysis.</p> |
| 55 | |
| 56 | <p>While there is still much that can be done, we find that the clang front-end |
| 57 | is significantly quicker than gcc and uses less memory For example, when |
| 58 | compiling "Carbon.h" on Mac OS/X, we see that clang is 2.5x faster than GCC:</p> |
| 59 | |
| 60 | <img class="img_slide" src="feature-compile1.png" width="400" height="300" /> |
| 61 | |
| 62 | <p>Carbon.h is a monster: it transitively includes 558 files, 12.3M of code, |
| 63 | declares 10000 functions, has 2000 struct definitions, 8000 fields, 20000 enum |
| 64 | constants, etc (see slide 25+ of the <a href="clang_video-07-25-2007.html">clang |
| 65 | talk</a> for more information). It is also #include'd into almost every C file |
| 66 | in a GUI app on the Mac, so its compile time is very important.</p> |
| 67 | |
| 68 | <p>From the slide above, you can see that we can measure the time to preprocess |
| 69 | the file independently from the time to parse it, and independently from the |
| 70 | time to build the ASTs for the code. GCC doesn't provide a way to measure the |
| 71 | parser without AST building (it only provides -fsyntax-only). In our |
| 72 | measurements, we find that clang's preprocessor is consistently 40% faster than |
| 73 | GCCs, and the parser + AST builder is ~4x faster than GCC's. If you have |
| 74 | sources that do not depend as heavily on the preprocessor (or if you |
| 75 | use Precompiled Headers) you may see a much bigger speedup from clang. |
| 76 | </p> |
| 77 | |
| 78 | <p>Compile time performance is important, but when using clang as an API, often |
| 79 | memory use is even moreso: the less memory the code takes the more code you can |
| 80 | fit into memory at a time (useful for whole program analysis tools, for |
| 81 | example).</p> |
| 82 | |
| 83 | <img class="img_slide" src="feature-memory1.png" width="400" height="300" /> |
| 84 | |
| 85 | <p>Here we see a huge advantage of clang: its ASTs take <b>5x less memory</b> |
| 86 | than GCC's syntax trees, despite the fact that clang's ASTs capture far more |
| 87 | source-level information than GCC's trees do. This feat is accomplished through |
| 88 | the use of carefully designed APIs and efficient representations.</p> |
| 89 | |
| 90 | <p>In addition to being efficient when pitted head-to-head against GCC in batch |
| 91 | mode, clang is built with a <a href="#libraryarch">library based |
| 92 | architecture</a> that makes it relatively easy to adapt it and build new tools |
| 93 | with it. This means that it is often possible to apply out-of-the-box thinking |
| 94 | and novel techniques to improve compilation in various ways.</p> |
| 95 | |
| 96 | <img class="img_slide" src="feature-compile2.png" width="400" height="300" /> |
| 97 | |
| 98 | <p>This slide shows how the clang preprocessor can be used to make "distcc" |
| 99 | parallelization <b>3x</b> more scalable than when using the GCC preprocessor. |
| 100 | "distcc" quickly bottlenecks on the preprocessor running on the central driver |
| 101 | machine, so a fast preprocessor is very useful. Comparing the first two bars |
| 102 | of each group shows how a ~40% faster preprocessor can reduce preprocessing time |
| 103 | of these large C++ apps by about 40% (shocking!).</p> |
| 104 | |
| 105 | <p>The third bar on the slide is the interesting part: it shows how trivial |
| 106 | caching of file system accesses across invocations of the preprocessor allows |
| 107 | clang to reduce time spent in the kernel by 10x, making distcc over 3x more |
| 108 | scalable. This is obviously just one simple hack, doing more interesting things |
| 109 | (like caching tokens across preprocessed files) would yield another substantial |
| 110 | speedup.</p> |
| 111 | |
| 112 | <p>The clean framework-based design of clang means that many things are possible |
| 113 | that would be very difficult in other systems, for example incremental |
| 114 | compilation, multithreading, intelligent caching, etc. We are only starting |
| 115 | to tap the full potential of the clang design.</p> |
| 116 | |
| 117 | |
| 118 | <!--=======================================================================--> |
| 119 | <h2><a name="expressivediags">Expressive Diagnostics</a></h2> |
| 120 | <!--=======================================================================--> |
| 121 | |
| 122 | <p>Clang is designed to efficiently capture range information for expressions |
| 123 | and statements, which allows it to emit very useful and detailed diagnostic |
| 124 | information (e.g. warnings and errors) when a problem is detected.</p> |
| 125 | |
| 126 | <p>For example, this slide compares the diagnostics emitted by clang (top) to |
| 127 | the diagnostics emitted by GCC (middle) for a simple example:</p> |
| 128 | |
| 129 | <img class="img_slide" src="feature-diagnostics1.png" width="400" height="300"/> |
| 130 | |
| 131 | <p>As you can see, clang goes beyond tracking just column number information: it |
| 132 | is able to highlight the subexpressions involved in a problem, making it much |
| 133 | easier to understand the source of the problem in many cases. For example, in |
| 134 | the first problem, it tells you <em>why</em> the operand is invalid (it |
| 135 | requires a pointer) and what type it really is.</p> |
| 136 | |
| 137 | <p>In the second error, you can see how clang uses column number information to |
| 138 | identify exactly which "+" out of the four on that line is causing the problem. |
| 139 | Further, it highlights the subexpressions involved, which can be very useful |
| 140 | when a complex subexpression that relies on tricky precedence rules.</p> |
| 141 | |
| 142 | <p>The example doesn't show it, but clang works very hard to retain typedef |
| 143 | information, ensuring that diagnostics print the user types, not the fully |
| 144 | expanded (and often huge) types. This is clearly important for C++ code (tell |
| 145 | me about "<tt>std::string</tt>", not about "<tt>std::basic_string<char, |
| 146 | std::char_traits<char>, std::allocator<char> ></tt>"!), but it is |
| 147 | also very useful in C code in some cases as well (e.g. "<tt>__m128"</tt> vs |
| 148 | "<tt>float __attribute__((__vector_size__(16)))</tt>").</p> |
| 149 | |
| 150 | |
| 151 | <!--=======================================================================--> |
| 152 | <h1>Driving Goals and Internal Design</h1> |
| 153 | <!--=======================================================================--> |
| 154 | |
| 155 | <!--=======================================================================--> |
Chris Lattner | 6908f30 | 2007-12-10 05:52:05 +0000 | [diff] [blame] | 156 | <h2><a name="real">A real-world, production quality compiler</a></h2> |
| 157 | <!--=======================================================================--> |
Chris Lattner | 7a27439 | 2007-10-06 05:23:00 +0000 | [diff] [blame] | 158 | |
Chris Lattner | 6908f30 | 2007-12-10 05:52:05 +0000 | [diff] [blame] | 159 | <p> |
| 160 | Clang is designed and built by experienced commercial compiler developers who |
| 161 | are increasingly frustrated with the problems that <a |
| 162 | href="comparison.html">existing open source compilers</a> have. Clang is |
| 163 | carefully and thoughtfully designed and built to provide the foundation of a |
| 164 | whole new generation of C/C++/Objective C development tools, and we intend for |
| 165 | it to be commercial quality.</p> |
| 166 | |
| 167 | <p>Being a production quality compiler means many things: it means being high |
| 168 | performance, being solid and (relatively) bug free, and it means eventually |
| 169 | being used and depended on by a broad range of people. While we are still in |
| 170 | the early development stages, we strongly believe that this will become a |
| 171 | reality.</p> |
| 172 | |
| 173 | <!--=======================================================================--> |
| 174 | <h2><a name="unifiedparser">A single unified parser for C, Objective C, C++, |
| 175 | and Objective C++</a></h2> |
| 176 | <!--=======================================================================--> |
| 177 | |
| 178 | <p>Clang is the "C Language Family Front-end", which means we intend to support |
| 179 | the most popular members of the C family. We are convinced that the right |
| 180 | parsing technology for this class of languages is a hand-built recursive-descent |
| 181 | parser. Because it is plain C++ code, recursive descent makes it very easy for |
| 182 | new developers to understand the code, it easily supports ad-hoc rules and other |
| 183 | strange hacks required by C/C++, and makes it straight-forward to implement |
| 184 | excellent diagnostics and error recovery.</p> |
| 185 | |
| 186 | <p>We believe that implementing C/C++/ObjC in a single unified parser makes the |
| 187 | end result easier to maintain and evolve than maintaining a separate C and C++ |
| 188 | parser which must be bugfixed and maintained independently of each other.</p> |
| 189 | |
| 190 | <!--=======================================================================--> |
| 191 | <h2><a name="conformance">Conformance with C/C++/ObjC and their |
| 192 | variants</a></h2> |
| 193 | <!--=======================================================================--> |
| 194 | |
| 195 | <p>When you start work on implementing a language, you find out that there is a |
| 196 | huge gap between how the language works and how most people understand it to |
| 197 | work. This gap is the difference between a normal programmer and a (scary? |
| 198 | super-natural?) "language lawyer", who knows the ins and outs of the language |
| 199 | and can grok standardese with ease.</p> |
| 200 | |
| 201 | <p>In practice, being conformant with the languages means that we aim to support |
| 202 | the full language, including the dark and dusty corners (like trigraphs, |
| 203 | preprocessor arcana, C99 VLAs, etc). Where we support extensions above and |
| 204 | beyond what the standard officially allows, we make an effort to explicitly call |
| 205 | this out in the code and emit warnings about it (which are disabled by default, |
| 206 | but can optionally be mapped to either warnings or errors), allowing you to use |
| 207 | clang in "strict" mode if you desire.</p> |
| 208 | |
| 209 | <p>We also intend to support "dialects" of these languages, such as C89, K&R |
| 210 | C, C++'03, Objective-C 2, etc.</p> |
| 211 | |
| 212 | <!--=======================================================================--> |
Chris Lattner | 1a380a0 | 2007-12-10 07:14:08 +0000 | [diff] [blame^] | 213 | <h2><a name="gcccompat">GCC Compatibility</a></h2> |
| 214 | <!--=======================================================================--> |
| 215 | |
| 216 | <p>GCC is currently the defacto-standard open source compiler today, and it |
| 217 | routinely compiles a huge volume of code. GCC supports a huge number of |
| 218 | extensions and features (many of which are undocumented) and a lot of |
| 219 | code and header files depend on these features in order to build.</p> |
| 220 | |
| 221 | <p>While it would be nice to be able to ignore these extensions and focus on |
| 222 | implementing the language standards to the letter, pragmatics force us to |
| 223 | support the GCC extensions that see the most use. As mentioned above, all |
| 224 | extensions are explicitly recognized as such and marked with extension |
| 225 | diagnostics, which can be mapped to warnings, errors, or just ignored. |
| 226 | </p> |
| 227 | |
| 228 | |
| 229 | <!--=======================================================================--> |
| 230 | <h2><a name="libraryarch">Library based architecture</a></h2> |
Chris Lattner | 6908f30 | 2007-12-10 05:52:05 +0000 | [diff] [blame] | 231 | <!--=======================================================================--> |
| 232 | |
Chris Lattner | 7a27439 | 2007-10-06 05:23:00 +0000 | [diff] [blame] | 233 | A major design concept for the LLVM front-end involves using a library based architecture. In this library based architecture, various parts of the front-end can be cleanly divided into separate libraries which can then be mixed up for different needs and uses. In addition, the library based approach makes it much easier for new developers to get involved and extend LLVM to do new and unique things. In the words of Chris, |
Chris Lattner | 1a380a0 | 2007-12-10 07:14:08 +0000 | [diff] [blame^] | 234 | |
| 235 | <blockquote> |
| 236 | "The world needs better compiler tools, tools which are built as libraries. |
| 237 | This design point allows reuse of the tools in new and novel ways. However, |
| 238 | building the tools as libraries isn't enough: they must have clean APIs, be as |
| 239 | decoupled from each other as possible, and be easy to modify/extend. This |
| 240 | requires clean layering, decent design, and keeping the libraries independent of |
| 241 | any specific client."</blockquote> |
| 242 | |
Chris Lattner | 7a27439 | 2007-10-06 05:23:00 +0000 | [diff] [blame] | 243 | Currently, the LLVM front-end is divided into the following libraries: |
| 244 | <ul> |
Chris Lattner | 6908f30 | 2007-12-10 05:52:05 +0000 | [diff] [blame] | 245 | <li>libsupport - Basic support library, reused from LLVM. |
| 246 | <li>libsystem - System abstraction library, reused from LLVM. |
| 247 | <li>libbasic - Diagnostics, SourceLocations, SourceBuffer abstraction, file system caching for input source files. <span class="weak_txt">(depends on above libraries)</span> |
| 248 | <li>libast - Provides classes to represent the C AST, the C type system, builtin functions, and various helpers for analyzing and manipulating the AST (visitors, pretty printers, etc). <span class="weak_txt">(depends on above libraries)</span> |
| 249 | <li>liblex - C/C++/ObjC lexing and preprocessing, identifier hash table, pragma handling, tokens, and macros. <span class="weak_txt">(depends on above libraries)</span> |
| 250 | <li>libparse - Parsing and local semantic analysis. This library invokes coarse-grained 'Actions' provided by the client to do stuff (e.g. libsema builds ASTs). <span class="weak_txt">(depends on above libraries)</span> |
| 251 | <li>libsema - Provides a set of parser actions to build a standardized AST for programs. AST's are 'streamed' out a top-level declaration at a time, allowing clients to use decl-at-a-time processing, build up entire translation units, or even build 'whole program' ASTs depending on how they use the APIs. <span class="weak_txt">(depends on libast and libparse)</span> |
| 252 | <li>libcodegen - Lower the AST to LLVM IR for optimization & codegen. <span class="weak_txt">(depends on libast)</span> |
| 253 | <li>librewrite - Editing of text buffers, depends on libast.</li> |
| 254 | <li>libanalysis - Static analysis support, depends on libast.</li> |
| 255 | <li><b>clang</b> - An example driver, client of the libraries at various levels. <span class="weak_txt">(depends on above libraries, and LLVM VMCore)</span> |
Chris Lattner | 7a27439 | 2007-10-06 05:23:00 +0000 | [diff] [blame] | 256 | </ul> |
| 257 | As an example of the power of this library based design.... If you wanted to build a preprocessor, you would take the Basic and Lexer libraries. If you want an indexer, you would take the previous two and add the Parser library and some actions for indexing. If you want a refactoring, static analysis, or source-to-source compiler tool, you would then add the AST building and semantic analyzer libraries. |
| 258 | In the end, LLVM's library based design will provide developers with many more possibilities. |
| 259 | |
Chris Lattner | 7a27439 | 2007-10-06 05:23:00 +0000 | [diff] [blame] | 260 | <h2>Better Integration with IDEs</h2> |
Chris Lattner | 96e778b | 2007-10-06 05:30:19 +0000 | [diff] [blame] | 261 | Another design goal of Clang is to integrate extremely well with IDEs. IDEs often have very different requirements than code generation, often requiring information that a codegen-only frontend can throw away. Clang is specifically designed and built to capture this information. |
Chris Lattner | 7a27439 | 2007-10-06 05:23:00 +0000 | [diff] [blame] | 262 | </div> |
| 263 | </body> |
Chris Lattner | bafc68f | 2007-10-06 05:48:57 +0000 | [diff] [blame] | 264 | </html> |