| <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" |
| "http://www.w3.org/TR/html4/strict.dtd"> |
| <html> |
| <head> |
| <META http-equiv="Content-Type" content="text/html; charset=ISO-8859-1"> |
| <title>Clang - Performance</title> |
| <link type="text/css" rel="stylesheet" href="menu.css"> |
| <link type="text/css" rel="stylesheet" href="content.css"> |
| <style type="text/css"> |
| </style> |
| </head> |
| <body> |
| |
| <!--#include virtual="menu.html.incl"--> |
| |
| <div id="content"> |
| |
| <!--*************************************************************************--> |
| <h1>Clang - Performance</h1> |
| <!--*************************************************************************--> |
| |
| <p>This page tracks the compile time performance of Clang on two |
| interesting benchmarks:</p> |
| <ul> |
| <li><i>Sketch</i>: The Objective-C example application shipped on |
| Mac OS X as part of Xcode. <i>Sketch</i> is indicative of a |
| "typical" Objective-C app. The source itself has a relatively |
| small amount of code (~7,500 lines of source code), but it relies |
| on the extensive Cocoa APIs to build its functionality. Like many |
| Objective-C applications, it includes |
| <tt>Cocoa/Cocoa.h</tt> in all of its source files, which represents a |
| significant stress test of the front-end's performance on lexing, |
| preprocessing, parsing, and syntax analysis.</li> |
| <li><i>176.gcc</i>: This is the gcc-2.7.2.2 code base as present in |
| SPECINT 2000. In contrast to Sketch, <i>176.gcc</i> consists of a |
| large amount of C source code (~220,000 lines) with few system |
| dependencies. This stresses the back-end's performance on generating |
| assembly code and debug information.</li> |
| </ul> |
| |
| <!--*************************************************************************--> |
| <h2><a name="enduser">Experiments</a></h2> |
| <!--*************************************************************************--> |
| |
| <p>Measurements are done by serially processing each file in the |
| respective benchmark, using Clang, gcc, and llvm-gcc as compilers. In |
| order to track the performance of various subsystems the timings have |
| been broken down into separate stages where possible:</p> |
| |
| <ul> |
| <li><tt>-Eonly</tt>: This option runs the preprocessor but does not |
| perform any output. For gcc and llvm-gcc, the -MM option is used |
| as a rough equivalent to this step.</li> |
| <li><tt>-parse-noop</tt>: This option runs the parser on the input, |
| but without semantic analysis or any output. gcc and llvm-gcc have |
| no equivalent for this option.</li> |
| <li><tt>-fsyntax-only</tt>: This option runs the parser with semantic |
| analysis.</li> |
| <li><tt>-emit-llvm -O0</tt>: For Clang and llvm-gcc, this option |
| converts to the LLVM intermediate representation but doesn't |
| generate native code.</li> |
| <li><tt>-S -O0</tt>: Perform actual code generation to produce a |
| native assembler file.</li> |
| <li><tt>-S -O0 -g</tt>: This adds emission of debug information to |
| the assembly output.</li> |
| </ul> |
| |
| <p>This set of stages is chosen to be approximately additive, that is |
| each subsequent stage simply adds some additional processing. The |
| timings measure the delta of the given stage from the previous |
| one. For example, the timings for <tt>-fsyntax-only</tt> below show |
| the difference of running with <tt>-fsyntax-only</tt> versus running |
| with <tt>-parse-noop</tt> (for clang) or <tt>-MM</tt> with gcc and |
| llvm-gcc. This amounts to a fairly accurate measure of only the time |
| to perform semantic analysis (and parsing, in the case of gcc and llvm-gcc).</p> |
| |
| <p>These timings are chosen to break down the compilation process for |
| clang as much as possible. The graphs below show these numbers |
| combined so that it is easy to see how the time for a particular task |
| is divided among various components. For example, <tt>-S -O0</tt> |
| includes the time of <tt>-fsyntax-only</tt> and <tt>-emit-llvm -O0</tt>.</p> |
| |
| <p>Note that we already know that the LLVM optimizers are substantially (30-40%) |
| faster than the GCC optimizers at a given -O level, so we only focus on -O0 |
| compile time here.</p> |
| |
| <!--*************************************************************************--> |
| <h2><a name="enduser">Timing Results</a></h2> |
| <!--*************************************************************************--> |
| |
| <!--=======================================================================--> |
| <h3><a name="2008-10-31">2008-10-31</a></h3> |
| <!--=======================================================================--> |
| |
| <h4 style="text-align:center">Sketch</h4> |
| <img class="img_slide" |
| src="timing-data/2008-10-31/sketch.png" alt="Sketch Timings"> |
| |
| <p>This shows Clang's substantial performance improvements in |
| preprocessing and semantic analysis; over 90% faster on |
| -fsyntax-only. As expected, time spent in code generation for this |
| benchmark is relatively small. One caveat, Clang's debug information |
| generation for Objective-C is very incomplete; this means the <tt>-S |
| -O0 -g</tt> numbers are unfair since Clang is generating substantially |
| less output.</p> |
| |
| <p>This chart also shows the effect of using precompiled headers (PCH) |
| on compiler time. gcc and llvm-gcc see a large performance improvement |
| with PCH; about 4x in wall time. Unfortunately, Clang does not yet |
| have an implementation of PCH-style optimizations, but we are actively |
| working to address this.</p> |
| |
| <h4 style="text-align:center">176.gcc</h4> |
| <img class="img_slide" |
| src="timing-data/2008-10-31/176.gcc.png" alt="176.gcc Timings"> |
| |
| <p>Unlike the <i>Sketch</i> timings, compilation of <i>176.gcc</i> |
| involves a large amount of code generation. The time spent in Clang's |
| LLVM IR generation and code generation is on par with gcc's code |
| generation time but the improved parsing & semantic analysis |
| performance means Clang still comes in at ~29% faster versus gcc |
| on <tt>-S -O0 -g</tt> and ~20% faster versus llvm-gcc.</p> |
| |
| <p>These numbers indicate that Clang still has room for improvement in |
| several areas, notably our LLVM IR generation is significantly slower |
| than that of llvm-gcc, and both Clang and llvm-gcc incur a |
| significantly higher cost for adding debugging information compared to |
| gcc.</p> |
| |
| </div> |
| </body> |
| </html> |