Daniel Dunbar | 3d1c946 | 2009-03-04 00:04:28 +0000 | [diff] [blame^] | 1 | <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" |
| 2 | "http://www.w3.org/TR/html4/strict.dtd"> |
| 3 | <html> |
| 4 | <head> |
| 5 | <META http-equiv="Content-Type" content="text/html; charset=ISO-8859-1" /> |
| 6 | <title>Clang - Performance</title> |
| 7 | <link type="text/css" rel="stylesheet" href="menu.css" /> |
| 8 | <link type="text/css" rel="stylesheet" href="content.css" /> |
| 9 | <style type="text/css"> |
| 10 | </style> |
| 11 | </head> |
| 12 | <body> |
| 13 | |
| 14 | <!--#include virtual="menu.html.incl"--> |
| 15 | |
| 16 | <div id="content"> |
| 17 | |
| 18 | <!--*************************************************************************--> |
| 19 | <h1>Clang - Performance</h1> |
| 20 | <!--*************************************************************************--> |
| 21 | |
| 22 | <p>This page tracks the compile time performance of Clang on two |
| 23 | interesting benchmarks: |
| 24 | <ul> |
| 25 | <li><i>Sketch</i>: The Objective-C example application shipped on |
| 26 | Mac OS X as part of Xcode. <i>Sketch</i> is indicative of a |
| 27 | "typical" Objective-C app. The source itself has a relatively |
| 28 | small amount of code (~7,500 lines of source code), but it relies |
| 29 | on the extensive Cocoa APIs to build its functionality. Like many |
| 30 | Objective-C applications, it includes |
| 31 | <tt>Cocoa/Cocoa.h</tt> in all of its source files, which represents a |
| 32 | significant stress test of the front-end's performance on lexing, |
| 33 | preprocessing, parsing, and syntax analysis.</li> |
| 34 | <li><i>176.gcc</i>: This is the gcc-2.7.2.2 code base as present in |
| 35 | SPECINT 2000. In contrast to Sketch, <i>176.gcc</i> consists of a |
| 36 | large amount of C source code (~220,000 lines) with few system |
| 37 | dependencies. This stresses the back-end's performance on generating |
| 38 | assembly code and debug information.</li> |
| 39 | </ul> |
| 40 | </p> |
| 41 | |
| 42 | <!--*************************************************************************--> |
| 43 | <h2><a name="enduser">Experiments</a></h2> |
| 44 | <!--*************************************************************************--> |
| 45 | |
| 46 | <p>Measurements are done by serially processing each file in the |
| 47 | respective benchmark, using Clang, gcc, and llvm-gcc as compilers. In |
| 48 | order to track the performance of various subsystems the timings have |
| 49 | been broken down into separate stages where possible: |
| 50 | |
| 51 | <ul> |
| 52 | <li><tt>-Eonly</tt>: This option runs the preprocessor but does not |
| 53 | perform any output. For gcc and llvm-gcc, the -MM option is used |
| 54 | as a rough equivalent to this step.</li> |
| 55 | <li><tt>-parse-noop</tt>: This option runs the parser on the input, |
| 56 | but without semantic analysis or any output. gcc and llvm-gcc have |
| 57 | no equivalent for this option.</li> |
| 58 | <li><tt>-fsyntax-only</tt>: This option runs the parser with semantic |
| 59 | analysis.</li> |
| 60 | <li><tt>-emit-llvm -O0</tt>: For Clang and llvm-gcc, this option |
| 61 | converts to the LLVM intermediate representation but doesn't |
| 62 | generate native code.</li> |
| 63 | <li><tt>-S -O0</tt>: Perform actual code generation to produce a |
| 64 | native assembler file.</li> |
| 65 | <li><tt>-S -O0 -g</tt>: This adds emission of debug information to |
| 66 | the assembly output.</li> |
| 67 | </ul> |
| 68 | </p> |
| 69 | |
| 70 | <p>This set of stages is chosen to be approximately additive, that is |
| 71 | each subsequent stage simply adds some additional processing. The |
| 72 | timings measure the delta of the given stage from the previous |
| 73 | one. For example, the timings for <tt>-fsyntax-only</tt> below show |
| 74 | the difference of running with <tt>-fsyntax-only</tt> versus running |
| 75 | with <tt>-parse-noop</tt> (for clang) or <tt>-MM</tt> with gcc and |
| 76 | llvm-gcc. This amounts to a fairly accurate measure of only the time |
| 77 | to perform semantic analysis (and parsing, in the case of gcc and llvm-gcc).</p> |
| 78 | |
| 79 | <p>These timings are chosen to break down the compilation process for |
| 80 | clang as much as possible. The graphs below show these numbers |
| 81 | combined so that it is easy to see how the time for a particular task |
| 82 | is divided among various components. For example, <tt>-S -O0</tt> |
| 83 | includes the time of <tt>-fsyntax-only</tt> and <tt>-emit-llvm -O0</tt>.</p> |
| 84 | |
| 85 | <p>Note that we already know that the LLVM optimizers are substantially (30-40%) |
| 86 | faster than the GCC optimizers at a given -O level, so we only focus on -O0 |
| 87 | compile time here.</p> |
| 88 | |
| 89 | <!--*************************************************************************--> |
| 90 | <h2><a name="enduser">Timing Results</a></h2> |
| 91 | <!--*************************************************************************--> |
| 92 | |
| 93 | <!--=======================================================================--> |
| 94 | <h3><a name="2008-10-31">2008-10-31</a></h3> |
| 95 | <!--=======================================================================--> |
| 96 | |
| 97 | <center><h4>Sketch</h4></center> |
| 98 | <img class="img_slide" |
| 99 | src="timing-data/2008-10-31/sketch.png" alt="Sketch Timings"/> |
| 100 | |
| 101 | <p>This shows Clang's substantial performance improvements in |
| 102 | preprocessing and semantic analysis; over 90% faster on |
| 103 | -fsyntax-only. As expected, time spent in code generation for this |
| 104 | benchmark is relatively small. One caveat, Clang's debug information |
| 105 | generation for Objective-C is very incomplete; this means the <tt>-S |
| 106 | -O0 -g</tt> numbers are unfair since Clang is generating substantially |
| 107 | less output.</p> |
| 108 | |
| 109 | <p>This chart also shows the effect of using precompiled headers (PCH) |
| 110 | on compiler time. gcc and llvm-gcc see a large performance improvement |
| 111 | with PCH; about 4x in wall time. Unfortunately, Clang does not yet |
| 112 | have an implementation of PCH-style optimizations, but we are actively |
| 113 | working to address this.</p> |
| 114 | |
| 115 | <center><h4>176.gcc</h4></center> |
| 116 | <img class="img_slide" |
| 117 | src="timing-data/2008-10-31/176.gcc.png" alt="176.gcc Timings"/> |
| 118 | |
| 119 | <p>Unlike the <i>Sketch</i> timings, compilation of <i>176.gcc</i> |
| 120 | involves a large amount of code generation. The time spent in Clang's |
| 121 | LLVM IR generation and code generation is on par with gcc's code |
| 122 | generation time but the improved parsing & semantic analysis |
| 123 | performance means Clang still comes in at ~29% faster versus gcc |
| 124 | on <tt>-S -O0 -g</tt> and ~20% faster versus llvm-gcc.</p> |
| 125 | |
| 126 | <p>These numbers indicate that Clang still has room for improvement in |
| 127 | several areas, notably our LLVM IR generation is significantly slower |
| 128 | than that of llvm-gcc, and both Clang and llvm-gcc incur a |
| 129 | significantly higher cost for adding debugging information compared to |
| 130 | gcc.</p> |
| 131 | |
| 132 | </div> |
| 133 | </body> |
| 134 | </html> |