blob: b2876670d3b1f47cc5e76bbc43fab116b900915a [file] [log] [blame]
Daniel Dunbar3d1c9462009-03-04 00:04:28 +00001<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
2 "http://www.w3.org/TR/html4/strict.dtd">
3<html>
4<head>
Benjamin Kramer665a8dc2012-01-15 15:26:07 +00005 <META http-equiv="Content-Type" content="text/html; charset=ISO-8859-1">
Daniel Dunbar3d1c9462009-03-04 00:04:28 +00006 <title>Clang - Performance</title>
Benjamin Kramer665a8dc2012-01-15 15:26:07 +00007 <link type="text/css" rel="stylesheet" href="menu.css">
8 <link type="text/css" rel="stylesheet" href="content.css">
Daniel Dunbar3d1c9462009-03-04 00:04:28 +00009 <style type="text/css">
10</style>
11</head>
12<body>
13
14<!--#include virtual="menu.html.incl"-->
15
16<div id="content">
17
18<!--*************************************************************************-->
19<h1>Clang - Performance</h1>
20<!--*************************************************************************-->
21
22<p>This page tracks the compile time performance of Clang on two
Benjamin Kramer665a8dc2012-01-15 15:26:07 +000023interesting benchmarks:</p>
Daniel Dunbar3d1c9462009-03-04 00:04:28 +000024<ul>
25 <li><i>Sketch</i>: The Objective-C example application shipped on
26 Mac OS X as part of Xcode. <i>Sketch</i> is indicative of a
27 "typical" Objective-C app. The source itself has a relatively
28 small amount of code (~7,500 lines of source code), but it relies
29 on the extensive Cocoa APIs to build its functionality. Like many
30 Objective-C applications, it includes
31 <tt>Cocoa/Cocoa.h</tt> in all of its source files, which represents a
32 significant stress test of the front-end's performance on lexing,
33 preprocessing, parsing, and syntax analysis.</li>
34 <li><i>176.gcc</i>: This is the gcc-2.7.2.2 code base as present in
35 SPECINT 2000. In contrast to Sketch, <i>176.gcc</i> consists of a
36 large amount of C source code (~220,000 lines) with few system
37 dependencies. This stresses the back-end's performance on generating
38 assembly code and debug information.</li>
39</ul>
Daniel Dunbar3d1c9462009-03-04 00:04:28 +000040
41<!--*************************************************************************-->
42<h2><a name="enduser">Experiments</a></h2>
43<!--*************************************************************************-->
44
45<p>Measurements are done by serially processing each file in the
46respective benchmark, using Clang, gcc, and llvm-gcc as compilers. In
47order to track the performance of various subsystems the timings have
Benjamin Kramer665a8dc2012-01-15 15:26:07 +000048been broken down into separate stages where possible:</p>
Daniel Dunbar3d1c9462009-03-04 00:04:28 +000049
50<ul>
51 <li><tt>-Eonly</tt>: This option runs the preprocessor but does not
52 perform any output. For gcc and llvm-gcc, the -MM option is used
53 as a rough equivalent to this step.</li>
54 <li><tt>-parse-noop</tt>: This option runs the parser on the input,
55 but without semantic analysis or any output. gcc and llvm-gcc have
56 no equivalent for this option.</li>
57 <li><tt>-fsyntax-only</tt>: This option runs the parser with semantic
58 analysis.</li>
59 <li><tt>-emit-llvm -O0</tt>: For Clang and llvm-gcc, this option
60 converts to the LLVM intermediate representation but doesn't
61 generate native code.</li>
62 <li><tt>-S -O0</tt>: Perform actual code generation to produce a
63 native assembler file.</li>
64 <li><tt>-S -O0 -g</tt>: This adds emission of debug information to
65 the assembly output.</li>
66</ul>
Daniel Dunbar3d1c9462009-03-04 00:04:28 +000067
68<p>This set of stages is chosen to be approximately additive, that is
69each subsequent stage simply adds some additional processing. The
70timings measure the delta of the given stage from the previous
71one. For example, the timings for <tt>-fsyntax-only</tt> below show
72the difference of running with <tt>-fsyntax-only</tt> versus running
73with <tt>-parse-noop</tt> (for clang) or <tt>-MM</tt> with gcc and
74llvm-gcc. This amounts to a fairly accurate measure of only the time
75to perform semantic analysis (and parsing, in the case of gcc and llvm-gcc).</p>
76
77<p>These timings are chosen to break down the compilation process for
78clang as much as possible. The graphs below show these numbers
79combined so that it is easy to see how the time for a particular task
80is divided among various components. For example, <tt>-S -O0</tt>
81includes the time of <tt>-fsyntax-only</tt> and <tt>-emit-llvm -O0</tt>.</p>
82
83<p>Note that we already know that the LLVM optimizers are substantially (30-40%)
84faster than the GCC optimizers at a given -O level, so we only focus on -O0
85compile time here.</p>
86
87<!--*************************************************************************-->
88<h2><a name="enduser">Timing Results</a></h2>
89<!--*************************************************************************-->
90
91<!--=======================================================================-->
92<h3><a name="2008-10-31">2008-10-31</a></h3>
93<!--=======================================================================-->
94
Benjamin Kramer665a8dc2012-01-15 15:26:07 +000095<h4 style="text-align:center">Sketch</h4>
Daniel Dunbar3d1c9462009-03-04 00:04:28 +000096<img class="img_slide"
Benjamin Kramer665a8dc2012-01-15 15:26:07 +000097 src="timing-data/2008-10-31/sketch.png" alt="Sketch Timings">
Daniel Dunbar3d1c9462009-03-04 00:04:28 +000098
99<p>This shows Clang's substantial performance improvements in
100preprocessing and semantic analysis; over 90% faster on
101-fsyntax-only. As expected, time spent in code generation for this
102benchmark is relatively small. One caveat, Clang's debug information
103generation for Objective-C is very incomplete; this means the <tt>-S
104-O0 -g</tt> numbers are unfair since Clang is generating substantially
105less output.</p>
106
107<p>This chart also shows the effect of using precompiled headers (PCH)
108on compiler time. gcc and llvm-gcc see a large performance improvement
109with PCH; about 4x in wall time. Unfortunately, Clang does not yet
110have an implementation of PCH-style optimizations, but we are actively
111working to address this.</p>
112
Benjamin Kramer665a8dc2012-01-15 15:26:07 +0000113<h4 style="text-align:center">176.gcc</h4>
Daniel Dunbar3d1c9462009-03-04 00:04:28 +0000114<img class="img_slide"
Benjamin Kramer665a8dc2012-01-15 15:26:07 +0000115 src="timing-data/2008-10-31/176.gcc.png" alt="176.gcc Timings">
Daniel Dunbar3d1c9462009-03-04 00:04:28 +0000116
117<p>Unlike the <i>Sketch</i> timings, compilation of <i>176.gcc</i>
118involves a large amount of code generation. The time spent in Clang's
119LLVM IR generation and code generation is on par with gcc's code
120generation time but the improved parsing & semantic analysis
121performance means Clang still comes in at ~29% faster versus gcc
122on <tt>-S -O0 -g</tt> and ~20% faster versus llvm-gcc.</p>
123
124<p>These numbers indicate that Clang still has room for improvement in
125several areas, notably our LLVM IR generation is significantly slower
126than that of llvm-gcc, and both Clang and llvm-gcc incur a
127significantly higher cost for adding debugging information compared to
128gcc.</p>
129
130</div>
131</body>
132</html>