Blame - www/performance-2008-10-31.html - fp2-dev/platform/external/clang

blob: 5246ac30b8576cf4d57f8d7b6a80212d774ebc57 [file] [log] [blame]

Daniel Dunbar	3d1c946	2009-03-04 00:04:28 +0000	[diff] [blame]	1	<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
				2	"http://www.w3.org/TR/html4/strict.dtd">
				3	<html>
				4	<head>
				5	<META http-equiv="Content-Type" content="text/html; charset=ISO-8859-1" />
				6	<title>Clang - Performance</title>
				7	<link type="text/css" rel="stylesheet" href="menu.css" />
				8	<link type="text/css" rel="stylesheet" href="content.css" />
				9	<style type="text/css">
				10	</style>
				11	</head>
				12	<body>
				13
				14	<!--#include virtual="menu.html.incl"-->
				15
				16	<div id="content">
				17
				18	<!--*************************************************************************-->
				19	<h1>Clang - Performance</h1>
				20	<!--*************************************************************************-->
				21
				22	<p>This page tracks the compile time performance of Clang on two
				23	interesting benchmarks:
				24	<ul>
				25	<li><i>Sketch</i>: The Objective-C example application shipped on
				26	Mac OS X as part of Xcode. <i>Sketch</i> is indicative of a
				27	"typical" Objective-C app. The source itself has a relatively
				28	small amount of code (~7,500 lines of source code), but it relies
				29	on the extensive Cocoa APIs to build its functionality. Like many
				30	Objective-C applications, it includes
				31	<tt>Cocoa/Cocoa.h</tt> in all of its source files, which represents a
				32	significant stress test of the front-end's performance on lexing,
				33	preprocessing, parsing, and syntax analysis.</li>
				34	<li><i>176.gcc</i>: This is the gcc-2.7.2.2 code base as present in
				35	SPECINT 2000. In contrast to Sketch, <i>176.gcc</i> consists of a
				36	large amount of C source code (~220,000 lines) with few system
				37	dependencies. This stresses the back-end's performance on generating
				38	assembly code and debug information.</li>
				39	</ul>
				40	</p>
				41
				42	<!--*************************************************************************-->
				43	<h2><a name="enduser">Experiments</a></h2>
				44	<!--*************************************************************************-->
				45
				46	<p>Measurements are done by serially processing each file in the
				47	respective benchmark, using Clang, gcc, and llvm-gcc as compilers. In
				48	order to track the performance of various subsystems the timings have
				49	been broken down into separate stages where possible:
				50
				51	<ul>
				52	<li><tt>-Eonly</tt>: This option runs the preprocessor but does not
				53	perform any output. For gcc and llvm-gcc, the -MM option is used
				54	as a rough equivalent to this step.</li>
				55	<li><tt>-parse-noop</tt>: This option runs the parser on the input,
				56	but without semantic analysis or any output. gcc and llvm-gcc have
				57	no equivalent for this option.</li>
				58	<li><tt>-fsyntax-only</tt>: This option runs the parser with semantic
				59	analysis.</li>
				60	<li><tt>-emit-llvm -O0</tt>: For Clang and llvm-gcc, this option
				61	converts to the LLVM intermediate representation but doesn't
				62	generate native code.</li>
				63	<li><tt>-S -O0</tt>: Perform actual code generation to produce a
				64	native assembler file.</li>
				65	<li><tt>-S -O0 -g</tt>: This adds emission of debug information to
				66	the assembly output.</li>
				67	</ul>
				68	</p>
				69
				70	<p>This set of stages is chosen to be approximately additive, that is
				71	each subsequent stage simply adds some additional processing. The
				72	timings measure the delta of the given stage from the previous
				73	one. For example, the timings for <tt>-fsyntax-only</tt> below show
				74	the difference of running with <tt>-fsyntax-only</tt> versus running
				75	with <tt>-parse-noop</tt> (for clang) or <tt>-MM</tt> with gcc and
				76	llvm-gcc. This amounts to a fairly accurate measure of only the time
				77	to perform semantic analysis (and parsing, in the case of gcc and llvm-gcc).</p>
				78
				79	<p>These timings are chosen to break down the compilation process for
				80	clang as much as possible. The graphs below show these numbers
				81	combined so that it is easy to see how the time for a particular task
				82	is divided among various components. For example, <tt>-S -O0</tt>
				83	includes the time of <tt>-fsyntax-only</tt> and <tt>-emit-llvm -O0</tt>.</p>
				84
				85	<p>Note that we already know that the LLVM optimizers are substantially (30-40%)
				86	faster than the GCC optimizers at a given -O level, so we only focus on -O0
				87	compile time here.</p>
				88
				89	<!--*************************************************************************-->
				90	<h2><a name="enduser">Timing Results</a></h2>
				91	<!--*************************************************************************-->
				92
				93	<!--=======================================================================-->
				94	<h3><a name="2008-10-31">2008-10-31</a></h3>
				95	<!--=======================================================================-->
				96
				97	<center><h4>Sketch</h4></center>
				98	<img class="img_slide"
				99	src="timing-data/2008-10-31/sketch.png" alt="Sketch Timings"/>
				100
				101	<p>This shows Clang's substantial performance improvements in
				102	preprocessing and semantic analysis; over 90% faster on
				103	-fsyntax-only. As expected, time spent in code generation for this
				104	benchmark is relatively small. One caveat, Clang's debug information
				105	generation for Objective-C is very incomplete; this means the <tt>-S
				106	-O0 -g</tt> numbers are unfair since Clang is generating substantially
				107	less output.</p>
				108
				109	<p>This chart also shows the effect of using precompiled headers (PCH)
				110	on compiler time. gcc and llvm-gcc see a large performance improvement
				111	with PCH; about 4x in wall time. Unfortunately, Clang does not yet
				112	have an implementation of PCH-style optimizations, but we are actively
				113	working to address this.</p>
				114
				115	<center><h4>176.gcc</h4></center>
				116	<img class="img_slide"
				117	src="timing-data/2008-10-31/176.gcc.png" alt="176.gcc Timings"/>
				118
				119	<p>Unlike the <i>Sketch</i> timings, compilation of <i>176.gcc</i>
				120	involves a large amount of code generation. The time spent in Clang's
				121	LLVM IR generation and code generation is on par with gcc's code
				122	generation time but the improved parsing & semantic analysis
				123	performance means Clang still comes in at ~29% faster versus gcc
				124	on <tt>-S -O0 -g</tt> and ~20% faster versus llvm-gcc.</p>
				125
				126	<p>These numbers indicate that Clang still has room for improvement in
				127	several areas, notably our LLVM IR generation is significantly slower
				128	than that of llvm-gcc, and both Clang and llvm-gcc incur a
				129	significantly higher cost for adding debugging information compared to
				130	gcc.</p>
				131
				132	</div>
				133	</body>
				134	</html>