Blame - docs/tutorial/LangImpl4.rst - fp2-dev/platform/external/llvm

blob: 44e0cc150954cc6d452c6a4adfdc4570b4a89e28 [file] [log] [blame]

Sean Silva	ee47edf	2012-12-05 00:26:32 +0000	[diff] [blame]	1	==============================================
				2	Kaleidoscope: Adding JIT and Optimizer Support
				3	==============================================
				4
				5	.. contents::
				6	:local:
				7
Sean Silva	ee47edf	2012-12-05 00:26:32 +0000	[diff] [blame]	8	Chapter 4 Introduction
				9	======================
				10
				11	Welcome to Chapter 4 of the "`Implementing a language with
				12	LLVM <index.html>`_" tutorial. Chapters 1-3 described the implementation
				13	of a simple language and added support for generating LLVM IR. This
				14	chapter describes two new techniques: adding optimizer support to your
				15	language, and adding JIT compiler support. These additions will
				16	demonstrate how to get nice, efficient code for the Kaleidoscope
				17	language.
				18
				19	Trivial Constant Folding
				20	========================
				21
				22	Our demonstration for Chapter 3 is elegant and easy to extend.
				23	Unfortunately, it does not produce wonderful code. The IRBuilder,
				24	however, does give us obvious optimizations when compiling simple code:
				25
				26	::
				27
				28	ready> def test(x) 1+2+x;
				29	Read function definition:
				30	define double @test(double %x) {
				31	entry:
				32	%addtmp = fadd double 3.000000e+00, %x
				33	ret double %addtmp
				34	}
				35
				36	This code is not a literal transcription of the AST built by parsing the
				37	input. That would be:
				38
				39	::
				40
				41	ready> def test(x) 1+2+x;
				42	Read function definition:
				43	define double @test(double %x) {
				44	entry:
				45	%addtmp = fadd double 2.000000e+00, 1.000000e+00
				46	%addtmp1 = fadd double %addtmp, %x
				47	ret double %addtmp1
				48	}
				49
				50	Constant folding, as seen above, in particular, is a very common and
				51	very important optimization: so much so that many language implementors
				52	implement constant folding support in their AST representation.
				53
				54	With LLVM, you don't need this support in the AST. Since all calls to
				55	build LLVM IR go through the LLVM IR builder, the builder itself checked
				56	to see if there was a constant folding opportunity when you call it. If
				57	so, it just does the constant fold and return the constant instead of
				58	creating an instruction.
				59
				60	Well, that was easy :). In practice, we recommend always using
				61	``IRBuilder`` when generating code like this. It has no "syntactic
				62	overhead" for its use (you don't have to uglify your compiler with
				63	constant checks everywhere) and it can dramatically reduce the amount of
				64	LLVM IR that is generated in some cases (particular for languages with a
				65	macro preprocessor or that use a lot of constants).
				66
				67	On the other hand, the ``IRBuilder`` is limited by the fact that it does
				68	all of its analysis inline with the code as it is built. If you take a
				69	slightly more complex example:
				70
				71	::
				72
				73	ready> def test(x) (1+2+x)*(x+(1+2));
				74	ready> Read function definition:
				75	define double @test(double %x) {
				76	entry:
				77	%addtmp = fadd double 3.000000e+00, %x
				78	%addtmp1 = fadd double %x, 3.000000e+00
				79	%multmp = fmul double %addtmp, %addtmp1
				80	ret double %multmp
				81	}
				82
				83	In this case, the LHS and RHS of the multiplication are the same value.
				84	We'd really like to see this generate "``tmp = x+3; result = tmp*tmp;``"
				85	instead of computing "``x+3``" twice.
				86
				87	Unfortunately, no amount of local analysis will be able to detect and
				88	correct this. This requires two transformations: reassociation of
				89	expressions (to make the add's lexically identical) and Common
				90	Subexpression Elimination (CSE) to delete the redundant add instruction.
				91	Fortunately, LLVM provides a broad range of optimizations that you can
				92	use, in the form of "passes".
				93
				94	LLVM Optimization Passes
				95	========================
				96
				97	LLVM provides many optimization passes, which do many different sorts of
				98	things and have different tradeoffs. Unlike other systems, LLVM doesn't
				99	hold to the mistaken notion that one set of optimizations is right for
				100	all languages and for all situations. LLVM allows a compiler implementor
				101	to make complete decisions about what optimizations to use, in which
				102	order, and in what situation.
				103
				104	As a concrete example, LLVM supports both "whole module" passes, which
				105	look across as large of body of code as they can (often a whole file,
				106	but if run at link time, this can be a substantial portion of the whole
				107	program). It also supports and includes "per-function" passes which just
				108	operate on a single function at a time, without looking at other
				109	functions. For more information on passes and how they are run, see the
				110	`How to Write a Pass <../WritingAnLLVMPass.html>`_ document and the
				111	`List of LLVM Passes <../Passes.html>`_.
				112
				113	For Kaleidoscope, we are currently generating functions on the fly, one
				114	at a time, as the user types them in. We aren't shooting for the
				115	ultimate optimization experience in this setting, but we also want to
				116	catch the easy and quick stuff where possible. As such, we will choose
				117	to run a few per-function optimizations as the user types the function
				118	in. If we wanted to make a "static Kaleidoscope compiler", we would use
				119	exactly the code we have now, except that we would defer running the
				120	optimizer until the entire file has been parsed.
				121
				122	In order to get per-function optimizations going, we need to set up a
				123	`FunctionPassManager <../WritingAnLLVMPass.html#passmanager>`_ to hold
				124	and organize the LLVM optimizations that we want to run. Once we have
				125	that, we can add a set of optimizations to run. The code looks like
				126	this:
				127
				128	.. code-block:: c++
				129
				130	FunctionPassManager OurFPM(TheModule);
				131
				132	// Set up the optimizer pipeline. Start with registering info about how the
				133	// target lays out data structures.
				134	OurFPM.add(new DataLayout(*TheExecutionEngine->getDataLayout()));
				135	// Provide basic AliasAnalysis support for GVN.
				136	OurFPM.add(createBasicAliasAnalysisPass());
				137	// Do simple "peephole" optimizations and bit-twiddling optzns.
				138	OurFPM.add(createInstructionCombiningPass());
				139	// Reassociate expressions.
				140	OurFPM.add(createReassociatePass());
				141	// Eliminate Common SubExpressions.
				142	OurFPM.add(createGVNPass());
				143	// Simplify the control flow graph (deleting unreachable blocks, etc).
				144	OurFPM.add(createCFGSimplificationPass());
				145
				146	OurFPM.doInitialization();
				147
				148	// Set the global so the code gen can use this.
				149	TheFPM = &OurFPM;
				150
				151	// Run the main "interpreter loop" now.
				152	MainLoop();
				153
				154	This code defines a ``FunctionPassManager``, "``OurFPM``". It requires a
				155	pointer to the ``Module`` to construct itself. Once it is set up, we use
				156	a series of "add" calls to add a bunch of LLVM passes. The first pass is
				157	basically boilerplate, it adds a pass so that later optimizations know
				158	how the data structures in the program are laid out. The
				159	"``TheExecutionEngine``" variable is related to the JIT, which we will
				160	get to in the next section.
				161
				162	In this case, we choose to add 4 optimization passes. The passes we
				163	chose here are a pretty standard set of "cleanup" optimizations that are
				164	useful for a wide variety of code. I won't delve into what they do but,
				165	believe me, they are a good starting place :).
				166
				167	Once the PassManager is set up, we need to make use of it. We do this by
				168	running it after our newly created function is constructed (in
				169	``FunctionAST::Codegen``), but before it is returned to the client:
				170
				171	.. code-block:: c++
				172
				173	if (Value *RetVal = Body->Codegen()) {
				174	// Finish off the function.
				175	Builder.CreateRet(RetVal);
				176
				177	// Validate the generated code, checking for consistency.
				178	verifyFunction(*TheFunction);
				179
				180	// Optimize the function.
				181	TheFPM->run(*TheFunction);
				182
				183	return TheFunction;
				184	}
				185
				186	As you can see, this is pretty straightforward. The
				187	``FunctionPassManager`` optimizes and updates the LLVM Function\* in
				188	place, improving (hopefully) its body. With this in place, we can try
				189	our test above again:
				190
				191	::
				192
				193	ready> def test(x) (1+2+x)*(x+(1+2));
				194	ready> Read function definition:
				195	define double @test(double %x) {
				196	entry:
				197	%addtmp = fadd double %x, 3.000000e+00
				198	%multmp = fmul double %addtmp, %addtmp
				199	ret double %multmp
				200	}
				201
				202	As expected, we now get our nicely optimized code, saving a floating
				203	point add instruction from every execution of this function.
				204
				205	LLVM provides a wide variety of optimizations that can be used in
				206	certain circumstances. Some `documentation about the various
				207	passes <../Passes.html>`_ is available, but it isn't very complete.
				208	Another good source of ideas can come from looking at the passes that
				209	``Clang`` runs to get started. The "``opt``" tool allows you to
				210	experiment with passes from the command line, so you can see if they do
				211	anything.
				212
				213	Now that we have reasonable code coming out of our front-end, lets talk
				214	about executing it!
				215
				216	Adding a JIT Compiler
				217	=====================
				218
				219	Code that is available in LLVM IR can have a wide variety of tools
				220	applied to it. For example, you can run optimizations on it (as we did
				221	above), you can dump it out in textual or binary forms, you can compile
				222	the code to an assembly file (.s) for some target, or you can JIT
				223	compile it. The nice thing about the LLVM IR representation is that it
				224	is the "common currency" between many different parts of the compiler.
				225
				226	In this section, we'll add JIT compiler support to our interpreter. The
				227	basic idea that we want for Kaleidoscope is to have the user enter
				228	function bodies as they do now, but immediately evaluate the top-level
				229	expressions they type in. For example, if they type in "1 + 2;", we
				230	should evaluate and print out 3. If they define a function, they should
				231	be able to call it from the command line.
				232
				233	In order to do this, we first declare and initialize the JIT. This is
				234	done by adding a global variable and a call in ``main``:
				235
				236	.. code-block:: c++
				237
				238	static ExecutionEngine *TheExecutionEngine;
				239	...
				240	int main() {
				241	..
				242	// Create the JIT. This takes ownership of the module.
				243	TheExecutionEngine = EngineBuilder(TheModule).create();
				244	..
				245	}
				246
				247	This creates an abstract "Execution Engine" which can be either a JIT
				248	compiler or the LLVM interpreter. LLVM will automatically pick a JIT
				249	compiler for you if one is available for your platform, otherwise it
				250	will fall back to the interpreter.
				251
				252	Once the ``ExecutionEngine`` is created, the JIT is ready to be used.
				253	There are a variety of APIs that are useful, but the simplest one is the
				254	"``getPointerToFunction(F)``" method. This method JIT compiles the
				255	specified LLVM Function and returns a function pointer to the generated
				256	machine code. In our case, this means that we can change the code that
				257	parses a top-level expression to look like this:
				258
				259	.. code-block:: c++
				260
				261	static void HandleTopLevelExpression() {
				262	// Evaluate a top-level expression into an anonymous function.
				263	if (FunctionAST *F = ParseTopLevelExpr()) {
				264	if (Function *LF = F->Codegen()) {
				265	LF->dump(); // Dump the function for exposition purposes.
				266
				267	// JIT the function, returning a function pointer.
				268	void *FPtr = TheExecutionEngine->getPointerToFunction(LF);
				269
				270	// Cast it to the right type (takes no arguments, returns a double) so we
				271	// can call it as a native function.
				272	double (FP)() = (double ()())(intptr_t)FPtr;
				273	fprintf(stderr, "Evaluated to %f\n", FP());
				274	}
				275
				276	Recall that we compile top-level expressions into a self-contained LLVM
				277	function that takes no arguments and returns the computed double.
				278	Because the LLVM JIT compiler matches the native platform ABI, this
				279	means that you can just cast the result pointer to a function pointer of
				280	that type and call it directly. This means, there is no difference
				281	between JIT compiled code and native machine code that is statically
				282	linked into your application.
				283
				284	With just these two changes, lets see how Kaleidoscope works now!
				285
				286	::
				287
				288	ready> 4+5;
				289	Read top-level expression:
				290	define double @0() {
				291	entry:
				292	ret double 9.000000e+00
				293	}
				294
				295	Evaluated to 9.000000
				296
				297	Well this looks like it is basically working. The dump of the function
				298	shows the "no argument function that always returns double" that we
				299	synthesize for each top-level expression that is typed in. This
				300	demonstrates very basic functionality, but can we do more?
				301
				302	::
				303
				304	ready> def testfunc(x y) x + y*2;
				305	Read function definition:
				306	define double @testfunc(double %x, double %y) {
				307	entry:
				308	%multmp = fmul double %y, 2.000000e+00
				309	%addtmp = fadd double %multmp, %x
				310	ret double %addtmp
				311	}
				312
				313	ready> testfunc(4, 10);
				314	Read top-level expression:
				315	define double @1() {
				316	entry:
				317	%calltmp = call double @testfunc(double 4.000000e+00, double 1.000000e+01)
				318	ret double %calltmp
				319	}
				320
				321	Evaluated to 24.000000
				322
				323	This illustrates that we can now call user code, but there is something
				324	a bit subtle going on here. Note that we only invoke the JIT on the
				325	anonymous functions that call testfunc, but we never invoked it on
				326	testfunc itself. What actually happened here is that the JIT scanned
				327	for all non-JIT'd functions transitively called from the anonymous
				328	function and compiled all of them before returning from
				329	``getPointerToFunction()``.
				330
				331	The JIT provides a number of other more advanced interfaces for things
				332	like freeing allocated machine code, rejit'ing functions to update them,
				333	etc. However, even with this simple code, we get some surprisingly
				334	powerful capabilities - check this out (I removed the dump of the
				335	anonymous functions, you should get the idea by now :) :
				336
				337	::
				338
				339	ready> extern sin(x);
				340	Read extern:
				341	declare double @sin(double)
				342
				343	ready> extern cos(x);
				344	Read extern:
				345	declare double @cos(double)
				346
				347	ready> sin(1.0);
				348	Read top-level expression:
				349	define double @2() {
				350	entry:
				351	ret double 0x3FEAED548F090CEE
				352	}
				353
				354	Evaluated to 0.841471
				355
				356	ready> def foo(x) sin(x)sin(x) + cos(x)cos(x);
				357	Read function definition:
				358	define double @foo(double %x) {
				359	entry:
				360	%calltmp = call double @sin(double %x)
				361	%multmp = fmul double %calltmp, %calltmp
				362	%calltmp2 = call double @cos(double %x)
				363	%multmp4 = fmul double %calltmp2, %calltmp2
				364	%addtmp = fadd double %multmp, %multmp4
				365	ret double %addtmp
				366	}
				367
				368	ready> foo(4.0);
				369	Read top-level expression:
				370	define double @3() {
				371	entry:
				372	%calltmp = call double @foo(double 4.000000e+00)
				373	ret double %calltmp
				374	}
				375
				376	Evaluated to 1.000000
				377
				378	Whoa, how does the JIT know about sin and cos? The answer is
				379	surprisingly simple: in this example, the JIT started execution of a
				380	function and got to a function call. It realized that the function was
				381	not yet JIT compiled and invoked the standard set of routines to resolve
				382	the function. In this case, there is no body defined for the function,
				383	so the JIT ended up calling "``dlsym("sin")``" on the Kaleidoscope
				384	process itself. Since "``sin``" is defined within the JIT's address
				385	space, it simply patches up calls in the module to call the libm version
				386	of ``sin`` directly.
				387
				388	The LLVM JIT provides a number of interfaces (look in the
				389	``ExecutionEngine.h`` file) for controlling how unknown functions get
				390	resolved. It allows you to establish explicit mappings between IR
				391	objects and addresses (useful for LLVM global variables that you want to
				392	map to static tables, for example), allows you to dynamically decide on
				393	the fly based on the function name, and even allows you to have the JIT
				394	compile functions lazily the first time they're called.
				395
				396	One interesting application of this is that we can now extend the
				397	language by writing arbitrary C++ code to implement operations. For
				398	example, if we add:
				399
				400	.. code-block:: c++
				401
				402	/// putchard - putchar that takes a double and returns 0.
				403	extern "C"
				404	double putchard(double X) {
				405	putchar((char)X);
				406	return 0;
				407	}
				408
				409	Now we can produce simple output to the console by using things like:
				410	"``extern putchard(x); putchard(120);``", which prints a lowercase 'x'
				411	on the console (120 is the ASCII code for 'x'). Similar code could be
				412	used to implement file I/O, console input, and many other capabilities
				413	in Kaleidoscope.
				414
				415	This completes the JIT and optimizer chapter of the Kaleidoscope
				416	tutorial. At this point, we can compile a non-Turing-complete
				417	programming language, optimize and JIT compile it in a user-driven way.
				418	Next up we'll look into `extending the language with control flow
				419	constructs <LangImpl5.html>`_, tackling some interesting LLVM IR issues
				420	along the way.
				421
				422	Full Code Listing
				423	=================
				424
				425	Here is the complete code listing for our running example, enhanced with
				426	the LLVM JIT and optimizer. To build this example, use:
				427
				428	.. code-block:: bash
				429
				430	# Compile
				431	clang++ -g toy.cpp `llvm-config --cppflags --ldflags --libs core jit native` -O3 -o toy
				432	# Run
				433	./toy
				434
				435	If you are compiling this on Linux, make sure to add the "-rdynamic"
				436	option as well. This makes sure that the external functions are resolved
				437	properly at runtime.
				438
				439	Here is the code:
				440
Logan Chien	2dce4f2	2013-06-08 09:03:03 +0000	[diff] [blame^]	441	.. literalinclude:: ../../examples/Kaleidoscope/Chapter4/toy.cpp
				442	:language: c++
Sean Silva	ee47edf	2012-12-05 00:26:32 +0000	[diff] [blame]	443
				444	`Next: Extending the language: control flow <LangImpl5.html>`_
				445