Blame - llvm/docs/FAQ.rst - toolchain/llvm-project

blob: 2c69abfdd0bc572ecd97c751f0a4e26048ad8ded [file] [log] [blame]

Michael J. Spencer	626a4ec	2012-06-18 20:21:38 +0000	[diff] [blame]	1	================================
				2	Frequently Asked Questions (FAQ)
				3	================================
				4
				5	.. contents::
				6	:local:
				7
				8
				9	License
				10	=======
				11
				12	Does the University of Illinois Open Source License really qualify as an "open source" license?
				13	-----------------------------------------------------------------------------------------------
				14	Yes, the license is `certified
				15	<http://www.opensource.org/licenses/UoI-NCSA.php>`_ by the Open Source
				16	Initiative (OSI).
				17
				18
				19	Can I modify LLVM source code and redistribute the modified source?
				20	-------------------------------------------------------------------
				21	Yes. The modified source distribution must retain the copyright notice and
Sylvestre Ledru	0455cbe	2016-07-28 09:28:58 +0000	[diff] [blame]	22	follow the three bulleted conditions listed in the `LLVM license
Michael J. Spencer	626a4ec	2012-06-18 20:21:38 +0000	[diff] [blame]	23	<http://llvm.org/svn/llvm-project/llvm/trunk/LICENSE.TXT>`_.
				24
				25
				26	Can I modify the LLVM source code and redistribute binaries or other tools based on it, without redistributing the source?
				27	--------------------------------------------------------------------------------------------------------------------------
				28	Yes. This is why we distribute LLVM under a less restrictive license than GPL,
				29	as explained in the first question above.
				30
				31
				32	Source Code
				33	===========
				34
				35	In what language is LLVM written?
				36	---------------------------------
				37	All of the LLVM tools and libraries are written in C++ with extensive use of
				38	the STL.
				39
				40
				41	How portable is the LLVM source code?
				42	-------------------------------------
				43	The LLVM source code should be portable to most modern Unix-like operating
				44	systems. Most of the code is written in standard C++ with operating system
				45	services abstracted to a support library. The tools required to build and
				46	test LLVM have been ported to a plethora of platforms.
				47
Sean Silva	0f2eabc	2012-12-27 10:23:04 +0000	[diff] [blame]	48	What API do I use to store a value to one of the virtual registers in LLVM IR's SSA representation?
				49	---------------------------------------------------------------------------------------------------
				50
				51	In short: you can't. It's actually kind of a silly question once you grok
				52	what's going on. Basically, in code like:
				53
				54	.. code-block:: llvm
				55
				56	%result = add i32 %foo, %bar
				57
				58	, ``%result`` is just a name given to the ``Value`` of the ``add``
				59	instruction. In other words, ``%result`` is the add instruction. The
				60	"assignment" doesn't explicitly "store" anything to any "virtual register";
				61	the "``=``" is more like the mathematical sense of equality.
				62
				63	Longer explanation: In order to generate a textual representation of the
				64	IR, some kind of name has to be given to each instruction so that other
				65	instructions can textually reference it. However, the isomorphic in-memory
				66	representation that you manipulate from C++ has no such restriction since
				67	instructions can simply keep pointers to any other ``Value``'s that they
				68	reference. In fact, the names of dummy numbered temporaries like ``%1`` are
				69	not explicitly represented in the in-memory representation at all (see
				70	``Value::getName()``).
Michael J. Spencer	626a4ec	2012-06-18 20:21:38 +0000	[diff] [blame]	71
Michael J. Spencer	626a4ec	2012-06-18 20:21:38 +0000	[diff] [blame]	72
				73	Source Languages
				74	================
				75
				76	What source languages are supported?
				77	------------------------------------
Michael J. Spencer	626a4ec	2012-06-18 20:21:38 +0000	[diff] [blame]	78
Wilfred Hughes	73a0dac	2016-03-12 00:43:26 +0000	[diff] [blame]	79	LLVM currently has full support for C and C++ source languages through
				80	`Clang <http://clang.llvm.org/>`_. Many other language frontends have
				81	been written using LLVM, and an incomplete list is available at
				82	`projects with LLVM <http://llvm.org/ProjectsWithLLVM/>`_.
Michael J. Spencer	626a4ec	2012-06-18 20:21:38 +0000	[diff] [blame]	83
				84
				85	I'd like to write a self-hosting LLVM compiler. How should I interface with the LLVM middle-end optimizers and back-end code generators?
				86	----------------------------------------------------------------------------------------------------------------------------------------
				87	Your compiler front-end will communicate with LLVM by creating a module in the
				88	LLVM intermediate representation (IR) format. Assuming you want to write your
				89	language's compiler in the language itself (rather than C++), there are 3
				90	major ways to tackle generating LLVM IR from a front-end:
				91
				92	1. **Call into the LLVM libraries code using your language's FFI (foreign
				93	function interface).**
				94
				95	* for: best tracks changes to the LLVM IR, .ll syntax, and .bc format
				96
				97	* for: enables running LLVM optimization passes without a emit/parse
				98	overhead
				99
				100	* for: adapts well to a JIT context
				101
				102	* against: lots of ugly glue code to write
				103
				104	2. Emit LLVM assembly from your compiler's native language.
				105
				106	* for: very straightforward to get started
				107
				108	* against: the .ll parser is slower than the bitcode reader when
				109	interfacing to the middle end
				110
				111	* against: it may be harder to track changes to the IR
				112
				113	3. Emit LLVM bitcode from your compiler's native language.
				114
				115	* for: can use the more-efficient bitcode reader when interfacing to the
				116	middle end
				117
				118	* against: you'll have to re-engineer the LLVM IR object model and bitcode
				119	writer in your language
				120
				121	* against: it may be harder to track changes to the IR
				122
				123	If you go with the first option, the C bindings in include/llvm-c should help
				124	a lot, since most languages have strong support for interfacing with C. The
				125	most common hurdle with calling C from managed code is interfacing with the
				126	garbage collector. The C interface was designed to require very little memory
				127	management, and so is straightforward in this regard.
				128
				129	What support is there for a higher level source language constructs for building a compiler?
				130	--------------------------------------------------------------------------------------------
				131	Currently, there isn't much. LLVM supports an intermediate representation
				132	which is useful for code representation but will not support the high level
				133	(abstract syntax tree) representation needed by most compilers. There are no
				134	facilities for lexical nor semantic analysis.
				135
				136
				137	I don't understand the ``GetElementPtr`` instruction. Help!
				138	-----------------------------------------------------------
				139	See `The Often Misunderstood GEP Instruction <GetElementPtr.html>`_.
				140
				141
				142	Using the C and C++ Front Ends
				143	==============================
				144
				145	Can I compile C or C++ code to platform-independent LLVM bitcode?
				146	-----------------------------------------------------------------
				147	No. C and C++ are inherently platform-dependent languages. The most obvious
				148	example of this is the preprocessor. A very common way that C code is made
				149	portable is by using the preprocessor to include platform-specific code. In
				150	practice, information about other platforms is lost after preprocessing, so
				151	the result is inherently dependent on the platform that the preprocessing was
				152	targeting.
				153
				154	Another example is ``sizeof``. It's common for ``sizeof(long)`` to vary
				155	between platforms. In most C front-ends, ``sizeof`` is expanded to a
				156	constant immediately, thus hard-wiring a platform-specific detail.
				157
				158	Also, since many platforms define their ABIs in terms of C, and since LLVM is
				159	lower-level than C, front-ends currently must emit platform-specific IR in
				160	order to have the result conform to the platform ABI.
				161
				162
				163	Questions about code generated by the demo page
				164	===============================================
				165
				166	What is this ``llvm.global_ctors`` and ``_GLOBAL__I_a...`` stuff that happens when I ``#include <iostream>``?
				167	-------------------------------------------------------------------------------------------------------------
				168	If you ``#include`` the ``<iostream>`` header into a C++ translation unit,
				169	the file will probably use the ``std::cin``/``std::cout``/... global objects.
				170	However, C++ does not guarantee an order of initialization between static
				171	objects in different translation units, so if a static ctor/dtor in your .cpp
				172	file used ``std::cout``, for example, the object would not necessarily be
				173	automatically initialized before your use.
				174
				175	To make ``std::cout`` and friends work correctly in these scenarios, the STL
				176	that we use declares a static object that gets created in every translation
				177	unit that includes ``<iostream>``. This object has a static constructor
				178	and destructor that initializes and destroys the global iostream objects
				179	before they could possibly be used in the file. The code that you see in the
				180	``.ll`` file corresponds to the constructor and destructor registration code.
				181
				182	If you would like to make it easier to understand the LLVM code generated
				183	by the compiler in the demo page, consider using ``printf()`` instead of
				184	``iostream``\s to print values.
				185
				186
				187	Where did all of my code go??
				188	-----------------------------
				189	If you are using the LLVM demo page, you may often wonder what happened to
				190	all of the code that you typed in. Remember that the demo script is running
				191	the code through the LLVM optimizers, so if your code doesn't actually do
				192	anything useful, it might all be deleted.
				193
				194	To prevent this, make sure that the code is actually needed. For example, if
				195	you are computing some expression, return the value from the function instead
				196	of leaving it in a local variable. If you really want to constrain the
				197	optimizer, you can read from and assign to ``volatile`` global variables.
				198
				199
				200	What is this "``undef``" thing that shows up in my code?
				201	--------------------------------------------------------
				202	``undef`` is the LLVM way of representing a value that is not defined. You
				203	can get these if you do not initialize a variable before you use it. For
				204	example, the C function:
				205
				206	.. code-block:: c
				207
				208	int X() { int i; return i; }
				209
				210	Is compiled to "``ret i32 undef``" because "``i``" never has a value specified
				211	for it.
				212
				213
				214	Why does instcombine + simplifycfg turn a call to a function with a mismatched calling convention into "unreachable"? Why not make the verifier reject it?
				215	----------------------------------------------------------------------------------------------------------------------------------------------------------
				216	This is a common problem run into by authors of front-ends that are using
				217	custom calling conventions: you need to make sure to set the right calling
				218	convention on both the function and on each call to the function. For
				219	example, this code:
				220
				221	.. code-block:: llvm
				222
				223	define fastcc void @foo() {
				224	ret void
				225	}
				226	define void @bar() {
				227	call void @foo()
				228	ret void
				229	}
				230
				231	Is optimized to:
				232
				233	.. code-block:: llvm
				234
				235	define fastcc void @foo() {
				236	ret void
				237	}
				238	define void @bar() {
				239	unreachable
				240	}
				241
				242	... with "``opt -instcombine -simplifycfg``". This often bites people because
				243	"all their code disappears". Setting the calling convention on the caller and
				244	callee is required for indirect calls to work, so people often ask why not
				245	make the verifier reject this sort of thing.
				246
				247	The answer is that this code has undefined behavior, but it is not illegal.
				248	If we made it illegal, then every transformation that could potentially create
				249	this would have to ensure that it doesn't, and there is valid code that can
				250	create this sort of construct (in dead code). The sorts of things that can
				251	cause this to happen are fairly contrived, but we still need to accept them.
				252	Here's an example:
				253
				254	.. code-block:: llvm
				255
				256	define fastcc void @foo() {
				257	ret void
				258	}
				259	define internal void @bar(void()* %FP, i1 %cond) {
				260	br i1 %cond, label %T, label %F
				261	T:
				262	call void %FP()
				263	ret void
				264	F:
				265	call fastcc void %FP()
				266	ret void
				267	}
				268	define void @test() {
				269	%X = or i1 false, false
				270	call void @bar(void()* @foo, i1 %X)
				271	ret void
				272	}
				273
				274	In this example, "test" always passes ``@foo``/``false`` into ``bar``, which
				275	ensures that it is dynamically called with the right calling conv (thus, the
				276	code is perfectly well defined). If you run this through the inliner, you
				277	get this (the explicit "or" is there so that the inliner doesn't dead code
				278	eliminate a bunch of stuff):
				279
				280	.. code-block:: llvm
				281
				282	define fastcc void @foo() {
				283	ret void
				284	}
				285	define void @test() {
				286	%X = or i1 false, false
				287	br i1 %X, label %T.i, label %F.i
				288	T.i:
				289	call void @foo()
				290	br label %bar.exit
				291	F.i:
				292	call fastcc void @foo()
				293	br label %bar.exit
				294	bar.exit:
				295	ret void
				296	}
				297
				298	Here you can see that the inlining pass made an undefined call to ``@foo``
				299	with the wrong calling convention. We really don't want to make the inliner
				300	have to know about this sort of thing, so it needs to be valid code. In this
				301	case, dead code elimination can trivially remove the undefined code. However,
				302	if ``%X`` was an input argument to ``@test``, the inliner would produce this:
				303
				304	.. code-block:: llvm
				305
				306	define fastcc void @foo() {
				307	ret void
				308	}
				309
				310	define void @test(i1 %X) {
				311	br i1 %X, label %T.i, label %F.i
				312	T.i:
				313	call void @foo()
				314	br label %bar.exit
				315	F.i:
				316	call fastcc void @foo()
				317	br label %bar.exit
				318	bar.exit:
				319	ret void
				320	}
				321
				322	The interesting thing about this is that ``%X`` must be false for the
				323	code to be well-defined, but no amount of dead code elimination will be able
				324	to delete the broken call as unreachable. However, since
				325	``instcombine``/``simplifycfg`` turns the undefined call into unreachable, we
				326	end up with a branch on a condition that goes to unreachable: a branch to
				327	unreachable can never happen, so "``-inline -instcombine -simplifycfg``" is
				328	able to produce:
				329
				330	.. code-block:: llvm
				331
				332	define fastcc void @foo() {
				333	ret void
				334	}
				335	define void @test(i1 %X) {
				336	F.i:
				337	call fastcc void @foo()
				338	ret void
				339	}