blob: 2c69abfdd0bc572ecd97c751f0a4e26048ad8ded [file] [log] [blame]
Michael J. Spencer626a4ec2012-06-18 20:21:38 +00001================================
2Frequently Asked Questions (FAQ)
3================================
4
5.. contents::
6 :local:
7
8
9License
10=======
11
12Does the University of Illinois Open Source License really qualify as an "open source" license?
13-----------------------------------------------------------------------------------------------
14Yes, the license is `certified
15<http://www.opensource.org/licenses/UoI-NCSA.php>`_ by the Open Source
16Initiative (OSI).
17
18
19Can I modify LLVM source code and redistribute the modified source?
20-------------------------------------------------------------------
21Yes. The modified source distribution must retain the copyright notice and
Sylvestre Ledru0455cbe2016-07-28 09:28:58 +000022follow the three bulleted conditions listed in the `LLVM license
Michael J. Spencer626a4ec2012-06-18 20:21:38 +000023<http://llvm.org/svn/llvm-project/llvm/trunk/LICENSE.TXT>`_.
24
25
26Can I modify the LLVM source code and redistribute binaries or other tools based on it, without redistributing the source?
27--------------------------------------------------------------------------------------------------------------------------
28Yes. This is why we distribute LLVM under a less restrictive license than GPL,
29as explained in the first question above.
30
31
32Source Code
33===========
34
35In what language is LLVM written?
36---------------------------------
37All of the LLVM tools and libraries are written in C++ with extensive use of
38the STL.
39
40
41How portable is the LLVM source code?
42-------------------------------------
43The LLVM source code should be portable to most modern Unix-like operating
44systems. Most of the code is written in standard C++ with operating system
45services abstracted to a support library. The tools required to build and
46test LLVM have been ported to a plethora of platforms.
47
Sean Silva0f2eabc2012-12-27 10:23:04 +000048What API do I use to store a value to one of the virtual registers in LLVM IR's SSA representation?
49---------------------------------------------------------------------------------------------------
50
51In short: you can't. It's actually kind of a silly question once you grok
52what's going on. Basically, in code like:
53
54.. code-block:: llvm
55
56 %result = add i32 %foo, %bar
57
58, ``%result`` is just a name given to the ``Value`` of the ``add``
59instruction. In other words, ``%result`` *is* the add instruction. The
60"assignment" doesn't explicitly "store" anything to any "virtual register";
61the "``=``" is more like the mathematical sense of equality.
62
63Longer explanation: In order to generate a textual representation of the
64IR, some kind of name has to be given to each instruction so that other
65instructions can textually reference it. However, the isomorphic in-memory
66representation that you manipulate from C++ has no such restriction since
67instructions can simply keep pointers to any other ``Value``'s that they
68reference. In fact, the names of dummy numbered temporaries like ``%1`` are
69not explicitly represented in the in-memory representation at all (see
70``Value::getName()``).
Michael J. Spencer626a4ec2012-06-18 20:21:38 +000071
Michael J. Spencer626a4ec2012-06-18 20:21:38 +000072
73Source Languages
74================
75
76What source languages are supported?
77------------------------------------
Michael J. Spencer626a4ec2012-06-18 20:21:38 +000078
Wilfred Hughes73a0dac2016-03-12 00:43:26 +000079LLVM currently has full support for C and C++ source languages through
80`Clang <http://clang.llvm.org/>`_. Many other language frontends have
81been written using LLVM, and an incomplete list is available at
82`projects with LLVM <http://llvm.org/ProjectsWithLLVM/>`_.
Michael J. Spencer626a4ec2012-06-18 20:21:38 +000083
84
85I'd like to write a self-hosting LLVM compiler. How should I interface with the LLVM middle-end optimizers and back-end code generators?
86----------------------------------------------------------------------------------------------------------------------------------------
87Your compiler front-end will communicate with LLVM by creating a module in the
88LLVM intermediate representation (IR) format. Assuming you want to write your
89language's compiler in the language itself (rather than C++), there are 3
90major ways to tackle generating LLVM IR from a front-end:
91
921. **Call into the LLVM libraries code using your language's FFI (foreign
93 function interface).**
94
95 * *for:* best tracks changes to the LLVM IR, .ll syntax, and .bc format
96
97 * *for:* enables running LLVM optimization passes without a emit/parse
98 overhead
99
100 * *for:* adapts well to a JIT context
101
102 * *against:* lots of ugly glue code to write
103
1042. **Emit LLVM assembly from your compiler's native language.**
105
106 * *for:* very straightforward to get started
107
108 * *against:* the .ll parser is slower than the bitcode reader when
109 interfacing to the middle end
110
111 * *against:* it may be harder to track changes to the IR
112
1133. **Emit LLVM bitcode from your compiler's native language.**
114
115 * *for:* can use the more-efficient bitcode reader when interfacing to the
116 middle end
117
118 * *against:* you'll have to re-engineer the LLVM IR object model and bitcode
119 writer in your language
120
121 * *against:* it may be harder to track changes to the IR
122
123If you go with the first option, the C bindings in include/llvm-c should help
124a lot, since most languages have strong support for interfacing with C. The
125most common hurdle with calling C from managed code is interfacing with the
126garbage collector. The C interface was designed to require very little memory
127management, and so is straightforward in this regard.
128
129What support is there for a higher level source language constructs for building a compiler?
130--------------------------------------------------------------------------------------------
131Currently, there isn't much. LLVM supports an intermediate representation
132which is useful for code representation but will not support the high level
133(abstract syntax tree) representation needed by most compilers. There are no
134facilities for lexical nor semantic analysis.
135
136
137I don't understand the ``GetElementPtr`` instruction. Help!
138-----------------------------------------------------------
139See `The Often Misunderstood GEP Instruction <GetElementPtr.html>`_.
140
141
142Using the C and C++ Front Ends
143==============================
144
145Can I compile C or C++ code to platform-independent LLVM bitcode?
146-----------------------------------------------------------------
147No. C and C++ are inherently platform-dependent languages. The most obvious
148example of this is the preprocessor. A very common way that C code is made
149portable is by using the preprocessor to include platform-specific code. In
150practice, information about other platforms is lost after preprocessing, so
151the result is inherently dependent on the platform that the preprocessing was
152targeting.
153
154Another example is ``sizeof``. It's common for ``sizeof(long)`` to vary
155between platforms. In most C front-ends, ``sizeof`` is expanded to a
156constant immediately, thus hard-wiring a platform-specific detail.
157
158Also, since many platforms define their ABIs in terms of C, and since LLVM is
159lower-level than C, front-ends currently must emit platform-specific IR in
160order to have the result conform to the platform ABI.
161
162
163Questions about code generated by the demo page
164===============================================
165
166What is this ``llvm.global_ctors`` and ``_GLOBAL__I_a...`` stuff that happens when I ``#include <iostream>``?
167-------------------------------------------------------------------------------------------------------------
168If you ``#include`` the ``<iostream>`` header into a C++ translation unit,
169the file will probably use the ``std::cin``/``std::cout``/... global objects.
170However, C++ does not guarantee an order of initialization between static
171objects in different translation units, so if a static ctor/dtor in your .cpp
172file used ``std::cout``, for example, the object would not necessarily be
173automatically initialized before your use.
174
175To make ``std::cout`` and friends work correctly in these scenarios, the STL
176that we use declares a static object that gets created in every translation
177unit that includes ``<iostream>``. This object has a static constructor
178and destructor that initializes and destroys the global iostream objects
179before they could possibly be used in the file. The code that you see in the
180``.ll`` file corresponds to the constructor and destructor registration code.
181
182If you would like to make it easier to *understand* the LLVM code generated
183by the compiler in the demo page, consider using ``printf()`` instead of
184``iostream``\s to print values.
185
186
187Where did all of my code go??
188-----------------------------
189If you are using the LLVM demo page, you may often wonder what happened to
190all of the code that you typed in. Remember that the demo script is running
191the code through the LLVM optimizers, so if your code doesn't actually do
192anything useful, it might all be deleted.
193
194To prevent this, make sure that the code is actually needed. For example, if
195you are computing some expression, return the value from the function instead
196of leaving it in a local variable. If you really want to constrain the
197optimizer, you can read from and assign to ``volatile`` global variables.
198
199
200What is this "``undef``" thing that shows up in my code?
201--------------------------------------------------------
202``undef`` is the LLVM way of representing a value that is not defined. You
203can get these if you do not initialize a variable before you use it. For
204example, the C function:
205
206.. code-block:: c
207
208 int X() { int i; return i; }
209
210Is compiled to "``ret i32 undef``" because "``i``" never has a value specified
211for it.
212
213
214Why does instcombine + simplifycfg turn a call to a function with a mismatched calling convention into "unreachable"? Why not make the verifier reject it?
215----------------------------------------------------------------------------------------------------------------------------------------------------------
216This is a common problem run into by authors of front-ends that are using
217custom calling conventions: you need to make sure to set the right calling
218convention on both the function and on each call to the function. For
219example, this code:
220
221.. code-block:: llvm
222
223 define fastcc void @foo() {
224 ret void
225 }
226 define void @bar() {
227 call void @foo()
228 ret void
229 }
230
231Is optimized to:
232
233.. code-block:: llvm
234
235 define fastcc void @foo() {
236 ret void
237 }
238 define void @bar() {
239 unreachable
240 }
241
242... with "``opt -instcombine -simplifycfg``". This often bites people because
243"all their code disappears". Setting the calling convention on the caller and
244callee is required for indirect calls to work, so people often ask why not
245make the verifier reject this sort of thing.
246
247The answer is that this code has undefined behavior, but it is not illegal.
248If we made it illegal, then every transformation that could potentially create
249this would have to ensure that it doesn't, and there is valid code that can
250create this sort of construct (in dead code). The sorts of things that can
251cause this to happen are fairly contrived, but we still need to accept them.
252Here's an example:
253
254.. code-block:: llvm
255
256 define fastcc void @foo() {
257 ret void
258 }
259 define internal void @bar(void()* %FP, i1 %cond) {
260 br i1 %cond, label %T, label %F
261 T:
262 call void %FP()
263 ret void
264 F:
265 call fastcc void %FP()
266 ret void
267 }
268 define void @test() {
269 %X = or i1 false, false
270 call void @bar(void()* @foo, i1 %X)
271 ret void
272 }
273
274In this example, "test" always passes ``@foo``/``false`` into ``bar``, which
275ensures that it is dynamically called with the right calling conv (thus, the
276code is perfectly well defined). If you run this through the inliner, you
277get this (the explicit "or" is there so that the inliner doesn't dead code
278eliminate a bunch of stuff):
279
280.. code-block:: llvm
281
282 define fastcc void @foo() {
283 ret void
284 }
285 define void @test() {
286 %X = or i1 false, false
287 br i1 %X, label %T.i, label %F.i
288 T.i:
289 call void @foo()
290 br label %bar.exit
291 F.i:
292 call fastcc void @foo()
293 br label %bar.exit
294 bar.exit:
295 ret void
296 }
297
298Here you can see that the inlining pass made an undefined call to ``@foo``
299with the wrong calling convention. We really don't want to make the inliner
300have to know about this sort of thing, so it needs to be valid code. In this
301case, dead code elimination can trivially remove the undefined code. However,
302if ``%X`` was an input argument to ``@test``, the inliner would produce this:
303
304.. code-block:: llvm
305
306 define fastcc void @foo() {
307 ret void
308 }
309
310 define void @test(i1 %X) {
311 br i1 %X, label %T.i, label %F.i
312 T.i:
313 call void @foo()
314 br label %bar.exit
315 F.i:
316 call fastcc void @foo()
317 br label %bar.exit
318 bar.exit:
319 ret void
320 }
321
322The interesting thing about this is that ``%X`` *must* be false for the
323code to be well-defined, but no amount of dead code elimination will be able
324to delete the broken call as unreachable. However, since
325``instcombine``/``simplifycfg`` turns the undefined call into unreachable, we
326end up with a branch on a condition that goes to unreachable: a branch to
327unreachable can never happen, so "``-inline -instcombine -simplifycfg``" is
328able to produce:
329
330.. code-block:: llvm
331
332 define fastcc void @foo() {
333 ret void
334 }
335 define void @test(i1 %X) {
336 F.i:
337 call fastcc void @foo()
338 ret void
339 }