Michael J. Spencer | 0ed5cf4 | 2012-06-18 20:21:38 +0000 | [diff] [blame] | 1 | .. _faq: |
| 2 | |
| 3 | ================================ |
| 4 | Frequently Asked Questions (FAQ) |
| 5 | ================================ |
| 6 | |
| 7 | .. contents:: |
| 8 | :local: |
| 9 | |
| 10 | |
| 11 | License |
| 12 | ======= |
| 13 | |
| 14 | Does the University of Illinois Open Source License really qualify as an "open source" license? |
| 15 | ----------------------------------------------------------------------------------------------- |
| 16 | Yes, the license is `certified |
| 17 | <http://www.opensource.org/licenses/UoI-NCSA.php>`_ by the Open Source |
| 18 | Initiative (OSI). |
| 19 | |
| 20 | |
| 21 | Can I modify LLVM source code and redistribute the modified source? |
| 22 | ------------------------------------------------------------------- |
| 23 | Yes. The modified source distribution must retain the copyright notice and |
| 24 | follow the three bulletted conditions listed in the `LLVM license |
| 25 | <http://llvm.org/svn/llvm-project/llvm/trunk/LICENSE.TXT>`_. |
| 26 | |
| 27 | |
| 28 | Can I modify the LLVM source code and redistribute binaries or other tools based on it, without redistributing the source? |
| 29 | -------------------------------------------------------------------------------------------------------------------------- |
| 30 | Yes. This is why we distribute LLVM under a less restrictive license than GPL, |
| 31 | as explained in the first question above. |
| 32 | |
| 33 | |
| 34 | Source Code |
| 35 | =========== |
| 36 | |
| 37 | In what language is LLVM written? |
| 38 | --------------------------------- |
| 39 | All of the LLVM tools and libraries are written in C++ with extensive use of |
| 40 | the STL. |
| 41 | |
| 42 | |
| 43 | How portable is the LLVM source code? |
| 44 | ------------------------------------- |
| 45 | The LLVM source code should be portable to most modern Unix-like operating |
| 46 | systems. Most of the code is written in standard C++ with operating system |
| 47 | services abstracted to a support library. The tools required to build and |
| 48 | test LLVM have been ported to a plethora of platforms. |
| 49 | |
| 50 | Some porting problems may exist in the following areas: |
| 51 | |
| 52 | * The autoconf/makefile build system relies heavily on UNIX shell tools, |
| 53 | like the Bourne Shell and sed. Porting to systems without these tools |
| 54 | (MacOS 9, Plan 9) will require more effort. |
| 55 | |
| 56 | |
| 57 | Build Problems |
| 58 | ============== |
| 59 | |
| 60 | When I run configure, it finds the wrong C compiler. |
| 61 | ---------------------------------------------------- |
| 62 | The ``configure`` script attempts to locate first ``gcc`` and then ``cc``, |
| 63 | unless it finds compiler paths set in ``CC`` and ``CXX`` for the C and C++ |
| 64 | compiler, respectively. |
| 65 | |
| 66 | If ``configure`` finds the wrong compiler, either adjust your ``PATH`` |
| 67 | environment variable or set ``CC`` and ``CXX`` explicitly. |
| 68 | |
| 69 | |
| 70 | The ``configure`` script finds the right C compiler, but it uses the LLVM tools from a previous build. What do I do? |
| 71 | --------------------------------------------------------------------------------------------------------------------- |
| 72 | The ``configure`` script uses the ``PATH`` to find executables, so if it's |
| 73 | grabbing the wrong linker/assembler/etc, there are two ways to fix it: |
| 74 | |
| 75 | #. Adjust your ``PATH`` environment variable so that the correct program |
| 76 | appears first in the ``PATH``. This may work, but may not be convenient |
| 77 | when you want them *first* in your path for other work. |
| 78 | |
| 79 | #. Run ``configure`` with an alternative ``PATH`` that is correct. In a |
| 80 | Bourne compatible shell, the syntax would be: |
| 81 | |
| 82 | .. code-block:: bash |
| 83 | |
| 84 | % PATH=[the path without the bad program] ./configure ... |
| 85 | |
| 86 | This is still somewhat inconvenient, but it allows ``configure`` to do its |
| 87 | work without having to adjust your ``PATH`` permanently. |
| 88 | |
| 89 | |
| 90 | When creating a dynamic library, I get a strange GLIBC error. |
| 91 | ------------------------------------------------------------- |
| 92 | Under some operating systems (i.e. Linux), libtool does not work correctly if |
| 93 | GCC was compiled with the ``--disable-shared option``. To work around this, |
| 94 | install your own version of GCC that has shared libraries enabled by default. |
| 95 | |
| 96 | |
| 97 | I've updated my source tree from Subversion, and now my build is trying to use a file/directory that doesn't exist. |
| 98 | ------------------------------------------------------------------------------------------------------------------- |
| 99 | You need to re-run configure in your object directory. When new Makefiles |
| 100 | are added to the source tree, they have to be copied over to the object tree |
| 101 | in order to be used by the build. |
| 102 | |
| 103 | |
| 104 | I've modified a Makefile in my source tree, but my build tree keeps using the old version. What do I do? |
| 105 | --------------------------------------------------------------------------------------------------------- |
| 106 | If the Makefile already exists in your object tree, you can just run the |
| 107 | following command in the top level directory of your object tree: |
| 108 | |
| 109 | .. code-block:: bash |
| 110 | |
| 111 | % ./config.status <relative path to Makefile>; |
| 112 | |
| 113 | If the Makefile is new, you will have to modify the configure script to copy |
| 114 | it over. |
| 115 | |
| 116 | |
| 117 | I've upgraded to a new version of LLVM, and I get strange build errors. |
| 118 | ----------------------------------------------------------------------- |
| 119 | Sometimes, changes to the LLVM source code alters how the build system works. |
| 120 | Changes in ``libtool``, ``autoconf``, or header file dependencies are |
| 121 | especially prone to this sort of problem. |
| 122 | |
| 123 | The best thing to try is to remove the old files and re-build. In most cases, |
| 124 | this takes care of the problem. To do this, just type ``make clean`` and then |
| 125 | ``make`` in the directory that fails to build. |
| 126 | |
| 127 | |
| 128 | I've built LLVM and am testing it, but the tests freeze. |
| 129 | -------------------------------------------------------- |
| 130 | This is most likely occurring because you built a profile or release |
| 131 | (optimized) build of LLVM and have not specified the same information on the |
| 132 | ``gmake`` command line. |
| 133 | |
| 134 | For example, if you built LLVM with the command: |
| 135 | |
| 136 | .. code-block:: bash |
| 137 | |
| 138 | % gmake ENABLE_PROFILING=1 |
| 139 | |
| 140 | ...then you must run the tests with the following commands: |
| 141 | |
| 142 | .. code-block:: bash |
| 143 | |
| 144 | % cd llvm/test |
| 145 | % gmake ENABLE_PROFILING=1 |
| 146 | |
| 147 | Why do test results differ when I perform different types of builds? |
| 148 | -------------------------------------------------------------------- |
| 149 | The LLVM test suite is dependent upon several features of the LLVM tools and |
| 150 | libraries. |
| 151 | |
| 152 | First, the debugging assertions in code are not enabled in optimized or |
| 153 | profiling builds. Hence, tests that used to fail may pass. |
| 154 | |
| 155 | Second, some tests may rely upon debugging options or behavior that is only |
| 156 | available in the debug build. These tests will fail in an optimized or |
| 157 | profile build. |
| 158 | |
| 159 | |
| 160 | Compiling LLVM with GCC 3.3.2 fails, what should I do? |
| 161 | ------------------------------------------------------ |
| 162 | This is `a bug in GCC <http://gcc.gnu.org/bugzilla/show_bug.cgi?id=13392>`_, |
| 163 | and affects projects other than LLVM. Try upgrading or downgrading your GCC. |
| 164 | |
| 165 | |
| 166 | Compiling LLVM with GCC succeeds, but the resulting tools do not work, what can be wrong? |
| 167 | ----------------------------------------------------------------------------------------- |
| 168 | Several versions of GCC have shown a weakness in miscompiling the LLVM |
| 169 | codebase. Please consult your compiler version (``gcc --version``) to find |
| 170 | out whether it is `broken <GettingStarted.html#brokengcc>`_. If so, your only |
| 171 | option is to upgrade GCC to a known good version. |
| 172 | |
| 173 | |
| 174 | After Subversion update, rebuilding gives the error "No rule to make target". |
| 175 | ----------------------------------------------------------------------------- |
| 176 | If the error is of the form: |
| 177 | |
| 178 | .. code-block:: bash |
| 179 | |
| 180 | gmake[2]: *** No rule to make target `/path/to/somefile', |
| 181 | needed by `/path/to/another/file.d'. |
| 182 | Stop. |
| 183 | |
| 184 | This may occur anytime files are moved within the Subversion repository or |
| 185 | removed entirely. In this case, the best solution is to erase all ``.d`` |
| 186 | files, which list dependencies for source files, and rebuild: |
| 187 | |
| 188 | .. code-block:: bash |
| 189 | |
| 190 | % cd $LLVM_OBJ_DIR |
| 191 | % rm -f `find . -name \*\.d` |
| 192 | % gmake |
| 193 | |
| 194 | In other cases, it may be necessary to run ``make clean`` before rebuilding. |
| 195 | |
| 196 | |
| 197 | Source Languages |
| 198 | ================ |
| 199 | |
| 200 | What source languages are supported? |
| 201 | ------------------------------------ |
| 202 | LLVM currently has full support for C and C++ source languages. These are |
| 203 | available through both `Clang <http://clang.llvm.org/>`_ and `DragonEgg |
| 204 | <http://dragonegg.llvm.org/>`_. |
| 205 | |
| 206 | The PyPy developers are working on integrating LLVM into the PyPy backend so |
| 207 | that PyPy language can translate to LLVM. |
| 208 | |
| 209 | |
| 210 | I'd like to write a self-hosting LLVM compiler. How should I interface with the LLVM middle-end optimizers and back-end code generators? |
| 211 | ---------------------------------------------------------------------------------------------------------------------------------------- |
| 212 | Your compiler front-end will communicate with LLVM by creating a module in the |
| 213 | LLVM intermediate representation (IR) format. Assuming you want to write your |
| 214 | language's compiler in the language itself (rather than C++), there are 3 |
| 215 | major ways to tackle generating LLVM IR from a front-end: |
| 216 | |
| 217 | 1. **Call into the LLVM libraries code using your language's FFI (foreign |
| 218 | function interface).** |
| 219 | |
| 220 | * *for:* best tracks changes to the LLVM IR, .ll syntax, and .bc format |
| 221 | |
| 222 | * *for:* enables running LLVM optimization passes without a emit/parse |
| 223 | overhead |
| 224 | |
| 225 | * *for:* adapts well to a JIT context |
| 226 | |
| 227 | * *against:* lots of ugly glue code to write |
| 228 | |
| 229 | 2. **Emit LLVM assembly from your compiler's native language.** |
| 230 | |
| 231 | * *for:* very straightforward to get started |
| 232 | |
| 233 | * *against:* the .ll parser is slower than the bitcode reader when |
| 234 | interfacing to the middle end |
| 235 | |
| 236 | * *against:* it may be harder to track changes to the IR |
| 237 | |
| 238 | 3. **Emit LLVM bitcode from your compiler's native language.** |
| 239 | |
| 240 | * *for:* can use the more-efficient bitcode reader when interfacing to the |
| 241 | middle end |
| 242 | |
| 243 | * *against:* you'll have to re-engineer the LLVM IR object model and bitcode |
| 244 | writer in your language |
| 245 | |
| 246 | * *against:* it may be harder to track changes to the IR |
| 247 | |
| 248 | If you go with the first option, the C bindings in include/llvm-c should help |
| 249 | a lot, since most languages have strong support for interfacing with C. The |
| 250 | most common hurdle with calling C from managed code is interfacing with the |
| 251 | garbage collector. The C interface was designed to require very little memory |
| 252 | management, and so is straightforward in this regard. |
| 253 | |
| 254 | What support is there for a higher level source language constructs for building a compiler? |
| 255 | -------------------------------------------------------------------------------------------- |
| 256 | Currently, there isn't much. LLVM supports an intermediate representation |
| 257 | which is useful for code representation but will not support the high level |
| 258 | (abstract syntax tree) representation needed by most compilers. There are no |
| 259 | facilities for lexical nor semantic analysis. |
| 260 | |
| 261 | |
| 262 | I don't understand the ``GetElementPtr`` instruction. Help! |
| 263 | ----------------------------------------------------------- |
| 264 | See `The Often Misunderstood GEP Instruction <GetElementPtr.html>`_. |
| 265 | |
| 266 | |
| 267 | Using the C and C++ Front Ends |
| 268 | ============================== |
| 269 | |
| 270 | Can I compile C or C++ code to platform-independent LLVM bitcode? |
| 271 | ----------------------------------------------------------------- |
| 272 | No. C and C++ are inherently platform-dependent languages. The most obvious |
| 273 | example of this is the preprocessor. A very common way that C code is made |
| 274 | portable is by using the preprocessor to include platform-specific code. In |
| 275 | practice, information about other platforms is lost after preprocessing, so |
| 276 | the result is inherently dependent on the platform that the preprocessing was |
| 277 | targeting. |
| 278 | |
| 279 | Another example is ``sizeof``. It's common for ``sizeof(long)`` to vary |
| 280 | between platforms. In most C front-ends, ``sizeof`` is expanded to a |
| 281 | constant immediately, thus hard-wiring a platform-specific detail. |
| 282 | |
| 283 | Also, since many platforms define their ABIs in terms of C, and since LLVM is |
| 284 | lower-level than C, front-ends currently must emit platform-specific IR in |
| 285 | order to have the result conform to the platform ABI. |
| 286 | |
| 287 | |
| 288 | Questions about code generated by the demo page |
| 289 | =============================================== |
| 290 | |
| 291 | What is this ``llvm.global_ctors`` and ``_GLOBAL__I_a...`` stuff that happens when I ``#include <iostream>``? |
| 292 | ------------------------------------------------------------------------------------------------------------- |
| 293 | If you ``#include`` the ``<iostream>`` header into a C++ translation unit, |
| 294 | the file will probably use the ``std::cin``/``std::cout``/... global objects. |
| 295 | However, C++ does not guarantee an order of initialization between static |
| 296 | objects in different translation units, so if a static ctor/dtor in your .cpp |
| 297 | file used ``std::cout``, for example, the object would not necessarily be |
| 298 | automatically initialized before your use. |
| 299 | |
| 300 | To make ``std::cout`` and friends work correctly in these scenarios, the STL |
| 301 | that we use declares a static object that gets created in every translation |
| 302 | unit that includes ``<iostream>``. This object has a static constructor |
| 303 | and destructor that initializes and destroys the global iostream objects |
| 304 | before they could possibly be used in the file. The code that you see in the |
| 305 | ``.ll`` file corresponds to the constructor and destructor registration code. |
| 306 | |
| 307 | If you would like to make it easier to *understand* the LLVM code generated |
| 308 | by the compiler in the demo page, consider using ``printf()`` instead of |
| 309 | ``iostream``\s to print values. |
| 310 | |
| 311 | |
| 312 | Where did all of my code go?? |
| 313 | ----------------------------- |
| 314 | If you are using the LLVM demo page, you may often wonder what happened to |
| 315 | all of the code that you typed in. Remember that the demo script is running |
| 316 | the code through the LLVM optimizers, so if your code doesn't actually do |
| 317 | anything useful, it might all be deleted. |
| 318 | |
| 319 | To prevent this, make sure that the code is actually needed. For example, if |
| 320 | you are computing some expression, return the value from the function instead |
| 321 | of leaving it in a local variable. If you really want to constrain the |
| 322 | optimizer, you can read from and assign to ``volatile`` global variables. |
| 323 | |
| 324 | |
| 325 | What is this "``undef``" thing that shows up in my code? |
| 326 | -------------------------------------------------------- |
| 327 | ``undef`` is the LLVM way of representing a value that is not defined. You |
| 328 | can get these if you do not initialize a variable before you use it. For |
| 329 | example, the C function: |
| 330 | |
| 331 | .. code-block:: c |
| 332 | |
| 333 | int X() { int i; return i; } |
| 334 | |
| 335 | Is compiled to "``ret i32 undef``" because "``i``" never has a value specified |
| 336 | for it. |
| 337 | |
| 338 | |
| 339 | Why does instcombine + simplifycfg turn a call to a function with a mismatched calling convention into "unreachable"? Why not make the verifier reject it? |
| 340 | ---------------------------------------------------------------------------------------------------------------------------------------------------------- |
| 341 | This is a common problem run into by authors of front-ends that are using |
| 342 | custom calling conventions: you need to make sure to set the right calling |
| 343 | convention on both the function and on each call to the function. For |
| 344 | example, this code: |
| 345 | |
| 346 | .. code-block:: llvm |
| 347 | |
| 348 | define fastcc void @foo() { |
| 349 | ret void |
| 350 | } |
| 351 | define void @bar() { |
| 352 | call void @foo() |
| 353 | ret void |
| 354 | } |
| 355 | |
| 356 | Is optimized to: |
| 357 | |
| 358 | .. code-block:: llvm |
| 359 | |
| 360 | define fastcc void @foo() { |
| 361 | ret void |
| 362 | } |
| 363 | define void @bar() { |
| 364 | unreachable |
| 365 | } |
| 366 | |
| 367 | ... with "``opt -instcombine -simplifycfg``". This often bites people because |
| 368 | "all their code disappears". Setting the calling convention on the caller and |
| 369 | callee is required for indirect calls to work, so people often ask why not |
| 370 | make the verifier reject this sort of thing. |
| 371 | |
| 372 | The answer is that this code has undefined behavior, but it is not illegal. |
| 373 | If we made it illegal, then every transformation that could potentially create |
| 374 | this would have to ensure that it doesn't, and there is valid code that can |
| 375 | create this sort of construct (in dead code). The sorts of things that can |
| 376 | cause this to happen are fairly contrived, but we still need to accept them. |
| 377 | Here's an example: |
| 378 | |
| 379 | .. code-block:: llvm |
| 380 | |
| 381 | define fastcc void @foo() { |
| 382 | ret void |
| 383 | } |
| 384 | define internal void @bar(void()* %FP, i1 %cond) { |
| 385 | br i1 %cond, label %T, label %F |
| 386 | T: |
| 387 | call void %FP() |
| 388 | ret void |
| 389 | F: |
| 390 | call fastcc void %FP() |
| 391 | ret void |
| 392 | } |
| 393 | define void @test() { |
| 394 | %X = or i1 false, false |
| 395 | call void @bar(void()* @foo, i1 %X) |
| 396 | ret void |
| 397 | } |
| 398 | |
| 399 | In this example, "test" always passes ``@foo``/``false`` into ``bar``, which |
| 400 | ensures that it is dynamically called with the right calling conv (thus, the |
| 401 | code is perfectly well defined). If you run this through the inliner, you |
| 402 | get this (the explicit "or" is there so that the inliner doesn't dead code |
| 403 | eliminate a bunch of stuff): |
| 404 | |
| 405 | .. code-block:: llvm |
| 406 | |
| 407 | define fastcc void @foo() { |
| 408 | ret void |
| 409 | } |
| 410 | define void @test() { |
| 411 | %X = or i1 false, false |
| 412 | br i1 %X, label %T.i, label %F.i |
| 413 | T.i: |
| 414 | call void @foo() |
| 415 | br label %bar.exit |
| 416 | F.i: |
| 417 | call fastcc void @foo() |
| 418 | br label %bar.exit |
| 419 | bar.exit: |
| 420 | ret void |
| 421 | } |
| 422 | |
| 423 | Here you can see that the inlining pass made an undefined call to ``@foo`` |
| 424 | with the wrong calling convention. We really don't want to make the inliner |
| 425 | have to know about this sort of thing, so it needs to be valid code. In this |
| 426 | case, dead code elimination can trivially remove the undefined code. However, |
| 427 | if ``%X`` was an input argument to ``@test``, the inliner would produce this: |
| 428 | |
| 429 | .. code-block:: llvm |
| 430 | |
| 431 | define fastcc void @foo() { |
| 432 | ret void |
| 433 | } |
| 434 | |
| 435 | define void @test(i1 %X) { |
| 436 | br i1 %X, label %T.i, label %F.i |
| 437 | T.i: |
| 438 | call void @foo() |
| 439 | br label %bar.exit |
| 440 | F.i: |
| 441 | call fastcc void @foo() |
| 442 | br label %bar.exit |
| 443 | bar.exit: |
| 444 | ret void |
| 445 | } |
| 446 | |
| 447 | The interesting thing about this is that ``%X`` *must* be false for the |
| 448 | code to be well-defined, but no amount of dead code elimination will be able |
| 449 | to delete the broken call as unreachable. However, since |
| 450 | ``instcombine``/``simplifycfg`` turns the undefined call into unreachable, we |
| 451 | end up with a branch on a condition that goes to unreachable: a branch to |
| 452 | unreachable can never happen, so "``-inline -instcombine -simplifycfg``" is |
| 453 | able to produce: |
| 454 | |
| 455 | .. code-block:: llvm |
| 456 | |
| 457 | define fastcc void @foo() { |
| 458 | ret void |
| 459 | } |
| 460 | define void @test(i1 %X) { |
| 461 | F.i: |
| 462 | call fastcc void @foo() |
| 463 | ret void |
| 464 | } |