blob: 5ad5113bb4b117e5156c0787d1fe7b7f91985c83 [file] [log] [blame]
Sean Silvabeb15ca2012-12-04 03:20:08 +00001========================
2LLVM Programmer's Manual
3========================
4
5.. contents::
6 :local:
7
8.. warning::
Chris Lattner045a73e2013-01-10 21:24:04 +00009 This is always a work in progress.
Sean Silvabeb15ca2012-12-04 03:20:08 +000010
11.. _introduction:
12
13Introduction
14============
15
16This document is meant to highlight some of the important classes and interfaces
17available in the LLVM source-base. This manual is not intended to explain what
18LLVM is, how it works, and what LLVM code looks like. It assumes that you know
19the basics of LLVM and are interested in writing transformations or otherwise
20analyzing or manipulating the code.
21
22This document should get you oriented so that you can find your way in the
23continuously growing source code that makes up the LLVM infrastructure. Note
24that this manual is not intended to serve as a replacement for reading the
25source code, so if you think there should be a method in one of these classes to
26do something, but it's not listed, check the source. Links to the `doxygen
27<http://llvm.org/doxygen/>`__ sources are provided to make this as easy as
28possible.
29
30The first section of this document describes general information that is useful
31to know when working in the LLVM infrastructure, and the second describes the
32Core LLVM classes. In the future this manual will be extended with information
33describing how to use extension libraries, such as dominator information, CFG
34traversal routines, and useful utilities like the ``InstVisitor`` (`doxygen
35<http://llvm.org/doxygen/InstVisitor_8h-source.html>`__) template.
36
37.. _general:
38
39General Information
40===================
41
42This section contains general information that is useful if you are working in
43the LLVM source-base, but that isn't specific to any particular API.
44
45.. _stl:
46
47The C++ Standard Template Library
48---------------------------------
49
50LLVM makes heavy use of the C++ Standard Template Library (STL), perhaps much
51more than you are used to, or have seen before. Because of this, you might want
52to do a little background reading in the techniques used and capabilities of the
53library. There are many good pages that discuss the STL, and several books on
54the subject that you can get, so it will not be discussed in this document.
55
56Here are some useful links:
57
Sean Silva4b587852012-12-04 03:30:36 +000058#. `cppreference.com
59 <http://en.cppreference.com/w/>`_ - an excellent
Sean Silvabeb15ca2012-12-04 03:20:08 +000060 reference for the STL and other parts of the standard C++ library.
61
62#. `C++ In a Nutshell <http://www.tempest-sw.com/cpp/>`_ - This is an O'Reilly
63 book in the making. It has a decent Standard Library Reference that rivals
64 Dinkumware's, and is unfortunately no longer free since the book has been
65 published.
66
67#. `C++ Frequently Asked Questions <http://www.parashift.com/c++-faq-lite/>`_.
68
69#. `SGI's STL Programmer's Guide <http://www.sgi.com/tech/stl/>`_ - Contains a
70 useful `Introduction to the STL
71 <http://www.sgi.com/tech/stl/stl_introduction.html>`_.
72
73#. `Bjarne Stroustrup's C++ Page
74 <http://www.research.att.com/%7Ebs/C++.html>`_.
75
76#. `Bruce Eckel's Thinking in C++, 2nd ed. Volume 2 Revision 4.0
Sean Silvac454f07e32012-12-04 03:45:27 +000077 (even better, get the book)
78 <http://www.mindview.net/Books/TICPP/ThinkingInCPP2e.html>`_.
Sean Silvabeb15ca2012-12-04 03:20:08 +000079
Sean Silva92a44892013-01-11 02:28:08 +000080You are also encouraged to take a look at the :doc:`LLVM Coding Standards
81<CodingStandards>` guide which focuses on how to write maintainable code more
Sean Silvabeb15ca2012-12-04 03:20:08 +000082than where to put your curly braces.
83
84.. _resources:
85
86Other useful references
87-----------------------
88
89#. `Using static and shared libraries across platforms
90 <http://www.fortran-2000.com/ArnaudRecipes/sharedlib.html>`_
91
92.. _apis:
93
94Important and useful LLVM APIs
95==============================
96
97Here we highlight some LLVM APIs that are generally useful and good to know
98about when writing transformations.
99
100.. _isa:
101
102The ``isa<>``, ``cast<>`` and ``dyn_cast<>`` templates
103------------------------------------------------------
104
105The LLVM source-base makes extensive use of a custom form of RTTI. These
106templates have many similarities to the C++ ``dynamic_cast<>`` operator, but
107they don't have some drawbacks (primarily stemming from the fact that
108``dynamic_cast<>`` only works on classes that have a v-table). Because they are
109used so often, you must know what they do and how they work. All of these
110templates are defined in the ``llvm/Support/Casting.h`` (`doxygen
111<http://llvm.org/doxygen/Casting_8h-source.html>`__) file (note that you very
112rarely have to include this file directly).
113
114``isa<>``:
115 The ``isa<>`` operator works exactly like the Java "``instanceof``" operator.
116 It returns true or false depending on whether a reference or pointer points to
117 an instance of the specified class. This can be very useful for constraint
118 checking of various sorts (example below).
119
120``cast<>``:
121 The ``cast<>`` operator is a "checked cast" operation. It converts a pointer
122 or reference from a base class to a derived class, causing an assertion
123 failure if it is not really an instance of the right type. This should be
124 used in cases where you have some information that makes you believe that
125 something is of the right type. An example of the ``isa<>`` and ``cast<>``
126 template is:
127
128 .. code-block:: c++
129
130 static bool isLoopInvariant(const Value *V, const Loop *L) {
131 if (isa<Constant>(V) || isa<Argument>(V) || isa<GlobalValue>(V))
132 return true;
133
134 // Otherwise, it must be an instruction...
135 return !L->contains(cast<Instruction>(V)->getParent());
136 }
137
138 Note that you should **not** use an ``isa<>`` test followed by a ``cast<>``,
139 for that use the ``dyn_cast<>`` operator.
140
141``dyn_cast<>``:
142 The ``dyn_cast<>`` operator is a "checking cast" operation. It checks to see
143 if the operand is of the specified type, and if so, returns a pointer to it
144 (this operator does not work with references). If the operand is not of the
145 correct type, a null pointer is returned. Thus, this works very much like
146 the ``dynamic_cast<>`` operator in C++, and should be used in the same
147 circumstances. Typically, the ``dyn_cast<>`` operator is used in an ``if``
148 statement or some other flow control statement like this:
149
150 .. code-block:: c++
151
152 if (AllocationInst *AI = dyn_cast<AllocationInst>(Val)) {
153 // ...
154 }
155
156 This form of the ``if`` statement effectively combines together a call to
157 ``isa<>`` and a call to ``cast<>`` into one statement, which is very
158 convenient.
159
160 Note that the ``dyn_cast<>`` operator, like C++'s ``dynamic_cast<>`` or Java's
161 ``instanceof`` operator, can be abused. In particular, you should not use big
162 chained ``if/then/else`` blocks to check for lots of different variants of
163 classes. If you find yourself wanting to do this, it is much cleaner and more
164 efficient to use the ``InstVisitor`` class to dispatch over the instruction
165 type directly.
166
167``cast_or_null<>``:
168 The ``cast_or_null<>`` operator works just like the ``cast<>`` operator,
169 except that it allows for a null pointer as an argument (which it then
170 propagates). This can sometimes be useful, allowing you to combine several
171 null checks into one.
172
173``dyn_cast_or_null<>``:
174 The ``dyn_cast_or_null<>`` operator works just like the ``dyn_cast<>``
175 operator, except that it allows for a null pointer as an argument (which it
176 then propagates). This can sometimes be useful, allowing you to combine
177 several null checks into one.
178
179These five templates can be used with any classes, whether they have a v-table
180or not. If you want to add support for these templates, see the document
Sean Silva92a44892013-01-11 02:28:08 +0000181:doc:`How to set up LLVM-style RTTI for your class hierarchy
182<HowToSetUpLLVMStyleRTTI>`
Sean Silvabeb15ca2012-12-04 03:20:08 +0000183
184.. _string_apis:
185
186Passing strings (the ``StringRef`` and ``Twine`` classes)
187---------------------------------------------------------
188
189Although LLVM generally does not do much string manipulation, we do have several
190important APIs which take strings. Two important examples are the Value class
191-- which has names for instructions, functions, etc. -- and the ``StringMap``
192class which is used extensively in LLVM and Clang.
193
194These are generic classes, and they need to be able to accept strings which may
195have embedded null characters. Therefore, they cannot simply take a ``const
196char *``, and taking a ``const std::string&`` requires clients to perform a heap
197allocation which is usually unnecessary. Instead, many LLVM APIs use a
198``StringRef`` or a ``const Twine&`` for passing strings efficiently.
199
200.. _StringRef:
201
202The ``StringRef`` class
203^^^^^^^^^^^^^^^^^^^^^^^^^^^^
204
205The ``StringRef`` data type represents a reference to a constant string (a
206character array and a length) and supports the common operations available on
207``std::string``, but does not require heap allocation.
208
209It can be implicitly constructed using a C style null-terminated string, an
210``std::string``, or explicitly with a character pointer and length. For
211example, the ``StringRef`` find function is declared as:
212
213.. code-block:: c++
214
215 iterator find(StringRef Key);
216
217and clients can call it using any one of:
218
219.. code-block:: c++
220
221 Map.find("foo"); // Lookup "foo"
222 Map.find(std::string("bar")); // Lookup "bar"
223 Map.find(StringRef("\0baz", 4)); // Lookup "\0baz"
224
225Similarly, APIs which need to return a string may return a ``StringRef``
226instance, which can be used directly or converted to an ``std::string`` using
227the ``str`` member function. See ``llvm/ADT/StringRef.h`` (`doxygen
228<http://llvm.org/doxygen/classllvm_1_1StringRef_8h-source.html>`__) for more
229information.
230
231You should rarely use the ``StringRef`` class directly, because it contains
232pointers to external memory it is not generally safe to store an instance of the
233class (unless you know that the external storage will not be freed).
234``StringRef`` is small and pervasive enough in LLVM that it should always be
235passed by value.
236
237The ``Twine`` class
238^^^^^^^^^^^^^^^^^^^
239
240The ``Twine`` (`doxygen <http://llvm.org/doxygen/classllvm_1_1Twine.html>`__)
241class is an efficient way for APIs to accept concatenated strings. For example,
242a common LLVM paradigm is to name one instruction based on the name of another
243instruction with a suffix, for example:
244
245.. code-block:: c++
246
247 New = CmpInst::Create(..., SO->getName() + ".cmp");
248
249The ``Twine`` class is effectively a lightweight `rope
250<http://en.wikipedia.org/wiki/Rope_(computer_science)>`_ which points to
251temporary (stack allocated) objects. Twines can be implicitly constructed as
252the result of the plus operator applied to strings (i.e., a C strings, an
253``std::string``, or a ``StringRef``). The twine delays the actual concatenation
254of strings until it is actually required, at which point it can be efficiently
255rendered directly into a character array. This avoids unnecessary heap
256allocation involved in constructing the temporary results of string
257concatenation. See ``llvm/ADT/Twine.h`` (`doxygen
258<http://llvm.org/doxygen/Twine_8h_source.html>`__) and :ref:`here <dss_twine>`
259for more information.
260
261As with a ``StringRef``, ``Twine`` objects point to external memory and should
262almost never be stored or mentioned directly. They are intended solely for use
263when defining a function which should be able to efficiently accept concatenated
264strings.
265
Lang Hamesf7f6d3e2016-03-16 01:02:46 +0000266.. _error_apis:
267
268Error handling
269--------------
270
271Proper error handling helps us identify bugs in our code, and helps end-users
272understand errors in their tool usage. Errors fall into two broad categories:
273*programmatic* and *recoverable*, with different strategies for handling and
274reporting.
275
276Programmatic Errors
277^^^^^^^^^^^^^^^^^^^
278
279Programmatic errors are violations of program invariants or API contracts, and
280represent bugs within the program itself. Our aim is to document invariants, and
281to abort quickly at the point of failure (providing some basic diagnostic) when
282invariants are broken at runtime.
283
284The fundamental tools for handling programmatic errors are assertions and the
285llvm_unreachable function. Assertions are used to express invariant conditions,
286and should include a message describing the invariant:
287
288.. code-block:: c++
289
290 assert(isPhysReg(R) && "All virt regs should have been allocated already.");
291
292The llvm_unreachable function can be used to document areas of control flow
293that should never be entered if the program invariants hold:
294
295.. code-block:: c++
296
297 enum { Foo, Bar, Baz } X = foo();
298
299 switch (X) {
300 case Foo: /* Handle Foo */; break;
301 case Bar: /* Handle Bar */; break;
302 default:
303 llvm_unreachable("X should be Foo or Bar here");
304 }
305
306Recoverable Errors
307^^^^^^^^^^^^^^^^^^
308
309Recoverable errors represent an error in the program's environment, for example
310a resource failure (a missing file, a dropped network connection, etc.), or
311malformed input. These errors should be detected and communicated to a level of
312the program where they can be handled appropriately. Handling the error may be
313as simple as reporting the issue to the user, or it may involve attempts at
314recovery.
315
316Recoverable errors are modeled using LLVM's ``Error`` scheme. This scheme
317represents errors using function return values, similar to classic C integer
318error codes, or C++'s ``std::error_code``. However, the ``Error`` class is
319actually a lightweight wrapper for user-defined error types, allowing arbitrary
320information to be attached to describe the error. This is similar to the way C++
321exceptions allow throwing of user-defined types.
322
323Success values are created by calling ``Error::success()``:
324
325.. code-block:: c++
326
327 Error foo() {
328 // Do something.
329 // Return success.
330 return Error::success();
331 }
332
333Success values are very cheap to construct and return - they have minimal
334impact on program performance.
335
336Failure values are constructed using ``make_error<T>``, where ``T`` is any class
337that inherits from the ErrorInfo utility:
338
339.. code-block:: c++
340
341 class MyError : public ErrorInfo<MyError> {
342 public:
343 MyError(std::string Msg) : Msg(Msg) {}
344 void log(OStream &OS) const override { OS << "MyError - " << Msg; }
345 private:
346 std::string Msg;
347 };
348
349 Error bar() {
350 if (checkErrorCondition)
351 return make_error<MyError>("Error condition detected");
352
353 // No error - proceed with bar.
354
355 // Return success value.
356 return Error::success();
357 }
358
Lang Hamesa0f517f2016-03-23 03:18:16 +0000359Error values can be implicitly converted to bool: true for error, false for
360success, enabling the following idiom:
361
362.. code-block::
363
364 if (auto Err = mayFail())
365 return Err;
366
367 // Success! We can proceed.
368
369
Lang Hamesf7f6d3e2016-03-16 01:02:46 +0000370For functions that can fail but need to return a value the ``Expected<T>``
371utility can be used. Values of this type can be constructed with either a
Lang Hamesa0f517f2016-03-23 03:18:16 +0000372``T``, or a ``Error``. Expected<T> values are also implicitly convertible to
373boolean, but with the opposite convention to Error: true for success, false for
374error. If success, the ``T`` value can be accessed via the dereference operator.
375If failure, the ``Error`` value can be extracted using the ``takeError()``
376method. Idiomatic usage looks like:
Lang Hamesf7f6d3e2016-03-16 01:02:46 +0000377
378.. code-block:: c++
379
380 Expected<float> parseAndSquareRoot(IStream &IS) {
381 float f;
382 OS >> f;
383 if (f < 0)
384 return make_error<FloatingPointError>(...);
385 return sqrt(f);
386 }
387
388 Error foo(IStream &IS) {
389 if (auto SqrtOrErr = parseAndSquartRoot(IS)) {
390 float Sqrt = *SqrtOrErr;
391 // ...
392 } else
393 return SqrtOrErr.takeError();
394 }
395
396All Error instances, whether success or failure, must be either checked or
397moved from (via std::move or a return) before they are destructed. Accidentally
398discarding an unchecked error will cause a program abort at the point where the
399unchecked value's destructor is run, making it easy to identify and fix
400violations of this rule.
401
402Success values are considered checked once they have been tested (by invoking
403the boolean conversion operator):
404
405.. code-block:: c++
406
407 if (auto Err = canFail(...))
408 return Err; // Failure value - move error to caller.
409
410 // Safe to continue: Err was checked.
411
412In contrast, the following code will always cause an abort, regardless of the
413return value of ``foo``:
414
415.. code-block:: c++
416
417 canFail();
418 // Program will always abort here, even if canFail() returns Success, since
419 // the value is not checked.
420
421Failure values are considered checked once a handler for the error type has
422been activated:
423
424.. code-block:: c++
425
426 auto Err = canFail(...);
427 if (auto Err2 =
428 handleErrors(std::move(Err),
429 [](std::unique_ptr<MyError> M) {
430 // Try to handle 'M'. If successful, return a success value from
431 // the handler.
432 if (tryToHandle(M))
433 return Error::success();
434
435 // We failed to handle 'M' - return it from the handler.
436 // This value will be passed back from catchErrors and
437 // wind up in Err2, where it will be returned from this function.
438 return Error(std::move(M));
439 })))
440 return Err2;
441
442
Richard Smithddb2fde2014-05-06 07:45:39 +0000443.. _function_apis:
444
Lang Hamesf7f6d3e2016-03-16 01:02:46 +0000445More information on Error and its related utilities can be found in the
446Error.h header file.
447
Richard Smithddb2fde2014-05-06 07:45:39 +0000448Passing functions and other callable objects
449--------------------------------------------
450
451Sometimes you may want a function to be passed a callback object. In order to
452support lambda expressions and other function objects, you should not use the
453traditional C approach of taking a function pointer and an opaque cookie:
454
455.. code-block:: c++
456
457 void takeCallback(bool (*Callback)(Function *, void *), void *Cookie);
458
459Instead, use one of the following approaches:
460
461Function template
462^^^^^^^^^^^^^^^^^
463
464If you don't mind putting the definition of your function into a header file,
465make it a function template that is templated on the callable type.
466
467.. code-block:: c++
468
469 template<typename Callable>
470 void takeCallback(Callable Callback) {
471 Callback(1, 2, 3);
472 }
473
474The ``function_ref`` class template
475^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
476
477The ``function_ref``
478(`doxygen <http://llvm.org/doxygen/classllvm_1_1function_ref.html>`__) class
479template represents a reference to a callable object, templated over the type
480of the callable. This is a good choice for passing a callback to a function,
Reid Kleckner5c2245b2014-07-17 22:43:00 +0000481if you don't need to hold onto the callback after the function returns. In this
482way, ``function_ref`` is to ``std::function`` as ``StringRef`` is to
483``std::string``.
Richard Smithddb2fde2014-05-06 07:45:39 +0000484
485``function_ref<Ret(Param1, Param2, ...)>`` can be implicitly constructed from
486any callable object that can be called with arguments of type ``Param1``,
487``Param2``, ..., and returns a value that can be converted to type ``Ret``.
488For example:
489
490.. code-block:: c++
491
492 void visitBasicBlocks(Function *F, function_ref<bool (BasicBlock*)> Callback) {
493 for (BasicBlock &BB : *F)
494 if (Callback(&BB))
495 return;
496 }
497
498can be called using:
499
500.. code-block:: c++
501
502 visitBasicBlocks(F, [&](BasicBlock *BB) {
503 if (process(BB))
504 return isEmpty(BB);
505 return false;
506 });
507
Reid Kleckner5c2245b2014-07-17 22:43:00 +0000508Note that a ``function_ref`` object contains pointers to external memory, so it
509is not generally safe to store an instance of the class (unless you know that
510the external storage will not be freed). If you need this ability, consider
511using ``std::function``. ``function_ref`` is small enough that it should always
512be passed by value.
Richard Smithddb2fde2014-05-06 07:45:39 +0000513
Sean Silvabeb15ca2012-12-04 03:20:08 +0000514.. _DEBUG:
515
516The ``DEBUG()`` macro and ``-debug`` option
517-------------------------------------------
518
519Often when working on your pass you will put a bunch of debugging printouts and
520other code into your pass. After you get it working, you want to remove it, but
521you may need it again in the future (to work out new bugs that you run across).
522
523Naturally, because of this, you don't want to delete the debug printouts, but
524you don't want them to always be noisy. A standard compromise is to comment
525them out, allowing you to enable them if you need them in the future.
526
527The ``llvm/Support/Debug.h`` (`doxygen
528<http://llvm.org/doxygen/Debug_8h-source.html>`__) file provides a macro named
529``DEBUG()`` that is a much nicer solution to this problem. Basically, you can
530put arbitrary code into the argument of the ``DEBUG`` macro, and it is only
531executed if '``opt``' (or any other tool) is run with the '``-debug``' command
532line argument:
533
534.. code-block:: c++
535
536 DEBUG(errs() << "I am here!\n");
537
538Then you can run your pass like this:
539
540.. code-block:: none
541
542 $ opt < a.bc > /dev/null -mypass
543 <no output>
544 $ opt < a.bc > /dev/null -mypass -debug
545 I am here!
546
547Using the ``DEBUG()`` macro instead of a home-brewed solution allows you to not
548have to create "yet another" command line option for the debug output for your
Justin Bognerc2e54af2015-10-15 18:17:44 +0000549pass. Note that ``DEBUG()`` macros are disabled for non-asserts builds, so they
Sean Silvabeb15ca2012-12-04 03:20:08 +0000550do not cause a performance impact at all (for the same reason, they should also
551not contain side-effects!).
552
553One additional nice thing about the ``DEBUG()`` macro is that you can enable or
554disable it directly in gdb. Just use "``set DebugFlag=0``" or "``set
555DebugFlag=1``" from the gdb if the program is running. If the program hasn't
556been started yet, you can always just run it with ``-debug``.
557
558.. _DEBUG_TYPE:
559
560Fine grained debug info with ``DEBUG_TYPE`` and the ``-debug-only`` option
561^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
562
563Sometimes you may find yourself in a situation where enabling ``-debug`` just
564turns on **too much** information (such as when working on the code generator).
565If you want to enable debug information with more fine-grained control, you
Justin Bognerc2e54af2015-10-15 18:17:44 +0000566should define the ``DEBUG_TYPE`` macro and use the ``-debug-only`` option as
Alexey Samsonov6c0ddfe2014-06-05 23:12:43 +0000567follows:
Sean Silvabeb15ca2012-12-04 03:20:08 +0000568
569.. code-block:: c++
570
Sean Silvabeb15ca2012-12-04 03:20:08 +0000571 #define DEBUG_TYPE "foo"
572 DEBUG(errs() << "'foo' debug type\n");
573 #undef DEBUG_TYPE
574 #define DEBUG_TYPE "bar"
575 DEBUG(errs() << "'bar' debug type\n"));
576 #undef DEBUG_TYPE
Sean Silvabeb15ca2012-12-04 03:20:08 +0000577
578Then you can run your pass like this:
579
580.. code-block:: none
581
582 $ opt < a.bc > /dev/null -mypass
583 <no output>
584 $ opt < a.bc > /dev/null -mypass -debug
Sean Silvabeb15ca2012-12-04 03:20:08 +0000585 'foo' debug type
586 'bar' debug type
Sean Silvabeb15ca2012-12-04 03:20:08 +0000587 $ opt < a.bc > /dev/null -mypass -debug-only=foo
588 'foo' debug type
589 $ opt < a.bc > /dev/null -mypass -debug-only=bar
590 'bar' debug type
Christof Doumaf617e672016-01-12 10:23:13 +0000591 $ opt < a.bc > /dev/null -mypass -debug-only=foo,bar
592 'foo' debug type
593 'bar' debug type
Sean Silvabeb15ca2012-12-04 03:20:08 +0000594
595Of course, in practice, you should only set ``DEBUG_TYPE`` at the top of a file,
Justin Bognerc2e54af2015-10-15 18:17:44 +0000596to specify the debug type for the entire module. Be careful that you only do
597this after including Debug.h and not around any #include of headers. Also, you
598should use names more meaningful than "foo" and "bar", because there is no
599system in place to ensure that names do not conflict. If two different modules
600use the same string, they will all be turned on when the name is specified.
601This allows, for example, all debug information for instruction scheduling to be
602enabled with ``-debug-only=InstrSched``, even if the source lives in multiple
Sylvestre Ledru84666a12016-02-14 20:16:22 +0000603files. The name must not include a comma (,) as that is used to separate the
Christof Doumaf617e672016-01-12 10:23:13 +0000604arguments of the ``-debug-only`` option.
Sean Silvabeb15ca2012-12-04 03:20:08 +0000605
Sylvestre Ledru1623b462014-09-25 10:58:16 +0000606For performance reasons, -debug-only is not available in optimized build
607(``--enable-optimized``) of LLVM.
Sylvestre Ledrub5984fa2014-09-25 10:57:00 +0000608
Sean Silvabeb15ca2012-12-04 03:20:08 +0000609The ``DEBUG_WITH_TYPE`` macro is also available for situations where you would
610like to set ``DEBUG_TYPE``, but only for one specific ``DEBUG`` statement. It
611takes an additional first parameter, which is the type to use. For example, the
612preceding example could be written as:
613
614.. code-block:: c++
615
Sean Silvabeb15ca2012-12-04 03:20:08 +0000616 DEBUG_WITH_TYPE("foo", errs() << "'foo' debug type\n");
617 DEBUG_WITH_TYPE("bar", errs() << "'bar' debug type\n"));
Sean Silvabeb15ca2012-12-04 03:20:08 +0000618
619.. _Statistic:
620
621The ``Statistic`` class & ``-stats`` option
622-------------------------------------------
623
624The ``llvm/ADT/Statistic.h`` (`doxygen
625<http://llvm.org/doxygen/Statistic_8h-source.html>`__) file provides a class
626named ``Statistic`` that is used as a unified way to keep track of what the LLVM
627compiler is doing and how effective various optimizations are. It is useful to
628see what optimizations are contributing to making a particular program run
629faster.
630
631Often you may run your pass on some big program, and you're interested to see
632how many times it makes a certain transformation. Although you can do this with
633hand inspection, or some ad-hoc method, this is a real pain and not very useful
634for big programs. Using the ``Statistic`` class makes it very easy to keep
635track of this information, and the calculated information is presented in a
636uniform manner with the rest of the passes being executed.
637
638There are many examples of ``Statistic`` uses, but the basics of using it are as
639follows:
640
641#. Define your statistic like this:
642
643 .. code-block:: c++
644
645 #define DEBUG_TYPE "mypassname" // This goes before any #includes.
646 STATISTIC(NumXForms, "The # of times I did stuff");
647
648 The ``STATISTIC`` macro defines a static variable, whose name is specified by
649 the first argument. The pass name is taken from the ``DEBUG_TYPE`` macro, and
650 the description is taken from the second argument. The variable defined
651 ("NumXForms" in this case) acts like an unsigned integer.
652
653#. Whenever you make a transformation, bump the counter:
654
655 .. code-block:: c++
656
657 ++NumXForms; // I did stuff!
658
659That's all you have to do. To get '``opt``' to print out the statistics
660gathered, use the '``-stats``' option:
661
662.. code-block:: none
663
664 $ opt -stats -mypassname < program.bc > /dev/null
665 ... statistics output ...
666
Justin Bogner08f36fd2015-02-21 20:53:36 +0000667Note that in order to use the '``-stats``' option, LLVM must be
668compiled with assertions enabled.
669
Sean Silvabeb15ca2012-12-04 03:20:08 +0000670When running ``opt`` on a C file from the SPEC benchmark suite, it gives a
671report that looks like this:
672
673.. code-block:: none
674
675 7646 bitcodewriter - Number of normal instructions
676 725 bitcodewriter - Number of oversized instructions
677 129996 bitcodewriter - Number of bitcode bytes written
678 2817 raise - Number of insts DCEd or constprop'd
679 3213 raise - Number of cast-of-self removed
680 5046 raise - Number of expression trees converted
681 75 raise - Number of other getelementptr's formed
682 138 raise - Number of load/store peepholes
683 42 deadtypeelim - Number of unused typenames removed from symtab
684 392 funcresolve - Number of varargs functions resolved
685 27 globaldce - Number of global variables removed
686 2 adce - Number of basic blocks removed
687 134 cee - Number of branches revectored
688 49 cee - Number of setcc instruction eliminated
689 532 gcse - Number of loads removed
690 2919 gcse - Number of instructions removed
691 86 indvars - Number of canonical indvars added
692 87 indvars - Number of aux indvars removed
693 25 instcombine - Number of dead inst eliminate
694 434 instcombine - Number of insts combined
695 248 licm - Number of load insts hoisted
696 1298 licm - Number of insts hoisted to a loop pre-header
697 3 licm - Number of insts hoisted to multiple loop preds (bad, no loop pre-header)
698 75 mem2reg - Number of alloca's promoted
699 1444 cfgsimplify - Number of blocks simplified
700
701Obviously, with so many optimizations, having a unified framework for this stuff
702is very nice. Making your pass fit well into the framework makes it more
703maintainable and useful.
704
705.. _ViewGraph:
706
707Viewing graphs while debugging code
708-----------------------------------
709
710Several of the important data structures in LLVM are graphs: for example CFGs
711made out of LLVM :ref:`BasicBlocks <BasicBlock>`, CFGs made out of LLVM
712:ref:`MachineBasicBlocks <MachineBasicBlock>`, and :ref:`Instruction Selection
713DAGs <SelectionDAG>`. In many cases, while debugging various parts of the
714compiler, it is nice to instantly visualize these graphs.
715
716LLVM provides several callbacks that are available in a debug build to do
717exactly that. If you call the ``Function::viewCFG()`` method, for example, the
718current LLVM tool will pop up a window containing the CFG for the function where
719each basic block is a node in the graph, and each node contains the instructions
720in the block. Similarly, there also exists ``Function::viewCFGOnly()`` (does
721not include the instructions), the ``MachineFunction::viewCFG()`` and
722``MachineFunction::viewCFGOnly()``, and the ``SelectionDAG::viewGraph()``
723methods. Within GDB, for example, you can usually use something like ``call
724DAG.viewGraph()`` to pop up a window. Alternatively, you can sprinkle calls to
725these functions in your code in places you want to debug.
726
Alp Toker125be842014-06-02 01:40:04 +0000727Getting this to work requires a small amount of setup. On Unix systems
Sean Silvabeb15ca2012-12-04 03:20:08 +0000728with X11, install the `graphviz <http://www.graphviz.org>`_ toolkit, and make
Nico Weberad156922014-03-07 18:08:54 +0000729sure 'dot' and 'gv' are in your path. If you are running on Mac OS X, download
730and install the Mac OS X `Graphviz program
Sean Silvabeb15ca2012-12-04 03:20:08 +0000731<http://www.pixelglow.com/graphviz/>`_ and add
732``/Applications/Graphviz.app/Contents/MacOS/`` (or wherever you install it) to
Alp Toker125be842014-06-02 01:40:04 +0000733your path. The programs need not be present when configuring, building or
734running LLVM and can simply be installed when needed during an active debug
735session.
Sean Silvabeb15ca2012-12-04 03:20:08 +0000736
737``SelectionDAG`` has been extended to make it easier to locate *interesting*
738nodes in large complex graphs. From gdb, if you ``call DAG.setGraphColor(node,
739"color")``, then the next ``call DAG.viewGraph()`` would highlight the node in
740the specified color (choices of colors can be found at `colors
741<http://www.graphviz.org/doc/info/colors.html>`_.) More complex node attributes
742can be provided with ``call DAG.setGraphAttrs(node, "attributes")`` (choices can
743be found at `Graph attributes <http://www.graphviz.org/doc/info/attrs.html>`_.)
744If you want to restart and clear all the current graph attributes, then you can
745``call DAG.clearGraphAttrs()``.
746
747Note that graph visualization features are compiled out of Release builds to
748reduce file size. This means that you need a Debug+Asserts or Release+Asserts
749build to use these features.
750
751.. _datastructure:
752
753Picking the Right Data Structure for a Task
754===========================================
755
756LLVM has a plethora of data structures in the ``llvm/ADT/`` directory, and we
757commonly use STL data structures. This section describes the trade-offs you
758should consider when you pick one.
759
760The first step is a choose your own adventure: do you want a sequential
761container, a set-like container, or a map-like container? The most important
762thing when choosing a container is the algorithmic properties of how you plan to
763access the container. Based on that, you should use:
764
765
766* a :ref:`map-like <ds_map>` container if you need efficient look-up of a
767 value based on another value. Map-like containers also support efficient
768 queries for containment (whether a key is in the map). Map-like containers
769 generally do not support efficient reverse mapping (values to keys). If you
770 need that, use two maps. Some map-like containers also support efficient
771 iteration through the keys in sorted order. Map-like containers are the most
772 expensive sort, only use them if you need one of these capabilities.
773
774* a :ref:`set-like <ds_set>` container if you need to put a bunch of stuff into
775 a container that automatically eliminates duplicates. Some set-like
776 containers support efficient iteration through the elements in sorted order.
777 Set-like containers are more expensive than sequential containers.
778
779* a :ref:`sequential <ds_sequential>` container provides the most efficient way
780 to add elements and keeps track of the order they are added to the collection.
781 They permit duplicates and support efficient iteration, but do not support
782 efficient look-up based on a key.
783
784* a :ref:`string <ds_string>` container is a specialized sequential container or
785 reference structure that is used for character or byte arrays.
786
787* a :ref:`bit <ds_bit>` container provides an efficient way to store and
788 perform set operations on sets of numeric id's, while automatically
789 eliminating duplicates. Bit containers require a maximum of 1 bit for each
790 identifier you want to store.
791
792Once the proper category of container is determined, you can fine tune the
793memory use, constant factors, and cache behaviors of access by intelligently
794picking a member of the category. Note that constant factors and cache behavior
795can be a big deal. If you have a vector that usually only contains a few
796elements (but could contain many), for example, it's much better to use
797:ref:`SmallVector <dss_smallvector>` than :ref:`vector <dss_vector>`. Doing so
798avoids (relatively) expensive malloc/free calls, which dwarf the cost of adding
799the elements to the container.
800
801.. _ds_sequential:
802
803Sequential Containers (std::vector, std::list, etc)
804---------------------------------------------------
805
806There are a variety of sequential containers available for you, based on your
807needs. Pick the first in this section that will do what you want.
808
809.. _dss_arrayref:
810
811llvm/ADT/ArrayRef.h
812^^^^^^^^^^^^^^^^^^^
813
814The ``llvm::ArrayRef`` class is the preferred class to use in an interface that
815accepts a sequential list of elements in memory and just reads from them. By
816taking an ``ArrayRef``, the API can be passed a fixed size array, an
817``std::vector``, an ``llvm::SmallVector`` and anything else that is contiguous
818in memory.
819
820.. _dss_fixedarrays:
821
822Fixed Size Arrays
823^^^^^^^^^^^^^^^^^
824
825Fixed size arrays are very simple and very fast. They are good if you know
826exactly how many elements you have, or you have a (low) upper bound on how many
827you have.
828
829.. _dss_heaparrays:
830
831Heap Allocated Arrays
832^^^^^^^^^^^^^^^^^^^^^
833
834Heap allocated arrays (``new[]`` + ``delete[]``) are also simple. They are good
835if the number of elements is variable, if you know how many elements you will
836need before the array is allocated, and if the array is usually large (if not,
837consider a :ref:`SmallVector <dss_smallvector>`). The cost of a heap allocated
838array is the cost of the new/delete (aka malloc/free). Also note that if you
839are allocating an array of a type with a constructor, the constructor and
840destructors will be run for every element in the array (re-sizable vectors only
841construct those elements actually used).
842
843.. _dss_tinyptrvector:
844
845llvm/ADT/TinyPtrVector.h
846^^^^^^^^^^^^^^^^^^^^^^^^
847
848``TinyPtrVector<Type>`` is a highly specialized collection class that is
849optimized to avoid allocation in the case when a vector has zero or one
850elements. It has two major restrictions: 1) it can only hold values of pointer
851type, and 2) it cannot hold a null pointer.
852
853Since this container is highly specialized, it is rarely used.
854
855.. _dss_smallvector:
856
857llvm/ADT/SmallVector.h
858^^^^^^^^^^^^^^^^^^^^^^
859
860``SmallVector<Type, N>`` is a simple class that looks and smells just like
861``vector<Type>``: it supports efficient iteration, lays out elements in memory
862order (so you can do pointer arithmetic between elements), supports efficient
863push_back/pop_back operations, supports efficient random access to its elements,
864etc.
865
866The advantage of SmallVector is that it allocates space for some number of
867elements (N) **in the object itself**. Because of this, if the SmallVector is
868dynamically smaller than N, no malloc is performed. This can be a big win in
869cases where the malloc/free call is far more expensive than the code that
870fiddles around with the elements.
871
872This is good for vectors that are "usually small" (e.g. the number of
873predecessors/successors of a block is usually less than 8). On the other hand,
874this makes the size of the SmallVector itself large, so you don't want to
875allocate lots of them (doing so will waste a lot of space). As such,
876SmallVectors are most useful when on the stack.
877
878SmallVector also provides a nice portable and efficient replacement for
879``alloca``.
880
Sean Silva4ee92f92013-03-22 23:41:29 +0000881.. note::
882
Sean Silva43590682013-03-22 23:52:38 +0000883 Prefer to use ``SmallVectorImpl<T>`` as a parameter type.
Sean Silva4ee92f92013-03-22 23:41:29 +0000884
885 In APIs that don't care about the "small size" (most?), prefer to use
886 the ``SmallVectorImpl<T>`` class, which is basically just the "vector
887 header" (and methods) without the elements allocated after it. Note that
888 ``SmallVector<T, N>`` inherits from ``SmallVectorImpl<T>`` so the
889 conversion is implicit and costs nothing. E.g.
890
891 .. code-block:: c++
892
893 // BAD: Clients cannot pass e.g. SmallVector<Foo, 4>.
894 hardcodedSmallSize(SmallVector<Foo, 2> &Out);
895 // GOOD: Clients can pass any SmallVector<Foo, N>.
896 allowsAnySmallSize(SmallVectorImpl<Foo> &Out);
897
898 void someFunc() {
899 SmallVector<Foo, 8> Vec;
900 hardcodedSmallSize(Vec); // Error.
901 allowsAnySmallSize(Vec); // Works.
902 }
903
904 Even though it has "``Impl``" in the name, this is so widely used that
905 it really isn't "private to the implementation" anymore. A name like
906 ``SmallVectorHeader`` would be more appropriate.
907
Sean Silvabeb15ca2012-12-04 03:20:08 +0000908.. _dss_vector:
909
910<vector>
911^^^^^^^^
912
913``std::vector`` is well loved and respected. It is useful when SmallVector
914isn't: when the size of the vector is often large (thus the small optimization
915will rarely be a benefit) or if you will be allocating many instances of the
916vector itself (which would waste space for elements that aren't in the
917container). vector is also useful when interfacing with code that expects
918vectors :).
919
920One worthwhile note about std::vector: avoid code like this:
921
922.. code-block:: c++
923
924 for ( ... ) {
925 std::vector<foo> V;
926 // make use of V.
927 }
928
929Instead, write this as:
930
931.. code-block:: c++
932
933 std::vector<foo> V;
934 for ( ... ) {
935 // make use of V.
936 V.clear();
937 }
938
939Doing so will save (at least) one heap allocation and free per iteration of the
940loop.
941
942.. _dss_deque:
943
944<deque>
945^^^^^^^
946
947``std::deque`` is, in some senses, a generalized version of ``std::vector``.
948Like ``std::vector``, it provides constant time random access and other similar
949properties, but it also provides efficient access to the front of the list. It
950does not guarantee continuity of elements within memory.
951
952In exchange for this extra flexibility, ``std::deque`` has significantly higher
953constant factor costs than ``std::vector``. If possible, use ``std::vector`` or
954something cheaper.
955
956.. _dss_list:
957
958<list>
959^^^^^^
960
961``std::list`` is an extremely inefficient class that is rarely useful. It
962performs a heap allocation for every element inserted into it, thus having an
963extremely high constant factor, particularly for small data types.
964``std::list`` also only supports bidirectional iteration, not random access
965iteration.
966
967In exchange for this high cost, std::list supports efficient access to both ends
968of the list (like ``std::deque``, but unlike ``std::vector`` or
969``SmallVector``). In addition, the iterator invalidation characteristics of
970std::list are stronger than that of a vector class: inserting or removing an
971element into the list does not invalidate iterator or pointers to other elements
972in the list.
973
974.. _dss_ilist:
975
976llvm/ADT/ilist.h
977^^^^^^^^^^^^^^^^
978
979``ilist<T>`` implements an 'intrusive' doubly-linked list. It is intrusive,
980because it requires the element to store and provide access to the prev/next
981pointers for the list.
982
983``ilist`` has the same drawbacks as ``std::list``, and additionally requires an
984``ilist_traits`` implementation for the element type, but it provides some novel
985characteristics. In particular, it can efficiently store polymorphic objects,
986the traits class is informed when an element is inserted or removed from the
987list, and ``ilist``\ s are guaranteed to support a constant-time splice
988operation.
989
990These properties are exactly what we want for things like ``Instruction``\ s and
991basic blocks, which is why these are implemented with ``ilist``\ s.
992
993Related classes of interest are explained in the following subsections:
994
995* :ref:`ilist_traits <dss_ilist_traits>`
996
997* :ref:`iplist <dss_iplist>`
998
999* :ref:`llvm/ADT/ilist_node.h <dss_ilist_node>`
1000
1001* :ref:`Sentinels <dss_ilist_sentinel>`
1002
1003.. _dss_packedvector:
1004
1005llvm/ADT/PackedVector.h
1006^^^^^^^^^^^^^^^^^^^^^^^
1007
1008Useful for storing a vector of values using only a few number of bits for each
1009value. Apart from the standard operations of a vector-like container, it can
1010also perform an 'or' set operation.
1011
1012For example:
1013
1014.. code-block:: c++
1015
1016 enum State {
1017 None = 0x0,
1018 FirstCondition = 0x1,
1019 SecondCondition = 0x2,
1020 Both = 0x3
1021 };
1022
1023 State get() {
1024 PackedVector<State, 2> Vec1;
1025 Vec1.push_back(FirstCondition);
1026
1027 PackedVector<State, 2> Vec2;
1028 Vec2.push_back(SecondCondition);
1029
1030 Vec1 |= Vec2;
1031 return Vec1[0]; // returns 'Both'.
1032 }
1033
1034.. _dss_ilist_traits:
1035
1036ilist_traits
1037^^^^^^^^^^^^
1038
1039``ilist_traits<T>`` is ``ilist<T>``'s customization mechanism. ``iplist<T>``
1040(and consequently ``ilist<T>``) publicly derive from this traits class.
1041
1042.. _dss_iplist:
1043
1044iplist
1045^^^^^^
1046
1047``iplist<T>`` is ``ilist<T>``'s base and as such supports a slightly narrower
1048interface. Notably, inserters from ``T&`` are absent.
1049
1050``ilist_traits<T>`` is a public base of this class and can be used for a wide
1051variety of customizations.
1052
1053.. _dss_ilist_node:
1054
1055llvm/ADT/ilist_node.h
1056^^^^^^^^^^^^^^^^^^^^^
1057
Robin Morisset039781e2014-08-29 21:53:01 +00001058``ilist_node<T>`` implements the forward and backward links that are expected
Sean Silvabeb15ca2012-12-04 03:20:08 +00001059by the ``ilist<T>`` (and analogous containers) in the default manner.
1060
1061``ilist_node<T>``\ s are meant to be embedded in the node type ``T``, usually
1062``T`` publicly derives from ``ilist_node<T>``.
1063
1064.. _dss_ilist_sentinel:
1065
1066Sentinels
1067^^^^^^^^^
1068
1069``ilist``\ s have another specialty that must be considered. To be a good
1070citizen in the C++ ecosystem, it needs to support the standard container
1071operations, such as ``begin`` and ``end`` iterators, etc. Also, the
1072``operator--`` must work correctly on the ``end`` iterator in the case of
1073non-empty ``ilist``\ s.
1074
1075The only sensible solution to this problem is to allocate a so-called *sentinel*
1076along with the intrusive list, which serves as the ``end`` iterator, providing
1077the back-link to the last element. However conforming to the C++ convention it
1078is illegal to ``operator++`` beyond the sentinel and it also must not be
1079dereferenced.
1080
1081These constraints allow for some implementation freedom to the ``ilist`` how to
1082allocate and store the sentinel. The corresponding policy is dictated by
1083``ilist_traits<T>``. By default a ``T`` gets heap-allocated whenever the need
1084for a sentinel arises.
1085
1086While the default policy is sufficient in most cases, it may break down when
1087``T`` does not provide a default constructor. Also, in the case of many
1088instances of ``ilist``\ s, the memory overhead of the associated sentinels is
1089wasted. To alleviate the situation with numerous and voluminous
1090``T``-sentinels, sometimes a trick is employed, leading to *ghostly sentinels*.
1091
1092Ghostly sentinels are obtained by specially-crafted ``ilist_traits<T>`` which
1093superpose the sentinel with the ``ilist`` instance in memory. Pointer
1094arithmetic is used to obtain the sentinel, which is relative to the ``ilist``'s
1095``this`` pointer. The ``ilist`` is augmented by an extra pointer, which serves
1096as the back-link of the sentinel. This is the only field in the ghostly
1097sentinel which can be legally accessed.
1098
1099.. _dss_other:
1100
1101Other Sequential Container options
1102^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
1103
1104Other STL containers are available, such as ``std::string``.
1105
1106There are also various STL adapter classes such as ``std::queue``,
1107``std::priority_queue``, ``std::stack``, etc. These provide simplified access
1108to an underlying container but don't affect the cost of the container itself.
1109
1110.. _ds_string:
1111
1112String-like containers
1113----------------------
1114
1115There are a variety of ways to pass around and use strings in C and C++, and
1116LLVM adds a few new options to choose from. Pick the first option on this list
1117that will do what you need, they are ordered according to their relative cost.
1118
Ed Maste8ed40ce2015-04-14 20:52:58 +00001119Note that it is generally preferred to *not* pass strings around as ``const
Sean Silvabeb15ca2012-12-04 03:20:08 +00001120char*``'s. These have a number of problems, including the fact that they
1121cannot represent embedded nul ("\0") characters, and do not have a length
1122available efficiently. The general replacement for '``const char*``' is
1123StringRef.
1124
1125For more information on choosing string containers for APIs, please see
1126:ref:`Passing Strings <string_apis>`.
1127
1128.. _dss_stringref:
1129
1130llvm/ADT/StringRef.h
1131^^^^^^^^^^^^^^^^^^^^
1132
1133The StringRef class is a simple value class that contains a pointer to a
1134character and a length, and is quite related to the :ref:`ArrayRef
1135<dss_arrayref>` class (but specialized for arrays of characters). Because
1136StringRef carries a length with it, it safely handles strings with embedded nul
1137characters in it, getting the length does not require a strlen call, and it even
1138has very convenient APIs for slicing and dicing the character range that it
1139represents.
1140
1141StringRef is ideal for passing simple strings around that are known to be live,
1142either because they are C string literals, std::string, a C array, or a
1143SmallVector. Each of these cases has an efficient implicit conversion to
1144StringRef, which doesn't result in a dynamic strlen being executed.
1145
1146StringRef has a few major limitations which make more powerful string containers
1147useful:
1148
1149#. You cannot directly convert a StringRef to a 'const char*' because there is
1150 no way to add a trailing nul (unlike the .c_str() method on various stronger
1151 classes).
1152
1153#. StringRef doesn't own or keep alive the underlying string bytes.
1154 As such it can easily lead to dangling pointers, and is not suitable for
1155 embedding in datastructures in most cases (instead, use an std::string or
1156 something like that).
1157
1158#. For the same reason, StringRef cannot be used as the return value of a
1159 method if the method "computes" the result string. Instead, use std::string.
1160
1161#. StringRef's do not allow you to mutate the pointed-to string bytes and it
1162 doesn't allow you to insert or remove bytes from the range. For editing
1163 operations like this, it interoperates with the :ref:`Twine <dss_twine>`
1164 class.
1165
1166Because of its strengths and limitations, it is very common for a function to
1167take a StringRef and for a method on an object to return a StringRef that points
1168into some string that it owns.
1169
1170.. _dss_twine:
1171
1172llvm/ADT/Twine.h
1173^^^^^^^^^^^^^^^^
1174
1175The Twine class is used as an intermediary datatype for APIs that want to take a
1176string that can be constructed inline with a series of concatenations. Twine
1177works by forming recursive instances of the Twine datatype (a simple value
1178object) on the stack as temporary objects, linking them together into a tree
1179which is then linearized when the Twine is consumed. Twine is only safe to use
1180as the argument to a function, and should always be a const reference, e.g.:
1181
1182.. code-block:: c++
1183
1184 void foo(const Twine &T);
1185 ...
1186 StringRef X = ...
1187 unsigned i = ...
1188 foo(X + "." + Twine(i));
1189
1190This example forms a string like "blarg.42" by concatenating the values
1191together, and does not form intermediate strings containing "blarg" or "blarg.".
1192
1193Because Twine is constructed with temporary objects on the stack, and because
1194these instances are destroyed at the end of the current statement, it is an
1195inherently dangerous API. For example, this simple variant contains undefined
1196behavior and will probably crash:
1197
1198.. code-block:: c++
1199
1200 void foo(const Twine &T);
1201 ...
1202 StringRef X = ...
1203 unsigned i = ...
1204 const Twine &Tmp = X + "." + Twine(i);
1205 foo(Tmp);
1206
1207... because the temporaries are destroyed before the call. That said, Twine's
1208are much more efficient than intermediate std::string temporaries, and they work
1209really well with StringRef. Just be aware of their limitations.
1210
1211.. _dss_smallstring:
1212
1213llvm/ADT/SmallString.h
1214^^^^^^^^^^^^^^^^^^^^^^
1215
1216SmallString is a subclass of :ref:`SmallVector <dss_smallvector>` that adds some
1217convenience APIs like += that takes StringRef's. SmallString avoids allocating
1218memory in the case when the preallocated space is enough to hold its data, and
1219it calls back to general heap allocation when required. Since it owns its data,
1220it is very safe to use and supports full mutation of the string.
1221
1222Like SmallVector's, the big downside to SmallString is their sizeof. While they
1223are optimized for small strings, they themselves are not particularly small.
1224This means that they work great for temporary scratch buffers on the stack, but
1225should not generally be put into the heap: it is very rare to see a SmallString
1226as the member of a frequently-allocated heap data structure or returned
1227by-value.
1228
1229.. _dss_stdstring:
1230
1231std::string
1232^^^^^^^^^^^
1233
1234The standard C++ std::string class is a very general class that (like
1235SmallString) owns its underlying data. sizeof(std::string) is very reasonable
1236so it can be embedded into heap data structures and returned by-value. On the
1237other hand, std::string is highly inefficient for inline editing (e.g.
1238concatenating a bunch of stuff together) and because it is provided by the
1239standard library, its performance characteristics depend a lot of the host
1240standard library (e.g. libc++ and MSVC provide a highly optimized string class,
1241GCC contains a really slow implementation).
1242
1243The major disadvantage of std::string is that almost every operation that makes
1244them larger can allocate memory, which is slow. As such, it is better to use
1245SmallVector or Twine as a scratch buffer, but then use std::string to persist
1246the result.
1247
1248.. _ds_set:
1249
1250Set-Like Containers (std::set, SmallSet, SetVector, etc)
1251--------------------------------------------------------
1252
1253Set-like containers are useful when you need to canonicalize multiple values
1254into a single representation. There are several different choices for how to do
1255this, providing various trade-offs.
1256
1257.. _dss_sortedvectorset:
1258
1259A sorted 'vector'
1260^^^^^^^^^^^^^^^^^
1261
1262If you intend to insert a lot of elements, then do a lot of queries, a great
1263approach is to use a vector (or other sequential container) with
1264std::sort+std::unique to remove duplicates. This approach works really well if
1265your usage pattern has these two distinct phases (insert then query), and can be
1266coupled with a good choice of :ref:`sequential container <ds_sequential>`.
1267
1268This combination provides the several nice properties: the result data is
1269contiguous in memory (good for cache locality), has few allocations, is easy to
1270address (iterators in the final vector are just indices or pointers), and can be
Sean Silvac9fbd232013-03-29 21:57:47 +00001271efficiently queried with a standard binary search (e.g.
1272``std::lower_bound``; if you want the whole range of elements comparing
1273equal, use ``std::equal_range``).
Sean Silvabeb15ca2012-12-04 03:20:08 +00001274
1275.. _dss_smallset:
1276
1277llvm/ADT/SmallSet.h
1278^^^^^^^^^^^^^^^^^^^
1279
1280If you have a set-like data structure that is usually small and whose elements
1281are reasonably small, a ``SmallSet<Type, N>`` is a good choice. This set has
1282space for N elements in place (thus, if the set is dynamically smaller than N,
1283no malloc traffic is required) and accesses them with a simple linear search.
Artyom Skrobov62641152015-05-19 10:21:12 +00001284When the set grows beyond N elements, it allocates a more expensive
Sean Silvabeb15ca2012-12-04 03:20:08 +00001285representation that guarantees efficient access (for most types, it falls back
Artyom Skrobov62641152015-05-19 10:21:12 +00001286to :ref:`std::set <dss_set>`, but for pointers it uses something far better,
1287:ref:`SmallPtrSet <dss_smallptrset>`.
Sean Silvabeb15ca2012-12-04 03:20:08 +00001288
1289The magic of this class is that it handles small sets extremely efficiently, but
1290gracefully handles extremely large sets without loss of efficiency. The
1291drawback is that the interface is quite small: it supports insertion, queries
1292and erasing, but does not support iteration.
1293
1294.. _dss_smallptrset:
1295
1296llvm/ADT/SmallPtrSet.h
1297^^^^^^^^^^^^^^^^^^^^^^
1298
Artyom Skrobov62641152015-05-19 10:21:12 +00001299``SmallPtrSet`` has all the advantages of ``SmallSet`` (and a ``SmallSet`` of
Sean Silvabeb15ca2012-12-04 03:20:08 +00001300pointers is transparently implemented with a ``SmallPtrSet``), but also supports
Artyom Skrobov62641152015-05-19 10:21:12 +00001301iterators. If more than N insertions are performed, a single quadratically
Sean Silvabeb15ca2012-12-04 03:20:08 +00001302probed hash table is allocated and grows as needed, providing extremely
1303efficient access (constant time insertion/deleting/queries with low constant
1304factors) and is very stingy with malloc traffic.
1305
Artyom Skrobov62641152015-05-19 10:21:12 +00001306Note that, unlike :ref:`std::set <dss_set>`, the iterators of ``SmallPtrSet``
1307are invalidated whenever an insertion occurs. Also, the values visited by the
1308iterators are not visited in sorted order.
1309
1310.. _dss_stringset:
1311
1312llvm/ADT/StringSet.h
1313^^^^^^^^^^^^^^^^^^^^
1314
1315``StringSet`` is a thin wrapper around :ref:`StringMap\<char\> <dss_stringmap>`,
1316and it allows efficient storage and retrieval of unique strings.
1317
Sylvestre Ledru84666a12016-02-14 20:16:22 +00001318Functionally analogous to ``SmallSet<StringRef>``, ``StringSet`` also supports
Artyom Skrobov62641152015-05-19 10:21:12 +00001319iteration. (The iterator dereferences to a ``StringMapEntry<char>``, so you
1320need to call ``i->getKey()`` to access the item of the StringSet.) On the
1321other hand, ``StringSet`` doesn't support range-insertion and
1322copy-construction, which :ref:`SmallSet <dss_smallset>` and :ref:`SmallPtrSet
1323<dss_smallptrset>` do support.
Sean Silvabeb15ca2012-12-04 03:20:08 +00001324
1325.. _dss_denseset:
1326
1327llvm/ADT/DenseSet.h
1328^^^^^^^^^^^^^^^^^^^
1329
1330DenseSet is a simple quadratically probed hash table. It excels at supporting
1331small values: it uses a single allocation to hold all of the pairs that are
1332currently inserted in the set. DenseSet is a great way to unique small values
1333that are not simple pointers (use :ref:`SmallPtrSet <dss_smallptrset>` for
1334pointers). Note that DenseSet has the same requirements for the value type that
1335:ref:`DenseMap <dss_densemap>` has.
1336
1337.. _dss_sparseset:
1338
1339llvm/ADT/SparseSet.h
1340^^^^^^^^^^^^^^^^^^^^
1341
1342SparseSet holds a small number of objects identified by unsigned keys of
1343moderate size. It uses a lot of memory, but provides operations that are almost
1344as fast as a vector. Typical keys are physical registers, virtual registers, or
1345numbered basic blocks.
1346
1347SparseSet is useful for algorithms that need very fast clear/find/insert/erase
1348and fast iteration over small sets. It is not intended for building composite
1349data structures.
1350
Michael Ilseman830875b2013-01-21 21:46:32 +00001351.. _dss_sparsemultiset:
1352
1353llvm/ADT/SparseMultiSet.h
1354^^^^^^^^^^^^^^^^^^^^^^^^^^^^
1355
1356SparseMultiSet adds multiset behavior to SparseSet, while retaining SparseSet's
1357desirable attributes. Like SparseSet, it typically uses a lot of memory, but
1358provides operations that are almost as fast as a vector. Typical keys are
1359physical registers, virtual registers, or numbered basic blocks.
1360
1361SparseMultiSet is useful for algorithms that need very fast
1362clear/find/insert/erase of the entire collection, and iteration over sets of
1363elements sharing a key. It is often a more efficient choice than using composite
1364data structures (e.g. vector-of-vectors, map-of-vectors). It is not intended for
1365building composite data structures.
1366
Sean Silvabeb15ca2012-12-04 03:20:08 +00001367.. _dss_FoldingSet:
1368
1369llvm/ADT/FoldingSet.h
1370^^^^^^^^^^^^^^^^^^^^^
1371
1372FoldingSet is an aggregate class that is really good at uniquing
1373expensive-to-create or polymorphic objects. It is a combination of a chained
1374hash table with intrusive links (uniqued objects are required to inherit from
1375FoldingSetNode) that uses :ref:`SmallVector <dss_smallvector>` as part of its ID
1376process.
1377
1378Consider a case where you want to implement a "getOrCreateFoo" method for a
1379complex object (for example, a node in the code generator). The client has a
1380description of **what** it wants to generate (it knows the opcode and all the
1381operands), but we don't want to 'new' a node, then try inserting it into a set
1382only to find out it already exists, at which point we would have to delete it
1383and return the node that already exists.
1384
1385To support this style of client, FoldingSet perform a query with a
1386FoldingSetNodeID (which wraps SmallVector) that can be used to describe the
1387element that we want to query for. The query either returns the element
1388matching the ID or it returns an opaque ID that indicates where insertion should
1389take place. Construction of the ID usually does not require heap traffic.
1390
1391Because FoldingSet uses intrusive links, it can support polymorphic objects in
1392the set (for example, you can have SDNode instances mixed with LoadSDNodes).
1393Because the elements are individually allocated, pointers to the elements are
1394stable: inserting or removing elements does not invalidate any pointers to other
1395elements.
1396
1397.. _dss_set:
1398
1399<set>
1400^^^^^
1401
1402``std::set`` is a reasonable all-around set class, which is decent at many
1403things but great at nothing. std::set allocates memory for each element
1404inserted (thus it is very malloc intensive) and typically stores three pointers
1405per element in the set (thus adding a large amount of per-element space
1406overhead). It offers guaranteed log(n) performance, which is not particularly
1407fast from a complexity standpoint (particularly if the elements of the set are
1408expensive to compare, like strings), and has extremely high constant factors for
1409lookup, insertion and removal.
1410
1411The advantages of std::set are that its iterators are stable (deleting or
1412inserting an element from the set does not affect iterators or pointers to other
1413elements) and that iteration over the set is guaranteed to be in sorted order.
1414If the elements in the set are large, then the relative overhead of the pointers
1415and malloc traffic is not a big deal, but if the elements of the set are small,
1416std::set is almost never a good choice.
1417
1418.. _dss_setvector:
1419
1420llvm/ADT/SetVector.h
1421^^^^^^^^^^^^^^^^^^^^
1422
1423LLVM's ``SetVector<Type>`` is an adapter class that combines your choice of a
1424set-like container along with a :ref:`Sequential Container <ds_sequential>` The
1425important property that this provides is efficient insertion with uniquing
1426(duplicate elements are ignored) with iteration support. It implements this by
1427inserting elements into both a set-like container and the sequential container,
1428using the set-like container for uniquing and the sequential container for
1429iteration.
1430
1431The difference between SetVector and other sets is that the order of iteration
1432is guaranteed to match the order of insertion into the SetVector. This property
1433is really important for things like sets of pointers. Because pointer values
1434are non-deterministic (e.g. vary across runs of the program on different
1435machines), iterating over the pointers in the set will not be in a well-defined
1436order.
1437
1438The drawback of SetVector is that it requires twice as much space as a normal
1439set and has the sum of constant factors from the set-like container and the
1440sequential container that it uses. Use it **only** if you need to iterate over
1441the elements in a deterministic order. SetVector is also expensive to delete
Paul Robinson687915f2013-11-14 18:47:23 +00001442elements out of (linear time), unless you use its "pop_back" method, which is
Sean Silvabeb15ca2012-12-04 03:20:08 +00001443faster.
1444
1445``SetVector`` is an adapter class that defaults to using ``std::vector`` and a
1446size 16 ``SmallSet`` for the underlying containers, so it is quite expensive.
1447However, ``"llvm/ADT/SetVector.h"`` also provides a ``SmallSetVector`` class,
1448which defaults to using a ``SmallVector`` and ``SmallSet`` of a specified size.
1449If you use this, and if your sets are dynamically smaller than ``N``, you will
1450save a lot of heap traffic.
1451
1452.. _dss_uniquevector:
1453
1454llvm/ADT/UniqueVector.h
1455^^^^^^^^^^^^^^^^^^^^^^^
1456
1457UniqueVector is similar to :ref:`SetVector <dss_setvector>` but it retains a
1458unique ID for each element inserted into the set. It internally contains a map
1459and a vector, and it assigns a unique ID for each value inserted into the set.
1460
1461UniqueVector is very expensive: its cost is the sum of the cost of maintaining
1462both the map and vector, it has high complexity, high constant factors, and
1463produces a lot of malloc traffic. It should be avoided.
1464
1465.. _dss_immutableset:
1466
1467llvm/ADT/ImmutableSet.h
1468^^^^^^^^^^^^^^^^^^^^^^^
1469
1470ImmutableSet is an immutable (functional) set implementation based on an AVL
1471tree. Adding or removing elements is done through a Factory object and results
1472in the creation of a new ImmutableSet object. If an ImmutableSet already exists
1473with the given contents, then the existing one is returned; equality is compared
1474with a FoldingSetNodeID. The time and space complexity of add or remove
1475operations is logarithmic in the size of the original set.
1476
1477There is no method for returning an element of the set, you can only check for
1478membership.
1479
1480.. _dss_otherset:
1481
1482Other Set-Like Container Options
1483^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
1484
1485The STL provides several other options, such as std::multiset and the various
1486"hash_set" like containers (whether from C++ TR1 or from the SGI library). We
1487never use hash_set and unordered_set because they are generally very expensive
1488(each insertion requires a malloc) and very non-portable.
1489
1490std::multiset is useful if you're not interested in elimination of duplicates,
Artyom Skrobov62641152015-05-19 10:21:12 +00001491but has all the drawbacks of :ref:`std::set <dss_set>`. A sorted vector
1492(where you don't delete duplicate entries) or some other approach is almost
Aaron Ballman9f154f62015-07-29 15:57:49 +00001493always better.
Sean Silvabeb15ca2012-12-04 03:20:08 +00001494
1495.. _ds_map:
1496
1497Map-Like Containers (std::map, DenseMap, etc)
1498---------------------------------------------
1499
1500Map-like containers are useful when you want to associate data to a key. As
1501usual, there are a lot of different ways to do this. :)
1502
1503.. _dss_sortedvectormap:
1504
1505A sorted 'vector'
1506^^^^^^^^^^^^^^^^^
1507
1508If your usage pattern follows a strict insert-then-query approach, you can
1509trivially use the same approach as :ref:`sorted vectors for set-like containers
1510<dss_sortedvectorset>`. The only difference is that your query function (which
1511uses std::lower_bound to get efficient log(n) lookup) should only compare the
1512key, not both the key and value. This yields the same advantages as sorted
1513vectors for sets.
1514
1515.. _dss_stringmap:
1516
1517llvm/ADT/StringMap.h
1518^^^^^^^^^^^^^^^^^^^^
1519
1520Strings are commonly used as keys in maps, and they are difficult to support
1521efficiently: they are variable length, inefficient to hash and compare when
1522long, expensive to copy, etc. StringMap is a specialized container designed to
1523cope with these issues. It supports mapping an arbitrary range of bytes to an
1524arbitrary other object.
1525
1526The StringMap implementation uses a quadratically-probed hash table, where the
1527buckets store a pointer to the heap allocated entries (and some other stuff).
1528The entries in the map must be heap allocated because the strings are variable
1529length. The string data (key) and the element object (value) are stored in the
1530same allocation with the string data immediately after the element object.
1531This container guarantees the "``(char*)(&Value+1)``" points to the key string
1532for a value.
1533
1534The StringMap is very fast for several reasons: quadratic probing is very cache
1535efficient for lookups, the hash value of strings in buckets is not recomputed
1536when looking up an element, StringMap rarely has to touch the memory for
1537unrelated objects when looking up a value (even when hash collisions happen),
1538hash table growth does not recompute the hash values for strings already in the
1539table, and each pair in the map is store in a single allocation (the string data
1540is stored in the same allocation as the Value of a pair).
1541
1542StringMap also provides query methods that take byte ranges, so it only ever
1543copies a string if a value is inserted into the table.
1544
1545StringMap iteratation order, however, is not guaranteed to be deterministic, so
1546any uses which require that should instead use a std::map.
1547
1548.. _dss_indexmap:
1549
1550llvm/ADT/IndexedMap.h
1551^^^^^^^^^^^^^^^^^^^^^
1552
1553IndexedMap is a specialized container for mapping small dense integers (or
1554values that can be mapped to small dense integers) to some other type. It is
1555internally implemented as a vector with a mapping function that maps the keys
1556to the dense integer range.
1557
1558This is useful for cases like virtual registers in the LLVM code generator: they
1559have a dense mapping that is offset by a compile-time constant (the first
1560virtual register ID).
1561
1562.. _dss_densemap:
1563
1564llvm/ADT/DenseMap.h
1565^^^^^^^^^^^^^^^^^^^
1566
1567DenseMap is a simple quadratically probed hash table. It excels at supporting
1568small keys and values: it uses a single allocation to hold all of the pairs
1569that are currently inserted in the map. DenseMap is a great way to map
1570pointers to pointers, or map other small types to each other.
1571
1572There are several aspects of DenseMap that you should be aware of, however.
1573The iterators in a DenseMap are invalidated whenever an insertion occurs,
1574unlike map. Also, because DenseMap allocates space for a large number of
1575key/value pairs (it starts with 64 by default), it will waste a lot of space if
1576your keys or values are large. Finally, you must implement a partial
1577specialization of DenseMapInfo for the key that you want, if it isn't already
1578supported. This is required to tell DenseMap about two special marker values
1579(which can never be inserted into the map) that it needs internally.
1580
1581DenseMap's find_as() method supports lookup operations using an alternate key
1582type. This is useful in cases where the normal key type is expensive to
1583construct, but cheap to compare against. The DenseMapInfo is responsible for
1584defining the appropriate comparison and hashing methods for each alternate key
1585type used.
1586
1587.. _dss_valuemap:
1588
Chandler Carrutha4ea2692014-03-04 11:26:31 +00001589llvm/IR/ValueMap.h
Sean Silvabeb15ca2012-12-04 03:20:08 +00001590^^^^^^^^^^^^^^^^^^^
1591
1592ValueMap is a wrapper around a :ref:`DenseMap <dss_densemap>` mapping
1593``Value*``\ s (or subclasses) to another type. When a Value is deleted or
1594RAUW'ed, ValueMap will update itself so the new version of the key is mapped to
1595the same value, just as if the key were a WeakVH. You can configure exactly how
1596this happens, and what else happens on these two events, by passing a ``Config``
1597parameter to the ValueMap template.
1598
1599.. _dss_intervalmap:
1600
1601llvm/ADT/IntervalMap.h
1602^^^^^^^^^^^^^^^^^^^^^^
1603
1604IntervalMap is a compact map for small keys and values. It maps key intervals
1605instead of single keys, and it will automatically coalesce adjacent intervals.
Hans Wennborg8888d5b2015-01-17 03:19:21 +00001606When the map only contains a few intervals, they are stored in the map object
Sean Silvabeb15ca2012-12-04 03:20:08 +00001607itself to avoid allocations.
1608
1609The IntervalMap iterators are quite big, so they should not be passed around as
1610STL iterators. The heavyweight iterators allow a smaller data structure.
1611
1612.. _dss_map:
1613
1614<map>
1615^^^^^
1616
1617std::map has similar characteristics to :ref:`std::set <dss_set>`: it uses a
1618single allocation per pair inserted into the map, it offers log(n) lookup with
1619an extremely large constant factor, imposes a space penalty of 3 pointers per
1620pair in the map, etc.
1621
1622std::map is most useful when your keys or values are very large, if you need to
1623iterate over the collection in sorted order, or if you need stable iterators
1624into the map (i.e. they don't get invalidated if an insertion or deletion of
1625another element takes place).
1626
1627.. _dss_mapvector:
1628
1629llvm/ADT/MapVector.h
1630^^^^^^^^^^^^^^^^^^^^
1631
1632``MapVector<KeyT,ValueT>`` provides a subset of the DenseMap interface. The
1633main difference is that the iteration order is guaranteed to be the insertion
1634order, making it an easy (but somewhat expensive) solution for non-deterministic
1635iteration over maps of pointers.
1636
1637It is implemented by mapping from key to an index in a vector of key,value
Duncan P. N. Exon Smithf51601c2014-07-15 20:24:56 +00001638pairs. This provides fast lookup and iteration, but has two main drawbacks:
1639the key is stored twice and removing elements takes linear time. If it is
1640necessary to remove elements, it's best to remove them in bulk using
1641``remove_if()``.
Sean Silvabeb15ca2012-12-04 03:20:08 +00001642
1643.. _dss_inteqclasses:
1644
1645llvm/ADT/IntEqClasses.h
1646^^^^^^^^^^^^^^^^^^^^^^^
1647
1648IntEqClasses provides a compact representation of equivalence classes of small
1649integers. Initially, each integer in the range 0..n-1 has its own equivalence
1650class. Classes can be joined by passing two class representatives to the
1651join(a, b) method. Two integers are in the same class when findLeader() returns
1652the same representative.
1653
1654Once all equivalence classes are formed, the map can be compressed so each
1655integer 0..n-1 maps to an equivalence class number in the range 0..m-1, where m
1656is the total number of equivalence classes. The map must be uncompressed before
1657it can be edited again.
1658
1659.. _dss_immutablemap:
1660
1661llvm/ADT/ImmutableMap.h
1662^^^^^^^^^^^^^^^^^^^^^^^
1663
1664ImmutableMap is an immutable (functional) map implementation based on an AVL
1665tree. Adding or removing elements is done through a Factory object and results
1666in the creation of a new ImmutableMap object. If an ImmutableMap already exists
1667with the given key set, then the existing one is returned; equality is compared
1668with a FoldingSetNodeID. The time and space complexity of add or remove
1669operations is logarithmic in the size of the original map.
1670
1671.. _dss_othermap:
1672
1673Other Map-Like Container Options
1674^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
1675
1676The STL provides several other options, such as std::multimap and the various
1677"hash_map" like containers (whether from C++ TR1 or from the SGI library). We
1678never use hash_set and unordered_set because they are generally very expensive
1679(each insertion requires a malloc) and very non-portable.
1680
1681std::multimap is useful if you want to map a key to multiple values, but has all
1682the drawbacks of std::map. A sorted vector or some other approach is almost
1683always better.
1684
1685.. _ds_bit:
1686
1687Bit storage containers (BitVector, SparseBitVector)
1688---------------------------------------------------
1689
1690Unlike the other containers, there are only two bit storage containers, and
1691choosing when to use each is relatively straightforward.
1692
1693One additional option is ``std::vector<bool>``: we discourage its use for two
1694reasons 1) the implementation in many common compilers (e.g. commonly
1695available versions of GCC) is extremely inefficient and 2) the C++ standards
1696committee is likely to deprecate this container and/or change it significantly
1697somehow. In any case, please don't use it.
1698
1699.. _dss_bitvector:
1700
1701BitVector
1702^^^^^^^^^
1703
1704The BitVector container provides a dynamic size set of bits for manipulation.
1705It supports individual bit setting/testing, as well as set operations. The set
1706operations take time O(size of bitvector), but operations are performed one word
1707at a time, instead of one bit at a time. This makes the BitVector very fast for
1708set operations compared to other containers. Use the BitVector when you expect
1709the number of set bits to be high (i.e. a dense set).
1710
1711.. _dss_smallbitvector:
1712
1713SmallBitVector
1714^^^^^^^^^^^^^^
1715
1716The SmallBitVector container provides the same interface as BitVector, but it is
1717optimized for the case where only a small number of bits, less than 25 or so,
1718are needed. It also transparently supports larger bit counts, but slightly less
1719efficiently than a plain BitVector, so SmallBitVector should only be used when
1720larger counts are rare.
1721
1722At this time, SmallBitVector does not support set operations (and, or, xor), and
1723its operator[] does not provide an assignable lvalue.
1724
1725.. _dss_sparsebitvector:
1726
1727SparseBitVector
1728^^^^^^^^^^^^^^^
1729
1730The SparseBitVector container is much like BitVector, with one major difference:
1731Only the bits that are set, are stored. This makes the SparseBitVector much
1732more space efficient than BitVector when the set is sparse, as well as making
1733set operations O(number of set bits) instead of O(size of universe). The
1734downside to the SparseBitVector is that setting and testing of random bits is
1735O(N), and on large SparseBitVectors, this can be slower than BitVector. In our
1736implementation, setting or testing bits in sorted order (either forwards or
1737reverse) is O(1) worst case. Testing and setting bits within 128 bits (depends
1738on size) of the current bit is also O(1). As a general statement,
1739testing/setting bits in a SparseBitVector is O(distance away from last set bit).
1740
1741.. _common:
1742
1743Helpful Hints for Common Operations
1744===================================
1745
1746This section describes how to perform some very simple transformations of LLVM
1747code. This is meant to give examples of common idioms used, showing the
1748practical side of LLVM transformations.
1749
1750Because this is a "how-to" section, you should also read about the main classes
1751that you will be working with. The :ref:`Core LLVM Class Hierarchy Reference
1752<coreclasses>` contains details and descriptions of the main classes that you
1753should know about.
1754
1755.. _inspection:
1756
1757Basic Inspection and Traversal Routines
1758---------------------------------------
1759
1760The LLVM compiler infrastructure have many different data structures that may be
1761traversed. Following the example of the C++ standard template library, the
1762techniques used to traverse these various data structures are all basically the
1763same. For a enumerable sequence of values, the ``XXXbegin()`` function (or
1764method) returns an iterator to the start of the sequence, the ``XXXend()``
1765function returns an iterator pointing to one past the last valid element of the
1766sequence, and there is some ``XXXiterator`` data type that is common between the
1767two operations.
1768
1769Because the pattern for iteration is common across many different aspects of the
1770program representation, the standard template library algorithms may be used on
1771them, and it is easier to remember how to iterate. First we show a few common
1772examples of the data structures that need to be traversed. Other data
1773structures are traversed in very similar ways.
1774
1775.. _iterate_function:
1776
1777Iterating over the ``BasicBlock`` in a ``Function``
1778^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
1779
1780It's quite common to have a ``Function`` instance that you'd like to transform
1781in some way; in particular, you'd like to manipulate its ``BasicBlock``\ s. To
1782facilitate this, you'll need to iterate over all of the ``BasicBlock``\ s that
1783constitute the ``Function``. The following is an example that prints the name
1784of a ``BasicBlock`` and the number of ``Instruction``\ s it contains:
1785
1786.. code-block:: c++
1787
1788 // func is a pointer to a Function instance
1789 for (Function::iterator i = func->begin(), e = func->end(); i != e; ++i)
1790 // Print out the name of the basic block if it has one, and then the
1791 // number of instructions that it contains
1792 errs() << "Basic block (name=" << i->getName() << ") has "
1793 << i->size() << " instructions.\n";
1794
1795Note that i can be used as if it were a pointer for the purposes of invoking
1796member functions of the ``Instruction`` class. This is because the indirection
1797operator is overloaded for the iterator classes. In the above code, the
1798expression ``i->size()`` is exactly equivalent to ``(*i).size()`` just like
1799you'd expect.
1800
1801.. _iterate_basicblock:
1802
1803Iterating over the ``Instruction`` in a ``BasicBlock``
1804^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
1805
1806Just like when dealing with ``BasicBlock``\ s in ``Function``\ s, it's easy to
1807iterate over the individual instructions that make up ``BasicBlock``\ s. Here's
1808a code snippet that prints out each instruction in a ``BasicBlock``:
1809
1810.. code-block:: c++
1811
1812 // blk is a pointer to a BasicBlock instance
1813 for (BasicBlock::iterator i = blk->begin(), e = blk->end(); i != e; ++i)
1814 // The next statement works since operator<<(ostream&,...)
1815 // is overloaded for Instruction&
1816 errs() << *i << "\n";
1817
1818
1819However, this isn't really the best way to print out the contents of a
1820``BasicBlock``! Since the ostream operators are overloaded for virtually
1821anything you'll care about, you could have just invoked the print routine on the
1822basic block itself: ``errs() << *blk << "\n";``.
1823
1824.. _iterate_insiter:
1825
1826Iterating over the ``Instruction`` in a ``Function``
1827^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
1828
1829If you're finding that you commonly iterate over a ``Function``'s
1830``BasicBlock``\ s and then that ``BasicBlock``'s ``Instruction``\ s,
1831``InstIterator`` should be used instead. You'll need to include
Yaron Kerend9c0bed2014-05-03 11:30:49 +00001832``llvm/IR/InstIterator.h`` (`doxygen
Yaron Keren81bb4152014-05-03 12:06:13 +00001833<http://llvm.org/doxygen/InstIterator_8h.html>`__) and then instantiate
Sean Silvabeb15ca2012-12-04 03:20:08 +00001834``InstIterator``\ s explicitly in your code. Here's a small example that shows
1835how to dump all instructions in a function to the standard error stream:
1836
1837.. code-block:: c++
1838
Yaron Kerend9c0bed2014-05-03 11:30:49 +00001839 #include "llvm/IR/InstIterator.h"
Sean Silvabeb15ca2012-12-04 03:20:08 +00001840
1841 // F is a pointer to a Function instance
1842 for (inst_iterator I = inst_begin(F), E = inst_end(F); I != E; ++I)
1843 errs() << *I << "\n";
1844
1845Easy, isn't it? You can also use ``InstIterator``\ s to fill a work list with
1846its initial contents. For example, if you wanted to initialize a work list to
1847contain all instructions in a ``Function`` F, all you would need to do is
1848something like:
1849
1850.. code-block:: c++
1851
1852 std::set<Instruction*> worklist;
1853 // or better yet, SmallPtrSet<Instruction*, 64> worklist;
1854
1855 for (inst_iterator I = inst_begin(F), E = inst_end(F); I != E; ++I)
1856 worklist.insert(&*I);
1857
1858The STL set ``worklist`` would now contain all instructions in the ``Function``
1859pointed to by F.
1860
1861.. _iterate_convert:
1862
1863Turning an iterator into a class pointer (and vice-versa)
1864^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
1865
1866Sometimes, it'll be useful to grab a reference (or pointer) to a class instance
1867when all you've got at hand is an iterator. Well, extracting a reference or a
1868pointer from an iterator is very straight-forward. Assuming that ``i`` is a
1869``BasicBlock::iterator`` and ``j`` is a ``BasicBlock::const_iterator``:
1870
1871.. code-block:: c++
1872
1873 Instruction& inst = *i; // Grab reference to instruction reference
1874 Instruction* pinst = &*i; // Grab pointer to instruction reference
1875 const Instruction& inst = *j;
1876
1877However, the iterators you'll be working with in the LLVM framework are special:
1878they will automatically convert to a ptr-to-instance type whenever they need to.
1879Instead of derferencing the iterator and then taking the address of the result,
1880you can simply assign the iterator to the proper pointer type and you get the
1881dereference and address-of operation as a result of the assignment (behind the
Charlie Turner2ac115e2015-04-16 17:01:23 +00001882scenes, this is a result of overloading casting mechanisms). Thus the second
1883line of the last example,
Sean Silvabeb15ca2012-12-04 03:20:08 +00001884
1885.. code-block:: c++
1886
1887 Instruction *pinst = &*i;
1888
1889is semantically equivalent to
1890
1891.. code-block:: c++
1892
1893 Instruction *pinst = i;
1894
1895It's also possible to turn a class pointer into the corresponding iterator, and
1896this is a constant time operation (very efficient). The following code snippet
1897illustrates use of the conversion constructors provided by LLVM iterators. By
1898using these, you can explicitly grab the iterator of something without actually
1899obtaining it via iteration over some structure:
1900
1901.. code-block:: c++
1902
1903 void printNextInstruction(Instruction* inst) {
1904 BasicBlock::iterator it(inst);
1905 ++it; // After this line, it refers to the instruction after *inst
1906 if (it != inst->getParent()->end()) errs() << *it << "\n";
1907 }
1908
1909Unfortunately, these implicit conversions come at a cost; they prevent these
1910iterators from conforming to standard iterator conventions, and thus from being
1911usable with standard algorithms and containers. For example, they prevent the
1912following code, where ``B`` is a ``BasicBlock``, from compiling:
1913
1914.. code-block:: c++
1915
1916 llvm::SmallVector<llvm::Instruction *, 16>(B->begin(), B->end());
1917
1918Because of this, these implicit conversions may be removed some day, and
1919``operator*`` changed to return a pointer instead of a reference.
1920
1921.. _iterate_complex:
1922
1923Finding call sites: a slightly more complex example
1924^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
1925
1926Say that you're writing a FunctionPass and would like to count all the locations
1927in the entire module (that is, across every ``Function``) where a certain
1928function (i.e., some ``Function *``) is already in scope. As you'll learn
1929later, you may want to use an ``InstVisitor`` to accomplish this in a much more
1930straight-forward manner, but this example will allow us to explore how you'd do
1931it if you didn't have ``InstVisitor`` around. In pseudo-code, this is what we
1932want to do:
1933
1934.. code-block:: none
1935
1936 initialize callCounter to zero
1937 for each Function f in the Module
1938 for each BasicBlock b in f
1939 for each Instruction i in b
1940 if (i is a CallInst and calls the given function)
1941 increment callCounter
1942
1943And the actual code is (remember, because we're writing a ``FunctionPass``, our
1944``FunctionPass``-derived class simply has to override the ``runOnFunction``
1945method):
1946
1947.. code-block:: c++
1948
1949 Function* targetFunc = ...;
1950
1951 class OurFunctionPass : public FunctionPass {
1952 public:
1953 OurFunctionPass(): callCounter(0) { }
1954
1955 virtual runOnFunction(Function& F) {
1956 for (Function::iterator b = F.begin(), be = F.end(); b != be; ++b) {
1957 for (BasicBlock::iterator i = b->begin(), ie = b->end(); i != ie; ++i) {
1958 if (CallInst* callInst = dyn_cast<CallInst>(&*i)) {
1959 // We know we've encountered a call instruction, so we
1960 // need to determine if it's a call to the
1961 // function pointed to by m_func or not.
1962 if (callInst->getCalledFunction() == targetFunc)
1963 ++callCounter;
1964 }
1965 }
1966 }
1967 }
1968
1969 private:
1970 unsigned callCounter;
1971 };
1972
1973.. _calls_and_invokes:
1974
1975Treating calls and invokes the same way
1976^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
1977
1978You may have noticed that the previous example was a bit oversimplified in that
1979it did not deal with call sites generated by 'invoke' instructions. In this,
1980and in other situations, you may find that you want to treat ``CallInst``\ s and
1981``InvokeInst``\ s the same way, even though their most-specific common base
1982class is ``Instruction``, which includes lots of less closely-related things.
1983For these cases, LLVM provides a handy wrapper class called ``CallSite``
1984(`doxygen <http://llvm.org/doxygen/classllvm_1_1CallSite.html>`__) It is
1985essentially a wrapper around an ``Instruction`` pointer, with some methods that
1986provide functionality common to ``CallInst``\ s and ``InvokeInst``\ s.
1987
1988This class has "value semantics": it should be passed by value, not by reference
1989and it should not be dynamically allocated or deallocated using ``operator new``
1990or ``operator delete``. It is efficiently copyable, assignable and
1991constructable, with costs equivalents to that of a bare pointer. If you look at
1992its definition, it has only a single pointer member.
1993
1994.. _iterate_chains:
1995
1996Iterating over def-use & use-def chains
1997^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
1998
1999Frequently, we might have an instance of the ``Value`` class (`doxygen
2000<http://llvm.org/doxygen/classllvm_1_1Value.html>`__) and we want to determine
2001which ``User`` s use the ``Value``. The list of all ``User``\ s of a particular
2002``Value`` is called a *def-use* chain. For example, let's say we have a
2003``Function*`` named ``F`` to a particular function ``foo``. Finding all of the
2004instructions that *use* ``foo`` is as simple as iterating over the *def-use*
2005chain of ``F``:
2006
2007.. code-block:: c++
2008
2009 Function *F = ...;
2010
Adam Nemet3aecd182015-03-17 17:51:58 +00002011 for (User *U : F->users()) {
Yaron Kerenadcf88e2014-05-01 12:33:26 +00002012 if (Instruction *Inst = dyn_cast<Instruction>(U)) {
Sean Silvabeb15ca2012-12-04 03:20:08 +00002013 errs() << "F is used in instruction:\n";
2014 errs() << *Inst << "\n";
2015 }
2016
Sean Silvabeb15ca2012-12-04 03:20:08 +00002017Alternatively, it's common to have an instance of the ``User`` Class (`doxygen
2018<http://llvm.org/doxygen/classllvm_1_1User.html>`__) and need to know what
2019``Value``\ s are used by it. The list of all ``Value``\ s used by a ``User`` is
2020known as a *use-def* chain. Instances of class ``Instruction`` are common
2021``User`` s, so we might want to iterate over all of the values that a particular
2022instruction uses (that is, the operands of the particular ``Instruction``):
2023
2024.. code-block:: c++
2025
2026 Instruction *pi = ...;
2027
Yaron Keren7229bbf2014-05-02 08:26:30 +00002028 for (Use &U : pi->operands()) {
Yaron Kerenadcf88e2014-05-01 12:33:26 +00002029 Value *v = U.get();
Sean Silvabeb15ca2012-12-04 03:20:08 +00002030 // ...
2031 }
2032
2033Declaring objects as ``const`` is an important tool of enforcing mutation free
2034algorithms (such as analyses, etc.). For this purpose above iterators come in
2035constant flavors as ``Value::const_use_iterator`` and
2036``Value::const_op_iterator``. They automatically arise when calling
2037``use/op_begin()`` on ``const Value*``\ s or ``const User*``\ s respectively.
2038Upon dereferencing, they return ``const Use*``\ s. Otherwise the above patterns
2039remain unchanged.
2040
2041.. _iterate_preds:
2042
2043Iterating over predecessors & successors of blocks
2044^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2045
2046Iterating over the predecessors and successors of a block is quite easy with the
Yaron Keren28e28e82015-07-12 20:40:41 +00002047routines defined in ``"llvm/IR/CFG.h"``. Just use code like this to
Sean Silvabeb15ca2012-12-04 03:20:08 +00002048iterate over all predecessors of BB:
2049
2050.. code-block:: c++
2051
2052 #include "llvm/Support/CFG.h"
2053 BasicBlock *BB = ...;
2054
Duncan P. N. Exon Smith6c990152014-07-21 17:06:51 +00002055 for (pred_iterator PI = pred_begin(BB), E = pred_end(BB); PI != E; ++PI) {
2056 BasicBlock *Pred = *PI;
Sean Silvabeb15ca2012-12-04 03:20:08 +00002057 // ...
2058 }
2059
Duncan P. N. Exon Smith6c990152014-07-21 17:06:51 +00002060Similarly, to iterate over successors use ``succ_iterator/succ_begin/succ_end``.
Sean Silvabeb15ca2012-12-04 03:20:08 +00002061
2062.. _simplechanges:
2063
2064Making simple changes
2065---------------------
2066
2067There are some primitive transformation operations present in the LLVM
2068infrastructure that are worth knowing about. When performing transformations,
2069it's fairly common to manipulate the contents of basic blocks. This section
2070describes some of the common methods for doing so and gives example code.
2071
2072.. _schanges_creating:
2073
2074Creating and inserting new ``Instruction``\ s
2075^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2076
2077*Instantiating Instructions*
2078
2079Creation of ``Instruction``\ s is straight-forward: simply call the constructor
2080for the kind of instruction to instantiate and provide the necessary parameters.
2081For example, an ``AllocaInst`` only *requires* a (const-ptr-to) ``Type``. Thus:
2082
2083.. code-block:: c++
2084
2085 AllocaInst* ai = new AllocaInst(Type::Int32Ty);
2086
2087will create an ``AllocaInst`` instance that represents the allocation of one
2088integer in the current stack frame, at run time. Each ``Instruction`` subclass
2089is likely to have varying default parameters which change the semantics of the
2090instruction, so refer to the `doxygen documentation for the subclass of
2091Instruction <http://llvm.org/doxygen/classllvm_1_1Instruction.html>`_ that
2092you're interested in instantiating.
2093
2094*Naming values*
2095
2096It is very useful to name the values of instructions when you're able to, as
2097this facilitates the debugging of your transformations. If you end up looking
2098at generated LLVM machine code, you definitely want to have logical names
2099associated with the results of instructions! By supplying a value for the
2100``Name`` (default) parameter of the ``Instruction`` constructor, you associate a
2101logical name with the result of the instruction's execution at run time. For
2102example, say that I'm writing a transformation that dynamically allocates space
2103for an integer on the stack, and that integer is going to be used as some kind
2104of index by some other code. To accomplish this, I place an ``AllocaInst`` at
2105the first point in the first ``BasicBlock`` of some ``Function``, and I'm
2106intending to use it within the same ``Function``. I might do:
2107
2108.. code-block:: c++
2109
2110 AllocaInst* pa = new AllocaInst(Type::Int32Ty, 0, "indexLoc");
2111
2112where ``indexLoc`` is now the logical name of the instruction's execution value,
2113which is a pointer to an integer on the run time stack.
2114
2115*Inserting instructions*
2116
Dan Liewc6ab58f2014-06-06 17:25:47 +00002117There are essentially three ways to insert an ``Instruction`` into an existing
Sean Silvabeb15ca2012-12-04 03:20:08 +00002118sequence of instructions that form a ``BasicBlock``:
2119
2120* Insertion into an explicit instruction list
2121
2122 Given a ``BasicBlock* pb``, an ``Instruction* pi`` within that ``BasicBlock``,
2123 and a newly-created instruction we wish to insert before ``*pi``, we do the
2124 following:
2125
2126 .. code-block:: c++
2127
2128 BasicBlock *pb = ...;
2129 Instruction *pi = ...;
2130 Instruction *newInst = new Instruction(...);
2131
2132 pb->getInstList().insert(pi, newInst); // Inserts newInst before pi in pb
2133
2134 Appending to the end of a ``BasicBlock`` is so common that the ``Instruction``
2135 class and ``Instruction``-derived classes provide constructors which take a
2136 pointer to a ``BasicBlock`` to be appended to. For example code that looked
2137 like:
2138
2139 .. code-block:: c++
2140
2141 BasicBlock *pb = ...;
2142 Instruction *newInst = new Instruction(...);
2143
2144 pb->getInstList().push_back(newInst); // Appends newInst to pb
2145
2146 becomes:
2147
2148 .. code-block:: c++
2149
2150 BasicBlock *pb = ...;
2151 Instruction *newInst = new Instruction(..., pb);
2152
2153 which is much cleaner, especially if you are creating long instruction
2154 streams.
2155
2156* Insertion into an implicit instruction list
2157
2158 ``Instruction`` instances that are already in ``BasicBlock``\ s are implicitly
2159 associated with an existing instruction list: the instruction list of the
2160 enclosing basic block. Thus, we could have accomplished the same thing as the
2161 above code without being given a ``BasicBlock`` by doing:
2162
2163 .. code-block:: c++
2164
2165 Instruction *pi = ...;
2166 Instruction *newInst = new Instruction(...);
2167
2168 pi->getParent()->getInstList().insert(pi, newInst);
2169
2170 In fact, this sequence of steps occurs so frequently that the ``Instruction``
2171 class and ``Instruction``-derived classes provide constructors which take (as
2172 a default parameter) a pointer to an ``Instruction`` which the newly-created
2173 ``Instruction`` should precede. That is, ``Instruction`` constructors are
2174 capable of inserting the newly-created instance into the ``BasicBlock`` of a
2175 provided instruction, immediately before that instruction. Using an
2176 ``Instruction`` constructor with a ``insertBefore`` (default) parameter, the
2177 above code becomes:
2178
2179 .. code-block:: c++
2180
2181 Instruction* pi = ...;
2182 Instruction* newInst = new Instruction(..., pi);
2183
2184 which is much cleaner, especially if you're creating a lot of instructions and
2185 adding them to ``BasicBlock``\ s.
2186
Dan Liewc6ab58f2014-06-06 17:25:47 +00002187* Insertion using an instance of ``IRBuilder``
2188
Dan Liew599cec62014-06-06 18:44:21 +00002189 Inserting several ``Instruction``\ s can be quite laborious using the previous
Dan Liewc6ab58f2014-06-06 17:25:47 +00002190 methods. The ``IRBuilder`` is a convenience class that can be used to add
2191 several instructions to the end of a ``BasicBlock`` or before a particular
2192 ``Instruction``. It also supports constant folding and renaming named
2193 registers (see ``IRBuilder``'s template arguments).
2194
2195 The example below demonstrates a very simple use of the ``IRBuilder`` where
2196 three instructions are inserted before the instruction ``pi``. The first two
2197 instructions are Call instructions and third instruction multiplies the return
2198 value of the two calls.
2199
2200 .. code-block:: c++
2201
2202 Instruction *pi = ...;
2203 IRBuilder<> Builder(pi);
2204 CallInst* callOne = Builder.CreateCall(...);
2205 CallInst* callTwo = Builder.CreateCall(...);
2206 Value* result = Builder.CreateMul(callOne, callTwo);
2207
2208 The example below is similar to the above example except that the created
2209 ``IRBuilder`` inserts instructions at the end of the ``BasicBlock`` ``pb``.
2210
2211 .. code-block:: c++
2212
2213 BasicBlock *pb = ...;
2214 IRBuilder<> Builder(pb);
2215 CallInst* callOne = Builder.CreateCall(...);
2216 CallInst* callTwo = Builder.CreateCall(...);
2217 Value* result = Builder.CreateMul(callOne, callTwo);
2218
2219 See :doc:`tutorial/LangImpl3` for a practical use of the ``IRBuilder``.
2220
2221
Sean Silvabeb15ca2012-12-04 03:20:08 +00002222.. _schanges_deleting:
2223
2224Deleting Instructions
2225^^^^^^^^^^^^^^^^^^^^^
2226
2227Deleting an instruction from an existing sequence of instructions that form a
2228BasicBlock_ is very straight-forward: just call the instruction's
2229``eraseFromParent()`` method. For example:
2230
2231.. code-block:: c++
2232
2233 Instruction *I = .. ;
2234 I->eraseFromParent();
2235
2236This unlinks the instruction from its containing basic block and deletes it. If
2237you'd just like to unlink the instruction from its containing basic block but
2238not delete it, you can use the ``removeFromParent()`` method.
2239
2240.. _schanges_replacing:
2241
2242Replacing an Instruction with another Value
2243^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2244
2245Replacing individual instructions
2246"""""""""""""""""""""""""""""""""
2247
2248Including "`llvm/Transforms/Utils/BasicBlockUtils.h
2249<http://llvm.org/doxygen/BasicBlockUtils_8h-source.html>`_" permits use of two
2250very useful replace functions: ``ReplaceInstWithValue`` and
2251``ReplaceInstWithInst``.
2252
2253.. _schanges_deleting_sub:
2254
2255Deleting Instructions
2256"""""""""""""""""""""
2257
2258* ``ReplaceInstWithValue``
2259
2260 This function replaces all uses of a given instruction with a value, and then
2261 removes the original instruction. The following example illustrates the
2262 replacement of the result of a particular ``AllocaInst`` that allocates memory
2263 for a single integer with a null pointer to an integer.
2264
2265 .. code-block:: c++
2266
2267 AllocaInst* instToReplace = ...;
2268 BasicBlock::iterator ii(instToReplace);
2269
2270 ReplaceInstWithValue(instToReplace->getParent()->getInstList(), ii,
2271 Constant::getNullValue(PointerType::getUnqual(Type::Int32Ty)));
2272
2273* ``ReplaceInstWithInst``
2274
2275 This function replaces a particular instruction with another instruction,
2276 inserting the new instruction into the basic block at the location where the
2277 old instruction was, and replacing any uses of the old instruction with the
2278 new instruction. The following example illustrates the replacement of one
2279 ``AllocaInst`` with another.
2280
2281 .. code-block:: c++
2282
2283 AllocaInst* instToReplace = ...;
2284 BasicBlock::iterator ii(instToReplace);
2285
2286 ReplaceInstWithInst(instToReplace->getParent()->getInstList(), ii,
2287 new AllocaInst(Type::Int32Ty, 0, "ptrToReplacedInt"));
2288
2289
2290Replacing multiple uses of Users and Values
2291"""""""""""""""""""""""""""""""""""""""""""
2292
2293You can use ``Value::replaceAllUsesWith`` and ``User::replaceUsesOfWith`` to
2294change more than one use at a time. See the doxygen documentation for the
2295`Value Class <http://llvm.org/doxygen/classllvm_1_1Value.html>`_ and `User Class
2296<http://llvm.org/doxygen/classllvm_1_1User.html>`_, respectively, for more
2297information.
2298
2299.. _schanges_deletingGV:
2300
2301Deleting GlobalVariables
2302^^^^^^^^^^^^^^^^^^^^^^^^
2303
2304Deleting a global variable from a module is just as easy as deleting an
2305Instruction. First, you must have a pointer to the global variable that you
2306wish to delete. You use this pointer to erase it from its parent, the module.
2307For example:
2308
2309.. code-block:: c++
2310
2311 GlobalVariable *GV = .. ;
2312
2313 GV->eraseFromParent();
2314
2315
2316.. _create_types:
2317
2318How to Create Types
2319-------------------
2320
2321In generating IR, you may need some complex types. If you know these types
2322statically, you can use ``TypeBuilder<...>::get()``, defined in
2323``llvm/Support/TypeBuilder.h``, to retrieve them. ``TypeBuilder`` has two forms
2324depending on whether you're building types for cross-compilation or native
2325library use. ``TypeBuilder<T, true>`` requires that ``T`` be independent of the
2326host environment, meaning that it's built out of types from the ``llvm::types``
2327(`doxygen <http://llvm.org/doxygen/namespacellvm_1_1types.html>`__) namespace
2328and pointers, functions, arrays, etc. built of those. ``TypeBuilder<T, false>``
2329additionally allows native C types whose size may depend on the host compiler.
2330For example,
2331
2332.. code-block:: c++
2333
2334 FunctionType *ft = TypeBuilder<types::i<8>(types::i<32>*), true>::get();
2335
2336is easier to read and write than the equivalent
2337
2338.. code-block:: c++
2339
2340 std::vector<const Type*> params;
2341 params.push_back(PointerType::getUnqual(Type::Int32Ty));
2342 FunctionType *ft = FunctionType::get(Type::Int8Ty, params, false);
2343
2344See the `class comment
2345<http://llvm.org/doxygen/TypeBuilder_8h-source.html#l00001>`_ for more details.
2346
2347.. _threading:
2348
2349Threads and LLVM
2350================
2351
2352This section describes the interaction of the LLVM APIs with multithreading,
2353both on the part of client applications, and in the JIT, in the hosted
2354application.
2355
2356Note that LLVM's support for multithreading is still relatively young. Up
2357through version 2.5, the execution of threaded hosted applications was
2358supported, but not threaded client access to the APIs. While this use case is
2359now supported, clients *must* adhere to the guidelines specified below to ensure
2360proper operation in multithreaded mode.
2361
2362Note that, on Unix-like platforms, LLVM requires the presence of GCC's atomic
2363intrinsics in order to support threaded operation. If you need a
2364multhreading-capable LLVM on a platform without a suitably modern system
2365compiler, consider compiling LLVM and LLVM-GCC in single-threaded mode, and
2366using the resultant compiler to build a copy of LLVM with multithreading
2367support.
2368
Sean Silvabeb15ca2012-12-04 03:20:08 +00002369.. _shutdown:
2370
2371Ending Execution with ``llvm_shutdown()``
2372-----------------------------------------
2373
2374When you are done using the LLVM APIs, you should call ``llvm_shutdown()`` to
Chandler Carruth39cd2162014-06-27 15:13:01 +00002375deallocate memory used for internal structures.
Zachary Turnerccbf3d02014-06-16 22:49:41 +00002376
Sean Silvabeb15ca2012-12-04 03:20:08 +00002377.. _managedstatic:
2378
2379Lazy Initialization with ``ManagedStatic``
2380------------------------------------------
2381
2382``ManagedStatic`` is a utility class in LLVM used to implement static
Chandler Carruth39cd2162014-06-27 15:13:01 +00002383initialization of static resources, such as the global type tables. In a
2384single-threaded environment, it implements a simple lazy initialization scheme.
2385When LLVM is compiled with support for multi-threading, however, it uses
Sean Silvabeb15ca2012-12-04 03:20:08 +00002386double-checked locking to implement thread-safe lazy initialization.
2387
Sean Silvabeb15ca2012-12-04 03:20:08 +00002388.. _llvmcontext:
2389
2390Achieving Isolation with ``LLVMContext``
2391----------------------------------------
2392
2393``LLVMContext`` is an opaque class in the LLVM API which clients can use to
2394operate multiple, isolated instances of LLVM concurrently within the same
2395address space. For instance, in a hypothetical compile-server, the compilation
2396of an individual translation unit is conceptually independent from all the
2397others, and it would be desirable to be able to compile incoming translation
2398units concurrently on independent server threads. Fortunately, ``LLVMContext``
2399exists to enable just this kind of scenario!
2400
2401Conceptually, ``LLVMContext`` provides isolation. Every LLVM entity
2402(``Module``\ s, ``Value``\ s, ``Type``\ s, ``Constant``\ s, etc.) in LLVM's
2403in-memory IR belongs to an ``LLVMContext``. Entities in different contexts
2404*cannot* interact with each other: ``Module``\ s in different contexts cannot be
2405linked together, ``Function``\ s cannot be added to ``Module``\ s in different
2406contexts, etc. What this means is that is is safe to compile on multiple
2407threads simultaneously, as long as no two threads operate on entities within the
2408same context.
2409
2410In practice, very few places in the API require the explicit specification of a
2411``LLVMContext``, other than the ``Type`` creation/lookup APIs. Because every
2412``Type`` carries a reference to its owning context, most other entities can
2413determine what context they belong to by looking at their own ``Type``. If you
2414are adding new entities to LLVM IR, please try to maintain this interface
2415design.
2416
2417For clients that do *not* require the benefits of isolation, LLVM provides a
2418convenience API ``getGlobalContext()``. This returns a global, lazily
2419initialized ``LLVMContext`` that may be used in situations where isolation is
2420not a concern.
2421
2422.. _jitthreading:
2423
2424Threads and the JIT
2425-------------------
2426
2427LLVM's "eager" JIT compiler is safe to use in threaded programs. Multiple
2428threads can call ``ExecutionEngine::getPointerToFunction()`` or
2429``ExecutionEngine::runFunction()`` concurrently, and multiple threads can run
2430code output by the JIT concurrently. The user must still ensure that only one
2431thread accesses IR in a given ``LLVMContext`` while another thread might be
2432modifying it. One way to do that is to always hold the JIT lock while accessing
2433IR outside the JIT (the JIT *modifies* the IR by adding ``CallbackVH``\ s).
2434Another way is to only call ``getPointerToFunction()`` from the
2435``LLVMContext``'s thread.
2436
2437When the JIT is configured to compile lazily (using
2438``ExecutionEngine::DisableLazyCompilation(false)``), there is currently a `race
2439condition <http://llvm.org/bugs/show_bug.cgi?id=5184>`_ in updating call sites
2440after a function is lazily-jitted. It's still possible to use the lazy JIT in a
2441threaded program if you ensure that only one thread at a time can call any
2442particular lazy stub and that the JIT lock guards any IR access, but we suggest
2443using only the eager JIT in threaded programs.
2444
2445.. _advanced:
2446
2447Advanced Topics
2448===============
2449
2450This section describes some of the advanced or obscure API's that most clients
2451do not need to be aware of. These API's tend manage the inner workings of the
2452LLVM system, and only need to be accessed in unusual circumstances.
2453
2454.. _SymbolTable:
2455
2456The ``ValueSymbolTable`` class
2457------------------------------
2458
2459The ``ValueSymbolTable`` (`doxygen
2460<http://llvm.org/doxygen/classllvm_1_1ValueSymbolTable.html>`__) class provides
2461a symbol table that the :ref:`Function <c_Function>` and Module_ classes use for
2462naming value definitions. The symbol table can provide a name for any Value_.
2463
2464Note that the ``SymbolTable`` class should not be directly accessed by most
2465clients. It should only be used when iteration over the symbol table names
2466themselves are required, which is very special purpose. Note that not all LLVM
2467Value_\ s have names, and those without names (i.e. they have an empty name) do
2468not exist in the symbol table.
2469
2470Symbol tables support iteration over the values in the symbol table with
2471``begin/end/iterator`` and supports querying to see if a specific name is in the
2472symbol table (with ``lookup``). The ``ValueSymbolTable`` class exposes no
2473public mutator methods, instead, simply call ``setName`` on a value, which will
2474autoinsert it into the appropriate symbol table.
2475
2476.. _UserLayout:
2477
2478The ``User`` and owned ``Use`` classes' memory layout
2479-----------------------------------------------------
2480
2481The ``User`` (`doxygen <http://llvm.org/doxygen/classllvm_1_1User.html>`__)
2482class provides a basis for expressing the ownership of ``User`` towards other
2483`Value instance <http://llvm.org/doxygen/classllvm_1_1Value.html>`_\ s. The
2484``Use`` (`doxygen <http://llvm.org/doxygen/classllvm_1_1Use.html>`__) helper
2485class is employed to do the bookkeeping and to facilitate *O(1)* addition and
2486removal.
2487
2488.. _Use2User:
2489
2490Interaction and relationship between ``User`` and ``Use`` objects
2491^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2492
2493A subclass of ``User`` can choose between incorporating its ``Use`` objects or
2494refer to them out-of-line by means of a pointer. A mixed variant (some ``Use``
2495s inline others hung off) is impractical and breaks the invariant that the
2496``Use`` objects belonging to the same ``User`` form a contiguous array.
2497
2498We have 2 different layouts in the ``User`` (sub)classes:
2499
2500* Layout a)
2501
2502 The ``Use`` object(s) are inside (resp. at fixed offset) of the ``User``
2503 object and there are a fixed number of them.
2504
2505* Layout b)
2506
2507 The ``Use`` object(s) are referenced by a pointer to an array from the
2508 ``User`` object and there may be a variable number of them.
2509
2510As of v2.4 each layout still possesses a direct pointer to the start of the
2511array of ``Use``\ s. Though not mandatory for layout a), we stick to this
2512redundancy for the sake of simplicity. The ``User`` object also stores the
2513number of ``Use`` objects it has. (Theoretically this information can also be
2514calculated given the scheme presented below.)
2515
2516Special forms of allocation operators (``operator new``) enforce the following
2517memory layouts:
2518
2519* Layout a) is modelled by prepending the ``User`` object by the ``Use[]``
2520 array.
2521
2522 .. code-block:: none
2523
2524 ...---.---.---.---.-------...
2525 | P | P | P | P | User
2526 '''---'---'---'---'-------'''
2527
2528* Layout b) is modelled by pointing at the ``Use[]`` array.
2529
2530 .. code-block:: none
2531
2532 .-------...
2533 | User
2534 '-------'''
2535 |
2536 v
2537 .---.---.---.---...
2538 | P | P | P | P |
2539 '---'---'---'---'''
2540
2541*(In the above figures* '``P``' *stands for the* ``Use**`` *that is stored in
2542each* ``Use`` *object in the member* ``Use::Prev`` *)*
2543
2544.. _Waymarking:
2545
2546The waymarking algorithm
2547^^^^^^^^^^^^^^^^^^^^^^^^
2548
2549Since the ``Use`` objects are deprived of the direct (back)pointer to their
2550``User`` objects, there must be a fast and exact method to recover it. This is
2551accomplished by the following scheme:
2552
2553A bit-encoding in the 2 LSBits (least significant bits) of the ``Use::Prev``
2554allows to find the start of the ``User`` object:
2555
Dmitri Gribenkoe8131122013-01-19 20:34:20 +00002556* ``00`` --- binary digit 0
Sean Silvabeb15ca2012-12-04 03:20:08 +00002557
Dmitri Gribenkoe8131122013-01-19 20:34:20 +00002558* ``01`` --- binary digit 1
Sean Silvabeb15ca2012-12-04 03:20:08 +00002559
Dmitri Gribenkoe8131122013-01-19 20:34:20 +00002560* ``10`` --- stop and calculate (``s``)
Sean Silvabeb15ca2012-12-04 03:20:08 +00002561
Dmitri Gribenkoe8131122013-01-19 20:34:20 +00002562* ``11`` --- full stop (``S``)
Sean Silvabeb15ca2012-12-04 03:20:08 +00002563
2564Given a ``Use*``, all we have to do is to walk till we get a stop and we either
2565have a ``User`` immediately behind or we have to walk to the next stop picking
2566up digits and calculating the offset:
2567
2568.. code-block:: none
2569
2570 .---.---.---.---.---.---.---.---.---.---.---.---.---.---.---.---.----------------
2571 | 1 | s | 1 | 0 | 1 | 0 | s | 1 | 1 | 0 | s | 1 | 1 | s | 1 | S | User (or User*)
2572 '---'---'---'---'---'---'---'---'---'---'---'---'---'---'---'---'----------------
2573 |+15 |+10 |+6 |+3 |+1
2574 | | | | | __>
2575 | | | | __________>
2576 | | | ______________________>
2577 | | ______________________________________>
2578 | __________________________________________________________>
2579
2580Only the significant number of bits need to be stored between the stops, so that
2581the *worst case is 20 memory accesses* when there are 1000 ``Use`` objects
2582associated with a ``User``.
2583
2584.. _ReferenceImpl:
2585
2586Reference implementation
2587^^^^^^^^^^^^^^^^^^^^^^^^
2588
2589The following literate Haskell fragment demonstrates the concept:
2590
2591.. code-block:: haskell
2592
2593 > import Test.QuickCheck
2594 >
2595 > digits :: Int -> [Char] -> [Char]
2596 > digits 0 acc = '0' : acc
2597 > digits 1 acc = '1' : acc
2598 > digits n acc = digits (n `div` 2) $ digits (n `mod` 2) acc
2599 >
2600 > dist :: Int -> [Char] -> [Char]
2601 > dist 0 [] = ['S']
2602 > dist 0 acc = acc
2603 > dist 1 acc = let r = dist 0 acc in 's' : digits (length r) r
2604 > dist n acc = dist (n - 1) $ dist 1 acc
2605 >
2606 > takeLast n ss = reverse $ take n $ reverse ss
2607 >
2608 > test = takeLast 40 $ dist 20 []
2609 >
2610
2611Printing <test> gives: ``"1s100000s11010s10100s1111s1010s110s11s1S"``
2612
2613The reverse algorithm computes the length of the string just by examining a
2614certain prefix:
2615
2616.. code-block:: haskell
2617
2618 > pref :: [Char] -> Int
2619 > pref "S" = 1
2620 > pref ('s':'1':rest) = decode 2 1 rest
2621 > pref (_:rest) = 1 + pref rest
2622 >
2623 > decode walk acc ('0':rest) = decode (walk + 1) (acc * 2) rest
2624 > decode walk acc ('1':rest) = decode (walk + 1) (acc * 2 + 1) rest
2625 > decode walk acc _ = walk + acc
2626 >
2627
2628Now, as expected, printing <pref test> gives ``40``.
2629
2630We can *quickCheck* this with following property:
2631
2632.. code-block:: haskell
2633
2634 > testcase = dist 2000 []
2635 > testcaseLength = length testcase
2636 >
2637 > identityProp n = n > 0 && n <= testcaseLength ==> length arr == pref arr
2638 > where arr = takeLast n testcase
2639 >
2640
2641As expected <quickCheck identityProp> gives:
2642
2643::
2644
2645 *Main> quickCheck identityProp
2646 OK, passed 100 tests.
2647
2648Let's be a bit more exhaustive:
2649
2650.. code-block:: haskell
2651
2652 >
2653 > deepCheck p = check (defaultConfig { configMaxTest = 500 }) p
2654 >
2655
2656And here is the result of <deepCheck identityProp>:
2657
2658::
2659
2660 *Main> deepCheck identityProp
2661 OK, passed 500 tests.
2662
2663.. _Tagging:
2664
2665Tagging considerations
2666^^^^^^^^^^^^^^^^^^^^^^
2667
2668To maintain the invariant that the 2 LSBits of each ``Use**`` in ``Use`` never
2669change after being set up, setters of ``Use::Prev`` must re-tag the new
2670``Use**`` on every modification. Accordingly getters must strip the tag bits.
2671
2672For layout b) instead of the ``User`` we find a pointer (``User*`` with LSBit
2673set). Following this pointer brings us to the ``User``. A portable trick
2674ensures that the first bytes of ``User`` (if interpreted as a pointer) never has
2675the LSBit set. (Portability is relying on the fact that all known compilers
2676place the ``vptr`` in the first word of the instances.)
2677
Chandler Carruth064dc332015-01-28 03:04:54 +00002678.. _polymorphism:
2679
2680Designing Type Hiercharies and Polymorphic Interfaces
2681-----------------------------------------------------
2682
2683There are two different design patterns that tend to result in the use of
2684virtual dispatch for methods in a type hierarchy in C++ programs. The first is
2685a genuine type hierarchy where different types in the hierarchy model
2686a specific subset of the functionality and semantics, and these types nest
2687strictly within each other. Good examples of this can be seen in the ``Value``
2688or ``Type`` type hierarchies.
2689
2690A second is the desire to dispatch dynamically across a collection of
2691polymorphic interface implementations. This latter use case can be modeled with
2692virtual dispatch and inheritance by defining an abstract interface base class
2693which all implementations derive from and override. However, this
2694implementation strategy forces an **"is-a"** relationship to exist that is not
2695actually meaningful. There is often not some nested hierarchy of useful
2696generalizations which code might interact with and move up and down. Instead,
2697there is a singular interface which is dispatched across a range of
2698implementations.
2699
2700The preferred implementation strategy for the second use case is that of
2701generic programming (sometimes called "compile-time duck typing" or "static
2702polymorphism"). For example, a template over some type parameter ``T`` can be
2703instantiated across any particular implementation that conforms to the
2704interface or *concept*. A good example here is the highly generic properties of
2705any type which models a node in a directed graph. LLVM models these primarily
2706through templates and generic programming. Such templates include the
2707``LoopInfoBase`` and ``DominatorTreeBase``. When this type of polymorphism
2708truly needs **dynamic** dispatch you can generalize it using a technique
2709called *concept-based polymorphism*. This pattern emulates the interfaces and
2710behaviors of templates using a very limited form of virtual dispatch for type
2711erasure inside its implementation. You can find examples of this technique in
2712the ``PassManager.h`` system, and there is a more detailed introduction to it
2713by Sean Parent in several of his talks and papers:
2714
2715#. `Inheritance Is The Base Class of Evil
2716 <http://channel9.msdn.com/Events/GoingNative/2013/Inheritance-Is-The-Base-Class-of-Evil>`_
2717 - The GoingNative 2013 talk describing this technique, and probably the best
2718 place to start.
2719#. `Value Semantics and Concepts-based Polymorphism
2720 <http://www.youtube.com/watch?v=_BpMYeUFXv8>`_ - The C++Now! 2012 talk
2721 describing this technique in more detail.
2722#. `Sean Parent's Papers and Presentations
2723 <http://github.com/sean-parent/sean-parent.github.com/wiki/Papers-and-Presentations>`_
2724 - A Github project full of links to slides, video, and sometimes code.
2725
2726When deciding between creating a type hierarchy (with either tagged or virtual
2727dispatch) and using templates or concepts-based polymorphism, consider whether
2728there is some refinement of an abstract base class which is a semantically
2729meaningful type on an interface boundary. If anything more refined than the
2730root abstract interface is meaningless to talk about as a partial extension of
2731the semantic model, then your use case likely fits better with polymorphism and
2732you should avoid using virtual dispatch. However, there may be some exigent
2733circumstances that require one technique or the other to be used.
2734
2735If you do need to introduce a type hierarchy, we prefer to use explicitly
2736closed type hierarchies with manual tagged dispatch and/or RTTI rather than the
2737open inheritance model and virtual dispatch that is more common in C++ code.
2738This is because LLVM rarely encourages library consumers to extend its core
2739types, and leverages the closed and tag-dispatched nature of its hierarchies to
2740generate significantly more efficient code. We have also found that a large
2741amount of our usage of type hierarchies fits better with tag-based pattern
2742matching rather than dynamic dispatch across a common interface. Within LLVM we
2743have built custom helpers to facilitate this design. See this document's
Sean Silva52c7dcd2015-01-28 10:36:41 +00002744section on :ref:`isa and dyn_cast <isa>` and our :doc:`detailed document
2745<HowToSetUpLLVMStyleRTTI>` which describes how you can implement this
2746pattern for use with the LLVM helpers.
Chandler Carruth064dc332015-01-28 03:04:54 +00002747
Sanjoy Das8ce64992015-03-26 19:25:01 +00002748.. _abi_breaking_checks:
2749
2750ABI Breaking Checks
2751-------------------
2752
2753Checks and asserts that alter the LLVM C++ ABI are predicated on the
2754preprocessor symbol `LLVM_ENABLE_ABI_BREAKING_CHECKS` -- LLVM
2755libraries built with `LLVM_ENABLE_ABI_BREAKING_CHECKS` are not ABI
2756compatible LLVM libraries built without it defined. By default,
2757turning on assertions also turns on `LLVM_ENABLE_ABI_BREAKING_CHECKS`
2758so a default +Asserts build is not ABI compatible with a
2759default -Asserts build. Clients that want ABI compatibility
2760between +Asserts and -Asserts builds should use the CMake or autoconf
2761build systems to set `LLVM_ENABLE_ABI_BREAKING_CHECKS` independently
2762of `LLVM_ENABLE_ASSERTIONS`.
2763
Sean Silvabeb15ca2012-12-04 03:20:08 +00002764.. _coreclasses:
2765
2766The Core LLVM Class Hierarchy Reference
2767=======================================
2768
Benjamin Kramer9f566a52013-07-08 19:59:35 +00002769``#include "llvm/IR/Type.h"``
Sean Silvabeb15ca2012-12-04 03:20:08 +00002770
2771header source: `Type.h <http://llvm.org/doxygen/Type_8h-source.html>`_
2772
2773doxygen info: `Type Clases <http://llvm.org/doxygen/classllvm_1_1Type.html>`_
2774
2775The Core LLVM classes are the primary means of representing the program being
2776inspected or transformed. The core LLVM classes are defined in header files in
Charlie Turner2ac115e2015-04-16 17:01:23 +00002777the ``include/llvm/IR`` directory, and implemented in the ``lib/IR``
2778directory. It's worth noting that, for historical reasons, this library is
2779called ``libLLVMCore.so``, not ``libLLVMIR.so`` as you might expect.
Sean Silvabeb15ca2012-12-04 03:20:08 +00002780
2781.. _Type:
2782
2783The Type class and Derived Types
2784--------------------------------
2785
2786``Type`` is a superclass of all type classes. Every ``Value`` has a ``Type``.
2787``Type`` cannot be instantiated directly but only through its subclasses.
2788Certain primitive types (``VoidType``, ``LabelType``, ``FloatType`` and
2789``DoubleType``) have hidden subclasses. They are hidden because they offer no
2790useful functionality beyond what the ``Type`` class offers except to distinguish
2791themselves from other subclasses of ``Type``.
2792
2793All other types are subclasses of ``DerivedType``. Types can be named, but this
2794is not a requirement. There exists exactly one instance of a given shape at any
2795one time. This allows type equality to be performed with address equality of
2796the Type Instance. That is, given two ``Type*`` values, the types are identical
2797if the pointers are identical.
2798
2799.. _m_Type:
2800
2801Important Public Methods
2802^^^^^^^^^^^^^^^^^^^^^^^^
2803
2804* ``bool isIntegerTy() const``: Returns true for any integer type.
2805
2806* ``bool isFloatingPointTy()``: Return true if this is one of the five
2807 floating point types.
2808
2809* ``bool isSized()``: Return true if the type has known size. Things
2810 that don't have a size are abstract types, labels and void.
2811
2812.. _derivedtypes:
2813
2814Important Derived Types
2815^^^^^^^^^^^^^^^^^^^^^^^
2816
2817``IntegerType``
2818 Subclass of DerivedType that represents integer types of any bit width. Any
2819 bit width between ``IntegerType::MIN_INT_BITS`` (1) and
2820 ``IntegerType::MAX_INT_BITS`` (~8 million) can be represented.
2821
2822 * ``static const IntegerType* get(unsigned NumBits)``: get an integer
2823 type of a specific bit width.
2824
2825 * ``unsigned getBitWidth() const``: Get the bit width of an integer type.
2826
2827``SequentialType``
2828 This is subclassed by ArrayType, PointerType and VectorType.
2829
2830 * ``const Type * getElementType() const``: Returns the type of each
2831 of the elements in the sequential type.
2832
2833``ArrayType``
2834 This is a subclass of SequentialType and defines the interface for array
2835 types.
2836
2837 * ``unsigned getNumElements() const``: Returns the number of elements
2838 in the array.
2839
2840``PointerType``
2841 Subclass of SequentialType for pointer types.
2842
2843``VectorType``
2844 Subclass of SequentialType for vector types. A vector type is similar to an
2845 ArrayType but is distinguished because it is a first class type whereas
2846 ArrayType is not. Vector types are used for vector operations and are usually
Ed Maste8ed40ce2015-04-14 20:52:58 +00002847 small vectors of an integer or floating point type.
Sean Silvabeb15ca2012-12-04 03:20:08 +00002848
2849``StructType``
2850 Subclass of DerivedTypes for struct types.
2851
2852.. _FunctionType:
2853
2854``FunctionType``
2855 Subclass of DerivedTypes for function types.
2856
2857 * ``bool isVarArg() const``: Returns true if it's a vararg function.
2858
2859 * ``const Type * getReturnType() const``: Returns the return type of the
2860 function.
2861
2862 * ``const Type * getParamType (unsigned i)``: Returns the type of the ith
2863 parameter.
2864
2865 * ``const unsigned getNumParams() const``: Returns the number of formal
2866 parameters.
2867
2868.. _Module:
2869
2870The ``Module`` class
2871--------------------
2872
Benjamin Kramer9f566a52013-07-08 19:59:35 +00002873``#include "llvm/IR/Module.h"``
Sean Silvabeb15ca2012-12-04 03:20:08 +00002874
2875header source: `Module.h <http://llvm.org/doxygen/Module_8h-source.html>`_
2876
2877doxygen info: `Module Class <http://llvm.org/doxygen/classllvm_1_1Module.html>`_
2878
2879The ``Module`` class represents the top level structure present in LLVM
2880programs. An LLVM module is effectively either a translation unit of the
2881original program or a combination of several translation units merged by the
2882linker. The ``Module`` class keeps track of a list of :ref:`Function
2883<c_Function>`\ s, a list of GlobalVariable_\ s, and a SymbolTable_.
2884Additionally, it contains a few helpful member functions that try to make common
2885operations easy.
2886
2887.. _m_Module:
2888
2889Important Public Members of the ``Module`` class
2890^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2891
2892* ``Module::Module(std::string name = "")``
2893
2894 Constructing a Module_ is easy. You can optionally provide a name for it
2895 (probably based on the name of the translation unit).
2896
2897* | ``Module::iterator`` - Typedef for function list iterator
2898 | ``Module::const_iterator`` - Typedef for const_iterator.
2899 | ``begin()``, ``end()``, ``size()``, ``empty()``
2900
2901 These are forwarding methods that make it easy to access the contents of a
2902 ``Module`` object's :ref:`Function <c_Function>` list.
2903
2904* ``Module::FunctionListType &getFunctionList()``
2905
2906 Returns the list of :ref:`Function <c_Function>`\ s. This is necessary to use
2907 when you need to update the list or perform a complex action that doesn't have
2908 a forwarding method.
2909
2910----------------
2911
2912* | ``Module::global_iterator`` - Typedef for global variable list iterator
2913 | ``Module::const_global_iterator`` - Typedef for const_iterator.
2914 | ``global_begin()``, ``global_end()``, ``global_size()``, ``global_empty()``
2915
2916 These are forwarding methods that make it easy to access the contents of a
2917 ``Module`` object's GlobalVariable_ list.
2918
2919* ``Module::GlobalListType &getGlobalList()``
2920
2921 Returns the list of GlobalVariable_\ s. This is necessary to use when you
2922 need to update the list or perform a complex action that doesn't have a
2923 forwarding method.
2924
2925----------------
2926
2927* ``SymbolTable *getSymbolTable()``
2928
2929 Return a reference to the SymbolTable_ for this ``Module``.
2930
2931----------------
2932
2933* ``Function *getFunction(StringRef Name) const``
2934
2935 Look up the specified function in the ``Module`` SymbolTable_. If it does not
2936 exist, return ``null``.
2937
2938* ``Function *getOrInsertFunction(const std::string &Name, const FunctionType
2939 *T)``
2940
2941 Look up the specified function in the ``Module`` SymbolTable_. If it does not
2942 exist, add an external declaration for the function and return it.
2943
2944* ``std::string getTypeName(const Type *Ty)``
2945
2946 If there is at least one entry in the SymbolTable_ for the specified Type_,
2947 return it. Otherwise return the empty string.
2948
2949* ``bool addTypeName(const std::string &Name, const Type *Ty)``
2950
2951 Insert an entry in the SymbolTable_ mapping ``Name`` to ``Ty``. If there is
2952 already an entry for this name, true is returned and the SymbolTable_ is not
2953 modified.
2954
2955.. _Value:
2956
2957The ``Value`` class
2958-------------------
2959
Benjamin Kramer9f566a52013-07-08 19:59:35 +00002960``#include "llvm/IR/Value.h"``
Sean Silvabeb15ca2012-12-04 03:20:08 +00002961
2962header source: `Value.h <http://llvm.org/doxygen/Value_8h-source.html>`_
2963
2964doxygen info: `Value Class <http://llvm.org/doxygen/classllvm_1_1Value.html>`_
2965
2966The ``Value`` class is the most important class in the LLVM Source base. It
2967represents a typed value that may be used (among other things) as an operand to
2968an instruction. There are many different types of ``Value``\ s, such as
2969Constant_\ s, Argument_\ s. Even Instruction_\ s and :ref:`Function
2970<c_Function>`\ s are ``Value``\ s.
2971
2972A particular ``Value`` may be used many times in the LLVM representation for a
2973program. For example, an incoming argument to a function (represented with an
2974instance of the Argument_ class) is "used" by every instruction in the function
2975that references the argument. To keep track of this relationship, the ``Value``
2976class keeps a list of all of the ``User``\ s that is using it (the User_ class
2977is a base class for all nodes in the LLVM graph that can refer to ``Value``\ s).
2978This use list is how LLVM represents def-use information in the program, and is
2979accessible through the ``use_*`` methods, shown below.
2980
2981Because LLVM is a typed representation, every LLVM ``Value`` is typed, and this
2982Type_ is available through the ``getType()`` method. In addition, all LLVM
2983values can be named. The "name" of the ``Value`` is a symbolic string printed
2984in the LLVM code:
2985
2986.. code-block:: llvm
2987
2988 %foo = add i32 1, 2
2989
2990.. _nameWarning:
2991
2992The name of this instruction is "foo". **NOTE** that the name of any value may
2993be missing (an empty string), so names should **ONLY** be used for debugging
2994(making the source code easier to read, debugging printouts), they should not be
2995used to keep track of values or map between them. For this purpose, use a
2996``std::map`` of pointers to the ``Value`` itself instead.
2997
2998One important aspect of LLVM is that there is no distinction between an SSA
2999variable and the operation that produces it. Because of this, any reference to
3000the value produced by an instruction (or the value available as an incoming
3001argument, for example) is represented as a direct pointer to the instance of the
3002class that represents this value. Although this may take some getting used to,
3003it simplifies the representation and makes it easier to manipulate.
3004
3005.. _m_Value:
3006
3007Important Public Members of the ``Value`` class
3008^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
3009
3010* | ``Value::use_iterator`` - Typedef for iterator over the use-list
3011 | ``Value::const_use_iterator`` - Typedef for const_iterator over the
3012 use-list
3013 | ``unsigned use_size()`` - Returns the number of users of the value.
3014 | ``bool use_empty()`` - Returns true if there are no users.
3015 | ``use_iterator use_begin()`` - Get an iterator to the start of the
3016 use-list.
3017 | ``use_iterator use_end()`` - Get an iterator to the end of the use-list.
3018 | ``User *use_back()`` - Returns the last element in the list.
3019
3020 These methods are the interface to access the def-use information in LLVM.
3021 As with all other iterators in LLVM, the naming conventions follow the
3022 conventions defined by the STL_.
3023
3024* ``Type *getType() const``
3025 This method returns the Type of the Value.
3026
3027* | ``bool hasName() const``
3028 | ``std::string getName() const``
3029 | ``void setName(const std::string &Name)``
3030
3031 This family of methods is used to access and assign a name to a ``Value``, be
3032 aware of the :ref:`precaution above <nameWarning>`.
3033
3034* ``void replaceAllUsesWith(Value *V)``
3035
3036 This method traverses the use list of a ``Value`` changing all User_\ s of the
3037 current value to refer to "``V``" instead. For example, if you detect that an
3038 instruction always produces a constant value (for example through constant
3039 folding), you can replace all uses of the instruction with the constant like
3040 this:
3041
3042 .. code-block:: c++
3043
3044 Inst->replaceAllUsesWith(ConstVal);
3045
3046.. _User:
3047
3048The ``User`` class
3049------------------
3050
Benjamin Kramer9f566a52013-07-08 19:59:35 +00003051``#include "llvm/IR/User.h"``
Sean Silvabeb15ca2012-12-04 03:20:08 +00003052
3053header source: `User.h <http://llvm.org/doxygen/User_8h-source.html>`_
3054
3055doxygen info: `User Class <http://llvm.org/doxygen/classllvm_1_1User.html>`_
3056
3057Superclass: Value_
3058
3059The ``User`` class is the common base class of all LLVM nodes that may refer to
3060``Value``\ s. It exposes a list of "Operands" that are all of the ``Value``\ s
3061that the User is referring to. The ``User`` class itself is a subclass of
3062``Value``.
3063
3064The operands of a ``User`` point directly to the LLVM ``Value`` that it refers
3065to. Because LLVM uses Static Single Assignment (SSA) form, there can only be
3066one definition referred to, allowing this direct connection. This connection
3067provides the use-def information in LLVM.
3068
3069.. _m_User:
3070
3071Important Public Members of the ``User`` class
3072^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
3073
3074The ``User`` class exposes the operand list in two ways: through an index access
3075interface and through an iterator based interface.
3076
3077* | ``Value *getOperand(unsigned i)``
3078 | ``unsigned getNumOperands()``
3079
3080 These two methods expose the operands of the ``User`` in a convenient form for
3081 direct access.
3082
3083* | ``User::op_iterator`` - Typedef for iterator over the operand list
3084 | ``op_iterator op_begin()`` - Get an iterator to the start of the operand
3085 list.
3086 | ``op_iterator op_end()`` - Get an iterator to the end of the operand list.
3087
3088 Together, these methods make up the iterator based interface to the operands
3089 of a ``User``.
3090
3091
3092.. _Instruction:
3093
3094The ``Instruction`` class
3095-------------------------
3096
Benjamin Kramer9f566a52013-07-08 19:59:35 +00003097``#include "llvm/IR/Instruction.h"``
Sean Silvabeb15ca2012-12-04 03:20:08 +00003098
3099header source: `Instruction.h
3100<http://llvm.org/doxygen/Instruction_8h-source.html>`_
3101
3102doxygen info: `Instruction Class
3103<http://llvm.org/doxygen/classllvm_1_1Instruction.html>`_
3104
3105Superclasses: User_, Value_
3106
3107The ``Instruction`` class is the common base class for all LLVM instructions.
3108It provides only a few methods, but is a very commonly used class. The primary
3109data tracked by the ``Instruction`` class itself is the opcode (instruction
3110type) and the parent BasicBlock_ the ``Instruction`` is embedded into. To
3111represent a specific type of instruction, one of many subclasses of
3112``Instruction`` are used.
3113
3114Because the ``Instruction`` class subclasses the User_ class, its operands can
3115be accessed in the same way as for other ``User``\ s (with the
3116``getOperand()``/``getNumOperands()`` and ``op_begin()``/``op_end()`` methods).
3117An important file for the ``Instruction`` class is the ``llvm/Instruction.def``
3118file. This file contains some meta-data about the various different types of
3119instructions in LLVM. It describes the enum values that are used as opcodes
3120(for example ``Instruction::Add`` and ``Instruction::ICmp``), as well as the
3121concrete sub-classes of ``Instruction`` that implement the instruction (for
3122example BinaryOperator_ and CmpInst_). Unfortunately, the use of macros in this
3123file confuses doxygen, so these enum values don't show up correctly in the
3124`doxygen output <http://llvm.org/doxygen/classllvm_1_1Instruction.html>`_.
3125
3126.. _s_Instruction:
3127
3128Important Subclasses of the ``Instruction`` class
3129^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
3130
3131.. _BinaryOperator:
3132
3133* ``BinaryOperator``
3134
3135 This subclasses represents all two operand instructions whose operands must be
3136 the same type, except for the comparison instructions.
3137
3138.. _CastInst:
3139
3140* ``CastInst``
3141 This subclass is the parent of the 12 casting instructions. It provides
3142 common operations on cast instructions.
3143
3144.. _CmpInst:
3145
3146* ``CmpInst``
3147
3148 This subclass respresents the two comparison instructions,
3149 `ICmpInst <LangRef.html#i_icmp>`_ (integer opreands), and
3150 `FCmpInst <LangRef.html#i_fcmp>`_ (floating point operands).
3151
3152.. _TerminatorInst:
3153
3154* ``TerminatorInst``
3155
3156 This subclass is the parent of all terminator instructions (those which can
3157 terminate a block).
3158
3159.. _m_Instruction:
3160
3161Important Public Members of the ``Instruction`` class
3162^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
3163
3164* ``BasicBlock *getParent()``
3165
3166 Returns the BasicBlock_ that this
3167 ``Instruction`` is embedded into.
3168
3169* ``bool mayWriteToMemory()``
3170
3171 Returns true if the instruction writes to memory, i.e. it is a ``call``,
3172 ``free``, ``invoke``, or ``store``.
3173
3174* ``unsigned getOpcode()``
3175
3176 Returns the opcode for the ``Instruction``.
3177
3178* ``Instruction *clone() const``
3179
3180 Returns another instance of the specified instruction, identical in all ways
3181 to the original except that the instruction has no parent (i.e. it's not
3182 embedded into a BasicBlock_), and it has no name.
3183
3184.. _Constant:
3185
3186The ``Constant`` class and subclasses
3187-------------------------------------
3188
3189Constant represents a base class for different types of constants. It is
3190subclassed by ConstantInt, ConstantArray, etc. for representing the various
3191types of Constants. GlobalValue_ is also a subclass, which represents the
3192address of a global variable or function.
3193
3194.. _s_Constant:
3195
3196Important Subclasses of Constant
3197^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
3198
3199* ConstantInt : This subclass of Constant represents an integer constant of
3200 any width.
3201
3202 * ``const APInt& getValue() const``: Returns the underlying
3203 value of this constant, an APInt value.
3204
3205 * ``int64_t getSExtValue() const``: Converts the underlying APInt value to an
3206 int64_t via sign extension. If the value (not the bit width) of the APInt
3207 is too large to fit in an int64_t, an assertion will result. For this
3208 reason, use of this method is discouraged.
3209
3210 * ``uint64_t getZExtValue() const``: Converts the underlying APInt value
3211 to a uint64_t via zero extension. IF the value (not the bit width) of the
3212 APInt is too large to fit in a uint64_t, an assertion will result. For this
3213 reason, use of this method is discouraged.
3214
3215 * ``static ConstantInt* get(const APInt& Val)``: Returns the ConstantInt
3216 object that represents the value provided by ``Val``. The type is implied
3217 as the IntegerType that corresponds to the bit width of ``Val``.
3218
3219 * ``static ConstantInt* get(const Type *Ty, uint64_t Val)``: Returns the
3220 ConstantInt object that represents the value provided by ``Val`` for integer
3221 type ``Ty``.
3222
3223* ConstantFP : This class represents a floating point constant.
3224
3225 * ``double getValue() const``: Returns the underlying value of this constant.
3226
3227* ConstantArray : This represents a constant array.
3228
3229 * ``const std::vector<Use> &getValues() const``: Returns a vector of
3230 component constants that makeup this array.
3231
3232* ConstantStruct : This represents a constant struct.
3233
3234 * ``const std::vector<Use> &getValues() const``: Returns a vector of
3235 component constants that makeup this array.
3236
3237* GlobalValue : This represents either a global variable or a function. In
3238 either case, the value is a constant fixed address (after linking).
3239
3240.. _GlobalValue:
3241
3242The ``GlobalValue`` class
3243-------------------------
3244
Benjamin Kramer9f566a52013-07-08 19:59:35 +00003245``#include "llvm/IR/GlobalValue.h"``
Sean Silvabeb15ca2012-12-04 03:20:08 +00003246
3247header source: `GlobalValue.h
3248<http://llvm.org/doxygen/GlobalValue_8h-source.html>`_
3249
3250doxygen info: `GlobalValue Class
3251<http://llvm.org/doxygen/classllvm_1_1GlobalValue.html>`_
3252
3253Superclasses: Constant_, User_, Value_
3254
3255Global values ( GlobalVariable_\ s or :ref:`Function <c_Function>`\ s) are the
3256only LLVM values that are visible in the bodies of all :ref:`Function
3257<c_Function>`\ s. Because they are visible at global scope, they are also
3258subject to linking with other globals defined in different translation units.
3259To control the linking process, ``GlobalValue``\ s know their linkage rules.
3260Specifically, ``GlobalValue``\ s know whether they have internal or external
3261linkage, as defined by the ``LinkageTypes`` enumeration.
3262
3263If a ``GlobalValue`` has internal linkage (equivalent to being ``static`` in C),
3264it is not visible to code outside the current translation unit, and does not
3265participate in linking. If it has external linkage, it is visible to external
3266code, and does participate in linking. In addition to linkage information,
3267``GlobalValue``\ s keep track of which Module_ they are currently part of.
3268
3269Because ``GlobalValue``\ s are memory objects, they are always referred to by
3270their **address**. As such, the Type_ of a global is always a pointer to its
3271contents. It is important to remember this when using the ``GetElementPtrInst``
3272instruction because this pointer must be dereferenced first. For example, if
3273you have a ``GlobalVariable`` (a subclass of ``GlobalValue)`` that is an array
3274of 24 ints, type ``[24 x i32]``, then the ``GlobalVariable`` is a pointer to
3275that array. Although the address of the first element of this array and the
3276value of the ``GlobalVariable`` are the same, they have different types. The
3277``GlobalVariable``'s type is ``[24 x i32]``. The first element's type is
3278``i32.`` Because of this, accessing a global value requires you to dereference
3279the pointer with ``GetElementPtrInst`` first, then its elements can be accessed.
3280This is explained in the `LLVM Language Reference Manual
3281<LangRef.html#globalvars>`_.
3282
3283.. _m_GlobalValue:
3284
3285Important Public Members of the ``GlobalValue`` class
3286^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
3287
3288* | ``bool hasInternalLinkage() const``
3289 | ``bool hasExternalLinkage() const``
3290 | ``void setInternalLinkage(bool HasInternalLinkage)``
3291
3292 These methods manipulate the linkage characteristics of the ``GlobalValue``.
3293
3294* ``Module *getParent()``
3295
3296 This returns the Module_ that the
3297 GlobalValue is currently embedded into.
3298
3299.. _c_Function:
3300
3301The ``Function`` class
3302----------------------
3303
Benjamin Kramer9f566a52013-07-08 19:59:35 +00003304``#include "llvm/IR/Function.h"``
Sean Silvabeb15ca2012-12-04 03:20:08 +00003305
3306header source: `Function.h <http://llvm.org/doxygen/Function_8h-source.html>`_
3307
3308doxygen info: `Function Class
3309<http://llvm.org/doxygen/classllvm_1_1Function.html>`_
3310
3311Superclasses: GlobalValue_, Constant_, User_, Value_
3312
3313The ``Function`` class represents a single procedure in LLVM. It is actually
3314one of the more complex classes in the LLVM hierarchy because it must keep track
3315of a large amount of data. The ``Function`` class keeps track of a list of
3316BasicBlock_\ s, a list of formal Argument_\ s, and a SymbolTable_.
3317
3318The list of BasicBlock_\ s is the most commonly used part of ``Function``
3319objects. The list imposes an implicit ordering of the blocks in the function,
3320which indicate how the code will be laid out by the backend. Additionally, the
3321first BasicBlock_ is the implicit entry node for the ``Function``. It is not
3322legal in LLVM to explicitly branch to this initial block. There are no implicit
3323exit nodes, and in fact there may be multiple exit nodes from a single
3324``Function``. If the BasicBlock_ list is empty, this indicates that the
3325``Function`` is actually a function declaration: the actual body of the function
3326hasn't been linked in yet.
3327
3328In addition to a list of BasicBlock_\ s, the ``Function`` class also keeps track
3329of the list of formal Argument_\ s that the function receives. This container
3330manages the lifetime of the Argument_ nodes, just like the BasicBlock_ list does
3331for the BasicBlock_\ s.
3332
3333The SymbolTable_ is a very rarely used LLVM feature that is only used when you
3334have to look up a value by name. Aside from that, the SymbolTable_ is used
3335internally to make sure that there are not conflicts between the names of
3336Instruction_\ s, BasicBlock_\ s, or Argument_\ s in the function body.
3337
3338Note that ``Function`` is a GlobalValue_ and therefore also a Constant_. The
3339value of the function is its address (after linking) which is guaranteed to be
3340constant.
3341
3342.. _m_Function:
3343
3344Important Public Members of the ``Function``
3345^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
3346
3347* ``Function(const FunctionType *Ty, LinkageTypes Linkage,
3348 const std::string &N = "", Module* Parent = 0)``
3349
3350 Constructor used when you need to create new ``Function``\ s to add the
3351 program. The constructor must specify the type of the function to create and
3352 what type of linkage the function should have. The FunctionType_ argument
3353 specifies the formal arguments and return value for the function. The same
3354 FunctionType_ value can be used to create multiple functions. The ``Parent``
3355 argument specifies the Module in which the function is defined. If this
3356 argument is provided, the function will automatically be inserted into that
3357 module's list of functions.
3358
3359* ``bool isDeclaration()``
3360
3361 Return whether or not the ``Function`` has a body defined. If the function is
3362 "external", it does not have a body, and thus must be resolved by linking with
3363 a function defined in a different translation unit.
3364
3365* | ``Function::iterator`` - Typedef for basic block list iterator
3366 | ``Function::const_iterator`` - Typedef for const_iterator.
3367 | ``begin()``, ``end()``, ``size()``, ``empty()``
3368
3369 These are forwarding methods that make it easy to access the contents of a
3370 ``Function`` object's BasicBlock_ list.
3371
3372* ``Function::BasicBlockListType &getBasicBlockList()``
3373
3374 Returns the list of BasicBlock_\ s. This is necessary to use when you need to
3375 update the list or perform a complex action that doesn't have a forwarding
3376 method.
3377
3378* | ``Function::arg_iterator`` - Typedef for the argument list iterator
3379 | ``Function::const_arg_iterator`` - Typedef for const_iterator.
3380 | ``arg_begin()``, ``arg_end()``, ``arg_size()``, ``arg_empty()``
3381
3382 These are forwarding methods that make it easy to access the contents of a
3383 ``Function`` object's Argument_ list.
3384
3385* ``Function::ArgumentListType &getArgumentList()``
3386
3387 Returns the list of Argument_. This is necessary to use when you need to
3388 update the list or perform a complex action that doesn't have a forwarding
3389 method.
3390
3391* ``BasicBlock &getEntryBlock()``
3392
3393 Returns the entry ``BasicBlock`` for the function. Because the entry block
3394 for the function is always the first block, this returns the first block of
3395 the ``Function``.
3396
3397* | ``Type *getReturnType()``
3398 | ``FunctionType *getFunctionType()``
3399
3400 This traverses the Type_ of the ``Function`` and returns the return type of
3401 the function, or the FunctionType_ of the actual function.
3402
3403* ``SymbolTable *getSymbolTable()``
3404
3405 Return a pointer to the SymbolTable_ for this ``Function``.
3406
3407.. _GlobalVariable:
3408
3409The ``GlobalVariable`` class
3410----------------------------
3411
Benjamin Kramer9f566a52013-07-08 19:59:35 +00003412``#include "llvm/IR/GlobalVariable.h"``
Sean Silvabeb15ca2012-12-04 03:20:08 +00003413
3414header source: `GlobalVariable.h
3415<http://llvm.org/doxygen/GlobalVariable_8h-source.html>`_
3416
3417doxygen info: `GlobalVariable Class
3418<http://llvm.org/doxygen/classllvm_1_1GlobalVariable.html>`_
3419
3420Superclasses: GlobalValue_, Constant_, User_, Value_
3421
3422Global variables are represented with the (surprise surprise) ``GlobalVariable``
3423class. Like functions, ``GlobalVariable``\ s are also subclasses of
3424GlobalValue_, and as such are always referenced by their address (global values
3425must live in memory, so their "name" refers to their constant address). See
3426GlobalValue_ for more on this. Global variables may have an initial value
3427(which must be a Constant_), and if they have an initializer, they may be marked
3428as "constant" themselves (indicating that their contents never change at
3429runtime).
3430
3431.. _m_GlobalVariable:
3432
3433Important Public Members of the ``GlobalVariable`` class
3434^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
3435
3436* ``GlobalVariable(const Type *Ty, bool isConstant, LinkageTypes &Linkage,
3437 Constant *Initializer = 0, const std::string &Name = "", Module* Parent = 0)``
3438
3439 Create a new global variable of the specified type. If ``isConstant`` is true
3440 then the global variable will be marked as unchanging for the program. The
3441 Linkage parameter specifies the type of linkage (internal, external, weak,
3442 linkonce, appending) for the variable. If the linkage is InternalLinkage,
3443 WeakAnyLinkage, WeakODRLinkage, LinkOnceAnyLinkage or LinkOnceODRLinkage, then
3444 the resultant global variable will have internal linkage. AppendingLinkage
3445 concatenates together all instances (in different translation units) of the
3446 variable into a single variable but is only applicable to arrays. See the
3447 `LLVM Language Reference <LangRef.html#modulestructure>`_ for further details
3448 on linkage types. Optionally an initializer, a name, and the module to put
3449 the variable into may be specified for the global variable as well.
3450
3451* ``bool isConstant() const``
3452
3453 Returns true if this is a global variable that is known not to be modified at
3454 runtime.
3455
3456* ``bool hasInitializer()``
3457
3458 Returns true if this ``GlobalVariable`` has an intializer.
3459
3460* ``Constant *getInitializer()``
3461
3462 Returns the initial value for a ``GlobalVariable``. It is not legal to call
3463 this method if there is no initializer.
3464
3465.. _BasicBlock:
3466
3467The ``BasicBlock`` class
3468------------------------
3469
Benjamin Kramer9f566a52013-07-08 19:59:35 +00003470``#include "llvm/IR/BasicBlock.h"``
Sean Silvabeb15ca2012-12-04 03:20:08 +00003471
3472header source: `BasicBlock.h
3473<http://llvm.org/doxygen/BasicBlock_8h-source.html>`_
3474
3475doxygen info: `BasicBlock Class
3476<http://llvm.org/doxygen/classllvm_1_1BasicBlock.html>`_
3477
3478Superclass: Value_
3479
3480This class represents a single entry single exit section of the code, commonly
3481known as a basic block by the compiler community. The ``BasicBlock`` class
3482maintains a list of Instruction_\ s, which form the body of the block. Matching
3483the language definition, the last element of this list of instructions is always
3484a terminator instruction (a subclass of the TerminatorInst_ class).
3485
3486In addition to tracking the list of instructions that make up the block, the
3487``BasicBlock`` class also keeps track of the :ref:`Function <c_Function>` that
3488it is embedded into.
3489
3490Note that ``BasicBlock``\ s themselves are Value_\ s, because they are
3491referenced by instructions like branches and can go in the switch tables.
3492``BasicBlock``\ s have type ``label``.
3493
3494.. _m_BasicBlock:
3495
3496Important Public Members of the ``BasicBlock`` class
3497^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
3498
3499* ``BasicBlock(const std::string &Name = "", Function *Parent = 0)``
3500
3501 The ``BasicBlock`` constructor is used to create new basic blocks for
3502 insertion into a function. The constructor optionally takes a name for the
3503 new block, and a :ref:`Function <c_Function>` to insert it into. If the
3504 ``Parent`` parameter is specified, the new ``BasicBlock`` is automatically
3505 inserted at the end of the specified :ref:`Function <c_Function>`, if not
3506 specified, the BasicBlock must be manually inserted into the :ref:`Function
3507 <c_Function>`.
3508
3509* | ``BasicBlock::iterator`` - Typedef for instruction list iterator
3510 | ``BasicBlock::const_iterator`` - Typedef for const_iterator.
3511 | ``begin()``, ``end()``, ``front()``, ``back()``,
3512 ``size()``, ``empty()``
3513 STL-style functions for accessing the instruction list.
3514
3515 These methods and typedefs are forwarding functions that have the same
3516 semantics as the standard library methods of the same names. These methods
3517 expose the underlying instruction list of a basic block in a way that is easy
3518 to manipulate. To get the full complement of container operations (including
3519 operations to update the list), you must use the ``getInstList()`` method.
3520
3521* ``BasicBlock::InstListType &getInstList()``
3522
3523 This method is used to get access to the underlying container that actually
3524 holds the Instructions. This method must be used when there isn't a
3525 forwarding function in the ``BasicBlock`` class for the operation that you
3526 would like to perform. Because there are no forwarding functions for
3527 "updating" operations, you need to use this if you want to update the contents
3528 of a ``BasicBlock``.
3529
3530* ``Function *getParent()``
3531
3532 Returns a pointer to :ref:`Function <c_Function>` the block is embedded into,
3533 or a null pointer if it is homeless.
3534
3535* ``TerminatorInst *getTerminator()``
3536
3537 Returns a pointer to the terminator instruction that appears at the end of the
3538 ``BasicBlock``. If there is no terminator instruction, or if the last
3539 instruction in the block is not a terminator, then a null pointer is returned.
3540
3541.. _Argument:
3542
3543The ``Argument`` class
3544----------------------
3545
3546This subclass of Value defines the interface for incoming formal arguments to a
3547function. A Function maintains a list of its formal arguments. An argument has
3548a pointer to the parent Function.
3549
3550