blob: ffc022eef168d8f96e0006125b140cda93f29958 [file] [log] [blame]
Sean Silvabeb15ca2012-12-04 03:20:08 +00001========================
2LLVM Programmer's Manual
3========================
4
5.. contents::
6 :local:
7
8.. warning::
Chris Lattner045a73e2013-01-10 21:24:04 +00009 This is always a work in progress.
Sean Silvabeb15ca2012-12-04 03:20:08 +000010
11.. _introduction:
12
13Introduction
14============
15
16This document is meant to highlight some of the important classes and interfaces
17available in the LLVM source-base. This manual is not intended to explain what
18LLVM is, how it works, and what LLVM code looks like. It assumes that you know
19the basics of LLVM and are interested in writing transformations or otherwise
20analyzing or manipulating the code.
21
22This document should get you oriented so that you can find your way in the
23continuously growing source code that makes up the LLVM infrastructure. Note
24that this manual is not intended to serve as a replacement for reading the
25source code, so if you think there should be a method in one of these classes to
26do something, but it's not listed, check the source. Links to the `doxygen
27<http://llvm.org/doxygen/>`__ sources are provided to make this as easy as
28possible.
29
30The first section of this document describes general information that is useful
31to know when working in the LLVM infrastructure, and the second describes the
32Core LLVM classes. In the future this manual will be extended with information
33describing how to use extension libraries, such as dominator information, CFG
34traversal routines, and useful utilities like the ``InstVisitor`` (`doxygen
35<http://llvm.org/doxygen/InstVisitor_8h-source.html>`__) template.
36
37.. _general:
38
39General Information
40===================
41
42This section contains general information that is useful if you are working in
43the LLVM source-base, but that isn't specific to any particular API.
44
45.. _stl:
46
47The C++ Standard Template Library
48---------------------------------
49
50LLVM makes heavy use of the C++ Standard Template Library (STL), perhaps much
51more than you are used to, or have seen before. Because of this, you might want
52to do a little background reading in the techniques used and capabilities of the
53library. There are many good pages that discuss the STL, and several books on
54the subject that you can get, so it will not be discussed in this document.
55
56Here are some useful links:
57
Sean Silva4b587852012-12-04 03:30:36 +000058#. `cppreference.com
59 <http://en.cppreference.com/w/>`_ - an excellent
Sean Silvabeb15ca2012-12-04 03:20:08 +000060 reference for the STL and other parts of the standard C++ library.
61
62#. `C++ In a Nutshell <http://www.tempest-sw.com/cpp/>`_ - This is an O'Reilly
63 book in the making. It has a decent Standard Library Reference that rivals
64 Dinkumware's, and is unfortunately no longer free since the book has been
65 published.
66
67#. `C++ Frequently Asked Questions <http://www.parashift.com/c++-faq-lite/>`_.
68
69#. `SGI's STL Programmer's Guide <http://www.sgi.com/tech/stl/>`_ - Contains a
70 useful `Introduction to the STL
71 <http://www.sgi.com/tech/stl/stl_introduction.html>`_.
72
73#. `Bjarne Stroustrup's C++ Page
74 <http://www.research.att.com/%7Ebs/C++.html>`_.
75
76#. `Bruce Eckel's Thinking in C++, 2nd ed. Volume 2 Revision 4.0
Sean Silvac454f07e32012-12-04 03:45:27 +000077 (even better, get the book)
78 <http://www.mindview.net/Books/TICPP/ThinkingInCPP2e.html>`_.
Sean Silvabeb15ca2012-12-04 03:20:08 +000079
Sean Silva92a44892013-01-11 02:28:08 +000080You are also encouraged to take a look at the :doc:`LLVM Coding Standards
81<CodingStandards>` guide which focuses on how to write maintainable code more
Sean Silvabeb15ca2012-12-04 03:20:08 +000082than where to put your curly braces.
83
84.. _resources:
85
86Other useful references
87-----------------------
88
89#. `Using static and shared libraries across platforms
90 <http://www.fortran-2000.com/ArnaudRecipes/sharedlib.html>`_
91
92.. _apis:
93
94Important and useful LLVM APIs
95==============================
96
97Here we highlight some LLVM APIs that are generally useful and good to know
98about when writing transformations.
99
100.. _isa:
101
102The ``isa<>``, ``cast<>`` and ``dyn_cast<>`` templates
103------------------------------------------------------
104
105The LLVM source-base makes extensive use of a custom form of RTTI. These
106templates have many similarities to the C++ ``dynamic_cast<>`` operator, but
107they don't have some drawbacks (primarily stemming from the fact that
108``dynamic_cast<>`` only works on classes that have a v-table). Because they are
109used so often, you must know what they do and how they work. All of these
110templates are defined in the ``llvm/Support/Casting.h`` (`doxygen
111<http://llvm.org/doxygen/Casting_8h-source.html>`__) file (note that you very
112rarely have to include this file directly).
113
114``isa<>``:
115 The ``isa<>`` operator works exactly like the Java "``instanceof``" operator.
116 It returns true or false depending on whether a reference or pointer points to
117 an instance of the specified class. This can be very useful for constraint
118 checking of various sorts (example below).
119
120``cast<>``:
121 The ``cast<>`` operator is a "checked cast" operation. It converts a pointer
122 or reference from a base class to a derived class, causing an assertion
123 failure if it is not really an instance of the right type. This should be
124 used in cases where you have some information that makes you believe that
125 something is of the right type. An example of the ``isa<>`` and ``cast<>``
126 template is:
127
128 .. code-block:: c++
129
130 static bool isLoopInvariant(const Value *V, const Loop *L) {
131 if (isa<Constant>(V) || isa<Argument>(V) || isa<GlobalValue>(V))
132 return true;
133
134 // Otherwise, it must be an instruction...
135 return !L->contains(cast<Instruction>(V)->getParent());
136 }
137
138 Note that you should **not** use an ``isa<>`` test followed by a ``cast<>``,
139 for that use the ``dyn_cast<>`` operator.
140
141``dyn_cast<>``:
142 The ``dyn_cast<>`` operator is a "checking cast" operation. It checks to see
143 if the operand is of the specified type, and if so, returns a pointer to it
144 (this operator does not work with references). If the operand is not of the
145 correct type, a null pointer is returned. Thus, this works very much like
146 the ``dynamic_cast<>`` operator in C++, and should be used in the same
147 circumstances. Typically, the ``dyn_cast<>`` operator is used in an ``if``
148 statement or some other flow control statement like this:
149
150 .. code-block:: c++
151
Piotr Padlewskidb8d7c82016-11-11 22:12:15 +0000152 if (auto *AI = dyn_cast<AllocationInst>(Val)) {
Sean Silvabeb15ca2012-12-04 03:20:08 +0000153 // ...
154 }
155
156 This form of the ``if`` statement effectively combines together a call to
157 ``isa<>`` and a call to ``cast<>`` into one statement, which is very
158 convenient.
159
160 Note that the ``dyn_cast<>`` operator, like C++'s ``dynamic_cast<>`` or Java's
161 ``instanceof`` operator, can be abused. In particular, you should not use big
162 chained ``if/then/else`` blocks to check for lots of different variants of
163 classes. If you find yourself wanting to do this, it is much cleaner and more
164 efficient to use the ``InstVisitor`` class to dispatch over the instruction
165 type directly.
166
167``cast_or_null<>``:
168 The ``cast_or_null<>`` operator works just like the ``cast<>`` operator,
169 except that it allows for a null pointer as an argument (which it then
170 propagates). This can sometimes be useful, allowing you to combine several
171 null checks into one.
172
173``dyn_cast_or_null<>``:
174 The ``dyn_cast_or_null<>`` operator works just like the ``dyn_cast<>``
175 operator, except that it allows for a null pointer as an argument (which it
176 then propagates). This can sometimes be useful, allowing you to combine
177 several null checks into one.
178
179These five templates can be used with any classes, whether they have a v-table
180or not. If you want to add support for these templates, see the document
Sean Silva92a44892013-01-11 02:28:08 +0000181:doc:`How to set up LLVM-style RTTI for your class hierarchy
182<HowToSetUpLLVMStyleRTTI>`
Sean Silvabeb15ca2012-12-04 03:20:08 +0000183
184.. _string_apis:
185
186Passing strings (the ``StringRef`` and ``Twine`` classes)
187---------------------------------------------------------
188
189Although LLVM generally does not do much string manipulation, we do have several
190important APIs which take strings. Two important examples are the Value class
191-- which has names for instructions, functions, etc. -- and the ``StringMap``
192class which is used extensively in LLVM and Clang.
193
194These are generic classes, and they need to be able to accept strings which may
195have embedded null characters. Therefore, they cannot simply take a ``const
196char *``, and taking a ``const std::string&`` requires clients to perform a heap
197allocation which is usually unnecessary. Instead, many LLVM APIs use a
198``StringRef`` or a ``const Twine&`` for passing strings efficiently.
199
200.. _StringRef:
201
202The ``StringRef`` class
203^^^^^^^^^^^^^^^^^^^^^^^^^^^^
204
205The ``StringRef`` data type represents a reference to a constant string (a
206character array and a length) and supports the common operations available on
207``std::string``, but does not require heap allocation.
208
209It can be implicitly constructed using a C style null-terminated string, an
210``std::string``, or explicitly with a character pointer and length. For
211example, the ``StringRef`` find function is declared as:
212
213.. code-block:: c++
214
215 iterator find(StringRef Key);
216
217and clients can call it using any one of:
218
219.. code-block:: c++
220
221 Map.find("foo"); // Lookup "foo"
222 Map.find(std::string("bar")); // Lookup "bar"
223 Map.find(StringRef("\0baz", 4)); // Lookup "\0baz"
224
225Similarly, APIs which need to return a string may return a ``StringRef``
226instance, which can be used directly or converted to an ``std::string`` using
227the ``str`` member function. See ``llvm/ADT/StringRef.h`` (`doxygen
228<http://llvm.org/doxygen/classllvm_1_1StringRef_8h-source.html>`__) for more
229information.
230
231You should rarely use the ``StringRef`` class directly, because it contains
232pointers to external memory it is not generally safe to store an instance of the
233class (unless you know that the external storage will not be freed).
234``StringRef`` is small and pervasive enough in LLVM that it should always be
235passed by value.
236
237The ``Twine`` class
238^^^^^^^^^^^^^^^^^^^
239
240The ``Twine`` (`doxygen <http://llvm.org/doxygen/classllvm_1_1Twine.html>`__)
241class is an efficient way for APIs to accept concatenated strings. For example,
242a common LLVM paradigm is to name one instruction based on the name of another
243instruction with a suffix, for example:
244
245.. code-block:: c++
246
247 New = CmpInst::Create(..., SO->getName() + ".cmp");
248
249The ``Twine`` class is effectively a lightweight `rope
250<http://en.wikipedia.org/wiki/Rope_(computer_science)>`_ which points to
251temporary (stack allocated) objects. Twines can be implicitly constructed as
252the result of the plus operator applied to strings (i.e., a C strings, an
253``std::string``, or a ``StringRef``). The twine delays the actual concatenation
254of strings until it is actually required, at which point it can be efficiently
255rendered directly into a character array. This avoids unnecessary heap
256allocation involved in constructing the temporary results of string
257concatenation. See ``llvm/ADT/Twine.h`` (`doxygen
258<http://llvm.org/doxygen/Twine_8h_source.html>`__) and :ref:`here <dss_twine>`
259for more information.
260
261As with a ``StringRef``, ``Twine`` objects point to external memory and should
262almost never be stored or mentioned directly. They are intended solely for use
263when defining a function which should be able to efficiently accept concatenated
264strings.
265
Zachary Turner11db2642016-11-11 23:57:40 +0000266.. _formatting_strings:
267
268Formatting strings (the ``formatv`` function)
269---------------------------------------------
270While LLVM doesn't necessarily do a lot of string manipulation and parsing, it
271does do a lot of string formatting. From diagnostic messages, to llvm tool
272outputs such as ``llvm-readobj`` to printing verbose disassembly listings and
273LLDB runtime logging, the need for string formatting is pervasive.
274
275The ``formatv`` is similar in spirit to ``printf``, but uses a different syntax
276which borrows heavily from Python and C#. Unlike ``printf`` it deduces the type
277to be formatted at compile time, so it does not need a format specifier such as
278``%d``. This reduces the mental overhead of trying to construct portable format
279strings, especially for platform-specific types like ``size_t`` or pointer types.
280Unlike both ``printf`` and Python, it additionally fails to compile if LLVM does
281not know how to format the type. These two properties ensure that the function
282is both safer and simpler to use than traditional formatting methods such as
283the ``printf`` family of functions.
284
285Simple formatting
286^^^^^^^^^^^^^^^^^
287
288A call to ``formatv`` involves a single **format string** consisting of 0 or more
289**replacement sequences**, followed by a variable length list of **replacement values**.
290A replacement sequence is a string of the form ``{N[[,align]:style]}``.
291
292``N`` refers to the 0-based index of the argument from the list of replacement
293values. Note that this means it is possible to reference the same parameter
294multiple times, possibly with different style and/or alignment options, in any order.
295
296``align`` is an optional string specifying the width of the field to format
297the value into, and the alignment of the value within the field. It is specified as
298an optional **alignment style** followed by a positive integral **field width**. The
299alignment style can be one of the characters ``-`` (left align), ``=`` (center align),
300or ``+`` (right align). The default is right aligned.
301
302``style`` is an optional string consisting of a type specific that controls the
303formatting of the value. For example, to format a floating point value as a percentage,
304you can use the style option ``P``.
305
306Custom formatting
307^^^^^^^^^^^^^^^^^
308
309There are two ways to customize the formatting behavior for a type.
310
3111. Provide a template specialization of ``llvm::format_provider<T>`` for your
312 type ``T`` with the appropriate static format method.
313
314 .. code-block:: c++
315
316 namespace llvm {
317 template<>
318 struct format_provider<MyFooBar> {
319 static void format(const MyFooBar &V, raw_ostream &Stream, StringRef Style) {
320 // Do whatever is necessary to format `V` into `Stream`
321 }
322 };
323 void foo() {
324 MyFooBar X;
325 std::string S = formatv("{0}", X);
326 }
327 }
328
329 This is a useful extensibility mechanism for adding support for formatting your own
330 custom types with your own custom Style options. But it does not help when you want
331 to extend the mechanism for formatting a type that the library already knows how to
332 format. For that, we need something else.
333
Pavel Labath08c2e862016-12-15 09:40:27 +00003342. Provide a **format adapter** inheriting from ``llvm::FormatAdapter<T>``.
Zachary Turner11db2642016-11-11 23:57:40 +0000335
336 .. code-block:: c++
337
338 namespace anything {
Pavel Labath08c2e862016-12-15 09:40:27 +0000339 struct format_int_custom : public llvm::FormatAdapter<int> {
340 explicit format_int_custom(int N) : llvm::FormatAdapter<int>(N) {}
341 void format(llvm::raw_ostream &Stream, StringRef Style) override {
342 // Do whatever is necessary to format ``this->Item`` into ``Stream``
Zachary Turner11db2642016-11-11 23:57:40 +0000343 }
344 };
345 }
346 namespace llvm {
347 void foo() {
348 std::string S = formatv("{0}", anything::format_int_custom(42));
349 }
350 }
351
Pavel Labath08c2e862016-12-15 09:40:27 +0000352 If the type is detected to be derived from ``FormatAdapter<T>``, ``formatv``
353 will call the
Zachary Turner11db2642016-11-11 23:57:40 +0000354 ``format`` method on the argument passing in the specified style. This allows
355 one to provide custom formatting of any type, including one which already has
356 a builtin format provider.
357
358``formatv`` Examples
359^^^^^^^^^^^^^^^^^^^^
360Below is intended to provide an incomplete set of examples demonstrating
361the usage of ``formatv``. More information can be found by reading the
362doxygen documentation or by looking at the unit test suite.
363
364
365.. code-block:: c++
366
367 std::string S;
368 // Simple formatting of basic types and implicit string conversion.
369 S = formatv("{0} ({1:P})", 7, 0.35); // S == "7 (35.00%)"
370
371 // Out-of-order referencing and multi-referencing
372 outs() << formatv("{0} {2} {1} {0}", 1, "test", 3); // prints "1 3 test 1"
373
374 // Left, right, and center alignment
375 S = formatv("{0,7}", 'a'); // S == " a";
376 S = formatv("{0,-7}", 'a'); // S == "a ";
377 S = formatv("{0,=7}", 'a'); // S == " a ";
378 S = formatv("{0,+7}", 'a'); // S == " a";
379
380 // Custom styles
381 S = formatv("{0:N} - {0:x} - {1:E}", 12345, 123908342); // S == "12,345 - 0x3039 - 1.24E8"
382
383 // Adapters
384 S = formatv("{0}", fmt_align(42, AlignStyle::Center, 7)); // S == " 42 "
385 S = formatv("{0}", fmt_repeat("hi", 3)); // S == "hihihi"
386 S = formatv("{0}", fmt_pad("hi", 2, 6)); // S == " hi "
387
388 // Ranges
389 std::vector<int> V = {8, 9, 10};
390 S = formatv("{0}", make_range(V.begin(), V.end())); // S == "8, 9, 10"
391 S = formatv("{0:$[+]}", make_range(V.begin(), V.end())); // S == "8+9+10"
392 S = formatv("{0:$[ + ]@[x]}", make_range(V.begin(), V.end())); // S == "0x8 + 0x9 + 0xA"
393
Lang Hamesf7f6d3e2016-03-16 01:02:46 +0000394.. _error_apis:
395
396Error handling
397--------------
398
399Proper error handling helps us identify bugs in our code, and helps end-users
400understand errors in their tool usage. Errors fall into two broad categories:
401*programmatic* and *recoverable*, with different strategies for handling and
402reporting.
403
404Programmatic Errors
405^^^^^^^^^^^^^^^^^^^
406
407Programmatic errors are violations of program invariants or API contracts, and
408represent bugs within the program itself. Our aim is to document invariants, and
409to abort quickly at the point of failure (providing some basic diagnostic) when
410invariants are broken at runtime.
411
412The fundamental tools for handling programmatic errors are assertions and the
413llvm_unreachable function. Assertions are used to express invariant conditions,
414and should include a message describing the invariant:
415
416.. code-block:: c++
417
418 assert(isPhysReg(R) && "All virt regs should have been allocated already.");
419
420The llvm_unreachable function can be used to document areas of control flow
421that should never be entered if the program invariants hold:
422
423.. code-block:: c++
424
425 enum { Foo, Bar, Baz } X = foo();
426
427 switch (X) {
428 case Foo: /* Handle Foo */; break;
429 case Bar: /* Handle Bar */; break;
430 default:
431 llvm_unreachable("X should be Foo or Bar here");
432 }
433
434Recoverable Errors
435^^^^^^^^^^^^^^^^^^
436
437Recoverable errors represent an error in the program's environment, for example
438a resource failure (a missing file, a dropped network connection, etc.), or
439malformed input. These errors should be detected and communicated to a level of
440the program where they can be handled appropriately. Handling the error may be
441as simple as reporting the issue to the user, or it may involve attempts at
442recovery.
443
444Recoverable errors are modeled using LLVM's ``Error`` scheme. This scheme
445represents errors using function return values, similar to classic C integer
446error codes, or C++'s ``std::error_code``. However, the ``Error`` class is
447actually a lightweight wrapper for user-defined error types, allowing arbitrary
448information to be attached to describe the error. This is similar to the way C++
449exceptions allow throwing of user-defined types.
450
Lang Hames42f5dd82016-09-02 03:46:08 +0000451Success values are created by calling ``Error::success()``, E.g.:
Lang Hamesf7f6d3e2016-03-16 01:02:46 +0000452
453.. code-block:: c++
454
455 Error foo() {
456 // Do something.
457 // Return success.
458 return Error::success();
459 }
460
461Success values are very cheap to construct and return - they have minimal
462impact on program performance.
463
464Failure values are constructed using ``make_error<T>``, where ``T`` is any class
Lang Hames42f5dd82016-09-02 03:46:08 +0000465that inherits from the ErrorInfo utility, E.g.:
Lang Hamesf7f6d3e2016-03-16 01:02:46 +0000466
467.. code-block:: c++
Kostya Serebryanyaf67fd12016-10-27 20:14:03 +0000468
Lang Hames03a88cc2016-10-25 21:19:30 +0000469 class BadFileFormat : public ErrorInfo<BadFileFormat> {
Lang Hamesf7f6d3e2016-03-16 01:02:46 +0000470 public:
Reid Klecknera15b76b2016-03-24 23:49:34 +0000471 static char ID;
Lang Hames03a88cc2016-10-25 21:19:30 +0000472 std::string Path;
473
474 BadFileFormat(StringRef Path) : Path(Path.str()) {}
475
476 void log(raw_ostream &OS) const override {
477 OS << Path << " is malformed";
478 }
479
480 std::error_code convertToErrorCode() const override {
481 return make_error_code(object_error::parse_failed);
482 }
Lang Hamesf7f6d3e2016-03-16 01:02:46 +0000483 };
484
Lang Hames03a88cc2016-10-25 21:19:30 +0000485 char FileExists::ID; // This should be declared in the C++ file.
Reid Klecknera15b76b2016-03-24 23:49:34 +0000486
Lang Hames03a88cc2016-10-25 21:19:30 +0000487 Error printFormattedFile(StringRef Path) {
488 if (<check for valid format>)
489 return make_error<InvalidObjectFile>(Path);
490 // print file contents.
Lang Hamesf7f6d3e2016-03-16 01:02:46 +0000491 return Error::success();
492 }
493
Lang Hamesa0f517f2016-03-23 03:18:16 +0000494Error values can be implicitly converted to bool: true for error, false for
495success, enabling the following idiom:
496
Justin Bogner91269bf2016-03-23 22:54:19 +0000497.. code-block:: c++
Lang Hamesa0f517f2016-03-23 03:18:16 +0000498
Lang Hames1684d7c2016-03-24 18:05:21 +0000499 Error mayFail();
Lang Hamesa0f517f2016-03-23 03:18:16 +0000500
Lang Hames1684d7c2016-03-24 18:05:21 +0000501 Error foo() {
502 if (auto Err = mayFail())
503 return Err;
504 // Success! We can proceed.
505 ...
Lang Hamesa0f517f2016-03-23 03:18:16 +0000506
Lang Hamesf7f6d3e2016-03-16 01:02:46 +0000507For functions that can fail but need to return a value the ``Expected<T>``
508utility can be used. Values of this type can be constructed with either a
Lang Hames42f5dd82016-09-02 03:46:08 +0000509``T``, or an ``Error``. Expected<T> values are also implicitly convertible to
Lang Hames03a88cc2016-10-25 21:19:30 +0000510boolean, but with the opposite convention to ``Error``: true for success, false
511for error. If success, the ``T`` value can be accessed via the dereference
512operator. If failure, the ``Error`` value can be extracted using the
513``takeError()`` method. Idiomatic usage looks like:
Lang Hamesf7f6d3e2016-03-16 01:02:46 +0000514
515.. code-block:: c++
516
Lang Hames03a88cc2016-10-25 21:19:30 +0000517 Expected<FormattedFile> openFormattedFile(StringRef Path) {
518 // If badly formatted, return an error.
519 if (auto Err = checkFormat(Path))
520 return std::move(Err);
521 // Otherwise return a FormattedFile instance.
522 return FormattedFile(Path);
Lang Hamesf7f6d3e2016-03-16 01:02:46 +0000523 }
524
Lang Hames03a88cc2016-10-25 21:19:30 +0000525 Error processFormattedFile(StringRef Path) {
526 // Try to open a formatted file
527 if (auto FileOrErr = openFormattedFile(Path)) {
528 // On success, grab a reference to the file and continue.
529 auto &File = *FileOrErr;
Lang Hames497fd942016-10-25 22:41:54 +0000530 ...
Lang Hamesca20d9e2016-10-25 22:38:50 +0000531 } else
532 // On error, extract the Error value and return it.
Lang Hames03a88cc2016-10-25 21:19:30 +0000533 return FileOrErr.takeError();
Lang Hamesf7f6d3e2016-03-16 01:02:46 +0000534 }
535
Lang Hames03a88cc2016-10-25 21:19:30 +0000536If an ``Expected<T>`` value is in success mode then the ``takeError()`` method
537will return a success value. Using this fact, the above function can be
538rewritten as:
539
540.. code-block:: c++
541
542 Error processFormattedFile(StringRef Path) {
543 // Try to open a formatted file
544 auto FileOrErr = openFormattedFile(Path);
545 if (auto Err = FileOrErr.takeError())
546 // On error, extract the Error value and return it.
547 return Err;
548 // On success, grab a reference to the file and continue.
549 auto &File = *FileOrErr;
Lang Hames497fd942016-10-25 22:41:54 +0000550 ...
Lang Hames03a88cc2016-10-25 21:19:30 +0000551 }
552
553This second form is often more readable for functions that involve multiple
554``Expected<T>`` values as it limits the indentation required.
555
556All ``Error`` instances, whether success or failure, must be either checked or
557moved from (via ``std::move`` or a return) before they are destructed.
558Accidentally discarding an unchecked error will cause a program abort at the
559point where the unchecked value's destructor is run, making it easy to identify
560and fix violations of this rule.
Lang Hamesf7f6d3e2016-03-16 01:02:46 +0000561
562Success values are considered checked once they have been tested (by invoking
563the boolean conversion operator):
564
565.. code-block:: c++
566
567 if (auto Err = canFail(...))
568 return Err; // Failure value - move error to caller.
569
570 // Safe to continue: Err was checked.
571
Lang Hamesc5d41d42016-09-02 03:50:50 +0000572In contrast, the following code will always cause an abort, even if ``canFail``
573returns a success value:
Lang Hamesf7f6d3e2016-03-16 01:02:46 +0000574
575.. code-block:: c++
576
577 canFail();
578 // Program will always abort here, even if canFail() returns Success, since
579 // the value is not checked.
580
581Failure values are considered checked once a handler for the error type has
582been activated:
583
584.. code-block:: c++
585
Lang Hames03a88cc2016-10-25 21:19:30 +0000586 handleErrors(
Kostya Serebryanya1f87e52016-10-31 21:10:26 +0000587 processFormattedFile(...),
Lang Hames03a88cc2016-10-25 21:19:30 +0000588 [](const BadFileFormat &BFF) {
Kostya Serebryanya1f87e52016-10-31 21:10:26 +0000589 report("Unable to process " + BFF.Path + ": bad format");
Lang Hames03a88cc2016-10-25 21:19:30 +0000590 },
591 [](const FileNotFound &FNF) {
592 report("File not found " + FNF.Path);
593 });
Lang Hamesf7f6d3e2016-03-16 01:02:46 +0000594
Lang Hames03a88cc2016-10-25 21:19:30 +0000595The ``handleErrors`` function takes an error as its first argument, followed by
596a variadic list of "handlers", each of which must be a callable type (a
597function, lambda, or class with a call operator) with one argument. The
598``handleErrors`` function will visit each handler in the sequence and check its
599argument type against the dynamic type of the error, running the first handler
Lang Hames19a23082016-11-07 22:33:13 +0000600that matches. This is the same decision process that is used decide which catch
601clause to run for a C++ exception.
Lang Hamesf7f6d3e2016-03-16 01:02:46 +0000602
Lang Hames03a88cc2016-10-25 21:19:30 +0000603Since the list of handlers passed to ``handleErrors`` may not cover every error
604type that can occur, the ``handleErrors`` function also returns an Error value
605that must be checked or propagated. If the error value that is passed to
606``handleErrors`` does not match any of the handlers it will be returned from
607handleErrors. Idiomatic use of ``handleErrors`` thus looks like:
608
609.. code-block:: c++
610
611 if (auto Err =
612 handleErrors(
613 processFormattedFile(...),
614 [](const BadFileFormat &BFF) {
Kostya Serebryanyb848eaf2016-11-01 05:51:12 +0000615 report("Unable to process " + BFF.Path + ": bad format");
Lang Hames03a88cc2016-10-25 21:19:30 +0000616 },
617 [](const FileNotFound &FNF) {
618 report("File not found " + FNF.Path);
619 }))
620 return Err;
621
622In cases where you truly know that the handler list is exhaustive the
623``handleAllErrors`` function can be used instead. This is identical to
624``handleErrors`` except that it will terminate the program if an unhandled
625error is passed in, and can therefore return void. The ``handleAllErrors``
626function should generally be avoided: the introduction of a new error type
627elsewhere in the program can easily turn a formerly exhaustive list of errors
628into a non-exhaustive list, risking unexpected program termination. Where
629possible, use handleErrors and propagate unknown errors up the stack instead.
630
Lang Hames19a23082016-11-07 22:33:13 +0000631For tool code, where errors can be handled by printing an error message then
632exiting with an error code, the :ref:`ExitOnError <err_exitonerr>` utility
633may be a better choice than handleErrors, as it simplifies control flow when
634calling fallible functions.
635
Lang Hames03a88cc2016-10-25 21:19:30 +0000636StringError
637"""""""""""
638
639Many kinds of errors have no recovery strategy, the only action that can be
640taken is to report them to the user so that the user can attempt to fix the
641environment. In this case representing the error as a string makes perfect
Lang Hames6b19ce62016-10-25 22:22:48 +0000642sense. LLVM provides the ``StringError`` class for this purpose. It takes two
Lang Hames03a88cc2016-10-25 21:19:30 +0000643arguments: A string error message, and an equivalent ``std::error_code`` for
644interoperability:
645
646.. code-block:: c++
647
648 make_error<StringError>("Bad executable",
649 make_error_code(errc::executable_format_error"));
650
651If you're certain that the error you're building will never need to be converted
652to a ``std::error_code`` you can use the ``inconvertibleErrorCode()`` function:
653
654.. code-block:: c++
655
656 make_error<StringError>("Bad executable", inconvertibleErrorCode());
657
658This should be done only after careful consideration. If any attempt is made to
659convert this error to a ``std::error_code`` it will trigger immediate program
660termination. Unless you are certain that your errors will not need
661interoperability you should look for an existing ``std::error_code`` that you
662can convert to, and even (as painful as it is) consider introducing a new one as
663a stopgap measure.
664
665Interoperability with std::error_code and ErrorOr
666"""""""""""""""""""""""""""""""""""""""""""""""""
667
668Many existing LLVM APIs use ``std::error_code`` and its partner ``ErrorOr<T>``
669(which plays the same role as ``Expected<T>``, but wraps a ``std::error_code``
670rather than an ``Error``). The infectious nature of error types means that an
671attempt to change one of these functions to return ``Error`` or ``Expected<T>``
672instead often results in an avalanche of changes to callers, callers of callers,
673and so on. (The first such attempt, returning an ``Error`` from
Kostya Serebryanyb848eaf2016-11-01 05:51:12 +0000674MachOObjectFile's constructor, was abandoned after the diff reached 3000 lines,
Lang Hames03a88cc2016-10-25 21:19:30 +0000675impacted half a dozen libraries, and was still growing).
676
677To solve this problem, the ``Error``/``std::error_code`` interoperability requirement was
678introduced. Two pairs of functions allow any ``Error`` value to be converted to a
679``std::error_code``, any ``Expected<T>`` to be converted to an ``ErrorOr<T>``, and vice
680versa:
681
682.. code-block:: c++
683
684 std::error_code errorToErrorCode(Error Err);
685 Error errorCodeToError(std::error_code EC);
686
687 template <typename T> ErrorOr<T> expectedToErrorOr(Expected<T> TOrErr);
688 template <typename T> Expected<T> errorOrToExpected(ErrorOr<T> TOrEC);
689
690
691Using these APIs it is easy to make surgical patches that update individual
692functions from ``std::error_code`` to ``Error``, and from ``ErrorOr<T>`` to
693``Expected<T>``.
694
695Returning Errors from error handlers
696""""""""""""""""""""""""""""""""""""
697
698Error recovery attempts may themselves fail. For that reason, ``handleErrors``
699actually recognises three different forms of handler signature:
700
701.. code-block:: c++
702
703 // Error must be handled, no new errors produced:
704 void(UserDefinedError &E);
705
706 // Error must be handled, new errors can be produced:
707 Error(UserDefinedError &E);
708
709 // Original error can be inspected, then re-wrapped and returned (or a new
710 // error can be produced):
711 Error(std::unique_ptr<UserDefinedError> E);
712
713Any error returned from a handler will be returned from the ``handleErrors``
714function so that it can be handled itself, or propagated up the stack.
715
Lang Hames19a23082016-11-07 22:33:13 +0000716.. _err_exitonerr:
717
Lang Hames03a88cc2016-10-25 21:19:30 +0000718Using ExitOnError to simplify tool code
719"""""""""""""""""""""""""""""""""""""""
720
721Library code should never call ``exit`` for a recoverable error, however in tool
Lang Hames6b19ce62016-10-25 22:22:48 +0000722code (especially command line tools) this can be a reasonable approach. Calling
Lang Hames03a88cc2016-10-25 21:19:30 +0000723``exit`` upon encountering an error dramatically simplifies control flow as the
724error no longer needs to be propagated up the stack. This allows code to be
725written in straight-line style, as long as each fallible call is wrapped in a
Lang Hames4f8a9602016-10-25 22:35:55 +0000726check and call to exit. The ``ExitOnError`` class supports this pattern by
Lang Hames03a88cc2016-10-25 21:19:30 +0000727providing call operators that inspect ``Error`` values, stripping the error away
728in the success case and logging to ``stderr`` then exiting in the failure case.
729
730To use this class, declare a global ``ExitOnError`` variable in your program:
731
732.. code-block:: c++
733
734 ExitOnError ExitOnErr;
735
736Calls to fallible functions can then be wrapped with a call to ``ExitOnErr``,
737turning them into non-failing calls:
738
739.. code-block:: c++
740
741 Error mayFail();
742 Expected<int> mayFail2();
743
744 void foo() {
745 ExitOnErr(mayFail());
746 int X = ExitOnErr(mayFail2());
747 }
748
Kostya Serebryanyb848eaf2016-11-01 05:51:12 +0000749On failure, the error's log message will be written to ``stderr``, optionally
750preceded by a string "banner" that can be set by calling the setBanner method. A
Lang Hames03a88cc2016-10-25 21:19:30 +0000751mapping can also be supplied from ``Error`` values to exit codes using the
752``setExitCodeMapper`` method:
753
Lang Hames7a9ca33372016-10-25 22:25:07 +0000754.. code-block:: c++
755
756 int main(int argc, char *argv[]) {
Kostya Serebryanyb848eaf2016-11-01 05:51:12 +0000757 ExitOnErr.setBanner(std::string(argv[0]) + " error:");
Lang Hames7a9ca33372016-10-25 22:25:07 +0000758 ExitOnErr.setExitCodeMapper(
759 [](const Error &Err) {
760 if (Err.isA<BadFileFormat>())
761 return 2;
762 return 1;
763 });
Lang Hames03a88cc2016-10-25 21:19:30 +0000764
765Use ``ExitOnError`` in your tool code where possible as it can greatly improve
766readability.
767
768Fallible constructors
769"""""""""""""""""""""
770
771Some classes require resource acquisition or other complex initialization that
Kostya Serebryanyb848eaf2016-11-01 05:51:12 +0000772can fail during construction. Unfortunately constructors can't return errors,
773and having clients test objects after they're constructed to ensure that they're
774valid is error prone as it's all too easy to forget the test. To work around
Lang Hames03a88cc2016-10-25 21:19:30 +0000775this, use the named constructor idiom and return an ``Expected<T>``:
776
777.. code-block:: c++
778
779 class Foo {
780 public:
781
Lang Hames4f8a9602016-10-25 22:35:55 +0000782 static Expected<Foo> Create(Resource R1, Resource R2) {
Lang Hames03a88cc2016-10-25 21:19:30 +0000783 Error Err;
784 Foo F(R1, R2, Err);
785 if (Err)
786 return std::move(Err);
787 return std::move(F);
788 }
789
790 private:
791
792 Foo(Resource R1, Resource R2, Error &Err) {
793 ErrorAsOutParameter EAO(&Err);
794 if (auto Err2 = R1.acquire()) {
795 Err = std::move(Err2);
796 return;
797 }
798 Err = R2.acquire();
799 }
800 };
801
802
803Here, the named constructor passes an ``Error`` by reference into the actual
804constructor, which the constructor can then use to return errors. The
805``ErrorAsOutParameter`` utility sets the ``Error`` value's checked flag on entry
806to the constructor so that the error can be assigned to, then resets it on exit
807to force the client (the named constructor) to check the error.
808
809By using this idiom, clients attempting to construct a Foo receive either a
810well-formed Foo or an Error, never an object in an invalid state.
811
812Propagating and consuming errors based on types
813"""""""""""""""""""""""""""""""""""""""""""""""
814
815In some contexts, certain types of error are known to be benign. For example,
816when walking an archive, some clients may be happy to skip over badly formatted
817object files rather than terminating the walk immediately. Skipping badly
Lang Hames4f8a9602016-10-25 22:35:55 +0000818formatted objects could be achieved using an elaborate handler method, but the
Lang Hames03a88cc2016-10-25 21:19:30 +0000819Error.h header provides two utilities that make this idiom much cleaner: the
820type inspection method, ``isA``, and the ``consumeError`` function:
821
822.. code-block:: c++
823
824 Error walkArchive(Archive A) {
825 for (unsigned I = 0; I != A.numMembers(); ++I) {
826 auto ChildOrErr = A.getMember(I);
Lang Hames4f8a9602016-10-25 22:35:55 +0000827 if (auto Err = ChildOrErr.takeError()) {
Lang Hames03a88cc2016-10-25 21:19:30 +0000828 if (Err.isA<BadFileFormat>())
829 consumeError(std::move(Err))
830 else
831 return Err;
Lang Hames4f8a9602016-10-25 22:35:55 +0000832 }
833 auto &Child = *ChildOrErr;
Lang Hames497fd942016-10-25 22:41:54 +0000834 // Use Child
835 ...
Lang Hames03a88cc2016-10-25 21:19:30 +0000836 }
837 return Error::success();
838 }
839
840Concatenating Errors with joinErrors
841""""""""""""""""""""""""""""""""""""
842
843In the archive walking example above ``BadFileFormat`` errors are simply
844consumed and ignored. If the client had wanted report these errors after
845completing the walk over the archive they could use the ``joinErrors`` utility:
846
847.. code-block:: c++
848
849 Error walkArchive(Archive A) {
850 Error DeferredErrs = Error::success();
851 for (unsigned I = 0; I != A.numMembers(); ++I) {
852 auto ChildOrErr = A.getMember(I);
853 if (auto Err = ChildOrErr.takeError())
854 if (Err.isA<BadFileFormat>())
855 DeferredErrs = joinErrors(std::move(DeferredErrs), std::move(Err));
856 else
857 return Err;
858 auto &Child = *ChildOrErr;
Lang Hames497fd942016-10-25 22:41:54 +0000859 // Use Child
860 ...
Lang Hames03a88cc2016-10-25 21:19:30 +0000861 }
862 return DeferredErrs;
863 }
864
865The ``joinErrors`` routine builds a special error type called ``ErrorList``,
866which holds a list of user defined errors. The ``handleErrors`` routine
867recognizes this type and will attempt to handle each of the contained erorrs in
868order. If all contained errors can be handled, ``handleErrors`` will return
869``Error::success()``, otherwise ``handleErrors`` will concatenate the remaining
870errors and return the resulting ``ErrorList``.
871
872Building fallible iterators and iterator ranges
873"""""""""""""""""""""""""""""""""""""""""""""""
874
875The archive walking examples above retrieve archive members by index, however
876this requires considerable boiler-plate for iteration and error checking. We can
Lang Hames8009f612016-10-25 23:08:32 +0000877clean this up by using ``Error`` with the "fallible iterator" pattern. The usual
878C++ iterator patterns do not allow for failure on increment, but we can
879incorporate support for it by having iterators hold an Error reference through
880which they can report failure. In this pattern, if an increment operation fails
881the failure is recorded via the Error reference and the iterator value is set to
882the end of the range in order to terminate the loop. This ensures that the
883dereference operation is safe anywhere that an ordinary iterator dereference
884would be safe (i.e. when the iterator is not equal to end). Where this pattern
885is followed (as in the ``llvm::object::Archive`` class) the result is much
886cleaner iteration idiom:
Lang Hames03a88cc2016-10-25 21:19:30 +0000887
888.. code-block:: c++
889
890 Error Err;
891 for (auto &Child : Ar->children(Err)) {
Kostya Serebryanyb848eaf2016-11-01 05:51:12 +0000892 // Use Child - we only enter the loop when it's valid
Lang Hames497fd942016-10-25 22:41:54 +0000893 ...
Lang Hames03a88cc2016-10-25 21:19:30 +0000894 }
Kostya Serebryanyb848eaf2016-11-01 05:51:12 +0000895 // Check Err after the loop to ensure it didn't break due to an error.
Lang Hames03a88cc2016-10-25 21:19:30 +0000896 if (Err)
897 return Err;
Lang Hamesf7f6d3e2016-03-16 01:02:46 +0000898
Richard Smithddb2fde2014-05-06 07:45:39 +0000899.. _function_apis:
900
Lang Hamesf7f6d3e2016-03-16 01:02:46 +0000901More information on Error and its related utilities can be found in the
902Error.h header file.
903
Richard Smithddb2fde2014-05-06 07:45:39 +0000904Passing functions and other callable objects
905--------------------------------------------
906
907Sometimes you may want a function to be passed a callback object. In order to
908support lambda expressions and other function objects, you should not use the
909traditional C approach of taking a function pointer and an opaque cookie:
910
911.. code-block:: c++
912
913 void takeCallback(bool (*Callback)(Function *, void *), void *Cookie);
914
915Instead, use one of the following approaches:
916
917Function template
918^^^^^^^^^^^^^^^^^
919
920If you don't mind putting the definition of your function into a header file,
921make it a function template that is templated on the callable type.
922
923.. code-block:: c++
924
925 template<typename Callable>
926 void takeCallback(Callable Callback) {
927 Callback(1, 2, 3);
928 }
929
930The ``function_ref`` class template
931^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
932
933The ``function_ref``
Sean Silva96faef22016-07-09 23:08:14 +0000934(`doxygen <http://llvm.org/docs/doxygen/html/classllvm_1_1function__ref_3_01Ret_07Params_8_8_8_08_4.html>`__) class
Richard Smithddb2fde2014-05-06 07:45:39 +0000935template represents a reference to a callable object, templated over the type
936of the callable. This is a good choice for passing a callback to a function,
Reid Kleckner5c2245b2014-07-17 22:43:00 +0000937if you don't need to hold onto the callback after the function returns. In this
938way, ``function_ref`` is to ``std::function`` as ``StringRef`` is to
939``std::string``.
Richard Smithddb2fde2014-05-06 07:45:39 +0000940
941``function_ref<Ret(Param1, Param2, ...)>`` can be implicitly constructed from
942any callable object that can be called with arguments of type ``Param1``,
943``Param2``, ..., and returns a value that can be converted to type ``Ret``.
944For example:
945
946.. code-block:: c++
947
948 void visitBasicBlocks(Function *F, function_ref<bool (BasicBlock*)> Callback) {
949 for (BasicBlock &BB : *F)
950 if (Callback(&BB))
951 return;
952 }
953
954can be called using:
955
956.. code-block:: c++
957
958 visitBasicBlocks(F, [&](BasicBlock *BB) {
959 if (process(BB))
960 return isEmpty(BB);
961 return false;
962 });
963
Reid Kleckner5c2245b2014-07-17 22:43:00 +0000964Note that a ``function_ref`` object contains pointers to external memory, so it
965is not generally safe to store an instance of the class (unless you know that
966the external storage will not be freed). If you need this ability, consider
967using ``std::function``. ``function_ref`` is small enough that it should always
968be passed by value.
Richard Smithddb2fde2014-05-06 07:45:39 +0000969
Sean Silvabeb15ca2012-12-04 03:20:08 +0000970.. _DEBUG:
971
972The ``DEBUG()`` macro and ``-debug`` option
973-------------------------------------------
974
975Often when working on your pass you will put a bunch of debugging printouts and
976other code into your pass. After you get it working, you want to remove it, but
977you may need it again in the future (to work out new bugs that you run across).
978
979Naturally, because of this, you don't want to delete the debug printouts, but
980you don't want them to always be noisy. A standard compromise is to comment
981them out, allowing you to enable them if you need them in the future.
982
983The ``llvm/Support/Debug.h`` (`doxygen
984<http://llvm.org/doxygen/Debug_8h-source.html>`__) file provides a macro named
985``DEBUG()`` that is a much nicer solution to this problem. Basically, you can
986put arbitrary code into the argument of the ``DEBUG`` macro, and it is only
987executed if '``opt``' (or any other tool) is run with the '``-debug``' command
988line argument:
989
990.. code-block:: c++
991
992 DEBUG(errs() << "I am here!\n");
993
994Then you can run your pass like this:
995
996.. code-block:: none
997
998 $ opt < a.bc > /dev/null -mypass
999 <no output>
1000 $ opt < a.bc > /dev/null -mypass -debug
1001 I am here!
1002
1003Using the ``DEBUG()`` macro instead of a home-brewed solution allows you to not
1004have to create "yet another" command line option for the debug output for your
Justin Bognerc2e54af2015-10-15 18:17:44 +00001005pass. Note that ``DEBUG()`` macros are disabled for non-asserts builds, so they
Sean Silvabeb15ca2012-12-04 03:20:08 +00001006do not cause a performance impact at all (for the same reason, they should also
1007not contain side-effects!).
1008
1009One additional nice thing about the ``DEBUG()`` macro is that you can enable or
1010disable it directly in gdb. Just use "``set DebugFlag=0``" or "``set
1011DebugFlag=1``" from the gdb if the program is running. If the program hasn't
1012been started yet, you can always just run it with ``-debug``.
1013
1014.. _DEBUG_TYPE:
1015
1016Fine grained debug info with ``DEBUG_TYPE`` and the ``-debug-only`` option
1017^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
1018
1019Sometimes you may find yourself in a situation where enabling ``-debug`` just
1020turns on **too much** information (such as when working on the code generator).
1021If you want to enable debug information with more fine-grained control, you
Justin Bognerc2e54af2015-10-15 18:17:44 +00001022should define the ``DEBUG_TYPE`` macro and use the ``-debug-only`` option as
Alexey Samsonov6c0ddfe2014-06-05 23:12:43 +00001023follows:
Sean Silvabeb15ca2012-12-04 03:20:08 +00001024
1025.. code-block:: c++
1026
Sean Silvabeb15ca2012-12-04 03:20:08 +00001027 #define DEBUG_TYPE "foo"
1028 DEBUG(errs() << "'foo' debug type\n");
1029 #undef DEBUG_TYPE
1030 #define DEBUG_TYPE "bar"
1031 DEBUG(errs() << "'bar' debug type\n"));
1032 #undef DEBUG_TYPE
Sean Silvabeb15ca2012-12-04 03:20:08 +00001033
1034Then you can run your pass like this:
1035
1036.. code-block:: none
1037
1038 $ opt < a.bc > /dev/null -mypass
1039 <no output>
1040 $ opt < a.bc > /dev/null -mypass -debug
Sean Silvabeb15ca2012-12-04 03:20:08 +00001041 'foo' debug type
1042 'bar' debug type
Sean Silvabeb15ca2012-12-04 03:20:08 +00001043 $ opt < a.bc > /dev/null -mypass -debug-only=foo
1044 'foo' debug type
1045 $ opt < a.bc > /dev/null -mypass -debug-only=bar
1046 'bar' debug type
Christof Doumaf617e672016-01-12 10:23:13 +00001047 $ opt < a.bc > /dev/null -mypass -debug-only=foo,bar
1048 'foo' debug type
1049 'bar' debug type
Sean Silvabeb15ca2012-12-04 03:20:08 +00001050
1051Of course, in practice, you should only set ``DEBUG_TYPE`` at the top of a file,
Justin Bognerc2e54af2015-10-15 18:17:44 +00001052to specify the debug type for the entire module. Be careful that you only do
1053this after including Debug.h and not around any #include of headers. Also, you
1054should use names more meaningful than "foo" and "bar", because there is no
1055system in place to ensure that names do not conflict. If two different modules
1056use the same string, they will all be turned on when the name is specified.
1057This allows, for example, all debug information for instruction scheduling to be
1058enabled with ``-debug-only=InstrSched``, even if the source lives in multiple
Sylvestre Ledru84666a12016-02-14 20:16:22 +00001059files. The name must not include a comma (,) as that is used to separate the
Christof Doumaf617e672016-01-12 10:23:13 +00001060arguments of the ``-debug-only`` option.
Sean Silvabeb15ca2012-12-04 03:20:08 +00001061
Sylvestre Ledru1623b462014-09-25 10:58:16 +00001062For performance reasons, -debug-only is not available in optimized build
1063(``--enable-optimized``) of LLVM.
Sylvestre Ledrub5984fa2014-09-25 10:57:00 +00001064
Sean Silvabeb15ca2012-12-04 03:20:08 +00001065The ``DEBUG_WITH_TYPE`` macro is also available for situations where you would
1066like to set ``DEBUG_TYPE``, but only for one specific ``DEBUG`` statement. It
1067takes an additional first parameter, which is the type to use. For example, the
1068preceding example could be written as:
1069
1070.. code-block:: c++
1071
Sean Silvabeb15ca2012-12-04 03:20:08 +00001072 DEBUG_WITH_TYPE("foo", errs() << "'foo' debug type\n");
1073 DEBUG_WITH_TYPE("bar", errs() << "'bar' debug type\n"));
Sean Silvabeb15ca2012-12-04 03:20:08 +00001074
1075.. _Statistic:
1076
1077The ``Statistic`` class & ``-stats`` option
1078-------------------------------------------
1079
1080The ``llvm/ADT/Statistic.h`` (`doxygen
1081<http://llvm.org/doxygen/Statistic_8h-source.html>`__) file provides a class
1082named ``Statistic`` that is used as a unified way to keep track of what the LLVM
1083compiler is doing and how effective various optimizations are. It is useful to
1084see what optimizations are contributing to making a particular program run
1085faster.
1086
1087Often you may run your pass on some big program, and you're interested to see
1088how many times it makes a certain transformation. Although you can do this with
1089hand inspection, or some ad-hoc method, this is a real pain and not very useful
1090for big programs. Using the ``Statistic`` class makes it very easy to keep
1091track of this information, and the calculated information is presented in a
1092uniform manner with the rest of the passes being executed.
1093
1094There are many examples of ``Statistic`` uses, but the basics of using it are as
1095follows:
1096
1097#. Define your statistic like this:
1098
1099 .. code-block:: c++
1100
1101 #define DEBUG_TYPE "mypassname" // This goes before any #includes.
1102 STATISTIC(NumXForms, "The # of times I did stuff");
1103
1104 The ``STATISTIC`` macro defines a static variable, whose name is specified by
1105 the first argument. The pass name is taken from the ``DEBUG_TYPE`` macro, and
1106 the description is taken from the second argument. The variable defined
1107 ("NumXForms" in this case) acts like an unsigned integer.
1108
1109#. Whenever you make a transformation, bump the counter:
1110
1111 .. code-block:: c++
1112
1113 ++NumXForms; // I did stuff!
1114
1115That's all you have to do. To get '``opt``' to print out the statistics
1116gathered, use the '``-stats``' option:
1117
1118.. code-block:: none
1119
1120 $ opt -stats -mypassname < program.bc > /dev/null
1121 ... statistics output ...
1122
Justin Bogner08f36fd2015-02-21 20:53:36 +00001123Note that in order to use the '``-stats``' option, LLVM must be
1124compiled with assertions enabled.
1125
Sean Silvabeb15ca2012-12-04 03:20:08 +00001126When running ``opt`` on a C file from the SPEC benchmark suite, it gives a
1127report that looks like this:
1128
1129.. code-block:: none
1130
1131 7646 bitcodewriter - Number of normal instructions
1132 725 bitcodewriter - Number of oversized instructions
1133 129996 bitcodewriter - Number of bitcode bytes written
1134 2817 raise - Number of insts DCEd or constprop'd
1135 3213 raise - Number of cast-of-self removed
1136 5046 raise - Number of expression trees converted
1137 75 raise - Number of other getelementptr's formed
1138 138 raise - Number of load/store peepholes
1139 42 deadtypeelim - Number of unused typenames removed from symtab
1140 392 funcresolve - Number of varargs functions resolved
1141 27 globaldce - Number of global variables removed
1142 2 adce - Number of basic blocks removed
1143 134 cee - Number of branches revectored
1144 49 cee - Number of setcc instruction eliminated
1145 532 gcse - Number of loads removed
1146 2919 gcse - Number of instructions removed
1147 86 indvars - Number of canonical indvars added
1148 87 indvars - Number of aux indvars removed
1149 25 instcombine - Number of dead inst eliminate
1150 434 instcombine - Number of insts combined
1151 248 licm - Number of load insts hoisted
1152 1298 licm - Number of insts hoisted to a loop pre-header
1153 3 licm - Number of insts hoisted to multiple loop preds (bad, no loop pre-header)
1154 75 mem2reg - Number of alloca's promoted
1155 1444 cfgsimplify - Number of blocks simplified
1156
1157Obviously, with so many optimizations, having a unified framework for this stuff
1158is very nice. Making your pass fit well into the framework makes it more
1159maintainable and useful.
1160
1161.. _ViewGraph:
1162
1163Viewing graphs while debugging code
1164-----------------------------------
1165
1166Several of the important data structures in LLVM are graphs: for example CFGs
1167made out of LLVM :ref:`BasicBlocks <BasicBlock>`, CFGs made out of LLVM
1168:ref:`MachineBasicBlocks <MachineBasicBlock>`, and :ref:`Instruction Selection
1169DAGs <SelectionDAG>`. In many cases, while debugging various parts of the
1170compiler, it is nice to instantly visualize these graphs.
1171
1172LLVM provides several callbacks that are available in a debug build to do
1173exactly that. If you call the ``Function::viewCFG()`` method, for example, the
1174current LLVM tool will pop up a window containing the CFG for the function where
1175each basic block is a node in the graph, and each node contains the instructions
1176in the block. Similarly, there also exists ``Function::viewCFGOnly()`` (does
1177not include the instructions), the ``MachineFunction::viewCFG()`` and
1178``MachineFunction::viewCFGOnly()``, and the ``SelectionDAG::viewGraph()``
1179methods. Within GDB, for example, you can usually use something like ``call
1180DAG.viewGraph()`` to pop up a window. Alternatively, you can sprinkle calls to
1181these functions in your code in places you want to debug.
1182
Alp Toker125be842014-06-02 01:40:04 +00001183Getting this to work requires a small amount of setup. On Unix systems
Sean Silvabeb15ca2012-12-04 03:20:08 +00001184with X11, install the `graphviz <http://www.graphviz.org>`_ toolkit, and make
Nico Weberad156922014-03-07 18:08:54 +00001185sure 'dot' and 'gv' are in your path. If you are running on Mac OS X, download
1186and install the Mac OS X `Graphviz program
Sean Silvabeb15ca2012-12-04 03:20:08 +00001187<http://www.pixelglow.com/graphviz/>`_ and add
1188``/Applications/Graphviz.app/Contents/MacOS/`` (or wherever you install it) to
Alp Toker125be842014-06-02 01:40:04 +00001189your path. The programs need not be present when configuring, building or
1190running LLVM and can simply be installed when needed during an active debug
1191session.
Sean Silvabeb15ca2012-12-04 03:20:08 +00001192
1193``SelectionDAG`` has been extended to make it easier to locate *interesting*
1194nodes in large complex graphs. From gdb, if you ``call DAG.setGraphColor(node,
1195"color")``, then the next ``call DAG.viewGraph()`` would highlight the node in
1196the specified color (choices of colors can be found at `colors
1197<http://www.graphviz.org/doc/info/colors.html>`_.) More complex node attributes
1198can be provided with ``call DAG.setGraphAttrs(node, "attributes")`` (choices can
1199be found at `Graph attributes <http://www.graphviz.org/doc/info/attrs.html>`_.)
1200If you want to restart and clear all the current graph attributes, then you can
1201``call DAG.clearGraphAttrs()``.
1202
1203Note that graph visualization features are compiled out of Release builds to
1204reduce file size. This means that you need a Debug+Asserts or Release+Asserts
1205build to use these features.
1206
1207.. _datastructure:
1208
1209Picking the Right Data Structure for a Task
1210===========================================
1211
1212LLVM has a plethora of data structures in the ``llvm/ADT/`` directory, and we
1213commonly use STL data structures. This section describes the trade-offs you
1214should consider when you pick one.
1215
1216The first step is a choose your own adventure: do you want a sequential
1217container, a set-like container, or a map-like container? The most important
1218thing when choosing a container is the algorithmic properties of how you plan to
1219access the container. Based on that, you should use:
1220
1221
1222* a :ref:`map-like <ds_map>` container if you need efficient look-up of a
1223 value based on another value. Map-like containers also support efficient
1224 queries for containment (whether a key is in the map). Map-like containers
1225 generally do not support efficient reverse mapping (values to keys). If you
1226 need that, use two maps. Some map-like containers also support efficient
1227 iteration through the keys in sorted order. Map-like containers are the most
1228 expensive sort, only use them if you need one of these capabilities.
1229
1230* a :ref:`set-like <ds_set>` container if you need to put a bunch of stuff into
1231 a container that automatically eliminates duplicates. Some set-like
1232 containers support efficient iteration through the elements in sorted order.
1233 Set-like containers are more expensive than sequential containers.
1234
1235* a :ref:`sequential <ds_sequential>` container provides the most efficient way
1236 to add elements and keeps track of the order they are added to the collection.
1237 They permit duplicates and support efficient iteration, but do not support
1238 efficient look-up based on a key.
1239
1240* a :ref:`string <ds_string>` container is a specialized sequential container or
1241 reference structure that is used for character or byte arrays.
1242
1243* a :ref:`bit <ds_bit>` container provides an efficient way to store and
1244 perform set operations on sets of numeric id's, while automatically
1245 eliminating duplicates. Bit containers require a maximum of 1 bit for each
1246 identifier you want to store.
1247
1248Once the proper category of container is determined, you can fine tune the
1249memory use, constant factors, and cache behaviors of access by intelligently
1250picking a member of the category. Note that constant factors and cache behavior
1251can be a big deal. If you have a vector that usually only contains a few
1252elements (but could contain many), for example, it's much better to use
1253:ref:`SmallVector <dss_smallvector>` than :ref:`vector <dss_vector>`. Doing so
1254avoids (relatively) expensive malloc/free calls, which dwarf the cost of adding
1255the elements to the container.
1256
1257.. _ds_sequential:
1258
1259Sequential Containers (std::vector, std::list, etc)
1260---------------------------------------------------
1261
1262There are a variety of sequential containers available for you, based on your
1263needs. Pick the first in this section that will do what you want.
1264
1265.. _dss_arrayref:
1266
1267llvm/ADT/ArrayRef.h
1268^^^^^^^^^^^^^^^^^^^
1269
1270The ``llvm::ArrayRef`` class is the preferred class to use in an interface that
1271accepts a sequential list of elements in memory and just reads from them. By
1272taking an ``ArrayRef``, the API can be passed a fixed size array, an
1273``std::vector``, an ``llvm::SmallVector`` and anything else that is contiguous
1274in memory.
1275
1276.. _dss_fixedarrays:
1277
1278Fixed Size Arrays
1279^^^^^^^^^^^^^^^^^
1280
1281Fixed size arrays are very simple and very fast. They are good if you know
1282exactly how many elements you have, or you have a (low) upper bound on how many
1283you have.
1284
1285.. _dss_heaparrays:
1286
1287Heap Allocated Arrays
1288^^^^^^^^^^^^^^^^^^^^^
1289
1290Heap allocated arrays (``new[]`` + ``delete[]``) are also simple. They are good
1291if the number of elements is variable, if you know how many elements you will
1292need before the array is allocated, and if the array is usually large (if not,
1293consider a :ref:`SmallVector <dss_smallvector>`). The cost of a heap allocated
1294array is the cost of the new/delete (aka malloc/free). Also note that if you
1295are allocating an array of a type with a constructor, the constructor and
1296destructors will be run for every element in the array (re-sizable vectors only
1297construct those elements actually used).
1298
1299.. _dss_tinyptrvector:
1300
1301llvm/ADT/TinyPtrVector.h
1302^^^^^^^^^^^^^^^^^^^^^^^^
1303
1304``TinyPtrVector<Type>`` is a highly specialized collection class that is
1305optimized to avoid allocation in the case when a vector has zero or one
1306elements. It has two major restrictions: 1) it can only hold values of pointer
1307type, and 2) it cannot hold a null pointer.
1308
1309Since this container is highly specialized, it is rarely used.
1310
1311.. _dss_smallvector:
1312
1313llvm/ADT/SmallVector.h
1314^^^^^^^^^^^^^^^^^^^^^^
1315
1316``SmallVector<Type, N>`` is a simple class that looks and smells just like
1317``vector<Type>``: it supports efficient iteration, lays out elements in memory
1318order (so you can do pointer arithmetic between elements), supports efficient
1319push_back/pop_back operations, supports efficient random access to its elements,
1320etc.
1321
1322The advantage of SmallVector is that it allocates space for some number of
1323elements (N) **in the object itself**. Because of this, if the SmallVector is
1324dynamically smaller than N, no malloc is performed. This can be a big win in
1325cases where the malloc/free call is far more expensive than the code that
1326fiddles around with the elements.
1327
1328This is good for vectors that are "usually small" (e.g. the number of
1329predecessors/successors of a block is usually less than 8). On the other hand,
1330this makes the size of the SmallVector itself large, so you don't want to
1331allocate lots of them (doing so will waste a lot of space). As such,
1332SmallVectors are most useful when on the stack.
1333
1334SmallVector also provides a nice portable and efficient replacement for
1335``alloca``.
1336
Sean Silva4ee92f92013-03-22 23:41:29 +00001337.. note::
1338
Sean Silva43590682013-03-22 23:52:38 +00001339 Prefer to use ``SmallVectorImpl<T>`` as a parameter type.
Sean Silva4ee92f92013-03-22 23:41:29 +00001340
1341 In APIs that don't care about the "small size" (most?), prefer to use
1342 the ``SmallVectorImpl<T>`` class, which is basically just the "vector
1343 header" (and methods) without the elements allocated after it. Note that
1344 ``SmallVector<T, N>`` inherits from ``SmallVectorImpl<T>`` so the
1345 conversion is implicit and costs nothing. E.g.
1346
1347 .. code-block:: c++
1348
1349 // BAD: Clients cannot pass e.g. SmallVector<Foo, 4>.
1350 hardcodedSmallSize(SmallVector<Foo, 2> &Out);
1351 // GOOD: Clients can pass any SmallVector<Foo, N>.
1352 allowsAnySmallSize(SmallVectorImpl<Foo> &Out);
1353
1354 void someFunc() {
1355 SmallVector<Foo, 8> Vec;
1356 hardcodedSmallSize(Vec); // Error.
1357 allowsAnySmallSize(Vec); // Works.
1358 }
1359
1360 Even though it has "``Impl``" in the name, this is so widely used that
1361 it really isn't "private to the implementation" anymore. A name like
1362 ``SmallVectorHeader`` would be more appropriate.
1363
Sean Silvabeb15ca2012-12-04 03:20:08 +00001364.. _dss_vector:
1365
1366<vector>
1367^^^^^^^^
1368
1369``std::vector`` is well loved and respected. It is useful when SmallVector
1370isn't: when the size of the vector is often large (thus the small optimization
1371will rarely be a benefit) or if you will be allocating many instances of the
1372vector itself (which would waste space for elements that aren't in the
1373container). vector is also useful when interfacing with code that expects
1374vectors :).
1375
1376One worthwhile note about std::vector: avoid code like this:
1377
1378.. code-block:: c++
1379
1380 for ( ... ) {
1381 std::vector<foo> V;
1382 // make use of V.
1383 }
1384
1385Instead, write this as:
1386
1387.. code-block:: c++
1388
1389 std::vector<foo> V;
1390 for ( ... ) {
1391 // make use of V.
1392 V.clear();
1393 }
1394
1395Doing so will save (at least) one heap allocation and free per iteration of the
1396loop.
1397
1398.. _dss_deque:
1399
1400<deque>
1401^^^^^^^
1402
1403``std::deque`` is, in some senses, a generalized version of ``std::vector``.
1404Like ``std::vector``, it provides constant time random access and other similar
1405properties, but it also provides efficient access to the front of the list. It
1406does not guarantee continuity of elements within memory.
1407
1408In exchange for this extra flexibility, ``std::deque`` has significantly higher
1409constant factor costs than ``std::vector``. If possible, use ``std::vector`` or
1410something cheaper.
1411
1412.. _dss_list:
1413
1414<list>
1415^^^^^^
1416
1417``std::list`` is an extremely inefficient class that is rarely useful. It
1418performs a heap allocation for every element inserted into it, thus having an
1419extremely high constant factor, particularly for small data types.
1420``std::list`` also only supports bidirectional iteration, not random access
1421iteration.
1422
1423In exchange for this high cost, std::list supports efficient access to both ends
1424of the list (like ``std::deque``, but unlike ``std::vector`` or
1425``SmallVector``). In addition, the iterator invalidation characteristics of
1426std::list are stronger than that of a vector class: inserting or removing an
1427element into the list does not invalidate iterator or pointers to other elements
1428in the list.
1429
1430.. _dss_ilist:
1431
1432llvm/ADT/ilist.h
1433^^^^^^^^^^^^^^^^
1434
1435``ilist<T>`` implements an 'intrusive' doubly-linked list. It is intrusive,
1436because it requires the element to store and provide access to the prev/next
1437pointers for the list.
1438
1439``ilist`` has the same drawbacks as ``std::list``, and additionally requires an
1440``ilist_traits`` implementation for the element type, but it provides some novel
1441characteristics. In particular, it can efficiently store polymorphic objects,
1442the traits class is informed when an element is inserted or removed from the
1443list, and ``ilist``\ s are guaranteed to support a constant-time splice
1444operation.
1445
1446These properties are exactly what we want for things like ``Instruction``\ s and
1447basic blocks, which is why these are implemented with ``ilist``\ s.
1448
1449Related classes of interest are explained in the following subsections:
1450
1451* :ref:`ilist_traits <dss_ilist_traits>`
1452
1453* :ref:`iplist <dss_iplist>`
1454
1455* :ref:`llvm/ADT/ilist_node.h <dss_ilist_node>`
1456
1457* :ref:`Sentinels <dss_ilist_sentinel>`
1458
1459.. _dss_packedvector:
1460
1461llvm/ADT/PackedVector.h
1462^^^^^^^^^^^^^^^^^^^^^^^
1463
1464Useful for storing a vector of values using only a few number of bits for each
1465value. Apart from the standard operations of a vector-like container, it can
1466also perform an 'or' set operation.
1467
1468For example:
1469
1470.. code-block:: c++
1471
1472 enum State {
1473 None = 0x0,
1474 FirstCondition = 0x1,
1475 SecondCondition = 0x2,
1476 Both = 0x3
1477 };
1478
1479 State get() {
1480 PackedVector<State, 2> Vec1;
1481 Vec1.push_back(FirstCondition);
1482
1483 PackedVector<State, 2> Vec2;
1484 Vec2.push_back(SecondCondition);
1485
1486 Vec1 |= Vec2;
1487 return Vec1[0]; // returns 'Both'.
1488 }
1489
1490.. _dss_ilist_traits:
1491
1492ilist_traits
1493^^^^^^^^^^^^
1494
1495``ilist_traits<T>`` is ``ilist<T>``'s customization mechanism. ``iplist<T>``
1496(and consequently ``ilist<T>``) publicly derive from this traits class.
1497
1498.. _dss_iplist:
1499
1500iplist
1501^^^^^^
1502
1503``iplist<T>`` is ``ilist<T>``'s base and as such supports a slightly narrower
1504interface. Notably, inserters from ``T&`` are absent.
1505
1506``ilist_traits<T>`` is a public base of this class and can be used for a wide
1507variety of customizations.
1508
1509.. _dss_ilist_node:
1510
1511llvm/ADT/ilist_node.h
1512^^^^^^^^^^^^^^^^^^^^^
1513
Robin Morisset039781e2014-08-29 21:53:01 +00001514``ilist_node<T>`` implements the forward and backward links that are expected
Sean Silvabeb15ca2012-12-04 03:20:08 +00001515by the ``ilist<T>`` (and analogous containers) in the default manner.
1516
1517``ilist_node<T>``\ s are meant to be embedded in the node type ``T``, usually
1518``T`` publicly derives from ``ilist_node<T>``.
1519
1520.. _dss_ilist_sentinel:
1521
1522Sentinels
1523^^^^^^^^^
1524
1525``ilist``\ s have another specialty that must be considered. To be a good
1526citizen in the C++ ecosystem, it needs to support the standard container
1527operations, such as ``begin`` and ``end`` iterators, etc. Also, the
1528``operator--`` must work correctly on the ``end`` iterator in the case of
1529non-empty ``ilist``\ s.
1530
1531The only sensible solution to this problem is to allocate a so-called *sentinel*
1532along with the intrusive list, which serves as the ``end`` iterator, providing
1533the back-link to the last element. However conforming to the C++ convention it
1534is illegal to ``operator++`` beyond the sentinel and it also must not be
1535dereferenced.
1536
1537These constraints allow for some implementation freedom to the ``ilist`` how to
1538allocate and store the sentinel. The corresponding policy is dictated by
1539``ilist_traits<T>``. By default a ``T`` gets heap-allocated whenever the need
1540for a sentinel arises.
1541
1542While the default policy is sufficient in most cases, it may break down when
1543``T`` does not provide a default constructor. Also, in the case of many
1544instances of ``ilist``\ s, the memory overhead of the associated sentinels is
1545wasted. To alleviate the situation with numerous and voluminous
1546``T``-sentinels, sometimes a trick is employed, leading to *ghostly sentinels*.
1547
1548Ghostly sentinels are obtained by specially-crafted ``ilist_traits<T>`` which
1549superpose the sentinel with the ``ilist`` instance in memory. Pointer
1550arithmetic is used to obtain the sentinel, which is relative to the ``ilist``'s
1551``this`` pointer. The ``ilist`` is augmented by an extra pointer, which serves
1552as the back-link of the sentinel. This is the only field in the ghostly
1553sentinel which can be legally accessed.
1554
1555.. _dss_other:
1556
1557Other Sequential Container options
1558^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
1559
1560Other STL containers are available, such as ``std::string``.
1561
1562There are also various STL adapter classes such as ``std::queue``,
1563``std::priority_queue``, ``std::stack``, etc. These provide simplified access
1564to an underlying container but don't affect the cost of the container itself.
1565
1566.. _ds_string:
1567
1568String-like containers
1569----------------------
1570
1571There are a variety of ways to pass around and use strings in C and C++, and
1572LLVM adds a few new options to choose from. Pick the first option on this list
1573that will do what you need, they are ordered according to their relative cost.
1574
Ed Maste8ed40ce2015-04-14 20:52:58 +00001575Note that it is generally preferred to *not* pass strings around as ``const
Sean Silvabeb15ca2012-12-04 03:20:08 +00001576char*``'s. These have a number of problems, including the fact that they
1577cannot represent embedded nul ("\0") characters, and do not have a length
1578available efficiently. The general replacement for '``const char*``' is
1579StringRef.
1580
1581For more information on choosing string containers for APIs, please see
1582:ref:`Passing Strings <string_apis>`.
1583
1584.. _dss_stringref:
1585
1586llvm/ADT/StringRef.h
1587^^^^^^^^^^^^^^^^^^^^
1588
1589The StringRef class is a simple value class that contains a pointer to a
1590character and a length, and is quite related to the :ref:`ArrayRef
1591<dss_arrayref>` class (but specialized for arrays of characters). Because
1592StringRef carries a length with it, it safely handles strings with embedded nul
1593characters in it, getting the length does not require a strlen call, and it even
1594has very convenient APIs for slicing and dicing the character range that it
1595represents.
1596
1597StringRef is ideal for passing simple strings around that are known to be live,
1598either because they are C string literals, std::string, a C array, or a
1599SmallVector. Each of these cases has an efficient implicit conversion to
1600StringRef, which doesn't result in a dynamic strlen being executed.
1601
1602StringRef has a few major limitations which make more powerful string containers
1603useful:
1604
1605#. You cannot directly convert a StringRef to a 'const char*' because there is
1606 no way to add a trailing nul (unlike the .c_str() method on various stronger
1607 classes).
1608
1609#. StringRef doesn't own or keep alive the underlying string bytes.
1610 As such it can easily lead to dangling pointers, and is not suitable for
1611 embedding in datastructures in most cases (instead, use an std::string or
1612 something like that).
1613
1614#. For the same reason, StringRef cannot be used as the return value of a
1615 method if the method "computes" the result string. Instead, use std::string.
1616
1617#. StringRef's do not allow you to mutate the pointed-to string bytes and it
1618 doesn't allow you to insert or remove bytes from the range. For editing
1619 operations like this, it interoperates with the :ref:`Twine <dss_twine>`
1620 class.
1621
1622Because of its strengths and limitations, it is very common for a function to
1623take a StringRef and for a method on an object to return a StringRef that points
1624into some string that it owns.
1625
1626.. _dss_twine:
1627
1628llvm/ADT/Twine.h
1629^^^^^^^^^^^^^^^^
1630
1631The Twine class is used as an intermediary datatype for APIs that want to take a
1632string that can be constructed inline with a series of concatenations. Twine
1633works by forming recursive instances of the Twine datatype (a simple value
1634object) on the stack as temporary objects, linking them together into a tree
1635which is then linearized when the Twine is consumed. Twine is only safe to use
1636as the argument to a function, and should always be a const reference, e.g.:
1637
1638.. code-block:: c++
1639
1640 void foo(const Twine &T);
1641 ...
1642 StringRef X = ...
1643 unsigned i = ...
1644 foo(X + "." + Twine(i));
1645
1646This example forms a string like "blarg.42" by concatenating the values
1647together, and does not form intermediate strings containing "blarg" or "blarg.".
1648
1649Because Twine is constructed with temporary objects on the stack, and because
1650these instances are destroyed at the end of the current statement, it is an
1651inherently dangerous API. For example, this simple variant contains undefined
1652behavior and will probably crash:
1653
1654.. code-block:: c++
1655
1656 void foo(const Twine &T);
1657 ...
1658 StringRef X = ...
1659 unsigned i = ...
1660 const Twine &Tmp = X + "." + Twine(i);
1661 foo(Tmp);
1662
1663... because the temporaries are destroyed before the call. That said, Twine's
1664are much more efficient than intermediate std::string temporaries, and they work
1665really well with StringRef. Just be aware of their limitations.
1666
1667.. _dss_smallstring:
1668
1669llvm/ADT/SmallString.h
1670^^^^^^^^^^^^^^^^^^^^^^
1671
1672SmallString is a subclass of :ref:`SmallVector <dss_smallvector>` that adds some
1673convenience APIs like += that takes StringRef's. SmallString avoids allocating
1674memory in the case when the preallocated space is enough to hold its data, and
1675it calls back to general heap allocation when required. Since it owns its data,
1676it is very safe to use and supports full mutation of the string.
1677
1678Like SmallVector's, the big downside to SmallString is their sizeof. While they
1679are optimized for small strings, they themselves are not particularly small.
1680This means that they work great for temporary scratch buffers on the stack, but
1681should not generally be put into the heap: it is very rare to see a SmallString
1682as the member of a frequently-allocated heap data structure or returned
1683by-value.
1684
1685.. _dss_stdstring:
1686
1687std::string
1688^^^^^^^^^^^
1689
1690The standard C++ std::string class is a very general class that (like
1691SmallString) owns its underlying data. sizeof(std::string) is very reasonable
1692so it can be embedded into heap data structures and returned by-value. On the
1693other hand, std::string is highly inefficient for inline editing (e.g.
1694concatenating a bunch of stuff together) and because it is provided by the
1695standard library, its performance characteristics depend a lot of the host
1696standard library (e.g. libc++ and MSVC provide a highly optimized string class,
1697GCC contains a really slow implementation).
1698
1699The major disadvantage of std::string is that almost every operation that makes
1700them larger can allocate memory, which is slow. As such, it is better to use
1701SmallVector or Twine as a scratch buffer, but then use std::string to persist
1702the result.
1703
1704.. _ds_set:
1705
1706Set-Like Containers (std::set, SmallSet, SetVector, etc)
1707--------------------------------------------------------
1708
1709Set-like containers are useful when you need to canonicalize multiple values
1710into a single representation. There are several different choices for how to do
1711this, providing various trade-offs.
1712
1713.. _dss_sortedvectorset:
1714
1715A sorted 'vector'
1716^^^^^^^^^^^^^^^^^
1717
1718If you intend to insert a lot of elements, then do a lot of queries, a great
1719approach is to use a vector (or other sequential container) with
1720std::sort+std::unique to remove duplicates. This approach works really well if
1721your usage pattern has these two distinct phases (insert then query), and can be
1722coupled with a good choice of :ref:`sequential container <ds_sequential>`.
1723
1724This combination provides the several nice properties: the result data is
1725contiguous in memory (good for cache locality), has few allocations, is easy to
1726address (iterators in the final vector are just indices or pointers), and can be
Sean Silvac9fbd232013-03-29 21:57:47 +00001727efficiently queried with a standard binary search (e.g.
1728``std::lower_bound``; if you want the whole range of elements comparing
1729equal, use ``std::equal_range``).
Sean Silvabeb15ca2012-12-04 03:20:08 +00001730
1731.. _dss_smallset:
1732
1733llvm/ADT/SmallSet.h
1734^^^^^^^^^^^^^^^^^^^
1735
1736If you have a set-like data structure that is usually small and whose elements
1737are reasonably small, a ``SmallSet<Type, N>`` is a good choice. This set has
1738space for N elements in place (thus, if the set is dynamically smaller than N,
1739no malloc traffic is required) and accesses them with a simple linear search.
Artyom Skrobov62641152015-05-19 10:21:12 +00001740When the set grows beyond N elements, it allocates a more expensive
Sean Silvabeb15ca2012-12-04 03:20:08 +00001741representation that guarantees efficient access (for most types, it falls back
Artyom Skrobov62641152015-05-19 10:21:12 +00001742to :ref:`std::set <dss_set>`, but for pointers it uses something far better,
1743:ref:`SmallPtrSet <dss_smallptrset>`.
Sean Silvabeb15ca2012-12-04 03:20:08 +00001744
1745The magic of this class is that it handles small sets extremely efficiently, but
1746gracefully handles extremely large sets without loss of efficiency. The
1747drawback is that the interface is quite small: it supports insertion, queries
1748and erasing, but does not support iteration.
1749
1750.. _dss_smallptrset:
1751
1752llvm/ADT/SmallPtrSet.h
1753^^^^^^^^^^^^^^^^^^^^^^
1754
Artyom Skrobov62641152015-05-19 10:21:12 +00001755``SmallPtrSet`` has all the advantages of ``SmallSet`` (and a ``SmallSet`` of
Sean Silvabeb15ca2012-12-04 03:20:08 +00001756pointers is transparently implemented with a ``SmallPtrSet``), but also supports
Artyom Skrobov62641152015-05-19 10:21:12 +00001757iterators. If more than N insertions are performed, a single quadratically
Sean Silvabeb15ca2012-12-04 03:20:08 +00001758probed hash table is allocated and grows as needed, providing extremely
1759efficient access (constant time insertion/deleting/queries with low constant
1760factors) and is very stingy with malloc traffic.
1761
Artyom Skrobov62641152015-05-19 10:21:12 +00001762Note that, unlike :ref:`std::set <dss_set>`, the iterators of ``SmallPtrSet``
1763are invalidated whenever an insertion occurs. Also, the values visited by the
1764iterators are not visited in sorted order.
1765
1766.. _dss_stringset:
1767
1768llvm/ADT/StringSet.h
1769^^^^^^^^^^^^^^^^^^^^
1770
1771``StringSet`` is a thin wrapper around :ref:`StringMap\<char\> <dss_stringmap>`,
1772and it allows efficient storage and retrieval of unique strings.
1773
Sylvestre Ledru84666a12016-02-14 20:16:22 +00001774Functionally analogous to ``SmallSet<StringRef>``, ``StringSet`` also supports
Artyom Skrobov62641152015-05-19 10:21:12 +00001775iteration. (The iterator dereferences to a ``StringMapEntry<char>``, so you
1776need to call ``i->getKey()`` to access the item of the StringSet.) On the
1777other hand, ``StringSet`` doesn't support range-insertion and
1778copy-construction, which :ref:`SmallSet <dss_smallset>` and :ref:`SmallPtrSet
1779<dss_smallptrset>` do support.
Sean Silvabeb15ca2012-12-04 03:20:08 +00001780
1781.. _dss_denseset:
1782
1783llvm/ADT/DenseSet.h
1784^^^^^^^^^^^^^^^^^^^
1785
1786DenseSet is a simple quadratically probed hash table. It excels at supporting
1787small values: it uses a single allocation to hold all of the pairs that are
1788currently inserted in the set. DenseSet is a great way to unique small values
1789that are not simple pointers (use :ref:`SmallPtrSet <dss_smallptrset>` for
1790pointers). Note that DenseSet has the same requirements for the value type that
1791:ref:`DenseMap <dss_densemap>` has.
1792
1793.. _dss_sparseset:
1794
1795llvm/ADT/SparseSet.h
1796^^^^^^^^^^^^^^^^^^^^
1797
1798SparseSet holds a small number of objects identified by unsigned keys of
1799moderate size. It uses a lot of memory, but provides operations that are almost
1800as fast as a vector. Typical keys are physical registers, virtual registers, or
1801numbered basic blocks.
1802
1803SparseSet is useful for algorithms that need very fast clear/find/insert/erase
1804and fast iteration over small sets. It is not intended for building composite
1805data structures.
1806
Michael Ilseman830875b2013-01-21 21:46:32 +00001807.. _dss_sparsemultiset:
1808
1809llvm/ADT/SparseMultiSet.h
1810^^^^^^^^^^^^^^^^^^^^^^^^^^^^
1811
1812SparseMultiSet adds multiset behavior to SparseSet, while retaining SparseSet's
1813desirable attributes. Like SparseSet, it typically uses a lot of memory, but
1814provides operations that are almost as fast as a vector. Typical keys are
1815physical registers, virtual registers, or numbered basic blocks.
1816
1817SparseMultiSet is useful for algorithms that need very fast
1818clear/find/insert/erase of the entire collection, and iteration over sets of
1819elements sharing a key. It is often a more efficient choice than using composite
1820data structures (e.g. vector-of-vectors, map-of-vectors). It is not intended for
1821building composite data structures.
1822
Sean Silvabeb15ca2012-12-04 03:20:08 +00001823.. _dss_FoldingSet:
1824
1825llvm/ADT/FoldingSet.h
1826^^^^^^^^^^^^^^^^^^^^^
1827
1828FoldingSet is an aggregate class that is really good at uniquing
1829expensive-to-create or polymorphic objects. It is a combination of a chained
1830hash table with intrusive links (uniqued objects are required to inherit from
1831FoldingSetNode) that uses :ref:`SmallVector <dss_smallvector>` as part of its ID
1832process.
1833
1834Consider a case where you want to implement a "getOrCreateFoo" method for a
1835complex object (for example, a node in the code generator). The client has a
1836description of **what** it wants to generate (it knows the opcode and all the
1837operands), but we don't want to 'new' a node, then try inserting it into a set
1838only to find out it already exists, at which point we would have to delete it
1839and return the node that already exists.
1840
1841To support this style of client, FoldingSet perform a query with a
1842FoldingSetNodeID (which wraps SmallVector) that can be used to describe the
1843element that we want to query for. The query either returns the element
1844matching the ID or it returns an opaque ID that indicates where insertion should
1845take place. Construction of the ID usually does not require heap traffic.
1846
1847Because FoldingSet uses intrusive links, it can support polymorphic objects in
1848the set (for example, you can have SDNode instances mixed with LoadSDNodes).
1849Because the elements are individually allocated, pointers to the elements are
1850stable: inserting or removing elements does not invalidate any pointers to other
1851elements.
1852
1853.. _dss_set:
1854
1855<set>
1856^^^^^
1857
1858``std::set`` is a reasonable all-around set class, which is decent at many
1859things but great at nothing. std::set allocates memory for each element
1860inserted (thus it is very malloc intensive) and typically stores three pointers
1861per element in the set (thus adding a large amount of per-element space
1862overhead). It offers guaranteed log(n) performance, which is not particularly
1863fast from a complexity standpoint (particularly if the elements of the set are
1864expensive to compare, like strings), and has extremely high constant factors for
1865lookup, insertion and removal.
1866
1867The advantages of std::set are that its iterators are stable (deleting or
1868inserting an element from the set does not affect iterators or pointers to other
1869elements) and that iteration over the set is guaranteed to be in sorted order.
1870If the elements in the set are large, then the relative overhead of the pointers
1871and malloc traffic is not a big deal, but if the elements of the set are small,
1872std::set is almost never a good choice.
1873
1874.. _dss_setvector:
1875
1876llvm/ADT/SetVector.h
1877^^^^^^^^^^^^^^^^^^^^
1878
1879LLVM's ``SetVector<Type>`` is an adapter class that combines your choice of a
1880set-like container along with a :ref:`Sequential Container <ds_sequential>` The
1881important property that this provides is efficient insertion with uniquing
1882(duplicate elements are ignored) with iteration support. It implements this by
1883inserting elements into both a set-like container and the sequential container,
1884using the set-like container for uniquing and the sequential container for
1885iteration.
1886
1887The difference between SetVector and other sets is that the order of iteration
1888is guaranteed to match the order of insertion into the SetVector. This property
1889is really important for things like sets of pointers. Because pointer values
1890are non-deterministic (e.g. vary across runs of the program on different
1891machines), iterating over the pointers in the set will not be in a well-defined
1892order.
1893
1894The drawback of SetVector is that it requires twice as much space as a normal
1895set and has the sum of constant factors from the set-like container and the
1896sequential container that it uses. Use it **only** if you need to iterate over
1897the elements in a deterministic order. SetVector is also expensive to delete
Paul Robinson687915f2013-11-14 18:47:23 +00001898elements out of (linear time), unless you use its "pop_back" method, which is
Sean Silvabeb15ca2012-12-04 03:20:08 +00001899faster.
1900
1901``SetVector`` is an adapter class that defaults to using ``std::vector`` and a
1902size 16 ``SmallSet`` for the underlying containers, so it is quite expensive.
1903However, ``"llvm/ADT/SetVector.h"`` also provides a ``SmallSetVector`` class,
1904which defaults to using a ``SmallVector`` and ``SmallSet`` of a specified size.
1905If you use this, and if your sets are dynamically smaller than ``N``, you will
1906save a lot of heap traffic.
1907
1908.. _dss_uniquevector:
1909
1910llvm/ADT/UniqueVector.h
1911^^^^^^^^^^^^^^^^^^^^^^^
1912
1913UniqueVector is similar to :ref:`SetVector <dss_setvector>` but it retains a
1914unique ID for each element inserted into the set. It internally contains a map
1915and a vector, and it assigns a unique ID for each value inserted into the set.
1916
1917UniqueVector is very expensive: its cost is the sum of the cost of maintaining
1918both the map and vector, it has high complexity, high constant factors, and
1919produces a lot of malloc traffic. It should be avoided.
1920
1921.. _dss_immutableset:
1922
1923llvm/ADT/ImmutableSet.h
1924^^^^^^^^^^^^^^^^^^^^^^^
1925
1926ImmutableSet is an immutable (functional) set implementation based on an AVL
1927tree. Adding or removing elements is done through a Factory object and results
1928in the creation of a new ImmutableSet object. If an ImmutableSet already exists
1929with the given contents, then the existing one is returned; equality is compared
1930with a FoldingSetNodeID. The time and space complexity of add or remove
1931operations is logarithmic in the size of the original set.
1932
1933There is no method for returning an element of the set, you can only check for
1934membership.
1935
1936.. _dss_otherset:
1937
1938Other Set-Like Container Options
1939^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
1940
1941The STL provides several other options, such as std::multiset and the various
1942"hash_set" like containers (whether from C++ TR1 or from the SGI library). We
1943never use hash_set and unordered_set because they are generally very expensive
1944(each insertion requires a malloc) and very non-portable.
1945
1946std::multiset is useful if you're not interested in elimination of duplicates,
Artyom Skrobov62641152015-05-19 10:21:12 +00001947but has all the drawbacks of :ref:`std::set <dss_set>`. A sorted vector
1948(where you don't delete duplicate entries) or some other approach is almost
Aaron Ballman9f154f62015-07-29 15:57:49 +00001949always better.
Sean Silvabeb15ca2012-12-04 03:20:08 +00001950
1951.. _ds_map:
1952
1953Map-Like Containers (std::map, DenseMap, etc)
1954---------------------------------------------
1955
1956Map-like containers are useful when you want to associate data to a key. As
1957usual, there are a lot of different ways to do this. :)
1958
1959.. _dss_sortedvectormap:
1960
1961A sorted 'vector'
1962^^^^^^^^^^^^^^^^^
1963
1964If your usage pattern follows a strict insert-then-query approach, you can
1965trivially use the same approach as :ref:`sorted vectors for set-like containers
1966<dss_sortedvectorset>`. The only difference is that your query function (which
1967uses std::lower_bound to get efficient log(n) lookup) should only compare the
1968key, not both the key and value. This yields the same advantages as sorted
1969vectors for sets.
1970
1971.. _dss_stringmap:
1972
1973llvm/ADT/StringMap.h
1974^^^^^^^^^^^^^^^^^^^^
1975
1976Strings are commonly used as keys in maps, and they are difficult to support
1977efficiently: they are variable length, inefficient to hash and compare when
1978long, expensive to copy, etc. StringMap is a specialized container designed to
1979cope with these issues. It supports mapping an arbitrary range of bytes to an
1980arbitrary other object.
1981
1982The StringMap implementation uses a quadratically-probed hash table, where the
1983buckets store a pointer to the heap allocated entries (and some other stuff).
1984The entries in the map must be heap allocated because the strings are variable
1985length. The string data (key) and the element object (value) are stored in the
1986same allocation with the string data immediately after the element object.
1987This container guarantees the "``(char*)(&Value+1)``" points to the key string
1988for a value.
1989
1990The StringMap is very fast for several reasons: quadratic probing is very cache
1991efficient for lookups, the hash value of strings in buckets is not recomputed
1992when looking up an element, StringMap rarely has to touch the memory for
1993unrelated objects when looking up a value (even when hash collisions happen),
1994hash table growth does not recompute the hash values for strings already in the
1995table, and each pair in the map is store in a single allocation (the string data
1996is stored in the same allocation as the Value of a pair).
1997
1998StringMap also provides query methods that take byte ranges, so it only ever
1999copies a string if a value is inserted into the table.
2000
2001StringMap iteratation order, however, is not guaranteed to be deterministic, so
2002any uses which require that should instead use a std::map.
2003
2004.. _dss_indexmap:
2005
2006llvm/ADT/IndexedMap.h
2007^^^^^^^^^^^^^^^^^^^^^
2008
2009IndexedMap is a specialized container for mapping small dense integers (or
2010values that can be mapped to small dense integers) to some other type. It is
2011internally implemented as a vector with a mapping function that maps the keys
2012to the dense integer range.
2013
2014This is useful for cases like virtual registers in the LLVM code generator: they
2015have a dense mapping that is offset by a compile-time constant (the first
2016virtual register ID).
2017
2018.. _dss_densemap:
2019
2020llvm/ADT/DenseMap.h
2021^^^^^^^^^^^^^^^^^^^
2022
2023DenseMap is a simple quadratically probed hash table. It excels at supporting
2024small keys and values: it uses a single allocation to hold all of the pairs
2025that are currently inserted in the map. DenseMap is a great way to map
2026pointers to pointers, or map other small types to each other.
2027
2028There are several aspects of DenseMap that you should be aware of, however.
2029The iterators in a DenseMap are invalidated whenever an insertion occurs,
2030unlike map. Also, because DenseMap allocates space for a large number of
2031key/value pairs (it starts with 64 by default), it will waste a lot of space if
2032your keys or values are large. Finally, you must implement a partial
2033specialization of DenseMapInfo for the key that you want, if it isn't already
2034supported. This is required to tell DenseMap about two special marker values
2035(which can never be inserted into the map) that it needs internally.
2036
2037DenseMap's find_as() method supports lookup operations using an alternate key
2038type. This is useful in cases where the normal key type is expensive to
2039construct, but cheap to compare against. The DenseMapInfo is responsible for
2040defining the appropriate comparison and hashing methods for each alternate key
2041type used.
2042
2043.. _dss_valuemap:
2044
Chandler Carrutha4ea2692014-03-04 11:26:31 +00002045llvm/IR/ValueMap.h
Sean Silvabeb15ca2012-12-04 03:20:08 +00002046^^^^^^^^^^^^^^^^^^^
2047
2048ValueMap is a wrapper around a :ref:`DenseMap <dss_densemap>` mapping
2049``Value*``\ s (or subclasses) to another type. When a Value is deleted or
2050RAUW'ed, ValueMap will update itself so the new version of the key is mapped to
2051the same value, just as if the key were a WeakVH. You can configure exactly how
2052this happens, and what else happens on these two events, by passing a ``Config``
2053parameter to the ValueMap template.
2054
2055.. _dss_intervalmap:
2056
2057llvm/ADT/IntervalMap.h
2058^^^^^^^^^^^^^^^^^^^^^^
2059
2060IntervalMap is a compact map for small keys and values. It maps key intervals
2061instead of single keys, and it will automatically coalesce adjacent intervals.
Hans Wennborg8888d5b2015-01-17 03:19:21 +00002062When the map only contains a few intervals, they are stored in the map object
Sean Silvabeb15ca2012-12-04 03:20:08 +00002063itself to avoid allocations.
2064
2065The IntervalMap iterators are quite big, so they should not be passed around as
2066STL iterators. The heavyweight iterators allow a smaller data structure.
2067
2068.. _dss_map:
2069
2070<map>
2071^^^^^
2072
2073std::map has similar characteristics to :ref:`std::set <dss_set>`: it uses a
2074single allocation per pair inserted into the map, it offers log(n) lookup with
2075an extremely large constant factor, imposes a space penalty of 3 pointers per
2076pair in the map, etc.
2077
2078std::map is most useful when your keys or values are very large, if you need to
2079iterate over the collection in sorted order, or if you need stable iterators
2080into the map (i.e. they don't get invalidated if an insertion or deletion of
2081another element takes place).
2082
2083.. _dss_mapvector:
2084
2085llvm/ADT/MapVector.h
2086^^^^^^^^^^^^^^^^^^^^
2087
2088``MapVector<KeyT,ValueT>`` provides a subset of the DenseMap interface. The
2089main difference is that the iteration order is guaranteed to be the insertion
2090order, making it an easy (but somewhat expensive) solution for non-deterministic
2091iteration over maps of pointers.
2092
2093It is implemented by mapping from key to an index in a vector of key,value
Duncan P. N. Exon Smithf51601c2014-07-15 20:24:56 +00002094pairs. This provides fast lookup and iteration, but has two main drawbacks:
2095the key is stored twice and removing elements takes linear time. If it is
2096necessary to remove elements, it's best to remove them in bulk using
2097``remove_if()``.
Sean Silvabeb15ca2012-12-04 03:20:08 +00002098
2099.. _dss_inteqclasses:
2100
2101llvm/ADT/IntEqClasses.h
2102^^^^^^^^^^^^^^^^^^^^^^^
2103
2104IntEqClasses provides a compact representation of equivalence classes of small
2105integers. Initially, each integer in the range 0..n-1 has its own equivalence
2106class. Classes can be joined by passing two class representatives to the
2107join(a, b) method. Two integers are in the same class when findLeader() returns
2108the same representative.
2109
2110Once all equivalence classes are formed, the map can be compressed so each
2111integer 0..n-1 maps to an equivalence class number in the range 0..m-1, where m
2112is the total number of equivalence classes. The map must be uncompressed before
2113it can be edited again.
2114
2115.. _dss_immutablemap:
2116
2117llvm/ADT/ImmutableMap.h
2118^^^^^^^^^^^^^^^^^^^^^^^
2119
2120ImmutableMap is an immutable (functional) map implementation based on an AVL
2121tree. Adding or removing elements is done through a Factory object and results
2122in the creation of a new ImmutableMap object. If an ImmutableMap already exists
2123with the given key set, then the existing one is returned; equality is compared
2124with a FoldingSetNodeID. The time and space complexity of add or remove
2125operations is logarithmic in the size of the original map.
2126
2127.. _dss_othermap:
2128
2129Other Map-Like Container Options
2130^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2131
2132The STL provides several other options, such as std::multimap and the various
2133"hash_map" like containers (whether from C++ TR1 or from the SGI library). We
2134never use hash_set and unordered_set because they are generally very expensive
2135(each insertion requires a malloc) and very non-portable.
2136
2137std::multimap is useful if you want to map a key to multiple values, but has all
2138the drawbacks of std::map. A sorted vector or some other approach is almost
2139always better.
2140
2141.. _ds_bit:
2142
2143Bit storage containers (BitVector, SparseBitVector)
2144---------------------------------------------------
2145
2146Unlike the other containers, there are only two bit storage containers, and
2147choosing when to use each is relatively straightforward.
2148
2149One additional option is ``std::vector<bool>``: we discourage its use for two
2150reasons 1) the implementation in many common compilers (e.g. commonly
2151available versions of GCC) is extremely inefficient and 2) the C++ standards
2152committee is likely to deprecate this container and/or change it significantly
2153somehow. In any case, please don't use it.
2154
2155.. _dss_bitvector:
2156
2157BitVector
2158^^^^^^^^^
2159
2160The BitVector container provides a dynamic size set of bits for manipulation.
2161It supports individual bit setting/testing, as well as set operations. The set
2162operations take time O(size of bitvector), but operations are performed one word
2163at a time, instead of one bit at a time. This makes the BitVector very fast for
2164set operations compared to other containers. Use the BitVector when you expect
2165the number of set bits to be high (i.e. a dense set).
2166
2167.. _dss_smallbitvector:
2168
2169SmallBitVector
2170^^^^^^^^^^^^^^
2171
2172The SmallBitVector container provides the same interface as BitVector, but it is
2173optimized for the case where only a small number of bits, less than 25 or so,
2174are needed. It also transparently supports larger bit counts, but slightly less
2175efficiently than a plain BitVector, so SmallBitVector should only be used when
2176larger counts are rare.
2177
2178At this time, SmallBitVector does not support set operations (and, or, xor), and
2179its operator[] does not provide an assignable lvalue.
2180
2181.. _dss_sparsebitvector:
2182
2183SparseBitVector
2184^^^^^^^^^^^^^^^
2185
2186The SparseBitVector container is much like BitVector, with one major difference:
2187Only the bits that are set, are stored. This makes the SparseBitVector much
2188more space efficient than BitVector when the set is sparse, as well as making
2189set operations O(number of set bits) instead of O(size of universe). The
2190downside to the SparseBitVector is that setting and testing of random bits is
2191O(N), and on large SparseBitVectors, this can be slower than BitVector. In our
2192implementation, setting or testing bits in sorted order (either forwards or
2193reverse) is O(1) worst case. Testing and setting bits within 128 bits (depends
2194on size) of the current bit is also O(1). As a general statement,
2195testing/setting bits in a SparseBitVector is O(distance away from last set bit).
2196
David Blaikie063b2722016-12-20 17:33:58 +00002197.. _debugging:
2198
2199Debugging
2200=========
2201
2202A handful of `GDB pretty printers
2203<https://sourceware.org/gdb/onlinedocs/gdb/Pretty-Printing.html>`__ are
2204provided for some of the core LLVM libraries. To use them, execute the
2205following (or add it to your ``~/.gdbinit``)::
2206
2207 source /path/to/llvm/src/utils/gdb-scripts/prettyprinters.py
2208
2209It also might be handy to enable the `print pretty
David Blaikied21e08e2016-12-20 17:43:48 +00002210<http://ftp.gnu.org/old-gnu/Manuals/gdb/html_node/gdb_57.html>`__ option to
David Blaikie063b2722016-12-20 17:33:58 +00002211avoid data structures being printed as a big block of text.
2212
Sean Silvabeb15ca2012-12-04 03:20:08 +00002213.. _common:
2214
2215Helpful Hints for Common Operations
2216===================================
2217
2218This section describes how to perform some very simple transformations of LLVM
2219code. This is meant to give examples of common idioms used, showing the
2220practical side of LLVM transformations.
2221
2222Because this is a "how-to" section, you should also read about the main classes
2223that you will be working with. The :ref:`Core LLVM Class Hierarchy Reference
2224<coreclasses>` contains details and descriptions of the main classes that you
2225should know about.
2226
2227.. _inspection:
2228
2229Basic Inspection and Traversal Routines
2230---------------------------------------
2231
2232The LLVM compiler infrastructure have many different data structures that may be
2233traversed. Following the example of the C++ standard template library, the
2234techniques used to traverse these various data structures are all basically the
2235same. For a enumerable sequence of values, the ``XXXbegin()`` function (or
2236method) returns an iterator to the start of the sequence, the ``XXXend()``
2237function returns an iterator pointing to one past the last valid element of the
2238sequence, and there is some ``XXXiterator`` data type that is common between the
2239two operations.
2240
2241Because the pattern for iteration is common across many different aspects of the
2242program representation, the standard template library algorithms may be used on
2243them, and it is easier to remember how to iterate. First we show a few common
2244examples of the data structures that need to be traversed. Other data
2245structures are traversed in very similar ways.
2246
2247.. _iterate_function:
2248
2249Iterating over the ``BasicBlock`` in a ``Function``
2250^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2251
2252It's quite common to have a ``Function`` instance that you'd like to transform
2253in some way; in particular, you'd like to manipulate its ``BasicBlock``\ s. To
2254facilitate this, you'll need to iterate over all of the ``BasicBlock``\ s that
2255constitute the ``Function``. The following is an example that prints the name
2256of a ``BasicBlock`` and the number of ``Instruction``\ s it contains:
2257
2258.. code-block:: c++
2259
2260 // func is a pointer to a Function instance
2261 for (Function::iterator i = func->begin(), e = func->end(); i != e; ++i)
2262 // Print out the name of the basic block if it has one, and then the
2263 // number of instructions that it contains
2264 errs() << "Basic block (name=" << i->getName() << ") has "
2265 << i->size() << " instructions.\n";
2266
2267Note that i can be used as if it were a pointer for the purposes of invoking
2268member functions of the ``Instruction`` class. This is because the indirection
2269operator is overloaded for the iterator classes. In the above code, the
2270expression ``i->size()`` is exactly equivalent to ``(*i).size()`` just like
2271you'd expect.
2272
2273.. _iterate_basicblock:
2274
2275Iterating over the ``Instruction`` in a ``BasicBlock``
2276^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2277
2278Just like when dealing with ``BasicBlock``\ s in ``Function``\ s, it's easy to
2279iterate over the individual instructions that make up ``BasicBlock``\ s. Here's
2280a code snippet that prints out each instruction in a ``BasicBlock``:
2281
2282.. code-block:: c++
2283
2284 // blk is a pointer to a BasicBlock instance
2285 for (BasicBlock::iterator i = blk->begin(), e = blk->end(); i != e; ++i)
2286 // The next statement works since operator<<(ostream&,...)
2287 // is overloaded for Instruction&
2288 errs() << *i << "\n";
2289
2290
2291However, this isn't really the best way to print out the contents of a
2292``BasicBlock``! Since the ostream operators are overloaded for virtually
2293anything you'll care about, you could have just invoked the print routine on the
2294basic block itself: ``errs() << *blk << "\n";``.
2295
2296.. _iterate_insiter:
2297
2298Iterating over the ``Instruction`` in a ``Function``
2299^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2300
2301If you're finding that you commonly iterate over a ``Function``'s
2302``BasicBlock``\ s and then that ``BasicBlock``'s ``Instruction``\ s,
2303``InstIterator`` should be used instead. You'll need to include
Yaron Kerend9c0bed2014-05-03 11:30:49 +00002304``llvm/IR/InstIterator.h`` (`doxygen
Yaron Keren81bb4152014-05-03 12:06:13 +00002305<http://llvm.org/doxygen/InstIterator_8h.html>`__) and then instantiate
Sean Silvabeb15ca2012-12-04 03:20:08 +00002306``InstIterator``\ s explicitly in your code. Here's a small example that shows
2307how to dump all instructions in a function to the standard error stream:
2308
2309.. code-block:: c++
2310
Yaron Kerend9c0bed2014-05-03 11:30:49 +00002311 #include "llvm/IR/InstIterator.h"
Sean Silvabeb15ca2012-12-04 03:20:08 +00002312
2313 // F is a pointer to a Function instance
2314 for (inst_iterator I = inst_begin(F), E = inst_end(F); I != E; ++I)
2315 errs() << *I << "\n";
2316
2317Easy, isn't it? You can also use ``InstIterator``\ s to fill a work list with
2318its initial contents. For example, if you wanted to initialize a work list to
2319contain all instructions in a ``Function`` F, all you would need to do is
2320something like:
2321
2322.. code-block:: c++
2323
2324 std::set<Instruction*> worklist;
2325 // or better yet, SmallPtrSet<Instruction*, 64> worklist;
2326
2327 for (inst_iterator I = inst_begin(F), E = inst_end(F); I != E; ++I)
2328 worklist.insert(&*I);
2329
2330The STL set ``worklist`` would now contain all instructions in the ``Function``
2331pointed to by F.
2332
2333.. _iterate_convert:
2334
2335Turning an iterator into a class pointer (and vice-versa)
2336^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2337
2338Sometimes, it'll be useful to grab a reference (or pointer) to a class instance
2339when all you've got at hand is an iterator. Well, extracting a reference or a
2340pointer from an iterator is very straight-forward. Assuming that ``i`` is a
2341``BasicBlock::iterator`` and ``j`` is a ``BasicBlock::const_iterator``:
2342
2343.. code-block:: c++
2344
2345 Instruction& inst = *i; // Grab reference to instruction reference
2346 Instruction* pinst = &*i; // Grab pointer to instruction reference
2347 const Instruction& inst = *j;
2348
2349However, the iterators you'll be working with in the LLVM framework are special:
2350they will automatically convert to a ptr-to-instance type whenever they need to.
Vedant Kumara34bdfa2016-03-23 05:18:50 +00002351Instead of dereferencing the iterator and then taking the address of the result,
Sean Silvabeb15ca2012-12-04 03:20:08 +00002352you can simply assign the iterator to the proper pointer type and you get the
2353dereference and address-of operation as a result of the assignment (behind the
Charlie Turner2ac115e2015-04-16 17:01:23 +00002354scenes, this is a result of overloading casting mechanisms). Thus the second
2355line of the last example,
Sean Silvabeb15ca2012-12-04 03:20:08 +00002356
2357.. code-block:: c++
2358
2359 Instruction *pinst = &*i;
2360
2361is semantically equivalent to
2362
2363.. code-block:: c++
2364
2365 Instruction *pinst = i;
2366
2367It's also possible to turn a class pointer into the corresponding iterator, and
2368this is a constant time operation (very efficient). The following code snippet
2369illustrates use of the conversion constructors provided by LLVM iterators. By
2370using these, you can explicitly grab the iterator of something without actually
2371obtaining it via iteration over some structure:
2372
2373.. code-block:: c++
2374
2375 void printNextInstruction(Instruction* inst) {
2376 BasicBlock::iterator it(inst);
2377 ++it; // After this line, it refers to the instruction after *inst
2378 if (it != inst->getParent()->end()) errs() << *it << "\n";
2379 }
2380
2381Unfortunately, these implicit conversions come at a cost; they prevent these
2382iterators from conforming to standard iterator conventions, and thus from being
2383usable with standard algorithms and containers. For example, they prevent the
2384following code, where ``B`` is a ``BasicBlock``, from compiling:
2385
2386.. code-block:: c++
2387
2388 llvm::SmallVector<llvm::Instruction *, 16>(B->begin(), B->end());
2389
2390Because of this, these implicit conversions may be removed some day, and
2391``operator*`` changed to return a pointer instead of a reference.
2392
2393.. _iterate_complex:
2394
2395Finding call sites: a slightly more complex example
2396^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2397
2398Say that you're writing a FunctionPass and would like to count all the locations
2399in the entire module (that is, across every ``Function``) where a certain
2400function (i.e., some ``Function *``) is already in scope. As you'll learn
2401later, you may want to use an ``InstVisitor`` to accomplish this in a much more
2402straight-forward manner, but this example will allow us to explore how you'd do
2403it if you didn't have ``InstVisitor`` around. In pseudo-code, this is what we
2404want to do:
2405
2406.. code-block:: none
2407
2408 initialize callCounter to zero
2409 for each Function f in the Module
2410 for each BasicBlock b in f
2411 for each Instruction i in b
2412 if (i is a CallInst and calls the given function)
2413 increment callCounter
2414
2415And the actual code is (remember, because we're writing a ``FunctionPass``, our
2416``FunctionPass``-derived class simply has to override the ``runOnFunction``
2417method):
2418
2419.. code-block:: c++
2420
2421 Function* targetFunc = ...;
2422
2423 class OurFunctionPass : public FunctionPass {
2424 public:
2425 OurFunctionPass(): callCounter(0) { }
2426
2427 virtual runOnFunction(Function& F) {
2428 for (Function::iterator b = F.begin(), be = F.end(); b != be; ++b) {
2429 for (BasicBlock::iterator i = b->begin(), ie = b->end(); i != ie; ++i) {
2430 if (CallInst* callInst = dyn_cast<CallInst>(&*i)) {
2431 // We know we've encountered a call instruction, so we
2432 // need to determine if it's a call to the
2433 // function pointed to by m_func or not.
2434 if (callInst->getCalledFunction() == targetFunc)
2435 ++callCounter;
2436 }
2437 }
2438 }
2439 }
2440
2441 private:
2442 unsigned callCounter;
2443 };
2444
2445.. _calls_and_invokes:
2446
2447Treating calls and invokes the same way
2448^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2449
2450You may have noticed that the previous example was a bit oversimplified in that
2451it did not deal with call sites generated by 'invoke' instructions. In this,
2452and in other situations, you may find that you want to treat ``CallInst``\ s and
2453``InvokeInst``\ s the same way, even though their most-specific common base
2454class is ``Instruction``, which includes lots of less closely-related things.
2455For these cases, LLVM provides a handy wrapper class called ``CallSite``
2456(`doxygen <http://llvm.org/doxygen/classllvm_1_1CallSite.html>`__) It is
2457essentially a wrapper around an ``Instruction`` pointer, with some methods that
2458provide functionality common to ``CallInst``\ s and ``InvokeInst``\ s.
2459
2460This class has "value semantics": it should be passed by value, not by reference
2461and it should not be dynamically allocated or deallocated using ``operator new``
2462or ``operator delete``. It is efficiently copyable, assignable and
2463constructable, with costs equivalents to that of a bare pointer. If you look at
2464its definition, it has only a single pointer member.
2465
2466.. _iterate_chains:
2467
2468Iterating over def-use & use-def chains
2469^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2470
2471Frequently, we might have an instance of the ``Value`` class (`doxygen
2472<http://llvm.org/doxygen/classllvm_1_1Value.html>`__) and we want to determine
2473which ``User`` s use the ``Value``. The list of all ``User``\ s of a particular
2474``Value`` is called a *def-use* chain. For example, let's say we have a
2475``Function*`` named ``F`` to a particular function ``foo``. Finding all of the
2476instructions that *use* ``foo`` is as simple as iterating over the *def-use*
2477chain of ``F``:
2478
2479.. code-block:: c++
2480
2481 Function *F = ...;
2482
Adam Nemet3aecd182015-03-17 17:51:58 +00002483 for (User *U : F->users()) {
Yaron Kerenadcf88e2014-05-01 12:33:26 +00002484 if (Instruction *Inst = dyn_cast<Instruction>(U)) {
Sean Silvabeb15ca2012-12-04 03:20:08 +00002485 errs() << "F is used in instruction:\n";
2486 errs() << *Inst << "\n";
2487 }
2488
Sean Silvabeb15ca2012-12-04 03:20:08 +00002489Alternatively, it's common to have an instance of the ``User`` Class (`doxygen
2490<http://llvm.org/doxygen/classllvm_1_1User.html>`__) and need to know what
2491``Value``\ s are used by it. The list of all ``Value``\ s used by a ``User`` is
2492known as a *use-def* chain. Instances of class ``Instruction`` are common
2493``User`` s, so we might want to iterate over all of the values that a particular
2494instruction uses (that is, the operands of the particular ``Instruction``):
2495
2496.. code-block:: c++
2497
2498 Instruction *pi = ...;
2499
Yaron Keren7229bbf2014-05-02 08:26:30 +00002500 for (Use &U : pi->operands()) {
Yaron Kerenadcf88e2014-05-01 12:33:26 +00002501 Value *v = U.get();
Sean Silvabeb15ca2012-12-04 03:20:08 +00002502 // ...
2503 }
2504
2505Declaring objects as ``const`` is an important tool of enforcing mutation free
2506algorithms (such as analyses, etc.). For this purpose above iterators come in
2507constant flavors as ``Value::const_use_iterator`` and
2508``Value::const_op_iterator``. They automatically arise when calling
2509``use/op_begin()`` on ``const Value*``\ s or ``const User*``\ s respectively.
2510Upon dereferencing, they return ``const Use*``\ s. Otherwise the above patterns
2511remain unchanged.
2512
2513.. _iterate_preds:
2514
2515Iterating over predecessors & successors of blocks
2516^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2517
2518Iterating over the predecessors and successors of a block is quite easy with the
Yaron Keren28e28e82015-07-12 20:40:41 +00002519routines defined in ``"llvm/IR/CFG.h"``. Just use code like this to
Sean Silvabeb15ca2012-12-04 03:20:08 +00002520iterate over all predecessors of BB:
2521
2522.. code-block:: c++
2523
Andrey Bokhanko74541452016-09-02 11:13:35 +00002524 #include "llvm/IR/CFG.h"
Sean Silvabeb15ca2012-12-04 03:20:08 +00002525 BasicBlock *BB = ...;
2526
Duncan P. N. Exon Smith6c990152014-07-21 17:06:51 +00002527 for (pred_iterator PI = pred_begin(BB), E = pred_end(BB); PI != E; ++PI) {
2528 BasicBlock *Pred = *PI;
Sean Silvabeb15ca2012-12-04 03:20:08 +00002529 // ...
2530 }
2531
Duncan P. N. Exon Smith6c990152014-07-21 17:06:51 +00002532Similarly, to iterate over successors use ``succ_iterator/succ_begin/succ_end``.
Sean Silvabeb15ca2012-12-04 03:20:08 +00002533
2534.. _simplechanges:
2535
2536Making simple changes
2537---------------------
2538
2539There are some primitive transformation operations present in the LLVM
2540infrastructure that are worth knowing about. When performing transformations,
2541it's fairly common to manipulate the contents of basic blocks. This section
2542describes some of the common methods for doing so and gives example code.
2543
2544.. _schanges_creating:
2545
2546Creating and inserting new ``Instruction``\ s
2547^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2548
2549*Instantiating Instructions*
2550
2551Creation of ``Instruction``\ s is straight-forward: simply call the constructor
2552for the kind of instruction to instantiate and provide the necessary parameters.
2553For example, an ``AllocaInst`` only *requires* a (const-ptr-to) ``Type``. Thus:
2554
2555.. code-block:: c++
2556
2557 AllocaInst* ai = new AllocaInst(Type::Int32Ty);
2558
2559will create an ``AllocaInst`` instance that represents the allocation of one
2560integer in the current stack frame, at run time. Each ``Instruction`` subclass
2561is likely to have varying default parameters which change the semantics of the
2562instruction, so refer to the `doxygen documentation for the subclass of
2563Instruction <http://llvm.org/doxygen/classllvm_1_1Instruction.html>`_ that
2564you're interested in instantiating.
2565
2566*Naming values*
2567
2568It is very useful to name the values of instructions when you're able to, as
2569this facilitates the debugging of your transformations. If you end up looking
2570at generated LLVM machine code, you definitely want to have logical names
2571associated with the results of instructions! By supplying a value for the
2572``Name`` (default) parameter of the ``Instruction`` constructor, you associate a
2573logical name with the result of the instruction's execution at run time. For
2574example, say that I'm writing a transformation that dynamically allocates space
2575for an integer on the stack, and that integer is going to be used as some kind
2576of index by some other code. To accomplish this, I place an ``AllocaInst`` at
2577the first point in the first ``BasicBlock`` of some ``Function``, and I'm
2578intending to use it within the same ``Function``. I might do:
2579
2580.. code-block:: c++
2581
2582 AllocaInst* pa = new AllocaInst(Type::Int32Ty, 0, "indexLoc");
2583
2584where ``indexLoc`` is now the logical name of the instruction's execution value,
2585which is a pointer to an integer on the run time stack.
2586
2587*Inserting instructions*
2588
Dan Liewc6ab58f2014-06-06 17:25:47 +00002589There are essentially three ways to insert an ``Instruction`` into an existing
Sean Silvabeb15ca2012-12-04 03:20:08 +00002590sequence of instructions that form a ``BasicBlock``:
2591
2592* Insertion into an explicit instruction list
2593
2594 Given a ``BasicBlock* pb``, an ``Instruction* pi`` within that ``BasicBlock``,
2595 and a newly-created instruction we wish to insert before ``*pi``, we do the
2596 following:
2597
2598 .. code-block:: c++
2599
2600 BasicBlock *pb = ...;
2601 Instruction *pi = ...;
2602 Instruction *newInst = new Instruction(...);
2603
2604 pb->getInstList().insert(pi, newInst); // Inserts newInst before pi in pb
2605
2606 Appending to the end of a ``BasicBlock`` is so common that the ``Instruction``
2607 class and ``Instruction``-derived classes provide constructors which take a
2608 pointer to a ``BasicBlock`` to be appended to. For example code that looked
2609 like:
2610
2611 .. code-block:: c++
2612
2613 BasicBlock *pb = ...;
2614 Instruction *newInst = new Instruction(...);
2615
2616 pb->getInstList().push_back(newInst); // Appends newInst to pb
2617
2618 becomes:
2619
2620 .. code-block:: c++
2621
2622 BasicBlock *pb = ...;
2623 Instruction *newInst = new Instruction(..., pb);
2624
2625 which is much cleaner, especially if you are creating long instruction
2626 streams.
2627
2628* Insertion into an implicit instruction list
2629
2630 ``Instruction`` instances that are already in ``BasicBlock``\ s are implicitly
2631 associated with an existing instruction list: the instruction list of the
2632 enclosing basic block. Thus, we could have accomplished the same thing as the
2633 above code without being given a ``BasicBlock`` by doing:
2634
2635 .. code-block:: c++
2636
2637 Instruction *pi = ...;
2638 Instruction *newInst = new Instruction(...);
2639
2640 pi->getParent()->getInstList().insert(pi, newInst);
2641
2642 In fact, this sequence of steps occurs so frequently that the ``Instruction``
2643 class and ``Instruction``-derived classes provide constructors which take (as
2644 a default parameter) a pointer to an ``Instruction`` which the newly-created
2645 ``Instruction`` should precede. That is, ``Instruction`` constructors are
2646 capable of inserting the newly-created instance into the ``BasicBlock`` of a
2647 provided instruction, immediately before that instruction. Using an
2648 ``Instruction`` constructor with a ``insertBefore`` (default) parameter, the
2649 above code becomes:
2650
2651 .. code-block:: c++
2652
2653 Instruction* pi = ...;
2654 Instruction* newInst = new Instruction(..., pi);
2655
2656 which is much cleaner, especially if you're creating a lot of instructions and
2657 adding them to ``BasicBlock``\ s.
2658
Dan Liewc6ab58f2014-06-06 17:25:47 +00002659* Insertion using an instance of ``IRBuilder``
2660
Dan Liew599cec62014-06-06 18:44:21 +00002661 Inserting several ``Instruction``\ s can be quite laborious using the previous
Dan Liewc6ab58f2014-06-06 17:25:47 +00002662 methods. The ``IRBuilder`` is a convenience class that can be used to add
2663 several instructions to the end of a ``BasicBlock`` or before a particular
2664 ``Instruction``. It also supports constant folding and renaming named
2665 registers (see ``IRBuilder``'s template arguments).
2666
2667 The example below demonstrates a very simple use of the ``IRBuilder`` where
2668 three instructions are inserted before the instruction ``pi``. The first two
2669 instructions are Call instructions and third instruction multiplies the return
2670 value of the two calls.
2671
2672 .. code-block:: c++
2673
2674 Instruction *pi = ...;
2675 IRBuilder<> Builder(pi);
2676 CallInst* callOne = Builder.CreateCall(...);
2677 CallInst* callTwo = Builder.CreateCall(...);
2678 Value* result = Builder.CreateMul(callOne, callTwo);
2679
2680 The example below is similar to the above example except that the created
2681 ``IRBuilder`` inserts instructions at the end of the ``BasicBlock`` ``pb``.
2682
2683 .. code-block:: c++
2684
2685 BasicBlock *pb = ...;
2686 IRBuilder<> Builder(pb);
2687 CallInst* callOne = Builder.CreateCall(...);
2688 CallInst* callTwo = Builder.CreateCall(...);
2689 Value* result = Builder.CreateMul(callOne, callTwo);
2690
Etienne Bergerond8b97352016-07-13 06:10:37 +00002691 See :doc:`tutorial/LangImpl03` for a practical use of the ``IRBuilder``.
Dan Liewc6ab58f2014-06-06 17:25:47 +00002692
2693
Sean Silvabeb15ca2012-12-04 03:20:08 +00002694.. _schanges_deleting:
2695
2696Deleting Instructions
2697^^^^^^^^^^^^^^^^^^^^^
2698
2699Deleting an instruction from an existing sequence of instructions that form a
2700BasicBlock_ is very straight-forward: just call the instruction's
2701``eraseFromParent()`` method. For example:
2702
2703.. code-block:: c++
2704
2705 Instruction *I = .. ;
2706 I->eraseFromParent();
2707
2708This unlinks the instruction from its containing basic block and deletes it. If
2709you'd just like to unlink the instruction from its containing basic block but
2710not delete it, you can use the ``removeFromParent()`` method.
2711
2712.. _schanges_replacing:
2713
2714Replacing an Instruction with another Value
2715^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2716
2717Replacing individual instructions
2718"""""""""""""""""""""""""""""""""
2719
2720Including "`llvm/Transforms/Utils/BasicBlockUtils.h
2721<http://llvm.org/doxygen/BasicBlockUtils_8h-source.html>`_" permits use of two
2722very useful replace functions: ``ReplaceInstWithValue`` and
2723``ReplaceInstWithInst``.
2724
2725.. _schanges_deleting_sub:
2726
2727Deleting Instructions
2728"""""""""""""""""""""
2729
2730* ``ReplaceInstWithValue``
2731
2732 This function replaces all uses of a given instruction with a value, and then
2733 removes the original instruction. The following example illustrates the
2734 replacement of the result of a particular ``AllocaInst`` that allocates memory
2735 for a single integer with a null pointer to an integer.
2736
2737 .. code-block:: c++
2738
2739 AllocaInst* instToReplace = ...;
2740 BasicBlock::iterator ii(instToReplace);
2741
2742 ReplaceInstWithValue(instToReplace->getParent()->getInstList(), ii,
2743 Constant::getNullValue(PointerType::getUnqual(Type::Int32Ty)));
2744
2745* ``ReplaceInstWithInst``
2746
2747 This function replaces a particular instruction with another instruction,
2748 inserting the new instruction into the basic block at the location where the
2749 old instruction was, and replacing any uses of the old instruction with the
2750 new instruction. The following example illustrates the replacement of one
2751 ``AllocaInst`` with another.
2752
2753 .. code-block:: c++
2754
2755 AllocaInst* instToReplace = ...;
2756 BasicBlock::iterator ii(instToReplace);
2757
2758 ReplaceInstWithInst(instToReplace->getParent()->getInstList(), ii,
2759 new AllocaInst(Type::Int32Ty, 0, "ptrToReplacedInt"));
2760
2761
2762Replacing multiple uses of Users and Values
2763"""""""""""""""""""""""""""""""""""""""""""
2764
2765You can use ``Value::replaceAllUsesWith`` and ``User::replaceUsesOfWith`` to
2766change more than one use at a time. See the doxygen documentation for the
2767`Value Class <http://llvm.org/doxygen/classllvm_1_1Value.html>`_ and `User Class
2768<http://llvm.org/doxygen/classllvm_1_1User.html>`_, respectively, for more
2769information.
2770
2771.. _schanges_deletingGV:
2772
2773Deleting GlobalVariables
2774^^^^^^^^^^^^^^^^^^^^^^^^
2775
2776Deleting a global variable from a module is just as easy as deleting an
2777Instruction. First, you must have a pointer to the global variable that you
2778wish to delete. You use this pointer to erase it from its parent, the module.
2779For example:
2780
2781.. code-block:: c++
2782
2783 GlobalVariable *GV = .. ;
2784
2785 GV->eraseFromParent();
2786
2787
2788.. _create_types:
2789
2790How to Create Types
2791-------------------
2792
2793In generating IR, you may need some complex types. If you know these types
2794statically, you can use ``TypeBuilder<...>::get()``, defined in
2795``llvm/Support/TypeBuilder.h``, to retrieve them. ``TypeBuilder`` has two forms
2796depending on whether you're building types for cross-compilation or native
2797library use. ``TypeBuilder<T, true>`` requires that ``T`` be independent of the
2798host environment, meaning that it's built out of types from the ``llvm::types``
2799(`doxygen <http://llvm.org/doxygen/namespacellvm_1_1types.html>`__) namespace
2800and pointers, functions, arrays, etc. built of those. ``TypeBuilder<T, false>``
2801additionally allows native C types whose size may depend on the host compiler.
2802For example,
2803
2804.. code-block:: c++
2805
2806 FunctionType *ft = TypeBuilder<types::i<8>(types::i<32>*), true>::get();
2807
2808is easier to read and write than the equivalent
2809
2810.. code-block:: c++
2811
2812 std::vector<const Type*> params;
2813 params.push_back(PointerType::getUnqual(Type::Int32Ty));
2814 FunctionType *ft = FunctionType::get(Type::Int8Ty, params, false);
2815
2816See the `class comment
2817<http://llvm.org/doxygen/TypeBuilder_8h-source.html#l00001>`_ for more details.
2818
2819.. _threading:
2820
2821Threads and LLVM
2822================
2823
2824This section describes the interaction of the LLVM APIs with multithreading,
2825both on the part of client applications, and in the JIT, in the hosted
2826application.
2827
2828Note that LLVM's support for multithreading is still relatively young. Up
2829through version 2.5, the execution of threaded hosted applications was
2830supported, but not threaded client access to the APIs. While this use case is
2831now supported, clients *must* adhere to the guidelines specified below to ensure
2832proper operation in multithreaded mode.
2833
2834Note that, on Unix-like platforms, LLVM requires the presence of GCC's atomic
2835intrinsics in order to support threaded operation. If you need a
2836multhreading-capable LLVM on a platform without a suitably modern system
2837compiler, consider compiling LLVM and LLVM-GCC in single-threaded mode, and
2838using the resultant compiler to build a copy of LLVM with multithreading
2839support.
2840
Sean Silvabeb15ca2012-12-04 03:20:08 +00002841.. _shutdown:
2842
2843Ending Execution with ``llvm_shutdown()``
2844-----------------------------------------
2845
2846When you are done using the LLVM APIs, you should call ``llvm_shutdown()`` to
Chandler Carruth39cd2162014-06-27 15:13:01 +00002847deallocate memory used for internal structures.
Zachary Turnerccbf3d02014-06-16 22:49:41 +00002848
Sean Silvabeb15ca2012-12-04 03:20:08 +00002849.. _managedstatic:
2850
2851Lazy Initialization with ``ManagedStatic``
2852------------------------------------------
2853
2854``ManagedStatic`` is a utility class in LLVM used to implement static
Chandler Carruth39cd2162014-06-27 15:13:01 +00002855initialization of static resources, such as the global type tables. In a
2856single-threaded environment, it implements a simple lazy initialization scheme.
2857When LLVM is compiled with support for multi-threading, however, it uses
Sean Silvabeb15ca2012-12-04 03:20:08 +00002858double-checked locking to implement thread-safe lazy initialization.
2859
Sean Silvabeb15ca2012-12-04 03:20:08 +00002860.. _llvmcontext:
2861
2862Achieving Isolation with ``LLVMContext``
2863----------------------------------------
2864
2865``LLVMContext`` is an opaque class in the LLVM API which clients can use to
2866operate multiple, isolated instances of LLVM concurrently within the same
2867address space. For instance, in a hypothetical compile-server, the compilation
2868of an individual translation unit is conceptually independent from all the
2869others, and it would be desirable to be able to compile incoming translation
2870units concurrently on independent server threads. Fortunately, ``LLVMContext``
2871exists to enable just this kind of scenario!
2872
2873Conceptually, ``LLVMContext`` provides isolation. Every LLVM entity
2874(``Module``\ s, ``Value``\ s, ``Type``\ s, ``Constant``\ s, etc.) in LLVM's
2875in-memory IR belongs to an ``LLVMContext``. Entities in different contexts
2876*cannot* interact with each other: ``Module``\ s in different contexts cannot be
2877linked together, ``Function``\ s cannot be added to ``Module``\ s in different
2878contexts, etc. What this means is that is is safe to compile on multiple
2879threads simultaneously, as long as no two threads operate on entities within the
2880same context.
2881
2882In practice, very few places in the API require the explicit specification of a
2883``LLVMContext``, other than the ``Type`` creation/lookup APIs. Because every
2884``Type`` carries a reference to its owning context, most other entities can
2885determine what context they belong to by looking at their own ``Type``. If you
2886are adding new entities to LLVM IR, please try to maintain this interface
2887design.
2888
Sean Silvabeb15ca2012-12-04 03:20:08 +00002889.. _jitthreading:
2890
2891Threads and the JIT
2892-------------------
2893
2894LLVM's "eager" JIT compiler is safe to use in threaded programs. Multiple
2895threads can call ``ExecutionEngine::getPointerToFunction()`` or
2896``ExecutionEngine::runFunction()`` concurrently, and multiple threads can run
2897code output by the JIT concurrently. The user must still ensure that only one
2898thread accesses IR in a given ``LLVMContext`` while another thread might be
2899modifying it. One way to do that is to always hold the JIT lock while accessing
2900IR outside the JIT (the JIT *modifies* the IR by adding ``CallbackVH``\ s).
2901Another way is to only call ``getPointerToFunction()`` from the
2902``LLVMContext``'s thread.
2903
2904When the JIT is configured to compile lazily (using
2905``ExecutionEngine::DisableLazyCompilation(false)``), there is currently a `race
2906condition <http://llvm.org/bugs/show_bug.cgi?id=5184>`_ in updating call sites
2907after a function is lazily-jitted. It's still possible to use the lazy JIT in a
2908threaded program if you ensure that only one thread at a time can call any
2909particular lazy stub and that the JIT lock guards any IR access, but we suggest
2910using only the eager JIT in threaded programs.
2911
2912.. _advanced:
2913
2914Advanced Topics
2915===============
2916
2917This section describes some of the advanced or obscure API's that most clients
2918do not need to be aware of. These API's tend manage the inner workings of the
2919LLVM system, and only need to be accessed in unusual circumstances.
2920
2921.. _SymbolTable:
2922
2923The ``ValueSymbolTable`` class
2924------------------------------
2925
2926The ``ValueSymbolTable`` (`doxygen
2927<http://llvm.org/doxygen/classllvm_1_1ValueSymbolTable.html>`__) class provides
2928a symbol table that the :ref:`Function <c_Function>` and Module_ classes use for
2929naming value definitions. The symbol table can provide a name for any Value_.
2930
2931Note that the ``SymbolTable`` class should not be directly accessed by most
2932clients. It should only be used when iteration over the symbol table names
2933themselves are required, which is very special purpose. Note that not all LLVM
2934Value_\ s have names, and those without names (i.e. they have an empty name) do
2935not exist in the symbol table.
2936
2937Symbol tables support iteration over the values in the symbol table with
2938``begin/end/iterator`` and supports querying to see if a specific name is in the
2939symbol table (with ``lookup``). The ``ValueSymbolTable`` class exposes no
2940public mutator methods, instead, simply call ``setName`` on a value, which will
2941autoinsert it into the appropriate symbol table.
2942
2943.. _UserLayout:
2944
2945The ``User`` and owned ``Use`` classes' memory layout
2946-----------------------------------------------------
2947
2948The ``User`` (`doxygen <http://llvm.org/doxygen/classllvm_1_1User.html>`__)
2949class provides a basis for expressing the ownership of ``User`` towards other
2950`Value instance <http://llvm.org/doxygen/classllvm_1_1Value.html>`_\ s. The
2951``Use`` (`doxygen <http://llvm.org/doxygen/classllvm_1_1Use.html>`__) helper
2952class is employed to do the bookkeeping and to facilitate *O(1)* addition and
2953removal.
2954
2955.. _Use2User:
2956
2957Interaction and relationship between ``User`` and ``Use`` objects
2958^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2959
2960A subclass of ``User`` can choose between incorporating its ``Use`` objects or
2961refer to them out-of-line by means of a pointer. A mixed variant (some ``Use``
2962s inline others hung off) is impractical and breaks the invariant that the
2963``Use`` objects belonging to the same ``User`` form a contiguous array.
2964
2965We have 2 different layouts in the ``User`` (sub)classes:
2966
2967* Layout a)
2968
2969 The ``Use`` object(s) are inside (resp. at fixed offset) of the ``User``
2970 object and there are a fixed number of them.
2971
2972* Layout b)
2973
2974 The ``Use`` object(s) are referenced by a pointer to an array from the
2975 ``User`` object and there may be a variable number of them.
2976
2977As of v2.4 each layout still possesses a direct pointer to the start of the
2978array of ``Use``\ s. Though not mandatory for layout a), we stick to this
2979redundancy for the sake of simplicity. The ``User`` object also stores the
2980number of ``Use`` objects it has. (Theoretically this information can also be
2981calculated given the scheme presented below.)
2982
2983Special forms of allocation operators (``operator new``) enforce the following
2984memory layouts:
2985
2986* Layout a) is modelled by prepending the ``User`` object by the ``Use[]``
2987 array.
2988
2989 .. code-block:: none
2990
2991 ...---.---.---.---.-------...
2992 | P | P | P | P | User
2993 '''---'---'---'---'-------'''
2994
2995* Layout b) is modelled by pointing at the ``Use[]`` array.
2996
2997 .. code-block:: none
2998
2999 .-------...
3000 | User
3001 '-------'''
3002 |
3003 v
3004 .---.---.---.---...
3005 | P | P | P | P |
3006 '---'---'---'---'''
3007
3008*(In the above figures* '``P``' *stands for the* ``Use**`` *that is stored in
3009each* ``Use`` *object in the member* ``Use::Prev`` *)*
3010
3011.. _Waymarking:
3012
3013The waymarking algorithm
3014^^^^^^^^^^^^^^^^^^^^^^^^
3015
3016Since the ``Use`` objects are deprived of the direct (back)pointer to their
3017``User`` objects, there must be a fast and exact method to recover it. This is
3018accomplished by the following scheme:
3019
3020A bit-encoding in the 2 LSBits (least significant bits) of the ``Use::Prev``
3021allows to find the start of the ``User`` object:
3022
Dmitri Gribenkoe8131122013-01-19 20:34:20 +00003023* ``00`` --- binary digit 0
Sean Silvabeb15ca2012-12-04 03:20:08 +00003024
Dmitri Gribenkoe8131122013-01-19 20:34:20 +00003025* ``01`` --- binary digit 1
Sean Silvabeb15ca2012-12-04 03:20:08 +00003026
Dmitri Gribenkoe8131122013-01-19 20:34:20 +00003027* ``10`` --- stop and calculate (``s``)
Sean Silvabeb15ca2012-12-04 03:20:08 +00003028
Dmitri Gribenkoe8131122013-01-19 20:34:20 +00003029* ``11`` --- full stop (``S``)
Sean Silvabeb15ca2012-12-04 03:20:08 +00003030
3031Given a ``Use*``, all we have to do is to walk till we get a stop and we either
3032have a ``User`` immediately behind or we have to walk to the next stop picking
3033up digits and calculating the offset:
3034
3035.. code-block:: none
3036
3037 .---.---.---.---.---.---.---.---.---.---.---.---.---.---.---.---.----------------
3038 | 1 | s | 1 | 0 | 1 | 0 | s | 1 | 1 | 0 | s | 1 | 1 | s | 1 | S | User (or User*)
3039 '---'---'---'---'---'---'---'---'---'---'---'---'---'---'---'---'----------------
3040 |+15 |+10 |+6 |+3 |+1
3041 | | | | | __>
3042 | | | | __________>
3043 | | | ______________________>
3044 | | ______________________________________>
3045 | __________________________________________________________>
3046
3047Only the significant number of bits need to be stored between the stops, so that
3048the *worst case is 20 memory accesses* when there are 1000 ``Use`` objects
3049associated with a ``User``.
3050
3051.. _ReferenceImpl:
3052
3053Reference implementation
3054^^^^^^^^^^^^^^^^^^^^^^^^
3055
3056The following literate Haskell fragment demonstrates the concept:
3057
3058.. code-block:: haskell
3059
3060 > import Test.QuickCheck
3061 >
3062 > digits :: Int -> [Char] -> [Char]
3063 > digits 0 acc = '0' : acc
3064 > digits 1 acc = '1' : acc
3065 > digits n acc = digits (n `div` 2) $ digits (n `mod` 2) acc
3066 >
3067 > dist :: Int -> [Char] -> [Char]
3068 > dist 0 [] = ['S']
3069 > dist 0 acc = acc
3070 > dist 1 acc = let r = dist 0 acc in 's' : digits (length r) r
3071 > dist n acc = dist (n - 1) $ dist 1 acc
3072 >
3073 > takeLast n ss = reverse $ take n $ reverse ss
3074 >
3075 > test = takeLast 40 $ dist 20 []
3076 >
3077
3078Printing <test> gives: ``"1s100000s11010s10100s1111s1010s110s11s1S"``
3079
3080The reverse algorithm computes the length of the string just by examining a
3081certain prefix:
3082
3083.. code-block:: haskell
3084
3085 > pref :: [Char] -> Int
3086 > pref "S" = 1
3087 > pref ('s':'1':rest) = decode 2 1 rest
3088 > pref (_:rest) = 1 + pref rest
3089 >
3090 > decode walk acc ('0':rest) = decode (walk + 1) (acc * 2) rest
3091 > decode walk acc ('1':rest) = decode (walk + 1) (acc * 2 + 1) rest
3092 > decode walk acc _ = walk + acc
3093 >
3094
3095Now, as expected, printing <pref test> gives ``40``.
3096
3097We can *quickCheck* this with following property:
3098
3099.. code-block:: haskell
3100
3101 > testcase = dist 2000 []
3102 > testcaseLength = length testcase
3103 >
3104 > identityProp n = n > 0 && n <= testcaseLength ==> length arr == pref arr
3105 > where arr = takeLast n testcase
3106 >
3107
3108As expected <quickCheck identityProp> gives:
3109
3110::
3111
3112 *Main> quickCheck identityProp
3113 OK, passed 100 tests.
3114
3115Let's be a bit more exhaustive:
3116
3117.. code-block:: haskell
3118
3119 >
3120 > deepCheck p = check (defaultConfig { configMaxTest = 500 }) p
3121 >
3122
3123And here is the result of <deepCheck identityProp>:
3124
3125::
3126
3127 *Main> deepCheck identityProp
3128 OK, passed 500 tests.
3129
3130.. _Tagging:
3131
3132Tagging considerations
3133^^^^^^^^^^^^^^^^^^^^^^
3134
3135To maintain the invariant that the 2 LSBits of each ``Use**`` in ``Use`` never
3136change after being set up, setters of ``Use::Prev`` must re-tag the new
3137``Use**`` on every modification. Accordingly getters must strip the tag bits.
3138
3139For layout b) instead of the ``User`` we find a pointer (``User*`` with LSBit
3140set). Following this pointer brings us to the ``User``. A portable trick
3141ensures that the first bytes of ``User`` (if interpreted as a pointer) never has
3142the LSBit set. (Portability is relying on the fact that all known compilers
3143place the ``vptr`` in the first word of the instances.)
3144
Chandler Carruth064dc332015-01-28 03:04:54 +00003145.. _polymorphism:
3146
3147Designing Type Hiercharies and Polymorphic Interfaces
3148-----------------------------------------------------
3149
3150There are two different design patterns that tend to result in the use of
3151virtual dispatch for methods in a type hierarchy in C++ programs. The first is
3152a genuine type hierarchy where different types in the hierarchy model
3153a specific subset of the functionality and semantics, and these types nest
3154strictly within each other. Good examples of this can be seen in the ``Value``
3155or ``Type`` type hierarchies.
3156
3157A second is the desire to dispatch dynamically across a collection of
3158polymorphic interface implementations. This latter use case can be modeled with
3159virtual dispatch and inheritance by defining an abstract interface base class
3160which all implementations derive from and override. However, this
3161implementation strategy forces an **"is-a"** relationship to exist that is not
3162actually meaningful. There is often not some nested hierarchy of useful
3163generalizations which code might interact with and move up and down. Instead,
3164there is a singular interface which is dispatched across a range of
3165implementations.
3166
3167The preferred implementation strategy for the second use case is that of
3168generic programming (sometimes called "compile-time duck typing" or "static
3169polymorphism"). For example, a template over some type parameter ``T`` can be
3170instantiated across any particular implementation that conforms to the
3171interface or *concept*. A good example here is the highly generic properties of
3172any type which models a node in a directed graph. LLVM models these primarily
3173through templates and generic programming. Such templates include the
3174``LoopInfoBase`` and ``DominatorTreeBase``. When this type of polymorphism
3175truly needs **dynamic** dispatch you can generalize it using a technique
3176called *concept-based polymorphism*. This pattern emulates the interfaces and
3177behaviors of templates using a very limited form of virtual dispatch for type
3178erasure inside its implementation. You can find examples of this technique in
3179the ``PassManager.h`` system, and there is a more detailed introduction to it
3180by Sean Parent in several of his talks and papers:
3181
3182#. `Inheritance Is The Base Class of Evil
3183 <http://channel9.msdn.com/Events/GoingNative/2013/Inheritance-Is-The-Base-Class-of-Evil>`_
3184 - The GoingNative 2013 talk describing this technique, and probably the best
3185 place to start.
3186#. `Value Semantics and Concepts-based Polymorphism
3187 <http://www.youtube.com/watch?v=_BpMYeUFXv8>`_ - The C++Now! 2012 talk
3188 describing this technique in more detail.
3189#. `Sean Parent's Papers and Presentations
3190 <http://github.com/sean-parent/sean-parent.github.com/wiki/Papers-and-Presentations>`_
3191 - A Github project full of links to slides, video, and sometimes code.
3192
3193When deciding between creating a type hierarchy (with either tagged or virtual
3194dispatch) and using templates or concepts-based polymorphism, consider whether
3195there is some refinement of an abstract base class which is a semantically
3196meaningful type on an interface boundary. If anything more refined than the
3197root abstract interface is meaningless to talk about as a partial extension of
3198the semantic model, then your use case likely fits better with polymorphism and
3199you should avoid using virtual dispatch. However, there may be some exigent
3200circumstances that require one technique or the other to be used.
3201
3202If you do need to introduce a type hierarchy, we prefer to use explicitly
3203closed type hierarchies with manual tagged dispatch and/or RTTI rather than the
3204open inheritance model and virtual dispatch that is more common in C++ code.
3205This is because LLVM rarely encourages library consumers to extend its core
3206types, and leverages the closed and tag-dispatched nature of its hierarchies to
3207generate significantly more efficient code. We have also found that a large
3208amount of our usage of type hierarchies fits better with tag-based pattern
3209matching rather than dynamic dispatch across a common interface. Within LLVM we
3210have built custom helpers to facilitate this design. See this document's
Sean Silva52c7dcd2015-01-28 10:36:41 +00003211section on :ref:`isa and dyn_cast <isa>` and our :doc:`detailed document
3212<HowToSetUpLLVMStyleRTTI>` which describes how you can implement this
3213pattern for use with the LLVM helpers.
Chandler Carruth064dc332015-01-28 03:04:54 +00003214
Sanjoy Das8ce64992015-03-26 19:25:01 +00003215.. _abi_breaking_checks:
3216
3217ABI Breaking Checks
3218-------------------
3219
3220Checks and asserts that alter the LLVM C++ ABI are predicated on the
3221preprocessor symbol `LLVM_ENABLE_ABI_BREAKING_CHECKS` -- LLVM
3222libraries built with `LLVM_ENABLE_ABI_BREAKING_CHECKS` are not ABI
3223compatible LLVM libraries built without it defined. By default,
3224turning on assertions also turns on `LLVM_ENABLE_ABI_BREAKING_CHECKS`
3225so a default +Asserts build is not ABI compatible with a
3226default -Asserts build. Clients that want ABI compatibility
3227between +Asserts and -Asserts builds should use the CMake or autoconf
3228build systems to set `LLVM_ENABLE_ABI_BREAKING_CHECKS` independently
3229of `LLVM_ENABLE_ASSERTIONS`.
3230
Sean Silvabeb15ca2012-12-04 03:20:08 +00003231.. _coreclasses:
3232
3233The Core LLVM Class Hierarchy Reference
3234=======================================
3235
Benjamin Kramer9f566a52013-07-08 19:59:35 +00003236``#include "llvm/IR/Type.h"``
Sean Silvabeb15ca2012-12-04 03:20:08 +00003237
3238header source: `Type.h <http://llvm.org/doxygen/Type_8h-source.html>`_
3239
3240doxygen info: `Type Clases <http://llvm.org/doxygen/classllvm_1_1Type.html>`_
3241
3242The Core LLVM classes are the primary means of representing the program being
3243inspected or transformed. The core LLVM classes are defined in header files in
Charlie Turner2ac115e2015-04-16 17:01:23 +00003244the ``include/llvm/IR`` directory, and implemented in the ``lib/IR``
3245directory. It's worth noting that, for historical reasons, this library is
3246called ``libLLVMCore.so``, not ``libLLVMIR.so`` as you might expect.
Sean Silvabeb15ca2012-12-04 03:20:08 +00003247
3248.. _Type:
3249
3250The Type class and Derived Types
3251--------------------------------
3252
3253``Type`` is a superclass of all type classes. Every ``Value`` has a ``Type``.
3254``Type`` cannot be instantiated directly but only through its subclasses.
3255Certain primitive types (``VoidType``, ``LabelType``, ``FloatType`` and
3256``DoubleType``) have hidden subclasses. They are hidden because they offer no
3257useful functionality beyond what the ``Type`` class offers except to distinguish
3258themselves from other subclasses of ``Type``.
3259
3260All other types are subclasses of ``DerivedType``. Types can be named, but this
3261is not a requirement. There exists exactly one instance of a given shape at any
3262one time. This allows type equality to be performed with address equality of
3263the Type Instance. That is, given two ``Type*`` values, the types are identical
3264if the pointers are identical.
3265
3266.. _m_Type:
3267
3268Important Public Methods
3269^^^^^^^^^^^^^^^^^^^^^^^^
3270
3271* ``bool isIntegerTy() const``: Returns true for any integer type.
3272
3273* ``bool isFloatingPointTy()``: Return true if this is one of the five
3274 floating point types.
3275
3276* ``bool isSized()``: Return true if the type has known size. Things
3277 that don't have a size are abstract types, labels and void.
3278
3279.. _derivedtypes:
3280
3281Important Derived Types
3282^^^^^^^^^^^^^^^^^^^^^^^
3283
3284``IntegerType``
3285 Subclass of DerivedType that represents integer types of any bit width. Any
3286 bit width between ``IntegerType::MIN_INT_BITS`` (1) and
3287 ``IntegerType::MAX_INT_BITS`` (~8 million) can be represented.
3288
3289 * ``static const IntegerType* get(unsigned NumBits)``: get an integer
3290 type of a specific bit width.
3291
3292 * ``unsigned getBitWidth() const``: Get the bit width of an integer type.
3293
3294``SequentialType``
Peter Collingbourne45681582016-12-02 03:05:41 +00003295 This is subclassed by ArrayType and VectorType.
Sean Silvabeb15ca2012-12-04 03:20:08 +00003296
3297 * ``const Type * getElementType() const``: Returns the type of each
3298 of the elements in the sequential type.
3299
Peter Collingbournebc070522016-12-02 03:20:58 +00003300 * ``uint64_t getNumElements() const``: Returns the number of elements
3301 in the sequential type.
3302
Sean Silvabeb15ca2012-12-04 03:20:08 +00003303``ArrayType``
3304 This is a subclass of SequentialType and defines the interface for array
3305 types.
3306
Sean Silvabeb15ca2012-12-04 03:20:08 +00003307``PointerType``
Peter Collingbourne45681582016-12-02 03:05:41 +00003308 Subclass of Type for pointer types.
Sean Silvabeb15ca2012-12-04 03:20:08 +00003309
3310``VectorType``
3311 Subclass of SequentialType for vector types. A vector type is similar to an
3312 ArrayType but is distinguished because it is a first class type whereas
3313 ArrayType is not. Vector types are used for vector operations and are usually
Ed Maste8ed40ce2015-04-14 20:52:58 +00003314 small vectors of an integer or floating point type.
Sean Silvabeb15ca2012-12-04 03:20:08 +00003315
3316``StructType``
3317 Subclass of DerivedTypes for struct types.
3318
3319.. _FunctionType:
3320
3321``FunctionType``
3322 Subclass of DerivedTypes for function types.
3323
3324 * ``bool isVarArg() const``: Returns true if it's a vararg function.
3325
3326 * ``const Type * getReturnType() const``: Returns the return type of the
3327 function.
3328
3329 * ``const Type * getParamType (unsigned i)``: Returns the type of the ith
3330 parameter.
3331
3332 * ``const unsigned getNumParams() const``: Returns the number of formal
3333 parameters.
3334
3335.. _Module:
3336
3337The ``Module`` class
3338--------------------
3339
Benjamin Kramer9f566a52013-07-08 19:59:35 +00003340``#include "llvm/IR/Module.h"``
Sean Silvabeb15ca2012-12-04 03:20:08 +00003341
3342header source: `Module.h <http://llvm.org/doxygen/Module_8h-source.html>`_
3343
3344doxygen info: `Module Class <http://llvm.org/doxygen/classllvm_1_1Module.html>`_
3345
3346The ``Module`` class represents the top level structure present in LLVM
3347programs. An LLVM module is effectively either a translation unit of the
3348original program or a combination of several translation units merged by the
3349linker. The ``Module`` class keeps track of a list of :ref:`Function
3350<c_Function>`\ s, a list of GlobalVariable_\ s, and a SymbolTable_.
3351Additionally, it contains a few helpful member functions that try to make common
3352operations easy.
3353
3354.. _m_Module:
3355
3356Important Public Members of the ``Module`` class
3357^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
3358
3359* ``Module::Module(std::string name = "")``
3360
3361 Constructing a Module_ is easy. You can optionally provide a name for it
3362 (probably based on the name of the translation unit).
3363
3364* | ``Module::iterator`` - Typedef for function list iterator
3365 | ``Module::const_iterator`` - Typedef for const_iterator.
3366 | ``begin()``, ``end()``, ``size()``, ``empty()``
3367
3368 These are forwarding methods that make it easy to access the contents of a
3369 ``Module`` object's :ref:`Function <c_Function>` list.
3370
3371* ``Module::FunctionListType &getFunctionList()``
3372
3373 Returns the list of :ref:`Function <c_Function>`\ s. This is necessary to use
3374 when you need to update the list or perform a complex action that doesn't have
3375 a forwarding method.
3376
3377----------------
3378
3379* | ``Module::global_iterator`` - Typedef for global variable list iterator
3380 | ``Module::const_global_iterator`` - Typedef for const_iterator.
3381 | ``global_begin()``, ``global_end()``, ``global_size()``, ``global_empty()``
3382
3383 These are forwarding methods that make it easy to access the contents of a
3384 ``Module`` object's GlobalVariable_ list.
3385
3386* ``Module::GlobalListType &getGlobalList()``
3387
3388 Returns the list of GlobalVariable_\ s. This is necessary to use when you
3389 need to update the list or perform a complex action that doesn't have a
3390 forwarding method.
3391
3392----------------
3393
3394* ``SymbolTable *getSymbolTable()``
3395
3396 Return a reference to the SymbolTable_ for this ``Module``.
3397
3398----------------
3399
3400* ``Function *getFunction(StringRef Name) const``
3401
3402 Look up the specified function in the ``Module`` SymbolTable_. If it does not
3403 exist, return ``null``.
3404
3405* ``Function *getOrInsertFunction(const std::string &Name, const FunctionType
3406 *T)``
3407
3408 Look up the specified function in the ``Module`` SymbolTable_. If it does not
3409 exist, add an external declaration for the function and return it.
3410
3411* ``std::string getTypeName(const Type *Ty)``
3412
3413 If there is at least one entry in the SymbolTable_ for the specified Type_,
3414 return it. Otherwise return the empty string.
3415
3416* ``bool addTypeName(const std::string &Name, const Type *Ty)``
3417
3418 Insert an entry in the SymbolTable_ mapping ``Name`` to ``Ty``. If there is
3419 already an entry for this name, true is returned and the SymbolTable_ is not
3420 modified.
3421
3422.. _Value:
3423
3424The ``Value`` class
3425-------------------
3426
Benjamin Kramer9f566a52013-07-08 19:59:35 +00003427``#include "llvm/IR/Value.h"``
Sean Silvabeb15ca2012-12-04 03:20:08 +00003428
3429header source: `Value.h <http://llvm.org/doxygen/Value_8h-source.html>`_
3430
3431doxygen info: `Value Class <http://llvm.org/doxygen/classllvm_1_1Value.html>`_
3432
3433The ``Value`` class is the most important class in the LLVM Source base. It
3434represents a typed value that may be used (among other things) as an operand to
3435an instruction. There are many different types of ``Value``\ s, such as
3436Constant_\ s, Argument_\ s. Even Instruction_\ s and :ref:`Function
3437<c_Function>`\ s are ``Value``\ s.
3438
3439A particular ``Value`` may be used many times in the LLVM representation for a
3440program. For example, an incoming argument to a function (represented with an
3441instance of the Argument_ class) is "used" by every instruction in the function
3442that references the argument. To keep track of this relationship, the ``Value``
3443class keeps a list of all of the ``User``\ s that is using it (the User_ class
3444is a base class for all nodes in the LLVM graph that can refer to ``Value``\ s).
3445This use list is how LLVM represents def-use information in the program, and is
3446accessible through the ``use_*`` methods, shown below.
3447
3448Because LLVM is a typed representation, every LLVM ``Value`` is typed, and this
3449Type_ is available through the ``getType()`` method. In addition, all LLVM
3450values can be named. The "name" of the ``Value`` is a symbolic string printed
3451in the LLVM code:
3452
3453.. code-block:: llvm
3454
3455 %foo = add i32 1, 2
3456
3457.. _nameWarning:
3458
3459The name of this instruction is "foo". **NOTE** that the name of any value may
3460be missing (an empty string), so names should **ONLY** be used for debugging
3461(making the source code easier to read, debugging printouts), they should not be
3462used to keep track of values or map between them. For this purpose, use a
3463``std::map`` of pointers to the ``Value`` itself instead.
3464
3465One important aspect of LLVM is that there is no distinction between an SSA
3466variable and the operation that produces it. Because of this, any reference to
3467the value produced by an instruction (or the value available as an incoming
3468argument, for example) is represented as a direct pointer to the instance of the
3469class that represents this value. Although this may take some getting used to,
3470it simplifies the representation and makes it easier to manipulate.
3471
3472.. _m_Value:
3473
3474Important Public Members of the ``Value`` class
3475^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
3476
3477* | ``Value::use_iterator`` - Typedef for iterator over the use-list
3478 | ``Value::const_use_iterator`` - Typedef for const_iterator over the
3479 use-list
3480 | ``unsigned use_size()`` - Returns the number of users of the value.
3481 | ``bool use_empty()`` - Returns true if there are no users.
3482 | ``use_iterator use_begin()`` - Get an iterator to the start of the
3483 use-list.
3484 | ``use_iterator use_end()`` - Get an iterator to the end of the use-list.
3485 | ``User *use_back()`` - Returns the last element in the list.
3486
3487 These methods are the interface to access the def-use information in LLVM.
3488 As with all other iterators in LLVM, the naming conventions follow the
3489 conventions defined by the STL_.
3490
3491* ``Type *getType() const``
3492 This method returns the Type of the Value.
3493
3494* | ``bool hasName() const``
3495 | ``std::string getName() const``
3496 | ``void setName(const std::string &Name)``
3497
3498 This family of methods is used to access and assign a name to a ``Value``, be
3499 aware of the :ref:`precaution above <nameWarning>`.
3500
3501* ``void replaceAllUsesWith(Value *V)``
3502
3503 This method traverses the use list of a ``Value`` changing all User_\ s of the
3504 current value to refer to "``V``" instead. For example, if you detect that an
3505 instruction always produces a constant value (for example through constant
3506 folding), you can replace all uses of the instruction with the constant like
3507 this:
3508
3509 .. code-block:: c++
3510
3511 Inst->replaceAllUsesWith(ConstVal);
3512
3513.. _User:
3514
3515The ``User`` class
3516------------------
3517
Benjamin Kramer9f566a52013-07-08 19:59:35 +00003518``#include "llvm/IR/User.h"``
Sean Silvabeb15ca2012-12-04 03:20:08 +00003519
3520header source: `User.h <http://llvm.org/doxygen/User_8h-source.html>`_
3521
3522doxygen info: `User Class <http://llvm.org/doxygen/classllvm_1_1User.html>`_
3523
3524Superclass: Value_
3525
3526The ``User`` class is the common base class of all LLVM nodes that may refer to
3527``Value``\ s. It exposes a list of "Operands" that are all of the ``Value``\ s
3528that the User is referring to. The ``User`` class itself is a subclass of
3529``Value``.
3530
3531The operands of a ``User`` point directly to the LLVM ``Value`` that it refers
3532to. Because LLVM uses Static Single Assignment (SSA) form, there can only be
3533one definition referred to, allowing this direct connection. This connection
3534provides the use-def information in LLVM.
3535
3536.. _m_User:
3537
3538Important Public Members of the ``User`` class
3539^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
3540
3541The ``User`` class exposes the operand list in two ways: through an index access
3542interface and through an iterator based interface.
3543
3544* | ``Value *getOperand(unsigned i)``
3545 | ``unsigned getNumOperands()``
3546
3547 These two methods expose the operands of the ``User`` in a convenient form for
3548 direct access.
3549
3550* | ``User::op_iterator`` - Typedef for iterator over the operand list
3551 | ``op_iterator op_begin()`` - Get an iterator to the start of the operand
3552 list.
3553 | ``op_iterator op_end()`` - Get an iterator to the end of the operand list.
3554
3555 Together, these methods make up the iterator based interface to the operands
3556 of a ``User``.
3557
3558
3559.. _Instruction:
3560
3561The ``Instruction`` class
3562-------------------------
3563
Benjamin Kramer9f566a52013-07-08 19:59:35 +00003564``#include "llvm/IR/Instruction.h"``
Sean Silvabeb15ca2012-12-04 03:20:08 +00003565
3566header source: `Instruction.h
3567<http://llvm.org/doxygen/Instruction_8h-source.html>`_
3568
3569doxygen info: `Instruction Class
3570<http://llvm.org/doxygen/classllvm_1_1Instruction.html>`_
3571
3572Superclasses: User_, Value_
3573
3574The ``Instruction`` class is the common base class for all LLVM instructions.
3575It provides only a few methods, but is a very commonly used class. The primary
3576data tracked by the ``Instruction`` class itself is the opcode (instruction
3577type) and the parent BasicBlock_ the ``Instruction`` is embedded into. To
3578represent a specific type of instruction, one of many subclasses of
3579``Instruction`` are used.
3580
3581Because the ``Instruction`` class subclasses the User_ class, its operands can
3582be accessed in the same way as for other ``User``\ s (with the
3583``getOperand()``/``getNumOperands()`` and ``op_begin()``/``op_end()`` methods).
3584An important file for the ``Instruction`` class is the ``llvm/Instruction.def``
3585file. This file contains some meta-data about the various different types of
3586instructions in LLVM. It describes the enum values that are used as opcodes
3587(for example ``Instruction::Add`` and ``Instruction::ICmp``), as well as the
3588concrete sub-classes of ``Instruction`` that implement the instruction (for
3589example BinaryOperator_ and CmpInst_). Unfortunately, the use of macros in this
3590file confuses doxygen, so these enum values don't show up correctly in the
3591`doxygen output <http://llvm.org/doxygen/classllvm_1_1Instruction.html>`_.
3592
3593.. _s_Instruction:
3594
3595Important Subclasses of the ``Instruction`` class
3596^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
3597
3598.. _BinaryOperator:
3599
3600* ``BinaryOperator``
3601
3602 This subclasses represents all two operand instructions whose operands must be
3603 the same type, except for the comparison instructions.
3604
3605.. _CastInst:
3606
3607* ``CastInst``
3608 This subclass is the parent of the 12 casting instructions. It provides
3609 common operations on cast instructions.
3610
3611.. _CmpInst:
3612
3613* ``CmpInst``
3614
3615 This subclass respresents the two comparison instructions,
3616 `ICmpInst <LangRef.html#i_icmp>`_ (integer opreands), and
3617 `FCmpInst <LangRef.html#i_fcmp>`_ (floating point operands).
3618
3619.. _TerminatorInst:
3620
3621* ``TerminatorInst``
3622
3623 This subclass is the parent of all terminator instructions (those which can
3624 terminate a block).
3625
3626.. _m_Instruction:
3627
3628Important Public Members of the ``Instruction`` class
3629^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
3630
3631* ``BasicBlock *getParent()``
3632
3633 Returns the BasicBlock_ that this
3634 ``Instruction`` is embedded into.
3635
3636* ``bool mayWriteToMemory()``
3637
3638 Returns true if the instruction writes to memory, i.e. it is a ``call``,
3639 ``free``, ``invoke``, or ``store``.
3640
3641* ``unsigned getOpcode()``
3642
3643 Returns the opcode for the ``Instruction``.
3644
3645* ``Instruction *clone() const``
3646
3647 Returns another instance of the specified instruction, identical in all ways
3648 to the original except that the instruction has no parent (i.e. it's not
3649 embedded into a BasicBlock_), and it has no name.
3650
3651.. _Constant:
3652
3653The ``Constant`` class and subclasses
3654-------------------------------------
3655
3656Constant represents a base class for different types of constants. It is
3657subclassed by ConstantInt, ConstantArray, etc. for representing the various
3658types of Constants. GlobalValue_ is also a subclass, which represents the
3659address of a global variable or function.
3660
3661.. _s_Constant:
3662
3663Important Subclasses of Constant
3664^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
3665
3666* ConstantInt : This subclass of Constant represents an integer constant of
3667 any width.
3668
3669 * ``const APInt& getValue() const``: Returns the underlying
3670 value of this constant, an APInt value.
3671
3672 * ``int64_t getSExtValue() const``: Converts the underlying APInt value to an
3673 int64_t via sign extension. If the value (not the bit width) of the APInt
3674 is too large to fit in an int64_t, an assertion will result. For this
3675 reason, use of this method is discouraged.
3676
3677 * ``uint64_t getZExtValue() const``: Converts the underlying APInt value
3678 to a uint64_t via zero extension. IF the value (not the bit width) of the
3679 APInt is too large to fit in a uint64_t, an assertion will result. For this
3680 reason, use of this method is discouraged.
3681
3682 * ``static ConstantInt* get(const APInt& Val)``: Returns the ConstantInt
3683 object that represents the value provided by ``Val``. The type is implied
3684 as the IntegerType that corresponds to the bit width of ``Val``.
3685
3686 * ``static ConstantInt* get(const Type *Ty, uint64_t Val)``: Returns the
3687 ConstantInt object that represents the value provided by ``Val`` for integer
3688 type ``Ty``.
3689
3690* ConstantFP : This class represents a floating point constant.
3691
3692 * ``double getValue() const``: Returns the underlying value of this constant.
3693
3694* ConstantArray : This represents a constant array.
3695
3696 * ``const std::vector<Use> &getValues() const``: Returns a vector of
3697 component constants that makeup this array.
3698
3699* ConstantStruct : This represents a constant struct.
3700
3701 * ``const std::vector<Use> &getValues() const``: Returns a vector of
3702 component constants that makeup this array.
3703
3704* GlobalValue : This represents either a global variable or a function. In
3705 either case, the value is a constant fixed address (after linking).
3706
3707.. _GlobalValue:
3708
3709The ``GlobalValue`` class
3710-------------------------
3711
Benjamin Kramer9f566a52013-07-08 19:59:35 +00003712``#include "llvm/IR/GlobalValue.h"``
Sean Silvabeb15ca2012-12-04 03:20:08 +00003713
3714header source: `GlobalValue.h
3715<http://llvm.org/doxygen/GlobalValue_8h-source.html>`_
3716
3717doxygen info: `GlobalValue Class
3718<http://llvm.org/doxygen/classllvm_1_1GlobalValue.html>`_
3719
3720Superclasses: Constant_, User_, Value_
3721
3722Global values ( GlobalVariable_\ s or :ref:`Function <c_Function>`\ s) are the
3723only LLVM values that are visible in the bodies of all :ref:`Function
3724<c_Function>`\ s. Because they are visible at global scope, they are also
3725subject to linking with other globals defined in different translation units.
3726To control the linking process, ``GlobalValue``\ s know their linkage rules.
3727Specifically, ``GlobalValue``\ s know whether they have internal or external
3728linkage, as defined by the ``LinkageTypes`` enumeration.
3729
3730If a ``GlobalValue`` has internal linkage (equivalent to being ``static`` in C),
3731it is not visible to code outside the current translation unit, and does not
3732participate in linking. If it has external linkage, it is visible to external
3733code, and does participate in linking. In addition to linkage information,
3734``GlobalValue``\ s keep track of which Module_ they are currently part of.
3735
3736Because ``GlobalValue``\ s are memory objects, they are always referred to by
3737their **address**. As such, the Type_ of a global is always a pointer to its
3738contents. It is important to remember this when using the ``GetElementPtrInst``
3739instruction because this pointer must be dereferenced first. For example, if
3740you have a ``GlobalVariable`` (a subclass of ``GlobalValue)`` that is an array
3741of 24 ints, type ``[24 x i32]``, then the ``GlobalVariable`` is a pointer to
3742that array. Although the address of the first element of this array and the
3743value of the ``GlobalVariable`` are the same, they have different types. The
3744``GlobalVariable``'s type is ``[24 x i32]``. The first element's type is
3745``i32.`` Because of this, accessing a global value requires you to dereference
3746the pointer with ``GetElementPtrInst`` first, then its elements can be accessed.
3747This is explained in the `LLVM Language Reference Manual
3748<LangRef.html#globalvars>`_.
3749
3750.. _m_GlobalValue:
3751
3752Important Public Members of the ``GlobalValue`` class
3753^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
3754
3755* | ``bool hasInternalLinkage() const``
3756 | ``bool hasExternalLinkage() const``
3757 | ``void setInternalLinkage(bool HasInternalLinkage)``
3758
3759 These methods manipulate the linkage characteristics of the ``GlobalValue``.
3760
3761* ``Module *getParent()``
3762
3763 This returns the Module_ that the
3764 GlobalValue is currently embedded into.
3765
3766.. _c_Function:
3767
3768The ``Function`` class
3769----------------------
3770
Benjamin Kramer9f566a52013-07-08 19:59:35 +00003771``#include "llvm/IR/Function.h"``
Sean Silvabeb15ca2012-12-04 03:20:08 +00003772
3773header source: `Function.h <http://llvm.org/doxygen/Function_8h-source.html>`_
3774
3775doxygen info: `Function Class
3776<http://llvm.org/doxygen/classllvm_1_1Function.html>`_
3777
3778Superclasses: GlobalValue_, Constant_, User_, Value_
3779
3780The ``Function`` class represents a single procedure in LLVM. It is actually
3781one of the more complex classes in the LLVM hierarchy because it must keep track
3782of a large amount of data. The ``Function`` class keeps track of a list of
3783BasicBlock_\ s, a list of formal Argument_\ s, and a SymbolTable_.
3784
3785The list of BasicBlock_\ s is the most commonly used part of ``Function``
3786objects. The list imposes an implicit ordering of the blocks in the function,
3787which indicate how the code will be laid out by the backend. Additionally, the
3788first BasicBlock_ is the implicit entry node for the ``Function``. It is not
3789legal in LLVM to explicitly branch to this initial block. There are no implicit
3790exit nodes, and in fact there may be multiple exit nodes from a single
3791``Function``. If the BasicBlock_ list is empty, this indicates that the
3792``Function`` is actually a function declaration: the actual body of the function
3793hasn't been linked in yet.
3794
3795In addition to a list of BasicBlock_\ s, the ``Function`` class also keeps track
3796of the list of formal Argument_\ s that the function receives. This container
3797manages the lifetime of the Argument_ nodes, just like the BasicBlock_ list does
3798for the BasicBlock_\ s.
3799
3800The SymbolTable_ is a very rarely used LLVM feature that is only used when you
3801have to look up a value by name. Aside from that, the SymbolTable_ is used
3802internally to make sure that there are not conflicts between the names of
3803Instruction_\ s, BasicBlock_\ s, or Argument_\ s in the function body.
3804
3805Note that ``Function`` is a GlobalValue_ and therefore also a Constant_. The
3806value of the function is its address (after linking) which is guaranteed to be
3807constant.
3808
3809.. _m_Function:
3810
3811Important Public Members of the ``Function``
3812^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
3813
3814* ``Function(const FunctionType *Ty, LinkageTypes Linkage,
3815 const std::string &N = "", Module* Parent = 0)``
3816
3817 Constructor used when you need to create new ``Function``\ s to add the
3818 program. The constructor must specify the type of the function to create and
3819 what type of linkage the function should have. The FunctionType_ argument
3820 specifies the formal arguments and return value for the function. The same
3821 FunctionType_ value can be used to create multiple functions. The ``Parent``
3822 argument specifies the Module in which the function is defined. If this
3823 argument is provided, the function will automatically be inserted into that
3824 module's list of functions.
3825
3826* ``bool isDeclaration()``
3827
3828 Return whether or not the ``Function`` has a body defined. If the function is
3829 "external", it does not have a body, and thus must be resolved by linking with
3830 a function defined in a different translation unit.
3831
3832* | ``Function::iterator`` - Typedef for basic block list iterator
3833 | ``Function::const_iterator`` - Typedef for const_iterator.
3834 | ``begin()``, ``end()``, ``size()``, ``empty()``
3835
3836 These are forwarding methods that make it easy to access the contents of a
3837 ``Function`` object's BasicBlock_ list.
3838
3839* ``Function::BasicBlockListType &getBasicBlockList()``
3840
3841 Returns the list of BasicBlock_\ s. This is necessary to use when you need to
3842 update the list or perform a complex action that doesn't have a forwarding
3843 method.
3844
3845* | ``Function::arg_iterator`` - Typedef for the argument list iterator
3846 | ``Function::const_arg_iterator`` - Typedef for const_iterator.
3847 | ``arg_begin()``, ``arg_end()``, ``arg_size()``, ``arg_empty()``
3848
3849 These are forwarding methods that make it easy to access the contents of a
3850 ``Function`` object's Argument_ list.
3851
3852* ``Function::ArgumentListType &getArgumentList()``
3853
3854 Returns the list of Argument_. This is necessary to use when you need to
3855 update the list or perform a complex action that doesn't have a forwarding
3856 method.
3857
3858* ``BasicBlock &getEntryBlock()``
3859
3860 Returns the entry ``BasicBlock`` for the function. Because the entry block
3861 for the function is always the first block, this returns the first block of
3862 the ``Function``.
3863
3864* | ``Type *getReturnType()``
3865 | ``FunctionType *getFunctionType()``
3866
3867 This traverses the Type_ of the ``Function`` and returns the return type of
3868 the function, or the FunctionType_ of the actual function.
3869
3870* ``SymbolTable *getSymbolTable()``
3871
3872 Return a pointer to the SymbolTable_ for this ``Function``.
3873
3874.. _GlobalVariable:
3875
3876The ``GlobalVariable`` class
3877----------------------------
3878
Benjamin Kramer9f566a52013-07-08 19:59:35 +00003879``#include "llvm/IR/GlobalVariable.h"``
Sean Silvabeb15ca2012-12-04 03:20:08 +00003880
3881header source: `GlobalVariable.h
3882<http://llvm.org/doxygen/GlobalVariable_8h-source.html>`_
3883
3884doxygen info: `GlobalVariable Class
3885<http://llvm.org/doxygen/classllvm_1_1GlobalVariable.html>`_
3886
3887Superclasses: GlobalValue_, Constant_, User_, Value_
3888
3889Global variables are represented with the (surprise surprise) ``GlobalVariable``
3890class. Like functions, ``GlobalVariable``\ s are also subclasses of
3891GlobalValue_, and as such are always referenced by their address (global values
3892must live in memory, so their "name" refers to their constant address). See
3893GlobalValue_ for more on this. Global variables may have an initial value
3894(which must be a Constant_), and if they have an initializer, they may be marked
3895as "constant" themselves (indicating that their contents never change at
3896runtime).
3897
3898.. _m_GlobalVariable:
3899
3900Important Public Members of the ``GlobalVariable`` class
3901^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
3902
3903* ``GlobalVariable(const Type *Ty, bool isConstant, LinkageTypes &Linkage,
3904 Constant *Initializer = 0, const std::string &Name = "", Module* Parent = 0)``
3905
3906 Create a new global variable of the specified type. If ``isConstant`` is true
3907 then the global variable will be marked as unchanging for the program. The
3908 Linkage parameter specifies the type of linkage (internal, external, weak,
3909 linkonce, appending) for the variable. If the linkage is InternalLinkage,
3910 WeakAnyLinkage, WeakODRLinkage, LinkOnceAnyLinkage or LinkOnceODRLinkage, then
3911 the resultant global variable will have internal linkage. AppendingLinkage
3912 concatenates together all instances (in different translation units) of the
3913 variable into a single variable but is only applicable to arrays. See the
3914 `LLVM Language Reference <LangRef.html#modulestructure>`_ for further details
3915 on linkage types. Optionally an initializer, a name, and the module to put
3916 the variable into may be specified for the global variable as well.
3917
3918* ``bool isConstant() const``
3919
3920 Returns true if this is a global variable that is known not to be modified at
3921 runtime.
3922
3923* ``bool hasInitializer()``
3924
3925 Returns true if this ``GlobalVariable`` has an intializer.
3926
3927* ``Constant *getInitializer()``
3928
3929 Returns the initial value for a ``GlobalVariable``. It is not legal to call
3930 this method if there is no initializer.
3931
3932.. _BasicBlock:
3933
3934The ``BasicBlock`` class
3935------------------------
3936
Benjamin Kramer9f566a52013-07-08 19:59:35 +00003937``#include "llvm/IR/BasicBlock.h"``
Sean Silvabeb15ca2012-12-04 03:20:08 +00003938
3939header source: `BasicBlock.h
3940<http://llvm.org/doxygen/BasicBlock_8h-source.html>`_
3941
3942doxygen info: `BasicBlock Class
3943<http://llvm.org/doxygen/classllvm_1_1BasicBlock.html>`_
3944
3945Superclass: Value_
3946
3947This class represents a single entry single exit section of the code, commonly
3948known as a basic block by the compiler community. The ``BasicBlock`` class
3949maintains a list of Instruction_\ s, which form the body of the block. Matching
3950the language definition, the last element of this list of instructions is always
3951a terminator instruction (a subclass of the TerminatorInst_ class).
3952
3953In addition to tracking the list of instructions that make up the block, the
3954``BasicBlock`` class also keeps track of the :ref:`Function <c_Function>` that
3955it is embedded into.
3956
3957Note that ``BasicBlock``\ s themselves are Value_\ s, because they are
3958referenced by instructions like branches and can go in the switch tables.
3959``BasicBlock``\ s have type ``label``.
3960
3961.. _m_BasicBlock:
3962
3963Important Public Members of the ``BasicBlock`` class
3964^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
3965
3966* ``BasicBlock(const std::string &Name = "", Function *Parent = 0)``
3967
3968 The ``BasicBlock`` constructor is used to create new basic blocks for
3969 insertion into a function. The constructor optionally takes a name for the
3970 new block, and a :ref:`Function <c_Function>` to insert it into. If the
3971 ``Parent`` parameter is specified, the new ``BasicBlock`` is automatically
3972 inserted at the end of the specified :ref:`Function <c_Function>`, if not
3973 specified, the BasicBlock must be manually inserted into the :ref:`Function
3974 <c_Function>`.
3975
3976* | ``BasicBlock::iterator`` - Typedef for instruction list iterator
3977 | ``BasicBlock::const_iterator`` - Typedef for const_iterator.
3978 | ``begin()``, ``end()``, ``front()``, ``back()``,
3979 ``size()``, ``empty()``
3980 STL-style functions for accessing the instruction list.
3981
3982 These methods and typedefs are forwarding functions that have the same
3983 semantics as the standard library methods of the same names. These methods
3984 expose the underlying instruction list of a basic block in a way that is easy
3985 to manipulate. To get the full complement of container operations (including
3986 operations to update the list), you must use the ``getInstList()`` method.
3987
3988* ``BasicBlock::InstListType &getInstList()``
3989
3990 This method is used to get access to the underlying container that actually
3991 holds the Instructions. This method must be used when there isn't a
3992 forwarding function in the ``BasicBlock`` class for the operation that you
3993 would like to perform. Because there are no forwarding functions for
3994 "updating" operations, you need to use this if you want to update the contents
3995 of a ``BasicBlock``.
3996
3997* ``Function *getParent()``
3998
3999 Returns a pointer to :ref:`Function <c_Function>` the block is embedded into,
4000 or a null pointer if it is homeless.
4001
4002* ``TerminatorInst *getTerminator()``
4003
4004 Returns a pointer to the terminator instruction that appears at the end of the
4005 ``BasicBlock``. If there is no terminator instruction, or if the last
4006 instruction in the block is not a terminator, then a null pointer is returned.
4007
4008.. _Argument:
4009
4010The ``Argument`` class
4011----------------------
4012
4013This subclass of Value defines the interface for incoming formal arguments to a
4014function. A Function maintains a list of its formal arguments. An argument has
4015a pointer to the parent Function.
4016
4017