Sean Silva | 93ca021 | 2012-12-13 01:10:46 +0000 | [diff] [blame] | 1 | ============================= |
| 2 | Introduction to the Clang AST |
| 3 | ============================= |
| 4 | |
| 5 | This document gives a gentle introduction to the mysteries of the Clang |
| 6 | AST. It is targeted at developers who either want to contribute to |
| 7 | Clang, or use tools that work based on Clang's AST, like the AST |
| 8 | matchers. |
| 9 | |
Manuel Klimek | 7b77300 | 2013-05-17 08:40:22 +0000 | [diff] [blame] | 10 | .. raw:: html |
| 11 | |
| 12 | <center><iframe width="560" height="315" src="http://www.youtube.com/embed/VqCkCDFLSsc?vq=hd720" frameborder="0" allowfullscreen></iframe></center> |
| 13 | |
Sean Silva | 6ba6726 | 2013-06-21 02:09:29 +0000 | [diff] [blame] | 14 | `Slides <http://llvm.org/devmtg/2013-04/klimek-slides.pdf>`_ |
| 15 | |
Sean Silva | 93ca021 | 2012-12-13 01:10:46 +0000 | [diff] [blame] | 16 | Introduction |
| 17 | ============ |
| 18 | |
| 19 | Clang's AST is different from ASTs produced by some other compilers in |
| 20 | that it closely resembles both the written C++ code and the C++ |
| 21 | standard. For example, parenthesis expressions and compile time |
| 22 | constants are available in an unreduced form in the AST. This makes |
| 23 | Clang's AST a good fit for refactoring tools. |
| 24 | |
| 25 | Documentation for all Clang AST nodes is available via the generated |
| 26 | `Doxygen <http://clang.llvm.org/doxygen>`_. The doxygen online |
| 27 | documentation is also indexed by your favorite search engine, which will |
| 28 | make a search for clang and the AST node's class name usually turn up |
| 29 | the doxygen of the class you're looking for (for example, search for: |
| 30 | clang ParenExpr). |
| 31 | |
| 32 | Examining the AST |
| 33 | ================= |
| 34 | |
| 35 | A good way to familarize yourself with the Clang AST is to actually look |
Richard Smith | 28d9077 | 2013-10-08 19:45:46 +0000 | [diff] [blame] | 36 | at it on some simple example code. Clang has a builtin AST-dump mode, |
| 37 | which can be enabled with the flag ``-ast-dump``. |
Sean Silva | 93ca021 | 2012-12-13 01:10:46 +0000 | [diff] [blame] | 38 | |
| 39 | Let's look at a simple example AST: |
| 40 | |
| 41 | :: |
| 42 | |
Dmitri Gribenko | 97555a1 | 2012-12-15 21:10:51 +0000 | [diff] [blame] | 43 | $ cat test.cc |
Sean Silva | 93ca021 | 2012-12-13 01:10:46 +0000 | [diff] [blame] | 44 | int f(int x) { |
| 45 | int result = (x / 42); |
| 46 | return result; |
| 47 | } |
| 48 | |
Richard Smith | 2e0b0cf | 2013-10-08 19:50:01 +0000 | [diff] [blame] | 49 | # Clang by default is a frontend for many tools; -Xclang is used to pass |
| 50 | # options directly to the C++ frontend. |
| 51 | $ clang -Xclang -ast-dump -fsyntax-only test.cc |
Richard Smith | 28d9077 | 2013-10-08 19:45:46 +0000 | [diff] [blame] | 52 | TranslationUnitDecl 0x5aea0d0 <<invalid sloc>> |
Sean Silva | 93ca021 | 2012-12-13 01:10:46 +0000 | [diff] [blame] | 53 | ... cutting out internal declarations of clang ... |
Richard Smith | 28d9077 | 2013-10-08 19:45:46 +0000 | [diff] [blame] | 54 | `-FunctionDecl 0x5aeab50 <test.cc:1:1, line:4:1> f 'int (int)' |
| 55 | |-ParmVarDecl 0x5aeaa90 <line:1:7, col:11> x 'int' |
| 56 | `-CompoundStmt 0x5aead88 <col:14, line:4:1> |
| 57 | |-DeclStmt 0x5aead10 <line:2:3, col:24> |
| 58 | | `-VarDecl 0x5aeac10 <col:3, col:23> result 'int' |
| 59 | | `-ParenExpr 0x5aeacf0 <col:16, col:23> 'int' |
| 60 | | `-BinaryOperator 0x5aeacc8 <col:17, col:21> 'int' '/' |
| 61 | | |-ImplicitCastExpr 0x5aeacb0 <col:17> 'int' <LValueToRValue> |
| 62 | | | `-DeclRefExpr 0x5aeac68 <col:17> 'int' lvalue ParmVar 0x5aeaa90 'x' 'int' |
| 63 | | `-IntegerLiteral 0x5aeac90 <col:21> 'int' 42 |
| 64 | `-ReturnStmt 0x5aead68 <line:3:3, col:10> |
| 65 | `-ImplicitCastExpr 0x5aead50 <col:10> 'int' <LValueToRValue> |
| 66 | `-DeclRefExpr 0x5aead28 <col:10> 'int' lvalue Var 0x5aeac10 'result' 'int' |
Sean Silva | 93ca021 | 2012-12-13 01:10:46 +0000 | [diff] [blame] | 67 | |
Richard Smith | 28d9077 | 2013-10-08 19:45:46 +0000 | [diff] [blame] | 68 | The toplevel declaration in |
Sean Silva | 93ca021 | 2012-12-13 01:10:46 +0000 | [diff] [blame] | 69 | a translation unit is always the `translation unit |
| 70 | declaration <http://clang.llvm.org/doxygen/classclang_1_1TranslationUnitDecl.html>`_. |
| 71 | In this example, our first user written declaration is the `function |
| 72 | declaration <http://clang.llvm.org/doxygen/classclang_1_1FunctionDecl.html>`_ |
Dmitri Gribenko | 97555a1 | 2012-12-15 21:10:51 +0000 | [diff] [blame] | 73 | of "``f``". The body of "``f``" is a `compound |
Sean Silva | 93ca021 | 2012-12-13 01:10:46 +0000 | [diff] [blame] | 74 | statement <http://clang.llvm.org/doxygen/classclang_1_1CompoundStmt.html>`_, |
| 75 | whose child nodes are a `declaration |
| 76 | statement <http://clang.llvm.org/doxygen/classclang_1_1DeclStmt.html>`_ |
| 77 | that declares our result variable, and the `return |
| 78 | statement <http://clang.llvm.org/doxygen/classclang_1_1ReturnStmt.html>`_. |
| 79 | |
| 80 | AST Context |
| 81 | =========== |
| 82 | |
| 83 | All information about the AST for a translation unit is bundled up in |
| 84 | the class |
| 85 | `ASTContext <http://clang.llvm.org/doxygen/classclang_1_1ASTContext.html>`_. |
| 86 | It allows traversal of the whole translation unit starting from |
| 87 | `getTranslationUnitDecl <http://clang.llvm.org/doxygen/classclang_1_1ASTContext.html#abd909fb01ef10cfd0244832a67b1dd64>`_, |
| 88 | or to access Clang's `table of |
| 89 | identifiers <http://clang.llvm.org/doxygen/classclang_1_1ASTContext.html#a4f95adb9958e22fbe55212ae6482feb4>`_ |
| 90 | for the parsed translation unit. |
| 91 | |
| 92 | AST Nodes |
| 93 | ========= |
| 94 | |
| 95 | Clang's AST nodes are modeled on a class hierarchy that does not have a |
| 96 | common ancestor. Instead, there are multiple larger hierarchies for |
| 97 | basic node types like |
| 98 | `Decl <http://clang.llvm.org/doxygen/classclang_1_1Decl.html>`_ and |
| 99 | `Stmt <http://clang.llvm.org/doxygen/classclang_1_1Stmt.html>`_. Many |
| 100 | important AST nodes derive from |
| 101 | `Type <http://clang.llvm.org/doxygen/classclang_1_1Type.html>`_, |
| 102 | `Decl <http://clang.llvm.org/doxygen/classclang_1_1Decl.html>`_, |
| 103 | `DeclContext <http://clang.llvm.org/doxygen/classclang_1_1DeclContext.html>`_ |
| 104 | or `Stmt <http://clang.llvm.org/doxygen/classclang_1_1Stmt.html>`_, with |
| 105 | some classes deriving from both Decl and DeclContext. |
| 106 | |
| 107 | There are also a multitude of nodes in the AST that are not part of a |
| 108 | larger hierarchy, and are only reachable from specific other nodes, like |
| 109 | `CXXBaseSpecifier <http://clang.llvm.org/doxygen/classclang_1_1CXXBaseSpecifier.html>`_. |
| 110 | |
| 111 | Thus, to traverse the full AST, one starts from the |
| 112 | `TranslationUnitDecl <http://clang.llvm.org/doxygen/classclang_1_1TranslationUnitDecl.html>`_ |
| 113 | and then recursively traverses everything that can be reached from that |
| 114 | node - this information has to be encoded for each specific node type. |
| 115 | This algorithm is encoded in the |
| 116 | `RecursiveASTVisitor <http://clang.llvm.org/doxygen/classclang_1_1RecursiveASTVisitor.html>`_. |
| 117 | See the `RecursiveASTVisitor |
| 118 | tutorial <http://clang.llvm.org/docs/RAVFrontendAction.html>`_. |
| 119 | |
| 120 | The two most basic nodes in the Clang AST are statements |
| 121 | (`Stmt <http://clang.llvm.org/doxygen/classclang_1_1Stmt.html>`_) and |
| 122 | declarations |
| 123 | (`Decl <http://clang.llvm.org/doxygen/classclang_1_1Decl.html>`_). Note |
| 124 | that expressions |
| 125 | (`Expr <http://clang.llvm.org/doxygen/classclang_1_1Expr.html>`_) are |
| 126 | also statements in Clang's AST. |