| Sean Silva | 93ca021 | 2012-12-13 01:10:46 +0000 | [diff] [blame] | 1 | ============================= | 
 | 2 | Introduction to the Clang AST | 
 | 3 | ============================= | 
 | 4 |  | 
 | 5 | This document gives a gentle introduction to the mysteries of the Clang | 
 | 6 | AST. It is targeted at developers who either want to contribute to | 
 | 7 | Clang, or use tools that work based on Clang's AST, like the AST | 
 | 8 | matchers. | 
 | 9 |  | 
| Manuel Klimek | 7b77300 | 2013-05-17 08:40:22 +0000 | [diff] [blame] | 10 | .. raw:: html | 
 | 11 |  | 
 | 12 |   <center><iframe width="560" height="315" src="http://www.youtube.com/embed/VqCkCDFLSsc?vq=hd720" frameborder="0" allowfullscreen></iframe></center> | 
 | 13 |  | 
| Sean Silva | 6ba6726 | 2013-06-21 02:09:29 +0000 | [diff] [blame] | 14 | `Slides <http://llvm.org/devmtg/2013-04/klimek-slides.pdf>`_ | 
 | 15 |  | 
| Sean Silva | 93ca021 | 2012-12-13 01:10:46 +0000 | [diff] [blame] | 16 | Introduction | 
 | 17 | ============ | 
 | 18 |  | 
 | 19 | Clang's AST is different from ASTs produced by some other compilers in | 
 | 20 | that it closely resembles both the written C++ code and the C++ | 
 | 21 | standard. For example, parenthesis expressions and compile time | 
 | 22 | constants are available in an unreduced form in the AST. This makes | 
 | 23 | Clang's AST a good fit for refactoring tools. | 
 | 24 |  | 
 | 25 | Documentation for all Clang AST nodes is available via the generated | 
 | 26 | `Doxygen <http://clang.llvm.org/doxygen>`_. The doxygen online | 
 | 27 | documentation is also indexed by your favorite search engine, which will | 
 | 28 | make a search for clang and the AST node's class name usually turn up | 
 | 29 | the doxygen of the class you're looking for (for example, search for: | 
 | 30 | clang ParenExpr). | 
 | 31 |  | 
 | 32 | Examining the AST | 
 | 33 | ================= | 
 | 34 |  | 
 | 35 | A good way to familarize yourself with the Clang AST is to actually look | 
 | 36 | at it on some simple example code. Clang has a builtin AST-dump modes, | 
| Dmitri Gribenko | 97555a1 | 2012-12-15 21:10:51 +0000 | [diff] [blame] | 37 | which can be enabled with the flags ``-ast-dump`` and ``-ast-dump-xml``. Note | 
 | 38 | that ``-ast-dump-xml`` currently only works with debug builds of clang. | 
| Sean Silva | 93ca021 | 2012-12-13 01:10:46 +0000 | [diff] [blame] | 39 |  | 
 | 40 | Let's look at a simple example AST: | 
 | 41 |  | 
 | 42 | :: | 
 | 43 |  | 
| Dmitri Gribenko | 97555a1 | 2012-12-15 21:10:51 +0000 | [diff] [blame] | 44 |     $ cat test.cc | 
| Sean Silva | 93ca021 | 2012-12-13 01:10:46 +0000 | [diff] [blame] | 45 |     int f(int x) { | 
 | 46 |       int result = (x / 42); | 
 | 47 |       return result; | 
 | 48 |     } | 
 | 49 |  | 
 | 50 |     # Clang by default is a frontend for many tools; -cc1 tells it to directly | 
 | 51 |     # use the C++ compiler mode. -undef leaves out some internal declarations. | 
 | 52 |     $ clang -cc1 -undef -ast-dump-xml test.cc | 
 | 53 |     ... cutting out internal declarations of clang ... | 
 | 54 |     <TranslationUnit ptr="0x4871160"> | 
 | 55 |      <Function ptr="0x48a5800" name="f" prototype="true"> | 
 | 56 |       <FunctionProtoType ptr="0x4871de0" canonical="0x4871de0"> | 
 | 57 |        <BuiltinType ptr="0x4871250" canonical="0x4871250"/> | 
 | 58 |        <parameters> | 
 | 59 |         <BuiltinType ptr="0x4871250" canonical="0x4871250"/> | 
 | 60 |        </parameters> | 
 | 61 |       </FunctionProtoType> | 
 | 62 |       <ParmVar ptr="0x4871d80" name="x" initstyle="c"> | 
 | 63 |        <BuiltinType ptr="0x4871250" canonical="0x4871250"/> | 
 | 64 |       </ParmVar> | 
 | 65 |       <Stmt> | 
 | 66 |     (CompoundStmt 0x48a5a38 <t2.cc:1:14, line:4:1> | 
 | 67 |       (DeclStmt 0x48a59c0 <line:2:3, col:24> | 
 | 68 |         0x48a58c0 "int result = | 
 | 69 |           (ParenExpr 0x48a59a0 <col:16, col:23> 'int' | 
 | 70 |             (BinaryOperator 0x48a5978 <col:17, col:21> 'int' '/' | 
 | 71 |               (ImplicitCastExpr 0x48a5960 <col:17> 'int' <LValueToRValue> | 
 | 72 |                 (DeclRefExpr 0x48a5918 <col:17> 'int' lvalue ParmVar 0x4871d80 'x' 'int')) | 
 | 73 |               (IntegerLiteral 0x48a5940 <col:21> 'int' 42)))") | 
 | 74 |       (ReturnStmt 0x48a5a18 <line:3:3, col:10> | 
 | 75 |         (ImplicitCastExpr 0x48a5a00 <col:10> 'int' <LValueToRValue> | 
 | 76 |           (DeclRefExpr 0x48a59d8 <col:10> 'int' lvalue Var 0x48a58c0 'result' 'int')))) | 
 | 77 |  | 
 | 78 |       </Stmt> | 
 | 79 |      </Function> | 
 | 80 |     </TranslationUnit> | 
 | 81 |  | 
| Dmitri Gribenko | 97555a1 | 2012-12-15 21:10:51 +0000 | [diff] [blame] | 82 | In general, ``-ast-dump-xml`` dumps declarations in an XML-style format and | 
| Sean Silva | 93ca021 | 2012-12-13 01:10:46 +0000 | [diff] [blame] | 83 | statements in an S-expression-style format. The toplevel declaration in | 
 | 84 | a translation unit is always the `translation unit | 
 | 85 | declaration <http://clang.llvm.org/doxygen/classclang_1_1TranslationUnitDecl.html>`_. | 
 | 86 | In this example, our first user written declaration is the `function | 
 | 87 | declaration <http://clang.llvm.org/doxygen/classclang_1_1FunctionDecl.html>`_ | 
| Dmitri Gribenko | 97555a1 | 2012-12-15 21:10:51 +0000 | [diff] [blame] | 88 | of "``f``". The body of "``f``" is a `compound | 
| Sean Silva | 93ca021 | 2012-12-13 01:10:46 +0000 | [diff] [blame] | 89 | statement <http://clang.llvm.org/doxygen/classclang_1_1CompoundStmt.html>`_, | 
 | 90 | whose child nodes are a `declaration | 
 | 91 | statement <http://clang.llvm.org/doxygen/classclang_1_1DeclStmt.html>`_ | 
 | 92 | that declares our result variable, and the `return | 
 | 93 | statement <http://clang.llvm.org/doxygen/classclang_1_1ReturnStmt.html>`_. | 
 | 94 |  | 
 | 95 | AST Context | 
 | 96 | =========== | 
 | 97 |  | 
 | 98 | All information about the AST for a translation unit is bundled up in | 
 | 99 | the class | 
 | 100 | `ASTContext <http://clang.llvm.org/doxygen/classclang_1_1ASTContext.html>`_. | 
 | 101 | It allows traversal of the whole translation unit starting from | 
 | 102 | `getTranslationUnitDecl <http://clang.llvm.org/doxygen/classclang_1_1ASTContext.html#abd909fb01ef10cfd0244832a67b1dd64>`_, | 
 | 103 | or to access Clang's `table of | 
 | 104 | identifiers <http://clang.llvm.org/doxygen/classclang_1_1ASTContext.html#a4f95adb9958e22fbe55212ae6482feb4>`_ | 
 | 105 | for the parsed translation unit. | 
 | 106 |  | 
 | 107 | AST Nodes | 
 | 108 | ========= | 
 | 109 |  | 
 | 110 | Clang's AST nodes are modeled on a class hierarchy that does not have a | 
 | 111 | common ancestor. Instead, there are multiple larger hierarchies for | 
 | 112 | basic node types like | 
 | 113 | `Decl <http://clang.llvm.org/doxygen/classclang_1_1Decl.html>`_ and | 
 | 114 | `Stmt <http://clang.llvm.org/doxygen/classclang_1_1Stmt.html>`_. Many | 
 | 115 | important AST nodes derive from | 
 | 116 | `Type <http://clang.llvm.org/doxygen/classclang_1_1Type.html>`_, | 
 | 117 | `Decl <http://clang.llvm.org/doxygen/classclang_1_1Decl.html>`_, | 
 | 118 | `DeclContext <http://clang.llvm.org/doxygen/classclang_1_1DeclContext.html>`_ | 
 | 119 | or `Stmt <http://clang.llvm.org/doxygen/classclang_1_1Stmt.html>`_, with | 
 | 120 | some classes deriving from both Decl and DeclContext. | 
 | 121 |  | 
 | 122 | There are also a multitude of nodes in the AST that are not part of a | 
 | 123 | larger hierarchy, and are only reachable from specific other nodes, like | 
 | 124 | `CXXBaseSpecifier <http://clang.llvm.org/doxygen/classclang_1_1CXXBaseSpecifier.html>`_. | 
 | 125 |  | 
 | 126 | Thus, to traverse the full AST, one starts from the | 
 | 127 | `TranslationUnitDecl <http://clang.llvm.org/doxygen/classclang_1_1TranslationUnitDecl.html>`_ | 
 | 128 | and then recursively traverses everything that can be reached from that | 
 | 129 | node - this information has to be encoded for each specific node type. | 
 | 130 | This algorithm is encoded in the | 
 | 131 | `RecursiveASTVisitor <http://clang.llvm.org/doxygen/classclang_1_1RecursiveASTVisitor.html>`_. | 
 | 132 | See the `RecursiveASTVisitor | 
 | 133 | tutorial <http://clang.llvm.org/docs/RAVFrontendAction.html>`_. | 
 | 134 |  | 
 | 135 | The two most basic nodes in the Clang AST are statements | 
 | 136 | (`Stmt <http://clang.llvm.org/doxygen/classclang_1_1Stmt.html>`_) and | 
 | 137 | declarations | 
 | 138 | (`Decl <http://clang.llvm.org/doxygen/classclang_1_1Decl.html>`_). Note | 
 | 139 | that expressions | 
 | 140 | (`Expr <http://clang.llvm.org/doxygen/classclang_1_1Expr.html>`_) are | 
 | 141 | also statements in Clang's AST. |