Manuel Klimek | a705f1f | 2012-07-25 07:28:11 +0000 | [diff] [blame] | 1 | <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" |
| 2 | "http://www.w3.org/TR/html4/strict.dtd"> |
| 3 | <html> |
| 4 | <head> |
| 5 | <title>Introduction to the Clang AST</title> |
| 6 | <link type="text/css" rel="stylesheet" href="../menu.css" /> |
| 7 | <link type="text/css" rel="stylesheet" href="../content.css" /> |
| 8 | </head> |
| 9 | <body> |
| 10 | |
| 11 | <!--#include virtual="../menu.html.incl"--> |
| 12 | |
| 13 | <div id="content"> |
| 14 | |
| 15 | <h1>Introduction to the Clang AST</h1> |
| 16 | <p>This document gives a gentle introduction to the mysteries of the Clang AST. |
| 17 | It is targeted at developers who either want to contribute to Clang, or use |
| 18 | tools that work based on Clang's AST, like the AST matchers.</p> |
| 19 | <!-- FIXME: Add link once we have an AST matcher document --> |
| 20 | |
| 21 | <!-- ======================================================================= --> |
| 22 | <h2 id="intro">Introduction</h2> |
| 23 | <!-- ======================================================================= --> |
| 24 | |
| 25 | <p>Clang's AST is different from ASTs produced by some other compilers in that it closely |
| 26 | resembles both the written C++ code and the C++ standard. For example, |
| 27 | parenthesis expressions and compile time constants are available in an unreduced |
| 28 | form in the AST. This makes Clang's AST a good fit for refactoring tools.</p> |
| 29 | |
| 30 | <p>Documentation for all Clang AST nodes is available via the generated |
| 31 | <a href="http://clang.llvm.org/doxygen">Doxygen</a>. The doxygen online |
| 32 | documentation is also indexed by your favorite search engine, which will make |
| 33 | a search for clang and the AST node's class name usually turn up the doxygen |
| 34 | of the class you're looking for (for example, search for: clang ParenExpr).</p> |
| 35 | |
| 36 | <!-- ======================================================================= --> |
| 37 | <h2 id="examine">Examining the AST</h2> |
| 38 | <!-- ======================================================================= --> |
| 39 | |
| 40 | <p>A good way to familarize yourself with the Clang AST is to actually look |
| 41 | at it on some simple example code. Clang has a builtin AST-dump modes, which |
| 42 | can be enabled with the flags -ast-dump and -ast-dump-xml. Note that -ast-dump-xml |
| 43 | currently only works with debug-builds of clang.</p> |
| 44 | |
| 45 | <p>Let's look at a simple example AST:</p> |
| 46 | <pre> |
| 47 | # cat test.cc |
| 48 | int f(int x) { |
| 49 | int result = (x / 42); |
| 50 | return result; |
| 51 | } |
| 52 | |
| 53 | # Clang by default is a frontend for many tools; -cc1 tells it to directly |
| 54 | # use the C++ compiler mode. -undef leaves out some internal declarations. |
| 55 | $ clang -cc1 -undef -ast-dump-xml test.cc |
| 56 | ... cutting out internal declarations of clang ... |
| 57 | <TranslationUnit ptr="0x4871160"> |
| 58 | <Function ptr="0x48a5800" name="f" prototype="true"> |
| 59 | <FunctionProtoType ptr="0x4871de0" canonical="0x4871de0"> |
| 60 | <BuiltinType ptr="0x4871250" canonical="0x4871250"/> |
| 61 | <parameters> |
| 62 | <BuiltinType ptr="0x4871250" canonical="0x4871250"/> |
| 63 | </parameters> |
| 64 | </FunctionProtoType> |
| 65 | <ParmVar ptr="0x4871d80" name="x" initstyle="c"> |
| 66 | <BuiltinType ptr="0x4871250" canonical="0x4871250"/> |
| 67 | </ParmVar> |
| 68 | <Stmt> |
| 69 | (CompoundStmt 0x48a5a38 <t2.cc:1:14, line:4:1> |
| 70 | (DeclStmt 0x48a59c0 <line:2:3, col:24> |
| 71 | 0x48a58c0 "int result = |
| 72 | (ParenExpr 0x48a59a0 <col:16, col:23> 'int' |
| 73 | (BinaryOperator 0x48a5978 <col:17, col:21> 'int' '/' |
| 74 | (ImplicitCastExpr 0x48a5960 <col:17> 'int' <LValueToRValue> |
| 75 | (DeclRefExpr 0x48a5918 <col:17> 'int' lvalue ParmVar 0x4871d80 'x' 'int')) |
| 76 | (IntegerLiteral 0x48a5940 <col:21> 'int' 42)))") |
| 77 | (ReturnStmt 0x48a5a18 <line:3:3, col:10> |
| 78 | (ImplicitCastExpr 0x48a5a00 <col:10> 'int' <LValueToRValue> |
| 79 | (DeclRefExpr 0x48a59d8 <col:10> 'int' lvalue Var 0x48a58c0 'result' 'int')))) |
| 80 | |
| 81 | </Stmt> |
| 82 | </Function> |
| 83 | </TranslationUnit> |
| 84 | </pre> |
| 85 | <p>In general, -ast-dump-xml dumps declarations in an XML-style format and |
| 86 | statements in an S-expression-style format. |
| 87 | The toplevel declaration in a translation unit is always the |
| 88 | <a href="http://clang.llvm.org/doxygen/classclang_1_1TranslationUnitDecl.html">translation unit declaration</a>. |
| 89 | In this example, our first user written declaration is the |
| 90 | <a href="http://clang.llvm.org/doxygen/classclang_1_1FunctionDecl.html">function declaration</a> |
| 91 | of 'f'. The body of 'f' is a <a href="http://clang.llvm.org/doxygen/classclang_1_1CompoundStmt.html">compound statement</a>, |
| 92 | whose child nodes are a <a href="http://clang.llvm.org/doxygen/classclang_1_1DeclStmt.html">declaration statement</a> |
| 93 | that declares our result variable, and the |
| 94 | <a href="http://clang.llvm.org/doxygen/classclang_1_1ReturnStmt.html">return statement</a>.</p> |
| 95 | |
| 96 | <!-- ======================================================================= --> |
| 97 | <h2 id="context">AST Context</h2> |
| 98 | <!-- ======================================================================= --> |
| 99 | |
| 100 | <p>All information about the AST for a translation unit is bundled up in the class |
| 101 | <a href="http://clang.llvm.org/doxygen/classclang_1_1ASTContext.html">ASTContext</a>. |
| 102 | It allows traversal of the whole translation unit starting from |
| 103 | <a href="http://clang.llvm.org/doxygen/classclang_1_1ASTContext.html#abd909fb01ef10cfd0244832a67b1dd64">getTranslationUnitDecl</a>, |
| 104 | or to access Clang's <a href="http://clang.llvm.org/doxygen/classclang_1_1ASTContext.html#a4f95adb9958e22fbe55212ae6482feb4">table of identifiers</a> |
| 105 | for the parsed translation unit.</p> |
| 106 | |
| 107 | <!-- ======================================================================= --> |
| 108 | <h2 id="nodes">AST Nodes</h2> |
| 109 | <!-- ======================================================================= --> |
| 110 | |
| 111 | <p>Clang's AST nodes are modeled on a class hierarchy that does not have a common |
| 112 | ancestor. Instead, there are multiple larger hierarchies for basic node types like |
| 113 | <a href="http://clang.llvm.org/doxygen/classclang_1_1Decl.html">Decl</a> and |
| 114 | <a href="http://clang.llvm.org/doxygen/classclang_1_1Stmt.html">Stmt</a>. Many |
| 115 | important AST nodes derive from <a href="http://clang.llvm.org/doxygen/classclang_1_1Type.html">Type</a>, |
| 116 | <a href="http://clang.llvm.org/doxygen/classclang_1_1Decl.html">Decl</a>, |
| 117 | <a href="http://clang.llvm.org/doxygen/classclang_1_1DeclContext.html">DeclContext</a> or |
| 118 | <a href="http://clang.llvm.org/doxygen/classclang_1_1Stmt.html">Stmt</a>, |
| 119 | with some classes deriving from both Decl and DeclContext.</p> |
| 120 | <p>There are also a multitude of nodes in the AST that are not part of a |
| 121 | larger hierarchy, and are only reachable from specific other nodes, |
| 122 | like <a href="http://clang.llvm.org/doxygen/classclang_1_1CXXBaseSpecifier.html">CXXBaseSpecifier</a>. |
| 123 | </p> |
| 124 | |
| 125 | <p>Thus, to traverse the full AST, one starts from the <a href="http://clang.llvm.org/doxygen/classclang_1_1TranslationUnitDecl.html">TranslationUnitDecl</a> |
| 126 | and then recursively traverses everything that can be reached from that node |
| 127 | - this information has to be encoded for each specific node type. This algorithm |
| 128 | is encoded in the <a href="http://clang.llvm.org/doxygen/classclang_1_1RecursiveASTVisitor.html">RecursiveASTVisitor</a>. |
| 129 | See the <a href="http://clang.llvm.org/docs/RAVFrontendAction.html">RecursiveASTVisitor tutorial</a>.</p> |
| 130 | |
| 131 | <p>The two most basic nodes in the Clang AST are statements (<a href="http://clang.llvm.org/doxygen/classclang_1_1Stmt.html">Stmt</a>) |
| 132 | and declarations (<a href="http://clang.llvm.org/doxygen/classclang_1_1Decl.html">Decl</a>). |
| 133 | Note that expressions (<a href="http://clang.llvm.org/doxygen/classclang_1_1Expr.html">Expr</a>) |
| 134 | are also statements in Clang's AST.</p> |
| 135 | |
| 136 | </div> |
| 137 | </body> |
| 138 | </html> |
| 139 | |