blob: b62aa588d994b43e6970aa6c29705241661ec16d [file] [log] [blame]
Sean Silva93ca0212012-12-13 01:10:46 +00001=============================
2Introduction to the Clang AST
3=============================
4
5This document gives a gentle introduction to the mysteries of the Clang
6AST. It is targeted at developers who either want to contribute to
7Clang, or use tools that work based on Clang's AST, like the AST
8matchers.
9
Manuel Klimek7b773002013-05-17 08:40:22 +000010.. raw:: html
11
12 <center><iframe width="560" height="315" src="http://www.youtube.com/embed/VqCkCDFLSsc?vq=hd720" frameborder="0" allowfullscreen></iframe></center>
13
Sean Silva93ca0212012-12-13 01:10:46 +000014Introduction
15============
16
17Clang's AST is different from ASTs produced by some other compilers in
18that it closely resembles both the written C++ code and the C++
19standard. For example, parenthesis expressions and compile time
20constants are available in an unreduced form in the AST. This makes
21Clang's AST a good fit for refactoring tools.
22
23Documentation for all Clang AST nodes is available via the generated
24`Doxygen <http://clang.llvm.org/doxygen>`_. The doxygen online
25documentation is also indexed by your favorite search engine, which will
26make a search for clang and the AST node's class name usually turn up
27the doxygen of the class you're looking for (for example, search for:
28clang ParenExpr).
29
30Examining the AST
31=================
32
33A good way to familarize yourself with the Clang AST is to actually look
34at it on some simple example code. Clang has a builtin AST-dump modes,
Dmitri Gribenko97555a12012-12-15 21:10:51 +000035which can be enabled with the flags ``-ast-dump`` and ``-ast-dump-xml``. Note
36that ``-ast-dump-xml`` currently only works with debug builds of clang.
Sean Silva93ca0212012-12-13 01:10:46 +000037
38Let's look at a simple example AST:
39
40::
41
Dmitri Gribenko97555a12012-12-15 21:10:51 +000042 $ cat test.cc
Sean Silva93ca0212012-12-13 01:10:46 +000043 int f(int x) {
44 int result = (x / 42);
45 return result;
46 }
47
48 # Clang by default is a frontend for many tools; -cc1 tells it to directly
49 # use the C++ compiler mode. -undef leaves out some internal declarations.
50 $ clang -cc1 -undef -ast-dump-xml test.cc
51 ... cutting out internal declarations of clang ...
52 <TranslationUnit ptr="0x4871160">
53 <Function ptr="0x48a5800" name="f" prototype="true">
54 <FunctionProtoType ptr="0x4871de0" canonical="0x4871de0">
55 <BuiltinType ptr="0x4871250" canonical="0x4871250"/>
56 <parameters>
57 <BuiltinType ptr="0x4871250" canonical="0x4871250"/>
58 </parameters>
59 </FunctionProtoType>
60 <ParmVar ptr="0x4871d80" name="x" initstyle="c">
61 <BuiltinType ptr="0x4871250" canonical="0x4871250"/>
62 </ParmVar>
63 <Stmt>
64 (CompoundStmt 0x48a5a38 <t2.cc:1:14, line:4:1>
65 (DeclStmt 0x48a59c0 <line:2:3, col:24>
66 0x48a58c0 "int result =
67 (ParenExpr 0x48a59a0 <col:16, col:23> 'int'
68 (BinaryOperator 0x48a5978 <col:17, col:21> 'int' '/'
69 (ImplicitCastExpr 0x48a5960 <col:17> 'int' <LValueToRValue>
70 (DeclRefExpr 0x48a5918 <col:17> 'int' lvalue ParmVar 0x4871d80 'x' 'int'))
71 (IntegerLiteral 0x48a5940 <col:21> 'int' 42)))")
72 (ReturnStmt 0x48a5a18 <line:3:3, col:10>
73 (ImplicitCastExpr 0x48a5a00 <col:10> 'int' <LValueToRValue>
74 (DeclRefExpr 0x48a59d8 <col:10> 'int' lvalue Var 0x48a58c0 'result' 'int'))))
75
76 </Stmt>
77 </Function>
78 </TranslationUnit>
79
Dmitri Gribenko97555a12012-12-15 21:10:51 +000080In general, ``-ast-dump-xml`` dumps declarations in an XML-style format and
Sean Silva93ca0212012-12-13 01:10:46 +000081statements in an S-expression-style format. The toplevel declaration in
82a translation unit is always the `translation unit
83declaration <http://clang.llvm.org/doxygen/classclang_1_1TranslationUnitDecl.html>`_.
84In this example, our first user written declaration is the `function
85declaration <http://clang.llvm.org/doxygen/classclang_1_1FunctionDecl.html>`_
Dmitri Gribenko97555a12012-12-15 21:10:51 +000086of "``f``". The body of "``f``" is a `compound
Sean Silva93ca0212012-12-13 01:10:46 +000087statement <http://clang.llvm.org/doxygen/classclang_1_1CompoundStmt.html>`_,
88whose child nodes are a `declaration
89statement <http://clang.llvm.org/doxygen/classclang_1_1DeclStmt.html>`_
90that declares our result variable, and the `return
91statement <http://clang.llvm.org/doxygen/classclang_1_1ReturnStmt.html>`_.
92
93AST Context
94===========
95
96All information about the AST for a translation unit is bundled up in
97the class
98`ASTContext <http://clang.llvm.org/doxygen/classclang_1_1ASTContext.html>`_.
99It allows traversal of the whole translation unit starting from
100`getTranslationUnitDecl <http://clang.llvm.org/doxygen/classclang_1_1ASTContext.html#abd909fb01ef10cfd0244832a67b1dd64>`_,
101or to access Clang's `table of
102identifiers <http://clang.llvm.org/doxygen/classclang_1_1ASTContext.html#a4f95adb9958e22fbe55212ae6482feb4>`_
103for the parsed translation unit.
104
105AST Nodes
106=========
107
108Clang's AST nodes are modeled on a class hierarchy that does not have a
109common ancestor. Instead, there are multiple larger hierarchies for
110basic node types like
111`Decl <http://clang.llvm.org/doxygen/classclang_1_1Decl.html>`_ and
112`Stmt <http://clang.llvm.org/doxygen/classclang_1_1Stmt.html>`_. Many
113important AST nodes derive from
114`Type <http://clang.llvm.org/doxygen/classclang_1_1Type.html>`_,
115`Decl <http://clang.llvm.org/doxygen/classclang_1_1Decl.html>`_,
116`DeclContext <http://clang.llvm.org/doxygen/classclang_1_1DeclContext.html>`_
117or `Stmt <http://clang.llvm.org/doxygen/classclang_1_1Stmt.html>`_, with
118some classes deriving from both Decl and DeclContext.
119
120There are also a multitude of nodes in the AST that are not part of a
121larger hierarchy, and are only reachable from specific other nodes, like
122`CXXBaseSpecifier <http://clang.llvm.org/doxygen/classclang_1_1CXXBaseSpecifier.html>`_.
123
124Thus, to traverse the full AST, one starts from the
125`TranslationUnitDecl <http://clang.llvm.org/doxygen/classclang_1_1TranslationUnitDecl.html>`_
126and then recursively traverses everything that can be reached from that
127node - this information has to be encoded for each specific node type.
128This algorithm is encoded in the
129`RecursiveASTVisitor <http://clang.llvm.org/doxygen/classclang_1_1RecursiveASTVisitor.html>`_.
130See the `RecursiveASTVisitor
131tutorial <http://clang.llvm.org/docs/RAVFrontendAction.html>`_.
132
133The two most basic nodes in the Clang AST are statements
134(`Stmt <http://clang.llvm.org/doxygen/classclang_1_1Stmt.html>`_) and
135declarations
136(`Decl <http://clang.llvm.org/doxygen/classclang_1_1Decl.html>`_). Note
137that expressions
138(`Expr <http://clang.llvm.org/doxygen/classclang_1_1Expr.html>`_) are
139also statements in Clang's AST.