blob: 381c5587056f541f1df899219fdf9f10e1710af5 [file] [log] [blame]
Sean Silvabf9b4cd2012-12-13 01:10:46 +00001===============================================================
2Tutorial for building tools using LibTooling and LibASTMatchers
3===============================================================
4
5This document is intended to show how to build a useful source-to-source
6translation tool based on Clang's `LibTooling <LibTooling.html>`_. It is
7explicitly aimed at people who are new to Clang, so all you should need
8is a working knowledge of C++ and the command line.
9
10In order to work on the compiler, you need some basic knowledge of the
11abstract syntax tree (AST). To this end, the reader is incouraged to
12skim the :doc:`Introduction to the Clang
13AST <IntroductionToTheClangAST>`
14
15Step 0: Obtaining Clang
16=======================
17
18As Clang is part of the LLVM project, you'll need to download LLVM's
19source code first. Both Clang and LLVM are maintained as Subversion
20repositories, but we'll be accessing them through the git mirror. For
21further information, see the `getting started
22guide <http://llvm.org/docs/GettingStarted.html>`_.
23
24::
25
26 mkdir ~/clang-llvm && cd ~/clang-llvm
27 git clone http://llvm.org/git/llvm.git
28 cd llvm/tools
29 git clone http://llvm.org/git/clang.git
30
31Next you need to obtain the CMake build system and Ninja build tool. You
32may already have CMake installed, but current binary versions of CMake
33aren't built with Ninja support.
34
35::
36
37 cd ~/clang-llvm
38 git clone https://github.com/martine/ninja.git
39 cd ninja
40 git checkout release
41 ./bootstrap.py
42 sudo cp ninja /usr/bin/
43
44 cd ~/clang-llvm
45 git clone git://cmake.org/stage/cmake.git
46 cd cmake
47 git checkout next
48 ./bootstrap
49 make
50 sudo make install
51
52Okay. Now we'll build Clang!
53
54::
55
56 cd ~/clang-llvm
57 mkdir build && cd build
58 cmake -G Ninja ../llvm -DLLVM_BUILD_TESTS=ON # Enable tests; default is off.
59 ninja
60 ninja check # Test LLVM only.
61 ninja clang-test # Test Clang only.
62 ninja install
63
64And we're live.
65
66All of the tests should pass, though there is a (very) small chance that
67you can catch LLVM and Clang out of sync. Running ``'git svn rebase'``
68in both the llvm and clang directories should fix any problems.
69
70Finally, we want to set Clang as its own compiler.
71
72::
73
74 cd ~/clang-llvm/build
75 ccmake ../llvm
76
77The second command will bring up a GUI for configuring Clang. You need
78to set the entry for ``CMAKE_CXX_COMPILER``. Press ``'t'`` to turn on
79advanced mode. Scroll down to ``CMAKE_CXX_COMPILER``, and set it to
80``/usr/bin/clang++``, or wherever you installed it. Press ``'c'`` to
81configure, then ``'g'`` to generate CMake's files.
82
83Finally, run ninja one last time, and you're done.
84
85Step 1: Create a ClangTool
86==========================
87
88Now that we have enough background knowledge, it's time to create the
89simplest productive ClangTool in existence: a syntax checker. While this
90already exists as ``clang-check``, it's important to understand what's
91going on.
92
93First, we'll need to create a new directory for our tool and tell CMake
94that it exists. As this is not going to be a core clang tool, it will
95live in the ``tools/extra`` repository.
96
97::
98
99 cd ~/clang-llvm/llvm/tools/clang
100 mkdir tools/extra/loop-convert
101 echo 'add_subdirectory(loop-convert)' >> tools/extra/CMakeLists.txt
102 vim tools/extra/loop-convert/CMakeLists.txt
103
104CMakeLists.txt should have the following contents:
105
106::
107
108 set(LLVM_LINK_COMPONENTS support)
109 set(LLVM_USED_LIBS clangTooling clangBasic clangAST)
110
111 add_clang_executable(loop-convert
112 LoopConvert.cpp
113 )
114 target_link_libraries(loop-convert
115 clangTooling
116 clangBasic
117 clangASTMatchers
118 )
119
120With that done, Ninja will be able to compile our tool. Let's give it
121something to compile! Put the following into
122``tools/extra/loop-convert/LoopConvert.cpp``. A detailed explanation of
123why the different parts are needed can be found in the `LibTooling
124documentation <LibTooling.html>`_.
125
126::
127
128 // Declares clang::SyntaxOnlyAction.
129 #include "clang/Frontend/FrontendActions.h"
130 #include "clang/Tooling/CommonOptionsParser.h"
131 #include "clang/Tooling/Tooling.h"
132 // Declares llvm::cl::extrahelp.
133 #include "llvm/Support/CommandLine.h"
134
135 using namespace clang::tooling;
136 using namespace llvm;
137
138 // CommonOptionsParser declares HelpMessage with a description of the common
139 // command-line options related to the compilation database and input files.
140 // It's nice to have this help message in all tools.
141 static cl::extrahelp CommonHelp(CommonOptionsParser::HelpMessage);
142
143 // A help message for this specific tool can be added afterwards.
144 static cl::extrahelp MoreHelp("\nMore help text...");
145
146 int main(int argc, const char **argv) {
147 CommonOptionsParser OptionsParser(argc, argv);
Edwin Vane524741f2012-12-14 18:58:25 +0000148 ClangTool Tool(OptionsParser.getCompilations(),
149 OptionsParser.getSourcePathList());
Sean Silvabf9b4cd2012-12-13 01:10:46 +0000150 return Tool.run(newFrontendActionFactory<clang::SyntaxOnlyAction>());
151 }
152
153And that's it! You can compile our new tool by running ninja from the
154``build`` directory.
155
156::
157
158 cd ~/clang-llvm/build
159 ninja
160
161You should now be able to run the syntax checker, which is located in
162``~/clang-llvm/build/bin``, on any source file. Try it!
163
164::
165
166 cat "void main() {}" > test.cpp
167 bin/loop-convert test.cpp --
168
169Note the two dashes after we specify the source file. The additional
170options for the compiler are passed after the dashes rather than loading
171them from a compilation database - there just aren't any options needed
172right now.
173
174Intermezzo: Learn AST matcher basics
175====================================
176
177Clang recently introduced the :doc:`ASTMatcher
178library <LibASTMatchers>` to provide a simple, powerful, and
179concise way to describe specific patterns in the AST. Implemented as a
180DSL powered by macros and templates (see
181`ASTMatchers.h <../doxygen/ASTMatchers_8h_source.html>`_ if you're
182curious), matchers offer the feel of algebraic data types common to
183functional programming languages.
184
185For example, suppose you wanted to examine only binary operators. There
186is a matcher to do exactly that, conveniently named ``binaryOperator``.
187I'll give you one guess what this matcher does:
188
189::
190
191 binaryOperator(hasOperatorName("+"), hasLHS(integerLiteral(equals(0))))
192
193Shockingly, it will match against addition expressions whose left hand
194side is exactly the literal 0. It will not match against other forms of
1950, such as ``'\0'`` or ``NULL``, but it will match against macros that
196expand to 0. The matcher will also not match against calls to the
197overloaded operator ``'+'``, as there is a separate ``operatorCallExpr``
198matcher to handle overloaded operators.
199
200There are AST matchers to match all the different nodes of the AST,
201narrowing matchers to only match AST nodes fulfilling specific criteria,
202and traversal matchers to get from one kind of AST node to another. For
203a complete list of AST matchers, take a look at the `AST Matcher
204References <LibASTMatchersReference.html>`_
205
206All matcher that are nouns describe entities in the AST and can be
207bound, so that they can be referred to whenever a match is found. To do
208so, simply call the method ``bind`` on these matchers, e.g.:
209
210::
211
212 variable(hasType(isInteger())).bind("intvar")
213
214Step 2: Using AST matchers
215==========================
216
217Okay, on to using matchers for real. Let's start by defining a matcher
218which will capture all ``for`` statements that define a new variable
219initialized to zero. Let's start with matching all ``for`` loops:
220
221::
222
223 forStmt()
224
225Next, we want to specify that a single variable is declared in the first
226portion of the loop, so we can extend the matcher to
227
228::
229
230 forStmt(hasLoopInit(declStmt(hasSingleDecl(varDecl()))))
231
232Finally, we can add the condition that the variable is initialized to
233zero.
234
235::
236
237 forStmt(hasLoopInit(declStmt(hasSingleDecl(varDecl(
238 hasInitializer(integerLiteral(equals(0))))))))
239
240It is fairly easy to read and understand the matcher definition ("match
241loops whose init portion declares a single variable which is initialized
242to the integer literal 0"), but deciding that every piece is necessary
243is more difficult. Note that this matcher will not match loops whose
244variables are initialized to ``'\0'``, ``0.0``, ``NULL``, or any form of
245zero besides the integer 0.
246
247The last step is giving the matcher a name and binding the ``ForStmt``
248as we will want to do something with it:
249
250::
251
252 StatementMatcher LoopMatcher =
253 forStmt(hasLoopInit(declStmt(hasSingleDecl(varDecl(
254 hasInitializer(integerLiteral(equals(0)))))))).bind("forLoop");
255
256Once you have defined your matchers, you will need to add a little more
257scaffolding in order to run them. Matchers are paired with a
258``MatchCallback`` and registered with a ``MatchFinder`` object, then run
259from a ``ClangTool``. More code!
260
261Add the following to ``LoopConvert.cpp``:
262
263::
Manuel Klimek3de6d3a2013-03-04 11:31:46 +0000264 #include "clang/ASTMatchers/ASTMatchers.h"
265 #include "clang/ASTMatchers/ASTMatchFinder.h"
266
267 using namespace clang;
268 using namespace clang::ast_matchers;
Sean Silvabf9b4cd2012-12-13 01:10:46 +0000269
270 StatementMatcher LoopMatcher =
271 forStmt(hasLoopInit(declStmt(hasSingleDecl(varDecl(
272 hasInitializer(integerLiteral(equals(0)))))))).bind("forLoop");
273
274 class LoopPrinter : public MatchFinder::MatchCallback {
275 public :
276 virtual void run(const MatchFinder::MatchResult &Result) {
277 if (const ForStmt *FS = Result.Nodes.getNodeAs<clang::ForStmt>("forLoop"))
278 FS->dump();
279 };
280
281And change ``main()`` to:
282
283::
284
285 int main(int argc, const char **argv) {
286 CommonOptionsParser OptionsParser(argc, argv);
Edwin Vane524741f2012-12-14 18:58:25 +0000287 ClangTool Tool(OptionsParser.getCompilations(),
288 OptionsParser.getSourcePathList());
Sean Silvabf9b4cd2012-12-13 01:10:46 +0000289
290 LoopPrinter Printer;
291 MatchFinder Finder;
292 Finder.addMatcher(LoopMatcher, &Printer);
293
294 return Tool.run(newFrontendActionFactory(&Finder));
295 }
296
297Now, you should be able to recompile and run the code to discover for
298loops. Create a new file with a few examples, and test out our new
299handiwork:
300
301::
302
303 cd ~/clang-llvm/llvm/llvm_build/
304 ninja loop-convert
305 vim ~/test-files/simple-loops.cc
306 bin/loop-convert ~/test-files/simple-loops.cc
307
308Step 3.5: More Complicated Matchers
309===================================
310
311Our simple matcher is capable of discovering for loops, but we would
312still need to filter out many more ourselves. We can do a good portion
313of the remaining work with some cleverly chosen matchers, but first we
314need to decide exactly which properties we want to allow.
315
316How can we characterize for loops over arrays which would be eligible
317for translation to range-based syntax? Range based loops over arrays of
318size ``N`` that:
319
320- start at index ``0``
321- iterate consecutively
322- end at index ``N-1``
323
324We already check for (1), so all we need to add is a check to the loop's
325condition to ensure that the loop's index variable is compared against
326``N`` and another check to ensure that the increment step just
327increments this same variable. The matcher for (2) is straightforward:
328require a pre- or post-increment of the same variable declared in the
329init portion.
330
331Unfortunately, such a matcher is impossible to write. Matchers contain
332no logic for comparing two arbitrary AST nodes and determining whether
333or not they are equal, so the best we can do is matching more than we
334would like to allow, and punting extra comparisons to the callback.
335
336In any case, we can start building this sub-matcher. We can require that
337the increment step be a unary increment like this:
338
339::
340
341 hasIncrement(unaryOperator(hasOperatorName("++")))
342
343Specifying what is incremented introduces another quirk of Clang's AST:
344Usages of variables are represented as ``DeclRefExpr``'s ("declaration
345reference expressions") because they are expressions which refer to
346variable declarations. To find a ``unaryOperator`` that refers to a
347specific declaration, we can simply add a second condition to it:
348
349::
350
351 hasIncrement(unaryOperator(
352 hasOperatorName("++"),
353 hasUnaryOperand(declRefExpr())))
354
355Furthermore, we can restrict our matcher to only match if the
356incremented variable is an integer:
357
358::
359
360 hasIncrement(unaryOperator(
361 hasOperatorName("++"),
362 hasUnaryOperand(declRefExpr(to(varDecl(hasType(isInteger())))))))
363
364And the last step will be to attach an identifier to this variable, so
365that we can retrieve it in the callback:
366
367::
368
369 hasIncrement(unaryOperator(
370 hasOperatorName("++"),
371 hasUnaryOperand(declRefExpr(to(
372 varDecl(hasType(isInteger())).bind("incrementVariable"))))))
373
374We can add this code to the definition of ``LoopMatcher`` and make sure
375that our program, outfitted with the new matcher, only prints out loops
376that declare a single variable initialized to zero and have an increment
377step consisting of a unary increment of some variable.
378
379Now, we just need to add a matcher to check if the condition part of the
380``for`` loop compares a variable against the size of the array. There is
381only one problem - we don't know which array we're iterating over
382without looking at the body of the loop! We are again restricted to
383approximating the result we want with matchers, filling in the details
384in the callback. So we start with:
385
386::
387
388 hasCondition(binaryOperator(hasOperatorName("<"))
389
390It makes sense to ensure that the left-hand side is a reference to a
391variable, and that the right-hand side has integer type.
392
393::
394
395 hasCondition(binaryOperator(
396 hasOperatorName("<"),
397 hasRHS(expr(hasType(isInteger()))),
398 hasLHS(declRefExpr(to(varDecl(hasType(isInteger())))))))
399
400Why? Because it doesn't work. Of the three loops provided in
401``test-files/simple.cpp``, zero of them have a matching condition. A
402quick look at the AST dump of the first for loop, produced by the
403previous iteration of loop-convert, shows us the answer:
404
405::
406
407 (ForStmt 0x173b240
408 (DeclStmt 0x173afc8
409 0x173af50 "int i =
410 (IntegerLiteral 0x173afa8 'int' 0)")
411 <<>>
412 (BinaryOperator 0x173b060 '_Bool' '<'
413 (ImplicitCastExpr 0x173b030 'int'
414 (DeclRefExpr 0x173afe0 'int' lvalue Var 0x173af50 'i' 'int'))
415 (ImplicitCastExpr 0x173b048 'int'
416 (DeclRefExpr 0x173b008 'const int' lvalue Var 0x170fa80 'N' 'const int')))
417 (UnaryOperator 0x173b0b0 'int' lvalue prefix '++'
418 (DeclRefExpr 0x173b088 'int' lvalue Var 0x173af50 'i' 'int'))
419 (CompoundStatement …
420
421We already know that the declaration and increments both match, or this
422loop wouldn't have been dumped. The culprit lies in the implicit cast
423applied to the first operand (i.e. the LHS) of the less-than operator,
424an L-value to R-value conversion applied to the expression referencing
425``i``. Thankfully, the matcher library offers a solution to this problem
426in the form of ``ignoringParenImpCasts``, which instructs the matcher to
427ignore implicit casts and parentheses before continuing to match.
428Adjusting the condition operator will restore the desired match.
429
430::
431
432 hasCondition(binaryOperator(
433 hasOperatorName("<"),
434 hasLHS(expr(hasType(isInteger()))),
435 hasRHS(ignoringParenImpCasts(declRefExpr(
436 to(varDecl(hasType(isInteger()))))))))
437
438After adding binds to the expressions we wished to capture and
439extracting the identifier strings into variables, we have array-step-2
440completed.
441
442Step 4: Retrieving Matched Nodes
443================================
444
445So far, the matcher callback isn't very interesting: it just dumps the
446loop's AST. At some point, we will need to make changes to the input
447source code. Next, we'll work on using the nodes we bound in the
448previous step.
449
450The ``MatchFinder::run()`` callback takes a
451``MatchFinder::MatchResult&`` as its parameter. We're most interested in
452its ``Context`` and ``Nodes`` members. Clang uses the ``ASTContext``
453class to represent contextual information about the AST, as the name
454implies, though the most functionally important detail is that several
455operations require an ``ASTContext*`` parameter. More immediately useful
456is the set of matched nodes, and how we retrieve them.
457
458Since we bind three variables (identified by ConditionVarName,
459InitVarName, and IncrementVarName), we can obtain the matched nodes by
460using the ``getNodeAs()`` member function.
461
462In ``LoopActions.cpp``:
463
464::
465
466 #include "clang/AST/ASTContext.h"
467
468 void LoopPrinter::run(const MatchFinder::MatchResult &Result) {
469 ASTContext *Context = Result.Context;
470 const ForStmt *FS = Result.Nodes.getStmtAs<ForStmt>(LoopName);
471 // We do not want to convert header files!
472 if (!FS || !Context->getSourceManager().isFromMainFile(FS->getForLoc()))
473 return;
474 const VarDecl *IncVar = Result.Nodes.getNodeAs<VarDecl>(IncrementVarName);
475 const VarDecl *CondVar = Result.Nodes.getNodeAs<VarDecl>(ConditionVarName);
476 const VarDecl *InitVar = Result.Nodes.getNodeAs<VarDecl>(InitVarName);
477
478Now that we have the three variables, represented by their respective
479declarations, let's make sure that they're all the same, using a helper
480function I call ``areSameVariable()``.
481
482::
483
484 if (!areSameVariable(IncVar, CondVar) || !areSameVariable(IncVar, InitVar))
485 return;
486 llvm::outs() << "Potential array-based loop discovered.\n";
487 }
488
489If execution reaches the end of ``LoopPrinter::run()``, we know that the
490loop shell that looks like
491
492::
493
494 for (int i= 0; i < expr(); ++i) { ... }
495
496For now, we will just print a message explaining that we found a loop.
497The next section will deal with recursively traversing the AST to
498discover all changes needed.
499
500As a side note, here is the implementation of ``areSameVariable``. Clang
501associates a ``VarDecl`` with each variable to represent the variable's
502declaration. Since the "canonical" form of each declaration is unique by
503address, all we need to do is make sure neither ``ValueDecl`` (base
504class of ``VarDecl``) is ``NULL`` and compare the canonical Decls.
505
506::
507
508 static bool areSameVariable(const ValueDecl *First, const ValueDecl *Second) {
509 return First && Second &&
510 First->getCanonicalDecl() == Second->getCanonicalDecl();
511 }
512
513It's not as trivial to test if two expressions are the same, though
514Clang has already done the hard work for us by providing a way to
515canonicalize expressions:
516
517::
518
519 static bool areSameExpr(ASTContext* Context, const Expr *First,
520 const Expr *Second) {
521 if (!First || !Second)
522 return false;
523 llvm::FoldingSetNodeID FirstID, SecondID;
524 First->Profile(FirstID, *Context, true);
525 Second->Profile(SecondID, *Context, true);
526 return FirstID == SecondID;
527 }
528
529This code relies on the comparison between two
530``llvm::FoldingSetNodeIDs``. As the documentation for
531``Stmt::Profile()`` indicates, the ``Profile()`` member function builds
532a description of a node in the AST, based on its properties, along with
533those of its children. ``FoldingSetNodeID`` then serves as a hash we can
534use to compare expressions. We will need ``areSameExpr`` later. Before
535you run the new code on the additional loops added to
536test-files/simple.cpp, try to figure out which ones will be considered
537potentially convertible.