blob: 3d0af67cb5fd6240496f81376ef14d897658a3e9 [file] [log] [blame]
Sean Silva93ca0212012-12-13 01:10:46 +00001===============================================================
2Tutorial for building tools using LibTooling and LibASTMatchers
3===============================================================
4
5This document is intended to show how to build a useful source-to-source
6translation tool based on Clang's `LibTooling <LibTooling.html>`_. It is
7explicitly aimed at people who are new to Clang, so all you should need
8is a working knowledge of C++ and the command line.
9
10In order to work on the compiler, you need some basic knowledge of the
11abstract syntax tree (AST). To this end, the reader is incouraged to
12skim the :doc:`Introduction to the Clang
13AST <IntroductionToTheClangAST>`
14
15Step 0: Obtaining Clang
16=======================
17
18As Clang is part of the LLVM project, you'll need to download LLVM's
19source code first. Both Clang and LLVM are maintained as Subversion
20repositories, but we'll be accessing them through the git mirror. For
21further information, see the `getting started
22guide <http://llvm.org/docs/GettingStarted.html>`_.
23
24::
25
26 mkdir ~/clang-llvm && cd ~/clang-llvm
27 git clone http://llvm.org/git/llvm.git
28 cd llvm/tools
29 git clone http://llvm.org/git/clang.git
30
31Next you need to obtain the CMake build system and Ninja build tool. You
32may already have CMake installed, but current binary versions of CMake
33aren't built with Ninja support.
34
35::
36
37 cd ~/clang-llvm
38 git clone https://github.com/martine/ninja.git
39 cd ninja
40 git checkout release
41 ./bootstrap.py
42 sudo cp ninja /usr/bin/
43
44 cd ~/clang-llvm
45 git clone git://cmake.org/stage/cmake.git
46 cd cmake
47 git checkout next
48 ./bootstrap
49 make
50 sudo make install
51
52Okay. Now we'll build Clang!
53
54::
55
56 cd ~/clang-llvm
57 mkdir build && cd build
58 cmake -G Ninja ../llvm -DLLVM_BUILD_TESTS=ON # Enable tests; default is off.
59 ninja
60 ninja check # Test LLVM only.
61 ninja clang-test # Test Clang only.
62 ninja install
63
64And we're live.
65
66All of the tests should pass, though there is a (very) small chance that
67you can catch LLVM and Clang out of sync. Running ``'git svn rebase'``
68in both the llvm and clang directories should fix any problems.
69
70Finally, we want to set Clang as its own compiler.
71
72::
73
74 cd ~/clang-llvm/build
75 ccmake ../llvm
76
77The second command will bring up a GUI for configuring Clang. You need
78to set the entry for ``CMAKE_CXX_COMPILER``. Press ``'t'`` to turn on
79advanced mode. Scroll down to ``CMAKE_CXX_COMPILER``, and set it to
80``/usr/bin/clang++``, or wherever you installed it. Press ``'c'`` to
81configure, then ``'g'`` to generate CMake's files.
82
83Finally, run ninja one last time, and you're done.
84
85Step 1: Create a ClangTool
86==========================
87
88Now that we have enough background knowledge, it's time to create the
89simplest productive ClangTool in existence: a syntax checker. While this
90already exists as ``clang-check``, it's important to understand what's
91going on.
92
93First, we'll need to create a new directory for our tool and tell CMake
94that it exists. As this is not going to be a core clang tool, it will
95live in the ``tools/extra`` repository.
96
97::
98
99 cd ~/clang-llvm/llvm/tools/clang
100 mkdir tools/extra/loop-convert
101 echo 'add_subdirectory(loop-convert)' >> tools/extra/CMakeLists.txt
102 vim tools/extra/loop-convert/CMakeLists.txt
103
104CMakeLists.txt should have the following contents:
105
106::
107
108 set(LLVM_LINK_COMPONENTS support)
109 set(LLVM_USED_LIBS clangTooling clangBasic clangAST)
110
111 add_clang_executable(loop-convert
112 LoopConvert.cpp
113 )
114 target_link_libraries(loop-convert
115 clangTooling
116 clangBasic
117 clangASTMatchers
118 )
119
120With that done, Ninja will be able to compile our tool. Let's give it
121something to compile! Put the following into
122``tools/extra/loop-convert/LoopConvert.cpp``. A detailed explanation of
123why the different parts are needed can be found in the `LibTooling
124documentation <LibTooling.html>`_.
125
126::
127
128 // Declares clang::SyntaxOnlyAction.
129 #include "clang/Frontend/FrontendActions.h"
130 #include "clang/Tooling/CommonOptionsParser.h"
131 #include "clang/Tooling/Tooling.h"
132 // Declares llvm::cl::extrahelp.
133 #include "llvm/Support/CommandLine.h"
134
135 using namespace clang::tooling;
136 using namespace llvm;
137
138 // CommonOptionsParser declares HelpMessage with a description of the common
139 // command-line options related to the compilation database and input files.
140 // It's nice to have this help message in all tools.
141 static cl::extrahelp CommonHelp(CommonOptionsParser::HelpMessage);
142
143 // A help message for this specific tool can be added afterwards.
144 static cl::extrahelp MoreHelp("\nMore help text...");
145
146 int main(int argc, const char **argv) {
147 CommonOptionsParser OptionsParser(argc, argv);
Edwin Vaneb1f67db2012-12-14 18:58:25 +0000148 ClangTool Tool(OptionsParser.getCompilations(),
149 OptionsParser.getSourcePathList());
Sean Silva93ca0212012-12-13 01:10:46 +0000150 return Tool.run(newFrontendActionFactory<clang::SyntaxOnlyAction>());
151 }
152
153And that's it! You can compile our new tool by running ninja from the
154``build`` directory.
155
156::
157
158 cd ~/clang-llvm/build
159 ninja
160
161You should now be able to run the syntax checker, which is located in
162``~/clang-llvm/build/bin``, on any source file. Try it!
163
164::
165
166 cat "void main() {}" > test.cpp
167 bin/loop-convert test.cpp --
168
169Note the two dashes after we specify the source file. The additional
170options for the compiler are passed after the dashes rather than loading
171them from a compilation database - there just aren't any options needed
172right now.
173
174Intermezzo: Learn AST matcher basics
175====================================
176
177Clang recently introduced the :doc:`ASTMatcher
178library <LibASTMatchers>` to provide a simple, powerful, and
179concise way to describe specific patterns in the AST. Implemented as a
180DSL powered by macros and templates (see
181`ASTMatchers.h <../doxygen/ASTMatchers_8h_source.html>`_ if you're
182curious), matchers offer the feel of algebraic data types common to
183functional programming languages.
184
185For example, suppose you wanted to examine only binary operators. There
186is a matcher to do exactly that, conveniently named ``binaryOperator``.
187I'll give you one guess what this matcher does:
188
189::
190
191 binaryOperator(hasOperatorName("+"), hasLHS(integerLiteral(equals(0))))
192
193Shockingly, it will match against addition expressions whose left hand
194side is exactly the literal 0. It will not match against other forms of
1950, such as ``'\0'`` or ``NULL``, but it will match against macros that
196expand to 0. The matcher will also not match against calls to the
197overloaded operator ``'+'``, as there is a separate ``operatorCallExpr``
198matcher to handle overloaded operators.
199
200There are AST matchers to match all the different nodes of the AST,
201narrowing matchers to only match AST nodes fulfilling specific criteria,
202and traversal matchers to get from one kind of AST node to another. For
203a complete list of AST matchers, take a look at the `AST Matcher
204References <LibASTMatchersReference.html>`_
205
206All matcher that are nouns describe entities in the AST and can be
207bound, so that they can be referred to whenever a match is found. To do
208so, simply call the method ``bind`` on these matchers, e.g.:
209
210::
211
212 variable(hasType(isInteger())).bind("intvar")
213
214Step 2: Using AST matchers
215==========================
216
217Okay, on to using matchers for real. Let's start by defining a matcher
218which will capture all ``for`` statements that define a new variable
219initialized to zero. Let's start with matching all ``for`` loops:
220
221::
222
223 forStmt()
224
225Next, we want to specify that a single variable is declared in the first
226portion of the loop, so we can extend the matcher to
227
228::
229
230 forStmt(hasLoopInit(declStmt(hasSingleDecl(varDecl()))))
231
232Finally, we can add the condition that the variable is initialized to
233zero.
234
235::
236
237 forStmt(hasLoopInit(declStmt(hasSingleDecl(varDecl(
238 hasInitializer(integerLiteral(equals(0))))))))
239
240It is fairly easy to read and understand the matcher definition ("match
241loops whose init portion declares a single variable which is initialized
242to the integer literal 0"), but deciding that every piece is necessary
243is more difficult. Note that this matcher will not match loops whose
244variables are initialized to ``'\0'``, ``0.0``, ``NULL``, or any form of
245zero besides the integer 0.
246
247The last step is giving the matcher a name and binding the ``ForStmt``
248as we will want to do something with it:
249
250::
251
252 StatementMatcher LoopMatcher =
253 forStmt(hasLoopInit(declStmt(hasSingleDecl(varDecl(
254 hasInitializer(integerLiteral(equals(0)))))))).bind("forLoop");
255
256Once you have defined your matchers, you will need to add a little more
257scaffolding in order to run them. Matchers are paired with a
258``MatchCallback`` and registered with a ``MatchFinder`` object, then run
259from a ``ClangTool``. More code!
260
261Add the following to ``LoopConvert.cpp``:
262
263::
264
265 StatementMatcher LoopMatcher =
266 forStmt(hasLoopInit(declStmt(hasSingleDecl(varDecl(
267 hasInitializer(integerLiteral(equals(0)))))))).bind("forLoop");
268
269 class LoopPrinter : public MatchFinder::MatchCallback {
270 public :
271 virtual void run(const MatchFinder::MatchResult &Result) {
272 if (const ForStmt *FS = Result.Nodes.getNodeAs<clang::ForStmt>("forLoop"))
273 FS->dump();
274 };
275
276And change ``main()`` to:
277
278::
279
280 int main(int argc, const char **argv) {
281 CommonOptionsParser OptionsParser(argc, argv);
Edwin Vaneb1f67db2012-12-14 18:58:25 +0000282 ClangTool Tool(OptionsParser.getCompilations(),
283 OptionsParser.getSourcePathList());
Sean Silva93ca0212012-12-13 01:10:46 +0000284
285 LoopPrinter Printer;
286 MatchFinder Finder;
287 Finder.addMatcher(LoopMatcher, &Printer);
288
289 return Tool.run(newFrontendActionFactory(&Finder));
290 }
291
292Now, you should be able to recompile and run the code to discover for
293loops. Create a new file with a few examples, and test out our new
294handiwork:
295
296::
297
298 cd ~/clang-llvm/llvm/llvm_build/
299 ninja loop-convert
300 vim ~/test-files/simple-loops.cc
301 bin/loop-convert ~/test-files/simple-loops.cc
302
303Step 3.5: More Complicated Matchers
304===================================
305
306Our simple matcher is capable of discovering for loops, but we would
307still need to filter out many more ourselves. We can do a good portion
308of the remaining work with some cleverly chosen matchers, but first we
309need to decide exactly which properties we want to allow.
310
311How can we characterize for loops over arrays which would be eligible
312for translation to range-based syntax? Range based loops over arrays of
313size ``N`` that:
314
315- start at index ``0``
316- iterate consecutively
317- end at index ``N-1``
318
319We already check for (1), so all we need to add is a check to the loop's
320condition to ensure that the loop's index variable is compared against
321``N`` and another check to ensure that the increment step just
322increments this same variable. The matcher for (2) is straightforward:
323require a pre- or post-increment of the same variable declared in the
324init portion.
325
326Unfortunately, such a matcher is impossible to write. Matchers contain
327no logic for comparing two arbitrary AST nodes and determining whether
328or not they are equal, so the best we can do is matching more than we
329would like to allow, and punting extra comparisons to the callback.
330
331In any case, we can start building this sub-matcher. We can require that
332the increment step be a unary increment like this:
333
334::
335
336 hasIncrement(unaryOperator(hasOperatorName("++")))
337
338Specifying what is incremented introduces another quirk of Clang's AST:
339Usages of variables are represented as ``DeclRefExpr``'s ("declaration
340reference expressions") because they are expressions which refer to
341variable declarations. To find a ``unaryOperator`` that refers to a
342specific declaration, we can simply add a second condition to it:
343
344::
345
346 hasIncrement(unaryOperator(
347 hasOperatorName("++"),
348 hasUnaryOperand(declRefExpr())))
349
350Furthermore, we can restrict our matcher to only match if the
351incremented variable is an integer:
352
353::
354
355 hasIncrement(unaryOperator(
356 hasOperatorName("++"),
357 hasUnaryOperand(declRefExpr(to(varDecl(hasType(isInteger())))))))
358
359And the last step will be to attach an identifier to this variable, so
360that we can retrieve it in the callback:
361
362::
363
364 hasIncrement(unaryOperator(
365 hasOperatorName("++"),
366 hasUnaryOperand(declRefExpr(to(
367 varDecl(hasType(isInteger())).bind("incrementVariable"))))))
368
369We can add this code to the definition of ``LoopMatcher`` and make sure
370that our program, outfitted with the new matcher, only prints out loops
371that declare a single variable initialized to zero and have an increment
372step consisting of a unary increment of some variable.
373
374Now, we just need to add a matcher to check if the condition part of the
375``for`` loop compares a variable against the size of the array. There is
376only one problem - we don't know which array we're iterating over
377without looking at the body of the loop! We are again restricted to
378approximating the result we want with matchers, filling in the details
379in the callback. So we start with:
380
381::
382
383 hasCondition(binaryOperator(hasOperatorName("<"))
384
385It makes sense to ensure that the left-hand side is a reference to a
386variable, and that the right-hand side has integer type.
387
388::
389
390 hasCondition(binaryOperator(
391 hasOperatorName("<"),
392 hasRHS(expr(hasType(isInteger()))),
393 hasLHS(declRefExpr(to(varDecl(hasType(isInteger())))))))
394
395Why? Because it doesn't work. Of the three loops provided in
396``test-files/simple.cpp``, zero of them have a matching condition. A
397quick look at the AST dump of the first for loop, produced by the
398previous iteration of loop-convert, shows us the answer:
399
400::
401
402 (ForStmt 0x173b240
403 (DeclStmt 0x173afc8
404 0x173af50 "int i =
405 (IntegerLiteral 0x173afa8 'int' 0)")
406 <<>>
407 (BinaryOperator 0x173b060 '_Bool' '<'
408 (ImplicitCastExpr 0x173b030 'int'
409 (DeclRefExpr 0x173afe0 'int' lvalue Var 0x173af50 'i' 'int'))
410 (ImplicitCastExpr 0x173b048 'int'
411 (DeclRefExpr 0x173b008 'const int' lvalue Var 0x170fa80 'N' 'const int')))
412 (UnaryOperator 0x173b0b0 'int' lvalue prefix '++'
413 (DeclRefExpr 0x173b088 'int' lvalue Var 0x173af50 'i' 'int'))
414 (CompoundStatement …
415
416We already know that the declaration and increments both match, or this
417loop wouldn't have been dumped. The culprit lies in the implicit cast
418applied to the first operand (i.e. the LHS) of the less-than operator,
419an L-value to R-value conversion applied to the expression referencing
420``i``. Thankfully, the matcher library offers a solution to this problem
421in the form of ``ignoringParenImpCasts``, which instructs the matcher to
422ignore implicit casts and parentheses before continuing to match.
423Adjusting the condition operator will restore the desired match.
424
425::
426
427 hasCondition(binaryOperator(
428 hasOperatorName("<"),
429 hasLHS(expr(hasType(isInteger()))),
430 hasRHS(ignoringParenImpCasts(declRefExpr(
431 to(varDecl(hasType(isInteger()))))))))
432
433After adding binds to the expressions we wished to capture and
434extracting the identifier strings into variables, we have array-step-2
435completed.
436
437Step 4: Retrieving Matched Nodes
438================================
439
440So far, the matcher callback isn't very interesting: it just dumps the
441loop's AST. At some point, we will need to make changes to the input
442source code. Next, we'll work on using the nodes we bound in the
443previous step.
444
445The ``MatchFinder::run()`` callback takes a
446``MatchFinder::MatchResult&`` as its parameter. We're most interested in
447its ``Context`` and ``Nodes`` members. Clang uses the ``ASTContext``
448class to represent contextual information about the AST, as the name
449implies, though the most functionally important detail is that several
450operations require an ``ASTContext*`` parameter. More immediately useful
451is the set of matched nodes, and how we retrieve them.
452
453Since we bind three variables (identified by ConditionVarName,
454InitVarName, and IncrementVarName), we can obtain the matched nodes by
455using the ``getNodeAs()`` member function.
456
457In ``LoopActions.cpp``:
458
459::
460
461 #include "clang/AST/ASTContext.h"
462
463 void LoopPrinter::run(const MatchFinder::MatchResult &Result) {
464 ASTContext *Context = Result.Context;
465 const ForStmt *FS = Result.Nodes.getStmtAs<ForStmt>(LoopName);
466 // We do not want to convert header files!
467 if (!FS || !Context->getSourceManager().isFromMainFile(FS->getForLoc()))
468 return;
469 const VarDecl *IncVar = Result.Nodes.getNodeAs<VarDecl>(IncrementVarName);
470 const VarDecl *CondVar = Result.Nodes.getNodeAs<VarDecl>(ConditionVarName);
471 const VarDecl *InitVar = Result.Nodes.getNodeAs<VarDecl>(InitVarName);
472
473Now that we have the three variables, represented by their respective
474declarations, let's make sure that they're all the same, using a helper
475function I call ``areSameVariable()``.
476
477::
478
479 if (!areSameVariable(IncVar, CondVar) || !areSameVariable(IncVar, InitVar))
480 return;
481 llvm::outs() << "Potential array-based loop discovered.\n";
482 }
483
484If execution reaches the end of ``LoopPrinter::run()``, we know that the
485loop shell that looks like
486
487::
488
489 for (int i= 0; i < expr(); ++i) { ... }
490
491For now, we will just print a message explaining that we found a loop.
492The next section will deal with recursively traversing the AST to
493discover all changes needed.
494
495As a side note, here is the implementation of ``areSameVariable``. Clang
496associates a ``VarDecl`` with each variable to represent the variable's
497declaration. Since the "canonical" form of each declaration is unique by
498address, all we need to do is make sure neither ``ValueDecl`` (base
499class of ``VarDecl``) is ``NULL`` and compare the canonical Decls.
500
501::
502
503 static bool areSameVariable(const ValueDecl *First, const ValueDecl *Second) {
504 return First && Second &&
505 First->getCanonicalDecl() == Second->getCanonicalDecl();
506 }
507
508It's not as trivial to test if two expressions are the same, though
509Clang has already done the hard work for us by providing a way to
510canonicalize expressions:
511
512::
513
514 static bool areSameExpr(ASTContext* Context, const Expr *First,
515 const Expr *Second) {
516 if (!First || !Second)
517 return false;
518 llvm::FoldingSetNodeID FirstID, SecondID;
519 First->Profile(FirstID, *Context, true);
520 Second->Profile(SecondID, *Context, true);
521 return FirstID == SecondID;
522 }
523
524This code relies on the comparison between two
525``llvm::FoldingSetNodeIDs``. As the documentation for
526``Stmt::Profile()`` indicates, the ``Profile()`` member function builds
527a description of a node in the AST, based on its properties, along with
528those of its children. ``FoldingSetNodeID`` then serves as a hash we can
529use to compare expressions. We will need ``areSameExpr`` later. Before
530you run the new code on the additional loops added to
531test-files/simple.cpp, try to figure out which ones will be considered
532potentially convertible.