blob: ba568e3594eabb2b1bede8f40e03f1f925055acc [file] [log] [blame]
Sean Silva93ca0212012-12-13 01:10:46 +00001===============================================================
2Tutorial for building tools using LibTooling and LibASTMatchers
3===============================================================
4
5This document is intended to show how to build a useful source-to-source
6translation tool based on Clang's `LibTooling <LibTooling.html>`_. It is
7explicitly aimed at people who are new to Clang, so all you should need
8is a working knowledge of C++ and the command line.
9
10In order to work on the compiler, you need some basic knowledge of the
11abstract syntax tree (AST). To this end, the reader is incouraged to
12skim the :doc:`Introduction to the Clang
13AST <IntroductionToTheClangAST>`
14
15Step 0: Obtaining Clang
16=======================
17
18As Clang is part of the LLVM project, you'll need to download LLVM's
19source code first. Both Clang and LLVM are maintained as Subversion
20repositories, but we'll be accessing them through the git mirror. For
21further information, see the `getting started
22guide <http://llvm.org/docs/GettingStarted.html>`_.
23
Dmitri Gribenkoba6efd32013-03-05 13:05:56 +000024.. code-block:: console
Sean Silva93ca0212012-12-13 01:10:46 +000025
26 mkdir ~/clang-llvm && cd ~/clang-llvm
27 git clone http://llvm.org/git/llvm.git
28 cd llvm/tools
29 git clone http://llvm.org/git/clang.git
30
31Next you need to obtain the CMake build system and Ninja build tool. You
32may already have CMake installed, but current binary versions of CMake
33aren't built with Ninja support.
34
Dmitri Gribenkoba6efd32013-03-05 13:05:56 +000035.. code-block:: console
Sean Silva93ca0212012-12-13 01:10:46 +000036
37 cd ~/clang-llvm
38 git clone https://github.com/martine/ninja.git
39 cd ninja
40 git checkout release
41 ./bootstrap.py
42 sudo cp ninja /usr/bin/
43
44 cd ~/clang-llvm
45 git clone git://cmake.org/stage/cmake.git
46 cd cmake
47 git checkout next
48 ./bootstrap
49 make
50 sudo make install
51
52Okay. Now we'll build Clang!
53
Dmitri Gribenkoba6efd32013-03-05 13:05:56 +000054.. code-block:: console
Sean Silva93ca0212012-12-13 01:10:46 +000055
56 cd ~/clang-llvm
57 mkdir build && cd build
58 cmake -G Ninja ../llvm -DLLVM_BUILD_TESTS=ON # Enable tests; default is off.
59 ninja
60 ninja check # Test LLVM only.
61 ninja clang-test # Test Clang only.
62 ninja install
63
64And we're live.
65
66All of the tests should pass, though there is a (very) small chance that
67you can catch LLVM and Clang out of sync. Running ``'git svn rebase'``
68in both the llvm and clang directories should fix any problems.
69
70Finally, we want to set Clang as its own compiler.
71
Dmitri Gribenkoba6efd32013-03-05 13:05:56 +000072.. code-block:: console
Sean Silva93ca0212012-12-13 01:10:46 +000073
74 cd ~/clang-llvm/build
75 ccmake ../llvm
76
77The second command will bring up a GUI for configuring Clang. You need
78to set the entry for ``CMAKE_CXX_COMPILER``. Press ``'t'`` to turn on
79advanced mode. Scroll down to ``CMAKE_CXX_COMPILER``, and set it to
80``/usr/bin/clang++``, or wherever you installed it. Press ``'c'`` to
81configure, then ``'g'`` to generate CMake's files.
82
83Finally, run ninja one last time, and you're done.
84
85Step 1: Create a ClangTool
86==========================
87
88Now that we have enough background knowledge, it's time to create the
89simplest productive ClangTool in existence: a syntax checker. While this
90already exists as ``clang-check``, it's important to understand what's
91going on.
92
93First, we'll need to create a new directory for our tool and tell CMake
94that it exists. As this is not going to be a core clang tool, it will
95live in the ``tools/extra`` repository.
96
Dmitri Gribenkoba6efd32013-03-05 13:05:56 +000097.. code-block:: console
Sean Silva93ca0212012-12-13 01:10:46 +000098
99 cd ~/clang-llvm/llvm/tools/clang
100 mkdir tools/extra/loop-convert
101 echo 'add_subdirectory(loop-convert)' >> tools/extra/CMakeLists.txt
102 vim tools/extra/loop-convert/CMakeLists.txt
103
104CMakeLists.txt should have the following contents:
105
106::
107
108 set(LLVM_LINK_COMPONENTS support)
109 set(LLVM_USED_LIBS clangTooling clangBasic clangAST)
110
111 add_clang_executable(loop-convert
112 LoopConvert.cpp
113 )
114 target_link_libraries(loop-convert
115 clangTooling
116 clangBasic
117 clangASTMatchers
118 )
119
120With that done, Ninja will be able to compile our tool. Let's give it
121something to compile! Put the following into
122``tools/extra/loop-convert/LoopConvert.cpp``. A detailed explanation of
123why the different parts are needed can be found in the `LibTooling
124documentation <LibTooling.html>`_.
125
Dmitri Gribenkoba6efd32013-03-05 13:05:56 +0000126.. code-block:: c++
Sean Silva93ca0212012-12-13 01:10:46 +0000127
128 // Declares clang::SyntaxOnlyAction.
129 #include "clang/Frontend/FrontendActions.h"
130 #include "clang/Tooling/CommonOptionsParser.h"
131 #include "clang/Tooling/Tooling.h"
132 // Declares llvm::cl::extrahelp.
133 #include "llvm/Support/CommandLine.h"
134
135 using namespace clang::tooling;
136 using namespace llvm;
137
138 // CommonOptionsParser declares HelpMessage with a description of the common
139 // command-line options related to the compilation database and input files.
140 // It's nice to have this help message in all tools.
141 static cl::extrahelp CommonHelp(CommonOptionsParser::HelpMessage);
142
143 // A help message for this specific tool can be added afterwards.
144 static cl::extrahelp MoreHelp("\nMore help text...");
145
146 int main(int argc, const char **argv) {
147 CommonOptionsParser OptionsParser(argc, argv);
Edwin Vaneb1f67db2012-12-14 18:58:25 +0000148 ClangTool Tool(OptionsParser.getCompilations(),
149 OptionsParser.getSourcePathList());
Sean Silva93ca0212012-12-13 01:10:46 +0000150 return Tool.run(newFrontendActionFactory<clang::SyntaxOnlyAction>());
151 }
152
153And that's it! You can compile our new tool by running ninja from the
154``build`` directory.
155
Dmitri Gribenkoba6efd32013-03-05 13:05:56 +0000156.. code-block:: console
Sean Silva93ca0212012-12-13 01:10:46 +0000157
158 cd ~/clang-llvm/build
159 ninja
160
161You should now be able to run the syntax checker, which is located in
162``~/clang-llvm/build/bin``, on any source file. Try it!
163
Dmitri Gribenkoba6efd32013-03-05 13:05:56 +0000164.. code-block:: console
Sean Silva93ca0212012-12-13 01:10:46 +0000165
166 cat "void main() {}" > test.cpp
167 bin/loop-convert test.cpp --
168
169Note the two dashes after we specify the source file. The additional
170options for the compiler are passed after the dashes rather than loading
171them from a compilation database - there just aren't any options needed
172right now.
173
174Intermezzo: Learn AST matcher basics
175====================================
176
177Clang recently introduced the :doc:`ASTMatcher
178library <LibASTMatchers>` to provide a simple, powerful, and
179concise way to describe specific patterns in the AST. Implemented as a
180DSL powered by macros and templates (see
181`ASTMatchers.h <../doxygen/ASTMatchers_8h_source.html>`_ if you're
182curious), matchers offer the feel of algebraic data types common to
183functional programming languages.
184
185For example, suppose you wanted to examine only binary operators. There
186is a matcher to do exactly that, conveniently named ``binaryOperator``.
187I'll give you one guess what this matcher does:
188
Dmitri Gribenkoba6efd32013-03-05 13:05:56 +0000189.. code-block:: c++
Sean Silva93ca0212012-12-13 01:10:46 +0000190
191 binaryOperator(hasOperatorName("+"), hasLHS(integerLiteral(equals(0))))
192
193Shockingly, it will match against addition expressions whose left hand
194side is exactly the literal 0. It will not match against other forms of
1950, such as ``'\0'`` or ``NULL``, but it will match against macros that
196expand to 0. The matcher will also not match against calls to the
197overloaded operator ``'+'``, as there is a separate ``operatorCallExpr``
198matcher to handle overloaded operators.
199
200There are AST matchers to match all the different nodes of the AST,
201narrowing matchers to only match AST nodes fulfilling specific criteria,
202and traversal matchers to get from one kind of AST node to another. For
203a complete list of AST matchers, take a look at the `AST Matcher
204References <LibASTMatchersReference.html>`_
205
206All matcher that are nouns describe entities in the AST and can be
207bound, so that they can be referred to whenever a match is found. To do
208so, simply call the method ``bind`` on these matchers, e.g.:
209
Dmitri Gribenkoba6efd32013-03-05 13:05:56 +0000210.. code-block:: c++
Sean Silva93ca0212012-12-13 01:10:46 +0000211
212 variable(hasType(isInteger())).bind("intvar")
213
214Step 2: Using AST matchers
215==========================
216
217Okay, on to using matchers for real. Let's start by defining a matcher
218which will capture all ``for`` statements that define a new variable
219initialized to zero. Let's start with matching all ``for`` loops:
220
Dmitri Gribenkoba6efd32013-03-05 13:05:56 +0000221.. code-block:: c++
Sean Silva93ca0212012-12-13 01:10:46 +0000222
223 forStmt()
224
225Next, we want to specify that a single variable is declared in the first
226portion of the loop, so we can extend the matcher to
227
Dmitri Gribenkoba6efd32013-03-05 13:05:56 +0000228.. code-block:: c++
Sean Silva93ca0212012-12-13 01:10:46 +0000229
230 forStmt(hasLoopInit(declStmt(hasSingleDecl(varDecl()))))
231
232Finally, we can add the condition that the variable is initialized to
233zero.
234
Dmitri Gribenkoba6efd32013-03-05 13:05:56 +0000235.. code-block:: c++
Sean Silva93ca0212012-12-13 01:10:46 +0000236
237 forStmt(hasLoopInit(declStmt(hasSingleDecl(varDecl(
238 hasInitializer(integerLiteral(equals(0))))))))
239
240It is fairly easy to read and understand the matcher definition ("match
241loops whose init portion declares a single variable which is initialized
242to the integer literal 0"), but deciding that every piece is necessary
243is more difficult. Note that this matcher will not match loops whose
244variables are initialized to ``'\0'``, ``0.0``, ``NULL``, or any form of
245zero besides the integer 0.
246
247The last step is giving the matcher a name and binding the ``ForStmt``
248as we will want to do something with it:
249
Dmitri Gribenkoba6efd32013-03-05 13:05:56 +0000250.. code-block:: c++
Sean Silva93ca0212012-12-13 01:10:46 +0000251
252 StatementMatcher LoopMatcher =
253 forStmt(hasLoopInit(declStmt(hasSingleDecl(varDecl(
254 hasInitializer(integerLiteral(equals(0)))))))).bind("forLoop");
255
256Once you have defined your matchers, you will need to add a little more
257scaffolding in order to run them. Matchers are paired with a
258``MatchCallback`` and registered with a ``MatchFinder`` object, then run
259from a ``ClangTool``. More code!
260
261Add the following to ``LoopConvert.cpp``:
262
Dmitri Gribenkoba6efd32013-03-05 13:05:56 +0000263.. code-block:: c++
264
Manuel Klimeke3a2b812013-03-04 11:31:46 +0000265 #include "clang/ASTMatchers/ASTMatchers.h"
266 #include "clang/ASTMatchers/ASTMatchFinder.h"
267
268 using namespace clang;
269 using namespace clang::ast_matchers;
Sean Silva93ca0212012-12-13 01:10:46 +0000270
271 StatementMatcher LoopMatcher =
272 forStmt(hasLoopInit(declStmt(hasSingleDecl(varDecl(
273 hasInitializer(integerLiteral(equals(0)))))))).bind("forLoop");
274
275 class LoopPrinter : public MatchFinder::MatchCallback {
276 public :
277 virtual void run(const MatchFinder::MatchResult &Result) {
278 if (const ForStmt *FS = Result.Nodes.getNodeAs<clang::ForStmt>("forLoop"))
279 FS->dump();
280 };
281
282And change ``main()`` to:
283
Dmitri Gribenkoba6efd32013-03-05 13:05:56 +0000284.. code-block:: c++
Sean Silva93ca0212012-12-13 01:10:46 +0000285
286 int main(int argc, const char **argv) {
287 CommonOptionsParser OptionsParser(argc, argv);
Edwin Vaneb1f67db2012-12-14 18:58:25 +0000288 ClangTool Tool(OptionsParser.getCompilations(),
289 OptionsParser.getSourcePathList());
Sean Silva93ca0212012-12-13 01:10:46 +0000290
291 LoopPrinter Printer;
292 MatchFinder Finder;
293 Finder.addMatcher(LoopMatcher, &Printer);
294
295 return Tool.run(newFrontendActionFactory(&Finder));
296 }
297
298Now, you should be able to recompile and run the code to discover for
299loops. Create a new file with a few examples, and test out our new
300handiwork:
301
Dmitri Gribenkoba6efd32013-03-05 13:05:56 +0000302.. code-block:: console
Sean Silva93ca0212012-12-13 01:10:46 +0000303
304 cd ~/clang-llvm/llvm/llvm_build/
305 ninja loop-convert
306 vim ~/test-files/simple-loops.cc
307 bin/loop-convert ~/test-files/simple-loops.cc
308
309Step 3.5: More Complicated Matchers
310===================================
311
312Our simple matcher is capable of discovering for loops, but we would
313still need to filter out many more ourselves. We can do a good portion
314of the remaining work with some cleverly chosen matchers, but first we
315need to decide exactly which properties we want to allow.
316
317How can we characterize for loops over arrays which would be eligible
318for translation to range-based syntax? Range based loops over arrays of
319size ``N`` that:
320
321- start at index ``0``
322- iterate consecutively
323- end at index ``N-1``
324
325We already check for (1), so all we need to add is a check to the loop's
326condition to ensure that the loop's index variable is compared against
327``N`` and another check to ensure that the increment step just
328increments this same variable. The matcher for (2) is straightforward:
329require a pre- or post-increment of the same variable declared in the
330init portion.
331
332Unfortunately, such a matcher is impossible to write. Matchers contain
333no logic for comparing two arbitrary AST nodes and determining whether
334or not they are equal, so the best we can do is matching more than we
335would like to allow, and punting extra comparisons to the callback.
336
337In any case, we can start building this sub-matcher. We can require that
338the increment step be a unary increment like this:
339
Dmitri Gribenkoba6efd32013-03-05 13:05:56 +0000340.. code-block:: c++
Sean Silva93ca0212012-12-13 01:10:46 +0000341
342 hasIncrement(unaryOperator(hasOperatorName("++")))
343
344Specifying what is incremented introduces another quirk of Clang's AST:
345Usages of variables are represented as ``DeclRefExpr``'s ("declaration
346reference expressions") because they are expressions which refer to
347variable declarations. To find a ``unaryOperator`` that refers to a
348specific declaration, we can simply add a second condition to it:
349
Dmitri Gribenkoba6efd32013-03-05 13:05:56 +0000350.. code-block:: c++
Sean Silva93ca0212012-12-13 01:10:46 +0000351
352 hasIncrement(unaryOperator(
353 hasOperatorName("++"),
354 hasUnaryOperand(declRefExpr())))
355
356Furthermore, we can restrict our matcher to only match if the
357incremented variable is an integer:
358
Dmitri Gribenkoba6efd32013-03-05 13:05:56 +0000359.. code-block:: c++
Sean Silva93ca0212012-12-13 01:10:46 +0000360
361 hasIncrement(unaryOperator(
362 hasOperatorName("++"),
363 hasUnaryOperand(declRefExpr(to(varDecl(hasType(isInteger())))))))
364
365And the last step will be to attach an identifier to this variable, so
366that we can retrieve it in the callback:
367
Dmitri Gribenkoba6efd32013-03-05 13:05:56 +0000368.. code-block:: c++
Sean Silva93ca0212012-12-13 01:10:46 +0000369
370 hasIncrement(unaryOperator(
371 hasOperatorName("++"),
372 hasUnaryOperand(declRefExpr(to(
373 varDecl(hasType(isInteger())).bind("incrementVariable"))))))
374
375We can add this code to the definition of ``LoopMatcher`` and make sure
376that our program, outfitted with the new matcher, only prints out loops
377that declare a single variable initialized to zero and have an increment
378step consisting of a unary increment of some variable.
379
380Now, we just need to add a matcher to check if the condition part of the
381``for`` loop compares a variable against the size of the array. There is
382only one problem - we don't know which array we're iterating over
383without looking at the body of the loop! We are again restricted to
384approximating the result we want with matchers, filling in the details
385in the callback. So we start with:
386
Dmitri Gribenkoba6efd32013-03-05 13:05:56 +0000387.. code-block:: c++
Sean Silva93ca0212012-12-13 01:10:46 +0000388
389 hasCondition(binaryOperator(hasOperatorName("<"))
390
391It makes sense to ensure that the left-hand side is a reference to a
392variable, and that the right-hand side has integer type.
393
Dmitri Gribenkoba6efd32013-03-05 13:05:56 +0000394.. code-block:: c++
Sean Silva93ca0212012-12-13 01:10:46 +0000395
396 hasCondition(binaryOperator(
397 hasOperatorName("<"),
Edwin Vane7e6f23a2013-03-05 18:04:37 +0000398 hasLHS(declRefExpr(to(varDecl(hasType(isInteger()))))),
399 hasRHS(expr(hasType(isInteger())))))
Sean Silva93ca0212012-12-13 01:10:46 +0000400
401Why? Because it doesn't work. Of the three loops provided in
402``test-files/simple.cpp``, zero of them have a matching condition. A
403quick look at the AST dump of the first for loop, produced by the
404previous iteration of loop-convert, shows us the answer:
405
406::
407
408 (ForStmt 0x173b240
409 (DeclStmt 0x173afc8
410 0x173af50 "int i =
411 (IntegerLiteral 0x173afa8 'int' 0)")
412 <<>>
413 (BinaryOperator 0x173b060 '_Bool' '<'
414 (ImplicitCastExpr 0x173b030 'int'
415 (DeclRefExpr 0x173afe0 'int' lvalue Var 0x173af50 'i' 'int'))
416 (ImplicitCastExpr 0x173b048 'int'
417 (DeclRefExpr 0x173b008 'const int' lvalue Var 0x170fa80 'N' 'const int')))
418 (UnaryOperator 0x173b0b0 'int' lvalue prefix '++'
419 (DeclRefExpr 0x173b088 'int' lvalue Var 0x173af50 'i' 'int'))
420 (CompoundStatement …
421
422We already know that the declaration and increments both match, or this
423loop wouldn't have been dumped. The culprit lies in the implicit cast
424applied to the first operand (i.e. the LHS) of the less-than operator,
425an L-value to R-value conversion applied to the expression referencing
426``i``. Thankfully, the matcher library offers a solution to this problem
427in the form of ``ignoringParenImpCasts``, which instructs the matcher to
428ignore implicit casts and parentheses before continuing to match.
429Adjusting the condition operator will restore the desired match.
430
Dmitri Gribenkoba6efd32013-03-05 13:05:56 +0000431.. code-block:: c++
Sean Silva93ca0212012-12-13 01:10:46 +0000432
433 hasCondition(binaryOperator(
434 hasOperatorName("<"),
Edwin Vane7e6f23a2013-03-05 18:04:37 +0000435 hasLHS(ignoringParenImpCasts(declRefExpr(
436 to(varDecl(hasType(isInteger())))))),
437 hasRHS(expr(hasType(isInteger())))))
Sean Silva93ca0212012-12-13 01:10:46 +0000438
439After adding binds to the expressions we wished to capture and
440extracting the identifier strings into variables, we have array-step-2
441completed.
442
443Step 4: Retrieving Matched Nodes
444================================
445
446So far, the matcher callback isn't very interesting: it just dumps the
447loop's AST. At some point, we will need to make changes to the input
448source code. Next, we'll work on using the nodes we bound in the
449previous step.
450
451The ``MatchFinder::run()`` callback takes a
452``MatchFinder::MatchResult&`` as its parameter. We're most interested in
453its ``Context`` and ``Nodes`` members. Clang uses the ``ASTContext``
454class to represent contextual information about the AST, as the name
455implies, though the most functionally important detail is that several
456operations require an ``ASTContext*`` parameter. More immediately useful
457is the set of matched nodes, and how we retrieve them.
458
459Since we bind three variables (identified by ConditionVarName,
460InitVarName, and IncrementVarName), we can obtain the matched nodes by
461using the ``getNodeAs()`` member function.
462
463In ``LoopActions.cpp``:
464
Dmitri Gribenkoba6efd32013-03-05 13:05:56 +0000465.. code-block:: c++
Sean Silva93ca0212012-12-13 01:10:46 +0000466
467 #include "clang/AST/ASTContext.h"
468
469 void LoopPrinter::run(const MatchFinder::MatchResult &Result) {
470 ASTContext *Context = Result.Context;
471 const ForStmt *FS = Result.Nodes.getStmtAs<ForStmt>(LoopName);
472 // We do not want to convert header files!
473 if (!FS || !Context->getSourceManager().isFromMainFile(FS->getForLoc()))
474 return;
475 const VarDecl *IncVar = Result.Nodes.getNodeAs<VarDecl>(IncrementVarName);
476 const VarDecl *CondVar = Result.Nodes.getNodeAs<VarDecl>(ConditionVarName);
477 const VarDecl *InitVar = Result.Nodes.getNodeAs<VarDecl>(InitVarName);
478
479Now that we have the three variables, represented by their respective
480declarations, let's make sure that they're all the same, using a helper
481function I call ``areSameVariable()``.
482
Dmitri Gribenkoba6efd32013-03-05 13:05:56 +0000483.. code-block:: c++
Sean Silva93ca0212012-12-13 01:10:46 +0000484
485 if (!areSameVariable(IncVar, CondVar) || !areSameVariable(IncVar, InitVar))
486 return;
487 llvm::outs() << "Potential array-based loop discovered.\n";
488 }
489
490If execution reaches the end of ``LoopPrinter::run()``, we know that the
491loop shell that looks like
492
Dmitri Gribenkoba6efd32013-03-05 13:05:56 +0000493.. code-block:: c++
Sean Silva93ca0212012-12-13 01:10:46 +0000494
495 for (int i= 0; i < expr(); ++i) { ... }
496
497For now, we will just print a message explaining that we found a loop.
498The next section will deal with recursively traversing the AST to
499discover all changes needed.
500
501As a side note, here is the implementation of ``areSameVariable``. Clang
502associates a ``VarDecl`` with each variable to represent the variable's
503declaration. Since the "canonical" form of each declaration is unique by
504address, all we need to do is make sure neither ``ValueDecl`` (base
505class of ``VarDecl``) is ``NULL`` and compare the canonical Decls.
506
Dmitri Gribenkoba6efd32013-03-05 13:05:56 +0000507.. code-block:: c++
Sean Silva93ca0212012-12-13 01:10:46 +0000508
509 static bool areSameVariable(const ValueDecl *First, const ValueDecl *Second) {
510 return First && Second &&
511 First->getCanonicalDecl() == Second->getCanonicalDecl();
512 }
513
514It's not as trivial to test if two expressions are the same, though
515Clang has already done the hard work for us by providing a way to
516canonicalize expressions:
517
Dmitri Gribenkoba6efd32013-03-05 13:05:56 +0000518.. code-block:: c++
Sean Silva93ca0212012-12-13 01:10:46 +0000519
Dmitri Gribenkoba6efd32013-03-05 13:05:56 +0000520 static bool areSameExpr(ASTContext *Context, const Expr *First,
Sean Silva93ca0212012-12-13 01:10:46 +0000521 const Expr *Second) {
522 if (!First || !Second)
523 return false;
524 llvm::FoldingSetNodeID FirstID, SecondID;
525 First->Profile(FirstID, *Context, true);
526 Second->Profile(SecondID, *Context, true);
527 return FirstID == SecondID;
528 }
529
530This code relies on the comparison between two
531``llvm::FoldingSetNodeIDs``. As the documentation for
532``Stmt::Profile()`` indicates, the ``Profile()`` member function builds
533a description of a node in the AST, based on its properties, along with
534those of its children. ``FoldingSetNodeID`` then serves as a hash we can
535use to compare expressions. We will need ``areSameExpr`` later. Before
536you run the new code on the additional loops added to
537test-files/simple.cpp, try to figure out which ones will be considered
538potentially convertible.