Sean Silva | 93ca021 | 2012-12-13 01:10:46 +0000 | [diff] [blame] | 1 | =============================================================== |
| 2 | Tutorial for building tools using LibTooling and LibASTMatchers |
| 3 | =============================================================== |
| 4 | |
| 5 | This document is intended to show how to build a useful source-to-source |
| 6 | translation tool based on Clang's `LibTooling <LibTooling.html>`_. It is |
| 7 | explicitly aimed at people who are new to Clang, so all you should need |
| 8 | is a working knowledge of C++ and the command line. |
| 9 | |
| 10 | In order to work on the compiler, you need some basic knowledge of the |
| 11 | abstract syntax tree (AST). To this end, the reader is incouraged to |
| 12 | skim the :doc:`Introduction to the Clang |
| 13 | AST <IntroductionToTheClangAST>` |
| 14 | |
| 15 | Step 0: Obtaining Clang |
| 16 | ======================= |
| 17 | |
| 18 | As Clang is part of the LLVM project, you'll need to download LLVM's |
| 19 | source code first. Both Clang and LLVM are maintained as Subversion |
| 20 | repositories, but we'll be accessing them through the git mirror. For |
| 21 | further information, see the `getting started |
| 22 | guide <http://llvm.org/docs/GettingStarted.html>`_. |
| 23 | |
Dmitri Gribenko | ba6efd3 | 2013-03-05 13:05:56 +0000 | [diff] [blame] | 24 | .. code-block:: console |
Sean Silva | 93ca021 | 2012-12-13 01:10:46 +0000 | [diff] [blame] | 25 | |
| 26 | mkdir ~/clang-llvm && cd ~/clang-llvm |
| 27 | git clone http://llvm.org/git/llvm.git |
| 28 | cd llvm/tools |
| 29 | git clone http://llvm.org/git/clang.git |
Manuel Klimek | 141f1d2 | 2013-04-17 15:02:12 +0000 | [diff] [blame] | 30 | cd clang/tools |
| 31 | git clone http://llvm.org/git/clang-tools-extra.git extra |
Sean Silva | 93ca021 | 2012-12-13 01:10:46 +0000 | [diff] [blame] | 32 | |
| 33 | Next you need to obtain the CMake build system and Ninja build tool. You |
| 34 | may already have CMake installed, but current binary versions of CMake |
| 35 | aren't built with Ninja support. |
| 36 | |
Dmitri Gribenko | ba6efd3 | 2013-03-05 13:05:56 +0000 | [diff] [blame] | 37 | .. code-block:: console |
Sean Silva | 93ca021 | 2012-12-13 01:10:46 +0000 | [diff] [blame] | 38 | |
| 39 | cd ~/clang-llvm |
| 40 | git clone https://github.com/martine/ninja.git |
| 41 | cd ninja |
| 42 | git checkout release |
| 43 | ./bootstrap.py |
| 44 | sudo cp ninja /usr/bin/ |
| 45 | |
| 46 | cd ~/clang-llvm |
| 47 | git clone git://cmake.org/stage/cmake.git |
| 48 | cd cmake |
| 49 | git checkout next |
| 50 | ./bootstrap |
| 51 | make |
| 52 | sudo make install |
| 53 | |
| 54 | Okay. Now we'll build Clang! |
| 55 | |
Dmitri Gribenko | ba6efd3 | 2013-03-05 13:05:56 +0000 | [diff] [blame] | 56 | .. code-block:: console |
Sean Silva | 93ca021 | 2012-12-13 01:10:46 +0000 | [diff] [blame] | 57 | |
| 58 | cd ~/clang-llvm |
| 59 | mkdir build && cd build |
| 60 | cmake -G Ninja ../llvm -DLLVM_BUILD_TESTS=ON # Enable tests; default is off. |
| 61 | ninja |
| 62 | ninja check # Test LLVM only. |
| 63 | ninja clang-test # Test Clang only. |
| 64 | ninja install |
| 65 | |
| 66 | And we're live. |
| 67 | |
| 68 | All of the tests should pass, though there is a (very) small chance that |
| 69 | you can catch LLVM and Clang out of sync. Running ``'git svn rebase'`` |
| 70 | in both the llvm and clang directories should fix any problems. |
| 71 | |
| 72 | Finally, we want to set Clang as its own compiler. |
| 73 | |
Dmitri Gribenko | ba6efd3 | 2013-03-05 13:05:56 +0000 | [diff] [blame] | 74 | .. code-block:: console |
Sean Silva | 93ca021 | 2012-12-13 01:10:46 +0000 | [diff] [blame] | 75 | |
| 76 | cd ~/clang-llvm/build |
| 77 | ccmake ../llvm |
| 78 | |
| 79 | The second command will bring up a GUI for configuring Clang. You need |
| 80 | to set the entry for ``CMAKE_CXX_COMPILER``. Press ``'t'`` to turn on |
| 81 | advanced mode. Scroll down to ``CMAKE_CXX_COMPILER``, and set it to |
| 82 | ``/usr/bin/clang++``, or wherever you installed it. Press ``'c'`` to |
| 83 | configure, then ``'g'`` to generate CMake's files. |
| 84 | |
| 85 | Finally, run ninja one last time, and you're done. |
| 86 | |
| 87 | Step 1: Create a ClangTool |
| 88 | ========================== |
| 89 | |
| 90 | Now that we have enough background knowledge, it's time to create the |
| 91 | simplest productive ClangTool in existence: a syntax checker. While this |
| 92 | already exists as ``clang-check``, it's important to understand what's |
| 93 | going on. |
| 94 | |
| 95 | First, we'll need to create a new directory for our tool and tell CMake |
| 96 | that it exists. As this is not going to be a core clang tool, it will |
| 97 | live in the ``tools/extra`` repository. |
| 98 | |
Dmitri Gribenko | ba6efd3 | 2013-03-05 13:05:56 +0000 | [diff] [blame] | 99 | .. code-block:: console |
Sean Silva | 93ca021 | 2012-12-13 01:10:46 +0000 | [diff] [blame] | 100 | |
| 101 | cd ~/clang-llvm/llvm/tools/clang |
| 102 | mkdir tools/extra/loop-convert |
| 103 | echo 'add_subdirectory(loop-convert)' >> tools/extra/CMakeLists.txt |
| 104 | vim tools/extra/loop-convert/CMakeLists.txt |
| 105 | |
| 106 | CMakeLists.txt should have the following contents: |
| 107 | |
| 108 | :: |
| 109 | |
| 110 | set(LLVM_LINK_COMPONENTS support) |
| 111 | set(LLVM_USED_LIBS clangTooling clangBasic clangAST) |
| 112 | |
| 113 | add_clang_executable(loop-convert |
| 114 | LoopConvert.cpp |
| 115 | ) |
| 116 | target_link_libraries(loop-convert |
| 117 | clangTooling |
| 118 | clangBasic |
| 119 | clangASTMatchers |
| 120 | ) |
| 121 | |
| 122 | With that done, Ninja will be able to compile our tool. Let's give it |
| 123 | something to compile! Put the following into |
| 124 | ``tools/extra/loop-convert/LoopConvert.cpp``. A detailed explanation of |
| 125 | why the different parts are needed can be found in the `LibTooling |
| 126 | documentation <LibTooling.html>`_. |
| 127 | |
Dmitri Gribenko | ba6efd3 | 2013-03-05 13:05:56 +0000 | [diff] [blame] | 128 | .. code-block:: c++ |
Sean Silva | 93ca021 | 2012-12-13 01:10:46 +0000 | [diff] [blame] | 129 | |
| 130 | // Declares clang::SyntaxOnlyAction. |
| 131 | #include "clang/Frontend/FrontendActions.h" |
| 132 | #include "clang/Tooling/CommonOptionsParser.h" |
| 133 | #include "clang/Tooling/Tooling.h" |
| 134 | // Declares llvm::cl::extrahelp. |
| 135 | #include "llvm/Support/CommandLine.h" |
| 136 | |
| 137 | using namespace clang::tooling; |
| 138 | using namespace llvm; |
| 139 | |
| 140 | // CommonOptionsParser declares HelpMessage with a description of the common |
| 141 | // command-line options related to the compilation database and input files. |
| 142 | // It's nice to have this help message in all tools. |
| 143 | static cl::extrahelp CommonHelp(CommonOptionsParser::HelpMessage); |
| 144 | |
| 145 | // A help message for this specific tool can be added afterwards. |
| 146 | static cl::extrahelp MoreHelp("\nMore help text..."); |
| 147 | |
| 148 | int main(int argc, const char **argv) { |
| 149 | CommonOptionsParser OptionsParser(argc, argv); |
Edwin Vane | b1f67db | 2012-12-14 18:58:25 +0000 | [diff] [blame] | 150 | ClangTool Tool(OptionsParser.getCompilations(), |
| 151 | OptionsParser.getSourcePathList()); |
Sean Silva | 93ca021 | 2012-12-13 01:10:46 +0000 | [diff] [blame] | 152 | return Tool.run(newFrontendActionFactory<clang::SyntaxOnlyAction>()); |
| 153 | } |
| 154 | |
| 155 | And that's it! You can compile our new tool by running ninja from the |
| 156 | ``build`` directory. |
| 157 | |
Dmitri Gribenko | ba6efd3 | 2013-03-05 13:05:56 +0000 | [diff] [blame] | 158 | .. code-block:: console |
Sean Silva | 93ca021 | 2012-12-13 01:10:46 +0000 | [diff] [blame] | 159 | |
| 160 | cd ~/clang-llvm/build |
| 161 | ninja |
| 162 | |
| 163 | You should now be able to run the syntax checker, which is located in |
| 164 | ``~/clang-llvm/build/bin``, on any source file. Try it! |
| 165 | |
Dmitri Gribenko | ba6efd3 | 2013-03-05 13:05:56 +0000 | [diff] [blame] | 166 | .. code-block:: console |
Sean Silva | 93ca021 | 2012-12-13 01:10:46 +0000 | [diff] [blame] | 167 | |
Manuel Klimek | 141f1d2 | 2013-04-17 15:02:12 +0000 | [diff] [blame] | 168 | cat "int main() { return 0; }" > test.cpp |
Sean Silva | 93ca021 | 2012-12-13 01:10:46 +0000 | [diff] [blame] | 169 | bin/loop-convert test.cpp -- |
| 170 | |
| 171 | Note the two dashes after we specify the source file. The additional |
| 172 | options for the compiler are passed after the dashes rather than loading |
| 173 | them from a compilation database - there just aren't any options needed |
| 174 | right now. |
| 175 | |
| 176 | Intermezzo: Learn AST matcher basics |
| 177 | ==================================== |
| 178 | |
| 179 | Clang recently introduced the :doc:`ASTMatcher |
| 180 | library <LibASTMatchers>` to provide a simple, powerful, and |
| 181 | concise way to describe specific patterns in the AST. Implemented as a |
| 182 | DSL powered by macros and templates (see |
| 183 | `ASTMatchers.h <../doxygen/ASTMatchers_8h_source.html>`_ if you're |
| 184 | curious), matchers offer the feel of algebraic data types common to |
| 185 | functional programming languages. |
| 186 | |
| 187 | For example, suppose you wanted to examine only binary operators. There |
| 188 | is a matcher to do exactly that, conveniently named ``binaryOperator``. |
| 189 | I'll give you one guess what this matcher does: |
| 190 | |
Dmitri Gribenko | ba6efd3 | 2013-03-05 13:05:56 +0000 | [diff] [blame] | 191 | .. code-block:: c++ |
Sean Silva | 93ca021 | 2012-12-13 01:10:46 +0000 | [diff] [blame] | 192 | |
| 193 | binaryOperator(hasOperatorName("+"), hasLHS(integerLiteral(equals(0)))) |
| 194 | |
| 195 | Shockingly, it will match against addition expressions whose left hand |
| 196 | side is exactly the literal 0. It will not match against other forms of |
| 197 | 0, such as ``'\0'`` or ``NULL``, but it will match against macros that |
| 198 | expand to 0. The matcher will also not match against calls to the |
| 199 | overloaded operator ``'+'``, as there is a separate ``operatorCallExpr`` |
| 200 | matcher to handle overloaded operators. |
| 201 | |
| 202 | There are AST matchers to match all the different nodes of the AST, |
| 203 | narrowing matchers to only match AST nodes fulfilling specific criteria, |
| 204 | and traversal matchers to get from one kind of AST node to another. For |
| 205 | a complete list of AST matchers, take a look at the `AST Matcher |
| 206 | References <LibASTMatchersReference.html>`_ |
| 207 | |
| 208 | All matcher that are nouns describe entities in the AST and can be |
| 209 | bound, so that they can be referred to whenever a match is found. To do |
| 210 | so, simply call the method ``bind`` on these matchers, e.g.: |
| 211 | |
Dmitri Gribenko | ba6efd3 | 2013-03-05 13:05:56 +0000 | [diff] [blame] | 212 | .. code-block:: c++ |
Sean Silva | 93ca021 | 2012-12-13 01:10:46 +0000 | [diff] [blame] | 213 | |
| 214 | variable(hasType(isInteger())).bind("intvar") |
| 215 | |
| 216 | Step 2: Using AST matchers |
| 217 | ========================== |
| 218 | |
| 219 | Okay, on to using matchers for real. Let's start by defining a matcher |
| 220 | which will capture all ``for`` statements that define a new variable |
| 221 | initialized to zero. Let's start with matching all ``for`` loops: |
| 222 | |
Dmitri Gribenko | ba6efd3 | 2013-03-05 13:05:56 +0000 | [diff] [blame] | 223 | .. code-block:: c++ |
Sean Silva | 93ca021 | 2012-12-13 01:10:46 +0000 | [diff] [blame] | 224 | |
| 225 | forStmt() |
| 226 | |
| 227 | Next, we want to specify that a single variable is declared in the first |
| 228 | portion of the loop, so we can extend the matcher to |
| 229 | |
Dmitri Gribenko | ba6efd3 | 2013-03-05 13:05:56 +0000 | [diff] [blame] | 230 | .. code-block:: c++ |
Sean Silva | 93ca021 | 2012-12-13 01:10:46 +0000 | [diff] [blame] | 231 | |
| 232 | forStmt(hasLoopInit(declStmt(hasSingleDecl(varDecl())))) |
| 233 | |
| 234 | Finally, we can add the condition that the variable is initialized to |
| 235 | zero. |
| 236 | |
Dmitri Gribenko | ba6efd3 | 2013-03-05 13:05:56 +0000 | [diff] [blame] | 237 | .. code-block:: c++ |
Sean Silva | 93ca021 | 2012-12-13 01:10:46 +0000 | [diff] [blame] | 238 | |
| 239 | forStmt(hasLoopInit(declStmt(hasSingleDecl(varDecl( |
| 240 | hasInitializer(integerLiteral(equals(0)))))))) |
| 241 | |
| 242 | It is fairly easy to read and understand the matcher definition ("match |
| 243 | loops whose init portion declares a single variable which is initialized |
| 244 | to the integer literal 0"), but deciding that every piece is necessary |
| 245 | is more difficult. Note that this matcher will not match loops whose |
| 246 | variables are initialized to ``'\0'``, ``0.0``, ``NULL``, or any form of |
| 247 | zero besides the integer 0. |
| 248 | |
| 249 | The last step is giving the matcher a name and binding the ``ForStmt`` |
| 250 | as we will want to do something with it: |
| 251 | |
Dmitri Gribenko | ba6efd3 | 2013-03-05 13:05:56 +0000 | [diff] [blame] | 252 | .. code-block:: c++ |
Sean Silva | 93ca021 | 2012-12-13 01:10:46 +0000 | [diff] [blame] | 253 | |
| 254 | StatementMatcher LoopMatcher = |
| 255 | forStmt(hasLoopInit(declStmt(hasSingleDecl(varDecl( |
| 256 | hasInitializer(integerLiteral(equals(0)))))))).bind("forLoop"); |
| 257 | |
| 258 | Once you have defined your matchers, you will need to add a little more |
| 259 | scaffolding in order to run them. Matchers are paired with a |
| 260 | ``MatchCallback`` and registered with a ``MatchFinder`` object, then run |
| 261 | from a ``ClangTool``. More code! |
| 262 | |
| 263 | Add the following to ``LoopConvert.cpp``: |
| 264 | |
Dmitri Gribenko | ba6efd3 | 2013-03-05 13:05:56 +0000 | [diff] [blame] | 265 | .. code-block:: c++ |
| 266 | |
Manuel Klimek | e3a2b81 | 2013-03-04 11:31:46 +0000 | [diff] [blame] | 267 | #include "clang/ASTMatchers/ASTMatchers.h" |
| 268 | #include "clang/ASTMatchers/ASTMatchFinder.h" |
| 269 | |
| 270 | using namespace clang; |
| 271 | using namespace clang::ast_matchers; |
Sean Silva | 93ca021 | 2012-12-13 01:10:46 +0000 | [diff] [blame] | 272 | |
| 273 | StatementMatcher LoopMatcher = |
| 274 | forStmt(hasLoopInit(declStmt(hasSingleDecl(varDecl( |
| 275 | hasInitializer(integerLiteral(equals(0)))))))).bind("forLoop"); |
| 276 | |
| 277 | class LoopPrinter : public MatchFinder::MatchCallback { |
| 278 | public : |
| 279 | virtual void run(const MatchFinder::MatchResult &Result) { |
Manuel Klimek | 141f1d2 | 2013-04-17 15:02:12 +0000 | [diff] [blame] | 280 | if (const ForStmt *FS = Result.Nodes.getNodeAs<clang::ForStmt>("forLoop")) |
| 281 | FS->dump(); |
| 282 | } |
Sean Silva | 93ca021 | 2012-12-13 01:10:46 +0000 | [diff] [blame] | 283 | }; |
| 284 | |
| 285 | And change ``main()`` to: |
| 286 | |
Dmitri Gribenko | ba6efd3 | 2013-03-05 13:05:56 +0000 | [diff] [blame] | 287 | .. code-block:: c++ |
Sean Silva | 93ca021 | 2012-12-13 01:10:46 +0000 | [diff] [blame] | 288 | |
| 289 | int main(int argc, const char **argv) { |
| 290 | CommonOptionsParser OptionsParser(argc, argv); |
Edwin Vane | b1f67db | 2012-12-14 18:58:25 +0000 | [diff] [blame] | 291 | ClangTool Tool(OptionsParser.getCompilations(), |
| 292 | OptionsParser.getSourcePathList()); |
Sean Silva | 93ca021 | 2012-12-13 01:10:46 +0000 | [diff] [blame] | 293 | |
| 294 | LoopPrinter Printer; |
| 295 | MatchFinder Finder; |
| 296 | Finder.addMatcher(LoopMatcher, &Printer); |
| 297 | |
| 298 | return Tool.run(newFrontendActionFactory(&Finder)); |
| 299 | } |
| 300 | |
| 301 | Now, you should be able to recompile and run the code to discover for |
| 302 | loops. Create a new file with a few examples, and test out our new |
| 303 | handiwork: |
| 304 | |
Dmitri Gribenko | ba6efd3 | 2013-03-05 13:05:56 +0000 | [diff] [blame] | 305 | .. code-block:: console |
Sean Silva | 93ca021 | 2012-12-13 01:10:46 +0000 | [diff] [blame] | 306 | |
| 307 | cd ~/clang-llvm/llvm/llvm_build/ |
| 308 | ninja loop-convert |
| 309 | vim ~/test-files/simple-loops.cc |
| 310 | bin/loop-convert ~/test-files/simple-loops.cc |
| 311 | |
| 312 | Step 3.5: More Complicated Matchers |
| 313 | =================================== |
| 314 | |
| 315 | Our simple matcher is capable of discovering for loops, but we would |
| 316 | still need to filter out many more ourselves. We can do a good portion |
| 317 | of the remaining work with some cleverly chosen matchers, but first we |
| 318 | need to decide exactly which properties we want to allow. |
| 319 | |
| 320 | How can we characterize for loops over arrays which would be eligible |
| 321 | for translation to range-based syntax? Range based loops over arrays of |
| 322 | size ``N`` that: |
| 323 | |
| 324 | - start at index ``0`` |
| 325 | - iterate consecutively |
| 326 | - end at index ``N-1`` |
| 327 | |
| 328 | We already check for (1), so all we need to add is a check to the loop's |
| 329 | condition to ensure that the loop's index variable is compared against |
| 330 | ``N`` and another check to ensure that the increment step just |
| 331 | increments this same variable. The matcher for (2) is straightforward: |
| 332 | require a pre- or post-increment of the same variable declared in the |
| 333 | init portion. |
| 334 | |
| 335 | Unfortunately, such a matcher is impossible to write. Matchers contain |
| 336 | no logic for comparing two arbitrary AST nodes and determining whether |
| 337 | or not they are equal, so the best we can do is matching more than we |
| 338 | would like to allow, and punting extra comparisons to the callback. |
| 339 | |
| 340 | In any case, we can start building this sub-matcher. We can require that |
| 341 | the increment step be a unary increment like this: |
| 342 | |
Dmitri Gribenko | ba6efd3 | 2013-03-05 13:05:56 +0000 | [diff] [blame] | 343 | .. code-block:: c++ |
Sean Silva | 93ca021 | 2012-12-13 01:10:46 +0000 | [diff] [blame] | 344 | |
| 345 | hasIncrement(unaryOperator(hasOperatorName("++"))) |
| 346 | |
| 347 | Specifying what is incremented introduces another quirk of Clang's AST: |
| 348 | Usages of variables are represented as ``DeclRefExpr``'s ("declaration |
| 349 | reference expressions") because they are expressions which refer to |
| 350 | variable declarations. To find a ``unaryOperator`` that refers to a |
| 351 | specific declaration, we can simply add a second condition to it: |
| 352 | |
Dmitri Gribenko | ba6efd3 | 2013-03-05 13:05:56 +0000 | [diff] [blame] | 353 | .. code-block:: c++ |
Sean Silva | 93ca021 | 2012-12-13 01:10:46 +0000 | [diff] [blame] | 354 | |
| 355 | hasIncrement(unaryOperator( |
| 356 | hasOperatorName("++"), |
| 357 | hasUnaryOperand(declRefExpr()))) |
| 358 | |
| 359 | Furthermore, we can restrict our matcher to only match if the |
| 360 | incremented variable is an integer: |
| 361 | |
Dmitri Gribenko | ba6efd3 | 2013-03-05 13:05:56 +0000 | [diff] [blame] | 362 | .. code-block:: c++ |
Sean Silva | 93ca021 | 2012-12-13 01:10:46 +0000 | [diff] [blame] | 363 | |
| 364 | hasIncrement(unaryOperator( |
| 365 | hasOperatorName("++"), |
| 366 | hasUnaryOperand(declRefExpr(to(varDecl(hasType(isInteger()))))))) |
| 367 | |
| 368 | And the last step will be to attach an identifier to this variable, so |
| 369 | that we can retrieve it in the callback: |
| 370 | |
Dmitri Gribenko | ba6efd3 | 2013-03-05 13:05:56 +0000 | [diff] [blame] | 371 | .. code-block:: c++ |
Sean Silva | 93ca021 | 2012-12-13 01:10:46 +0000 | [diff] [blame] | 372 | |
| 373 | hasIncrement(unaryOperator( |
| 374 | hasOperatorName("++"), |
| 375 | hasUnaryOperand(declRefExpr(to( |
| 376 | varDecl(hasType(isInteger())).bind("incrementVariable")))))) |
| 377 | |
| 378 | We can add this code to the definition of ``LoopMatcher`` and make sure |
| 379 | that our program, outfitted with the new matcher, only prints out loops |
| 380 | that declare a single variable initialized to zero and have an increment |
| 381 | step consisting of a unary increment of some variable. |
| 382 | |
| 383 | Now, we just need to add a matcher to check if the condition part of the |
| 384 | ``for`` loop compares a variable against the size of the array. There is |
| 385 | only one problem - we don't know which array we're iterating over |
| 386 | without looking at the body of the loop! We are again restricted to |
| 387 | approximating the result we want with matchers, filling in the details |
| 388 | in the callback. So we start with: |
| 389 | |
Dmitri Gribenko | ba6efd3 | 2013-03-05 13:05:56 +0000 | [diff] [blame] | 390 | .. code-block:: c++ |
Sean Silva | 93ca021 | 2012-12-13 01:10:46 +0000 | [diff] [blame] | 391 | |
| 392 | hasCondition(binaryOperator(hasOperatorName("<")) |
| 393 | |
| 394 | It makes sense to ensure that the left-hand side is a reference to a |
| 395 | variable, and that the right-hand side has integer type. |
| 396 | |
Dmitri Gribenko | ba6efd3 | 2013-03-05 13:05:56 +0000 | [diff] [blame] | 397 | .. code-block:: c++ |
Sean Silva | 93ca021 | 2012-12-13 01:10:46 +0000 | [diff] [blame] | 398 | |
| 399 | hasCondition(binaryOperator( |
| 400 | hasOperatorName("<"), |
Edwin Vane | 7e6f23a | 2013-03-05 18:04:37 +0000 | [diff] [blame] | 401 | hasLHS(declRefExpr(to(varDecl(hasType(isInteger()))))), |
| 402 | hasRHS(expr(hasType(isInteger()))))) |
Sean Silva | 93ca021 | 2012-12-13 01:10:46 +0000 | [diff] [blame] | 403 | |
| 404 | Why? Because it doesn't work. Of the three loops provided in |
| 405 | ``test-files/simple.cpp``, zero of them have a matching condition. A |
| 406 | quick look at the AST dump of the first for loop, produced by the |
| 407 | previous iteration of loop-convert, shows us the answer: |
| 408 | |
| 409 | :: |
| 410 | |
| 411 | (ForStmt 0x173b240 |
| 412 | (DeclStmt 0x173afc8 |
| 413 | 0x173af50 "int i = |
| 414 | (IntegerLiteral 0x173afa8 'int' 0)") |
| 415 | <<>> |
| 416 | (BinaryOperator 0x173b060 '_Bool' '<' |
Manuel Klimek | 74df03a | 2013-04-18 14:30:45 +0000 | [diff] [blame] | 417 | (ImplicitCastExpr 0x173b030 'int' |
Sean Silva | 93ca021 | 2012-12-13 01:10:46 +0000 | [diff] [blame] | 418 | (DeclRefExpr 0x173afe0 'int' lvalue Var 0x173af50 'i' 'int')) |
Manuel Klimek | 74df03a | 2013-04-18 14:30:45 +0000 | [diff] [blame] | 419 | (ImplicitCastExpr 0x173b048 'int' |
Sean Silva | 93ca021 | 2012-12-13 01:10:46 +0000 | [diff] [blame] | 420 | (DeclRefExpr 0x173b008 'const int' lvalue Var 0x170fa80 'N' 'const int'))) |
| 421 | (UnaryOperator 0x173b0b0 'int' lvalue prefix '++' |
| 422 | (DeclRefExpr 0x173b088 'int' lvalue Var 0x173af50 'i' 'int')) |
Dmitri Gribenko | ebf8fe2 | 2013-04-27 16:26:29 +0000 | [diff] [blame] | 423 | (CompoundStatement ... |
Sean Silva | 93ca021 | 2012-12-13 01:10:46 +0000 | [diff] [blame] | 424 | |
| 425 | We already know that the declaration and increments both match, or this |
| 426 | loop wouldn't have been dumped. The culprit lies in the implicit cast |
| 427 | applied to the first operand (i.e. the LHS) of the less-than operator, |
| 428 | an L-value to R-value conversion applied to the expression referencing |
| 429 | ``i``. Thankfully, the matcher library offers a solution to this problem |
| 430 | in the form of ``ignoringParenImpCasts``, which instructs the matcher to |
| 431 | ignore implicit casts and parentheses before continuing to match. |
| 432 | Adjusting the condition operator will restore the desired match. |
| 433 | |
Dmitri Gribenko | ba6efd3 | 2013-03-05 13:05:56 +0000 | [diff] [blame] | 434 | .. code-block:: c++ |
Sean Silva | 93ca021 | 2012-12-13 01:10:46 +0000 | [diff] [blame] | 435 | |
| 436 | hasCondition(binaryOperator( |
| 437 | hasOperatorName("<"), |
Edwin Vane | 7e6f23a | 2013-03-05 18:04:37 +0000 | [diff] [blame] | 438 | hasLHS(ignoringParenImpCasts(declRefExpr( |
| 439 | to(varDecl(hasType(isInteger())))))), |
| 440 | hasRHS(expr(hasType(isInteger()))))) |
Sean Silva | 93ca021 | 2012-12-13 01:10:46 +0000 | [diff] [blame] | 441 | |
| 442 | After adding binds to the expressions we wished to capture and |
| 443 | extracting the identifier strings into variables, we have array-step-2 |
| 444 | completed. |
| 445 | |
| 446 | Step 4: Retrieving Matched Nodes |
| 447 | ================================ |
| 448 | |
| 449 | So far, the matcher callback isn't very interesting: it just dumps the |
| 450 | loop's AST. At some point, we will need to make changes to the input |
| 451 | source code. Next, we'll work on using the nodes we bound in the |
| 452 | previous step. |
| 453 | |
| 454 | The ``MatchFinder::run()`` callback takes a |
| 455 | ``MatchFinder::MatchResult&`` as its parameter. We're most interested in |
| 456 | its ``Context`` and ``Nodes`` members. Clang uses the ``ASTContext`` |
| 457 | class to represent contextual information about the AST, as the name |
| 458 | implies, though the most functionally important detail is that several |
| 459 | operations require an ``ASTContext*`` parameter. More immediately useful |
| 460 | is the set of matched nodes, and how we retrieve them. |
| 461 | |
| 462 | Since we bind three variables (identified by ConditionVarName, |
| 463 | InitVarName, and IncrementVarName), we can obtain the matched nodes by |
| 464 | using the ``getNodeAs()`` member function. |
| 465 | |
Manuel Klimek | 74df03a | 2013-04-18 14:30:45 +0000 | [diff] [blame] | 466 | In ``LoopConvert.cpp`` add |
Sean Silva | 93ca021 | 2012-12-13 01:10:46 +0000 | [diff] [blame] | 467 | |
Dmitri Gribenko | ba6efd3 | 2013-03-05 13:05:56 +0000 | [diff] [blame] | 468 | .. code-block:: c++ |
Sean Silva | 93ca021 | 2012-12-13 01:10:46 +0000 | [diff] [blame] | 469 | |
| 470 | #include "clang/AST/ASTContext.h" |
| 471 | |
Manuel Klimek | 74df03a | 2013-04-18 14:30:45 +0000 | [diff] [blame] | 472 | Change ``LoopMatcher`` to |
Sean Silva | 93ca021 | 2012-12-13 01:10:46 +0000 | [diff] [blame] | 473 | |
Dmitri Gribenko | ba6efd3 | 2013-03-05 13:05:56 +0000 | [diff] [blame] | 474 | .. code-block:: c++ |
Sean Silva | 93ca021 | 2012-12-13 01:10:46 +0000 | [diff] [blame] | 475 | |
Manuel Klimek | 74df03a | 2013-04-18 14:30:45 +0000 | [diff] [blame] | 476 | StatementMatcher LoopMatcher = |
| 477 | forStmt(hasLoopInit(declStmt( |
| 478 | hasSingleDecl(varDecl(hasInitializer(integerLiteral(equals(0)))) |
| 479 | .bind("initVarName")))), |
| 480 | hasIncrement(unaryOperator( |
| 481 | hasOperatorName("++"), |
| 482 | hasUnaryOperand(declRefExpr( |
| 483 | to(varDecl(hasType(isInteger())).bind("incVarName")))))), |
| 484 | hasCondition(binaryOperator( |
| 485 | hasOperatorName("<"), |
| 486 | hasLHS(ignoringParenImpCasts(declRefExpr( |
| 487 | to(varDecl(hasType(isInteger())).bind("condVarName"))))), |
| 488 | hasRHS(expr(hasType(isInteger())))))).bind("forLoop"); |
| 489 | |
| 490 | And change ``LoopPrinter::run`` to |
| 491 | |
| 492 | .. code-block:: c++ |
| 493 | |
| 494 | void LoopPrinter::run(const MatchFinder::MatchResult &Result) { |
| 495 | ASTContext *Context = Result.Context; |
| 496 | const ForStmt *FS = Result.Nodes.getStmtAs<ForStmt>("forLoop"); |
| 497 | // We do not want to convert header files! |
| 498 | if (!FS || !Context->getSourceManager().isFromMainFile(FS->getForLoc())) |
| 499 | return; |
| 500 | const VarDecl *IncVar = Result.Nodes.getNodeAs<VarDecl>("incVarName"); |
| 501 | const VarDecl *CondVar = Result.Nodes.getNodeAs<VarDecl>("condVarName"); |
| 502 | const VarDecl *InitVar = Result.Nodes.getNodeAs<VarDecl>("initVarName"); |
| 503 | |
| 504 | if (!areSameVariable(IncVar, CondVar) || !areSameVariable(IncVar, InitVar)) |
| 505 | return; |
| 506 | llvm::outs() << "Potential array-based loop discovered.\n"; |
| 507 | } |
| 508 | |
| 509 | Clang associates a ``VarDecl`` with each variable to represent the variable's |
| 510 | declaration. Since the "canonical" form of each declaration is unique by |
| 511 | address, all we need to do is make sure neither ``ValueDecl`` (base class of |
| 512 | ``VarDecl``) is ``NULL`` and compare the canonical Decls. |
| 513 | |
| 514 | .. code-block:: c++ |
| 515 | |
| 516 | static bool areSameVariable(const ValueDecl *First, const ValueDecl *Second) { |
| 517 | return First && Second && |
| 518 | First->getCanonicalDecl() == Second->getCanonicalDecl(); |
| 519 | } |
Sean Silva | 93ca021 | 2012-12-13 01:10:46 +0000 | [diff] [blame] | 520 | |
| 521 | If execution reaches the end of ``LoopPrinter::run()``, we know that the |
| 522 | loop shell that looks like |
| 523 | |
Dmitri Gribenko | ba6efd3 | 2013-03-05 13:05:56 +0000 | [diff] [blame] | 524 | .. code-block:: c++ |
Sean Silva | 93ca021 | 2012-12-13 01:10:46 +0000 | [diff] [blame] | 525 | |
| 526 | for (int i= 0; i < expr(); ++i) { ... } |
| 527 | |
| 528 | For now, we will just print a message explaining that we found a loop. |
| 529 | The next section will deal with recursively traversing the AST to |
| 530 | discover all changes needed. |
| 531 | |
Manuel Klimek | 74df03a | 2013-04-18 14:30:45 +0000 | [diff] [blame] | 532 | As a side note, it's not as trivial to test if two expressions are the same, |
| 533 | though Clang has already done the hard work for us by providing a way to |
Sean Silva | 93ca021 | 2012-12-13 01:10:46 +0000 | [diff] [blame] | 534 | canonicalize expressions: |
| 535 | |
Dmitri Gribenko | ba6efd3 | 2013-03-05 13:05:56 +0000 | [diff] [blame] | 536 | .. code-block:: c++ |
Sean Silva | 93ca021 | 2012-12-13 01:10:46 +0000 | [diff] [blame] | 537 | |
Dmitri Gribenko | ba6efd3 | 2013-03-05 13:05:56 +0000 | [diff] [blame] | 538 | static bool areSameExpr(ASTContext *Context, const Expr *First, |
Sean Silva | 93ca021 | 2012-12-13 01:10:46 +0000 | [diff] [blame] | 539 | const Expr *Second) { |
| 540 | if (!First || !Second) |
| 541 | return false; |
| 542 | llvm::FoldingSetNodeID FirstID, SecondID; |
| 543 | First->Profile(FirstID, *Context, true); |
| 544 | Second->Profile(SecondID, *Context, true); |
| 545 | return FirstID == SecondID; |
| 546 | } |
| 547 | |
| 548 | This code relies on the comparison between two |
| 549 | ``llvm::FoldingSetNodeIDs``. As the documentation for |
| 550 | ``Stmt::Profile()`` indicates, the ``Profile()`` member function builds |
| 551 | a description of a node in the AST, based on its properties, along with |
| 552 | those of its children. ``FoldingSetNodeID`` then serves as a hash we can |
| 553 | use to compare expressions. We will need ``areSameExpr`` later. Before |
| 554 | you run the new code on the additional loops added to |
| 555 | test-files/simple.cpp, try to figure out which ones will be considered |
| 556 | potentially convertible. |