Sean Silva | 93ca021 | 2012-12-13 01:10:46 +0000 | [diff] [blame] | 1 | =============================================================== |
| 2 | Tutorial for building tools using LibTooling and LibASTMatchers |
| 3 | =============================================================== |
| 4 | |
| 5 | This document is intended to show how to build a useful source-to-source |
| 6 | translation tool based on Clang's `LibTooling <LibTooling.html>`_. It is |
| 7 | explicitly aimed at people who are new to Clang, so all you should need |
| 8 | is a working knowledge of C++ and the command line. |
| 9 | |
| 10 | In order to work on the compiler, you need some basic knowledge of the |
| 11 | abstract syntax tree (AST). To this end, the reader is incouraged to |
| 12 | skim the :doc:`Introduction to the Clang |
| 13 | AST <IntroductionToTheClangAST>` |
| 14 | |
| 15 | Step 0: Obtaining Clang |
| 16 | ======================= |
| 17 | |
| 18 | As Clang is part of the LLVM project, you'll need to download LLVM's |
| 19 | source code first. Both Clang and LLVM are maintained as Subversion |
| 20 | repositories, but we'll be accessing them through the git mirror. For |
| 21 | further information, see the `getting started |
| 22 | guide <http://llvm.org/docs/GettingStarted.html>`_. |
| 23 | |
Dmitri Gribenko | ba6efd3 | 2013-03-05 13:05:56 +0000 | [diff] [blame] | 24 | .. code-block:: console |
Sean Silva | 93ca021 | 2012-12-13 01:10:46 +0000 | [diff] [blame] | 25 | |
| 26 | mkdir ~/clang-llvm && cd ~/clang-llvm |
| 27 | git clone http://llvm.org/git/llvm.git |
| 28 | cd llvm/tools |
| 29 | git clone http://llvm.org/git/clang.git |
Manuel Klimek | 141f1d2 | 2013-04-17 15:02:12 +0000 | [diff] [blame] | 30 | cd clang/tools |
| 31 | git clone http://llvm.org/git/clang-tools-extra.git extra |
Sean Silva | 93ca021 | 2012-12-13 01:10:46 +0000 | [diff] [blame] | 32 | |
| 33 | Next you need to obtain the CMake build system and Ninja build tool. You |
| 34 | may already have CMake installed, but current binary versions of CMake |
| 35 | aren't built with Ninja support. |
| 36 | |
Dmitri Gribenko | ba6efd3 | 2013-03-05 13:05:56 +0000 | [diff] [blame] | 37 | .. code-block:: console |
Sean Silva | 93ca021 | 2012-12-13 01:10:46 +0000 | [diff] [blame] | 38 | |
| 39 | cd ~/clang-llvm |
| 40 | git clone https://github.com/martine/ninja.git |
| 41 | cd ninja |
| 42 | git checkout release |
| 43 | ./bootstrap.py |
| 44 | sudo cp ninja /usr/bin/ |
| 45 | |
| 46 | cd ~/clang-llvm |
| 47 | git clone git://cmake.org/stage/cmake.git |
| 48 | cd cmake |
| 49 | git checkout next |
| 50 | ./bootstrap |
| 51 | make |
| 52 | sudo make install |
| 53 | |
| 54 | Okay. Now we'll build Clang! |
| 55 | |
Dmitri Gribenko | ba6efd3 | 2013-03-05 13:05:56 +0000 | [diff] [blame] | 56 | .. code-block:: console |
Sean Silva | 93ca021 | 2012-12-13 01:10:46 +0000 | [diff] [blame] | 57 | |
| 58 | cd ~/clang-llvm |
| 59 | mkdir build && cd build |
| 60 | cmake -G Ninja ../llvm -DLLVM_BUILD_TESTS=ON # Enable tests; default is off. |
| 61 | ninja |
| 62 | ninja check # Test LLVM only. |
| 63 | ninja clang-test # Test Clang only. |
| 64 | ninja install |
| 65 | |
| 66 | And we're live. |
| 67 | |
| 68 | All of the tests should pass, though there is a (very) small chance that |
| 69 | you can catch LLVM and Clang out of sync. Running ``'git svn rebase'`` |
| 70 | in both the llvm and clang directories should fix any problems. |
| 71 | |
| 72 | Finally, we want to set Clang as its own compiler. |
| 73 | |
Dmitri Gribenko | ba6efd3 | 2013-03-05 13:05:56 +0000 | [diff] [blame] | 74 | .. code-block:: console |
Sean Silva | 93ca021 | 2012-12-13 01:10:46 +0000 | [diff] [blame] | 75 | |
| 76 | cd ~/clang-llvm/build |
| 77 | ccmake ../llvm |
| 78 | |
| 79 | The second command will bring up a GUI for configuring Clang. You need |
| 80 | to set the entry for ``CMAKE_CXX_COMPILER``. Press ``'t'`` to turn on |
| 81 | advanced mode. Scroll down to ``CMAKE_CXX_COMPILER``, and set it to |
| 82 | ``/usr/bin/clang++``, or wherever you installed it. Press ``'c'`` to |
| 83 | configure, then ``'g'`` to generate CMake's files. |
| 84 | |
| 85 | Finally, run ninja one last time, and you're done. |
| 86 | |
| 87 | Step 1: Create a ClangTool |
| 88 | ========================== |
| 89 | |
| 90 | Now that we have enough background knowledge, it's time to create the |
| 91 | simplest productive ClangTool in existence: a syntax checker. While this |
| 92 | already exists as ``clang-check``, it's important to understand what's |
| 93 | going on. |
| 94 | |
| 95 | First, we'll need to create a new directory for our tool and tell CMake |
| 96 | that it exists. As this is not going to be a core clang tool, it will |
| 97 | live in the ``tools/extra`` repository. |
| 98 | |
Dmitri Gribenko | ba6efd3 | 2013-03-05 13:05:56 +0000 | [diff] [blame] | 99 | .. code-block:: console |
Sean Silva | 93ca021 | 2012-12-13 01:10:46 +0000 | [diff] [blame] | 100 | |
| 101 | cd ~/clang-llvm/llvm/tools/clang |
| 102 | mkdir tools/extra/loop-convert |
| 103 | echo 'add_subdirectory(loop-convert)' >> tools/extra/CMakeLists.txt |
| 104 | vim tools/extra/loop-convert/CMakeLists.txt |
| 105 | |
| 106 | CMakeLists.txt should have the following contents: |
| 107 | |
| 108 | :: |
| 109 | |
| 110 | set(LLVM_LINK_COMPONENTS support) |
| 111 | set(LLVM_USED_LIBS clangTooling clangBasic clangAST) |
| 112 | |
| 113 | add_clang_executable(loop-convert |
| 114 | LoopConvert.cpp |
| 115 | ) |
| 116 | target_link_libraries(loop-convert |
| 117 | clangTooling |
| 118 | clangBasic |
| 119 | clangASTMatchers |
| 120 | ) |
| 121 | |
| 122 | With that done, Ninja will be able to compile our tool. Let's give it |
| 123 | something to compile! Put the following into |
| 124 | ``tools/extra/loop-convert/LoopConvert.cpp``. A detailed explanation of |
| 125 | why the different parts are needed can be found in the `LibTooling |
| 126 | documentation <LibTooling.html>`_. |
| 127 | |
Dmitri Gribenko | ba6efd3 | 2013-03-05 13:05:56 +0000 | [diff] [blame] | 128 | .. code-block:: c++ |
Sean Silva | 93ca021 | 2012-12-13 01:10:46 +0000 | [diff] [blame] | 129 | |
| 130 | // Declares clang::SyntaxOnlyAction. |
| 131 | #include "clang/Frontend/FrontendActions.h" |
| 132 | #include "clang/Tooling/CommonOptionsParser.h" |
| 133 | #include "clang/Tooling/Tooling.h" |
| 134 | // Declares llvm::cl::extrahelp. |
| 135 | #include "llvm/Support/CommandLine.h" |
| 136 | |
| 137 | using namespace clang::tooling; |
| 138 | using namespace llvm; |
| 139 | |
Stephen Hines | 651f13c | 2014-04-23 16:59:28 -0700 | [diff] [blame] | 140 | // Apply a custom category to all command-line options so that they are the |
| 141 | // only ones displayed. |
| 142 | static llvm::cl::OptionCategory MyToolCategory("my-tool options"); |
| 143 | |
Sean Silva | 93ca021 | 2012-12-13 01:10:46 +0000 | [diff] [blame] | 144 | // CommonOptionsParser declares HelpMessage with a description of the common |
| 145 | // command-line options related to the compilation database and input files. |
| 146 | // It's nice to have this help message in all tools. |
| 147 | static cl::extrahelp CommonHelp(CommonOptionsParser::HelpMessage); |
| 148 | |
| 149 | // A help message for this specific tool can be added afterwards. |
| 150 | static cl::extrahelp MoreHelp("\nMore help text..."); |
| 151 | |
| 152 | int main(int argc, const char **argv) { |
Stephen Hines | 651f13c | 2014-04-23 16:59:28 -0700 | [diff] [blame] | 153 | CommonOptionsParser OptionsParser(argc, argv, MyToolCategory); |
Edwin Vane | b1f67db | 2012-12-14 18:58:25 +0000 | [diff] [blame] | 154 | ClangTool Tool(OptionsParser.getCompilations(), |
| 155 | OptionsParser.getSourcePathList()); |
Stephen Hines | 6bcf27b | 2014-05-29 04:14:42 -0700 | [diff] [blame] | 156 | return Tool.run(newFrontendActionFactory<clang::SyntaxOnlyAction>().get()); |
Sean Silva | 93ca021 | 2012-12-13 01:10:46 +0000 | [diff] [blame] | 157 | } |
| 158 | |
| 159 | And that's it! You can compile our new tool by running ninja from the |
| 160 | ``build`` directory. |
| 161 | |
Dmitri Gribenko | ba6efd3 | 2013-03-05 13:05:56 +0000 | [diff] [blame] | 162 | .. code-block:: console |
Sean Silva | 93ca021 | 2012-12-13 01:10:46 +0000 | [diff] [blame] | 163 | |
| 164 | cd ~/clang-llvm/build |
| 165 | ninja |
| 166 | |
| 167 | You should now be able to run the syntax checker, which is located in |
| 168 | ``~/clang-llvm/build/bin``, on any source file. Try it! |
| 169 | |
Dmitri Gribenko | ba6efd3 | 2013-03-05 13:05:56 +0000 | [diff] [blame] | 170 | .. code-block:: console |
Sean Silva | 93ca021 | 2012-12-13 01:10:46 +0000 | [diff] [blame] | 171 | |
Manuel Klimek | 141f1d2 | 2013-04-17 15:02:12 +0000 | [diff] [blame] | 172 | cat "int main() { return 0; }" > test.cpp |
Sean Silva | 93ca021 | 2012-12-13 01:10:46 +0000 | [diff] [blame] | 173 | bin/loop-convert test.cpp -- |
| 174 | |
| 175 | Note the two dashes after we specify the source file. The additional |
| 176 | options for the compiler are passed after the dashes rather than loading |
| 177 | them from a compilation database - there just aren't any options needed |
| 178 | right now. |
| 179 | |
| 180 | Intermezzo: Learn AST matcher basics |
| 181 | ==================================== |
| 182 | |
| 183 | Clang recently introduced the :doc:`ASTMatcher |
| 184 | library <LibASTMatchers>` to provide a simple, powerful, and |
| 185 | concise way to describe specific patterns in the AST. Implemented as a |
| 186 | DSL powered by macros and templates (see |
| 187 | `ASTMatchers.h <../doxygen/ASTMatchers_8h_source.html>`_ if you're |
| 188 | curious), matchers offer the feel of algebraic data types common to |
| 189 | functional programming languages. |
| 190 | |
| 191 | For example, suppose you wanted to examine only binary operators. There |
| 192 | is a matcher to do exactly that, conveniently named ``binaryOperator``. |
| 193 | I'll give you one guess what this matcher does: |
| 194 | |
Dmitri Gribenko | ba6efd3 | 2013-03-05 13:05:56 +0000 | [diff] [blame] | 195 | .. code-block:: c++ |
Sean Silva | 93ca021 | 2012-12-13 01:10:46 +0000 | [diff] [blame] | 196 | |
| 197 | binaryOperator(hasOperatorName("+"), hasLHS(integerLiteral(equals(0)))) |
| 198 | |
| 199 | Shockingly, it will match against addition expressions whose left hand |
| 200 | side is exactly the literal 0. It will not match against other forms of |
| 201 | 0, such as ``'\0'`` or ``NULL``, but it will match against macros that |
| 202 | expand to 0. The matcher will also not match against calls to the |
| 203 | overloaded operator ``'+'``, as there is a separate ``operatorCallExpr`` |
| 204 | matcher to handle overloaded operators. |
| 205 | |
| 206 | There are AST matchers to match all the different nodes of the AST, |
| 207 | narrowing matchers to only match AST nodes fulfilling specific criteria, |
| 208 | and traversal matchers to get from one kind of AST node to another. For |
| 209 | a complete list of AST matchers, take a look at the `AST Matcher |
| 210 | References <LibASTMatchersReference.html>`_ |
| 211 | |
| 212 | All matcher that are nouns describe entities in the AST and can be |
| 213 | bound, so that they can be referred to whenever a match is found. To do |
| 214 | so, simply call the method ``bind`` on these matchers, e.g.: |
| 215 | |
Dmitri Gribenko | ba6efd3 | 2013-03-05 13:05:56 +0000 | [diff] [blame] | 216 | .. code-block:: c++ |
Sean Silva | 93ca021 | 2012-12-13 01:10:46 +0000 | [diff] [blame] | 217 | |
| 218 | variable(hasType(isInteger())).bind("intvar") |
| 219 | |
| 220 | Step 2: Using AST matchers |
| 221 | ========================== |
| 222 | |
| 223 | Okay, on to using matchers for real. Let's start by defining a matcher |
| 224 | which will capture all ``for`` statements that define a new variable |
| 225 | initialized to zero. Let's start with matching all ``for`` loops: |
| 226 | |
Dmitri Gribenko | ba6efd3 | 2013-03-05 13:05:56 +0000 | [diff] [blame] | 227 | .. code-block:: c++ |
Sean Silva | 93ca021 | 2012-12-13 01:10:46 +0000 | [diff] [blame] | 228 | |
| 229 | forStmt() |
| 230 | |
| 231 | Next, we want to specify that a single variable is declared in the first |
| 232 | portion of the loop, so we can extend the matcher to |
| 233 | |
Dmitri Gribenko | ba6efd3 | 2013-03-05 13:05:56 +0000 | [diff] [blame] | 234 | .. code-block:: c++ |
Sean Silva | 93ca021 | 2012-12-13 01:10:46 +0000 | [diff] [blame] | 235 | |
| 236 | forStmt(hasLoopInit(declStmt(hasSingleDecl(varDecl())))) |
| 237 | |
| 238 | Finally, we can add the condition that the variable is initialized to |
| 239 | zero. |
| 240 | |
Dmitri Gribenko | ba6efd3 | 2013-03-05 13:05:56 +0000 | [diff] [blame] | 241 | .. code-block:: c++ |
Sean Silva | 93ca021 | 2012-12-13 01:10:46 +0000 | [diff] [blame] | 242 | |
| 243 | forStmt(hasLoopInit(declStmt(hasSingleDecl(varDecl( |
| 244 | hasInitializer(integerLiteral(equals(0)))))))) |
| 245 | |
| 246 | It is fairly easy to read and understand the matcher definition ("match |
| 247 | loops whose init portion declares a single variable which is initialized |
| 248 | to the integer literal 0"), but deciding that every piece is necessary |
| 249 | is more difficult. Note that this matcher will not match loops whose |
| 250 | variables are initialized to ``'\0'``, ``0.0``, ``NULL``, or any form of |
| 251 | zero besides the integer 0. |
| 252 | |
| 253 | The last step is giving the matcher a name and binding the ``ForStmt`` |
| 254 | as we will want to do something with it: |
| 255 | |
Dmitri Gribenko | ba6efd3 | 2013-03-05 13:05:56 +0000 | [diff] [blame] | 256 | .. code-block:: c++ |
Sean Silva | 93ca021 | 2012-12-13 01:10:46 +0000 | [diff] [blame] | 257 | |
| 258 | StatementMatcher LoopMatcher = |
| 259 | forStmt(hasLoopInit(declStmt(hasSingleDecl(varDecl( |
| 260 | hasInitializer(integerLiteral(equals(0)))))))).bind("forLoop"); |
| 261 | |
| 262 | Once you have defined your matchers, you will need to add a little more |
| 263 | scaffolding in order to run them. Matchers are paired with a |
| 264 | ``MatchCallback`` and registered with a ``MatchFinder`` object, then run |
| 265 | from a ``ClangTool``. More code! |
| 266 | |
| 267 | Add the following to ``LoopConvert.cpp``: |
| 268 | |
Dmitri Gribenko | ba6efd3 | 2013-03-05 13:05:56 +0000 | [diff] [blame] | 269 | .. code-block:: c++ |
| 270 | |
Manuel Klimek | e3a2b81 | 2013-03-04 11:31:46 +0000 | [diff] [blame] | 271 | #include "clang/ASTMatchers/ASTMatchers.h" |
| 272 | #include "clang/ASTMatchers/ASTMatchFinder.h" |
| 273 | |
| 274 | using namespace clang; |
| 275 | using namespace clang::ast_matchers; |
Sean Silva | 93ca021 | 2012-12-13 01:10:46 +0000 | [diff] [blame] | 276 | |
| 277 | StatementMatcher LoopMatcher = |
| 278 | forStmt(hasLoopInit(declStmt(hasSingleDecl(varDecl( |
| 279 | hasInitializer(integerLiteral(equals(0)))))))).bind("forLoop"); |
| 280 | |
| 281 | class LoopPrinter : public MatchFinder::MatchCallback { |
| 282 | public : |
| 283 | virtual void run(const MatchFinder::MatchResult &Result) { |
Manuel Klimek | 141f1d2 | 2013-04-17 15:02:12 +0000 | [diff] [blame] | 284 | if (const ForStmt *FS = Result.Nodes.getNodeAs<clang::ForStmt>("forLoop")) |
| 285 | FS->dump(); |
| 286 | } |
Sean Silva | 93ca021 | 2012-12-13 01:10:46 +0000 | [diff] [blame] | 287 | }; |
| 288 | |
| 289 | And change ``main()`` to: |
| 290 | |
Dmitri Gribenko | ba6efd3 | 2013-03-05 13:05:56 +0000 | [diff] [blame] | 291 | .. code-block:: c++ |
Sean Silva | 93ca021 | 2012-12-13 01:10:46 +0000 | [diff] [blame] | 292 | |
| 293 | int main(int argc, const char **argv) { |
Stephen Hines | 651f13c | 2014-04-23 16:59:28 -0700 | [diff] [blame] | 294 | CommonOptionsParser OptionsParser(argc, argv, MyToolCategory); |
Edwin Vane | b1f67db | 2012-12-14 18:58:25 +0000 | [diff] [blame] | 295 | ClangTool Tool(OptionsParser.getCompilations(), |
| 296 | OptionsParser.getSourcePathList()); |
Sean Silva | 93ca021 | 2012-12-13 01:10:46 +0000 | [diff] [blame] | 297 | |
| 298 | LoopPrinter Printer; |
| 299 | MatchFinder Finder; |
| 300 | Finder.addMatcher(LoopMatcher, &Printer); |
| 301 | |
Stephen Hines | 6bcf27b | 2014-05-29 04:14:42 -0700 | [diff] [blame] | 302 | return Tool.run(newFrontendActionFactory(&Finder).get()); |
Sean Silva | 93ca021 | 2012-12-13 01:10:46 +0000 | [diff] [blame] | 303 | } |
| 304 | |
| 305 | Now, you should be able to recompile and run the code to discover for |
| 306 | loops. Create a new file with a few examples, and test out our new |
| 307 | handiwork: |
| 308 | |
Dmitri Gribenko | ba6efd3 | 2013-03-05 13:05:56 +0000 | [diff] [blame] | 309 | .. code-block:: console |
Sean Silva | 93ca021 | 2012-12-13 01:10:46 +0000 | [diff] [blame] | 310 | |
| 311 | cd ~/clang-llvm/llvm/llvm_build/ |
| 312 | ninja loop-convert |
| 313 | vim ~/test-files/simple-loops.cc |
| 314 | bin/loop-convert ~/test-files/simple-loops.cc |
| 315 | |
| 316 | Step 3.5: More Complicated Matchers |
| 317 | =================================== |
| 318 | |
| 319 | Our simple matcher is capable of discovering for loops, but we would |
| 320 | still need to filter out many more ourselves. We can do a good portion |
| 321 | of the remaining work with some cleverly chosen matchers, but first we |
| 322 | need to decide exactly which properties we want to allow. |
| 323 | |
| 324 | How can we characterize for loops over arrays which would be eligible |
| 325 | for translation to range-based syntax? Range based loops over arrays of |
| 326 | size ``N`` that: |
| 327 | |
| 328 | - start at index ``0`` |
| 329 | - iterate consecutively |
| 330 | - end at index ``N-1`` |
| 331 | |
| 332 | We already check for (1), so all we need to add is a check to the loop's |
| 333 | condition to ensure that the loop's index variable is compared against |
| 334 | ``N`` and another check to ensure that the increment step just |
| 335 | increments this same variable. The matcher for (2) is straightforward: |
| 336 | require a pre- or post-increment of the same variable declared in the |
| 337 | init portion. |
| 338 | |
| 339 | Unfortunately, such a matcher is impossible to write. Matchers contain |
| 340 | no logic for comparing two arbitrary AST nodes and determining whether |
| 341 | or not they are equal, so the best we can do is matching more than we |
| 342 | would like to allow, and punting extra comparisons to the callback. |
| 343 | |
| 344 | In any case, we can start building this sub-matcher. We can require that |
| 345 | the increment step be a unary increment like this: |
| 346 | |
Dmitri Gribenko | ba6efd3 | 2013-03-05 13:05:56 +0000 | [diff] [blame] | 347 | .. code-block:: c++ |
Sean Silva | 93ca021 | 2012-12-13 01:10:46 +0000 | [diff] [blame] | 348 | |
| 349 | hasIncrement(unaryOperator(hasOperatorName("++"))) |
| 350 | |
| 351 | Specifying what is incremented introduces another quirk of Clang's AST: |
| 352 | Usages of variables are represented as ``DeclRefExpr``'s ("declaration |
| 353 | reference expressions") because they are expressions which refer to |
| 354 | variable declarations. To find a ``unaryOperator`` that refers to a |
| 355 | specific declaration, we can simply add a second condition to it: |
| 356 | |
Dmitri Gribenko | ba6efd3 | 2013-03-05 13:05:56 +0000 | [diff] [blame] | 357 | .. code-block:: c++ |
Sean Silva | 93ca021 | 2012-12-13 01:10:46 +0000 | [diff] [blame] | 358 | |
| 359 | hasIncrement(unaryOperator( |
| 360 | hasOperatorName("++"), |
| 361 | hasUnaryOperand(declRefExpr()))) |
| 362 | |
| 363 | Furthermore, we can restrict our matcher to only match if the |
| 364 | incremented variable is an integer: |
| 365 | |
Dmitri Gribenko | ba6efd3 | 2013-03-05 13:05:56 +0000 | [diff] [blame] | 366 | .. code-block:: c++ |
Sean Silva | 93ca021 | 2012-12-13 01:10:46 +0000 | [diff] [blame] | 367 | |
| 368 | hasIncrement(unaryOperator( |
| 369 | hasOperatorName("++"), |
| 370 | hasUnaryOperand(declRefExpr(to(varDecl(hasType(isInteger()))))))) |
| 371 | |
| 372 | And the last step will be to attach an identifier to this variable, so |
| 373 | that we can retrieve it in the callback: |
| 374 | |
Dmitri Gribenko | ba6efd3 | 2013-03-05 13:05:56 +0000 | [diff] [blame] | 375 | .. code-block:: c++ |
Sean Silva | 93ca021 | 2012-12-13 01:10:46 +0000 | [diff] [blame] | 376 | |
| 377 | hasIncrement(unaryOperator( |
| 378 | hasOperatorName("++"), |
| 379 | hasUnaryOperand(declRefExpr(to( |
| 380 | varDecl(hasType(isInteger())).bind("incrementVariable")))))) |
| 381 | |
| 382 | We can add this code to the definition of ``LoopMatcher`` and make sure |
| 383 | that our program, outfitted with the new matcher, only prints out loops |
| 384 | that declare a single variable initialized to zero and have an increment |
| 385 | step consisting of a unary increment of some variable. |
| 386 | |
| 387 | Now, we just need to add a matcher to check if the condition part of the |
| 388 | ``for`` loop compares a variable against the size of the array. There is |
| 389 | only one problem - we don't know which array we're iterating over |
| 390 | without looking at the body of the loop! We are again restricted to |
| 391 | approximating the result we want with matchers, filling in the details |
| 392 | in the callback. So we start with: |
| 393 | |
Dmitri Gribenko | ba6efd3 | 2013-03-05 13:05:56 +0000 | [diff] [blame] | 394 | .. code-block:: c++ |
Sean Silva | 93ca021 | 2012-12-13 01:10:46 +0000 | [diff] [blame] | 395 | |
| 396 | hasCondition(binaryOperator(hasOperatorName("<")) |
| 397 | |
| 398 | It makes sense to ensure that the left-hand side is a reference to a |
| 399 | variable, and that the right-hand side has integer type. |
| 400 | |
Dmitri Gribenko | ba6efd3 | 2013-03-05 13:05:56 +0000 | [diff] [blame] | 401 | .. code-block:: c++ |
Sean Silva | 93ca021 | 2012-12-13 01:10:46 +0000 | [diff] [blame] | 402 | |
| 403 | hasCondition(binaryOperator( |
| 404 | hasOperatorName("<"), |
Edwin Vane | 7e6f23a | 2013-03-05 18:04:37 +0000 | [diff] [blame] | 405 | hasLHS(declRefExpr(to(varDecl(hasType(isInteger()))))), |
| 406 | hasRHS(expr(hasType(isInteger()))))) |
Sean Silva | 93ca021 | 2012-12-13 01:10:46 +0000 | [diff] [blame] | 407 | |
| 408 | Why? Because it doesn't work. Of the three loops provided in |
| 409 | ``test-files/simple.cpp``, zero of them have a matching condition. A |
| 410 | quick look at the AST dump of the first for loop, produced by the |
| 411 | previous iteration of loop-convert, shows us the answer: |
| 412 | |
| 413 | :: |
| 414 | |
| 415 | (ForStmt 0x173b240 |
| 416 | (DeclStmt 0x173afc8 |
| 417 | 0x173af50 "int i = |
| 418 | (IntegerLiteral 0x173afa8 'int' 0)") |
| 419 | <<>> |
| 420 | (BinaryOperator 0x173b060 '_Bool' '<' |
Manuel Klimek | 74df03a | 2013-04-18 14:30:45 +0000 | [diff] [blame] | 421 | (ImplicitCastExpr 0x173b030 'int' |
Sean Silva | 93ca021 | 2012-12-13 01:10:46 +0000 | [diff] [blame] | 422 | (DeclRefExpr 0x173afe0 'int' lvalue Var 0x173af50 'i' 'int')) |
Manuel Klimek | 74df03a | 2013-04-18 14:30:45 +0000 | [diff] [blame] | 423 | (ImplicitCastExpr 0x173b048 'int' |
Sean Silva | 93ca021 | 2012-12-13 01:10:46 +0000 | [diff] [blame] | 424 | (DeclRefExpr 0x173b008 'const int' lvalue Var 0x170fa80 'N' 'const int'))) |
| 425 | (UnaryOperator 0x173b0b0 'int' lvalue prefix '++' |
| 426 | (DeclRefExpr 0x173b088 'int' lvalue Var 0x173af50 'i' 'int')) |
Dmitri Gribenko | ebf8fe2 | 2013-04-27 16:26:29 +0000 | [diff] [blame] | 427 | (CompoundStatement ... |
Sean Silva | 93ca021 | 2012-12-13 01:10:46 +0000 | [diff] [blame] | 428 | |
| 429 | We already know that the declaration and increments both match, or this |
| 430 | loop wouldn't have been dumped. The culprit lies in the implicit cast |
| 431 | applied to the first operand (i.e. the LHS) of the less-than operator, |
| 432 | an L-value to R-value conversion applied to the expression referencing |
| 433 | ``i``. Thankfully, the matcher library offers a solution to this problem |
| 434 | in the form of ``ignoringParenImpCasts``, which instructs the matcher to |
| 435 | ignore implicit casts and parentheses before continuing to match. |
| 436 | Adjusting the condition operator will restore the desired match. |
| 437 | |
Dmitri Gribenko | ba6efd3 | 2013-03-05 13:05:56 +0000 | [diff] [blame] | 438 | .. code-block:: c++ |
Sean Silva | 93ca021 | 2012-12-13 01:10:46 +0000 | [diff] [blame] | 439 | |
| 440 | hasCondition(binaryOperator( |
| 441 | hasOperatorName("<"), |
Edwin Vane | 7e6f23a | 2013-03-05 18:04:37 +0000 | [diff] [blame] | 442 | hasLHS(ignoringParenImpCasts(declRefExpr( |
| 443 | to(varDecl(hasType(isInteger())))))), |
| 444 | hasRHS(expr(hasType(isInteger()))))) |
Sean Silva | 93ca021 | 2012-12-13 01:10:46 +0000 | [diff] [blame] | 445 | |
| 446 | After adding binds to the expressions we wished to capture and |
| 447 | extracting the identifier strings into variables, we have array-step-2 |
| 448 | completed. |
| 449 | |
| 450 | Step 4: Retrieving Matched Nodes |
| 451 | ================================ |
| 452 | |
| 453 | So far, the matcher callback isn't very interesting: it just dumps the |
| 454 | loop's AST. At some point, we will need to make changes to the input |
| 455 | source code. Next, we'll work on using the nodes we bound in the |
| 456 | previous step. |
| 457 | |
| 458 | The ``MatchFinder::run()`` callback takes a |
| 459 | ``MatchFinder::MatchResult&`` as its parameter. We're most interested in |
| 460 | its ``Context`` and ``Nodes`` members. Clang uses the ``ASTContext`` |
| 461 | class to represent contextual information about the AST, as the name |
| 462 | implies, though the most functionally important detail is that several |
| 463 | operations require an ``ASTContext*`` parameter. More immediately useful |
| 464 | is the set of matched nodes, and how we retrieve them. |
| 465 | |
| 466 | Since we bind three variables (identified by ConditionVarName, |
| 467 | InitVarName, and IncrementVarName), we can obtain the matched nodes by |
| 468 | using the ``getNodeAs()`` member function. |
| 469 | |
Manuel Klimek | 74df03a | 2013-04-18 14:30:45 +0000 | [diff] [blame] | 470 | In ``LoopConvert.cpp`` add |
Sean Silva | 93ca021 | 2012-12-13 01:10:46 +0000 | [diff] [blame] | 471 | |
Dmitri Gribenko | ba6efd3 | 2013-03-05 13:05:56 +0000 | [diff] [blame] | 472 | .. code-block:: c++ |
Sean Silva | 93ca021 | 2012-12-13 01:10:46 +0000 | [diff] [blame] | 473 | |
| 474 | #include "clang/AST/ASTContext.h" |
| 475 | |
Manuel Klimek | 74df03a | 2013-04-18 14:30:45 +0000 | [diff] [blame] | 476 | Change ``LoopMatcher`` to |
Sean Silva | 93ca021 | 2012-12-13 01:10:46 +0000 | [diff] [blame] | 477 | |
Dmitri Gribenko | ba6efd3 | 2013-03-05 13:05:56 +0000 | [diff] [blame] | 478 | .. code-block:: c++ |
Sean Silva | 93ca021 | 2012-12-13 01:10:46 +0000 | [diff] [blame] | 479 | |
Manuel Klimek | 74df03a | 2013-04-18 14:30:45 +0000 | [diff] [blame] | 480 | StatementMatcher LoopMatcher = |
| 481 | forStmt(hasLoopInit(declStmt( |
| 482 | hasSingleDecl(varDecl(hasInitializer(integerLiteral(equals(0)))) |
| 483 | .bind("initVarName")))), |
| 484 | hasIncrement(unaryOperator( |
| 485 | hasOperatorName("++"), |
| 486 | hasUnaryOperand(declRefExpr( |
| 487 | to(varDecl(hasType(isInteger())).bind("incVarName")))))), |
| 488 | hasCondition(binaryOperator( |
| 489 | hasOperatorName("<"), |
| 490 | hasLHS(ignoringParenImpCasts(declRefExpr( |
| 491 | to(varDecl(hasType(isInteger())).bind("condVarName"))))), |
| 492 | hasRHS(expr(hasType(isInteger())))))).bind("forLoop"); |
| 493 | |
| 494 | And change ``LoopPrinter::run`` to |
| 495 | |
| 496 | .. code-block:: c++ |
| 497 | |
| 498 | void LoopPrinter::run(const MatchFinder::MatchResult &Result) { |
| 499 | ASTContext *Context = Result.Context; |
| 500 | const ForStmt *FS = Result.Nodes.getStmtAs<ForStmt>("forLoop"); |
| 501 | // We do not want to convert header files! |
| 502 | if (!FS || !Context->getSourceManager().isFromMainFile(FS->getForLoc())) |
| 503 | return; |
| 504 | const VarDecl *IncVar = Result.Nodes.getNodeAs<VarDecl>("incVarName"); |
| 505 | const VarDecl *CondVar = Result.Nodes.getNodeAs<VarDecl>("condVarName"); |
| 506 | const VarDecl *InitVar = Result.Nodes.getNodeAs<VarDecl>("initVarName"); |
| 507 | |
| 508 | if (!areSameVariable(IncVar, CondVar) || !areSameVariable(IncVar, InitVar)) |
| 509 | return; |
| 510 | llvm::outs() << "Potential array-based loop discovered.\n"; |
| 511 | } |
| 512 | |
| 513 | Clang associates a ``VarDecl`` with each variable to represent the variable's |
| 514 | declaration. Since the "canonical" form of each declaration is unique by |
| 515 | address, all we need to do is make sure neither ``ValueDecl`` (base class of |
| 516 | ``VarDecl``) is ``NULL`` and compare the canonical Decls. |
| 517 | |
| 518 | .. code-block:: c++ |
| 519 | |
| 520 | static bool areSameVariable(const ValueDecl *First, const ValueDecl *Second) { |
| 521 | return First && Second && |
| 522 | First->getCanonicalDecl() == Second->getCanonicalDecl(); |
| 523 | } |
Sean Silva | 93ca021 | 2012-12-13 01:10:46 +0000 | [diff] [blame] | 524 | |
| 525 | If execution reaches the end of ``LoopPrinter::run()``, we know that the |
| 526 | loop shell that looks like |
| 527 | |
Dmitri Gribenko | ba6efd3 | 2013-03-05 13:05:56 +0000 | [diff] [blame] | 528 | .. code-block:: c++ |
Sean Silva | 93ca021 | 2012-12-13 01:10:46 +0000 | [diff] [blame] | 529 | |
| 530 | for (int i= 0; i < expr(); ++i) { ... } |
| 531 | |
| 532 | For now, we will just print a message explaining that we found a loop. |
| 533 | The next section will deal with recursively traversing the AST to |
| 534 | discover all changes needed. |
| 535 | |
Manuel Klimek | 74df03a | 2013-04-18 14:30:45 +0000 | [diff] [blame] | 536 | As a side note, it's not as trivial to test if two expressions are the same, |
| 537 | though Clang has already done the hard work for us by providing a way to |
Sean Silva | 93ca021 | 2012-12-13 01:10:46 +0000 | [diff] [blame] | 538 | canonicalize expressions: |
| 539 | |
Dmitri Gribenko | ba6efd3 | 2013-03-05 13:05:56 +0000 | [diff] [blame] | 540 | .. code-block:: c++ |
Sean Silva | 93ca021 | 2012-12-13 01:10:46 +0000 | [diff] [blame] | 541 | |
Dmitri Gribenko | ba6efd3 | 2013-03-05 13:05:56 +0000 | [diff] [blame] | 542 | static bool areSameExpr(ASTContext *Context, const Expr *First, |
Sean Silva | 93ca021 | 2012-12-13 01:10:46 +0000 | [diff] [blame] | 543 | const Expr *Second) { |
| 544 | if (!First || !Second) |
| 545 | return false; |
| 546 | llvm::FoldingSetNodeID FirstID, SecondID; |
| 547 | First->Profile(FirstID, *Context, true); |
| 548 | Second->Profile(SecondID, *Context, true); |
| 549 | return FirstID == SecondID; |
| 550 | } |
| 551 | |
| 552 | This code relies on the comparison between two |
| 553 | ``llvm::FoldingSetNodeIDs``. As the documentation for |
| 554 | ``Stmt::Profile()`` indicates, the ``Profile()`` member function builds |
| 555 | a description of a node in the AST, based on its properties, along with |
| 556 | those of its children. ``FoldingSetNodeID`` then serves as a hash we can |
| 557 | use to compare expressions. We will need ``areSameExpr`` later. Before |
| 558 | you run the new code on the additional loops added to |
| 559 | test-files/simple.cpp, try to figure out which ones will be considered |
| 560 | potentially convertible. |