Sean Silva | 93ca021 | 2012-12-13 01:10:46 +0000 | [diff] [blame] | 1 | =============================================================== |
| 2 | Tutorial for building tools using LibTooling and LibASTMatchers |
| 3 | =============================================================== |
| 4 | |
| 5 | This document is intended to show how to build a useful source-to-source |
| 6 | translation tool based on Clang's `LibTooling <LibTooling.html>`_. It is |
| 7 | explicitly aimed at people who are new to Clang, so all you should need |
| 8 | is a working knowledge of C++ and the command line. |
| 9 | |
| 10 | In order to work on the compiler, you need some basic knowledge of the |
| 11 | abstract syntax tree (AST). To this end, the reader is incouraged to |
| 12 | skim the :doc:`Introduction to the Clang |
| 13 | AST <IntroductionToTheClangAST>` |
| 14 | |
| 15 | Step 0: Obtaining Clang |
| 16 | ======================= |
| 17 | |
| 18 | As Clang is part of the LLVM project, you'll need to download LLVM's |
| 19 | source code first. Both Clang and LLVM are maintained as Subversion |
| 20 | repositories, but we'll be accessing them through the git mirror. For |
| 21 | further information, see the `getting started |
| 22 | guide <http://llvm.org/docs/GettingStarted.html>`_. |
| 23 | |
Dmitri Gribenko | ba6efd3 | 2013-03-05 13:05:56 +0000 | [diff] [blame] | 24 | .. code-block:: console |
Sean Silva | 93ca021 | 2012-12-13 01:10:46 +0000 | [diff] [blame] | 25 | |
| 26 | mkdir ~/clang-llvm && cd ~/clang-llvm |
| 27 | git clone http://llvm.org/git/llvm.git |
| 28 | cd llvm/tools |
| 29 | git clone http://llvm.org/git/clang.git |
Manuel Klimek | 141f1d2 | 2013-04-17 15:02:12 +0000 | [diff] [blame] | 30 | cd clang/tools |
| 31 | git clone http://llvm.org/git/clang-tools-extra.git extra |
Sean Silva | 93ca021 | 2012-12-13 01:10:46 +0000 | [diff] [blame] | 32 | |
| 33 | Next you need to obtain the CMake build system and Ninja build tool. You |
| 34 | may already have CMake installed, but current binary versions of CMake |
| 35 | aren't built with Ninja support. |
| 36 | |
Dmitri Gribenko | ba6efd3 | 2013-03-05 13:05:56 +0000 | [diff] [blame] | 37 | .. code-block:: console |
Sean Silva | 93ca021 | 2012-12-13 01:10:46 +0000 | [diff] [blame] | 38 | |
| 39 | cd ~/clang-llvm |
| 40 | git clone https://github.com/martine/ninja.git |
| 41 | cd ninja |
| 42 | git checkout release |
| 43 | ./bootstrap.py |
| 44 | sudo cp ninja /usr/bin/ |
| 45 | |
| 46 | cd ~/clang-llvm |
| 47 | git clone git://cmake.org/stage/cmake.git |
| 48 | cd cmake |
| 49 | git checkout next |
| 50 | ./bootstrap |
| 51 | make |
| 52 | sudo make install |
| 53 | |
| 54 | Okay. Now we'll build Clang! |
| 55 | |
Dmitri Gribenko | ba6efd3 | 2013-03-05 13:05:56 +0000 | [diff] [blame] | 56 | .. code-block:: console |
Sean Silva | 93ca021 | 2012-12-13 01:10:46 +0000 | [diff] [blame] | 57 | |
| 58 | cd ~/clang-llvm |
| 59 | mkdir build && cd build |
| 60 | cmake -G Ninja ../llvm -DLLVM_BUILD_TESTS=ON # Enable tests; default is off. |
| 61 | ninja |
| 62 | ninja check # Test LLVM only. |
| 63 | ninja clang-test # Test Clang only. |
| 64 | ninja install |
| 65 | |
| 66 | And we're live. |
| 67 | |
| 68 | All of the tests should pass, though there is a (very) small chance that |
| 69 | you can catch LLVM and Clang out of sync. Running ``'git svn rebase'`` |
| 70 | in both the llvm and clang directories should fix any problems. |
| 71 | |
| 72 | Finally, we want to set Clang as its own compiler. |
| 73 | |
Dmitri Gribenko | ba6efd3 | 2013-03-05 13:05:56 +0000 | [diff] [blame] | 74 | .. code-block:: console |
Sean Silva | 93ca021 | 2012-12-13 01:10:46 +0000 | [diff] [blame] | 75 | |
| 76 | cd ~/clang-llvm/build |
| 77 | ccmake ../llvm |
| 78 | |
| 79 | The second command will bring up a GUI for configuring Clang. You need |
| 80 | to set the entry for ``CMAKE_CXX_COMPILER``. Press ``'t'`` to turn on |
| 81 | advanced mode. Scroll down to ``CMAKE_CXX_COMPILER``, and set it to |
| 82 | ``/usr/bin/clang++``, or wherever you installed it. Press ``'c'`` to |
| 83 | configure, then ``'g'`` to generate CMake's files. |
| 84 | |
| 85 | Finally, run ninja one last time, and you're done. |
| 86 | |
| 87 | Step 1: Create a ClangTool |
| 88 | ========================== |
| 89 | |
| 90 | Now that we have enough background knowledge, it's time to create the |
| 91 | simplest productive ClangTool in existence: a syntax checker. While this |
| 92 | already exists as ``clang-check``, it's important to understand what's |
| 93 | going on. |
| 94 | |
| 95 | First, we'll need to create a new directory for our tool and tell CMake |
| 96 | that it exists. As this is not going to be a core clang tool, it will |
| 97 | live in the ``tools/extra`` repository. |
| 98 | |
Dmitri Gribenko | ba6efd3 | 2013-03-05 13:05:56 +0000 | [diff] [blame] | 99 | .. code-block:: console |
Sean Silva | 93ca021 | 2012-12-13 01:10:46 +0000 | [diff] [blame] | 100 | |
| 101 | cd ~/clang-llvm/llvm/tools/clang |
| 102 | mkdir tools/extra/loop-convert |
| 103 | echo 'add_subdirectory(loop-convert)' >> tools/extra/CMakeLists.txt |
| 104 | vim tools/extra/loop-convert/CMakeLists.txt |
| 105 | |
| 106 | CMakeLists.txt should have the following contents: |
| 107 | |
| 108 | :: |
| 109 | |
| 110 | set(LLVM_LINK_COMPONENTS support) |
Sean Silva | 93ca021 | 2012-12-13 01:10:46 +0000 | [diff] [blame] | 111 | |
| 112 | add_clang_executable(loop-convert |
| 113 | LoopConvert.cpp |
| 114 | ) |
| 115 | target_link_libraries(loop-convert |
| 116 | clangTooling |
| 117 | clangBasic |
| 118 | clangASTMatchers |
| 119 | ) |
| 120 | |
| 121 | With that done, Ninja will be able to compile our tool. Let's give it |
| 122 | something to compile! Put the following into |
| 123 | ``tools/extra/loop-convert/LoopConvert.cpp``. A detailed explanation of |
| 124 | why the different parts are needed can be found in the `LibTooling |
| 125 | documentation <LibTooling.html>`_. |
| 126 | |
Dmitri Gribenko | ba6efd3 | 2013-03-05 13:05:56 +0000 | [diff] [blame] | 127 | .. code-block:: c++ |
Sean Silva | 93ca021 | 2012-12-13 01:10:46 +0000 | [diff] [blame] | 128 | |
| 129 | // Declares clang::SyntaxOnlyAction. |
| 130 | #include "clang/Frontend/FrontendActions.h" |
| 131 | #include "clang/Tooling/CommonOptionsParser.h" |
| 132 | #include "clang/Tooling/Tooling.h" |
| 133 | // Declares llvm::cl::extrahelp. |
| 134 | #include "llvm/Support/CommandLine.h" |
| 135 | |
| 136 | using namespace clang::tooling; |
| 137 | using namespace llvm; |
| 138 | |
Stephen Hines | 651f13c | 2014-04-23 16:59:28 -0700 | [diff] [blame] | 139 | // Apply a custom category to all command-line options so that they are the |
| 140 | // only ones displayed. |
| 141 | static llvm::cl::OptionCategory MyToolCategory("my-tool options"); |
| 142 | |
Sean Silva | 93ca021 | 2012-12-13 01:10:46 +0000 | [diff] [blame] | 143 | // CommonOptionsParser declares HelpMessage with a description of the common |
| 144 | // command-line options related to the compilation database and input files. |
| 145 | // It's nice to have this help message in all tools. |
| 146 | static cl::extrahelp CommonHelp(CommonOptionsParser::HelpMessage); |
| 147 | |
| 148 | // A help message for this specific tool can be added afterwards. |
| 149 | static cl::extrahelp MoreHelp("\nMore help text..."); |
| 150 | |
| 151 | int main(int argc, const char **argv) { |
Stephen Hines | 651f13c | 2014-04-23 16:59:28 -0700 | [diff] [blame] | 152 | CommonOptionsParser OptionsParser(argc, argv, MyToolCategory); |
Edwin Vane | b1f67db | 2012-12-14 18:58:25 +0000 | [diff] [blame] | 153 | ClangTool Tool(OptionsParser.getCompilations(), |
| 154 | OptionsParser.getSourcePathList()); |
Stephen Hines | 6bcf27b | 2014-05-29 04:14:42 -0700 | [diff] [blame] | 155 | return Tool.run(newFrontendActionFactory<clang::SyntaxOnlyAction>().get()); |
Sean Silva | 93ca021 | 2012-12-13 01:10:46 +0000 | [diff] [blame] | 156 | } |
| 157 | |
| 158 | And that's it! You can compile our new tool by running ninja from the |
| 159 | ``build`` directory. |
| 160 | |
Dmitri Gribenko | ba6efd3 | 2013-03-05 13:05:56 +0000 | [diff] [blame] | 161 | .. code-block:: console |
Sean Silva | 93ca021 | 2012-12-13 01:10:46 +0000 | [diff] [blame] | 162 | |
| 163 | cd ~/clang-llvm/build |
| 164 | ninja |
| 165 | |
| 166 | You should now be able to run the syntax checker, which is located in |
| 167 | ``~/clang-llvm/build/bin``, on any source file. Try it! |
| 168 | |
Dmitri Gribenko | ba6efd3 | 2013-03-05 13:05:56 +0000 | [diff] [blame] | 169 | .. code-block:: console |
Sean Silva | 93ca021 | 2012-12-13 01:10:46 +0000 | [diff] [blame] | 170 | |
Pirama Arumuga Nainar | b6d6993 | 2015-07-01 12:25:36 -0700 | [diff] [blame] | 171 | echo "int main() { return 0; }" > test.cpp |
Sean Silva | 93ca021 | 2012-12-13 01:10:46 +0000 | [diff] [blame] | 172 | bin/loop-convert test.cpp -- |
| 173 | |
| 174 | Note the two dashes after we specify the source file. The additional |
| 175 | options for the compiler are passed after the dashes rather than loading |
| 176 | them from a compilation database - there just aren't any options needed |
| 177 | right now. |
| 178 | |
| 179 | Intermezzo: Learn AST matcher basics |
| 180 | ==================================== |
| 181 | |
| 182 | Clang recently introduced the :doc:`ASTMatcher |
| 183 | library <LibASTMatchers>` to provide a simple, powerful, and |
| 184 | concise way to describe specific patterns in the AST. Implemented as a |
| 185 | DSL powered by macros and templates (see |
| 186 | `ASTMatchers.h <../doxygen/ASTMatchers_8h_source.html>`_ if you're |
| 187 | curious), matchers offer the feel of algebraic data types common to |
| 188 | functional programming languages. |
| 189 | |
| 190 | For example, suppose you wanted to examine only binary operators. There |
| 191 | is a matcher to do exactly that, conveniently named ``binaryOperator``. |
| 192 | I'll give you one guess what this matcher does: |
| 193 | |
Dmitri Gribenko | ba6efd3 | 2013-03-05 13:05:56 +0000 | [diff] [blame] | 194 | .. code-block:: c++ |
Sean Silva | 93ca021 | 2012-12-13 01:10:46 +0000 | [diff] [blame] | 195 | |
| 196 | binaryOperator(hasOperatorName("+"), hasLHS(integerLiteral(equals(0)))) |
| 197 | |
| 198 | Shockingly, it will match against addition expressions whose left hand |
| 199 | side is exactly the literal 0. It will not match against other forms of |
| 200 | 0, such as ``'\0'`` or ``NULL``, but it will match against macros that |
| 201 | expand to 0. The matcher will also not match against calls to the |
| 202 | overloaded operator ``'+'``, as there is a separate ``operatorCallExpr`` |
| 203 | matcher to handle overloaded operators. |
| 204 | |
| 205 | There are AST matchers to match all the different nodes of the AST, |
| 206 | narrowing matchers to only match AST nodes fulfilling specific criteria, |
| 207 | and traversal matchers to get from one kind of AST node to another. For |
| 208 | a complete list of AST matchers, take a look at the `AST Matcher |
| 209 | References <LibASTMatchersReference.html>`_ |
| 210 | |
| 211 | All matcher that are nouns describe entities in the AST and can be |
| 212 | bound, so that they can be referred to whenever a match is found. To do |
| 213 | so, simply call the method ``bind`` on these matchers, e.g.: |
| 214 | |
Dmitri Gribenko | ba6efd3 | 2013-03-05 13:05:56 +0000 | [diff] [blame] | 215 | .. code-block:: c++ |
Sean Silva | 93ca021 | 2012-12-13 01:10:46 +0000 | [diff] [blame] | 216 | |
| 217 | variable(hasType(isInteger())).bind("intvar") |
| 218 | |
| 219 | Step 2: Using AST matchers |
| 220 | ========================== |
| 221 | |
| 222 | Okay, on to using matchers for real. Let's start by defining a matcher |
| 223 | which will capture all ``for`` statements that define a new variable |
| 224 | initialized to zero. Let's start with matching all ``for`` loops: |
| 225 | |
Dmitri Gribenko | ba6efd3 | 2013-03-05 13:05:56 +0000 | [diff] [blame] | 226 | .. code-block:: c++ |
Sean Silva | 93ca021 | 2012-12-13 01:10:46 +0000 | [diff] [blame] | 227 | |
| 228 | forStmt() |
| 229 | |
| 230 | Next, we want to specify that a single variable is declared in the first |
| 231 | portion of the loop, so we can extend the matcher to |
| 232 | |
Dmitri Gribenko | ba6efd3 | 2013-03-05 13:05:56 +0000 | [diff] [blame] | 233 | .. code-block:: c++ |
Sean Silva | 93ca021 | 2012-12-13 01:10:46 +0000 | [diff] [blame] | 234 | |
| 235 | forStmt(hasLoopInit(declStmt(hasSingleDecl(varDecl())))) |
| 236 | |
| 237 | Finally, we can add the condition that the variable is initialized to |
| 238 | zero. |
| 239 | |
Dmitri Gribenko | ba6efd3 | 2013-03-05 13:05:56 +0000 | [diff] [blame] | 240 | .. code-block:: c++ |
Sean Silva | 93ca021 | 2012-12-13 01:10:46 +0000 | [diff] [blame] | 241 | |
| 242 | forStmt(hasLoopInit(declStmt(hasSingleDecl(varDecl( |
| 243 | hasInitializer(integerLiteral(equals(0)))))))) |
| 244 | |
| 245 | It is fairly easy to read and understand the matcher definition ("match |
| 246 | loops whose init portion declares a single variable which is initialized |
| 247 | to the integer literal 0"), but deciding that every piece is necessary |
| 248 | is more difficult. Note that this matcher will not match loops whose |
| 249 | variables are initialized to ``'\0'``, ``0.0``, ``NULL``, or any form of |
| 250 | zero besides the integer 0. |
| 251 | |
| 252 | The last step is giving the matcher a name and binding the ``ForStmt`` |
| 253 | as we will want to do something with it: |
| 254 | |
Dmitri Gribenko | ba6efd3 | 2013-03-05 13:05:56 +0000 | [diff] [blame] | 255 | .. code-block:: c++ |
Sean Silva | 93ca021 | 2012-12-13 01:10:46 +0000 | [diff] [blame] | 256 | |
| 257 | StatementMatcher LoopMatcher = |
| 258 | forStmt(hasLoopInit(declStmt(hasSingleDecl(varDecl( |
| 259 | hasInitializer(integerLiteral(equals(0)))))))).bind("forLoop"); |
| 260 | |
| 261 | Once you have defined your matchers, you will need to add a little more |
| 262 | scaffolding in order to run them. Matchers are paired with a |
| 263 | ``MatchCallback`` and registered with a ``MatchFinder`` object, then run |
| 264 | from a ``ClangTool``. More code! |
| 265 | |
| 266 | Add the following to ``LoopConvert.cpp``: |
| 267 | |
Dmitri Gribenko | ba6efd3 | 2013-03-05 13:05:56 +0000 | [diff] [blame] | 268 | .. code-block:: c++ |
| 269 | |
Manuel Klimek | e3a2b81 | 2013-03-04 11:31:46 +0000 | [diff] [blame] | 270 | #include "clang/ASTMatchers/ASTMatchers.h" |
| 271 | #include "clang/ASTMatchers/ASTMatchFinder.h" |
| 272 | |
| 273 | using namespace clang; |
| 274 | using namespace clang::ast_matchers; |
Sean Silva | 93ca021 | 2012-12-13 01:10:46 +0000 | [diff] [blame] | 275 | |
| 276 | StatementMatcher LoopMatcher = |
| 277 | forStmt(hasLoopInit(declStmt(hasSingleDecl(varDecl( |
| 278 | hasInitializer(integerLiteral(equals(0)))))))).bind("forLoop"); |
| 279 | |
| 280 | class LoopPrinter : public MatchFinder::MatchCallback { |
| 281 | public : |
| 282 | virtual void run(const MatchFinder::MatchResult &Result) { |
Manuel Klimek | 141f1d2 | 2013-04-17 15:02:12 +0000 | [diff] [blame] | 283 | if (const ForStmt *FS = Result.Nodes.getNodeAs<clang::ForStmt>("forLoop")) |
| 284 | FS->dump(); |
| 285 | } |
Sean Silva | 93ca021 | 2012-12-13 01:10:46 +0000 | [diff] [blame] | 286 | }; |
| 287 | |
| 288 | And change ``main()`` to: |
| 289 | |
Dmitri Gribenko | ba6efd3 | 2013-03-05 13:05:56 +0000 | [diff] [blame] | 290 | .. code-block:: c++ |
Sean Silva | 93ca021 | 2012-12-13 01:10:46 +0000 | [diff] [blame] | 291 | |
| 292 | int main(int argc, const char **argv) { |
Stephen Hines | 651f13c | 2014-04-23 16:59:28 -0700 | [diff] [blame] | 293 | CommonOptionsParser OptionsParser(argc, argv, MyToolCategory); |
Edwin Vane | b1f67db | 2012-12-14 18:58:25 +0000 | [diff] [blame] | 294 | ClangTool Tool(OptionsParser.getCompilations(), |
| 295 | OptionsParser.getSourcePathList()); |
Sean Silva | 93ca021 | 2012-12-13 01:10:46 +0000 | [diff] [blame] | 296 | |
| 297 | LoopPrinter Printer; |
| 298 | MatchFinder Finder; |
| 299 | Finder.addMatcher(LoopMatcher, &Printer); |
| 300 | |
Stephen Hines | 6bcf27b | 2014-05-29 04:14:42 -0700 | [diff] [blame] | 301 | return Tool.run(newFrontendActionFactory(&Finder).get()); |
Sean Silva | 93ca021 | 2012-12-13 01:10:46 +0000 | [diff] [blame] | 302 | } |
| 303 | |
| 304 | Now, you should be able to recompile and run the code to discover for |
| 305 | loops. Create a new file with a few examples, and test out our new |
| 306 | handiwork: |
| 307 | |
Dmitri Gribenko | ba6efd3 | 2013-03-05 13:05:56 +0000 | [diff] [blame] | 308 | .. code-block:: console |
Sean Silva | 93ca021 | 2012-12-13 01:10:46 +0000 | [diff] [blame] | 309 | |
| 310 | cd ~/clang-llvm/llvm/llvm_build/ |
| 311 | ninja loop-convert |
| 312 | vim ~/test-files/simple-loops.cc |
| 313 | bin/loop-convert ~/test-files/simple-loops.cc |
| 314 | |
| 315 | Step 3.5: More Complicated Matchers |
| 316 | =================================== |
| 317 | |
| 318 | Our simple matcher is capable of discovering for loops, but we would |
| 319 | still need to filter out many more ourselves. We can do a good portion |
| 320 | of the remaining work with some cleverly chosen matchers, but first we |
| 321 | need to decide exactly which properties we want to allow. |
| 322 | |
| 323 | How can we characterize for loops over arrays which would be eligible |
| 324 | for translation to range-based syntax? Range based loops over arrays of |
| 325 | size ``N`` that: |
| 326 | |
| 327 | - start at index ``0`` |
| 328 | - iterate consecutively |
| 329 | - end at index ``N-1`` |
| 330 | |
| 331 | We already check for (1), so all we need to add is a check to the loop's |
| 332 | condition to ensure that the loop's index variable is compared against |
| 333 | ``N`` and another check to ensure that the increment step just |
| 334 | increments this same variable. The matcher for (2) is straightforward: |
| 335 | require a pre- or post-increment of the same variable declared in the |
| 336 | init portion. |
| 337 | |
| 338 | Unfortunately, such a matcher is impossible to write. Matchers contain |
| 339 | no logic for comparing two arbitrary AST nodes and determining whether |
| 340 | or not they are equal, so the best we can do is matching more than we |
| 341 | would like to allow, and punting extra comparisons to the callback. |
| 342 | |
| 343 | In any case, we can start building this sub-matcher. We can require that |
| 344 | the increment step be a unary increment like this: |
| 345 | |
Dmitri Gribenko | ba6efd3 | 2013-03-05 13:05:56 +0000 | [diff] [blame] | 346 | .. code-block:: c++ |
Sean Silva | 93ca021 | 2012-12-13 01:10:46 +0000 | [diff] [blame] | 347 | |
| 348 | hasIncrement(unaryOperator(hasOperatorName("++"))) |
| 349 | |
| 350 | Specifying what is incremented introduces another quirk of Clang's AST: |
| 351 | Usages of variables are represented as ``DeclRefExpr``'s ("declaration |
| 352 | reference expressions") because they are expressions which refer to |
| 353 | variable declarations. To find a ``unaryOperator`` that refers to a |
| 354 | specific declaration, we can simply add a second condition to it: |
| 355 | |
Dmitri Gribenko | ba6efd3 | 2013-03-05 13:05:56 +0000 | [diff] [blame] | 356 | .. code-block:: c++ |
Sean Silva | 93ca021 | 2012-12-13 01:10:46 +0000 | [diff] [blame] | 357 | |
| 358 | hasIncrement(unaryOperator( |
| 359 | hasOperatorName("++"), |
| 360 | hasUnaryOperand(declRefExpr()))) |
| 361 | |
| 362 | Furthermore, we can restrict our matcher to only match if the |
| 363 | incremented variable is an integer: |
| 364 | |
Dmitri Gribenko | ba6efd3 | 2013-03-05 13:05:56 +0000 | [diff] [blame] | 365 | .. code-block:: c++ |
Sean Silva | 93ca021 | 2012-12-13 01:10:46 +0000 | [diff] [blame] | 366 | |
| 367 | hasIncrement(unaryOperator( |
| 368 | hasOperatorName("++"), |
| 369 | hasUnaryOperand(declRefExpr(to(varDecl(hasType(isInteger()))))))) |
| 370 | |
| 371 | And the last step will be to attach an identifier to this variable, so |
| 372 | that we can retrieve it in the callback: |
| 373 | |
Dmitri Gribenko | ba6efd3 | 2013-03-05 13:05:56 +0000 | [diff] [blame] | 374 | .. code-block:: c++ |
Sean Silva | 93ca021 | 2012-12-13 01:10:46 +0000 | [diff] [blame] | 375 | |
| 376 | hasIncrement(unaryOperator( |
| 377 | hasOperatorName("++"), |
| 378 | hasUnaryOperand(declRefExpr(to( |
| 379 | varDecl(hasType(isInteger())).bind("incrementVariable")))))) |
| 380 | |
| 381 | We can add this code to the definition of ``LoopMatcher`` and make sure |
| 382 | that our program, outfitted with the new matcher, only prints out loops |
| 383 | that declare a single variable initialized to zero and have an increment |
| 384 | step consisting of a unary increment of some variable. |
| 385 | |
| 386 | Now, we just need to add a matcher to check if the condition part of the |
| 387 | ``for`` loop compares a variable against the size of the array. There is |
| 388 | only one problem - we don't know which array we're iterating over |
| 389 | without looking at the body of the loop! We are again restricted to |
| 390 | approximating the result we want with matchers, filling in the details |
| 391 | in the callback. So we start with: |
| 392 | |
Dmitri Gribenko | ba6efd3 | 2013-03-05 13:05:56 +0000 | [diff] [blame] | 393 | .. code-block:: c++ |
Sean Silva | 93ca021 | 2012-12-13 01:10:46 +0000 | [diff] [blame] | 394 | |
| 395 | hasCondition(binaryOperator(hasOperatorName("<")) |
| 396 | |
| 397 | It makes sense to ensure that the left-hand side is a reference to a |
| 398 | variable, and that the right-hand side has integer type. |
| 399 | |
Dmitri Gribenko | ba6efd3 | 2013-03-05 13:05:56 +0000 | [diff] [blame] | 400 | .. code-block:: c++ |
Sean Silva | 93ca021 | 2012-12-13 01:10:46 +0000 | [diff] [blame] | 401 | |
| 402 | hasCondition(binaryOperator( |
| 403 | hasOperatorName("<"), |
Edwin Vane | 7e6f23a | 2013-03-05 18:04:37 +0000 | [diff] [blame] | 404 | hasLHS(declRefExpr(to(varDecl(hasType(isInteger()))))), |
| 405 | hasRHS(expr(hasType(isInteger()))))) |
Sean Silva | 93ca021 | 2012-12-13 01:10:46 +0000 | [diff] [blame] | 406 | |
| 407 | Why? Because it doesn't work. Of the three loops provided in |
| 408 | ``test-files/simple.cpp``, zero of them have a matching condition. A |
| 409 | quick look at the AST dump of the first for loop, produced by the |
| 410 | previous iteration of loop-convert, shows us the answer: |
| 411 | |
| 412 | :: |
| 413 | |
| 414 | (ForStmt 0x173b240 |
| 415 | (DeclStmt 0x173afc8 |
| 416 | 0x173af50 "int i = |
| 417 | (IntegerLiteral 0x173afa8 'int' 0)") |
| 418 | <<>> |
| 419 | (BinaryOperator 0x173b060 '_Bool' '<' |
Manuel Klimek | 74df03a | 2013-04-18 14:30:45 +0000 | [diff] [blame] | 420 | (ImplicitCastExpr 0x173b030 'int' |
Sean Silva | 93ca021 | 2012-12-13 01:10:46 +0000 | [diff] [blame] | 421 | (DeclRefExpr 0x173afe0 'int' lvalue Var 0x173af50 'i' 'int')) |
Manuel Klimek | 74df03a | 2013-04-18 14:30:45 +0000 | [diff] [blame] | 422 | (ImplicitCastExpr 0x173b048 'int' |
Sean Silva | 93ca021 | 2012-12-13 01:10:46 +0000 | [diff] [blame] | 423 | (DeclRefExpr 0x173b008 'const int' lvalue Var 0x170fa80 'N' 'const int'))) |
| 424 | (UnaryOperator 0x173b0b0 'int' lvalue prefix '++' |
| 425 | (DeclRefExpr 0x173b088 'int' lvalue Var 0x173af50 'i' 'int')) |
Dmitri Gribenko | ebf8fe2 | 2013-04-27 16:26:29 +0000 | [diff] [blame] | 426 | (CompoundStatement ... |
Sean Silva | 93ca021 | 2012-12-13 01:10:46 +0000 | [diff] [blame] | 427 | |
| 428 | We already know that the declaration and increments both match, or this |
| 429 | loop wouldn't have been dumped. The culprit lies in the implicit cast |
| 430 | applied to the first operand (i.e. the LHS) of the less-than operator, |
| 431 | an L-value to R-value conversion applied to the expression referencing |
| 432 | ``i``. Thankfully, the matcher library offers a solution to this problem |
| 433 | in the form of ``ignoringParenImpCasts``, which instructs the matcher to |
| 434 | ignore implicit casts and parentheses before continuing to match. |
| 435 | Adjusting the condition operator will restore the desired match. |
| 436 | |
Dmitri Gribenko | ba6efd3 | 2013-03-05 13:05:56 +0000 | [diff] [blame] | 437 | .. code-block:: c++ |
Sean Silva | 93ca021 | 2012-12-13 01:10:46 +0000 | [diff] [blame] | 438 | |
| 439 | hasCondition(binaryOperator( |
| 440 | hasOperatorName("<"), |
Edwin Vane | 7e6f23a | 2013-03-05 18:04:37 +0000 | [diff] [blame] | 441 | hasLHS(ignoringParenImpCasts(declRefExpr( |
| 442 | to(varDecl(hasType(isInteger())))))), |
| 443 | hasRHS(expr(hasType(isInteger()))))) |
Sean Silva | 93ca021 | 2012-12-13 01:10:46 +0000 | [diff] [blame] | 444 | |
| 445 | After adding binds to the expressions we wished to capture and |
| 446 | extracting the identifier strings into variables, we have array-step-2 |
| 447 | completed. |
| 448 | |
| 449 | Step 4: Retrieving Matched Nodes |
| 450 | ================================ |
| 451 | |
| 452 | So far, the matcher callback isn't very interesting: it just dumps the |
| 453 | loop's AST. At some point, we will need to make changes to the input |
| 454 | source code. Next, we'll work on using the nodes we bound in the |
| 455 | previous step. |
| 456 | |
| 457 | The ``MatchFinder::run()`` callback takes a |
| 458 | ``MatchFinder::MatchResult&`` as its parameter. We're most interested in |
| 459 | its ``Context`` and ``Nodes`` members. Clang uses the ``ASTContext`` |
| 460 | class to represent contextual information about the AST, as the name |
| 461 | implies, though the most functionally important detail is that several |
| 462 | operations require an ``ASTContext*`` parameter. More immediately useful |
| 463 | is the set of matched nodes, and how we retrieve them. |
| 464 | |
| 465 | Since we bind three variables (identified by ConditionVarName, |
| 466 | InitVarName, and IncrementVarName), we can obtain the matched nodes by |
| 467 | using the ``getNodeAs()`` member function. |
| 468 | |
Manuel Klimek | 74df03a | 2013-04-18 14:30:45 +0000 | [diff] [blame] | 469 | In ``LoopConvert.cpp`` add |
Sean Silva | 93ca021 | 2012-12-13 01:10:46 +0000 | [diff] [blame] | 470 | |
Dmitri Gribenko | ba6efd3 | 2013-03-05 13:05:56 +0000 | [diff] [blame] | 471 | .. code-block:: c++ |
Sean Silva | 93ca021 | 2012-12-13 01:10:46 +0000 | [diff] [blame] | 472 | |
| 473 | #include "clang/AST/ASTContext.h" |
| 474 | |
Manuel Klimek | 74df03a | 2013-04-18 14:30:45 +0000 | [diff] [blame] | 475 | Change ``LoopMatcher`` to |
Sean Silva | 93ca021 | 2012-12-13 01:10:46 +0000 | [diff] [blame] | 476 | |
Dmitri Gribenko | ba6efd3 | 2013-03-05 13:05:56 +0000 | [diff] [blame] | 477 | .. code-block:: c++ |
Sean Silva | 93ca021 | 2012-12-13 01:10:46 +0000 | [diff] [blame] | 478 | |
Manuel Klimek | 74df03a | 2013-04-18 14:30:45 +0000 | [diff] [blame] | 479 | StatementMatcher LoopMatcher = |
| 480 | forStmt(hasLoopInit(declStmt( |
| 481 | hasSingleDecl(varDecl(hasInitializer(integerLiteral(equals(0)))) |
| 482 | .bind("initVarName")))), |
| 483 | hasIncrement(unaryOperator( |
| 484 | hasOperatorName("++"), |
| 485 | hasUnaryOperand(declRefExpr( |
| 486 | to(varDecl(hasType(isInteger())).bind("incVarName")))))), |
| 487 | hasCondition(binaryOperator( |
| 488 | hasOperatorName("<"), |
| 489 | hasLHS(ignoringParenImpCasts(declRefExpr( |
| 490 | to(varDecl(hasType(isInteger())).bind("condVarName"))))), |
| 491 | hasRHS(expr(hasType(isInteger())))))).bind("forLoop"); |
| 492 | |
| 493 | And change ``LoopPrinter::run`` to |
| 494 | |
| 495 | .. code-block:: c++ |
| 496 | |
| 497 | void LoopPrinter::run(const MatchFinder::MatchResult &Result) { |
| 498 | ASTContext *Context = Result.Context; |
| 499 | const ForStmt *FS = Result.Nodes.getStmtAs<ForStmt>("forLoop"); |
| 500 | // We do not want to convert header files! |
| 501 | if (!FS || !Context->getSourceManager().isFromMainFile(FS->getForLoc())) |
| 502 | return; |
| 503 | const VarDecl *IncVar = Result.Nodes.getNodeAs<VarDecl>("incVarName"); |
| 504 | const VarDecl *CondVar = Result.Nodes.getNodeAs<VarDecl>("condVarName"); |
| 505 | const VarDecl *InitVar = Result.Nodes.getNodeAs<VarDecl>("initVarName"); |
| 506 | |
| 507 | if (!areSameVariable(IncVar, CondVar) || !areSameVariable(IncVar, InitVar)) |
| 508 | return; |
| 509 | llvm::outs() << "Potential array-based loop discovered.\n"; |
| 510 | } |
| 511 | |
| 512 | Clang associates a ``VarDecl`` with each variable to represent the variable's |
| 513 | declaration. Since the "canonical" form of each declaration is unique by |
| 514 | address, all we need to do is make sure neither ``ValueDecl`` (base class of |
| 515 | ``VarDecl``) is ``NULL`` and compare the canonical Decls. |
| 516 | |
| 517 | .. code-block:: c++ |
| 518 | |
| 519 | static bool areSameVariable(const ValueDecl *First, const ValueDecl *Second) { |
| 520 | return First && Second && |
| 521 | First->getCanonicalDecl() == Second->getCanonicalDecl(); |
| 522 | } |
Sean Silva | 93ca021 | 2012-12-13 01:10:46 +0000 | [diff] [blame] | 523 | |
| 524 | If execution reaches the end of ``LoopPrinter::run()``, we know that the |
| 525 | loop shell that looks like |
| 526 | |
Dmitri Gribenko | ba6efd3 | 2013-03-05 13:05:56 +0000 | [diff] [blame] | 527 | .. code-block:: c++ |
Sean Silva | 93ca021 | 2012-12-13 01:10:46 +0000 | [diff] [blame] | 528 | |
| 529 | for (int i= 0; i < expr(); ++i) { ... } |
| 530 | |
| 531 | For now, we will just print a message explaining that we found a loop. |
| 532 | The next section will deal with recursively traversing the AST to |
| 533 | discover all changes needed. |
| 534 | |
Manuel Klimek | 74df03a | 2013-04-18 14:30:45 +0000 | [diff] [blame] | 535 | As a side note, it's not as trivial to test if two expressions are the same, |
| 536 | though Clang has already done the hard work for us by providing a way to |
Sean Silva | 93ca021 | 2012-12-13 01:10:46 +0000 | [diff] [blame] | 537 | canonicalize expressions: |
| 538 | |
Dmitri Gribenko | ba6efd3 | 2013-03-05 13:05:56 +0000 | [diff] [blame] | 539 | .. code-block:: c++ |
Sean Silva | 93ca021 | 2012-12-13 01:10:46 +0000 | [diff] [blame] | 540 | |
Dmitri Gribenko | ba6efd3 | 2013-03-05 13:05:56 +0000 | [diff] [blame] | 541 | static bool areSameExpr(ASTContext *Context, const Expr *First, |
Sean Silva | 93ca021 | 2012-12-13 01:10:46 +0000 | [diff] [blame] | 542 | const Expr *Second) { |
| 543 | if (!First || !Second) |
| 544 | return false; |
| 545 | llvm::FoldingSetNodeID FirstID, SecondID; |
| 546 | First->Profile(FirstID, *Context, true); |
| 547 | Second->Profile(SecondID, *Context, true); |
| 548 | return FirstID == SecondID; |
| 549 | } |
| 550 | |
| 551 | This code relies on the comparison between two |
| 552 | ``llvm::FoldingSetNodeIDs``. As the documentation for |
| 553 | ``Stmt::Profile()`` indicates, the ``Profile()`` member function builds |
| 554 | a description of a node in the AST, based on its properties, along with |
| 555 | those of its children. ``FoldingSetNodeID`` then serves as a hash we can |
| 556 | use to compare expressions. We will need ``areSameExpr`` later. Before |
| 557 | you run the new code on the additional loops added to |
| 558 | test-files/simple.cpp, try to figure out which ones will be considered |
| 559 | potentially convertible. |