Sean Silva | bf9b4cd | 2012-12-13 01:10:46 +0000 | [diff] [blame] | 1 | =============================================================== |
| 2 | Tutorial for building tools using LibTooling and LibASTMatchers |
| 3 | =============================================================== |
| 4 | |
| 5 | This document is intended to show how to build a useful source-to-source |
| 6 | translation tool based on Clang's `LibTooling <LibTooling.html>`_. It is |
| 7 | explicitly aimed at people who are new to Clang, so all you should need |
| 8 | is a working knowledge of C++ and the command line. |
| 9 | |
| 10 | In order to work on the compiler, you need some basic knowledge of the |
| 11 | abstract syntax tree (AST). To this end, the reader is incouraged to |
| 12 | skim the :doc:`Introduction to the Clang |
| 13 | AST <IntroductionToTheClangAST>` |
| 14 | |
| 15 | Step 0: Obtaining Clang |
| 16 | ======================= |
| 17 | |
| 18 | As Clang is part of the LLVM project, you'll need to download LLVM's |
James Y Knight | 5d71fc5 | 2019-01-29 16:37:27 +0000 | [diff] [blame] | 19 | source code first. Both Clang and LLVM are in the same git repository, |
| 20 | under different directories. For further information, see the `getting |
| 21 | started guide <https://llvm.org/docs/GettingStarted.html>`_. |
Sean Silva | bf9b4cd | 2012-12-13 01:10:46 +0000 | [diff] [blame] | 22 | |
Dmitri Gribenko | 68a0668 | 2013-03-05 13:05:56 +0000 | [diff] [blame] | 23 | .. code-block:: console |
Sean Silva | bf9b4cd | 2012-12-13 01:10:46 +0000 | [diff] [blame] | 24 | |
James Y Knight | 5d71fc5 | 2019-01-29 16:37:27 +0000 | [diff] [blame] | 25 | cd ~/clang-llvm |
| 26 | git clone https://github.com/llvm/llvm-project.git |
Sean Silva | bf9b4cd | 2012-12-13 01:10:46 +0000 | [diff] [blame] | 27 | |
James Y Knight | 5d71fc5 | 2019-01-29 16:37:27 +0000 | [diff] [blame] | 28 | Next you need to obtain the CMake build system and Ninja build tool. |
Sean Silva | bf9b4cd | 2012-12-13 01:10:46 +0000 | [diff] [blame] | 29 | |
Dmitri Gribenko | 68a0668 | 2013-03-05 13:05:56 +0000 | [diff] [blame] | 30 | .. code-block:: console |
Sean Silva | bf9b4cd | 2012-12-13 01:10:46 +0000 | [diff] [blame] | 31 | |
| 32 | cd ~/clang-llvm |
| 33 | git clone https://github.com/martine/ninja.git |
| 34 | cd ninja |
| 35 | git checkout release |
| 36 | ./bootstrap.py |
| 37 | sudo cp ninja /usr/bin/ |
| 38 | |
| 39 | cd ~/clang-llvm |
| 40 | git clone git://cmake.org/stage/cmake.git |
| 41 | cd cmake |
| 42 | git checkout next |
| 43 | ./bootstrap |
| 44 | make |
| 45 | sudo make install |
| 46 | |
| 47 | Okay. Now we'll build Clang! |
| 48 | |
Dmitri Gribenko | 68a0668 | 2013-03-05 13:05:56 +0000 | [diff] [blame] | 49 | .. code-block:: console |
Sean Silva | bf9b4cd | 2012-12-13 01:10:46 +0000 | [diff] [blame] | 50 | |
| 51 | cd ~/clang-llvm |
| 52 | mkdir build && cd build |
Ilya Biryukov | 37e1b41 | 2019-06-07 16:31:06 +0000 | [diff] [blame] | 53 | cmake -G Ninja ../llvm -DLLVM_ENABLE_PROJECTS="clang;clang-tools-extra" -DLLVM_BUILD_TESTS=ON # Enable tests; default is off. |
Sean Silva | bf9b4cd | 2012-12-13 01:10:46 +0000 | [diff] [blame] | 54 | ninja |
| 55 | ninja check # Test LLVM only. |
| 56 | ninja clang-test # Test Clang only. |
| 57 | ninja install |
| 58 | |
| 59 | And we're live. |
| 60 | |
James Y Knight | 5d71fc5 | 2019-01-29 16:37:27 +0000 | [diff] [blame] | 61 | All of the tests should pass. |
Sean Silva | bf9b4cd | 2012-12-13 01:10:46 +0000 | [diff] [blame] | 62 | |
| 63 | Finally, we want to set Clang as its own compiler. |
| 64 | |
Dmitri Gribenko | 68a0668 | 2013-03-05 13:05:56 +0000 | [diff] [blame] | 65 | .. code-block:: console |
Sean Silva | bf9b4cd | 2012-12-13 01:10:46 +0000 | [diff] [blame] | 66 | |
| 67 | cd ~/clang-llvm/build |
| 68 | ccmake ../llvm |
| 69 | |
| 70 | The second command will bring up a GUI for configuring Clang. You need |
| 71 | to set the entry for ``CMAKE_CXX_COMPILER``. Press ``'t'`` to turn on |
| 72 | advanced mode. Scroll down to ``CMAKE_CXX_COMPILER``, and set it to |
| 73 | ``/usr/bin/clang++``, or wherever you installed it. Press ``'c'`` to |
| 74 | configure, then ``'g'`` to generate CMake's files. |
| 75 | |
| 76 | Finally, run ninja one last time, and you're done. |
| 77 | |
| 78 | Step 1: Create a ClangTool |
| 79 | ========================== |
| 80 | |
| 81 | Now that we have enough background knowledge, it's time to create the |
| 82 | simplest productive ClangTool in existence: a syntax checker. While this |
| 83 | already exists as ``clang-check``, it's important to understand what's |
| 84 | going on. |
| 85 | |
| 86 | First, we'll need to create a new directory for our tool and tell CMake |
| 87 | that it exists. As this is not going to be a core clang tool, it will |
Ilya Biryukov | 0d02dc6 | 2019-06-06 08:06:25 +0000 | [diff] [blame] | 88 | live in the ``clang-tools-extra`` repository. |
Sean Silva | bf9b4cd | 2012-12-13 01:10:46 +0000 | [diff] [blame] | 89 | |
Dmitri Gribenko | 68a0668 | 2013-03-05 13:05:56 +0000 | [diff] [blame] | 90 | .. code-block:: console |
Sean Silva | bf9b4cd | 2012-12-13 01:10:46 +0000 | [diff] [blame] | 91 | |
Ilya Biryukov | 0d02dc6 | 2019-06-06 08:06:25 +0000 | [diff] [blame] | 92 | cd ~/clang-llvm |
| 93 | mkdir clang-tools-extra/loop-convert |
| 94 | echo 'add_subdirectory(loop-convert)' >> clang-tools-extra/CMakeLists.txt |
| 95 | vim clang-tools-extra/loop-convert/CMakeLists.txt |
Sean Silva | bf9b4cd | 2012-12-13 01:10:46 +0000 | [diff] [blame] | 96 | |
| 97 | CMakeLists.txt should have the following contents: |
| 98 | |
| 99 | :: |
| 100 | |
| 101 | set(LLVM_LINK_COMPONENTS support) |
Sean Silva | bf9b4cd | 2012-12-13 01:10:46 +0000 | [diff] [blame] | 102 | |
| 103 | add_clang_executable(loop-convert |
| 104 | LoopConvert.cpp |
| 105 | ) |
| 106 | target_link_libraries(loop-convert |
Stephen Kelly | 97e0f52 | 2018-10-01 20:28:07 +0000 | [diff] [blame] | 107 | PRIVATE |
Sean Silva | bf9b4cd | 2012-12-13 01:10:46 +0000 | [diff] [blame] | 108 | clangTooling |
| 109 | clangBasic |
| 110 | clangASTMatchers |
| 111 | ) |
| 112 | |
| 113 | With that done, Ninja will be able to compile our tool. Let's give it |
| 114 | something to compile! Put the following into |
Ilya Biryukov | 0d02dc6 | 2019-06-06 08:06:25 +0000 | [diff] [blame] | 115 | ``clang-tools-extra/loop-convert/LoopConvert.cpp``. A detailed explanation of |
Sean Silva | bf9b4cd | 2012-12-13 01:10:46 +0000 | [diff] [blame] | 116 | why the different parts are needed can be found in the `LibTooling |
| 117 | documentation <LibTooling.html>`_. |
| 118 | |
Dmitri Gribenko | 68a0668 | 2013-03-05 13:05:56 +0000 | [diff] [blame] | 119 | .. code-block:: c++ |
Sean Silva | bf9b4cd | 2012-12-13 01:10:46 +0000 | [diff] [blame] | 120 | |
| 121 | // Declares clang::SyntaxOnlyAction. |
| 122 | #include "clang/Frontend/FrontendActions.h" |
| 123 | #include "clang/Tooling/CommonOptionsParser.h" |
| 124 | #include "clang/Tooling/Tooling.h" |
| 125 | // Declares llvm::cl::extrahelp. |
| 126 | #include "llvm/Support/CommandLine.h" |
| 127 | |
| 128 | using namespace clang::tooling; |
| 129 | using namespace llvm; |
| 130 | |
Alexander Kornienko | aa7bca1 | 2014-02-19 16:11:38 +0000 | [diff] [blame] | 131 | // Apply a custom category to all command-line options so that they are the |
| 132 | // only ones displayed. |
| 133 | static llvm::cl::OptionCategory MyToolCategory("my-tool options"); |
| 134 | |
Sean Silva | bf9b4cd | 2012-12-13 01:10:46 +0000 | [diff] [blame] | 135 | // CommonOptionsParser declares HelpMessage with a description of the common |
| 136 | // command-line options related to the compilation database and input files. |
| 137 | // It's nice to have this help message in all tools. |
| 138 | static cl::extrahelp CommonHelp(CommonOptionsParser::HelpMessage); |
| 139 | |
| 140 | // A help message for this specific tool can be added afterwards. |
Stephane Sezer | 9601bf5 | 2018-05-08 19:46:29 +0000 | [diff] [blame] | 141 | static cl::extrahelp MoreHelp("\nMore help text...\n"); |
Sean Silva | bf9b4cd | 2012-12-13 01:10:46 +0000 | [diff] [blame] | 142 | |
| 143 | int main(int argc, const char **argv) { |
Alexander Kornienko | aa7bca1 | 2014-02-19 16:11:38 +0000 | [diff] [blame] | 144 | CommonOptionsParser OptionsParser(argc, argv, MyToolCategory); |
Edwin Vane | 524741f | 2012-12-14 18:58:25 +0000 | [diff] [blame] | 145 | ClangTool Tool(OptionsParser.getCompilations(), |
| 146 | OptionsParser.getSourcePathList()); |
Richard Smith | 06900a0 | 2014-05-01 20:04:39 +0000 | [diff] [blame] | 147 | return Tool.run(newFrontendActionFactory<clang::SyntaxOnlyAction>().get()); |
Sean Silva | bf9b4cd | 2012-12-13 01:10:46 +0000 | [diff] [blame] | 148 | } |
| 149 | |
| 150 | And that's it! You can compile our new tool by running ninja from the |
| 151 | ``build`` directory. |
| 152 | |
Dmitri Gribenko | 68a0668 | 2013-03-05 13:05:56 +0000 | [diff] [blame] | 153 | .. code-block:: console |
Sean Silva | bf9b4cd | 2012-12-13 01:10:46 +0000 | [diff] [blame] | 154 | |
| 155 | cd ~/clang-llvm/build |
| 156 | ninja |
| 157 | |
| 158 | You should now be able to run the syntax checker, which is located in |
| 159 | ``~/clang-llvm/build/bin``, on any source file. Try it! |
| 160 | |
Dmitri Gribenko | 68a0668 | 2013-03-05 13:05:56 +0000 | [diff] [blame] | 161 | .. code-block:: console |
Sean Silva | bf9b4cd | 2012-12-13 01:10:46 +0000 | [diff] [blame] | 162 | |
Douglas Katzman | 677324e | 2015-06-02 22:40:27 +0000 | [diff] [blame] | 163 | echo "int main() { return 0; }" > test.cpp |
Sean Silva | bf9b4cd | 2012-12-13 01:10:46 +0000 | [diff] [blame] | 164 | bin/loop-convert test.cpp -- |
| 165 | |
| 166 | Note the two dashes after we specify the source file. The additional |
| 167 | options for the compiler are passed after the dashes rather than loading |
| 168 | them from a compilation database - there just aren't any options needed |
| 169 | right now. |
| 170 | |
| 171 | Intermezzo: Learn AST matcher basics |
| 172 | ==================================== |
| 173 | |
| 174 | Clang recently introduced the :doc:`ASTMatcher |
| 175 | library <LibASTMatchers>` to provide a simple, powerful, and |
| 176 | concise way to describe specific patterns in the AST. Implemented as a |
| 177 | DSL powered by macros and templates (see |
| 178 | `ASTMatchers.h <../doxygen/ASTMatchers_8h_source.html>`_ if you're |
| 179 | curious), matchers offer the feel of algebraic data types common to |
| 180 | functional programming languages. |
| 181 | |
| 182 | For example, suppose you wanted to examine only binary operators. There |
| 183 | is a matcher to do exactly that, conveniently named ``binaryOperator``. |
| 184 | I'll give you one guess what this matcher does: |
| 185 | |
Dmitri Gribenko | 68a0668 | 2013-03-05 13:05:56 +0000 | [diff] [blame] | 186 | .. code-block:: c++ |
Sean Silva | bf9b4cd | 2012-12-13 01:10:46 +0000 | [diff] [blame] | 187 | |
| 188 | binaryOperator(hasOperatorName("+"), hasLHS(integerLiteral(equals(0)))) |
| 189 | |
| 190 | Shockingly, it will match against addition expressions whose left hand |
| 191 | side is exactly the literal 0. It will not match against other forms of |
| 192 | 0, such as ``'\0'`` or ``NULL``, but it will match against macros that |
| 193 | expand to 0. The matcher will also not match against calls to the |
| 194 | overloaded operator ``'+'``, as there is a separate ``operatorCallExpr`` |
| 195 | matcher to handle overloaded operators. |
| 196 | |
| 197 | There are AST matchers to match all the different nodes of the AST, |
| 198 | narrowing matchers to only match AST nodes fulfilling specific criteria, |
| 199 | and traversal matchers to get from one kind of AST node to another. For |
| 200 | a complete list of AST matchers, take a look at the `AST Matcher |
| 201 | References <LibASTMatchersReference.html>`_ |
| 202 | |
| 203 | All matcher that are nouns describe entities in the AST and can be |
| 204 | bound, so that they can be referred to whenever a match is found. To do |
| 205 | so, simply call the method ``bind`` on these matchers, e.g.: |
| 206 | |
Dmitri Gribenko | 68a0668 | 2013-03-05 13:05:56 +0000 | [diff] [blame] | 207 | .. code-block:: c++ |
Sean Silva | bf9b4cd | 2012-12-13 01:10:46 +0000 | [diff] [blame] | 208 | |
| 209 | variable(hasType(isInteger())).bind("intvar") |
| 210 | |
| 211 | Step 2: Using AST matchers |
| 212 | ========================== |
| 213 | |
| 214 | Okay, on to using matchers for real. Let's start by defining a matcher |
| 215 | which will capture all ``for`` statements that define a new variable |
| 216 | initialized to zero. Let's start with matching all ``for`` loops: |
| 217 | |
Dmitri Gribenko | 68a0668 | 2013-03-05 13:05:56 +0000 | [diff] [blame] | 218 | .. code-block:: c++ |
Sean Silva | bf9b4cd | 2012-12-13 01:10:46 +0000 | [diff] [blame] | 219 | |
| 220 | forStmt() |
| 221 | |
| 222 | Next, we want to specify that a single variable is declared in the first |
| 223 | portion of the loop, so we can extend the matcher to |
| 224 | |
Dmitri Gribenko | 68a0668 | 2013-03-05 13:05:56 +0000 | [diff] [blame] | 225 | .. code-block:: c++ |
Sean Silva | bf9b4cd | 2012-12-13 01:10:46 +0000 | [diff] [blame] | 226 | |
| 227 | forStmt(hasLoopInit(declStmt(hasSingleDecl(varDecl())))) |
| 228 | |
| 229 | Finally, we can add the condition that the variable is initialized to |
| 230 | zero. |
| 231 | |
Dmitri Gribenko | 68a0668 | 2013-03-05 13:05:56 +0000 | [diff] [blame] | 232 | .. code-block:: c++ |
Sean Silva | bf9b4cd | 2012-12-13 01:10:46 +0000 | [diff] [blame] | 233 | |
| 234 | forStmt(hasLoopInit(declStmt(hasSingleDecl(varDecl( |
| 235 | hasInitializer(integerLiteral(equals(0)))))))) |
| 236 | |
| 237 | It is fairly easy to read and understand the matcher definition ("match |
| 238 | loops whose init portion declares a single variable which is initialized |
| 239 | to the integer literal 0"), but deciding that every piece is necessary |
| 240 | is more difficult. Note that this matcher will not match loops whose |
| 241 | variables are initialized to ``'\0'``, ``0.0``, ``NULL``, or any form of |
| 242 | zero besides the integer 0. |
| 243 | |
| 244 | The last step is giving the matcher a name and binding the ``ForStmt`` |
| 245 | as we will want to do something with it: |
| 246 | |
Dmitri Gribenko | 68a0668 | 2013-03-05 13:05:56 +0000 | [diff] [blame] | 247 | .. code-block:: c++ |
Sean Silva | bf9b4cd | 2012-12-13 01:10:46 +0000 | [diff] [blame] | 248 | |
| 249 | StatementMatcher LoopMatcher = |
| 250 | forStmt(hasLoopInit(declStmt(hasSingleDecl(varDecl( |
| 251 | hasInitializer(integerLiteral(equals(0)))))))).bind("forLoop"); |
| 252 | |
| 253 | Once you have defined your matchers, you will need to add a little more |
| 254 | scaffolding in order to run them. Matchers are paired with a |
| 255 | ``MatchCallback`` and registered with a ``MatchFinder`` object, then run |
| 256 | from a ``ClangTool``. More code! |
| 257 | |
| 258 | Add the following to ``LoopConvert.cpp``: |
| 259 | |
Dmitri Gribenko | 68a0668 | 2013-03-05 13:05:56 +0000 | [diff] [blame] | 260 | .. code-block:: c++ |
| 261 | |
Manuel Klimek | 3de6d3a | 2013-03-04 11:31:46 +0000 | [diff] [blame] | 262 | #include "clang/ASTMatchers/ASTMatchers.h" |
| 263 | #include "clang/ASTMatchers/ASTMatchFinder.h" |
| 264 | |
| 265 | using namespace clang; |
| 266 | using namespace clang::ast_matchers; |
Sean Silva | bf9b4cd | 2012-12-13 01:10:46 +0000 | [diff] [blame] | 267 | |
| 268 | StatementMatcher LoopMatcher = |
| 269 | forStmt(hasLoopInit(declStmt(hasSingleDecl(varDecl( |
| 270 | hasInitializer(integerLiteral(equals(0)))))))).bind("forLoop"); |
| 271 | |
| 272 | class LoopPrinter : public MatchFinder::MatchCallback { |
| 273 | public : |
| 274 | virtual void run(const MatchFinder::MatchResult &Result) { |
Manuel Klimek | 21ae251 | 2013-04-17 15:02:12 +0000 | [diff] [blame] | 275 | if (const ForStmt *FS = Result.Nodes.getNodeAs<clang::ForStmt>("forLoop")) |
| 276 | FS->dump(); |
| 277 | } |
Sean Silva | bf9b4cd | 2012-12-13 01:10:46 +0000 | [diff] [blame] | 278 | }; |
| 279 | |
| 280 | And change ``main()`` to: |
| 281 | |
Dmitri Gribenko | 68a0668 | 2013-03-05 13:05:56 +0000 | [diff] [blame] | 282 | .. code-block:: c++ |
Sean Silva | bf9b4cd | 2012-12-13 01:10:46 +0000 | [diff] [blame] | 283 | |
| 284 | int main(int argc, const char **argv) { |
Alexander Kornienko | aa7bca1 | 2014-02-19 16:11:38 +0000 | [diff] [blame] | 285 | CommonOptionsParser OptionsParser(argc, argv, MyToolCategory); |
Edwin Vane | 524741f | 2012-12-14 18:58:25 +0000 | [diff] [blame] | 286 | ClangTool Tool(OptionsParser.getCompilations(), |
| 287 | OptionsParser.getSourcePathList()); |
Sean Silva | bf9b4cd | 2012-12-13 01:10:46 +0000 | [diff] [blame] | 288 | |
| 289 | LoopPrinter Printer; |
| 290 | MatchFinder Finder; |
| 291 | Finder.addMatcher(LoopMatcher, &Printer); |
| 292 | |
Richard Smith | 06900a0 | 2014-05-01 20:04:39 +0000 | [diff] [blame] | 293 | return Tool.run(newFrontendActionFactory(&Finder).get()); |
Sean Silva | bf9b4cd | 2012-12-13 01:10:46 +0000 | [diff] [blame] | 294 | } |
| 295 | |
| 296 | Now, you should be able to recompile and run the code to discover for |
| 297 | loops. Create a new file with a few examples, and test out our new |
| 298 | handiwork: |
| 299 | |
Dmitri Gribenko | 68a0668 | 2013-03-05 13:05:56 +0000 | [diff] [blame] | 300 | .. code-block:: console |
Sean Silva | bf9b4cd | 2012-12-13 01:10:46 +0000 | [diff] [blame] | 301 | |
| 302 | cd ~/clang-llvm/llvm/llvm_build/ |
| 303 | ninja loop-convert |
| 304 | vim ~/test-files/simple-loops.cc |
| 305 | bin/loop-convert ~/test-files/simple-loops.cc |
| 306 | |
| 307 | Step 3.5: More Complicated Matchers |
| 308 | =================================== |
| 309 | |
| 310 | Our simple matcher is capable of discovering for loops, but we would |
| 311 | still need to filter out many more ourselves. We can do a good portion |
| 312 | of the remaining work with some cleverly chosen matchers, but first we |
| 313 | need to decide exactly which properties we want to allow. |
| 314 | |
| 315 | How can we characterize for loops over arrays which would be eligible |
| 316 | for translation to range-based syntax? Range based loops over arrays of |
| 317 | size ``N`` that: |
| 318 | |
| 319 | - start at index ``0`` |
| 320 | - iterate consecutively |
| 321 | - end at index ``N-1`` |
| 322 | |
| 323 | We already check for (1), so all we need to add is a check to the loop's |
| 324 | condition to ensure that the loop's index variable is compared against |
| 325 | ``N`` and another check to ensure that the increment step just |
| 326 | increments this same variable. The matcher for (2) is straightforward: |
| 327 | require a pre- or post-increment of the same variable declared in the |
| 328 | init portion. |
| 329 | |
| 330 | Unfortunately, such a matcher is impossible to write. Matchers contain |
| 331 | no logic for comparing two arbitrary AST nodes and determining whether |
| 332 | or not they are equal, so the best we can do is matching more than we |
| 333 | would like to allow, and punting extra comparisons to the callback. |
| 334 | |
| 335 | In any case, we can start building this sub-matcher. We can require that |
| 336 | the increment step be a unary increment like this: |
| 337 | |
Dmitri Gribenko | 68a0668 | 2013-03-05 13:05:56 +0000 | [diff] [blame] | 338 | .. code-block:: c++ |
Sean Silva | bf9b4cd | 2012-12-13 01:10:46 +0000 | [diff] [blame] | 339 | |
| 340 | hasIncrement(unaryOperator(hasOperatorName("++"))) |
| 341 | |
| 342 | Specifying what is incremented introduces another quirk of Clang's AST: |
| 343 | Usages of variables are represented as ``DeclRefExpr``'s ("declaration |
| 344 | reference expressions") because they are expressions which refer to |
| 345 | variable declarations. To find a ``unaryOperator`` that refers to a |
| 346 | specific declaration, we can simply add a second condition to it: |
| 347 | |
Dmitri Gribenko | 68a0668 | 2013-03-05 13:05:56 +0000 | [diff] [blame] | 348 | .. code-block:: c++ |
Sean Silva | bf9b4cd | 2012-12-13 01:10:46 +0000 | [diff] [blame] | 349 | |
| 350 | hasIncrement(unaryOperator( |
| 351 | hasOperatorName("++"), |
| 352 | hasUnaryOperand(declRefExpr()))) |
| 353 | |
| 354 | Furthermore, we can restrict our matcher to only match if the |
| 355 | incremented variable is an integer: |
| 356 | |
Dmitri Gribenko | 68a0668 | 2013-03-05 13:05:56 +0000 | [diff] [blame] | 357 | .. code-block:: c++ |
Sean Silva | bf9b4cd | 2012-12-13 01:10:46 +0000 | [diff] [blame] | 358 | |
| 359 | hasIncrement(unaryOperator( |
| 360 | hasOperatorName("++"), |
| 361 | hasUnaryOperand(declRefExpr(to(varDecl(hasType(isInteger()))))))) |
| 362 | |
| 363 | And the last step will be to attach an identifier to this variable, so |
| 364 | that we can retrieve it in the callback: |
| 365 | |
Dmitri Gribenko | 68a0668 | 2013-03-05 13:05:56 +0000 | [diff] [blame] | 366 | .. code-block:: c++ |
Sean Silva | bf9b4cd | 2012-12-13 01:10:46 +0000 | [diff] [blame] | 367 | |
| 368 | hasIncrement(unaryOperator( |
| 369 | hasOperatorName("++"), |
| 370 | hasUnaryOperand(declRefExpr(to( |
| 371 | varDecl(hasType(isInteger())).bind("incrementVariable")))))) |
| 372 | |
| 373 | We can add this code to the definition of ``LoopMatcher`` and make sure |
| 374 | that our program, outfitted with the new matcher, only prints out loops |
| 375 | that declare a single variable initialized to zero and have an increment |
| 376 | step consisting of a unary increment of some variable. |
| 377 | |
| 378 | Now, we just need to add a matcher to check if the condition part of the |
| 379 | ``for`` loop compares a variable against the size of the array. There is |
| 380 | only one problem - we don't know which array we're iterating over |
| 381 | without looking at the body of the loop! We are again restricted to |
| 382 | approximating the result we want with matchers, filling in the details |
| 383 | in the callback. So we start with: |
| 384 | |
Dmitri Gribenko | 68a0668 | 2013-03-05 13:05:56 +0000 | [diff] [blame] | 385 | .. code-block:: c++ |
Sean Silva | bf9b4cd | 2012-12-13 01:10:46 +0000 | [diff] [blame] | 386 | |
| 387 | hasCondition(binaryOperator(hasOperatorName("<")) |
| 388 | |
| 389 | It makes sense to ensure that the left-hand side is a reference to a |
| 390 | variable, and that the right-hand side has integer type. |
| 391 | |
Dmitri Gribenko | 68a0668 | 2013-03-05 13:05:56 +0000 | [diff] [blame] | 392 | .. code-block:: c++ |
Sean Silva | bf9b4cd | 2012-12-13 01:10:46 +0000 | [diff] [blame] | 393 | |
| 394 | hasCondition(binaryOperator( |
| 395 | hasOperatorName("<"), |
Edwin Vane | eec9a93 | 2013-03-05 18:04:37 +0000 | [diff] [blame] | 396 | hasLHS(declRefExpr(to(varDecl(hasType(isInteger()))))), |
| 397 | hasRHS(expr(hasType(isInteger()))))) |
Sean Silva | bf9b4cd | 2012-12-13 01:10:46 +0000 | [diff] [blame] | 398 | |
| 399 | Why? Because it doesn't work. Of the three loops provided in |
| 400 | ``test-files/simple.cpp``, zero of them have a matching condition. A |
| 401 | quick look at the AST dump of the first for loop, produced by the |
| 402 | previous iteration of loop-convert, shows us the answer: |
| 403 | |
| 404 | :: |
| 405 | |
| 406 | (ForStmt 0x173b240 |
| 407 | (DeclStmt 0x173afc8 |
| 408 | 0x173af50 "int i = |
| 409 | (IntegerLiteral 0x173afa8 'int' 0)") |
| 410 | <<>> |
| 411 | (BinaryOperator 0x173b060 '_Bool' '<' |
Manuel Klimek | f5a0ae8 | 2013-04-18 14:30:45 +0000 | [diff] [blame] | 412 | (ImplicitCastExpr 0x173b030 'int' |
Sean Silva | bf9b4cd | 2012-12-13 01:10:46 +0000 | [diff] [blame] | 413 | (DeclRefExpr 0x173afe0 'int' lvalue Var 0x173af50 'i' 'int')) |
Manuel Klimek | f5a0ae8 | 2013-04-18 14:30:45 +0000 | [diff] [blame] | 414 | (ImplicitCastExpr 0x173b048 'int' |
Sean Silva | bf9b4cd | 2012-12-13 01:10:46 +0000 | [diff] [blame] | 415 | (DeclRefExpr 0x173b008 'const int' lvalue Var 0x170fa80 'N' 'const int'))) |
| 416 | (UnaryOperator 0x173b0b0 'int' lvalue prefix '++' |
| 417 | (DeclRefExpr 0x173b088 'int' lvalue Var 0x173af50 'i' 'int')) |
Dmitri Gribenko | b8f9f3e | 2013-04-27 16:26:29 +0000 | [diff] [blame] | 418 | (CompoundStatement ... |
Sean Silva | bf9b4cd | 2012-12-13 01:10:46 +0000 | [diff] [blame] | 419 | |
| 420 | We already know that the declaration and increments both match, or this |
| 421 | loop wouldn't have been dumped. The culprit lies in the implicit cast |
| 422 | applied to the first operand (i.e. the LHS) of the less-than operator, |
| 423 | an L-value to R-value conversion applied to the expression referencing |
| 424 | ``i``. Thankfully, the matcher library offers a solution to this problem |
| 425 | in the form of ``ignoringParenImpCasts``, which instructs the matcher to |
| 426 | ignore implicit casts and parentheses before continuing to match. |
| 427 | Adjusting the condition operator will restore the desired match. |
| 428 | |
Dmitri Gribenko | 68a0668 | 2013-03-05 13:05:56 +0000 | [diff] [blame] | 429 | .. code-block:: c++ |
Sean Silva | bf9b4cd | 2012-12-13 01:10:46 +0000 | [diff] [blame] | 430 | |
| 431 | hasCondition(binaryOperator( |
| 432 | hasOperatorName("<"), |
Edwin Vane | eec9a93 | 2013-03-05 18:04:37 +0000 | [diff] [blame] | 433 | hasLHS(ignoringParenImpCasts(declRefExpr( |
| 434 | to(varDecl(hasType(isInteger())))))), |
| 435 | hasRHS(expr(hasType(isInteger()))))) |
Sean Silva | bf9b4cd | 2012-12-13 01:10:46 +0000 | [diff] [blame] | 436 | |
| 437 | After adding binds to the expressions we wished to capture and |
| 438 | extracting the identifier strings into variables, we have array-step-2 |
| 439 | completed. |
| 440 | |
| 441 | Step 4: Retrieving Matched Nodes |
| 442 | ================================ |
| 443 | |
| 444 | So far, the matcher callback isn't very interesting: it just dumps the |
| 445 | loop's AST. At some point, we will need to make changes to the input |
| 446 | source code. Next, we'll work on using the nodes we bound in the |
| 447 | previous step. |
| 448 | |
| 449 | The ``MatchFinder::run()`` callback takes a |
| 450 | ``MatchFinder::MatchResult&`` as its parameter. We're most interested in |
| 451 | its ``Context`` and ``Nodes`` members. Clang uses the ``ASTContext`` |
| 452 | class to represent contextual information about the AST, as the name |
| 453 | implies, though the most functionally important detail is that several |
| 454 | operations require an ``ASTContext*`` parameter. More immediately useful |
| 455 | is the set of matched nodes, and how we retrieve them. |
| 456 | |
| 457 | Since we bind three variables (identified by ConditionVarName, |
| 458 | InitVarName, and IncrementVarName), we can obtain the matched nodes by |
| 459 | using the ``getNodeAs()`` member function. |
| 460 | |
Manuel Klimek | f5a0ae8 | 2013-04-18 14:30:45 +0000 | [diff] [blame] | 461 | In ``LoopConvert.cpp`` add |
Sean Silva | bf9b4cd | 2012-12-13 01:10:46 +0000 | [diff] [blame] | 462 | |
Dmitri Gribenko | 68a0668 | 2013-03-05 13:05:56 +0000 | [diff] [blame] | 463 | .. code-block:: c++ |
Sean Silva | bf9b4cd | 2012-12-13 01:10:46 +0000 | [diff] [blame] | 464 | |
| 465 | #include "clang/AST/ASTContext.h" |
| 466 | |
Manuel Klimek | f5a0ae8 | 2013-04-18 14:30:45 +0000 | [diff] [blame] | 467 | Change ``LoopMatcher`` to |
Sean Silva | bf9b4cd | 2012-12-13 01:10:46 +0000 | [diff] [blame] | 468 | |
Dmitri Gribenko | 68a0668 | 2013-03-05 13:05:56 +0000 | [diff] [blame] | 469 | .. code-block:: c++ |
Sean Silva | bf9b4cd | 2012-12-13 01:10:46 +0000 | [diff] [blame] | 470 | |
Manuel Klimek | f5a0ae8 | 2013-04-18 14:30:45 +0000 | [diff] [blame] | 471 | StatementMatcher LoopMatcher = |
| 472 | forStmt(hasLoopInit(declStmt( |
| 473 | hasSingleDecl(varDecl(hasInitializer(integerLiteral(equals(0)))) |
| 474 | .bind("initVarName")))), |
| 475 | hasIncrement(unaryOperator( |
| 476 | hasOperatorName("++"), |
| 477 | hasUnaryOperand(declRefExpr( |
| 478 | to(varDecl(hasType(isInteger())).bind("incVarName")))))), |
| 479 | hasCondition(binaryOperator( |
| 480 | hasOperatorName("<"), |
| 481 | hasLHS(ignoringParenImpCasts(declRefExpr( |
| 482 | to(varDecl(hasType(isInteger())).bind("condVarName"))))), |
| 483 | hasRHS(expr(hasType(isInteger())))))).bind("forLoop"); |
| 484 | |
| 485 | And change ``LoopPrinter::run`` to |
| 486 | |
| 487 | .. code-block:: c++ |
| 488 | |
| 489 | void LoopPrinter::run(const MatchFinder::MatchResult &Result) { |
| 490 | ASTContext *Context = Result.Context; |
Mads Ravn | 8ab17a3 | 2016-12-30 20:49:44 +0000 | [diff] [blame] | 491 | const ForStmt *FS = Result.Nodes.getNodeAs<ForStmt>("forLoop"); |
Manuel Klimek | f5a0ae8 | 2013-04-18 14:30:45 +0000 | [diff] [blame] | 492 | // We do not want to convert header files! |
Mads Ravn | 8ab17a3 | 2016-12-30 20:49:44 +0000 | [diff] [blame] | 493 | if (!FS || !Context->getSourceManager().isWrittenInMainFile(FS->getForLoc())) |
Manuel Klimek | f5a0ae8 | 2013-04-18 14:30:45 +0000 | [diff] [blame] | 494 | return; |
| 495 | const VarDecl *IncVar = Result.Nodes.getNodeAs<VarDecl>("incVarName"); |
| 496 | const VarDecl *CondVar = Result.Nodes.getNodeAs<VarDecl>("condVarName"); |
| 497 | const VarDecl *InitVar = Result.Nodes.getNodeAs<VarDecl>("initVarName"); |
| 498 | |
| 499 | if (!areSameVariable(IncVar, CondVar) || !areSameVariable(IncVar, InitVar)) |
| 500 | return; |
| 501 | llvm::outs() << "Potential array-based loop discovered.\n"; |
| 502 | } |
| 503 | |
| 504 | Clang associates a ``VarDecl`` with each variable to represent the variable's |
| 505 | declaration. Since the "canonical" form of each declaration is unique by |
| 506 | address, all we need to do is make sure neither ``ValueDecl`` (base class of |
| 507 | ``VarDecl``) is ``NULL`` and compare the canonical Decls. |
| 508 | |
| 509 | .. code-block:: c++ |
| 510 | |
| 511 | static bool areSameVariable(const ValueDecl *First, const ValueDecl *Second) { |
| 512 | return First && Second && |
| 513 | First->getCanonicalDecl() == Second->getCanonicalDecl(); |
| 514 | } |
Sean Silva | bf9b4cd | 2012-12-13 01:10:46 +0000 | [diff] [blame] | 515 | |
| 516 | If execution reaches the end of ``LoopPrinter::run()``, we know that the |
| 517 | loop shell that looks like |
| 518 | |
Dmitri Gribenko | 68a0668 | 2013-03-05 13:05:56 +0000 | [diff] [blame] | 519 | .. code-block:: c++ |
Sean Silva | bf9b4cd | 2012-12-13 01:10:46 +0000 | [diff] [blame] | 520 | |
| 521 | for (int i= 0; i < expr(); ++i) { ... } |
| 522 | |
| 523 | For now, we will just print a message explaining that we found a loop. |
| 524 | The next section will deal with recursively traversing the AST to |
| 525 | discover all changes needed. |
| 526 | |
Manuel Klimek | f5a0ae8 | 2013-04-18 14:30:45 +0000 | [diff] [blame] | 527 | As a side note, it's not as trivial to test if two expressions are the same, |
| 528 | though Clang has already done the hard work for us by providing a way to |
Sean Silva | bf9b4cd | 2012-12-13 01:10:46 +0000 | [diff] [blame] | 529 | canonicalize expressions: |
| 530 | |
Dmitri Gribenko | 68a0668 | 2013-03-05 13:05:56 +0000 | [diff] [blame] | 531 | .. code-block:: c++ |
Sean Silva | bf9b4cd | 2012-12-13 01:10:46 +0000 | [diff] [blame] | 532 | |
Dmitri Gribenko | 68a0668 | 2013-03-05 13:05:56 +0000 | [diff] [blame] | 533 | static bool areSameExpr(ASTContext *Context, const Expr *First, |
Sean Silva | bf9b4cd | 2012-12-13 01:10:46 +0000 | [diff] [blame] | 534 | const Expr *Second) { |
| 535 | if (!First || !Second) |
| 536 | return false; |
| 537 | llvm::FoldingSetNodeID FirstID, SecondID; |
| 538 | First->Profile(FirstID, *Context, true); |
| 539 | Second->Profile(SecondID, *Context, true); |
| 540 | return FirstID == SecondID; |
| 541 | } |
| 542 | |
| 543 | This code relies on the comparison between two |
| 544 | ``llvm::FoldingSetNodeIDs``. As the documentation for |
| 545 | ``Stmt::Profile()`` indicates, the ``Profile()`` member function builds |
| 546 | a description of a node in the AST, based on its properties, along with |
| 547 | those of its children. ``FoldingSetNodeID`` then serves as a hash we can |
| 548 | use to compare expressions. We will need ``areSameExpr`` later. Before |
| 549 | you run the new code on the additional loops added to |
| 550 | test-files/simple.cpp, try to figure out which ones will be considered |
| 551 | potentially convertible. |