Anna Zaks | d67fc49 | 2011-11-02 17:49:20 +0000 | [diff] [blame] | 1 | <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" |
| 2 | "http://www.w3.org/TR/html4/strict.dtd"> |
| 3 | <html> |
| 4 | <head> |
| 5 | <title>Checker Developer Manual</title> |
| 6 | <link type="text/css" rel="stylesheet" href="menu.css" /> |
| 7 | <link type="text/css" rel="stylesheet" href="content.css" /> |
| 8 | <script type="text/javascript" src="scripts/menu.js"></script> |
| 9 | </head> |
| 10 | <body> |
| 11 | |
| 12 | <div id="page"> |
| 13 | <!--#include virtual="menu.html.incl"--> |
| 14 | |
| 15 | <div id="content"> |
| 16 | |
| 17 | <h1><font color=red>This Page Is Under Construction</font></h1> |
| 18 | |
| 19 | <h1>Checker Developer Manual</h1> |
| 20 | |
| 21 | <p>The static analyzer engine performs symbolic execution of the program and |
| 22 | relies on a set of checkers to implement the logic for detecting and |
| 23 | constructing bug reports. This page provides hints and guidelines for anyone |
| 24 | who is interested in implementing their own checker. The static analyzer is a |
| 25 | part of the Clang project, so consult <a href="http://clang.llvm.org/hacking.html">Hacking on Clang</a> |
| 26 | and <a href="http://llvm.org/docs/ProgrammersManual.html">LLVM Programmer's Manual</a> |
| 27 | for general developer guidelines and information. </p> |
| 28 | |
| 29 | <ul> |
| 30 | <li><a href="#start">Getting Started</a></li> |
Anna Zaks | 464ef2e | 2011-11-07 05:36:29 +0000 | [diff] [blame^] | 31 | <li><a href="#analyzer">Analyzer Overview</a></li> |
Anna Zaks | d67fc49 | 2011-11-02 17:49:20 +0000 | [diff] [blame] | 32 | <li><a href="#idea">Idea for a Checker</a></li> |
Anna Zaks | 464ef2e | 2011-11-07 05:36:29 +0000 | [diff] [blame^] | 33 | <li><a href="#registration">Checker Registration</a></li> |
Anna Zaks | d67fc49 | 2011-11-02 17:49:20 +0000 | [diff] [blame] | 34 | <li><a href="#skeleton">Checker Skeleton</a></li> |
| 35 | <li><a href="#node">Exploded Node</a></li> |
| 36 | <li><a href="#bugs">Bug Reports</a></li> |
| 37 | <li><a href="#ast">AST Visitors</a></li> |
| 38 | <li><a href="#testing">Testing</a></li> |
| 39 | <li><a href="#commands">Useful Commands</a></li> |
| 40 | </ul> |
| 41 | |
| 42 | <h2 id=start>Getting Started</h2> |
| 43 | <ul> |
Anna Zaks | 464ef2e | 2011-11-07 05:36:29 +0000 | [diff] [blame^] | 44 | <li>To check out the source code and build the project, follow steps 1-4 of |
| 45 | the <a href="http://clang.llvm.org/get_started.html">Clang Getting Started</a> |
Anna Zaks | d67fc49 | 2011-11-02 17:49:20 +0000 | [diff] [blame] | 46 | page.</li> |
| 47 | |
| 48 | <li>The analyzer source code is located under the Clang source tree: |
| 49 | <br><tt> |
| 50 | $ <b>cd llvm/tools/clang</b> |
| 51 | </tt> |
Anna Zaks | 464ef2e | 2011-11-07 05:36:29 +0000 | [diff] [blame^] | 52 | <br>See: <tt>include/clang/StaticAnalyzer</tt>, <tt>lib/StaticAnalyzer</tt>, |
| 53 | <tt>test/Analysis</tt>.</li> |
Anna Zaks | d67fc49 | 2011-11-02 17:49:20 +0000 | [diff] [blame] | 54 | |
Anna Zaks | 464ef2e | 2011-11-07 05:36:29 +0000 | [diff] [blame^] | 55 | <li>The analyzer regression tests can be executed from the Clang's build |
| 56 | directory: |
Anna Zaks | d67fc49 | 2011-11-02 17:49:20 +0000 | [diff] [blame] | 57 | <br><tt> |
| 58 | $ <b>cd ../../../; cd build/tools/clang; TESTDIRS=Analysis make test</b> |
| 59 | </tt></li> |
| 60 | |
| 61 | <li>Analyze a file with the specified checker: |
| 62 | <br><tt> |
| 63 | $ <b>clang -cc1 -analyze -analyzer-checker=core.DivideZero test.c</b> |
| 64 | </tt></li> |
| 65 | |
| 66 | <li>List the available checkers: |
| 67 | <br><tt> |
| 68 | $ <b>clang -cc1 -analyzer-checker-help</b> |
| 69 | </tt></li> |
| 70 | |
Anna Zaks | 464ef2e | 2011-11-07 05:36:29 +0000 | [diff] [blame^] | 71 | <li>See the analyzer help for different output formats, fine tuning, and |
| 72 | debug options: |
Anna Zaks | d67fc49 | 2011-11-02 17:49:20 +0000 | [diff] [blame] | 73 | <br><tt> |
| 74 | $ <b>clang -cc1 -help | grep "analyzer"</b> |
| 75 | </tt></li> |
| 76 | |
| 77 | </ul> |
| 78 | |
| 79 | <h2 id=analyzer>Static Analyzer Overview</h2> |
Anna Zaks | 464ef2e | 2011-11-07 05:36:29 +0000 | [diff] [blame^] | 80 | The analyzer core performs symbolic execution of the given program. All the |
| 81 | input values are represented with symbolic values; further, the engine deduces |
| 82 | the values of all the expressions in the program based on the input symbols |
| 83 | and the path. The execution is path sensitive and every possible path through |
| 84 | the program is explored. The explored execution traces are represented with |
| 85 | <a href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1ExplodedGraph.html">ExplidedGraph</a> object. |
| 86 | Each node of the graph is |
| 87 | <a href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1ExplodedNode.html">ExplodedNode</a>, |
| 88 | which consists of a <tt>ProgramPoint</tt> and a <tt>ProgramState</tt>. |
| 89 | <p> |
| 90 | <a href="http://clang.llvm.org/doxygen/classclang_1_1ProgramPoint.html">ProgramPoint</a> |
| 91 | represents the corresponding location in the program (or the CFG graph). |
| 92 | <tt>ProgramPoint</tt> is also used to record additional information on |
| 93 | when/how the state was added. For example, <tt>PostPurgeDeadSymbolsKind</tt> |
| 94 | kind means that the state is the result of purging dead symbols - the |
| 95 | analyzer's equivalent of garbage collection. |
| 96 | <p> |
| 97 | <a href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1ProgramState.html">ProgramState</a> |
| 98 | represents abstract state of the program. It consists of: |
Anna Zaks | d67fc49 | 2011-11-02 17:49:20 +0000 | [diff] [blame] | 99 | <ul> |
Anna Zaks | 464ef2e | 2011-11-07 05:36:29 +0000 | [diff] [blame^] | 100 | <li><tt>Environment</tt> - a mapping from source code expressions to symbolic |
| 101 | values |
| 102 | <li><tt>Store</tt> - a mapping from memory locations to symbolic values |
| 103 | <li><tt>GenericDataMap</tt> - constraints on symbolic values |
| 104 | </ul> |
| 105 | |
| 106 | <p> |
| 107 | Checkers are not merely passive receivers of the analyzer core changes - they |
| 108 | actively participate in the <tt>ProgramState</tt> construction through the |
| 109 | <tt>GenericDataMap</tt> which can be used to store the checker-defined part |
| 110 | of the state. Each time the analyzer engine explores a new statement, it |
| 111 | notifies each checker registered to listen for that statement, giving it an |
| 112 | opportunity to either report a bug or modify the state. (As a rule of thumb, |
| 113 | the checker itself should be stateless.) The checkers are called one after another |
| 114 | in the predefined order; thus, calling all the checkers adds a chain to the |
| 115 | <tt>ExplodedGraph</tt>. |
| 116 | <!-- |
| 117 | TODO: Add a picture. |
| 118 | <br> |
| 119 | Symbols<br> |
| 120 | FunctionalObjects are used throughout. |
| 121 | --> |
| 122 | <h2 id=idea>Idea for a Checker</h2> |
| 123 | Here are several questions which you should consider when evaluating your |
| 124 | checker idea: |
| 125 | <ul> |
| 126 | <li>Can the check be effectively implemented without path-sensitive |
| 127 | analysis? See <a href="#ast">AST Visitors</a>.</li> |
Anna Zaks | d67fc49 | 2011-11-02 17:49:20 +0000 | [diff] [blame] | 128 | |
| 129 | <li>How high the false positive rate is going to be? Looking at the occurrences |
Anna Zaks | 464ef2e | 2011-11-07 05:36:29 +0000 | [diff] [blame^] | 130 | of the issue you want to write a checker for in the existing code bases might |
| 131 | give you some ideas. </li> |
Anna Zaks | d67fc49 | 2011-11-02 17:49:20 +0000 | [diff] [blame] | 132 | |
| 133 | <li>How the current limitations of the analysis will effect the false alarm |
| 134 | rate? Currently, the analyzer only reasons about one procedure at a time (no |
Anna Zaks | 464ef2e | 2011-11-07 05:36:29 +0000 | [diff] [blame^] | 135 | inter-procedural analysis). Also, it uses a simple range tracking based |
| 136 | solver to model symbolic execution.</li> |
Anna Zaks | d67fc49 | 2011-11-02 17:49:20 +0000 | [diff] [blame] | 137 | |
| 138 | <li>Consult the <a href="http://llvm.org/bugs/buglist.cgi?query_format=advanced&bug_status=NEW&bug_status=REOPENED&version=trunk&component=Static%20Analyzer&product=clang">Bugzilla database</a> |
| 139 | to get some ideas for new checkers and consider starting with improving/fixing |
| 140 | bugs in the existing checkers.</li> |
| 141 | </ul> |
| 142 | |
Anna Zaks | 464ef2e | 2011-11-07 05:36:29 +0000 | [diff] [blame^] | 143 | <h2 id=registration>Checker Registration</h2> |
| 144 | All checker implementation files are located in <tt>clang/lib/StaticAnalyzer/Checkers</tt> |
| 145 | folder. Follow the steps below to register a new checker with the analyzer. |
| 146 | <ol> |
| 147 | <li>Create a new checker implementation file, for example <tt>./lib/StaticAnalyzer/Checkers/NewChecker.cpp</tt> |
| 148 | <pre class="code_example"> |
| 149 | using namespace clang; |
| 150 | using namespace ento; |
| 151 | |
| 152 | namespace { |
| 153 | class NewChecker: public Checker< check::PreStmt<CallExpr> > { |
| 154 | public: |
| 155 | void checkPreStmt(const CallExpr *CE, CheckerContext &Ctx) const {} |
| 156 | } |
| 157 | } |
| 158 | void ento::registerNewChecker(CheckerManager &mgr) { |
| 159 | mgr.registerChecker<NewChecker>(); |
| 160 | } |
| 161 | </pre> |
| 162 | |
| 163 | <li>Pick the package name for your checker and add the registration code to |
| 164 | <tt>./lib/StaticAnalyzer/Checkers/Checkers.td</tt>. Note, all checkers should |
| 165 | first be developed as experimental. Suppose our new checker performs security |
| 166 | related checks, then we should add the following lines under |
| 167 | <tt>SecurityExperimental</tt> package: |
| 168 | <pre class="code_example"> |
| 169 | let ParentPackage = SecurityExperimental in { |
| 170 | ... |
| 171 | def NewChecker : Checker<"NewChecker">, |
| 172 | HelpText<"This text should give a short description of the checks performed.">, |
| 173 | DescFile<"NewChecker.cpp">; |
| 174 | ... |
| 175 | } // end "security.experimental" |
| 176 | </pre> |
| 177 | |
| 178 | <li>Make the source code file visible to CMake by adding it to |
| 179 | <tt>./lib/StaticAnalyzer/Checkers/CMakeLists.txt</tt>. |
| 180 | |
| 181 | <li>Compile and see your checker in the list of available checkers by running:<br> |
| 182 | <tt><b>$clang -cc1 -analyzer-checker-help</b></tt> |
| 183 | </ol> |
| 184 | |
| 185 | |
Anna Zaks | d67fc49 | 2011-11-02 17:49:20 +0000 | [diff] [blame] | 186 | <h2 id=skeleton>Checker Skeleton</h2> |
Anna Zaks | d67fc49 | 2011-11-02 17:49:20 +0000 | [diff] [blame] | 187 | There are two main decisions you need to make: |
| 188 | <ul> |
| 189 | <li> Which events the checker should be tracking.</li> |
Anna Zaks | 464ef2e | 2011-11-07 05:36:29 +0000 | [diff] [blame^] | 190 | <li> What data you want to store as part of the checker-specific program |
| 191 | state. Try to minimize the checker state as much as possible. </li> |
Anna Zaks | d67fc49 | 2011-11-02 17:49:20 +0000 | [diff] [blame] | 192 | </ul> |
Anna Zaks | d67fc49 | 2011-11-02 17:49:20 +0000 | [diff] [blame] | 193 | |
| 194 | <h2 id=bugs>Bug Reports</h2> |
| 195 | |
| 196 | <h2 id=ast>AST Visitors</h2> |
| 197 | Some checks might not require path-sensitivity to be effective. Simple AST walk |
Anna Zaks | 464ef2e | 2011-11-07 05:36:29 +0000 | [diff] [blame^] | 198 | might be sufficient. If that is the case, consider implementing a Clang |
| 199 | compiler warning. On the other hand, a check might not be acceptable as a compiler |
Anna Zaks | d67fc49 | 2011-11-02 17:49:20 +0000 | [diff] [blame] | 200 | warning; for example, because of a relatively high false positive rate. In this |
| 201 | situation, AST callbacks <tt><b>checkASTDecl</b></tt> and |
| 202 | <tt><b>checkASTCodeBody</b></tt> are your best friends. |
| 203 | |
| 204 | <h2 id=testing>Testing</h2> |
| 205 | Every patch should be well tested with Clang regression tests. The checker tests |
| 206 | live in <tt>clang/test/Analysis</tt> folder. To run all of the analyzer tests, |
| 207 | execute the following from the <tt>clang</tt> build directory: |
| 208 | <pre class="code"> |
| 209 | $ <b>TESTDIRS=Analysis make test</b> |
| 210 | </pre> |
| 211 | |
| 212 | <h2 id=commands>Useful Commands/Debugging Hints</h2> |
| 213 | <ul> |
| 214 | <li> |
Anna Zaks | 464ef2e | 2011-11-07 05:36:29 +0000 | [diff] [blame^] | 215 | While investigating a checker-related issue, instruct the analyzer to only |
| 216 | execute a single checker: |
Anna Zaks | d67fc49 | 2011-11-02 17:49:20 +0000 | [diff] [blame] | 217 | <br><tt> |
| 218 | $ <b>clang -cc1 -analyze -analyzer-checker=osx.KeychainAPI test.c</b> |
| 219 | </tt> |
| 220 | </li> |
| 221 | <li> |
| 222 | To dump AST: |
| 223 | <br><tt> |
| 224 | $ <b>clang -cc1 -ast-dump test.c</b> |
| 225 | </tt> |
| 226 | </li> |
| 227 | <li> |
| 228 | To view/dump CFG use <tt>debug.ViewCFG</tt> or <tt>debug.DumpCFG</tt> checkers: |
| 229 | <br><tt> |
| 230 | $ <b>clang -cc1 -analyze -analyzer-checker=debug.ViewCFG test.c</b> |
| 231 | </tt> |
| 232 | </li> |
| 233 | <li> |
| 234 | To see all available debug checkers: |
| 235 | <br><tt> |
| 236 | $ <b>clang -cc1 -analyzer-checker-help | grep "debug"</b> |
| 237 | </tt> |
| 238 | </li> |
| 239 | <li> |
Anna Zaks | 464ef2e | 2011-11-07 05:36:29 +0000 | [diff] [blame^] | 240 | To see which function is failing while processing a large file use |
| 241 | <tt>-analyzer-display-progress</tt> option. |
Anna Zaks | d67fc49 | 2011-11-02 17:49:20 +0000 | [diff] [blame] | 242 | </li> |
| 243 | <li> |
Anna Zaks | 464ef2e | 2011-11-07 05:36:29 +0000 | [diff] [blame^] | 244 | While debugging execute <tt>clang -cc1 -analyze -analyzer-checker=core</tt> |
| 245 | instead of <tt>clang --analyze</tt>, as the later would call the compiler |
| 246 | in a separate process. |
Anna Zaks | d67fc49 | 2011-11-02 17:49:20 +0000 | [diff] [blame] | 247 | </li> |
| 248 | <li> |
Anna Zaks | 464ef2e | 2011-11-07 05:36:29 +0000 | [diff] [blame^] | 249 | To view <tt>ExplodedGraph</tt> (the state graph explored by the analyzer) while |
| 250 | debugging, goto a frame that has <tt>clang::ento::ExprEngine</tt> object and |
| 251 | execute: |
Anna Zaks | d67fc49 | 2011-11-02 17:49:20 +0000 | [diff] [blame] | 252 | <br><tt> |
| 253 | (gdb) <b>p ViewGraph(0)</b> |
| 254 | </tt> |
| 255 | </li> |
| 256 | <li> |
Anna Zaks | 464ef2e | 2011-11-07 05:36:29 +0000 | [diff] [blame^] | 257 | To see <tt>clang::Expr</tt> while debugging use the following command. If you |
| 258 | pass in a SourceManager object, it will also dump the corresponding line in the |
| 259 | source code. |
Anna Zaks | d67fc49 | 2011-11-02 17:49:20 +0000 | [diff] [blame] | 260 | <br><tt> |
| 261 | (gdb) <b>p E->dump()</b> |
| 262 | </tt> |
| 263 | </li> |
| 264 | <li> |
| 265 | To dump AST of a method that the current <tt>ExplodedNode</tt> belongs to: |
| 266 | <br><tt> |
| 267 | (gdb) <b>p ENode->getCodeDecl().getBody()->dump(getContext().getSourceManager())</b> |
| 268 | </tt> |
| 269 | </li> |
| 270 | </ul> |
| 271 | |
| 272 | </div> |
| 273 | </div> |
| 274 | </body> |
| 275 | </html> |