Anna Zaks | 75a3f48 | 2011-11-02 17:49:20 +0000 | [diff] [blame] | 1 | <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" |
| 2 | "http://www.w3.org/TR/html4/strict.dtd"> |
| 3 | <html> |
| 4 | <head> |
| 5 | <title>Checker Developer Manual</title> |
Benjamin Kramer | eaa262b | 2012-01-15 15:26:07 +0000 | [diff] [blame] | 6 | <link type="text/css" rel="stylesheet" href="menu.css"> |
| 7 | <link type="text/css" rel="stylesheet" href="content.css"> |
Anna Zaks | 75a3f48 | 2011-11-02 17:49:20 +0000 | [diff] [blame] | 8 | <script type="text/javascript" src="scripts/menu.js"></script> |
| 9 | </head> |
| 10 | <body> |
| 11 | |
| 12 | <div id="page"> |
| 13 | <!--#include virtual="menu.html.incl"--> |
| 14 | |
| 15 | <div id="content"> |
| 16 | |
Anna Zaks | 8cfbaa6 | 2013-05-18 22:51:28 +0000 | [diff] [blame] | 17 | <h3 style="color:red">This Page Is Under Construction</h3> |
Anna Zaks | 75a3f48 | 2011-11-02 17:49:20 +0000 | [diff] [blame] | 18 | |
| 19 | <h1>Checker Developer Manual</h1> |
| 20 | |
Anna Zaks | 1ebded0 | 2013-04-14 18:36:51 +0000 | [diff] [blame] | 21 | <p>The static analyzer engine performs path-sensitive exploration of the program and |
Anna Zaks | 75a3f48 | 2011-11-02 17:49:20 +0000 | [diff] [blame] | 22 | relies on a set of checkers to implement the logic for detecting and |
Anna Zaks | 1ebded0 | 2013-04-14 18:36:51 +0000 | [diff] [blame] | 23 | constructing specific bug reports. Anyone who is interested in implementing their own |
| 24 | checker, should check out the Building a Checker in 24 Hours talk |
| 25 | (<a href="http://llvm.org/devmtg/2012-11/Zaks-Rose-Checker24Hours.pdf">slides</a> |
| 26 | <a href="http://llvm.org/devmtg/2012-11/videos/Zaks-Rose-Checker24Hours.mp4">video</a>) |
| 27 | and refer to this page for additional information on writing a checker. The static analyzer is a |
Anna Zaks | 75a3f48 | 2011-11-02 17:49:20 +0000 | [diff] [blame] | 28 | part of the Clang project, so consult <a href="http://clang.llvm.org/hacking.html">Hacking on Clang</a> |
Anna Zaks | 1ebded0 | 2013-04-14 18:36:51 +0000 | [diff] [blame] | 29 | and <a href="http://llvm.org/docs/ProgrammersManual.html">LLVM Programmer's Manual</a> |
| 30 | for developer guidelines and send your questions and proposals to |
| 31 | <a href=http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev>cfe-dev mailing list</a>. |
| 32 | </p> |
Anna Zaks | 75a3f48 | 2011-11-02 17:49:20 +0000 | [diff] [blame] | 33 | |
| 34 | <ul> |
| 35 | <li><a href="#start">Getting Started</a></li> |
Anna Zaks | 8cfbaa6 | 2013-05-18 22:51:28 +0000 | [diff] [blame] | 36 | <li><a href="#analyzer">Static Analyzer Overview</a> |
| 37 | <ul> |
| 38 | <li><a href="#interaction">Interaction with Checkers</a></li> |
| 39 | <li><a href="#values">Representing Values</a></li> |
| 40 | </ul></li> |
Anna Zaks | 75a3f48 | 2011-11-02 17:49:20 +0000 | [diff] [blame] | 41 | <li><a href="#idea">Idea for a Checker</a></li> |
Anna Zaks | 5259086 | 2011-11-07 05:36:29 +0000 | [diff] [blame] | 42 | <li><a href="#registration">Checker Registration</a></li> |
Anna Zaks | 8cfbaa6 | 2013-05-18 22:51:28 +0000 | [diff] [blame] | 43 | <li><a href="#events_callbacks">Events, Callbacks, and Checker Class Structure</a></li> |
| 44 | <li><a href="#extendingstates">Custom Program States</a></li> |
Anna Zaks | 75a3f48 | 2011-11-02 17:49:20 +0000 | [diff] [blame] | 45 | <li><a href="#bugs">Bug Reports</a></li> |
| 46 | <li><a href="#ast">AST Visitors</a></li> |
| 47 | <li><a href="#testing">Testing</a></li> |
Anna Zaks | 8cfbaa6 | 2013-05-18 22:51:28 +0000 | [diff] [blame] | 48 | <li><a href="#commands">Useful Commands/Debugging Hints</a></li> |
| 49 | <li><a href="#additioninformation">Additional Sources of Information</a></li> |
Anton Yartsev | 45056dc | 2014-05-19 15:04:55 +0000 | [diff] [blame] | 50 | <li><a href="#links">Useful Links</a></li> |
Anna Zaks | 75a3f48 | 2011-11-02 17:49:20 +0000 | [diff] [blame] | 51 | </ul> |
| 52 | |
| 53 | <h2 id=start>Getting Started</h2> |
| 54 | <ul> |
Anna Zaks | 5259086 | 2011-11-07 05:36:29 +0000 | [diff] [blame] | 55 | <li>To check out the source code and build the project, follow steps 1-4 of |
| 56 | the <a href="http://clang.llvm.org/get_started.html">Clang Getting Started</a> |
Anna Zaks | 75a3f48 | 2011-11-02 17:49:20 +0000 | [diff] [blame] | 57 | page.</li> |
| 58 | |
| 59 | <li>The analyzer source code is located under the Clang source tree: |
| 60 | <br><tt> |
| 61 | $ <b>cd llvm/tools/clang</b> |
| 62 | </tt> |
Anna Zaks | 5259086 | 2011-11-07 05:36:29 +0000 | [diff] [blame] | 63 | <br>See: <tt>include/clang/StaticAnalyzer</tt>, <tt>lib/StaticAnalyzer</tt>, |
| 64 | <tt>test/Analysis</tt>.</li> |
Anna Zaks | 75a3f48 | 2011-11-02 17:49:20 +0000 | [diff] [blame] | 65 | |
Anna Zaks | 5259086 | 2011-11-07 05:36:29 +0000 | [diff] [blame] | 66 | <li>The analyzer regression tests can be executed from the Clang's build |
| 67 | directory: |
Anna Zaks | 75a3f48 | 2011-11-02 17:49:20 +0000 | [diff] [blame] | 68 | <br><tt> |
| 69 | $ <b>cd ../../../; cd build/tools/clang; TESTDIRS=Analysis make test</b> |
| 70 | </tt></li> |
| 71 | |
| 72 | <li>Analyze a file with the specified checker: |
| 73 | <br><tt> |
| 74 | $ <b>clang -cc1 -analyze -analyzer-checker=core.DivideZero test.c</b> |
| 75 | </tt></li> |
| 76 | |
| 77 | <li>List the available checkers: |
| 78 | <br><tt> |
| 79 | $ <b>clang -cc1 -analyzer-checker-help</b> |
| 80 | </tt></li> |
| 81 | |
Anna Zaks | 5259086 | 2011-11-07 05:36:29 +0000 | [diff] [blame] | 82 | <li>See the analyzer help for different output formats, fine tuning, and |
| 83 | debug options: |
Anna Zaks | 75a3f48 | 2011-11-02 17:49:20 +0000 | [diff] [blame] | 84 | <br><tt> |
| 85 | $ <b>clang -cc1 -help | grep "analyzer"</b> |
| 86 | </tt></li> |
| 87 | |
| 88 | </ul> |
| 89 | |
| 90 | <h2 id=analyzer>Static Analyzer Overview</h2> |
Anna Zaks | 5259086 | 2011-11-07 05:36:29 +0000 | [diff] [blame] | 91 | The analyzer core performs symbolic execution of the given program. All the |
| 92 | input values are represented with symbolic values; further, the engine deduces |
| 93 | the values of all the expressions in the program based on the input symbols |
| 94 | and the path. The execution is path sensitive and every possible path through |
| 95 | the program is explored. The explored execution traces are represented with |
David Blaikie | 7c70fe6 | 2012-09-20 20:59:21 +0000 | [diff] [blame] | 96 | <a href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1ExplodedGraph.html">ExplodedGraph</a> object. |
Anna Zaks | 5259086 | 2011-11-07 05:36:29 +0000 | [diff] [blame] | 97 | Each node of the graph is |
| 98 | <a href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1ExplodedNode.html">ExplodedNode</a>, |
| 99 | which consists of a <tt>ProgramPoint</tt> and a <tt>ProgramState</tt>. |
| 100 | <p> |
| 101 | <a href="http://clang.llvm.org/doxygen/classclang_1_1ProgramPoint.html">ProgramPoint</a> |
Jonathan Roelofs | 99bdd98 | 2015-05-19 18:51:56 +0000 | [diff] [blame^] | 102 | represents the corresponding location in the program (or the CFG). |
Anna Zaks | 5259086 | 2011-11-07 05:36:29 +0000 | [diff] [blame] | 103 | <tt>ProgramPoint</tt> is also used to record additional information on |
| 104 | when/how the state was added. For example, <tt>PostPurgeDeadSymbolsKind</tt> |
| 105 | kind means that the state is the result of purging dead symbols - the |
| 106 | analyzer's equivalent of garbage collection. |
| 107 | <p> |
| 108 | <a href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1ProgramState.html">ProgramState</a> |
| 109 | represents abstract state of the program. It consists of: |
Anna Zaks | 75a3f48 | 2011-11-02 17:49:20 +0000 | [diff] [blame] | 110 | <ul> |
Anna Zaks | 5259086 | 2011-11-07 05:36:29 +0000 | [diff] [blame] | 111 | <li><tt>Environment</tt> - a mapping from source code expressions to symbolic |
| 112 | values |
| 113 | <li><tt>Store</tt> - a mapping from memory locations to symbolic values |
| 114 | <li><tt>GenericDataMap</tt> - constraints on symbolic values |
| 115 | </ul> |
| 116 | |
Anna Zaks | 8cfbaa6 | 2013-05-18 22:51:28 +0000 | [diff] [blame] | 117 | <h3 id=interaction>Interaction with Checkers</h3> |
Anna Zaks | 5259086 | 2011-11-07 05:36:29 +0000 | [diff] [blame] | 118 | Checkers are not merely passive receivers of the analyzer core changes - they |
| 119 | actively participate in the <tt>ProgramState</tt> construction through the |
| 120 | <tt>GenericDataMap</tt> which can be used to store the checker-defined part |
| 121 | of the state. Each time the analyzer engine explores a new statement, it |
| 122 | notifies each checker registered to listen for that statement, giving it an |
| 123 | opportunity to either report a bug or modify the state. (As a rule of thumb, |
| 124 | the checker itself should be stateless.) The checkers are called one after another |
| 125 | in the predefined order; thus, calling all the checkers adds a chain to the |
| 126 | <tt>ExplodedGraph</tt>. |
Anna Zaks | db0e173 | 2011-12-07 19:04:24 +0000 | [diff] [blame] | 127 | |
Anna Zaks | 8cfbaa6 | 2013-05-18 22:51:28 +0000 | [diff] [blame] | 128 | <h3 id=values>Representing Values</h3> |
Anna Zaks | db0e173 | 2011-12-07 19:04:24 +0000 | [diff] [blame] | 129 | During symbolic execution, <a href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1SVal.html">SVal</a> |
Anna Zaks | af48a93 | 2013-01-08 00:25:14 +0000 | [diff] [blame] | 130 | objects are used to represent the semantic evaluation of expressions. |
| 131 | They can represent things like concrete |
| 132 | integers, symbolic values, or memory locations (which are memory regions). |
| 133 | They are a discriminated union of "values", symbolic and otherwise. |
| 134 | If a value isn't symbolic, usually that means there is no symbolic |
| 135 | information to track. For example, if the value was an integer, such as |
| 136 | <tt>42</tt>, it would be a <a href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1nonloc_1_1ConcreteInt.html">ConcreteInt</a>, |
| 137 | and the checker doesn't usually need to track any state with the concrete |
| 138 | number. In some cases, <tt>SVal</tt> is not a symbol, but it really should be |
| 139 | a symbolic value. This happens when the analyzer cannot reason about something |
| 140 | (yet). An example is floating point numbers. In such cases, the |
Anna Zaks | 8cfbaa6 | 2013-05-18 22:51:28 +0000 | [diff] [blame] | 141 | <tt>SVal</tt> will evaluate to <a href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1UnknownVal.html">UnknownVal</a>. |
Anna Zaks | af48a93 | 2013-01-08 00:25:14 +0000 | [diff] [blame] | 142 | This represents a case that is outside the realm of the analyzer's reasoning |
| 143 | capabilities. <tt>SVals</tt> are value objects and their values can be viewed |
| 144 | using the <tt>.dump()</tt> method. Often they wrap persistent objects such as |
| 145 | symbols or regions. |
Anna Zaks | db0e173 | 2011-12-07 19:04:24 +0000 | [diff] [blame] | 146 | <p> |
| 147 | <a href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1SymExpr.html">SymExpr</a> (symbol) |
Anna Zaks | af48a93 | 2013-01-08 00:25:14 +0000 | [diff] [blame] | 148 | is meant to represent abstract, but named, symbolic value. Symbols represent |
Anna Zaks | db0e173 | 2011-12-07 19:04:24 +0000 | [diff] [blame] | 149 | an actual (immutable) value. We might not know what its specific value is, but |
Anna Zaks | af48a93 | 2013-01-08 00:25:14 +0000 | [diff] [blame] | 150 | we can associate constraints with that value as we analyze a path. For |
| 151 | example, we might record that the value of a symbol is greater than |
| 152 | <tt>0</tt>, etc. |
Anna Zaks | db0e173 | 2011-12-07 19:04:24 +0000 | [diff] [blame] | 153 | <p> |
Anna Zaks | af48a93 | 2013-01-08 00:25:14 +0000 | [diff] [blame] | 154 | |
| 155 | <p> |
Anna Zaks | db0e173 | 2011-12-07 19:04:24 +0000 | [diff] [blame] | 156 | <a href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1MemRegion.html">MemRegion</a> is similar to a symbol. |
| 157 | It is used to provide a lexicon of how to describe abstract memory. Regions can |
| 158 | layer on top of other regions, providing a layered approach to representing memory. |
| 159 | For example, a struct object on the stack might be represented by a <tt>VarRegion</tt>, |
| 160 | but a <tt>FieldRegion</tt> which is a subregion of the <tt>VarRegion</tt> could |
| 161 | be used to represent the memory associated with a specific field of that object. |
Anna Zaks | af48a93 | 2013-01-08 00:25:14 +0000 | [diff] [blame] | 162 | So how do we represent symbolic memory regions? That's what |
| 163 | <a href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1SymbolicRegion.html">SymbolicRegion</a> |
| 164 | is for. It is a <tt>MemRegion</tt> that has an associated symbol. Since the |
Anna Zaks | db0e173 | 2011-12-07 19:04:24 +0000 | [diff] [blame] | 165 | symbol is unique and has a unique name; that symbol names the region. |
Anna Zaks | af48a93 | 2013-01-08 00:25:14 +0000 | [diff] [blame] | 166 | |
| 167 | <P> |
Anna Zaks | db0e173 | 2011-12-07 19:04:24 +0000 | [diff] [blame] | 168 | Let's see how the analyzer processes the expressions in the following example: |
| 169 | <p> |
| 170 | <pre class="code_example"> |
| 171 | int foo(int x) { |
| 172 | int y = x * 2; |
| 173 | int z = x; |
| 174 | ... |
| 175 | } |
| 176 | </pre> |
| 177 | <p> |
| 178 | Let's look at how <tt>x*2</tt> gets evaluated. When <tt>x</tt> is evaluated, |
| 179 | we first construct an <tt>SVal</tt> that represents the lvalue of <tt>x</tt>, in |
| 180 | this case it is an <tt>SVal</tt> that references the <tt>MemRegion</tt> for <tt>x</tt>. |
| 181 | Afterwards, when we do the lvalue-to-rvalue conversion, we get a new <tt>SVal</tt>, |
| 182 | which references the value <b>currently bound</b> to <tt>x</tt>. That value is |
| 183 | symbolic; it's whatever <tt>x</tt> was bound to at the start of the function. |
| 184 | Let's call that symbol <tt>$0</tt>. Similarly, we evaluate the expression for <tt>2</tt>, |
| 185 | and get an <tt>SVal</tt> that references the concrete number <tt>2</tt>. When |
| 186 | we evaluate <tt>x*2</tt>, we take the two <tt>SVals</tt> of the subexpressions, |
| 187 | and create a new <tt>SVal</tt> that represents their multiplication (which in |
| 188 | this case is a new symbolic expression, which we might call <tt>$1</tt>). When we |
| 189 | evaluate the assignment to <tt>y</tt>, we again compute its lvalue (a <tt>MemRegion</tt>), |
| 190 | and then bind the <tt>SVal</tt> for the RHS (which references the symbolic value <tt>$1</tt>) |
| 191 | to the <tt>MemRegion</tt> in the symbolic store. |
| 192 | <br> |
| 193 | The second line is similar. When we evaluate <tt>x</tt> again, we do the same |
| 194 | dance, and create an <tt>SVal</tt> that references the symbol <tt>$0</tt>. Note, two <tt>SVals</tt> |
| 195 | might reference the same underlying values. |
| 196 | |
| 197 | <p> |
| 198 | To summarize, MemRegions are unique names for blocks of memory. Symbols are |
| 199 | unique names for abstract symbolic values. Some MemRegions represents abstract |
| 200 | symbolic chunks of memory, and thus are also based on symbols. SVals are just |
| 201 | references to values, and can reference either MemRegions, Symbols, or concrete |
| 202 | values (e.g., the number 1). |
| 203 | |
Anna Zaks | 5259086 | 2011-11-07 05:36:29 +0000 | [diff] [blame] | 204 | <!-- |
| 205 | TODO: Add a picture. |
| 206 | <br> |
| 207 | Symbols<br> |
| 208 | FunctionalObjects are used throughout. |
| 209 | --> |
Anna Zaks | 8cfbaa6 | 2013-05-18 22:51:28 +0000 | [diff] [blame] | 210 | |
Anna Zaks | 5259086 | 2011-11-07 05:36:29 +0000 | [diff] [blame] | 211 | <h2 id=idea>Idea for a Checker</h2> |
| 212 | Here are several questions which you should consider when evaluating your |
| 213 | checker idea: |
| 214 | <ul> |
| 215 | <li>Can the check be effectively implemented without path-sensitive |
| 216 | analysis? See <a href="#ast">AST Visitors</a>.</li> |
Anna Zaks | 75a3f48 | 2011-11-02 17:49:20 +0000 | [diff] [blame] | 217 | |
| 218 | <li>How high the false positive rate is going to be? Looking at the occurrences |
Anna Zaks | 5259086 | 2011-11-07 05:36:29 +0000 | [diff] [blame] | 219 | of the issue you want to write a checker for in the existing code bases might |
| 220 | give you some ideas. </li> |
Anna Zaks | 75a3f48 | 2011-11-02 17:49:20 +0000 | [diff] [blame] | 221 | |
| 222 | <li>How the current limitations of the analysis will effect the false alarm |
| 223 | rate? Currently, the analyzer only reasons about one procedure at a time (no |
Anna Zaks | 5259086 | 2011-11-07 05:36:29 +0000 | [diff] [blame] | 224 | inter-procedural analysis). Also, it uses a simple range tracking based |
| 225 | solver to model symbolic execution.</li> |
Anna Zaks | 75a3f48 | 2011-11-02 17:49:20 +0000 | [diff] [blame] | 226 | |
Benjamin Kramer | eaa262b | 2012-01-15 15:26:07 +0000 | [diff] [blame] | 227 | <li>Consult the <a |
| 228 | href="http://llvm.org/bugs/buglist.cgi?query_format=advanced&bug_status=NEW&bug_status=REOPENED&version=trunk&component=Static%20Analyzer&product=clang">Bugzilla database</a> |
Anna Zaks | 75a3f48 | 2011-11-02 17:49:20 +0000 | [diff] [blame] | 229 | to get some ideas for new checkers and consider starting with improving/fixing |
| 230 | bugs in the existing checkers.</li> |
| 231 | </ul> |
| 232 | |
Anna Zaks | 8cfbaa6 | 2013-05-18 22:51:28 +0000 | [diff] [blame] | 233 | <p>Once an idea for a checker has been chosen, there are two key decisions that |
| 234 | need to be made: |
Anna Zaks | 75a3f48 | 2011-11-02 17:49:20 +0000 | [diff] [blame] | 235 | <ul> |
Anna Zaks | 8cfbaa6 | 2013-05-18 22:51:28 +0000 | [diff] [blame] | 236 | <li> Which events the checker should be tracking. This is discussed in more |
| 237 | detail in the section <a href="#events_callbacks">Events, Callbacks, and |
| 238 | Checker Class Structure</a>. |
| 239 | <li> What checker-specific data needs to be stored as part of the program |
| 240 | state (if any). This should be minimized as much as possible. More detail about |
| 241 | implementing custom program state is given in section <a |
| 242 | href="#extendingstates">Custom Program States</a>. |
Anna Zaks | 75a3f48 | 2011-11-02 17:49:20 +0000 | [diff] [blame] | 243 | </ul> |
Anna Zaks | 75a3f48 | 2011-11-02 17:49:20 +0000 | [diff] [blame] | 244 | |
Anna Zaks | 8cfbaa6 | 2013-05-18 22:51:28 +0000 | [diff] [blame] | 245 | |
| 246 | <h2 id=registration>Checker Registration</h2> |
| 247 | All checker implementation files are located in |
| 248 | <tt>clang/lib/StaticAnalyzer/Checkers</tt> folder. The steps below describe |
| 249 | how the checker <tt>SimpleStreamChecker</tt>, which checks for misuses of |
| 250 | stream APIs, was registered with the analyzer. |
| 251 | Similar steps should be followed for a new checker. |
| 252 | <ol> |
| 253 | <li>A new checker implementation file, <tt>SimpleStreamChecker.cpp</tt>, was |
| 254 | created in the directory <tt>lib/StaticAnalyzer/Checkers</tt>. |
| 255 | <li>The following registration code was added to the implementation file: |
| 256 | <pre class="code_example"> |
| 257 | void ento::registerSimpleStreamChecker(CheckerManager &mgr) { |
| 258 | mgr.registerChecker<SimpleStreamChecker>(); |
| 259 | } |
| 260 | </pre> |
| 261 | <li>A package was selected for the checker and the checker was defined in the |
| 262 | table of checkers at <tt>lib/StaticAnalyzer/Checkers/Checkers.td</tt>. Since all |
| 263 | checkers should first be developed as "alpha", and the SimpleStreamChecker |
| 264 | performs UNIX API checks, the correct package is "alpha.unix", and the following |
| 265 | was added to the corresponding <tt>UnixAlpha</tt> section of <tt>Checkers.td</tt>: |
| 266 | <pre class="code_example"> |
| 267 | let ParentPackage = UnixAlpha in { |
| 268 | ... |
| 269 | def SimpleStreamChecker : Checker<"SimpleStream">, |
| 270 | HelpText<"Check for misuses of stream APIs">, |
| 271 | DescFile<"SimpleStreamChecker.cpp">; |
| 272 | ... |
| 273 | } // end "alpha.unix" |
| 274 | </pre> |
| 275 | |
| 276 | <li>The source code file was made visible to CMake by adding it to |
| 277 | <tt>lib/StaticAnalyzer/Checkers/CMakeLists.txt</tt>. |
| 278 | |
| 279 | </ol> |
| 280 | |
| 281 | After adding a new checker to the analyzer, one can verify that the new checker |
| 282 | was successfully added by seeing if it appears in the list of available checkers: |
| 283 | <br> <tt><b>$clang -cc1 -analyzer-checker-help</b></tt> |
| 284 | |
| 285 | <h2 id=events_callbacks>Events, Callbacks, and Checker Class Structure</h2> |
| 286 | |
| 287 | <p> All checkers inherit from the <tt><a |
| 288 | href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1Checker.html"> |
| 289 | Checker</a></tt> template class; the template parameter(s) describe the type of |
| 290 | events that the checker is interested in processing. The various types of events |
| 291 | that are available are described in the file <a |
| 292 | href="http://clang.llvm.org/doxygen/CheckerDocumentation_8cpp_source.html"> |
| 293 | CheckerDocumentation.cpp</a> |
| 294 | |
| 295 | <p> For each event type requested, a corresponding callback function must be |
| 296 | defined in the checker class (<a |
| 297 | href="http://clang.llvm.org/doxygen/CheckerDocumentation_8cpp_source.html"> |
| 298 | CheckerDocumentation.cpp</a> shows the |
| 299 | correct function name and signature for each event type). |
| 300 | |
| 301 | <p> As an example, consider <tt>SimpleStreamChecker</tt>. This checker needs to |
| 302 | take action at the following times: |
| 303 | |
| 304 | <ul> |
| 305 | <li>Before making a call to a function, check if the function is <tt>fclose</tt>. |
| 306 | If so, check the parameter being passed. |
| 307 | <li>After making a function call, check if the function is <tt>fopen</tt>. If |
| 308 | so, process the return value. |
| 309 | <li>When values go out of scope, check whether they are still-open file |
| 310 | descriptors, and report a bug if so. In addition, remove any information about |
| 311 | them from the program state in order to keep the state as small as possible. |
| 312 | <li>When file pointers "escape" (are used in a way that the analyzer can no longer |
| 313 | track them), mark them as such. This prevents false positives in the cases where |
| 314 | the analyzer cannot be sure whether the file was closed or not. |
| 315 | </ul> |
| 316 | |
| 317 | <p>These events that will be used for each of these actions are, respectively, <a |
| 318 | href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1check_1_1PreCall.html">PreCall</a>, |
| 319 | <a |
| 320 | href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1check_1_1PostCall.html">PostCall</a>, |
| 321 | <a |
| 322 | href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1check_1_1DeadSymbols.html">DeadSymbols</a>, |
| 323 | and <a |
| 324 | href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1check_1_1PointerEscape.html">PointerEscape</a>. |
| 325 | The high-level structure of the checker's class is thus: |
| 326 | |
| 327 | <pre class="code_example"> |
| 328 | class SimpleStreamChecker : public Checker<check::PreCall, |
| 329 | check::PostCall, |
| 330 | check::DeadSymbols, |
| 331 | check::PointerEscape> { |
| 332 | public: |
| 333 | |
| 334 | void checkPreCall(const CallEvent &Call, CheckerContext &C) const; |
| 335 | |
| 336 | void checkPostCall(const CallEvent &Call, CheckerContext &C) const; |
| 337 | |
| 338 | void checkDeadSymbols(SymbolReaper &SR, CheckerContext &C) const; |
| 339 | |
| 340 | ProgramStateRef checkPointerEscape(ProgramStateRef State, |
| 341 | const InvalidatedSymbols &Escaped, |
| 342 | const CallEvent *Call, |
| 343 | PointerEscapeKind Kind) const; |
| 344 | }; |
| 345 | </pre> |
| 346 | |
| 347 | <h2 id=extendingstates>Custom Program States</h2> |
| 348 | |
| 349 | <p> Checkers often need to keep track of information specific to the checks they |
| 350 | perform. However, since checkers have no guarantee about the order in which the |
| 351 | program will be explored, or even that all possible paths will be explored, this |
| 352 | state information cannot be kept within individual checkers. Therefore, if |
| 353 | checkers need to store custom information, they need to add new categories of |
| 354 | data to the <tt>ProgramState</tt>. The preferred way to do so is to use one of |
| 355 | several macros designed for this purpose. They are: |
| 356 | |
| 357 | <ul> |
| 358 | <li><a |
| 359 | href="http://clang.llvm.org/doxygen/ProgramStateTrait_8h.html#ae4cddb54383cd702a045d7c61b009147">REGISTER_TRAIT_WITH_PROGRAMSTATE</a>: |
| 360 | Used when the state information is a single value. The methods available for |
| 361 | state types declared with this macro are <tt>get</tt>, <tt>set</tt>, and |
| 362 | <tt>remove</tt>. |
| 363 | <li><a |
| 364 | href="http://clang.llvm.org/doxygen/CheckerContext_8h.html#aa27656fa0ce65b0d9ba12eb3c02e8be9">REGISTER_LIST_WITH_PROGRAMSTATE</a>: |
| 365 | Used when the state information is a list of values. The methods available for |
| 366 | state types declared with this macro are <tt>add</tt>, <tt>get</tt>, |
| 367 | <tt>remove</tt>, and <tt>contains</tt>. |
| 368 | <li><a |
| 369 | href="http://clang.llvm.org/doxygen/CheckerContext_8h.html#ad90f9387b94b344eaaf499afec05f4d1">REGISTER_SET_WITH_PROGRAMSTATE</a>: |
| 370 | Used when the state information is a set of values. The methods available for |
| 371 | state types declared with this macro are <tt>add</tt>, <tt>get</tt>, |
| 372 | <tt>remove</tt>, and <tt>contains</tt>. |
| 373 | <li><a |
| 374 | href="http://clang.llvm.org/doxygen/CheckerContext_8h.html#a6d1893bb8c18543337b6c363c1319fcf">REGISTER_MAP_WITH_PROGRAMSTATE</a>: |
| 375 | Used when the state information is a map from a key to a value. The methods |
| 376 | available for state types declared with this macro are <tt>add</tt>, |
| 377 | <tt>set</tt>, <tt>get</tt>, <tt>remove</tt>, and <tt>contains</tt>. |
| 378 | </ul> |
| 379 | |
| 380 | <p>All of these macros take as parameters the name to be used for the custom |
| 381 | category of state information and the data type(s) to be used for storage. The |
| 382 | data type(s) specified will become the parameter type and/or return type of the |
| 383 | methods that manipulate the new category of state information. Each of these |
| 384 | methods are templated with the name of the custom data type. |
| 385 | |
| 386 | <p>For example, a common case is the need to track data associated with a |
| 387 | symbolic expression; a map type is the most logical way to implement this. The |
| 388 | key for this map will be a pointer to a symbolic expression |
| 389 | (<tt>SymbolRef</tt>). If the data type to be associated with the symbolic |
| 390 | expression is an integer, then the custom category of state information would be |
| 391 | declared as |
| 392 | |
| 393 | <pre class="code_example"> |
| 394 | REGISTER_MAP_WITH_PROGRAMSTATE(ExampleDataType, SymbolRef, int) |
| 395 | </pre> |
| 396 | |
| 397 | The data would be accessed with the function |
| 398 | |
| 399 | <pre class="code_example"> |
| 400 | ProgramStateRef state; |
| 401 | SymbolRef Sym; |
| 402 | ... |
| 403 | int currentlValue = state->get<ExampleDataType>(Sym); |
| 404 | </pre> |
| 405 | |
| 406 | and set with the function |
| 407 | |
| 408 | <pre class="code_example"> |
| 409 | ProgramStateRef state; |
| 410 | SymbolRef Sym; |
| 411 | int newValue; |
| 412 | ... |
| 413 | ProgramStateRef newState = state->set<ExampleDataType>(Sym, newValue); |
| 414 | </pre> |
| 415 | |
| 416 | <p>In addition, the macros define a data type used for storing the data of the |
| 417 | new data category; the name of this type is the name of the data category with |
| 418 | "Ty" appended. For <tt>REGISTER_TRAIT_WITH_PROGRAMSTATE</tt>, this will simply |
| 419 | be passed data type; for the other three macros, this will be a specialized |
| 420 | version of the <a |
| 421 | href="http://llvm.org/doxygen/classllvm_1_1ImmutableList.html">llvm::ImmutableList</a>, |
| 422 | <a |
| 423 | href="http://llvm.org/doxygen/classllvm_1_1ImmutableSet.html">llvm::ImmutableSet</a>, |
| 424 | or <a |
| 425 | href="http://llvm.org/doxygen/classllvm_1_1ImmutableMap.html">llvm::ImmutableMap</a> |
| 426 | templated class. For the <tt>ExampleDataType</tt> example above, the type |
| 427 | created would be equivalent to writing the declaration: |
| 428 | |
| 429 | <pre class="code_example"> |
| 430 | typedef llvm::ImmutableMap<SymbolRef, int> ExampleDataTypeTy; |
| 431 | </pre> |
| 432 | |
| 433 | <p>These macros will cover a majority of use cases; however, they still have a |
| 434 | few limitations. They cannot be used inside namespaces (since they expand to |
| 435 | contain top-level namespace references), and the data types that they define |
| 436 | cannot be referenced from more than one file. |
| 437 | |
| 438 | <p>Note that <tt>ProgramStates</tt> are immutable; instead of modifying an existing |
| 439 | one, functions that modify the state will return a copy of the previous state |
| 440 | with the change applied. This updated state must be then provided to the |
| 441 | analyzer core by calling the <tt>CheckerContext::addTransition</tt> function. |
Anna Zaks | 75a3f48 | 2011-11-02 17:49:20 +0000 | [diff] [blame] | 442 | <h2 id=bugs>Bug Reports</h2> |
| 443 | |
Anna Zaks | 8cfbaa6 | 2013-05-18 22:51:28 +0000 | [diff] [blame] | 444 | |
| 445 | <p> When a checker detects a mistake in the analyzed code, it needs a way to |
| 446 | report it to the analyzer core so that it can be displayed. The two classes used |
| 447 | to construct this report are <tt><a |
| 448 | href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1BugType.html">BugType</a></tt> |
| 449 | and <tt><a |
| 450 | href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1BugReport.html"> |
| 451 | BugReport</a></tt>. |
| 452 | |
| 453 | <p> |
| 454 | <tt>BugType</tt>, as the name would suggest, represents a type of bug. The |
| 455 | constructor for <tt>BugType</tt> takes two parameters: The name of the bug |
| 456 | type, and the name of the category of the bug. These are used (e.g.) in the |
| 457 | summary page generated by the scan-build tool. |
| 458 | |
| 459 | <P> |
| 460 | The <tt>BugReport</tt> class represents a specific occurrence of a bug. In |
| 461 | the most common case, three parameters are used to form a <tt>BugReport</tt>: |
| 462 | <ol> |
| 463 | <li>The type of bug, specified as an instance of the <tt>BugType</tt> class. |
| 464 | <li>A short descriptive string. This is placed at the location of the bug in |
| 465 | the detailed line-by-line output generated by scan-build. |
| 466 | <li>The context in which the bug occurred. This includes both the location of |
| 467 | the bug in the program and the program's state when the location is reached. These are |
| 468 | both encapsulated in an <tt>ExplodedNode</tt>. |
| 469 | </ol> |
| 470 | |
| 471 | <p>In order to obtain the correct <tt>ExplodedNode</tt>, a decision must be made |
| 472 | as to whether or not analysis can continue along the current path. This decision |
| 473 | is based on whether the detected bug is one that would prevent the program under |
| 474 | analysis from continuing. For example, leaking of a resource should not stop |
| 475 | analysis, as the program can continue to run after the leak. Dereferencing a |
| 476 | null pointer, on the other hand, should stop analysis, as there is no way for |
| 477 | the program to meaningfully continue after such an error. |
| 478 | |
| 479 | <p>If analysis can continue, then the most recent <tt>ExplodedNode</tt> |
| 480 | generated by the checker can be passed to the <tt>BugReport</tt> constructor |
| 481 | without additional modification. This <tt>ExplodedNode</tt> will be the one |
| 482 | returned by the most recent call to <a |
| 483 | href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1CheckerContext.html#a264f48d97809707049689c37aa35af78">CheckerContext::addTransition</a>. |
| 484 | If no transition has been performed during the current callback, the checker should call <a |
| 485 | href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1CheckerContext.html#a264f48d97809707049689c37aa35af78">CheckerContext::addTransition()</a> |
| 486 | and use the returned node for bug reporting. |
| 487 | |
| 488 | <p>If analysis can not continue, then the current state should be transitioned |
| 489 | into a so-called <i>sink node</i>, a node from which no further analysis will be |
| 490 | performed. This is done by calling the <a |
| 491 | href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1CheckerContext.html#adeea33a5a2bed190210c4a2bb807a6f0"> |
| 492 | CheckerContext::generateSink</a> function; this function is the same as the |
| 493 | <tt>addTransition</tt> function, but marks the state as a sink node. Like |
| 494 | <tt>addTransition</tt>, this returns an <tt>ExplodedNode</tt> with the updated |
| 495 | state, which can then be passed to the <tt>BugReport</tt> constructor. |
| 496 | |
| 497 | <p> |
| 498 | After a <tt>BugReport</tt> is created, it should be passed to the analyzer core |
| 499 | by calling <a href = "http://clang.llvm.org/doxygen/classclang_1_1ento_1_1CheckerContext.html#ae7738af2cbfd1d713edec33d3203dff5">CheckerContext::emitReport</a>. |
| 500 | |
Anna Zaks | 75a3f48 | 2011-11-02 17:49:20 +0000 | [diff] [blame] | 501 | <h2 id=ast>AST Visitors</h2> |
| 502 | Some checks might not require path-sensitivity to be effective. Simple AST walk |
Anna Zaks | 5259086 | 2011-11-07 05:36:29 +0000 | [diff] [blame] | 503 | might be sufficient. If that is the case, consider implementing a Clang |
| 504 | compiler warning. On the other hand, a check might not be acceptable as a compiler |
Anna Zaks | 75a3f48 | 2011-11-02 17:49:20 +0000 | [diff] [blame] | 505 | warning; for example, because of a relatively high false positive rate. In this |
| 506 | situation, AST callbacks <tt><b>checkASTDecl</b></tt> and |
| 507 | <tt><b>checkASTCodeBody</b></tt> are your best friends. |
| 508 | |
| 509 | <h2 id=testing>Testing</h2> |
| 510 | Every patch should be well tested with Clang regression tests. The checker tests |
| 511 | live in <tt>clang/test/Analysis</tt> folder. To run all of the analyzer tests, |
| 512 | execute the following from the <tt>clang</tt> build directory: |
| 513 | <pre class="code"> |
| 514 | $ <b>TESTDIRS=Analysis make test</b> |
| 515 | </pre> |
| 516 | |
| 517 | <h2 id=commands>Useful Commands/Debugging Hints</h2> |
| 518 | <ul> |
| 519 | <li> |
Anna Zaks | 5259086 | 2011-11-07 05:36:29 +0000 | [diff] [blame] | 520 | While investigating a checker-related issue, instruct the analyzer to only |
| 521 | execute a single checker: |
Anna Zaks | 75a3f48 | 2011-11-02 17:49:20 +0000 | [diff] [blame] | 522 | <br><tt> |
| 523 | $ <b>clang -cc1 -analyze -analyzer-checker=osx.KeychainAPI test.c</b> |
| 524 | </tt> |
| 525 | </li> |
| 526 | <li> |
| 527 | To dump AST: |
| 528 | <br><tt> |
| 529 | $ <b>clang -cc1 -ast-dump test.c</b> |
| 530 | </tt> |
| 531 | </li> |
| 532 | <li> |
| 533 | To view/dump CFG use <tt>debug.ViewCFG</tt> or <tt>debug.DumpCFG</tt> checkers: |
| 534 | <br><tt> |
| 535 | $ <b>clang -cc1 -analyze -analyzer-checker=debug.ViewCFG test.c</b> |
| 536 | </tt> |
| 537 | </li> |
| 538 | <li> |
| 539 | To see all available debug checkers: |
| 540 | <br><tt> |
| 541 | $ <b>clang -cc1 -analyzer-checker-help | grep "debug"</b> |
| 542 | </tt> |
| 543 | </li> |
| 544 | <li> |
Anna Zaks | 5259086 | 2011-11-07 05:36:29 +0000 | [diff] [blame] | 545 | To see which function is failing while processing a large file use |
| 546 | <tt>-analyzer-display-progress</tt> option. |
Anna Zaks | 75a3f48 | 2011-11-02 17:49:20 +0000 | [diff] [blame] | 547 | </li> |
| 548 | <li> |
Anna Zaks | 5259086 | 2011-11-07 05:36:29 +0000 | [diff] [blame] | 549 | While debugging execute <tt>clang -cc1 -analyze -analyzer-checker=core</tt> |
| 550 | instead of <tt>clang --analyze</tt>, as the later would call the compiler |
| 551 | in a separate process. |
Anna Zaks | 75a3f48 | 2011-11-02 17:49:20 +0000 | [diff] [blame] | 552 | </li> |
| 553 | <li> |
Anna Zaks | 5259086 | 2011-11-07 05:36:29 +0000 | [diff] [blame] | 554 | To view <tt>ExplodedGraph</tt> (the state graph explored by the analyzer) while |
| 555 | debugging, goto a frame that has <tt>clang::ento::ExprEngine</tt> object and |
| 556 | execute: |
Anna Zaks | 75a3f48 | 2011-11-02 17:49:20 +0000 | [diff] [blame] | 557 | <br><tt> |
| 558 | (gdb) <b>p ViewGraph(0)</b> |
| 559 | </tt> |
| 560 | </li> |
| 561 | <li> |
Anna Zaks | e87ad46 | 2011-12-07 19:04:27 +0000 | [diff] [blame] | 562 | To see the <tt>ProgramState</tt> while debugging use the following command. |
| 563 | <br><tt> |
| 564 | (gdb) <b>p State->dump()</b> |
| 565 | </tt> |
| 566 | </li> |
| 567 | <li> |
Anna Zaks | 5259086 | 2011-11-07 05:36:29 +0000 | [diff] [blame] | 568 | To see <tt>clang::Expr</tt> while debugging use the following command. If you |
| 569 | pass in a SourceManager object, it will also dump the corresponding line in the |
| 570 | source code. |
Anna Zaks | 75a3f48 | 2011-11-02 17:49:20 +0000 | [diff] [blame] | 571 | <br><tt> |
| 572 | (gdb) <b>p E->dump()</b> |
| 573 | </tt> |
| 574 | </li> |
| 575 | <li> |
| 576 | To dump AST of a method that the current <tt>ExplodedNode</tt> belongs to: |
| 577 | <br><tt> |
Anna Zaks | 03e0651 | 2012-01-20 00:11:04 +0000 | [diff] [blame] | 578 | (gdb) <b>p C.getPredecessor()->getCodeDecl().getBody()->dump()</b> |
| 579 | (gdb) <b>p C.getPredecessor()->getCodeDecl().getBody()->dump(getContext().getSourceManager())</b> |
Anna Zaks | 75a3f48 | 2011-11-02 17:49:20 +0000 | [diff] [blame] | 580 | </tt> |
| 581 | </li> |
| 582 | </ul> |
| 583 | |
Anna Zaks | 8cfbaa6 | 2013-05-18 22:51:28 +0000 | [diff] [blame] | 584 | <h2 id=additioninformation>Additional Sources of Information</h2> |
| 585 | |
| 586 | Here are some additional resources that are useful when working on the Clang |
| 587 | Static Analyzer: |
| 588 | |
| 589 | <ul> |
| 590 | <li> <a href="http://clang.llvm.org/doxygen">Clang doxygen</a>. Contains |
| 591 | up-to-date documentation about the APIs available in Clang. Relevant entries |
| 592 | have been linked throughout this page. Also of use is the |
| 593 | <a href="http://llvm.org/doxygen">LLVM doxygen</a>, when dealing with classes |
| 594 | from LLVM. |
| 595 | <li> The <a href="http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev"> |
| 596 | cfe-dev mailing list</a>. This is the primary mailing list used for |
| 597 | discussion of Clang development (including static code analysis). The |
| 598 | <a href="http://lists.cs.uiuc.edu/pipermail/cfe-dev">archive</a> also contains |
| 599 | a lot of information. |
| 600 | <li> The "Building a Checker in 24 hours" presentation given at the <a |
| 601 | href="http://llvm.org/devmtg/2012-11">November 2012 LLVM Developer's |
| 602 | meeting</a>. Describes the construction of SimpleStreamChecker. <a |
| 603 | href="http://llvm.org/devmtg/2012-11/Zaks-Rose-Checker24Hours.pdf">Slides</a> |
| 604 | and <a |
| 605 | href="http://llvm.org/devmtg/2012-11/videos/Zaks-Rose-Checker24Hours.mp4">video</a> |
| 606 | are available. |
| 607 | </ul> |
| 608 | |
Anton Yartsev | 45056dc | 2014-05-19 15:04:55 +0000 | [diff] [blame] | 609 | <h2 id=links>Useful Links</h2> |
| 610 | <ul> |
| 611 | <li>The list of <a href="implicit_checks.html">Implicit Checkers</a></li> |
| 612 | </ul> |
| 613 | |
Anna Zaks | 75a3f48 | 2011-11-02 17:49:20 +0000 | [diff] [blame] | 614 | </div> |
| 615 | </div> |
| 616 | </body> |
| 617 | </html> |