blob: 5368eb0e96186e9a4191929aa0e1a44f38976010 [file] [log] [blame]
Anna Zaks75a3f482011-11-02 17:49:20 +00001<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
2 "http://www.w3.org/TR/html4/strict.dtd">
3<html>
4<head>
5 <title>Checker Developer Manual</title>
Benjamin Kramereaa262b2012-01-15 15:26:07 +00006 <link type="text/css" rel="stylesheet" href="menu.css">
7 <link type="text/css" rel="stylesheet" href="content.css">
Anna Zaks75a3f482011-11-02 17:49:20 +00008 <script type="text/javascript" src="scripts/menu.js"></script>
9</head>
10<body>
11
12<div id="page">
13<!--#include virtual="menu.html.incl"-->
14
15<div id="content">
16
Benjamin Kramereaa262b2012-01-15 15:26:07 +000017<h1 style="color:red">This Page Is Under Construction</h1>
Anna Zaks75a3f482011-11-02 17:49:20 +000018
19<h1>Checker Developer Manual</h1>
20
21<p>The static analyzer engine performs symbolic execution of the program and
22relies on a set of checkers to implement the logic for detecting and
23constructing bug reports. This page provides hints and guidelines for anyone
24who is interested in implementing their own checker. The static analyzer is a
25part of the Clang project, so consult <a href="http://clang.llvm.org/hacking.html">Hacking on Clang</a>
26and <a href="http://llvm.org/docs/ProgrammersManual.html">LLVM Programmer's Manual</a>
27for general developer guidelines and information. </p>
28
29 <ul>
30 <li><a href="#start">Getting Started</a></li>
Anna Zaks52590862011-11-07 05:36:29 +000031 <li><a href="#analyzer">Analyzer Overview</a></li>
Anna Zaks75a3f482011-11-02 17:49:20 +000032 <li><a href="#idea">Idea for a Checker</a></li>
Anna Zaks52590862011-11-07 05:36:29 +000033 <li><a href="#registration">Checker Registration</a></li>
Anna Zaks75a3f482011-11-02 17:49:20 +000034 <li><a href="#skeleton">Checker Skeleton</a></li>
35 <li><a href="#node">Exploded Node</a></li>
36 <li><a href="#bugs">Bug Reports</a></li>
37 <li><a href="#ast">AST Visitors</a></li>
38 <li><a href="#testing">Testing</a></li>
39 <li><a href="#commands">Useful Commands</a></li>
40 </ul>
41
42<h2 id=start>Getting Started</h2>
43 <ul>
Anna Zaks52590862011-11-07 05:36:29 +000044 <li>To check out the source code and build the project, follow steps 1-4 of
45 the <a href="http://clang.llvm.org/get_started.html">Clang Getting Started</a>
Anna Zaks75a3f482011-11-02 17:49:20 +000046 page.</li>
47
48 <li>The analyzer source code is located under the Clang source tree:
49 <br><tt>
50 $ <b>cd llvm/tools/clang</b>
51 </tt>
Anna Zaks52590862011-11-07 05:36:29 +000052 <br>See: <tt>include/clang/StaticAnalyzer</tt>, <tt>lib/StaticAnalyzer</tt>,
53 <tt>test/Analysis</tt>.</li>
Anna Zaks75a3f482011-11-02 17:49:20 +000054
Anna Zaks52590862011-11-07 05:36:29 +000055 <li>The analyzer regression tests can be executed from the Clang's build
56 directory:
Anna Zaks75a3f482011-11-02 17:49:20 +000057 <br><tt>
58 $ <b>cd ../../../; cd build/tools/clang; TESTDIRS=Analysis make test</b>
59 </tt></li>
60
61 <li>Analyze a file with the specified checker:
62 <br><tt>
63 $ <b>clang -cc1 -analyze -analyzer-checker=core.DivideZero test.c</b>
64 </tt></li>
65
66 <li>List the available checkers:
67 <br><tt>
68 $ <b>clang -cc1 -analyzer-checker-help</b>
69 </tt></li>
70
Anna Zaks52590862011-11-07 05:36:29 +000071 <li>See the analyzer help for different output formats, fine tuning, and
72 debug options:
Anna Zaks75a3f482011-11-02 17:49:20 +000073 <br><tt>
74 $ <b>clang -cc1 -help | grep "analyzer"</b>
75 </tt></li>
76
77 </ul>
78
79<h2 id=analyzer>Static Analyzer Overview</h2>
Anna Zaks52590862011-11-07 05:36:29 +000080 The analyzer core performs symbolic execution of the given program. All the
81 input values are represented with symbolic values; further, the engine deduces
82 the values of all the expressions in the program based on the input symbols
83 and the path. The execution is path sensitive and every possible path through
84 the program is explored. The explored execution traces are represented with
David Blaikie7c70fe62012-09-20 20:59:21 +000085 <a href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1ExplodedGraph.html">ExplodedGraph</a> object.
Anna Zaks52590862011-11-07 05:36:29 +000086 Each node of the graph is
87 <a href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1ExplodedNode.html">ExplodedNode</a>,
88 which consists of a <tt>ProgramPoint</tt> and a <tt>ProgramState</tt>.
89 <p>
90 <a href="http://clang.llvm.org/doxygen/classclang_1_1ProgramPoint.html">ProgramPoint</a>
91 represents the corresponding location in the program (or the CFG graph).
92 <tt>ProgramPoint</tt> is also used to record additional information on
93 when/how the state was added. For example, <tt>PostPurgeDeadSymbolsKind</tt>
94 kind means that the state is the result of purging dead symbols - the
95 analyzer's equivalent of garbage collection.
96 <p>
97 <a href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1ProgramState.html">ProgramState</a>
98 represents abstract state of the program. It consists of:
Anna Zaks75a3f482011-11-02 17:49:20 +000099 <ul>
Anna Zaks52590862011-11-07 05:36:29 +0000100 <li><tt>Environment</tt> - a mapping from source code expressions to symbolic
101 values
102 <li><tt>Store</tt> - a mapping from memory locations to symbolic values
103 <li><tt>GenericDataMap</tt> - constraints on symbolic values
104 </ul>
105
Anna Zaksdb0e1732011-12-07 19:04:24 +0000106 <h3>Interaction with Checkers</h3>
Anna Zaks52590862011-11-07 05:36:29 +0000107 Checkers are not merely passive receivers of the analyzer core changes - they
108 actively participate in the <tt>ProgramState</tt> construction through the
109 <tt>GenericDataMap</tt> which can be used to store the checker-defined part
110 of the state. Each time the analyzer engine explores a new statement, it
111 notifies each checker registered to listen for that statement, giving it an
112 opportunity to either report a bug or modify the state. (As a rule of thumb,
113 the checker itself should be stateless.) The checkers are called one after another
114 in the predefined order; thus, calling all the checkers adds a chain to the
115 <tt>ExplodedGraph</tt>.
Anna Zaksdb0e1732011-12-07 19:04:24 +0000116
117 <h3>Representing Values</h3>
118 During symbolic execution, <a href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1SVal.html">SVal</a>
Anna Zaksaf48a932013-01-08 00:25:14 +0000119 objects are used to represent the semantic evaluation of expressions.
120 They can represent things like concrete
121 integers, symbolic values, or memory locations (which are memory regions).
122 They are a discriminated union of "values", symbolic and otherwise.
123 If a value isn't symbolic, usually that means there is no symbolic
124 information to track. For example, if the value was an integer, such as
125 <tt>42</tt>, it would be a <a href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1nonloc_1_1ConcreteInt.html">ConcreteInt</a>,
126 and the checker doesn't usually need to track any state with the concrete
127 number. In some cases, <tt>SVal</tt> is not a symbol, but it really should be
128 a symbolic value. This happens when the analyzer cannot reason about something
129 (yet). An example is floating point numbers. In such cases, the
130 <tt>SVal</tt> will evaluate to <a href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1UnknownVal.html">UnknownVal<a>.
131 This represents a case that is outside the realm of the analyzer's reasoning
132 capabilities. <tt>SVals</tt> are value objects and their values can be viewed
133 using the <tt>.dump()</tt> method. Often they wrap persistent objects such as
134 symbols or regions.
Anna Zaksdb0e1732011-12-07 19:04:24 +0000135 <p>
136 <a href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1SymExpr.html">SymExpr</a> (symbol)
Anna Zaksaf48a932013-01-08 00:25:14 +0000137 is meant to represent abstract, but named, symbolic value. Symbols represent
Anna Zaksdb0e1732011-12-07 19:04:24 +0000138 an actual (immutable) value. We might not know what its specific value is, but
Anna Zaksaf48a932013-01-08 00:25:14 +0000139 we can associate constraints with that value as we analyze a path. For
140 example, we might record that the value of a symbol is greater than
141 <tt>0</tt>, etc.
Anna Zaksdb0e1732011-12-07 19:04:24 +0000142 <p>
Anna Zaksaf48a932013-01-08 00:25:14 +0000143
144 <p>
Anna Zaksdb0e1732011-12-07 19:04:24 +0000145 <a href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1MemRegion.html">MemRegion</a> is similar to a symbol.
146 It is used to provide a lexicon of how to describe abstract memory. Regions can
147 layer on top of other regions, providing a layered approach to representing memory.
148 For example, a struct object on the stack might be represented by a <tt>VarRegion</tt>,
149 but a <tt>FieldRegion</tt> which is a subregion of the <tt>VarRegion</tt> could
150 be used to represent the memory associated with a specific field of that object.
Anna Zaksaf48a932013-01-08 00:25:14 +0000151 So how do we represent symbolic memory regions? That's what
152 <a href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1SymbolicRegion.html">SymbolicRegion</a>
153 is for. It is a <tt>MemRegion</tt> that has an associated symbol. Since the
Anna Zaksdb0e1732011-12-07 19:04:24 +0000154 symbol is unique and has a unique name; that symbol names the region.
Anna Zaksaf48a932013-01-08 00:25:14 +0000155
156 <P>
Anna Zaksdb0e1732011-12-07 19:04:24 +0000157 Let's see how the analyzer processes the expressions in the following example:
158 <p>
159 <pre class="code_example">
160 int foo(int x) {
161 int y = x * 2;
162 int z = x;
163 ...
164 }
165 </pre>
166 <p>
167Let's look at how <tt>x*2</tt> gets evaluated. When <tt>x</tt> is evaluated,
168we first construct an <tt>SVal</tt> that represents the lvalue of <tt>x</tt>, in
169this case it is an <tt>SVal</tt> that references the <tt>MemRegion</tt> for <tt>x</tt>.
170Afterwards, when we do the lvalue-to-rvalue conversion, we get a new <tt>SVal</tt>,
171which references the value <b>currently bound</b> to <tt>x</tt>. That value is
172symbolic; it's whatever <tt>x</tt> was bound to at the start of the function.
173Let's call that symbol <tt>$0</tt>. Similarly, we evaluate the expression for <tt>2</tt>,
174and get an <tt>SVal</tt> that references the concrete number <tt>2</tt>. When
175we evaluate <tt>x*2</tt>, we take the two <tt>SVals</tt> of the subexpressions,
176and create a new <tt>SVal</tt> that represents their multiplication (which in
177this case is a new symbolic expression, which we might call <tt>$1</tt>). When we
178evaluate the assignment to <tt>y</tt>, we again compute its lvalue (a <tt>MemRegion</tt>),
179and then bind the <tt>SVal</tt> for the RHS (which references the symbolic value <tt>$1</tt>)
180to the <tt>MemRegion</tt> in the symbolic store.
181<br>
182The second line is similar. When we evaluate <tt>x</tt> again, we do the same
183dance, and create an <tt>SVal</tt> that references the symbol <tt>$0</tt>. Note, two <tt>SVals</tt>
184might reference the same underlying values.
185
186<p>
187To summarize, MemRegions are unique names for blocks of memory. Symbols are
188unique names for abstract symbolic values. Some MemRegions represents abstract
189symbolic chunks of memory, and thus are also based on symbols. SVals are just
190references to values, and can reference either MemRegions, Symbols, or concrete
191values (e.g., the number 1).
192
Anna Zaks52590862011-11-07 05:36:29 +0000193 <!--
194 TODO: Add a picture.
195 <br>
196 Symbols<br>
197 FunctionalObjects are used throughout.
198 -->
199<h2 id=idea>Idea for a Checker</h2>
200 Here are several questions which you should consider when evaluating your
201 checker idea:
202 <ul>
203 <li>Can the check be effectively implemented without path-sensitive
204 analysis? See <a href="#ast">AST Visitors</a>.</li>
Anna Zaks75a3f482011-11-02 17:49:20 +0000205
206 <li>How high the false positive rate is going to be? Looking at the occurrences
Anna Zaks52590862011-11-07 05:36:29 +0000207 of the issue you want to write a checker for in the existing code bases might
208 give you some ideas. </li>
Anna Zaks75a3f482011-11-02 17:49:20 +0000209
210 <li>How the current limitations of the analysis will effect the false alarm
211 rate? Currently, the analyzer only reasons about one procedure at a time (no
Anna Zaks52590862011-11-07 05:36:29 +0000212 inter-procedural analysis). Also, it uses a simple range tracking based
213 solver to model symbolic execution.</li>
Anna Zaks75a3f482011-11-02 17:49:20 +0000214
Benjamin Kramereaa262b2012-01-15 15:26:07 +0000215 <li>Consult the <a
216 href="http://llvm.org/bugs/buglist.cgi?query_format=advanced&amp;bug_status=NEW&amp;bug_status=REOPENED&amp;version=trunk&amp;component=Static%20Analyzer&amp;product=clang">Bugzilla database</a>
Anna Zaks75a3f482011-11-02 17:49:20 +0000217 to get some ideas for new checkers and consider starting with improving/fixing
218 bugs in the existing checkers.</li>
219 </ul>
220
Anna Zaks52590862011-11-07 05:36:29 +0000221<h2 id=registration>Checker Registration</h2>
222 All checker implementation files are located in <tt>clang/lib/StaticAnalyzer/Checkers</tt>
223 folder. Follow the steps below to register a new checker with the analyzer.
224<ol>
225 <li>Create a new checker implementation file, for example <tt>./lib/StaticAnalyzer/Checkers/NewChecker.cpp</tt>
226<pre class="code_example">
227using namespace clang;
228using namespace ento;
229
230namespace {
Benjamin Kramereaa262b2012-01-15 15:26:07 +0000231class NewChecker: public Checker< check::PreStmt&lt;CallExpr> > {
Anna Zaks52590862011-11-07 05:36:29 +0000232public:
Benjamin Kramereaa262b2012-01-15 15:26:07 +0000233 void checkPreStmt(const CallExpr *CE, CheckerContext &amp;Ctx) const {}
Anna Zaks52590862011-11-07 05:36:29 +0000234}
235}
Benjamin Kramereaa262b2012-01-15 15:26:07 +0000236void ento::registerNewChecker(CheckerManager &amp;mgr) {
237 mgr.registerChecker&lt;NewChecker>();
Anna Zaks52590862011-11-07 05:36:29 +0000238}
239</pre>
240
241<li>Pick the package name for your checker and add the registration code to
242<tt>./lib/StaticAnalyzer/Checkers/Checkers.td</tt>. Note, all checkers should
243first be developed as experimental. Suppose our new checker performs security
244related checks, then we should add the following lines under
245<tt>SecurityExperimental</tt> package:
246<pre class="code_example">
247let ParentPackage = SecurityExperimental in {
248...
249def NewChecker : Checker<"NewChecker">,
250 HelpText<"This text should give a short description of the checks performed.">,
251 DescFile<"NewChecker.cpp">;
252...
253} // end "security.experimental"
254</pre>
255
256<li>Make the source code file visible to CMake by adding it to
257<tt>./lib/StaticAnalyzer/Checkers/CMakeLists.txt</tt>.
258
259<li>Compile and see your checker in the list of available checkers by running:<br>
260<tt><b>$clang -cc1 -analyzer-checker-help</b></tt>
261</ol>
262
263
Anna Zaks75a3f482011-11-02 17:49:20 +0000264<h2 id=skeleton>Checker Skeleton</h2>
Anna Zaks75a3f482011-11-02 17:49:20 +0000265 There are two main decisions you need to make:
266 <ul>
Anna Zaksdb0e1732011-12-07 19:04:24 +0000267 <li> Which events the checker should be tracking.
268 See <a href="http://clang.llvm.org/doxygen/classento_1_1CheckerDocumentation.html">CheckerDocumentation</a>
269 for the list of available checker callbacks.</li>
Anna Zaks52590862011-11-07 05:36:29 +0000270 <li> What data you want to store as part of the checker-specific program
271 state. Try to minimize the checker state as much as possible. </li>
Anna Zaks75a3f482011-11-02 17:49:20 +0000272 </ul>
Anna Zaks75a3f482011-11-02 17:49:20 +0000273
274<h2 id=bugs>Bug Reports</h2>
275
276<h2 id=ast>AST Visitors</h2>
277 Some checks might not require path-sensitivity to be effective. Simple AST walk
Anna Zaks52590862011-11-07 05:36:29 +0000278 might be sufficient. If that is the case, consider implementing a Clang
279 compiler warning. On the other hand, a check might not be acceptable as a compiler
Anna Zaks75a3f482011-11-02 17:49:20 +0000280 warning; for example, because of a relatively high false positive rate. In this
281 situation, AST callbacks <tt><b>checkASTDecl</b></tt> and
282 <tt><b>checkASTCodeBody</b></tt> are your best friends.
283
284<h2 id=testing>Testing</h2>
285 Every patch should be well tested with Clang regression tests. The checker tests
286 live in <tt>clang/test/Analysis</tt> folder. To run all of the analyzer tests,
287 execute the following from the <tt>clang</tt> build directory:
288 <pre class="code">
289 $ <b>TESTDIRS=Analysis make test</b>
290 </pre>
291
292<h2 id=commands>Useful Commands/Debugging Hints</h2>
293<ul>
294<li>
Anna Zaks52590862011-11-07 05:36:29 +0000295While investigating a checker-related issue, instruct the analyzer to only
296execute a single checker:
Anna Zaks75a3f482011-11-02 17:49:20 +0000297<br><tt>
298$ <b>clang -cc1 -analyze -analyzer-checker=osx.KeychainAPI test.c</b>
299</tt>
300</li>
301<li>
302To dump AST:
303<br><tt>
304$ <b>clang -cc1 -ast-dump test.c</b>
305</tt>
306</li>
307<li>
308To view/dump CFG use <tt>debug.ViewCFG</tt> or <tt>debug.DumpCFG</tt> checkers:
309<br><tt>
310$ <b>clang -cc1 -analyze -analyzer-checker=debug.ViewCFG test.c</b>
311</tt>
312</li>
313<li>
314To see all available debug checkers:
315<br><tt>
316$ <b>clang -cc1 -analyzer-checker-help | grep "debug"</b>
317</tt>
318</li>
319<li>
Anna Zaks52590862011-11-07 05:36:29 +0000320To see which function is failing while processing a large file use
321<tt>-analyzer-display-progress</tt> option.
Anna Zaks75a3f482011-11-02 17:49:20 +0000322</li>
323<li>
Anna Zaks52590862011-11-07 05:36:29 +0000324While debugging execute <tt>clang -cc1 -analyze -analyzer-checker=core</tt>
325instead of <tt>clang --analyze</tt>, as the later would call the compiler
326in a separate process.
Anna Zaks75a3f482011-11-02 17:49:20 +0000327</li>
328<li>
Anna Zaks52590862011-11-07 05:36:29 +0000329To view <tt>ExplodedGraph</tt> (the state graph explored by the analyzer) while
330debugging, goto a frame that has <tt>clang::ento::ExprEngine</tt> object and
331execute:
Anna Zaks75a3f482011-11-02 17:49:20 +0000332<br><tt>
333(gdb) <b>p ViewGraph(0)</b>
334</tt>
335</li>
336<li>
Anna Zakse87ad462011-12-07 19:04:27 +0000337To see the <tt>ProgramState</tt> while debugging use the following command.
338<br><tt>
339(gdb) <b>p State->dump()</b>
340</tt>
341</li>
342<li>
Anna Zaks52590862011-11-07 05:36:29 +0000343To see <tt>clang::Expr</tt> while debugging use the following command. If you
344pass in a SourceManager object, it will also dump the corresponding line in the
345source code.
Anna Zaks75a3f482011-11-02 17:49:20 +0000346<br><tt>
347(gdb) <b>p E->dump()</b>
348</tt>
349</li>
350<li>
351To dump AST of a method that the current <tt>ExplodedNode</tt> belongs to:
352<br><tt>
Anna Zaks03e06512012-01-20 00:11:04 +0000353(gdb) <b>p C.getPredecessor()->getCodeDecl().getBody()->dump()</b>
354(gdb) <b>p C.getPredecessor()->getCodeDecl().getBody()->dump(getContext().getSourceManager())</b>
Anna Zaks75a3f482011-11-02 17:49:20 +0000355</tt>
356</li>
357</ul>
358
359</div>
360</div>
361</body>
362</html>