blob: cd628a15116641e35379393a637be429bee30e9d [file] [log] [blame]
Anna Zaksd67fc492011-11-02 17:49:20 +00001<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
2 "http://www.w3.org/TR/html4/strict.dtd">
3<html>
4<head>
5 <title>Checker Developer Manual</title>
Benjamin Kramer665a8dc2012-01-15 15:26:07 +00006 <link type="text/css" rel="stylesheet" href="menu.css">
7 <link type="text/css" rel="stylesheet" href="content.css">
Anna Zaksd67fc492011-11-02 17:49:20 +00008 <script type="text/javascript" src="scripts/menu.js"></script>
9</head>
10<body>
11
12<div id="page">
13<!--#include virtual="menu.html.incl"-->
14
15<div id="content">
16
Benjamin Kramer665a8dc2012-01-15 15:26:07 +000017<h1 style="color:red">This Page Is Under Construction</h1>
Anna Zaksd67fc492011-11-02 17:49:20 +000018
19<h1>Checker Developer Manual</h1>
20
21<p>The static analyzer engine performs symbolic execution of the program and
22relies on a set of checkers to implement the logic for detecting and
23constructing bug reports. This page provides hints and guidelines for anyone
24who is interested in implementing their own checker. The static analyzer is a
25part of the Clang project, so consult <a href="http://clang.llvm.org/hacking.html">Hacking on Clang</a>
26and <a href="http://llvm.org/docs/ProgrammersManual.html">LLVM Programmer's Manual</a>
27for general developer guidelines and information. </p>
28
29 <ul>
30 <li><a href="#start">Getting Started</a></li>
Anna Zaks464ef2e2011-11-07 05:36:29 +000031 <li><a href="#analyzer">Analyzer Overview</a></li>
Anna Zaksd67fc492011-11-02 17:49:20 +000032 <li><a href="#idea">Idea for a Checker</a></li>
Anna Zaks464ef2e2011-11-07 05:36:29 +000033 <li><a href="#registration">Checker Registration</a></li>
Anna Zaksd67fc492011-11-02 17:49:20 +000034 <li><a href="#skeleton">Checker Skeleton</a></li>
35 <li><a href="#node">Exploded Node</a></li>
36 <li><a href="#bugs">Bug Reports</a></li>
37 <li><a href="#ast">AST Visitors</a></li>
38 <li><a href="#testing">Testing</a></li>
39 <li><a href="#commands">Useful Commands</a></li>
40 </ul>
41
42<h2 id=start>Getting Started</h2>
43 <ul>
Anna Zaks464ef2e2011-11-07 05:36:29 +000044 <li>To check out the source code and build the project, follow steps 1-4 of
45 the <a href="http://clang.llvm.org/get_started.html">Clang Getting Started</a>
Anna Zaksd67fc492011-11-02 17:49:20 +000046 page.</li>
47
48 <li>The analyzer source code is located under the Clang source tree:
49 <br><tt>
50 $ <b>cd llvm/tools/clang</b>
51 </tt>
Anna Zaks464ef2e2011-11-07 05:36:29 +000052 <br>See: <tt>include/clang/StaticAnalyzer</tt>, <tt>lib/StaticAnalyzer</tt>,
53 <tt>test/Analysis</tt>.</li>
Anna Zaksd67fc492011-11-02 17:49:20 +000054
Anna Zaks464ef2e2011-11-07 05:36:29 +000055 <li>The analyzer regression tests can be executed from the Clang's build
56 directory:
Anna Zaksd67fc492011-11-02 17:49:20 +000057 <br><tt>
58 $ <b>cd ../../../; cd build/tools/clang; TESTDIRS=Analysis make test</b>
59 </tt></li>
60
61 <li>Analyze a file with the specified checker:
62 <br><tt>
63 $ <b>clang -cc1 -analyze -analyzer-checker=core.DivideZero test.c</b>
64 </tt></li>
65
66 <li>List the available checkers:
67 <br><tt>
68 $ <b>clang -cc1 -analyzer-checker-help</b>
69 </tt></li>
70
Anna Zaks464ef2e2011-11-07 05:36:29 +000071 <li>See the analyzer help for different output formats, fine tuning, and
72 debug options:
Anna Zaksd67fc492011-11-02 17:49:20 +000073 <br><tt>
74 $ <b>clang -cc1 -help | grep "analyzer"</b>
75 </tt></li>
76
77 </ul>
78
79<h2 id=analyzer>Static Analyzer Overview</h2>
Anna Zaks464ef2e2011-11-07 05:36:29 +000080 The analyzer core performs symbolic execution of the given program. All the
81 input values are represented with symbolic values; further, the engine deduces
82 the values of all the expressions in the program based on the input symbols
83 and the path. The execution is path sensitive and every possible path through
84 the program is explored. The explored execution traces are represented with
85 <a href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1ExplodedGraph.html">ExplidedGraph</a> object.
86 Each node of the graph is
87 <a href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1ExplodedNode.html">ExplodedNode</a>,
88 which consists of a <tt>ProgramPoint</tt> and a <tt>ProgramState</tt>.
89 <p>
90 <a href="http://clang.llvm.org/doxygen/classclang_1_1ProgramPoint.html">ProgramPoint</a>
91 represents the corresponding location in the program (or the CFG graph).
92 <tt>ProgramPoint</tt> is also used to record additional information on
93 when/how the state was added. For example, <tt>PostPurgeDeadSymbolsKind</tt>
94 kind means that the state is the result of purging dead symbols - the
95 analyzer's equivalent of garbage collection.
96 <p>
97 <a href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1ProgramState.html">ProgramState</a>
98 represents abstract state of the program. It consists of:
Anna Zaksd67fc492011-11-02 17:49:20 +000099 <ul>
Anna Zaks464ef2e2011-11-07 05:36:29 +0000100 <li><tt>Environment</tt> - a mapping from source code expressions to symbolic
101 values
102 <li><tt>Store</tt> - a mapping from memory locations to symbolic values
103 <li><tt>GenericDataMap</tt> - constraints on symbolic values
104 </ul>
105
Anna Zaks22d4fb92011-12-07 19:04:24 +0000106 <h3>Interaction with Checkers</h3>
Anna Zaks464ef2e2011-11-07 05:36:29 +0000107 Checkers are not merely passive receivers of the analyzer core changes - they
108 actively participate in the <tt>ProgramState</tt> construction through the
109 <tt>GenericDataMap</tt> which can be used to store the checker-defined part
110 of the state. Each time the analyzer engine explores a new statement, it
111 notifies each checker registered to listen for that statement, giving it an
112 opportunity to either report a bug or modify the state. (As a rule of thumb,
113 the checker itself should be stateless.) The checkers are called one after another
114 in the predefined order; thus, calling all the checkers adds a chain to the
115 <tt>ExplodedGraph</tt>.
Anna Zaks22d4fb92011-12-07 19:04:24 +0000116
117 <h3>Representing Values</h3>
118 During symbolic execution, <a href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1SVal.html">SVal</a>
119 objects are used to represent the semantic evaluation of expressions. They can
120 represent things like concrete integers, symbolic values, or memory locations
121 (which are memory regions). They are a discriminated union of "values",
122 symbolic and otherwise.
123 <p>
124 <a href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1SymExpr.html">SymExpr</a> (symbol)
125 is meant to represent abstract, but named, symbolic value.
126 Symbolic values can have constraints associated with them. Symbols represent
127 an actual (immutable) value. We might not know what its specific value is, but
128 we can associate constraints with that value as we analyze a path.
129 <p>
130
131 <a href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1MemRegion.html">MemRegion</a> is similar to a symbol.
132 It is used to provide a lexicon of how to describe abstract memory. Regions can
133 layer on top of other regions, providing a layered approach to representing memory.
134 For example, a struct object on the stack might be represented by a <tt>VarRegion</tt>,
135 but a <tt>FieldRegion</tt> which is a subregion of the <tt>VarRegion</tt> could
136 be used to represent the memory associated with a specific field of that object.
137 So how do we represent symbolic memory regions? That's what <a href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1SymbolicRegion.html">SymbolicRegion</a>
138 is for. It is a <tt>MemRegion</tt> that has an associated symbol. Since the
139 symbol is unique and has a unique name; that symbol names the region.
140 <p>
141 Let's see how the analyzer processes the expressions in the following example:
142 <p>
143 <pre class="code_example">
144 int foo(int x) {
145 int y = x * 2;
146 int z = x;
147 ...
148 }
149 </pre>
150 <p>
151Let's look at how <tt>x*2</tt> gets evaluated. When <tt>x</tt> is evaluated,
152we first construct an <tt>SVal</tt> that represents the lvalue of <tt>x</tt>, in
153this case it is an <tt>SVal</tt> that references the <tt>MemRegion</tt> for <tt>x</tt>.
154Afterwards, when we do the lvalue-to-rvalue conversion, we get a new <tt>SVal</tt>,
155which references the value <b>currently bound</b> to <tt>x</tt>. That value is
156symbolic; it's whatever <tt>x</tt> was bound to at the start of the function.
157Let's call that symbol <tt>$0</tt>. Similarly, we evaluate the expression for <tt>2</tt>,
158and get an <tt>SVal</tt> that references the concrete number <tt>2</tt>. When
159we evaluate <tt>x*2</tt>, we take the two <tt>SVals</tt> of the subexpressions,
160and create a new <tt>SVal</tt> that represents their multiplication (which in
161this case is a new symbolic expression, which we might call <tt>$1</tt>). When we
162evaluate the assignment to <tt>y</tt>, we again compute its lvalue (a <tt>MemRegion</tt>),
163and then bind the <tt>SVal</tt> for the RHS (which references the symbolic value <tt>$1</tt>)
164to the <tt>MemRegion</tt> in the symbolic store.
165<br>
166The second line is similar. When we evaluate <tt>x</tt> again, we do the same
167dance, and create an <tt>SVal</tt> that references the symbol <tt>$0</tt>. Note, two <tt>SVals</tt>
168might reference the same underlying values.
169
170<p>
171To summarize, MemRegions are unique names for blocks of memory. Symbols are
172unique names for abstract symbolic values. Some MemRegions represents abstract
173symbolic chunks of memory, and thus are also based on symbols. SVals are just
174references to values, and can reference either MemRegions, Symbols, or concrete
175values (e.g., the number 1).
176
Anna Zaks464ef2e2011-11-07 05:36:29 +0000177 <!--
178 TODO: Add a picture.
179 <br>
180 Symbols<br>
181 FunctionalObjects are used throughout.
182 -->
183<h2 id=idea>Idea for a Checker</h2>
184 Here are several questions which you should consider when evaluating your
185 checker idea:
186 <ul>
187 <li>Can the check be effectively implemented without path-sensitive
188 analysis? See <a href="#ast">AST Visitors</a>.</li>
Anna Zaksd67fc492011-11-02 17:49:20 +0000189
190 <li>How high the false positive rate is going to be? Looking at the occurrences
Anna Zaks464ef2e2011-11-07 05:36:29 +0000191 of the issue you want to write a checker for in the existing code bases might
192 give you some ideas. </li>
Anna Zaksd67fc492011-11-02 17:49:20 +0000193
194 <li>How the current limitations of the analysis will effect the false alarm
195 rate? Currently, the analyzer only reasons about one procedure at a time (no
Anna Zaks464ef2e2011-11-07 05:36:29 +0000196 inter-procedural analysis). Also, it uses a simple range tracking based
197 solver to model symbolic execution.</li>
Anna Zaksd67fc492011-11-02 17:49:20 +0000198
Benjamin Kramer665a8dc2012-01-15 15:26:07 +0000199 <li>Consult the <a
200 href="http://llvm.org/bugs/buglist.cgi?query_format=advanced&amp;bug_status=NEW&amp;bug_status=REOPENED&amp;version=trunk&amp;component=Static%20Analyzer&amp;product=clang">Bugzilla database</a>
Anna Zaksd67fc492011-11-02 17:49:20 +0000201 to get some ideas for new checkers and consider starting with improving/fixing
202 bugs in the existing checkers.</li>
203 </ul>
204
Anna Zaks464ef2e2011-11-07 05:36:29 +0000205<h2 id=registration>Checker Registration</h2>
206 All checker implementation files are located in <tt>clang/lib/StaticAnalyzer/Checkers</tt>
207 folder. Follow the steps below to register a new checker with the analyzer.
208<ol>
209 <li>Create a new checker implementation file, for example <tt>./lib/StaticAnalyzer/Checkers/NewChecker.cpp</tt>
210<pre class="code_example">
211using namespace clang;
212using namespace ento;
213
214namespace {
Benjamin Kramer665a8dc2012-01-15 15:26:07 +0000215class NewChecker: public Checker< check::PreStmt&lt;CallExpr> > {
Anna Zaks464ef2e2011-11-07 05:36:29 +0000216public:
Benjamin Kramer665a8dc2012-01-15 15:26:07 +0000217 void checkPreStmt(const CallExpr *CE, CheckerContext &amp;Ctx) const {}
Anna Zaks464ef2e2011-11-07 05:36:29 +0000218}
219}
Benjamin Kramer665a8dc2012-01-15 15:26:07 +0000220void ento::registerNewChecker(CheckerManager &amp;mgr) {
221 mgr.registerChecker&lt;NewChecker>();
Anna Zaks464ef2e2011-11-07 05:36:29 +0000222}
223</pre>
224
225<li>Pick the package name for your checker and add the registration code to
226<tt>./lib/StaticAnalyzer/Checkers/Checkers.td</tt>. Note, all checkers should
227first be developed as experimental. Suppose our new checker performs security
228related checks, then we should add the following lines under
229<tt>SecurityExperimental</tt> package:
230<pre class="code_example">
231let ParentPackage = SecurityExperimental in {
232...
233def NewChecker : Checker<"NewChecker">,
234 HelpText<"This text should give a short description of the checks performed.">,
235 DescFile<"NewChecker.cpp">;
236...
237} // end "security.experimental"
238</pre>
239
240<li>Make the source code file visible to CMake by adding it to
241<tt>./lib/StaticAnalyzer/Checkers/CMakeLists.txt</tt>.
242
243<li>Compile and see your checker in the list of available checkers by running:<br>
244<tt><b>$clang -cc1 -analyzer-checker-help</b></tt>
245</ol>
246
247
Anna Zaksd67fc492011-11-02 17:49:20 +0000248<h2 id=skeleton>Checker Skeleton</h2>
Anna Zaksd67fc492011-11-02 17:49:20 +0000249 There are two main decisions you need to make:
250 <ul>
Anna Zaks22d4fb92011-12-07 19:04:24 +0000251 <li> Which events the checker should be tracking.
252 See <a href="http://clang.llvm.org/doxygen/classento_1_1CheckerDocumentation.html">CheckerDocumentation</a>
253 for the list of available checker callbacks.</li>
Anna Zaks464ef2e2011-11-07 05:36:29 +0000254 <li> What data you want to store as part of the checker-specific program
255 state. Try to minimize the checker state as much as possible. </li>
Anna Zaksd67fc492011-11-02 17:49:20 +0000256 </ul>
Anna Zaksd67fc492011-11-02 17:49:20 +0000257
258<h2 id=bugs>Bug Reports</h2>
259
260<h2 id=ast>AST Visitors</h2>
261 Some checks might not require path-sensitivity to be effective. Simple AST walk
Anna Zaks464ef2e2011-11-07 05:36:29 +0000262 might be sufficient. If that is the case, consider implementing a Clang
263 compiler warning. On the other hand, a check might not be acceptable as a compiler
Anna Zaksd67fc492011-11-02 17:49:20 +0000264 warning; for example, because of a relatively high false positive rate. In this
265 situation, AST callbacks <tt><b>checkASTDecl</b></tt> and
266 <tt><b>checkASTCodeBody</b></tt> are your best friends.
267
268<h2 id=testing>Testing</h2>
269 Every patch should be well tested with Clang regression tests. The checker tests
270 live in <tt>clang/test/Analysis</tt> folder. To run all of the analyzer tests,
271 execute the following from the <tt>clang</tt> build directory:
272 <pre class="code">
273 $ <b>TESTDIRS=Analysis make test</b>
274 </pre>
275
276<h2 id=commands>Useful Commands/Debugging Hints</h2>
277<ul>
278<li>
Anna Zaks464ef2e2011-11-07 05:36:29 +0000279While investigating a checker-related issue, instruct the analyzer to only
280execute a single checker:
Anna Zaksd67fc492011-11-02 17:49:20 +0000281<br><tt>
282$ <b>clang -cc1 -analyze -analyzer-checker=osx.KeychainAPI test.c</b>
283</tt>
284</li>
285<li>
286To dump AST:
287<br><tt>
288$ <b>clang -cc1 -ast-dump test.c</b>
289</tt>
290</li>
291<li>
292To view/dump CFG use <tt>debug.ViewCFG</tt> or <tt>debug.DumpCFG</tt> checkers:
293<br><tt>
294$ <b>clang -cc1 -analyze -analyzer-checker=debug.ViewCFG test.c</b>
295</tt>
296</li>
297<li>
298To see all available debug checkers:
299<br><tt>
300$ <b>clang -cc1 -analyzer-checker-help | grep "debug"</b>
301</tt>
302</li>
303<li>
Anna Zaks464ef2e2011-11-07 05:36:29 +0000304To see which function is failing while processing a large file use
305<tt>-analyzer-display-progress</tt> option.
Anna Zaksd67fc492011-11-02 17:49:20 +0000306</li>
307<li>
Anna Zaks464ef2e2011-11-07 05:36:29 +0000308While debugging execute <tt>clang -cc1 -analyze -analyzer-checker=core</tt>
309instead of <tt>clang --analyze</tt>, as the later would call the compiler
310in a separate process.
Anna Zaksd67fc492011-11-02 17:49:20 +0000311</li>
312<li>
Anna Zaks464ef2e2011-11-07 05:36:29 +0000313To view <tt>ExplodedGraph</tt> (the state graph explored by the analyzer) while
314debugging, goto a frame that has <tt>clang::ento::ExprEngine</tt> object and
315execute:
Anna Zaksd67fc492011-11-02 17:49:20 +0000316<br><tt>
317(gdb) <b>p ViewGraph(0)</b>
318</tt>
319</li>
320<li>
Anna Zaksb78229c2011-12-07 19:04:27 +0000321To see the <tt>ProgramState</tt> while debugging use the following command.
322<br><tt>
323(gdb) <b>p State->dump()</b>
324</tt>
325</li>
326<li>
Anna Zaks464ef2e2011-11-07 05:36:29 +0000327To see <tt>clang::Expr</tt> while debugging use the following command. If you
328pass in a SourceManager object, it will also dump the corresponding line in the
329source code.
Anna Zaksd67fc492011-11-02 17:49:20 +0000330<br><tt>
331(gdb) <b>p E->dump()</b>
332</tt>
333</li>
334<li>
335To dump AST of a method that the current <tt>ExplodedNode</tt> belongs to:
336<br><tt>
337(gdb) <b>p ENode->getCodeDecl().getBody()->dump(getContext().getSourceManager())</b>
338</tt>
339</li>
340</ul>
341
342</div>
343</div>
344</body>
345</html>