blob: 4883556383f38556c46c2f9f45b459734d8252c6 [file] [log] [blame]
Anna Zaks75a3f482011-11-02 17:49:20 +00001<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
2 "http://www.w3.org/TR/html4/strict.dtd">
3<html>
4<head>
5 <title>Checker Developer Manual</title>
Benjamin Kramereaa262b2012-01-15 15:26:07 +00006 <link type="text/css" rel="stylesheet" href="menu.css">
7 <link type="text/css" rel="stylesheet" href="content.css">
Anna Zaks75a3f482011-11-02 17:49:20 +00008 <script type="text/javascript" src="scripts/menu.js"></script>
9</head>
10<body>
11
12<div id="page">
13<!--#include virtual="menu.html.incl"-->
14
15<div id="content">
16
Anna Zaks8cfbaa62013-05-18 22:51:28 +000017<h3 style="color:red">This Page Is Under Construction</h3>
Anna Zaks75a3f482011-11-02 17:49:20 +000018
19<h1>Checker Developer Manual</h1>
20
Anna Zaks1ebded02013-04-14 18:36:51 +000021<p>The static analyzer engine performs path-sensitive exploration of the program and
Anna Zaks75a3f482011-11-02 17:49:20 +000022relies on a set of checkers to implement the logic for detecting and
Anna Zaks1ebded02013-04-14 18:36:51 +000023constructing specific bug reports. Anyone who is interested in implementing their own
24checker, should check out the Building a Checker in 24 Hours talk
25(<a href="http://llvm.org/devmtg/2012-11/Zaks-Rose-Checker24Hours.pdf">slides</a>
26 <a href="http://llvm.org/devmtg/2012-11/videos/Zaks-Rose-Checker24Hours.mp4">video</a>)
27and refer to this page for additional information on writing a checker. The static analyzer is a
Anna Zaks75a3f482011-11-02 17:49:20 +000028part of the Clang project, so consult <a href="http://clang.llvm.org/hacking.html">Hacking on Clang</a>
Anna Zaks1ebded02013-04-14 18:36:51 +000029and <a href="http://llvm.org/docs/ProgrammersManual.html">LLVM Programmer's Manual</a>
30for developer guidelines and send your questions and proposals to
Tanya Lattner4a08e932015-08-05 03:55:23 +000031<a href=http://lists.llvm.org/mailman/listinfo/cfe-dev>cfe-dev mailing list</a>.
Anna Zaks1ebded02013-04-14 18:36:51 +000032</p>
Anna Zaks75a3f482011-11-02 17:49:20 +000033
34 <ul>
35 <li><a href="#start">Getting Started</a></li>
Anna Zaks8cfbaa62013-05-18 22:51:28 +000036 <li><a href="#analyzer">Static Analyzer Overview</a>
37 <ul>
38 <li><a href="#interaction">Interaction with Checkers</a></li>
39 <li><a href="#values">Representing Values</a></li>
40 </ul></li>
Anna Zaks75a3f482011-11-02 17:49:20 +000041 <li><a href="#idea">Idea for a Checker</a></li>
Anna Zaks52590862011-11-07 05:36:29 +000042 <li><a href="#registration">Checker Registration</a></li>
Anna Zaks8cfbaa62013-05-18 22:51:28 +000043 <li><a href="#events_callbacks">Events, Callbacks, and Checker Class Structure</a></li>
44 <li><a href="#extendingstates">Custom Program States</a></li>
Anna Zaks75a3f482011-11-02 17:49:20 +000045 <li><a href="#bugs">Bug Reports</a></li>
46 <li><a href="#ast">AST Visitors</a></li>
47 <li><a href="#testing">Testing</a></li>
Artem Dergachevd73c57c2016-07-28 20:13:14 +000048 <li><a href="#commands">Useful Commands/Debugging Hints</a>
49 <ul>
50 <li><a href="#attaching">Attaching the Debugger</a></li>
51 <li><a href="#narrowing">Narrowing Down the Problem</a></li>
52 <li><a href="#visualizing">Visualizing the Analysis</a></li>
53 <li><a href="#debugprints">Debug Prints and Tricks</a></li>
54 </ul></li>
Anna Zaks8cfbaa62013-05-18 22:51:28 +000055 <li><a href="#additioninformation">Additional Sources of Information</a></li>
Anton Yartsev45056dc2014-05-19 15:04:55 +000056 <li><a href="#links">Useful Links</a></li>
Anna Zaks75a3f482011-11-02 17:49:20 +000057 </ul>
58
59<h2 id=start>Getting Started</h2>
60 <ul>
Anna Zaks52590862011-11-07 05:36:29 +000061 <li>To check out the source code and build the project, follow steps 1-4 of
62 the <a href="http://clang.llvm.org/get_started.html">Clang Getting Started</a>
Anna Zaks75a3f482011-11-02 17:49:20 +000063 page.</li>
64
65 <li>The analyzer source code is located under the Clang source tree:
66 <br><tt>
67 $ <b>cd llvm/tools/clang</b>
68 </tt>
Anna Zaks52590862011-11-07 05:36:29 +000069 <br>See: <tt>include/clang/StaticAnalyzer</tt>, <tt>lib/StaticAnalyzer</tt>,
70 <tt>test/Analysis</tt>.</li>
Anna Zaks75a3f482011-11-02 17:49:20 +000071
Anna Zaks52590862011-11-07 05:36:29 +000072 <li>The analyzer regression tests can be executed from the Clang's build
73 directory:
Anna Zaks75a3f482011-11-02 17:49:20 +000074 <br><tt>
75 $ <b>cd ../../../; cd build/tools/clang; TESTDIRS=Analysis make test</b>
76 </tt></li>
77
78 <li>Analyze a file with the specified checker:
79 <br><tt>
80 $ <b>clang -cc1 -analyze -analyzer-checker=core.DivideZero test.c</b>
81 </tt></li>
82
83 <li>List the available checkers:
84 <br><tt>
85 $ <b>clang -cc1 -analyzer-checker-help</b>
86 </tt></li>
87
Anna Zaks52590862011-11-07 05:36:29 +000088 <li>See the analyzer help for different output formats, fine tuning, and
89 debug options:
Anna Zaks75a3f482011-11-02 17:49:20 +000090 <br><tt>
91 $ <b>clang -cc1 -help | grep "analyzer"</b>
92 </tt></li>
93
94 </ul>
95
96<h2 id=analyzer>Static Analyzer Overview</h2>
Anna Zaks52590862011-11-07 05:36:29 +000097 The analyzer core performs symbolic execution of the given program. All the
98 input values are represented with symbolic values; further, the engine deduces
99 the values of all the expressions in the program based on the input symbols
100 and the path. The execution is path sensitive and every possible path through
101 the program is explored. The explored execution traces are represented with
David Blaikie7c70fe62012-09-20 20:59:21 +0000102 <a href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1ExplodedGraph.html">ExplodedGraph</a> object.
Anna Zaks52590862011-11-07 05:36:29 +0000103 Each node of the graph is
104 <a href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1ExplodedNode.html">ExplodedNode</a>,
105 which consists of a <tt>ProgramPoint</tt> and a <tt>ProgramState</tt>.
106 <p>
107 <a href="http://clang.llvm.org/doxygen/classclang_1_1ProgramPoint.html">ProgramPoint</a>
Jonathan Roelofs99bdd982015-05-19 18:51:56 +0000108 represents the corresponding location in the program (or the CFG).
Anna Zaks52590862011-11-07 05:36:29 +0000109 <tt>ProgramPoint</tt> is also used to record additional information on
110 when/how the state was added. For example, <tt>PostPurgeDeadSymbolsKind</tt>
111 kind means that the state is the result of purging dead symbols - the
112 analyzer's equivalent of garbage collection.
113 <p>
114 <a href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1ProgramState.html">ProgramState</a>
115 represents abstract state of the program. It consists of:
Anna Zaks75a3f482011-11-02 17:49:20 +0000116 <ul>
Anna Zaks52590862011-11-07 05:36:29 +0000117 <li><tt>Environment</tt> - a mapping from source code expressions to symbolic
118 values
119 <li><tt>Store</tt> - a mapping from memory locations to symbolic values
120 <li><tt>GenericDataMap</tt> - constraints on symbolic values
121 </ul>
122
Anna Zaks8cfbaa62013-05-18 22:51:28 +0000123 <h3 id=interaction>Interaction with Checkers</h3>
Artem Dergachevd73c57c2016-07-28 20:13:14 +0000124
125 <p>
Anna Zaks52590862011-11-07 05:36:29 +0000126 Checkers are not merely passive receivers of the analyzer core changes - they
127 actively participate in the <tt>ProgramState</tt> construction through the
128 <tt>GenericDataMap</tt> which can be used to store the checker-defined part
129 of the state. Each time the analyzer engine explores a new statement, it
130 notifies each checker registered to listen for that statement, giving it an
131 opportunity to either report a bug or modify the state. (As a rule of thumb,
132 the checker itself should be stateless.) The checkers are called one after another
133 in the predefined order; thus, calling all the checkers adds a chain to the
Artem Dergachevd73c57c2016-07-28 20:13:14 +0000134 <tt>ExplodedGraph</tt>.
135 </p>
Anna Zaksdb0e1732011-12-07 19:04:24 +0000136
Anna Zaks8cfbaa62013-05-18 22:51:28 +0000137 <h3 id=values>Representing Values</h3>
Artem Dergachevd73c57c2016-07-28 20:13:14 +0000138
139 <p>
Anna Zaksdb0e1732011-12-07 19:04:24 +0000140 During symbolic execution, <a href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1SVal.html">SVal</a>
Anna Zaksaf48a932013-01-08 00:25:14 +0000141 objects are used to represent the semantic evaluation of expressions.
142 They can represent things like concrete
143 integers, symbolic values, or memory locations (which are memory regions).
144 They are a discriminated union of "values", symbolic and otherwise.
145 If a value isn't symbolic, usually that means there is no symbolic
146 information to track. For example, if the value was an integer, such as
147 <tt>42</tt>, it would be a <a href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1nonloc_1_1ConcreteInt.html">ConcreteInt</a>,
148 and the checker doesn't usually need to track any state with the concrete
149 number. In some cases, <tt>SVal</tt> is not a symbol, but it really should be
150 a symbolic value. This happens when the analyzer cannot reason about something
151 (yet). An example is floating point numbers. In such cases, the
Anna Zaks8cfbaa62013-05-18 22:51:28 +0000152 <tt>SVal</tt> will evaluate to <a href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1UnknownVal.html">UnknownVal</a>.
Anna Zaksaf48a932013-01-08 00:25:14 +0000153 This represents a case that is outside the realm of the analyzer's reasoning
154 capabilities. <tt>SVals</tt> are value objects and their values can be viewed
155 using the <tt>.dump()</tt> method. Often they wrap persistent objects such as
Artem Dergachevd73c57c2016-07-28 20:13:14 +0000156 symbols or regions.
157 </p>
158
Anna Zaksdb0e1732011-12-07 19:04:24 +0000159 <p>
160 <a href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1SymExpr.html">SymExpr</a> (symbol)
Anna Zaksaf48a932013-01-08 00:25:14 +0000161 is meant to represent abstract, but named, symbolic value. Symbols represent
Anna Zaksdb0e1732011-12-07 19:04:24 +0000162 an actual (immutable) value. We might not know what its specific value is, but
Anna Zaksaf48a932013-01-08 00:25:14 +0000163 we can associate constraints with that value as we analyze a path. For
164 example, we might record that the value of a symbol is greater than
165 <tt>0</tt>, etc.
Artem Dergachevd73c57c2016-07-28 20:13:14 +0000166 </p>
Anna Zaksaf48a932013-01-08 00:25:14 +0000167
168 <p>
Anna Zaksdb0e1732011-12-07 19:04:24 +0000169 <a href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1MemRegion.html">MemRegion</a> is similar to a symbol.
170 It is used to provide a lexicon of how to describe abstract memory. Regions can
171 layer on top of other regions, providing a layered approach to representing memory.
172 For example, a struct object on the stack might be represented by a <tt>VarRegion</tt>,
173 but a <tt>FieldRegion</tt> which is a subregion of the <tt>VarRegion</tt> could
174 be used to represent the memory associated with a specific field of that object.
Anna Zaksaf48a932013-01-08 00:25:14 +0000175 So how do we represent symbolic memory regions? That's what
176 <a href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1SymbolicRegion.html">SymbolicRegion</a>
177 is for. It is a <tt>MemRegion</tt> that has an associated symbol. Since the
Anna Zaksdb0e1732011-12-07 19:04:24 +0000178 symbol is unique and has a unique name; that symbol names the region.
Artem Dergachevd73c57c2016-07-28 20:13:14 +0000179 </p>
Anna Zaksaf48a932013-01-08 00:25:14 +0000180
Artem Dergachevd73c57c2016-07-28 20:13:14 +0000181 <p>
Anna Zaksdb0e1732011-12-07 19:04:24 +0000182 Let's see how the analyzer processes the expressions in the following example:
Artem Dergachevd73c57c2016-07-28 20:13:14 +0000183 </p>
184
Anna Zaksdb0e1732011-12-07 19:04:24 +0000185 <p>
186 <pre class="code_example">
187 int foo(int x) {
188 int y = x * 2;
189 int z = x;
190 ...
191 }
192 </pre>
Artem Dergachevd73c57c2016-07-28 20:13:14 +0000193 </p>
194
Anna Zaksdb0e1732011-12-07 19:04:24 +0000195 <p>
196Let's look at how <tt>x*2</tt> gets evaluated. When <tt>x</tt> is evaluated,
197we first construct an <tt>SVal</tt> that represents the lvalue of <tt>x</tt>, in
198this case it is an <tt>SVal</tt> that references the <tt>MemRegion</tt> for <tt>x</tt>.
199Afterwards, when we do the lvalue-to-rvalue conversion, we get a new <tt>SVal</tt>,
200which references the value <b>currently bound</b> to <tt>x</tt>. That value is
201symbolic; it's whatever <tt>x</tt> was bound to at the start of the function.
202Let's call that symbol <tt>$0</tt>. Similarly, we evaluate the expression for <tt>2</tt>,
203and get an <tt>SVal</tt> that references the concrete number <tt>2</tt>. When
204we evaluate <tt>x*2</tt>, we take the two <tt>SVals</tt> of the subexpressions,
205and create a new <tt>SVal</tt> that represents their multiplication (which in
206this case is a new symbolic expression, which we might call <tt>$1</tt>). When we
207evaluate the assignment to <tt>y</tt>, we again compute its lvalue (a <tt>MemRegion</tt>),
208and then bind the <tt>SVal</tt> for the RHS (which references the symbolic value <tt>$1</tt>)
209to the <tt>MemRegion</tt> in the symbolic store.
210<br>
211The second line is similar. When we evaluate <tt>x</tt> again, we do the same
212dance, and create an <tt>SVal</tt> that references the symbol <tt>$0</tt>. Note, two <tt>SVals</tt>
213might reference the same underlying values.
Artem Dergachevd73c57c2016-07-28 20:13:14 +0000214 </p>
Anna Zaksdb0e1732011-12-07 19:04:24 +0000215
216<p>
217To summarize, MemRegions are unique names for blocks of memory. Symbols are
218unique names for abstract symbolic values. Some MemRegions represents abstract
219symbolic chunks of memory, and thus are also based on symbols. SVals are just
220references to values, and can reference either MemRegions, Symbols, or concrete
221values (e.g., the number 1).
Artem Dergachevd73c57c2016-07-28 20:13:14 +0000222</p>
Anna Zaksdb0e1732011-12-07 19:04:24 +0000223
Anna Zaks52590862011-11-07 05:36:29 +0000224 <!--
225 TODO: Add a picture.
226 <br>
227 Symbols<br>
228 FunctionalObjects are used throughout.
229 -->
Anna Zaks8cfbaa62013-05-18 22:51:28 +0000230
Anna Zaks52590862011-11-07 05:36:29 +0000231<h2 id=idea>Idea for a Checker</h2>
232 Here are several questions which you should consider when evaluating your
233 checker idea:
234 <ul>
235 <li>Can the check be effectively implemented without path-sensitive
236 analysis? See <a href="#ast">AST Visitors</a>.</li>
Anna Zaks75a3f482011-11-02 17:49:20 +0000237
238 <li>How high the false positive rate is going to be? Looking at the occurrences
Anna Zaks52590862011-11-07 05:36:29 +0000239 of the issue you want to write a checker for in the existing code bases might
240 give you some ideas. </li>
Anna Zaks75a3f482011-11-02 17:49:20 +0000241
242 <li>How the current limitations of the analysis will effect the false alarm
243 rate? Currently, the analyzer only reasons about one procedure at a time (no
Anna Zaks52590862011-11-07 05:36:29 +0000244 inter-procedural analysis). Also, it uses a simple range tracking based
245 solver to model symbolic execution.</li>
Anna Zaks75a3f482011-11-02 17:49:20 +0000246
Benjamin Kramereaa262b2012-01-15 15:26:07 +0000247 <li>Consult the <a
248 href="http://llvm.org/bugs/buglist.cgi?query_format=advanced&amp;bug_status=NEW&amp;bug_status=REOPENED&amp;version=trunk&amp;component=Static%20Analyzer&amp;product=clang">Bugzilla database</a>
Anna Zaks75a3f482011-11-02 17:49:20 +0000249 to get some ideas for new checkers and consider starting with improving/fixing
250 bugs in the existing checkers.</li>
251 </ul>
252
Anna Zaks8cfbaa62013-05-18 22:51:28 +0000253<p>Once an idea for a checker has been chosen, there are two key decisions that
254need to be made:
Anna Zaks75a3f482011-11-02 17:49:20 +0000255 <ul>
Anna Zaks8cfbaa62013-05-18 22:51:28 +0000256 <li> Which events the checker should be tracking. This is discussed in more
257 detail in the section <a href="#events_callbacks">Events, Callbacks, and
258 Checker Class Structure</a>.
259 <li> What checker-specific data needs to be stored as part of the program
260 state (if any). This should be minimized as much as possible. More detail about
261 implementing custom program state is given in section <a
262 href="#extendingstates">Custom Program States</a>.
Anna Zaks75a3f482011-11-02 17:49:20 +0000263 </ul>
Anna Zaks75a3f482011-11-02 17:49:20 +0000264
Anna Zaks8cfbaa62013-05-18 22:51:28 +0000265
266<h2 id=registration>Checker Registration</h2>
267 All checker implementation files are located in
268 <tt>clang/lib/StaticAnalyzer/Checkers</tt> folder. The steps below describe
269 how the checker <tt>SimpleStreamChecker</tt>, which checks for misuses of
270 stream APIs, was registered with the analyzer.
271 Similar steps should be followed for a new checker.
272<ol>
273 <li>A new checker implementation file, <tt>SimpleStreamChecker.cpp</tt>, was
274 created in the directory <tt>lib/StaticAnalyzer/Checkers</tt>.
275 <li>The following registration code was added to the implementation file:
276<pre class="code_example">
277void ento::registerSimpleStreamChecker(CheckerManager &amp;mgr) {
278 mgr.registerChecker&lt;SimpleStreamChecker&gt();
279}
280</pre>
281<li>A package was selected for the checker and the checker was defined in the
Gabor Horvath70c671a2017-07-08 08:23:52 +0000282table of checkers at <tt>include/clang/StaticAnalyzer/Checkers/Checkers.td</tt>.
283Since all checkers should first be developed as "alpha", and the SimpleStreamChecker
Anna Zaks8cfbaa62013-05-18 22:51:28 +0000284performs UNIX API checks, the correct package is "alpha.unix", and the following
285was added to the corresponding <tt>UnixAlpha</tt> section of <tt>Checkers.td</tt>:
286<pre class="code_example">
287let ParentPackage = UnixAlpha in {
288...
289def SimpleStreamChecker : Checker<"SimpleStream">,
290 HelpText<"Check for misuses of stream APIs">,
291 DescFile<"SimpleStreamChecker.cpp">;
292...
293} // end "alpha.unix"
294</pre>
295
296<li>The source code file was made visible to CMake by adding it to
297<tt>lib/StaticAnalyzer/Checkers/CMakeLists.txt</tt>.
298
299</ol>
300
301After adding a new checker to the analyzer, one can verify that the new checker
302was successfully added by seeing if it appears in the list of available checkers:
303<br> <tt><b>$clang -cc1 -analyzer-checker-help</b></tt>
304
305<h2 id=events_callbacks>Events, Callbacks, and Checker Class Structure</h2>
306
307<p> All checkers inherit from the <tt><a
308href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1Checker.html">
309Checker</a></tt> template class; the template parameter(s) describe the type of
310events that the checker is interested in processing. The various types of events
311that are available are described in the file <a
312href="http://clang.llvm.org/doxygen/CheckerDocumentation_8cpp_source.html">
313CheckerDocumentation.cpp</a>
314
315<p> For each event type requested, a corresponding callback function must be
316defined in the checker class (<a
317href="http://clang.llvm.org/doxygen/CheckerDocumentation_8cpp_source.html">
318CheckerDocumentation.cpp</a> shows the
319correct function name and signature for each event type).
320
321<p> As an example, consider <tt>SimpleStreamChecker</tt>. This checker needs to
322take action at the following times:
323
324<ul>
325<li>Before making a call to a function, check if the function is <tt>fclose</tt>.
326If so, check the parameter being passed.
327<li>After making a function call, check if the function is <tt>fopen</tt>. If
328so, process the return value.
329<li>When values go out of scope, check whether they are still-open file
330descriptors, and report a bug if so. In addition, remove any information about
331them from the program state in order to keep the state as small as possible.
332<li>When file pointers "escape" (are used in a way that the analyzer can no longer
333track them), mark them as such. This prevents false positives in the cases where
334the analyzer cannot be sure whether the file was closed or not.
335</ul>
336
337<p>These events that will be used for each of these actions are, respectively, <a
338href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1check_1_1PreCall.html">PreCall</a>,
339<a
340href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1check_1_1PostCall.html">PostCall</a>,
341<a
342href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1check_1_1DeadSymbols.html">DeadSymbols</a>,
343and <a
344href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1check_1_1PointerEscape.html">PointerEscape</a>.
345The high-level structure of the checker's class is thus:
346
347<pre class="code_example">
348class SimpleStreamChecker : public Checker&lt;check::PreCall,
349 check::PostCall,
350 check::DeadSymbols,
351 check::PointerEscape&gt; {
352public:
353
354 void checkPreCall(const CallEvent &amp;Call, CheckerContext &amp;C) const;
355
356 void checkPostCall(const CallEvent &amp;Call, CheckerContext &amp;C) const;
357
358 void checkDeadSymbols(SymbolReaper &amp;SR, CheckerContext &amp;C) const;
359
360 ProgramStateRef checkPointerEscape(ProgramStateRef State,
361 const InvalidatedSymbols &amp;Escaped,
362 const CallEvent *Call,
363 PointerEscapeKind Kind) const;
364};
365</pre>
366
367<h2 id=extendingstates>Custom Program States</h2>
368
369<p> Checkers often need to keep track of information specific to the checks they
370perform. However, since checkers have no guarantee about the order in which the
371program will be explored, or even that all possible paths will be explored, this
372state information cannot be kept within individual checkers. Therefore, if
373checkers need to store custom information, they need to add new categories of
374data to the <tt>ProgramState</tt>. The preferred way to do so is to use one of
375several macros designed for this purpose. They are:
376
377<ul>
378<li><a
379href="http://clang.llvm.org/doxygen/ProgramStateTrait_8h.html#ae4cddb54383cd702a045d7c61b009147">REGISTER_TRAIT_WITH_PROGRAMSTATE</a>:
380Used when the state information is a single value. The methods available for
381state types declared with this macro are <tt>get</tt>, <tt>set</tt>, and
382<tt>remove</tt>.
383<li><a
384href="http://clang.llvm.org/doxygen/CheckerContext_8h.html#aa27656fa0ce65b0d9ba12eb3c02e8be9">REGISTER_LIST_WITH_PROGRAMSTATE</a>:
385Used when the state information is a list of values. The methods available for
386state types declared with this macro are <tt>add</tt>, <tt>get</tt>,
387<tt>remove</tt>, and <tt>contains</tt>.
388<li><a
389href="http://clang.llvm.org/doxygen/CheckerContext_8h.html#ad90f9387b94b344eaaf499afec05f4d1">REGISTER_SET_WITH_PROGRAMSTATE</a>:
390Used when the state information is a set of values. The methods available for
391state types declared with this macro are <tt>add</tt>, <tt>get</tt>,
392<tt>remove</tt>, and <tt>contains</tt>.
393<li><a
394href="http://clang.llvm.org/doxygen/CheckerContext_8h.html#a6d1893bb8c18543337b6c363c1319fcf">REGISTER_MAP_WITH_PROGRAMSTATE</a>:
395Used when the state information is a map from a key to a value. The methods
396available for state types declared with this macro are <tt>add</tt>,
397<tt>set</tt>, <tt>get</tt>, <tt>remove</tt>, and <tt>contains</tt>.
398</ul>
399
400<p>All of these macros take as parameters the name to be used for the custom
401category of state information and the data type(s) to be used for storage. The
402data type(s) specified will become the parameter type and/or return type of the
403methods that manipulate the new category of state information. Each of these
404methods are templated with the name of the custom data type.
405
406<p>For example, a common case is the need to track data associated with a
407symbolic expression; a map type is the most logical way to implement this. The
408key for this map will be a pointer to a symbolic expression
409(<tt>SymbolRef</tt>). If the data type to be associated with the symbolic
410expression is an integer, then the custom category of state information would be
411declared as
412
413<pre class="code_example">
414REGISTER_MAP_WITH_PROGRAMSTATE(ExampleDataType, SymbolRef, int)
415</pre>
416
417The data would be accessed with the function
418
419<pre class="code_example">
420ProgramStateRef state;
421SymbolRef Sym;
422...
423int currentlValue = state-&gt;get&lt;ExampleDataType&gt;(Sym);
424</pre>
425
426and set with the function
427
428<pre class="code_example">
429ProgramStateRef state;
430SymbolRef Sym;
431int newValue;
432...
433ProgramStateRef newState = state-&gt;set&lt;ExampleDataType&gt;(Sym, newValue);
434</pre>
435
436<p>In addition, the macros define a data type used for storing the data of the
437new data category; the name of this type is the name of the data category with
438"Ty" appended. For <tt>REGISTER_TRAIT_WITH_PROGRAMSTATE</tt>, this will simply
439be passed data type; for the other three macros, this will be a specialized
440version of the <a
441href="http://llvm.org/doxygen/classllvm_1_1ImmutableList.html">llvm::ImmutableList</a>,
442<a
443href="http://llvm.org/doxygen/classllvm_1_1ImmutableSet.html">llvm::ImmutableSet</a>,
444or <a
445href="http://llvm.org/doxygen/classllvm_1_1ImmutableMap.html">llvm::ImmutableMap</a>
446templated class. For the <tt>ExampleDataType</tt> example above, the type
447created would be equivalent to writing the declaration:
448
449<pre class="code_example">
450typedef llvm::ImmutableMap&lt;SymbolRef, int&gt; ExampleDataTypeTy;
451</pre>
452
453<p>These macros will cover a majority of use cases; however, they still have a
454few limitations. They cannot be used inside namespaces (since they expand to
455contain top-level namespace references), and the data types that they define
456cannot be referenced from more than one file.
457
458<p>Note that <tt>ProgramStates</tt> are immutable; instead of modifying an existing
459one, functions that modify the state will return a copy of the previous state
460with the change applied. This updated state must be then provided to the
461analyzer core by calling the <tt>CheckerContext::addTransition</tt> function.
Anna Zaks75a3f482011-11-02 17:49:20 +0000462<h2 id=bugs>Bug Reports</h2>
463
Anna Zaks8cfbaa62013-05-18 22:51:28 +0000464
465<p> When a checker detects a mistake in the analyzed code, it needs a way to
466report it to the analyzer core so that it can be displayed. The two classes used
467to construct this report are <tt><a
468href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1BugType.html">BugType</a></tt>
469and <tt><a
470href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1BugReport.html">
471BugReport</a></tt>.
472
473<p>
474<tt>BugType</tt>, as the name would suggest, represents a type of bug. The
475constructor for <tt>BugType</tt> takes two parameters: The name of the bug
476type, and the name of the category of the bug. These are used (e.g.) in the
477summary page generated by the scan-build tool.
478
479<P>
480 The <tt>BugReport</tt> class represents a specific occurrence of a bug. In
481 the most common case, three parameters are used to form a <tt>BugReport</tt>:
482<ol>
483<li>The type of bug, specified as an instance of the <tt>BugType</tt> class.
484<li>A short descriptive string. This is placed at the location of the bug in
485the detailed line-by-line output generated by scan-build.
486<li>The context in which the bug occurred. This includes both the location of
487the bug in the program and the program's state when the location is reached. These are
488both encapsulated in an <tt>ExplodedNode</tt>.
489</ol>
490
491<p>In order to obtain the correct <tt>ExplodedNode</tt>, a decision must be made
492as to whether or not analysis can continue along the current path. This decision
493is based on whether the detected bug is one that would prevent the program under
494analysis from continuing. For example, leaking of a resource should not stop
495analysis, as the program can continue to run after the leak. Dereferencing a
496null pointer, on the other hand, should stop analysis, as there is no way for
497the program to meaningfully continue after such an error.
498
499<p>If analysis can continue, then the most recent <tt>ExplodedNode</tt>
500generated by the checker can be passed to the <tt>BugReport</tt> constructor
501without additional modification. This <tt>ExplodedNode</tt> will be the one
502returned by the most recent call to <a
503href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1CheckerContext.html#a264f48d97809707049689c37aa35af78">CheckerContext::addTransition</a>.
504If no transition has been performed during the current callback, the checker should call <a
505href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1CheckerContext.html#a264f48d97809707049689c37aa35af78">CheckerContext::addTransition()</a>
506and use the returned node for bug reporting.
507
508<p>If analysis can not continue, then the current state should be transitioned
509into a so-called <i>sink node</i>, a node from which no further analysis will be
510performed. This is done by calling the <a
511href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1CheckerContext.html#adeea33a5a2bed190210c4a2bb807a6f0">
512CheckerContext::generateSink</a> function; this function is the same as the
513<tt>addTransition</tt> function, but marks the state as a sink node. Like
514<tt>addTransition</tt>, this returns an <tt>ExplodedNode</tt> with the updated
515state, which can then be passed to the <tt>BugReport</tt> constructor.
516
517<p>
518After a <tt>BugReport</tt> is created, it should be passed to the analyzer core
519by calling <a href = "http://clang.llvm.org/doxygen/classclang_1_1ento_1_1CheckerContext.html#ae7738af2cbfd1d713edec33d3203dff5">CheckerContext::emitReport</a>.
520
Anna Zaks75a3f482011-11-02 17:49:20 +0000521<h2 id=ast>AST Visitors</h2>
522 Some checks might not require path-sensitivity to be effective. Simple AST walk
Anna Zaks52590862011-11-07 05:36:29 +0000523 might be sufficient. If that is the case, consider implementing a Clang
524 compiler warning. On the other hand, a check might not be acceptable as a compiler
Anna Zaks75a3f482011-11-02 17:49:20 +0000525 warning; for example, because of a relatively high false positive rate. In this
526 situation, AST callbacks <tt><b>checkASTDecl</b></tt> and
527 <tt><b>checkASTCodeBody</b></tt> are your best friends.
528
529<h2 id=testing>Testing</h2>
530 Every patch should be well tested with Clang regression tests. The checker tests
531 live in <tt>clang/test/Analysis</tt> folder. To run all of the analyzer tests,
532 execute the following from the <tt>clang</tt> build directory:
533 <pre class="code">
Artem Dergachevd73c57c2016-07-28 20:13:14 +0000534 $ <b>bin/llvm-lit -sv ../llvm/tools/clang/test/Analysis</b>
Anna Zaks75a3f482011-11-02 17:49:20 +0000535 </pre>
536
537<h2 id=commands>Useful Commands/Debugging Hints</h2>
Artem Dergachevd73c57c2016-07-28 20:13:14 +0000538
539<h3 id=attaching>Attaching the Debugger</h3>
540
541<p>When your command contains the <tt><b>-cc1</b></tt> flag, you can attach the
542debugger to it directly:</p>
543
544<pre class="code">
545 $ <b>gdb --args clang -cc1 -analyze -analyzer-checker=core test.c</b>
546 $ <b>lldb -- clang -cc1 -analyze -analyzer-checker=core test.c</b>
547</pre>
548
549<p>
550Otherwise, if your command line contains <tt><b>--analyze</b></tt>,
551the actual clang instance would be run in a separate process. In
552order to debug it, use the <tt><b>-###</b></tt> flag for obtaining
553the command line of the child process:
554</p>
555
556<pre class="code">
557 $ <b>clang --analyze test.c -\#\#\#</b>
558</pre>
559
560<p>
561Below we describe a few useful command line arguments, all of which assume that
562you are running <tt><b>clang -cc1</b></tt>.
563</p>
564
565<h3 id=narrowing>Narrowing Down the Problem</h3>
566
567<p>While investigating a checker-related issue, instruct the analyzer to only
Anna Zaks52590862011-11-07 05:36:29 +0000568execute a single checker:
Artem Dergachevd73c57c2016-07-28 20:13:14 +0000569</p>
570<pre class="code">
571 $ <b>clang -cc1 -analyze -analyzer-checker=osx.KeychainAPI test.c</b>
572</pre>
573
574<p>If you are experiencing a crash, to see which function is failing while
575processing a large file use the <tt><b>-analyzer-display-progress</b></tt>
576option.</p>
577
George Karpenkov40eb5132017-09-30 00:07:22 +0000578<p>To selectively analyze only the given function, use the
579<tt><b>-analyze-function</b></tt> option:</p>
Artem Dergachevd73c57c2016-07-28 20:13:14 +0000580<pre class="code">
581 $ <b>clang -cc1 -analyze -analyzer-checker=core test.c -analyzer-display-progress</b>
582 ANALYZE (Syntax): test.c foo
583 ANALYZE (Syntax): test.c bar
584 ANALYZE (Path, Inline_Regular): test.c bar
585 ANALYZE (Path, Inline_Regular): test.c foo
586 $ <b>clang -cc1 -analyze -analyzer-checker=core test.c -analyzer-display-progress -analyze-function=foo</b>
587 ANALYZE (Syntax): test.c foo
588 ANALYZE (Path, Inline_Regular): test.c foo
589</pre>
590
George Karpenkov40eb5132017-09-30 00:07:22 +0000591<b>Note: </b> a fully qualified function name has to be used when selecting
592C++ functions and methods, Objective-C methods and blocks, e.g.:
593
594<pre class="code">
595 $ <b>clang -cc1 -analyze -analyzer-checker=core test.cc -analyze-function=foo(int)</b>
596</pre>
597
598The fully qualified name can be found from the
599<tt><b>-analyzer-display-progress</b></tt> output.
600
Artem Dergachevd73c57c2016-07-28 20:13:14 +0000601<p>The bug reporter mechanism removes path diagnostics inside intermediate
602function calls that have returned by the time the bug was found and contain
603no interesting pieces. Usually it is up to the checkers to produce more
604interesting pieces by adding custom <tt>BugReporterVisitor</tt> objects.
605However, you can disable path pruning while debugging with the
606<tt><b>-analyzer-config prune-paths=false</b></tt> option.
607
608<h3 id=visualizing>Visualizing the Analysis</h3>
609
610<p>To dump the AST, which often helps understanding how the program should
611behave:</p>
612<pre class="code">
613 $ <b>clang -cc1 -ast-dump test.c</b>
614</pre>
615
616<p>To view/dump CFG use <tt>debug.ViewCFG</tt> or <tt>debug.DumpCFG</tt>
617checkers:</p>
618<pre class="code">
619 $ <b>clang -cc1 -analyze -analyzer-checker=debug.ViewCFG test.c</b>
620</pre>
621
622<p><tt>ExplodedGraph</tt> (the state graph explored by the analyzer) can be
623visualized with another debug checker:</p>
624<pre class="code">
625 $ <b>clang -cc1 -analyze -analyzer-checker=debug.ViewExplodedGraph test.c</b>
626</pre>
627<p>Or, equivalently, with <tt><b>-analyzer-viz-egraph-graphviz</b></tt>
628option, which does the same thing - dumps the exploded graph in graphviz
629<tt><b>.dot</b></tt> format.</p>
630
631<p>You can convert <tt><b>.dot</b></tt> files into other formats - in
632particular, converting to <tt><b>.svg</b></tt> and viewing in your web
633browser might be more comfortable than using a <tt><b>.dot</b></tt> viewer:</p>
634<pre class="code">
635 $ <b>dot -Tsvg ExprEngine-501e2e.dot -o ExprEngine-501e2e.svg</b>
636</pre>
637
638<p>The <tt><b>-trim-egraph</b></tt> option removes all paths except those
639leading to bug reports from the exploded graph dump. This is useful
640because exploded graphs are often huge and hard to navigate.</p>
641
642<p>Viewing <tt>ExplodedGraph</tt> is your most powerful tool for understanding
643the analyzer's false positives, because it gives comprehensive information
644on every decision made by the analyzer across all analysis paths.</p>
645
646<p>There are more debug checkers available. To see all available debug checkers:
647</p>
648<pre class="code">
649 $ <b>clang -cc1 -analyzer-checker-help | grep "debug"</b>
650</pre>
651
652<h3 id=debugprints>Debug Prints and Tricks</h3>
653
654<p>To view "half-baked" <tt>ExplodedGraph</tt> while debugging, jump to a frame
655that has <tt>clang::ento::ExprEngine</tt> object and execute:</p>
656<pre class="code">
657 (gdb) <b>p ViewGraph(0)</b>
658</pre>
659
660<p>To see the <tt>ProgramState</tt> while debugging use the following command.
661<pre class="code">
662 (gdb) <b>p State->dump()</b>
663</pre>
664
665<p>To see <tt>clang::Expr</tt> while debugging use the following command. If you
666pass in a <tt>SourceManager</tt> object, it will also dump the corresponding line in the
667source code.</p>
668<pre class="code">
669 (gdb) <b>p E->dump()</b>
670</pre>
671
672<p>To dump AST of a method that the current <tt>ExplodedNode</tt> belongs
673to:</p>
674<pre class="code">
675 (gdb) <b>p C.getPredecessor()->getCodeDecl().getBody()->dump()</b>
676</pre>
Anna Zaks75a3f482011-11-02 17:49:20 +0000677
Anna Zaks8cfbaa62013-05-18 22:51:28 +0000678<h2 id=additioninformation>Additional Sources of Information</h2>
679
680Here are some additional resources that are useful when working on the Clang
681Static Analyzer:
682
683<ul>
684<li> <a href="http://clang.llvm.org/doxygen">Clang doxygen</a>. Contains
685up-to-date documentation about the APIs available in Clang. Relevant entries
686have been linked throughout this page. Also of use is the
687<a href="http://llvm.org/doxygen">LLVM doxygen</a>, when dealing with classes
688from LLVM.
Tanya Lattner4a08e932015-08-05 03:55:23 +0000689<li> The <a href="http://lists.llvm.org/mailman/listinfo/cfe-dev">
Anna Zaks8cfbaa62013-05-18 22:51:28 +0000690cfe-dev mailing list</a>. This is the primary mailing list used for
691discussion of Clang development (including static code analysis). The
Tanya Lattner4a08e932015-08-05 03:55:23 +0000692<a href="http://lists.llvm.org/pipermail/cfe-dev">archive</a> also contains
Anna Zaks8cfbaa62013-05-18 22:51:28 +0000693a lot of information.
694<li> The "Building a Checker in 24 hours" presentation given at the <a
695href="http://llvm.org/devmtg/2012-11">November 2012 LLVM Developer's
696meeting</a>. Describes the construction of SimpleStreamChecker. <a
697href="http://llvm.org/devmtg/2012-11/Zaks-Rose-Checker24Hours.pdf">Slides</a>
698and <a
699href="http://llvm.org/devmtg/2012-11/videos/Zaks-Rose-Checker24Hours.mp4">video</a>
700are available.
701</ul>
702
Anton Yartsev45056dc2014-05-19 15:04:55 +0000703<h2 id=links>Useful Links</h2>
704<ul>
705<li>The list of <a href="implicit_checks.html">Implicit Checkers</a></li>
706</ul>
707
Anna Zaks75a3f482011-11-02 17:49:20 +0000708</div>
709</div>
710</body>
711</html>