| //===----------------------------------------------------------------------===// |
| // Clang Static Analyzer |
| //===----------------------------------------------------------------------===// |
| |
| = Library Structure = |
| |
| The analyzer library has two layers: a (low-level) static analysis |
| engine (GRExprEngine.cpp and friends), and some static checkers |
| (*Checker.cpp). The latter are built on top of the former via the |
| Checker and CheckerVisitor interfaces (Checker.h and |
| CheckerVisitor.h). The Checker interface is designed to be minimal |
| and simple for checker writers, and attempts to isolate them from much |
| of the gore of the internal analysis engine. |
| |
| = How It Works = |
| |
| The analyzer is inspired by several foundational research papers ([1], |
| [2]). (FIXME: kremenek to add more links) |
| |
| In a nutshell, the analyzer is basically a source code simulator that |
| traces out possible paths of execution. The state of the program |
| (values of variables and expressions) is encapsulated by the state |
| (ProgramState). A location in the program is called a program point |
| (ProgramPoint), and the combination of state and program point is a |
| node in an exploded graph (ExplodedGraph). The term "exploded" comes |
| from exploding the control-flow edges in the control-flow graph (CFG). |
| |
| Conceptually the analyzer does a reachability analysis through the |
| ExplodedGraph. We start at a root node, which has the entry program |
| point and initial state, and then simulate transitions by analyzing |
| individual expressions. The analysis of an expression can cause the |
| state to change, resulting in a new node in the ExplodedGraph with an |
| updated program point and an updated state. A bug is found by hitting |
| a node that satisfies some "bug condition" (basically a violation of a |
| checking invariant). |
| |
| The analyzer traces out multiple paths by reasoning about branches and |
| then bifurcating the state: on the true branch the conditions of the |
| branch are assumed to be true and on the false branch the conditions |
| of the branch are assumed to be false. Such "assumptions" create |
| constraints on the values of the program, and those constraints are |
| recorded in the ProgramState object (and are manipulated by the |
| ConstraintManager). If assuming the conditions of a branch would |
| cause the constraints to be unsatisfiable, the branch is considered |
| infeasible and that path is not taken. This is how we get |
| path-sensitivity. We reduce exponential blow-up by caching nodes. If |
| a new node with the same state and program point as an existing node |
| would get generated, the path "caches out" and we simply reuse the |
| existing node. Thus the ExplodedGraph is not a DAG; it can contain |
| cycles as paths loop back onto each other and cache out. |
| |
| ProgramState and ExplodedNodes are basically immutable once created. Once |
| one creates a ProgramState, you need to create a new one to get a new |
| ProgramState. This immutability is key since the ExplodedGraph represents |
| the behavior of the analyzed program from the entry point. To |
| represent these efficiently, we use functional data structures (e.g., |
| ImmutableMaps) which share data between instances. |
| |
| Finally, individual Checkers work by also manipulating the analysis |
| state. The analyzer engine talks to them via a visitor interface. |
| For example, the PreVisitCallExpr() method is called by GRExprEngine |
| to tell the Checker that we are about to analyze a CallExpr, and the |
| checker is asked to check for any preconditions that might not be |
| satisfied. The checker can do nothing, or it can generate a new |
| ProgramState and ExplodedNode which contains updated checker state. If it |
| finds a bug, it can tell the BugReporter object about the bug, |
| providing it an ExplodedNode which is the last node in the path that |
| triggered the problem. |
| |
| = Notes about C++ = |
| |
| Since now constructors are seen before the variable that is constructed |
| in the CFG, we create a temporary object as the destination region that |
| is constructed into. See ExprEngine::VisitCXXConstructExpr(). |
| |
| In ExprEngine::processCallExit(), we always bind the object region to the |
| evaluated CXXConstructExpr. Then in VisitDeclStmt(), we compute the |
| corresponding lazy compound value if the variable is not a reference, and |
| bind the variable region to the lazy compound value. If the variable |
| is a reference, just use the object region as the initilizer value. |
| |
| Before entering a C++ method (or ctor/dtor), the 'this' region is bound |
| to the object region. In ctors, we synthesize 'this' region with |
| CXXRecordDecl*, which means we do not use type qualifiers. In methods, we |
| synthesize 'this' region with CXXMethodDecl*, which has getThisType() |
| taking type qualifiers into account. It does not matter we use qualified |
| 'this' region in one method and unqualified 'this' region in another |
| method, because we only need to ensure the 'this' region is consistent |
| when we synthesize it and create it directly from CXXThisExpr in a single |
| method call. |
| |
| = Working on the Analyzer = |
| |
| If you are interested in bringing up support for C++ expressions, the |
| best place to look is the visitation logic in GRExprEngine, which |
| handles the simulation of individual expressions. There are plenty of |
| examples there of how other expressions are handled. |
| |
| If you are interested in writing checkers, look at the Checker and |
| CheckerVisitor interfaces (Checker.h and CheckerVisitor.h). Also look |
| at the files named *Checker.cpp for examples on how you can implement |
| these interfaces. |
| |
| = Debugging the Analyzer = |
| |
| There are some useful command-line options for debugging. For example: |
| |
| $ clang -cc1 -help | grep analyze |
| -analyze-function <value> |
| -analyzer-display-progress |
| -analyzer-viz-egraph-graphviz |
| ... |
| |
| The first allows you to specify only analyzing a specific function. |
| The second prints to the console what function is being analyzed. The |
| third generates a graphviz dot file of the ExplodedGraph. This is |
| extremely useful when debugging the analyzer and viewing the |
| simulation results. |
| |
| Of course, viewing the CFG (Control-Flow Graph) is also useful: |
| |
| $ clang -cc1 -help | grep cfg |
| -cfg-add-implicit-dtors Add C++ implicit destructors to CFGs for all analyses |
| -cfg-add-initializers Add C++ initializers to CFGs for all analyses |
| -cfg-dump Display Control-Flow Graphs |
| -cfg-view View Control-Flow Graphs using GraphViz |
| -unoptimized-cfg Generate unoptimized CFGs for all analyses |
| |
| -cfg-dump dumps a textual representation of the CFG to the console, |
| and -cfg-view creates a GraphViz representation. |
| |
| = References = |
| |
| [1] Precise interprocedural dataflow analysis via graph reachability, |
| T Reps, S Horwitz, and M Sagiv, POPL '95, |
| http://portal.acm.org/citation.cfm?id=199462 |
| |
| [2] A memory model for static analysis of C programs, Z Xu, T |
| Kremenek, and J Zhang, http://lcs.ios.ac.cn/~xzx/memmodel.pdf |