Ted Kremenek | 9d9963e | 2009-03-26 16:19:54 +0000 | [diff] [blame] | 1 | Static Analyzer: 'Regions' |
| 2 | -------------------------- |
Zhongxing Xu | 7fddc33 | 2009-03-26 08:23:58 +0000 | [diff] [blame] | 3 | |
Ted Kremenek | 9d9963e | 2009-03-26 16:19:54 +0000 | [diff] [blame] | 4 | INTRODUCTION |
Zhongxing Xu | 7fddc33 | 2009-03-26 08:23:58 +0000 | [diff] [blame] | 5 | |
Ted Kremenek | 9d9963e | 2009-03-26 16:19:54 +0000 | [diff] [blame] | 6 | The path-sensitive analysis engine in libAnalysis employs an extensible API |
| 7 | for abstractly modeling the memory of an analyzed program. This API employs |
| 8 | the concept of "memory regions" to abstractly model chunks of program memory |
| 9 | such as program variables and dynamically allocated memory such as those |
| 10 | returned from 'malloc' and 'alloca'. Regions are hierarchical, with subregions |
| 11 | modeling subtyping relationships, field and array offsets into larger chunks |
| 12 | of memory, and so on. |
Zhongxing Xu | 7fddc33 | 2009-03-26 08:23:58 +0000 | [diff] [blame] | 13 | |
Ted Kremenek | 9d9963e | 2009-03-26 16:19:54 +0000 | [diff] [blame] | 14 | The region API consists of two components. The first is the taxonomy and |
| 15 | representation of regions themselves within the analyzer engine. The primary |
| 16 | definitions and interfaces are described in |
| 17 | 'include/clang/Analysis/PathSensitive/MemRegion.h'. At the root of the region |
| 18 | hierarchy is the class 'MemRegion' with specific subclasses refining the |
| 19 | region concept for variables, heap allocated memory, and so forth. |
Zhongxing Xu | 7fddc33 | 2009-03-26 08:23:58 +0000 | [diff] [blame] | 20 | |
Ted Kremenek | 9d9963e | 2009-03-26 16:19:54 +0000 | [diff] [blame] | 21 | The second component in the region API is the modeling of the binding of |
| 22 | values to regions. For example, modeling the value stored to a local variable |
| 23 | 'x' consists of recording the binding between the region for 'x' (which |
| 24 | represents the raw memory associated with 'x') and the value stored to 'x'. |
| 25 | This binding relationship is captured with the notion of "symbolic stores." |
| 26 | |
| 27 | Symbolic stores, which can be thought of as representing the relation 'regions |
| 28 | -> values', are implemented by subclasses of the StoreManager class (Store.h). |
| 29 | A particular StoreManager implementation has complete flexibility concerning |
| 30 | (a) *how* to model the binding between regions and values and (b) *what* |
| 31 | bindings are recorded. Together, both points allow different StoreManagers to |
| 32 | tradeoff between different levels of analysis precision and scalability |
| 33 | concerning the reasoning of program memory. Meanwhile, the core path-sensitive |
| 34 | engine makes no assumptions about (a) or (b), and queries a StoreManager about |
| 35 | the bindings to a memory region through a generic interface that all |
| 36 | StoreManagers share. If a particular StoreManager cannot reason about the |
| 37 | potential bindings of a given memory region (e.g., 'BasicStoreManager' does |
| 38 | not reason about fields of structures) then the StoreManager can simply return |
| 39 | 'unknown' (represented by 'UnknownVal') for a particular region-binding. This |
| 40 | separation of concerns not only isolates the core analysis engine from the |
| 41 | details of reasoning about program memory but also facilities the option of a |
| 42 | client of the path-sensitive engine to easily swap in different StoreManager |
| 43 | implementations that internally reason about program memory in very different |
| 44 | ways. |
| 45 | |
| 46 | The rest of this document is divided into two parts. We first discuss region |
| 47 | taxonomy and the semantics of regions. We then discuss the StoreManager |
| 48 | interface, and details of how the currently available StoreManager classes |
| 49 | implement region bindings. |
| 50 | |
| 51 | MEMORY REGIONS and REGION TAXONOMY |
| 52 | |
| 53 | SYMBOLIC REGIONS |
| 54 | |
| 55 | A symbolic region is a map of the concept of symbolic values into the domain |
| 56 | of regions. It is the way that we represent symbolic pointers. Whenever a |
| 57 | symbolic pointer value is needed, a symbolic region is created to represent |
| 58 | it. |
| 59 | |
| 60 | A symbolic region has no type. It wraps a SymbolData. But sometimes we have |
| 61 | type information associated with a symbolic region. For this case, a |
| 62 | TypedViewRegion is created to layer the type information on top of the |
| 63 | symbolic region. The reason we do not carry type information with the symbolic |
| 64 | region is that the symbolic regions can have no type. To be consistent, we |
| 65 | don't let them to carry type information. |
| 66 | |
| 67 | Like a symbolic pointer, a symbolic region may be NULL, has unknown extent, |
| 68 | and represents a generic chunk of memory. |
| 69 | |
| 70 | NOTE: We plan not to use loc::SymbolVal in RegionStore and remove it |
| 71 | gradually. |
Zhongxing Xu | 7fddc33 | 2009-03-26 08:23:58 +0000 | [diff] [blame] | 72 | |
Zhongxing Xu | 113cc14 | 2009-04-01 03:23:38 +0000 | [diff] [blame] | 73 | Symbolic regions get their rvalue types through the following ways: |
| 74 | * through the parameter or global variable that points to it, e.g.: |
| 75 | |
| 76 | void f(struct s* p) { |
| 77 | ... |
| 78 | } |
| 79 | |
| 80 | The symbolic region pointed to by 'p' has type 'struct s'. |
| 81 | |
| 82 | * through explicit or implicit casts, e.g.: |
| 83 | void f(void* p) { |
| 84 | struct s* q = (struct s*) p; |
| 85 | ... |
| 86 | } |
| 87 | |
| 88 | We attach the type information to the symbolic region lazily. For the first |
| 89 | case above, we create the TypedViewRegion only when the pointer is actually |
| 90 | used to access the pointee memory object, that is when the element or field |
| 91 | region is created. For the cast case, the TypedViewRegion is created when |
| 92 | visiting the CastExpr. |
| 93 | |
| 94 | The reason for doing lazy typing is that symbolic regions are sometimes only |
| 95 | used to do location comparison. |
| 96 | |
Zhongxing Xu | 7fddc33 | 2009-03-26 08:23:58 +0000 | [diff] [blame] | 97 | Pointer Casts |
| 98 | |
| 99 | Pointer casts allow people to impose different 'views' onto a chunk of memory. |
| 100 | |
| 101 | Usually we have two kinds of casts. One kind of casts cast down with in the type |
| 102 | hierarchy. It imposes more specific views onto more generic memory regions. The |
| 103 | other kind of casts cast up with in the type hierarchy. It strips away more |
| 104 | specific views on top of the more generic memory regions. |
| 105 | |
| 106 | We simulate the down casts by layering another TypedViewRegion on top of the |
| 107 | original region. We simulate the up casts by striping away the top |
| 108 | TypedViewRegion. Down casts is usually simple. For up casts, if the there is no |
| 109 | TypedViewRegion to be stripped, we return the original region. If the underlying |
| 110 | region is of the different type than the cast-to type, we flag an error state. |
| 111 | |
| 112 | For toll-free bridging casts, we return the original region. |
| 113 | |
| 114 | Region Bindings |
| 115 | |
| 116 | The following region kinds are boundable: VarRegion, CompoundLiteralRegion, |
| 117 | StringRegion, ElementRegion, FieldRegion, and ObjCIvarRegion. |
| 118 | |
| 119 | When binding regions, we perform canonicalization on element regions and field |
| 120 | regions. This is because we can have different views on the same region, some of |
| 121 | which are essentially the same view with different sugar type names. |
| 122 | |
| 123 | To canonicalize a region, we get the canonical types for all TypedViewRegions |
| 124 | along the way up to the root region, and make new TypedViewRegions with those |
| 125 | canonical types. |
| 126 | |
| 127 | All bindings and retrievings are done on the canonicalized regions. |
| 128 | |
| 129 | Canonicalization is transparent outside the region store manager, and more |
| 130 | specifically, unaware outside the Bind() and Retrieve() method. We don't need to |
| 131 | consider region canonicalization when doing pointer cast. |