blob: 6767cfd5790359ba778e5ec836c4f9cdce580dcc [file] [log] [blame]
Ted Kremenekd76e0a62009-04-03 01:38:55 +00001<html>
2<head>
3<title>Static Analyzer Design Document: Memory Regions</title>
4</head>
5<body>
6
7<h1>Static Analyzer Design Document: Memory Regions</h1>
8
9<h3>Authors</h3>
10
11<p>Ted Kremenek, <tt>kremenek at apple</tt><br>
12Zhongxing Xu, <tt>xuzhongzhing at gmail</tt></p>
13
14<h2 id="intro">Introduction</h2>
15
16<p>The path-sensitive analysis engine in libAnalysis employs an extensible API
17for abstractly modeling the memory of an analyzed program. This API employs the
18concept of "memory regions" to abstractly model chunks of program memory such as
19program variables and dynamically allocated memory such as those returned from
20'malloc' and 'alloca'. Regions are hierarchical, with subregions modeling
21subtyping relationships, field and array offsets into larger chunks of memory,
22and so on.</p>
23
24<p>The region API consists of two components:</p>
25
26<ul> <li>A taxonomy and representation of regions themselves within the analyzer
27engine. The primary definitions and interfaces are described in <tt><a
28href="http://clang.llvm.org/doxygen/MemRegion_8h-source.html">MemRegion.h</a></tt>.
29At the root of the region hierarchy is the class <tt>MemRegion</tt> with
30specific subclasses refining the region concept for variables, heap allocated
31memory, and so forth.</li> <li>The modeling of binding of values to regions. For
32example, modeling the value stored to a local variable <tt>x</tt> consists of
33recording the binding between the region for <tt>x</tt> (which represents the
34raw memory associated with <tt>x</tt>) and the value stored to <tt>x</tt>. This
35binding relationship is captured with the notion of &quot;symbolic
36stores.&quot;</li> </ul>
37
38<p>Symbolic stores, which can be thought of as representing the relation
39<tt>regions -> values</tt>, are implemented by subclasses of the
40<tt>StoreManager</tt> class (<tt><a
41href="http://clang.llvm.org/doxygen/Store_8h-source.html">Store.h</a></tt>). A
42particular StoreManager implementation has complete flexibility concerning the
43following:
44
45<ul>
46<li><em>How</em> to model the binding between regions and values</li>
47<li><em>What</em> bindings are recorded
48</ul>
49
50<p>Together, both points allow different StoreManagers to tradeoff between
51different levels of analysis precision and scalability concerning the reasoning
52of program memory. Meanwhile, the core path-sensitive engine makes no
53assumptions about either points, and queries a StoreManager about the bindings
54to a memory region through a generic interface that all StoreManagers share. If
55a particular StoreManager cannot reason about the potential bindings of a given
56memory region (e.g., '<tt>BasicStoreManager</tt>' does not reason about fields
57of structures) then the StoreManager can simply return 'unknown' (represented by
58'<tt>UnknownVal</tt>') for a particular region-binding. This separation of
59concerns not only isolates the core analysis engine from the details of
60reasoning about program memory but also facilities the option of a client of the
61path-sensitive engine to easily swap in different StoreManager implementations
62that internally reason about program memory in very different ways.</pp>
63
64<p>The rest of this document is divided into two parts. We first discuss region
65taxonomy and the semantics of regions. We then discuss the StoreManager
66interface, and details of how the currently available StoreManager classes
67implement region bindings.</p>
68
69<h2 id="regions">Memory Regions and Region Taxonomy</h2>
70
71<h3>Pointers</h3>
72
73<p>Before talking about the memory regions, we would talk about the pointers
74since memory regions are essentially used to represent pointer values.</p>
75
76<p>The pointer is a type of values. Pointer values have two semantic aspects.
77One is its physical value, which is an address or location. The other is the
78type of the memory object residing in the address.</p>
79
80<p>Memory regions are designed to abstract these two properties of the pointer.
81The physical value of a pointer is represented by MemRegion pointers. The rvalue
82type of the region corresponds to the type of the pointee object.</p>
83
84<p>One complication is that we could have different view regions on the same
85memory chunk. They represent the same memory location, but have different
86abstract location, i.e., MemRegion pointers. Thus we need to canonicalize the
87abstract locations to get a unique abstract location for one physical
88location.</p>
89
90<p>Furthermore, these different view regions may or may not represent memory
91objects of different types. Some different types are semantically the same,
92for example, 'struct s' and 'my_type' are the same type.</p>
93
94<pre>
95struct s;
96typedef struct s my_type;
97</pre>
98
99<p>But <tt>char</tt> and <tt>int</tt> are not the same type in the code below:</p>
100
101<pre>
102void *p;
103int *q = (int*) p;
104char *r = (char*) p;
105</pre
106
107<p>Thus we need to canonicalize the MemRegion which is used in binding and
108retrieving.</p>
109
110<h3>Symbolic Regions</h3>
111
112<p>A symbolic region is a map of the concept of symbolic values into the domain
113of regions. It is the way that we represent symbolic pointers. Whenever a
114symbolic pointer value is needed, a symbolic region is created to represent
115it.</p>
116
117<p>A symbolic region has no type. It wraps a SymbolData. But sometimes we have
118type information associated with a symbolic region. For this case, a
119TypedViewRegion is created to layer the type information on top of the symbolic
120region. The reason we do not carry type information with the symbolic region is
121that the symbolic regions can have no type. To be consistent, we don't let them
122to carry type information.</p>
123
124<p>Like a symbolic pointer, a symbolic region may be NULL, has unknown extent,
125and represents a generic chunk of memory.</p>
126
127<p><em><b>NOTE</b>: We plan not to use loc::SymbolVal in RegionStore and remove it
128 gradually.</em></p>
129
130<p>Symbolic regions get their rvalue types through the following ways:</p>
131
132<ul>
133<li>Through the parameter or global variable that points to it, e.g.:
134<pre>
135void f(struct s* p) {
136 ...
137}
138</pre>
139
140<p>The symbolic region pointed to by <tt>p</tt> has type <tt>struct
141s</tt>.</p></li>
142
143<li>Through explicit or implicit casts, e.g.:
144<pre>
145void f(void* p) {
146 struct s* q = (struct s*) p;
147 ...
148}
149</pre>
150</li>
151</ul>
152
153<p>We attach the type information to the symbolic region lazily. For the first
154case above, we create the <tt>TypedViewRegion</tt> only when the pointer is
155actually used to access the pointee memory object, that is when the element or
156field region is created. For the cast case, the <tt>TypedViewRegion</tt> is
157created when visiting the <tt>CastExpr</tt>.</p>
158
159<p>The reason for doing lazy typing is that symbolic regions are sometimes only
160used to do location comparison.</p>
161
162<h3>Pointer Casts</h3>
163
164<p>Pointer casts allow people to impose different 'views' onto a chunk of
165memory.</p>
166
167<p>Usually we have two kinds of casts. One kind of casts cast down with in the
168type hierarchy. It imposes more specific views onto more generic memory regions.
169The other kind of casts cast up with in the type hierarchy. It strips away more
170specific views on top of the more generic memory regions.</p>
171
172<p>We simulate the down casts by layering another <tt>TypedViewRegion</tt> on
173top of the original region. We simulate the up casts by striping away the top
174<tt>TypedViewRegion</tt>. Down casts is usually simple. For up casts, if the
175there is no <tt>TypedViewRegion</tt> to be stripped, we return the original
176region. If the underlying region is of the different type than the cast-to type,
177we flag an error state.</p>
178
179<p>For toll-free bridging casts, we return the original region.</p>
180
181<p>We can set up a partial order for pointer types, with the most general type
182<tt>void*</tt> at the top. The partial order forms a tree with <tt>void*</tt> as
183its root node.</p>
184
185<p>Every <tt>MemRegion</tt> has a root position in the type tree. For example,
186the pointee region of <tt>void *p</tt> has its root position at the root node of
187the tree. <tt>VarRegion</tt> of <tt>int x</tt> has its root position at the 'int
188type' node.</p>
189
190<p><tt>TypedViewRegion</tt> is used to move the region down or up in the tree.
191Moving down in the tree adds a <tt>TypedViewRegion</tt>. Moving up in the tree
192removes a <Tt>TypedViewRegion</tt>.</p>
193
194<p>Do we want to allow moving up beyond the root position? This happens
195when:</p> <pre> int x; void *p = &amp;x; </pre>
196
197<p>The region of <tt>x</tt> has its root position at 'int*' node. the cast to
198void* moves that region up to the 'void*' node. I propose to not allow such
199casts, and assign the region of <tt>x</tt> for <tt>p</tt>.<p>
200
201<h3>Region Bindings</h3>
202
203<p>The following region kinds are boundable: VarRegion, CompoundLiteralRegion,
204StringRegion, ElementRegion, FieldRegion, and ObjCIvarRegion.</p>
205
206<p>When binding regions, we perform canonicalization on element regions and field
207regions. This is because we can have different views on the same region, some
208of which are essentially the same view with different sugar type names.</p>
209
210<p>To canonicalize a region, we get the canonical types for all TypedViewRegions
211along the way up to the root region, and make new TypedViewRegions with those
212canonical types.</p>
213
214<p>For Objective-C and C++, perhaps another canonicalization rule should be
215added: for FieldRegion, the least derived class that has the field is used as
216the type of the super region of the FieldRegion.</p>
217
218<p>All bindings and retrievings are done on the canonicalized regions.</p>
219
220<p>Canonicalization is transparent outside the region store manager, and more
221specifically, unaware outside the Bind() and Retrieve() method. We don't need to
222consider region canonicalization when doing pointer cast.</p>
223
224<h3>Constraint Manager</h3>
225
226<p>The constraint manager reasons about the abstract location of memory objects.
227We can have different views on a region, but none of these views changes the
228location of that object. Thus we should get the same abstract location for those
229regions.</p>
230
231</body>
232</html>