Blame - docs/analyzer/IPA.txt - fp2-dev/platform/external/clang

blob: d585c3c4bb93dca7e1cbc70dc6a8b7ecf7a6d755 [file] [log] [blame]

Jordan Rose	25c0400	2012-08-17 02:11:35 +0000	[diff] [blame]	1	Inlining
				2	========
				3
Ted Kremenek	77df8d9	2012-08-22 01:20:05 +0000	[diff] [blame]	4	-analyzer-ipa=none - All inlining is disabled. This is the only mode available
				5	in LLVM 3.1 and earlier and in Xcode 4.3 and earlier.
				6
				7	-analyzer-ipa=basic-inlining - Turns on inlining for C functions, C++ static
				8	member functions, and blocks -- essentially, the calls that behave like
				9	simple C function calls. This is essentially the mode used in Xcode 4.4.
				10
				11	-analyzer-ipa=inlining - Turns on inlining when we can confidently find the
				12	function/method body corresponding to the call. (C functions, static
				13	functions, devirtualized C++ methods, Objective-C class methods, Objective-C
				14	instance methods when ExprEngine is confident about the dynamic type of the
				15	instance).
				16
				17	-analyzer-ipa=dynamic - Inline instance methods for which the type is
				18	determined at runtime and we are not 100% sure that our type info is
				19	correct. For virtual calls, inline the most plausible definition.
				20
				21	-analyzer-ipa=dynamic-bifurcate - Same as -analyzer-ipa=dynamic, but the path
				22	is split. We inline on one branch and do not inline on the other. This mode
				23	does not drop the coverage in cases when the parent class has code that is
				24	only exercised when some of its methods are overriden.
Jordan Rose	25c0400	2012-08-17 02:11:35 +0000	[diff] [blame]	25
Jordan Rose	c568e2f	2012-08-21 21:44:21 +0000	[diff] [blame]	26	Currently, -analyzer-ipa=basic-inlining is the default mode.
Jordan Rose	25c0400	2012-08-17 02:11:35 +0000	[diff] [blame]	27
				28	Basics of Implementation
				29	-----------------------
				30
Ted Kremenek	77df8d9	2012-08-22 01:20:05 +0000	[diff] [blame]	31	The low-level mechanism of inlining a function is handled in
				32	ExprEngine::inlineCall and ExprEngine::processCallExit.
Jordan Rose	25c0400	2012-08-17 02:11:35 +0000	[diff] [blame]	33
Ted Kremenek	77df8d9	2012-08-22 01:20:05 +0000	[diff] [blame]	34	If the conditions are right for inlining, a CallEnter node is created and added
				35	to the analysis work list. The CallEnter node marks the change to a new
				36	LocationContext representing the called function, and its state includes the
				37	contents of the new stack frame. When the CallEnter node is actually processed,
				38	its single successor will be a edge to the first CFG block in the function.
				39
				40	Exiting an inlined function is a bit more work, fortunately broken up into
				41	reasonable steps:
				42
				43	1. The CoreEngine realizes we're at the end of an inlined call and generates a
				44	CallExitBegin node.
				45
				46	2. ExprEngine takes over (in processCallExit) and finds the return value of the
				47	function, if it has one. This is bound to the expression that triggered the
				48	call. (In the case of calls without origin expressions, such as destructors,
				49	this step is skipped.)
				50
				51	3. Dead symbols and bindings are cleaned out from the state, including any local
				52	bindings.
				53
				54	4. A CallExitEnd node is generated, which marks the transition back to the
				55	caller's LocationContext.
				56
				57	5. Custom post-call checks are processed and the final nodes are pushed back
				58	onto the work list, so that evaluation of the caller can continue.
Jordan Rose	25c0400	2012-08-17 02:11:35 +0000	[diff] [blame]	59
				60	Retry Without Inlining
Ted Kremenek	77df8d9	2012-08-22 01:20:05 +0000	[diff] [blame]	61	----------------------
				62
				63	In some cases, we would like to retry analyzes without inlining the particular
				64	call.
				65
				66	Currently, we use this technique to recover the coverage in case we stop
				67	analyzing a path due to exceeding the maximum block count inside an inlined
				68	function.
				69
				70	When this situation is detected, we walk up the path to find the first node
				71	before inlining was started and enqueue it on the WorkList with a special
				72	ReplayWithoutInlining bit added to it (ExprEngine::replayWithoutInlining). The
				73	path is then re-analyzed from that point without inlining that particular call.
				74
				75	Deciding When to Inline
Jordan Rose	25c0400	2012-08-17 02:11:35 +0000	[diff] [blame]	76	-----------------------
				77
Ted Kremenek	77df8d9	2012-08-22 01:20:05 +0000	[diff] [blame]	78	In general, the analyzer attempts to inline as much as possible, since it
				79	provides a better summary of what actually happens in the program. There are
				80	some cases, however, where the analyzer chooses not to inline:
Jordan Rose	25c0400	2012-08-17 02:11:35 +0000	[diff] [blame]	81
Ted Kremenek	77df8d9	2012-08-22 01:20:05 +0000	[diff] [blame]	82	- If there is no definition available for the called function or method. In
				83	this case, there is no opportunity to inline.
				84
				85	- If we the CFG cannot be constructed for a called function, or the liveness
				86	cannot be computed. These are prerequisites for analyzing a function body,
				87	with or without inlining.
				88
				89	- If the LocationContext chain for a given ExplodedNode reaches a maximum cutoff
				90	depth. This prevents unbounded analysis due to infinite recursion, but also
				91	serves as a useful cutoff for performance reasons.
				92
				93	- If the function is variadic. This is not a hard limitation, but an engineering
				94	limitation.
				95
				96	Tracked by: <rdar://problem/12147064> Support inlining of variadic functions
				97
				98	- In C++, ExprEngine does not inline constructors unless the destructor is
				99	guaranteed to be inlined as well.
				100
				101	TMK/COMMENT This needs to be a bit more precise. How do we know the
				102	destructor is guaranteed to be inlined?
				103
				104	- In C++, ExprEngine does not inline custom implementations of operator 'new'
				105	implementations). This is due to a lack of complete handling of destructors.
				106
				107	- Calls resulting in "dynamic dispatch" are specially handled. See more below.
				108
				109	- Engine::FunctionSummaries map stores additional information about
				110	declarations, some of which is collected at runtime based on previous analyzes
				111	of the function. We do not inline functions which were not profitable to
				112	inline in a different context (for example, if the maximum block count was
				113	exceeded, see Retry Without Inlining).
Jordan Rose	25c0400	2012-08-17 02:11:35 +0000	[diff] [blame]	114
				115
Ted Kremenek	77df8d9	2012-08-22 01:20:05 +0000	[diff] [blame]	116	Dynamic Calls and Devirtualization
Jordan Rose	25c0400	2012-08-17 02:11:35 +0000	[diff] [blame]	117	----------------------------------
Jordan Rose	25c0400	2012-08-17 02:11:35 +0000	[diff] [blame]	118
Ted Kremenek	77df8d9	2012-08-22 01:20:05 +0000	[diff] [blame]	119	"Dynamic" calls are those that are resolved at runtime, such as C++ virtual
				120	method calls and Objective-C message sends. Due to the path-sensitive nature of
				121	the analyzer, the analyzer may be able to reason about the dynamic type of the
				122	object whose method is being called and thus "devirtualize" the call.
Jordan Rose	25c0400	2012-08-17 02:11:35 +0000	[diff] [blame]	123
Ted Kremenek	77df8d9	2012-08-22 01:20:05 +0000	[diff] [blame]	124	This path-sensitive devirtualization occurs when the analyzer can determine what
				125	method would actually be called at runtime. This is possible when the type
				126	information is constrained enough for a simulated C++/Objective-C object in
				127	order to make such a decision.
Jordan Rose	25c0400	2012-08-17 02:11:35 +0000	[diff] [blame]	128
Ted Kremenek	77df8d9	2012-08-22 01:20:05 +0000	[diff] [blame]	129	== RuntimeDefinition ==
Jordan Rose	25c0400	2012-08-17 02:11:35 +0000	[diff] [blame]	130
Ted Kremenek	77df8d9	2012-08-22 01:20:05 +0000	[diff] [blame]	131	The basis of this devirtualization is CallEvent's getRuntimeDefinition() method,
				132	which returns a RuntimeDefinition object. The "runtime" + "defintion"
				133	corresponds to the definition of the called method as would be computed at
				134	runtime. In the case of no dynamic dispatch, this object resolves to a Decl*
				135	for the called function. In the case of dynamic dispatch, the RuntimeDefinition
				136	object also includes an optional MemRegion* corresponding to the object being
				137	called (i.e., the "receiver" in Objective-C parlance). This information is
				138	later consulted by ExprEngine (along with tracked dynamic type information) to
				139	potentially resolve the called method.
Jordan Rose	25c0400	2012-08-17 02:11:35 +0000	[diff] [blame]	140
Ted Kremenek	77df8d9	2012-08-22 01:20:05 +0000	[diff] [blame]	141	== DynamicTypeInfo ==
Jordan Rose	25c0400	2012-08-17 02:11:35 +0000	[diff] [blame]	142
Ted Kremenek	77df8d9	2012-08-22 01:20:05 +0000	[diff] [blame]	143	In addition to RuntimeDefinition, the analyzer needs to track the potential
				144	runtime type of a simulated C++/Objective-C object. As the analyzer analyzes a
				145	path, it may accrue more information to refine the knowledge about the type of
				146	an object. This can then be used to make better decisions about the target
				147	method of a call.
Jordan Rose	25c0400	2012-08-17 02:11:35 +0000	[diff] [blame]	148
Ted Kremenek	77df8d9	2012-08-22 01:20:05 +0000	[diff] [blame]	149	Such type information is tracked as DynamicTypeInfo. This is path-sensitive
				150	data that is stored in ProgramState, which defines a mapping from MemRegions to
				151	an (optional) DynamicTypeInfo.
				152
				153	If no DynamicTypeInfo has been explicitly set for a MemRegion, it will be lazily
				154	inferred from the region's type or associated symbol. Information from symbolic
				155	regions is weaker than from true typed regions.
				156
				157	EXAMPLE: A C++ object declared "A obj" is known to have the class 'A', but a
				158	reference "A &ref" may dynamically be a subclass of 'A'.
				159
				160	The DynamicTypePropagation checker gathers and propagates DynamicTypeInfo,
				161	updating it as information is observed along a path that can refine that type
				162	information for a region.
				163
				164	WARNING: Not all of the existing analyzer code has been retrofitted to use
				165	DynamicTypeInfo, nor is it universally appropriate. In particular,
				166	DynamicTypeInfo always applies to a region with all casts stripped
				167	off, but sometimes the information provided by casts can be useful.)
				168
				169
				170	When asked to provide a definition, the CallEvents for dynamic calls will use
				171	the DynamicTypeInfo in their ProgramState to provide the best definition of the
				172	method to be called. In some cases this devirtualization can be perfect or
				173	near-perfect, and the analyzer can inline the definition as usual. In other
				174	cases ExprEngine can make a guess, but report that our guess may not be the
				175	method actually called at runtime.
				176
				177	TMK/COMMENT: what does it mean to "report" that our guess may not be the
				178	method actually called?
				179
				180	The -analyzer-ipa option has four different modes: none, inlining, dynamic, and
				181	dynamic-bifurcate. Under -analyzer-ipa=dynamic, all dynamic calls are inlined,
				182	whether we are certain or not that this will actually be the definition used at
				183	runtime. Under -analyzer-ipa=inlining, only "near-perfect" devirtualized calls
				184	are inlined*, and other dynamic calls are evaluated conservatively (as if no
				185	definition were available).
				186
				187	* Currently, no Objective-C messages are not inlined under
				188	-analyzer-ipa=inlining, even if we are reasonably confident of the type of the
				189	receiver. We plan to enable this once we have tested our heuristics more
				190	thoroughly.
				191
				192	The last option, -analyzer-ipa=dynamic-bifurcate, behaves similarly to
				193	"dynamic", but performs a conservative invalidation in the general virtual case
				194	in addition to inlining. The details of this are discussed below.
Jordan Rose	25c0400	2012-08-17 02:11:35 +0000	[diff] [blame]	195
				196	Bifurcation
				197	-----------
Jordan Rose	25c0400	2012-08-17 02:11:35 +0000	[diff] [blame]	198
Ted Kremenek	77df8d9	2012-08-22 01:20:05 +0000	[diff] [blame]	199	ExprEngine::BifurcateCall implements the -analyzer-ipa=dynamic-bifurcate
				200	mode.
Jordan Rose	25c0400	2012-08-17 02:11:35 +0000	[diff] [blame]	201
Anna Zaks	2eed8cc	2012-08-22 05:38:38 +0000	[diff] [blame]	202	When a call is made on a region with imprecise dynamic type information
				203	(RuntimeDefinition::mayHaveOtherDefinitions() evaluates to TRUE), ExprEngine
				204	bifurcates the path and marks the MemRegion (derived from a RuntimeDefinition
Ted Kremenek	77df8d9	2012-08-22 01:20:05 +0000	[diff] [blame]	205	object) with a path-sensitive "mode" in the ProgramState.
				206
				207	Currently, there are 2 modes:
				208
				209	DynamicDispatchModeInlined - Models the case where the dynamic type information
Anna Zaks	2eed8cc	2012-08-22 05:38:38 +0000	[diff] [blame]	210	of the receiver (MemoryRegion) is assumed to be perfectly constrained so
				211	that a given definition of a method is expected to be the code actually
				212	called. When this mode is set, ExprEngine uses the Decl from
				213	RuntimeDefinition to inline any dynamically dispatched call sent to this
				214	receiver because the function definition is considered to be fully resolved.
Ted Kremenek	77df8d9	2012-08-22 01:20:05 +0000	[diff] [blame]	215
				216	DynamicDispatchModeConservative - Models the case where the dynamic type
Anna Zaks	2eed8cc	2012-08-22 05:38:38 +0000	[diff] [blame]	217	information is assumed to be incorrect, for example, implies that the method
				218	definition is overriden in a subclass. In such cases, ExprEngine does not
				219	inline the methods sent to the receiver (MemoryRegion), even if a candidate
				220	definition is available. This mode is conservative about simulating the
				221	effects of a call.
Ted Kremenek	77df8d9	2012-08-22 01:20:05 +0000	[diff] [blame]	222
Anna Zaks	2eed8cc	2012-08-22 05:38:38 +0000	[diff] [blame]	223	Going forward along the symbolic execution path, ExprEngine consults the mode
				224	of the receiver's MemRegion to make decisions on whether the calls should be
				225	inlined or not, which ensures that there is at most one split per region.
Ted Kremenek	77df8d9	2012-08-22 01:20:05 +0000	[diff] [blame]	226
				227	At a high level, "bifurcation mode" allows for increased semantic coverage in
				228	cases where the parent method contains code which is only executed when the
				229	class is subclassed. The disadvantages of this mode are a (considerable?)
				230	performance hit and the possibility of false positives on the path where the
				231	conservative mode is used.
Jordan Rose	25c0400	2012-08-17 02:11:35 +0000	[diff] [blame]	232
				233	Objective-C Message Heuristics
				234	------------------------------
Jordan Rose	25c0400	2012-08-17 02:11:35 +0000	[diff] [blame]	235
Anna Zaks	2eed8cc	2012-08-22 05:38:38 +0000	[diff] [blame]	236	ExprEngine relies on a set of heuristics to partition the set of Objective-C
				237	method calls into those that require bifurcation and those that do not. Below
				238	are the cases when the DynamicTypeInfo of the object is considered precise
Ted Kremenek	77df8d9	2012-08-22 01:20:05 +0000	[diff] [blame]	239	(cannot be a subclass):
				240
				241	- If the object was created with +alloc or +new and initialized with an -init
				242	method.
				243
				244	- If the calls are property accesses using dot syntax. This is based on the
				245	assumption that children rarely override properties, or do so in an
				246	essentially compatible way.
				247
				248	- If the class interface is declared inside the main source file. In this case
				249	it is unlikely that it will be subclassed.
				250
				251	- If the method is not declared outside of main source file, either by the
				252	receiver's class or by any superclasses.
Jordan Rose	25c0400	2012-08-17 02:11:35 +0000	[diff] [blame]	253
				254	C++ Inlining Caveats
				255	--------------------
Jordan Rose	25c0400	2012-08-17 02:11:35 +0000	[diff] [blame]	256
Ted Kremenek	77df8d9	2012-08-22 01:20:05 +0000	[diff] [blame]	257	C++11 [class.cdtor]p4 describes how the vtable of an object is modified as it is
				258	being constructed or destructed; that is, the type of the object depends on
				259	which base constructors have been completed. This is tracked using
				260	DynamicTypeInfo in the DynamicTypePropagation checker.
Jordan Rose	25c0400	2012-08-17 02:11:35 +0000	[diff] [blame]	261
Ted Kremenek	77df8d9	2012-08-22 01:20:05 +0000	[diff] [blame]	262	There are several limitations in the current implementation:
Jordan Rose	25c0400	2012-08-17 02:11:35 +0000	[diff] [blame]	263
Ted Kremenek	77df8d9	2012-08-22 01:20:05 +0000	[diff] [blame]	264	- Temporaries are poorly modelled right now because we're not confident in the
				265	placement
Jordan Rose	25c0400	2012-08-17 02:11:35 +0000	[diff] [blame]	266
Ted Kremenek	77df8d9	2012-08-22 01:20:05 +0000	[diff] [blame]	267	- 'new' is poorly modelled due to some nasty CFG/design issues. This is tracked
				268	in PR12014. 'delete' is not modelled at all.
				269
				270	- Arrays of objects are modeled very poorly right now. ExprEngine currently
				271	only simualtes the first constructor and first destructor. Because of this,
				272	ExprEngine does not inline any constructors or destructors for arrays.
Jordan Rose	25c0400	2012-08-17 02:11:35 +0000	[diff] [blame]	273
				274	CallEvent
Ted Kremenek	77df8d9	2012-08-22 01:20:05 +0000	[diff] [blame]	275	---------
Jordan Rose	25c0400	2012-08-17 02:11:35 +0000	[diff] [blame]	276
Ted Kremenek	77df8d9	2012-08-22 01:20:05 +0000	[diff] [blame]	277	A CallEvent represents a specific call to a function, method, or other body of
				278	code. It is path-sensitive, containing both the current state (ProgramStateRef)
				279	and stack space (LocationContext), and provides uniform access to the argument
				280	values and return type of a call, no matter how the call is written in the
				281	source or what sort of code body is being invoked.
Jordan Rose	25c0400	2012-08-17 02:11:35 +0000	[diff] [blame]	282
Ted Kremenek	77df8d9	2012-08-22 01:20:05 +0000	[diff] [blame]	283	NOTE: For those familiar with Cocoa, CallEvent is roughly equivalent to
				284	NSInvocation.
Jordan Rose	25c0400	2012-08-17 02:11:35 +0000	[diff] [blame]	285
Ted Kremenek	77df8d9	2012-08-22 01:20:05 +0000	[diff] [blame]	286	CallEvent should be used whenever there is logic dealing with function calls
				287	that does not care how the call occurred.
Jordan Rose	25c0400	2012-08-17 02:11:35 +0000	[diff] [blame]	288
Ted Kremenek	77df8d9	2012-08-22 01:20:05 +0000	[diff] [blame]	289	Examples include checking that arguments satisfy preconditions (such as
				290	__attribute__((nonnull))), and attempting to inline a call.
				291
				292	CallEvents are reference-counted objects managed by a CallEventManager. While
				293	there is no inherent issue with persisting them (say, in a ProgramState's GDM),
				294	they are intended for short-lived use, and can be recreated from CFGElements or
				295	StackFrameContexts fairly easily.