clang/docs/analyzer/DesignDiscussions/InitializerLists.rst - toolchain/llvm-project - Gitiles

 This discussion took place in https://reviews.llvm.org/D35216
 "Escape symbols when creating std::initializer_list".

 It touches problems of modelling C++ standard library constructs in general,
 including modelling implementation-defined fields within C++ standard library
 objects, in particular constructing objects into pointers held by such fields,
 and separation of responsibilities between analyzer's core and checkers.

 **Artem:**

 I've seen a few false positives that appear because we construct
 C++11 std::initializer_list objects with brace initializers, and such
 construction is not properly modeled. For instance, if a new object is
 constructed on the heap only to be put into a brace-initialized STL container,
 the object is reported to be leaked.

 Approach (0): This can be trivially fixed by this patch, which causes pointers
 passed into initializer list expressions to immediately escape.

 This fix is overly conservative though. So i did a bit of investigation as to
 how model std::initializer_list better.

 According to the standard, std::initializer_list<T> is an object that has
 methods begin(), end(), and size(), where begin() returns a pointer to continuous
 array of size() objects of type T, and end() is equal to begin() plus size().
 The standard does hint that it should be possible to implement
 std::initializer_list<T> as a pair of pointers, or as a pointer and a size
 integer, however specific fields that the object would contain are an
 implementation detail.

 Ideally, we should be able to model the initializer list's methods precisely.
 Or, at least, it should be possible to explain to the analyzer that the list
 somehow "takes hold" of the values put into it. Initializer lists can also be
 copied, which is a separate story that i'm not trying to address here.

 The obvious approach to modeling std::initializer_list in a checker would be to
 construct a SymbolMetadata for the memory region of the initializer list object,
 which would be of type T* and represent begin(), so we'd trivially model begin()
 as a function that returns this symbol. The array pointed to by that symbol
 would be bindLoc()ed to contain the list's contents (probably as a CompoundVal
 to produce less bindings in the store). Extent of this array would represent
 size() and would be equal to the length of the list as written.

 So this sounds good, however apparently it does nothing to address our false
 positives: when the list escapes, our RegionStoreManager is not magically
 guessing that the metadata symbol attached to it, together with its contents,
 should also escape. In fact, it's impossible to trigger a pointer escape from
 within the checker.

 Approach (1): If only we enabled ProgramState::bindLoc(..., notifyChanges=true)
 to cause pointer escapes (not only region changes) (which sounds like the right
 thing to do anyway) such checker would be able to solve the false positives by
 triggering escapes when binding list elements to the list. However, it'd be as
 conservative as the current patch's solution. Ideally, we do not want escapes to
 happen so early. Instead, we'd prefer them to be delayed until the list itself
 escapes.

 So i believe that escaping metadata symbols whenever their base regions escape
 would be the right thing to do. Currently we didn't think about that because we
 had neither pointer-type metadatas nor non-pointer escapes.

 Approach (2): We could teach the Store to scan itself for bindings to
 metadata-symbolic-based regions during scanReachableSymbols() whenever a region
 turns out to be reachable. This requires no work on checker side, but it sounds
 performance-heavy.

 Approach (3): We could let checkers maintain the set of active metadata symbols
 in the program state (ideally somewhere in the Store, which sounds weird but
 causes the smallest amount of layering violations), so that the core knew what
 to escape. This puts a stress on the checkers, but with a smart data map it
 wouldn't be a problem.

 Approach (4): We could allow checkers to trigger pointer escapes in arbitrary
 moments. If we allow doing this within checkPointerEscape callback itself, we
 would be able to express facts like "when this region escapes, that metadata
 symbol attached to it should also escape". This sounds like an ultimate freedom,
 with maximum stress on the checkers - still not too much stress when we have
 smart data maps.

 I'm personally liking the approach (2) - it should be possible to avoid
 performance overhead, and clarity seems nice.

 **Gabor:**

 At this point, I am a bit wondering about two questions.

 - When should something belong to a checker and when should something belong
 to the engine? Sometimes we model library aspects in the engine and model
 language constructs in checkers.
 - What is the checker programming model that we are aiming for? Maximum
 freedom or more easy checker development?

 I think if we aim for maximum freedom, we do not need to worry about the
 potential stress on checkers, and we can introduce abstractions to mitigate that
 later on.
 If we want to simplify the API, then maybe it makes more sense to move language
 construct modeling to the engine when the checker API is not sufficient instead
 of complicating the API.

 Right now I have no preference or objections between the alternatives but there
 are some random thoughts:

 - Maybe it would be great to have a guideline how to evolve the analyzer and
 follow it, so it can help us to decide in similar situations
 - I do care about performance in this case. The reason is that we have a
 limited performance budget. And I think we should not expect most of the checker
 writers to add modeling of language constructs. So, in my opinion, it is ok to
 have less nice/more verbose API for language modeling if we can have better
 performance this way, since it only needs to be done once, and is done by the
 framework developers.

 **Artem:** These are some great questions, i guess it'd be better to discuss
 them more openly. As a quick dump of my current mood:

 - To me it seems obvious that we need to aim for a checker API that is both
 simple and powerful. This can probably by keeping the API as powerful as
 necessary while providing a layer of simple ready-made solutions on top of it.
 Probably a few reusable components for assembling checkers. And this layer
 should ideally be pleasant enough to work with, so that people would prefer to
 extend it when something is lacking, instead of falling back to the complex
 omnipotent API. I'm thinking of AST matchers vs. AST visitors as a roughly
 similar situation: matchers are not omnipotent, but they're so nice.

 - Separation between core and checkers is usually quite strange. Once we have
 shared state traits, i generally wouldn't mind having region store or range
 constraint manager as checkers (though it's probably not worth it to transform
 them - just a mood). The main thing to avoid here would be the situation when
 the checker overwrites stuff written by the core because it thinks it has a
 better idea what's going on, so the core should provide a good default behavior.

 - Yeah, i totally care about performance as well, and if i try to implement
 approach, i'd make sure it's good.

 **Artem:**

 > Approach (2): We could teach the Store to scan itself for bindings to
 > metadata-symbolic-based regions during scanReachableSymbols() whenever
 > a region turns out to be reachable. This requires no work on checker side,
 > but it sounds performance-heavy.

 Nope, this approach is wrong. Metadata symbols may become out-of-date: when the
 object changes, metadata symbols attached to it aren't changing (because symbols
 simply don't change). The same metadata may have different symbols to denote its
 value in different moments of time, but at most one of them represents the
 actual metadata value. So we'd be escaping more stuff than necessary.

 If only we had "ghost fields"
 (http://lists.llvm.org/pipermail/cfe-dev/2016-May/049000.html), it would have
 been much easier, because the ghost field would only contain the actual
 metadata, and the Store would always know about it. This example adds to my
 belief that ghost fields are exactly what we need for most C++ checkers.

 **Devin:**

 In this case, I would be fine with some sort of
 AbstractStorageMemoryRegion that meant "here is a memory region and somewhere
 reachable from here exists another region of type T". Or even multiple regions
 with different identifiers. This wouldn't specify how the memory is reachable,
 but it would allow for transfer functions to get at those regions and it would
 allow for invalidation.

 For std::initializer_list this reachable region would the region for the backing
 array and the transfer functions for begin() and end() yield the beginning and
 end element regions for it.

 In my view this differs from ghost variables in that (1) this storage does
 actually exist (it is just a library implementation detail where that storage
 lives) and (2) it is perfectly valid for a pointer into that storage to be
 returned and for another part of the program to read or write from that storage.
 (Well, in this case just read since it is allowed to be read-only memory).

 What I'm not OK with is modeling abstract analysis state (for example, the count
 of a NSMutableArray or the typestate of a file handle) as a value stored in some
 ginned up region in the store. This takes an easy problem that the analyzer does
 well at (modeling typestate) and turns it into a hard one that the analyzer is
 bad at (reasoning about the contents of the heap).

 I think the key criterion here is: "is the region accessible from outside the
 library". That is, does the library expose the region as a pointer that can be
 read to or written from in the client program? If so, then it makes sense for
 this to be in the store: we are modeling reachable storage as storage. But if
 we're just modeling arbitrary analysis facts that need to be invalidated when a
 pointer escapes then we shouldn't try to gin up storage for them just to get
 invalidation for free.

 **Artem:**

 > In this case, I would be fine with some sort of AbstractStorageMemoryRegion
 > that meant "here is a memory region and somewhere reachable from here exists
 > another region of type T". Or even multiple regions with different
 > identifiers. This wouldn't specify how the memory is reachable, but it would
 > allow for transfer functions to get at those regions and it would allow for
 > invalidation.

 Yeah, this is what we can easily implement now as a
 symbolic-region-based-on-a-metadata-symbol (though we can make a new region
 class for that if we eg. want it typed). The problem is that the relation
 between such storage region and its parent object region is essentially
 immaterial, similarly to the relation between SymbolRegionValue and its parent
 region. Region contents are mutable: today the abstract storage is reachable
 from its parent object, tomorrow it's not, and maybe something else becomes
 reachable, something that isn't even abstract. So the parent region for the
 abstract storage is most of the time at best a "nice to know" thing - we cannot
 rely on it to do any actual work. We'd anyway need to rely on the checker to do
 the job.

 > For std::initializer_list this reachable region would the region for the
 > backing array and the transfer functions for begin() and end() yield the
 > beginning and end element regions for it.

 So maybe in fact for std::initializer_list it may work fine because you cannot
 change the data after the object is constructed - so this region's contents are
 essentially immutable. For the future, i feel as if it is a dead end.

 I'd like to consider another funny example. Suppose we're trying to model
 std::unique_ptr. Consider::

   void bar(const std::unique_ptr<int> &x);

   void foo(std::unique_ptr<int> &x) {
     int *a = x.get();   // (a, 0, direct): &AbstractStorageRegion
     *a = 1;             // (AbstractStorageRegion, 0, direct): 1 S32b
     int *b = new int;
     *b = 2;             // (SymRegion{conj_$0<int *>}, 0 ,direct): 2 S32b
     x.reset(b);         // Checker map: x -> SymRegion{conj_$0<int *>}
     bar(x);             // 'a' doesn't escape (the pointer was unique), 'b' does.
     clang_analyzer_eval(*a == 1); // Making this true is up to the checker.
     clang_analyzer_eval(*b == 2); // Making this unknown is up to the checker.
   }

 The checker doesn't totally need to ensure that *a == 1 passes - even though the
 pointer was unique, it could theoretically have .get()-ed above and the code
 could of course break the uniqueness invariant (though we'd probably want it).
 The checker can say that "even if *a did escape, it was not because it was
 stuffed directly into bar()".

 The checker's direct responsibility, however, is to solve the *b == 2 thing
 (which is in fact the problem we're dealing with in this patch - escaping the
 storage region of the object).

 So we're talking about one more operation over the program state (scanning
 reachable symbols and regions) that cannot work without checker support.

 We can probably add a new callback "checkReachableSymbols" to solve this. This
 is in fact also related to the dead symbols problem (we're scanning for live
 symbols in the store and in the checkers separately, but we need to do so
 simultaneously with a single worklist). Hmm, in fact this sounds like a good
 idea; we can replace checkLiveSymbols with checkReachableSymbols.

 Or we could just have ghost member variables, and no checker support required at
 all. For ghost member variables, the relation with their parent region (which
 would be their superregion) is actually useful, the mutability of their contents
 is expressed naturally, and the store automagically sees reachable symbols, live
 symbols, escapes, invalidations, whatever.

 > In my view this differs from ghost variables in that (1) this storage does
 > actually exist (it is just a library implementation detail where that storage
 > lives) and (2) it is perfectly valid for a pointer into that storage to be
 > returned and for another part of the program to read or write from that
 > storage. (Well, in this case just read since it is allowed to be read-only
 > memory).

 > What I'm not OK with is modeling abstract analysis state (for example, the
 > count of a NSMutableArray or the typestate of a file handle) as a value stored
 > in some ginned up region in the store.This takes an easy problem that the
 > analyzer does well at (modeling typestate) and turns it into a hard one that
 > the analyzer is bad at (reasoning about the contents of the heap).

 Yeah, i tend to agree on that. For simple typestates, this is probably an
 overkill, so let's definitely put aside the idea of "ghost symbolic regions"
 that i had earlier.

 But, to summarize a bit, in our current case, however, the typestate we're
 looking for is the contents of the heap. And when we try to model such
 typestates (complex in this specific manner, i.e. heap-like) in any checker, we
 have a choice between re-doing this modeling in every such checker (which is
 something analyzer is indeed good at, but at a price of making checkers heavy)
 or instead relying on the Store to do exactly what it's designed to do.

 > I think the key criterion here is: "is the region accessible from outside
 > the library". That is, does the library expose the region as a pointer that
 > can be read to or written from in the client program? If so, then it makes
 > sense for this to be in the store: we are modeling reachable storage as
 > storage. But if we're just modeling arbitrary analysis facts that need to be
 > invalidated when a pointer escapes then we shouldn't try to gin up storage
 > for them just to get invalidation for free.

 As a metaphor, i'd probably compare it to body farms - the difference between
 ghost member variables and metadata symbols seems to me like the difference
 between body farms and evalCall. Both are nice to have, and body farms are very
 pleasant to work with, even if not omnipotent. I think it's fine for a
 FunctionDecl's body in a body farm to have a local variable, even if such
 variable doesn't actually exist, even if it cannot be seen from outside the
 function call. I'm not seeing immediate practical difference between "it does
 actually exist" and "it doesn't actually exist, just a handy abstraction".
 Similarly, i think it's fine if we have a CXXRecordDecl with
 implementation-defined contents, and try to farm up a member variable as a handy
 abstraction (we don't even need to know its name or offset, only that it's there
 somewhere).

 **Artem:**

 We've discussed it in person with Devin, and he provided more points to think
 about:

 - If the initializer list consists of non-POD data, constructors of list's
 objects need to take the sub-region of the list's region as this-region In the
 current (v2) version of this patch, these objects are constructed elsewhere and
 then trivial-copied into the list's metadata pointer region, which may be
 incorrect. This is our overall problem with C++ constructors, which manifests in
 this case as well. Additionally, objects would need to be constructed in the
 analyzer's core, which would not be able to predict that it needs to take a
 checker-specific region as this-region, which makes it harder, though it might
 be mitigated by sharing the checker state traits.

 - Because "ghost variables" are not material to the user, we need to somehow
 make super sure that they don't make it into the diagnostic messages.

 So, because this needs further digging into overall C++ support and rises too
 many questions, i'm delaying a better approach to this problem and will fall
 back to the original trivial patch.
	This discussion took place in https://reviews.llvm.org/D35216
	"Escape symbols when creating std::initializer_list".

	It touches problems of modelling C++ standard library constructs in general,
	including modelling implementation-defined fields within C++ standard library
	objects, in particular constructing objects into pointers held by such fields,
	and separation of responsibilities between analyzer's core and checkers.

	Artem:

	I've seen a few false positives that appear because we construct
	C++11 std::initializer_list objects with brace initializers, and such
	construction is not properly modeled. For instance, if a new object is
	constructed on the heap only to be put into a brace-initialized STL container,
	the object is reported to be leaked.

	Approach (0): This can be trivially fixed by this patch, which causes pointers
	passed into initializer list expressions to immediately escape.

	This fix is overly conservative though. So i did a bit of investigation as to
	how model std::initializer_list better.

	According to the standard, std::initializer_list<T> is an object that has
	methods begin(), end(), and size(), where begin() returns a pointer to continuous
	array of size() objects of type T, and end() is equal to begin() plus size().
	The standard does hint that it should be possible to implement
	std::initializer_list<T> as a pair of pointers, or as a pointer and a size
	integer, however specific fields that the object would contain are an
	implementation detail.

	Ideally, we should be able to model the initializer list's methods precisely.
	Or, at least, it should be possible to explain to the analyzer that the list
	somehow "takes hold" of the values put into it. Initializer lists can also be
	copied, which is a separate story that i'm not trying to address here.

	The obvious approach to modeling std::initializer_list in a checker would be to
	construct a SymbolMetadata for the memory region of the initializer list object,
	which would be of type T* and represent begin(), so we'd trivially model begin()
	as a function that returns this symbol. The array pointed to by that symbol
	would be bindLoc()ed to contain the list's contents (probably as a CompoundVal
	to produce less bindings in the store). Extent of this array would represent
	size() and would be equal to the length of the list as written.

	So this sounds good, however apparently it does nothing to address our false
	positives: when the list escapes, our RegionStoreManager is not magically
	guessing that the metadata symbol attached to it, together with its contents,
	should also escape. In fact, it's impossible to trigger a pointer escape from
	within the checker.

	Approach (1): If only we enabled ProgramState::bindLoc(..., notifyChanges=true)
	to cause pointer escapes (not only region changes) (which sounds like the right
	thing to do anyway) such checker would be able to solve the false positives by
	triggering escapes when binding list elements to the list. However, it'd be as
	conservative as the current patch's solution. Ideally, we do not want escapes to
	happen so early. Instead, we'd prefer them to be delayed until the list itself
	escapes.

	So i believe that escaping metadata symbols whenever their base regions escape
	would be the right thing to do. Currently we didn't think about that because we
	had neither pointer-type metadatas nor non-pointer escapes.

	Approach (2): We could teach the Store to scan itself for bindings to
	metadata-symbolic-based regions during scanReachableSymbols() whenever a region
	turns out to be reachable. This requires no work on checker side, but it sounds
	performance-heavy.

	Approach (3): We could let checkers maintain the set of active metadata symbols
	in the program state (ideally somewhere in the Store, which sounds weird but
	causes the smallest amount of layering violations), so that the core knew what
	to escape. This puts a stress on the checkers, but with a smart data map it
	wouldn't be a problem.

	Approach (4): We could allow checkers to trigger pointer escapes in arbitrary
	moments. If we allow doing this within checkPointerEscape callback itself, we
	would be able to express facts like "when this region escapes, that metadata
	symbol attached to it should also escape". This sounds like an ultimate freedom,
	with maximum stress on the checkers - still not too much stress when we have
	smart data maps.

	I'm personally liking the approach (2) - it should be possible to avoid
	performance overhead, and clarity seems nice.

	Gabor:

	At this point, I am a bit wondering about two questions.

	- When should something belong to a checker and when should something belong
	to the engine? Sometimes we model library aspects in the engine and model
	language constructs in checkers.
	- What is the checker programming model that we are aiming for? Maximum
	freedom or more easy checker development?

	I think if we aim for maximum freedom, we do not need to worry about the
	potential stress on checkers, and we can introduce abstractions to mitigate that
	later on.
	If we want to simplify the API, then maybe it makes more sense to move language
	construct modeling to the engine when the checker API is not sufficient instead
	of complicating the API.

	Right now I have no preference or objections between the alternatives but there
	are some random thoughts:

	- Maybe it would be great to have a guideline how to evolve the analyzer and
	follow it, so it can help us to decide in similar situations
	- I do care about performance in this case. The reason is that we have a
	limited performance budget. And I think we should not expect most of the checker
	writers to add modeling of language constructs. So, in my opinion, it is ok to
	have less nice/more verbose API for language modeling if we can have better
	performance this way, since it only needs to be done once, and is done by the
	framework developers.

	Artem: These are some great questions, i guess it'd be better to discuss
	them more openly. As a quick dump of my current mood:

	- To me it seems obvious that we need to aim for a checker API that is both
	simple and powerful. This can probably by keeping the API as powerful as
	necessary while providing a layer of simple ready-made solutions on top of it.
	Probably a few reusable components for assembling checkers. And this layer
	should ideally be pleasant enough to work with, so that people would prefer to
	extend it when something is lacking, instead of falling back to the complex
	omnipotent API. I'm thinking of AST matchers vs. AST visitors as a roughly
	similar situation: matchers are not omnipotent, but they're so nice.

	- Separation between core and checkers is usually quite strange. Once we have
	shared state traits, i generally wouldn't mind having region store or range
	constraint manager as checkers (though it's probably not worth it to transform
	them - just a mood). The main thing to avoid here would be the situation when
	the checker overwrites stuff written by the core because it thinks it has a
	better idea what's going on, so the core should provide a good default behavior.

	- Yeah, i totally care about performance as well, and if i try to implement
	approach, i'd make sure it's good.

	Artem:

	> Approach (2): We could teach the Store to scan itself for bindings to
	> metadata-symbolic-based regions during scanReachableSymbols() whenever
	> a region turns out to be reachable. This requires no work on checker side,
	> but it sounds performance-heavy.

	Nope, this approach is wrong. Metadata symbols may become out-of-date: when the
	object changes, metadata symbols attached to it aren't changing (because symbols
	simply don't change). The same metadata may have different symbols to denote its
	value in different moments of time, but at most one of them represents the
	actual metadata value. So we'd be escaping more stuff than necessary.

	If only we had "ghost fields"
	(http://lists.llvm.org/pipermail/cfe-dev/2016-May/049000.html), it would have
	been much easier, because the ghost field would only contain the actual
	metadata, and the Store would always know about it. This example adds to my
	belief that ghost fields are exactly what we need for most C++ checkers.

	Devin:

	In this case, I would be fine with some sort of
	AbstractStorageMemoryRegion that meant "here is a memory region and somewhere
	reachable from here exists another region of type T". Or even multiple regions
	with different identifiers. This wouldn't specify how the memory is reachable,
	but it would allow for transfer functions to get at those regions and it would
	allow for invalidation.

	For std::initializer_list this reachable region would the region for the backing
	array and the transfer functions for begin() and end() yield the beginning and
	end element regions for it.

	In my view this differs from ghost variables in that (1) this storage does
	actually exist (it is just a library implementation detail where that storage
	lives) and (2) it is perfectly valid for a pointer into that storage to be
	returned and for another part of the program to read or write from that storage.
	(Well, in this case just read since it is allowed to be read-only memory).

	What I'm not OK with is modeling abstract analysis state (for example, the count
	of a NSMutableArray or the typestate of a file handle) as a value stored in some
	ginned up region in the store. This takes an easy problem that the analyzer does
	well at (modeling typestate) and turns it into a hard one that the analyzer is
	bad at (reasoning about the contents of the heap).

	I think the key criterion here is: "is the region accessible from outside the
	library". That is, does the library expose the region as a pointer that can be
	read to or written from in the client program? If so, then it makes sense for
	this to be in the store: we are modeling reachable storage as storage. But if
	we're just modeling arbitrary analysis facts that need to be invalidated when a
	pointer escapes then we shouldn't try to gin up storage for them just to get
	invalidation for free.

	Artem:

	> In this case, I would be fine with some sort of AbstractStorageMemoryRegion
	> that meant "here is a memory region and somewhere reachable from here exists
	> another region of type T". Or even multiple regions with different
	> identifiers. This wouldn't specify how the memory is reachable, but it would
	> allow for transfer functions to get at those regions and it would allow for
	> invalidation.

	Yeah, this is what we can easily implement now as a
	symbolic-region-based-on-a-metadata-symbol (though we can make a new region
	class for that if we eg. want it typed). The problem is that the relation
	between such storage region and its parent object region is essentially
	immaterial, similarly to the relation between SymbolRegionValue and its parent
	region. Region contents are mutable: today the abstract storage is reachable
	from its parent object, tomorrow it's not, and maybe something else becomes
	reachable, something that isn't even abstract. So the parent region for the
	abstract storage is most of the time at best a "nice to know" thing - we cannot
	rely on it to do any actual work. We'd anyway need to rely on the checker to do
	the job.

	> For std::initializer_list this reachable region would the region for the
	> backing array and the transfer functions for begin() and end() yield the
	> beginning and end element regions for it.

	So maybe in fact for std::initializer_list it may work fine because you cannot
	change the data after the object is constructed - so this region's contents are
	essentially immutable. For the future, i feel as if it is a dead end.

	I'd like to consider another funny example. Suppose we're trying to model
	std::unique_ptr. Consider::

	void bar(const std::unique_ptr<int> &x);

	void foo(std::unique_ptr<int> &x) {
	int *a = x.get(); // (a, 0, direct): &AbstractStorageRegion
	*a = 1; // (AbstractStorageRegion, 0, direct): 1 S32b
	int *b = new int;
	b = 2; // (SymRegion{conj_$0<int >}, 0 ,direct): 2 S32b
	x.reset(b); // Checker map: x -> SymRegion{conj_$0<int *>}
	bar(x); // 'a' doesn't escape (the pointer was unique), 'b' does.
	clang_analyzer_eval(*a == 1); // Making this true is up to the checker.
	clang_analyzer_eval(*b == 2); // Making this unknown is up to the checker.
	}

	The checker doesn't totally need to ensure that *a == 1 passes - even though the
	pointer was unique, it could theoretically have .get()-ed above and the code
	could of course break the uniqueness invariant (though we'd probably want it).
	The checker can say that "even if *a did escape, it was not because it was
	stuffed directly into bar()".

	The checker's direct responsibility, however, is to solve the *b == 2 thing
	(which is in fact the problem we're dealing with in this patch - escaping the
	storage region of the object).

	So we're talking about one more operation over the program state (scanning
	reachable symbols and regions) that cannot work without checker support.

	We can probably add a new callback "checkReachableSymbols" to solve this. This
	is in fact also related to the dead symbols problem (we're scanning for live
	symbols in the store and in the checkers separately, but we need to do so
	simultaneously with a single worklist). Hmm, in fact this sounds like a good
	idea; we can replace checkLiveSymbols with checkReachableSymbols.

	Or we could just have ghost member variables, and no checker support required at
	all. For ghost member variables, the relation with their parent region (which
	would be their superregion) is actually useful, the mutability of their contents
	is expressed naturally, and the store automagically sees reachable symbols, live
	symbols, escapes, invalidations, whatever.

	> In my view this differs from ghost variables in that (1) this storage does
	> actually exist (it is just a library implementation detail where that storage
	> lives) and (2) it is perfectly valid for a pointer into that storage to be
	> returned and for another part of the program to read or write from that
	> storage. (Well, in this case just read since it is allowed to be read-only
	> memory).

	> What I'm not OK with is modeling abstract analysis state (for example, the
	> count of a NSMutableArray or the typestate of a file handle) as a value stored
	> in some ginned up region in the store.This takes an easy problem that the
	> analyzer does well at (modeling typestate) and turns it into a hard one that
	> the analyzer is bad at (reasoning about the contents of the heap).

	Yeah, i tend to agree on that. For simple typestates, this is probably an
	overkill, so let's definitely put aside the idea of "ghost symbolic regions"
	that i had earlier.

	But, to summarize a bit, in our current case, however, the typestate we're
	looking for is the contents of the heap. And when we try to model such
	typestates (complex in this specific manner, i.e. heap-like) in any checker, we
	have a choice between re-doing this modeling in every such checker (which is
	something analyzer is indeed good at, but at a price of making checkers heavy)
	or instead relying on the Store to do exactly what it's designed to do.

	> I think the key criterion here is: "is the region accessible from outside
	> the library". That is, does the library expose the region as a pointer that
	> can be read to or written from in the client program? If so, then it makes
	> sense for this to be in the store: we are modeling reachable storage as
	> storage. But if we're just modeling arbitrary analysis facts that need to be
	> invalidated when a pointer escapes then we shouldn't try to gin up storage
	> for them just to get invalidation for free.

	As a metaphor, i'd probably compare it to body farms - the difference between
	ghost member variables and metadata symbols seems to me like the difference
	between body farms and evalCall. Both are nice to have, and body farms are very
	pleasant to work with, even if not omnipotent. I think it's fine for a
	FunctionDecl's body in a body farm to have a local variable, even if such
	variable doesn't actually exist, even if it cannot be seen from outside the
	function call. I'm not seeing immediate practical difference between "it does
	actually exist" and "it doesn't actually exist, just a handy abstraction".
	Similarly, i think it's fine if we have a CXXRecordDecl with
	implementation-defined contents, and try to farm up a member variable as a handy
	abstraction (we don't even need to know its name or offset, only that it's there
	somewhere).

	Artem:

	We've discussed it in person with Devin, and he provided more points to think
	about:

	- If the initializer list consists of non-POD data, constructors of list's
	objects need to take the sub-region of the list's region as this-region In the
	current (v2) version of this patch, these objects are constructed elsewhere and
	then trivial-copied into the list's metadata pointer region, which may be
	incorrect. This is our overall problem with C++ constructors, which manifests in
	this case as well. Additionally, objects would need to be constructed in the
	analyzer's core, which would not be able to predict that it needs to take a
	checker-specific region as this-region, which makes it harder, though it might
	be mitigated by sharing the checker state traits.

	- Because "ghost variables" are not material to the user, we need to somehow
	make super sure that they don't make it into the diagnostic messages.

	So, because this needs further digging into overall C++ support and rises too
	many questions, i'm delaying a better approach to this problem and will fall
	back to the original trivial patch.