[analyzer] Creating standard Sphinx documentation The lack of documentation has been a long standing issue in the Static Analyzer, and one of the leading reasons behind this was a lack of good documentation infrastucture. This lead serious drawbacks, such as * Not having proper release notes for years * Not being able to have a sensible auto-generated checker documentations (which lead to most of them not having any) * The HTML website that has to updated manually is a chore, and has been outdated for a long while * Many design discussions are now hidden in phabricator revisions This patch implements a new documentation infrastucture using Sphinx, like most of the other subprojects in LLVM. It transformed some pages as a proof-of- concept, with many others to follow in later patches. The eventual goal is to preserve the original website's (https://clang-analyzer.llvm.org/) frontpage, but move everything else to the new format. Some other ideas, like creating a unipage for each checker (similar to how clang-tidy works now), are also being discussed. Patch by Dániel Krupp! Differential Revision: https://reviews.llvm.org/D54429 llvm-svn: 353126

commit: 1a17032b788016299ea4e3c4b53670c6dcd94b4f [log] [tgz]
author: Kristof Umann <dkszelethus@gmail.com> Tue Feb 05 00:39:33 2019 +0000
committer: Kristof Umann <dkszelethus@gmail.com> Tue Feb 05 00:39:33 2019 +0000
tree: 318aac876ac4171cb6613f534126f39430fc193e
parent: 7f09fd6b045da9fd62529fede180ac3e48a88305 [diff]
diff --git a/clang/docs/analyzer/developer-docs/DebugChecks.rst b/clang/docs/analyzer/developer-docs/DebugChecks.rst
new file mode 100644
index 0000000..bb2f37f
--- /dev/null
+++ b/clang/docs/analyzer/developer-docs/DebugChecks.rst

@@ -0,0 +1,287 @@
+============
+Debug Checks
+============
+
+.. contents::
+   :local:
+
+The analyzer contains a number of checkers which can aid in debugging. Enable
+them by using the "-analyzer-checker=" flag, followed by the name of the
+checker.
+
+
+General Analysis Dumpers
+========================
+
+These checkers are used to dump the results of various infrastructural analyses
+to stderr. Some checkers also have "view" variants, which will display a graph
+using a 'dot' format viewer (such as Graphviz on OS X) instead.
+
+- debug.DumpCallGraph, debug.ViewCallGraph: Show the call graph generated for
+  the current translation unit. This is used to determine the order in which to
+  analyze functions when inlining is enabled.
+
+- debug.DumpCFG, debug.ViewCFG: Show the CFG generated for each top-level
+  function being analyzed.
+
+- debug.DumpDominators: Shows the dominance tree for the CFG of each top-level
+  function.
+
+- debug.DumpLiveVars: Show the results of live variable analysis for each
+  top-level function being analyzed.
+
+- debug.DumpLiveStmts: Show the results of live statement analysis for each
+  top-level function being analyzed.
+
+- debug.ViewExplodedGraph: Show the Exploded Graphs generated for the
+  analysis of different functions in the input translation unit. When there
+  are several functions analyzed, display one graph per function. Beware
+  that these graphs may grow very large, even for small functions.
+
+Path Tracking
+=============
+
+These checkers print information about the path taken by the analyzer engine.
+
+- debug.DumpCalls: Prints out every function or method call encountered during a
+  path traversal. This is indented to show the call stack, but does NOT do any
+  special handling of branches, meaning different paths could end up
+  interleaved.
+
+- debug.DumpTraversal: Prints the name of each branch statement encountered
+  during a path traversal ("IfStmt", "WhileStmt", etc). Currently used to check
+  whether the analysis engine is doing BFS or DFS.
+
+
+State Checking
+==============
+
+These checkers will print out information about the analyzer state in the form
+of analysis warnings. They are intended for use with the -verify functionality
+in regression tests.
+
+- debug.TaintTest: Prints out the word "tainted" for every expression that
+  carries taint. At the time of this writing, taint was only introduced by the
+  checks under experimental.security.taint.TaintPropagation; this checker may
+  eventually move to the security.taint package.
+
+- debug.ExprInspection: Responds to certain function calls, which are modeled
+  after builtins. These function calls should affect the program state other
+  than the evaluation of their arguments; to use them, you will need to declare
+  them within your test file. The available functions are described below.
+
+(FIXME: debug.ExprInspection should probably be renamed, since it no longer only
+inspects expressions.)
+
+
+ExprInspection checks
+---------------------
+
+- ``void clang_analyzer_eval(bool);``
+
+  Prints TRUE if the argument is known to have a non-zero value, FALSE if the
+  argument is known to have a zero or null value, and UNKNOWN if the argument
+  isn't sufficiently constrained on this path.  You can use this to test other
+  values by using expressions like "x == 5".  Note that this functionality is
+  currently DISABLED in inlined functions, since different calls to the same
+  inlined function could provide different information, making it difficult to
+  write proper -verify directives.
+
+  In C, the argument can be typed as 'int' or as '_Bool'.
+
+  Example usage::
+
+    clang_analyzer_eval(x); // expected-warning{{UNKNOWN}}
+    if (!x) return;
+    clang_analyzer_eval(x); // expected-warning{{TRUE}}
+
+
+- ``void clang_analyzer_checkInlined(bool);``
+
+  If a call occurs within an inlined function, prints TRUE or FALSE according to
+  the value of its argument. If a call occurs outside an inlined function,
+  nothing is printed.
+
+  The intended use of this checker is to assert that a function is inlined at
+  least once (by passing 'true' and expecting a warning), or to assert that a
+  function is never inlined (by passing 'false' and expecting no warning). The
+  argument is technically unnecessary but is intended to clarify intent.
+
+  You might wonder why we can't print TRUE if a function is ever inlined and
+  FALSE if it is not. The problem is that any inlined function could conceivably
+  also be analyzed as a top-level function (in which case both TRUE and FALSE
+  would be printed), depending on the value of the -analyzer-inlining option.
+
+  In C, the argument can be typed as 'int' or as '_Bool'.
+
+  Example usage::
+
+    int inlined() {
+      clang_analyzer_checkInlined(true); // expected-warning{{TRUE}}
+      return 42;
+    }
+    
+    void topLevel() {
+      clang_analyzer_checkInlined(false); // no-warning (not inlined)
+      int value = inlined();
+      // This assertion will not be valid if the previous call was not inlined.
+      clang_analyzer_eval(value == 42); // expected-warning{{TRUE}}
+    }
+
+- ``void clang_analyzer_warnIfReached();``
+
+  Generate a warning if this line of code gets reached by the analyzer.
+
+  Example usage::
+
+    if (true) {
+      clang_analyzer_warnIfReached();  // expected-warning{{REACHABLE}}
+    }
+    else {
+      clang_analyzer_warnIfReached();  // no-warning
+    }
+
+- ``void clang_analyzer_numTimesReached();``
+
+  Same as above, but include the number of times this call expression
+  gets reached by the analyzer during the current analysis.
+
+  Example usage::
+
+    for (int x = 0; x < 3; ++x) {
+      clang_analyzer_numTimesReached(); // expected-warning{{3}}
+    }
+
+- ``void clang_analyzer_warnOnDeadSymbol(int);``
+
+  Subscribe for a delayed warning when the symbol that represents the value of
+  the argument is garbage-collected by the analyzer.
+
+  When calling 'clang_analyzer_warnOnDeadSymbol(x)', if value of 'x' is a
+  symbol, then this symbol is marked by the ExprInspection checker. Then,
+  during each garbage collection run, the checker sees if the marked symbol is
+  being collected and issues the 'SYMBOL DEAD' warning if it does.
+  This way you know where exactly, up to the line of code, the symbol dies.
+
+  It is unlikely that you call this function after the symbol is already dead,
+  because the very reference to it as the function argument prevents it from
+  dying. However, if the argument is not a symbol but a concrete value,
+  no warning would be issued.
+
+  Example usage::
+
+    do {
+      int x = generate_some_integer();
+      clang_analyzer_warnOnDeadSymbol(x);
+    } while(0);  // expected-warning{{SYMBOL DEAD}}
+
+
+- ``void clang_analyzer_explain(a single argument of any type);``
+
+  This function explains the value of its argument in a human-readable manner
+  in the warning message. You can make as many overrides of its prototype
+  in the test code as necessary to explain various integral, pointer,
+  or even record-type values. To simplify usage in C code (where overloading
+  the function declaration is not allowed), you may append an arbitrary suffix
+  to the function name, without affecting functionality.
+
+  Example usage::
+
+    void clang_analyzer_explain(int);
+    void clang_analyzer_explain(void *);
+
+    // Useful in C code
+    void clang_analyzer_explain_int(int);
+
+    void foo(int param, void *ptr) {
+      clang_analyzer_explain(param); // expected-warning{{argument 'param'}}
+      clang_analyzer_explain_int(param); // expected-warning{{argument 'param'}}
+      if (!ptr)
+        clang_analyzer_explain(ptr); // expected-warning{{memory address '0'}}
+    }
+
+- ``void clang_analyzer_dump( /* a single argument of any type */);``
+
+  Similar to clang_analyzer_explain, but produces a raw dump of the value,
+  same as SVal::dump().
+
+  Example usage::
+
+    void clang_analyzer_dump(int);
+    void foo(int x) {
+      clang_analyzer_dump(x); // expected-warning{{reg_$0<x>}}
+    }
+
+- ``size_t clang_analyzer_getExtent(void *);``
+
+  This function returns the value that represents the extent of a memory region
+  pointed to by the argument. This value is often difficult to obtain otherwise,
+  because no valid code that produces this value. However, it may be useful
+  for testing purposes, to see how well does the analyzer model region extents.
+
+  Example usage::
+
+    void foo() {
+      int x, *y;
+      size_t xs = clang_analyzer_getExtent(&x);
+      clang_analyzer_explain(xs); // expected-warning{{'4'}}
+      size_t ys = clang_analyzer_getExtent(&y);
+      clang_analyzer_explain(ys); // expected-warning{{'8'}}
+    }
+
+- ``void clang_analyzer_printState();``
+
+  Dumps the current ProgramState to the stderr. Quickly lookup the program state
+  at any execution point without ViewExplodedGraph or re-compiling the program.
+  This is not very useful for writing tests (apart from testing how ProgramState
+  gets printed), but useful for debugging tests. Also, this method doesn't
+  produce a warning, so it gets printed on the console before all other
+  ExprInspection warnings.
+
+  Example usage::
+
+    void foo() {
+      int x = 1;
+      clang_analyzer_printState(); // Read the stderr!
+    }
+
+- ``void clang_analyzer_hashDump(int);``
+
+  The analyzer can generate a hash to identify reports. To debug what information
+  is used to calculate this hash it is possible to dump the hashed string as a
+  warning of an arbitrary expression using the function above.
+
+  Example usage::
+
+    void foo() {
+      int x = 1;
+      clang_analyzer_hashDump(x); // expected-warning{{hashed string for x}}
+    }
+
+- ``void clang_analyzer_denote(int, const char *);``
+
+  Denotes symbols with strings. A subsequent call to clang_analyzer_express()
+  will expresses another symbol in terms of these string. Useful for testing
+  relationships between different symbols.
+
+  Example usage::
+
+    void foo(int x) {
+      clang_analyzer_denote(x, "$x");
+      clang_analyzer_express(x + 1); // expected-warning{{$x + 1}}
+    }
+
+- ``void clang_analyzer_express(int);``
+
+  See clang_analyzer_denote().
+
+Statistics
+==========
+
+The debug.Stats checker collects various information about the analysis of each
+function, such as how many blocks were reached and if the analyzer timed out.
+
+There is also an additional -analyzer-stats flag, which enables various
+statistics within the analyzer engine. Note the Stats checker (which produces at
+least one bug report per function) may actually change the values reported by
+-analyzer-stats.

diff --git a/clang/docs/analyzer/developer-docs/IPA.rst b/clang/docs/analyzer/developer-docs/IPA.rst
new file mode 100644
index 0000000..2e8fe37
--- /dev/null
+++ b/clang/docs/analyzer/developer-docs/IPA.rst

@@ -0,0 +1,396 @@
+Inlining
+========
+
+There are several options that control which calls the analyzer will consider for
+inlining. The major one is ``-analyzer-config ipa``:
+
+* ``analyzer-config ipa=none`` - All inlining is disabled. This is the only mode 
+  available in LLVM 3.1 and earlier and in Xcode 4.3 and earlier.
+
+* ``analyzer-config ipa=basic-inlining`` - Turns on inlining for C functions, C++ 
+   static member functions, and blocks -- essentially, the calls that behave 
+   like simple C function calls. This is essentially the mode used in 
+   Xcode 4.4.
+
+* ``analyzer-config ipa=inlining`` - Turns on inlining when we can confidently find
+    the function/method body corresponding to the call. (C functions, static
+    functions, devirtualized C++ methods, Objective-C class methods, Objective-C
+    instance methods when ExprEngine is confident about the dynamic type of the
+    instance).
+
+* ``analyzer-config ipa=dynamic`` - Inline instance methods for which the type is
+   determined at runtime and we are not 100% sure that our type info is
+   correct. For virtual calls, inline the most plausible definition.
+
+* ``analyzer-config ipa=dynamic-bifurcate`` - Same as -analyzer-config ipa=dynamic,
+   but the path is split. We inline on one branch and do not inline on the 
+   other. This mode does not drop the coverage in cases when the parent class 
+   has code that is only exercised when some of its methods are overridden.
+
+Currently, ``-analyzer-config ipa=dynamic-bifurcate`` is the default mode.
+
+While ``-analyzer-config ipa`` determines in general how aggressively the analyzer 
+will try to inline functions, several additional options control which types of 
+functions can inlined, in an all-or-nothing way. These options use the 
+analyzer's configuration table, so they are all specified as follows:
+
+    ``-analyzer-config OPTION=VALUE``
+
+c++-inlining
+------------
+
+This option controls which C++ member functions may be inlined.
+
+    ``-analyzer-config c++-inlining=[none | methods | constructors | destructors]``
+
+Each of these modes implies that all the previous member function kinds will be
+inlined as well; it doesn't make sense to inline destructors without inlining
+constructors, for example.
+
+The default c++-inlining mode is 'destructors', meaning that all member
+functions with visible definitions will be considered for inlining. In some
+cases the analyzer may still choose not to inline the function.
+
+Note that under 'constructors', constructors for types with non-trivial
+destructors will not be inlined. Additionally, no C++ member functions will be 
+inlined under -analyzer-config ipa=none or -analyzer-config ipa=basic-inlining,
+regardless of the setting of the c++-inlining mode.
+
+c++-template-inlining
+^^^^^^^^^^^^^^^^^^^^^
+
+This option controls whether C++ templated functions may be inlined.
+
+    ``-analyzer-config c++-template-inlining=[true | false]``
+
+Currently, template functions are considered for inlining by default.
+
+The motivation behind this option is that very generic code can be a source
+of false positives, either by considering paths that the caller considers
+impossible (by some unstated precondition), or by inlining some but not all
+of a deep implementation of a function.
+
+c++-stdlib-inlining
+^^^^^^^^^^^^^^^^^^^
+
+This option controls whether functions from the C++ standard library, including
+methods of the container classes in the Standard Template Library, should be
+considered for inlining.
+
+    ``-analyzer-config c++-stdlib-inlining=[true | false]``
+
+Currently, C++ standard library functions are considered for inlining by 
+default.
+
+The standard library functions and the STL in particular are used ubiquitously
+enough that our tolerance for false positives is even lower here. A false
+positive due to poor modeling of the STL leads to a poor user experience, since
+most users would not be comfortable adding assertions to system headers in order
+to silence analyzer warnings.
+
+c++-container-inlining
+^^^^^^^^^^^^^^^^^^^^^^
+
+This option controls whether constructors and destructors of "container" types
+should be considered for inlining.
+
+    ``-analyzer-config c++-container-inlining=[true | false]``
+
+Currently, these constructors and destructors are NOT considered for inlining
+by default.
+
+The current implementation of this setting checks whether a type has a member
+named 'iterator' or a member named 'begin'; these names are idiomatic in C++,
+with the latter specified in the C++11 standard. The analyzer currently does a
+fairly poor job of modeling certain data structure invariants of container-like
+objects. For example, these three expressions should be equivalent:
+
+
+.. code-block:: cpp
+   
+ std::distance(c.begin(), c.end()) == 0
+ c.begin() == c.end()
+ c.empty()
+
+Many of these issues are avoided if containers always have unknown, symbolic
+state, which is what happens when their constructors are treated as opaque.
+In the future, we may decide specific containers are "safe" to model through
+inlining, or choose to model them directly using checkers instead.
+
+
+Basics of Implementation
+------------------------
+
+The low-level mechanism of inlining a function is handled in
+ExprEngine::inlineCall and ExprEngine::processCallExit.
+
+If the conditions are right for inlining, a CallEnter node is created and added
+to the analysis work list. The CallEnter node marks the change to a new
+LocationContext representing the called function, and its state includes the
+contents of the new stack frame. When the CallEnter node is actually processed,
+its single successor will be a edge to the first CFG block in the function.
+
+Exiting an inlined function is a bit more work, fortunately broken up into
+reasonable steps:
+
+1. The CoreEngine realizes we're at the end of an inlined call and generates a
+   CallExitBegin node.
+
+2. ExprEngine takes over (in processCallExit) and finds the return value of the
+   function, if it has one. This is bound to the expression that triggered the
+   call. (In the case of calls without origin expressions, such as destructors,
+   this step is skipped.)
+
+3. Dead symbols and bindings are cleaned out from the state, including any local
+   bindings.
+
+4. A CallExitEnd node is generated, which marks the transition back to the
+   caller's LocationContext.
+
+5. Custom post-call checks are processed and the final nodes are pushed back
+   onto the work list, so that evaluation of the caller can continue.
+
+Retry Without Inlining
+^^^^^^^^^^^^^^^^^^^^^^
+
+In some cases, we would like to retry analysis without inlining a particular
+call.
+
+Currently, we use this technique to recover coverage in case we stop
+analyzing a path due to exceeding the maximum block count inside an inlined
+function.
+
+When this situation is detected, we walk up the path to find the first node
+before inlining was started and enqueue it on the WorkList with a special
+ReplayWithoutInlining bit added to it (ExprEngine::replayWithoutInlining).  The
+path is then re-analyzed from that point without inlining that particular call.
+
+Deciding When to Inline
+^^^^^^^^^^^^^^^^^^^^^^^
+
+In general, the analyzer attempts to inline as much as possible, since it
+provides a better summary of what actually happens in the program.  There are
+some cases, however, where the analyzer chooses not to inline:
+
+- If there is no definition available for the called function or method.  In
+  this case, there is no opportunity to inline.
+
+- If the CFG cannot be constructed for a called function, or the liveness
+  cannot be computed.  These are prerequisites for analyzing a function body,
+  with or without inlining.
+
+- If the LocationContext chain for a given ExplodedNode reaches a maximum cutoff
+  depth.  This prevents unbounded analysis due to infinite recursion, but also
+  serves as a useful cutoff for performance reasons.
+
+- If the function is variadic.  This is not a hard limitation, but an engineering
+  limitation.
+
+  Tracked by: <rdar://problem/12147064> Support inlining of variadic functions
+
+- In C++, constructors are not inlined unless the destructor call will be
+  processed by the ExprEngine. Thus, if the CFG was built without nodes for
+  implicit destructors, or if the destructors for the given object are not
+  represented in the CFG, the constructor will not be inlined. (As an exception,
+  constructors for objects with trivial constructors can still be inlined.)
+  See "C++ Caveats" below.
+
+- In C++, ExprEngine does not inline custom implementations of operator 'new'
+  or operator 'delete', nor does it inline the constructors and destructors
+  associated with these. See "C++ Caveats" below.
+
+- Calls resulting in "dynamic dispatch" are specially handled.  See more below.
+
+- The FunctionSummaries map stores additional information about declarations,
+  some of which is collected at runtime based on previous analyses.
+  We do not inline functions which were not profitable to inline in a different
+  context (for example, if the maximum block count was exceeded; see
+  "Retry Without Inlining").
+
+
+Dynamic Calls and Devirtualization
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+"Dynamic" calls are those that are resolved at runtime, such as C++ virtual
+method calls and Objective-C message sends. Due to the path-sensitive nature of
+the analysis, the analyzer may be able to reason about the dynamic type of the
+object whose method is being called and thus "devirtualize" the call. 
+
+This path-sensitive devirtualization occurs when the analyzer can determine what
+method would actually be called at runtime.  This is possible when the type
+information is constrained enough for a simulated C++/Objective-C object that
+the analyzer can make such a decision.
+
+DynamicTypeInfo
+^^^^^^^^^^^^^^^
+
+As the analyzer analyzes a path, it may accrue information to refine the
+knowledge about the type of an object.  This can then be used to make better
+decisions about the target method of a call.
+
+Such type information is tracked as DynamicTypeInfo.  This is path-sensitive
+data that is stored in ProgramState, which defines a mapping from MemRegions to
+an (optional) DynamicTypeInfo.
+
+If no DynamicTypeInfo has been explicitly set for a MemRegion, it will be lazily
+inferred from the region's type or associated symbol. Information from symbolic
+regions is weaker than from true typed regions.
+
+  EXAMPLE: A C++ object declared "A obj" is known to have the class 'A', but a
+           reference "A &ref" may dynamically be a subclass of 'A'.
+
+The DynamicTypePropagation checker gathers and propagates DynamicTypeInfo,
+updating it as information is observed along a path that can refine that type
+information for a region.
+
+  WARNING: Not all of the existing analyzer code has been retrofitted to use
+           DynamicTypeInfo, nor is it universally appropriate. In particular,
+           DynamicTypeInfo always applies to a region with all casts stripped
+           off, but sometimes the information provided by casts can be useful.
+
+
+RuntimeDefinition
+^^^^^^^^^^^^^^^^^
+
+The basis of devirtualization is CallEvent's getRuntimeDefinition() method,
+which returns a RuntimeDefinition object.  When asked to provide a definition,
+the CallEvents for dynamic calls will use the DynamicTypeInfo in their
+ProgramState to attempt to devirtualize the call.  In the case of no dynamic
+dispatch, or perfectly constrained devirtualization, the resulting
+RuntimeDefinition contains a Decl corresponding to the definition of the called
+function, and RuntimeDefinition::mayHaveOtherDefinitions will return FALSE.
+
+In the case of dynamic dispatch where our information is not perfect, CallEvent
+can make a guess, but RuntimeDefinition::mayHaveOtherDefinitions will return
+TRUE. The RuntimeDefinition object will then also include a MemRegion
+corresponding to the object being called (i.e., the "receiver" in Objective-C
+parlance), which ExprEngine uses to decide whether or not the call should be
+inlined.
+
+Inlining Dynamic Calls
+^^^^^^^^^^^^^^^^^^^^^^ 
+
+The -analyzer-config ipa option has five different modes: none, basic-inlining,
+inlining, dynamic, and dynamic-bifurcate. Under -analyzer-config ipa=dynamic,
+all dynamic calls are inlined, whether we are certain or not that this will
+actually be the definition used at runtime. Under -analyzer-config ipa=inlining,
+only "near-perfect" devirtualized calls are inlined*, and other dynamic calls
+are evaluated conservatively (as if no definition were available). 
+
+* Currently, no Objective-C messages are not inlined under
+  -analyzer-config ipa=inlining, even if we are reasonably confident of the type
+  of the receiver. We plan to enable this once we have tested our heuristics
+  more thoroughly.
+
+The last option, -analyzer-config ipa=dynamic-bifurcate, behaves similarly to
+"dynamic", but performs a conservative invalidation in the general virtual case
+in *addition* to inlining. The details of this are discussed below.
+
+As stated above, -analyzer-config ipa=basic-inlining does not inline any C++ 
+member functions or Objective-C method calls, even if they are non-virtual or 
+can be safely devirtualized.
+
+
+Bifurcation
+^^^^^^^^^^^
+
+ExprEngine::BifurcateCall implements the ``-analyzer-config ipa=dynamic-bifurcate``
+mode.
+
+When a call is made on an object with imprecise dynamic type information 
+(RuntimeDefinition::mayHaveOtherDefinitions() evaluates to TRUE), ExprEngine
+bifurcates the path and marks the object's region (retrieved from the
+RuntimeDefinition object) with a path-sensitive "mode" in the ProgramState.
+
+Currently, there are 2 modes: 
+
+* ``DynamicDispatchModeInlined`` - Models the case where the dynamic type information
+   of the receiver (MemoryRegion) is assumed to be perfectly constrained so 
+   that a given definition of a method is expected to be the code actually 
+   called. When this mode is set, ExprEngine uses the Decl from 
+   RuntimeDefinition to inline any dynamically dispatched call sent to this 
+   receiver because the function definition is considered to be fully resolved.
+
+* ``DynamicDispatchModeConservative`` - Models the case where the dynamic type
+   information is assumed to be incorrect, for example, implies that the method 
+   definition is overridden in a subclass. In such cases, ExprEngine does not 
+   inline the methods sent to the receiver (MemoryRegion), even if a candidate 
+   definition is available. This mode is conservative about simulating the 
+   effects of a call.
+
+Going forward along the symbolic execution path, ExprEngine consults the mode 
+of the receiver's MemRegion to make decisions on whether the calls should be 
+inlined or not, which ensures that there is at most one split per region.
+
+At a high level, "bifurcation mode" allows for increased semantic coverage in
+cases where the parent method contains code which is only executed when the
+class is subclassed. The disadvantages of this mode are a (considerable?)
+performance hit and the possibility of false positives on the path where the
+conservative mode is used.
+
+Objective-C Message Heuristics
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+ExprEngine relies on a set of heuristics to partition the set of Objective-C 
+method calls into those that require bifurcation and those that do not. Below 
+are the cases when the DynamicTypeInfo of the object is considered precise
+(cannot be a subclass):
+
+ - If the object was created with +alloc or +new and initialized with an -init
+   method.
+
+ - If the calls are property accesses using dot syntax. This is based on the
+   assumption that children rarely override properties, or do so in an
+   essentially compatible way.
+
+ - If the class interface is declared inside the main source file. In this case
+   it is unlikely that it will be subclassed.
+
+ - If the method is not declared outside of main source file, either by the
+   receiver's class or by any superclasses.
+
+C++ Caveats
+^^^^^^^^^^^
+
+C++11 [class.cdtor]p4 describes how the vtable of an object is modified as it is
+being constructed or destructed; that is, the type of the object depends on
+which base constructors have been completed. This is tracked using
+DynamicTypeInfo in the DynamicTypePropagation checker.
+
+There are several limitations in the current implementation:
+
+* Temporaries are poorly modeled right now because we're not confident in the
+  placement of their destructors in the CFG. We currently won't inline their
+  constructors unless the destructor is trivial, and don't process their
+  destructors at all, not even to invalidate the region.
+
+* 'new' is poorly modeled due to some nasty CFG/design issues.  This is tracked
+  in PR12014.  'delete' is not modeled at all.
+
+* Arrays of objects are modeled very poorly right now.  ExprEngine currently
+  only simulates the first constructor and first destructor. Because of this,
+  ExprEngine does not inline any constructors or destructors for arrays.
+
+
+CallEvent
+^^^^^^^^^
+
+A CallEvent represents a specific call to a function, method, or other body of
+code. It is path-sensitive, containing both the current state (ProgramStateRef)
+and stack space (LocationContext), and provides uniform access to the argument
+values and return type of a call, no matter how the call is written in the
+source or what sort of code body is being invoked.
+
+  NOTE: For those familiar with Cocoa, CallEvent is roughly equivalent to
+        NSInvocation.
+
+CallEvent should be used whenever there is logic dealing with function calls
+that does not care how the call occurred.
+
+Examples include checking that arguments satisfy preconditions (such as
+__attribute__((nonnull))), and attempting to inline a call.
+
+CallEvents are reference-counted objects managed by a CallEventManager. While
+there is no inherent issue with persisting them (say, in a ProgramState's GDM),
+they are intended for short-lived use, and can be recreated from CFGElements or
+non-top-level StackFrameContexts fairly easily.

diff --git a/clang/docs/analyzer/developer-docs/InitializerLists.rst b/clang/docs/analyzer/developer-docs/InitializerLists.rst
new file mode 100644
index 0000000..c9dc7a0
--- /dev/null
+++ b/clang/docs/analyzer/developer-docs/InitializerLists.rst

@@ -0,0 +1,327 @@
+================
+Initializer List
+================
+This discussion took place in https://reviews.llvm.org/D35216
+"Escape symbols when creating std::initializer_list".
+
+It touches problems of modelling C++ standard library constructs in general,
+including modelling implementation-defined fields within C++ standard library
+objects, in particular constructing objects into pointers held by such fields,
+and separation of responsibilities between analyzer's core and checkers.
+
+**Artem:**
+
+I've seen a few false positives that appear because we construct
+C++11 std::initializer_list objects with brace initializers, and such
+construction is not properly modeled. For instance, if a new object is
+constructed on the heap only to be put into a brace-initialized STL container,
+the object is reported to be leaked.
+
+Approach (0): This can be trivially fixed by this patch, which causes pointers
+passed into initializer list expressions to immediately escape.
+
+This fix is overly conservative though. So i did a bit of investigation as to
+how model std::initializer_list better.
+
+According to the standard, ``std::initializer_list<T>`` is an object that has
+methods ``begin(), end(), and size()``, where ``begin()`` returns a pointer to continuous
+array of ``size()`` objects of type T, and end() is equal to begin() plus size().
+The standard does hint that it should be possible to implement
+``std::initializer_list<T>`` as a pair of pointers, or as a pointer and a size
+integer, however specific fields that the object would contain are an
+implementation detail.
+
+Ideally, we should be able to model the initializer list's methods precisely.
+Or, at least, it should be possible to explain to the analyzer that the list
+somehow "takes hold" of the values put into it. Initializer lists can also be
+copied, which is a separate story that i'm not trying to address here.
+
+The obvious approach to modeling ``std::initializer_list`` in a checker would be to
+construct a SymbolMetadata for the memory region of the initializer list object,
+which would be of type ``T*`` and represent ``begin()``, so we'd trivially model ``begin()``
+as a function that returns this symbol. The array pointed to by that symbol
+would be ``bindLoc()``ed to contain the list's contents (probably as a ``CompoundVal``
+to produce less bindings in the store). Extent of this array would represent
+``size()`` and would be equal to the length of the list as written.
+
+So this sounds good, however apparently it does nothing to address our false
+positives: when the list escapes, our ``RegionStoreManager`` is not magically
+guessing that the metadata symbol attached to it, together with its contents,
+should also escape. In fact, it's impossible to trigger a pointer escape from
+within the checker.
+
+Approach (1): If only we enabled ``ProgramState::bindLoc(..., notifyChanges=true)``
+to cause pointer escapes (not only region changes) (which sounds like the right
+thing to do anyway) such checker would be able to solve the false positives by
+triggering escapes when binding list elements to the list. However, it'd be as
+conservative as the current patch's solution. Ideally, we do not want escapes to
+happen so early. Instead, we'd prefer them to be delayed until the list itself
+escapes.
+
+So i believe that escaping metadata symbols whenever their base regions escape
+would be the right thing to do. Currently we didn't think about that because we
+had neither pointer-type metadatas nor non-pointer escapes.
+
+Approach (2): We could teach the Store to scan itself for bindings to
+metadata-symbolic-based regions during scanReachableSymbols() whenever a region
+turns out to be reachable. This requires no work on checker side, but it sounds
+performance-heavy.
+
+Approach (3): We could let checkers maintain the set of active metadata symbols
+in the program state (ideally somewhere in the Store, which sounds weird but
+causes the smallest amount of layering violations), so that the core knew what
+to escape. This puts a stress on the checkers, but with a smart data map it
+wouldn't be a problem.
+
+Approach (4): We could allow checkers to trigger pointer escapes in arbitrary
+moments. If we allow doing this within ``checkPointerEscape`` callback itself, we
+would be able to express facts like "when this region escapes, that metadata
+symbol attached to it should also escape". This sounds like an ultimate freedom,
+with maximum stress on the checkers - still not too much stress when we have
+smart data maps.
+
+I'm personally liking the approach (2) - it should be possible to avoid
+performance overhead, and clarity seems nice.
+
+**Gabor:**
+
+At this point, I am a bit wondering about two questions.
+
+* When should something belong to a checker and when should something belong to the engine? 
+  Sometimes we model library aspects in the engine and model language constructs in checkers.
+
+* What is the checker programming model that we are aiming for? Maximum freedom or more easy checker development?
+
+I think if we aim for maximum freedom, we do not need to worry about the
+potential stress on checkers, and we can introduce abstractions to mitigate that
+later on.
+If we want to simplify the API, then maybe it makes more sense to move language
+construct modeling to the engine when the checker API is not sufficient instead
+of complicating the API.
+
+Right now I have no preference or objections between the alternatives but there
+are some random thoughts:
+
+* Maybe it would be great to have a guideline how to evolve the analyzer and
+  follow it, so it can help us to decide in similar situations
+
+* I do care about performance in this case. The reason is that we have a
+  limited performance budget. And I think we should not expect most of the checker
+  writers to add modeling of language constructs. So, in my opinion, it is ok to
+  have less nice/more verbose API for language modeling if we can have better
+  performance this way, since it only needs to be done once, and is done by the
+  framework developers.
+
+**Artem:** These are some great questions, i guess it'd be better to discuss
+them more openly. As a quick dump of my current mood:
+
+* To me it seems obvious that we need to aim for a checker API that is both
+  simple and powerful. This can probably by keeping the API as powerful as
+  necessary while providing a layer of simple ready-made solutions on top of it.
+  Probably a few reusable components for assembling checkers. And this layer
+  should ideally be pleasant enough to work with, so that people would prefer to
+  extend it when something is lacking, instead of falling back to the complex
+  omnipotent API. I'm thinking of AST matchers vs. AST visitors as a roughly
+  similar situation: matchers are not omnipotent, but they're so nice.
+
+* Separation between core and checkers is usually quite strange. Once we have
+  shared state traits, i generally wouldn't mind having region store or range
+  constraint manager as checkers (though it's probably not worth it to transform
+  them - just a mood). The main thing to avoid here would be the situation when
+  the checker overwrites stuff written by the core because it thinks it has a
+  better idea what's going on, so the core should provide a good default behavior.
+
+* Yeah, i totally care about performance as well, and if i try to implement
+  approach, i'd make sure it's good.
+
+**Artem:**
+
+> Approach (2): We could teach the Store to scan itself for bindings to
+> metadata-symbolic-based regions during scanReachableSymbols() whenever
+> a region turns out to be reachable. This requires no work on checker side,
+> but it sounds performance-heavy.
+
+Nope, this approach is wrong. Metadata symbols may become out-of-date: when the
+object changes, metadata symbols attached to it aren't changing (because symbols
+simply don't change). The same metadata may have different symbols to denote its
+value in different moments of time, but at most one of them represents the
+actual metadata value. So we'd be escaping more stuff than necessary.
+
+If only we had "ghost fields"
+(https://lists.llvm.org/pipermail/cfe-dev/2016-May/049000.html), it would have
+been much easier, because the ghost field would only contain the actual
+metadata, and the Store would always know about it. This example adds to my
+belief that ghost fields are exactly what we need for most C++ checkers.
+
+**Devin:**
+
+In this case, I would be fine with some sort of
+AbstractStorageMemoryRegion that meant "here is a memory region and somewhere
+reachable from here exists another region of type T". Or even multiple regions
+with different identifiers. This wouldn't specify how the memory is reachable,
+but it would allow for transfer functions to get at those regions and it would
+allow for invalidation.
+
+For ``std::initializer_list`` this reachable region would the region for the backing
+array and the transfer functions for begin() and end() yield the beginning and
+end element regions for it.
+
+In my view this differs from ghost variables in that (1) this storage does
+actually exist (it is just a library implementation detail where that storage
+lives) and (2) it is perfectly valid for a pointer into that storage to be
+returned and for another part of the program to read or write from that storage.
+(Well, in this case just read since it is allowed to be read-only memory).
+
+What I'm not OK with is modeling abstract analysis state (for example, the count
+of a NSMutableArray or the typestate of a file handle) as a value stored in some
+ginned up region in the store. This takes an easy problem that the analyzer does
+well at (modeling typestate) and turns it into a hard one that the analyzer is
+bad at (reasoning about the contents of the heap).
+
+I think the key criterion here is: "is the region accessible from outside the
+library". That is, does the library expose the region as a pointer that can be
+read to or written from in the client program? If so, then it makes sense for
+this to be in the store: we are modeling reachable storage as storage. But if
+we're just modeling arbitrary analysis facts that need to be invalidated when a
+pointer escapes then we shouldn't try to gin up storage for them just to get
+invalidation for free.
+
+**Artem:**
+
+> In this case, I would be fine with some sort of ``AbstractStorageMemoryRegion``
+> that meant "here is a memory region and somewhere reachable from here exists
+> another region of type T". Or even multiple regions with different
+> identifiers. This wouldn't specify how the memory is reachable, but it would
+> allow for transfer functions to get at those regions and it would allow for
+> invalidation.
+
+Yeah, this is what we can easily implement now as a
+symbolic-region-based-on-a-metadata-symbol (though we can make a new region
+class for that if we eg. want it typed). The problem is that the relation
+between such storage region and its parent object region is essentially
+immaterial, similarly to the relation between ``SymbolRegionValue`` and its parent
+region. Region contents are mutable: today the abstract storage is reachable
+from its parent object, tomorrow it's not, and maybe something else becomes
+reachable, something that isn't even abstract. So the parent region for the
+abstract storage is most of the time at best a "nice to know" thing - we cannot
+rely on it to do any actual work. We'd anyway need to rely on the checker to do
+the job.
+
+> For std::initializer_list this reachable region would the region for the
+> backing array and the transfer functions for begin() and end() yield the
+> beginning and end element regions for it.
+
+So maybe in fact for std::initializer_list it may work fine because you cannot
+change the data after the object is constructed - so this region's contents are
+essentially immutable. For the future, i feel as if it is a dead end.
+
+I'd like to consider another funny example. Suppose we're trying to model
+
+.. code-block:: cpp
+ 
+ std::unique_ptr. Consider::
+ 
+   void bar(const std::unique_ptr<int> &x);
+ 
+   void foo(std::unique_ptr<int> &x) {
+     int *a = x.get();   // (a, 0, direct): &AbstractStorageRegion
+     *a = 1;             // (AbstractStorageRegion, 0, direct): 1 S32b
+     int *b = new int;
+     *b = 2;             // (SymRegion{conj_$0<int *>}, 0 ,direct): 2 S32b
+     x.reset(b);         // Checker map: x -> SymRegion{conj_$0<int *>}
+     bar(x);             // 'a' doesn't escape (the pointer was unique), 'b' does.
+     clang_analyzer_eval(*a == 1); // Making this true is up to the checker.
+     clang_analyzer_eval(*b == 2); // Making this unknown is up to the checker.
+   }
+ 
+The checker doesn't totally need to ensure that ``*a == 1`` passes - even though the
+pointer was unique, it could theoretically have ``.get()``-ed above and the code
+could of course break the uniqueness invariant (though we'd probably want it).
+The checker can say that "even if ``*a`` did escape, it was not because it was
+stuffed directly into bar()".
+
+The checker's direct responsibility, however, is to solve the ``*b == 2`` thing
+(which is in fact the problem we're dealing with in this patch - escaping the
+storage region of the object).
+
+So we're talking about one more operation over the program state (scanning
+reachable symbols and regions) that cannot work without checker support.
+
+We can probably add a new callback "checkReachableSymbols" to solve this. This
+is in fact also related to the dead symbols problem (we're scanning for live
+symbols in the store and in the checkers separately, but we need to do so
+simultaneously with a single worklist). Hmm, in fact this sounds like a good
+idea; we can replace checkLiveSymbols with checkReachableSymbols.
+
+Or we could just have ghost member variables, and no checker support required at
+all. For ghost member variables, the relation with their parent region (which
+would be their superregion) is actually useful, the mutability of their contents
+is expressed naturally, and the store automagically sees reachable symbols, live
+symbols, escapes, invalidations, whatever.
+
+> In my view this differs from ghost variables in that (1) this storage does
+> actually exist (it is just a library implementation detail where that storage
+> lives) and (2) it is perfectly valid for a pointer into that storage to be
+> returned and for another part of the program to read or write from that
+> storage. (Well, in this case just read since it is allowed to be read-only
+> memory).
+
+> What I'm not OK with is modeling abstract analysis state (for example, the
+> count of a NSMutableArray or the typestate of a file handle) as a value stored
+> in some ginned up region in the store.This takes an easy problem that the
+> analyzer does well at (modeling typestate) and turns it into a hard one that
+> the analyzer is bad at (reasoning about the contents of the heap).
+
+Yeah, i tend to agree on that. For simple typestates, this is probably an
+overkill, so let's definitely put aside the idea of "ghost symbolic regions"
+that i had earlier.
+
+But, to summarize a bit, in our current case, however, the typestate we're
+looking for is the contents of the heap. And when we try to model such
+typestates (complex in this specific manner, i.e. heap-like) in any checker, we
+have a choice between re-doing this modeling in every such checker (which is
+something analyzer is indeed good at, but at a price of making checkers heavy)
+or instead relying on the Store to do exactly what it's designed to do.
+
+> I think the key criterion here is: "is the region accessible from outside
+> the library". That is, does the library expose the region as a pointer that
+> can be read to or written from in the client program? If so, then it makes
+> sense for this to be in the store: we are modeling reachable storage as
+> storage. But if we're just modeling arbitrary analysis facts that need to be
+> invalidated when a pointer escapes then we shouldn't try to gin up storage
+> for them just to get invalidation for free.
+
+As a metaphor, i'd probably compare it to body farms - the difference between
+ghost member variables and metadata symbols seems to me like the difference
+between body farms and evalCall. Both are nice to have, and body farms are very
+pleasant to work with, even if not omnipotent. I think it's fine for a
+FunctionDecl's body in a body farm to have a local variable, even if such
+variable doesn't actually exist, even if it cannot be seen from outside the
+function call. I'm not seeing immediate practical difference between "it does
+actually exist" and "it doesn't actually exist, just a handy abstraction".
+Similarly, i think it's fine if we have a ``CXXRecordDecl`` with
+implementation-defined contents, and try to farm up a member variable as a handy
+abstraction (we don't even need to know its name or offset, only that it's there
+somewhere).
+
+**Artem:**
+
+We've discussed it in person with Devin, and he provided more points to think
+about:
+
+* If the initializer list consists of non-POD data, constructors of list's
+  objects need to take the sub-region of the list's region as this-region In the
+  current (v2) version of this patch, these objects are constructed elsewhere and
+  then trivial-copied into the list's metadata pointer region, which may be
+  incorrect. This is our overall problem with C++ constructors, which manifests in
+  this case as well. Additionally, objects would need to be constructed in the
+  analyzer's core, which would not be able to predict that it needs to take a
+  checker-specific region as this-region, which makes it harder, though it might
+  be mitigated by sharing the checker state traits.
+
+* Because "ghost variables" are not material to the user, we need to somehow
+  make super sure that they don't make it into the diagnostic messages.
+
+So, because this needs further digging into overall C++ support and rises too
+many questions, i'm delaying a better approach to this problem and will fall
+back to the original trivial patch.

diff --git a/clang/docs/analyzer/developer-docs/RegionStore.rst b/clang/docs/analyzer/developer-docs/RegionStore.rst
new file mode 100644
index 0000000..c963e5b
--- /dev/null
+++ b/clang/docs/analyzer/developer-docs/RegionStore.rst

@@ -0,0 +1,183 @@
+============
+Region Store
+============
+The analyzer "Store" represents the contents of memory regions. It is an opaque
+functional data structure stored in each ``ProgramState``; the only class that
+can modify the store is its associated StoreManager.
+
+Currently (Feb. 2013), the only StoreManager implementation being used is
+``RegionStoreManager``. This store records bindings to memory regions using a
+"base region + offset" key. (This allows ``*p`` and ``p[0]`` to map to the same
+location, among other benefits.)
+
+Regions are grouped into "clusters", which roughly correspond to "regions with
+the same base region". This allows certain operations to be more efficient,
+such as invalidation.
+
+Regions that do not have a known offset use a special "symbolic" offset. These
+keys store both the original region, and the "concrete offset region" -- the
+last region whose offset is entirely concrete. (For example, in the expression
+``foo.bar[1][i].baz``, the concrete offset region is the array ``foo.bar[1]``,
+since that has a known offset from the start of the top-level ``foo`` struct.)
+
+
+Binding Invalidation
+--------------------
+
+Supporting both concrete and symbolic offsets makes things a bit tricky. Here's
+an example:
+
+.. code-block:: cpp
+
+  foo[0] = 0;
+  foo[1] = 1;
+  foo[i] = i;
+
+After the third assignment, nothing can be said about the value of ``foo[0]``,
+because ``foo[i]`` may have overwritten it! Thus, *binding to a region with a
+symbolic offset invalidates the entire concrete offset region.* We know
+``foo[i]`` is somewhere within ``foo``, so we don't have to invalidate
+anything else, but we do have to be conservative about all other bindings within
+``foo``.
+
+Continuing the example:
+
+.. code-block:: cpp
+
+  foo[i] = i;
+  foo[0] = 0;
+
+After this latest assignment, nothing can be said about the value of ``foo[i]``,
+because ``foo[0]`` may have overwritten it! *Binding to a region R with a
+concrete offset invalidates any symbolic offset bindings whose concrete offset
+region is a super-region **or** sub-region of R.* All we know about ``foo[i]``
+is that it is somewhere within ``foo``, so changing *anything* within ``foo``
+might change ``foo[i]``, and changing *all* of ``foo`` (or its base region) will
+*definitely* change ``foo[i]``.
+
+This logic could be improved by using the current constraints on ``i``, at the
+cost of speed. The latter case could also be improved by matching region kinds,
+i.e. changing ``foo[0].a`` is unlikely to affect ``foo[i].b``, no matter what
+``i`` is.
+
+For more detail, read through ``RegionStoreManager::removeSubRegionBindings`` in
+RegionStore.cpp.
+
+
+ObjCIvarRegions
+---------------
+
+Objective-C instance variables require a bit of special handling. Like struct
+fields, they are not base regions, and when their parent object region is
+invalidated, all the instance variables must be invalidated as well. However,
+they have no concrete compile-time offsets (in the modern, "non-fragile"
+runtime), and so cannot easily be represented as an offset from the start of
+the object in the analyzer. Moreover, this means that invalidating a single
+instance variable should *not* invalidate the rest of the object, since unlike
+struct fields or array elements there is no way to perform pointer arithmetic
+to access another instance variable.
+
+Consequently, although the base region of an ObjCIvarRegion is the entire
+object, RegionStore offsets are computed from the start of the instance
+variable. Thus it is not valid to assume that all bindings with non-symbolic
+offsets start from the base region!
+
+
+Region Invalidation
+-------------------
+
+Unlike binding invalidation, region invalidation occurs when the entire
+contents of a region may have changed---say, because it has been passed to a
+function the analyzer can model, like memcpy, or because its address has
+escaped, usually as an argument to an opaque function call. In these cases we
+need to throw away not just all bindings within the region itself, but within
+its entire cluster, since neighboring regions may be accessed via pointer
+arithmetic.
+
+Region invalidation typically does even more than this, however. Because it
+usually represents the complete escape of a region from the analyzer's model,
+its *contents* must also be transitively invalidated. (For example, if a region
+``p`` of type ``int **`` is invalidated, the contents of ``*p`` and ``**p`` may
+have changed as well.) The algorithm that traverses this transitive closure of
+accessible regions is known as ClusterAnalysis, and is also used for finding
+all live bindings in the store (in order to throw away the dead ones). The name
+"ClusterAnalysis" predates the cluster-based organization of bindings, but
+refers to the same concept: during invalidation and liveness analysis, all
+bindings within a cluster must be treated in the same way for a conservative
+model of program behavior.
+
+
+Default Bindings
+----------------
+
+Most bindings in RegionStore are simple scalar values -- integers and pointers.
+These are known as "Direct" bindings. However, RegionStore supports a second
+type of binding called a "Default" binding. These are used to provide values to
+all the elements of an aggregate type (struct or array) without having to
+explicitly specify a binding for each individual element.
+
+When there is no Direct binding for a particular region, the store manager
+looks at each super-region in turn to see if there is a Default binding. If so,
+this value is used as the value of the original region. The search ends when
+the base region is reached, at which point the RegionStore will pick an
+appropriate default value for the region (usually a symbolic value, but
+sometimes zero, for static data, or "uninitialized", for stack variables).
+
+.. code-block:: cpp
+
+  int manyInts[10];
+  manyInts[1] = 42;   // Creates a Direct binding for manyInts[1].
+  print(manyInts[1]); // Retrieves the Direct binding for manyInts[1];
+  print(manyInts[0]); // There is no Direct binding for manyInts[0].
+                      // Is there a Default binding for the entire array?
+                      // There is not, but it is a stack variable, so we use
+                      // "uninitialized" as the default value (and emit a
+                      // diagnostic!).
+
+NOTE: The fact that bindings are stored as a base region plus an offset limits
+the Default Binding strategy, because in C aggregates can contain other
+aggregates. In the current implementation of RegionStore, there is no way to
+distinguish a Default binding for an entire aggregate from a Default binding
+for the sub-aggregate at offset 0.
+
+
+Lazy Bindings (LazyCompoundVal)
+-------------------------------
+
+RegionStore implements an optimization for copying aggregates (structs and
+arrays) called "lazy bindings", implemented using a special SVal called
+LazyCompoundVal. When the store is asked for the "binding" for an entire
+aggregate (i.e. for an lvalue-to-rvalue conversion), it returns a
+LazyCompoundVal instead. When this value is then stored into a variable, it is
+bound as a Default value. This makes copying arrays and structs much cheaper
+than if they had required memberwise access.
+
+Under the hood, a LazyCompoundVal is implemented as a uniqued pair of (region,
+store), representing "the value of the region during this 'snapshot' of the
+store". This has important implications for any sort of liveness or
+reachability analysis, which must take the bindings in the old store into
+account.
+
+Retrieving a value from a lazy binding happens in the same way as any other
+Default binding: since there is no direct binding, the store manager falls back
+to super-regions to look for an appropriate default binding. LazyCompoundVal
+differs from a normal default binding, however, in that it contains several
+different values, instead of one value that will appear several times. Because
+of this, the store manager has to reconstruct the subregion chain on top of the
+LazyCompoundVal region, and look up *that* region in the previous store.
+
+Here's a concrete example:
+
+.. code-block:: cpp
+
+  CGPoint p;
+  p.x = 42;       // A Direct binding is made to the FieldRegion 'p.x'.
+  CGPoint p2 = p; // A LazyCompoundVal is created for 'p', along with a
+                  // snapshot of the current store state. This value is then
+                  // used as a Default binding for the VarRegion 'p2'.
+  return p2.x;    // The binding for FieldRegion 'p2.x' is requested.
+                  // There is no Direct binding, so we look for a Default
+                  // binding to 'p2' and find the LCV.
+                  // Because it's a LCV, we look at our requested region
+                  // and see that it's the '.x' field. We ask for the value
+                  // of 'p.x' within the snapshot, and get back 42.

diff --git a/clang/docs/analyzer/developer-docs/nullability.rst b/clang/docs/analyzer/developer-docs/nullability.rst
new file mode 100644
index 0000000..be6f473
--- /dev/null
+++ b/clang/docs/analyzer/developer-docs/nullability.rst

@@ -0,0 +1,107 @@
+==================
+Nullability Checks
+==================
+
+This document is a high level description of the nullablility checks.
+These checks intended to use the annotations that is described in this
+RFC: http://lists.cs.uiuc.edu/pipermail/cfe-dev/2015-March/041798.html.
+
+Let's consider the following 2 categories:
+
+**1) nullable**
+
+If a pointer ``p`` has a nullable annotation and no explicit null check or assert, we should warn in the following cases:
+
+* ``p`` gets implicitly converted into nonnull pointer, for example, we are passing it to a function that takes a nonnull parameter.
+* ``p`` gets dereferenced
+
+Taking a branch on nullable pointers are the same like taking branch on null unspecified pointers.
+
+Explicit cast from nullable to nonnul:
+
+.. code-block:: cpp
+
+  __nullable id foo;
+  id bar = foo;
+  takesNonNull((_nonnull) bar); // should not warn here (backward compatibility hack)
+  anotherTakesNonNull(bar); // would be great to warn here, but not necessary(*)
+
+Because bar corresponds to the same symbol all the time it is not easy to implement the checker that way the cast only suppress the first call but not the second. For this reason in the first implementation after a contradictory cast happens, I will treat bar as nullable unspecified, this way all of the warnings will be suppressed. Treating the symbol as nullable unspecified also has an advantage that in case the takesNonNull function body is being inlined, the will be no warning, when the symbol is dereferenced. In case I have time after the initial version I might spend additional time to try to find a more sophisticated solution, in which we would produce the second warning (*).
+ 
+**2) nonnull**
+
+* Dereferencing a nonnull, or sending message to it is ok.
+* Converting nonnull to nullable is Ok.
+* When there is an explicit cast from nonnull to nullable I will trust the cast (it is probable there for a reason, because this cast does not suppress any warnings or errors).
+* But what should we do about null checks?:
+
+.. code-block:: cpp
+
+  __nonnull id takesNonnull(__nonnull id x) {
+      if (x == nil) {
+          // Defensive backward compatible code:
+          ....
+          return nil; // Should the analyzer cover this piece of code? Should we require the cast (__nonnull)nil?
+      }
+      ....
+  }
+
+There are these directions:
+
+* We can either take the branch; this way the branch is analyzed
+* Should we not warn about any nullability issues in that branch? Probably not, it is ok to break the nullability postconditions when the nullability preconditions are violated.
+* We can assume that these pointers are not null and we lose coverage with the analyzer. (This can be implemented either in constraint solver or in the checker itself.)
+
+Other Issues to keep in mind/take care of:
+
+* Messaging:
+
+  * Sending a message to a nullable pointer
+
+    * Even though the method might return a nonnull pointer, when it was sent to a nullable pointer the return type will be nullable.
+  	* The result is nullable unless the receiver is known to be non null.
+
+  * Sending a message to a unspecified or nonnull pointer
+
+    * If the pointer is not assumed to be nil, we should be optimistic and use the nullability implied by the method.
+
+      * This will not happen automatically, since the AST will have null unspecified in this case.
+
+Inlining
+--------
+
+A symbol may need to be treated differently inside an inlined body. For example, consider these conversions from nonnull to nullable in presence of inlining:
+
+.. code-block:: cpp
+
+  id obj = getNonnull();
+  takesNullable(obj);
+  takesNonnull(obj);
+  
+  void takesNullable(nullable id obj) {
+     obj->ivar // we should assume obj is nullable and warn here
+  }
+           
+With no special treatment, when the takesNullable is inlined the analyzer will not warn when the obj symbol is dereferenced. One solution for this is to reanalyze takesNullable as a top level function to get possible violations. The alternative method, deducing nullability information from the arguments after inlining is not robust enough (for example there might be more parameters with different nullability, but in the given path the two parameters might end up being the same symbol or there can be nested functions that take different view of the nullability of the same symbol). So the symbol will remain nonnull to avoid false positives but the functions that takes nullable parameters will be analyzed separately as well without inlining.
+
+Annotations on multi level pointers
+-----------------------------------
+
+Tracking multiple levels of annotations for pointers pointing to pointers would make the checker more complicated, because this way a vector of nullability qualifiers would be needed to be tracked for each symbol. This is not a big caveat, since once the top level pointer is dereferenced, the symvol for the inner pointer will have the nullability information. The lack of multi level annotation tracking only observable, when multiple levels of pointers are passed to a function which has a parameter with multiple levels of annotations. So for now the checker support the top level nullability qualifiers only.:
+
+.. code-block:: cpp
+
+  int * __nonnull * __nullable p;
+  int ** q = p;
+  takesStarNullableStarNullable(q);
+
+Implementation notes
+--------------------
+
+What to track?
+
+* The checker would track memory regions, and to each relevant region a qualifier information would be attached which is either nullable, nonnull or null unspecified (or contradicted to suppress warnings for a specific region).
+* On a branch, where a nullable pointer is known to be non null, the checker treat it as a same way as a pointer annotated as nonnull.
+* When there is an explicit cast from a null unspecified to either nonnull or nullable I will trust the cast.
+* Unannotated pointers are treated the same way as pointers annotated with nullability unspecified qualifier, unless the region is wrapped in ASSUME_NONNULL macros.
+* We might want to implement a callback for entry points to top level functions, where the pointer nullability assumptions would be made.
commit	1a17032b788016299ea4e3c4b53670c6dcd94b4f	[log] [tgz]
author	Kristof Umann <dkszelethus@gmail.com>	Tue Feb 05 00:39:33 2019 +0000
committer	Kristof Umann <dkszelethus@gmail.com>	Tue Feb 05 00:39:33 2019 +0000
tree	318aac876ac4171cb6613f534126f39430fc193e
parent	7f09fd6b045da9fd62529fede180ac3e48a88305 [diff]