Update Clang for rebase to r212749.

This also fixes a small issue with arm_neon.h not being generated always.

Includes a cherry-pick of:
r213450 - fixes mac-specific header issue
r213126 - removes a default -Bsymbolic on Android

Change-Id: I2a790a0f5d3b2aab11de596fc3a74e7cbc99081d
diff --git a/docs/AttributeReference.rst b/docs/AttributeReference.rst
index 34a812d..c12f6d8 100644
--- a/docs/AttributeReference.rst
+++ b/docs/AttributeReference.rst
@@ -219,14 +219,14 @@
 The ``carries_dependency`` attribute specifies dependency propagation into and
 out of functions.
 
-When specified on a function or Objective-C method, the ``carries_depedency``
+When specified on a function or Objective-C method, the ``carries_dependency``
 attribute means that the return value carries a dependency out of the function, 
 so that the implementation need not constrain ordering upon return from that
 function. Implementations of the function and its caller may choose to preserve
 dependencies instead of emitting memory ordering instructions such as fences.
 
 Note, this attribute does not change the meaning of the program, but may result
-in generatation of more efficient code.
+in generation of more efficient code.
 
 
 enable_if
@@ -312,6 +312,18 @@
 Query for this feature with ``__has_attribute(enable_if)``.
 
 
+flatten (gnu::flatten)
+----------------------
+.. csv-table:: Supported Syntaxes
+   :header: "GNU", "C++11", "__declspec", "Keyword"
+
+   "X","X","",""
+
+The ``flatten`` attribute causes calls within the attributed function to
+be inlined unless it is impossible to do so, for example if the body of the
+callee is unavailable or if the callee has the ``noinline`` attribute.
+
+
 format (gnu::format)
 --------------------
 .. csv-table:: Supported Syntaxes
@@ -467,6 +479,18 @@
 tool to avoid false positives and provide meaningful stack traces.
 
 
+no_split_stack (gnu::no_split_stack)
+------------------------------------
+.. csv-table:: Supported Syntaxes
+   :header: "GNU", "C++11", "__declspec", "Keyword"
+
+   "X","X","",""
+
+The ``no_split_stack`` attribute disables the emission of the split stack
+preamble for a particular function. It has no effect if ``-fsplit-stack``
+is not specified.
+
+
 objc_method_family
 ------------------
 .. csv-table:: Supported Syntaxes
@@ -545,6 +569,24 @@
                       ^
 
 
+optnone (clang::optnone)
+------------------------
+.. csv-table:: Supported Syntaxes
+   :header: "GNU", "C++11", "__declspec", "Keyword"
+
+   "X","X","",""
+
+The ``optnone`` attribute suppresses essentially all optimizations
+on a function or method, regardless of the optimization level applied to
+the compilation unit as a whole.  This is particularly useful when you
+need to debug a particular function, but it is infeasible to build the
+entire application without optimization.  Avoiding optimization on the
+specified function can improve the quality of the debugging information
+for that function.
+
+This attribute is incompatible with the ``always_inline`` attribute.
+
+
 overloadable
 ------------
 .. csv-table:: Supported Syntaxes
@@ -632,6 +674,18 @@
 Query for this feature with ``__has_extension(attribute_overloadable)``.
 
 
+pcs (gnu::pcs)
+--------------
+.. csv-table:: Supported Syntaxes
+   :header: "GNU", "C++11", "__declspec", "Keyword"
+
+   "X","X","",""
+
+On ARM targets, this can attribute can be used to select calling conventions,
+similar to ``stdcall`` on x86. Valid parameter values are "aapcs" and
+"aapcs-vfp".
+
+
 release_capability (release_shared_capability, clang::release_capability, clang::release_shared_capability)
 -----------------------------------------------------------------------------------------------------------
 .. csv-table:: Supported Syntaxes
@@ -659,6 +713,17 @@
 ===================
 
 
+section (gnu::section, __declspec(allocate))
+--------------------------------------------
+.. csv-table:: Supported Syntaxes
+   :header: "GNU", "C++11", "__declspec", "Keyword"
+
+   "X","X","X",""
+
+The ``section`` attribute allows you to specify a specific section a
+global variable or function should be in after translation.
+
+
 tls_model (gnu::tls_model)
 --------------------------
 .. csv-table:: Supported Syntaxes
@@ -677,6 +742,26 @@
 TLS models are mutually exclusive.
 
 
+thread
+------
+.. csv-table:: Supported Syntaxes
+   :header: "GNU", "C++11", "__declspec", "Keyword"
+
+   "","","X",""
+
+The ``__declspec(thread)`` attribute declares a variable with thread local
+storage.  It is available under the ``-fms-extensions`` flag for MSVC
+compatibility.  Documentation for the Visual C++ attribute is available on MSDN_.
+
+.. _MSDN: http://msdn.microsoft.com/en-us/library/9w1sdazb.aspx
+
+In Clang, ``__declspec(thread)`` is generally equivalent in functionality to the
+GNU ``__thread`` keyword.  The variable must not have a destructor and must have
+a constant initializer, if any.  The attribute only applies to variables
+declared with static storage duration, such as globals, class static data
+members, and static locals.
+
+
 Type Attributes
 ===============
 
diff --git a/docs/ClangFormatStyleOptions.rst b/docs/ClangFormatStyleOptions.rst
index 0483bd7..910192b 100644
--- a/docs/ClangFormatStyleOptions.rst
+++ b/docs/ClangFormatStyleOptions.rst
@@ -211,9 +211,12 @@
   the parentheses of a function call with that name. If there is no name,
   a zero-length name is assumed.
 
-**DerivePointerBinding** (``bool``)
-  If ``true``, analyze the formatted file for the most common binding
-  and use ``PointerBindsToType`` only as fallback.
+**DerivePointerAlignment** (``bool``)
+  If ``true``, analyze the formatted file for the most common
+  alignment of & and *. ``PointerAlignment`` is then used only as fallback.
+
+**DisableFormat** (``bool``)
+  Disables formatting at all.
 
 **ExperimentalAutoDetectBinPacking** (``bool``)
   If ``true``, clang-format detects whether function calls and
@@ -245,13 +248,13 @@
   When ``false``, use the same indentation level as for the switch statement.
   Switch statement body is always indented one level more than case labels.
 
-**IndentFunctionDeclarationAfterType** (``bool``)
-  If ``true``, indent when breaking function declarations which
-  are not also definitions after the type.
-
 **IndentWidth** (``unsigned``)
   The number of columns to use for indentation.
 
+**IndentWrappedFunctionNames** (``bool``)
+  Indent if a function definition or declaration is wrapped after the
+  type.
+
 **KeepEmptyLinesAtTheStartOfBlocks** (``bool``)
   If true, empty lines at the start of blocks are kept.
 
@@ -314,8 +317,18 @@
   Penalty for putting the return type of a function onto its own
   line.
 
-**PointerBindsToType** (``bool``)
-  Set whether & and * bind to the type as opposed to the variable.
+**PointerAlignment** (``PointerAlignmentStyle``)
+  Pointer and reference alignment style.
+
+  Possible values:
+
+  * ``PAS_Left`` (in configuration: ``Left``)
+    Align pointer to the left.
+  * ``PAS_Right`` (in configuration: ``Right``)
+    Align pointer to the right.
+  * ``PAS_Middle`` (in configuration: ``Middle``)
+    Align pointer in the middle.
+
 
 **SpaceBeforeAssignmentOperators** (``bool``)
   If ``false``, spaces will be removed before assignment operators.
diff --git a/docs/InternalsManual.rst b/docs/InternalsManual.rst
index 5830a7e..8e047db 100644
--- a/docs/InternalsManual.rst
+++ b/docs/InternalsManual.rst
@@ -1852,13 +1852,14 @@
    * Make sure that ``children()`` visits all of the subexpressions.  This is
      important for a number of features (e.g., IDE support, C++ variadic
      templates).  If you have sub-types, you'll also need to visit those
-     sub-types in the ``RecursiveASTVisitor``.
-   * Add printing support (``StmtPrinter.cpp``) and dumping support
-     (``StmtDumper.cpp``) for your expression.
+     sub-types in ``RecursiveASTVisitor`` and ``DataRecursiveASTVisitor``.
+   * Add printing support (``StmtPrinter.cpp``) for your expression.
    * Add profiling support (``StmtProfile.cpp``) for your AST node, noting the
      distinguishing (non-source location) characteristics of an instance of
      your expression.  Omitting this step will lead to hard-to-diagnose
      failures regarding matching of template declarations.
+   * Add serialization support (``ASTReaderStmt.cpp``, ``ASTWriterStmt.cpp``)
+     for your AST node.
 
 #. Teach semantic analysis to build your AST node.  At this point, you can wire
    up your ``Sema::BuildXXX`` function to actually create your AST.  A few
diff --git a/docs/LanguageExtensions.rst b/docs/LanguageExtensions.rst
index 5e957cb..ad7821a 100644
--- a/docs/LanguageExtensions.rst
+++ b/docs/LanguageExtensions.rst
@@ -1446,6 +1446,24 @@
     return __builtin_addressof(value);
   }
 
+``__builtin_operator_new`` and ``__builtin_operator_delete``
+------------------------------------------------------------
+
+``__builtin_operator_new`` allocates memory just like a non-placement non-class
+*new-expression*. This is exactly like directly calling the normal
+non-placement ``::operator new``, except that it allows certain optimizations
+that the C++ standard does not permit for a direct function call to
+``::operator new`` (in particular, removing ``new`` / ``delete`` pairs and
+merging allocations).
+
+Likewise, ``__builtin_operator_delete`` deallocates memory just like a
+non-class *delete-expression*, and is exactly like directly calling the normal
+``::operator delete``, except that it permits optimizations. Only the unsized
+form of ``__builtin_operator_delete`` is currently available.
+
+These builtins are intended for use in the implementation of ``std::allocator``
+and other similar allocation libraries, and are only available in C++.
+
 Multiprecision Arithmetic Builtins
 ----------------------------------
 
@@ -1562,7 +1580,9 @@
 .. code-block:: c
 
   T __builtin_arm_ldrex(const volatile T *addr);
+  T __builtin_arm_ldaex(const volatile T *addr);
   int __builtin_arm_strex(T val, volatile T *addr);
+  int __builtin_arm_stlex(T val, volatile T *addr);
   void __builtin_arm_clrex(void);
 
 The types ``T`` currently supported are:
@@ -1571,11 +1591,11 @@
 * Pointer types.
 
 Note that the compiler does not guarantee it will not insert stores which clear
-the exclusive monitor in between an ``ldrex`` and its paired ``strex``. In
-practice this is only usually a risk when the extra store is on the same cache
-line as the variable being modified and Clang will only insert stack stores on
-its own, so it is best not to use these operations on variables with automatic
-storage duration.
+the exclusive monitor in between an ``ldrex`` type operation and its paired
+``strex``. In practice this is only usually a risk when the extra store is on
+the same cache line as the variable being modified and Clang will only insert
+stack stores on its own, so it is best not to use these operations on variables
+with automatic storage duration.
 
 Also, loads and stores may be implicit in code written between the ``ldrex`` and
 ``strex``. Clang will not necessarily mitigate the effects of these either, so
@@ -1667,3 +1687,190 @@
 
 Use ``__has_feature(memory_sanitizer)`` to check if the code is being built
 with :doc:`MemorySanitizer`.
+
+
+Extensions for selectively disabling optimization
+=================================================
+
+Clang provides a mechanism for selectively disabling optimizations in functions
+and methods.
+
+To disable optimizations in a single function definition, the GNU-style or C++11
+non-standard attribute ``optnone`` can be used.
+
+.. code-block:: c++
+
+  // The following functions will not be optimized.
+  // GNU-style attribute
+  __attribute__((optnone)) int foo() {
+    // ... code
+  }
+  // C++11 attribute
+  [[clang::optnone]] int bar() {
+    // ... code
+  }
+
+To facilitate disabling optimization for a range of function definitions, a
+range-based pragma is provided. Its syntax is ``#pragma clang optimize``
+followed by ``off`` or ``on``.
+
+All function definitions in the region between an ``off`` and the following
+``on`` will be decorated with the ``optnone`` attribute unless doing so would
+conflict with explicit attributes already present on the function (e.g. the
+ones that control inlining).
+
+.. code-block:: c++
+
+  #pragma clang optimize off
+  // This function will be decorated with optnone.
+  int foo() {
+    // ... code
+  }
+
+  // optnone conflicts with always_inline, so bar() will not be decorated.
+  __attribute__((always_inline)) int bar() {
+    // ... code
+  }
+  #pragma clang optimize on
+
+If no ``on`` is found to close an ``off`` region, the end of the region is the
+end of the compilation unit.
+
+Note that a stray ``#pragma clang optimize on`` does not selectively enable
+additional optimizations when compiling at low optimization levels. This feature
+can only be used to selectively disable optimizations.
+
+The pragma has an effect on functions only at the point of their definition; for
+function templates, this means that the state of the pragma at the point of an
+instantiation is not necessarily relevant. Consider the following example:
+
+.. code-block:: c++
+
+  template<typename T> T twice(T t) {
+    return 2 * t;
+  }
+
+  #pragma clang optimize off
+  template<typename T> T thrice(T t) {
+    return 3 * t;
+  }
+
+  int container(int a, int b) {
+    return twice(a) + thrice(b);
+  }
+  #pragma clang optimize on
+
+In this example, the definition of the template function ``twice`` is outside
+the pragma region, whereas the definition of ``thrice`` is inside the region.
+The ``container`` function is also in the region and will not be optimized, but
+it causes the instantiation of ``twice`` and ``thrice`` with an ``int`` type; of
+these two instantiations, ``twice`` will be optimized (because its definition
+was outside the region) and ``thrice`` will not be optimized.
+
+Extensions for loop hint optimizations
+======================================
+
+The ``#pragma clang loop`` directive is used to specify hints for optimizing the
+subsequent for, while, do-while, or c++11 range-based for loop. The directive
+provides options for vectorization, interleaving, and unrolling. Loop hints can
+be specified before any loop and will be ignored if the optimization is not safe
+to apply.
+
+Vectorization and Interleaving
+------------------------------
+
+A vectorized loop performs multiple iterations of the original loop
+in parallel using vector instructions. The instruction set of the target
+processor determines which vector instructions are available and their vector
+widths. This restricts the types of loops that can be vectorized. The vectorizer
+automatically determines if the loop is safe and profitable to vectorize. A
+vector instruction cost model is used to select the vector width.
+
+Interleaving multiple loop iterations allows modern processors to further
+improve instruction-level parallelism (ILP) using advanced hardware features,
+such as multiple execution units and out-of-order execution. The vectorizer uses
+a cost model that depends on the register pressure and generated code size to
+select the interleaving count.
+
+Vectorization is enabled by ``vectorize(enable)`` and interleaving is enabled
+by ``interleave(enable)``. This is useful when compiling with ``-Os`` to
+manually enable vectorization or interleaving.
+
+.. code-block:: c++
+
+  #pragma clang loop vectorize(enable)
+  #pragma clang loop interleave(enable)
+  for(...) {
+    ...
+  }
+
+The vector width is specified by ``vectorize_width(_value_)`` and the interleave
+count is specified by ``interleave_count(_value_)``, where
+_value_ is a positive integer. This is useful for specifying the optimal
+width/count of the set of target architectures supported by your application.
+
+.. code-block:: c++
+
+  #pragma clang loop vectorize_width(2)
+  #pragma clang loop interleave_count(2)
+  for(...) {
+    ...
+  }
+
+Specifying a width/count of 1 disables the optimization, and is equivalent to
+``vectorize(disable)`` or ``interleave(disable)``.
+
+Loop Unrolling
+--------------
+
+Unrolling a loop reduces the loop control overhead and exposes more
+opportunities for ILP. Loops can be fully or partially unrolled. Full unrolling
+eliminates the loop and replaces it with an enumerated sequence of loop
+iterations. Full unrolling is only possible if the loop trip count is known at
+compile time. Partial unrolling replicates the loop body within the loop and
+reduces the trip count.
+
+If ``unroll(enable)`` is specified the unroller will attempt to fully unroll the
+loop if the trip count is known at compile time. If the loop count is not known
+or the fully unrolled code size is greater than the limit specified by the
+`-pragma-unroll-threshold` command line option the loop will be partially
+unrolled subject to the same limit.
+
+.. code-block:: c++
+
+  #pragma clang loop unroll(enable)
+  for(...) {
+    ...
+  }
+
+The unroll count can be specified explicitly with ``unroll_count(_value_)`` where
+_value_ is a positive integer. If this value is greater than the trip count the
+loop will be fully unrolled. Otherwise the loop is partially unrolled subject
+to the `-pragma-unroll-threshold` limit.
+
+.. code-block:: c++
+
+  #pragma clang loop unroll_count(8)
+  for(...) {
+    ...
+  }
+
+Unrolling of a loop can be prevented by specifying ``unroll(disable)``.
+
+Additional Information
+----------------------
+
+For convenience multiple loop hints can be specified on a single line.
+
+.. code-block:: c++
+
+  #pragma clang loop vectorize_width(4) interleave_count(8)
+  for(...) {
+    ...
+  }
+
+If an optimization cannot be applied any hints that apply to it will be ignored.
+For example, the hint ``vectorize_width(4)`` is ignored if the loop is not
+proven safe to vectorize. To identify and diagnose optimization issues use
+`-Rpass`, `-Rpass-missed`, and `-Rpass-analysis` command line options. See the
+user guide for details.
diff --git a/docs/LibASTMatchersReference.html b/docs/LibASTMatchersReference.html
index c308635..4988053 100644
--- a/docs/LibASTMatchersReference.html
+++ b/docs/LibASTMatchersReference.html
@@ -1966,6 +1966,31 @@
 </pre></td></tr>
 
 
+<tr><td>Matcher&lt<a href="http://clang.llvm.org/doxygen/classclang_1_1VarDecl.html">VarDecl</a>&gt;</td><td class="name" onclick="toggle('hasGlobalStorage0')"><a name="hasGlobalStorage0Anchor">hasGlobalStorage</a></td><td></td></tr>
+<tr><td colspan="4" class="doc" id="hasGlobalStorage0"><pre>Matches a variable declaration that does not have local storage.
+
+Example matches y and z (matcher = varDecl(hasGlobalStorage())
+void f() {
+  int x;
+  static int y;
+}
+int z;
+</pre></td></tr>
+
+
+<tr><td>Matcher&lt<a href="http://clang.llvm.org/doxygen/classclang_1_1VarDecl.html">VarDecl</a>&gt;</td><td class="name" onclick="toggle('hasLocalStorage0')"><a name="hasLocalStorage0Anchor">hasLocalStorage</a></td><td></td></tr>
+<tr><td colspan="4" class="doc" id="hasLocalStorage0"><pre>Matches a variable declaration that has function scope and is a
+non-static local variable.
+
+Example matches x (matcher = varDecl(hasLocalStorage())
+void f() {
+  int x;
+  static int y;
+}
+int z;
+</pre></td></tr>
+
+
 <tr><td>Matcher&lt<a href="http://clang.llvm.org/doxygen/classclang_1_1VarDecl.html">VarDecl</a>&gt;</td><td class="name" onclick="toggle('isDefinition1')"><a name="isDefinition1Anchor">isDefinition</a></td><td></td></tr>
 <tr><td colspan="4" class="doc" id="isDefinition1"><pre>Matches if a declaration has a body attached.
 
diff --git a/docs/MSVCCompatibility.rst b/docs/MSVCCompatibility.rst
index 683ce93..32efb76 100644
--- a/docs/MSVCCompatibility.rst
+++ b/docs/MSVCCompatibility.rst
@@ -19,7 +19,7 @@
 
 First, Clang attempts to be ABI-compatible, meaning that Clang-compiled code
 should be able to link against MSVC-compiled code successfully.  However, C++
-ABIs are particular large and complicated, and Clang's support for MSVC's C++
+ABIs are particularly large and complicated, and Clang's support for MSVC's C++
 ABI is a work in progress.  If you don't require MSVC ABI compatibility or don't
 want to use Microsoft's C and C++ runtimes, the mingw32 toolchain might be a
 better fit for your project.
@@ -43,14 +43,14 @@
 
 The status of major ABI-impacting C++ features:
 
-* Record layout: :good:`Mostly complete`.  We've tested this with a fuzzer, and
-  most of the remaining failures involve ``#pragma pack``,
-  ``__declspec(align(N))``, or other pragmas.
+* Record layout: :good:`Complete`.  We've tested this with a fuzzer and have
+  fixed all known bugs.
 
 * Class inheritance: :good:`Mostly complete`.  This covers all of the standard
   OO features you would expect: virtual method inheritance, multiple
   inheritance, and virtual inheritance.  Every so often we uncover a bug where
-  our tables are incompatible, but this is pretty well in hand.
+  our tables are incompatible, but this is pretty well in hand.  This feature
+  has also been fuzz tested.
 
 * Name mangling: :good:`Ongoing`.  Every new C++ feature generally needs its own
   mangling.  For example, member pointer template arguments have an interesting
@@ -78,47 +78,64 @@
   enabling stack traces in all modern Windows debuggers.  Clang does not emit
   any type info or description of variable layout.
 
-* `RTTI`_: :none:`Unstarted`.  See the bug for a discussion of what needs to
-  happen first.
-
-.. _RTTI: http://llvm.org/PR18951
+* RTTI: :good:`Complete`.  Generation of RTTI data structures has been
+  finished, along with support for the ``/GR`` flag.
 
 * Exceptions and SEH: :none:`Unstarted`.  Clang can parse both constructs, but
-  does not know how to emit compatible handlers.  This depends on RTTI.
+  does not know how to emit compatible handlers.
 
 * Thread-safe initialization of local statics: :none:`Unstarted`.  We are ABI
-  compatible with MSVC 2012, which does not support thread-safe local statics.
-  MSVC 2013 changed the ABI to make initialization of local statics thread safe,
+  compatible with MSVC 2013, which does not support thread-safe local statics.
+  MSVC "14" changed the ABI to make initialization of local statics thread safe,
   and we have not yet implemented this.
 
-* Lambdas in ABI boundaries: :none:`Infeasible`.  It is unlikely that we will
-  ever be fully ABI compatible with lambdas declared in inline functions due to
-  what appears to be a hash code in the name mangling.  Lambdas that are not
-  externally visible should work fine.
+* Lambdas: :good:`Mostly complete`.  Clang is compatible with Microsoft's
+  implementation of lambdas except for providing overloads for conversion to
+  function pointer for different calling conventions.  However, Microsoft's
+  extension is non-conforming.
 
 Template instantiation and name lookup
 ======================================
 
-In addition to the usual `dependent name lookup FAQs`_, Clang is often unable to
-parse certain invalid C++ constructs that MSVC allows.  As of this writing,
-Clang will reject code with missing ``typename`` annotations:
+MSVC allows many invalid constructs in class templates that Clang has
+historically rejected.  In order to parse widely distributed headers for
+libraries such as the Active Template Library (ATL) and Windows Runtime Library
+(WRL), some template rules have been relaxed or extended in Clang on Windows.
 
-.. _dependent name lookup FAQs:
+The first major semantic difference is that MSVC appears to defer all parsing
+an analysis of inline method bodies in class templates until instantiation
+time.  By default on Windows, Clang attempts to follow suit.  This behavior is
+controlled by the ``-fdelayed-template-parsing`` flag.  While Clang delays
+parsing of method bodies, it still parses the bodies *before* template argument
+substitution, which is not what MSVC does.  The following compatibility tweaks
+are necessary to parse the the template in those cases.
+
+MSVC allows some name lookup into dependent base classes.  Even on other
+platforms, this has been a `frequently asked question`_ for Clang users.  A
+dependent base class is a base class that depends on the value of a template
+parameter.  Clang cannot see any of the names inside dependent bases while it
+is parsing your template, so the user is sometimes required to use the
+``typename`` keyword to assist the parser.  On Windows, Clang attempts to
+follow the normal lookup rules, but if lookup fails, it will assume that the
+user intended to find the name in a dependent base.  While parsing the
+following program, Clang will recover as if the user had written the
+commented-out code:
+
+.. _frequently asked question:
   http://clang.llvm.org/compatibility.html#dep_lookup
 
 .. code-block:: c++
 
-  struct X {
-    typedef int type;
+  template <typename T>
+  struct Foo : T {
+    void f() {
+      /*typename*/ T::UnknownType x =  /*this->*/unknownMember;
+    }
   };
-  template<typename T> int f() {
-    // missing typename keyword
-    return sizeof(/*typename*/ T::type);
-  }
-  template void f<X>();
 
-Accepting code like this is ongoing work.  Ultimately, it may be cleaner to
-`implement a token-based template instantiation mode`_ than it is to add
-compatibility hacks to the existing AST-based instantiation.
+After recovery, Clang warns the user that this code is non-standard and issues
+a hint suggesting how to fix the problem.
 
-.. _implement a token-based template instantiation mode: http://llvm.org/PR18714
+As of this writing, Clang is able to compile a simple ATL hello world
+application.  There are still issues parsing WRL headers for modern Windows 8
+apps, but they should be addressed soon.
diff --git a/docs/PTHInternals.rst b/docs/PTHInternals.rst
index 10dda61..7401cf9 100644
--- a/docs/PTHInternals.rst
+++ b/docs/PTHInternals.rst
@@ -135,11 +135,11 @@
 an algorithmic level, especially when one considers header files of
 arbitrary size.
 
-There are plans to potentially implement an complementary PCH
-implementation for Clang based on the lazy deserialization of ASTs. This
-approach would theoretically have the same constant-time algorithmic
-advantages just mentioned but would also retain some of the strengths of
-PTH such as reduced memory pressure (ideal for multi-core builds).
+There is also a PCH implementation for Clang based on the lazy
+deserialization of ASTs. This approach theoretically has the same
+constant-time algorithmic advantages just mentioned but also retains some
+of the strengths of PTH such as reduced memory pressure (ideal for
+multi-core builds).
 
 Internal PTH Optimizations
 --------------------------
diff --git a/docs/ReleaseNotes.rst b/docs/ReleaseNotes.rst
index df27af9..c99262e 100644
--- a/docs/ReleaseNotes.rst
+++ b/docs/ReleaseNotes.rst
@@ -51,10 +51,10 @@
   GCC 4.7 changed the mingw ABI. Clang 3.4 and older use the GCC 4.6
   ABI. Clang 3.5 and newer use the GCC 4.7 abi.
 
-- The __has_attribute feature test is now target-aware. Older versions of Clang 
-  would return true when the attribute spelling was known, regardless of whether 
-  the attribute was available to the specific target. Clang now returns true only 
-  when the attribute pertains to the current compilation target.
+- The __has_attribute feature test is now target-aware. Older versions of Clang
+  would return true when the attribute spelling was known, regardless of whether
+  the attribute was available to the specific target. Clang now returns true
+  only when the attribute pertains to the current compilation target.
 
 
 Improvements to Clang's diagnostics
@@ -92,6 +92,20 @@
 `-fcatch-undefined-behavior` and `-fbounds-checking` were removed in favor of
 `-fsanitize=` family of flags.
 
+It is now possible to get optimization reports from the major transformation
+passes via three new flags: `-Rpass`, `-Rpass-missed` and `-Rpass-analysis`.
+These flags take a POSIX regular expression which indicates the name
+of the pass (or passes) that should emit optimization remarks.
+
+New Pragmas in Clang
+-----------------------
+
+Loop optimization hints can be specified using the new `#pragma clang loop`
+directive just prior to the desired loop. The directive allows vectorization,
+interleaving, and unrolling to be enabled or disabled. Vector width as well
+as interleave and unrolling count can be manually specified.  See language
+extensions for details.
+
 C Language Changes in Clang
 ---------------------------
 
@@ -139,6 +153,16 @@
 Static Analyzer
 ---------------
 
+The `-analyzer-config` options are now passed from scan-build through to
+ccc-analyzer and then to Clang.
+
+With the option `-analyzer-config stable-report-filename=true`,
+instead of `report-XXXXXX.html`, scan-build/clang analyzer generate
+`report-<filename>-<function, method name>-<function position>-<id>.html`.
+(id = i++ for several issues found in the same function/method).
+
+List the function/method name in the index page of scan-build.
+
 ...
 
 Core Analysis Improvements
diff --git a/docs/UsersManual.rst b/docs/UsersManual.rst
index 19603d4..90a0e53 100644
--- a/docs/UsersManual.rst
+++ b/docs/UsersManual.rst
@@ -531,6 +531,70 @@
 The -fno-crash-diagnostics flag can be helpful for speeding the process
 of generating a delta reduced test case.
 
+Options to Emit Optimization Reports
+------------------------------------
+
+Optimization reports trace, at a high-level, all the major decisions
+done by compiler transformations. For instance, when the inliner
+decides to inline function ``foo()`` into ``bar()``, or the loop unroller
+decides to unroll a loop N times, or the vectorizer decides to
+vectorize a loop body.
+
+Clang offers a family of flags which the optimizers can use to emit
+a diagnostic in three cases:
+
+1. When the pass makes a transformation (:option:`-Rpass`).
+
+2. When the pass fails to make a transformation (:option:`-Rpass-missed`).
+
+3. When the pass determines whether or not to make a transformation
+   (:option:`-Rpass-analysis`).
+
+NOTE: Although the discussion below focuses on :option:`-Rpass`, the exact
+same options apply to :option:`-Rpass-missed` and :option:`-Rpass-analysis`.
+
+Since there are dozens of passes inside the compiler, each of these flags
+take a regular expression that identifies the name of the pass which should
+emit the associated diagnostic. For example, to get a report from the inliner,
+compile the code with:
+
+.. code-block:: console
+
+   $ clang -O2 -Rpass=inline code.cc -o code
+   code.cc:4:25: remark: foo inlined into bar [-Rpass=inline]
+   int bar(int j) { return foo(j, j - 2); }
+                           ^
+
+Note that remarks from the inliner are identified with `[-Rpass=inline]`.
+To request a report from every optimization pass, you should use
+:option:`-Rpass=.*` (in fact, you can use any valid POSIX regular
+expression). However, do not expect a report from every transformation
+made by the compiler. Optimization remarks do not really make sense
+outside of the major transformations (e.g., inlining, vectorization,
+loop optimizations) and not every optimization pass supports this
+feature.
+
+Current limitations
+^^^^^^^^^^^^^^^^^^^
+
+1. For :option:`-Rpass` to provide column information, you
+   need to enable it explicitly. That is, you need to add
+   :option:`-gcolumn-info`. If you omit this, remarks will only show
+   line information.
+
+2. Optimization remarks that refer to function names will display the
+   mangled name of the function. Since these remarks are emitted by the
+   back end of the compiler, it does not know anything about the input
+   language, nor its mangling rules.
+
+3. Some source locations are not displayed correctly. The front end has
+   a more detailed source location tracking than the locations included
+   in the debug info (e.g., the front end can locate code inside macro
+   expansions). However, the locations used by :option:`-Rpass` are
+   translated from debug annotations. That translation can be lossy,
+   which results in some remarks having no location information.
+
+
 Language and Target-Independent Features
 ========================================
 
@@ -872,10 +936,6 @@
       ``-fsanitize=address``:
       :doc:`AddressSanitizer`, a memory error
       detector.
-   -  ``-fsanitize=init-order``: Make AddressSanitizer check for
-      dynamic initialization order problems. Implied by ``-fsanitize=address``.
-   -  ``-fsanitize=address-full``: AddressSanitizer with all the
-      experimental features listed below.
    -  ``-fsanitize=integer``: Enables checks for undefined or
       suspicious integer behavior.
    -  .. _opt_fsanitize_thread:
@@ -958,14 +1018,6 @@
    -  ``-fno-sanitize-blacklist``: don't use blacklist file, if it was
       specified earlier in the command line.
 
-   Experimental features of AddressSanitizer (not ready for widespread
-   use, require explicit ``-fsanitize=address``):
-
-   -  ``-fsanitize=use-after-return``: Check for use-after-return
-      errors (accessing local variable after the function exit).
-   -  ``-fsanitize=use-after-scope``: Check for use-after-scope errors
-      (accesing local variable after it went out of scope).
-
    Extra features of MemorySanitizer (require explicit
    ``-fsanitize=memory``):
 
@@ -1065,8 +1117,29 @@
    only. This only applies to the AArch64 architecture.
 
 
-Using Sampling Profilers for Optimization
------------------------------------------
+Profile Guided Optimization
+---------------------------
+
+Profile information enables better optimization. For example, knowing that a
+branch is taken very frequently helps the compiler make better decisions when
+ordering basic blocks. Knowing that a function ``foo`` is called more
+frequently than another function ``bar`` helps the inliner.
+
+Clang supports profile guided optimization with two different kinds of
+profiling. A sampling profiler can generate a profile with very low runtime
+overhead, or you can build an instrumented version of the code that collects
+more detailed profile information. Both kinds of profiles can provide execution
+counts for instructions in the code and information on branches taken and
+function invocation.
+
+Regardless of which kind of profiling you use, be careful to collect profiles
+by running your code with inputs that are representative of the typical
+behavior. Code that is not exercised in the profile will be optimized as if it
+is unimportant, and the compiler may make poor optimization choices for code
+that is disproportionately used while profiling.
+
+Using Sampling Profilers
+^^^^^^^^^^^^^^^^^^^^^^^^
 
 Sampling profilers are used to collect runtime information, such as
 hardware counters, while your application executes. They are typically
@@ -1074,14 +1147,6 @@
 sample data collected by the profiler can be used during compilation
 to determine what the most executed areas of the code are.
 
-In particular, sample profilers can provide execution counts for all
-instructions in the code and information on branches taken and function
-invocation. The compiler can use this information in its optimization
-cost models. For example, knowing that a branch is taken very
-frequently helps the compiler make better decisions when ordering
-basic blocks. Knowing that a function ``foo`` is called more
-frequently than another function ``bar`` helps the inliner.
-
 Using the data from a sample profiler requires some changes in the way
 a program is built. Before the compiler can use profiling information,
 the code needs to execute under the profiler. The following is the
@@ -1141,7 +1206,7 @@
 
 
 Sample Profile Format
-^^^^^^^^^^^^^^^^^^^^^
+"""""""""""""""""""""
 
 If you are not using Linux Perf to collect profiles, you will need to
 write a conversion tool from your profiler to LLVM's format. This section
@@ -1225,6 +1290,60 @@
    with ``baz()`` being the relatively more frequently called target.
 
 
+Profiling with Instrumentation
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Clang also supports profiling via instrumentation. This requires building a
+special instrumented version of the code and has some runtime
+overhead during the profiling, but it provides more detailed results than a
+sampling profiler. It also provides reproducible results, at least to the
+extent that the code behaves consistently across runs.
+
+Here are the steps for using profile guided optimization with
+instrumentation:
+
+1. Build an instrumented version of the code by compiling and linking with the
+   ``-fprofile-instr-generate`` option.
+
+   .. code-block:: console
+
+     $ clang++ -O2 -fprofile-instr-generate code.cc -o code
+
+2. Run the instrumented executable with inputs that reflect the typical usage.
+   By default, the profile data will be written to a ``default.profraw`` file
+   in the current directory. You can override that default by setting the
+   ``LLVM_PROFILE_FILE`` environment variable to specify an alternate file.
+   Any instance of ``%p`` in that file name will be replaced by the process
+   ID, so that you can easily distinguish the profile output from multiple
+   runs.
+
+   .. code-block:: console
+
+     $ LLVM_PROFILE_FILE="code-%p.profraw" ./code
+
+3. Combine profiles from multiple runs and convert the "raw" profile format to
+   the input expected by clang. Use the ``merge`` command of the llvm-profdata
+   tool to do this.
+
+   .. code-block:: console
+
+     $ llvm-profdata merge -output=code.profdata code-*.profraw
+
+   Note that this step is necessary even when there is only one "raw" profile,
+   since the merge operation also changes the file format.
+
+4. Build the code again using the ``-fprofile-instr-use`` option to specify the
+   collected profile data.
+
+   .. code-block:: console
+
+     $ clang++ -O2 -fprofile-instr-use=code.profdata code.cc -o code
+
+   You can repeat step 4 as often as you like without regenerating the
+   profile. As you make changes to your code, clang may no longer be able to
+   use the profile data. It will warn you when this happens.
+
+
 Controlling Size of Debug Information
 -------------------------------------
 
@@ -1244,6 +1363,28 @@
   doesn't contain any other data (e.g. description of local variables or
   function parameters).
 
+.. option:: -fstandalone-debug
+
+  Clang supports a number of optimizations to reduce the size of debug
+  information in the binary. They work based on the assumption that
+  the debug type information can be spread out over multiple
+  compilation units.  For instance, Clang will not emit type
+  definitions for types that are not needed by a module and could be
+  replaced with a forward declaration.  Further, Clang will only emit
+  type info for a dynamic C++ class in the module that contains the
+  vtable for the class.
+
+  The **-fstandalone-debug** option turns off these optimizations.
+  This is useful when working with 3rd-party libraries that don't come
+  with debug information.  Note that Clang will never emit type
+  information for types that are not referenced at all by the program.
+
+.. option:: -fno-standalone-debug
+
+   On Darwin **-fstandalone-debug** is enabled by default. The
+   **-fno-standalone-debug** option can be used to get to turn on the
+   vtable-based optimization described above.
+
 .. option:: -g
 
   Generate complete debug info.
diff --git a/docs/tools/clang.pod b/docs/tools/clang.pod
index 6ccdbba..f7d2eaf 100644
--- a/docs/tools/clang.pod
+++ b/docs/tools/clang.pod
@@ -324,8 +324,9 @@
 
 The B<-fstandalone-debug> option turns off these optimizations.  This
 is useful when working with 3rd-party libraries that don't come with
-debug information.  Note that Clang will never emit type information
-for types that are not referenced at all by the program.
+debug information.  This is the default on Darwin.  Note that Clang
+will never emit type information for types that are not referenced at
+all by the program.
 
 =item B<-fexceptions>