Update Clang for rebase to r212749.

This also fixes a small issue with arm_neon.h not being generated always.

Includes a cherry-pick of:
r213450 - fixes mac-specific header issue
r213126 - removes a default -Bsymbolic on Android

Change-Id: I2a790a0f5d3b2aab11de596fc3a74e7cbc99081d
diff --git a/docs/UsersManual.rst b/docs/UsersManual.rst
index 19603d4..90a0e53 100644
--- a/docs/UsersManual.rst
+++ b/docs/UsersManual.rst
@@ -531,6 +531,70 @@
 The -fno-crash-diagnostics flag can be helpful for speeding the process
 of generating a delta reduced test case.
 
+Options to Emit Optimization Reports
+------------------------------------
+
+Optimization reports trace, at a high-level, all the major decisions
+done by compiler transformations. For instance, when the inliner
+decides to inline function ``foo()`` into ``bar()``, or the loop unroller
+decides to unroll a loop N times, or the vectorizer decides to
+vectorize a loop body.
+
+Clang offers a family of flags which the optimizers can use to emit
+a diagnostic in three cases:
+
+1. When the pass makes a transformation (:option:`-Rpass`).
+
+2. When the pass fails to make a transformation (:option:`-Rpass-missed`).
+
+3. When the pass determines whether or not to make a transformation
+   (:option:`-Rpass-analysis`).
+
+NOTE: Although the discussion below focuses on :option:`-Rpass`, the exact
+same options apply to :option:`-Rpass-missed` and :option:`-Rpass-analysis`.
+
+Since there are dozens of passes inside the compiler, each of these flags
+take a regular expression that identifies the name of the pass which should
+emit the associated diagnostic. For example, to get a report from the inliner,
+compile the code with:
+
+.. code-block:: console
+
+   $ clang -O2 -Rpass=inline code.cc -o code
+   code.cc:4:25: remark: foo inlined into bar [-Rpass=inline]
+   int bar(int j) { return foo(j, j - 2); }
+                           ^
+
+Note that remarks from the inliner are identified with `[-Rpass=inline]`.
+To request a report from every optimization pass, you should use
+:option:`-Rpass=.*` (in fact, you can use any valid POSIX regular
+expression). However, do not expect a report from every transformation
+made by the compiler. Optimization remarks do not really make sense
+outside of the major transformations (e.g., inlining, vectorization,
+loop optimizations) and not every optimization pass supports this
+feature.
+
+Current limitations
+^^^^^^^^^^^^^^^^^^^
+
+1. For :option:`-Rpass` to provide column information, you
+   need to enable it explicitly. That is, you need to add
+   :option:`-gcolumn-info`. If you omit this, remarks will only show
+   line information.
+
+2. Optimization remarks that refer to function names will display the
+   mangled name of the function. Since these remarks are emitted by the
+   back end of the compiler, it does not know anything about the input
+   language, nor its mangling rules.
+
+3. Some source locations are not displayed correctly. The front end has
+   a more detailed source location tracking than the locations included
+   in the debug info (e.g., the front end can locate code inside macro
+   expansions). However, the locations used by :option:`-Rpass` are
+   translated from debug annotations. That translation can be lossy,
+   which results in some remarks having no location information.
+
+
 Language and Target-Independent Features
 ========================================
 
@@ -872,10 +936,6 @@
       ``-fsanitize=address``:
       :doc:`AddressSanitizer`, a memory error
       detector.
-   -  ``-fsanitize=init-order``: Make AddressSanitizer check for
-      dynamic initialization order problems. Implied by ``-fsanitize=address``.
-   -  ``-fsanitize=address-full``: AddressSanitizer with all the
-      experimental features listed below.
    -  ``-fsanitize=integer``: Enables checks for undefined or
       suspicious integer behavior.
    -  .. _opt_fsanitize_thread:
@@ -958,14 +1018,6 @@
    -  ``-fno-sanitize-blacklist``: don't use blacklist file, if it was
       specified earlier in the command line.
 
-   Experimental features of AddressSanitizer (not ready for widespread
-   use, require explicit ``-fsanitize=address``):
-
-   -  ``-fsanitize=use-after-return``: Check for use-after-return
-      errors (accessing local variable after the function exit).
-   -  ``-fsanitize=use-after-scope``: Check for use-after-scope errors
-      (accesing local variable after it went out of scope).
-
    Extra features of MemorySanitizer (require explicit
    ``-fsanitize=memory``):
 
@@ -1065,8 +1117,29 @@
    only. This only applies to the AArch64 architecture.
 
 
-Using Sampling Profilers for Optimization
------------------------------------------
+Profile Guided Optimization
+---------------------------
+
+Profile information enables better optimization. For example, knowing that a
+branch is taken very frequently helps the compiler make better decisions when
+ordering basic blocks. Knowing that a function ``foo`` is called more
+frequently than another function ``bar`` helps the inliner.
+
+Clang supports profile guided optimization with two different kinds of
+profiling. A sampling profiler can generate a profile with very low runtime
+overhead, or you can build an instrumented version of the code that collects
+more detailed profile information. Both kinds of profiles can provide execution
+counts for instructions in the code and information on branches taken and
+function invocation.
+
+Regardless of which kind of profiling you use, be careful to collect profiles
+by running your code with inputs that are representative of the typical
+behavior. Code that is not exercised in the profile will be optimized as if it
+is unimportant, and the compiler may make poor optimization choices for code
+that is disproportionately used while profiling.
+
+Using Sampling Profilers
+^^^^^^^^^^^^^^^^^^^^^^^^
 
 Sampling profilers are used to collect runtime information, such as
 hardware counters, while your application executes. They are typically
@@ -1074,14 +1147,6 @@
 sample data collected by the profiler can be used during compilation
 to determine what the most executed areas of the code are.
 
-In particular, sample profilers can provide execution counts for all
-instructions in the code and information on branches taken and function
-invocation. The compiler can use this information in its optimization
-cost models. For example, knowing that a branch is taken very
-frequently helps the compiler make better decisions when ordering
-basic blocks. Knowing that a function ``foo`` is called more
-frequently than another function ``bar`` helps the inliner.
-
 Using the data from a sample profiler requires some changes in the way
 a program is built. Before the compiler can use profiling information,
 the code needs to execute under the profiler. The following is the
@@ -1141,7 +1206,7 @@
 
 
 Sample Profile Format
-^^^^^^^^^^^^^^^^^^^^^
+"""""""""""""""""""""
 
 If you are not using Linux Perf to collect profiles, you will need to
 write a conversion tool from your profiler to LLVM's format. This section
@@ -1225,6 +1290,60 @@
    with ``baz()`` being the relatively more frequently called target.
 
 
+Profiling with Instrumentation
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Clang also supports profiling via instrumentation. This requires building a
+special instrumented version of the code and has some runtime
+overhead during the profiling, but it provides more detailed results than a
+sampling profiler. It also provides reproducible results, at least to the
+extent that the code behaves consistently across runs.
+
+Here are the steps for using profile guided optimization with
+instrumentation:
+
+1. Build an instrumented version of the code by compiling and linking with the
+   ``-fprofile-instr-generate`` option.
+
+   .. code-block:: console
+
+     $ clang++ -O2 -fprofile-instr-generate code.cc -o code
+
+2. Run the instrumented executable with inputs that reflect the typical usage.
+   By default, the profile data will be written to a ``default.profraw`` file
+   in the current directory. You can override that default by setting the
+   ``LLVM_PROFILE_FILE`` environment variable to specify an alternate file.
+   Any instance of ``%p`` in that file name will be replaced by the process
+   ID, so that you can easily distinguish the profile output from multiple
+   runs.
+
+   .. code-block:: console
+
+     $ LLVM_PROFILE_FILE="code-%p.profraw" ./code
+
+3. Combine profiles from multiple runs and convert the "raw" profile format to
+   the input expected by clang. Use the ``merge`` command of the llvm-profdata
+   tool to do this.
+
+   .. code-block:: console
+
+     $ llvm-profdata merge -output=code.profdata code-*.profraw
+
+   Note that this step is necessary even when there is only one "raw" profile,
+   since the merge operation also changes the file format.
+
+4. Build the code again using the ``-fprofile-instr-use`` option to specify the
+   collected profile data.
+
+   .. code-block:: console
+
+     $ clang++ -O2 -fprofile-instr-use=code.profdata code.cc -o code
+
+   You can repeat step 4 as often as you like without regenerating the
+   profile. As you make changes to your code, clang may no longer be able to
+   use the profile data. It will warn you when this happens.
+
+
 Controlling Size of Debug Information
 -------------------------------------
 
@@ -1244,6 +1363,28 @@
   doesn't contain any other data (e.g. description of local variables or
   function parameters).
 
+.. option:: -fstandalone-debug
+
+  Clang supports a number of optimizations to reduce the size of debug
+  information in the binary. They work based on the assumption that
+  the debug type information can be spread out over multiple
+  compilation units.  For instance, Clang will not emit type
+  definitions for types that are not needed by a module and could be
+  replaced with a forward declaration.  Further, Clang will only emit
+  type info for a dynamic C++ class in the module that contains the
+  vtable for the class.
+
+  The **-fstandalone-debug** option turns off these optimizations.
+  This is useful when working with 3rd-party libraries that don't come
+  with debug information.  Note that Clang will never emit type
+  information for types that are not referenced at all by the program.
+
+.. option:: -fno-standalone-debug
+
+   On Darwin **-fstandalone-debug** is enabled by default. The
+   **-fno-standalone-debug** option can be used to get to turn on the
+   vtable-based optimization described above.
+
 .. option:: -g
 
   Generate complete debug info.