Docs: HWC2 updates, new section+subsections
      Removing Framebuffer bullet
      Making OpenGL 3.x drivers optional
      Fixing horrendous typo in nav
      Adding Clay's feedback

Bug: 28419158

Change-Id: I19ce49d22f7bdb54229363580d9075f71037ef9c
diff --git a/src/devices/devices_toc.cs b/src/devices/devices_toc.cs
index 75462ad..86b4315 100644
--- a/src/devices/devices_toc.cs
+++ b/src/devices/devices_toc.cs
@@ -145,7 +145,19 @@
         </div>
         <ul>
           <li><a href="<?cs var:toroot ?>devices/graphics/architecture.html">Architecture</a></li>
-          <li><a href="<?cs var:toroot ?>devices/graphics/implement.html">Implementation</a></li>
+          <li class="nav-section">
+            <div class="nav-section-header">
+              <a href="<?cs var:toroot ?>devices/graphics/implement.html">
+                <span class="en">Implementing</span>
+              </a>
+            </div>
+            <ul>
+              <li><a href="<?cs var:toroot ?>devices/graphics/implement-hwc.html">Hardware Composer HAL</a></li>
+              <li><a href="<?cs var:toroot ?>devices/graphics/implement-vsync.html">VSYNC</a></li>
+              <li><a href="<?cs var:toroot ?>devices/graphics/implement-vulkan.html">Vulkan</a></li>
+              <li><a href="<?cs var:toroot ?>devices/graphics/implement-vdisplays.html">Virtual Displays</a></li>
+            </ul>
+         </li>
          <li class="nav-section">
             <div class="nav-section-header">
               <a href="<?cs var:toroot ?>devices/graphics/testing.html">
diff --git a/src/devices/graphics/architecture.jd b/src/devices/graphics/architecture.jd
index a61d3b8..47cc9cc 100644
--- a/src/devices/graphics/architecture.jd
+++ b/src/devices/graphics/architecture.jd
@@ -226,8 +226,8 @@
 to HWC, and lets HWC handle the rest.</li>
 </ol>
 
-<p>Since the decision-making code can be custom tailored by the hardware vendor,
-it's possible to get the best performance out of every device.</p>
+<p>Since hardware vendors can custom tailor decision-making code, it's possible
+to get the best performance out of every device.</p>
 
 <p>Overlay planes may be less efficient than GL composition when nothing on the
 screen is changing. This is particularly true when overlay contents have
@@ -242,160 +242,26 @@
 composition for some of them, meaning the number of layers used by an app can
 have a measurable impact on power consumption and performance.</p>
 
-<p>You can see exactly what SurfaceFlinger is up to with the command <code>adb
-shell dumpsys SurfaceFlinger</code>. The output is verbose; the relevant section
-is HWC summary that appears near the bottom of the output:</p>
-
-<pre>
-    type    |          source crop              |           frame           name
-------------+-----------------------------------+--------------------------------
-        HWC | [    0.0,    0.0,  320.0,  240.0] | [   48,  411, 1032, 1149] SurfaceView
-        HWC | [    0.0,   75.0, 1080.0, 1776.0] | [    0,   75, 1080, 1776] com.android.grafika/com.android.grafika.PlayMovieSurfaceActivity
-        HWC | [    0.0,    0.0, 1080.0,   75.0] | [    0,    0, 1080,   75] StatusBar
-        HWC | [    0.0,    0.0, 1080.0,  144.0] | [    0, 1776, 1080, 1920] NavigationBar
-  FB TARGET | [    0.0,    0.0, 1080.0, 1920.0] | [    0,    0, 1080, 1920] HWC_FRAMEBUFFER_TARGET
-</pre>
-
-<p>The summary includes what layers are on screen and whether they are handled
-with overlays (HWC) or OpenGL ES composition (GLES). It also includes other data
-you likely don't care about (handle, hints, flags, etc.) and which has been
-trimmed from the snippet above; source crop and frame values will be examined
-more closely later on.</p>
-
-<p>The FB_TARGET layer is where GLES composition output goes. Since all layers
-shown above are using overlays, FB_TARGET isn’t being used for this frame. The
-layer's name is indicative of its original role: On a device with
-<code>/dev/graphics/fb0</code> and no overlays, all composition would be done
-with GLES, and the output would be written to the framebuffer. On newer devices,
-generally is no simple framebuffer so the FB_TARGET layer is a scratch buffer.</p>
-
-<p class="note"><strong>Note:</strong> This is why screen grabbers written for
-older versions of Android no longer work: They are trying to read from the
-Framebuffer, but there is no such thing.</p>
-
-<p>The overlay planes have another important role: They're the only way to
-display DRM content. DRM-protected buffers cannot be accessed by SurfaceFlinger
-or the GLES driver, which means your video will disappear if HWC switches to
-GLES composition.</p>
-
-<h3 id="triple-buffering">Triple-Buffering</h3>
-
-<p>To avoid tearing on the display, the system needs to be double-buffered: the
-front buffer is displayed while the back buffer is being prepared. At VSYNC, if
-the back buffer is ready, you quickly switch them. This works reasonably well
-in a system where you're drawing directly into the framebuffer, but there's a
-hitch in the flow when a composition step is added. Because of the way
-SurfaceFlinger is triggered, our double-buffered pipeline will have a bubble.</p>
-
-<p>Suppose frame N is being displayed, and frame N+1 has been acquired by
-SurfaceFlinger for display on the next VSYNC. (Assume frame N is composited
-with an overlay, so we can't alter the buffer contents until the display is done
-with it.)  When VSYNC arrives, HWC flips the buffers.  While the app is starting
-to render frame N+2 into the buffer that used to hold frame N, SurfaceFlinger is
-scanning the layer list, looking for updates.  SurfaceFlinger won't find any new
-buffers, so it prepares to show frame N+1 again after the next VSYNC.  A little
-while later, the app finishes rendering frame N+2 and queues it for
-SurfaceFlinger, but it's too late.  This has effectively cut our maximum frame
-rate in half.</p>
-
-<p>We can fix this with triple-buffering.  Just before VSYNC, frame N is being
-displayed, frame N+1 has been composited (or scheduled for an overlay) and is
-ready to be displayed, and frame N+2 is queued up and ready to be acquired by
-SurfaceFlinger.  When the screen flips, the buffers rotate through the stages
-with no bubble.  The app has just less than a full VSYNC period (16.7ms at 60fps) to
-do its rendering and queue the buffer. And SurfaceFlinger / HWC has a full VSYNC
-period to figure out the composition before the next flip.  The downside is
-that it takes at least two VSYNC periods for anything that the app does to
-appear on the screen.  As the latency increases, the device feels less
-responsive to touch input.</p>
-
-<img src="images/surfaceflinger_bufferqueue.png" alt="SurfaceFlinger with BufferQueue" />
-
-<p class="img-caption"><strong>Figure 1.</strong> SurfaceFlinger + BufferQueue</p>
-
-<p>The diagram above depicts the flow of SurfaceFlinger and BufferQueue. During
-frame:</p>
-
-<ol>
-<li>red buffer fills up, then slides into BufferQueue</li>
-<li>after red buffer leaves app, blue buffer slides in, replacing it</li>
-<li>green buffer and systemUI* shadow-slide into HWC (showing that SurfaceFlinger
-still has the buffers, but now HWC has prepared them for display via overlay on
-the next VSYNC).</li>
-</ol>
-
-<p>The blue buffer is referenced by both the display and the BufferQueue.  The
-app is not allowed to render to it until the associated sync fence signals.</p>
-
-<p>On VSYNC, all of these happen at once:</p>
-
-<ul>
-<li>red buffer leaps into SurfaceFlinger, replacing green buffer</li>
-<li>green buffer leaps into Display, replacing blue buffer, and a dotted-line
-green twin appears in the BufferQueue</li>
-<li>the blue buffer’s fence is signaled, and the blue buffer in App empties**</li>
-<li>display rect changes from &lt;blue + SystemUI&gt; to &lt;green +
-SystemUI&gt;</li>
-</ul>
-
-<p><strong>*</strong> - The System UI process is providing the status and nav
-bars, which for our purposes here aren’t changing, so SurfaceFlinger keeps using
-the previously-acquired buffer.  In practice there would be two separate
-buffers, one for the status bar at the top, one for the navigation bar at the
-bottom, and they would be sized to fit their contents.  Each would arrive on its
-own BufferQueue.</p>
-
-<p><strong>**</strong> - The buffer doesn’t actually “empty”; if you submit it
-without drawing on it you’ll get that same blue again.  The emptying is the
-result of clearing the buffer contents, which the app should do before it starts
-drawing.</p>
-
-<p>We can reduce the latency by noting layer composition should not require a
-full VSYNC period.  If composition is performed by overlays, it takes essentially
-zero CPU and GPU time. But we can't count on that, so we need to allow a little
-time.  If the app starts rendering halfway between VSYNC signals, and
-SurfaceFlinger defers the HWC setup until a few milliseconds before the signal
-is due to arrive, we can cut the latency from 2 frames to perhaps 1.5.  In
-theory you could render and composite in a single period, allowing a return to
-double-buffering; but getting it down that far is difficult on current devices.
-Minor fluctuations in rendering and composition time, and switching from
-overlays to GLES composition, can cause us to miss a swap deadline and repeat
-the previous frame.</p>
-
-<p>SurfaceFlinger's buffer handling demonstrates the fence-based buffer
-management mentioned earlier.  If we're animating at full speed, we need to
-have an acquired buffer for the display ("front") and an acquired buffer for
-the next flip ("back").  If we're showing the buffer on an overlay, the
-contents are being accessed directly by the display and must not be touched.
-But if you look at an active layer's BufferQueue state in the <code>dumpsys
-SurfaceFlinger</code> output, you'll see one acquired buffer, one queued buffer, and
-one free buffer.  That's because, when SurfaceFlinger acquires the new "back"
-buffer, it releases the current "front" buffer to the queue.  The "front"
-buffer is still in use by the display, so anything that dequeues it must wait
-for the fence to signal before drawing on it.  So long as everybody follows
-the fencing rules, all of the queue-management IPC requests can happen in
-parallel with the display.</p>
-
 <h3 id="virtual-displays">Virtual Displays</h3>
 
-<p>SurfaceFlinger supports a "primary" display, i.e. what's built into your phone
-or tablet, and an "external" display, such as a television connected through
-HDMI.  It also supports a number of "virtual" displays, which make composited
-output available within the system.  Virtual displays can be used to record the
+<p>SurfaceFlinger supports a primary display, i.e. what's built into your phone
+or tablet, and an external display, such as a television connected through
+HDMI. It also supports a number of virtual displays that can make composited
+output available within the system. Virtual displays can be used to record the
 screen or send it over a network.</p>
 
 <p>Virtual displays may share the same set of layers as the main display
-(the "layer stack") or have its own set.  There is no VSYNC for a virtual
+(the layer stack) or have its own set. There is no VSYNC for a virtual
 display, so the VSYNC for the primary display is used to trigger composition for
 all displays.</p>
 
-<p>In the past, virtual displays were always composited with GLES.  The Hardware
-Composer managed composition for only the primary display.  In Android 4.4, the
+<p>In the past, virtual displays were always composited with GLES; the Hardware
+Composer managed composition for only the primary display. In Android 4.4, the
 Hardware Composer gained the ability to participate in virtual display
 composition.</p>
 
-<p>As you might expect, the frames generated for a virtual display are written to a
-BufferQueue.</p>
+<p>As you might expect, the frames generated for a virtual display are written
+to a BufferQueue.</p>
 
 <h3 id="screenrecord">Case study: screenrecord</h3>
 
diff --git a/src/devices/graphics/implement-hwc.jd b/src/devices/graphics/implement-hwc.jd
new file mode 100644
index 0000000..63387d9
--- /dev/null
+++ b/src/devices/graphics/implement-hwc.jd
@@ -0,0 +1,318 @@
+page.title=Implementing the Hardware Composer HAL
+@jd:body
+
+<!--
+    Copyright 2016 The Android Open Source Project
+
+    Licensed under the Apache License, Version 2.0 (the "License");
+    you may not use this file except in compliance with the License.
+    You may obtain a copy of the License at
+
+        http://www.apache.org/licenses/LICENSE-2.0
+
+    Unless required by applicable law or agreed to in writing, software
+    distributed under the License is distributed on an "AS IS" BASIS,
+    WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+    See the License for the specific language governing permissions and
+    limitations under the License.
+-->
+
+<div id="qv-wrapper">
+  <div id="qv">
+    <h2>In this document</h2>
+    <ol id="auto-toc">
+    </ol>
+  </div>
+</div>
+
+
+<p>The Hardware Composer HAL (HWC) is used by SurfaceFlinger to composite
+surfaces to the screen. The HWC abstracts objects such as overlays and 2D
+blitters and helps offload some work that would normally be done with OpenGL.</p>
+
+<p>Android 7.0 includes a new version of HWC (HWC2) used by SurfaceFlinger to
+talk to specialized window composition hardware. SurfaceFlinger contains a
+fallback path that uses the 3D graphics processor (GPU) to perform the task of
+window composition, but this path is not ideal for a couple of reasons:</p>
+
+<ul>
+  <li>Typically, GPUs are not optimized for this use case and may use more power
+  than necessary to perform composition.</li>
+  <li>Any time SurfaceFlinger is using the GPU for composition is time that
+  applications cannot use the processor for their own rendering, so it is
+  preferable to use specialized hardware for composition instead of the GPU
+  whenever possible.</li>
+</ul>
+
+<h2 id="guidance">General guidance</h2>
+
+<p>As the physical display hardware behind the Hardware Composer abstraction
+layer can vary from device to device, it's difficult to give recommendations on
+specific features. In general, use the following guidance:</p>
+
+<ul>
+  <li>The HWC should support at least four overlays (status bar, system bar,
+  application, and wallpaper/background).</li>
+  <li>Layers can be bigger than the screen, so the HWC should be able to handle
+  layers that are larger than the display (for example, a wallpaper).</li>
+  <li>Pre-multiplied per-pixel alpha blending and per-plane alpha blending
+  should be supported at the same time.</li>
+  <li>The HWC should be able to consume the same buffers the GPU, camera, and
+  video decoder are producing, so supporting some of the following
+  properties is helpful:
+  <ul>
+    <li>RGBA packing order</li>
+    <li>YUV formats</li>
+    <li>Tiling, swizzling, and stride properties</li>
+  </ul>
+  <li>To support protected content, a hardware path for protected video playback
+  must be present.</li>
+  </ul>
+
+<p>The general recommendation is to implement a non-operational HWC first; after
+the structure is complete, implement a simple algorithm to delegate composition
+to the HWC (for example, delegate only the first three or four surfaces to the
+overlay hardware of the HWC).</p>
+
+<p>Focus on optimization, such as intelligently selecting the surfaces to send
+to the overlay hardware that maximizes the load taken off of the GPU. Another
+optimization is to detect whether the screen is updating; if it isn't, delegate
+composition to OpenGL instead of the HWC to save power. When the screen updates
+again, continue to offload composition to the HWC.</p>
+
+<p>Prepare for common use cases, such as:</p>
+
+<ul>
+  <li>Full-screen games in portrait and landscape mode</li>
+  <li>Full-screen video with closed captioning and playback control</li>
+  <li>The home screen (compositing the status bar, system bar, application
+  window, and live wallpapers)</li>
+  <li>Protected video playback</li>
+  <li>Multiple display support</li>
+</ul>
+
+<p>These use cases should address regular, predictable uses rather than edge
+cases that are rarely encountered (otherwise, optimizations will have little
+benefit). Implementations must balance two competing goals: animation smoothness
+and interaction latency.</p>
+
+
+<h2 id="interface_activities">HWC2 interface activities</h2>
+
+<p>HWC2 provides a few primitives (layer, display) to represent composition work
+and its interaction with the display hardware.</p>
+<p>A <em>layer</em> is the most important unit of composition; every layer has a
+set of properties that define how it interacts with other layers. Property
+categories include the following:</p>
+
+<ul>
+<li><strong>Positional</strong>. Defines where the layer appears on its display.
+Includes information such as the positions of a layer's edges and its <em>Z
+order</em> relative to other layers (whether it should be in front of or behind
+other layers).</li>
+<li><strong>Content</strong>. Defines how content displayed on the layer should
+be presented within the bounds defined by the positional properties. Includes
+information such as crop (to expand a portion of the content to fill the bounds
+of the layer) and transform (to show rotated or flipped content).</li>
+<li><strong>Composition</strong>. Defines how the layer should be composited
+with other layers. Includes information such as blending mode and a layer-wide
+alpha value for
+<a href="https://en.wikipedia.org/wiki/Alpha_compositing#Alpha_blending">alpha
+compositing</a>.</li>
+<li><strong>Optimization</strong>. Provides information not strictly necessary
+to correctly composite the layer, but which can be used by the HWC device to
+optimize how it performs composition. Includes information such as the visible
+region of the layer and which portion of the layer has been updated since the
+previous frame.</li>
+</ul>
+
+<p>A <em>display</em> is another important unit of composition. Every layer can
+be present only on one display. A system can have multiple displays, and
+displays can be added or removed during normal system operations. This
+addition/removal can come at the request of the HWC device (typically in
+response to an external display being plugged into or removed from the device,
+called <em>hotplugging</em>), or at the request of the client, which permits the
+creation of <em>virtual displays</em> whose contents are rendered into an
+off-screen buffer instead of to a physical display.</p>
+<p>HWC2 provides functions to determine the properties of a given display, to
+switch between different configurations (e.g., 4k or 1080p resolution) and color
+modes (e.g., native color or true sRGB), and to turn the display on, off, or
+into a low-power mode if supported.</p>
+<p>In addition to layers and displays, HWC2 also provides control over the
+hardware vertical sync (VSYNC) signal along with a callback into the client to
+notify it of when a vsync event has occurred.</p>
+
+<h3 id="func_pointers">Function pointers</h3>
+<p>In this section and in HWC2 header comments, HWC interface functions are
+referred to by lowerCamelCase names that do not actually exist in the interface
+as named fields. Instead, almost every function is loaded by requesting a
+function pointer using <code>getFunction</code> provided by
+<code>hwc2_device_t</code>. For example, the function <code>createLayer</code>
+is a function pointer of type <code>HWC2_PFN_CREATE_LAYER</code>, which is
+returned when the enumerated value <code>HWC2_FUNCTION_CREATE_LAYER</code> is
+passed into <code>getFunction</code>.</p>
+<p>For detailed documentation on functions (including functions required for
+every HWC2 implementation), refer to the HWC2 header.</p>
+
+<h3 id="layer_display_handles">Layer and display handles</h3>
+<p>Layers and displays are manipulated by opaque handles.</p>
+<p>When SurfaceFlinger wants to create a new layer, it calls the
+<code>createLayer</code> function, which then returns an opaque handle of type
+<code>hwc2_layer_t</code>. From that point on, any time SurfaceFlinger wants to
+modify a property of that layer, it passes that <code>hwc2_layer_t</code> value
+into the appropriate modification function, along with any other information
+needed to make the modification. The <code>hwc2_layer_t</code> type was made
+large enough to be able to hold either a pointer or an index, and it will be
+treated as opaque by SurfaceFlinger to provide HWC implementers maximum
+flexibility.</p>
+<p>Most of the above also applies to display handles, though handles are created
+differently depending on whether they are hotplugged (where the handle is passed
+through the hotplug callback) or requested by the client as a virtual display
+(where the handle is returned from <code>createVirtualDisplay</code>).</p>
+
+<h2 id="display_comp_ops">Display composition operations</h2>
+<p>Once per hardware vsync, SurfaceFlinger wakes if it has new content to
+composite. This new content could be new image buffers from applications or just
+a change in the properties of one or more layers. When it wakes, it performs the
+following steps:</p>
+
+<ol>
+<li>Apply transactions, if present. Includes changes in the properties of layers
+specified by the window manager but not changes in the contents of layers (i.e.,
+graphic buffers from applications).</li>
+<li>Latch new graphic buffers (acquire their handles from their respective
+applications), if present.</li>
+<li>If step 1 or 2 resulted in a change to the display contents, perform a new
+composition (described below).</li>
+</ol>
+
+<p>Steps 1 and 2 have some nuances (such as deferred transactions and
+presentation timestamps) that are outside the scope of this section. However,
+step 3 involves the HWC interface and is detailed below.</p>
+<p>At the beginning of the composition process, SurfaceFlinger will create and
+destroy layers or modify layer state as applicable. It will also update the
+layers with their current contents, using calls such as
+<code>setLayerBuffer</code> or <code>setLayerColor</code>. After all layers have
+been updated, it will call <code>validateDisplay</code>, which tells the device
+to examine the state of the various layers and determine how composition will
+proceed. By default, SurfaceFlinger usually attempts to configure every layer
+such that it will be composited by the device, though there may be some
+circumstances where it will mandate that it be composited by the client.</p>
+<p>After the call to <code>validateDisplay</code>, SurfaceFlinger will follow up
+with a call to <code>getChangedCompositionTypes</code> to see if the device
+wants any of the layers' composition types changed before performing the actual
+composition. SurfaceFlinger may choose to:</p>
+
+<ul>
+<li>Change some of the layer composition types and re-validate the display.</li>
+</ul>
+
+<blockquote><strong><em>OR</strong></em></blockquote>
+
+<ul>
+<li>Call <code>acceptDisplayChanges</code>, which has the same effect as
+changing the composition types as requested by the device and re-validating
+without actually having to call <code>validateDisplay</code> again.</li>
+</ul>
+
+<p>In practice, SurfaceFlinger always takes the latter path (calling
+<code>acceptDisplayChanges</code>) though this behavior may change in the
+future.</p>
+<p>At this point, the behavior differs depending on whether any of the layers
+have been marked for client composition. If any (or all) layers have been marked
+for client composition, SurfaceFlinger will now composite all of those layers
+into the client target buffer. This buffer will be provided to the device using
+the <code>setClientTarget</code> call so that it may be either displayed
+directly on the screen or further composited with layers that have not been
+marked for client composition. If no layers have been marked for client
+composition, then the client composition step is bypassed.</p>
+<p>Finally, after all of the state has been validated and client composition has
+been performed if needed, SurfaceFlinger will call <code>presentDisplay</code>.
+This is the HWC device's cue to complete the composition process and display the
+final result.</p>
+
+<h2 id="multiple_displays">Multiple displays in Android N</h2>
+<p>While the HWC2 interface is quite flexible when it comes to the number of
+displays in the system, the rest of the Android framework is not yet as
+flexible. When designing a HWC2 implementation intended for use on Android N,
+there are some additional restrictions not present in the HWC definition itself:
+</p>
+
+<ul>
+<li>It is assumed that there is exactly one <em>primary</em> display; that is,
+that there is one physical display that will be hotplugged immediately during
+the initialization of the device (specifically after the hotplug callback is
+registered).</li>
+<li>In addition to the primary display, exactly one <em>external</em> display
+may be hotplugged during normal operation of the device.</li>
+</ul>
+
+<p>While the SurfaceFlinger operations described above are performed per-display
+(eventual goal is to be able to composite displays independently of each other),
+they are currently performed sequentially for all active displays, even if only
+the contents of one display are updated.</p>
+<p>For example, if only the external display is updated, the sequence is:</p>
+
+<pre>
+// Update state for internal display
+// Update state for external display
+validateDisplay(&lt;internal display&gt;)
+validateDisplay(&lt;external display&gt;)
+presentDisplay(&lt;internal display&gt;)
+presentDisplay(&lt;external display&gt;)
+</pre>
+
+
+<h2 id="sync_fences">Synchronization fences</h2>
+<p>Synchronization (sync) fences are a crucial aspect of the Android graphics
+system. Fences allow CPU work to proceed independently from concurrent GPU work,
+blocking only when there is a true dependency.</p>
+<p>For example, when an application submits a buffer that is being produced on
+the GPU, it will also submit a fence object; this fence signals only when the
+GPU has finished writing into the buffer. Since the only part of the system that
+truly needs the GPU write to have finished is the display hardware (the hardware
+abstracted by the HWC HAL), the graphics pipeline is able to pass this fence
+along with the buffer through SurfaceFlinger to the HWC device. Only immediately
+before that buffer would be displayed does the device need to actually check
+that the fence has signaled.</p>
+<p>Sync fences are integrated tightly into HWC2 and organized in the following
+categories:</p>
+
+<ol>
+<li>Acquire fences are passed along with input buffers to the
+<code>setLayerBuffer</code> and <code>setClientTarget</code> calls. These
+represent a pending write into the buffer and must signal before the HWC client
+or device attempts to read from the associated buffer to perform composition.
+</li>
+<li>Release fences are retrieved after the call to <code>presentDisplay</code>
+using the <code>getReleaseFences</code> call and are passed back to the
+application along with buffers that will be replaced during the next
+composition. These represent a pending read from the buffer, and must signal
+before the application attempts to write new contents into the buffer.</li>
+<li>Retire fences are returned, one per frame, as part of the call to
+<code>presentDisplay</code> and represent when the composition of this frame
+has completed, or alternately, when the composition result of the prior frame is
+no longer needed. For physical displays, this is when the current frame appears
+on the screen and can also be interpreted as the time after which it is safe to
+write to the client target buffer again (if applicable). For virtual displays,
+this is the time when it is safe to read from the output buffer.</li>
+</ol>
+
+<h3 id="hwc2_changes">Changes in HWC2</h3>
+<p>The meaning of sync fences in HWC 2.0 has changed significantly relative to
+previous versions of the HAL.</p>
+<p>In HWC v1.x, the release and retire fences were speculative. A release fence
+for a buffer or a retire fence for the display retrieved in frame N would not
+signal any sooner than frame N + 1. In other words, the meaning of the fence
+was "the content of the buffer you provided for frame N is no longer needed."
+This is speculative because in theory SurfaceFlinger may not run again after
+frame N for an indeterminate period of time, which would leave those fences
+unsignaled for the same period.</p>
+<p>In HWC 2.0, release and retire fences are non-speculative. A release or
+retire fence retrieved in frame N will signal as soon as the content of the
+associated buffers replaces the contents of the buffers from frame N - 1, or in
+other words, the meaning of the fence is "the content of the buffer you provided
+for frame N has now replaced the previous content." This is non-speculative,
+since this fence should signal shortly after <code>presentDisplay</code> is
+called as soon as the hardware presents this frame's content.</p>
+<p>For implementation details, refer to the HWC2 header.</p>
diff --git a/src/devices/graphics/implement-vdisplays.jd b/src/devices/graphics/implement-vdisplays.jd
new file mode 100644
index 0000000..177a79f
--- /dev/null
+++ b/src/devices/graphics/implement-vdisplays.jd
@@ -0,0 +1,81 @@
+page.title=Implementing Virtual Displays
+@jd:body
+
+<!--
+    Copyright 2016 The Android Open Source Project
+
+    Licensed under the Apache License, Version 2.0 (the "License");
+    you may not use this file except in compliance with the License.
+    You may obtain a copy of the License at
+
+        http://www.apache.org/licenses/LICENSE-2.0
+
+    Unless required by applicable law or agreed to in writing, software
+    distributed under the License is distributed on an "AS IS" BASIS,
+    WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+    See the License for the specific language governing permissions and
+    limitations under the License.
+-->
+
+<div id="qv-wrapper">
+  <div id="qv">
+    <h2>In this document</h2>
+    <ol id="auto-toc">
+    </ol>
+  </div>
+</div>
+
+<p>Android added platform support for virtual displays in Hardware Composer
+v1.3 (support can be used by Miracast). The virtual display composition is
+similar to the physical display: Input layers are described in
+<code>prepare()</code>, SurfaceFlinger conducts GPU composition, and layers and
+GPU framebuffer are provided to Hardware Composer in <code>set()</code>.</p>
+
+<p>Instead of the output going to the screen, it is sent to a gralloc buffer.
+Hardware Composer writes output to a buffer and provides the completion fence.
+The buffer is sent to an arbitrary consumer: video encoder, GPU, CPU, etc.
+Virtual displays can use 2D/blitter or overlays if the display pipeline can
+write to memory.</p>
+
+<h2 id=modes>Modes</h2>
+
+<p>Each frame is in one of three modes after <code>prepare()</code>:</p>
+
+<ul>
+<li><em>GLES</em>. All layers composited by GPU, which writes directly to the
+output buffer while Hardware Composer does nothing. This is equivalent to
+virtual display composition with Hardware Composer version older than v1.3.</li>
+<li><em>MIXED</em>. GPU composites some layers to framebuffer, and Hardware
+Composer composites framebuffer and remaining layers. GPU writes to scratch
+buffer (framebuffer); Hardware Composer reads scratch buffer and writes to the
+output buffer. Buffers may have different formats, e.g. RGBA and YCbCr.</li>
+<li><em>HWC</em>. All layers composited by Hardware Composer, which writes
+directly to the output buffer.</li>
+</ul>
+
+<h2 id=output_format>Output format</h2>
+<p>Output format depends on the mode:</p>
+
+<ul>
+<li><em>MIXED and HWC modes</em>. If the consumer needs CPU access, the consumer
+chooses the format. Otherwise, the format is IMPLEMENTATION_DEFINED. Gralloc
+can choose best format based on usage flags. For example, choose a YCbCr format
+if the consumer is video encoder, and Hardware Composer can write the format
+efficiently.</li>
+<li><em>GLES mode</em>. EGL driver chooses output buffer format in
+<code>dequeueBuffer()</code>, typically RGBA8888. The consumer must be able to
+accept this format.</li>
+</ul>
+
+<h2 id=egl_requirement>EGL requirement</h2>
+
+<p>Hardware Composer v1.3 virtual displays require that
+<code>eglSwapBuffers()</code> does not dequeue the next buffer immediately.
+Instead, it should defer dequeueing the buffer until rendering begins.
+Otherwise, EGL always owns the next output buffer. SurfaceFlinger can’t get the
+output buffer for Hardware Composer in MIXED/HWC mode.</p>
+
+<p>If Hardware Composer always sends all virtual display layers to GPU, all
+frames will be in GLES mode. Although not recommended, you may use this
+method if you need to support Hardware Composer v1.3 for some other reason but
+can’t conduct virtual display composition.</p>
diff --git a/src/devices/graphics/implement-vsync.jd b/src/devices/graphics/implement-vsync.jd
new file mode 100644
index 0000000..3db2a51
--- /dev/null
+++ b/src/devices/graphics/implement-vsync.jd
@@ -0,0 +1,394 @@
+page.title=Implementing VSYNC
+@jd:body
+
+<!--
+    Copyright 2016 The Android Open Source Project
+
+    Licensed under the Apache License, Version 2.0 (the "License");
+    you may not use this file except in compliance with the License.
+    You may obtain a copy of the License at
+
+        http://www.apache.org/licenses/LICENSE-2.0
+
+    Unless required by applicable law or agreed to in writing, software
+    distributed under the License is distributed on an "AS IS" BASIS,
+    WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+    See the License for the specific language governing permissions and
+    limitations under the License.
+-->
+
+<div id="qv-wrapper">
+  <div id="qv">
+    <h2>In this document</h2>
+    <ol id="auto-toc">
+    </ol>
+  </div>
+</div>
+
+
+<p>VSYNC synchronizes certain events to the refresh cycle of the display.
+Applications always start drawing on a VSYNC boundary, and SurfaceFlinger
+always composites on a VSYNC boundary. This eliminates stutters and improves
+visual performance of graphics.</p>
+
+<p>The Hardware Composer (HWC) has a function pointer indicating the function
+to implement for VSYNC:</p>
+
+<pre class=prettyprint> int (waitForVsync*) (int64_t *timestamp) </pre>
+
+<p>This function blocks until a VSYNC occurs and returns the timestamp of the
+actual VSYNC. A message must be sent every time VSYNC occurs. A client can
+receive a VSYNC timestamp once at specified intervals or continuously at
+intervals of 1. You must implement VSYNC with a maximum 1 ms lag (0.5 ms or less
+is recommended); timestamps returned must be extremely accurate.</p>
+
+<h2 id=explicit_synchronization>Explicit synchronization</h2>
+
+<p>Explicit synchronization is required and provides a mechanism for Gralloc
+buffers to be acquired and released in a synchronized way. Explicit
+synchronization allows producers and consumers of graphics buffers to signal
+when they are done with a buffer. This allows Android to asynchronously queue
+buffers to be read or written with the certainty that another consumer or
+producer does not currently need them. For details, see
+<a href="{@docRoot}devices/graphics/index.html#synchronization_framework">Synchronization
+framework</a>.</p>
+
+<p>The benefits of explicit synchronization include less behavior variation
+between devices, better debugging support, and improved testing metrics. For
+instance, the sync framework output readily identifies problem areas and root
+causes, and centralized SurfaceFlinger presentation timestamps show when events
+occur in the normal flow of the system.</p>
+
+<p>This communication is facilitated by the use of synchronization fences,
+which are required when requesting a buffer for consuming or producing. The
+synchronization framework consists of three main building blocks:
+<code>sync_timeline</code>, <code>sync_pt</code>, and <code>sync_fence</code>.</p>
+
+<h3 id=sync_timeline>sync_timeline</h3>
+
+<p>A <code>sync_timeline</code> is a monotonically increasing timeline that
+should be implemented for each driver instance, such as a GL context, display
+controller, or 2D blitter. This is essentially a counter of jobs submitted to
+the kernel for a particular piece of hardware. It provides guarantees about the
+order of operations and allows hardware-specific implementations.</p>
+
+<p>The sync_timeline is offered as a CPU-only reference implementation called
+<code>sw_sync</code> (software sync). If possible, use this instead of a
+<code>sync_timeline</code> to save resources and avoid complexity. If you’re not
+employing a hardware resource, <code>sw_sync</code> should be sufficient.</p>
+
+<p>If you must implement a <code>sync_timeline</code>, use the
+<code>sw_sync</code> driver as a starting point. Follow these guidelines:</p>
+
+<ul>
+<li>Provide useful names for all drivers, timelines, and fences. This simplifies
+debugging.</li>
+<li>Implement <code>timeline_value_str</code> and <code>pt_value_str</code>
+operators in your timelines to make debugging output more readable.</li>
+<li>If you want your userspace libraries (such as the GL library) to have access
+to the private data of your timelines, implement the fill driver_data operator.
+This lets you get information about the immutable sync_fence and
+<code>sync_pts</code> so you can build command lines based upon them.</li>
+</ul>
+
+<p>When implementing a <code>sync_timeline</code>, <strong>do not</strong>:</p>
+
+<ul>
+<li>Base it on any real view of time, such as when a wall clock or other piece
+of work might finish. It is better to create an abstract timeline that you can
+control.</li>
+<li>Allow userspace to explicitly create or signal a fence. This can result in
+one piece of the user pipeline creating a denial-of-service attack that halts
+all functionality. This is because the userspace cannot make promises on behalf
+of the kernel.</li>
+<li>Access <code>sync_timeline</code>, <code>sync_pt</code>, or
+<code>sync_fence</code> elements explicitly, as the API should provide all
+required functions.</li>
+</ul>
+
+<h3 id=sync_pt>sync_pt</h3>
+
+<p>A <code>sync_pt</code> is a single value or point on a sync_timeline. A point
+has three states: active, signaled, and error. Points start in the active state
+and transition to the signaled or error states. For instance, when a buffer is
+no longer needed by an image consumer, this sync_point is signaled so image
+producers know it is okay to write into the buffer again.</p>
+
+<h3 id=sync_fence>sync_fence</h3>
+
+<p>A <code>sync_fence</code> is a collection of <code>sync_pts</code> that often
+have different <code>sync_timeline</code> parents (such as for the display
+controller and GPU). These are the main primitives over which drivers and
+userspace communicate their dependencies. A fence is a promise from the kernel
+given upon accepting work that has been queued and assures completion in a
+finite amount of time.</p>
+
+<p>This allows multiple consumers or producers to signal they are using a
+buffer and to allow this information to be communicated with one function
+parameter. Fences are backed by a file descriptor and can be passed from
+kernel-space to user-space. For instance, a fence can contain two
+<code>sync_points</code> that signify when two separate image consumers are done
+reading a buffer. When the fence is signaled, the image producers know both
+consumers are done consuming.</p>
+
+<p>Fences, like <code>sync_pts</code>, start active and then change state based
+upon the state of their points. If all <code>sync_pts</code> become signaled,
+the <code>sync_fence</code> becomes signaled. If one <code>sync_pt</code> falls
+into an error state, the entire sync_fence has an error state.</p>
+
+<p>Membership in the <code>sync_fence</code> is immutable after the fence is
+created. As a <code>sync_pt</code> can be in only one fence, it is included as a
+copy. Even if two points have the same value, there will be two copies of the
+<code>sync_pt</code> in the fence. To get more than one point in a fence, a
+merge operation is conducted where points from two distinct fences are added to
+a third fence. If one of those points was signaled in the originating fence and
+the other was not, the third fence will also not be in a signaled state.</p>
+
+<p>To implement explicit synchronization, provide the following:</p>
+
+<ul>
+<li>A kernel-space driver that implements a synchronization timeline for a
+particular piece of hardware. Drivers that need to be fence-aware are generally
+anything that accesses or communicates with the Hardware Composer. Key files
+include:
+<ul>
+<li>Core implementation:
+<ul>
+ <li><code>kernel/common/include/linux/sync.h</code></li>
+ <li><code>kernel/common/drivers/base/sync.c</code></li>
+</ul></li>
+<li><code>sw_sync</code>:
+<ul>
+ <li><code>kernel/common/include/linux/sw_sync.h</code></li>
+ <li><code>kernel/common/drivers/base/sw_sync.c</code></li>
+</ul></li>
+<li>Documentation at <code>kernel/common//Documentation/sync.txt</code>.</li>
+<li>Library to communicate with the kernel-space in
+ <code>platform/system/core/libsync</code>.</li>
+</ul></li>
+<li>A Hardware Composer HAL module (v1.3 or higher) that supports the new
+synchronization functionality. You must provide the appropriate synchronization
+fences as parameters to the <code>set()</code> and <code>prepare()</code>
+functions in the HAL.</li>
+<li>Two fence-related GL extensions (<code>EGL_ANDROID_native_fence_sync</code>
+and <code>EGL_ANDROID_wait_sync</code>) and fence support in your graphics
+drivers.</li>
+</ul>
+
+<p>For example, to use the API supporting the synchronization function, you
+might develop a display driver that has a display buffer function. Before the
+synchronization framework existed, this function would receive dma-bufs, put
+those buffers on the display, and block while the buffer is visible. For
+example:</p>
+
+<pre class=prettyprint>/*
+ * assumes buf is ready to be displayed.  returns when buffer is no longer on
+ * screen.
+ */
+void display_buffer(struct dma_buf *buf);
+</pre>
+
+<p>With the synchronization framework, the API call is slightly more complex.
+While putting a buffer on display, you associate it with a fence that says when
+the buffer will be ready. You can queue up the work and initiate after the fence
+clears.</p>
+
+<p>In this manner, you are not blocking anything. You immediately return your
+own fence, which is a guarantee of when the buffer will be off of the display.
+As you queue up buffers, the kernel will list dependencies with the
+synchronization framework:</p>
+
+<pre class=prettyprint>/*
+ * will display buf when fence is signaled.  returns immediately with a fence
+ * that will signal when buf is no longer displayed.
+ */
+struct sync_fence* display_buffer(struct dma_buf *buf, struct sync_fence
+*fence);
+</pre>
+
+
+<h2 id=sync_integration>Sync integration</h2>
+<p>This section explains how to integrate the low-level sync framework with
+different parts of the Android framework and the drivers that must communicate
+with one another.</p>
+
+<h3 id=integration_conventions>Integration conventions</h3>
+
+<p>The Android HAL interfaces for graphics follow consistent conventions so
+when file descriptors are passed across a HAL interface, ownership of the file
+descriptor is always transferred. This means:</p>
+
+<ul>
+<li>If you receive a fence file descriptor from the sync framework, you must
+close it.</li>
+<li>If you return a fence file descriptor to the sync framework, the framework
+will close it.</li>
+<li>To continue using the fence file descriptor, you must duplicate the
+descriptor.</li>
+</ul>
+
+<p>Every time a fence passes through BufferQueue (such as for a window that
+passes a fence to BufferQueue saying when its new contents will be ready) the
+fence object is renamed. Since kernel fence support allows fences to have
+strings for names, the sync framework uses the window name and buffer index
+that is being queued to name the fence (i.e., <code>SurfaceView:0</code>). This
+is helpful in debugging to identify the source of a deadlock as the names appear
+in the output of <code>/d/sync</code> and bug reports.</p>
+
+<h3 id=anativewindow_integration>ANativeWindow integration</h3>
+
+<p>ANativeWindow is fence aware and <code>dequeueBuffer</code>,
+<code>queueBuffer</code>, and <code>cancelBuffer</code> have fence parameters.
+</p>
+
+<h3 id=opengl_es_integration>OpenGL ES integration</h3>
+
+<p>OpenGL ES sync integration relies upon two EGL extensions:</p>
+
+<ul>
+<li><code>EGL_ANDROID_native_fence_sync</code>. Provides a way to either
+wrap or create native Android fence file descriptors in EGLSyncKHR objects.</li>
+<li><code>EGL_ANDROID_wait_sync</code>. Allows GPU-side stalls rather than in
+CPU, making the GPU wait for an EGLSyncKHR. This is essentially the same as the
+<code>EGL_KHR_wait_sync</code> extension (refer to that specification for
+details).</li>
+</ul>
+
+<p>These extensions can be used independently and are controlled by a compile
+flag in libgui. To use them, first implement the
+<code>EGL_ANDROID_native_fence_sync</code> extension along with the associated
+kernel support. Next, add a ANativeWindow support for fences to your driver then
+turn on support in libgui to make use of the
+<code>EGL_ANDROID_native_fence_sync</code> extension.</p>
+
+<p>In a second pass, enable the <code>EGL_ANDROID_wait_sync</code>
+extension in your driver and turn it on separately. The
+<code>EGL_ANDROID_native_fence_sync</code> extension consists of a distinct
+native fence EGLSync object type so extensions that apply to existing EGLSync
+object types don’t necessarily apply to <code>EGL_ANDROID_native_fence</code>
+objects to avoid unwanted interactions.</p>
+
+<p>The EGL_ANDROID_native_fence_sync extension employs a corresponding native
+fence file descriptor attribute that can be set only at creation time and
+cannot be directly queried onward from an existing sync object. This attribute
+can be set to one of two modes:</p>
+
+<ul>
+<li><em>A valid fence file descriptor</em>. Wraps an existing native Android
+fence file descriptor in an EGLSyncKHR object.</li>
+<li><em>-1</em>. Creates a native Android fence file descriptor from an
+EGLSyncKHR object.</li>
+</ul>
+
+<p>The DupNativeFenceFD function call is used to extract the EGLSyncKHR object
+from the native Android fence file descriptor. This has the same result as
+querying the attribute that was set but adheres to the convention that the
+recipient closes the fence (hence the duplicate operation). Finally, destroying
+the EGLSync object should close the internal fence attribute.</p>
+
+<h3 id=hardware_composer_integration>Hardware Composer integration</h3>
+
+<p>The Hardware Composer handles three types of sync fences:</p>
+
+<ul>
+<li><em>Acquire fence</em>. One per layer, set before calling
+<code>HWC::set</code>. It signals when Hardware Composer may read the buffer.</li>
+<li><em>Release fence</em>. One per layer, filled in by the driver in
+<code>HWC::set</code>. It signals when Hardware Composer is done reading the
+buffer so the framework can start using that buffer again for that particular
+layer.</li>
+<li><em>Retire fence</em>. One per the entire frame, filled in by the driver
+each time <code>HWC::set</code> is called. This covers all layers for the set
+operation and signals to the framework when all effects of this set operation
+have completed. The retire fence signals when the next set operation takes place
+on the screen.</li>
+</ul>
+
+<p>The retire fence can be used to determine how long each frame appears on the
+screen. This is useful in identifying the location and source of delays, such
+as a stuttering animation.</p>
+
+<h2 id=vsync_offset>VSYNC offset</h2>
+
+<p>Application and SurfaceFlinger render loops should be synchronized to the
+hardware VSYNC. On a VSYNC event, the display begins showing frame N while
+SurfaceFlinger begins compositing windows for frame N+1. The app handles
+pending input and generates frame N+2.</p>
+
+<p>Synchronizing with VSYNC delivers consistent latency. It reduces errors in
+apps and SurfaceFlinger and the drifting of displays in and out of phase with
+each other. This, however, does assume application and SurfaceFlinger per-frame
+times don’t vary widely. Nevertheless, the latency is at least two frames.</p>
+
+<p>To remedy this, you can employ VSYNC offsets to reduce the input-to-display
+latency by making application and composition signal relative to hardware
+VSYNC. This is possible because application plus composition usually takes less
+than 33 ms.</p>
+
+<p>The result of VSYNC offset is three signals with same period, offset
+phase:</p>
+
+<ul>
+<li><code>HW_VSYNC_0</code>. Display begins showing next frame.</li>
+<li><code>VSYNC</code>. App reads input and generates next frame.</li>
+<li><code>SF VSYNC</code>. SurfaceFlinger begins compositing for next frame.</li>
+</ul>
+
+<p>With VSYNC offset, SurfaceFlinger receives the buffer and composites the
+frame, while the application processes the input and renders the frame, all
+within a single frame of time.</p>
+
+<p class="note"><strong>Note:</strong> VSYNC offsets reduce the time available
+for app and composition and therefore provide a greater chance for error.</p>
+
+<h3 id=dispsync>DispSync</h3>
+
+<p>DispSync maintains a model of the periodic hardware-based VSYNC events of a
+display and uses that model to execute periodic callbacks at specific phase
+offsets from the hardware VSYNC events.</p>
+
+<p>DispSync is essentially a software phase lock loop (PLL) that generates the
+VSYNC and SF VSYNC signals used by Choreographer and SurfaceFlinger, even if
+not offset from hardware VSYNC.</p>
+
+<img src="images/dispsync.png" alt="DispSync flow">
+
+<p class="img-caption"><strong>Figure 1.</strong> DispSync flow</p>
+
+<p>DispSync has the following qualities:</p>
+
+<ul>
+<li><em>Reference</em>. HW_VSYNC_0.</li>
+<li><em>Output</em>. VSYNC and SF VSYNC.</li>
+<li><em>Feedback</em>. Retire fence signal timestamps from Hardware Composer.
+</li>
+</ul>
+
+<h3 id=vsync_retire_offset>VSYNC/Retire offset</h3>
+
+<p>The signal timestamp of retire fences must match HW VSYNC even on devices
+that don’t use the offset phase. Otherwise, errors appear to have greater
+severity than reality. Smart panels often have a delta: Retire fence is the end
+of direct memory access (DMA) to display memory, but the actual display switch
+and HW VSYNC is some time later.</p>
+
+<p><code>PRESENT_TIME_OFFSET_FROM_VSYNC_NS</code> is set in the device’s
+BoardConfig.mk make file. It is based upon the display controller and panel
+characteristics. Time from retire fence timestamp to HW VSYNC signal is
+measured in nanoseconds.</p>
+
+<h3 id=vsync_and_sf_vsync_offsets>VSYNC and SF_VSYNC offsets</h3>
+
+<p>The <code>VSYNC_EVENT_PHASE_OFFSET_NS</code> and
+<code>SF_VSYNC_EVENT_PHASE_OFFSET_NS</code> are set conservatively based on
+high-load use cases, such as partial GPU composition during window transition
+or Chrome scrolling through a webpage containing animations. These offsets
+allow for long application render time and long GPU composition time.</p>
+
+<p>More than a millisecond or two of latency is noticeable. We recommend
+integrating thorough automated error testing to minimize latency without
+significantly increasing error counts.</p>
+
+<p class="note"><strong>Note:</strong> Theses offsets are also configured in the
+device’s BoardConfig.mk file. Both settings are offset in nanoseconds after
+HW_VSYNC_0, default to zero (if not set), and can be negative.</p>
diff --git a/src/devices/graphics/implement-vulkan.jd b/src/devices/graphics/implement-vulkan.jd
new file mode 100644
index 0000000..d69ec4b
--- /dev/null
+++ b/src/devices/graphics/implement-vulkan.jd
@@ -0,0 +1,34 @@
+page.title=Implementing Vulkan
+@jd:body
+
+<!--
+    Copyright 2016 The Android Open Source Project
+
+    Licensed under the Apache License, Version 2.0 (the "License");
+    you may not use this file except in compliance with the License.
+    You may obtain a copy of the License at
+
+        http://www.apache.org/licenses/LICENSE-2.0
+
+    Unless required by applicable law or agreed to in writing, software
+    distributed under the License is distributed on an "AS IS" BASIS,
+    WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+    See the License for the specific language governing permissions and
+    limitations under the License.
+-->
+
+<div id="qv-wrapper">
+  <div id="qv">
+    <h2>In this document</h2>
+    <ol id="auto-toc">
+    </ol>
+  </div>
+</div>
+
+
+<p>Vulkan is a low-overhead, cross-platform API for high-performance 3D graphics.
+Like OpenGL ES, Vulkan provides tools for creating high-quality, real-time
+graphics in applications. Vulkan advantages include reductions in CPU overhead
+and support for the SPIR-V Binary Intermediate language.</p>
+
+<p>Details coming soon!</p>
diff --git a/src/devices/graphics/implement.jd b/src/devices/graphics/implement.jd
index 3f3654a..178f4b8 100644
--- a/src/devices/graphics/implement.jd
+++ b/src/devices/graphics/implement.jd
@@ -26,580 +26,140 @@
 </div>
 
 
-<p>Follow the instructions here to implement the Android graphics HAL.</p>
+<p>To implement the Android graphics HAL, review the following requirements,
+implementation details, and testing advice.</p>
 
 <h2 id=requirements>Requirements</h2>
 
-<p>The following list and sections describe what you need to provide to support
-graphics in your product:</p>
+<p>Android graphics support requires the following components:</p>
 
-<ul> <li> OpenGL ES 1.x Driver <li> OpenGL ES 2.0 Driver <li> OpenGL ES 3.0
-Driver (optional) <li> EGL Driver <li> Gralloc HAL implementation <li> Hardware
-Composer HAL implementation <li> Framebuffer HAL implementation </ul>
+<ul>
+    <li>EGL driver</li>
+    <li>OpenGL ES 1.x driver</li>
+    <li>OpenGL ES 2.0 driver</li>
+    <li>OpenGL ES 3.x driver (optional)</li>
+    <li>Gralloc HAL implementation</li>
+    <li>Hardware Composer HAL implementation</li>
+</ul>
 
 <h2 id=implementation>Implementation</h2>
 
 <h3 id=opengl_and_egl_drivers>OpenGL and EGL drivers</h3>
 
-<p>You must provide drivers for OpenGL ES 1.x, OpenGL ES 2.0, and EGL. Here are
-some key considerations:</p>
+<p>You must provide drivers for EGL, OpenGL ES 1.x, and OpenGL ES 2.0 (support
+for OpenGL 3.x is optional). Key considerations include:</p>
 
-<ul> <li> The GL driver needs to be robust and conformant to OpenGL ES
-standards.  <li> Do not limit the number of GL contexts. Because Android allows
-apps in the background and tries to keep GL contexts alive, you should not
-limit the number of contexts in your driver.  <li> It is not uncommon to have
-20-30 active GL contexts at once, so you should also be careful with the amount
-of memory allocated for each context.  <li> Support the YV12 image format and
-any other YUV image formats that come from other components in the system such
-as media codecs or the camera.  <li> Support the mandatory extensions:
-<code>GL_OES_texture_external</code>,
-<code>EGL_ANDROID_image_native_buffer</code>, and
-<code>EGL_ANDROID_recordable</code>. The
-<code>EGL_ANDROID_framebuffer_target</code> extension is required for Hardware
-Composer 1.1 and higher, as well.  <li> We highly recommend also supporting
-<code>EGL_ANDROID_blob_cache</code>, <code>EGL_KHR_fence_sync</code>,
-<code>EGL_KHR_wait_sync</code>, and <code>EGL_ANDROID_native_fence_sync</code>.
-</ul>
+<ul>
+    <li>GL driver must be robust and conformant to OpenGL ES standards.</li>
+    <li>Do not limit the number of GL contexts. Because Android allows apps in
+    the background and tries to keep GL contexts alive, you should not limit the
+    number of contexts in your driver.</li>
+    <li> It is common to have 20-30 active GL contexts at once, so be
+    mindful of the amount of memory allocated for each context.</li>
+    <li>Support the YV12 image format and other YUV image formats that come from
+    other components in the system, such as media codecs or the camera.</li>
+    <li>Support the mandatory extensions: <code>GL_OES_texture_external</code>,
+    <code>EGL_ANDROID_image_native_buffer</code>, and
+    <code>EGL_ANDROID_recordable</code>. In addition, the
+    <code>EGL_ANDROID_framebuffer_target</code> extension is required for
+    Hardware Composer v1.1 and higher.</li>
+    </ul>
+<p>We highly recommend also supporting <code>EGL_ANDROID_blob_cache</code>,
+<code>EGL_KHR_fence_sync</code>, <code>EGL_KHR_wait_sync</code>, and <code>EGL_ANDROID_native_fence_sync</code>.</p>
 
-<p>Note the OpenGL API exposed to app developers is different from the OpenGL
-interface that you are implementing. Apps do not have access to the GL driver
-layer and must go through the interface provided by the APIs.</p>
+<p class="note"><strong>Note</strong>: The OpenGL API exposed to app developers
+differs from the OpenGL implemented on the device. Apps cannot directly access
+the GL driver layer and must go through the interface provided by the APIs.</p>
 
 <h3 id=pre-rotation>Pre-rotation</h3>
 
-<p>Many hardware overlays do not support rotation, and even if they do it costs
-processing power. So the solution is to pre-transform the buffer before it
-reaches SurfaceFlinger. A query hint in <code>ANativeWindow</code> was added
-(<code>NATIVE_WINDOW_TRANSFORM_HINT</code>) that represents the most likely
-transform to be applied to the buffer by SurfaceFlinger. Your GL driver can use
-this hint to pre-transform the buffer before it reaches SurfaceFlinger so when
-the buffer arrives, it is correctly transformed.</p>
+<p>Many hardware overlays do not support rotation (and even if they do it costs
+processing power); the solution is to pre-transform the buffer before it reaches
+SurfaceFlinger. Android supports a query hint
+(<code>NATIVE_WINDOW_TRANSFORM_HINT</code>) in <code>ANativeWindow</code> to
+represent the most likely transform to be applied to the buffer by
+SurfaceFlinger. GL drivers can use this hint to pre-transform the buffer
+before it reaches SurfaceFlinger so when the buffer arrives, it is correctly
+transformed.</p>
 
-<p>For example, you may receive a hint to rotate 90 degrees. You must generate
-a matrix and apply it to the buffer to prevent it from running off the end of
-the page. To save power, this should be done in pre-rotation. See the
-<code>ANativeWindow</code> interface defined in
-<code>system/core/include/system/window.h</code> for more details.</p>
+<p>For example, when receiving a hint to rotate 90 degrees, generate and apply a
+matrix to the buffer to prevent it from running off the end of the page. To save
+power, do this pre-rotation. For details, see the <code>ANativeWindow</code>
+interface defined in <code>system/core/include/system/window.h</code>.</p>
 
 <h3 id=gralloc_hal>Gralloc HAL</h3>
 
-<p>The graphics memory allocator is needed to allocate memory that is requested
-by image producers. You can find the interface definition of the HAL at:
-<code>hardware/libhardware/modules/gralloc.h</code></p>
+<p>The graphics memory allocator allocates memory requested by image producers.
+You can find the interface definition of the HAL at
+<code>hardware/libhardware/modules/gralloc.h</code>.</p>
 
 <h3 id=protected_buffers>Protected buffers</h3>
 
 <p>The gralloc usage flag <code>GRALLOC_USAGE_PROTECTED</code> allows the
 graphics buffer to be displayed only through a hardware-protected path. These
-overlay planes are the only way to display DRM content. DRM-protected buffers
-cannot be accessed by SurfaceFlinger or the OpenGL ES driver.</p>
+overlay planes are the only way to display DRM content (DRM-protected buffers
+cannot be accessed by SurfaceFlinger or the OpenGL ES driver).</p>
 
 <p>DRM-protected video can be presented only on an overlay plane. Video players
 that support protected content must be implemented with SurfaceView. Software
-running on unprotected hardware cannot read or write the buffer.
-Hardware-protected paths must appear on the Hardware Composer overlay. For
-instance, protected videos will disappear from the display if Hardware Composer
-switches to OpenGL ES composition.</p>
+running on unprotected hardware cannot read or write the buffer;
+hardware-protected paths must appear on the Hardware Composer overlay (i.e.,
+protected videos will disappear from the display if Hardware Composer switches
+to OpenGL ES composition).</p>
 
-<p>See the <a href="{@docRoot}devices/drm.html">DRM</a> page for a description
-of protected content.</p>
+<p>For details on protected content, see
+<a href="{@docRoot}devices/drm.html">DRM</a>.</p>
 
 <h3 id=hardware_composer_hal>Hardware Composer HAL</h3>
 
-<p>The Hardware Composer HAL is used by SurfaceFlinger to composite surfaces to
-the screen. The Hardware Composer abstracts objects like overlays and 2D
-blitters and helps offload some work that would normally be done with
-OpenGL.</p>
-
-<p>We recommend you start using version 1.3 of the Hardware Composer HAL as it
-will provide support for the newest features (explicit synchronization,
-external displays, and more). Because the physical display hardware behind the
-Hardware Composer abstraction layer can vary from device to device, it is
-difficult to define recommended features. But here is some guidance:</p>
-
-<ul> <li> The Hardware Composer should support at least four overlays (status
-bar, system bar, application, and wallpaper/background).  <li> Layers can be
-bigger than the screen, so the Hardware Composer should be able to handle
-layers that are larger than the display (for example, a wallpaper).  <li>
-Pre-multiplied per-pixel alpha blending and per-plane alpha blending should be
-supported at the same time.  <li> The Hardware Composer should be able to
-consume the same buffers that the GPU, camera, video decoder, and Skia buffers
-are producing, so supporting some of the following properties is helpful: <ul>
-<li> RGBA packing order <li> YUV formats <li> Tiling, swizzling, and stride
-properties </ul> <li> A hardware path for protected video playback must be
-present if you want to support protected content.  </ul>
-
-<p>The general recommendation when implementing your Hardware Composer is to
-implement a non-operational Hardware Composer first. Once you have the
-structure done, implement a simple algorithm to delegate composition to the
-Hardware Composer. For example, just delegate the first three or four surfaces
-to the overlay hardware of the Hardware Composer.</p>
-
-<p>Focus on optimization, such as intelligently selecting the surfaces to send
-to the overlay hardware that maximizes the load taken off of the GPU. Another
-optimization is to detect whether the screen is updating. If not, delegate
-composition to OpenGL instead of the Hardware Composer to save power. When the
-screen updates again, continue to offload composition to the Hardware
-Composer.</p>
-
-<p>Devices must report the display mode (or resolution). Android uses the first
-mode reported by the device. To support televisions, have the TV device report
-the mode selected for it by the manufacturer to Hardware Composer. See
-hwcomposer.h for more details.</p>
-
-<p>Prepare for common use cases, such as:</p>
-
-<ul> <li> Full-screen games in portrait and landscape mode <li> Full-screen
-video with closed captioning and playback control <li> The home screen
-(compositing the status bar, system bar, application window, and live
-wallpapers) <li> Protected video playback <li> Multiple display support </ul>
-
-<p>These use cases should address regular, predictable uses rather than edge
-cases that are rarely encountered. Otherwise, any optimization will have little
-benefit. Implementations must balance two competing goals: animation smoothness
-and interaction latency.</p>
-
-<p>Further, to make best use of Android graphics, you must develop a robust
-clocking strategy. Performance matters little if clocks have been turned down
-to make every operation slow. You need a clocking strategy that puts the clocks
-at high speed when needed, such as to make animations seamless, and then slows
-the clocks whenever the increased speed is no longer needed.</p>
-
-<p>Use the <code>adb shell dumpsys SurfaceFlinger</code> command to see
-precisely what SurfaceFlinger is doing. See the <a
-href="{@docRoot}devices/graphics/architecture.html#hwcomposer">Hardware
-Composer</a> section of the Architecture page for example output and a
-description of relevant fields.</p>
-
-<p>You can find the HAL for the Hardware Composer and additional documentation
-in: <code>hardware/libhardware/include/hardware/hwcomposer.h
-hardware/libhardware/include/hardware/hwcomposer_defs.h</code></p>
-
-<p>A stub implementation is available in the
-<code>hardware/libhardware/modules/hwcomposer</code> directory.</p>
+<p>The Hardware Composer HAL (HWC) is used by SurfaceFlinger to composite
+surfaces to the screen. It abstracts objects such as overlays and 2D blitters
+and helps offload some work that would normally be done with OpenGL. For details
+on the HWC, see <a href="{@docRoot}devices/graphics/implement-hwc.html">Hardware
+Composer HAL</a>.</p>
 
 <h3 id=vsync>VSYNC</h3>
 
 <p>VSYNC synchronizes certain events to the refresh cycle of the display.
-Applications always start drawing on a VSYNC boundary, and SurfaceFlinger
-always composites on a VSYNC boundary. This eliminates stutters and improves
-visual performance of graphics. The Hardware Composer has a function
-pointer:</p>
-
-<pre class=prettyprint> int (waitForVsync*) (int64_t *timestamp) </pre>
-
-
-<p>This points to a function you must implement for VSYNC. This function blocks
-until a VSYNC occurs and returns the timestamp of the actual VSYNC. A message
-must be sent every time VSYNC occurs. A client can receive a VSYNC timestamp
-once, at specified intervals, or continuously (interval of 1). You must
-implement VSYNC to have no more than a 1ms lag at the maximum (0.5ms or less is
-recommended), and the timestamps returned must be extremely accurate.</p>
-
-<h4 id=explicit_synchronization>Explicit synchronization</h4>
-
-<p>Explicit synchronization is required and provides a mechanism for Gralloc
-buffers to be acquired and released in a synchronized way. Explicit
-synchronization allows producers and consumers of graphics buffers to signal
-when they are done with a buffer. This allows the Android system to
-asynchronously queue buffers to be read or written with the certainty that
-another consumer or producer does not currently need them. See the
-<a href="{@docRoot}devices/graphics/index.html#synchronization_framework">Synchronization
-framework</a> section for an overview of this mechanism.</p>
-
-<p>The benefits of explicit synchronization include less behavior variation
-between devices, better debugging support, and improved testing metrics. For
-instance, the sync framework output readily identifies problem areas and root
-causes. And centralized SurfaceFlinger presentation timestamps show when events
-occur in the normal flow of the system.</p>
-
-<p>This communication is facilitated by the use of synchronization fences,
-which are now required when requesting a buffer for consuming or producing. The
-synchronization framework consists of three main building blocks:
-sync_timeline, sync_pt, and sync_fence.</p>
-
-<h5 id=sync_timeline>sync_timeline</h5>
-
-<p>A sync_timeline is a monotonically increasing timeline that should be
-implemented for each driver instance, such as a GL context, display controller,
-or 2D blitter. This is essentially a counter of jobs submitted to the kernel
-for a particular piece of hardware. It provides guarantees about the order of
-operations and allows hardware-specific implementations.</p>
-
-<p>Please note, the sync_timeline is offered as a CPU-only reference
-implementation called sw_sync (which stands for software sync). If possible,
-use sw_sync instead of a sync_timeline to save resources and avoid complexity.
-If you’re not employing a hardware resource, sw_sync should be sufficient.</p>
-
-<p>If you must implement a sync_timeline, use the sw_sync driver as a starting
-point. Follow these guidelines:</p>
-
-<ul> <li> Provide useful names for all drivers, timelines, and fences. This
-simplifies debugging.  <li> Implement timeline_value str and pt_value_str
-operators in your timelines as they make debugging output much more readable.
-<li> If you want your userspace libraries (such as the GL library) to have
-access to the private data of your timelines, implement the fill driver_data
-operator. This lets you get information about the immutable sync_fence and
-sync_pts so you might build command lines based upon them.  </ul>
-
-<p>When implementing a sync_timeline, <strong>don’t</strong>:</p>
-
-<ul> <li> Base it on any real view of time, such as when a wall clock or other
-piece of work might finish. It is better to create an abstract timeline that
-you can control.  <li> Allow userspace to explicitly create or signal a fence.
-This can result in one piece of the user pipeline creating a denial-of-service
-attack that halts all functionality. This is because the userspace cannot make
-promises on behalf of the kernel.  <li> Access sync_timeline, sync_pt, or
-sync_fence elements explicitly, as the API should provide all required
-functions.  </ul>
-
-<h5 id=sync_pt>sync_pt</h5>
-
-<p>A sync_pt is a single value or point on a sync_timeline. A point has three
-states: active, signaled, and error. Points start in the active state and
-transition to the signaled or error states. For instance, when a buffer is no
-longer needed by an image consumer, this sync_point is signaled so that image
-producers know it is okay to write into the buffer again.</p>
-
-<h5 id=sync_fence>sync_fence</h5>
-
-<p>A sync_fence is a collection of sync_pts that often have different
-sync_timeline parents (such as for the display controller and GPU). These are
-the main primitives over which drivers and userspace communicate their
-dependencies. A fence is a promise from the kernel that it gives upon accepting
-work that has been queued and assures completion in a finite amount of
-time.</p>
-
-<p>This allows multiple consumers or producers to signal they are using a
-buffer and to allow this information to be communicated with one function
-parameter. Fences are backed by a file descriptor and can be passed from
-kernel-space to user-space. For instance, a fence can contain two sync_points
-that signify when two separate image consumers are done reading a buffer. When
-the fence is signaled, the image producers know both consumers are done
-consuming.
-
-Fences, like sync_pts, start active and then change state based upon the state
-of their points. If all sync_pts become signaled, the sync_fence becomes
-signaled. If one sync_pt falls into an error state, the entire sync_fence has
-an error state.
-
-Membership in the sync_fence is immutable once the fence is created. And since
-a sync_pt can be in only one fence, it is included as a copy. Even if two
-points have the same value, there will be two copies of the sync_pt in the
-fence.
-
-To get more than one point in a fence, a merge operation is conducted. In the
-merge, the points from two distinct fences are added to a third fence. If one
-of those points was signaled in the originating fence, and the other was not,
-the third fence will also not be in a signaled state.</p>
-
-<p>To implement explicit synchronization, you need to provide the
-following:</p>
-
-<ul> <li> A kernel-space driver that implements a synchronization timeline for
-a particular piece of hardware. Drivers that need to be fence-aware are
-generally anything that accesses or communicates with the Hardware Composer.
-Here are the key files (found in the android-3.4 kernel branch): <ul> <li> Core
-implementation: <ul> <li> <code>kernel/common/include/linux/sync.h</code> <li>
-<code>kernel/common/drivers/base/sync.c</code> </ul> <li> sw_sync: <ul> <li>
-<code>kernel/common/include/linux/sw_sync.h</code> <li>
-<code>kernel/common/drivers/base/sw_sync.c</code> </ul> <li> Documentation:
-<li> <code>kernel/common//Documentation/sync.txt</code> Finally, the
-<code>platform/system/core/libsync</code> directory includes a library to
-communicate with the kernel-space.  </ul> <li> A Hardware Composer HAL module
-(version 1.3 or later) that supports the new synchronization functionality. You
-will need to provide the appropriate synchronization fences as parameters to
-the set() and prepare() functions in the HAL.  <li> Two GL-specific extensions
-related to fences, <code>EGL_ANDROID_native_fence_sync</code> and
-<code>EGL_ANDROID_wait_sync</code>, along with incorporating fence support into
-your graphics drivers.  </ul>
-
-<p>For example, to use the API supporting the synchronization function, you
-might develop a display driver that has a display buffer function. Before the
-synchronization framework existed, this function would receive dma-bufs, put
-those buffers on the display, and block while the buffer is visible, like
-so:</p>
-
-<pre class=prettyprint>
-/*
- * assumes buf is ready to be displayed.  returns when buffer is no longer on
- * screen.
- */
-void display_buffer(struct dma_buf *buf); </pre>
-
-
-<p>With the synchronization framework, the API call is slightly more complex.
-While putting a buffer on display, you associate it with a fence that says when
-the buffer will be ready. So you queue up the work, which you will initiate
-once the fence clears.</p>
-
-<p>In this manner, you are not blocking anything. You immediately return your
-own fence, which is a guarantee of when the buffer will be off of the display.
-As you queue up buffers, the kernel will list dependencies. With the
-synchronization framework:</p>
-
-<pre class=prettyprint>
-/*
- * will display buf when fence is signaled.  returns immediately with a fence
- * that will signal when buf is no longer displayed.
- */
-struct sync_fence* display_buffer(struct dma_buf *buf, struct sync_fence
-*fence); </pre>
-
-
-<h4 id=sync_integration>Sync integration</h4>
-
-<h5 id=integration_conventions>Integration conventions</h5>
-
-<p>This section explains how to integrate the low-level sync framework with
-different parts of the Android framework and the drivers that need to
-communicate with one another.</p>
-
-<p>The Android HAL interfaces for graphics follow consistent conventions so
-when file descriptors are passed across a HAL interface, ownership of the file
-descriptor is always transferred. This means:</p>
-
-<ul> <li> if you receive a fence file descriptor from the sync framework, you
-must close it.  <li> if you return a fence file descriptor to the sync
-framework, the framework will close it.  <li> if you want to continue using the
-fence file descriptor, you must duplicate the descriptor.  </ul>
-
-<p>Every time a fence is passed through BufferQueue - such as for a window that
-passes a fence to BufferQueue saying when its new contents will be ready - the
-fence object is renamed. Since kernel fence support allows fences to have
-strings for names, the sync framework uses the window name and buffer index
-that is being queued to name the fence, for example:
-<code>SurfaceView:0</code></p>
-
-<p>This is helpful in debugging to identify the source of a deadlock. Those
-names appear in the output of <code>/d/sync</code> and bug reports when
-taken.</p>
-
-<h5 id=anativewindow_integration>ANativeWindow integration</h5>
-
-<p>ANativeWindow is fence aware. <code>dequeueBuffer</code>,
-<code>queueBuffer</code>, and <code>cancelBuffer</code> have fence
-parameters.</p>
-
-<h5 id=opengl_es_integration>OpenGL ES integration</h5>
-
-<p>OpenGL ES sync integration relies upon these two EGL extensions:</p>
-
-<ul> <li> <code>EGL_ANDROID_native_fence_sync</code> - provides a way to either
-wrap or create native Android fence file descriptors in EGLSyncKHR objects.
-<li> <code>EGL_ANDROID_wait_sync</code> - allows GPU-side stalls rather than in
-CPU, making the GPU wait for an EGLSyncKHR. This is essentially the same as the
-<code>EGL_KHR_wait_sync</code> extension. See the
-<code>EGL_KHR_wait_sync</code> specification for details.  </ul>
-
-<p>These extensions can be used independently and are controlled by a compile
-flag in libgui. To use them, first implement the
-<code>EGL_ANDROID_native_fence_sync</code> extension along with the associated
-kernel support. Next add a ANativeWindow support for fences to your driver and
-then turn on support in libgui to make use of the
-<code>EGL_ANDROID_native_fence_sync</code> extension.</p>
-
-<p>Then, as a second pass, enable the <code>EGL_ANDROID_wait_sync</code>
-extension in your driver and turn it on separately. The
-<code>EGL_ANDROID_native_fence_sync</code> extension consists of a distinct
-native fence EGLSync object type so extensions that apply to existing EGLSync
-object types don’t necessarily apply to <code>EGL_ANDROID_native_fence</code>
-objects to avoid unwanted interactions.</p>
-
-<p>The EGL_ANDROID_native_fence_sync extension employs a corresponding native
-fence file descriptor attribute that can be set only at creation time and
-cannot be directly queried onward from an existing sync object. This attribute
-can be set to one of two modes:</p>
-
-<ul> <li> A valid fence file descriptor - wraps an existing native Android
-fence file descriptor in an EGLSyncKHR object.  <li> -1 - creates a native
-Android fence file descriptor from an EGLSyncKHR object.  </ul>
-
-<p>The DupNativeFenceFD function call is used to extract the EGLSyncKHR object
-from the native Android fence file descriptor. This has the same result as
-querying the attribute that was set but adheres to the convention that the
-recipient closes the fence (hence the duplicate operation). Finally, destroying
-the EGLSync object should close the internal fence attribute.</p>
-
-<h5 id=hardware_composer_integration>Hardware Composer integration</h5>
-
-<p>Hardware Composer handles three types of sync fences:</p>
-
-<ul> <li> <em>Acquire fence</em> - one per layer, this is set before calling
-HWC::set. It signals when Hardware Composer may read the buffer.  <li>
-<em>Release fence</em> - one per layer, this is filled in by the driver in
-HWC::set. It signals when Hardware Composer is done reading the buffer so the
-framework can start using that buffer again for that particular layer.  <li>
-<em>Retire fence</em> - one per the entire frame, this is filled in by the
-driver each time HWC::set is called. This covers all of the layers for the set
-operation. It signals to the framework when all of the effects of this set
-operation has completed. The retire fence signals when the next set operation
-takes place on the screen.  </ul>
-
-<p>The retire fence can be used to determine how long each frame appears on the
-screen. This is useful in identifying the location and source of delays, such
-as a stuttering animation. </p>
-
-<h4 id=vsync_offset>VSYNC Offset</h4>
-
-<p>Application and SurfaceFlinger render loops should be synchronized to the
-hardware VSYNC. On a VSYNC event, the display begins showing frame N while
-SurfaceFlinger begins compositing windows for frame N+1. The app handles
-pending input and generates frame N+2.</p>
-
-<p>Synchronizing with VSYNC delivers consistent latency. It reduces errors in
-apps and SurfaceFlinger and the drifting of displays in and out of phase with
-each other. This, however, does assume application and SurfaceFlinger per-frame
-times don’t vary widely. Nevertheless, the latency is at least two frames.</p>
-
-<p>To remedy this, you may employ VSYNC offsets to reduce the input-to-display
-latency by making application and composition signal relative to hardware
-VSYNC. This is possible because application plus composition usually takes less
-than 33 ms.</p>
-
-<p>The result of VSYNC offset is three signals with same period, offset
-phase:</p>
-
-<ul> <li> <em>HW_VSYNC_0</em> - Display begins showing next frame <li>
-<em>VSYNC</em> - App reads input and generates next frame <li> <em>SF
-VSYNC</em> - SurfaceFlinger begins compositing for next frame </ul>
-
-<p>With VSYNC offset, SurfaceFlinger receives the buffer and composites the
-frame, while the application processes the input and renders the frame, all
-within a single frame of time.</p>
-
-<p>Please note, VSYNC offsets reduce the time available for app and composition
-and therefore provide a greater chance for error.</p>
-
-<h5 id=dispsync>DispSync</h5>
-
-<p>DispSync maintains a model of the periodic hardware-based VSYNC events of a
-display and uses that model to execute periodic callbacks at specific phase
-offsets from the hardware VSYNC events.</p>
-
-<p>DispSync is essentially a software phase lock loop (PLL) that generates the
-VSYNC and SF VSYNC signals used by Choreographer and SurfaceFlinger, even if
-not offset from hardware VSYNC.</p>
-
-<img src="images/dispsync.png" alt="DispSync flow">
-
-<p class="img-caption"><strong>Figure 4.</strong> DispSync flow</p>
-
-<p>DispSync has these qualities:</p>
-
-<ul> <li> <em>Reference</em> - HW_VSYNC_0 <li> <em>Output</em> - VSYNC and SF
-VSYNC <li> <em>Feedback</em> - Retire fence signal timestamps from Hardware
-Composer </ul>
-
-<h5 id=vsync_retire_offset>VSYNC/Retire Offset</h5>
-
-<p>The signal timestamp of retire fences must match HW VSYNC even on devices
-that don’t use the offset phase. Otherwise, errors appear to have greater
-severity than reality.</p>
-
-<p>“Smart” panels often have a delta. Retire fence is the end of direct memory
-access (DMA) to display memory. The actual display switch and HW VSYNC is some
-time later.</p>
-
-<p><code>PRESENT_TIME_OFFSET_FROM_VSYNC_NS</code> is set in the device’s
-BoardConfig.mk make file. It is based upon the display controller and panel
-characteristics. Time from retire fence timestamp to HW Vsync signal is
-measured in nanoseconds.</p>
-
-<h5 id=vsync_and_sf_vsync_offsets>VSYNC and SF_VSYNC Offsets</h5>
-
-<p>The <code>VSYNC_EVENT_PHASE_OFFSET_NS</code> and
-<code>SF_VSYNC_EVENT_PHASE_OFFSET_NS</code> are set conservatively based on
-high-load use cases, such as partial GPU composition during window transition
-or Chrome scrolling through a webpage containing animations. These offsets
-allow for long application render time and long GPU composition time.</p>
-
-<p>More than a millisecond or two of latency is noticeable. We recommend
-integrating thorough automated error testing to minimize latency without
-significantly increasing error counts.</p>
-
-<p>Note these offsets are also set in the device’s BoardConfig.mk make file.
-The default if not set is zero offset. Both settings are offset in nanoseconds
-after HW_VSYNC_0. Either can be negative.</p>
+Applications always start drawing on a VSYNC boundary, and SurfaceFlinger always
+composites on a VSYNC boundary. This eliminates stutters and improves visual
+performance of graphics. For details on VSYNC, see
+<a href="{@docRoot}devices/graphics/implement-vsync.html">Implementing
+VSYNC</a>.</p>
 
 <h3 id=virtual_displays>Virtual displays</h3>
 
-<p>Android added support for virtual displays to Hardware Composer in version
-1.3. This support was implemented in the Android platform and can be used by
-Miracast.</p>
-
-<p>The virtual display composition is similar to the physical display: Input
+<p>Android added platform support for virtual displays in Hardware Composer v1.3.
+The virtual display composition is similar to the physical display: Input
 layers are described in prepare(), SurfaceFlinger conducts GPU composition, and
-layers and GPU framebuffer are  provided to Hardware Composer in set().</p>
-
-<p>Instead of the output going to the screen, it is sent to a gralloc buffer.
-Hardware Composer writes output to a buffer and provides the completion fence.
-The buffer is sent to an arbitrary consumer: video encoder, GPU, CPU, etc.
-Virtual displays can use 2D/blitter or overlays if the display pipeline can
-write to memory.</p>
-
-<h4 id=modes>Modes</h4>
-
-<p>Each frame is in one of three modes after prepare():</p>
-
-<ul> <li> <em>GLES</em> - All layers composited by GPU. GPU writes directly to
-the output buffer while Hardware Composer does nothing. This is equivalent to
-virtual display composition with Hardware Composer <1.3.  <li> <em>MIXED</em> -
-GPU composites some layers to framebuffer, and Hardware Composer composites
-framebuffer and remaining layers. GPU writes to scratch buffer (framebuffer).
-Hardware Composer reads scratch buffer and writes to the output buffer. Buffers
-may have different formats, e.g. RGBA and YCbCr.  <li> <em>HWC</em> - All
-layers composited by Hardware Composer. Hardware Composer writes directly to
-the output buffer.  </ul>
-
-<h4 id=output_format>Output format</h4>
-
-<p><em>MIXED and HWC modes</em>: If the consumer needs CPU access, the consumer
-chooses the format. Otherwise, the format is IMPLEMENTATION_DEFINED. Gralloc
-can choose best format based on usage flags. For example, choose a YCbCr format
-if the consumer is video encoder, and Hardware Composer can write the format
-efficiently.</p>
-
-<p><em>GLES mode</em>: EGL driver chooses output buffer format in
-dequeueBuffer(), typically RGBA8888. The consumer must be able to accept this
-format.</p>
-
-<h4 id=egl_requirement>EGL requirement</h4>
-
-<p>Hardware Composer 1.3 virtual displays require that eglSwapBuffers() does
-not dequeue the next buffer immediately. Instead, it should defer dequeueing
-the buffer until rendering begins. Otherwise, EGL always owns the “next” output
-buffer. SurfaceFlinger can’t get the output buffer for Hardware Composer in
-MIXED/HWC mode. </p>
-
-<p>If Hardware Composer always sends all virtual display layers to GPU, all
-frames will be in GLES mode. Although it is not recommended, you may use this
-method if you need to support Hardware Composer 1.3 for some other reason but
-can’t conduct virtual display composition.</p>
+layers and GPU framebuffer are provided to Hardware Composer in set(). For
+details on virtual displays, see
+<a href="{@docRoot}devices/graphics/implement-vdisplays.html">Implementing
+Virtual Displays</a>.</p>
 
 <h2 id=testing>Testing</h2>
 
-<p>For benchmarking, we suggest following this flow by phase:</p>
+<p>For benchmarking, use the following flow by phase:</p>
 
-<ul> <li> <em>Specification</em> - When initially specifying the device, such
-as when using immature drivers, you should use predefined (fixed) clocks and
-workloads to measure the frames per second rendered. This gives a clear view of
-what the hardware is capable of doing.  <li> <em>Development</em> - In the
-development phase as drivers mature, you should use a fixed set of user actions
-to measure the number of visible stutters (janks) in animations.  <li>
-<em>Production</em> - Once the device is ready for production and you want to
-compare against competitors, you should increase the workload until stutters
-increase. Determine if the current clock settings can keep up with the load.
-This can help you identify where you might be able to slow the clocks and
-reduce power use.  </ul>
+<ul>
+  <li><em>Specification</em>. When initially specifying the device (such as when
+  using immature drivers), use predefined (fixed) clocks and workloads to
+  measure frames per second (fps) rendered. This gives a clear view of hardware
+  capabilities.</li>
+  <li><em>Development</em>. As drivers mature, use a fixed set of user actions
+  to measure the number of visible stutters (janks) in animations.</li>
+  <li><em>Production</em>. When a device is ready for comparison against
+  competitors, increase the workload until stutters increase. Determine if the
+  current clock settings can keep up with the load. This can help you identify
+  where to slow the clocks and reduce power use.</li>
+</ul>
 
-<p>For the specification phase, Android offers the Flatland tool to help derive
-device capabilities. It can be found at:
-<code>platform/frameworks/native/cmds/flatland/</code></p>
+<p>For help deriving device capabilities during the specification phase, use the
+Flatland tool at <code>platform/frameworks/native/cmds/flatland/</code>.
+Flatland relies upon fixed clocks and shows the throughput achievable with
+composition-based workloads. It uses gralloc buffers to simulate multiple window
+scenarios, filling in the window with GL then measuring the compositing.</p>
 
-<p>Flatland relies upon fixed clocks and shows the throughput that can be
-achieved with composition-based workloads. It uses gralloc buffers to simulate
-multiple window scenarios, filling in the window with GL and then measuring the
-compositing. Please note, Flatland uses the synchronization framework to
-measure time. So you must support the synchronization framework to readily use
-Flatland.</p>
+<p class="note"><strong>Note:</strong> Flatland uses the synchronization
+framework to measure time, so your implementation must support the
+synchronization framework.</p>
diff --git a/src/devices/graphics/index.jd b/src/devices/graphics/index.jd
index 4ce174f..b618909 100644
--- a/src/devices/graphics/index.jd
+++ b/src/devices/graphics/index.jd
@@ -206,7 +206,7 @@
 implemented their own implicit synchronization within their own drivers. This
 is no longer required with the Android graphics synchronization framework. See
 the
-<a href="{@docRoot}devices/graphics/implement.html#explicit_synchronization">Explicit
+<a href="{@docRoot}devices/graphics/implement-vsync.html#explicit_synchronization">Explicit
 synchronization</a> section for implementation instructions.</p>
 
 <p>The synchronization framework explicitly describes dependencies between