Heidi von Markham | fd022c7 | 2016-06-30 10:15:28 -0700 | [diff] [blame] | 1 | page.title=Implementing VSYNC |
| 2 | @jd:body |
| 3 | |
| 4 | <!-- |
| 5 | Copyright 2016 The Android Open Source Project |
| 6 | |
| 7 | Licensed under the Apache License, Version 2.0 (the "License"); |
| 8 | you may not use this file except in compliance with the License. |
| 9 | You may obtain a copy of the License at |
| 10 | |
| 11 | http://www.apache.org/licenses/LICENSE-2.0 |
| 12 | |
| 13 | Unless required by applicable law or agreed to in writing, software |
| 14 | distributed under the License is distributed on an "AS IS" BASIS, |
| 15 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. |
| 16 | See the License for the specific language governing permissions and |
| 17 | limitations under the License. |
| 18 | --> |
| 19 | |
| 20 | <div id="qv-wrapper"> |
| 21 | <div id="qv"> |
| 22 | <h2>In this document</h2> |
| 23 | <ol id="auto-toc"> |
| 24 | </ol> |
| 25 | </div> |
| 26 | </div> |
| 27 | |
| 28 | |
| 29 | <p>VSYNC synchronizes certain events to the refresh cycle of the display. |
| 30 | Applications always start drawing on a VSYNC boundary, and SurfaceFlinger |
| 31 | always composites on a VSYNC boundary. This eliminates stutters and improves |
| 32 | visual performance of graphics.</p> |
| 33 | |
| 34 | <p>The Hardware Composer (HWC) has a function pointer indicating the function |
| 35 | to implement for VSYNC:</p> |
| 36 | |
| 37 | <pre class=prettyprint> int (waitForVsync*) (int64_t *timestamp) </pre> |
| 38 | |
| 39 | <p>This function blocks until a VSYNC occurs and returns the timestamp of the |
| 40 | actual VSYNC. A message must be sent every time VSYNC occurs. A client can |
| 41 | receive a VSYNC timestamp once at specified intervals or continuously at |
| 42 | intervals of 1. You must implement VSYNC with a maximum 1 ms lag (0.5 ms or less |
| 43 | is recommended); timestamps returned must be extremely accurate.</p> |
| 44 | |
| 45 | <h2 id=explicit_synchronization>Explicit synchronization</h2> |
| 46 | |
| 47 | <p>Explicit synchronization is required and provides a mechanism for Gralloc |
| 48 | buffers to be acquired and released in a synchronized way. Explicit |
| 49 | synchronization allows producers and consumers of graphics buffers to signal |
| 50 | when they are done with a buffer. This allows Android to asynchronously queue |
| 51 | buffers to be read or written with the certainty that another consumer or |
| 52 | producer does not currently need them. For details, see |
| 53 | <a href="{@docRoot}devices/graphics/index.html#synchronization_framework">Synchronization |
| 54 | framework</a>.</p> |
| 55 | |
| 56 | <p>The benefits of explicit synchronization include less behavior variation |
| 57 | between devices, better debugging support, and improved testing metrics. For |
| 58 | instance, the sync framework output readily identifies problem areas and root |
| 59 | causes, and centralized SurfaceFlinger presentation timestamps show when events |
| 60 | occur in the normal flow of the system.</p> |
| 61 | |
| 62 | <p>This communication is facilitated by the use of synchronization fences, |
| 63 | which are required when requesting a buffer for consuming or producing. The |
| 64 | synchronization framework consists of three main building blocks: |
| 65 | <code>sync_timeline</code>, <code>sync_pt</code>, and <code>sync_fence</code>.</p> |
| 66 | |
| 67 | <h3 id=sync_timeline>sync_timeline</h3> |
| 68 | |
| 69 | <p>A <code>sync_timeline</code> is a monotonically increasing timeline that |
| 70 | should be implemented for each driver instance, such as a GL context, display |
| 71 | controller, or 2D blitter. This is essentially a counter of jobs submitted to |
| 72 | the kernel for a particular piece of hardware. It provides guarantees about the |
| 73 | order of operations and allows hardware-specific implementations.</p> |
| 74 | |
| 75 | <p>The sync_timeline is offered as a CPU-only reference implementation called |
| 76 | <code>sw_sync</code> (software sync). If possible, use this instead of a |
| 77 | <code>sync_timeline</code> to save resources and avoid complexity. If you’re not |
| 78 | employing a hardware resource, <code>sw_sync</code> should be sufficient.</p> |
| 79 | |
| 80 | <p>If you must implement a <code>sync_timeline</code>, use the |
| 81 | <code>sw_sync</code> driver as a starting point. Follow these guidelines:</p> |
| 82 | |
| 83 | <ul> |
| 84 | <li>Provide useful names for all drivers, timelines, and fences. This simplifies |
| 85 | debugging.</li> |
| 86 | <li>Implement <code>timeline_value_str</code> and <code>pt_value_str</code> |
| 87 | operators in your timelines to make debugging output more readable.</li> |
| 88 | <li>If you want your userspace libraries (such as the GL library) to have access |
| 89 | to the private data of your timelines, implement the fill driver_data operator. |
| 90 | This lets you get information about the immutable sync_fence and |
| 91 | <code>sync_pts</code> so you can build command lines based upon them.</li> |
| 92 | </ul> |
| 93 | |
| 94 | <p>When implementing a <code>sync_timeline</code>, <strong>do not</strong>:</p> |
| 95 | |
| 96 | <ul> |
| 97 | <li>Base it on any real view of time, such as when a wall clock or other piece |
| 98 | of work might finish. It is better to create an abstract timeline that you can |
| 99 | control.</li> |
| 100 | <li>Allow userspace to explicitly create or signal a fence. This can result in |
| 101 | one piece of the user pipeline creating a denial-of-service attack that halts |
| 102 | all functionality. This is because the userspace cannot make promises on behalf |
| 103 | of the kernel.</li> |
| 104 | <li>Access <code>sync_timeline</code>, <code>sync_pt</code>, or |
| 105 | <code>sync_fence</code> elements explicitly, as the API should provide all |
| 106 | required functions.</li> |
| 107 | </ul> |
| 108 | |
| 109 | <h3 id=sync_pt>sync_pt</h3> |
| 110 | |
| 111 | <p>A <code>sync_pt</code> is a single value or point on a sync_timeline. A point |
| 112 | has three states: active, signaled, and error. Points start in the active state |
| 113 | and transition to the signaled or error states. For instance, when a buffer is |
| 114 | no longer needed by an image consumer, this sync_point is signaled so image |
| 115 | producers know it is okay to write into the buffer again.</p> |
| 116 | |
| 117 | <h3 id=sync_fence>sync_fence</h3> |
| 118 | |
| 119 | <p>A <code>sync_fence</code> is a collection of <code>sync_pts</code> that often |
| 120 | have different <code>sync_timeline</code> parents (such as for the display |
| 121 | controller and GPU). These are the main primitives over which drivers and |
| 122 | userspace communicate their dependencies. A fence is a promise from the kernel |
| 123 | given upon accepting work that has been queued and assures completion in a |
| 124 | finite amount of time.</p> |
| 125 | |
| 126 | <p>This allows multiple consumers or producers to signal they are using a |
| 127 | buffer and to allow this information to be communicated with one function |
| 128 | parameter. Fences are backed by a file descriptor and can be passed from |
| 129 | kernel-space to user-space. For instance, a fence can contain two |
| 130 | <code>sync_points</code> that signify when two separate image consumers are done |
| 131 | reading a buffer. When the fence is signaled, the image producers know both |
| 132 | consumers are done consuming.</p> |
| 133 | |
| 134 | <p>Fences, like <code>sync_pts</code>, start active and then change state based |
| 135 | upon the state of their points. If all <code>sync_pts</code> become signaled, |
| 136 | the <code>sync_fence</code> becomes signaled. If one <code>sync_pt</code> falls |
| 137 | into an error state, the entire sync_fence has an error state.</p> |
| 138 | |
| 139 | <p>Membership in the <code>sync_fence</code> is immutable after the fence is |
| 140 | created. As a <code>sync_pt</code> can be in only one fence, it is included as a |
| 141 | copy. Even if two points have the same value, there will be two copies of the |
| 142 | <code>sync_pt</code> in the fence. To get more than one point in a fence, a |
| 143 | merge operation is conducted where points from two distinct fences are added to |
| 144 | a third fence. If one of those points was signaled in the originating fence and |
| 145 | the other was not, the third fence will also not be in a signaled state.</p> |
| 146 | |
| 147 | <p>To implement explicit synchronization, provide the following:</p> |
| 148 | |
| 149 | <ul> |
| 150 | <li>A kernel-space driver that implements a synchronization timeline for a |
| 151 | particular piece of hardware. Drivers that need to be fence-aware are generally |
| 152 | anything that accesses or communicates with the Hardware Composer. Key files |
| 153 | include: |
| 154 | <ul> |
| 155 | <li>Core implementation: |
| 156 | <ul> |
| 157 | <li><code>kernel/common/include/linux/sync.h</code></li> |
| 158 | <li><code>kernel/common/drivers/base/sync.c</code></li> |
| 159 | </ul></li> |
| 160 | <li><code>sw_sync</code>: |
| 161 | <ul> |
| 162 | <li><code>kernel/common/include/linux/sw_sync.h</code></li> |
| 163 | <li><code>kernel/common/drivers/base/sw_sync.c</code></li> |
| 164 | </ul></li> |
| 165 | <li>Documentation at <code>kernel/common//Documentation/sync.txt</code>.</li> |
| 166 | <li>Library to communicate with the kernel-space in |
| 167 | <code>platform/system/core/libsync</code>.</li> |
| 168 | </ul></li> |
| 169 | <li>A Hardware Composer HAL module (v1.3 or higher) that supports the new |
| 170 | synchronization functionality. You must provide the appropriate synchronization |
| 171 | fences as parameters to the <code>set()</code> and <code>prepare()</code> |
| 172 | functions in the HAL.</li> |
| 173 | <li>Two fence-related GL extensions (<code>EGL_ANDROID_native_fence_sync</code> |
| 174 | and <code>EGL_ANDROID_wait_sync</code>) and fence support in your graphics |
| 175 | drivers.</li> |
| 176 | </ul> |
| 177 | |
| 178 | <p>For example, to use the API supporting the synchronization function, you |
| 179 | might develop a display driver that has a display buffer function. Before the |
| 180 | synchronization framework existed, this function would receive dma-bufs, put |
| 181 | those buffers on the display, and block while the buffer is visible. For |
| 182 | example:</p> |
| 183 | |
| 184 | <pre class=prettyprint>/* |
| 185 | * assumes buf is ready to be displayed. returns when buffer is no longer on |
| 186 | * screen. |
| 187 | */ |
| 188 | void display_buffer(struct dma_buf *buf); |
| 189 | </pre> |
| 190 | |
| 191 | <p>With the synchronization framework, the API call is slightly more complex. |
| 192 | While putting a buffer on display, you associate it with a fence that says when |
| 193 | the buffer will be ready. You can queue up the work and initiate after the fence |
| 194 | clears.</p> |
| 195 | |
| 196 | <p>In this manner, you are not blocking anything. You immediately return your |
| 197 | own fence, which is a guarantee of when the buffer will be off of the display. |
| 198 | As you queue up buffers, the kernel will list dependencies with the |
| 199 | synchronization framework:</p> |
| 200 | |
| 201 | <pre class=prettyprint>/* |
| 202 | * will display buf when fence is signaled. returns immediately with a fence |
| 203 | * that will signal when buf is no longer displayed. |
| 204 | */ |
| 205 | struct sync_fence* display_buffer(struct dma_buf *buf, struct sync_fence |
| 206 | *fence); |
| 207 | </pre> |
| 208 | |
| 209 | |
| 210 | <h2 id=sync_integration>Sync integration</h2> |
| 211 | <p>This section explains how to integrate the low-level sync framework with |
| 212 | different parts of the Android framework and the drivers that must communicate |
| 213 | with one another.</p> |
| 214 | |
| 215 | <h3 id=integration_conventions>Integration conventions</h3> |
| 216 | |
| 217 | <p>The Android HAL interfaces for graphics follow consistent conventions so |
| 218 | when file descriptors are passed across a HAL interface, ownership of the file |
| 219 | descriptor is always transferred. This means:</p> |
| 220 | |
| 221 | <ul> |
| 222 | <li>If you receive a fence file descriptor from the sync framework, you must |
| 223 | close it.</li> |
| 224 | <li>If you return a fence file descriptor to the sync framework, the framework |
| 225 | will close it.</li> |
| 226 | <li>To continue using the fence file descriptor, you must duplicate the |
| 227 | descriptor.</li> |
| 228 | </ul> |
| 229 | |
| 230 | <p>Every time a fence passes through BufferQueue (such as for a window that |
| 231 | passes a fence to BufferQueue saying when its new contents will be ready) the |
| 232 | fence object is renamed. Since kernel fence support allows fences to have |
| 233 | strings for names, the sync framework uses the window name and buffer index |
| 234 | that is being queued to name the fence (i.e., <code>SurfaceView:0</code>). This |
| 235 | is helpful in debugging to identify the source of a deadlock as the names appear |
| 236 | in the output of <code>/d/sync</code> and bug reports.</p> |
| 237 | |
| 238 | <h3 id=anativewindow_integration>ANativeWindow integration</h3> |
| 239 | |
| 240 | <p>ANativeWindow is fence aware and <code>dequeueBuffer</code>, |
| 241 | <code>queueBuffer</code>, and <code>cancelBuffer</code> have fence parameters. |
| 242 | </p> |
| 243 | |
| 244 | <h3 id=opengl_es_integration>OpenGL ES integration</h3> |
| 245 | |
| 246 | <p>OpenGL ES sync integration relies upon two EGL extensions:</p> |
| 247 | |
| 248 | <ul> |
| 249 | <li><code>EGL_ANDROID_native_fence_sync</code>. Provides a way to either |
| 250 | wrap or create native Android fence file descriptors in EGLSyncKHR objects.</li> |
| 251 | <li><code>EGL_ANDROID_wait_sync</code>. Allows GPU-side stalls rather than in |
| 252 | CPU, making the GPU wait for an EGLSyncKHR. This is essentially the same as the |
| 253 | <code>EGL_KHR_wait_sync</code> extension (refer to that specification for |
| 254 | details).</li> |
| 255 | </ul> |
| 256 | |
| 257 | <p>These extensions can be used independently and are controlled by a compile |
| 258 | flag in libgui. To use them, first implement the |
| 259 | <code>EGL_ANDROID_native_fence_sync</code> extension along with the associated |
| 260 | kernel support. Next, add a ANativeWindow support for fences to your driver then |
| 261 | turn on support in libgui to make use of the |
| 262 | <code>EGL_ANDROID_native_fence_sync</code> extension.</p> |
| 263 | |
| 264 | <p>In a second pass, enable the <code>EGL_ANDROID_wait_sync</code> |
| 265 | extension in your driver and turn it on separately. The |
| 266 | <code>EGL_ANDROID_native_fence_sync</code> extension consists of a distinct |
| 267 | native fence EGLSync object type so extensions that apply to existing EGLSync |
| 268 | object types don’t necessarily apply to <code>EGL_ANDROID_native_fence</code> |
| 269 | objects to avoid unwanted interactions.</p> |
| 270 | |
| 271 | <p>The EGL_ANDROID_native_fence_sync extension employs a corresponding native |
| 272 | fence file descriptor attribute that can be set only at creation time and |
| 273 | cannot be directly queried onward from an existing sync object. This attribute |
| 274 | can be set to one of two modes:</p> |
| 275 | |
| 276 | <ul> |
| 277 | <li><em>A valid fence file descriptor</em>. Wraps an existing native Android |
| 278 | fence file descriptor in an EGLSyncKHR object.</li> |
| 279 | <li><em>-1</em>. Creates a native Android fence file descriptor from an |
| 280 | EGLSyncKHR object.</li> |
| 281 | </ul> |
| 282 | |
| 283 | <p>The DupNativeFenceFD function call is used to extract the EGLSyncKHR object |
| 284 | from the native Android fence file descriptor. This has the same result as |
| 285 | querying the attribute that was set but adheres to the convention that the |
| 286 | recipient closes the fence (hence the duplicate operation). Finally, destroying |
| 287 | the EGLSync object should close the internal fence attribute.</p> |
| 288 | |
| 289 | <h3 id=hardware_composer_integration>Hardware Composer integration</h3> |
| 290 | |
| 291 | <p>The Hardware Composer handles three types of sync fences:</p> |
| 292 | |
| 293 | <ul> |
| 294 | <li><em>Acquire fence</em>. One per layer, set before calling |
| 295 | <code>HWC::set</code>. It signals when Hardware Composer may read the buffer.</li> |
| 296 | <li><em>Release fence</em>. One per layer, filled in by the driver in |
| 297 | <code>HWC::set</code>. It signals when Hardware Composer is done reading the |
| 298 | buffer so the framework can start using that buffer again for that particular |
| 299 | layer.</li> |
| 300 | <li><em>Retire fence</em>. One per the entire frame, filled in by the driver |
| 301 | each time <code>HWC::set</code> is called. This covers all layers for the set |
| 302 | operation and signals to the framework when all effects of this set operation |
| 303 | have completed. The retire fence signals when the next set operation takes place |
| 304 | on the screen.</li> |
| 305 | </ul> |
| 306 | |
| 307 | <p>The retire fence can be used to determine how long each frame appears on the |
| 308 | screen. This is useful in identifying the location and source of delays, such |
| 309 | as a stuttering animation.</p> |
| 310 | |
| 311 | <h2 id=vsync_offset>VSYNC offset</h2> |
| 312 | |
| 313 | <p>Application and SurfaceFlinger render loops should be synchronized to the |
| 314 | hardware VSYNC. On a VSYNC event, the display begins showing frame N while |
| 315 | SurfaceFlinger begins compositing windows for frame N+1. The app handles |
| 316 | pending input and generates frame N+2.</p> |
| 317 | |
| 318 | <p>Synchronizing with VSYNC delivers consistent latency. It reduces errors in |
| 319 | apps and SurfaceFlinger and the drifting of displays in and out of phase with |
| 320 | each other. This, however, does assume application and SurfaceFlinger per-frame |
| 321 | times don’t vary widely. Nevertheless, the latency is at least two frames.</p> |
| 322 | |
| 323 | <p>To remedy this, you can employ VSYNC offsets to reduce the input-to-display |
| 324 | latency by making application and composition signal relative to hardware |
| 325 | VSYNC. This is possible because application plus composition usually takes less |
| 326 | than 33 ms.</p> |
| 327 | |
| 328 | <p>The result of VSYNC offset is three signals with same period, offset |
| 329 | phase:</p> |
| 330 | |
| 331 | <ul> |
| 332 | <li><code>HW_VSYNC_0</code>. Display begins showing next frame.</li> |
| 333 | <li><code>VSYNC</code>. App reads input and generates next frame.</li> |
| 334 | <li><code>SF VSYNC</code>. SurfaceFlinger begins compositing for next frame.</li> |
| 335 | </ul> |
| 336 | |
| 337 | <p>With VSYNC offset, SurfaceFlinger receives the buffer and composites the |
| 338 | frame, while the application processes the input and renders the frame, all |
| 339 | within a single frame of time.</p> |
| 340 | |
| 341 | <p class="note"><strong>Note:</strong> VSYNC offsets reduce the time available |
| 342 | for app and composition and therefore provide a greater chance for error.</p> |
| 343 | |
| 344 | <h3 id=dispsync>DispSync</h3> |
| 345 | |
| 346 | <p>DispSync maintains a model of the periodic hardware-based VSYNC events of a |
| 347 | display and uses that model to execute periodic callbacks at specific phase |
| 348 | offsets from the hardware VSYNC events.</p> |
| 349 | |
| 350 | <p>DispSync is essentially a software phase lock loop (PLL) that generates the |
| 351 | VSYNC and SF VSYNC signals used by Choreographer and SurfaceFlinger, even if |
| 352 | not offset from hardware VSYNC.</p> |
| 353 | |
| 354 | <img src="images/dispsync.png" alt="DispSync flow"> |
| 355 | |
| 356 | <p class="img-caption"><strong>Figure 1.</strong> DispSync flow</p> |
| 357 | |
| 358 | <p>DispSync has the following qualities:</p> |
| 359 | |
| 360 | <ul> |
| 361 | <li><em>Reference</em>. HW_VSYNC_0.</li> |
| 362 | <li><em>Output</em>. VSYNC and SF VSYNC.</li> |
| 363 | <li><em>Feedback</em>. Retire fence signal timestamps from Hardware Composer. |
| 364 | </li> |
| 365 | </ul> |
| 366 | |
| 367 | <h3 id=vsync_retire_offset>VSYNC/Retire offset</h3> |
| 368 | |
| 369 | <p>The signal timestamp of retire fences must match HW VSYNC even on devices |
| 370 | that don’t use the offset phase. Otherwise, errors appear to have greater |
| 371 | severity than reality. Smart panels often have a delta: Retire fence is the end |
| 372 | of direct memory access (DMA) to display memory, but the actual display switch |
| 373 | and HW VSYNC is some time later.</p> |
| 374 | |
| 375 | <p><code>PRESENT_TIME_OFFSET_FROM_VSYNC_NS</code> is set in the device’s |
| 376 | BoardConfig.mk make file. It is based upon the display controller and panel |
| 377 | characteristics. Time from retire fence timestamp to HW VSYNC signal is |
| 378 | measured in nanoseconds.</p> |
| 379 | |
| 380 | <h3 id=vsync_and_sf_vsync_offsets>VSYNC and SF_VSYNC offsets</h3> |
| 381 | |
| 382 | <p>The <code>VSYNC_EVENT_PHASE_OFFSET_NS</code> and |
| 383 | <code>SF_VSYNC_EVENT_PHASE_OFFSET_NS</code> are set conservatively based on |
| 384 | high-load use cases, such as partial GPU composition during window transition |
| 385 | or Chrome scrolling through a webpage containing animations. These offsets |
| 386 | allow for long application render time and long GPU composition time.</p> |
| 387 | |
| 388 | <p>More than a millisecond or two of latency is noticeable. We recommend |
| 389 | integrating thorough automated error testing to minimize latency without |
| 390 | significantly increasing error counts.</p> |
| 391 | |
| 392 | <p class="note"><strong>Note:</strong> Theses offsets are also configured in the |
| 393 | device’s BoardConfig.mk file. Both settings are offset in nanoseconds after |
| 394 | HW_VSYNC_0, default to zero (if not set), and can be negative.</p> |