Clay Murphy | 74643ca | 2014-09-02 17:30:57 -0700 | [diff] [blame] | 1 | page.title=Implementing graphics |
| 2 | @jd:body |
| 3 | |
| 4 | <!-- |
| 5 | Copyright 2014 The Android Open Source Project |
| 6 | |
| 7 | Licensed under the Apache License, Version 2.0 (the "License"); |
| 8 | you may not use this file except in compliance with the License. |
| 9 | You may obtain a copy of the License at |
| 10 | |
| 11 | http://www.apache.org/licenses/LICENSE-2.0 |
| 12 | |
| 13 | Unless required by applicable law or agreed to in writing, software |
| 14 | distributed under the License is distributed on an "AS IS" BASIS, |
| 15 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. |
| 16 | See the License for the specific language governing permissions and |
| 17 | limitations under the License. |
| 18 | --> |
| 19 | |
| 20 | <div id="qv-wrapper"> |
| 21 | <div id="qv"> |
| 22 | <h2>In this document</h2> |
| 23 | <ol id="auto-toc"> |
| 24 | </ol> |
| 25 | </div> |
| 26 | </div> |
| 27 | |
| 28 | |
| 29 | <p>Follow the instructions here to implement the Android graphics HAL.</p> |
| 30 | |
| 31 | <h2 id=requirements>Requirements</h2> |
| 32 | |
| 33 | <p>The following list and sections describe what you need to provide to support |
| 34 | graphics in your product:</p> |
| 35 | |
| 36 | <ul> <li> OpenGL ES 1.x Driver <li> OpenGL ES 2.0 Driver <li> OpenGL ES 3.0 |
| 37 | Driver (optional) <li> EGL Driver <li> Gralloc HAL implementation <li> Hardware |
| 38 | Composer HAL implementation <li> Framebuffer HAL implementation </ul> |
| 39 | |
| 40 | <h2 id=implementation>Implementation</h2> |
| 41 | |
| 42 | <h3 id=opengl_and_egl_drivers>OpenGL and EGL drivers</h3> |
| 43 | |
| 44 | <p>You must provide drivers for OpenGL ES 1.x, OpenGL ES 2.0, and EGL. Here are |
| 45 | some key considerations:</p> |
| 46 | |
| 47 | <ul> <li> The GL driver needs to be robust and conformant to OpenGL ES |
| 48 | standards. <li> Do not limit the number of GL contexts. Because Android allows |
| 49 | apps in the background and tries to keep GL contexts alive, you should not |
| 50 | limit the number of contexts in your driver. <li> It is not uncommon to have |
| 51 | 20-30 active GL contexts at once, so you should also be careful with the amount |
| 52 | of memory allocated for each context. <li> Support the YV12 image format and |
| 53 | any other YUV image formats that come from other components in the system such |
| 54 | as media codecs or the camera. <li> Support the mandatory extensions: |
| 55 | <code>GL_OES_texture_external</code>, |
| 56 | <code>EGL_ANDROID_image_native_buffer</code>, and |
| 57 | <code>EGL_ANDROID_recordable</code>. The |
| 58 | <code>EGL_ANDROID_framebuffer_target</code> extension is required for Hardware |
| 59 | Composer 1.1 and higher, as well. <li> We highly recommend also supporting |
| 60 | <code>EGL_ANDROID_blob_cache</code>, <code>EGL_KHR_fence_sync</code>, |
| 61 | <code>EGL_KHR_wait_sync</code>, and <code>EGL_ANDROID_native_fence_sync</code>. |
| 62 | </ul> |
| 63 | |
| 64 | <p>Note the OpenGL API exposed to app developers is different from the OpenGL |
| 65 | interface that you are implementing. Apps do not have access to the GL driver |
| 66 | layer and must go through the interface provided by the APIs.</p> |
| 67 | |
| 68 | <h3 id=pre-rotation>Pre-rotation</h3> |
| 69 | |
| 70 | <p>Many hardware overlays do not support rotation, and even if they do it costs |
| 71 | processing power. So the solution is to pre-transform the buffer before it |
| 72 | reaches SurfaceFlinger. A query hint in <code>ANativeWindow</code> was added |
| 73 | (<code>NATIVE_WINDOW_TRANSFORM_HINT</code>) that represents the most likely |
| 74 | transform to be applied to the buffer by SurfaceFlinger. Your GL driver can use |
| 75 | this hint to pre-transform the buffer before it reaches SurfaceFlinger so when |
| 76 | the buffer arrives, it is correctly transformed.</p> |
| 77 | |
| 78 | <p>For example, you may receive a hint to rotate 90 degrees. You must generate |
| 79 | a matrix and apply it to the buffer to prevent it from running off the end of |
| 80 | the page. To save power, this should be done in pre-rotation. See the |
| 81 | <code>ANativeWindow</code> interface defined in |
| 82 | <code>system/core/include/system/window.h</code> for more details.</p> |
| 83 | |
| 84 | <h3 id=gralloc_hal>Gralloc HAL</h3> |
| 85 | |
| 86 | <p>The graphics memory allocator is needed to allocate memory that is requested |
| 87 | by image producers. You can find the interface definition of the HAL at: |
| 88 | <code>hardware/libhardware/modules/gralloc.h</code></p> |
| 89 | |
| 90 | <h3 id=protected_buffers>Protected buffers</h3> |
| 91 | |
| 92 | <p>The gralloc usage flag <code>GRALLOC_USAGE_PROTECTED</code> allows the |
| 93 | graphics buffer to be displayed only through a hardware-protected path. These |
| 94 | overlay planes are the only way to display DRM content. DRM-protected buffers |
| 95 | cannot be accessed by SurfaceFlinger or the OpenGL ES driver.</p> |
| 96 | |
| 97 | <p>DRM-protected video can be presented only on an overlay plane. Video players |
| 98 | that support protected content must be implemented with SurfaceView. Software |
| 99 | running on unprotected hardware cannot read or write the buffer. |
| 100 | Hardware-protected paths must appear on the Hardware Composer overlay. For |
| 101 | instance, protected videos will disappear from the display if Hardware Composer |
| 102 | switches to OpenGL ES composition.</p> |
| 103 | |
| 104 | <p>See the <a href="{@docRoot}devices/drm.html">DRM</a> page for a description |
| 105 | of protected content.</p> |
| 106 | |
| 107 | <h3 id=hardware_composer_hal>Hardware Composer HAL</h3> |
| 108 | |
| 109 | <p>The Hardware Composer HAL is used by SurfaceFlinger to composite surfaces to |
| 110 | the screen. The Hardware Composer abstracts objects like overlays and 2D |
| 111 | blitters and helps offload some work that would normally be done with |
| 112 | OpenGL.</p> |
| 113 | |
| 114 | <p>We recommend you start using version 1.3 of the Hardware Composer HAL as it |
| 115 | will provide support for the newest features (explicit synchronization, |
| 116 | external displays, and more). Because the physical display hardware behind the |
| 117 | Hardware Composer abstraction layer can vary from device to device, it is |
| 118 | difficult to define recommended features. But here is some guidance:</p> |
| 119 | |
| 120 | <ul> <li> The Hardware Composer should support at least four overlays (status |
| 121 | bar, system bar, application, and wallpaper/background). <li> Layers can be |
| 122 | bigger than the screen, so the Hardware Composer should be able to handle |
| 123 | layers that are larger than the display (for example, a wallpaper). <li> |
| 124 | Pre-multiplied per-pixel alpha blending and per-plane alpha blending should be |
| 125 | supported at the same time. <li> The Hardware Composer should be able to |
| 126 | consume the same buffers that the GPU, camera, video decoder, and Skia buffers |
| 127 | are producing, so supporting some of the following properties is helpful: <ul> |
| 128 | <li> RGBA packing order <li> YUV formats <li> Tiling, swizzling, and stride |
| 129 | properties </ul> <li> A hardware path for protected video playback must be |
| 130 | present if you want to support protected content. </ul> |
| 131 | |
| 132 | <p>The general recommendation when implementing your Hardware Composer is to |
| 133 | implement a non-operational Hardware Composer first. Once you have the |
| 134 | structure done, implement a simple algorithm to delegate composition to the |
| 135 | Hardware Composer. For example, just delegate the first three or four surfaces |
| 136 | to the overlay hardware of the Hardware Composer.</p> |
| 137 | |
| 138 | <p>Focus on optimization, such as intelligently selecting the surfaces to send |
| 139 | to the overlay hardware that maximizes the load taken off of the GPU. Another |
| 140 | optimization is to detect whether the screen is updating. If not, delegate |
| 141 | composition to OpenGL instead of the Hardware Composer to save power. When the |
| 142 | screen updates again, continue to offload composition to the Hardware |
| 143 | Composer.</p> |
| 144 | |
| 145 | <p>Devices must report the display mode (or resolution). Android uses the first |
| 146 | mode reported by the device. To support televisions, have the TV device report |
| 147 | the mode selected for it by the manufacturer to Hardware Composer. See |
| 148 | hwcomposer.h for more details.</p> |
| 149 | |
| 150 | <p>Prepare for common use cases, such as:</p> |
| 151 | |
| 152 | <ul> <li> Full-screen games in portrait and landscape mode <li> Full-screen |
| 153 | video with closed captioning and playback control <li> The home screen |
| 154 | (compositing the status bar, system bar, application window, and live |
| 155 | wallpapers) <li> Protected video playback <li> Multiple display support </ul> |
| 156 | |
| 157 | <p>These use cases should address regular, predictable uses rather than edge |
| 158 | cases that are rarely encountered. Otherwise, any optimization will have little |
| 159 | benefit. Implementations must balance two competing goals: animation smoothness |
| 160 | and interaction latency.</p> |
| 161 | |
| 162 | <p>Further, to make best use of Android graphics, you must develop a robust |
| 163 | clocking strategy. Performance matters little if clocks have been turned down |
| 164 | to make every operation slow. You need a clocking strategy that puts the clocks |
| 165 | at high speed when needed, such as to make animations seamless, and then slows |
| 166 | the clocks whenever the increased speed is no longer needed.</p> |
| 167 | |
| 168 | <p>Use the <code>adb shell dumpsys SurfaceFlinger</code> command to see |
| 169 | precisely what SurfaceFlinger is doing. See the <a |
| 170 | href="{@docRoot}devices/graphics/architecture.html#hwcomposer">Hardware |
| 171 | Composer</a> section of the Architecture page for example output and a |
| 172 | description of relevant fields.</p> |
| 173 | |
| 174 | <p>You can find the HAL for the Hardware Composer and additional documentation |
| 175 | in: <code>hardware/libhardware/include/hardware/hwcomposer.h |
| 176 | hardware/libhardware/include/hardware/hwcomposer_defs.h</code></p> |
| 177 | |
| 178 | <p>A stub implementation is available in the |
| 179 | <code>hardware/libhardware/modules/hwcomposer</code> directory.</p> |
| 180 | |
| 181 | <h3 id=vsync>VSYNC</h3> |
| 182 | |
| 183 | <p>VSYNC synchronizes certain events to the refresh cycle of the display. |
| 184 | Applications always start drawing on a VSYNC boundary, and SurfaceFlinger |
| 185 | always composites on a VSYNC boundary. This eliminates stutters and improves |
| 186 | visual performance of graphics. The Hardware Composer has a function |
| 187 | pointer:</p> |
| 188 | |
| 189 | <pre class=prettyprint> int (waitForVsync*) (int64_t *timestamp) </pre> |
| 190 | |
| 191 | |
| 192 | <p>This points to a function you must implement for VSYNC. This function blocks |
| 193 | until a VSYNC occurs and returns the timestamp of the actual VSYNC. A message |
| 194 | must be sent every time VSYNC occurs. A client can receive a VSYNC timestamp |
| 195 | once, at specified intervals, or continuously (interval of 1). You must |
| 196 | implement VSYNC to have no more than a 1ms lag at the maximum (0.5ms or less is |
| 197 | recommended), and the timestamps returned must be extremely accurate.</p> |
| 198 | |
| 199 | <h4 id=explicit_synchronization>Explicit synchronization</h4> |
| 200 | |
| 201 | <p>Explicit synchronization is required and provides a mechanism for Gralloc |
| 202 | buffers to be acquired and released in a synchronized way. Explicit |
| 203 | synchronization allows producers and consumers of graphics buffers to signal |
| 204 | when they are done with a buffer. This allows the Android system to |
| 205 | asynchronously queue buffers to be read or written with the certainty that |
| 206 | another consumer or producer does not currently need them. See the <a |
| 207 | href="#synchronization_framework">Synchronization framework</a> section for an overview of |
| 208 | this mechanism.</p> |
| 209 | |
| 210 | <p>The benefits of explicit synchronization include less behavior variation |
| 211 | between devices, better debugging support, and improved testing metrics. For |
| 212 | instance, the sync framework output readily identifies problem areas and root |
| 213 | causes. And centralized SurfaceFlinger presentation timestamps show when events |
| 214 | occur in the normal flow of the system.</p> |
| 215 | |
| 216 | <p>This communication is facilitated by the use of synchronization fences, |
| 217 | which are now required when requesting a buffer for consuming or producing. The |
| 218 | synchronization framework consists of three main building blocks: |
| 219 | sync_timeline, sync_pt, and sync_fence.</p> |
| 220 | |
| 221 | <h5 id=sync_timeline>sync_timeline</h5> |
| 222 | |
| 223 | <p>A sync_timeline is a monotonically increasing timeline that should be |
| 224 | implemented for each driver instance, such as a GL context, display controller, |
| 225 | or 2D blitter. This is essentially a counter of jobs submitted to the kernel |
| 226 | for a particular piece of hardware. It provides guarantees about the order of |
| 227 | operations and allows hardware-specific implementations.</p> |
| 228 | |
| 229 | <p>Please note, the sync_timeline is offered as a CPU-only reference |
| 230 | implementation called sw_sync (which stands for software sync). If possible, |
| 231 | use sw_sync instead of a sync_timeline to save resources and avoid complexity. |
| 232 | If you’re not employing a hardware resource, sw_sync should be sufficient.</p> |
| 233 | |
| 234 | <p>If you must implement a sync_timeline, use the sw_sync driver as a starting |
| 235 | point. Follow these guidelines:</p> |
| 236 | |
| 237 | <ul> <li> Provide useful names for all drivers, timelines, and fences. This |
| 238 | simplifies debugging. <li> Implement timeline_value str and pt_value_str |
| 239 | operators in your timelines as they make debugging output much more readable. |
| 240 | <li> If you want your userspace libraries (such as the GL library) to have |
| 241 | access to the private data of your timelines, implement the fill driver_data |
| 242 | operator. This lets you get information about the immutable sync_fence and |
| 243 | sync_pts so you might build command lines based upon them. </ul> |
| 244 | |
| 245 | <p>When implementing a sync_timeline, <strong>don’t</strong>:</p> |
| 246 | |
| 247 | <ul> <li> Base it on any real view of time, such as when a wall clock or other |
| 248 | piece of work might finish. It is better to create an abstract timeline that |
| 249 | you can control. <li> Allow userspace to explicitly create or signal a fence. |
| 250 | This can result in one piece of the user pipeline creating a denial-of-service |
| 251 | attack that halts all functionality. This is because the userspace cannot make |
| 252 | promises on behalf of the kernel. <li> Access sync_timeline, sync_pt, or |
| 253 | sync_fence elements explicitly, as the API should provide all required |
| 254 | functions. </ul> |
| 255 | |
| 256 | <h5 id=sync_pt>sync_pt</h5> |
| 257 | |
| 258 | <p>A sync_pt is a single value or point on a sync_timeline. A point has three |
| 259 | states: active, signaled, and error. Points start in the active state and |
| 260 | transition to the signaled or error states. For instance, when a buffer is no |
| 261 | longer needed by an image consumer, this sync_point is signaled so that image |
| 262 | producers know it is okay to write into the buffer again.</p> |
| 263 | |
| 264 | <h5 id=sync_fence>sync_fence</h5> |
| 265 | |
| 266 | <p>A sync_fence is a collection of sync_pts that often have different |
| 267 | sync_timeline parents (such as for the display controller and GPU). These are |
| 268 | the main primitives over which drivers and userspace communicate their |
| 269 | dependencies. A fence is a promise from the kernel that it gives upon accepting |
| 270 | work that has been queued and assures completion in a finite amount of |
| 271 | time.</p> |
| 272 | |
| 273 | <p>This allows multiple consumers or producers to signal they are using a |
| 274 | buffer and to allow this information to be communicated with one function |
| 275 | parameter. Fences are backed by a file descriptor and can be passed from |
| 276 | kernel-space to user-space. For instance, a fence can contain two sync_points |
| 277 | that signify when two separate image consumers are done reading a buffer. When |
| 278 | the fence is signaled, the image producers know both consumers are done |
| 279 | consuming. |
| 280 | |
| 281 | Fences, like sync_pts, start active and then change state based upon the state |
| 282 | of their points. If all sync_pts become signaled, the sync_fence becomes |
| 283 | signaled. If one sync_pt falls into an error state, the entire sync_fence has |
| 284 | an error state. |
| 285 | |
| 286 | Membership in the sync_fence is immutable once the fence is created. And since |
| 287 | a sync_pt can be in only one fence, it is included as a copy. Even if two |
| 288 | points have the same value, there will be two copies of the sync_pt in the |
| 289 | fence. |
| 290 | |
| 291 | To get more than one point in a fence, a merge operation is conducted. In the |
| 292 | merge, the points from two distinct fences are added to a third fence. If one |
| 293 | of those points was signaled in the originating fence, and the other was not, |
| 294 | the third fence will also not be in a signaled state.</p> |
| 295 | |
| 296 | <p>To implement explicit synchronization, you need to provide the |
| 297 | following:</p> |
| 298 | |
| 299 | <ul> <li> A kernel-space driver that implements a synchronization timeline for |
| 300 | a particular piece of hardware. Drivers that need to be fence-aware are |
| 301 | generally anything that accesses or communicates with the Hardware Composer. |
| 302 | Here are the key files (found in the android-3.4 kernel branch): <ul> <li> Core |
| 303 | implementation: <ul> <li> <code>kernel/common/include/linux/sync.h</code> <li> |
| 304 | <code>kernel/common/drivers/base/sync.c</code> </ul> <li> sw_sync: <ul> <li> |
| 305 | <code>kernel/common/include/linux/sw_sync.h</code> <li> |
| 306 | <code>kernel/common/drivers/base/sw_sync.c</code> </ul> <li> Documentation: |
| 307 | <li> <code>kernel/common//Documentation/sync.txt</code> Finally, the |
| 308 | <code>platform/system/core/libsync</code> directory includes a library to |
| 309 | communicate with the kernel-space. </ul> <li> A Hardware Composer HAL module |
| 310 | (version 1.3 or later) that supports the new synchronization functionality. You |
| 311 | will need to provide the appropriate synchronization fences as parameters to |
| 312 | the set() and prepare() functions in the HAL. <li> Two GL-specific extensions |
| 313 | related to fences, <code>EGL_ANDROID_native_fence_sync</code> and |
| 314 | <code>EGL_ANDROID_wait_sync</code>, along with incorporating fence support into |
| 315 | your graphics drivers. </ul> |
| 316 | |
| 317 | <p>For example, to use the API supporting the synchronization function, you |
| 318 | might develop a display driver that has a display buffer function. Before the |
| 319 | synchronization framework existed, this function would receive dma-bufs, put |
| 320 | those buffers on the display, and block while the buffer is visible, like |
| 321 | so:</p> |
| 322 | |
| 323 | <pre class=prettyprint> |
| 324 | /* |
| 325 | * assumes buf is ready to be displayed. returns when buffer is no longer on |
| 326 | * screen. |
| 327 | */ |
| 328 | void display_buffer(struct dma_buf *buf); </pre> |
| 329 | |
| 330 | |
| 331 | <p>With the synchronization framework, the API call is slightly more complex. |
| 332 | While putting a buffer on display, you associate it with a fence that says when |
| 333 | the buffer will be ready. So you queue up the work, which you will initiate |
| 334 | once the fence clears.</p> |
| 335 | |
| 336 | <p>In this manner, you are not blocking anything. You immediately return your |
| 337 | own fence, which is a guarantee of when the buffer will be off of the display. |
| 338 | As you queue up buffers, the kernel will list dependencies. With the |
| 339 | synchronization framework:</p> |
| 340 | |
| 341 | <pre class=prettyprint> |
| 342 | /* |
| 343 | * will display buf when fence is signaled. returns immediately with a fence |
| 344 | * that will signal when buf is no longer displayed. |
| 345 | */ |
| 346 | struct sync_fence* display_buffer(struct dma_buf *buf, struct sync_fence |
| 347 | *fence); </pre> |
| 348 | |
| 349 | |
| 350 | <h4 id=sync_integration>Sync integration</h4> |
| 351 | |
| 352 | <h5 id=integration_conventions>Integration conventions</h5> |
| 353 | |
| 354 | <p>This section explains how to integrate the low-level sync framework with |
| 355 | different parts of the Android framework and the drivers that need to |
| 356 | communicate with one another.</p> |
| 357 | |
| 358 | <p>The Android HAL interfaces for graphics follow consistent conventions so |
| 359 | when file descriptors are passed across a HAL interface, ownership of the file |
| 360 | descriptor is always transferred. This means:</p> |
| 361 | |
| 362 | <ul> <li> if you receive a fence file descriptor from the sync framework, you |
| 363 | must close it. <li> if you return a fence file descriptor to the sync |
| 364 | framework, the framework will close it. <li> if you want to continue using the |
| 365 | fence file descriptor, you must duplicate the descriptor. </ul> |
| 366 | |
| 367 | <p>Every time a fence is passed through BufferQueue - such as for a window that |
| 368 | passes a fence to BufferQueue saying when its new contents will be ready - the |
| 369 | fence object is renamed. Since kernel fence support allows fences to have |
| 370 | strings for names, the sync framework uses the window name and buffer index |
| 371 | that is being queued to name the fence, for example: |
| 372 | <code>SurfaceView:0</code></p> |
| 373 | |
| 374 | <p>This is helpful in debugging to identify the source of a deadlock. Those |
| 375 | names appear in the output of <code>/d/sync</code> and bug reports when |
| 376 | taken.</p> |
| 377 | |
| 378 | <h5 id=anativewindow_integration>ANativeWindow integration</h5> |
| 379 | |
| 380 | <p>ANativeWindow is fence aware. <code>dequeueBuffer</code>, |
| 381 | <code>queueBuffer</code>, and <code>cancelBuffer</code> have fence |
| 382 | parameters.</p> |
| 383 | |
| 384 | <h5 id=opengl_es_integration>OpenGL ES integration</h5> |
| 385 | |
| 386 | <p>OpenGL ES sync integration relies upon these two EGL extensions:</p> |
| 387 | |
| 388 | <ul> <li> <code>EGL_ANDROID_native_fence_sync</code> - provides a way to either |
| 389 | wrap or create native Android fence file descriptors in EGLSyncKHR objects. |
| 390 | <li> <code>EGL_ANDROID_wait_sync</code> - allows GPU-side stalls rather than in |
| 391 | CPU, making the GPU wait for an EGLSyncKHR. This is essentially the same as the |
| 392 | <code>EGL_KHR_wait_sync</code> extension. See the |
| 393 | <code>EGL_KHR_wait_sync</code> specification for details. </ul> |
| 394 | |
| 395 | <p>These extensions can be used independently and are controlled by a compile |
| 396 | flag in libgui. To use them, first implement the |
| 397 | <code>EGL_ANDROID_native_fence_sync</code> extension along with the associated |
| 398 | kernel support. Next add a ANativeWindow support for fences to your driver and |
| 399 | then turn on support in libgui to make use of the |
| 400 | <code>EGL_ANDROID_native_fence_sync</code> extension.</p> |
| 401 | |
| 402 | <p>Then, as a second pass, enable the <code>EGL_ANDROID_wait_sync</code> |
| 403 | extension in your driver and turn it on separately. The |
| 404 | <code>EGL_ANDROID_native_fence_sync</code> extension consists of a distinct |
| 405 | native fence EGLSync object type so extensions that apply to existing EGLSync |
| 406 | object types don’t necessarily apply to <code>EGL_ANDROID_native_fence</code> |
| 407 | objects to avoid unwanted interactions.</p> |
| 408 | |
| 409 | <p>The EGL_ANDROID_native_fence_sync extension employs a corresponding native |
| 410 | fence file descriptor attribute that can be set only at creation time and |
| 411 | cannot be directly queried onward from an existing sync object. This attribute |
| 412 | can be set to one of two modes:</p> |
| 413 | |
| 414 | <ul> <li> A valid fence file descriptor - wraps an existing native Android |
| 415 | fence file descriptor in an EGLSyncKHR object. <li> -1 - creates a native |
| 416 | Android fence file descriptor from an EGLSyncKHR object. </ul> |
| 417 | |
| 418 | <p>The DupNativeFenceFD function call is used to extract the EGLSyncKHR object |
| 419 | from the native Android fence file descriptor. This has the same result as |
| 420 | querying the attribute that was set but adheres to the convention that the |
| 421 | recipient closes the fence (hence the duplicate operation). Finally, destroying |
| 422 | the EGLSync object should close the internal fence attribute.</p> |
| 423 | |
| 424 | <h5 id=hardware_composer_integration>Hardware Composer integration</h5> |
| 425 | |
| 426 | <p>Hardware Composer handles three types of sync fences:</p> |
| 427 | |
| 428 | <ul> <li> <em>Acquire fence</em> - one per layer, this is set before calling |
| 429 | HWC::set. It signals when Hardware Composer may read the buffer. <li> |
| 430 | <em>Release fence</em> - one per layer, this is filled in by the driver in |
| 431 | HWC::set. It signals when Hardware Composer is done reading the buffer so the |
| 432 | framework can start using that buffer again for that particular layer. <li> |
| 433 | <em>Retire fence</em> - one per the entire frame, this is filled in by the |
| 434 | driver each time HWC::set is called. This covers all of the layers for the set |
| 435 | operation. It signals to the framework when all of the effects of this set |
| 436 | operation has completed. The retire fence signals when the next set operation |
| 437 | takes place on the screen. </ul> |
| 438 | |
| 439 | <p>The retire fence can be used to determine how long each frame appears on the |
| 440 | screen. This is useful in identifying the location and source of delays, such |
| 441 | as a stuttering animation. </p> |
| 442 | |
| 443 | <h4 id=vsync_offset>VSYNC Offset</h4> |
| 444 | |
| 445 | <p>Application and SurfaceFlinger render loops should be synchronized to the |
| 446 | hardware VSYNC. On a VSYNC event, the display begins showing frame N while |
| 447 | SurfaceFlinger begins compositing windows for frame N+1. The app handles |
| 448 | pending input and generates frame N+2.</p> |
| 449 | |
| 450 | <p>Synchronizing with VSYNC delivers consistent latency. It reduces errors in |
| 451 | apps and SurfaceFlinger and the drifting of displays in and out of phase with |
| 452 | each other. This, however, does assume application and SurfaceFlinger per-frame |
| 453 | times don’t vary widely. Nevertheless, the latency is at least two frames.</p> |
| 454 | |
| 455 | <p>To remedy this, you may employ VSYNC offsets to reduce the input-to-display |
| 456 | latency by making application and composition signal relative to hardware |
| 457 | VSYNC. This is possible because application plus composition usually takes less |
| 458 | than 33 ms.</p> |
| 459 | |
| 460 | <p>The result of VSYNC offset is three signals with same period, offset |
| 461 | phase:</p> |
| 462 | |
| 463 | <ul> <li> <em>HW_VSYNC_0</em> - Display begins showing next frame <li> |
| 464 | <em>VSYNC</em> - App reads input and generates next frame <li> <em>SF |
| 465 | VSYNC</em> - SurfaceFlinger begins compositing for next frame </ul> |
| 466 | |
| 467 | <p>With VSYNC offset, SurfaceFlinger receives the buffer and composites the |
| 468 | frame, while the application processes the input and renders the frame, all |
| 469 | within a single frame of time.</p> |
| 470 | |
| 471 | <p>Please note, VSYNC offsets reduce the time available for app and composition |
| 472 | and therefore provide a greater chance for error.</p> |
| 473 | |
| 474 | <h5 id=dispsync>DispSync</h5> |
| 475 | |
| 476 | <p>DispSync maintains a model of the periodic hardware-based VSYNC events of a |
| 477 | display and uses that model to execute periodic callbacks at specific phase |
| 478 | offsets from the hardware VSYNC events.</p> |
| 479 | |
| 480 | <p>DispSync is essentially a software phase lock loop (PLL) that generates the |
| 481 | VSYNC and SF VSYNC signals used by Choreographer and SurfaceFlinger, even if |
| 482 | not offset from hardware VSYNC.</p> |
| 483 | |
| 484 | <img src="images/dispsync.png" alt="DispSync flow"> |
| 485 | |
| 486 | <p class="img-caption"><strong>Figure 4.</strong> DispSync flow</p> |
| 487 | |
| 488 | <p>DispSync has these qualities:</p> |
| 489 | |
| 490 | <ul> <li> <em>Reference</em> - HW_VSYNC_0 <li> <em>Output</em> - VSYNC and SF |
| 491 | VSYNC <li> <em>Feedback</em> - Retire fence signal timestamps from Hardware |
| 492 | Composer </ul> |
| 493 | |
| 494 | <h5 id=vsync_retire_offset>VSYNC/Retire Offset</h5> |
| 495 | |
| 496 | <p>The signal timestamp of retire fences must match HW VSYNC even on devices |
| 497 | that don’t use the offset phase. Otherwise, errors appear to have greater |
| 498 | severity than reality.</p> |
| 499 | |
| 500 | <p>“Smart” panels often have a delta. Retire fence is the end of direct memory |
| 501 | access (DMA) to display memory. The actual display switch and HW VSYNC is some |
| 502 | time later.</p> |
| 503 | |
| 504 | <p><code>PRESENT_TIME_OFFSET_FROM_VSYNC_NS</code> is set in the device’s |
| 505 | BoardConfig.mk make file. It is based upon the display controller and panel |
| 506 | characteristics. Time from retire fence timestamp to HW Vsync signal is |
| 507 | measured in nanoseconds.</p> |
| 508 | |
| 509 | <h5 id=vsync_and_sf_vsync_offsets>VSYNC and SF_VSYNC Offsets</h5> |
| 510 | |
| 511 | <p>The <code>VSYNC_EVENT_PHASE_OFFSET_NS</code> and |
| 512 | <code>SF_VSYNC_EVENT_PHASE_OFFSET_NS</code> are set conservatively based on |
| 513 | high-load use cases, such as partial GPU composition during window transition |
| 514 | or Chrome scrolling through a webpage containing animations. These offsets |
| 515 | allow for long application render time and long GPU composition time.</p> |
| 516 | |
| 517 | <p>More than a millisecond or two of latency is noticeable. We recommend |
| 518 | integrating thorough automated error testing to minimize latency without |
| 519 | significantly increasing error counts.</p> |
| 520 | |
| 521 | <p>Note these offsets are also set in the device’s BoardConfig.mk make file. |
| 522 | The default if not set is zero offset. Both settings are offset in nanoseconds |
| 523 | after HW_VSYNC_0. Either can be negative.</p> |
| 524 | |
| 525 | <h3 id=virtual_displays>Virtual displays</h3> |
| 526 | |
| 527 | <p>Android added support for virtual displays to Hardware Composer in version |
| 528 | 1.3. This support was implemented in the Android platform and can be used by |
| 529 | Miracast.</p> |
| 530 | |
| 531 | <p>The virtual display composition is similar to the physical display: Input |
| 532 | layers are described in prepare(), SurfaceFlinger conducts GPU composition, and |
| 533 | layers and GPU framebuffer are provided to Hardware Composer in set().</p> |
| 534 | |
| 535 | <p>Instead of the output going to the screen, it is sent to a gralloc buffer. |
| 536 | Hardware Composer writes output to a buffer and provides the completion fence. |
| 537 | The buffer is sent to an arbitrary consumer: video encoder, GPU, CPU, etc. |
| 538 | Virtual displays can use 2D/blitter or overlays if the display pipeline can |
| 539 | write to memory.</p> |
| 540 | |
| 541 | <h4 id=modes>Modes</h4> |
| 542 | |
| 543 | <p>Each frame is in one of three modes after prepare():</p> |
| 544 | |
| 545 | <ul> <li> <em>GLES</em> - All layers composited by GPU. GPU writes directly to |
| 546 | the output buffer while Hardware Composer does nothing. This is equivalent to |
| 547 | virtual display composition with Hardware Composer <1.3. <li> <em>MIXED</em> - |
| 548 | GPU composites some layers to framebuffer, and Hardware Composer composites |
| 549 | framebuffer and remaining layers. GPU writes to scratch buffer (framebuffer). |
| 550 | Hardware Composer reads scratch buffer and writes to the output buffer. Buffers |
| 551 | may have different formats, e.g. RGBA and YCbCr. <li> <em>HWC</em> - All |
| 552 | layers composited by Hardware Composer. Hardware Composer writes directly to |
| 553 | the output buffer. </ul> |
| 554 | |
| 555 | <h4 id=output_format>Output format</h4> |
| 556 | |
| 557 | <p><em>MIXED and HWC modes</em>: If the consumer needs CPU access, the consumer |
| 558 | chooses the format. Otherwise, the format is IMPLEMENTATION_DEFINED. Gralloc |
| 559 | can choose best format based on usage flags. For example, choose a YCbCr format |
| 560 | if the consumer is video encoder, and Hardware Composer can write the format |
| 561 | efficiently.</p> |
| 562 | |
| 563 | <p><em>GLES mode</em>: EGL driver chooses output buffer format in |
| 564 | dequeueBuffer(), typically RGBA8888. The consumer must be able to accept this |
| 565 | format.</p> |
| 566 | |
| 567 | <h4 id=egl_requirement>EGL requirement</h4> |
| 568 | |
| 569 | <p>Hardware Composer 1.3 virtual displays require that eglSwapBuffers() does |
| 570 | not dequeue the next buffer immediately. Instead, it should defer dequeueing |
| 571 | the buffer until rendering begins. Otherwise, EGL always owns the “next” output |
| 572 | buffer. SurfaceFlinger can’t get the output buffer for Hardware Composer in |
| 573 | MIXED/HWC mode. </p> |
| 574 | |
| 575 | <p>If Hardware Composer always sends all virtual display layers to GPU, all |
| 576 | frames will be in GLES mode. Although it is not recommended, you may use this |
| 577 | method if you need to support Hardware Composer 1.3 for some other reason but |
| 578 | can’t conduct virtual display composition.</p> |
| 579 | |
| 580 | <h2 id=testing>Testing</h2> |
| 581 | |
| 582 | <p>For benchmarking, we suggest following this flow by phase:</p> |
| 583 | |
| 584 | <ul> <li> <em>Specification</em> - When initially specifying the device, such |
| 585 | as when using immature drivers, you should use predefined (fixed) clocks and |
| 586 | workloads to measure the frames per second rendered. This gives a clear view of |
| 587 | what the hardware is capable of doing. <li> <em>Development</em> - In the |
| 588 | development phase as drivers mature, you should use a fixed set of user actions |
| 589 | to measure the number of visible stutters (janks) in animations. <li> |
| 590 | <em>Production</em> - Once the device is ready for production and you want to |
| 591 | compare against competitors, you should increase the workload until stutters |
| 592 | increase. Determine if the current clock settings can keep up with the load. |
| 593 | This can help you identify where you might be able to slow the clocks and |
| 594 | reduce power use. </ul> |
| 595 | |
| 596 | <p>For the specification phase, Android offers the Flatland tool to help derive |
| 597 | device capabilities. It can be found at: |
| 598 | <code>platform/frameworks/native/cmds/flatland/</code></p> |
| 599 | |
| 600 | <p>Flatland relies upon fixed clocks and shows the throughput that can be |
| 601 | achieved with composition-based workloads. It uses gralloc buffers to simulate |
| 602 | multiple window scenarios, filling in the window with GL and then measuring the |
| 603 | compositing. Please note, Flatland uses the synchronization framework to |
| 604 | measure time. So you must support the synchronization framework to readily use |
| 605 | Flatland.</p> |