Blame - src/devices/graphics/implement-vsync.jd - platform/docs/source.android.com

blob: 3db2a5138a20d7358d52a228fb3eda0866e428ad [file] [log] [blame]

Heidi von Markham	fd022c7	2016-06-30 10:15:28 -0700	[diff] [blame]	1	page.title=Implementing VSYNC
				2	@jd:body
				3
				4	<!--
				5	Copyright 2016 The Android Open Source Project
				6
				7	Licensed under the Apache License, Version 2.0 (the "License");
				8	you may not use this file except in compliance with the License.
				9	You may obtain a copy of the License at
				10
				11	http://www.apache.org/licenses/LICENSE-2.0
				12
				13	Unless required by applicable law or agreed to in writing, software
				14	distributed under the License is distributed on an "AS IS" BASIS,
				15	WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
				16	See the License for the specific language governing permissions and
				17	limitations under the License.
				18	-->
				19
				20	<div id="qv-wrapper">
				21	<div id="qv">
				22	<h2>In this document</h2>
				23	<ol id="auto-toc">
				24	</ol>
				25	</div>
				26	</div>
				27
				28
				29	<p>VSYNC synchronizes certain events to the refresh cycle of the display.
				30	Applications always start drawing on a VSYNC boundary, and SurfaceFlinger
				31	always composites on a VSYNC boundary. This eliminates stutters and improves
				32	visual performance of graphics.</p>
				33
				34	<p>The Hardware Composer (HWC) has a function pointer indicating the function
				35	to implement for VSYNC:</p>
				36
				37	<pre class=prettyprint> int (waitForVsync) (int64_t timestamp) </pre>
				38
				39	<p>This function blocks until a VSYNC occurs and returns the timestamp of the
				40	actual VSYNC. A message must be sent every time VSYNC occurs. A client can
				41	receive a VSYNC timestamp once at specified intervals or continuously at
				42	intervals of 1. You must implement VSYNC with a maximum 1 ms lag (0.5 ms or less
				43	is recommended); timestamps returned must be extremely accurate.</p>
				44
				45	<h2 id=explicit_synchronization>Explicit synchronization</h2>
				46
				47	<p>Explicit synchronization is required and provides a mechanism for Gralloc
				48	buffers to be acquired and released in a synchronized way. Explicit
				49	synchronization allows producers and consumers of graphics buffers to signal
				50	when they are done with a buffer. This allows Android to asynchronously queue
				51	buffers to be read or written with the certainty that another consumer or
				52	producer does not currently need them. For details, see
				53	<a href="{@docRoot}devices/graphics/index.html#synchronization_framework">Synchronization
				54	framework</a>.</p>
				55
				56	<p>The benefits of explicit synchronization include less behavior variation
				57	between devices, better debugging support, and improved testing metrics. For
				58	instance, the sync framework output readily identifies problem areas and root
				59	causes, and centralized SurfaceFlinger presentation timestamps show when events
				60	occur in the normal flow of the system.</p>
				61
				62	<p>This communication is facilitated by the use of synchronization fences,
				63	which are required when requesting a buffer for consuming or producing. The
				64	synchronization framework consists of three main building blocks:
				65	<code>sync_timeline</code>, <code>sync_pt</code>, and <code>sync_fence</code>.</p>
				66
				67	<h3 id=sync_timeline>sync_timeline</h3>
				68
				69	<p>A <code>sync_timeline</code> is a monotonically increasing timeline that
				70	should be implemented for each driver instance, such as a GL context, display
				71	controller, or 2D blitter. This is essentially a counter of jobs submitted to
				72	the kernel for a particular piece of hardware. It provides guarantees about the
				73	order of operations and allows hardware-specific implementations.</p>
				74
				75	<p>The sync_timeline is offered as a CPU-only reference implementation called
				76	<code>sw_sync</code> (software sync). If possible, use this instead of a
				77	<code>sync_timeline</code> to save resources and avoid complexity. If you’re not
				78	employing a hardware resource, <code>sw_sync</code> should be sufficient.</p>
				79
				80	<p>If you must implement a <code>sync_timeline</code>, use the
				81	<code>sw_sync</code> driver as a starting point. Follow these guidelines:</p>
				82
				83	<ul>
				84	<li>Provide useful names for all drivers, timelines, and fences. This simplifies
				85	debugging.</li>
				86	<li>Implement <code>timeline_value_str</code> and <code>pt_value_str</code>
				87	operators in your timelines to make debugging output more readable.</li>
				88	<li>If you want your userspace libraries (such as the GL library) to have access
				89	to the private data of your timelines, implement the fill driver_data operator.
				90	This lets you get information about the immutable sync_fence and
				91	<code>sync_pts</code> so you can build command lines based upon them.</li>
				92	</ul>
				93
				94	<p>When implementing a <code>sync_timeline</code>, <strong>do not</strong>:</p>
				95
				96	<ul>
				97	<li>Base it on any real view of time, such as when a wall clock or other piece
				98	of work might finish. It is better to create an abstract timeline that you can
				99	control.</li>
				100	<li>Allow userspace to explicitly create or signal a fence. This can result in
				101	one piece of the user pipeline creating a denial-of-service attack that halts
				102	all functionality. This is because the userspace cannot make promises on behalf
				103	of the kernel.</li>
				104	<li>Access <code>sync_timeline</code>, <code>sync_pt</code>, or
				105	<code>sync_fence</code> elements explicitly, as the API should provide all
				106	required functions.</li>
				107	</ul>
				108
				109	<h3 id=sync_pt>sync_pt</h3>
				110
				111	<p>A <code>sync_pt</code> is a single value or point on a sync_timeline. A point
				112	has three states: active, signaled, and error. Points start in the active state
				113	and transition to the signaled or error states. For instance, when a buffer is
				114	no longer needed by an image consumer, this sync_point is signaled so image
				115	producers know it is okay to write into the buffer again.</p>
				116
				117	<h3 id=sync_fence>sync_fence</h3>
				118
				119	<p>A <code>sync_fence</code> is a collection of <code>sync_pts</code> that often
				120	have different <code>sync_timeline</code> parents (such as for the display
				121	controller and GPU). These are the main primitives over which drivers and
				122	userspace communicate their dependencies. A fence is a promise from the kernel
				123	given upon accepting work that has been queued and assures completion in a
				124	finite amount of time.</p>
				125
				126	<p>This allows multiple consumers or producers to signal they are using a
				127	buffer and to allow this information to be communicated with one function
				128	parameter. Fences are backed by a file descriptor and can be passed from
				129	kernel-space to user-space. For instance, a fence can contain two
				130	<code>sync_points</code> that signify when two separate image consumers are done
				131	reading a buffer. When the fence is signaled, the image producers know both
				132	consumers are done consuming.</p>
				133
				134	<p>Fences, like <code>sync_pts</code>, start active and then change state based
				135	upon the state of their points. If all <code>sync_pts</code> become signaled,
				136	the <code>sync_fence</code> becomes signaled. If one <code>sync_pt</code> falls
				137	into an error state, the entire sync_fence has an error state.</p>
				138
				139	<p>Membership in the <code>sync_fence</code> is immutable after the fence is
				140	created. As a <code>sync_pt</code> can be in only one fence, it is included as a
				141	copy. Even if two points have the same value, there will be two copies of the
				142	<code>sync_pt</code> in the fence. To get more than one point in a fence, a
				143	merge operation is conducted where points from two distinct fences are added to
				144	a third fence. If one of those points was signaled in the originating fence and
				145	the other was not, the third fence will also not be in a signaled state.</p>
				146
				147	<p>To implement explicit synchronization, provide the following:</p>
				148
				149	<ul>
				150	<li>A kernel-space driver that implements a synchronization timeline for a
				151	particular piece of hardware. Drivers that need to be fence-aware are generally
				152	anything that accesses or communicates with the Hardware Composer. Key files
				153	include:
				154	<ul>
				155	<li>Core implementation:
				156	<ul>
				157	<li><code>kernel/common/include/linux/sync.h</code></li>
				158	<li><code>kernel/common/drivers/base/sync.c</code></li>
				159	</ul></li>
				160	<li><code>sw_sync</code>:
				161	<ul>
				162	<li><code>kernel/common/include/linux/sw_sync.h</code></li>
				163	<li><code>kernel/common/drivers/base/sw_sync.c</code></li>
				164	</ul></li>
				165	<li>Documentation at <code>kernel/common//Documentation/sync.txt</code>.</li>
				166	<li>Library to communicate with the kernel-space in
				167	<code>platform/system/core/libsync</code>.</li>
				168	</ul></li>
				169	<li>A Hardware Composer HAL module (v1.3 or higher) that supports the new
				170	synchronization functionality. You must provide the appropriate synchronization
				171	fences as parameters to the <code>set()</code> and <code>prepare()</code>
				172	functions in the HAL.</li>
				173	<li>Two fence-related GL extensions (<code>EGL_ANDROID_native_fence_sync</code>
				174	and <code>EGL_ANDROID_wait_sync</code>) and fence support in your graphics
				175	drivers.</li>
				176	</ul>
				177
				178	<p>For example, to use the API supporting the synchronization function, you
				179	might develop a display driver that has a display buffer function. Before the
				180	synchronization framework existed, this function would receive dma-bufs, put
				181	those buffers on the display, and block while the buffer is visible. For
				182	example:</p>
				183
				184	<pre class=prettyprint>/*
				185	* assumes buf is ready to be displayed. returns when buffer is no longer on
				186	* screen.
				187	*/
				188	void display_buffer(struct dma_buf *buf);
				189	</pre>
				190
				191	<p>With the synchronization framework, the API call is slightly more complex.
				192	While putting a buffer on display, you associate it with a fence that says when
				193	the buffer will be ready. You can queue up the work and initiate after the fence
				194	clears.</p>
				195
				196	<p>In this manner, you are not blocking anything. You immediately return your
				197	own fence, which is a guarantee of when the buffer will be off of the display.
				198	As you queue up buffers, the kernel will list dependencies with the
				199	synchronization framework:</p>
				200
				201	<pre class=prettyprint>/*
				202	* will display buf when fence is signaled. returns immediately with a fence
				203	* that will signal when buf is no longer displayed.
				204	*/
				205	struct sync_fence* display_buffer(struct dma_buf *buf, struct sync_fence
				206	*fence);
				207	</pre>
				208
				209
				210	<h2 id=sync_integration>Sync integration</h2>
				211	<p>This section explains how to integrate the low-level sync framework with
				212	different parts of the Android framework and the drivers that must communicate
				213	with one another.</p>
				214
				215	<h3 id=integration_conventions>Integration conventions</h3>
				216
				217	<p>The Android HAL interfaces for graphics follow consistent conventions so
				218	when file descriptors are passed across a HAL interface, ownership of the file
				219	descriptor is always transferred. This means:</p>
				220
				221	<ul>
				222	<li>If you receive a fence file descriptor from the sync framework, you must
				223	close it.</li>
				224	<li>If you return a fence file descriptor to the sync framework, the framework
				225	will close it.</li>
				226	<li>To continue using the fence file descriptor, you must duplicate the
				227	descriptor.</li>
				228	</ul>
				229
				230	<p>Every time a fence passes through BufferQueue (such as for a window that
				231	passes a fence to BufferQueue saying when its new contents will be ready) the
				232	fence object is renamed. Since kernel fence support allows fences to have
				233	strings for names, the sync framework uses the window name and buffer index
				234	that is being queued to name the fence (i.e., <code>SurfaceView:0</code>). This
				235	is helpful in debugging to identify the source of a deadlock as the names appear
				236	in the output of <code>/d/sync</code> and bug reports.</p>
				237
				238	<h3 id=anativewindow_integration>ANativeWindow integration</h3>
				239
				240	<p>ANativeWindow is fence aware and <code>dequeueBuffer</code>,
				241	<code>queueBuffer</code>, and <code>cancelBuffer</code> have fence parameters.
				242	</p>
				243
				244	<h3 id=opengl_es_integration>OpenGL ES integration</h3>
				245
				246	<p>OpenGL ES sync integration relies upon two EGL extensions:</p>
				247
				248	<ul>
				249	<li><code>EGL_ANDROID_native_fence_sync</code>. Provides a way to either
				250	wrap or create native Android fence file descriptors in EGLSyncKHR objects.</li>
				251	<li><code>EGL_ANDROID_wait_sync</code>. Allows GPU-side stalls rather than in
				252	CPU, making the GPU wait for an EGLSyncKHR. This is essentially the same as the
				253	<code>EGL_KHR_wait_sync</code> extension (refer to that specification for
				254	details).</li>
				255	</ul>
				256
				257	<p>These extensions can be used independently and are controlled by a compile
				258	flag in libgui. To use them, first implement the
				259	<code>EGL_ANDROID_native_fence_sync</code> extension along with the associated
				260	kernel support. Next, add a ANativeWindow support for fences to your driver then
				261	turn on support in libgui to make use of the
				262	<code>EGL_ANDROID_native_fence_sync</code> extension.</p>
				263
				264	<p>In a second pass, enable the <code>EGL_ANDROID_wait_sync</code>
				265	extension in your driver and turn it on separately. The
				266	<code>EGL_ANDROID_native_fence_sync</code> extension consists of a distinct
				267	native fence EGLSync object type so extensions that apply to existing EGLSync
				268	object types don’t necessarily apply to <code>EGL_ANDROID_native_fence</code>
				269	objects to avoid unwanted interactions.</p>
				270
				271	<p>The EGL_ANDROID_native_fence_sync extension employs a corresponding native
				272	fence file descriptor attribute that can be set only at creation time and
				273	cannot be directly queried onward from an existing sync object. This attribute
				274	can be set to one of two modes:</p>
				275
				276	<ul>
				277	<li><em>A valid fence file descriptor</em>. Wraps an existing native Android
				278	fence file descriptor in an EGLSyncKHR object.</li>
				279	<li><em>-1</em>. Creates a native Android fence file descriptor from an
				280	EGLSyncKHR object.</li>
				281	</ul>
				282
				283	<p>The DupNativeFenceFD function call is used to extract the EGLSyncKHR object
				284	from the native Android fence file descriptor. This has the same result as
				285	querying the attribute that was set but adheres to the convention that the
				286	recipient closes the fence (hence the duplicate operation). Finally, destroying
				287	the EGLSync object should close the internal fence attribute.</p>
				288
				289	<h3 id=hardware_composer_integration>Hardware Composer integration</h3>
				290
				291	<p>The Hardware Composer handles three types of sync fences:</p>
				292
				293	<ul>
				294	<li><em>Acquire fence</em>. One per layer, set before calling
				295	<code>HWC::set</code>. It signals when Hardware Composer may read the buffer.</li>
				296	<li><em>Release fence</em>. One per layer, filled in by the driver in
				297	<code>HWC::set</code>. It signals when Hardware Composer is done reading the
				298	buffer so the framework can start using that buffer again for that particular
				299	layer.</li>
				300	<li><em>Retire fence</em>. One per the entire frame, filled in by the driver
				301	each time <code>HWC::set</code> is called. This covers all layers for the set
				302	operation and signals to the framework when all effects of this set operation
				303	have completed. The retire fence signals when the next set operation takes place
				304	on the screen.</li>
				305	</ul>
				306
				307	<p>The retire fence can be used to determine how long each frame appears on the
				308	screen. This is useful in identifying the location and source of delays, such
				309	as a stuttering animation.</p>
				310
				311	<h2 id=vsync_offset>VSYNC offset</h2>
				312
				313	<p>Application and SurfaceFlinger render loops should be synchronized to the
				314	hardware VSYNC. On a VSYNC event, the display begins showing frame N while
				315	SurfaceFlinger begins compositing windows for frame N+1. The app handles
				316	pending input and generates frame N+2.</p>
				317
				318	<p>Synchronizing with VSYNC delivers consistent latency. It reduces errors in
				319	apps and SurfaceFlinger and the drifting of displays in and out of phase with
				320	each other. This, however, does assume application and SurfaceFlinger per-frame
				321	times don’t vary widely. Nevertheless, the latency is at least two frames.</p>
				322
				323	<p>To remedy this, you can employ VSYNC offsets to reduce the input-to-display
				324	latency by making application and composition signal relative to hardware
				325	VSYNC. This is possible because application plus composition usually takes less
				326	than 33 ms.</p>
				327
				328	<p>The result of VSYNC offset is three signals with same period, offset
				329	phase:</p>
				330
				331	<ul>
				332	<li><code>HW_VSYNC_0</code>. Display begins showing next frame.</li>
				333	<li><code>VSYNC</code>. App reads input and generates next frame.</li>
				334	<li><code>SF VSYNC</code>. SurfaceFlinger begins compositing for next frame.</li>
				335	</ul>
				336
				337	<p>With VSYNC offset, SurfaceFlinger receives the buffer and composites the
				338	frame, while the application processes the input and renders the frame, all
				339	within a single frame of time.</p>
				340
				341	<p class="note"><strong>Note:</strong> VSYNC offsets reduce the time available
				342	for app and composition and therefore provide a greater chance for error.</p>
				343
				344	<h3 id=dispsync>DispSync</h3>
				345
				346	<p>DispSync maintains a model of the periodic hardware-based VSYNC events of a
				347	display and uses that model to execute periodic callbacks at specific phase
				348	offsets from the hardware VSYNC events.</p>
				349
				350	<p>DispSync is essentially a software phase lock loop (PLL) that generates the
				351	VSYNC and SF VSYNC signals used by Choreographer and SurfaceFlinger, even if
				352	not offset from hardware VSYNC.</p>
				353
				354	<img src="images/dispsync.png" alt="DispSync flow">
				355
				356	<p class="img-caption"><strong>Figure 1.</strong> DispSync flow</p>
				357
				358	<p>DispSync has the following qualities:</p>
				359
				360	<ul>
				361	<li><em>Reference</em>. HW_VSYNC_0.</li>
				362	<li><em>Output</em>. VSYNC and SF VSYNC.</li>
				363	<li><em>Feedback</em>. Retire fence signal timestamps from Hardware Composer.
				364	</li>
				365	</ul>
				366
				367	<h3 id=vsync_retire_offset>VSYNC/Retire offset</h3>
				368
				369	<p>The signal timestamp of retire fences must match HW VSYNC even on devices
				370	that don’t use the offset phase. Otherwise, errors appear to have greater
				371	severity than reality. Smart panels often have a delta: Retire fence is the end
				372	of direct memory access (DMA) to display memory, but the actual display switch
				373	and HW VSYNC is some time later.</p>
				374
				375	<p><code>PRESENT_TIME_OFFSET_FROM_VSYNC_NS</code> is set in the device’s
				376	BoardConfig.mk make file. It is based upon the display controller and panel
				377	characteristics. Time from retire fence timestamp to HW VSYNC signal is
				378	measured in nanoseconds.</p>
				379
				380	<h3 id=vsync_and_sf_vsync_offsets>VSYNC and SF_VSYNC offsets</h3>
				381
				382	<p>The <code>VSYNC_EVENT_PHASE_OFFSET_NS</code> and
				383	<code>SF_VSYNC_EVENT_PHASE_OFFSET_NS</code> are set conservatively based on
				384	high-load use cases, such as partial GPU composition during window transition
				385	or Chrome scrolling through a webpage containing animations. These offsets
				386	allow for long application render time and long GPU composition time.</p>
				387
				388	<p>More than a millisecond or two of latency is noticeable. We recommend
				389	integrating thorough automated error testing to minimize latency without
				390	significantly increasing error counts.</p>
				391
				392	<p class="note"><strong>Note:</strong> Theses offsets are also configured in the
				393	device’s BoardConfig.mk file. Both settings are offset in nanoseconds after
				394	HW_VSYNC_0, default to zero (if not set), and can be negative.</p>