src/devices/audio_latency.jd - fp2-dev/platform/docs/source.android.com - Gitiles

 page.title=Audio Latency
 @jd:body

 <!--
     Copyright 2010 The Android Open Source Project

     Licensed under the Apache License, Version 2.0 (the "License");
     you may not use this file except in compliance with the License.
     You may obtain a copy of the License at

         http://www.apache.org/licenses/LICENSE-2.0

     Unless required by applicable law or agreed to in writing, software
     distributed under the License is distributed on an "AS IS" BASIS,
     WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
     See the License for the specific language governing permissions and
     limitations under the License.
 -->
 <div id="qv-wrapper">
   <div id="qv">
     <h2>In this document</h2>
     <ol id="auto-toc">
     </ol>
   </div>
 </div>

 <p>Audio latency is the time delay as an audio signal passes through a system.
   For a complete description of audio latency for the purposes of Android
   compatibility, see <em>Section 5.4 Audio Latency</em>
   in the <a href="http://source.android.com/compatibility/index.html">Android CDD</a>.
 </p>

 <h2 id="contributors">Contributors to Latency</h2>

 <p>
   This section focuses on the contributors to output latency,
   but a similar discussion applies to input latency.
 </p>
 <p>
   Assuming that the analog circuitry does not contribute significantly.
   Then the major surface-level contributors to audio latency are the following:
 </p>

 <ul>
   <li>Application</li>
   <li>Total number of buffers in pipeline</li>
   <li>Size of each buffer, in frames</li>
   <li>Additional latency after the app processor, such as from a DSP</li>
 </ul>

 <p>
   As accurate as the above list of contributors may be, it is also misleading.
   The reason is that buffer count and buffer size are more of an
   <em>effect</em> than a <em>cause</em>.  What usually happens is that
   a given buffer scheme is implemented and tested, but during testing, an audio
   underrun is heard as a "click" or "pop".  To compensate, the
   system designer then increases buffer sizes or buffer counts.
   This has the desired result of eliminating the underruns, but it also
   has the undesired side effect of increasing latency.
 </p>

 <p>
   A better approach is to understand the underlying causes of the
   underruns and then correct those.  This eliminates the
   audible artifacts and may even permit even smaller or fewer buffers
   and thus reduce latency.
 </p>

 <p>
   In our experience, the most common causes of underruns include:
 </p>
 <ul>
   <li>Linux CFS (Completely Fair Scheduler)</li>
   <li>high-priority threads with SCHED_FIFO scheduling</li>
   <li>long scheduling latency</li>
   <li>long-running interrupt handlers</li>
   <li>long interrupt disable time</li>
 </ul>

 <h3>Linux CFS and SCHED_FIFO scheduling</h3>
 <p>
   The Linux CFS is designed to be fair to competing workloads sharing a common CPU
   resource. This fairness is represented by a per-thread <em>nice</em> parameter.
   The nice value ranges from -19 (least nice, or most CPU time allocated)
   to 20 (nicest, or least CPU time allocated). In general, all threads with a given
   nice value receive approximately equal CPU time and threads with a
   numerically lower nice value should expect to
   receive more CPU time. However, CFS is "fair" only over relatively long
   periods of observation. Over short-term observation windows,
   CFS may allocate the CPU resource in unexpected ways. For example, it
   may take the CPU away from a thread with numerically low niceness
   onto a thread with a numerically high niceness.  In the case of audio,
   this can result in an underrun.
 </p>

 <p>
   The obvious solution is to avoid CFS for high-performance audio
   threads. Beginning with Android 4.1 (Jelly Bean), such threads now use the
   <code>SCHED_FIFO</code> scheduling policy rather than the <code>SCHED_NORMAL</code> (also called
   <code>SCHED_OTHER</code>) scheduling policy implemented by CFS.
 </p>

 <p>
   Though the high-performance audio threads now use <code>SCHED_FIFO</code>, they
   are still susceptible to other higher priority <code>SCHED_FIFO</code> threads.
   These are typically kernel worker threads, but there may also be a few
   non-audio user threads with policy <code>SCHED_FIFO</code>. The available <code>SCHED_FIFO</code>
   priorities range from 1 to 99.  The audio threads run at priority
   2 or 3.  This leaves priority 1 available for lower priority threads,
   and priorities 4 to 99 for higher priority threads.  We recommend that
   you use priority 1 whenever possible, and reserve priorities 4 to 99 for
   those threads that are guaranteed to complete within a bounded amount
   of time, and are known to not interfere with scheduling of audio threads.
 </p>

 <h3>Scheduling latency</h3>
 <p>
   Scheduling latency is the time between when a thread becomes
   ready to run, and when the resulting context switch completes so that the
   thread actually runs on a CPU. The shorter the latency the better and
   anything over two milliseconds causes problems for audio. Long scheduling
   latency is most likely to occur during mode transitions, such as
   bringing up or shutting down a CPU, switching between a security kernel
   and the normal kernel, switching from full power to low-power mode,
   or adjusting the CPU clock frequency and voltage.
 </p>

 <h3>Interrupts</h3>
 <p>
   In many designs, CPU 0 services all external interrupts.  So a
   long-running interrupt handler may delay other interrupts, in particular
   audio DMA completion interrupts. Design interrupt handlers
   to finish quickly and defer any lengthy work to a thread (preferably
   a CFS thread or <code>SCHED_FIFO</code> thread of priority 1).
 </p>

 <p>
   Equivalently, disabling interrupts on CPU 0 for a long period
   has the same result of delaying the servicing of audio interrupts.
   Long interrupt disable times typically happen while waiting for a kernel
   <i>spin lock</i>.  Review these spin locks to ensure that
   they are bounded.
 </p>


 <h2 id="measuringOutput">Measuring Output Latency</h2>

 <p>
   There are several techniques available to measure output latency,
   with varying degrees of accuracy and ease of running.
 </p>

 <h3>LED and oscilloscope test</h3>
 <p>
 This test measures latency in relation to the device's LED indicator.
 If your production device does not have an LED, you can install the
   LED on a prototype form factor device. For even better accuracy
   on prototype devices with exposed circuity, connect one
   oscilloscope probe to the LED directly to bypass the light
   sensor latency.
   </p>

 <p>
   If you cannot install an LED on either your production or prototype device,
   try the following workarounds:
 </p>

 <ul>
   <li>Use a General Purpose Input/Output (GPIO) pin for the same purpose</li>
   <li>Use JTAG or another debugging port</li>
   <li>Use the screen backlight. This might be risky as the
   backlight may have a non-neglible latency, and can contribute to
   an inaccurate latency reading.
   </li>
 </ul>

 <p>To conduct this test:</p>

 <ol>
   <li>Run an app that periodically pulses the LED at
   the same time it outputs audio.

   <p class="note"><b>Note:</b> To get useful results, it is crucial to use the correct
   APIs in the test app so that you're exercising the fast audio output path.
   See the separate document "Application developer guidelines for reduced
   audio latency". <!-- where is this ?-->
   </p>
   </li>
   <li>Place a light sensor next to the LED.</li>
   <li>Connect the probes of a dual-channel oscilloscope to both the wired headphone
   jack (line output) and light sensor.</li>
   <li>Use the oscilloscope to measure
   the time difference between observing the line output signal versus the light
   sensor signal.</li>
 </ol>

   <p>The difference in time is the approximate audio output latency,
   assuming that the LED latency and light sensor latency are both zero.
   Typically, the LED and light sensor each have a relatively low latency
   on the order of 1 millisecond or less, which is sufficiently low enough
   to ignore.</p>

 <h3>Larsen test</h3>
 <p>
   One of the easiest latency tests is an audio feedback
   (Larsen effect) test. This provides a crude measure of combined output
   and input latency by timing an impulse response loop. This test is not very useful
   by itself because of the nature of the test, but</p>

 <p>To conduct this test:</p>
 <ol>
   <li>Run an app that captures audio from the microphone and immediately plays the
   captured data back over the speaker.</li>
   <li>Create a sound externally,
   such as tapping a pencil by the microphone. This noise generates a feedback loop.</li>
   <li>Measure the time between feedback pulses to get the sum of the output latency, input latency, and application overhead.</li>
 </ol>

   <p>This method does not break down the
   component times, which is important when the output latency
   and input latency are independent, so this method is not recommended for measuring output latency, but might be useful
   to help measure output latency.</p>

 <h2 id="measuringInput">Measuring Input Latency</h2>

 <p>
   Input latency is more difficult to measure than output latency. The following
   tests might help.
 </p>

 <p>
 One approach is to first determine the output latency
   using the LED and oscilloscope method and then use
   the audio feedback (Larsen) test to determine the sum of output
   latency and input latency. The difference between these two
   measurements is the input latency.
 </p>

 <p>
   Another technique is to use a GPIO pin on a prototype device.
   Externally, pulse a GPIO input at the same time that you present
   an audio signal to the device.  Run an app that compares the
   difference in arrival times of the GPIO signal and audio data.
 </p>

 <h2 id="reducing">Reducing Latency</h2>

 <p>To achieve low audio latency, pay special attention throughout the
 system to scheduling, interrupt handling, power management, and device
 driver design. Your goal is to prevent any part of the platform from
 blocking a <code>SCHED_FIFO</code> audio thread for more than a couple
 of milliseconds. By adopting such a systematic approach, you can reduce
 audio latency and get the side benefit of more predictable performance
 overall.
 </p>


  <p>
   Audio underruns, when they do occur, are often detectable only under certain
   conditions or only at the transitions. Try stressing the system by launching
   new apps and scrolling quickly through various displays. But be aware
   that some test conditions are so stressful as to be beyond the design
   goals. For example, taking a bugreport puts such enormous load on the
   system that it may be acceptable to have an underrun in that case.
 </p>

 <p>
   When testing for underruns:
 </p>
   <ul>
   <li>Configure any DSP after the app processor so that it adds
   minimal latency</li>
   <li>Run tests under different conditions
   such as having the screen on or off, USB plugged in or unplugged,
   WiFi on or off, Bluetooth on or off, and telephony and data radios
   on or off.</li>
   <li>Select relatively quiet music that you're very familiar with, and which is easy
   to hear underruns in.</li>
   <li>Use wired headphones for extra sensitivity.</li>
   <li>Give yourself breaks so that you don't experience "ear fatigue".</li>
   </ul>

 <p>
   Once you find the underlying causes of underruns, reduce
   the buffer counts and sizes to take advantage of this.
   The eager approach of reducing buffer counts and sizes <i>before</i>
   analyzing underruns and fixing the causes of underruns only
   results in frustration.
 </p>

 <h3 id="tools">Tools</h3>
 <p>
   <code>systrace</code> is an excellent general-purpose tool
   for diagnosing system-level performance glitches.
 </p>

 <p>
   The output of <code>dumpsys media.audio_flinger</code> also contains a
   useful section called "simple moving statistics". This has a summary
   of the variability of elapsed times for each audio mix and I/O cycle.
   Ideally, all the time measurements should be about equal to the mean or
   nominal cycle time. If you see a very low minimum or high maximum, this is an
   indication of a problem, which is probably a high scheduling latency or interrupt
   disable time. The <i>tail</i> part of the output is especially helpful,
   as it highlights the variability beyond +/- 3 standard deviations.
 </p>
	page.title=Audio Latency
	@jd:body

	<!--
	Copyright 2010 The Android Open Source Project

	Licensed under the Apache License, Version 2.0 (the "License");
	you may not use this file except in compliance with the License.
	You may obtain a copy of the License at

	http://www.apache.org/licenses/LICENSE-2.0

	Unless required by applicable law or agreed to in writing, software
	distributed under the License is distributed on an "AS IS" BASIS,
	WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
	See the License for the specific language governing permissions and
	limitations under the License.
	-->
	<div id="qv-wrapper">
	<div id="qv">
	<h2>In this document</h2>
	<ol id="auto-toc">
	</ol>
	</div>
	</div>

	<p>Audio latency is the time delay as an audio signal passes through a system.
	For a complete description of audio latency for the purposes of Android
	compatibility, see <em>Section 5.4 Audio Latency</em>
	in the <a href="http://source.android.com/compatibility/index.html">Android CDD</a>.
	</p>

	<h2 id="contributors">Contributors to Latency</h2>

	<p>
	This section focuses on the contributors to output latency,
	but a similar discussion applies to input latency.
	</p>
	<p>
	Assuming that the analog circuitry does not contribute significantly.
	Then the major surface-level contributors to audio latency are the following:
	</p>

	<ul>
	<li>Application</li>
	<li>Total number of buffers in pipeline</li>
	<li>Size of each buffer, in frames</li>
	<li>Additional latency after the app processor, such as from a DSP</li>
	</ul>

	<p>
	As accurate as the above list of contributors may be, it is also misleading.
	The reason is that buffer count and buffer size are more of an
	<em>effect</em> than a <em>cause</em>. What usually happens is that
	a given buffer scheme is implemented and tested, but during testing, an audio
	underrun is heard as a "click" or "pop". To compensate, the
	system designer then increases buffer sizes or buffer counts.
	This has the desired result of eliminating the underruns, but it also
	has the undesired side effect of increasing latency.
	</p>

	<p>
	A better approach is to understand the underlying causes of the
	underruns and then correct those. This eliminates the
	audible artifacts and may even permit even smaller or fewer buffers
	and thus reduce latency.
	</p>

	<p>
	In our experience, the most common causes of underruns include:
	</p>
	<ul>
	<li>Linux CFS (Completely Fair Scheduler)</li>
	<li>high-priority threads with SCHED_FIFO scheduling</li>
	<li>long scheduling latency</li>
	<li>long-running interrupt handlers</li>
	<li>long interrupt disable time</li>
	</ul>

	<h3>Linux CFS and SCHED_FIFO scheduling</h3>
	<p>
	The Linux CFS is designed to be fair to competing workloads sharing a common CPU
	resource. This fairness is represented by a per-thread <em>nice</em> parameter.
	The nice value ranges from -19 (least nice, or most CPU time allocated)
	to 20 (nicest, or least CPU time allocated). In general, all threads with a given
	nice value receive approximately equal CPU time and threads with a
	numerically lower nice value should expect to
	receive more CPU time. However, CFS is "fair" only over relatively long
	periods of observation. Over short-term observation windows,
	CFS may allocate the CPU resource in unexpected ways. For example, it
	may take the CPU away from a thread with numerically low niceness
	onto a thread with a numerically high niceness. In the case of audio,
	this can result in an underrun.
	</p>

	<p>
	The obvious solution is to avoid CFS for high-performance audio
	threads. Beginning with Android 4.1 (Jelly Bean), such threads now use the
	<code>SCHED_FIFO</code> scheduling policy rather than the <code>SCHED_NORMAL</code> (also called
	<code>SCHED_OTHER</code>) scheduling policy implemented by CFS.
	</p>

	<p>
	Though the high-performance audio threads now use <code>SCHED_FIFO</code>, they
	are still susceptible to other higher priority <code>SCHED_FIFO</code> threads.
	These are typically kernel worker threads, but there may also be a few
	non-audio user threads with policy <code>SCHED_FIFO</code>. The available <code>SCHED_FIFO</code>
	priorities range from 1 to 99. The audio threads run at priority
	2 or 3. This leaves priority 1 available for lower priority threads,
	and priorities 4 to 99 for higher priority threads. We recommend that
	you use priority 1 whenever possible, and reserve priorities 4 to 99 for
	those threads that are guaranteed to complete within a bounded amount
	of time, and are known to not interfere with scheduling of audio threads.
	</p>

	<h3>Scheduling latency</h3>
	<p>
	Scheduling latency is the time between when a thread becomes
	ready to run, and when the resulting context switch completes so that the
	thread actually runs on a CPU. The shorter the latency the better and
	anything over two milliseconds causes problems for audio. Long scheduling
	latency is most likely to occur during mode transitions, such as
	bringing up or shutting down a CPU, switching between a security kernel
	and the normal kernel, switching from full power to low-power mode,
	or adjusting the CPU clock frequency and voltage.
	</p>

	<h3>Interrupts</h3>
	<p>
	In many designs, CPU 0 services all external interrupts. So a
	long-running interrupt handler may delay other interrupts, in particular
	audio DMA completion interrupts. Design interrupt handlers
	to finish quickly and defer any lengthy work to a thread (preferably
	a CFS thread or <code>SCHED_FIFO</code> thread of priority 1).
	</p>

	<p>
	Equivalently, disabling interrupts on CPU 0 for a long period
	has the same result of delaying the servicing of audio interrupts.
	Long interrupt disable times typically happen while waiting for a kernel
	<i>spin lock</i>. Review these spin locks to ensure that
	they are bounded.
	</p>



	<h2 id="measuringOutput">Measuring Output Latency</h2>

	<p>
	There are several techniques available to measure output latency,
	with varying degrees of accuracy and ease of running.
	</p>

	<h3>LED and oscilloscope test</h3>
	<p>
	This test measures latency in relation to the device's LED indicator.
	If your production device does not have an LED, you can install the
	LED on a prototype form factor device. For even better accuracy
	on prototype devices with exposed circuity, connect one
	oscilloscope probe to the LED directly to bypass the light
	sensor latency.
	</p>

	<p>
	If you cannot install an LED on either your production or prototype device,
	try the following workarounds:
	</p>

	<ul>
	<li>Use a General Purpose Input/Output (GPIO) pin for the same purpose</li>
	<li>Use JTAG or another debugging port</li>
	<li>Use the screen backlight. This might be risky as the
	backlight may have a non-neglible latency, and can contribute to
	an inaccurate latency reading.
	</li>
	</ul>

	<p>To conduct this test:</p>

	<ol>
	<li>Run an app that periodically pulses the LED at
	the same time it outputs audio.

	<p class="note"><b>Note:</b> To get useful results, it is crucial to use the correct
	APIs in the test app so that you're exercising the fast audio output path.
	See the separate document "Application developer guidelines for reduced
	audio latency". <!-- where is this ?-->
	</p>
	</li>
	<li>Place a light sensor next to the LED.</li>
	<li>Connect the probes of a dual-channel oscilloscope to both the wired headphone
	jack (line output) and light sensor.</li>
	<li>Use the oscilloscope to measure
	the time difference between observing the line output signal versus the light
	sensor signal.</li>
	</ol>

	<p>The difference in time is the approximate audio output latency,
	assuming that the LED latency and light sensor latency are both zero.
	Typically, the LED and light sensor each have a relatively low latency
	on the order of 1 millisecond or less, which is sufficiently low enough
	to ignore.</p>

	<h3>Larsen test</h3>
	<p>
	One of the easiest latency tests is an audio feedback
	(Larsen effect) test. This provides a crude measure of combined output
	and input latency by timing an impulse response loop. This test is not very useful
	by itself because of the nature of the test, but</p>

	<p>To conduct this test:</p>
	<ol>
	<li>Run an app that captures audio from the microphone and immediately plays the
	captured data back over the speaker.</li>
	<li>Create a sound externally,
	such as tapping a pencil by the microphone. This noise generates a feedback loop.</li>
	<li>Measure the time between feedback pulses to get the sum of the output latency, input latency, and application overhead.</li>
	</ol>

	<p>This method does not break down the
	component times, which is important when the output latency
	and input latency are independent, so this method is not recommended for measuring output latency, but might be useful
	to help measure output latency.</p>

	<h2 id="measuringInput">Measuring Input Latency</h2>

	<p>
	Input latency is more difficult to measure than output latency. The following
	tests might help.
	</p>

	<p>
	One approach is to first determine the output latency
	using the LED and oscilloscope method and then use
	the audio feedback (Larsen) test to determine the sum of output
	latency and input latency. The difference between these two
	measurements is the input latency.
	</p>

	<p>
	Another technique is to use a GPIO pin on a prototype device.
	Externally, pulse a GPIO input at the same time that you present
	an audio signal to the device. Run an app that compares the
	difference in arrival times of the GPIO signal and audio data.
	</p>

	<h2 id="reducing">Reducing Latency</h2>

	<p>To achieve low audio latency, pay special attention throughout the
	system to scheduling, interrupt handling, power management, and device
	driver design. Your goal is to prevent any part of the platform from
	blocking a <code>SCHED_FIFO</code> audio thread for more than a couple
	of milliseconds. By adopting such a systematic approach, you can reduce
	audio latency and get the side benefit of more predictable performance
	overall.
	</p>


	<p>
	Audio underruns, when they do occur, are often detectable only under certain
	conditions or only at the transitions. Try stressing the system by launching
	new apps and scrolling quickly through various displays. But be aware
	that some test conditions are so stressful as to be beyond the design
	goals. For example, taking a bugreport puts such enormous load on the
	system that it may be acceptable to have an underrun in that case.
	</p>

	<p>
	When testing for underruns:
	</p>
	<ul>
	<li>Configure any DSP after the app processor so that it adds
	minimal latency</li>
	<li>Run tests under different conditions
	such as having the screen on or off, USB plugged in or unplugged,
	WiFi on or off, Bluetooth on or off, and telephony and data radios
	on or off.</li>
	<li>Select relatively quiet music that you're very familiar with, and which is easy
	to hear underruns in.</li>
	<li>Use wired headphones for extra sensitivity.</li>
	<li>Give yourself breaks so that you don't experience "ear fatigue".</li>
	</ul>

	<p>
	Once you find the underlying causes of underruns, reduce
	the buffer counts and sizes to take advantage of this.
	The eager approach of reducing buffer counts and sizes <i>before</i>
	analyzing underruns and fixing the causes of underruns only
	results in frustration.
	</p>

	<h3 id="tools">Tools</h3>
	<p>
	<code>systrace</code> is an excellent general-purpose tool
	for diagnosing system-level performance glitches.
	</p>

	<p>
	The output of <code>dumpsys media.audio_flinger</code> also contains a
	useful section called "simple moving statistics". This has a summary
	of the variability of elapsed times for each audio mix and I/O cycle.
	Ideally, all the time measurements should be about equal to the mean or
	nominal cycle time. If you see a very low minimum or high maximum, this is an
	indication of a problem, which is probably a high scheduling latency or interrupt
	disable time. The <i>tail</i> part of the output is especially helpful,
	as it highlights the variability beyond +/- 3 standard deviations.
	</p>