Robert Ly | 35f2fda | 2013-01-29 16:27:05 -0800 | [diff] [blame^] | 1 | page.title=Audio Latency |
| 2 | @jd:body |
| 3 | |
| 4 | <!-- |
| 5 | Copyright 2010 The Android Open Source Project |
| 6 | |
| 7 | Licensed under the Apache License, Version 2.0 (the "License"); |
| 8 | you may not use this file except in compliance with the License. |
| 9 | You may obtain a copy of the License at |
| 10 | |
| 11 | http://www.apache.org/licenses/LICENSE-2.0 |
| 12 | |
| 13 | Unless required by applicable law or agreed to in writing, software |
| 14 | distributed under the License is distributed on an "AS IS" BASIS, |
| 15 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. |
| 16 | See the License for the specific language governing permissions and |
| 17 | limitations under the License. |
| 18 | --> |
| 19 | <div id="qv-wrapper"> |
| 20 | <div id="qv"> |
| 21 | <h2>In this document</h2> |
| 22 | <ol id="auto-toc"> |
| 23 | </ol> |
| 24 | </div> |
| 25 | </div> |
| 26 | |
| 27 | <p>Audio latency is the time delay as an audio signal passes through a system. |
| 28 | For a complete description of audio latency for the purposes of Android |
| 29 | compatibility, see <em>Section 5.4 Audio Latency</em> |
| 30 | in the <a href="http://source.android.com/compatibility/index.html">Android CDD</a>. |
| 31 | </p> |
| 32 | |
| 33 | <h2 id="contributors">Contributors to Latency</h2> |
| 34 | |
| 35 | <p> |
| 36 | This section focuses on the contributors to output latency, |
| 37 | but a similar discussion applies to input latency. |
| 38 | </p> |
| 39 | <p> |
| 40 | Assuming that the analog circuitry does not contribute significantly. |
| 41 | Then the major surface-level contributors to audio latency are the following: |
| 42 | </p> |
| 43 | |
| 44 | <ul> |
| 45 | <li>Application</li> |
| 46 | <li>Total number of buffers in pipeline</li> |
| 47 | <li>Size of each buffer, in frames</li> |
| 48 | <li>Additional latency after the app processor, such as from a DSP</li> |
| 49 | </ul> |
| 50 | |
| 51 | <p> |
| 52 | As accurate as the above list of contributors may be, it is also misleading. |
| 53 | The reason is that buffer count and buffer size are more of an |
| 54 | <em>effect</em> than a <em>cause</em>. What usually happens is that |
| 55 | a given buffer scheme is implemented and tested, but during testing, an audio |
| 56 | underrun is heard as a "click" or "pop". To compensate, the |
| 57 | system designer then increases buffer sizes or buffer counts. |
| 58 | This has the desired result of eliminating the underruns, but it also |
| 59 | has the undesired side effect of increasing latency. |
| 60 | </p> |
| 61 | |
| 62 | <p> |
| 63 | A better approach is to understand the underlying causes of the |
| 64 | underruns and then correct those. This eliminates the |
| 65 | audible artifacts and may even permit even smaller or fewer buffers |
| 66 | and thus reduce latency. |
| 67 | </p> |
| 68 | |
| 69 | <p> |
| 70 | In our experience, the most common causes of underruns include: |
| 71 | </p> |
| 72 | <ul> |
| 73 | <li>Linux CFS (Completely Fair Scheduler)</li> |
| 74 | <li>high-priority threads with SCHED_FIFO scheduling</li> |
| 75 | <li>long scheduling latency</li> |
| 76 | <li>long-running interrupt handlers</li> |
| 77 | <li>long interrupt disable time</li> |
| 78 | </ul> |
| 79 | |
| 80 | <h3>Linux CFS and SCHED_FIFO scheduling</h3> |
| 81 | <p> |
| 82 | The Linux CFS is designed to be fair to competing workloads sharing a common CPU |
| 83 | resource. This fairness is represented by a per-thread <em>nice</em> parameter. |
| 84 | The nice value ranges from -19 (least nice, or most CPU time allocated) |
| 85 | to 20 (nicest, or least CPU time allocated). In general, all threads with a given |
| 86 | nice value receive approximately equal CPU time and threads with a |
| 87 | numerically lower nice value should expect to |
| 88 | receive more CPU time. However, CFS is "fair" only over relatively long |
| 89 | periods of observation. Over short-term observation windows, |
| 90 | CFS may allocate the CPU resource in unexpected ways. For example, it |
| 91 | may take the CPU away from a thread with numerically low niceness |
| 92 | onto a thread with a numerically high niceness. In the case of audio, |
| 93 | this can result in an underrun. |
| 94 | </p> |
| 95 | |
| 96 | <p> |
| 97 | The obvious solution is to avoid CFS for high-performance audio |
| 98 | threads. Beginning with Android 4.1 (Jelly Bean), such threads now use the |
| 99 | <code>SCHED_FIFO</code> scheduling policy rather than the <code>SCHED_NORMAL</code> (also called |
| 100 | <code>SCHED_OTHER</code>) scheduling policy implemented by CFS. |
| 101 | </p> |
| 102 | |
| 103 | <p> |
| 104 | Though the high-performance audio threads now use <code>SCHED_FIFO</code>, they |
| 105 | are still susceptible to other higher priority <code>SCHED_FIFO</code> threads. |
| 106 | These are typically kernel worker threads, but there may also be a few |
| 107 | non-audio user threads with policy <code>SCHED_FIFO</code>. The available <code>SCHED_FIFO</code> |
| 108 | priorities range from 1 to 99. The audio threads run at priority |
| 109 | 2 or 3. This leaves priority 1 available for lower priority threads, |
| 110 | and priorities 4 to 99 for higher priority threads. We recommend that |
| 111 | you use priority 1 whenever possible, and reserve priorities 4 to 99 for |
| 112 | those threads that are guaranteed to complete within a bounded amount |
| 113 | of time, and are known to not interfere with scheduling of audio threads. |
| 114 | </p> |
| 115 | |
| 116 | <h3>Scheduling latency</h3> |
| 117 | <p> |
| 118 | Scheduling latency is the time between when a thread becomes |
| 119 | ready to run, and when the resulting context switch completes so that the |
| 120 | thread actually runs on a CPU. The shorter the latency the better and |
| 121 | anything over two milliseconds causes problems for audio. Long scheduling |
| 122 | latency is most likely to occur during mode transitions, such as |
| 123 | bringing up or shutting down a CPU, switching between a security kernel |
| 124 | and the normal kernel, switching from full power to low-power mode, |
| 125 | or adjusting the CPU clock frequency and voltage. |
| 126 | </p> |
| 127 | |
| 128 | <h3>Interrupts</h3> |
| 129 | <p> |
| 130 | In many designs, CPU 0 services all external interrupts. So a |
| 131 | long-running interrupt handler may delay other interrupts, in particular |
| 132 | audio DMA completion interrupts. Design interrupt handlers |
| 133 | to finish quickly and defer any lengthy work to a thread (preferably |
| 134 | a CFS thread or <code>SCHED_FIFO</code> thread of priority 1). |
| 135 | </p> |
| 136 | |
| 137 | <p> |
| 138 | Equivalently, disabling interrupts on CPU 0 for a long period |
| 139 | has the same result of delaying the servicing of audio interrupts. |
| 140 | Long interrupt disable times typically happen while waiting for a kernel |
| 141 | <i>spin lock</i>. Review these spin locks to ensure that |
| 142 | they are bounded. |
| 143 | </p> |
| 144 | |
| 145 | |
| 146 | |
| 147 | <h2 id="measuringOutput">Measuring Output Latency</h2> |
| 148 | |
| 149 | <p> |
| 150 | There are several techniques available to measure output latency, |
| 151 | with varying degrees of accuracy and ease of running. |
| 152 | </p> |
| 153 | |
| 154 | <h3>LED and oscilloscope test</h3> |
| 155 | <p> |
| 156 | This test measures latency in relation to the device's LED indicator. |
| 157 | If your production device does not have an LED, you can install the |
| 158 | LED on a prototype form factor device. For even better accuracy |
| 159 | on prototype devices with exposed circuity, connect one |
| 160 | oscilloscope probe to the LED directly to bypass the light |
| 161 | sensor latency. |
| 162 | </p> |
| 163 | |
| 164 | <p> |
| 165 | If you cannot install an LED on either your production or prototype device, |
| 166 | try the following workarounds: |
| 167 | </p> |
| 168 | |
| 169 | <ul> |
| 170 | <li>Use a General Purpose Input/Output (GPIO) pin for the same purpose</li> |
| 171 | <li>Use JTAG or another debugging port</li> |
| 172 | <li>Use the screen backlight. This might be risky as the |
| 173 | backlight may have a non-neglible latency, and can contribute to |
| 174 | an inaccurate latency reading. |
| 175 | </li> |
| 176 | </ul> |
| 177 | |
| 178 | <p>To conduct this test:</p> |
| 179 | |
| 180 | <ol> |
| 181 | <li>Run an app that periodically pulses the LED at |
| 182 | the same time it outputs audio. |
| 183 | |
| 184 | <p class="note"><b>Note:</b> To get useful results, it is crucial to use the correct |
| 185 | APIs in the test app so that you're exercising the fast audio output path. |
| 186 | See the separate document "Application developer guidelines for reduced |
| 187 | audio latency". <!-- where is this ?--> |
| 188 | </p> |
| 189 | </li> |
| 190 | <li>Place a light sensor next to the LED.</li> |
| 191 | <li>Connect the probes of a dual-channel oscilloscope to both the wired headphone |
| 192 | jack (line output) and light sensor.</li> |
| 193 | <li>Use the oscilloscope to measure |
| 194 | the time difference between observing the line output signal versus the light |
| 195 | sensor signal.</li> |
| 196 | </ol> |
| 197 | |
| 198 | <p>The difference in time is the approximate audio output latency, |
| 199 | assuming that the LED latency and light sensor latency are both zero. |
| 200 | Typically, the LED and light sensor each have a relatively low latency |
| 201 | on the order of 1 millisecond or less, which is sufficiently low enough |
| 202 | to ignore.</p> |
| 203 | |
| 204 | <h3>Larsen test</h3> |
| 205 | <p> |
| 206 | One of the easiest latency tests is an audio feedback |
| 207 | (Larsen effect) test. This provides a crude measure of combined output |
| 208 | and input latency by timing an impulse response loop. This test is not very useful |
| 209 | by itself because of the nature of the test, but</p> |
| 210 | |
| 211 | <p>To conduct this test:</p> |
| 212 | <ol> |
| 213 | <li>Run an app that captures audio from the microphone and immediately plays the |
| 214 | captured data back over the speaker.</li> |
| 215 | <li>Create a sound externally, |
| 216 | such as tapping a pencil by the microphone. This noise generates a feedback loop.</li> |
| 217 | <li>Measure the time between feedback pulses to get the sum of the output latency, input latency, and application overhead.</li> |
| 218 | </ol> |
| 219 | |
| 220 | <p>This method does not break down the |
| 221 | component times, which is important when the output latency |
| 222 | and input latency are independent, so this method is not recommended for measuring output latency, but might be useful |
| 223 | to help measure output latency.</p> |
| 224 | |
| 225 | <h2 id="measuringInput">Measuring Input Latency</h2> |
| 226 | |
| 227 | <p> |
| 228 | Input latency is more difficult to measure than output latency. The following |
| 229 | tests might help. |
| 230 | </p> |
| 231 | |
| 232 | <p> |
| 233 | One approach is to first determine the output latency |
| 234 | using the LED and oscilloscope method and then use |
| 235 | the audio feedback (Larsen) test to determine the sum of output |
| 236 | latency and input latency. The difference between these two |
| 237 | measurements is the input latency. |
| 238 | </p> |
| 239 | |
| 240 | <p> |
| 241 | Another technique is to use a GPIO pin on a prototype device. |
| 242 | Externally, pulse a GPIO input at the same time that you present |
| 243 | an audio signal to the device. Run an app that compares the |
| 244 | difference in arrival times of the GPIO signal and audio data. |
| 245 | </p> |
| 246 | |
| 247 | <h2 id="reducing">Reducing Latency</h2> |
| 248 | |
| 249 | <p>To achieve low audio latency, pay special attention throughout the |
| 250 | system to scheduling, interrupt handling, power management, and device |
| 251 | driver design. Your goal is to prevent any part of the platform from |
| 252 | blocking a <code>SCHED_FIFO</code> audio thread for more than a couple |
| 253 | of milliseconds. By adopting such a systematic approach, you can reduce |
| 254 | audio latency and get the side benefit of more predictable performance |
| 255 | overall. |
| 256 | </p> |
| 257 | |
| 258 | |
| 259 | <p> |
| 260 | Audio underruns, when they do occur, are often detectable only under certain |
| 261 | conditions or only at the transitions. Try stressing the system by launching |
| 262 | new apps and scrolling quickly through various displays. But be aware |
| 263 | that some test conditions are so stressful as to be beyond the design |
| 264 | goals. For example, taking a bugreport puts such enormous load on the |
| 265 | system that it may be acceptable to have an underrun in that case. |
| 266 | </p> |
| 267 | |
| 268 | <p> |
| 269 | When testing for underruns: |
| 270 | </p> |
| 271 | <ul> |
| 272 | <li>Configure any DSP after the app processor so that it adds |
| 273 | minimal latency</li> |
| 274 | <li>Run tests under different conditions |
| 275 | such as having the screen on or off, USB plugged in or unplugged, |
| 276 | WiFi on or off, Bluetooth on or off, and telephony and data radios |
| 277 | on or off.</li> |
| 278 | <li>Select relatively quiet music that you're very familiar with, and which is easy |
| 279 | to hear underruns in.</li> |
| 280 | <li>Use wired headphones for extra sensitivity.</li> |
| 281 | <li>Give yourself breaks so that you don't experience "ear fatigue".</li> |
| 282 | </ul> |
| 283 | |
| 284 | <p> |
| 285 | Once you find the underlying causes of underruns, reduce |
| 286 | the buffer counts and sizes to take advantage of this. |
| 287 | The eager approach of reducing buffer counts and sizes <i>before</i> |
| 288 | analyzing underruns and fixing the causes of underruns only |
| 289 | results in frustration. |
| 290 | </p> |
| 291 | |
| 292 | <h3 id="tools">Tools</h3> |
| 293 | <p> |
| 294 | <code>systrace</code> is an excellent general-purpose tool |
| 295 | for diagnosing system-level performance glitches. |
| 296 | </p> |
| 297 | |
| 298 | <p> |
| 299 | The output of <code>dumpsys media.audio_flinger</code> also contains a |
| 300 | useful section called "simple moving statistics". This has a summary |
| 301 | of the variability of elapsed times for each audio mix and I/O cycle. |
| 302 | Ideally, all the time measurements should be about equal to the mean or |
| 303 | nominal cycle time. If you see a very low minimum or high maximum, this is an |
| 304 | indication of a problem, which is probably a high scheduling latency or interrupt |
| 305 | disable time. The <i>tail</i> part of the output is especially helpful, |
| 306 | as it highlights the variability beyond +/- 3 standard deviations. |
| 307 | </p> |