Glenn Kasten | 98afa53 | 2013-04-15 14:02:36 -0700 | [diff] [blame] | 1 | page.title=Avoiding Priority Inversion |
| 2 | @jd:body |
| 3 | |
Clay Murphy | bc92aea | 2014-10-16 10:13:18 -0700 | [diff] [blame] | 4 | <!-- |
| 5 | Copyright 2013 The Android Open Source Project |
| 6 | |
| 7 | Licensed under the Apache License, Version 2.0 (the "License"); |
| 8 | you may not use this file except in compliance with the License. |
| 9 | You may obtain a copy of the License at |
| 10 | |
| 11 | http://www.apache.org/licenses/LICENSE-2.0 |
| 12 | |
| 13 | Unless required by applicable law or agreed to in writing, software |
| 14 | distributed under the License is distributed on an "AS IS" BASIS, |
| 15 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. |
| 16 | See the License for the specific language governing permissions and |
| 17 | limitations under the License. |
| 18 | --> |
Glenn Kasten | 98afa53 | 2013-04-15 14:02:36 -0700 | [diff] [blame] | 19 | <div id="qv-wrapper"> |
| 20 | <div id="qv"> |
| 21 | <h2>In this document</h2> |
| 22 | <ol id="auto-toc"> |
| 23 | </ol> |
| 24 | </div> |
| 25 | </div> |
| 26 | |
| 27 | <p> |
| 28 | This article explains how the Android's audio system attempts to avoid |
Glenn Kasten | 978bec8 | 2014-12-23 15:15:20 -0800 | [diff] [blame] | 29 | priority inversion, |
Glenn Kasten | 98afa53 | 2013-04-15 14:02:36 -0700 | [diff] [blame] | 30 | and highlights techniques that you can use too. |
| 31 | </p> |
| 32 | |
| 33 | <p> |
| 34 | These techniques may be useful to developers of high-performance |
| 35 | audio apps, OEMs, and SoC providers who are implementing an audio |
Clay Murphy | c28f237 | 2013-09-25 16:13:40 -0700 | [diff] [blame] | 36 | HAL. Please note implementing these techniques is not |
Glenn Kasten | 98afa53 | 2013-04-15 14:02:36 -0700 | [diff] [blame] | 37 | guaranteed to prevent glitches or other failures, particularly if |
| 38 | used outside of the audio context. |
Clay Murphy | c28f237 | 2013-09-25 16:13:40 -0700 | [diff] [blame] | 39 | Your results may vary, and you should conduct your own |
Glenn Kasten | 98afa53 | 2013-04-15 14:02:36 -0700 | [diff] [blame] | 40 | evaluation and testing. |
| 41 | </p> |
| 42 | |
| 43 | <h2 id="background">Background</h2> |
| 44 | |
| 45 | <p> |
Clay Murphy | c28f237 | 2013-09-25 16:13:40 -0700 | [diff] [blame] | 46 | The Android AudioFlinger audio server and AudioTrack/AudioRecord |
Glenn Kasten | 98afa53 | 2013-04-15 14:02:36 -0700 | [diff] [blame] | 47 | client implementation are being re-architected to reduce latency. |
Glenn Kasten | 978bec8 | 2014-12-23 15:15:20 -0800 | [diff] [blame] | 48 | This work started in Android 4.1, and continued with further improvements |
| 49 | in 4.2, 4.3, 4.4, and 5.0. |
Glenn Kasten | 98afa53 | 2013-04-15 14:02:36 -0700 | [diff] [blame] | 50 | </p> |
| 51 | |
| 52 | <p> |
Clay Murphy | c28f237 | 2013-09-25 16:13:40 -0700 | [diff] [blame] | 53 | To achieve this lower latency, many changes were needed throughout the system. One |
| 54 | important change is to assign CPU resources to time-critical |
Glenn Kasten | 98afa53 | 2013-04-15 14:02:36 -0700 | [diff] [blame] | 55 | threads with a more predictable scheduling policy. Reliable scheduling |
Clay Murphy | c28f237 | 2013-09-25 16:13:40 -0700 | [diff] [blame] | 56 | allows the audio buffer sizes and counts to be reduced while still |
Glenn Kasten | 978bec8 | 2014-12-23 15:15:20 -0800 | [diff] [blame] | 57 | avoiding underruns and overruns. |
Glenn Kasten | 98afa53 | 2013-04-15 14:02:36 -0700 | [diff] [blame] | 58 | </p> |
| 59 | |
Clay Murphy | 3a7af3a | 2014-09-09 17:29:09 -0700 | [diff] [blame] | 60 | <h2 id="priorityInversion">Priority inversion</h2> |
Glenn Kasten | 98afa53 | 2013-04-15 14:02:36 -0700 | [diff] [blame] | 61 | |
| 62 | <p> |
| 63 | <a href="http://en.wikipedia.org/wiki/Priority_inversion">Priority inversion</a> |
| 64 | is a classic failure mode of real-time systems, |
| 65 | where a higher-priority task is blocked for an unbounded time waiting |
Clay Murphy | 3a7af3a | 2014-09-09 17:29:09 -0700 | [diff] [blame] | 66 | for a lower-priority task to release a resource such as (shared |
| 67 | state protected by) a |
Glenn Kasten | 98afa53 | 2013-04-15 14:02:36 -0700 | [diff] [blame] | 68 | <a href="http://en.wikipedia.org/wiki/Mutual_exclusion">mutex</a>. |
| 69 | </p> |
| 70 | |
| 71 | <p> |
| 72 | In an audio system, priority inversion typically manifests as a |
| 73 | <a href="http://en.wikipedia.org/wiki/Glitch">glitch</a> |
| 74 | (click, pop, dropout), |
| 75 | <a href="http://en.wikipedia.org/wiki/Max_Headroom_(character)">repeated audio</a> |
| 76 | when circular buffers |
| 77 | are used, or delay in responding to a command. |
| 78 | </p> |
| 79 | |
| 80 | <p> |
Glenn Kasten | 5b7c17f | 2015-05-21 13:04:10 -0700 | [diff] [blame] | 81 | A common workaround for priority inversion is to increase audio buffer sizes. |
| 82 | However, this method increases latency and merely hides the problem |
| 83 | instead of solving it. It is better to understand and prevent priority |
| 84 | inversion, as seen below. |
| 85 | </p> |
| 86 | |
| 87 | <p> |
Glenn Kasten | 98afa53 | 2013-04-15 14:02:36 -0700 | [diff] [blame] | 88 | In the Android audio implementation, priority inversion is most |
Clay Murphy | 3a7af3a | 2014-09-09 17:29:09 -0700 | [diff] [blame] | 89 | likely to occur in these places. And so you should focus your attention here: |
Glenn Kasten | 98afa53 | 2013-04-15 14:02:36 -0700 | [diff] [blame] | 90 | </p> |
| 91 | |
| 92 | <ul> |
| 93 | |
| 94 | <li> |
| 95 | between normal mixer thread and fast mixer thread in AudioFlinger |
| 96 | </li> |
| 97 | |
| 98 | <li> |
| 99 | between application callback thread for a fast AudioTrack and |
| 100 | fast mixer thread (they both have elevated priority, but slightly |
| 101 | different priorities) |
| 102 | </li> |
| 103 | |
| 104 | <li> |
Glenn Kasten | 978bec8 | 2014-12-23 15:15:20 -0800 | [diff] [blame] | 105 | between application callback thread for a fast AudioRecord and |
| 106 | fast capture thread (similar to previous) |
| 107 | </li> |
| 108 | |
| 109 | <li> |
Clay Murphy | c28f237 | 2013-09-25 16:13:40 -0700 | [diff] [blame] | 110 | within the audio Hardware Abstraction Layer (HAL) implementation, e.g. for telephony or echo cancellation |
Glenn Kasten | 98afa53 | 2013-04-15 14:02:36 -0700 | [diff] [blame] | 111 | </li> |
| 112 | |
| 113 | <li> |
| 114 | within the audio driver in kernel |
| 115 | </li> |
| 116 | |
| 117 | <li> |
Glenn Kasten | 978bec8 | 2014-12-23 15:15:20 -0800 | [diff] [blame] | 118 | between AudioTrack or AudioRecord callback thread and other app threads (this is out of our control) |
Glenn Kasten | 98afa53 | 2013-04-15 14:02:36 -0700 | [diff] [blame] | 119 | </li> |
| 120 | |
| 121 | </ul> |
| 122 | |
Clay Murphy | 3a7af3a | 2014-09-09 17:29:09 -0700 | [diff] [blame] | 123 | <h2 id="commonSolutions">Common solutions</h2> |
Glenn Kasten | 98afa53 | 2013-04-15 14:02:36 -0700 | [diff] [blame] | 124 | |
| 125 | <p> |
Clay Murphy | 3a7af3a | 2014-09-09 17:29:09 -0700 | [diff] [blame] | 126 | The typical solutions include: |
Glenn Kasten | 98afa53 | 2013-04-15 14:02:36 -0700 | [diff] [blame] | 127 | </p> |
| 128 | |
| 129 | <ul> |
| 130 | |
| 131 | <li> |
| 132 | disabling interrupts |
| 133 | </li> |
| 134 | |
| 135 | <li> |
| 136 | priority inheritance mutexes |
| 137 | </li> |
| 138 | |
| 139 | </ul> |
| 140 | |
| 141 | <p> |
| 142 | Disabling interrupts is not feasible in Linux user space, and does |
Clay Murphy | c28f237 | 2013-09-25 16:13:40 -0700 | [diff] [blame] | 143 | not work for Symmetric Multi-Processors (SMP). |
Glenn Kasten | 98afa53 | 2013-04-15 14:02:36 -0700 | [diff] [blame] | 144 | </p> |
| 145 | |
| 146 | |
| 147 | <p> |
| 148 | Priority inheritance |
| 149 | <a href="http://en.wikipedia.org/wiki/Futex">futexes</a> |
| 150 | (fast user-space mutexes) are available |
| 151 | in Linux kernel, but are not currently exposed by the Android C |
| 152 | runtime library |
| 153 | <a href="http://en.wikipedia.org/wiki/Bionic_(software)">Bionic</a>. |
Clay Murphy | 3a7af3a | 2014-09-09 17:29:09 -0700 | [diff] [blame] | 154 | They are not used in the audio system because they are relatively heavyweight, |
| 155 | and because they rely on a trusted client. |
Glenn Kasten | 98afa53 | 2013-04-15 14:02:36 -0700 | [diff] [blame] | 156 | </p> |
| 157 | |
| 158 | <h2 id="androidTechniques">Techniques used by Android</h2> |
| 159 | |
| 160 | <p> |
Clay Murphy | 3a7af3a | 2014-09-09 17:29:09 -0700 | [diff] [blame] | 161 | Experiments started with "try lock" and lock with timeout. These are |
Glenn Kasten | 98afa53 | 2013-04-15 14:02:36 -0700 | [diff] [blame] | 162 | non-blocking and bounded blocking variants of the mutex lock |
Clay Murphy | 3a7af3a | 2014-09-09 17:29:09 -0700 | [diff] [blame] | 163 | operation. Try lock and lock with timeout worked fairly well but were |
| 164 | susceptible to a couple of obscure failure modes: the |
Glenn Kasten | 98afa53 | 2013-04-15 14:02:36 -0700 | [diff] [blame] | 165 | server was not guaranteed to be able to access the shared state if |
| 166 | the client happened to be busy, and the cumulative timeout could |
| 167 | be too long if there was a long sequence of unrelated locks that |
| 168 | all timed out. |
| 169 | </p> |
| 170 | |
| 171 | |
| 172 | <p> |
| 173 | We also use |
| 174 | <a href="http://en.wikipedia.org/wiki/Linearizability">atomic operations</a> |
| 175 | such as: |
| 176 | </p> |
| 177 | |
| 178 | <ul> |
| 179 | <li>increment</li> |
| 180 | <li>bitwise "or"</li> |
| 181 | <li>bitwise "and"</li> |
| 182 | </ul> |
| 183 | |
| 184 | <p> |
Clay Murphy | c28f237 | 2013-09-25 16:13:40 -0700 | [diff] [blame] | 185 | All of these return the previous value and include the necessary |
Glenn Kasten | 98afa53 | 2013-04-15 14:02:36 -0700 | [diff] [blame] | 186 | SMP barriers. The disadvantage is they can require unbounded retries. |
| 187 | In practice, we've found that the retries are not a problem. |
| 188 | </p> |
| 189 | |
Clay Murphy | 3a7af3a | 2014-09-09 17:29:09 -0700 | [diff] [blame] | 190 | <p class="note"><strong>Note:</strong> Atomic operations and their interactions with memory barriers |
Clay Murphy | 5d83ab4 | 2014-09-09 17:29:09 -0700 | [diff] [blame] | 191 | are notoriously badly misunderstood and used incorrectly. We include these methods |
| 192 | here for completeness but recommend you also read the article |
Glenn Kasten | 98afa53 | 2013-04-15 14:02:36 -0700 | [diff] [blame] | 193 | <a href="https://developer.android.com/training/articles/smp.html"> |
| 194 | SMP Primer for Android</a> |
| 195 | for further information. |
| 196 | </p> |
| 197 | |
| 198 | <p> |
| 199 | We still have and use most of the above tools, and have recently |
| 200 | added these techniques: |
| 201 | </p> |
| 202 | |
| 203 | <ul> |
| 204 | |
| 205 | <li> |
| 206 | Use non-blocking single-reader single-writer |
| 207 | <a href="http://en.wikipedia.org/wiki/Circular_buffer">FIFO queues</a> |
| 208 | for data. |
| 209 | </li> |
| 210 | |
| 211 | <li> |
| 212 | Try to |
| 213 | <i>copy</i> |
| 214 | state rather than |
| 215 | <i>share</i> |
| 216 | state between high- and |
| 217 | low-priority modules. |
| 218 | </li> |
| 219 | |
| 220 | <li> |
| 221 | When state does need to be shared, limit the state to the |
| 222 | maximum-size |
| 223 | <a href="http://en.wikipedia.org/wiki/Word_(computer_architecture)">word</a> |
Clay Murphy | c28f237 | 2013-09-25 16:13:40 -0700 | [diff] [blame] | 224 | that can be accessed atomically in one-bus operation |
Glenn Kasten | 98afa53 | 2013-04-15 14:02:36 -0700 | [diff] [blame] | 225 | without retries. |
| 226 | </li> |
| 227 | |
| 228 | <li> |
| 229 | For complex multi-word state, use a state queue. A state queue |
| 230 | is basically just a non-blocking single-reader single-writer FIFO |
| 231 | queue used for state rather than data, except the writer collapses |
| 232 | adjacent pushes into a single push. |
| 233 | </li> |
| 234 | |
| 235 | <li> |
| 236 | Pay attention to |
| 237 | <a href="http://en.wikipedia.org/wiki/Memory_barrier">memory barriers</a> |
| 238 | for SMP correctness. |
| 239 | </li> |
| 240 | |
| 241 | <li> |
| 242 | <a href="http://en.wikipedia.org/wiki/Trust,_but_verify">Trust, but verify</a>. |
| 243 | When sharing |
| 244 | <i>state</i> |
| 245 | between processes, don't |
| 246 | assume that the state is well-formed. For example, check that indices |
| 247 | are within bounds. This verification isn't needed between threads |
| 248 | in the same process, between mutual trusting processes (which |
| 249 | typically have the same UID). It's also unnecessary for shared |
| 250 | <i>data</i> |
| 251 | such as PCM audio where a corruption is inconsequential. |
| 252 | </li> |
| 253 | |
| 254 | </ul> |
| 255 | |
Clay Murphy | 3a7af3a | 2014-09-09 17:29:09 -0700 | [diff] [blame] | 256 | <h2 id="nonBlockingAlgorithms">Non-blocking algorithms</h2> |
Glenn Kasten | 98afa53 | 2013-04-15 14:02:36 -0700 | [diff] [blame] | 257 | |
| 258 | <p> |
| 259 | <a href="http://en.wikipedia.org/wiki/Non-blocking_algorithm">Non-blocking algorithms</a> |
| 260 | have been a subject of much recent study. |
| 261 | But with the exception of single-reader single-writer FIFO queues, |
| 262 | we've found them to be complex and error-prone. |
| 263 | </p> |
| 264 | |
| 265 | <p> |
Clay Murphy | c28f237 | 2013-09-25 16:13:40 -0700 | [diff] [blame] | 266 | Starting in Android 4.2, you can find our non-blocking, |
Glenn Kasten | 98afa53 | 2013-04-15 14:02:36 -0700 | [diff] [blame] | 267 | single-reader/writer classes in these locations: |
| 268 | </p> |
| 269 | |
| 270 | <ul> |
| 271 | |
| 272 | <li> |
| 273 | frameworks/av/include/media/nbaio/ |
| 274 | </li> |
| 275 | |
| 276 | <li> |
| 277 | frameworks/av/media/libnbaio/ |
| 278 | </li> |
| 279 | |
| 280 | <li> |
| 281 | frameworks/av/services/audioflinger/StateQueue* |
| 282 | </li> |
| 283 | |
| 284 | </ul> |
| 285 | |
| 286 | <p> |
| 287 | These were designed specifically for AudioFlinger and are not |
| 288 | general-purpose. Non-blocking algorithms are notorious for being |
Clay Murphy | c28f237 | 2013-09-25 16:13:40 -0700 | [diff] [blame] | 289 | difficult to debug. You can look at this code as a model. But be |
Glenn Kasten | 98afa53 | 2013-04-15 14:02:36 -0700 | [diff] [blame] | 290 | aware there may be bugs, and the classes are not guaranteed to be |
| 291 | suitable for other purposes. |
| 292 | </p> |
| 293 | |
| 294 | <p> |
Clay Murphy | 5d83ab4 | 2014-09-09 17:29:09 -0700 | [diff] [blame] | 295 | For developers, some of the sample OpenSL ES application code should be updated to |
Clay Murphy | 3a7af3a | 2014-09-09 17:29:09 -0700 | [diff] [blame] | 296 | use non-blocking algorithms or reference a non-Android open source library. |
Glenn Kasten | 98afa53 | 2013-04-15 14:02:36 -0700 | [diff] [blame] | 297 | </p> |
| 298 | |
Glenn Kasten | 7351200 | 2015-01-15 10:06:31 -0800 | [diff] [blame] | 299 | <p> |
| 300 | We have published an example non-blocking FIFO implementation that is specifically designed for |
| 301 | application code. See these files located in the platform source directory |
| 302 | <code>frameworks/av/audio_utils</code>: |
| 303 | </p> |
| 304 | <ul> |
Glenn Kasten | 3045cc5 | 2015-10-29 18:10:15 -0700 | [diff] [blame] | 305 | <li><a href="https://android.googlesource.com/platform/system/media/+/master/audio_utils/include/audio_utils/fifo.h">include/audio_utils/fifo.h</a></li> |
| 306 | <li><a href="https://android.googlesource.com/platform/system/media/+/master/audio_utils/fifo.c">fifo.c</a></li> |
| 307 | <li><a href="https://android.googlesource.com/platform/system/media/+/master/audio_utils/include/audio_utils/roundup.h">include/audio_utils/roundup.h</a></li> |
| 308 | <li><a href="https://android.googlesource.com/platform/system/media/+/master/audio_utils/roundup.c">roundup.c</a></li> |
Glenn Kasten | 7351200 | 2015-01-15 10:06:31 -0800 | [diff] [blame] | 309 | </ul> |
| 310 | |
Glenn Kasten | 98afa53 | 2013-04-15 14:02:36 -0700 | [diff] [blame] | 311 | <h2 id="tools">Tools</h2> |
| 312 | |
| 313 | <p> |
| 314 | To the best of our knowledge, there are no automatic tools for |
| 315 | finding priority inversion, especially before it happens. Some |
| 316 | research static code analysis tools are capable of finding priority |
| 317 | inversions if able to access the entire codebase. Of course, if |
| 318 | arbitrary user code is involved (as it is here for the application) |
| 319 | or is a large codebase (as for the Linux kernel and device drivers), |
| 320 | static analysis may be impractical. The most important thing is to |
| 321 | read the code very carefully and get a good grasp on the entire |
| 322 | system and the interactions. Tools such as |
| 323 | <a href="http://developer.android.com/tools/help/systrace.html">systrace</a> |
| 324 | and |
| 325 | <code>ps -t -p</code> |
| 326 | are useful for seeing priority inversion after it occurs, but do |
| 327 | not tell you in advance. |
| 328 | </p> |
| 329 | |
Clay Murphy | 3a7af3a | 2014-09-09 17:29:09 -0700 | [diff] [blame] | 330 | <h2 id="aFinalWord">A final word</h2> |
Glenn Kasten | 98afa53 | 2013-04-15 14:02:36 -0700 | [diff] [blame] | 331 | |
| 332 | <p> |
| 333 | After all of this discussion, don't be afraid of mutexes. Mutexes |
| 334 | are your friend for ordinary use, when used and implemented correctly |
| 335 | in ordinary non-time-critical use cases. But between high- and |
| 336 | low-priority tasks and in time-sensitive systems mutexes are more |
| 337 | likely to cause trouble. |
| 338 | </p> |