blob: 602c545b8325ca5d767d4f15b1c6462687560628 [file] [log] [blame]
Glenn Kasten98afa532013-04-15 14:02:36 -07001page.title=Avoiding Priority Inversion
2@jd:body
3
Clay Murphybc92aea2014-10-16 10:13:18 -07004<!--
5 Copyright 2013 The Android Open Source Project
6
7 Licensed under the Apache License, Version 2.0 (the "License");
8 you may not use this file except in compliance with the License.
9 You may obtain a copy of the License at
10
11 http://www.apache.org/licenses/LICENSE-2.0
12
13 Unless required by applicable law or agreed to in writing, software
14 distributed under the License is distributed on an "AS IS" BASIS,
15 WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
16 See the License for the specific language governing permissions and
17 limitations under the License.
18-->
Glenn Kasten98afa532013-04-15 14:02:36 -070019<div id="qv-wrapper">
20 <div id="qv">
21 <h2>In this document</h2>
22 <ol id="auto-toc">
23 </ol>
24 </div>
25</div>
26
27<p>
28This article explains how the Android's audio system attempts to avoid
Glenn Kasten978bec82014-12-23 15:15:20 -080029priority inversion,
Glenn Kasten98afa532013-04-15 14:02:36 -070030and highlights techniques that you can use too.
31</p>
32
33<p>
34These techniques may be useful to developers of high-performance
35audio apps, OEMs, and SoC providers who are implementing an audio
Clay Murphyc28f2372013-09-25 16:13:40 -070036HAL. Please note implementing these techniques is not
Glenn Kasten98afa532013-04-15 14:02:36 -070037guaranteed to prevent glitches or other failures, particularly if
38used outside of the audio context.
Clay Murphyc28f2372013-09-25 16:13:40 -070039Your results may vary, and you should conduct your own
Glenn Kasten98afa532013-04-15 14:02:36 -070040evaluation and testing.
41</p>
42
43<h2 id="background">Background</h2>
44
45<p>
Clay Murphyc28f2372013-09-25 16:13:40 -070046The Android AudioFlinger audio server and AudioTrack/AudioRecord
Glenn Kasten98afa532013-04-15 14:02:36 -070047client implementation are being re-architected to reduce latency.
Glenn Kasten978bec82014-12-23 15:15:20 -080048This work started in Android 4.1, and continued with further improvements
49in 4.2, 4.3, 4.4, and 5.0.
Glenn Kasten98afa532013-04-15 14:02:36 -070050</p>
51
52<p>
Clay Murphyc28f2372013-09-25 16:13:40 -070053To achieve this lower latency, many changes were needed throughout the system. One
54important change is to assign CPU resources to time-critical
Glenn Kasten98afa532013-04-15 14:02:36 -070055threads with a more predictable scheduling policy. Reliable scheduling
Clay Murphyc28f2372013-09-25 16:13:40 -070056allows the audio buffer sizes and counts to be reduced while still
Glenn Kasten978bec82014-12-23 15:15:20 -080057avoiding underruns and overruns.
Glenn Kasten98afa532013-04-15 14:02:36 -070058</p>
59
Clay Murphy3a7af3a2014-09-09 17:29:09 -070060<h2 id="priorityInversion">Priority inversion</h2>
Glenn Kasten98afa532013-04-15 14:02:36 -070061
62<p>
63<a href="http://en.wikipedia.org/wiki/Priority_inversion">Priority inversion</a>
64is a classic failure mode of real-time systems,
65where a higher-priority task is blocked for an unbounded time waiting
Clay Murphy3a7af3a2014-09-09 17:29:09 -070066for a lower-priority task to release a resource such as (shared
67state protected by) a
Glenn Kasten98afa532013-04-15 14:02:36 -070068<a href="http://en.wikipedia.org/wiki/Mutual_exclusion">mutex</a>.
69</p>
70
71<p>
72In an audio system, priority inversion typically manifests as a
73<a href="http://en.wikipedia.org/wiki/Glitch">glitch</a>
74(click, pop, dropout),
75<a href="http://en.wikipedia.org/wiki/Max_Headroom_(character)">repeated audio</a>
76when circular buffers
77are used, or delay in responding to a command.
78</p>
79
80<p>
Glenn Kasten5b7c17f2015-05-21 13:04:10 -070081A common workaround for priority inversion is to increase audio buffer sizes.
82However, this method increases latency and merely hides the problem
83instead of solving it. It is better to understand and prevent priority
84inversion, as seen below.
85</p>
86
87<p>
Glenn Kasten98afa532013-04-15 14:02:36 -070088In the Android audio implementation, priority inversion is most
Clay Murphy3a7af3a2014-09-09 17:29:09 -070089likely to occur in these places. And so you should focus your attention here:
Glenn Kasten98afa532013-04-15 14:02:36 -070090</p>
91
92<ul>
93
94<li>
95between normal mixer thread and fast mixer thread in AudioFlinger
96</li>
97
98<li>
99between application callback thread for a fast AudioTrack and
100fast mixer thread (they both have elevated priority, but slightly
101different priorities)
102</li>
103
104<li>
Glenn Kasten978bec82014-12-23 15:15:20 -0800105between application callback thread for a fast AudioRecord and
106fast capture thread (similar to previous)
107</li>
108
109<li>
Clay Murphyc28f2372013-09-25 16:13:40 -0700110within the audio Hardware Abstraction Layer (HAL) implementation, e.g. for telephony or echo cancellation
Glenn Kasten98afa532013-04-15 14:02:36 -0700111</li>
112
113<li>
114within the audio driver in kernel
115</li>
116
117<li>
Glenn Kasten978bec82014-12-23 15:15:20 -0800118between AudioTrack or AudioRecord callback thread and other app threads (this is out of our control)
Glenn Kasten98afa532013-04-15 14:02:36 -0700119</li>
120
121</ul>
122
Clay Murphy3a7af3a2014-09-09 17:29:09 -0700123<h2 id="commonSolutions">Common solutions</h2>
Glenn Kasten98afa532013-04-15 14:02:36 -0700124
125<p>
Clay Murphy3a7af3a2014-09-09 17:29:09 -0700126The typical solutions include:
Glenn Kasten98afa532013-04-15 14:02:36 -0700127</p>
128
129<ul>
130
131<li>
132disabling interrupts
133</li>
134
135<li>
136priority inheritance mutexes
137</li>
138
139</ul>
140
141<p>
142Disabling interrupts is not feasible in Linux user space, and does
Clay Murphyc28f2372013-09-25 16:13:40 -0700143not work for Symmetric Multi-Processors (SMP).
Glenn Kasten98afa532013-04-15 14:02:36 -0700144</p>
145
146
147<p>
148Priority inheritance
149<a href="http://en.wikipedia.org/wiki/Futex">futexes</a>
150(fast user-space mutexes) are available
151in Linux kernel, but are not currently exposed by the Android C
152runtime library
153<a href="http://en.wikipedia.org/wiki/Bionic_(software)">Bionic</a>.
Clay Murphy3a7af3a2014-09-09 17:29:09 -0700154They are not used in the audio system because they are relatively heavyweight,
155and because they rely on a trusted client.
Glenn Kasten98afa532013-04-15 14:02:36 -0700156</p>
157
158<h2 id="androidTechniques">Techniques used by Android</h2>
159
160<p>
Clay Murphy3a7af3a2014-09-09 17:29:09 -0700161Experiments started with "try lock" and lock with timeout. These are
Glenn Kasten98afa532013-04-15 14:02:36 -0700162non-blocking and bounded blocking variants of the mutex lock
Clay Murphy3a7af3a2014-09-09 17:29:09 -0700163operation. Try lock and lock with timeout worked fairly well but were
164susceptible to a couple of obscure failure modes: the
Glenn Kasten98afa532013-04-15 14:02:36 -0700165server was not guaranteed to be able to access the shared state if
166the client happened to be busy, and the cumulative timeout could
167be too long if there was a long sequence of unrelated locks that
168all timed out.
169</p>
170
171
172<p>
173We also use
174<a href="http://en.wikipedia.org/wiki/Linearizability">atomic operations</a>
175such as:
176</p>
177
178<ul>
179<li>increment</li>
180<li>bitwise "or"</li>
181<li>bitwise "and"</li>
182</ul>
183
184<p>
Clay Murphyc28f2372013-09-25 16:13:40 -0700185All of these return the previous value and include the necessary
Glenn Kasten98afa532013-04-15 14:02:36 -0700186SMP barriers. The disadvantage is they can require unbounded retries.
187In practice, we've found that the retries are not a problem.
188</p>
189
Clay Murphy3a7af3a2014-09-09 17:29:09 -0700190<p class="note"><strong>Note:</strong> Atomic operations and their interactions with memory barriers
Clay Murphy5d83ab42014-09-09 17:29:09 -0700191are notoriously badly misunderstood and used incorrectly. We include these methods
192here for completeness but recommend you also read the article
Glenn Kasten98afa532013-04-15 14:02:36 -0700193<a href="https://developer.android.com/training/articles/smp.html">
194SMP Primer for Android</a>
195for further information.
196</p>
197
198<p>
199We still have and use most of the above tools, and have recently
200added these techniques:
201</p>
202
203<ul>
204
205<li>
206Use non-blocking single-reader single-writer
207<a href="http://en.wikipedia.org/wiki/Circular_buffer">FIFO queues</a>
208for data.
209</li>
210
211<li>
212Try to
213<i>copy</i>
214state rather than
215<i>share</i>
216state between high- and
217low-priority modules.
218</li>
219
220<li>
221When state does need to be shared, limit the state to the
222maximum-size
223<a href="http://en.wikipedia.org/wiki/Word_(computer_architecture)">word</a>
Clay Murphyc28f2372013-09-25 16:13:40 -0700224that can be accessed atomically in one-bus operation
Glenn Kasten98afa532013-04-15 14:02:36 -0700225without retries.
226</li>
227
228<li>
229For complex multi-word state, use a state queue. A state queue
230is basically just a non-blocking single-reader single-writer FIFO
231queue used for state rather than data, except the writer collapses
232adjacent pushes into a single push.
233</li>
234
235<li>
236Pay attention to
237<a href="http://en.wikipedia.org/wiki/Memory_barrier">memory barriers</a>
238for SMP correctness.
239</li>
240
241<li>
242<a href="http://en.wikipedia.org/wiki/Trust,_but_verify">Trust, but verify</a>.
243When sharing
244<i>state</i>
245between processes, don't
246assume that the state is well-formed. For example, check that indices
247are within bounds. This verification isn't needed between threads
248in the same process, between mutual trusting processes (which
249typically have the same UID). It's also unnecessary for shared
250<i>data</i>
251such as PCM audio where a corruption is inconsequential.
252</li>
253
254</ul>
255
Clay Murphy3a7af3a2014-09-09 17:29:09 -0700256<h2 id="nonBlockingAlgorithms">Non-blocking algorithms</h2>
Glenn Kasten98afa532013-04-15 14:02:36 -0700257
258<p>
259<a href="http://en.wikipedia.org/wiki/Non-blocking_algorithm">Non-blocking algorithms</a>
260have been a subject of much recent study.
261But with the exception of single-reader single-writer FIFO queues,
262we've found them to be complex and error-prone.
263</p>
264
265<p>
Clay Murphyc28f2372013-09-25 16:13:40 -0700266Starting in Android 4.2, you can find our non-blocking,
Glenn Kasten98afa532013-04-15 14:02:36 -0700267single-reader/writer classes in these locations:
268</p>
269
270<ul>
271
272<li>
273frameworks/av/include/media/nbaio/
274</li>
275
276<li>
277frameworks/av/media/libnbaio/
278</li>
279
280<li>
281frameworks/av/services/audioflinger/StateQueue*
282</li>
283
284</ul>
285
286<p>
287These were designed specifically for AudioFlinger and are not
288general-purpose. Non-blocking algorithms are notorious for being
Clay Murphyc28f2372013-09-25 16:13:40 -0700289difficult to debug. You can look at this code as a model. But be
Glenn Kasten98afa532013-04-15 14:02:36 -0700290aware there may be bugs, and the classes are not guaranteed to be
291suitable for other purposes.
292</p>
293
294<p>
Clay Murphy5d83ab42014-09-09 17:29:09 -0700295For developers, some of the sample OpenSL ES application code should be updated to
Clay Murphy3a7af3a2014-09-09 17:29:09 -0700296use non-blocking algorithms or reference a non-Android open source library.
Glenn Kasten98afa532013-04-15 14:02:36 -0700297</p>
298
Glenn Kasten73512002015-01-15 10:06:31 -0800299<p>
300We have published an example non-blocking FIFO implementation that is specifically designed for
301application code. See these files located in the platform source directory
302<code>frameworks/av/audio_utils</code>:
303</p>
304<ul>
Glenn Kasten3045cc52015-10-29 18:10:15 -0700305 <li><a href="https://android.googlesource.com/platform/system/media/+/master/audio_utils/include/audio_utils/fifo.h">include/audio_utils/fifo.h</a></li>
306 <li><a href="https://android.googlesource.com/platform/system/media/+/master/audio_utils/fifo.c">fifo.c</a></li>
307 <li><a href="https://android.googlesource.com/platform/system/media/+/master/audio_utils/include/audio_utils/roundup.h">include/audio_utils/roundup.h</a></li>
308 <li><a href="https://android.googlesource.com/platform/system/media/+/master/audio_utils/roundup.c">roundup.c</a></li>
Glenn Kasten73512002015-01-15 10:06:31 -0800309</ul>
310
Glenn Kasten98afa532013-04-15 14:02:36 -0700311<h2 id="tools">Tools</h2>
312
313<p>
314To the best of our knowledge, there are no automatic tools for
315finding priority inversion, especially before it happens. Some
316research static code analysis tools are capable of finding priority
317inversions if able to access the entire codebase. Of course, if
318arbitrary user code is involved (as it is here for the application)
319or is a large codebase (as for the Linux kernel and device drivers),
320static analysis may be impractical. The most important thing is to
321read the code very carefully and get a good grasp on the entire
322system and the interactions. Tools such as
323<a href="http://developer.android.com/tools/help/systrace.html">systrace</a>
324and
325<code>ps -t -p</code>
326are useful for seeing priority inversion after it occurs, but do
327not tell you in advance.
328</p>
329
Clay Murphy3a7af3a2014-09-09 17:29:09 -0700330<h2 id="aFinalWord">A final word</h2>
Glenn Kasten98afa532013-04-15 14:02:36 -0700331
332<p>
333After all of this discussion, don't be afraid of mutexes. Mutexes
334are your friend for ordinary use, when used and implemented correctly
335in ordinary non-time-critical use cases. But between high- and
336low-priority tasks and in time-sensitive systems mutexes are more
337likely to cause trouble.
338</p>