blob: a7aab755698e098f9e3bc7825cd48eb4d8b49ef6 [file] [log] [blame]
Glenn Kasten98afa532013-04-15 14:02:36 -07001page.title=Avoiding Priority Inversion
2@jd:body
3
Clay Murphybc92aea2014-10-16 10:13:18 -07004<!--
5 Copyright 2013 The Android Open Source Project
6
7 Licensed under the Apache License, Version 2.0 (the "License");
8 you may not use this file except in compliance with the License.
9 You may obtain a copy of the License at
10
11 http://www.apache.org/licenses/LICENSE-2.0
12
13 Unless required by applicable law or agreed to in writing, software
14 distributed under the License is distributed on an "AS IS" BASIS,
15 WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
16 See the License for the specific language governing permissions and
17 limitations under the License.
18-->
Glenn Kasten98afa532013-04-15 14:02:36 -070019<div id="qv-wrapper">
20 <div id="qv">
21 <h2>In this document</h2>
22 <ol id="auto-toc">
23 </ol>
24 </div>
25</div>
26
27<p>
28This article explains how the Android's audio system attempts to avoid
Clay Murphyc28f2372013-09-25 16:13:40 -070029priority inversion, as of the Android 4.1 release,
Glenn Kasten98afa532013-04-15 14:02:36 -070030and highlights techniques that you can use too.
31</p>
32
33<p>
34These techniques may be useful to developers of high-performance
35audio apps, OEMs, and SoC providers who are implementing an audio
Clay Murphyc28f2372013-09-25 16:13:40 -070036HAL. Please note implementing these techniques is not
Glenn Kasten98afa532013-04-15 14:02:36 -070037guaranteed to prevent glitches or other failures, particularly if
38used outside of the audio context.
Clay Murphyc28f2372013-09-25 16:13:40 -070039Your results may vary, and you should conduct your own
Glenn Kasten98afa532013-04-15 14:02:36 -070040evaluation and testing.
41</p>
42
43<h2 id="background">Background</h2>
44
45<p>
Clay Murphyc28f2372013-09-25 16:13:40 -070046The Android AudioFlinger audio server and AudioTrack/AudioRecord
Glenn Kasten98afa532013-04-15 14:02:36 -070047client implementation are being re-architected to reduce latency.
Clay Murphyc28f2372013-09-25 16:13:40 -070048This work started in Android 4.1, continued in 4.2 and 4.3, and now more
49improvements exist in version 4.4.
Glenn Kasten98afa532013-04-15 14:02:36 -070050</p>
51
52<p>
Clay Murphyc28f2372013-09-25 16:13:40 -070053To achieve this lower latency, many changes were needed throughout the system. One
54important change is to assign CPU resources to time-critical
Glenn Kasten98afa532013-04-15 14:02:36 -070055threads with a more predictable scheduling policy. Reliable scheduling
Clay Murphyc28f2372013-09-25 16:13:40 -070056allows the audio buffer sizes and counts to be reduced while still
Glenn Kasten98afa532013-04-15 14:02:36 -070057avoiding artifacts due to underruns.
58</p>
59
60<h2 id="priorityInversion">Priority Inversion</h2>
61
62<p>
63<a href="http://en.wikipedia.org/wiki/Priority_inversion">Priority inversion</a>
64is a classic failure mode of real-time systems,
65where a higher-priority task is blocked for an unbounded time waiting
66for a lower-priority task to release a resource such as [shared
67state protected by] a
68<a href="http://en.wikipedia.org/wiki/Mutual_exclusion">mutex</a>.
69</p>
70
71<p>
72In an audio system, priority inversion typically manifests as a
73<a href="http://en.wikipedia.org/wiki/Glitch">glitch</a>
74(click, pop, dropout),
75<a href="http://en.wikipedia.org/wiki/Max_Headroom_(character)">repeated audio</a>
76when circular buffers
77are used, or delay in responding to a command.
78</p>
79
80<p>
81In the Android audio implementation, priority inversion is most
Clay Murphyc28f2372013-09-25 16:13:40 -070082likely to occur in these places. And so we focus attention here:
Glenn Kasten98afa532013-04-15 14:02:36 -070083</p>
84
85<ul>
86
87<li>
88between normal mixer thread and fast mixer thread in AudioFlinger
89</li>
90
91<li>
92between application callback thread for a fast AudioTrack and
93fast mixer thread (they both have elevated priority, but slightly
94different priorities)
95</li>
96
97<li>
Clay Murphyc28f2372013-09-25 16:13:40 -070098within the audio Hardware Abstraction Layer (HAL) implementation, e.g. for telephony or echo cancellation
Glenn Kasten98afa532013-04-15 14:02:36 -070099</li>
100
101<li>
102within the audio driver in kernel
103</li>
104
105<li>
106between AudioTrack callback thread and other app threads (this is out of our control)
107</li>
108
109</ul>
110
111<p>
112As of this writing, reduced latency for AudioRecord is planned but
113not yet implemented. The likely priority inversion spots will be
114similar to those for AudioTrack.
115</p>
116
117<h2 id="commonSolutions">Common Solutions</h2>
118
119<p>
120The typical solutions listed in the Wikipedia article include:
121</p>
122
123<ul>
124
125<li>
126disabling interrupts
127</li>
128
129<li>
130priority inheritance mutexes
131</li>
132
133</ul>
134
135<p>
136Disabling interrupts is not feasible in Linux user space, and does
Clay Murphyc28f2372013-09-25 16:13:40 -0700137not work for Symmetric Multi-Processors (SMP).
Glenn Kasten98afa532013-04-15 14:02:36 -0700138</p>
139
140
141<p>
142Priority inheritance
143<a href="http://en.wikipedia.org/wiki/Futex">futexes</a>
144(fast user-space mutexes) are available
145in Linux kernel, but are not currently exposed by the Android C
146runtime library
147<a href="http://en.wikipedia.org/wiki/Bionic_(software)">Bionic</a>.
148We chose not to use them in the audio system
149because they are relatively heavyweight, and because they rely on
150a trusted client.
151</p>
152
153<h2 id="androidTechniques">Techniques used by Android</h2>
154
155<p>
156We started with "try lock" and lock with timeout. These are
157non-blocking and bounded blocking variants of the mutex lock
158operation. Try lock and lock with timeout worked fairly well for
159us, but were susceptible to a couple of obscure failure modes: the
160server was not guaranteed to be able to access the shared state if
161the client happened to be busy, and the cumulative timeout could
162be too long if there was a long sequence of unrelated locks that
163all timed out.
164</p>
165
166
167<p>
168We also use
169<a href="http://en.wikipedia.org/wiki/Linearizability">atomic operations</a>
170such as:
171</p>
172
173<ul>
174<li>increment</li>
175<li>bitwise "or"</li>
176<li>bitwise "and"</li>
177</ul>
178
179<p>
Clay Murphyc28f2372013-09-25 16:13:40 -0700180All of these return the previous value and include the necessary
Glenn Kasten98afa532013-04-15 14:02:36 -0700181SMP barriers. The disadvantage is they can require unbounded retries.
182In practice, we've found that the retries are not a problem.
183</p>
184
185<p>
Clay Murphyc28f2372013-09-25 16:13:40 -0700186<strong>Note</strong>: Atomic operations and their interactions with memory barriers
Glenn Kasten98afa532013-04-15 14:02:36 -0700187are notoriously badly misunderstood and used incorrectly. We include
Clay Murphyc28f2372013-09-25 16:13:40 -0700188these methods here for completeness but recommend you also read the article
Glenn Kasten98afa532013-04-15 14:02:36 -0700189<a href="https://developer.android.com/training/articles/smp.html">
190SMP Primer for Android</a>
191for further information.
192</p>
193
194<p>
195We still have and use most of the above tools, and have recently
196added these techniques:
197</p>
198
199<ul>
200
201<li>
202Use non-blocking single-reader single-writer
203<a href="http://en.wikipedia.org/wiki/Circular_buffer">FIFO queues</a>
204for data.
205</li>
206
207<li>
208Try to
209<i>copy</i>
210state rather than
211<i>share</i>
212state between high- and
213low-priority modules.
214</li>
215
216<li>
217When state does need to be shared, limit the state to the
218maximum-size
219<a href="http://en.wikipedia.org/wiki/Word_(computer_architecture)">word</a>
Clay Murphyc28f2372013-09-25 16:13:40 -0700220that can be accessed atomically in one-bus operation
Glenn Kasten98afa532013-04-15 14:02:36 -0700221without retries.
222</li>
223
224<li>
225For complex multi-word state, use a state queue. A state queue
226is basically just a non-blocking single-reader single-writer FIFO
227queue used for state rather than data, except the writer collapses
228adjacent pushes into a single push.
229</li>
230
231<li>
232Pay attention to
233<a href="http://en.wikipedia.org/wiki/Memory_barrier">memory barriers</a>
234for SMP correctness.
235</li>
236
237<li>
238<a href="http://en.wikipedia.org/wiki/Trust,_but_verify">Trust, but verify</a>.
239When sharing
240<i>state</i>
241between processes, don't
242assume that the state is well-formed. For example, check that indices
243are within bounds. This verification isn't needed between threads
244in the same process, between mutual trusting processes (which
245typically have the same UID). It's also unnecessary for shared
246<i>data</i>
247such as PCM audio where a corruption is inconsequential.
248</li>
249
250</ul>
251
252<h2 id="nonBlockingAlgorithms">Non-Blocking Algorithms</h2>
253
254<p>
255<a href="http://en.wikipedia.org/wiki/Non-blocking_algorithm">Non-blocking algorithms</a>
256have been a subject of much recent study.
257But with the exception of single-reader single-writer FIFO queues,
258we've found them to be complex and error-prone.
259</p>
260
261<p>
Clay Murphyc28f2372013-09-25 16:13:40 -0700262Starting in Android 4.2, you can find our non-blocking,
Glenn Kasten98afa532013-04-15 14:02:36 -0700263single-reader/writer classes in these locations:
264</p>
265
266<ul>
267
268<li>
269frameworks/av/include/media/nbaio/
270</li>
271
272<li>
273frameworks/av/media/libnbaio/
274</li>
275
276<li>
277frameworks/av/services/audioflinger/StateQueue*
278</li>
279
280</ul>
281
282<p>
283These were designed specifically for AudioFlinger and are not
284general-purpose. Non-blocking algorithms are notorious for being
Clay Murphyc28f2372013-09-25 16:13:40 -0700285difficult to debug. You can look at this code as a model. But be
Glenn Kasten98afa532013-04-15 14:02:36 -0700286aware there may be bugs, and the classes are not guaranteed to be
287suitable for other purposes.
288</p>
289
290<p>
291For developers, we may update some of the sample OpenSL ES application
Clay Murphyc28f2372013-09-25 16:13:40 -0700292code to use non-blocking algorithms or reference a non-Android open source
Glenn Kasten98afa532013-04-15 14:02:36 -0700293library.
294</p>
295
296<h2 id="tools">Tools</h2>
297
298<p>
299To the best of our knowledge, there are no automatic tools for
300finding priority inversion, especially before it happens. Some
301research static code analysis tools are capable of finding priority
302inversions if able to access the entire codebase. Of course, if
303arbitrary user code is involved (as it is here for the application)
304or is a large codebase (as for the Linux kernel and device drivers),
305static analysis may be impractical. The most important thing is to
306read the code very carefully and get a good grasp on the entire
307system and the interactions. Tools such as
308<a href="http://developer.android.com/tools/help/systrace.html">systrace</a>
309and
310<code>ps -t -p</code>
311are useful for seeing priority inversion after it occurs, but do
312not tell you in advance.
313</p>
314
315<h2 id="aFinalWord">A Final Word</h2>
316
317<p>
318After all of this discussion, don't be afraid of mutexes. Mutexes
319are your friend for ordinary use, when used and implemented correctly
320in ordinary non-time-critical use cases. But between high- and
321low-priority tasks and in time-sensitive systems mutexes are more
322likely to cause trouble.
323</p>
324