blob: a8cd208c814b4b9c7beff5ca08881838e49b4a7f [file] [log] [blame]
Glenn Kasten98afa532013-04-15 14:02:36 -07001page.title=Avoiding Priority Inversion
2@jd:body
3
4<div id="qv-wrapper">
5 <div id="qv">
6 <h2>In this document</h2>
7 <ol id="auto-toc">
8 </ol>
9 </div>
10</div>
11
12<p>
13This article explains how the Android's audio system attempts to avoid
Clay Murphyc28f2372013-09-25 16:13:40 -070014priority inversion, as of the Android 4.1 release,
Glenn Kasten98afa532013-04-15 14:02:36 -070015and highlights techniques that you can use too.
16</p>
17
18<p>
19These techniques may be useful to developers of high-performance
20audio apps, OEMs, and SoC providers who are implementing an audio
Clay Murphyc28f2372013-09-25 16:13:40 -070021HAL. Please note implementing these techniques is not
Glenn Kasten98afa532013-04-15 14:02:36 -070022guaranteed to prevent glitches or other failures, particularly if
23used outside of the audio context.
Clay Murphyc28f2372013-09-25 16:13:40 -070024Your results may vary, and you should conduct your own
Glenn Kasten98afa532013-04-15 14:02:36 -070025evaluation and testing.
26</p>
27
28<h2 id="background">Background</h2>
29
30<p>
Clay Murphyc28f2372013-09-25 16:13:40 -070031The Android AudioFlinger audio server and AudioTrack/AudioRecord
Glenn Kasten98afa532013-04-15 14:02:36 -070032client implementation are being re-architected to reduce latency.
Clay Murphyc28f2372013-09-25 16:13:40 -070033This work started in Android 4.1, continued in 4.2 and 4.3, and now more
34improvements exist in version 4.4.
Glenn Kasten98afa532013-04-15 14:02:36 -070035</p>
36
37<p>
Clay Murphyc28f2372013-09-25 16:13:40 -070038To achieve this lower latency, many changes were needed throughout the system. One
39important change is to assign CPU resources to time-critical
Glenn Kasten98afa532013-04-15 14:02:36 -070040threads with a more predictable scheduling policy. Reliable scheduling
Clay Murphyc28f2372013-09-25 16:13:40 -070041allows the audio buffer sizes and counts to be reduced while still
Glenn Kasten98afa532013-04-15 14:02:36 -070042avoiding artifacts due to underruns.
43</p>
44
45<h2 id="priorityInversion">Priority Inversion</h2>
46
47<p>
48<a href="http://en.wikipedia.org/wiki/Priority_inversion">Priority inversion</a>
49is a classic failure mode of real-time systems,
50where a higher-priority task is blocked for an unbounded time waiting
51for a lower-priority task to release a resource such as [shared
52state protected by] a
53<a href="http://en.wikipedia.org/wiki/Mutual_exclusion">mutex</a>.
54</p>
55
56<p>
57In an audio system, priority inversion typically manifests as a
58<a href="http://en.wikipedia.org/wiki/Glitch">glitch</a>
59(click, pop, dropout),
60<a href="http://en.wikipedia.org/wiki/Max_Headroom_(character)">repeated audio</a>
61when circular buffers
62are used, or delay in responding to a command.
63</p>
64
65<p>
66In the Android audio implementation, priority inversion is most
Clay Murphyc28f2372013-09-25 16:13:40 -070067likely to occur in these places. And so we focus attention here:
Glenn Kasten98afa532013-04-15 14:02:36 -070068</p>
69
70<ul>
71
72<li>
73between normal mixer thread and fast mixer thread in AudioFlinger
74</li>
75
76<li>
77between application callback thread for a fast AudioTrack and
78fast mixer thread (they both have elevated priority, but slightly
79different priorities)
80</li>
81
82<li>
Clay Murphyc28f2372013-09-25 16:13:40 -070083within the audio Hardware Abstraction Layer (HAL) implementation, e.g. for telephony or echo cancellation
Glenn Kasten98afa532013-04-15 14:02:36 -070084</li>
85
86<li>
87within the audio driver in kernel
88</li>
89
90<li>
91between AudioTrack callback thread and other app threads (this is out of our control)
92</li>
93
94</ul>
95
96<p>
97As of this writing, reduced latency for AudioRecord is planned but
98not yet implemented. The likely priority inversion spots will be
99similar to those for AudioTrack.
100</p>
101
102<h2 id="commonSolutions">Common Solutions</h2>
103
104<p>
105The typical solutions listed in the Wikipedia article include:
106</p>
107
108<ul>
109
110<li>
111disabling interrupts
112</li>
113
114<li>
115priority inheritance mutexes
116</li>
117
118</ul>
119
120<p>
121Disabling interrupts is not feasible in Linux user space, and does
Clay Murphyc28f2372013-09-25 16:13:40 -0700122not work for Symmetric Multi-Processors (SMP).
Glenn Kasten98afa532013-04-15 14:02:36 -0700123</p>
124
125
126<p>
127Priority inheritance
128<a href="http://en.wikipedia.org/wiki/Futex">futexes</a>
129(fast user-space mutexes) are available
130in Linux kernel, but are not currently exposed by the Android C
131runtime library
132<a href="http://en.wikipedia.org/wiki/Bionic_(software)">Bionic</a>.
133We chose not to use them in the audio system
134because they are relatively heavyweight, and because they rely on
135a trusted client.
136</p>
137
138<h2 id="androidTechniques">Techniques used by Android</h2>
139
140<p>
141We started with "try lock" and lock with timeout. These are
142non-blocking and bounded blocking variants of the mutex lock
143operation. Try lock and lock with timeout worked fairly well for
144us, but were susceptible to a couple of obscure failure modes: the
145server was not guaranteed to be able to access the shared state if
146the client happened to be busy, and the cumulative timeout could
147be too long if there was a long sequence of unrelated locks that
148all timed out.
149</p>
150
151
152<p>
153We also use
154<a href="http://en.wikipedia.org/wiki/Linearizability">atomic operations</a>
155such as:
156</p>
157
158<ul>
159<li>increment</li>
160<li>bitwise "or"</li>
161<li>bitwise "and"</li>
162</ul>
163
164<p>
Clay Murphyc28f2372013-09-25 16:13:40 -0700165All of these return the previous value and include the necessary
Glenn Kasten98afa532013-04-15 14:02:36 -0700166SMP barriers. The disadvantage is they can require unbounded retries.
167In practice, we've found that the retries are not a problem.
168</p>
169
170<p>
Clay Murphyc28f2372013-09-25 16:13:40 -0700171<strong>Note</strong>: Atomic operations and their interactions with memory barriers
Glenn Kasten98afa532013-04-15 14:02:36 -0700172are notoriously badly misunderstood and used incorrectly. We include
Clay Murphyc28f2372013-09-25 16:13:40 -0700173these methods here for completeness but recommend you also read the article
Glenn Kasten98afa532013-04-15 14:02:36 -0700174<a href="https://developer.android.com/training/articles/smp.html">
175SMP Primer for Android</a>
176for further information.
177</p>
178
179<p>
180We still have and use most of the above tools, and have recently
181added these techniques:
182</p>
183
184<ul>
185
186<li>
187Use non-blocking single-reader single-writer
188<a href="http://en.wikipedia.org/wiki/Circular_buffer">FIFO queues</a>
189for data.
190</li>
191
192<li>
193Try to
194<i>copy</i>
195state rather than
196<i>share</i>
197state between high- and
198low-priority modules.
199</li>
200
201<li>
202When state does need to be shared, limit the state to the
203maximum-size
204<a href="http://en.wikipedia.org/wiki/Word_(computer_architecture)">word</a>
Clay Murphyc28f2372013-09-25 16:13:40 -0700205that can be accessed atomically in one-bus operation
Glenn Kasten98afa532013-04-15 14:02:36 -0700206without retries.
207</li>
208
209<li>
210For complex multi-word state, use a state queue. A state queue
211is basically just a non-blocking single-reader single-writer FIFO
212queue used for state rather than data, except the writer collapses
213adjacent pushes into a single push.
214</li>
215
216<li>
217Pay attention to
218<a href="http://en.wikipedia.org/wiki/Memory_barrier">memory barriers</a>
219for SMP correctness.
220</li>
221
222<li>
223<a href="http://en.wikipedia.org/wiki/Trust,_but_verify">Trust, but verify</a>.
224When sharing
225<i>state</i>
226between processes, don't
227assume that the state is well-formed. For example, check that indices
228are within bounds. This verification isn't needed between threads
229in the same process, between mutual trusting processes (which
230typically have the same UID). It's also unnecessary for shared
231<i>data</i>
232such as PCM audio where a corruption is inconsequential.
233</li>
234
235</ul>
236
237<h2 id="nonBlockingAlgorithms">Non-Blocking Algorithms</h2>
238
239<p>
240<a href="http://en.wikipedia.org/wiki/Non-blocking_algorithm">Non-blocking algorithms</a>
241have been a subject of much recent study.
242But with the exception of single-reader single-writer FIFO queues,
243we've found them to be complex and error-prone.
244</p>
245
246<p>
Clay Murphyc28f2372013-09-25 16:13:40 -0700247Starting in Android 4.2, you can find our non-blocking,
Glenn Kasten98afa532013-04-15 14:02:36 -0700248single-reader/writer classes in these locations:
249</p>
250
251<ul>
252
253<li>
254frameworks/av/include/media/nbaio/
255</li>
256
257<li>
258frameworks/av/media/libnbaio/
259</li>
260
261<li>
262frameworks/av/services/audioflinger/StateQueue*
263</li>
264
265</ul>
266
267<p>
268These were designed specifically for AudioFlinger and are not
269general-purpose. Non-blocking algorithms are notorious for being
Clay Murphyc28f2372013-09-25 16:13:40 -0700270difficult to debug. You can look at this code as a model. But be
Glenn Kasten98afa532013-04-15 14:02:36 -0700271aware there may be bugs, and the classes are not guaranteed to be
272suitable for other purposes.
273</p>
274
275<p>
276For developers, we may update some of the sample OpenSL ES application
Clay Murphyc28f2372013-09-25 16:13:40 -0700277code to use non-blocking algorithms or reference a non-Android open source
Glenn Kasten98afa532013-04-15 14:02:36 -0700278library.
279</p>
280
281<h2 id="tools">Tools</h2>
282
283<p>
284To the best of our knowledge, there are no automatic tools for
285finding priority inversion, especially before it happens. Some
286research static code analysis tools are capable of finding priority
287inversions if able to access the entire codebase. Of course, if
288arbitrary user code is involved (as it is here for the application)
289or is a large codebase (as for the Linux kernel and device drivers),
290static analysis may be impractical. The most important thing is to
291read the code very carefully and get a good grasp on the entire
292system and the interactions. Tools such as
293<a href="http://developer.android.com/tools/help/systrace.html">systrace</a>
294and
295<code>ps -t -p</code>
296are useful for seeing priority inversion after it occurs, but do
297not tell you in advance.
298</p>
299
300<h2 id="aFinalWord">A Final Word</h2>
301
302<p>
303After all of this discussion, don't be afraid of mutexes. Mutexes
304are your friend for ordinary use, when used and implemented correctly
305in ordinary non-time-critical use cases. But between high- and
306low-priority tasks and in time-sensitive systems mutexes are more
307likely to cause trouble.
308</p>
309