blob: 49b901e86d7a0128b9ae79a68a98a18fe1f8fe9f [file] [log] [blame]
Glenn Kasten98afa532013-04-15 14:02:36 -07001page.title=Avoiding Priority Inversion
2@jd:body
3
4<div id="qv-wrapper">
5 <div id="qv">
6 <h2>In this document</h2>
7 <ol id="auto-toc">
8 </ol>
9 </div>
10</div>
11
12<p>
13This article explains how the Android's audio system attempts to avoid
Clay Murphyc28f2372013-09-25 16:13:40 -070014priority inversion, as of the Android 4.1 release,
Glenn Kasten98afa532013-04-15 14:02:36 -070015and highlights techniques that you can use too.
16</p>
17
18<p>
19These techniques may be useful to developers of high-performance
20audio apps, OEMs, and SoC providers who are implementing an audio
Clay Murphyc28f2372013-09-25 16:13:40 -070021HAL. Please note implementing these techniques is not
Glenn Kasten98afa532013-04-15 14:02:36 -070022guaranteed to prevent glitches or other failures, particularly if
23used outside of the audio context.
Clay Murphyc28f2372013-09-25 16:13:40 -070024Your results may vary, and you should conduct your own
Glenn Kasten98afa532013-04-15 14:02:36 -070025evaluation and testing.
26</p>
27
28<h2 id="background">Background</h2>
29
30<p>
Clay Murphyc28f2372013-09-25 16:13:40 -070031The Android AudioFlinger audio server and AudioTrack/AudioRecord
Glenn Kasten98afa532013-04-15 14:02:36 -070032client implementation are being re-architected to reduce latency.
Clay Murphyc28f2372013-09-25 16:13:40 -070033This work started in Android 4.1, continued in 4.2 and 4.3, and now more
34improvements exist in version 4.4.
Glenn Kasten98afa532013-04-15 14:02:36 -070035</p>
36
37<p>
Clay Murphyc28f2372013-09-25 16:13:40 -070038To achieve this lower latency, many changes were needed throughout the system. One
39important change is to assign CPU resources to time-critical
Glenn Kasten98afa532013-04-15 14:02:36 -070040threads with a more predictable scheduling policy. Reliable scheduling
Clay Murphyc28f2372013-09-25 16:13:40 -070041allows the audio buffer sizes and counts to be reduced while still
Glenn Kasten98afa532013-04-15 14:02:36 -070042avoiding artifacts due to underruns.
43</p>
44
Clay Murphy3a7af3a2014-09-09 17:29:09 -070045<h2 id="priorityInversion">Priority inversion</h2>
Glenn Kasten98afa532013-04-15 14:02:36 -070046
47<p>
48<a href="http://en.wikipedia.org/wiki/Priority_inversion">Priority inversion</a>
49is a classic failure mode of real-time systems,
50where a higher-priority task is blocked for an unbounded time waiting
Clay Murphy3a7af3a2014-09-09 17:29:09 -070051for a lower-priority task to release a resource such as (shared
52state protected by) a
Glenn Kasten98afa532013-04-15 14:02:36 -070053<a href="http://en.wikipedia.org/wiki/Mutual_exclusion">mutex</a>.
54</p>
55
56<p>
57In an audio system, priority inversion typically manifests as a
58<a href="http://en.wikipedia.org/wiki/Glitch">glitch</a>
59(click, pop, dropout),
60<a href="http://en.wikipedia.org/wiki/Max_Headroom_(character)">repeated audio</a>
61when circular buffers
62are used, or delay in responding to a command.
63</p>
64
65<p>
66In the Android audio implementation, priority inversion is most
Clay Murphy3a7af3a2014-09-09 17:29:09 -070067likely to occur in these places. And so you should focus your attention here:
Glenn Kasten98afa532013-04-15 14:02:36 -070068</p>
69
70<ul>
71
72<li>
73between normal mixer thread and fast mixer thread in AudioFlinger
74</li>
75
76<li>
77between application callback thread for a fast AudioTrack and
78fast mixer thread (they both have elevated priority, but slightly
79different priorities)
80</li>
81
82<li>
Clay Murphyc28f2372013-09-25 16:13:40 -070083within the audio Hardware Abstraction Layer (HAL) implementation, e.g. for telephony or echo cancellation
Glenn Kasten98afa532013-04-15 14:02:36 -070084</li>
85
86<li>
87within the audio driver in kernel
88</li>
89
90<li>
91between AudioTrack callback thread and other app threads (this is out of our control)
92</li>
93
94</ul>
95
96<p>
97As of this writing, reduced latency for AudioRecord is planned but
98not yet implemented. The likely priority inversion spots will be
99similar to those for AudioTrack.
100</p>
101
Clay Murphy3a7af3a2014-09-09 17:29:09 -0700102<h2 id="commonSolutions">Common solutions</h2>
Glenn Kasten98afa532013-04-15 14:02:36 -0700103
104<p>
Clay Murphy3a7af3a2014-09-09 17:29:09 -0700105The typical solutions include:
Glenn Kasten98afa532013-04-15 14:02:36 -0700106</p>
107
108<ul>
109
110<li>
111disabling interrupts
112</li>
113
114<li>
115priority inheritance mutexes
116</li>
117
118</ul>
119
120<p>
121Disabling interrupts is not feasible in Linux user space, and does
Clay Murphyc28f2372013-09-25 16:13:40 -0700122not work for Symmetric Multi-Processors (SMP).
Glenn Kasten98afa532013-04-15 14:02:36 -0700123</p>
124
125
126<p>
127Priority inheritance
128<a href="http://en.wikipedia.org/wiki/Futex">futexes</a>
129(fast user-space mutexes) are available
130in Linux kernel, but are not currently exposed by the Android C
131runtime library
132<a href="http://en.wikipedia.org/wiki/Bionic_(software)">Bionic</a>.
Clay Murphy3a7af3a2014-09-09 17:29:09 -0700133They are not used in the audio system because they are relatively heavyweight,
134and because they rely on a trusted client.
Glenn Kasten98afa532013-04-15 14:02:36 -0700135</p>
136
137<h2 id="androidTechniques">Techniques used by Android</h2>
138
139<p>
Clay Murphy3a7af3a2014-09-09 17:29:09 -0700140Experiments started with "try lock" and lock with timeout. These are
Glenn Kasten98afa532013-04-15 14:02:36 -0700141non-blocking and bounded blocking variants of the mutex lock
Clay Murphy3a7af3a2014-09-09 17:29:09 -0700142operation. Try lock and lock with timeout worked fairly well but were
143susceptible to a couple of obscure failure modes: the
Glenn Kasten98afa532013-04-15 14:02:36 -0700144server was not guaranteed to be able to access the shared state if
145the client happened to be busy, and the cumulative timeout could
146be too long if there was a long sequence of unrelated locks that
147all timed out.
148</p>
149
150
151<p>
152We also use
153<a href="http://en.wikipedia.org/wiki/Linearizability">atomic operations</a>
154such as:
155</p>
156
157<ul>
158<li>increment</li>
159<li>bitwise "or"</li>
160<li>bitwise "and"</li>
161</ul>
162
163<p>
Clay Murphyc28f2372013-09-25 16:13:40 -0700164All of these return the previous value and include the necessary
Glenn Kasten98afa532013-04-15 14:02:36 -0700165SMP barriers. The disadvantage is they can require unbounded retries.
166In practice, we've found that the retries are not a problem.
167</p>
168
Clay Murphy3a7af3a2014-09-09 17:29:09 -0700169<p class="note"><strong>Note:</strong> Atomic operations and their interactions with memory barriers
170are notoriously badly misunderstood and used incorrectly. These methods are
171included here for completeness but recommend you also read the article
Glenn Kasten98afa532013-04-15 14:02:36 -0700172<a href="https://developer.android.com/training/articles/smp.html">
173SMP Primer for Android</a>
174for further information.
175</p>
176
177<p>
178We still have and use most of the above tools, and have recently
179added these techniques:
180</p>
181
182<ul>
183
184<li>
185Use non-blocking single-reader single-writer
186<a href="http://en.wikipedia.org/wiki/Circular_buffer">FIFO queues</a>
187for data.
188</li>
189
190<li>
191Try to
192<i>copy</i>
193state rather than
194<i>share</i>
195state between high- and
196low-priority modules.
197</li>
198
199<li>
200When state does need to be shared, limit the state to the
201maximum-size
202<a href="http://en.wikipedia.org/wiki/Word_(computer_architecture)">word</a>
Clay Murphyc28f2372013-09-25 16:13:40 -0700203that can be accessed atomically in one-bus operation
Glenn Kasten98afa532013-04-15 14:02:36 -0700204without retries.
205</li>
206
207<li>
208For complex multi-word state, use a state queue. A state queue
209is basically just a non-blocking single-reader single-writer FIFO
210queue used for state rather than data, except the writer collapses
211adjacent pushes into a single push.
212</li>
213
214<li>
215Pay attention to
216<a href="http://en.wikipedia.org/wiki/Memory_barrier">memory barriers</a>
217for SMP correctness.
218</li>
219
220<li>
221<a href="http://en.wikipedia.org/wiki/Trust,_but_verify">Trust, but verify</a>.
222When sharing
223<i>state</i>
224between processes, don't
225assume that the state is well-formed. For example, check that indices
226are within bounds. This verification isn't needed between threads
227in the same process, between mutual trusting processes (which
228typically have the same UID). It's also unnecessary for shared
229<i>data</i>
230such as PCM audio where a corruption is inconsequential.
231</li>
232
233</ul>
234
Clay Murphy3a7af3a2014-09-09 17:29:09 -0700235<h2 id="nonBlockingAlgorithms">Non-blocking algorithms</h2>
Glenn Kasten98afa532013-04-15 14:02:36 -0700236
237<p>
238<a href="http://en.wikipedia.org/wiki/Non-blocking_algorithm">Non-blocking algorithms</a>
239have been a subject of much recent study.
240But with the exception of single-reader single-writer FIFO queues,
241we've found them to be complex and error-prone.
242</p>
243
244<p>
Clay Murphyc28f2372013-09-25 16:13:40 -0700245Starting in Android 4.2, you can find our non-blocking,
Glenn Kasten98afa532013-04-15 14:02:36 -0700246single-reader/writer classes in these locations:
247</p>
248
249<ul>
250
251<li>
252frameworks/av/include/media/nbaio/
253</li>
254
255<li>
256frameworks/av/media/libnbaio/
257</li>
258
259<li>
260frameworks/av/services/audioflinger/StateQueue*
261</li>
262
263</ul>
264
265<p>
266These were designed specifically for AudioFlinger and are not
267general-purpose. Non-blocking algorithms are notorious for being
Clay Murphyc28f2372013-09-25 16:13:40 -0700268difficult to debug. You can look at this code as a model. But be
Glenn Kasten98afa532013-04-15 14:02:36 -0700269aware there may be bugs, and the classes are not guaranteed to be
270suitable for other purposes.
271</p>
272
273<p>
Clay Murphy3a7af3a2014-09-09 17:29:09 -0700274For developers, some of the sample OpenSL ES application code may be updated to
275use non-blocking algorithms or reference a non-Android open source library.
Glenn Kasten98afa532013-04-15 14:02:36 -0700276</p>
277
278<h2 id="tools">Tools</h2>
279
280<p>
281To the best of our knowledge, there are no automatic tools for
282finding priority inversion, especially before it happens. Some
283research static code analysis tools are capable of finding priority
284inversions if able to access the entire codebase. Of course, if
285arbitrary user code is involved (as it is here for the application)
286or is a large codebase (as for the Linux kernel and device drivers),
287static analysis may be impractical. The most important thing is to
288read the code very carefully and get a good grasp on the entire
289system and the interactions. Tools such as
290<a href="http://developer.android.com/tools/help/systrace.html">systrace</a>
291and
292<code>ps -t -p</code>
293are useful for seeing priority inversion after it occurs, but do
294not tell you in advance.
295</p>
296
Clay Murphy3a7af3a2014-09-09 17:29:09 -0700297<h2 id="aFinalWord">A final word</h2>
Glenn Kasten98afa532013-04-15 14:02:36 -0700298
299<p>
300After all of this discussion, don't be afraid of mutexes. Mutexes
301are your friend for ordinary use, when used and implemented correctly
302in ordinary non-time-critical use cases. But between high- and
303low-priority tasks and in time-sensitive systems mutexes are more
304likely to cause trouble.
305</p>
306