blob: 6ffded13713e2118222d6ae309d49fdb06c65625 [file] [log] [blame]
Pierre-Louis Bossart57bd9b82011-12-23 10:36:35 +05301 compress_offload.txt
2 =====================
3 Pierre-Louis.Bossart <pierre-louis.bossart@linux.intel.com>
4 Vinod Koul <vinod.koul@linux.intel.com>
5
6Overview
7
8Since its early days, the ALSA API was defined with PCM support or
9constant bitrates payloads such as IEC61937 in mind. Arguments and
10returned values in frames are the norm, making it a challenge to
11extend the existing API to compressed data streams.
12
13In recent years, audio digital signal processors (DSP) were integrated
14in system-on-chip designs, and DSPs are also integrated in audio
15codecs. Processing compressed data on such DSPs results in a dramatic
16reduction of power consumption compared to host-based
17processing. Support for such hardware has not been very good in Linux,
18mostly because of a lack of a generic API available in the mainline
19kernel.
20
Masanari Iidac94bed8e2012-04-10 00:22:13 +090021Rather than requiring a compatibility break with an API change of the
Pierre-Louis Bossart57bd9b82011-12-23 10:36:35 +053022ALSA PCM interface, a new 'Compressed Data' API is introduced to
23provide a control and data-streaming interface for audio DSPs.
24
25The design of this API was inspired by the 2-year experience with the
26Intel Moorestown SOC, with many corrections required to upstream the
27API in the mainline kernel instead of the staging tree and make it
28usable by others.
29
30Requirements
31
32The main requirements are:
33
34- separation between byte counts and time. Compressed formats may have
35 a header per file, per frame, or no header at all. The payload size
36 may vary from frame-to-frame. As a result, it is not possible to
37 estimate reliably the duration of audio buffers when handling
38 compressed data. Dedicated mechanisms are required to allow for
39 reliable audio-video synchronization, which requires precise
40 reporting of the number of samples rendered at any given time.
41
42- Handling of multiple formats. PCM data only requires a specification
43 of the sampling rate, number of channels and bits per sample. In
44 contrast, compressed data comes in a variety of formats. Audio DSPs
45 may also provide support for a limited number of audio encoders and
46 decoders embedded in firmware, or may support more choices through
47 dynamic download of libraries.
48
49- Focus on main formats. This API provides support for the most
50 popular formats used for audio and video capture and playback. It is
51 likely that as audio compression technology advances, new formats
52 will be added.
53
54- Handling of multiple configurations. Even for a given format like
55 AAC, some implementations may support AAC multichannel but HE-AAC
56 stereo. Likewise WMA10 level M3 may require too much memory and cpu
57 cycles. The new API needs to provide a generic way of listing these
58 formats.
59
60- Rendering/Grabbing only. This API does not provide any means of
61 hardware acceleration, where PCM samples are provided back to
62 user-space for additional processing. This API focuses instead on
63 streaming compressed data to a DSP, with the assumption that the
64 decoded samples are routed to a physical output or logical back-end.
65
66 - Complexity hiding. Existing user-space multimedia frameworks all
67 have existing enums/structures for each compressed format. This new
68 API assumes the existence of a platform-specific compatibility layer
69 to expose, translate and make use of the capabilities of the audio
70 DSP, eg. Android HAL or PulseAudio sinks. By construction, regular
71 applications are not supposed to make use of this API.
72
73
74Design
75
Masanari Iidac9f3f2d2013-07-18 01:29:12 +090076The new API shares a number of concepts with the PCM API for flow
Pierre-Louis Bossart57bd9b82011-12-23 10:36:35 +053077control. Start, pause, resume, drain and stop commands have the same
78semantics no matter what the content is.
79
80The concept of memory ring buffer divided in a set of fragments is
81borrowed from the ALSA PCM API. However, only sizes in bytes can be
82specified.
83
84Seeks/trick modes are assumed to be handled by the host.
85
86The notion of rewinds/forwards is not supported. Data committed to the
87ring buffer cannot be invalidated, except when dropping all buffers.
88
89The Compressed Data API does not make any assumptions on how the data
90is transmitted to the audio DSP. DMA transfers from main memory to an
91embedded audio cluster or to a SPI interface for external DSPs are
92possible. As in the ALSA PCM case, a core set of routines is exposed;
93each driver implementer will have to write support for a set of
94mandatory routines and possibly make use of optional ones.
95
96The main additions are
97
98- get_caps
99This routine returns the list of audio formats supported. Querying the
100codecs on a capture stream will return encoders, decoders will be
101listed for playback streams.
102
103- get_codec_caps For each codec, this routine returns a list of
104capabilities. The intent is to make sure all the capabilities
105correspond to valid settings, and to minimize the risks of
106configuration failures. For example, for a complex codec such as AAC,
107the number of channels supported may depend on a specific profile. If
108the capabilities were exposed with a single descriptor, it may happen
109that a specific combination of profiles/channels/formats may not be
110supported. Likewise, embedded DSPs have limited memory and cpu cycles,
111it is likely that some implementations make the list of capabilities
112dynamic and dependent on existing workloads. In addition to codec
113settings, this routine returns the minimum buffer size handled by the
114implementation. This information can be a function of the DMA buffer
115sizes, the number of bytes required to synchronize, etc, and can be
116used by userspace to define how much needs to be written in the ring
117buffer before playback can start.
118
119- set_params
120This routine sets the configuration chosen for a specific codec. The
121most important field in the parameters is the codec type; in most
122cases decoders will ignore other fields, while encoders will strictly
123comply to the settings
124
125- get_params
126This routines returns the actual settings used by the DSP. Changes to
127the settings should remain the exception.
128
129- get_timestamp
130The timestamp becomes a multiple field structure. It lists the number
131of bytes transferred, the number of samples processed and the number
132of samples rendered/grabbed. All these values can be used to determine
Stefan Huber298a4392013-06-27 12:54:50 +0200133the average bitrate, figure out if the ring buffer needs to be
Pierre-Louis Bossart57bd9b82011-12-23 10:36:35 +0530134refilled or the delay due to decoding/encoding/io on the DSP.
135
136Note that the list of codecs/profiles/modes was derived from the
137OpenMAX AL specification instead of reinventing the wheel.
138Modifications include:
139- Addition of FLAC and IEC formats
140- Merge of encoder/decoder capabilities
141- Profiles/modes listed as bitmasks to make descriptors more compact
142- Addition of set_params for decoders (missing in OpenMAX AL)
143- Addition of AMR/AMR-WB encoding modes (missing in OpenMAX AL)
144- Addition of format information for WMA
145- Addition of encoding options when required (derived from OpenMAX IL)
146- Addition of rateControlSupported (missing in OpenMAX AL)
147
Jeeja KP9727b492013-02-14 16:52:51 +0530148Gapless Playback
149================
150When playing thru an album, the decoders have the ability to skip the encoder
151delay and padding and directly move from one track content to another. The end
Eric Engestrom8d84c192016-04-25 07:37:02 +0100152user can perceive this as gapless playback as we don't have silence while
Jeeja KP9727b492013-02-14 16:52:51 +0530153switching from one track to another
154
155Also, there might be low-intensity noises due to encoding. Perfect gapless is
156difficult to reach with all types of compressed data, but works fine with most
157music content. The decoder needs to know the encoder delay and encoder padding.
158So we need to pass this to DSP. This metadata is extracted from ID3/MP4 headers
159and are not present by default in the bitstream, hence the need for a new
160interface to pass this information to the DSP. Also DSP and userspace needs to
161switch from one track to another and start using data for second track.
162
163The main additions are:
164
165- set_metadata
166This routine sets the encoder delay and encoder padding. This can be used by
167decoder to strip the silence. This needs to be set before the data in the track
168is written.
169
170- set_next_track
171This routine tells DSP that metadata and write operation sent after this would
172correspond to subsequent track
173
174- partial drain
175This is called when end of file is reached. The userspace can inform DSP that
176EOF is reached and now DSP can start skipping padding delay. Also next write
177data would belong to next track
178
Alexy Joseph998ee752015-09-10 00:04:30 -0700179- set_next_track_param
180This routine is called to send to DSP codec specific data of subsequent track
181in gapless before first write.
182
183
Jeeja KP9727b492013-02-14 16:52:51 +0530184Sequence flow for gapless would be:
185- Open
186- Get caps / codec caps
187- Set params
188- Set metadata of the first track
189- Fill data of the first track
190- Trigger start
191- User-space finished sending all,
Vinod Koul242658f2016-05-02 14:06:28 +0530192- Indicate next track data by sending set_next_track
Jeeja KP9727b492013-02-14 16:52:51 +0530193- Set metadata of the next track
194- then call partial_drain to flush most of buffer in DSP
Alexy Joseph998ee752015-09-10 00:04:30 -0700195- set codec specific data of subsequent track
Jeeja KP9727b492013-02-14 16:52:51 +0530196- Fill data of the next track
197- DSP switches to second track
198(note: order for partial_drain and write for next track can be reversed as well)
199
Pierre-Louis Bossart57bd9b82011-12-23 10:36:35 +0530200Not supported:
201
202- Support for VoIP/circuit-switched calls is not the target of this
203 API. Support for dynamic bit-rate changes would require a tight
204 coupling between the DSP and the host stack, limiting power savings.
205
206- Packet-loss concealment is not supported. This would require an
207 additional interface to let the decoder synthesize data when frames
208 are lost during transmission. This may be added in the future.
209
210- Volume control/routing is not handled by this API. Devices exposing a
211 compressed data interface will be considered as regular ALSA devices;
212 volume changes and routing information will be provided with regular
213 ALSA kcontrols.
214
215- Embedded audio effects. Such effects should be enabled in the same
216 manner, no matter if the input was PCM or compressed.
217
218- multichannel IEC encoding. Unclear if this is required.
219
220- Encoding/decoding acceleration is not supported as mentioned
221 above. It is possible to route the output of a decoder to a capture
222 stream, or even implement transcoding capabilities. This routing
223 would be enabled with ALSA kcontrols.
224
225- Audio policy/resource management. This API does not provide any
Masanari Iidab327d252013-10-29 12:05:02 +0900226 hooks to query the utilization of the audio DSP, nor any preemption
Pierre-Louis Bossart57bd9b82011-12-23 10:36:35 +0530227 mechanisms.
228
Masanari Iidab327d252013-10-29 12:05:02 +0900229- No notion of underrun/overrun. Since the bytes written are compressed
Pierre-Louis Bossart57bd9b82011-12-23 10:36:35 +0530230 in nature and data written/read doesn't translate directly to
Masanari Iidab327d252013-10-29 12:05:02 +0900231 rendered output in time, this does not deal with underrun/overrun and
Pierre-Louis Bossart57bd9b82011-12-23 10:36:35 +0530232 maybe dealt in user-library
233
234Credits:
235- Mark Brown and Liam Girdwood for discussions on the need for this API
236- Harsha Priya for her work on intel_sst compressed API
237- Rakesh Ughreja for valuable feedback
238- Sing Nallasellan, Sikkandar Madar and Prasanna Samaga for
239 demonstrating and quantifying the benefits of audio offload on a
240 real platform.