Pierre-Louis Bossart | 57bd9b8 | 2011-12-23 10:36:35 +0530 | [diff] [blame] | 1 | compress_offload.txt |
| 2 | ===================== |
| 3 | Pierre-Louis.Bossart <pierre-louis.bossart@linux.intel.com> |
| 4 | Vinod Koul <vinod.koul@linux.intel.com> |
| 5 | |
| 6 | Overview |
| 7 | |
| 8 | Since its early days, the ALSA API was defined with PCM support or |
| 9 | constant bitrates payloads such as IEC61937 in mind. Arguments and |
| 10 | returned values in frames are the norm, making it a challenge to |
| 11 | extend the existing API to compressed data streams. |
| 12 | |
| 13 | In recent years, audio digital signal processors (DSP) were integrated |
| 14 | in system-on-chip designs, and DSPs are also integrated in audio |
| 15 | codecs. Processing compressed data on such DSPs results in a dramatic |
| 16 | reduction of power consumption compared to host-based |
| 17 | processing. Support for such hardware has not been very good in Linux, |
| 18 | mostly because of a lack of a generic API available in the mainline |
| 19 | kernel. |
| 20 | |
Masanari Iida | c94bed8e | 2012-04-10 00:22:13 +0900 | [diff] [blame] | 21 | Rather than requiring a compatibility break with an API change of the |
Pierre-Louis Bossart | 57bd9b8 | 2011-12-23 10:36:35 +0530 | [diff] [blame] | 22 | ALSA PCM interface, a new 'Compressed Data' API is introduced to |
| 23 | provide a control and data-streaming interface for audio DSPs. |
| 24 | |
| 25 | The design of this API was inspired by the 2-year experience with the |
| 26 | Intel Moorestown SOC, with many corrections required to upstream the |
| 27 | API in the mainline kernel instead of the staging tree and make it |
| 28 | usable by others. |
| 29 | |
| 30 | Requirements |
| 31 | |
| 32 | The main requirements are: |
| 33 | |
| 34 | - separation between byte counts and time. Compressed formats may have |
| 35 | a header per file, per frame, or no header at all. The payload size |
| 36 | may vary from frame-to-frame. As a result, it is not possible to |
| 37 | estimate reliably the duration of audio buffers when handling |
| 38 | compressed data. Dedicated mechanisms are required to allow for |
| 39 | reliable audio-video synchronization, which requires precise |
| 40 | reporting of the number of samples rendered at any given time. |
| 41 | |
| 42 | - Handling of multiple formats. PCM data only requires a specification |
| 43 | of the sampling rate, number of channels and bits per sample. In |
| 44 | contrast, compressed data comes in a variety of formats. Audio DSPs |
| 45 | may also provide support for a limited number of audio encoders and |
| 46 | decoders embedded in firmware, or may support more choices through |
| 47 | dynamic download of libraries. |
| 48 | |
| 49 | - Focus on main formats. This API provides support for the most |
| 50 | popular formats used for audio and video capture and playback. It is |
| 51 | likely that as audio compression technology advances, new formats |
| 52 | will be added. |
| 53 | |
| 54 | - Handling of multiple configurations. Even for a given format like |
| 55 | AAC, some implementations may support AAC multichannel but HE-AAC |
| 56 | stereo. Likewise WMA10 level M3 may require too much memory and cpu |
| 57 | cycles. The new API needs to provide a generic way of listing these |
| 58 | formats. |
| 59 | |
| 60 | - Rendering/Grabbing only. This API does not provide any means of |
| 61 | hardware acceleration, where PCM samples are provided back to |
| 62 | user-space for additional processing. This API focuses instead on |
| 63 | streaming compressed data to a DSP, with the assumption that the |
| 64 | decoded samples are routed to a physical output or logical back-end. |
| 65 | |
| 66 | - Complexity hiding. Existing user-space multimedia frameworks all |
| 67 | have existing enums/structures for each compressed format. This new |
| 68 | API assumes the existence of a platform-specific compatibility layer |
| 69 | to expose, translate and make use of the capabilities of the audio |
| 70 | DSP, eg. Android HAL or PulseAudio sinks. By construction, regular |
| 71 | applications are not supposed to make use of this API. |
| 72 | |
| 73 | |
| 74 | Design |
| 75 | |
Masanari Iida | c9f3f2d | 2013-07-18 01:29:12 +0900 | [diff] [blame] | 76 | The new API shares a number of concepts with the PCM API for flow |
Pierre-Louis Bossart | 57bd9b8 | 2011-12-23 10:36:35 +0530 | [diff] [blame] | 77 | control. Start, pause, resume, drain and stop commands have the same |
| 78 | semantics no matter what the content is. |
| 79 | |
| 80 | The concept of memory ring buffer divided in a set of fragments is |
| 81 | borrowed from the ALSA PCM API. However, only sizes in bytes can be |
| 82 | specified. |
| 83 | |
| 84 | Seeks/trick modes are assumed to be handled by the host. |
| 85 | |
| 86 | The notion of rewinds/forwards is not supported. Data committed to the |
| 87 | ring buffer cannot be invalidated, except when dropping all buffers. |
| 88 | |
| 89 | The Compressed Data API does not make any assumptions on how the data |
| 90 | is transmitted to the audio DSP. DMA transfers from main memory to an |
| 91 | embedded audio cluster or to a SPI interface for external DSPs are |
| 92 | possible. As in the ALSA PCM case, a core set of routines is exposed; |
| 93 | each driver implementer will have to write support for a set of |
| 94 | mandatory routines and possibly make use of optional ones. |
| 95 | |
| 96 | The main additions are |
| 97 | |
| 98 | - get_caps |
| 99 | This routine returns the list of audio formats supported. Querying the |
| 100 | codecs on a capture stream will return encoders, decoders will be |
| 101 | listed for playback streams. |
| 102 | |
| 103 | - get_codec_caps For each codec, this routine returns a list of |
| 104 | capabilities. The intent is to make sure all the capabilities |
| 105 | correspond to valid settings, and to minimize the risks of |
| 106 | configuration failures. For example, for a complex codec such as AAC, |
| 107 | the number of channels supported may depend on a specific profile. If |
| 108 | the capabilities were exposed with a single descriptor, it may happen |
| 109 | that a specific combination of profiles/channels/formats may not be |
| 110 | supported. Likewise, embedded DSPs have limited memory and cpu cycles, |
| 111 | it is likely that some implementations make the list of capabilities |
| 112 | dynamic and dependent on existing workloads. In addition to codec |
| 113 | settings, this routine returns the minimum buffer size handled by the |
| 114 | implementation. This information can be a function of the DMA buffer |
| 115 | sizes, the number of bytes required to synchronize, etc, and can be |
| 116 | used by userspace to define how much needs to be written in the ring |
| 117 | buffer before playback can start. |
| 118 | |
| 119 | - set_params |
| 120 | This routine sets the configuration chosen for a specific codec. The |
| 121 | most important field in the parameters is the codec type; in most |
| 122 | cases decoders will ignore other fields, while encoders will strictly |
| 123 | comply to the settings |
| 124 | |
| 125 | - get_params |
| 126 | This routines returns the actual settings used by the DSP. Changes to |
| 127 | the settings should remain the exception. |
| 128 | |
| 129 | - get_timestamp |
| 130 | The timestamp becomes a multiple field structure. It lists the number |
| 131 | of bytes transferred, the number of samples processed and the number |
| 132 | of samples rendered/grabbed. All these values can be used to determine |
Stefan Huber | 298a439 | 2013-06-27 12:54:50 +0200 | [diff] [blame] | 133 | the average bitrate, figure out if the ring buffer needs to be |
Pierre-Louis Bossart | 57bd9b8 | 2011-12-23 10:36:35 +0530 | [diff] [blame] | 134 | refilled or the delay due to decoding/encoding/io on the DSP. |
| 135 | |
| 136 | Note that the list of codecs/profiles/modes was derived from the |
| 137 | OpenMAX AL specification instead of reinventing the wheel. |
| 138 | Modifications include: |
| 139 | - Addition of FLAC and IEC formats |
| 140 | - Merge of encoder/decoder capabilities |
| 141 | - Profiles/modes listed as bitmasks to make descriptors more compact |
| 142 | - Addition of set_params for decoders (missing in OpenMAX AL) |
| 143 | - Addition of AMR/AMR-WB encoding modes (missing in OpenMAX AL) |
| 144 | - Addition of format information for WMA |
| 145 | - Addition of encoding options when required (derived from OpenMAX IL) |
| 146 | - Addition of rateControlSupported (missing in OpenMAX AL) |
| 147 | |
Jeeja KP | 9727b49 | 2013-02-14 16:52:51 +0530 | [diff] [blame] | 148 | Gapless Playback |
| 149 | ================ |
| 150 | When playing thru an album, the decoders have the ability to skip the encoder |
| 151 | delay and padding and directly move from one track content to another. The end |
| 152 | user can perceive this as gapless playback as we dont have silence while |
| 153 | switching from one track to another |
| 154 | |
| 155 | Also, there might be low-intensity noises due to encoding. Perfect gapless is |
| 156 | difficult to reach with all types of compressed data, but works fine with most |
| 157 | music content. The decoder needs to know the encoder delay and encoder padding. |
| 158 | So we need to pass this to DSP. This metadata is extracted from ID3/MP4 headers |
| 159 | and are not present by default in the bitstream, hence the need for a new |
| 160 | interface to pass this information to the DSP. Also DSP and userspace needs to |
| 161 | switch from one track to another and start using data for second track. |
| 162 | |
| 163 | The main additions are: |
| 164 | |
| 165 | - set_metadata |
| 166 | This routine sets the encoder delay and encoder padding. This can be used by |
| 167 | decoder to strip the silence. This needs to be set before the data in the track |
| 168 | is written. |
| 169 | |
| 170 | - set_next_track |
| 171 | This routine tells DSP that metadata and write operation sent after this would |
| 172 | correspond to subsequent track |
| 173 | |
| 174 | - partial drain |
| 175 | This is called when end of file is reached. The userspace can inform DSP that |
| 176 | EOF is reached and now DSP can start skipping padding delay. Also next write |
| 177 | data would belong to next track |
| 178 | |
| 179 | Sequence flow for gapless would be: |
| 180 | - Open |
| 181 | - Get caps / codec caps |
| 182 | - Set params |
| 183 | - Set metadata of the first track |
| 184 | - Fill data of the first track |
| 185 | - Trigger start |
| 186 | - User-space finished sending all, |
| 187 | - Indicaite next track data by sending set_next_track |
| 188 | - Set metadata of the next track |
| 189 | - then call partial_drain to flush most of buffer in DSP |
| 190 | - Fill data of the next track |
| 191 | - DSP switches to second track |
| 192 | (note: order for partial_drain and write for next track can be reversed as well) |
| 193 | |
Pierre-Louis Bossart | 57bd9b8 | 2011-12-23 10:36:35 +0530 | [diff] [blame] | 194 | Not supported: |
| 195 | |
| 196 | - Support for VoIP/circuit-switched calls is not the target of this |
| 197 | API. Support for dynamic bit-rate changes would require a tight |
| 198 | coupling between the DSP and the host stack, limiting power savings. |
| 199 | |
| 200 | - Packet-loss concealment is not supported. This would require an |
| 201 | additional interface to let the decoder synthesize data when frames |
| 202 | are lost during transmission. This may be added in the future. |
| 203 | |
| 204 | - Volume control/routing is not handled by this API. Devices exposing a |
| 205 | compressed data interface will be considered as regular ALSA devices; |
| 206 | volume changes and routing information will be provided with regular |
| 207 | ALSA kcontrols. |
| 208 | |
| 209 | - Embedded audio effects. Such effects should be enabled in the same |
| 210 | manner, no matter if the input was PCM or compressed. |
| 211 | |
| 212 | - multichannel IEC encoding. Unclear if this is required. |
| 213 | |
| 214 | - Encoding/decoding acceleration is not supported as mentioned |
| 215 | above. It is possible to route the output of a decoder to a capture |
| 216 | stream, or even implement transcoding capabilities. This routing |
| 217 | would be enabled with ALSA kcontrols. |
| 218 | |
| 219 | - Audio policy/resource management. This API does not provide any |
Masanari Iida | b327d25 | 2013-10-29 12:05:02 +0900 | [diff] [blame] | 220 | hooks to query the utilization of the audio DSP, nor any preemption |
Pierre-Louis Bossart | 57bd9b8 | 2011-12-23 10:36:35 +0530 | [diff] [blame] | 221 | mechanisms. |
| 222 | |
Masanari Iida | b327d25 | 2013-10-29 12:05:02 +0900 | [diff] [blame] | 223 | - No notion of underrun/overrun. Since the bytes written are compressed |
Pierre-Louis Bossart | 57bd9b8 | 2011-12-23 10:36:35 +0530 | [diff] [blame] | 224 | in nature and data written/read doesn't translate directly to |
Masanari Iida | b327d25 | 2013-10-29 12:05:02 +0900 | [diff] [blame] | 225 | rendered output in time, this does not deal with underrun/overrun and |
Pierre-Louis Bossart | 57bd9b8 | 2011-12-23 10:36:35 +0530 | [diff] [blame] | 226 | maybe dealt in user-library |
| 227 | |
| 228 | Credits: |
| 229 | - Mark Brown and Liam Girdwood for discussions on the need for this API |
| 230 | - Harsha Priya for her work on intel_sst compressed API |
| 231 | - Rakesh Ughreja for valuable feedback |
| 232 | - Sing Nallasellan, Sikkandar Madar and Prasanna Samaga for |
| 233 | demonstrating and quantifying the benefits of audio offload on a |
| 234 | real platform. |