Blame - Documentation/sound/alsa/compress_offload.txt - kernel/msm-4.9

blob: 630c492c3dc2374d621b5c70d190b9100100402a [file] [log] [blame]

Pierre-Louis Bossart	57bd9b8	2011-12-23 10:36:35 +0530	[diff] [blame]	1	compress_offload.txt
				2	=====================
				3	Pierre-Louis.Bossart <pierre-louis.bossart@linux.intel.com>
				4	Vinod Koul <vinod.koul@linux.intel.com>
				5
				6	Overview
				7
				8	Since its early days, the ALSA API was defined with PCM support or
				9	constant bitrates payloads such as IEC61937 in mind. Arguments and
				10	returned values in frames are the norm, making it a challenge to
				11	extend the existing API to compressed data streams.
				12
				13	In recent years, audio digital signal processors (DSP) were integrated
				14	in system-on-chip designs, and DSPs are also integrated in audio
				15	codecs. Processing compressed data on such DSPs results in a dramatic
				16	reduction of power consumption compared to host-based
				17	processing. Support for such hardware has not been very good in Linux,
				18	mostly because of a lack of a generic API available in the mainline
				19	kernel.
				20
Masanari Iida	c94bed8e	2012-04-10 00:22:13 +0900	[diff] [blame]	21	Rather than requiring a compatibility break with an API change of the
Pierre-Louis Bossart	57bd9b8	2011-12-23 10:36:35 +0530	[diff] [blame]	22	ALSA PCM interface, a new 'Compressed Data' API is introduced to
				23	provide a control and data-streaming interface for audio DSPs.
				24
				25	The design of this API was inspired by the 2-year experience with the
				26	Intel Moorestown SOC, with many corrections required to upstream the
				27	API in the mainline kernel instead of the staging tree and make it
				28	usable by others.
				29
				30	Requirements
				31
				32	The main requirements are:
				33
				34	- separation between byte counts and time. Compressed formats may have
				35	a header per file, per frame, or no header at all. The payload size
				36	may vary from frame-to-frame. As a result, it is not possible to
				37	estimate reliably the duration of audio buffers when handling
				38	compressed data. Dedicated mechanisms are required to allow for
				39	reliable audio-video synchronization, which requires precise
				40	reporting of the number of samples rendered at any given time.
				41
				42	- Handling of multiple formats. PCM data only requires a specification
				43	of the sampling rate, number of channels and bits per sample. In
				44	contrast, compressed data comes in a variety of formats. Audio DSPs
				45	may also provide support for a limited number of audio encoders and
				46	decoders embedded in firmware, or may support more choices through
				47	dynamic download of libraries.
				48
				49	- Focus on main formats. This API provides support for the most
				50	popular formats used for audio and video capture and playback. It is
				51	likely that as audio compression technology advances, new formats
				52	will be added.
				53
				54	- Handling of multiple configurations. Even for a given format like
				55	AAC, some implementations may support AAC multichannel but HE-AAC
				56	stereo. Likewise WMA10 level M3 may require too much memory and cpu
				57	cycles. The new API needs to provide a generic way of listing these
				58	formats.
				59
				60	- Rendering/Grabbing only. This API does not provide any means of
				61	hardware acceleration, where PCM samples are provided back to
				62	user-space for additional processing. This API focuses instead on
				63	streaming compressed data to a DSP, with the assumption that the
				64	decoded samples are routed to a physical output or logical back-end.
				65
				66	- Complexity hiding. Existing user-space multimedia frameworks all
				67	have existing enums/structures for each compressed format. This new
				68	API assumes the existence of a platform-specific compatibility layer
				69	to expose, translate and make use of the capabilities of the audio
				70	DSP, eg. Android HAL or PulseAudio sinks. By construction, regular
				71	applications are not supposed to make use of this API.
				72
				73
				74	Design
				75
Masanari Iida	c9f3f2d	2013-07-18 01:29:12 +0900	[diff] [blame]	76	The new API shares a number of concepts with the PCM API for flow
Pierre-Louis Bossart	57bd9b8	2011-12-23 10:36:35 +0530	[diff] [blame]	77	control. Start, pause, resume, drain and stop commands have the same
				78	semantics no matter what the content is.
				79
				80	The concept of memory ring buffer divided in a set of fragments is
				81	borrowed from the ALSA PCM API. However, only sizes in bytes can be
				82	specified.
				83
				84	Seeks/trick modes are assumed to be handled by the host.
				85
				86	The notion of rewinds/forwards is not supported. Data committed to the
				87	ring buffer cannot be invalidated, except when dropping all buffers.
				88
				89	The Compressed Data API does not make any assumptions on how the data
				90	is transmitted to the audio DSP. DMA transfers from main memory to an
				91	embedded audio cluster or to a SPI interface for external DSPs are
				92	possible. As in the ALSA PCM case, a core set of routines is exposed;
				93	each driver implementer will have to write support for a set of
				94	mandatory routines and possibly make use of optional ones.
				95
				96	The main additions are
				97
				98	- get_caps
				99	This routine returns the list of audio formats supported. Querying the
				100	codecs on a capture stream will return encoders, decoders will be
				101	listed for playback streams.
				102
				103	- get_codec_caps For each codec, this routine returns a list of
				104	capabilities. The intent is to make sure all the capabilities
				105	correspond to valid settings, and to minimize the risks of
				106	configuration failures. For example, for a complex codec such as AAC,
				107	the number of channels supported may depend on a specific profile. If
				108	the capabilities were exposed with a single descriptor, it may happen
				109	that a specific combination of profiles/channels/formats may not be
				110	supported. Likewise, embedded DSPs have limited memory and cpu cycles,
				111	it is likely that some implementations make the list of capabilities
				112	dynamic and dependent on existing workloads. In addition to codec
				113	settings, this routine returns the minimum buffer size handled by the
				114	implementation. This information can be a function of the DMA buffer
				115	sizes, the number of bytes required to synchronize, etc, and can be
				116	used by userspace to define how much needs to be written in the ring
				117	buffer before playback can start.
				118
				119	- set_params
				120	This routine sets the configuration chosen for a specific codec. The
				121	most important field in the parameters is the codec type; in most
				122	cases decoders will ignore other fields, while encoders will strictly
				123	comply to the settings
				124
				125	- get_params
				126	This routines returns the actual settings used by the DSP. Changes to
				127	the settings should remain the exception.
				128
				129	- get_timestamp
				130	The timestamp becomes a multiple field structure. It lists the number
				131	of bytes transferred, the number of samples processed and the number
				132	of samples rendered/grabbed. All these values can be used to determine
Stefan Huber	298a439	2013-06-27 12:54:50 +0200	[diff] [blame]	133	the average bitrate, figure out if the ring buffer needs to be
Pierre-Louis Bossart	57bd9b8	2011-12-23 10:36:35 +0530	[diff] [blame]	134	refilled or the delay due to decoding/encoding/io on the DSP.
				135
				136	Note that the list of codecs/profiles/modes was derived from the
				137	OpenMAX AL specification instead of reinventing the wheel.
				138	Modifications include:
				139	- Addition of FLAC and IEC formats
				140	- Merge of encoder/decoder capabilities
				141	- Profiles/modes listed as bitmasks to make descriptors more compact
				142	- Addition of set_params for decoders (missing in OpenMAX AL)
				143	- Addition of AMR/AMR-WB encoding modes (missing in OpenMAX AL)
				144	- Addition of format information for WMA
				145	- Addition of encoding options when required (derived from OpenMAX IL)
				146	- Addition of rateControlSupported (missing in OpenMAX AL)
				147
Jeeja KP	9727b49	2013-02-14 16:52:51 +0530	[diff] [blame]	148	Gapless Playback
				149	================
				150	When playing thru an album, the decoders have the ability to skip the encoder
				151	delay and padding and directly move from one track content to another. The end
				152	user can perceive this as gapless playback as we dont have silence while
				153	switching from one track to another
				154
				155	Also, there might be low-intensity noises due to encoding. Perfect gapless is
				156	difficult to reach with all types of compressed data, but works fine with most
				157	music content. The decoder needs to know the encoder delay and encoder padding.
				158	So we need to pass this to DSP. This metadata is extracted from ID3/MP4 headers
				159	and are not present by default in the bitstream, hence the need for a new
				160	interface to pass this information to the DSP. Also DSP and userspace needs to
				161	switch from one track to another and start using data for second track.
				162
				163	The main additions are:
				164
				165	- set_metadata
				166	This routine sets the encoder delay and encoder padding. This can be used by
				167	decoder to strip the silence. This needs to be set before the data in the track
				168	is written.
				169
				170	- set_next_track
				171	This routine tells DSP that metadata and write operation sent after this would
				172	correspond to subsequent track
				173
				174	- partial drain
				175	This is called when end of file is reached. The userspace can inform DSP that
				176	EOF is reached and now DSP can start skipping padding delay. Also next write
				177	data would belong to next track
				178
				179	Sequence flow for gapless would be:
				180	- Open
				181	- Get caps / codec caps
				182	- Set params
				183	- Set metadata of the first track
				184	- Fill data of the first track
				185	- Trigger start
				186	- User-space finished sending all,
				187	- Indicaite next track data by sending set_next_track
				188	- Set metadata of the next track
				189	- then call partial_drain to flush most of buffer in DSP
				190	- Fill data of the next track
				191	- DSP switches to second track
				192	(note: order for partial_drain and write for next track can be reversed as well)
				193
Pierre-Louis Bossart	57bd9b8	2011-12-23 10:36:35 +0530	[diff] [blame]	194	Not supported:
				195
				196	- Support for VoIP/circuit-switched calls is not the target of this
				197	API. Support for dynamic bit-rate changes would require a tight
				198	coupling between the DSP and the host stack, limiting power savings.
				199
				200	- Packet-loss concealment is not supported. This would require an
				201	additional interface to let the decoder synthesize data when frames
				202	are lost during transmission. This may be added in the future.
				203
				204	- Volume control/routing is not handled by this API. Devices exposing a
				205	compressed data interface will be considered as regular ALSA devices;
				206	volume changes and routing information will be provided with regular
				207	ALSA kcontrols.
				208
				209	- Embedded audio effects. Such effects should be enabled in the same
				210	manner, no matter if the input was PCM or compressed.
				211
				212	- multichannel IEC encoding. Unclear if this is required.
				213
				214	- Encoding/decoding acceleration is not supported as mentioned
				215	above. It is possible to route the output of a decoder to a capture
				216	stream, or even implement transcoding capabilities. This routing
				217	would be enabled with ALSA kcontrols.
				218
				219	- Audio policy/resource management. This API does not provide any
Masanari Iida	b327d25	2013-10-29 12:05:02 +0900	[diff] [blame]	220	hooks to query the utilization of the audio DSP, nor any preemption
Pierre-Louis Bossart	57bd9b8	2011-12-23 10:36:35 +0530	[diff] [blame]	221	mechanisms.
				222
Masanari Iida	b327d25	2013-10-29 12:05:02 +0900	[diff] [blame]	223	- No notion of underrun/overrun. Since the bytes written are compressed
Pierre-Louis Bossart	57bd9b8	2011-12-23 10:36:35 +0530	[diff] [blame]	224	in nature and data written/read doesn't translate directly to
Masanari Iida	b327d25	2013-10-29 12:05:02 +0900	[diff] [blame]	225	rendered output in time, this does not deal with underrun/overrun and
Pierre-Louis Bossart	57bd9b8	2011-12-23 10:36:35 +0530	[diff] [blame]	226	maybe dealt in user-library
				227
				228	Credits:
				229	- Mark Brown and Liam Girdwood for discussions on the need for this API
				230	- Harsha Priya for her work on intel_sst compressed API
				231	- Rakesh Ughreja for valuable feedback
				232	- Sing Nallasellan, Sikkandar Madar and Prasanna Samaga for
				233	demonstrating and quantifying the benefits of audio offload on a
				234	real platform.