Blame - doc/draft-ietf-codec-opus-update.xml - platform/external/libopus

blob: cace96805bd7871f2a9440e754d65ec41145cbd8 [file] [log] [blame]

Jean-Marc Valin	ae521b0	2013-07-12 23:52:00 -0400	[diff] [blame]	1	<?xml version="1.0" encoding="US-ASCII"?>
				2	<!DOCTYPE rfc SYSTEM "rfc2629.dtd">
				3	<?rfc toc="yes"?>
				4	<?rfc tocompact="yes"?>
				5	<?rfc tocdepth="3"?>
				6	<?rfc tocindent="yes"?>
				7	<?rfc symrefs="yes"?>
				8	<?rfc sortrefs="yes"?>
				9	<?rfc comments="yes"?>
				10	<?rfc inline="yes"?>
				11	<?rfc compact="yes"?>
				12	<?rfc subcompact="no"?>
Jean-Marc Valin	4a4bc08	2016-06-30 18:01:04 -0400	[diff] [blame^]	13	<rfc category="std" docName="draft-ietf-codec-opus-update-02"
Jean-Marc Valin	ae521b0	2013-07-12 23:52:00 -0400	[diff] [blame]	14	ipr="trust200902">
				15	<front>
				16	<title abbrev="Opus Update">Updates to the Opus Audio Codec</title>
				17
				18	<author initials="JM" surname="Valin" fullname="Jean-Marc Valin">
				19	<organization>Mozilla Corporation</organization>
				20	<address>
				21	<postal>
Jean-Marc Valin	370286c	2014-09-03 21:52:37 -0400	[diff] [blame]	22	<street>331 E. Evelyn Avenue</street>
Jean-Marc Valin	ae521b0	2013-07-12 23:52:00 -0400	[diff] [blame]	23	<city>Mountain View</city>
				24	<region>CA</region>
				25	<code>94041</code>
				26	<country>USA</country>
				27	</postal>
				28	<phone>+1 650 903-0800</phone>
				29	<email>jmvalin@jmvalin.ca</email>
				30	</address>
				31	</author>
				32
Jean-Marc Valin	ae521b0	2013-07-12 23:52:00 -0400	[diff] [blame]	33	<author initials="K." surname="Vos" fullname="Koen Vos">
Jean-Marc Valin	0b644be	2014-01-13 15:31:01 -0500	[diff] [blame]	34	<organization>vocTone</organization>
Jean-Marc Valin	ae521b0	2013-07-12 23:52:00 -0400	[diff] [blame]	35	<address>
				36	<postal>
Jean-Marc Valin	0b644be	2014-01-13 15:31:01 -0500	[diff] [blame]	37	<street></street>
				38	<city></city>
Jean-Marc Valin	ae521b0	2013-07-12 23:52:00 -0400	[diff] [blame]	39	<region></region>
Jean-Marc Valin	0b644be	2014-01-13 15:31:01 -0500	[diff] [blame]	40	<code></code>
				41	<country></country>
Jean-Marc Valin	ae521b0	2013-07-12 23:52:00 -0400	[diff] [blame]	42	</postal>
Jean-Marc Valin	0b644be	2014-01-13 15:31:01 -0500	[diff] [blame]	43	<phone></phone>
				44	<email>koenvos74@gmail.com</email>
Jean-Marc Valin	ae521b0	2013-07-12 23:52:00 -0400	[diff] [blame]	45	</address>
				46	</author>
				47
				48
				49
Jean-Marc Valin	4a4bc08	2016-06-30 18:01:04 -0400	[diff] [blame^]	50	<date day="1" month="July" year="2016" />
Jean-Marc Valin	ae521b0	2013-07-12 23:52:00 -0400	[diff] [blame]	51
				52	<abstract>
				53	<t>This document addresses minor issues that were found in the specification
				54	of the Opus audio codec in <xref target="RFC6716">RFC 6716</xref>.</t>
				55	</abstract>
				56	</front>
				57
				58	<middle>
				59	<section title="Introduction">
Mark Harris	2838535	2014-01-13 16:53:21 -0500	[diff] [blame]	60	<t>This document addresses minor issues that were discovered in the reference
Timothy B. Terriberry	554b349	2014-10-03 21:49:57 -0700	[diff] [blame]	61	implementation of the Opus codec that serves as the specification in
Jean-Marc Valin	ae521b0	2013-07-12 23:52:00 -0400	[diff] [blame]	62	<xref target="RFC6716">RFC 6716</xref>. Only issues affecting the decoder are
				63	listed here. An up-to-date implementation of the Opus encoder can be found at
				64	http://opus-codec.org/. The updated specification remains fully compatible with
				65	the original specification and only one of the changes results in any difference
				66	in the audio output.
				67	</t>
				68	</section>
				69
				70	<section title="Terminology">
				71	<t>The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
				72	"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
				73	document are to be interpreted as described in <xref
				74	target="RFC2119">RFC 2119</xref>.</t>
				75	</section>
				76
				77	<section title="Stereo State Reset in SILK">
				78	<t>The reference implementation does not reinitialize the stereo state
				79	during a mode switch. The old stereo memory can produce a brief impulse
				80	(i.e. single sample) in the decoded audio. This can be fixed by changing
				81	silk/dec_API.c at line 72:
Jean-Marc Valin	4a4bc08	2016-06-30 18:01:04 -0400	[diff] [blame^]	82	</t>
				83	<figure>
				84	<artwork><![CDATA[
Jean-Marc Valin	ae521b0	2013-07-12 23:52:00 -0400	[diff] [blame]	85	for( n = 0; n < DECODER_NUM_CHANNELS; n++ ) {
				86	ret = silk_init_decoder( &channel_state[ n ] );
				87	}
Timothy B. Terriberry	554b349	2014-10-03 21:49:57 -0700	[diff] [blame]	88	+ silk_memset(&((silk_decoder *)decState)->sStereo, 0,
Jean-Marc Valin	ae521b0	2013-07-12 23:52:00 -0400	[diff] [blame]	89	+ sizeof(((silk_decoder *)decState)->sStereo));
				90	+ /* Not strictly needed, but it's cleaner that way */
				91	+ ((silk_decoder *)decState)->prev_decode_only_middle = 0;
				92
				93	return ret;
				94	}
				95	]]></artwork>
				96	</figure>
Jean-Marc Valin	4a4bc08	2016-06-30 18:01:04 -0400	[diff] [blame^]	97	<t>
				98	This change affects the normative part of the decoder, although the
				99	amount of change is too small to make a significant impact on testvectors.
Jean-Marc Valin	ae521b0	2013-07-12 23:52:00 -0400	[diff] [blame]	100	</t>
				101	</section>
				102
				103	<section anchor="padding" title="Parsing of the Opus Packet Padding">
				104	<t>It was discovered that some invalid packets of very large size could trigger
				105	an out-of-bounds read in the Opus packet parsing code responsible for padding.
				106	This is due to an integer overflow if the signaled padding exceeds 2^31-1 bytes
				107	(the actual packet may be smaller). The code can be fixed by applying the following
				108	changes at line 596 of src/opus_decoder.c:
Jean-Marc Valin	4a4bc08	2016-06-30 18:01:04 -0400	[diff] [blame^]	109	</t>
				110	<figure>
				111	<artwork><![CDATA[
Jean-Marc Valin	ae521b0	2013-07-12 23:52:00 -0400	[diff] [blame]	112	/* Padding flag is bit 6 */
				113	if (ch&0x40)
				114	{
				115	- int padding=0;
				116	int p;
				117	do {
				118	if (len<=0)
				119	return OPUS_INVALID_PACKET;
				120	p = *data++;
				121	len--;
				122	- padding += p==255 ? 254: p;
				123	+ len -= p==255 ? 254: p;
				124	} while (p==255);
				125	- len -= padding;
				126	}
				127	]]></artwork>
				128	</figure>
Jean-Marc Valin	ae521b0	2013-07-12 23:52:00 -0400	[diff] [blame]	129	<t>This packet parsing issue is limited to reading memory up
				130	to about 60 kB beyond the compressed buffer. This can only be triggered
				131	by a compressed packet more than about 16 MB long, so it's not a problem
				132	for RTP. In theory, it <spanx style="emph">could</spanx> crash a file
				133	decoder (e.g. Opus in Ogg) if the memory just after the incoming packet
Jean-Marc Valin	370286c	2014-09-03 21:52:37 -0400	[diff] [blame]	134	is out-of-range, but our attempts to trigger such a crash in a production
				135	application built using an affected version of the Opus decoder failed.</t>
Jean-Marc Valin	ae521b0	2013-07-12 23:52:00 -0400	[diff] [blame]	136	</section>
				137
				138	<section anchor="resampler" title="Resampler buffer">
				139	<t>The SILK resampler had the following issues:
				140	<list style="numbers">
				141	<t>The calls to memcpy() were using sizeof(opus_int32), but the type of the
				142	local buffer was opus_int16.</t>
				143	<t>Because the size was wrong, this potentially allowed the source
Mark Harris	2c7eb78	2014-01-13 16:30:55 -0500	[diff] [blame]	144	and destination regions of the memcpy() to overlap.
Jean-Marc Valin	ae521b0	2013-07-12 23:52:00 -0400	[diff] [blame]	145	We <spanx style="emph">believe</spanx> that nSamplesIn is at least fs_in_khZ,
				146	which is at least 8.
Mark Harris	2c7eb78	2014-01-13 16:30:55 -0500	[diff] [blame]	147	Since RESAMPLER_ORDER_FIR_12 is only 8, that should not be a problem once
Jean-Marc Valin	ae521b0	2013-07-12 23:52:00 -0400	[diff] [blame]	148	the type size is fixed.</t>
				149	<t>The size of the buffer used RESAMPLER_MAX_BATCH_SIZE_IN, but the
				150	data stored in it was actually _twice_ the input batch size
				151	(nSamplesIn<<1).</t>
				152	</list></t>
				153	<t>
				154	The fact that the code never produced any error in testing (including when run under the
				155	Valgrind memory debugger), suggests that in practice
				156	the batch sizes are reasonable enough that none of the issues above
				157	was ever a problem. However, proving that is non-obvious.
				158	</t>
Mark Harris	2c7eb78	2014-01-13 16:30:55 -0500	[diff] [blame]	159	<t>The code can be fixed by applying the following changes to line 70 of silk/resampler_private_IIR_FIR.c:
Jean-Marc Valin	4a4bc08	2016-06-30 18:01:04 -0400	[diff] [blame^]	160	</t>
Jean-Marc Valin	ae521b0	2013-07-12 23:52:00 -0400	[diff] [blame]	161	<figure>
				162	<artwork><![CDATA[
Jean-Marc Valin	ae521b0	2013-07-12 23:52:00 -0400	[diff] [blame]	163	)
				164	{
Jean-Marc Valin	370286c	2014-09-03 21:52:37 -0400	[diff] [blame]	165	silk_resampler_state_struct *S = \
				166	(silk_resampler_state_struct *)SS;
Jean-Marc Valin	ae521b0	2013-07-12 23:52:00 -0400	[diff] [blame]	167	opus_int32 nSamplesIn;
				168	opus_int32 max_index_Q16, index_increment_Q16;
Jean-Marc Valin	370286c	2014-09-03 21:52:37 -0400	[diff] [blame]	169	- opus_int16 buf[ RESAMPLER_MAX_BATCH_SIZE_IN + \
				170	RESAMPLER_ORDER_FIR_12 ];
				171	+ opus_int16 buf[ 2*RESAMPLER_MAX_BATCH_SIZE_IN + \
				172	RESAMPLER_ORDER_FIR_12 ];
Jean-Marc Valin	ae521b0	2013-07-12 23:52:00 -0400	[diff] [blame]	173
				174	/* Copy buffered samples to start of buffer */
Jean-Marc Valin	370286c	2014-09-03 21:52:37 -0400	[diff] [blame]	175	- silk_memcpy( buf, S->sFIR, RESAMPLER_ORDER_FIR_12 \
				176	* sizeof( opus_int32 ) );
				177	+ silk_memcpy( buf, S->sFIR, RESAMPLER_ORDER_FIR_12 \
				178	* sizeof( opus_int16 ) );
Jean-Marc Valin	ae521b0	2013-07-12 23:52:00 -0400	[diff] [blame]	179
				180	/* Iterate over blocks of frameSizeIn input samples */
				181	index_increment_Q16 = S->invRatio_Q16;
				182	while( 1 ) {
				183	nSamplesIn = silk_min( inLen, S->batchSize );
				184
				185	/* Upsample 2x */
Jean-Marc Valin	370286c	2014-09-03 21:52:37 -0400	[diff] [blame]	186	silk_resampler_private_up2_HQ( S->sIIR, &buf[ \
				187	RESAMPLER_ORDER_FIR_12 ], in, nSamplesIn );
Jean-Marc Valin	ae521b0	2013-07-12 23:52:00 -0400	[diff] [blame]	188
Jean-Marc Valin	370286c	2014-09-03 21:52:37 -0400	[diff] [blame]	189	max_index_Q16 = silk_LSHIFT32( nSamplesIn, 16 + 1 \
				190	); /* + 1 because 2x upsampling */
				191	out = silk_resampler_private_IIR_FIR_INTERPOL( out, \
				192	buf, max_index_Q16, index_increment_Q16 );
Jean-Marc Valin	ae521b0	2013-07-12 23:52:00 -0400	[diff] [blame]	193	in += nSamplesIn;
				194	inLen -= nSamplesIn;
				195
				196	if( inLen > 0 ) {
Jean-Marc Valin	370286c	2014-09-03 21:52:37 -0400	[diff] [blame]	197	/* More iterations to do; copy last part of \
				198	filtered signal to beginning of buffer */
				199	- silk_memcpy( buf, &buf[ nSamplesIn << 1 ], \
				200	RESAMPLER_ORDER_FIR_12 * sizeof( opus_int32 ) );
				201	+ silk_memmove( buf, &buf[ nSamplesIn << 1 ], \
				202	RESAMPLER_ORDER_FIR_12 * sizeof( opus_int16 ) );
Jean-Marc Valin	ae521b0	2013-07-12 23:52:00 -0400	[diff] [blame]	203	} else {
				204	break;
				205	}
				206	}
				207
Jean-Marc Valin	370286c	2014-09-03 21:52:37 -0400	[diff] [blame]	208	/* Copy last part of filtered signal to the state for \
				209	the next call */
				210	- silk_memcpy( S->sFIR, &buf[ nSamplesIn << 1 ], \
				211	RESAMPLER_ORDER_FIR_12 * sizeof( opus_int32 ) );
				212	+ silk_memcpy( S->sFIR, &buf[ nSamplesIn << 1 ], \
				213	RESAMPLER_ORDER_FIR_12 * sizeof( opus_int16 ) );
Jean-Marc Valin	ae521b0	2013-07-12 23:52:00 -0400	[diff] [blame]	214	}
				215	]]></artwork>
				216	</figure>
Jean-Marc Valin	4a4bc08	2016-06-30 18:01:04 -0400	[diff] [blame^]	217	<t>
Jean-Marc Valin	370286c	2014-09-03 21:52:37 -0400	[diff] [blame]	218	Note: due to RFC formatting conventions, lines exceeding the column width
				219	in the patch above are split using a backslash character. The backslashes
				220	at the end of a line and the white space at the beginning
				221	of the following line are not part of the patch. A properly formatted patch
Timothy B. Terriberry	554b349	2014-10-03 21:49:57 -0700	[diff] [blame]	222	including the three changes above is available at
Jean-Marc Valin	370286c	2014-09-03 21:52:37 -0400	[diff] [blame]	223	<eref target="http://jmvalin.ca/misc_stuff/opus_update.patch"/>.
Jean-Marc Valin	ae521b0	2013-07-12 23:52:00 -0400	[diff] [blame]	224	</t>
				225	</section>
Timothy B. Terriberry	554b349	2014-10-03 21:49:57 -0700	[diff] [blame]	226
Jean-Marc Valin	4a4bc08	2016-06-30 18:01:04 -0400	[diff] [blame^]	227	<section title="Downmix to Mono" anchor="stereo">
Jean-Marc Valin	ae521b0	2013-07-12 23:52:00 -0400	[diff] [blame]	228	<t>The last issue is not strictly a bug, but it is an issue that has been reported
Mark Harris	2c7eb78	2014-01-13 16:30:55 -0500	[diff] [blame]	229	when downmixing an Opus decoded stream to mono, whether this is done inside the decoder
				230	or as a post-processing step on the stereo decoder output. Opus intensity stereo allows
Timothy B. Terriberry	554b349	2014-10-03 21:49:57 -0700	[diff] [blame]	231	optionally coding the two channels 180-degrees out of phase on a per-band basis.
Jean-Marc Valin	ae521b0	2013-07-12 23:52:00 -0400	[diff] [blame]	232	This provides better stereo quality than forcing the two channels to be in phase,
				233	but when the output is downmixed to mono, the energy in the affected bands is cancelled
				234	sometimes resulting in audible artefacts.
				235	</t>
Jean-Marc Valin	370286c	2014-09-03 21:52:37 -0400	[diff] [blame]	236	<t>As a work-around for this issue, the decoder MAY choose not to apply the 180-degree
				237	phase shift when the output is meant to be downmixed (inside or
Jean-Marc Valin	ae521b0	2013-07-12 23:52:00 -0400	[diff] [blame]	238	outside of the decoder).
				239	</t>
				240	</section>
Jean-Marc Valin	4a4bc08	2016-06-30 18:01:04 -0400	[diff] [blame^]	241
				242	<section title="Hybrid Folding" anchor="folding">
				243	<t>When encoding in hybrid mode at low bitrate, we sometimes only have
				244	enough bits to code a single CELT band (8 - 9.6 kHz). When that happens,
				245	the second band (CELT band 18, from 9.6 to 12 kHz) cannot use folding
				246	because it is wider than the amount already coded, and falls back to
				247	LCG noise. Because it can also happen on transients (e.g. stops), it
				248	can cause audible pre-echo.
				249	</t>
				250	<t>
				251	To address the issue, we change the folding behaviour so that it is
				252	never forced to fall back to LCG due to not enough folding data. This
				253	is achieved by simply repeating part of the first band in the folding
				254	of the second band. This changes the code in celt/bands.c around line 237:
				255	</t>
				256	<figure>
				257	<artwork><![CDATA[
				258	b = 0;
				259	}
				260
				261	- if (resynth && MeBands[i]-N >= MeBands[start] && \
				262	(update_lowband \|\| lowband_offset==0))
				263	+ if (resynth && (MeBands[i]-N >= MeBands[start] \|\| \
				264	i==start+1) && (update_lowband \|\| lowband_offset==0))
				265	lowband_offset = i;
				266
				267	+ if (i == start+1)
				268	+ {
				269	+ int n1, n2;
				270	+ int offset;
				271	+ n1 = M*(eBands[start+1]-eBands[start]);
				272	+ n2 = M*(eBands[start+2]-eBands[start+1]);
				273	+ offset = M*eBands[start];
				274	+ /* Duplicate enough of the first band folding data to \
				275	be able to fold the second band.
				276	+ Copies no data for CELT-only mode. */
				277	+ OPUS_COPY(&norm[offset+n1], &norm[offset+2*n1 - n2], n2-n1);
				278	+ if (C==2)
				279	+ OPUS_COPY(&norm2[offset+n1], &norm2[offset+2*n1 - n2], \
				280	n2-n1);
				281	+ }
				282	+
				283	tf_change = tf_res[i];
				284	if (i>=m->effEBands)
				285	{
				286	]]></artwork>
				287	</figure>
				288
				289	<t>
				290	as well as line 260:
				291	</t>
				292
				293	<figure>
				294	<artwork><![CDATA[
				295	fold_start = lowband_offset;
				296	while(M*eBands[--fold_start] > effective_lowband);
				297	fold_end = lowband_offset-1;
				298	- while(M*eBands[++fold_end] < effective_lowband+N);
				299	+ while(++fold_end < i && M*eBands[++fold_end] < \
				300	effective_lowband+N);
				301	x_cm = y_cm = 0;
				302	fold_i = fold_start; do {
				303	x_cm \|= collapse_masks[fold_i*C+0];
				304
				305	]]></artwork>
				306	</figure>
				307	<t>
				308	The fix does not impact compatibility, because the improvement does
				309	not depend on the encoder doing anything special. There is also no
				310	reasonable way for an encoder to use the original behaviour to
				311	improve quality over the proposed change.
				312	</t>
				313	</section>
				314
				315	<section title="New Test Vectors">
				316	<t>Changes in <xref target="stereo"/> and <xref target="folding"/> have
				317	sufficient impact on the testvectors to make them fail. For this reason,
				318	this document also updates the Opus test vectors. The new test vectors now
				319	include two decoded outputs for the same bitstream. The outputs with
				320	suffix 'm' do not apply the CELT 180-degree phase shift as allowed in
				321	<xref target="stereo"/>, while the outputs with suffix 's' do. An
				322	implementation is compliant as long as it passes either the 'm' or the
				323	's' set of vectors.
				324	</t>
				325	<t>
				326	In addition, any Opus implementation
				327	that passes the original test vectors from <xref target="RFC6716">RFC 6716</xref>
				328	is still compliant with the Opus specification. However, newer implementations
				329	SHOULD be based on the new test vectors rather than the old ones.
				330	</t>
				331	<t>The new test vectors are located at
				332	<eref target="https://jmvalin.ca/misc_stuff/opus_newvectors.tar.gz"/>. (EDITOR:
				333	change link ietf.org when ready).
				334	</t>
				335	</section>
				336
Jean-Marc Valin	ae521b0	2013-07-12 23:52:00 -0400	[diff] [blame]	337	<section anchor="IANA" title="IANA Considerations">
				338	<t>This document makes no request of IANA.</t>
				339
				340	<t>Note to RFC Editor: this section may be removed on publication as an
				341	RFC.</t>
				342	</section>
				343
				344	<section anchor="Acknowledgements" title="Acknowledgements">
				345	<t>We would like to thank Juri Aedla for reporting the issue with the parsing of
				346	the Opus padding.</t>
				347	</section>
				348	</middle>
				349
				350	<back>
				351	<references title="References">
				352	<?rfc include="http://xml.resource.org/public/rfc/bibxml/reference.RFC.2119.xml"?>
				353	<?rfc include="http://xml.resource.org/public/rfc/bibxml/reference.RFC.6716.xml"?>
				354
				355
				356	</references>
				357	</back>
				358	</rfc>