Blame - doc/draft-ietf-payload-rtp-opus.xml - platform/external/libopus

2012-06-19 09:11:40 -0400

[diff] [blame]

1

<?xml version="1.0" encoding="UTF-8"?>

2

<!DOCTYPE rfc SYSTEM "rfc2629.dtd" [

3

<!ENTITY rfc2119 PUBLIC '' 'http://xml.resource.org/public/rfc/bibxml/reference.RFC.2119.xml'>

Jean-Marc Valin

2014-06-30 14:13:46 -0400

[diff] [blame^]

4

<!ENTITY rfc3389 PUBLIC '' 'http://xml.resource.org/public/rfc/bibxml/reference.RFC.3389.xml'>

Gregory Maxwell

2012-06-19 09:11:40 -0400

[diff] [blame]

5

<!ENTITY rfc3550 PUBLIC '' 'http://xml.resource.org/public/rfc/bibxml/reference.RFC.3550.xml'>

6

<!ENTITY rfc3711 PUBLIC '' 'http://xml.resource.org/public/rfc/bibxml/reference.RFC.3711.xml'>

7

<!ENTITY rfc3551 PUBLIC '' 'http://xml.resource.org/public/rfc/bibxml/reference.RFC.3551.xml'>

8

<!ENTITY rfc4288 PUBLIC '' 'http://xml.resource.org/public/rfc/bibxml/reference.RFC.4288.xml'>

9

<!ENTITY rfc4855 PUBLIC '' 'http://xml.resource.org/public/rfc/bibxml/reference.RFC.4855.xml'>

10

<!ENTITY rfc4566 PUBLIC '' 'http://xml.resource.org/public/rfc/bibxml/reference.RFC.4566.xml'>

11

<!ENTITY rfc3264 PUBLIC '' 'http://xml.resource.org/public/rfc/bibxml/reference.RFC.3264.xml'>

12

<!ENTITY rfc2974 PUBLIC '' 'http://xml.resource.org/public/rfc/bibxml/reference.RFC.2974.xml'>

13

<!ENTITY rfc2326 PUBLIC '' 'http://xml.resource.org/public/rfc/bibxml/reference.RFC.2326.xml'>

14

<!ENTITY rfc3555 PUBLIC '' 'http://xml.resource.org/public/rfc/bibxml/reference.RFC.3555.xml'>

Timothy B. Terriberry

2012-11-21 18:48:09 -0800

[diff] [blame]

15

<!ENTITY rfc5576 PUBLIC '' 'http://xml.resource.org/public/rfc/bibxml/reference.RFC.5576.xml'>

Gregory Maxwell

2012-06-19 09:11:40 -0400

[diff] [blame]

16

<!ENTITY rfc6562 PUBLIC '' 'http://xml.resource.org/public/rfc/bibxml/reference.RFC.6562.xml'>

Jean-Marc Valin

2012-11-22 17:10:50 -0500

[diff] [blame]

17

<!ENTITY rfc6716 PUBLIC '' 'http://xml.resource.org/public/rfc/bibxml/reference.RFC.6716.xml'>

Jean-Marc Valin

5771b5a

2013-08-02 12:04:50 -0400

[diff] [blame]

18

<!ENTITY nbsp " ">

Gregory Maxwell

2012-06-19 09:11:40 -0400

[diff] [blame]

19

]>

20

Jean-Marc Valin

2014-06-30 14:13:46 -0400

[diff] [blame^]

21

Gregory Maxwell

2012-06-19 09:11:40 -0400

[diff] [blame]

22

<?xml-stylesheet type='text/xsl' href='rfc2629.xslt' ?>

23

24

<?rfc strict="yes" ?>

25

<?rfc toc="yes" ?>

26

<?rfc tocdepth="3" ?>

27

<?rfc tocappendix='no' ?>

28

<?rfc tocindent='yes' ?>

29

<?rfc symrefs="yes" ?>

30

<?rfc sortrefs="yes" ?>

31

<?rfc compact="no" ?>

32

<?rfc subcompact="yes" ?>

33

<?rfc iprnotified="yes" ?>

<front>

RTP Payload Format for Opus Speech and Audio Codec

</title>

Gregory Maxwell

2012-06-19 09:11:40 -0400

[diff] [blame]

41

Jean-Marc Valin

2012-11-22 17:10:50 -0500

[diff] [blame]

42

<email>jspittka@gmail.com</email>

Gregory Maxwell

2012-06-19 09:11:40 -0400

[diff] [blame]

</address>

</author>

Jean-Marc Valin

2014-01-17 14:05:37 -0500

[diff] [blame]

47

<organization>vocTone</organization>

Gregory Maxwell

2012-06-19 09:11:40 -0400

[diff] [blame]

48

49

Jean-Marc Valin

49e6c05

2014-01-17 14:05:37 -0500

[diff] [blame]

Gregory Maxwell

2012-06-19 09:11:40 -0400

[diff] [blame]

55

</postal>

Jean-Marc Valin

2012-11-22 17:10:50 -0500

[diff] [blame]

56

<email>koenvos74@gmail.com</email>

Gregory Maxwell

2012-06-19 09:11:40 -0400

[diff] [blame]

</address>

</author>

<organization>Mozilla</organization>

62

63

Jean-Marc Valin

2014-06-30 14:13:46 -0400

[diff] [blame^]

64

<street>2 Harrison Street</street>

65

<city>San Francisco</city>

Gregory Maxwell

2012-06-19 09:11:40 -0400

[diff] [blame]

66

Jean-Marc Valin

2014-06-30 14:13:46 -0400

[diff] [blame^]

67

Gregory Maxwell

2012-06-19 09:11:40 -0400

[diff] [blame]

68

69

</postal>

70

<email>jmvalin@jmvalin.ca</email>

</address>

</author>

Jean-Marc Valin

2014-06-30 14:13:46 -0400

[diff] [blame^]

74

Gregory Maxwell

2012-06-19 09:11:40 -0400

[diff] [blame]

<t>

This document defines the Real-time Transport Protocol (RTP) payload

79

format for packetization of Opus encoded

80

speech and audio data that is essential to integrate the codec in the

81

most compatible way. Further, media type registrations

82

are described for the RTP payload format.

</t>

</abstract>

</front>

<t>

The Opus codec is a speech and audio codec developed within the

Jean-Marc Valin

2012-11-22 17:10:50 -0500

[diff] [blame]

91

IETF Internet Wideband Audio Codec working group (codec). The codec

Jean-Marc Valin

2012-11-29 09:24:54 -0500

[diff] [blame]

92

has a very low algorithmic delay and it

Gregory Maxwell

2012-06-19 09:11:40 -0400

[diff] [blame]

93

is highly scalable in terms of audio bandwidth, bitrate, and

94

complexity. Further, it provides different modes to efficiently encode speech signals

95

as well as music signals, thus, making it the codec of choice for

96

various applications using the Internet or similar networks.

97

</t>

98

<t>

99

This document defines the Real-time Transport Protocol (RTP)

100

<xref target="RFC3550"/> payload format for packetization

101

of Opus encoded speech and audio data that is essential to

102

integrate the Opus codec in the

103

most compatible way. Further, media type registrations are described for

104

the RTP payload format. More information on the Opus

Jean-Marc Valin

2012-11-22 17:10:50 -0500

[diff] [blame]

105

codec can be obtained from <xref target="RFC6716"/>.

Gregory Maxwell

2012-06-19 09:11:40 -0400

[diff] [blame]

</t>

</section>

<t>The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",

111

"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this

112

document are to be interpreted as described in <xref target="RFC2119"/>.</t>

113

<t>

114

Jean-Marc Valin

2012-11-29 09:24:54 -0500

[diff] [blame]

115

<t hangText="CBR:"> Constant bitrate</t>

116

<t hangText="CPU:"> Central Processing Unit</t>

117

<t hangText="DTX:"> Discontinuous transmission</t>

118

<t hangText="FEC:"> Forward error correction</t>

Gregory Maxwell

2012-06-19 09:11:40 -0400

[diff] [blame]

119

<t hangText="IP:"> Internet Protocol</t>

Jean-Marc Valin

2012-11-29 09:24:54 -0500

[diff] [blame]

120

<t hangText="samples:"> Speech or audio samples (usually per channel)</t>

Gregory Maxwell

2012-06-19 09:11:40 -0400

[diff] [blame]

121

<t hangText="SDP:"> Session Description Protocol</t>

Jean-Marc Valin

2012-11-29 09:24:54 -0500

[diff] [blame]

122

<t hangText="VBR:"> Variable bitrate</t>

Gregory Maxwell

2012-06-19 09:11:40 -0400

[diff] [blame]

</list>

</t>

<t>

Throughout this document, we refer to the following definitions:

128

</t>

129

130

<ttcol align='center'>Abbreviation</ttcol>

131

132

<ttcol align='center'>Bandwidth</ttcol>

133

<ttcol align='center'>Sampling</ttcol>

<c>Narrowband</c>

<c>Mediumband</c>

<c>Wideband</c>

<c>Super-wideband</c>

<c>Fullband</c>

Audio bandwidth naming

</postamble>

</texttable>

</section>

</section>

<t>

Jean-Marc Valin

2012-11-22 17:10:50 -0500

[diff] [blame]

168

The Opus <xref target="RFC6716"/> speech and audio codec has been developed to encode speech

Gregory Maxwell

2012-06-19 09:11:40 -0400

[diff] [blame]

169

signals as well as audio signals. Two different modes, a voice mode

170

or an audio mode, may be chosen to allow the most efficient coding

171

dependent on the type of input signal, the sampling frequency of the

172

input signal, and the specific application.

173

</t>

174

175

<t>

Jean-Marc Valin

2012-11-09 14:30:25 -0500

[diff] [blame]

176

The voice mode allows efficient encoding of voice signals at lower bit

Gregory Maxwell

2012-06-19 09:11:40 -0400

[diff] [blame]

177

rates while the audio mode is optimized for audio signals at medium and

higher bitrates.

</t>

<t>

The Opus speech and audio codec is highly scalable in terms of audio

Jean-Marc Valin

2012-11-09 14:30:25 -0500

[diff] [blame]

183

bandwidth, bitrate, and complexity. Further, Opus allows

184

transmitting stereo signals.

Gregory Maxwell

2012-06-19 09:11:40 -0400

[diff] [blame]

</t>

<t>

Opus supports all bitrates from 6 kb/s to 510 kb/s.

190

The bitrate can be changed dynamically within that range.

191

All

192

other parameters being

Jean-Marc Valin

2014-06-30 14:13:46 -0400

[diff] [blame^]

193

equal, a higher bitrate results in higher quality.

Gregory Maxwell

2012-06-19 09:11:40 -0400

[diff] [blame]

</t>

<t>

For a frame size of

20 ms, these

are the bitrate "sweet spots" for Opus in various configurations:

200

201

202

<t>8-12 kb/s for NB speech,</t>

203

<t>16-20 kb/s for WB speech,</t>

204

<t>28-40 kb/s for FB speech,</t>

205

<t>48-64 kb/s for FB mono music, and</t>

206

<t>64-128 kb/s for FB stereo music.</t>

</list>

</t>

</section>

<t>

For the same average bitrate, variable bitrate (VBR) can achieve higher quality

213

than constant bitrate (CBR). For the majority of voice transmission application, VBR

214

is the best choice. One potential reason for choosing CBR is the potential

215

information leak that <spanx style='emph'>may</spanx> occur when encrypting the

216

compressed stream. See <xref target="RFC6562"/> for guidelines on when VBR is

217

appropriate for encrypted audio communications. In the case where an existing

218

VBR stream needs to be converted to CBR for security reasons, then the Opus padding

Jean-Marc Valin

2012-11-22 17:10:50 -0500

[diff] [blame]

219

mechanism described in <xref target="RFC6716"/> is the RECOMMENDED way to achieve padding

Gregory Maxwell

2012-06-19 09:11:40 -0400

[diff] [blame]

220

because the RTP padding bit is unencrypted.</t>

221

222

<t>

223

The bitrate can be adjusted at any point in time. To avoid congestion,

224

the average bitrate SHOULD be adjusted to the available

Jean-Marc Valin

2012-11-22 17:10:50 -0500

[diff] [blame]

225

network capacity. If no target bitrate is specified, the bitrates specified in

226

<xref target='bitrate_by_bandwidth'/> are RECOMMENDED.

Gregory Maxwell

2012-06-19 09:11:40 -0400

[diff] [blame]

</t>

</section>

<t>

The Opus codec may, as described in <xref target='variable-vs-constant-bitrate'/>,

235

be operated with an adaptive bitrate. In that case, the bitrate

236

will automatically be reduced for certain input signals like periods

237

of silence. During continuous transmission the bitrate will be

238

reduced, when the input signal allows to do so, but the transmission

239

to the receiver itself will never be interrupted. Therefore, the

240

received signal will maintain the same high level of quality over the

241

full duration of a transmission while minimizing the average bit

rate over time.

</t>

<t>

In cases where the bitrate of Opus needs to be reduced even

247

further or in cases where only constant bitrate is available,

248

the Opus encoder may be set to use discontinuous

249

transmission (DTX), where parts of the encoded signal that

250

correspond to periods of silence in the input speech or audio signal

Jean-Marc Valin

2014-06-30 14:13:46 -0400

[diff] [blame^]

251

are not transmitted to the receiver. A receiver can distinguish

252

between DTX and packet loss by looking for gaps in the sequence

253

number, as described by Section 4.1

254

of <xref target="RFC3551"/>.

Gregory Maxwell

2012-06-19 09:11:40 -0400

[diff] [blame]

</t>

<t>

On the receiving side, the non-transmitted parts will be handled by a

259

frame loss concealment unit in the Opus decoder which generates a

260

comfort noise signal to replace the non transmitted parts of the

Jean-Marc Valin

2014-06-30 14:13:46 -0400

[diff] [blame^]

261

speech or audio signal. Use of <xref target="RFC3389"/> Comfort

262

Noise (CN) with Opus is discouraged.

263

The transmitter MUST drop whole frames only,

264

based on the size of the last transmitted frame,

265

to ensure successive RTP timestamps differ by a multiple of 120 and

266

to allow the receiver to use whole frames for concealment.

Gregory Maxwell

2012-06-19 09:11:40 -0400

[diff] [blame]

</t>

<t>

The DTX mode of Opus will have a slightly lower speech or audio

271

quality than the continuous mode. Therefore, it is RECOMMENDED to

272

use Opus in the continuous mode unless restraints on network

273

capacity are severe. The DTX mode can be engaged for operation

274

in both adaptive or constant bitrate.

</t>

</section>

</section>

<t>

Complexity can be scaled to optimize for CPU resources in real-time, mostly as

285

a trade-off between audio quality and bitrate. Also, different modes of Opus have different complexity.

</t>

</section>

<t>

The voice mode of Opus allows for "in-band" forward error correction (FEC)

294

data to be embedded into the bit stream of Opus. This FEC scheme adds

295

redundant information about the previous packet (n-1) to the current

296

output packet n. For

297

each frame, the encoder decides whether to use FEC based on (1) an

298

externally-provided estimate of the channel's packet loss rate; (2) an

299

externally-provided estimate of the channel's capacity; (3) the

300

sensitivity of the audio or speech signal to packet loss; (4) whether

301

the receiving decoder has indicated it can take advantage of "in-band"

302

FEC information. The decision to send "in-band" FEC information is

303

entirely controlled by the encoder and therefore no special precautions

304

for the payload have to be taken.

</t>

<t>

On the receiving side, the decoder can take advantage of this

309

additional information when, in case of a packet loss, the next packet

310

is available. In order to use the FEC data, the jitter buffer needs

311

to provide access to payloads with the FEC data. The decoder API function

312

has a flag to indicate that a FEC frame rather than a regular frame should

313

be decoded. If no FEC data is available for the current frame, the decoder

314

will consider the frame lost and invokes the frame loss concealment.

</t>

<t>

If the FEC scheme is not implemented on the receiving side, FEC

319

SHOULD NOT be used, as it leads to an inefficient usage of network

320

resources. Decoder support for FEC SHOULD be indicated at the time a

session is set up.

</t>

</section>

<t>

Opus allows for transmission of stereo audio signals. This operation

330

is signaled in-band in the Opus payload and no special arrangement

331

is required in the payload format. Any implementation of the Opus

Jean-Marc Valin

2012-11-09 14:30:25 -0500

[diff] [blame]

332

decoder MUST be capable of receiving stereo signals, although it MAY

333

decode those signals as mono.

Gregory Maxwell

2012-06-19 09:11:40 -0400

[diff] [blame]

334

</t>

335

<t>

336

If a decoder can not take advantage of the benefits of a stereo signal

337

this SHOULD be indicated at the time a session is set up. In that case

338

the sending side SHOULD NOT send stereo signals as it leads to an

339

inefficient usage of the network.

</t>

</section>

</section>

<t>The payload format for Opus consists of the RTP header and Opus payload

348

data.</t>

349

350

<t>The format of the RTP header is specified in <xref target="RFC3550"/>. The Opus

351

payload format uses the fields of the RTP header consistent with this

352

specification.</t>

353

354

<t>The payload length of Opus is a multiple number of octets and

355

therefore no padding is required. The payload MAY be padded by an

356

integer number of octets according to <xref target="RFC3550"/>.</t>

357

Jean-Marc Valin

2014-06-30 14:13:46 -0400

[diff] [blame^]

358

<t>The timestamp, sequence number, and marker bit (M) of the RTP header

359

are used in accordance with Section 4.1

360

of <xref target="RFC3551"/>.</t>

Gregory Maxwell

2012-06-19 09:11:40 -0400

[diff] [blame]

361

362

<t>The RTP payload type for Opus has not been assigned statically and is

363

expected to be assigned dynamically.</t>

364

365

<t>The receiving side MUST be prepared to receive duplicates of RTP

366

packets. Only one of those payloads MUST be provided to the Opus decoder

367

for decoding and others MUST be discarded.</t>

368

369

<t>Opus supports 5 different audio bandwidths which may be adjusted during

370

the duration of a call. The RTP timestamp clock frequency is defined as

371

the highest supported sampling frequency of Opus, i.e. 48000 Hz, for all

372

modes and sampling rates of Opus. The unit

373

for the timestamp is samples per single (mono) channel. The RTP timestamp corresponds to the

374

sample time of the first encoded sample in the encoded frame. For sampling

375

rates lower than 48000 Hz the number of samples has to be multiplied with

376

a multiplier according to <xref target="fs-upsample-factors"/> to determine

377

the RTP timestamp.</t>

378

Jean-Marc Valin

2012-11-29 09:24:54 -0500

[diff] [blame]

379

Gregory Maxwell

2012-06-19 09:11:40 -0400

[diff] [blame]

380

381

<ttcol align='center'>Multiplier</ttcol>

Gregory Maxwell

2012-06-19 09:11:40 -0400

[diff] [blame]

</texttable>

</section>

<t>

The Opus encoder can be set to output encoded frames representing 2.5, 5, 10, 20,

398

40, or 60 ms of speech or audio data. Further, an arbitrary number of frames can be

399

combined into a packet. The maximum packet length is limited to the amount of encoded

400

data representing 120 ms of speech or audio data. The packetization of encoded data

401

is purely done by the Opus encoder and therefore only one packet output from the Opus

402

encoder MUST be used as a payload.

403

</t>

404

405

<t><xref target='payload-structure'/> shows the structure combined with the RTP header.</t>

406

407

<figure anchor="payload-structure"

408

title="Payload Structure with RTP header">

409

410

<![CDATA[

411

+----------+--------------+

412

|RTP Header| Opus Payload |

413

+----------+--------------+

]]>

</artwork>

</figure>

<t>

Julian Spittka

2012-11-30 03:12:59 -0500

[diff] [blame]

419

<xref target='opus-packetization'/> shows supported frame sizes in

420

milliseconds of encoded speech or audio data for speech and audio mode

421

(Mode) and sampling rates (fs) of Opus and how the timestamp needs to

422

be incremented for packetization (ts incr). If the Opus encoder

423

outputs multiple encoded frames into a single packet the timestamps

424

have to be added up according to the combined frames.

Gregory Maxwell

2012-06-19 09:11:40 -0400

[diff] [blame]

425

</t>

426

Julian Spittka

2012-11-30 03:12:59 -0500

[diff] [blame]

427

<texttable anchor='opus-packetization' title="Supported Opus frame

428

sizes and timestamp increments">

Gregory Maxwell

2012-06-19 09:11:40 -0400

[diff] [blame]

<c>voice</c>

<c></c>

<c></c>

<c>audio</c>

<c></c>

<c></c>

Gregory Maxwell

2012-06-19 09:11:40 -0400

[diff] [blame]

</texttable>

</section>

</section>

<t>The adaptive nature of the Opus codec allows for an efficient

470

congestion control.</t>

471

472

<t>The target bitrate of Opus can be adjusted at any point in time and

473

thus allowing for an efficient congestion control. Furthermore, the amount

474

of encoded speech or audio data encoded in a

475

single packet can be used for congestion control since the transmission

476

rate is inversely proportional to these frame sizes. A lower packet

477

transmission rate reduces the amount of header overhead but at the same

478

time increases latency and error sensitivity and should be done with care.</t>

479

480

<t>It is RECOMMENDED that congestion control is applied during the

481

transmission of Opus encoded data.</t>

</section>

<t>One media subtype (audio/opus) has been defined and registered as

486

described in the following section.</t>

487

488

489

<t>Media type registration is done according to <xref

490

target="RFC4288"/> and <xref target="RFC4855"/>.<vspace

491

blankLines='1'/></t>

492

493

<t>Type name: audio<vspace blankLines='1'/></t>

494

<t>Subtype name: opus<vspace blankLines='1'/></t>

495

496

<t>Required parameters:</t>

497

498

<t hangText="rate:"> RTP timestamp clock rate is incremented with

499

48000 Hz clock rate for all modes of Opus and all sampling

500

frequencies. For audio sampling rates other than 48000 Hz the rate

501

has to be adjusted to 48000 Hz according to <xref target="fs-upsample-factors"/>.

</t>

</list></t>

<t>Optional parameters:</t>

506

507

Jean-Marc Valin

2012-11-12 15:44:52 -0500

[diff] [blame]

508

509

a hint about the maximum output sampling rate that the receiver is

Jean-Marc Valin

2012-11-22 17:10:50 -0500

[diff] [blame]

510

capable of rendering in Hz.

Jean-Marc Valin

2012-11-12 15:44:52 -0500

[diff] [blame]

511

The decoder MUST be capable of decoding

Gregory Maxwell

2012-06-19 09:11:40 -0400

[diff] [blame]

512

any audio bandwidth but due to hardware limitations only signals

Jean-Marc Valin

2012-11-12 15:44:52 -0500

[diff] [blame]

513

up to the specified sampling rate can be played back. Sending signals

Gregory Maxwell

2012-06-19 09:11:40 -0400

[diff] [blame]

514

with higher audio bandwidth results in higher than necessary network

515

usage and encoding complexity, so an encoder SHOULD NOT encode

Jean-Marc Valin

2012-11-12 15:44:52 -0500

[diff] [blame]

516

frequencies above the audio bandwidth specified by maxplaybackrate.

517

This parameter can take any value between 8000 and 48000, although

518

commonly the value will match one of the Opus bandwidths

519

(<xref target="bandwidth_definitions"/>).

520

By default, the receiver is assumed to have no limitations, i.e. 48000.

Gregory Maxwell

2012-06-19 09:11:40 -0400

[diff] [blame]

</t>

Jean-Marc Valin

2012-11-12 15:44:52 -0500

[diff] [blame]

524

525

a hint about the maximum input sampling rate that the sender is likely to produce.

526

This is not a guarantee that the sender will never send any higher bandwidth

527

(e.g. it could send a pre-recorded prompt that uses a higher bandwidth), but it

528

indicates to the receiver that frequencies above this maximum can safely be discarded.

529

This parameter is useful to avoid wasting receiver resources by operating the audio

530

processing pipeline (e.g. echo cancellation) at a higher rate than necessary.

531

This parameter can take any value between 8000 and 48000, although

532

commonly the value will match one of the Opus bandwidths

533

(<xref target="bandwidth_definitions"/>).

534

By default, the sender is assumed to have no limitations, i.e. 48000.

Jean-Marc Valin

2012-11-09 14:30:25 -0500

[diff] [blame]

</t>

Gregory Maxwell

2012-06-19 09:11:40 -0400

[diff] [blame]

538

<t hangText="maxptime:"> the decoder's maximum length of time in

539

milliseconds rounded up to the next full integer value represented

540

by the media in a packet that can be

541

encapsulated in a received packet according to Section 6 of

542

<xref target="RFC4566"/>. Possible values are 3, 5, 10, 20, 40,

543

and 60 or an arbitrary multiple of Opus frame sizes rounded up to

544

the next full integer value up to a maximum value of 120 as

545

defined in <xref target='opus-rtp-payload-format'/>. If no value is

546

specified, 120 is assumed as default. This value is a recommendation

547

by the decoding side to ensure the best

548

performance for the decoder. The decoder MUST be

549

capable of accepting any allowed packet sizes to

550

ensure maximum compatibility.

551

552

553

<t hangText="ptime:"> the decoder's recommended length of time in

554

milliseconds rounded up to the next full integer value represented

555

by the media in a packet according to

556

Section 6 of <xref target="RFC4566"/>. Possible values are

557

3, 5, 10, 20, 40, or 60 or an arbitrary multiple of Opus frame sizes

558

rounded up to the next full integer value up to a maximum

559

value of 120 as defined in <xref

560

target='opus-rtp-payload-format'/>. If no value is

561

specified, 20 is assumed as default. If ptime is greater than

562

maxptime, ptime MUST be ignored. This parameter MAY be changed

563

during a session. This value is a recommendation by the decoding

564

side to ensure the best

565

performance for the decoder. The decoder MUST be

566

capable of accepting any allowed packet sizes to

567

ensure maximum compatibility.

568

569

570

<t hangText="minptime:"> the decoder's minimum length of time in

571

milliseconds rounded up to the next full integer value represented

572

by the media in a packet that SHOULD

573

be encapsulated in a received packet according to Section 6 of <xref

574

target="RFC4566"/>. Possible values are 3, 5, 10, 20, 40, and 60

575

or an arbitrary multiple of Opus frame sizes rounded up to the next

576

full integer value up to a maximum value of 120

577

as defined in <xref target='opus-rtp-payload-format'/>. If no value is

578

specified, 3 is assumed as default. This value is a recommendation

579

by the decoding side to ensure the best

580

performance for the decoder. The decoder MUST be

581

capable to accept any allowed packet sizes to

582

ensure maximum compatibility.

583

584

585

<t hangText="maxaveragebitrate:"> specifies the maximum average

586

receive bitrate of a session in bits per second (b/s). The actual

587

value of the bitrate may vary as it is dependent on the

588

characteristics of the media in a packet. Note that the maximum

589

average bitrate MAY be modified dynamically during a session. Any

590

positive integer is allowed but values outside the range between

591

6000 and 510000 SHOULD be ignored. If no value is specified, the

592

maximum value specified in <xref target='bitrate_by_bandwidth'/>

Jean-Marc Valin

2012-11-12 15:44:52 -0500

[diff] [blame]

593

for the corresponding mode of Opus and corresponding maxplaybackrate:

Gregory Maxwell

2012-06-19 09:11:40 -0400

[diff] [blame]

594

will be the default.<vspace blankLines='1'/></t>

595

596

597

specifies whether the decoder prefers receiving stereo or mono signals.

598

Possible values are 1 and 0 where 1 specifies that stereo signals are preferred

599

and 0 specifies that only mono signals are preferred.

600

Independent of the stereo parameter every receiver MUST be able to receive and

601

decode stereo signals but sending stereo signals to a receiver that signaled a

602

preference for mono signals may result in higher than necessary network

603

utilisation and encoding complexity. If no value is specified, mono

604

is assumed (stereo=0).<vspace blankLines='1'/>

605

</t>

606

Jean-Marc Valin

2012-11-09 14:30:25 -0500

[diff] [blame]

607

608

specifies whether the sender is likely to produce stereo audio.

609

Possible values are 1 and 0 where 1 specifies that stereo signals are likely to

610

be sent, and 0 speficies that the sender will likely only send mono.

611

This is not a guarantee that the sender will never send stereo audio

612

(e.g. it could send a pre-recorded prompt that uses stereo), but it

613

indicates to the receiver that the received signal can be safely downmixed to mono.

614

This parameter is useful to avoid wasting receiver resources by operating the audio

Jean-Marc Valin

2012-11-12 15:44:52 -0500

[diff] [blame]

615

processing pipeline (e.g. echo cancellation) in stereo when not necessary.

Jean-Marc Valin

2012-11-09 14:30:25 -0500

[diff] [blame]

616

If no value is specified, mono

Jean-Marc Valin

2012-11-22 17:10:50 -0500

[diff] [blame]

617

is assumed (sprop-stereo=0).<vspace blankLines='1'/>

Jean-Marc Valin

2012-11-09 14:30:25 -0500

[diff] [blame]

618

</t>

619

Gregory Maxwell

2012-06-19 09:11:40 -0400

[diff] [blame]

620

621

specifies if the decoder prefers the use of a constant bitrate versus

622

variable bitrate. Possible values are 1 and 0 where 1 specifies constant

623

bitrate and 0 specifies variable bitrate. If no value is specified, cbr

624

is assumed to be 0. Note that the maximum average bitrate may still be

625

changed, e.g. to adapt to changing network conditions.<vspace blankLines='1'/>

626

</t>

627

Jean-Marc Valin

2012-11-29 09:24:54 -0500

[diff] [blame]

628

<t hangText="useinbandfec:"> specifies that the decoder has the capability to

Julian Spittka

2012-11-30 03:12:59 -0500

[diff] [blame]

629

take advantage of the Opus in-band FEC. Possible values are 1 and 0. It is RECOMMENDED to provide

630

0 in case FEC cannot be utilized on the receiving side. If no

631

value is specified, useinbandfec is assumed to be 0.

Jean-Marc Valin

2012-11-29 09:24:54 -0500

[diff] [blame]

632

This parameter is only a preference and the receiver MUST be able to process

Julian Spittka

2012-11-30 03:12:59 -0500

[diff] [blame]

633

packets that include FEC information, even if it means the FEC part is discarded.

Jean-Marc Valin

2012-11-29 09:24:54 -0500

[diff] [blame]

634

Gregory Maxwell

2012-06-19 09:11:40 -0400

[diff] [blame]

635

636

<t hangText="usedtx:"> specifies if the decoder prefers the use of

637

DTX. Possible values are 1 and 0. If no value is specified, usedtx

638

is assumed to be 0.<vspace blankLines='1'/></t>

639

</list></t>

640

641

<t>Encoding considerations:<vspace blankLines='1'/></t>

642

643

<t>Opus media type is framed and consists of binary data according

644

to Section 4.8 in <xref target="RFC4288"/>.</t>

645

</list></t>

646

647

<t>Security considerations: </t>

648

649

<t>See <xref target='security-considerations'/> of this document.</t>

650

</list></t>

651

652

<t>Interoperability considerations: none<vspace blankLines='1'/></t>

653

<t>Published specification: none<vspace blankLines='1'/></t>

654

655

<t>Applications that use this media type: </t>

656

657

<t>Any application that requires the transport of

658

speech or audio data may use this media type. Some examples are,

659

but not limited to, audio and video conferencing, Voice over IP,

media streaming.</t>

</list></t>

Jean-Marc Valin

2013-08-02 12:04:50 -0400

[diff] [blame]

663

<t>Person & email address to contact for further information:</t>

Gregory Maxwell

2012-06-19 09:11:40 -0400

[diff] [blame]

664

665

<t>SILK Support silksupport@skype.net</t>

666

<t>Jean-Marc Valin jmvalin@jmvalin.ca</t>

667

</list></t>

668

669

<t>Intended usage: COMMON<vspace blankLines='1'/></t>

670

671

<t>Restrictions on usage:<vspace blankLines='1'/></t>

672

673

674

<t>For transfer over RTP, the RTP payload format (<xref

675

target='opus-rtp-payload-format'/> of this document) SHALL be

used.</t>

</list></t>

<t>Author:</t>

Julian Spittka

2012-11-30 03:12:59 -0500

[diff] [blame]

681

<t>Julian Spittka jspittka@gmail.com<vspace blankLines='1'/></t>

682

<t>Koen Vos koenvos74@gmail.com<vspace blankLines='1'/></t>

Gregory Maxwell

2012-06-19 09:11:40 -0400

[diff] [blame]

683

<t>Jean-Marc Valin jmvalin@jmvalin.ca<vspace blankLines='1'/></t>

684

</list></t>

685

686

<t> Change controller: TBD</t>

</section>

<t>The information described in the media type specification has a

691

specific mapping to fields in the Session Description Protocol (SDP)

692

<xref target="RFC4566"/>, which is commonly used to describe RTP

693

sessions. When SDP is used to specify sessions employing the Opus codec,

694

the mapping is as follows:</t>

<t>

<t>The media type ("audio") goes in SDP "m=" as the media name.</t>

699

700

<t>The media subtype ("opus") goes in SDP "a=rtpmap" as the encoding

Jean-Marc Valin

2012-11-09 14:30:25 -0500

[diff] [blame]

701

name. The RTP clock rate in "a=rtpmap" MUST be 48000 and the number of

702

channels MUST be 2.</t>

Gregory Maxwell

2012-06-19 09:11:40 -0400

[diff] [blame]

703

Timothy B. Terriberry

2012-11-21 18:48:09 -0800

[diff] [blame]

704

<t>The OPTIONAL media type parameters "ptime" and "maxptime" are

Gregory Maxwell

2012-06-19 09:11:40 -0400

[diff] [blame]

705

mapped to "a=ptime" and "a=maxptime" attributes, respectively, in the

706

SDP.</t>

707

Julian Spittka

2012-11-30 03:12:59 -0500

[diff] [blame]

708

<t>The OPTIONAL media type parameters "maxaveragebitrate",

709

"maxplaybackrate", "minptime", "stereo", "cbr", "useinbandfec", and

710

"usedtx", when present, MUST be included in the "a=fmtp" attribute

711

in the SDP, expressed as a media type string in the form of a

Timothy B. Terriberry

2012-11-21 18:48:09 -0800

[diff] [blame]

712

semicolon-separated list of parameter=value pairs (e.g.,

Timothy B. Terriberry

f92c87a

2012-11-22 04:38:35 -0800

[diff] [blame]

713

maxaveragebitrate=20000). They MUST NOT be specified in an

714

SSRC-specific "fmtp" source-level attribute (as defined in

715

Section 6.3 of <xref target="RFC5576"/>).</t>

Timothy B. Terriberry

2012-11-21 18:48:09 -0800

[diff] [blame]

716

717

<t>The OPTIONAL media type parameters "sprop-maxcapturerate",

718

and "sprop-stereo" MAY be mapped to the "a=fmtp" SDP attribute by

719

copying them directly from the media type parameter string as part

720

of the semicolon-separated list of parameter=value pairs (e.g.,

721

sprop-stereo=1). These same OPTIONAL media type parameters MAY also

Timothy B. Terriberry

f92c87a

2012-11-22 04:38:35 -0800

[diff] [blame]

722

be specified using an SSRC-specific "fmtp" source-level attribute

723

as described in Section 6.3 of <xref target="RFC5576"/>.

724

They MAY be specified in both places, in which case the parameter

725

in the source-level attribute overrides the one found on the

726

"a=fmtp" line. The value of any parameter which is not specified in

727

a source-level source attribute MUST be taken from the "a=fmtp"

728

line, if it is present there.</t>

Timothy B. Terriberry

2012-11-21 18:48:09 -0800

[diff] [blame]

729

Gregory Maxwell

2012-06-19 09:11:40 -0400

[diff] [blame]

</list>

</t>

<t>Below are some examples of SDP session descriptions for Opus:</t>

734

Jean-Marc Valin

2012-11-22 17:10:50 -0500

[diff] [blame]

735

<t>Example 1: Standard mono session with 48000 Hz clock rate</t>

Gregory Maxwell

2012-06-19 09:11:40 -0400

[diff] [blame]

<![CDATA[

m=audio 54312 RTP/AVP 101

Jean-Marc Valin

2012-11-09 14:30:25 -0500

[diff] [blame]

740

a=rtpmap:101 opus/48000/2

Gregory Maxwell

2012-06-19 09:11:40 -0400

[diff] [blame]

]]>

</artwork>

</figure>

<t>Example 2: 16000 Hz clock rate, maximum packet size of 40 ms,

747

recommended packet size of 40 ms, maximum average bitrate of 20000 bps,

Jean-Marc Valin

b880e9b

2012-11-22 17:25:22 -0500

[diff] [blame]

748

prefers to receive stereo but only plans to send mono, FEC is allowed,

Jean-Marc Valin

2012-11-22 17:10:50 -0500

[diff] [blame]

749

DTX is not allowed</t>

Gregory Maxwell

2012-06-19 09:11:40 -0400

[diff] [blame]

<![CDATA[

m=audio 54312 RTP/AVP 101

Jean-Marc Valin

2012-11-09 14:30:25 -0500

[diff] [blame]

755

a=rtpmap:101 opus/48000/2

Jean-Marc Valin

2012-11-22 17:10:50 -0500

[diff] [blame]

756

a=fmtp:101 maxplaybackrate=16000; sprop-maxcapturerate=16000;

757

maxaveragebitrate=20000; stereo=1; useinbandfec=1; usedtx=0

Gregory Maxwell

2012-06-19 09:11:40 -0400

[diff] [blame]

a=ptime:40

a=maxptime:40

]]>

</artwork>

</figure>

Jean-Marc Valin

2012-11-22 17:10:50 -0500

[diff] [blame]

764

<t>Example 3: Two-way full-band stereo preferred</t>

<![CDATA[

m=audio 54312 RTP/AVP 101

770

a=rtpmap:101 opus/48000/2

771

a=fmtp:101 stereo=1; sprop-stereo=1

]]>

</artwork>

</figure>

Gregory Maxwell

2012-06-19 09:11:40 -0400

[diff] [blame]

777

778

779

<t>When using the offer-answer procedure described in <xref

780

target="RFC3264"/> to negotiate the use of Opus, the following

781

considerations apply:</t>

<t>Opus supports several clock rates. For signaling purposes only

786

the highest, i.e. 48000, is used. The actual clock rate of the

787

corresponding media is signaled inside the payload and is not

788

subject to this payload format description. The decoder MUST be

789

capable to decode every received clock rate. An example

is shown below:

<![CDATA[

Jean-Marc Valin

2012-11-09 14:30:25 -0500

[diff] [blame]

795

m=audio 54312 RTP/AVP 100

796

a=rtpmap:100 opus/48000/2

Gregory Maxwell

2012-06-19 09:11:40 -0400

[diff] [blame]

]]>

</artwork>

</figure>

</t>

Jean-Marc Valin

2012-11-22 17:10:50 -0500

[diff] [blame]

802

<t>The "ptime" and "maxptime" parameters are unidirectional

Gregory Maxwell

2012-06-19 09:11:40 -0400

[diff] [blame]

803

receive-only parameters and typically will not compromise

804

interoperability; however, dependent on the set values of the

805

parameters the performance of the application may suffer. <xref

806

target="RFC3264"/> defines the SDP offer-answer handling of the

807

"ptime" parameter. The "maxptime" parameter MUST be handled in the

808

same way.</t>

809

810

<t>

Jean-Marc Valin

2012-11-22 17:10:50 -0500

[diff] [blame]

811

The "minptime" parameter is a unidirectional

Gregory Maxwell

2012-06-19 09:11:40 -0400

[diff] [blame]

812

receive-only parameters and typically will not compromise

813

interoperability; however, dependent on the set values of the

814

parameter the performance of the application may suffer and should be

set with care.

</t>

<t>

Jean-Marc Valin

2012-11-22 17:10:50 -0500

[diff] [blame]

819

The "maxplaybackrate" parameter is a unidirectional receive-only

Gregory Maxwell

2012-06-19 09:11:40 -0400

[diff] [blame]

820

parameter that reflects limitations of the local receiver. The sender

821

of the other side SHOULD NOT send with an audio bandwidth higher than

Jean-Marc Valin

2012-11-12 15:44:52 -0500

[diff] [blame]

822

"maxplaybackrate" as this would lead to inefficient use of network resources.

823

The "maxplaybackrate" parameter does not

Gregory Maxwell

2012-06-19 09:11:40 -0400

[diff] [blame]

824

affect interoperability. Also, this parameter SHOULD NOT be used

825

to adjust the audio bandwidth as a function of the bitrates, as this

Philip Jägenstedt

6d9c16d

2012-09-27 13:28:32 +0200

[diff] [blame]

826

is the responsibility of the Opus encoder implementation.

Gregory Maxwell

2012-06-19 09:11:40 -0400

[diff] [blame]

827

</t>

828

Jean-Marc Valin

2012-11-22 17:10:50 -0500

[diff] [blame]

829

<t>The "maxaveragebitrate" parameter is a unidirectional receive-only

Gregory Maxwell

2012-06-19 09:11:40 -0400

[diff] [blame]

830

parameter that reflects limitations of the local receiver. The sender

831

of the other side MUST NOT send with an average bitrate higher than

832

"maxaveragebitrate" as it might overload the network and/or

Jean-Marc Valin

2012-11-22 17:10:50 -0500

[diff] [blame]

833

receiver. The "maxaveragebitrate" parameter typically will not

Gregory Maxwell

2012-06-19 09:11:40 -0400

[diff] [blame]

834

compromise interoperability; however, dependent on the set value of

835

the parameter the performance of the application may suffer and should

836

be set with care.</t>

837

Julian Spittka

2012-11-30 03:12:59 -0500

[diff] [blame]

838

<t>The "sprop-maxcapturerate" and "sprop-stereo" parameters are

Jean-Marc Valin

2012-11-22 17:10:50 -0500

[diff] [blame]

839

unidirectional sender-only parameters that reflect limitations of

840

the sender side.

Jean-Marc Valin

b880e9b

2012-11-22 17:25:22 -0500

[diff] [blame]

841

They allow the receiver to set up a reduced-complexity audio

842

processing pipeline if the sender is not planning to use the full

843

range of Opus's capabilities.

Julian Spittka

2012-11-30 03:12:59 -0500

[diff] [blame]

844

Neither "sprop-maxcapturerate" nor "sprop-stereo" affect

Jean-Marc Valin

2012-11-22 17:10:50 -0500

[diff] [blame]

845

interoperability and the receiver MUST be capable of receiving any signal.

846

</t>

Gregory Maxwell

2012-06-19 09:11:40 -0400

[diff] [blame]

847

848

<t>

Jean-Marc Valin

2012-11-09 14:30:25 -0500

[diff] [blame]

849

The "stereo" parameter is a unidirectional receive-only

Gregory Maxwell

2012-06-19 09:11:40 -0400

[diff] [blame]

parameter.

</t>

<t>

Jean-Marc Valin

2012-11-09 14:30:25 -0500

[diff] [blame]

854

The "cbr" parameter is a unidirectional receive-only

Gregory Maxwell

2012-06-19 09:11:40 -0400

[diff] [blame]

parameter.

</t>

Jean-Marc Valin

2012-11-09 14:30:25 -0500

[diff] [blame]

858

<t>The "useinbandfec" parameter is a unidirectional receive-only

Gregory Maxwell

2012-06-19 09:11:40 -0400

[diff] [blame]

859

parameter.</t>

860

Jean-Marc Valin

2012-11-09 14:30:25 -0500

[diff] [blame]

861

<t>The "usedtx" parameter is a unidirectional receive-only

Gregory Maxwell

2012-06-19 09:11:40 -0400

[diff] [blame]

862

parameter.</t>

863

864

<t>Any unknown parameter in an offer MUST be ignored by the receiver

865

and MUST be removed from the answer.</t>

</list></t>

</section>

<t>For declarative use of SDP such as in Session Announcement Protocol

873

(SAP), <xref target="RFC2974"/>, and RTSP, <xref target="RFC2326"/>, for

874

Opus, the following needs to be considered:</t>

Jean-Marc Valin

2012-11-12 15:44:52 -0500

[diff] [blame]

878

<t>The values for "maxptime", "ptime", "minptime", "maxplaybackrate", and

Gregory Maxwell

2012-06-19 09:11:40 -0400

[diff] [blame]

879

"maxaveragebitrate" should be selected carefully to ensure that a

880

reasonable performance can be achieved for the participants of a session.</t>

881

882

<t>

883

The values for "maxptime", "ptime", and "minptime" of the payload

884

format configuration are recommendations by the decoding side to ensure

885

the best performance for the decoder. The decoder MUST be

886

capable to accept any allowed packet sizes to

887

ensure maximum compatibility.

888

</t>

889

890

<t>All other parameters of the payload format configuration are declarative

891

and a participant MUST use the configurations that are provided for

892

the session. More than one configuration may be provided if necessary

893

by declaring multiple RTP payload types; however, the number of types

894

should be kept small.</t>

</list></t>

</section>

</section>

</section>

<t>All RTP packets using the payload format defined in this specification

903

are subject to the general security considerations discussed in the RTP

904

specification <xref target="RFC3550"/> and any profile from

905

e.g. <xref target="RFC3711"/> or <xref target="RFC3551"/>.</t>

906

907

<t>This payload format transports Opus encoded speech or audio data,

908

hence, security issues include confidentiality, integrity protection, and

909

authentication of the speech or audio itself. The Opus payload format does

910

not have any built-in security mechanisms. Any suitable external

911

mechanisms, such as SRTP <xref target="RFC3711"/>, MAY be used.</t>

912

913

<t>This payload format and the Opus encoding do not exhibit any

914

significant non-uniformity in the receiver-end computational load and thus

915

are unlikely to pose a denial-of-service threat due to the receipt of

916

pathological datagrams.</t>

</section>

</section>

</middle>

<back>

&rfc2119;

Jean-Marc Valin

2014-06-30 14:13:46 -0400

[diff] [blame^]

927

&rfc3389;

Gregory Maxwell

2012-06-19 09:11:40 -0400

[diff] [blame]

&rfc3550;

&rfc3711;

&rfc3551;

&rfc4288;

&rfc4855;

&rfc4566;

&rfc3264;

&rfc2974;

&rfc2326;

Timothy B. Terriberry

2012-11-21 18:48:09 -0800

[diff] [blame]

937

&rfc5576;

Jean-Marc Valin

bdf8740

2012-07-11 15:54:55 -0400

[diff] [blame]

938

&rfc6562;

Jean-Marc Valin

2012-11-22 17:10:50 -0500

[diff] [blame]

939

&rfc6716;

Gregory Maxwell

2012-06-19 09:11:40 -0400

[diff] [blame]

940

</references>

941

Gregory Maxwell