Blame - doc/draft-spittka-payload-rtp-opus.xml - platform/external/libopus

2012-06-19 09:11:40 -0400

[diff] [blame]

1

<?xml version="1.0" encoding="UTF-8"?>

2

<!DOCTYPE rfc SYSTEM "rfc2629.dtd" [

3

<!ENTITY rfc2119 PUBLIC '' 'http://xml.resource.org/public/rfc/bibxml/reference.RFC.2119.xml'>

4

<!ENTITY rfc3550 PUBLIC '' 'http://xml.resource.org/public/rfc/bibxml/reference.RFC.3550.xml'>

5

<!ENTITY rfc3711 PUBLIC '' 'http://xml.resource.org/public/rfc/bibxml/reference.RFC.3711.xml'>

6

<!ENTITY rfc3551 PUBLIC '' 'http://xml.resource.org/public/rfc/bibxml/reference.RFC.3551.xml'>

7

<!ENTITY rfc4288 PUBLIC '' 'http://xml.resource.org/public/rfc/bibxml/reference.RFC.4288.xml'>

8

<!ENTITY rfc4855 PUBLIC '' 'http://xml.resource.org/public/rfc/bibxml/reference.RFC.4855.xml'>

9

<!ENTITY rfc4566 PUBLIC '' 'http://xml.resource.org/public/rfc/bibxml/reference.RFC.4566.xml'>

10

<!ENTITY rfc3264 PUBLIC '' 'http://xml.resource.org/public/rfc/bibxml/reference.RFC.3264.xml'>

11

<!ENTITY rfc2974 PUBLIC '' 'http://xml.resource.org/public/rfc/bibxml/reference.RFC.2974.xml'>

12

<!ENTITY rfc2326 PUBLIC '' 'http://xml.resource.org/public/rfc/bibxml/reference.RFC.2326.xml'>

13

<!ENTITY rfc3555 PUBLIC '' 'http://xml.resource.org/public/rfc/bibxml/reference.RFC.3555.xml'>

Timothy B. Terriberry

2012-11-21 18:48:09 -0800

[diff] [blame]

14

<!ENTITY rfc5576 PUBLIC '' 'http://xml.resource.org/public/rfc/bibxml/reference.RFC.5576.xml'>

Gregory Maxwell

2012-06-19 09:11:40 -0400

[diff] [blame]

15

<!ENTITY rfc6562 PUBLIC '' 'http://xml.resource.org/public/rfc/bibxml/reference.RFC.6562.xml'>

]>

<?xml-stylesheet type='text/xsl' href='rfc2629.xslt' ?>

21

22

<?rfc strict="yes" ?>

23

<?rfc toc="yes" ?>

24

<?rfc tocdepth="3" ?>

25

<?rfc tocappendix='no' ?>

26

<?rfc tocindent='yes' ?>

27

<?rfc symrefs="yes" ?>

28

<?rfc sortrefs="yes" ?>

29

<?rfc compact="no" ?>

30

<?rfc subcompact="yes" ?>

31

<?rfc iprnotified="yes" ?>

<front>

RTP Payload Format for Opus Speech and Audio Codec

</title>

<organization>Skype Technologies S.A.</organization>

40

41

42

<street>3210 Porter Drive</street>

</postal>

<email>julian.spittka@skype.net</email>

</address>

</author>

<organization>Skype Technologies S.A.</organization>

54

55

56

<street>3210 Porter Drive</street>

</postal>

<email>koen.vos@skype.net</email>

</address>

</author>

<organization>Mozilla</organization>

68

69

70

<street>650 Castro Street</street>

71

<city>Mountain View</city>

</postal>

<email>jmvalin@jmvalin.ca</email>

</address>

</author>

Jean-Marc Valin

2012-07-11 15:54:55 -0400

[diff] [blame]

80

Gregory Maxwell

2012-06-19 09:11:40 -0400

[diff] [blame]

<t>

This document defines the Real-time Transport Protocol (RTP) payload

85

format for packetization of Opus encoded

86

speech and audio data that is essential to integrate the codec in the

87

most compatible way. Further, media type registrations

88

are described for the RTP payload format.

</t>

</abstract>

</front>

<t>

The Opus codec is a speech and audio codec developed within the

97

IETF Internet Wideband Audio Codec working group [codec]. The codec

98

has a very low algorithmic delay and is

99

is highly scalable in terms of audio bandwidth, bitrate, and

100

complexity. Further, it provides different modes to efficiently encode speech signals

101

as well as music signals, thus, making it the codec of choice for

102

various applications using the Internet or similar networks.

103

</t>

104

<t>

105

This document defines the Real-time Transport Protocol (RTP)

106

<xref target="RFC3550"/> payload format for packetization

107

of Opus encoded speech and audio data that is essential to

108

integrate the Opus codec in the

109

most compatible way. Further, media type registrations are described for

110

the RTP payload format. More information on the Opus

111

codec can be obtained from the following IETF draft

[Opus].

</t>

</section>

<t>The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",

118

"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this

119

document are to be interpreted as described in <xref target="RFC2119"/>.</t>

120

<t>

121

122

<t hangText="CPU:"> Central Processing Unit</t>

123

<t hangText="IP:"> Internet Protocol</t>

124

<t hangText="PSTN:"> Public Switched Telephone Network</t>

125

<t hangText="samples:"> Speech or audio samples</t>

126

<t hangText="SDP:"> Session Description Protocol</t>

</list>

</t>

<t>

Throughout this document, we refer to the following definitions:

132

</t>

133

134

<ttcol align='center'>Abbreviation</ttcol>

135

136

<ttcol align='center'>Bandwidth</ttcol>

137

<ttcol align='center'>Sampling</ttcol>

<c>Narrowband</c>

<c>Mediumband</c>

<c>Wideband</c>

<c>Super-wideband</c>

<c>Fullband</c>

Audio bandwidth naming

</postamble>

</texttable>

</section>

</section>

<t>

The Opus [Opus] speech and audio codec has been developed to encode speech

173

signals as well as audio signals. Two different modes, a voice mode

174

or an audio mode, may be chosen to allow the most efficient coding

175

dependent on the type of input signal, the sampling frequency of the

176

input signal, and the specific application.

177

</t>

178

179

<t>

Jean-Marc Valin

2012-11-09 14:30:25 -0500

[diff] [blame]

180

The voice mode allows efficient encoding of voice signals at lower bit

Gregory Maxwell

2012-06-19 09:11:40 -0400

[diff] [blame]

181

rates while the audio mode is optimized for audio signals at medium and

higher bitrates.

</t>

<t>

The Opus speech and audio codec is highly scalable in terms of audio

Jean-Marc Valin

2012-11-09 14:30:25 -0500

[diff] [blame]

187

bandwidth, bitrate, and complexity. Further, Opus allows

188

transmitting stereo signals.

Gregory Maxwell

2012-06-19 09:11:40 -0400

[diff] [blame]

</t>

<t>

Opus supports all bitrates from 6 kb/s to 510 kb/s.

194

The bitrate can be changed dynamically within that range.

195

All

196

other parameters being

197

equal, higher bitrate results in higher quality.

</t>

<t>

For a frame size of

20 ms, these

are the bitrate "sweet spots" for Opus in various configurations:

204

205

206

<t>8-12 kb/s for NB speech,</t>

207

<t>16-20 kb/s for WB speech,</t>

208

<t>28-40 kb/s for FB speech,</t>

209

<t>48-64 kb/s for FB mono music, and</t>

210

<t>64-128 kb/s for FB stereo music.</t>

</list>

</t>

</section>

<t>

For the same average bitrate, variable bitrate (VBR) can achieve higher quality

217

than constant bitrate (CBR). For the majority of voice transmission application, VBR

218

is the best choice. One potential reason for choosing CBR is the potential

219

information leak that <spanx style='emph'>may</spanx> occur when encrypting the

220

compressed stream. See <xref target="RFC6562"/> for guidelines on when VBR is

221

appropriate for encrypted audio communications. In the case where an existing

222

VBR stream needs to be converted to CBR for security reasons, then the Opus padding

223

mechanism described in [Opus] is the RECOMMENDED way to achieve padding

224

because the RTP padding bit is unencrypted.</t>

225

226

<t>

227

The bitrate can be adjusted at any point in time. To avoid congestion,

228

the average bitrate SHOULD be adjusted to the available

229

network capacity. If no target bitrate is specified the average bitrate

230

may go up to the highest bitrate specified in

231

<xref target='bitrate_by_bandwidth'/>.

</t>

</section>

<t>

The Opus codec may, as described in <xref target='variable-vs-constant-bitrate'/>,

240

be operated with an adaptive bitrate. In that case, the bitrate

241

will automatically be reduced for certain input signals like periods

242

of silence. During continuous transmission the bitrate will be

243

reduced, when the input signal allows to do so, but the transmission

244

to the receiver itself will never be interrupted. Therefore, the

245

received signal will maintain the same high level of quality over the

246

full duration of a transmission while minimizing the average bit

rate over time.

</t>

<t>

In cases where the bitrate of Opus needs to be reduced even

252

further or in cases where only constant bitrate is available,

253

the Opus encoder may be set to use discontinuous

254

transmission (DTX), where parts of the encoded signal that

255

correspond to periods of silence in the input speech or audio signal

256

are not transmitted to the receiver.

</t>

<t>

On the receiving side, the non-transmitted parts will be handled by a

261

frame loss concealment unit in the Opus decoder which generates a

262

comfort noise signal to replace the non transmitted parts of the

263

speech or audio signal.

</t>

<t>

The DTX mode of Opus will have a slightly lower speech or audio

268

quality than the continuous mode. Therefore, it is RECOMMENDED to

269

use Opus in the continuous mode unless restraints on network

270

capacity are severe. The DTX mode can be engaged for operation

271

in both adaptive or constant bitrate.

</t>

</section>

</section>

<t>

Complexity can be scaled to optimize for CPU resources in real-time, mostly as

282

a trade-off between audio quality and bitrate. Also, different modes of Opus have different complexity.

</t>

</section>

<t>

The voice mode of Opus allows for "in-band" forward error correction (FEC)

291

data to be embedded into the bit stream of Opus. This FEC scheme adds

292

redundant information about the previous packet (n-1) to the current

293

output packet n. For

294

each frame, the encoder decides whether to use FEC based on (1) an

295

externally-provided estimate of the channel's packet loss rate; (2) an

296

externally-provided estimate of the channel's capacity; (3) the

297

sensitivity of the audio or speech signal to packet loss; (4) whether

298

the receiving decoder has indicated it can take advantage of "in-band"

299

FEC information. The decision to send "in-band" FEC information is

300

entirely controlled by the encoder and therefore no special precautions

301

for the payload have to be taken.

</t>

<t>

On the receiving side, the decoder can take advantage of this

306

additional information when, in case of a packet loss, the next packet

307

is available. In order to use the FEC data, the jitter buffer needs

308

to provide access to payloads with the FEC data. The decoder API function

309

has a flag to indicate that a FEC frame rather than a regular frame should

310

be decoded. If no FEC data is available for the current frame, the decoder

311

will consider the frame lost and invokes the frame loss concealment.

</t>

<t>

If the FEC scheme is not implemented on the receiving side, FEC

316

SHOULD NOT be used, as it leads to an inefficient usage of network

317

resources. Decoder support for FEC SHOULD be indicated at the time a

session is set up.

</t>

</section>

<t>

Opus allows for transmission of stereo audio signals. This operation

327

is signaled in-band in the Opus payload and no special arrangement

328

is required in the payload format. Any implementation of the Opus

Jean-Marc Valin

2012-11-09 14:30:25 -0500

[diff] [blame]

329

decoder MUST be capable of receiving stereo signals, although it MAY

330

decode those signals as mono.

Gregory Maxwell

2012-06-19 09:11:40 -0400

[diff] [blame]

331

</t>

332

<t>

333

If a decoder can not take advantage of the benefits of a stereo signal

334

this SHOULD be indicated at the time a session is set up. In that case

335

the sending side SHOULD NOT send stereo signals as it leads to an

336

inefficient usage of the network.

</t>

</section>

</section>

<t>The payload format for Opus consists of the RTP header and Opus payload

345

data.</t>

346

347

<t>The format of the RTP header is specified in <xref target="RFC3550"/>. The Opus

348

payload format uses the fields of the RTP header consistent with this

349

specification.</t>

350

351

<t>The payload length of Opus is a multiple number of octets and

352

therefore no padding is required. The payload MAY be padded by an

353

integer number of octets according to <xref target="RFC3550"/>.</t>

354

Jean-Marc Valin

2012-11-09 14:30:25 -0500

[diff] [blame]

355

<t>The marker bit (M) of the RTP header is used in accordance with

356

Section 4.1 of <xref target="RFC3551"/>.</t>

Gregory Maxwell

2012-06-19 09:11:40 -0400

[diff] [blame]

357

358

<t>The RTP payload type for Opus has not been assigned statically and is

359

expected to be assigned dynamically.</t>

360

361

<t>The receiving side MUST be prepared to receive duplicates of RTP

362

packets. Only one of those payloads MUST be provided to the Opus decoder

363

for decoding and others MUST be discarded.</t>

364

365

<t>Opus supports 5 different audio bandwidths which may be adjusted during

366

the duration of a call. The RTP timestamp clock frequency is defined as

367

the highest supported sampling frequency of Opus, i.e. 48000 Hz, for all

368

modes and sampling rates of Opus. The unit

369

for the timestamp is samples per single (mono) channel. The RTP timestamp corresponds to the

370

sample time of the first encoded sample in the encoded frame. For sampling

371

rates lower than 48000 Hz the number of samples has to be multiplied with

372

a multiplier according to <xref target="fs-upsample-factors"/> to determine

373

the RTP timestamp.</t>

<ttcol align='center'>Multiplier</ttcol>

fs specifies the audio sampling frequency in Hertz (Hz); Multiplier is the

390

value that the number of samples have to be multiplied with to calculate

the RTP timestamp.

</postamble>

</texttable>

</section>

<t>

The Opus encoder can be set to output encoded frames representing 2.5, 5, 10, 20,

399

40, or 60 ms of speech or audio data. Further, an arbitrary number of frames can be

400

combined into a packet. The maximum packet length is limited to the amount of encoded

401

data representing 120 ms of speech or audio data. The packetization of encoded data

402

is purely done by the Opus encoder and therefore only one packet output from the Opus

403

encoder MUST be used as a payload.

404

</t>

405

406

<t><xref target='payload-structure'/> shows the structure combined with the RTP header.</t>

407

408

<figure anchor="payload-structure"

409

title="Payload Structure with RTP header">

410

411

<![CDATA[

412

+----------+--------------+

413

|RTP Header| Opus Payload |

414

+----------+--------------+

]]>

</artwork>

</figure>

<t>

<xref target='opus-packetization'/> shows supported frame sizes for different modes

421

and sampling rates of Opus and how the timestamp needs to be incremented for

packetization.

</t>

<c>voice</c>

<c></c>

<c></c>

<c>audio</c>

<c></c>

<c></c>

Mode specifies the Opus mode of operation; fs specifies the audio sampling

460

frequency in Hertz (Hz); 2.5, 5, 10, 20, 40, and 60 represent the duration of

461

encoded speech or audio data in a packet; ts incr specifies the

462

value the timestamp needs to be incremented for the representing packet size.

463

For multiple frames in a packet these values have to be multiplied with the

464

respective number of frames.

</postamble>

</texttable>

</section>

</section>

<t>The adaptive nature of the Opus codec allows for an efficient

475

congestion control.</t>

476

477

<t>The target bitrate of Opus can be adjusted at any point in time and

478

thus allowing for an efficient congestion control. Furthermore, the amount

479

of encoded speech or audio data encoded in a

480

single packet can be used for congestion control since the transmission

481

rate is inversely proportional to these frame sizes. A lower packet

482

transmission rate reduces the amount of header overhead but at the same

483

time increases latency and error sensitivity and should be done with care.</t>

484

485

<t>It is RECOMMENDED that congestion control is applied during the

486

transmission of Opus encoded data.</t>

</section>

<t>One media subtype (audio/opus) has been defined and registered as

491

described in the following section.</t>

492

493

494

<t>Media type registration is done according to <xref

495

target="RFC4288"/> and <xref target="RFC4855"/>.<vspace

496

blankLines='1'/></t>

497

498

<t>Type name: audio<vspace blankLines='1'/></t>

499

<t>Subtype name: opus<vspace blankLines='1'/></t>

500

501

<t>Required parameters:</t>

502

503

<t hangText="rate:"> RTP timestamp clock rate is incremented with

504

48000 Hz clock rate for all modes of Opus and all sampling

505

frequencies. For audio sampling rates other than 48000 Hz the rate

506

has to be adjusted to 48000 Hz according to <xref target="fs-upsample-factors"/>.

</t>

</list></t>

<t>Optional parameters:</t>

511

512

Jean-Marc Valin

2012-11-12 15:44:52 -0500

[diff] [blame]

513

514

a hint about the maximum output sampling rate that the receiver is

515

capable of renderingin in Hz.

516

The decoder MUST be capable of decoding

Gregory Maxwell

2012-06-19 09:11:40 -0400

[diff] [blame]

517

any audio bandwidth but due to hardware limitations only signals

Jean-Marc Valin

2012-11-12 15:44:52 -0500

[diff] [blame]

518

up to the specified sampling rate can be played back. Sending signals

Gregory Maxwell

2012-06-19 09:11:40 -0400

[diff] [blame]

519

with higher audio bandwidth results in higher than necessary network

520

usage and encoding complexity, so an encoder SHOULD NOT encode

Jean-Marc Valin

2012-11-12 15:44:52 -0500

[diff] [blame]

521

frequencies above the audio bandwidth specified by maxplaybackrate.

522

This parameter can take any value between 8000 and 48000, although

523

commonly the value will match one of the Opus bandwidths

524

(<xref target="bandwidth_definitions"/>).

525

By default, the receiver is assumed to have no limitations, i.e. 48000.

Gregory Maxwell

2012-06-19 09:11:40 -0400

[diff] [blame]

</t>

Jean-Marc Valin

2012-11-12 15:44:52 -0500

[diff] [blame]

529

530

a hint about the maximum input sampling rate that the sender is likely to produce.

531

This is not a guarantee that the sender will never send any higher bandwidth

532

(e.g. it could send a pre-recorded prompt that uses a higher bandwidth), but it

533

indicates to the receiver that frequencies above this maximum can safely be discarded.

534

This parameter is useful to avoid wasting receiver resources by operating the audio

535

processing pipeline (e.g. echo cancellation) at a higher rate than necessary.

536

This parameter can take any value between 8000 and 48000, although

537

commonly the value will match one of the Opus bandwidths

538

(<xref target="bandwidth_definitions"/>).

539

By default, the sender is assumed to have no limitations, i.e. 48000.

Jean-Marc Valin

2012-11-09 14:30:25 -0500

[diff] [blame]

</t>

Gregory Maxwell

2012-06-19 09:11:40 -0400

[diff] [blame]

543

<t hangText="maxptime:"> the decoder's maximum length of time in

544

milliseconds rounded up to the next full integer value represented

545

by the media in a packet that can be

546

encapsulated in a received packet according to Section 6 of

547

<xref target="RFC4566"/>. Possible values are 3, 5, 10, 20, 40,

548

and 60 or an arbitrary multiple of Opus frame sizes rounded up to

549

the next full integer value up to a maximum value of 120 as

550

defined in <xref target='opus-rtp-payload-format'/>. If no value is

551

specified, 120 is assumed as default. This value is a recommendation

552

by the decoding side to ensure the best

553

performance for the decoder. The decoder MUST be

554

capable of accepting any allowed packet sizes to

555

ensure maximum compatibility.

556

557

558

<t hangText="ptime:"> the decoder's recommended length of time in

559

milliseconds rounded up to the next full integer value represented

560

by the media in a packet according to

561

Section 6 of <xref target="RFC4566"/>. Possible values are

562

3, 5, 10, 20, 40, or 60 or an arbitrary multiple of Opus frame sizes

563

rounded up to the next full integer value up to a maximum

564

value of 120 as defined in <xref

565

target='opus-rtp-payload-format'/>. If no value is

566

specified, 20 is assumed as default. If ptime is greater than

567

maxptime, ptime MUST be ignored. This parameter MAY be changed

568

during a session. This value is a recommendation by the decoding

569

side to ensure the best

570

performance for the decoder. The decoder MUST be

571

capable of accepting any allowed packet sizes to

572

ensure maximum compatibility.

573

574

575

<t hangText="minptime:"> the decoder's minimum length of time in

576

milliseconds rounded up to the next full integer value represented

577

by the media in a packet that SHOULD

578

be encapsulated in a received packet according to Section 6 of <xref

579

target="RFC4566"/>. Possible values are 3, 5, 10, 20, 40, and 60

580

or an arbitrary multiple of Opus frame sizes rounded up to the next

581

full integer value up to a maximum value of 120

582

as defined in <xref target='opus-rtp-payload-format'/>. If no value is

583

specified, 3 is assumed as default. This value is a recommendation

584

by the decoding side to ensure the best

585

performance for the decoder. The decoder MUST be

586

capable to accept any allowed packet sizes to

587

ensure maximum compatibility.

588

589

590

<t hangText="maxaveragebitrate:"> specifies the maximum average

591

receive bitrate of a session in bits per second (b/s). The actual

592

value of the bitrate may vary as it is dependent on the

593

characteristics of the media in a packet. Note that the maximum

594

average bitrate MAY be modified dynamically during a session. Any

595

positive integer is allowed but values outside the range between

596

6000 and 510000 SHOULD be ignored. If no value is specified, the

597

maximum value specified in <xref target='bitrate_by_bandwidth'/>

Jean-Marc Valin

2012-11-12 15:44:52 -0500

[diff] [blame]

598

for the corresponding mode of Opus and corresponding maxplaybackrate:

Gregory Maxwell

2012-06-19 09:11:40 -0400

[diff] [blame]

599

will be the default.<vspace blankLines='1'/></t>

600

601

602

specifies whether the decoder prefers receiving stereo or mono signals.

603

Possible values are 1 and 0 where 1 specifies that stereo signals are preferred

604

and 0 specifies that only mono signals are preferred.

605

Independent of the stereo parameter every receiver MUST be able to receive and

606

decode stereo signals but sending stereo signals to a receiver that signaled a

607

preference for mono signals may result in higher than necessary network

608

utilisation and encoding complexity. If no value is specified, mono

609

is assumed (stereo=0).<vspace blankLines='1'/>

610

</t>

611

Jean-Marc Valin

2012-11-09 14:30:25 -0500

[diff] [blame]

612

613

specifies whether the sender is likely to produce stereo audio.

614

Possible values are 1 and 0 where 1 specifies that stereo signals are likely to

615

be sent, and 0 speficies that the sender will likely only send mono.

616

This is not a guarantee that the sender will never send stereo audio

617

(e.g. it could send a pre-recorded prompt that uses stereo), but it

618

indicates to the receiver that the received signal can be safely downmixed to mono.

619

This parameter is useful to avoid wasting receiver resources by operating the audio

Jean-Marc Valin

2012-11-12 15:44:52 -0500

[diff] [blame]

620

processing pipeline (e.g. echo cancellation) in stereo when not necessary.

Jean-Marc Valin

2012-11-09 14:30:25 -0500

[diff] [blame]

621

If no value is specified, mono

622

is assumed (stereo=0).<vspace blankLines='1'/>

623

</t>

624

Gregory Maxwell

2012-06-19 09:11:40 -0400

[diff] [blame]

625

626

specifies if the decoder prefers the use of a constant bitrate versus

627

variable bitrate. Possible values are 1 and 0 where 1 specifies constant

628

bitrate and 0 specifies variable bitrate. If no value is specified, cbr

629

is assumed to be 0. Note that the maximum average bitrate may still be

630

changed, e.g. to adapt to changing network conditions.<vspace blankLines='1'/>

631

</t>

632

633

<t hangText="useinbandfec:"> specifies that Opus in-band FEC is

634

supported by the decoder and MAY be used during a

635

session. Possible values are 1 and 0. It is RECOMMENDED to provide

636

0 in case FEC is not implemented on the receiving side. If no

637

value is specified, useinbandfec is assumed to be 1.<vspace blankLines='1'/></t>

638

639

<t hangText="usedtx:"> specifies if the decoder prefers the use of

640

DTX. Possible values are 1 and 0. If no value is specified, usedtx

641

is assumed to be 0.<vspace blankLines='1'/></t>

642

</list></t>

643

644

<t>Encoding considerations:<vspace blankLines='1'/></t>

645

646

<t>Opus media type is framed and consists of binary data according

647

to Section 4.8 in <xref target="RFC4288"/>.</t>

648

</list></t>

649

650

<t>Security considerations: </t>

651

652

<t>See <xref target='security-considerations'/> of this document.</t>

653

</list></t>

654

655

<t>Interoperability considerations: none<vspace blankLines='1'/></t>

656

<t>Published specification: none<vspace blankLines='1'/></t>

657

658

<t>Applications that use this media type: </t>

659

660

<t>Any application that requires the transport of

661

speech or audio data may use this media type. Some examples are,

662

but not limited to, audio and video conferencing, Voice over IP,

media streaming.</t>

</list></t>

<t>Person & email address to contact for further information:</t>

667

668

<t>SILK Support silksupport@skype.net</t>

669

<t>Jean-Marc Valin jmvalin@jmvalin.ca</t>

670

</list></t>

671

672

<t>Intended usage: COMMON<vspace blankLines='1'/></t>

673

674

<t>Restrictions on usage:<vspace blankLines='1'/></t>

675

676

677

<t>For transfer over RTP, the RTP payload format (<xref

678

target='opus-rtp-payload-format'/> of this document) SHALL be

used.</t>

</list></t>

<t>Author:</t>

<t>Julian Spittka julian.spittka@skype.net<vspace blankLines='1'/></t>

685

<t>Koen Vos koen.vos@skype.net<vspace blankLines='1'/></t>

686

<t>Jean-Marc Valin jmvalin@jmvalin.ca<vspace blankLines='1'/></t>

687

</list></t>

688

689

<t> Change controller: TBD</t>

</section>

<t>The information described in the media type specification has a

694

specific mapping to fields in the Session Description Protocol (SDP)

695

<xref target="RFC4566"/>, which is commonly used to describe RTP

696

sessions. When SDP is used to specify sessions employing the Opus codec,

697

the mapping is as follows:</t>

<t>

<t>The media type ("audio") goes in SDP "m=" as the media name.</t>

702

703

<t>The media subtype ("opus") goes in SDP "a=rtpmap" as the encoding

Jean-Marc Valin

2012-11-09 14:30:25 -0500

[diff] [blame]

704

name. The RTP clock rate in "a=rtpmap" MUST be 48000 and the number of

705

channels MUST be 2.</t>

Gregory Maxwell

2012-06-19 09:11:40 -0400

[diff] [blame]

706

Timothy B. Terriberry

2012-11-21 18:48:09 -0800

[diff] [blame]

707

<t>The OPTIONAL media type parameters "ptime" and "maxptime" are

Gregory Maxwell

2012-06-19 09:11:40 -0400

[diff] [blame]

708

mapped to "a=ptime" and "a=maxptime" attributes, respectively, in the

709

SDP.</t>

710

Timothy B. Terriberry

2012-11-21 18:48:09 -0800

[diff] [blame]

711

<t>The OPTIONAL media type parameters "maxaveragebitrate",

712

"minptime", "stereo", "cbr", "useinbandfec", and "usedtx", when

713

present, MUST be included in the "a=fmtp" attribute in the SDP,

714

expressed as a media type string in the form of a

715

semicolon-separated list of parameter=value pairs (e.g.,

Timothy B. Terriberry

f92c87a

2012-11-22 04:38:35 -0800

[diff] [blame^]

716

maxaveragebitrate=20000). They MUST NOT be specified in an

717

SSRC-specific "fmtp" source-level attribute (as defined in

718

Section 6.3 of <xref target="RFC5576"/>).</t>

Timothy B. Terriberry

2012-11-21 18:48:09 -0800

[diff] [blame]

719

720

<t>The OPTIONAL media type parameters "sprop-maxcapturerate",

721

and "sprop-stereo" MAY be mapped to the "a=fmtp" SDP attribute by

722

copying them directly from the media type parameter string as part

723

of the semicolon-separated list of parameter=value pairs (e.g.,

724

sprop-stereo=1). These same OPTIONAL media type parameters MAY also

Timothy B. Terriberry

f92c87a

2012-11-22 04:38:35 -0800

[diff] [blame^]

725

be specified using an SSRC-specific "fmtp" source-level attribute

726

as described in Section 6.3 of <xref target="RFC5576"/>.

727

They MAY be specified in both places, in which case the parameter

728

in the source-level attribute overrides the one found on the

729

"a=fmtp" line. The value of any parameter which is not specified in

730

a source-level source attribute MUST be taken from the "a=fmtp"

731

line, if it is present there.</t>

Timothy B. Terriberry

2012-11-21 18:48:09 -0800

[diff] [blame]

732

Gregory Maxwell

2012-06-19 09:11:40 -0400

[diff] [blame]

</list>

</t>

<t>Below are some examples of SDP session descriptions for Opus:</t>

737

738

<t>Example 1: Standard session with 48000 Hz clock rate</t>

<![CDATA[

m=audio 54312 RTP/AVP 101

Jean-Marc Valin

2012-11-09 14:30:25 -0500

[diff] [blame]

743

a=rtpmap:101 opus/48000/2

Gregory Maxwell

2012-06-19 09:11:40 -0400

[diff] [blame]

]]>

</artwork>

</figure>

<t>Example 2: 16000 Hz clock rate, maximum packet size of 40 ms,

750

recommended packet size of 40 ms, maximum average bitrate of 20000 bps,

751

stereo signals are preferred, FEC is allowed, DTX is not allowed</t>

<![CDATA[

m=audio 54312 RTP/AVP 101

Jean-Marc Valin

2012-11-09 14:30:25 -0500

[diff] [blame]

757

a=rtpmap:101 opus/48000/2

Jean-Marc Valin

2012-11-12 15:44:52 -0500

[diff] [blame]

758

a=fmtp:101 maxplaybackrate=16000; maxaveragebitrate=20000;

Gregory Maxwell

2012-06-19 09:11:40 -0400

[diff] [blame]

759

stereo=1; useinbandfec=1; usedtx=0

a=ptime:40

a=maxptime:40

]]>

</artwork>

</figure>

<t>When using the offer-answer procedure described in <xref

769

target="RFC3264"/> to negotiate the use of Opus, the following

770

considerations apply:</t>

<t>Opus supports several clock rates. For signaling purposes only

775

the highest, i.e. 48000, is used. The actual clock rate of the

776

corresponding media is signaled inside the payload and is not

777

subject to this payload format description. The decoder MUST be

778

capable to decode every received clock rate. An example

is shown below:

<![CDATA[

Jean-Marc Valin

2012-11-09 14:30:25 -0500

[diff] [blame]

784

m=audio 54312 RTP/AVP 100

785

a=rtpmap:100 opus/48000/2

Gregory Maxwell

2012-06-19 09:11:40 -0400

[diff] [blame]

]]>

</artwork>

</figure>

</t>

<t>The parameters "ptime" and "maxptime" are unidirectional

792

receive-only parameters and typically will not compromise

793

interoperability; however, dependent on the set values of the

794

parameters the performance of the application may suffer. <xref

795

target="RFC3264"/> defines the SDP offer-answer handling of the

796

"ptime" parameter. The "maxptime" parameter MUST be handled in the

same way.</t>

<t>

The parameter "minptime" is a unidirectional

801

receive-only parameters and typically will not compromise

802

interoperability; however, dependent on the set values of the

803

parameter the performance of the application may suffer and should be

set with care.

</t>

<t>

Jean-Marc Valin

2012-11-12 15:44:52 -0500

[diff] [blame]

808

The parameter "maxplaybackrate" is a unidirectional receive-only

Gregory Maxwell

2012-06-19 09:11:40 -0400

[diff] [blame]

809

parameter that reflects limitations of the local receiver. The sender

810

of the other side SHOULD NOT send with an audio bandwidth higher than

Jean-Marc Valin

2012-11-12 15:44:52 -0500

[diff] [blame]

811

"maxplaybackrate" as this would lead to inefficient use of network resources.

812

The "maxplaybackrate" parameter does not

Gregory Maxwell

2012-06-19 09:11:40 -0400

[diff] [blame]

813

affect interoperability. Also, this parameter SHOULD NOT be used

814

to adjust the audio bandwidth as a function of the bitrates, as this

Philip Jägenstedt

6d9c16d

2012-09-27 13:28:32 +0200

[diff] [blame]

815

is the responsibility of the Opus encoder implementation.

Gregory Maxwell

2012-06-19 09:11:40 -0400

[diff] [blame]

816

</t>

817

818

<t>The parameter "maxaveragebitrate" is a unidirectional receive-only

819

parameter that reflects limitations of the local receiver. The sender

820

of the other side MUST NOT send with an average bitrate higher than

821

"maxaveragebitrate" as it might overload the network and/or

822

receiver. The parameter "maxaveragebitrate" typically will not

823

compromise interoperability; however, dependent on the set value of

824

the parameter the performance of the application may suffer and should

825

be set with care.</t>

826

827

<t>If the parameter "maxaveragebitrate" is below the range specified

828

in <xref target='bitrate_by_bandwidth'/> the session MUST be rejected.</t>

829

830

<t>

Jean-Marc Valin

2012-11-09 14:30:25 -0500

[diff] [blame]

831

The "stereo" parameter is a unidirectional receive-only

Gregory Maxwell

2012-06-19 09:11:40 -0400

[diff] [blame]

parameter.

</t>

<t>

Jean-Marc Valin

2012-11-09 14:30:25 -0500

[diff] [blame]

836

The "cbr" parameter is a unidirectional receive-only

Gregory Maxwell

2012-06-19 09:11:40 -0400

[diff] [blame]

parameter.

</t>

Jean-Marc Valin

2012-11-09 14:30:25 -0500

[diff] [blame]

840

<t>The "useinbandfec" parameter is a unidirectional receive-only

Gregory Maxwell

2012-06-19 09:11:40 -0400

[diff] [blame]

841

parameter.</t>

842

Jean-Marc Valin

2012-11-09 14:30:25 -0500

[diff] [blame]

843

<t>The "usedtx" parameter is a unidirectional receive-only

Gregory Maxwell

2012-06-19 09:11:40 -0400

[diff] [blame]

844

parameter.</t>

845

846

<t>Any unknown parameter in an offer MUST be ignored by the receiver

847

and MUST be removed from the answer.</t>

</list></t>

</section>

<t>For declarative use of SDP such as in Session Announcement Protocol

855

(SAP), <xref target="RFC2974"/>, and RTSP, <xref target="RFC2326"/>, for

856

Opus, the following needs to be considered:</t>

Jean-Marc Valin

2012-11-12 15:44:52 -0500

[diff] [blame]

860

<t>The values for "maxptime", "ptime", "minptime", "maxplaybackrate", and

Gregory Maxwell

2012-06-19 09:11:40 -0400

[diff] [blame]

861

"maxaveragebitrate" should be selected carefully to ensure that a

862

reasonable performance can be achieved for the participants of a session.</t>

863

864

<t>

865

The values for "maxptime", "ptime", and "minptime" of the payload

866

format configuration are recommendations by the decoding side to ensure

867

the best performance for the decoder. The decoder MUST be

868

capable to accept any allowed packet sizes to

869

ensure maximum compatibility.

870

</t>

871

872

<t>All other parameters of the payload format configuration are declarative

873

and a participant MUST use the configurations that are provided for

874

the session. More than one configuration may be provided if necessary

875

by declaring multiple RTP payload types; however, the number of types

876

should be kept small.</t>

</list></t>

</section>

</section>

</section>

<t>All RTP packets using the payload format defined in this specification

885

are subject to the general security considerations discussed in the RTP

886

specification <xref target="RFC3550"/> and any profile from

887

e.g. <xref target="RFC3711"/> or <xref target="RFC3551"/>.</t>

888

889

<t>This payload format transports Opus encoded speech or audio data,

890

hence, security issues include confidentiality, integrity protection, and

891

authentication of the speech or audio itself. The Opus payload format does

892

not have any built-in security mechanisms. Any suitable external

893

mechanisms, such as SRTP <xref target="RFC3711"/>, MAY be used.</t>

894

895

<t>This payload format and the Opus encoding do not exhibit any

896

significant non-uniformity in the receiver-end computational load and thus

897

are unlikely to pose a denial-of-service threat due to the receipt of

898

pathological datagrams.</t>

</section>

</section>

</middle>

<back>

&rfc2119;

&rfc3550;

&rfc3711;

&rfc3551;

&rfc4288;

&rfc4855;

&rfc4566;

&rfc3264;

&rfc2974;

&rfc2326;

Timothy B. Terriberry

2012-11-21 18:48:09 -0800

[diff] [blame]

918

&rfc5576;

Jean-Marc Valin

bdf8740

2012-07-11 15:54:55 -0400

[diff] [blame]

919

&rfc6562;

Gregory Maxwell