Lucas Eckels | f869a6f | 2012-08-06 15:15:24 -0700 | [diff] [blame] | 1 | |
| 2 | |
| 3 | |
| 4 | |
| 5 | |
| 6 | |
| 7 | Network Working Group S. Pfeiffer |
| 8 | Request for Comments: 3533 CSIRO |
| 9 | Category: Informational May 2003 |
| 10 | |
| 11 | |
| 12 | The Ogg Encapsulation Format Version 0 |
| 13 | |
| 14 | Status of this Memo |
| 15 | |
| 16 | This memo provides information for the Internet community. It does |
| 17 | not specify an Internet standard of any kind. Distribution of this |
| 18 | memo is unlimited. |
| 19 | |
| 20 | Copyright Notice |
| 21 | |
| 22 | Copyright (C) The Internet Society (2003). All Rights Reserved. |
| 23 | |
| 24 | Abstract |
| 25 | |
| 26 | This document describes the Ogg bitstream format version 0, which is |
| 27 | a general, freely-available encapsulation format for media streams. |
| 28 | It is able to encapsulate any kind and number of video and audio |
| 29 | encoding formats as well as other data streams in a single bitstream. |
| 30 | |
| 31 | Terminology |
| 32 | |
| 33 | The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", |
| 34 | "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this |
| 35 | document are to be interpreted as described in BCP 14, RFC 2119 [2]. |
| 36 | |
| 37 | Table of Contents |
| 38 | |
| 39 | 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 2 |
| 40 | 2. Definitions . . . . . . . . . . . . . . . . . . . . . . . . . 2 |
| 41 | 3. Requirements for a generic encapsulation format . . . . . . . 3 |
| 42 | 4. The Ogg bitstream format . . . . . . . . . . . . . . . . . . . 3 |
| 43 | 5. The encapsulation process . . . . . . . . . . . . . . . . . . 6 |
| 44 | 6. The Ogg page format . . . . . . . . . . . . . . . . . . . . . 9 |
| 45 | 7. Security Considerations . . . . . . . . . . . . . . . . . . . 11 |
| 46 | 8. References . . . . . . . . . . . . . . . . . . . . . . . . . . 12 |
| 47 | A. Glossary of terms and abbreviations . . . . . . . . . . . . . 13 |
| 48 | B. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 14 |
| 49 | Author's Address . . . . . . . . . . . . . . . . . . . . . . . 14 |
| 50 | Full Copyright Statement . . . . . . . . . . . . . . . . . . . 15 |
| 51 | |
| 52 | |
| 53 | |
| 54 | |
| 55 | |
| 56 | |
| 57 | |
| 58 | Pfeiffer Informational [Page 1] |
| 59 | |
| 60 | RFC 3533 OGG May 2003 |
| 61 | |
| 62 | |
| 63 | 1. Introduction |
| 64 | |
| 65 | The Ogg bitstream format has been developed as a part of a larger |
| 66 | project aimed at creating a set of components for the coding and |
| 67 | decoding of multimedia content (codecs) which are to be freely |
| 68 | available and freely re-implementable, both in software and in |
| 69 | hardware for the computing community at large, including the Internet |
| 70 | community. It is the intention of the Ogg developers represented by |
| 71 | Xiph.Org that it be usable without intellectual property concerns. |
| 72 | |
| 73 | This document describes the Ogg bitstream format and how to use it to |
| 74 | encapsulate one or several media bitstreams created by one or several |
| 75 | encoders. The Ogg transport bitstream is designed to provide |
| 76 | framing, error protection and seeking structure for higher-level |
| 77 | codec streams that consist of raw, unencapsulated data packets, such |
| 78 | as the Vorbis audio codec or the upcoming Tarkin and Theora video |
| 79 | codecs. It is capable of interleaving different binary media and |
| 80 | other time-continuous data streams that are prepared by an encoder as |
| 81 | a sequence of data packets. Ogg provides enough information to |
| 82 | properly separate data back into such encoder created data packets at |
| 83 | the original packet boundaries without relying on decoding to find |
| 84 | packet boundaries. |
| 85 | |
| 86 | Please note that the MIME type application/ogg has been registered |
| 87 | with the IANA [1]. |
| 88 | |
| 89 | 2. Definitions |
| 90 | |
| 91 | For describing the Ogg encapsulation process, a set of terms will be |
| 92 | used whose meaning needs to be well understood. Therefore, some of |
| 93 | the most fundamental terms are defined now before we start with the |
| 94 | description of the requirements for a generic media stream |
| 95 | encapsulation format, the process of encapsulation, and the concrete |
| 96 | format of the Ogg bitstream. See the Appendix for a more complete |
| 97 | glossary. |
| 98 | |
| 99 | The result of an Ogg encapsulation is called the "Physical (Ogg) |
| 100 | Bitstream". It encapsulates one or several encoder-created |
| 101 | bitstreams, which are called "Logical Bitstreams". A logical |
| 102 | bitstream, provided to the Ogg encapsulation process, has a |
| 103 | structure, i.e., it is split up into a sequence of so-called |
| 104 | "Packets". The packets are created by the encoder of that logical |
| 105 | bitstream and represent meaningful entities for that encoder only |
| 106 | (e.g., an uncompressed stream may use video frames as packets). They |
| 107 | do not contain boundary information - strung together they appear to |
| 108 | be streams of random bytes with no landmarks. |
| 109 | |
| 110 | |
| 111 | |
| 112 | |
| 113 | |
| 114 | Pfeiffer Informational [Page 2] |
| 115 | |
| 116 | RFC 3533 OGG May 2003 |
| 117 | |
| 118 | |
| 119 | Please note that the term "packet" is not used in this document to |
| 120 | signify entities for transport over a network. |
| 121 | |
| 122 | 3. Requirements for a generic encapsulation format |
| 123 | |
| 124 | The design idea behind Ogg was to provide a generic, linear media |
| 125 | transport format to enable both file-based storage and stream-based |
| 126 | transmission of one or several interleaved media streams independent |
| 127 | of the encoding format of the media data. Such an encapsulation |
| 128 | format needs to provide: |
| 129 | |
| 130 | o framing for logical bitstreams. |
| 131 | |
| 132 | o interleaving of different logical bitstreams. |
| 133 | |
| 134 | o detection of corruption. |
| 135 | |
| 136 | o recapture after a parsing error. |
| 137 | |
| 138 | o position landmarks for direct random access of arbitrary positions |
| 139 | in the bitstream. |
| 140 | |
| 141 | o streaming capability (i.e., no seeking is needed to build a 100% |
| 142 | complete bitstream). |
| 143 | |
| 144 | o small overhead (i.e., use no more than approximately 1-2% of |
| 145 | bitstream bandwidth for packet boundary marking, high-level |
| 146 | framing, sync and seeking). |
| 147 | |
| 148 | o simplicity to enable fast parsing. |
| 149 | |
| 150 | o simple concatenation mechanism of several physical bitstreams. |
| 151 | |
| 152 | All of these design considerations have been taken into consideration |
| 153 | for Ogg. Ogg supports framing and interleaving of logical |
| 154 | bitstreams, seeking landmarks, detection of corruption, and stream |
| 155 | resynchronisation after a parsing error with no more than |
| 156 | approximately 1-2% overhead. It is a generic framework to perform |
| 157 | encapsulation of time-continuous bitstreams. It does not know any |
| 158 | specifics about the codec data that it encapsulates and is thus |
| 159 | independent of any media codec. |
| 160 | |
| 161 | 4. The Ogg bitstream format |
| 162 | |
| 163 | A physical Ogg bitstream consists of multiple logical bitstreams |
| 164 | interleaved in so-called "Pages". Whole pages are taken in order |
| 165 | from multiple logical bitstreams multiplexed at the page level. The |
| 166 | logical bitstreams are identified by a unique serial number in the |
| 167 | |
| 168 | |
| 169 | |
| 170 | Pfeiffer Informational [Page 3] |
| 171 | |
| 172 | RFC 3533 OGG May 2003 |
| 173 | |
| 174 | |
| 175 | header of each page of the physical bitstream. This unique serial |
| 176 | number is created randomly and does not have any connection to the |
| 177 | content or encoder of the logical bitstream it represents. Pages of |
| 178 | all logical bitstreams are concurrently interleaved, but they need |
| 179 | not be in a regular order - they are only required to be consecutive |
| 180 | within the logical bitstream. Ogg demultiplexing reconstructs the |
| 181 | original logical bitstreams from the physical bitstream by taking the |
| 182 | pages in order from the physical bitstream and redirecting them into |
| 183 | the appropriate logical decoding entity. |
| 184 | |
| 185 | Each Ogg page contains only one type of data as it belongs to one |
| 186 | logical bitstream only. Pages are of variable size and have a page |
| 187 | header containing encapsulation and error recovery information. Each |
| 188 | logical bitstream in a physical Ogg bitstream starts with a special |
| 189 | start page (bos=beginning of stream) and ends with a special page |
| 190 | (eos=end of stream). |
| 191 | |
| 192 | The bos page contains information to uniquely identify the codec type |
| 193 | and MAY contain information to set up the decoding process. The bos |
| 194 | page SHOULD also contain information about the encoded media - for |
| 195 | example, for audio, it should contain the sample rate and number of |
| 196 | channels. By convention, the first bytes of the bos page contain |
| 197 | magic data that uniquely identifies the required codec. It is the |
| 198 | responsibility of anyone fielding a new codec to make sure it is |
| 199 | possible to reliably distinguish his/her codec from all other codecs |
| 200 | in use. There is no fixed way to detect the end of the codec- |
| 201 | identifying marker. The format of the bos page is dependent on the |
| 202 | codec and therefore MUST be given in the encapsulation specification |
| 203 | of that logical bitstream type. Ogg also allows but does not require |
| 204 | secondary header packets after the bos page for logical bitstreams |
| 205 | and these must also precede any data packets in any logical |
| 206 | bitstream. These subsequent header packets are framed into an |
| 207 | integral number of pages, which will not contain any data packets. |
| 208 | So, a physical bitstream begins with the bos pages of all logical |
| 209 | bitstreams containing one initial header packet per page, followed by |
| 210 | the subsidiary header packets of all streams, followed by pages |
| 211 | containing data packets. |
| 212 | |
| 213 | The encapsulation specification for one or more logical bitstreams is |
| 214 | called a "media mapping". An example for a media mapping is "Ogg |
| 215 | Vorbis", which uses the Ogg framework to encapsulate Vorbis-encoded |
| 216 | audio data for stream-based storage (such as files) and transport |
| 217 | (such as TCP streams or pipes). Ogg Vorbis provides the name and |
| 218 | revision of the Vorbis codec, the audio rate and the audio quality on |
| 219 | the Ogg Vorbis bos page. It also uses two additional header pages |
| 220 | per logical bitstream. The Ogg Vorbis bos page starts with the byte |
| 221 | 0x01, followed by "vorbis" (a total of 7 bytes of identifier). |
| 222 | |
| 223 | |
| 224 | |
| 225 | |
| 226 | Pfeiffer Informational [Page 4] |
| 227 | |
| 228 | RFC 3533 OGG May 2003 |
| 229 | |
| 230 | |
| 231 | Ogg knows two types of multiplexing: concurrent multiplexing (so- |
| 232 | called "Grouping") and sequential multiplexing (so-called |
| 233 | "Chaining"). Grouping defines how to interleave several logical |
| 234 | bitstreams page-wise in the same physical bitstream. Grouping is for |
| 235 | example needed for interleaving a video stream with several |
| 236 | synchronised audio tracks using different codecs in different logical |
| 237 | bitstreams. Chaining on the other hand, is defined to provide a |
| 238 | simple mechanism to concatenate physical Ogg bitstreams, as is often |
| 239 | needed for streaming applications. |
| 240 | |
| 241 | In grouping, all bos pages of all logical bitstreams MUST appear |
| 242 | together at the beginning of the Ogg bitstream. The media mapping |
| 243 | specifies the order of the initial pages. For example, the grouping |
| 244 | of a specific Ogg video and Ogg audio bitstream may specify that the |
| 245 | physical bitstream MUST begin with the bos page of the logical video |
| 246 | bitstream, followed by the bos page of the audio bitstream. Unlike |
| 247 | bos pages, eos pages for the logical bitstreams need not all occur |
| 248 | contiguously. Eos pages may be 'nil' pages, that is, pages |
| 249 | containing no content but simply a page header with position |
| 250 | information and the eos flag set in the page header. Each grouped |
| 251 | logical bitstream MUST have a unique serial number within the scope |
| 252 | of the physical bitstream. |
| 253 | |
| 254 | In chaining, complete logical bitstreams are concatenated. The |
| 255 | bitstreams do not overlap, i.e., the eos page of a given logical |
| 256 | bitstream is immediately followed by the bos page of the next. Each |
| 257 | chained logical bitstream MUST have a unique serial number within the |
| 258 | scope of the physical bitstream. |
| 259 | |
| 260 | It is possible to consecutively chain groups of concurrently |
| 261 | multiplexed bitstreams. The groups, when unchained, MUST stand on |
| 262 | their own as a valid concurrently multiplexed bitstream. The |
| 263 | following diagram shows a schematic example of such a physical |
| 264 | bitstream that obeys all the rules of both grouped and chained |
| 265 | multiplexed bitstreams. |
| 266 | |
| 267 | physical bitstream with pages of |
| 268 | different logical bitstreams grouped and chained |
| 269 | ------------------------------------------------------------- |
| 270 | |*A*|*B*|*C*|A|A|C|B|A|B|#A#|C|...|B|C|#B#|#C#|*D*|D|...|#D#| |
| 271 | ------------------------------------------------------------- |
| 272 | bos bos bos eos eos eos bos eos |
| 273 | |
| 274 | In this example, there are two chained physical bitstreams, the first |
| 275 | of which is a grouped stream of three logical bitstreams A, B, and C. |
| 276 | The second physical bitstream is chained after the end of the grouped |
| 277 | bitstream, which ends after the last eos page of all its grouped |
| 278 | logical bitstreams. As can be seen, grouped bitstreams begin |
| 279 | |
| 280 | |
| 281 | |
| 282 | Pfeiffer Informational [Page 5] |
| 283 | |
| 284 | RFC 3533 OGG May 2003 |
| 285 | |
| 286 | |
| 287 | together - all of the bos pages MUST appear before any data pages. |
| 288 | It can also be seen that pages of concurrently multiplexed bitstreams |
| 289 | need not conform to a regular order. And it can be seen that a |
| 290 | grouped bitstream can end long before the other bitstreams in the |
| 291 | group end. |
| 292 | |
| 293 | Ogg does not know any specifics about the codec data except that each |
| 294 | logical bitstream belongs to a different codec, the data from the |
| 295 | codec comes in order and has position markers (so-called "Granule |
| 296 | positions"). Ogg does not have a concept of 'time': it only knows |
| 297 | about sequentially increasing, unitless position markers. An |
| 298 | application can only get temporal information through higher layers |
| 299 | which have access to the codec APIs to assign and convert granule |
| 300 | positions or time. |
| 301 | |
| 302 | A specific definition of a media mapping using Ogg may put further |
| 303 | constraints on its specific use of the Ogg bitstream format. For |
| 304 | example, a specific media mapping may require that all the eos pages |
| 305 | for all grouped bitstreams need to appear in direct sequence. An |
| 306 | example for a media mapping is the specification of "Ogg Vorbis". |
| 307 | Another example is the upcoming "Ogg Theora" specification which |
| 308 | encapsulates Theora-encoded video data and usually comes multiplexed |
| 309 | with a Vorbis stream for an Ogg containing synchronised audio and |
| 310 | video. As Ogg does not specify temporal relationships between the |
| 311 | encapsulated concurrently multiplexed bitstreams, the temporal |
| 312 | synchronisation between the audio and video stream will be specified |
| 313 | in this media mapping. To enable streaming, pages from various |
| 314 | logical bitstreams will typically be interleaved in chronological |
| 315 | order. |
| 316 | |
| 317 | 5. The encapsulation process |
| 318 | |
| 319 | The process of multiplexing different logical bitstreams happens at |
| 320 | the level of pages as described above. The bitstreams provided by |
| 321 | encoders are however handed over to Ogg as so-called "Packets" with |
| 322 | packet boundaries dependent on the encoding format. The process of |
| 323 | encapsulating packets into pages will be described now. |
| 324 | |
| 325 | From Ogg's perspective, packets can be of any arbitrary size. A |
| 326 | specific media mapping will define how to group or break up packets |
| 327 | from a specific media encoder. As Ogg pages have a maximum size of |
| 328 | about 64 kBytes, sometimes a packet has to be distributed over |
| 329 | several pages. To simplify that process, Ogg divides each packet |
| 330 | into 255 byte long chunks plus a final shorter chunk. These chunks |
| 331 | are called "Ogg Segments". They are only a logical construct and do |
| 332 | not have a header for themselves. |
| 333 | |
| 334 | |
| 335 | |
| 336 | |
| 337 | |
| 338 | Pfeiffer Informational [Page 6] |
| 339 | |
| 340 | RFC 3533 OGG May 2003 |
| 341 | |
| 342 | |
| 343 | A group of contiguous segments is wrapped into a variable length page |
| 344 | preceded by a header. A segment table in the page header tells about |
| 345 | the "Lacing values" (sizes) of the segments included in the page. A |
| 346 | flag in the page header tells whether a page contains a packet |
| 347 | continued from a previous page. Note that a lacing value of 255 |
| 348 | implies that a second lacing value follows in the packet, and a value |
| 349 | of less than 255 marks the end of the packet after that many |
| 350 | additional bytes. A packet of 255 bytes (or a multiple of 255 bytes) |
| 351 | is terminated by a lacing value of 0. Note also that a 'nil' (zero |
| 352 | length) packet is not an error; it consists of nothing more than a |
| 353 | lacing value of zero in the header. |
| 354 | |
| 355 | The encoding is optimized for speed and the expected case of the |
| 356 | majority of packets being between 50 and 200 bytes large. This is a |
| 357 | design justification rather than a recommendation. This encoding |
| 358 | both avoids imposing a maximum packet size as well as imposing |
| 359 | minimum overhead on small packets. In contrast, e.g., simply using |
| 360 | two bytes at the head of every packet and having a max packet size of |
| 361 | 32 kBytes would always penalize small packets (< 255 bytes, the |
| 362 | typical case) with twice the segmentation overhead. Using the lacing |
| 363 | values as suggested, small packets see the minimum possible byte- |
| 364 | aligned overhead (1 byte) and large packets (>512 bytes) see a fairly |
| 365 | constant ~0.5% overhead on encoding space. |
| 366 | |
| 367 | |
| 368 | |
| 369 | |
| 370 | |
| 371 | |
| 372 | |
| 373 | |
| 374 | |
| 375 | |
| 376 | |
| 377 | |
| 378 | |
| 379 | |
| 380 | |
| 381 | |
| 382 | |
| 383 | |
| 384 | |
| 385 | |
| 386 | |
| 387 | |
| 388 | |
| 389 | |
| 390 | |
| 391 | |
| 392 | |
| 393 | |
| 394 | Pfeiffer Informational [Page 7] |
| 395 | |
| 396 | RFC 3533 OGG May 2003 |
| 397 | |
| 398 | |
| 399 | The following diagram shows a schematic example of a media mapping |
| 400 | using Ogg and grouped logical bitstreams: |
| 401 | |
| 402 | logical bitstream with packet boundaries |
| 403 | ----------------------------------------------------------------- |
| 404 | > | packet_1 | packet_2 | packet_3 | < |
| 405 | ----------------------------------------------------------------- |
| 406 | |
| 407 | |segmentation (logically only) |
| 408 | v |
| 409 | |
| 410 | packet_1 (5 segments) packet_2 (4 segs) p_3 (2 segs) |
| 411 | ------------------------------ -------------------- ------------ |
| 412 | .. |seg_1|seg_2|seg_3|seg_4|s_5 | |seg_1|seg_2|seg_3|| |seg_1|s_2 | .. |
| 413 | ------------------------------ -------------------- ------------ |
| 414 | |
| 415 | | page encapsulation |
| 416 | v |
| 417 | |
| 418 | page_1 (packet_1 data) page_2 (pket_1 data) page_3 (packet_2 data) |
| 419 | ------------------------ ---------------- ------------------------ |
| 420 | |H|------------------- | |H|----------- | |H|------------------- | |
| 421 | |D||seg_1|seg_2|seg_3| | |D|seg_4|s_5 | | |D||seg_1|seg_2|seg_3| | ... |
| 422 | |R|------------------- | |R|----------- | |R|------------------- | |
| 423 | ------------------------ ---------------- ------------------------ |
| 424 | |
| 425 | | |
| 426 | pages of | |
| 427 | other --------| | |
| 428 | logical ------- |
| 429 | bitstreams | MUX | |
| 430 | ------- |
| 431 | | |
| 432 | v |
| 433 | |
| 434 | page_1 page_2 page_3 |
| 435 | ------ ------ ------- ----- ------- |
| 436 | ... || | || | || | || | || | ... |
| 437 | ------ ------ ------- ----- ------- |
| 438 | physical Ogg bitstream |
| 439 | |
| 440 | In this example we take a snapshot of the encapsulation process of |
| 441 | one logical bitstream. We can see part of that bitstream's |
| 442 | subdivision into packets as provided by the codec. The Ogg |
| 443 | encapsulation process chops up the packets into segments. The |
| 444 | packets in this example are rather large such that packet_1 is split |
| 445 | into 5 segments - 4 segments with 255 bytes and a final smaller one. |
| 446 | Packet_2 is split into 4 segments - 3 segments with 255 bytes and a |
| 447 | |
| 448 | |
| 449 | |
| 450 | Pfeiffer Informational [Page 8] |
| 451 | |
| 452 | RFC 3533 OGG May 2003 |
| 453 | |
| 454 | |
| 455 | final very small one - and packet_3 is split into two segments. The |
| 456 | encapsulation process then creates pages, which are quite small in |
| 457 | this example. Page_1 consists of the first three segments of |
| 458 | packet_1, page_2 contains the remaining 2 segments from packet_1, and |
| 459 | page_3 contains the first three pages of packet_2. Finally, this |
| 460 | logical bitstream is multiplexed into a physical Ogg bitstream with |
| 461 | pages of other logical bitstreams. |
| 462 | |
| 463 | 6. The Ogg page format |
| 464 | |
| 465 | A physical Ogg bitstream consists of a sequence of concatenated |
| 466 | pages. Pages are of variable size, usually 4-8 kB, maximum 65307 |
| 467 | bytes. A page header contains all the information needed to |
| 468 | demultiplex the logical bitstreams out of the physical bitstream and |
| 469 | to perform basic error recovery and landmarks for seeking. Each page |
| 470 | is a self-contained entity such that the page decode mechanism can |
| 471 | recognize, verify, and handle single pages at a time without |
| 472 | requiring the overall bitstream. |
| 473 | |
| 474 | The Ogg page header has the following format: |
| 475 | |
| 476 | 0 1 2 3 |
| 477 | 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1| Byte |
| 478 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |
| 479 | | capture_pattern: Magic number for page start "OggS" | 0-3 |
| 480 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |
| 481 | | version | header_type | granule_position | 4-7 |
| 482 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |
| 483 | | | 8-11 |
| 484 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |
| 485 | | | bitstream_serial_number | 12-15 |
| 486 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |
| 487 | | | page_sequence_number | 16-19 |
| 488 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |
| 489 | | | CRC_checksum | 20-23 |
| 490 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |
| 491 | | |page_segments | segment_table | 24-27 |
| 492 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |
| 493 | | ... | 28- |
| 494 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |
| 495 | |
| 496 | The LSb (least significant bit) comes first in the Bytes. Fields |
| 497 | with more than one byte length are encoded LSB (least significant |
| 498 | byte) first. |
| 499 | |
| 500 | |
| 501 | |
| 502 | |
| 503 | |
| 504 | |
| 505 | |
| 506 | Pfeiffer Informational [Page 9] |
| 507 | |
| 508 | RFC 3533 OGG May 2003 |
| 509 | |
| 510 | |
| 511 | The fields in the page header have the following meaning: |
| 512 | |
| 513 | 1. capture_pattern: a 4 Byte field that signifies the beginning of a |
| 514 | page. It contains the magic numbers: |
| 515 | |
| 516 | 0x4f 'O' |
| 517 | |
| 518 | 0x67 'g' |
| 519 | |
| 520 | 0x67 'g' |
| 521 | |
| 522 | 0x53 'S' |
| 523 | |
| 524 | It helps a decoder to find the page boundaries and regain |
| 525 | synchronisation after parsing a corrupted stream. Once the |
| 526 | capture pattern is found, the decoder verifies page sync and |
| 527 | integrity by computing and comparing the checksum. |
| 528 | |
| 529 | 2. stream_structure_version: 1 Byte signifying the version number of |
| 530 | the Ogg file format used in this stream (this document specifies |
| 531 | version 0). |
| 532 | |
| 533 | 3. header_type_flag: the bits in this 1 Byte field identify the |
| 534 | specific type of this page. |
| 535 | |
| 536 | * bit 0x01 |
| 537 | |
| 538 | set: page contains data of a packet continued from the previous |
| 539 | page |
| 540 | |
| 541 | unset: page contains a fresh packet |
| 542 | |
| 543 | * bit 0x02 |
| 544 | |
| 545 | set: this is the first page of a logical bitstream (bos) |
| 546 | |
| 547 | unset: this page is not a first page |
| 548 | |
| 549 | * bit 0x04 |
| 550 | |
| 551 | set: this is the last page of a logical bitstream (eos) |
| 552 | |
| 553 | unset: this page is not a last page |
| 554 | |
| 555 | 4. granule_position: an 8 Byte field containing position information. |
| 556 | For example, for an audio stream, it MAY contain the total number |
| 557 | of PCM samples encoded after including all frames finished on this |
| 558 | page. For a video stream it MAY contain the total number of video |
| 559 | |
| 560 | |
| 561 | |
| 562 | Pfeiffer Informational [Page 10] |
| 563 | |
| 564 | RFC 3533 OGG May 2003 |
| 565 | |
| 566 | |
| 567 | frames encoded after this page. This is a hint for the decoder |
| 568 | and gives it some timing and position information. Its meaning is |
| 569 | dependent on the codec for that logical bitstream and specified in |
| 570 | a specific media mapping. A special value of -1 (in two's |
| 571 | complement) indicates that no packets finish on this page. |
| 572 | |
| 573 | 5. bitstream_serial_number: a 4 Byte field containing the unique |
| 574 | serial number by which the logical bitstream is identified. |
| 575 | |
| 576 | 6. page_sequence_number: a 4 Byte field containing the sequence |
| 577 | number of the page so the decoder can identify page loss. This |
| 578 | sequence number is increasing on each logical bitstream |
| 579 | separately. |
| 580 | |
| 581 | 7. CRC_checksum: a 4 Byte field containing a 32 bit CRC checksum of |
| 582 | the page (including header with zero CRC field and page content). |
| 583 | The generator polynomial is 0x04c11db7. |
| 584 | |
| 585 | 8. number_page_segments: 1 Byte giving the number of segment entries |
| 586 | encoded in the segment table. |
| 587 | |
| 588 | 9. segment_table: number_page_segments Bytes containing the lacing |
| 589 | values of all segments in this page. Each Byte contains one |
| 590 | lacing value. |
| 591 | |
| 592 | The total header size in bytes is given by: |
| 593 | header_size = number_page_segments + 27 [Byte] |
| 594 | |
| 595 | The total page size in Bytes is given by: |
| 596 | page_size = header_size + sum(lacing_values: 1..number_page_segments) |
| 597 | [Byte] |
| 598 | |
| 599 | 7. Security Considerations |
| 600 | |
| 601 | The Ogg encapsulation format is a container format and only |
| 602 | encapsulates content (such as Vorbis-encoded audio). It does not |
| 603 | provide for any generic encryption or signing of itself or its |
| 604 | contained content bitstreams. However, it encapsulates any kind of |
| 605 | content bitstream as long as there is a codec for it, and is thus |
| 606 | able to contain encrypted and signed content data. It is also |
| 607 | possible to add an external security mechanism that encrypts or signs |
| 608 | an Ogg physical bitstream and thus provides content confidentiality |
| 609 | and authenticity. |
| 610 | |
| 611 | As Ogg encapsulates binary data, it is possible to include executable |
| 612 | content in an Ogg bitstream. This can be an issue with applications |
| 613 | that are implemented using the Ogg format, especially when Ogg is |
| 614 | used for streaming or file transfer in a networking scenario. As |
| 615 | |
| 616 | |
| 617 | |
| 618 | Pfeiffer Informational [Page 11] |
| 619 | |
| 620 | RFC 3533 OGG May 2003 |
| 621 | |
| 622 | |
| 623 | such, Ogg does not pose a threat there. However, an application |
| 624 | decoding Ogg and its encapsulated content bitstreams has to ensure |
| 625 | correct handling of manipulated bitstreams, of buffer overflows and |
| 626 | the like. |
| 627 | |
| 628 | 8. References |
| 629 | |
| 630 | [1] Walleij, L., "The application/ogg Media Type", RFC 3534, May |
| 631 | 2003. |
| 632 | |
| 633 | [2] Bradner, S., "Key words for use in RFCs to Indicate Requirement |
| 634 | Levels", BCP 14, RFC 2119, March 1997. |
| 635 | |
| 636 | |
| 637 | |
| 638 | |
| 639 | |
| 640 | |
| 641 | |
| 642 | |
| 643 | |
| 644 | |
| 645 | |
| 646 | |
| 647 | |
| 648 | |
| 649 | |
| 650 | |
| 651 | |
| 652 | |
| 653 | |
| 654 | |
| 655 | |
| 656 | |
| 657 | |
| 658 | |
| 659 | |
| 660 | |
| 661 | |
| 662 | |
| 663 | |
| 664 | |
| 665 | |
| 666 | |
| 667 | |
| 668 | |
| 669 | |
| 670 | |
| 671 | |
| 672 | |
| 673 | |
| 674 | Pfeiffer Informational [Page 12] |
| 675 | |
| 676 | RFC 3533 OGG May 2003 |
| 677 | |
| 678 | |
| 679 | Appendix A. Glossary of terms and abbreviations |
| 680 | |
| 681 | bos page: The initial page (beginning of stream) of a logical |
| 682 | bitstream which contains information to identify the codec type |
| 683 | and other decoding-relevant information. |
| 684 | |
| 685 | chaining (or sequential multiplexing): Concatenation of two or more |
| 686 | complete physical Ogg bitstreams. |
| 687 | |
| 688 | eos page: The final page (end of stream) of a logical bitstream. |
| 689 | |
| 690 | granule position: An increasing position number for a specific |
| 691 | logical bitstream stored in the page header. Its meaning is |
| 692 | dependent on the codec for that logical bitstream and specified in |
| 693 | a specific media mapping. |
| 694 | |
| 695 | grouping (or concurrent multiplexing): Interleaving of pages of |
| 696 | several logical bitstreams into one complete physical Ogg |
| 697 | bitstream under the restriction that all bos pages of all grouped |
| 698 | logical bitstreams MUST appear before any data pages. |
| 699 | |
| 700 | lacing value: An entry in the segment table of a page header |
| 701 | representing the size of the related segment. |
| 702 | |
| 703 | logical bitstream: A sequence of bits being the result of an encoded |
| 704 | media stream. |
| 705 | |
| 706 | media mapping: A specific use of the Ogg encapsulation format |
| 707 | together with a specific (set of) codec(s). |
| 708 | |
| 709 | (Ogg) packet: A subpart of a logical bitstream that is created by the |
| 710 | encoder for that bitstream and represents a meaningful entity for |
| 711 | the encoder, but only a sequence of bits to the Ogg encapsulation. |
| 712 | |
| 713 | (Ogg) page: A physical bitstream consists of a sequence of Ogg pages |
| 714 | containing data of one logical bitstream only. It usually |
| 715 | contains a group of contiguous segments of one packet only, but |
| 716 | sometimes packets are too large and need to be split over several |
| 717 | pages. |
| 718 | |
| 719 | physical (Ogg) bitstream: The sequence of bits resulting from an Ogg |
| 720 | encapsulation of one or several logical bitstreams. It consists |
| 721 | of a sequence of pages from the logical bitstreams with the |
| 722 | restriction that the pages of one logical bitstream MUST come in |
| 723 | their correct temporal order. |
| 724 | |
| 725 | |
| 726 | |
| 727 | |
| 728 | |
| 729 | |
| 730 | Pfeiffer Informational [Page 13] |
| 731 | |
| 732 | RFC 3533 OGG May 2003 |
| 733 | |
| 734 | |
| 735 | (Ogg) segment: The Ogg encapsulation process splits each packet into |
| 736 | chunks of 255 bytes plus a last fractional chunk of less than 255 |
| 737 | bytes. These chunks are called segments. |
| 738 | |
| 739 | Appendix B. Acknowledgements |
| 740 | |
| 741 | The author gratefully acknowledges the work that Christopher |
| 742 | Montgomery and the Xiph.Org foundation have done in defining the Ogg |
| 743 | multimedia project and as part of it the open file format described |
| 744 | in this document. The author hopes that providing this document to |
| 745 | the Internet community will help in promoting the Ogg multimedia |
| 746 | project at http://www.xiph.org/. Many thanks also for the many |
| 747 | technical and typo corrections that C. Montgomery and the Ogg |
| 748 | community provided as feedback to this RFC. |
| 749 | |
| 750 | Author's Address |
| 751 | |
| 752 | Silvia Pfeiffer |
| 753 | CSIRO, Australia |
| 754 | Locked Bag 17 |
| 755 | North Ryde, NSW 2113 |
| 756 | Australia |
| 757 | |
| 758 | Phone: +61 2 9325 3141 |
| 759 | EMail: Silvia.Pfeiffer@csiro.au |
| 760 | URI: http://www.cmis.csiro.au/Silvia.Pfeiffer/ |
| 761 | |
| 762 | |
| 763 | |
| 764 | |
| 765 | |
| 766 | |
| 767 | |
| 768 | |
| 769 | |
| 770 | |
| 771 | |
| 772 | |
| 773 | |
| 774 | |
| 775 | |
| 776 | |
| 777 | |
| 778 | |
| 779 | |
| 780 | |
| 781 | |
| 782 | |
| 783 | |
| 784 | |
| 785 | |
| 786 | Pfeiffer Informational [Page 14] |
| 787 | |
| 788 | RFC 3533 OGG May 2003 |
| 789 | |
| 790 | |
| 791 | Full Copyright Statement |
| 792 | |
| 793 | Copyright (C) The Internet Society (2003). All Rights Reserved. |
| 794 | |
| 795 | This document and translations of it may be copied and furnished to |
| 796 | others, and derivative works that comment on or otherwise explain it |
| 797 | or assist in its implementation may be prepared, copied, published |
| 798 | and distributed, in whole or in part, without restriction of any |
| 799 | kind, provided that the above copyright notice and this paragraph are |
| 800 | included on all such copies and derivative works. However, this |
| 801 | document itself may not be modified in any way, such as by removing |
| 802 | the copyright notice or references to the Internet Society or other |
| 803 | Internet organizations, except as needed for the purpose of |
| 804 | developing Internet standards in which case the procedures for |
| 805 | copyrights defined in the Internet Standards process must be |
| 806 | followed, or as required to translate it into languages other than |
| 807 | English. |
| 808 | |
| 809 | The limited permissions granted above are perpetual and will not be |
| 810 | revoked by the Internet Society or its successors or assigns. |
| 811 | |
| 812 | This document and the information contained herein is provided on an |
| 813 | "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING |
| 814 | TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING |
| 815 | BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION |
| 816 | HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF |
| 817 | MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. |
| 818 | |
| 819 | Acknowledgement |
| 820 | |
| 821 | Funding for the RFC Editor function is currently provided by the |
| 822 | Internet Society. |
| 823 | |
| 824 | |
| 825 | |
| 826 | |
| 827 | |
| 828 | |
| 829 | |
| 830 | |
| 831 | |
| 832 | |
| 833 | |
| 834 | |
| 835 | |
| 836 | |
| 837 | |
| 838 | |
| 839 | |
| 840 | |
| 841 | |
| 842 | Pfeiffer Informational [Page 15] |
| 843 | |