Thomas G. Lane | 36a4ccc | 1994-09-24 00:00:00 +0000 | [diff] [blame] | 1 | IJG JPEG LIBRARY: SYSTEM ARCHITECTURE |
| 2 | |
DRC | da13af6 | 2014-05-18 17:52:06 +0000 | [diff] [blame] | 3 | This file was part of the Independent JPEG Group's software: |
Guido Vollbeding | 5829cb2 | 2012-01-15 00:00:00 +0000 | [diff] [blame] | 4 | Copyright (C) 1991-2012, Thomas G. Lane, Guido Vollbeding. |
DRC | a73e870 | 2012-12-31 02:52:30 +0000 | [diff] [blame] | 5 | It was modified by The libjpeg-turbo Project to include only information |
| 6 | relevant to libjpeg-turbo. |
Thomas G. Lane | 36a4ccc | 1994-09-24 00:00:00 +0000 | [diff] [blame] | 7 | For conditions of distribution and use, see the accompanying README file. |
| 8 | |
| 9 | |
| 10 | This file provides an overview of the architecture of the IJG JPEG software; |
| 11 | that is, the functions of the various modules in the system and the interfaces |
| 12 | between modules. For more precise details about any data structure or calling |
| 13 | convention, see the include files and comments in the source code. |
| 14 | |
| 15 | We assume that the reader is already somewhat familiar with the JPEG standard. |
| 16 | The README file includes references for learning about JPEG. The file |
Guido Vollbeding | 5996a25 | 2009-06-27 00:00:00 +0000 | [diff] [blame] | 17 | libjpeg.txt describes the library from the viewpoint of an application |
Thomas G. Lane | 36a4ccc | 1994-09-24 00:00:00 +0000 | [diff] [blame] | 18 | programmer using the library; it's best to read that file before this one. |
Guido Vollbeding | 5996a25 | 2009-06-27 00:00:00 +0000 | [diff] [blame] | 19 | Also, the file coderules.txt describes the coding style conventions we use. |
Thomas G. Lane | 36a4ccc | 1994-09-24 00:00:00 +0000 | [diff] [blame] | 20 | |
| 21 | In this document, JPEG-specific terminology follows the JPEG standard: |
| 22 | A "component" means a color channel, e.g., Red or Luminance. |
| 23 | A "sample" is a single component value (i.e., one number in the image data). |
| 24 | A "coefficient" is a frequency coefficient (a DCT transform output number). |
| 25 | A "block" is an 8x8 group of samples or coefficients. |
| 26 | An "MCU" (minimum coded unit) is an interleaved set of blocks of size |
DRC | e5eaf37 | 2014-05-09 18:00:32 +0000 | [diff] [blame] | 27 | determined by the sampling factors, or a single block in a |
| 28 | noninterleaved scan. |
Thomas G. Lane | 36a4ccc | 1994-09-24 00:00:00 +0000 | [diff] [blame] | 29 | We do not use the terms "pixel" and "sample" interchangeably. When we say |
| 30 | pixel, we mean an element of the full-size image, while a sample is an element |
| 31 | of the downsampled image. Thus the number of samples may vary across |
| 32 | components while the number of pixels does not. (This terminology is not used |
| 33 | rigorously throughout the code, but it is used in places where confusion would |
| 34 | otherwise result.) |
| 35 | |
| 36 | |
| 37 | *** System features *** |
| 38 | |
| 39 | The IJG distribution contains two parts: |
| 40 | * A subroutine library for JPEG compression and decompression. |
Thomas G. Lane | bc79e06 | 1995-08-02 00:00:00 +0000 | [diff] [blame] | 41 | * cjpeg/djpeg, two sample applications that use the library to transform |
Thomas G. Lane | 36a4ccc | 1994-09-24 00:00:00 +0000 | [diff] [blame] | 42 | JFIF JPEG files to and from several other image formats. |
| 43 | cjpeg/djpeg are of no great intellectual complexity: they merely add a simple |
| 44 | command-line user interface and I/O routines for several uncompressed image |
| 45 | formats. This document concentrates on the library itself. |
| 46 | |
Thomas G. Lane | bc79e06 | 1995-08-02 00:00:00 +0000 | [diff] [blame] | 47 | We desire the library to be capable of supporting all JPEG baseline, extended |
| 48 | sequential, and progressive DCT processes. Hierarchical processes are not |
| 49 | supported. |
Thomas G. Lane | 36a4ccc | 1994-09-24 00:00:00 +0000 | [diff] [blame] | 50 | |
| 51 | The library does not support the lossless (spatial) JPEG process. Lossless |
| 52 | JPEG shares little or no code with lossy JPEG, and would normally be used |
| 53 | without the extensive pre- and post-processing provided by this library. |
| 54 | We feel that lossless JPEG is better handled by a separate library. |
| 55 | |
| 56 | Within these limits, any set of compression parameters allowed by the JPEG |
| 57 | spec should be readable for decompression. (We can be more restrictive about |
| 58 | what formats we can generate.) Although the system design allows for all |
| 59 | parameter values, some uncommon settings are not yet implemented and may |
| 60 | never be; nonintegral sampling ratios are the prime example. Furthermore, |
| 61 | we treat 8-bit vs. 12-bit data precision as a compile-time switch, not a |
| 62 | run-time option, because most machines can store 8-bit pixels much more |
| 63 | compactly than 12-bit. |
| 64 | |
Thomas G. Lane | 36a4ccc | 1994-09-24 00:00:00 +0000 | [diff] [blame] | 65 | By itself, the library handles only interchange JPEG datastreams --- in |
| 66 | particular the widely used JFIF file format. The library can be used by |
| 67 | surrounding code to process interchange or abbreviated JPEG datastreams that |
Thomas G. Lane | bc79e06 | 1995-08-02 00:00:00 +0000 | [diff] [blame] | 68 | are embedded in more complex file formats. (For example, libtiff uses this |
| 69 | library to implement JPEG compression within the TIFF file format.) |
Thomas G. Lane | 36a4ccc | 1994-09-24 00:00:00 +0000 | [diff] [blame] | 70 | |
| 71 | The library includes a substantial amount of code that is not covered by the |
| 72 | JPEG standard but is necessary for typical applications of JPEG. These |
| 73 | functions preprocess the image before JPEG compression or postprocess it after |
| 74 | decompression. They include colorspace conversion, downsampling/upsampling, |
| 75 | and color quantization. This code can be omitted if not needed. |
| 76 | |
| 77 | A wide range of quality vs. speed tradeoffs are possible in JPEG processing, |
| 78 | and even more so in decompression postprocessing. The decompression library |
| 79 | provides multiple implementations that cover most of the useful tradeoffs, |
| 80 | ranging from very-high-quality down to fast-preview operation. On the |
| 81 | compression side we have generally not provided low-quality choices, since |
| 82 | compression is normally less time-critical. It should be understood that the |
| 83 | low-quality modes may not meet the JPEG standard's accuracy requirements; |
| 84 | nonetheless, they are useful for viewers. |
| 85 | |
| 86 | |
Thomas G. Lane | 36a4ccc | 1994-09-24 00:00:00 +0000 | [diff] [blame] | 87 | *** System overview *** |
| 88 | |
| 89 | The compressor and decompressor are each divided into two main sections: |
| 90 | the JPEG compressor or decompressor proper, and the preprocessing or |
| 91 | postprocessing functions. The interface between these two sections is the |
| 92 | image data that the official JPEG spec regards as its input or output: this |
| 93 | data is in the colorspace to be used for compression, and it is downsampled |
| 94 | to the sampling factors to be used. The preprocessing and postprocessing |
| 95 | steps are responsible for converting a normal image representation to or from |
| 96 | this form. (Those few applications that want to deal with YCbCr downsampled |
| 97 | data can skip the preprocessing or postprocessing step.) |
| 98 | |
| 99 | Looking more closely, the compressor library contains the following main |
| 100 | elements: |
| 101 | |
| 102 | Preprocessing: |
Thomas G. Lane | bc79e06 | 1995-08-02 00:00:00 +0000 | [diff] [blame] | 103 | * Color space conversion (e.g., RGB to YCbCr). |
Thomas G. Lane | 36a4ccc | 1994-09-24 00:00:00 +0000 | [diff] [blame] | 104 | * Edge expansion and downsampling. Optionally, this step can do simple |
| 105 | smoothing --- this is often helpful for low-quality source data. |
| 106 | JPEG proper: |
| 107 | * MCU assembly, DCT, quantization. |
Thomas G. Lane | bc79e06 | 1995-08-02 00:00:00 +0000 | [diff] [blame] | 108 | * Entropy coding (sequential or progressive, Huffman or arithmetic). |
Thomas G. Lane | 36a4ccc | 1994-09-24 00:00:00 +0000 | [diff] [blame] | 109 | |
| 110 | In addition to these modules we need overall control, marker generation, |
| 111 | and support code (memory management & error handling). There is also a |
| 112 | module responsible for physically writing the output data --- typically |
| 113 | this is just an interface to fwrite(), but some applications may need to |
| 114 | do something else with the data. |
| 115 | |
| 116 | The decompressor library contains the following main elements: |
| 117 | |
| 118 | JPEG proper: |
Thomas G. Lane | bc79e06 | 1995-08-02 00:00:00 +0000 | [diff] [blame] | 119 | * Entropy decoding (sequential or progressive, Huffman or arithmetic). |
Thomas G. Lane | 36a4ccc | 1994-09-24 00:00:00 +0000 | [diff] [blame] | 120 | * Dequantization, inverse DCT, MCU disassembly. |
| 121 | Postprocessing: |
| 122 | * Upsampling. Optionally, this step may be able to do more general |
| 123 | rescaling of the image. |
| 124 | * Color space conversion (e.g., YCbCr to RGB). This step may also |
Thomas G. Lane | bc79e06 | 1995-08-02 00:00:00 +0000 | [diff] [blame] | 125 | provide gamma adjustment [ currently it does not ]. |
Thomas G. Lane | 36a4ccc | 1994-09-24 00:00:00 +0000 | [diff] [blame] | 126 | * Optional color quantization (e.g., reduction to 256 colors). |
| 127 | * Optional color precision reduction (e.g., 24-bit to 15-bit color). |
Thomas G. Lane | bc79e06 | 1995-08-02 00:00:00 +0000 | [diff] [blame] | 128 | [This feature is not currently implemented.] |
Thomas G. Lane | 36a4ccc | 1994-09-24 00:00:00 +0000 | [diff] [blame] | 129 | |
| 130 | We also need overall control, marker parsing, and a data source module. |
| 131 | The support code (memory management & error handling) can be shared with |
| 132 | the compression half of the library. |
| 133 | |
| 134 | There may be several implementations of each of these elements, particularly |
| 135 | in the decompressor, where a wide range of speed/quality tradeoffs is very |
| 136 | useful. It must be understood that some of the best speedups involve |
| 137 | merging adjacent steps in the pipeline. For example, upsampling, color space |
| 138 | conversion, and color quantization might all be done at once when using a |
| 139 | low-quality ordered-dither technique. The system architecture is designed to |
| 140 | allow such merging where appropriate. |
| 141 | |
| 142 | |
| 143 | Note: it is convenient to regard edge expansion (padding to block boundaries) |
| 144 | as a preprocessing/postprocessing function, even though the JPEG spec includes |
| 145 | it in compression/decompression. We do this because downsampling/upsampling |
| 146 | can be simplified a little if they work on padded data: it's not necessary to |
| 147 | have special cases at the right and bottom edges. Therefore the interface |
| 148 | buffer is always an integral number of blocks wide and high, and we expect |
| 149 | compression preprocessing to pad the source data properly. Padding will occur |
| 150 | only to the next block (8-sample) boundary. In an interleaved-scan situation, |
| 151 | additional dummy blocks may be used to fill out MCUs, but the MCU assembly and |
| 152 | disassembly logic will create or discard these blocks internally. (This is |
| 153 | advantageous for speed reasons, since we avoid DCTing the dummy blocks. |
| 154 | It also permits a small reduction in file size, because the compressor can |
Thomas G. Lane | bc79e06 | 1995-08-02 00:00:00 +0000 | [diff] [blame] | 155 | choose dummy block contents so as to minimize their size in compressed form. |
| 156 | Finally, it makes the interface buffer specification independent of whether |
| 157 | the file is actually interleaved or not.) Applications that wish to deal |
| 158 | directly with the downsampled data must provide similar buffering and padding |
| 159 | for odd-sized images. |
Thomas G. Lane | 36a4ccc | 1994-09-24 00:00:00 +0000 | [diff] [blame] | 160 | |
| 161 | |
| 162 | *** Poor man's object-oriented programming *** |
| 163 | |
| 164 | It should be clear by now that we have a lot of quasi-independent processing |
| 165 | steps, many of which have several possible behaviors. To avoid cluttering the |
| 166 | code with lots of switch statements, we use a simple form of object-style |
| 167 | programming to separate out the different possibilities. |
| 168 | |
| 169 | For example, two different color quantization algorithms could be implemented |
| 170 | as two separate modules that present the same external interface; at runtime, |
| 171 | the calling code will access the proper module indirectly through an "object". |
| 172 | |
| 173 | We can get the limited features we need while staying within portable C. |
| 174 | The basic tool is a function pointer. An "object" is just a struct |
| 175 | containing one or more function pointer fields, each of which corresponds to |
| 176 | a method name in real object-oriented languages. During initialization we |
| 177 | fill in the function pointers with references to whichever module we have |
| 178 | determined we need to use in this run. Then invocation of the module is done |
| 179 | by indirecting through a function pointer; on most machines this is no more |
| 180 | expensive than a switch statement, which would be the only other way of |
| 181 | making the required run-time choice. The really significant benefit, of |
| 182 | course, is keeping the source code clean and well structured. |
| 183 | |
| 184 | We can also arrange to have private storage that varies between different |
| 185 | implementations of the same kind of object. We do this by making all the |
| 186 | module-specific object structs be separately allocated entities, which will |
| 187 | be accessed via pointers in the master compression or decompression struct. |
| 188 | The "public" fields or methods for a given kind of object are specified by |
| 189 | a commonly known struct. But a module's initialization code can allocate |
| 190 | a larger struct that contains the common struct as its first member, plus |
| 191 | additional private fields. With appropriate pointer casting, the module's |
| 192 | internal functions can access these private fields. (For a simple example, |
| 193 | see jdatadst.c, which implements the external interface specified by struct |
| 194 | jpeg_destination_mgr, but adds extra fields.) |
| 195 | |
| 196 | (Of course this would all be a lot easier if we were using C++, but we are |
| 197 | not yet prepared to assume that everyone has a C++ compiler.) |
| 198 | |
| 199 | An important benefit of this scheme is that it is easy to provide multiple |
| 200 | versions of any method, each tuned to a particular case. While a lot of |
| 201 | precalculation might be done to select an optimal implementation of a method, |
| 202 | the cost per invocation is constant. For example, the upsampling step might |
| 203 | have a "generic" method, plus one or more "hardwired" methods for the most |
| 204 | popular sampling factors; the hardwired methods would be faster because they'd |
| 205 | use straight-line code instead of for-loops. The cost to determine which |
| 206 | method to use is paid only once, at startup, and the selection criteria are |
| 207 | hidden from the callers of the method. |
| 208 | |
| 209 | This plan differs a little bit from usual object-oriented structures, in that |
| 210 | only one instance of each object class will exist during execution. The |
| 211 | reason for having the class structure is that on different runs we may create |
| 212 | different instances (choose to execute different modules). You can think of |
| 213 | the term "method" as denoting the common interface presented by a particular |
| 214 | set of interchangeable functions, and "object" as denoting a group of related |
| 215 | methods, or the total shared interface behavior of a group of modules. |
| 216 | |
| 217 | |
| 218 | *** Overall control structure *** |
| 219 | |
| 220 | We previously mentioned the need for overall control logic in the compression |
| 221 | and decompression libraries. In IJG implementations prior to v5, overall |
| 222 | control was mostly provided by "pipeline control" modules, which proved to be |
| 223 | large, unwieldy, and hard to understand. To improve the situation, the |
| 224 | control logic has been subdivided into multiple modules. The control modules |
| 225 | consist of: |
| 226 | |
| 227 | 1. Master control for module selection and initialization. This has two |
| 228 | responsibilities: |
| 229 | |
| 230 | 1A. Startup initialization at the beginning of image processing. |
| 231 | The individual processing modules to be used in this run are selected |
| 232 | and given initialization calls. |
| 233 | |
| 234 | 1B. Per-pass control. This determines how many passes will be performed |
| 235 | and calls each active processing module to configure itself |
| 236 | appropriately at the beginning of each pass. End-of-pass processing, |
DRC | e5eaf37 | 2014-05-09 18:00:32 +0000 | [diff] [blame] | 237 | where necessary, is also invoked from the master control module. |
Thomas G. Lane | 36a4ccc | 1994-09-24 00:00:00 +0000 | [diff] [blame] | 238 | |
| 239 | Method selection is partially distributed, in that a particular processing |
| 240 | module may contain several possible implementations of a particular method, |
| 241 | which it will select among when given its initialization call. The master |
| 242 | control code need only be concerned with decisions that affect more than |
| 243 | one module. |
DRC | e5eaf37 | 2014-05-09 18:00:32 +0000 | [diff] [blame] | 244 | |
Thomas G. Lane | 36a4ccc | 1994-09-24 00:00:00 +0000 | [diff] [blame] | 245 | 2. Data buffering control. A separate control module exists for each |
| 246 | inter-processing-step data buffer. This module is responsible for |
| 247 | invoking the processing steps that write or read that data buffer. |
| 248 | |
| 249 | Each buffer controller sees the world as follows: |
| 250 | |
| 251 | input data => processing step A => buffer => processing step B => output data |
| 252 | | | | |
| 253 | ------------------ controller ------------------ |
| 254 | |
| 255 | The controller knows the dataflow requirements of steps A and B: how much data |
| 256 | they want to accept in one chunk and how much they output in one chunk. Its |
| 257 | function is to manage its buffer and call A and B at the proper times. |
| 258 | |
| 259 | A data buffer control module may itself be viewed as a processing step by a |
| 260 | higher-level control module; thus the control modules form a binary tree with |
| 261 | elementary processing steps at the leaves of the tree. |
| 262 | |
| 263 | The control modules are objects. A considerable amount of flexibility can |
| 264 | be had by replacing implementations of a control module. For example: |
| 265 | * Merging of adjacent steps in the pipeline is done by replacing a control |
| 266 | module and its pair of processing-step modules with a single processing- |
| 267 | step module. (Hence the possible merges are determined by the tree of |
| 268 | control modules.) |
| 269 | * In some processing modes, a given interstep buffer need only be a "strip" |
| 270 | buffer large enough to accommodate the desired data chunk sizes. In other |
| 271 | modes, a full-image buffer is needed and several passes are required. |
| 272 | The control module determines which kind of buffer is used and manipulates |
| 273 | virtual array buffers as needed. One or both processing steps may be |
| 274 | unaware of the multi-pass behavior. |
| 275 | |
| 276 | In theory, we might be able to make all of the data buffer controllers |
| 277 | interchangeable and provide just one set of implementations for all. In |
| 278 | practice, each one contains considerable special-case processing for its |
| 279 | particular job. The buffer controller concept should be regarded as an |
| 280 | overall system structuring principle, not as a complete description of the |
| 281 | task performed by any one controller. |
| 282 | |
| 283 | |
| 284 | *** Compression object structure *** |
| 285 | |
| 286 | Here is a sketch of the logical structure of the JPEG compression library: |
| 287 | |
| 288 | |-- Colorspace conversion |
| 289 | |-- Preprocessing controller --| |
| 290 | | |-- Downsampling |
| 291 | Main controller --| |
| 292 | | |-- Forward DCT, quantize |
| 293 | |-- Coefficient controller --| |
| 294 | |-- Entropy encoding |
| 295 | |
| 296 | This sketch also describes the flow of control (subroutine calls) during |
| 297 | typical image data processing. Each of the components shown in the diagram is |
| 298 | an "object" which may have several different implementations available. One |
| 299 | or more source code files contain the actual implementation(s) of each object. |
| 300 | |
| 301 | The objects shown above are: |
| 302 | |
| 303 | * Main controller: buffer controller for the subsampled-data buffer, which |
| 304 | holds the preprocessed input data. This controller invokes preprocessing to |
| 305 | fill the subsampled-data buffer, and JPEG compression to empty it. There is |
| 306 | usually no need for a full-image buffer here; a strip buffer is adequate. |
| 307 | |
| 308 | * Preprocessing controller: buffer controller for the downsampling input data |
| 309 | buffer, which lies between colorspace conversion and downsampling. Note |
| 310 | that a unified conversion/downsampling module would probably replace this |
| 311 | controller entirely. |
| 312 | |
| 313 | * Colorspace conversion: converts application image data into the desired |
| 314 | JPEG color space; also changes the data from pixel-interleaved layout to |
| 315 | separate component planes. Processes one pixel row at a time. |
| 316 | |
| 317 | * Downsampling: performs reduction of chroma components as required. |
| 318 | Optionally may perform pixel-level smoothing as well. Processes a "row |
| 319 | group" at a time, where a row group is defined as Vmax pixel rows of each |
| 320 | component before downsampling, and Vk sample rows afterwards (remember Vk |
| 321 | differs across components). Some downsampling or smoothing algorithms may |
| 322 | require context rows above and below the current row group; the |
| 323 | preprocessing controller is responsible for supplying these rows via proper |
| 324 | buffering. The downsampler is responsible for edge expansion at the right |
| 325 | edge (i.e., extending each sample row to a multiple of 8 samples); but the |
| 326 | preprocessing controller is responsible for vertical edge expansion (i.e., |
| 327 | duplicating the bottom sample row as needed to make a multiple of 8 rows). |
| 328 | |
| 329 | * Coefficient controller: buffer controller for the DCT-coefficient data. |
| 330 | This controller handles MCU assembly, including insertion of dummy DCT |
| 331 | blocks when needed at the right or bottom edge. When performing |
| 332 | Huffman-code optimization or emitting a multiscan JPEG file, this |
Thomas G. Lane | a8b67c4 | 1995-03-15 00:00:00 +0000 | [diff] [blame] | 333 | controller is responsible for buffering the full image. The equivalent of |
| 334 | one fully interleaved MCU row of subsampled data is processed per call, |
| 335 | even when the JPEG file is noninterleaved. |
Thomas G. Lane | 36a4ccc | 1994-09-24 00:00:00 +0000 | [diff] [blame] | 336 | |
Thomas G. Lane | bc79e06 | 1995-08-02 00:00:00 +0000 | [diff] [blame] | 337 | * Forward DCT and quantization: Perform DCT, quantize, and emit coefficients. |
| 338 | Works on one or more DCT blocks at a time. (Note: the coefficients are now |
| 339 | emitted in normal array order, which the entropy encoder is expected to |
| 340 | convert to zigzag order as necessary. Prior versions of the IJG code did |
| 341 | the conversion to zigzag order within the quantization step.) |
Thomas G. Lane | 36a4ccc | 1994-09-24 00:00:00 +0000 | [diff] [blame] | 342 | |
| 343 | * Entropy encoding: Perform Huffman or arithmetic entropy coding and emit the |
| 344 | coded data to the data destination module. Works on one MCU per call. |
Thomas G. Lane | bc79e06 | 1995-08-02 00:00:00 +0000 | [diff] [blame] | 345 | For progressive JPEG, the same DCT blocks are fed to the entropy coder |
| 346 | during each pass, and the coder must emit the appropriate subset of |
| 347 | coefficients. |
Thomas G. Lane | 36a4ccc | 1994-09-24 00:00:00 +0000 | [diff] [blame] | 348 | |
| 349 | In addition to the above objects, the compression library includes these |
| 350 | objects: |
| 351 | |
| 352 | * Master control: determines the number of passes required, controls overall |
| 353 | and per-pass initialization of the other modules. |
| 354 | |
| 355 | * Marker writing: generates JPEG markers (except for RSTn, which is emitted |
| 356 | by the entropy encoder when needed). |
| 357 | |
| 358 | * Data destination manager: writes the output JPEG datastream to its final |
| 359 | destination (e.g., a file). The destination manager supplied with the |
Guido Vollbeding | 5829cb2 | 2012-01-15 00:00:00 +0000 | [diff] [blame] | 360 | library knows how to write to a stdio stream or to a memory buffer; |
| 361 | for other behaviors, the surrounding application may provide its own |
| 362 | destination manager. |
Thomas G. Lane | 36a4ccc | 1994-09-24 00:00:00 +0000 | [diff] [blame] | 363 | |
| 364 | * Memory manager: allocates and releases memory, controls virtual arrays |
| 365 | (with backing store management, where required). |
| 366 | |
| 367 | * Error handler: performs formatting and output of error and trace messages; |
| 368 | determines handling of nonfatal errors. The surrounding application may |
| 369 | override some or all of this object's methods to change error handling. |
| 370 | |
| 371 | * Progress monitor: supports output of "percent-done" progress reports. |
| 372 | This object represents an optional callback to the surrounding application: |
| 373 | if wanted, it must be supplied by the application. |
| 374 | |
| 375 | The error handler, destination manager, and progress monitor objects are |
| 376 | defined as separate objects in order to simplify application-specific |
| 377 | customization of the JPEG library. A surrounding application may override |
| 378 | individual methods or supply its own all-new implementation of one of these |
| 379 | objects. The object interfaces for these objects are therefore treated as |
| 380 | part of the application interface of the library, whereas the other objects |
| 381 | are internal to the library. |
| 382 | |
| 383 | The error handler and memory manager are shared by JPEG compression and |
| 384 | decompression; the progress monitor, if used, may be shared as well. |
| 385 | |
| 386 | |
| 387 | *** Decompression object structure *** |
| 388 | |
| 389 | Here is a sketch of the logical structure of the JPEG decompression library: |
| 390 | |
| 391 | |-- Entropy decoding |
| 392 | |-- Coefficient controller --| |
| 393 | | |-- Dequantize, Inverse DCT |
| 394 | Main controller --| |
| 395 | | |-- Upsampling |
| 396 | |-- Postprocessing controller --| |-- Colorspace conversion |
| 397 | |-- Color quantization |
| 398 | |-- Color precision reduction |
| 399 | |
| 400 | As before, this diagram also represents typical control flow. The objects |
| 401 | shown are: |
| 402 | |
| 403 | * Main controller: buffer controller for the subsampled-data buffer, which |
| 404 | holds the output of JPEG decompression proper. This controller's primary |
| 405 | task is to feed the postprocessing procedure. Some upsampling algorithms |
| 406 | may require context rows above and below the current row group; when this |
| 407 | is true, the main controller is responsible for managing its buffer so as |
| 408 | to make context rows available. In the current design, the main buffer is |
| 409 | always a strip buffer; a full-image buffer is never required. |
| 410 | |
| 411 | * Coefficient controller: buffer controller for the DCT-coefficient data. |
| 412 | This controller handles MCU disassembly, including deletion of any dummy |
| 413 | DCT blocks at the right or bottom edge. When reading a multiscan JPEG |
| 414 | file, this controller is responsible for buffering the full image. |
| 415 | (Buffering DCT coefficients, rather than samples, is necessary to support |
| 416 | progressive JPEG.) The equivalent of one fully interleaved MCU row of |
| 417 | subsampled data is processed per call, even when the source JPEG file is |
| 418 | noninterleaved. |
| 419 | |
| 420 | * Entropy decoding: Read coded data from the data source module and perform |
| 421 | Huffman or arithmetic entropy decoding. Works on one MCU per call. |
Thomas G. Lane | bc79e06 | 1995-08-02 00:00:00 +0000 | [diff] [blame] | 422 | For progressive JPEG decoding, the coefficient controller supplies the prior |
| 423 | coefficients of each MCU (initially all zeroes), which the entropy decoder |
| 424 | modifies in each scan. |
Thomas G. Lane | 36a4ccc | 1994-09-24 00:00:00 +0000 | [diff] [blame] | 425 | |
| 426 | * Dequantization and inverse DCT: like it says. Note that the coefficients |
| 427 | buffered by the coefficient controller have NOT been dequantized; we |
| 428 | merge dequantization and inverse DCT into a single step for speed reasons. |
| 429 | When scaled-down output is asked for, simplified DCT algorithms may be used |
DRC | 66a69f0 | 2012-01-31 10:19:29 +0000 | [diff] [blame] | 430 | that emit fewer samples per DCT block, not the full 8x8. Works on one DCT |
| 431 | block at a time. |
Thomas G. Lane | 36a4ccc | 1994-09-24 00:00:00 +0000 | [diff] [blame] | 432 | |
| 433 | * Postprocessing controller: buffer controller for the color quantization |
| 434 | input buffer, when quantization is in use. (Without quantization, this |
| 435 | controller just calls the upsampler.) For two-pass quantization, this |
| 436 | controller is responsible for buffering the full-image data. |
| 437 | |
| 438 | * Upsampling: restores chroma components to full size. (May support more |
| 439 | general output rescaling, too. Note that if undersized DCT outputs have |
| 440 | been emitted by the DCT module, this module must adjust so that properly |
| 441 | sized outputs are created.) Works on one row group at a time. This module |
| 442 | also calls the color conversion module, so its top level is effectively a |
| 443 | buffer controller for the upsampling->color conversion buffer. However, in |
| 444 | all but the highest-quality operating modes, upsampling and color |
| 445 | conversion are likely to be merged into a single step. |
| 446 | |
| 447 | * Colorspace conversion: convert from JPEG color space to output color space, |
| 448 | and change data layout from separate component planes to pixel-interleaved. |
| 449 | Works on one pixel row at a time. |
| 450 | |
| 451 | * Color quantization: reduce the data to colormapped form, using either an |
| 452 | externally specified colormap or an internally generated one. This module |
| 453 | is not used for full-color output. Works on one pixel row at a time; may |
| 454 | require two passes to generate a color map. Note that the output will |
| 455 | always be a single component representing colormap indexes. In the current |
| 456 | design, the output values are JSAMPLEs, so an 8-bit compilation cannot |
| 457 | quantize to more than 256 colors. This is unlikely to be a problem in |
| 458 | practice. |
| 459 | |
| 460 | * Color reduction: this module handles color precision reduction, e.g., |
| 461 | generating 15-bit color (5 bits/primary) from JPEG's 24-bit output. |
| 462 | Not quite clear yet how this should be handled... should we merge it with |
| 463 | colorspace conversion??? |
| 464 | |
| 465 | Note that some high-speed operating modes might condense the entire |
| 466 | postprocessing sequence to a single module (upsample, color convert, and |
| 467 | quantize in one step). |
| 468 | |
| 469 | In addition to the above objects, the decompression library includes these |
| 470 | objects: |
| 471 | |
| 472 | * Master control: determines the number of passes required, controls overall |
Thomas G. Lane | bc79e06 | 1995-08-02 00:00:00 +0000 | [diff] [blame] | 473 | and per-pass initialization of the other modules. This is subdivided into |
| 474 | input and output control: jdinput.c controls only input-side processing, |
| 475 | while jdmaster.c handles overall initialization and output-side control. |
Thomas G. Lane | 36a4ccc | 1994-09-24 00:00:00 +0000 | [diff] [blame] | 476 | |
| 477 | * Marker reading: decodes JPEG markers (except for RSTn). |
| 478 | |
| 479 | * Data source manager: supplies the input JPEG datastream. The source |
Guido Vollbeding | 5829cb2 | 2012-01-15 00:00:00 +0000 | [diff] [blame] | 480 | manager supplied with the library knows how to read from a stdio stream |
| 481 | or from a memory buffer; for other behaviors, the surrounding application |
| 482 | may provide its own source manager. |
Thomas G. Lane | 36a4ccc | 1994-09-24 00:00:00 +0000 | [diff] [blame] | 483 | |
| 484 | * Memory manager: same as for compression library. |
| 485 | |
| 486 | * Error handler: same as for compression library. |
| 487 | |
| 488 | * Progress monitor: same as for compression library. |
| 489 | |
| 490 | As with compression, the data source manager, error handler, and progress |
| 491 | monitor are candidates for replacement by a surrounding application. |
| 492 | |
| 493 | |
Thomas G. Lane | bc79e06 | 1995-08-02 00:00:00 +0000 | [diff] [blame] | 494 | *** Decompression input and output separation *** |
| 495 | |
| 496 | To support efficient incremental display of progressive JPEG files, the |
| 497 | decompressor is divided into two sections that can run independently: |
| 498 | |
| 499 | 1. Data input includes marker parsing, entropy decoding, and input into the |
| 500 | coefficient controller's DCT coefficient buffer. Note that this |
| 501 | processing is relatively cheap and fast. |
| 502 | |
| 503 | 2. Data output reads from the DCT coefficient buffer and performs the IDCT |
| 504 | and all postprocessing steps. |
| 505 | |
| 506 | For a progressive JPEG file, the data input processing is allowed to get |
| 507 | arbitrarily far ahead of the data output processing. (This occurs only |
| 508 | if the application calls jpeg_consume_input(); otherwise input and output |
| 509 | run in lockstep, since the input section is called only when the output |
| 510 | section needs more data.) In this way the application can avoid making |
| 511 | extra display passes when data is arriving faster than the display pass |
| 512 | can run. Furthermore, it is possible to abort an output pass without |
| 513 | losing anything, since the coefficient buffer is read-only as far as the |
Guido Vollbeding | 5996a25 | 2009-06-27 00:00:00 +0000 | [diff] [blame] | 514 | output section is concerned. See libjpeg.txt for more detail. |
Thomas G. Lane | bc79e06 | 1995-08-02 00:00:00 +0000 | [diff] [blame] | 515 | |
| 516 | A full-image coefficient array is only created if the JPEG file has multiple |
| 517 | scans (or if the application specifies buffered-image mode anyway). When |
| 518 | reading a single-scan file, the coefficient controller normally creates only |
| 519 | a one-MCU buffer, so input and output processing must run in lockstep in this |
| 520 | case. jpeg_consume_input() is effectively a no-op in this situation. |
| 521 | |
| 522 | The main impact of dividing the decompressor in this fashion is that we must |
| 523 | be very careful with shared variables in the cinfo data structure. Each |
| 524 | variable that can change during the course of decompression must be |
| 525 | classified as belonging to data input or data output, and each section must |
| 526 | look only at its own variables. For example, the data output section may not |
| 527 | depend on any of the variables that describe the current scan in the JPEG |
| 528 | file, because these may change as the data input section advances into a new |
| 529 | scan. |
| 530 | |
| 531 | The progress monitor is (somewhat arbitrarily) defined to treat input of the |
| 532 | file as one pass when buffered-image mode is not used, and to ignore data |
| 533 | input work completely when buffered-image mode is used. Note that the |
| 534 | library has no reliable way to predict the number of passes when dealing |
| 535 | with a progressive JPEG file, nor can it predict the number of output passes |
| 536 | in buffered-image mode. So the work estimate is inherently bogus anyway. |
| 537 | |
| 538 | No comparable division is currently made in the compression library, because |
| 539 | there isn't any real need for it. |
| 540 | |
| 541 | |
Thomas G. Lane | 36a4ccc | 1994-09-24 00:00:00 +0000 | [diff] [blame] | 542 | *** Data formats *** |
| 543 | |
| 544 | Arrays of pixel sample values use the following data structure: |
| 545 | |
DRC | e5eaf37 | 2014-05-09 18:00:32 +0000 | [diff] [blame] | 546 | typedef something JSAMPLE; a pixel component value, 0..MAXJSAMPLE |
| 547 | typedef JSAMPLE *JSAMPROW; ptr to a row of samples |
| 548 | typedef JSAMPROW *JSAMPARRAY; ptr to a list of rows |
| 549 | typedef JSAMPARRAY *JSAMPIMAGE; ptr to a list of color-component arrays |
Thomas G. Lane | 36a4ccc | 1994-09-24 00:00:00 +0000 | [diff] [blame] | 550 | |
| 551 | The basic element type JSAMPLE will typically be one of unsigned char, |
| 552 | (signed) char, or short. Short will be used if samples wider than 8 bits are |
| 553 | to be supported (this is a compile-time option). Otherwise, unsigned char is |
| 554 | used if possible. If the compiler only supports signed chars, then it is |
| 555 | necessary to mask off the value when reading. Thus, all reads of JSAMPLE |
| 556 | values must be coded as "GETJSAMPLE(value)", where the macro will be defined |
| 557 | as "((value) & 0xFF)" on signed-char machines and "((int) (value))" elsewhere. |
| 558 | |
| 559 | With these conventions, JSAMPLE values can be assumed to be >= 0. This helps |
| 560 | simplify correct rounding during downsampling, etc. The JPEG standard's |
| 561 | specification that sample values run from -128..127 is accommodated by |
Guido Vollbeding | 5996a25 | 2009-06-27 00:00:00 +0000 | [diff] [blame] | 562 | subtracting 128 from the sample value in the DCT step. Similarly, during |
Thomas G. Lane | 36a4ccc | 1994-09-24 00:00:00 +0000 | [diff] [blame] | 563 | decompression the output of the IDCT step will be immediately shifted back to |
| 564 | 0..255. (NB: different values are required when 12-bit samples are in use. |
| 565 | The code is written in terms of MAXJSAMPLE and CENTERJSAMPLE, which will be |
| 566 | defined as 255 and 128 respectively in an 8-bit implementation, and as 4095 |
| 567 | and 2048 in a 12-bit implementation.) |
| 568 | |
| 569 | We use a pointer per row, rather than a two-dimensional JSAMPLE array. This |
| 570 | choice costs only a small amount of memory and has several benefits: |
| 571 | * Code using the data structure doesn't need to know the allocated width of |
| 572 | the rows. This simplifies edge expansion/compression, since we can work |
| 573 | in an array that's wider than the logical picture width. |
| 574 | * Indexing doesn't require multiplication; this is a performance win on many |
| 575 | machines. |
| 576 | * Arrays with more than 64K total elements can be supported even on machines |
| 577 | where malloc() cannot allocate chunks larger than 64K. |
| 578 | * The rows forming a component array may be allocated at different times |
| 579 | without extra copying. This trick allows some speedups in smoothing steps |
| 580 | that need access to the previous and next rows. |
| 581 | |
| 582 | Note that each color component is stored in a separate array; we don't use the |
| 583 | traditional layout in which the components of a pixel are stored together. |
| 584 | This simplifies coding of modules that work on each component independently, |
| 585 | because they don't need to know how many components there are. Furthermore, |
| 586 | we can read or write each component to a temporary file independently, which |
| 587 | is helpful when dealing with noninterleaved JPEG files. |
| 588 | |
| 589 | In general, a specific sample value is accessed by code such as |
DRC | e5eaf37 | 2014-05-09 18:00:32 +0000 | [diff] [blame] | 590 | GETJSAMPLE(image[colorcomponent][row][col]) |
Thomas G. Lane | 36a4ccc | 1994-09-24 00:00:00 +0000 | [diff] [blame] | 591 | where col is measured from the image left edge, but row is measured from the |
| 592 | first sample row currently in memory. Either of the first two indexings can |
| 593 | be precomputed by copying the relevant pointer. |
| 594 | |
| 595 | |
| 596 | Since most image-processing applications prefer to work on images in which |
| 597 | the components of a pixel are stored together, the data passed to or from the |
| 598 | surrounding application uses the traditional convention: a single pixel is |
| 599 | represented by N consecutive JSAMPLE values, and an image row is an array of |
| 600 | (# of color components)*(image width) JSAMPLEs. One or more rows of data can |
| 601 | be represented by a pointer of type JSAMPARRAY in this scheme. This scheme is |
| 602 | converted to component-wise storage inside the JPEG library. (Applications |
| 603 | that want to skip JPEG preprocessing or postprocessing will have to contend |
| 604 | with component-wise storage.) |
| 605 | |
| 606 | |
| 607 | Arrays of DCT-coefficient values use the following data structure: |
| 608 | |
DRC | e5eaf37 | 2014-05-09 18:00:32 +0000 | [diff] [blame] | 609 | typedef short JCOEF; a 16-bit signed integer |
| 610 | typedef JCOEF JBLOCK[DCTSIZE2]; an 8x8 block of coefficients |
| 611 | typedef JBLOCK *JBLOCKROW; ptr to one horizontal row of 8x8 blocks |
| 612 | typedef JBLOCKROW *JBLOCKARRAY; ptr to a list of such rows |
| 613 | typedef JBLOCKARRAY *JBLOCKIMAGE; ptr to a list of color component arrays |
Thomas G. Lane | 36a4ccc | 1994-09-24 00:00:00 +0000 | [diff] [blame] | 614 | |
| 615 | The underlying type is at least a 16-bit signed integer; while "short" is big |
| 616 | enough on all machines of interest, on some machines it is preferable to use |
| 617 | "int" for speed reasons, despite the storage cost. Coefficients are grouped |
| 618 | into 8x8 blocks (but we always use #defines DCTSIZE and DCTSIZE2 rather than |
Thomas G. Lane | 489583f | 1996-02-07 00:00:00 +0000 | [diff] [blame] | 619 | "8" and "64"). |
| 620 | |
| 621 | The contents of a coefficient block may be in either "natural" or zigzagged |
| 622 | order, and may be true values or divided by the quantization coefficients, |
| 623 | depending on where the block is in the processing pipeline. In the current |
| 624 | library, coefficient blocks are kept in natural order everywhere; the entropy |
| 625 | codecs zigzag or dezigzag the data as it is written or read. The blocks |
| 626 | contain quantized coefficients everywhere outside the DCT/IDCT subsystems. |
| 627 | (This latter decision may need to be revisited to support variable |
| 628 | quantization a la JPEG Part 3.) |
Thomas G. Lane | 36a4ccc | 1994-09-24 00:00:00 +0000 | [diff] [blame] | 629 | |
| 630 | Notice that the allocation unit is now a row of 8x8 blocks, corresponding to |
| 631 | eight rows of samples. Otherwise the structure is much the same as for |
| 632 | samples, and for the same reasons. |
| 633 | |
Thomas G. Lane | 36a4ccc | 1994-09-24 00:00:00 +0000 | [diff] [blame] | 634 | |
| 635 | *** Suspendable processing *** |
| 636 | |
| 637 | In some applications it is desirable to use the JPEG library as an |
| 638 | incremental, memory-to-memory filter. In this situation the data source or |
| 639 | destination may be a limited-size buffer, and we can't rely on being able to |
| 640 | empty or refill the buffer at arbitrary times. Instead the application would |
| 641 | like to have control return from the library at buffer overflow/underrun, and |
| 642 | then resume compression or decompression at a later time. |
| 643 | |
Thomas G. Lane | bc79e06 | 1995-08-02 00:00:00 +0000 | [diff] [blame] | 644 | This scenario is supported for simple cases. (For anything more complex, we |
| 645 | recommend that the application "bite the bullet" and develop real multitasking |
Guido Vollbeding | 5996a25 | 2009-06-27 00:00:00 +0000 | [diff] [blame] | 646 | capability.) The libjpeg.txt file goes into more detail about the usage and |
Thomas G. Lane | bc79e06 | 1995-08-02 00:00:00 +0000 | [diff] [blame] | 647 | limitations of this capability; here we address the implications for library |
| 648 | structure. |
Thomas G. Lane | 36a4ccc | 1994-09-24 00:00:00 +0000 | [diff] [blame] | 649 | |
| 650 | The essence of the problem is that the entropy codec (coder or decoder) must |
| 651 | be prepared to stop at arbitrary times. In turn, the controllers that call |
| 652 | the entropy codec must be able to stop before having produced or consumed all |
| 653 | the data that they normally would handle in one call. That part is reasonably |
| 654 | straightforward: we make the controller call interfaces include "progress |
| 655 | counters" which indicate the number of data chunks successfully processed, and |
| 656 | we require callers to test the counter rather than just assume all of the data |
| 657 | was processed. |
| 658 | |
| 659 | Rather than trying to restart at an arbitrary point, the current Huffman |
| 660 | codecs are designed to restart at the beginning of the current MCU after a |
| 661 | suspension due to buffer overflow/underrun. At the start of each call, the |
| 662 | codec's internal state is loaded from permanent storage (in the JPEG object |
| 663 | structures) into local variables. On successful completion of the MCU, the |
| 664 | permanent state is updated. (This copying is not very expensive, and may even |
| 665 | lead to *improved* performance if the local variables can be registerized.) |
| 666 | If a suspension occurs, the codec simply returns without updating the state, |
| 667 | thus effectively reverting to the start of the MCU. Note that this implies |
| 668 | leaving some data unprocessed in the source/destination buffer (ie, the |
| 669 | compressed partial MCU). The data source/destination module interfaces are |
| 670 | specified so as to make this possible. This also implies that the data buffer |
| 671 | must be large enough to hold a worst-case compressed MCU; a couple thousand |
| 672 | bytes should be enough. |
| 673 | |
Thomas G. Lane | bc79e06 | 1995-08-02 00:00:00 +0000 | [diff] [blame] | 674 | In a successive-approximation AC refinement scan, the progressive Huffman |
| 675 | decoder has to be able to undo assignments of newly nonzero coefficients if it |
| 676 | suspends before the MCU is complete, since decoding requires distinguishing |
| 677 | previously-zero and previously-nonzero coefficients. This is a bit tedious |
| 678 | but probably won't have much effect on performance. Other variants of Huffman |
| 679 | decoding need not worry about this, since they will just store the same values |
| 680 | again if forced to repeat the MCU. |
| 681 | |
| 682 | This approach would probably not work for an arithmetic codec, since its |
Thomas G. Lane | 36a4ccc | 1994-09-24 00:00:00 +0000 | [diff] [blame] | 683 | modifiable state is quite large and couldn't be copied cheaply. Instead it |
| 684 | would have to suspend and resume exactly at the point of the buffer end. |
Thomas G. Lane | 36a4ccc | 1994-09-24 00:00:00 +0000 | [diff] [blame] | 685 | |
| 686 | The JPEG marker reader is designed to cope with suspension at an arbitrary |
| 687 | point. It does so by backing up to the start of the marker parameter segment, |
| 688 | so the data buffer must be big enough to hold the largest marker of interest. |
| 689 | Again, a couple KB should be adequate. (A special "skip" convention is used |
| 690 | to bypass COM and APPn markers, so these can be larger than the buffer size |
| 691 | without causing problems; otherwise a 64K buffer would be needed in the worst |
| 692 | case.) |
| 693 | |
Guido Vollbeding | 5996a25 | 2009-06-27 00:00:00 +0000 | [diff] [blame] | 694 | The JPEG marker writer currently does *not* cope with suspension. |
| 695 | We feel that this is not necessary; it is much easier simply to require |
| 696 | the application to ensure there is enough buffer space before starting. (An |
| 697 | empty 2K buffer is more than sufficient for the header markers; and ensuring |
| 698 | there are a dozen or two bytes available before calling jpeg_finish_compress() |
| 699 | will suffice for the trailer.) This would not work for writing multi-scan |
| 700 | JPEG files, but we simply do not intend to support that capability with |
| 701 | suspension. |
Thomas G. Lane | 36a4ccc | 1994-09-24 00:00:00 +0000 | [diff] [blame] | 702 | |
| 703 | |
| 704 | *** Memory manager services *** |
| 705 | |
| 706 | The JPEG library's memory manager controls allocation and deallocation of |
| 707 | memory, and it manages large "virtual" data arrays on machines where the |
| 708 | operating system does not provide virtual memory. Note that the same |
| 709 | memory manager serves both compression and decompression operations. |
| 710 | |
| 711 | In all cases, allocated objects are tied to a particular compression or |
| 712 | decompression master record, and they will be released when that master |
| 713 | record is destroyed. |
| 714 | |
| 715 | The memory manager does not provide explicit deallocation of objects. |
| 716 | Instead, objects are created in "pools" of free storage, and a whole pool |
| 717 | can be freed at once. This approach helps prevent storage-leak bugs, and |
| 718 | it speeds up operations whenever malloc/free are slow (as they often are). |
| 719 | The pools can be regarded as lifetime identifiers for objects. Two |
| 720 | pools/lifetimes are defined: |
DRC | e5eaf37 | 2014-05-09 18:00:32 +0000 | [diff] [blame] | 721 | * JPOOL_PERMANENT lasts until master record is destroyed |
| 722 | * JPOOL_IMAGE lasts until done with image (JPEG datastream) |
Thomas G. Lane | 36a4ccc | 1994-09-24 00:00:00 +0000 | [diff] [blame] | 723 | Permanent lifetime is used for parameters and tables that should be carried |
| 724 | across from one datastream to another; this includes all application-visible |
| 725 | parameters. Image lifetime is used for everything else. (A third lifetime, |
| 726 | JPOOL_PASS = one processing pass, was originally planned. However it was |
| 727 | dropped as not being worthwhile. The actual usage patterns are such that the |
| 728 | peak memory usage would be about the same anyway; and having per-pass storage |
| 729 | substantially complicates the virtual memory allocation rules --- see below.) |
| 730 | |
| 731 | The memory manager deals with three kinds of object: |
| 732 | 1. "Small" objects. Typically these require no more than 10K-20K total. |
| 733 | 2. "Large" objects. These may require tens to hundreds of K depending on |
| 734 | image size. Semantically they behave the same as small objects, but we |
DRC | 5033f3e | 2014-05-18 18:33:44 +0000 | [diff] [blame] | 735 | distinguish them because pool allocation heuristics may differ for large and |
| 736 | small objects (historically, large objects were also referenced by far |
| 737 | pointers on MS-DOS machines.) Note that individual "large" objects cannot |
| 738 | exceed the size allowed by type size_t, which may be 64K or less on some |
| 739 | machines. |
Thomas G. Lane | 36a4ccc | 1994-09-24 00:00:00 +0000 | [diff] [blame] | 740 | 3. "Virtual" objects. These are large 2-D arrays of JSAMPLEs or JBLOCKs |
| 741 | (typically large enough for the entire image being processed). The |
| 742 | memory manager provides stripwise access to these arrays. On machines |
| 743 | without virtual memory, the rest of the array may be swapped out to a |
| 744 | temporary file. |
| 745 | |
| 746 | (Note: JSAMPARRAY and JBLOCKARRAY data structures are a combination of large |
| 747 | objects for the data proper and small objects for the row pointers. For |
| 748 | convenience and speed, the memory manager provides single routines to create |
| 749 | these structures. Similarly, virtual arrays include a small control block |
| 750 | and a JSAMPARRAY or JBLOCKARRAY working buffer, all created with one call.) |
| 751 | |
| 752 | In the present implementation, virtual arrays are only permitted to have image |
| 753 | lifespan. (Permanent lifespan would not be reasonable, and pass lifespan is |
| 754 | not very useful since a virtual array's raison d'etre is to store data for |
| 755 | multiple passes through the image.) We also expect that only "small" objects |
| 756 | will be given permanent lifespan, though this restriction is not required by |
| 757 | the memory manager. |
| 758 | |
| 759 | In a non-virtual-memory machine, some performance benefit can be gained by |
| 760 | making the in-memory buffers for virtual arrays be as large as possible. |
| 761 | (For small images, the buffers might fit entirely in memory, so blind |
| 762 | swapping would be very wasteful.) The memory manager will adjust the height |
| 763 | of the buffers to fit within a prespecified maximum memory usage. In order |
| 764 | to do this in a reasonably optimal fashion, the manager needs to allocate all |
| 765 | of the virtual arrays at once. Therefore, there isn't a one-step allocation |
| 766 | routine for virtual arrays; instead, there is a "request" routine that simply |
| 767 | allocates the control block, and a "realize" routine (called just once) that |
| 768 | determines space allocation and creates all of the actual buffers. The |
| 769 | realize routine must allow for space occupied by non-virtual large objects. |
| 770 | (We don't bother to factor in the space needed for small objects, on the |
| 771 | grounds that it isn't worth the trouble.) |
| 772 | |
| 773 | To support all this, we establish the following protocol for doing business |
| 774 | with the memory manager: |
| 775 | 1. Modules must request virtual arrays (which may have only image lifespan) |
Thomas G. Lane | bc79e06 | 1995-08-02 00:00:00 +0000 | [diff] [blame] | 776 | during the initial setup phase, i.e., in their jinit_xxx routines. |
Thomas G. Lane | 36a4ccc | 1994-09-24 00:00:00 +0000 | [diff] [blame] | 777 | 2. All "large" objects (including JSAMPARRAYs and JBLOCKARRAYs) must also be |
Thomas G. Lane | bc79e06 | 1995-08-02 00:00:00 +0000 | [diff] [blame] | 778 | allocated during initial setup. |
| 779 | 3. realize_virt_arrays will be called at the completion of initial setup. |
Thomas G. Lane | 36a4ccc | 1994-09-24 00:00:00 +0000 | [diff] [blame] | 780 | The above conventions ensure that sufficient information is available |
| 781 | for it to choose a good size for virtual array buffers. |
| 782 | Small objects of any lifespan may be allocated at any time. We expect that |
| 783 | the total space used for small objects will be small enough to be negligible |
| 784 | in the realize_virt_arrays computation. |
| 785 | |
| 786 | In a virtual-memory machine, we simply pretend that the available space is |
| 787 | infinite, thus causing realize_virt_arrays to decide that it can allocate all |
| 788 | the virtual arrays as full-size in-memory buffers. The overhead of the |
| 789 | virtual-array access protocol is very small when no swapping occurs. |
| 790 | |
Thomas G. Lane | bc79e06 | 1995-08-02 00:00:00 +0000 | [diff] [blame] | 791 | A virtual array can be specified to be "pre-zeroed"; when this flag is set, |
| 792 | never-yet-written sections of the array are set to zero before being made |
| 793 | available to the caller. If this flag is not set, never-written sections |
| 794 | of the array contain garbage. (This feature exists primarily because the |
| 795 | equivalent logic would otherwise be needed in jdcoefct.c for progressive |
| 796 | JPEG mode; we may as well make it available for possible other uses.) |
| 797 | |
| 798 | The first write pass on a virtual array is required to occur in top-to-bottom |
| 799 | order; read passes, as well as any write passes after the first one, may |
| 800 | access the array in any order. This restriction exists partly to simplify |
| 801 | the virtual array control logic, and partly because some file systems may not |
| 802 | support seeking beyond the current end-of-file in a temporary file. The main |
| 803 | implication of this restriction is that rearrangement of rows (such as |
| 804 | converting top-to-bottom data order to bottom-to-top) must be handled while |
| 805 | reading data out of the virtual array, not while putting it in. |
| 806 | |
Thomas G. Lane | 36a4ccc | 1994-09-24 00:00:00 +0000 | [diff] [blame] | 807 | |
| 808 | *** Memory manager internal structure *** |
| 809 | |
| 810 | To isolate system dependencies as much as possible, we have broken the |
| 811 | memory manager into two parts. There is a reasonably system-independent |
| 812 | "front end" (jmemmgr.c) and a "back end" that contains only the code |
| 813 | likely to change across systems. All of the memory management methods |
| 814 | outlined above are implemented by the front end. The back end provides |
| 815 | the following routines for use by the front end (none of these routines |
| 816 | are known to the rest of the JPEG code): |
| 817 | |
DRC | e5eaf37 | 2014-05-09 18:00:32 +0000 | [diff] [blame] | 818 | jpeg_mem_init, jpeg_mem_term system-dependent initialization/shutdown |
Thomas G. Lane | 36a4ccc | 1994-09-24 00:00:00 +0000 | [diff] [blame] | 819 | |
DRC | e5eaf37 | 2014-05-09 18:00:32 +0000 | [diff] [blame] | 820 | jpeg_get_small, jpeg_free_small interface to malloc and free library routines |
| 821 | (or their equivalents) |
Thomas G. Lane | 36a4ccc | 1994-09-24 00:00:00 +0000 | [diff] [blame] | 822 | |
DRC | 5033f3e | 2014-05-18 18:33:44 +0000 | [diff] [blame] | 823 | jpeg_get_large, jpeg_free_large historically was used to interface with |
| 824 | FAR malloc/free on MS-DOS machines; now the |
| 825 | same as jpeg_get_small/jpeg_free_small |
Thomas G. Lane | 36a4ccc | 1994-09-24 00:00:00 +0000 | [diff] [blame] | 826 | |
DRC | e5eaf37 | 2014-05-09 18:00:32 +0000 | [diff] [blame] | 827 | jpeg_mem_available estimate available memory |
Thomas G. Lane | 36a4ccc | 1994-09-24 00:00:00 +0000 | [diff] [blame] | 828 | |
DRC | e5eaf37 | 2014-05-09 18:00:32 +0000 | [diff] [blame] | 829 | jpeg_open_backing_store create a backing-store object |
Thomas G. Lane | 36a4ccc | 1994-09-24 00:00:00 +0000 | [diff] [blame] | 830 | |
DRC | e5eaf37 | 2014-05-09 18:00:32 +0000 | [diff] [blame] | 831 | read_backing_store, manipulate a backing-store object |
Thomas G. Lane | 36a4ccc | 1994-09-24 00:00:00 +0000 | [diff] [blame] | 832 | write_backing_store, |
| 833 | close_backing_store |
| 834 | |
| 835 | On some systems there will be more than one type of backing-store object |
| 836 | (specifically, in MS-DOS a backing store file might be an area of extended |
| 837 | memory as well as a disk file). jpeg_open_backing_store is responsible for |
| 838 | choosing how to implement a given object. The read/write/close routines |
| 839 | are method pointers in the structure that describes a given object; this |
| 840 | lets them be different for different object types. |
| 841 | |
| 842 | It may be necessary to ensure that backing store objects are explicitly |
| 843 | released upon abnormal program termination. For example, MS-DOS won't free |
| 844 | extended memory by itself. To support this, we will expect the main program |
| 845 | or surrounding application to arrange to call self_destruct (typically via |
| 846 | jpeg_destroy) upon abnormal termination. This may require a SIGINT signal |
| 847 | handler or equivalent. We don't want to have the back end module install its |
| 848 | own signal handler, because that would pre-empt the surrounding application's |
| 849 | ability to control signal handling. |
| 850 | |
| 851 | The IJG distribution includes several memory manager back end implementations. |
| 852 | Usually the same back end should be suitable for all applications on a given |
| 853 | system, but it is possible for an application to supply its own back end at |
| 854 | need. |
| 855 | |
| 856 | |
| 857 | *** Implications of DNL marker *** |
| 858 | |
| 859 | Some JPEG files may use a DNL marker to postpone definition of the image |
| 860 | height (this would be useful for a fax-like scanner's output, for instance). |
| 861 | In these files the SOF marker claims the image height is 0, and you only |
| 862 | find out the true image height at the end of the first scan. |
| 863 | |
| 864 | We could read these files as follows: |
| 865 | 1. Upon seeing zero image height, replace it by 65535 (the maximum allowed). |
| 866 | 2. When the DNL is found, update the image height in the global image |
| 867 | descriptor. |
| 868 | This implies that control modules must avoid making copies of the image |
| 869 | height, and must re-test for termination after each MCU row. This would |
| 870 | be easy enough to do. |
| 871 | |
| 872 | In cases where image-size data structures are allocated, this approach will |
| 873 | result in very inefficient use of virtual memory or much-larger-than-necessary |
| 874 | temporary files. This seems acceptable for something that probably won't be a |
| 875 | mainstream usage. People might have to forgo use of memory-hogging options |
| 876 | (such as two-pass color quantization or noninterleaved JPEG files) if they |
| 877 | want efficient conversion of such files. (One could improve efficiency by |
| 878 | demanding a user-supplied upper bound for the height, less than 65536; in most |
| 879 | cases it could be much less.) |
| 880 | |
| 881 | The standard also permits the SOF marker to overestimate the image height, |
| 882 | with a DNL to give the true, smaller height at the end of the first scan. |
| 883 | This would solve the space problems if the overestimate wasn't too great. |
| 884 | However, it implies that you don't even know whether DNL will be used. |
| 885 | |
| 886 | This leads to a couple of very serious objections: |
| 887 | 1. Testing for a DNL marker must occur in the inner loop of the decompressor's |
| 888 | Huffman decoder; this implies a speed penalty whether the feature is used |
| 889 | or not. |
| 890 | 2. There is no way to hide the last-minute change in image height from an |
| 891 | application using the decoder. Thus *every* application using the IJG |
| 892 | library would suffer a complexity penalty whether it cared about DNL or |
| 893 | not. |
| 894 | We currently do not support DNL because of these problems. |
| 895 | |
| 896 | A different approach is to insist that DNL-using files be preprocessed by a |
| 897 | separate program that reads ahead to the DNL, then goes back and fixes the SOF |
| 898 | marker. This is a much simpler solution and is probably far more efficient. |
| 899 | Even if one wants piped input, buffering the first scan of the JPEG file needs |
| 900 | a lot smaller temp file than is implied by the maximum-height method. For |
| 901 | this approach we'd simply treat DNL as a no-op in the decompressor (at most, |
| 902 | check that it matches the SOF image height). |
| 903 | |
| 904 | We will not worry about making the compressor capable of outputting DNL. |
| 905 | Something similar to the first scheme above could be applied if anyone ever |
| 906 | wants to make that work. |