Fred Drake | 295da24 | 1998-08-10 19:42:37 +0000 | [diff] [blame] | 1 | \section{\module{zlib} --- |
Fred Drake | b11d108 | 1999-04-21 18:44:41 +0000 | [diff] [blame] | 2 | Compression compatible with \program{gzip}} |
Fred Drake | b91e934 | 1998-07-23 17:59:49 +0000 | [diff] [blame] | 3 | |
Fred Drake | bbac432 | 1999-02-20 00:14:17 +0000 | [diff] [blame] | 4 | \declaremodule{builtin}{zlib} |
Fred Drake | 08caa96 | 1998-07-27 22:08:49 +0000 | [diff] [blame] | 5 | \modulesynopsis{Low-level interface to compression and decompression |
Fred Drake | b11d108 | 1999-04-21 18:44:41 +0000 | [diff] [blame] | 6 | routines compatible with \program{gzip}.} |
Fred Drake | b91e934 | 1998-07-23 17:59:49 +0000 | [diff] [blame] | 7 | |
Guido van Rossum | 04bc9d6 | 1997-04-30 18:12:27 +0000 | [diff] [blame] | 8 | |
| 9 | For applications that require data compression, the functions in this |
Fred Drake | 8a254b5 | 1998-04-09 15:41:44 +0000 | [diff] [blame] | 10 | module allow compression and decompression, using the zlib library. |
| 11 | The zlib library has its own home page at |
Fred Drake | b037d33 | 2001-06-25 15:30:13 +0000 | [diff] [blame] | 12 | \url{http://www.gzip.org/zlib/}. Version 1.1.3 is the |
Fred Drake | 315b9e0 | 2000-09-16 06:18:26 +0000 | [diff] [blame] | 13 | most recent version as of September 2000; use a later version if one |
| 14 | is available. There are known incompatibilities between the Python |
| 15 | module and earlier versions of the zlib library. |
Jeremy Hylton | 45b0aed | 1999-04-05 21:55:21 +0000 | [diff] [blame] | 16 | |
Fred Drake | 74810d5 | 1998-04-03 06:49:26 +0000 | [diff] [blame] | 17 | The available exception and functions in this module are: |
Guido van Rossum | 04bc9d6 | 1997-04-30 18:12:27 +0000 | [diff] [blame] | 18 | |
Fred Drake | 74810d5 | 1998-04-03 06:49:26 +0000 | [diff] [blame] | 19 | \begin{excdesc}{error} |
| 20 | Exception raised on compression and decompression errors. |
| 21 | \end{excdesc} |
| 22 | |
| 23 | |
Fred Drake | cce1090 | 1998-03-17 06:33:25 +0000 | [diff] [blame] | 24 | \begin{funcdesc}{adler32}{string\optional{, value}} |
Guido van Rossum | 04bc9d6 | 1997-04-30 18:12:27 +0000 | [diff] [blame] | 25 | Computes a Adler-32 checksum of \var{string}. (An Adler-32 |
| 26 | checksum is almost as reliable as a CRC32 but can be computed much |
| 27 | more quickly.) If \var{value} is present, it is used as the |
| 28 | starting value of the checksum; otherwise, a fixed default value is |
| 29 | used. This allows computing a running checksum over the |
| 30 | concatenation of several input strings. The algorithm is not |
| 31 | cryptographically strong, and should not be used for |
Fred Drake | 327798c | 2001-10-15 13:45:49 +0000 | [diff] [blame] | 32 | authentication or digital signatures. Since the algorithm is |
| 33 | designed for use as a checksum algorithm, it is not suitable for |
| 34 | use as a general hash algorithm. |
Guido van Rossum | 04bc9d6 | 1997-04-30 18:12:27 +0000 | [diff] [blame] | 35 | \end{funcdesc} |
| 36 | |
Fred Drake | cce1090 | 1998-03-17 06:33:25 +0000 | [diff] [blame] | 37 | \begin{funcdesc}{compress}{string\optional{, level}} |
Fred Drake | 5916070 | 1998-06-19 21:18:28 +0000 | [diff] [blame] | 38 | Compresses the data in \var{string}, returning a string contained |
| 39 | compressed data. \var{level} is an integer from \code{1} to |
| 40 | \code{9} controlling the level of compression; \code{1} is fastest |
| 41 | and produces the least compression, \code{9} is slowest and produces |
| 42 | the most. The default value is \code{6}. Raises the |
| 43 | \exception{error} exception if any error occurs. |
Guido van Rossum | 04bc9d6 | 1997-04-30 18:12:27 +0000 | [diff] [blame] | 44 | \end{funcdesc} |
| 45 | |
| 46 | \begin{funcdesc}{compressobj}{\optional{level}} |
Fred Drake | ed79783 | 1998-01-22 16:11:18 +0000 | [diff] [blame] | 47 | Returns a compression object, to be used for compressing data streams |
Guido van Rossum | 04bc9d6 | 1997-04-30 18:12:27 +0000 | [diff] [blame] | 48 | that won't fit into memory at once. \var{level} is an integer from |
Fred Drake | ed79783 | 1998-01-22 16:11:18 +0000 | [diff] [blame] | 49 | \code{1} to \code{9} controlling the level of compression; \code{1} is |
| 50 | fastest and produces the least compression, \code{9} is slowest and |
| 51 | produces the most. The default value is \code{6}. |
Guido van Rossum | 04bc9d6 | 1997-04-30 18:12:27 +0000 | [diff] [blame] | 52 | \end{funcdesc} |
| 53 | |
Fred Drake | cce1090 | 1998-03-17 06:33:25 +0000 | [diff] [blame] | 54 | \begin{funcdesc}{crc32}{string\optional{, value}} |
Fred Drake | 74810d5 | 1998-04-03 06:49:26 +0000 | [diff] [blame] | 55 | Computes a CRC (Cyclic Redundancy Check)% |
| 56 | \index{Cyclic Redundancy Check} |
Fred Drake | b208f12 | 1998-04-04 06:28:54 +0000 | [diff] [blame] | 57 | \index{checksum!Cyclic Redundancy Check} |
Fred Drake | 74810d5 | 1998-04-03 06:49:26 +0000 | [diff] [blame] | 58 | checksum of \var{string}. If |
| 59 | \var{value} is present, it is used as the starting value of the |
| 60 | checksum; otherwise, a fixed default value is used. This allows |
| 61 | computing a running checksum over the concatenation of several |
| 62 | input strings. The algorithm is not cryptographically strong, and |
Fred Drake | 327798c | 2001-10-15 13:45:49 +0000 | [diff] [blame] | 63 | should not be used for authentication or digital signatures. Since |
| 64 | the algorithm is designed for use as a checksum algorithm, it is not |
| 65 | suitable for use as a general hash algorithm. |
Guido van Rossum | 04bc9d6 | 1997-04-30 18:12:27 +0000 | [diff] [blame] | 66 | \end{funcdesc} |
| 67 | |
Fred Drake | 38e5d27 | 2000-04-03 20:13:55 +0000 | [diff] [blame] | 68 | \begin{funcdesc}{decompress}{string\optional{, wbits\optional{, bufsize}}} |
Fred Drake | 5916070 | 1998-06-19 21:18:28 +0000 | [diff] [blame] | 69 | Decompresses the data in \var{string}, returning a string containing |
| 70 | the uncompressed data. The \var{wbits} parameter controls the size of |
Fred Drake | 38e5d27 | 2000-04-03 20:13:55 +0000 | [diff] [blame] | 71 | the window buffer. If \var{bufsize} is given, it is used as the |
Fred Drake | 5916070 | 1998-06-19 21:18:28 +0000 | [diff] [blame] | 72 | initial size of the output buffer. Raises the \exception{error} |
| 73 | exception if any error occurs. |
Fred Drake | 38e5d27 | 2000-04-03 20:13:55 +0000 | [diff] [blame] | 74 | |
| 75 | The absolute value of \var{wbits} is the base two logarithm of the |
| 76 | size of the history buffer (the ``window size'') used when compressing |
| 77 | data. Its absolute value should be between 8 and 15 for the most |
| 78 | recent versions of the zlib library, larger values resulting in better |
| 79 | compression at the expense of greater memory usage. The default value |
| 80 | is 15. When \var{wbits} is negative, the standard |
| 81 | \program{gzip} header is suppressed; this is an undocumented feature |
| 82 | of the zlib library, used for compatibility with \program{unzip}'s |
| 83 | compression file format. |
| 84 | |
| 85 | \var{bufsize} is the initial size of the buffer used to hold |
| 86 | decompressed data. If more space is required, the buffer size will be |
| 87 | increased as needed, so you don't have to get this value exactly |
| 88 | right; tuning it will only save a few calls to \cfunction{malloc()}. The |
| 89 | default size is 16384. |
| 90 | |
Guido van Rossum | 04bc9d6 | 1997-04-30 18:12:27 +0000 | [diff] [blame] | 91 | \end{funcdesc} |
| 92 | |
| 93 | \begin{funcdesc}{decompressobj}{\optional{wbits}} |
Fred Drake | bc524c4 | 2001-04-18 20:16:51 +0000 | [diff] [blame] | 94 | Returns a decompression object, to be used for decompressing data |
Fred Drake | 5916070 | 1998-06-19 21:18:28 +0000 | [diff] [blame] | 95 | streams that won't fit into memory at once. The \var{wbits} |
| 96 | parameter controls the size of the window buffer. |
Guido van Rossum | 04bc9d6 | 1997-04-30 18:12:27 +0000 | [diff] [blame] | 97 | \end{funcdesc} |
| 98 | |
| 99 | Compression objects support the following methods: |
| 100 | |
Fred Drake | 74810d5 | 1998-04-03 06:49:26 +0000 | [diff] [blame] | 101 | \begin{methoddesc}[Compress]{compress}{string} |
Guido van Rossum | 04bc9d6 | 1997-04-30 18:12:27 +0000 | [diff] [blame] | 102 | Compress \var{string}, returning a string containing compressed data |
| 103 | for at least part of the data in \var{string}. This data should be |
| 104 | concatenated to the output produced by any preceding calls to the |
Fred Drake | ed79783 | 1998-01-22 16:11:18 +0000 | [diff] [blame] | 105 | \method{compress()} method. Some input may be kept in internal buffers |
Guido van Rossum | 04bc9d6 | 1997-04-30 18:12:27 +0000 | [diff] [blame] | 106 | for later processing. |
Fred Drake | 74810d5 | 1998-04-03 06:49:26 +0000 | [diff] [blame] | 107 | \end{methoddesc} |
Guido van Rossum | 04bc9d6 | 1997-04-30 18:12:27 +0000 | [diff] [blame] | 108 | |
Andrew M. Kuchling | f07c328 | 1998-12-31 21:14:23 +0000 | [diff] [blame] | 109 | \begin{methoddesc}[Compress]{flush}{\optional{mode}} |
| 110 | All pending input is processed, and a string containing the remaining |
| 111 | compressed output is returned. \var{mode} can be selected from the |
| 112 | constants \constant{Z_SYNC_FLUSH}, \constant{Z_FULL_FLUSH}, or |
| 113 | \constant{Z_FINISH}, defaulting to \constant{Z_FINISH}. \constant{Z_SYNC_FLUSH} and |
| 114 | \constant{Z_FULL_FLUSH} allow compressing further strings of data and |
| 115 | are used to allow partial error recovery on decompression, while |
| 116 | \constant{Z_FINISH} finishes the compressed stream and |
| 117 | prevents compressing any more data. After calling |
| 118 | \method{flush()} with \var{mode} set to \constant{Z_FINISH}, the |
Fred Drake | ed79783 | 1998-01-22 16:11:18 +0000 | [diff] [blame] | 119 | \method{compress()} method cannot be called again; the only realistic |
Andrew M. Kuchling | f07c328 | 1998-12-31 21:14:23 +0000 | [diff] [blame] | 120 | action is to delete the object. |
Fred Drake | 74810d5 | 1998-04-03 06:49:26 +0000 | [diff] [blame] | 121 | \end{methoddesc} |
Guido van Rossum | 04bc9d6 | 1997-04-30 18:12:27 +0000 | [diff] [blame] | 122 | |
Jeremy Hylton | 511e2ca | 2001-10-16 20:39:49 +0000 | [diff] [blame] | 123 | Decompression objects support the following methods, and two attributes: |
Fred Drake | 38e5d27 | 2000-04-03 20:13:55 +0000 | [diff] [blame] | 124 | |
| 125 | \begin{memberdesc}{unused_data} |
| 126 | A string which contains any unused data from the last string fed to |
| 127 | this decompression object. If the whole string turned out to contain |
| 128 | compressed data, this is \code{""}, the empty string. |
| 129 | |
| 130 | The only way to determine where a string of compressed data ends is by |
| 131 | actually decompressing it. This means that when compressed data is |
| 132 | contained part of a larger file, you can only find the end of it by |
| 133 | reading data and feeding it into a decompression object's |
| 134 | \method{decompress} method until the \member{unused_data} attribute is |
| 135 | no longer the empty string. |
| 136 | \end{memberdesc} |
Guido van Rossum | 04bc9d6 | 1997-04-30 18:12:27 +0000 | [diff] [blame] | 137 | |
Jeremy Hylton | 511e2ca | 2001-10-16 20:39:49 +0000 | [diff] [blame] | 138 | \begin{memberdesc}{unconsumed_tail} |
| 139 | A string that contains any data that was not consumed by the last |
| 140 | \method{decompress} call because it exceeded the limit for the |
| 141 | uncompressed data buffer. |
| 142 | \end{memberdesc} |
| 143 | |
| 144 | \begin{methoddesc}[Decompress]{decompress}{string}{\optional{max_length}} |
Guido van Rossum | 04bc9d6 | 1997-04-30 18:12:27 +0000 | [diff] [blame] | 145 | Decompress \var{string}, returning a string containing the |
| 146 | uncompressed data corresponding to at least part of the data in |
| 147 | \var{string}. This data should be concatenated to the output produced |
| 148 | by any preceding calls to the |
Fred Drake | ed79783 | 1998-01-22 16:11:18 +0000 | [diff] [blame] | 149 | \method{decompress()} method. Some of the input data may be preserved |
Guido van Rossum | 412154f | 1997-04-30 19:39:21 +0000 | [diff] [blame] | 150 | in internal buffers for later processing. |
Jeremy Hylton | 511e2ca | 2001-10-16 20:39:49 +0000 | [diff] [blame] | 151 | |
| 152 | If the optional parameter \var{max_length} is supplied then the return value |
| 153 | will be no longer than \var{max_length}. This may mean that not all of the |
| 154 | compressed input can be processed; and unconsumed data will be stored |
| 155 | in the attribute \member{unconsumed_tail}. This string must be passed |
| 156 | to a subsequent call to \method{decompress()} if decompression is to |
| 157 | continue. If \var{max_length} is not supplied then the whole input is |
| 158 | decompressed, and \member{unconsumed_tail} is an empty string. |
Fred Drake | 74810d5 | 1998-04-03 06:49:26 +0000 | [diff] [blame] | 159 | \end{methoddesc} |
Guido van Rossum | 04bc9d6 | 1997-04-30 18:12:27 +0000 | [diff] [blame] | 160 | |
Fred Drake | 74810d5 | 1998-04-03 06:49:26 +0000 | [diff] [blame] | 161 | \begin{methoddesc}[Decompress]{flush}{} |
Guido van Rossum | 04bc9d6 | 1997-04-30 18:12:27 +0000 | [diff] [blame] | 162 | All pending input is processed, and a string containing the remaining |
Fred Drake | ed79783 | 1998-01-22 16:11:18 +0000 | [diff] [blame] | 163 | uncompressed output is returned. After calling \method{flush()}, the |
| 164 | \method{decompress()} method cannot be called again; the only realistic |
Guido van Rossum | 04bc9d6 | 1997-04-30 18:12:27 +0000 | [diff] [blame] | 165 | action is to delete the object. |
Fred Drake | 74810d5 | 1998-04-03 06:49:26 +0000 | [diff] [blame] | 166 | \end{methoddesc} |
Guido van Rossum | 04bc9d6 | 1997-04-30 18:12:27 +0000 | [diff] [blame] | 167 | |
Guido van Rossum | e47da0a | 1997-07-17 16:34:52 +0000 | [diff] [blame] | 168 | \begin{seealso} |
Fred Drake | ba0a989 | 2000-10-18 17:43:06 +0000 | [diff] [blame] | 169 | \seemodule{gzip}{Reading and writing \program{gzip}-format files.} |
Fred Drake | b037d33 | 2001-06-25 15:30:13 +0000 | [diff] [blame] | 170 | \seeurl{http://www.gzip.org/zlib/}{The zlib library home page.} |
Guido van Rossum | e47da0a | 1997-07-17 16:34:52 +0000 | [diff] [blame] | 171 | \end{seealso} |