Fred Drake | 295da24 | 1998-08-10 19:42:37 +0000 | [diff] [blame] | 1 | \section{\module{zlib} --- |
Fred Drake | b11d108 | 1999-04-21 18:44:41 +0000 | [diff] [blame] | 2 | Compression compatible with \program{gzip}} |
Fred Drake | b91e934 | 1998-07-23 17:59:49 +0000 | [diff] [blame] | 3 | |
Fred Drake | bbac432 | 1999-02-20 00:14:17 +0000 | [diff] [blame] | 4 | \declaremodule{builtin}{zlib} |
Fred Drake | 08caa96 | 1998-07-27 22:08:49 +0000 | [diff] [blame] | 5 | \modulesynopsis{Low-level interface to compression and decompression |
Fred Drake | b11d108 | 1999-04-21 18:44:41 +0000 | [diff] [blame] | 6 | routines compatible with \program{gzip}.} |
Fred Drake | b91e934 | 1998-07-23 17:59:49 +0000 | [diff] [blame] | 7 | |
Guido van Rossum | 04bc9d6 | 1997-04-30 18:12:27 +0000 | [diff] [blame] | 8 | |
| 9 | For applications that require data compression, the functions in this |
Fred Drake | 8a254b5 | 1998-04-09 15:41:44 +0000 | [diff] [blame] | 10 | module allow compression and decompression, using the zlib library. |
Andrew M. Kuchling | 2330e9e | 2005-08-31 16:52:40 +0000 | [diff] [blame] | 11 | The zlib library has its own home page at \url{http://www.zlib.net}. |
Andrew M. Kuchling | 57712b3 | 2004-10-19 19:50:23 +0000 | [diff] [blame] | 12 | There are known incompatibilities between the Python module and |
| 13 | versions of the zlib library earlier than 1.1.3; 1.1.3 has a security |
| 14 | vulnerability, so we recommend using 1.1.4 or later. |
Jeremy Hylton | 45b0aed | 1999-04-05 21:55:21 +0000 | [diff] [blame] | 15 | |
Andrew M. Kuchling | 2330e9e | 2005-08-31 16:52:40 +0000 | [diff] [blame] | 16 | zlib's functions have many options and often need to be used in a |
| 17 | particular order. This documentation doesn't attempt to cover all of |
| 18 | the permutations; consult the zlib manual at |
| 19 | \url{http://www.zlib.net/manual.html} for authoritative information. |
| 20 | |
Fred Drake | 74810d5 | 1998-04-03 06:49:26 +0000 | [diff] [blame] | 21 | The available exception and functions in this module are: |
Guido van Rossum | 04bc9d6 | 1997-04-30 18:12:27 +0000 | [diff] [blame] | 22 | |
Fred Drake | 74810d5 | 1998-04-03 06:49:26 +0000 | [diff] [blame] | 23 | \begin{excdesc}{error} |
| 24 | Exception raised on compression and decompression errors. |
| 25 | \end{excdesc} |
| 26 | |
| 27 | |
Fred Drake | cce1090 | 1998-03-17 06:33:25 +0000 | [diff] [blame] | 28 | \begin{funcdesc}{adler32}{string\optional{, value}} |
Guido van Rossum | 04bc9d6 | 1997-04-30 18:12:27 +0000 | [diff] [blame] | 29 | Computes a Adler-32 checksum of \var{string}. (An Adler-32 |
| 30 | checksum is almost as reliable as a CRC32 but can be computed much |
| 31 | more quickly.) If \var{value} is present, it is used as the |
| 32 | starting value of the checksum; otherwise, a fixed default value is |
| 33 | used. This allows computing a running checksum over the |
| 34 | concatenation of several input strings. The algorithm is not |
| 35 | cryptographically strong, and should not be used for |
Fred Drake | 327798c | 2001-10-15 13:45:49 +0000 | [diff] [blame] | 36 | authentication or digital signatures. Since the algorithm is |
| 37 | designed for use as a checksum algorithm, it is not suitable for |
| 38 | use as a general hash algorithm. |
Guido van Rossum | 04bc9d6 | 1997-04-30 18:12:27 +0000 | [diff] [blame] | 39 | \end{funcdesc} |
| 40 | |
Fred Drake | cce1090 | 1998-03-17 06:33:25 +0000 | [diff] [blame] | 41 | \begin{funcdesc}{compress}{string\optional{, level}} |
Fred Drake | 5916070 | 1998-06-19 21:18:28 +0000 | [diff] [blame] | 42 | Compresses the data in \var{string}, returning a string contained |
| 43 | compressed data. \var{level} is an integer from \code{1} to |
| 44 | \code{9} controlling the level of compression; \code{1} is fastest |
| 45 | and produces the least compression, \code{9} is slowest and produces |
| 46 | the most. The default value is \code{6}. Raises the |
| 47 | \exception{error} exception if any error occurs. |
Guido van Rossum | 04bc9d6 | 1997-04-30 18:12:27 +0000 | [diff] [blame] | 48 | \end{funcdesc} |
| 49 | |
| 50 | \begin{funcdesc}{compressobj}{\optional{level}} |
Fred Drake | ed79783 | 1998-01-22 16:11:18 +0000 | [diff] [blame] | 51 | Returns a compression object, to be used for compressing data streams |
Guido van Rossum | 04bc9d6 | 1997-04-30 18:12:27 +0000 | [diff] [blame] | 52 | that won't fit into memory at once. \var{level} is an integer from |
Fred Drake | ed79783 | 1998-01-22 16:11:18 +0000 | [diff] [blame] | 53 | \code{1} to \code{9} controlling the level of compression; \code{1} is |
| 54 | fastest and produces the least compression, \code{9} is slowest and |
| 55 | produces the most. The default value is \code{6}. |
Guido van Rossum | 04bc9d6 | 1997-04-30 18:12:27 +0000 | [diff] [blame] | 56 | \end{funcdesc} |
| 57 | |
Fred Drake | cce1090 | 1998-03-17 06:33:25 +0000 | [diff] [blame] | 58 | \begin{funcdesc}{crc32}{string\optional{, value}} |
Fred Drake | 74810d5 | 1998-04-03 06:49:26 +0000 | [diff] [blame] | 59 | Computes a CRC (Cyclic Redundancy Check)% |
| 60 | \index{Cyclic Redundancy Check} |
Fred Drake | b208f12 | 1998-04-04 06:28:54 +0000 | [diff] [blame] | 61 | \index{checksum!Cyclic Redundancy Check} |
Fred Drake | 74810d5 | 1998-04-03 06:49:26 +0000 | [diff] [blame] | 62 | checksum of \var{string}. If |
| 63 | \var{value} is present, it is used as the starting value of the |
| 64 | checksum; otherwise, a fixed default value is used. This allows |
| 65 | computing a running checksum over the concatenation of several |
| 66 | input strings. The algorithm is not cryptographically strong, and |
Fred Drake | 327798c | 2001-10-15 13:45:49 +0000 | [diff] [blame] | 67 | should not be used for authentication or digital signatures. Since |
| 68 | the algorithm is designed for use as a checksum algorithm, it is not |
| 69 | suitable for use as a general hash algorithm. |
Guido van Rossum | 04bc9d6 | 1997-04-30 18:12:27 +0000 | [diff] [blame] | 70 | \end{funcdesc} |
| 71 | |
Fred Drake | 38e5d27 | 2000-04-03 20:13:55 +0000 | [diff] [blame] | 72 | \begin{funcdesc}{decompress}{string\optional{, wbits\optional{, bufsize}}} |
Fred Drake | 5916070 | 1998-06-19 21:18:28 +0000 | [diff] [blame] | 73 | Decompresses the data in \var{string}, returning a string containing |
| 74 | the uncompressed data. The \var{wbits} parameter controls the size of |
Fred Drake | 38e5d27 | 2000-04-03 20:13:55 +0000 | [diff] [blame] | 75 | the window buffer. If \var{bufsize} is given, it is used as the |
Fred Drake | 5916070 | 1998-06-19 21:18:28 +0000 | [diff] [blame] | 76 | initial size of the output buffer. Raises the \exception{error} |
| 77 | exception if any error occurs. |
Fred Drake | 38e5d27 | 2000-04-03 20:13:55 +0000 | [diff] [blame] | 78 | |
| 79 | The absolute value of \var{wbits} is the base two logarithm of the |
| 80 | size of the history buffer (the ``window size'') used when compressing |
| 81 | data. Its absolute value should be between 8 and 15 for the most |
| 82 | recent versions of the zlib library, larger values resulting in better |
| 83 | compression at the expense of greater memory usage. The default value |
| 84 | is 15. When \var{wbits} is negative, the standard |
| 85 | \program{gzip} header is suppressed; this is an undocumented feature |
| 86 | of the zlib library, used for compatibility with \program{unzip}'s |
| 87 | compression file format. |
| 88 | |
| 89 | \var{bufsize} is the initial size of the buffer used to hold |
| 90 | decompressed data. If more space is required, the buffer size will be |
| 91 | increased as needed, so you don't have to get this value exactly |
| 92 | right; tuning it will only save a few calls to \cfunction{malloc()}. The |
| 93 | default size is 16384. |
| 94 | |
Guido van Rossum | 04bc9d6 | 1997-04-30 18:12:27 +0000 | [diff] [blame] | 95 | \end{funcdesc} |
| 96 | |
| 97 | \begin{funcdesc}{decompressobj}{\optional{wbits}} |
Fred Drake | bc524c4 | 2001-04-18 20:16:51 +0000 | [diff] [blame] | 98 | Returns a decompression object, to be used for decompressing data |
Fred Drake | 5916070 | 1998-06-19 21:18:28 +0000 | [diff] [blame] | 99 | streams that won't fit into memory at once. The \var{wbits} |
| 100 | parameter controls the size of the window buffer. |
Guido van Rossum | 04bc9d6 | 1997-04-30 18:12:27 +0000 | [diff] [blame] | 101 | \end{funcdesc} |
| 102 | |
| 103 | Compression objects support the following methods: |
| 104 | |
Fred Drake | 74810d5 | 1998-04-03 06:49:26 +0000 | [diff] [blame] | 105 | \begin{methoddesc}[Compress]{compress}{string} |
Guido van Rossum | 04bc9d6 | 1997-04-30 18:12:27 +0000 | [diff] [blame] | 106 | Compress \var{string}, returning a string containing compressed data |
| 107 | for at least part of the data in \var{string}. This data should be |
| 108 | concatenated to the output produced by any preceding calls to the |
Fred Drake | ed79783 | 1998-01-22 16:11:18 +0000 | [diff] [blame] | 109 | \method{compress()} method. Some input may be kept in internal buffers |
Guido van Rossum | 04bc9d6 | 1997-04-30 18:12:27 +0000 | [diff] [blame] | 110 | for later processing. |
Fred Drake | 74810d5 | 1998-04-03 06:49:26 +0000 | [diff] [blame] | 111 | \end{methoddesc} |
Guido van Rossum | 04bc9d6 | 1997-04-30 18:12:27 +0000 | [diff] [blame] | 112 | |
Andrew M. Kuchling | f07c328 | 1998-12-31 21:14:23 +0000 | [diff] [blame] | 113 | \begin{methoddesc}[Compress]{flush}{\optional{mode}} |
| 114 | All pending input is processed, and a string containing the remaining |
| 115 | compressed output is returned. \var{mode} can be selected from the |
| 116 | constants \constant{Z_SYNC_FLUSH}, \constant{Z_FULL_FLUSH}, or |
| 117 | \constant{Z_FINISH}, defaulting to \constant{Z_FINISH}. \constant{Z_SYNC_FLUSH} and |
Andrew M. Kuchling | c1c956b | 2005-09-01 14:08:38 +0000 | [diff] [blame] | 118 | \constant{Z_FULL_FLUSH} allow compressing further strings of data, while |
Andrew M. Kuchling | f07c328 | 1998-12-31 21:14:23 +0000 | [diff] [blame] | 119 | \constant{Z_FINISH} finishes the compressed stream and |
| 120 | prevents compressing any more data. After calling |
| 121 | \method{flush()} with \var{mode} set to \constant{Z_FINISH}, the |
Fred Drake | ed79783 | 1998-01-22 16:11:18 +0000 | [diff] [blame] | 122 | \method{compress()} method cannot be called again; the only realistic |
Andrew M. Kuchling | f07c328 | 1998-12-31 21:14:23 +0000 | [diff] [blame] | 123 | action is to delete the object. |
Fred Drake | 74810d5 | 1998-04-03 06:49:26 +0000 | [diff] [blame] | 124 | \end{methoddesc} |
Guido van Rossum | 04bc9d6 | 1997-04-30 18:12:27 +0000 | [diff] [blame] | 125 | |
Georg Brandl | 8d3342b | 2006-05-16 07:38:27 +0000 | [diff] [blame] | 126 | \begin{methoddesc}[Compress]{copy}{} |
| 127 | Returns a copy of the compression object. This can be used to efficiently |
| 128 | compress a set of data that share a common initial prefix. |
| 129 | \versionadded{2.5} |
| 130 | \end{methoddesc} |
| 131 | |
Jeremy Hylton | 511e2ca | 2001-10-16 20:39:49 +0000 | [diff] [blame] | 132 | Decompression objects support the following methods, and two attributes: |
Fred Drake | 38e5d27 | 2000-04-03 20:13:55 +0000 | [diff] [blame] | 133 | |
| 134 | \begin{memberdesc}{unused_data} |
Martin v. Löwis | 9e9a7c3 | 2003-06-21 14:15:25 +0000 | [diff] [blame] | 135 | A string which contains any bytes past the end of the compressed data. |
| 136 | That is, this remains \code{""} until the last byte that contains |
| 137 | compression data is available. If the whole string turned out to |
| 138 | contain compressed data, this is \code{""}, the empty string. |
Fred Drake | 38e5d27 | 2000-04-03 20:13:55 +0000 | [diff] [blame] | 139 | |
| 140 | The only way to determine where a string of compressed data ends is by |
| 141 | actually decompressing it. This means that when compressed data is |
| 142 | contained part of a larger file, you can only find the end of it by |
Martin v. Löwis | 9e9a7c3 | 2003-06-21 14:15:25 +0000 | [diff] [blame] | 143 | reading data and feeding it followed by some non-empty string into a |
| 144 | decompression object's \method{decompress} method until the |
| 145 | \member{unused_data} attribute is no longer the empty string. |
Fred Drake | 38e5d27 | 2000-04-03 20:13:55 +0000 | [diff] [blame] | 146 | \end{memberdesc} |
Guido van Rossum | 04bc9d6 | 1997-04-30 18:12:27 +0000 | [diff] [blame] | 147 | |
Jeremy Hylton | 511e2ca | 2001-10-16 20:39:49 +0000 | [diff] [blame] | 148 | \begin{memberdesc}{unconsumed_tail} |
| 149 | A string that contains any data that was not consumed by the last |
| 150 | \method{decompress} call because it exceeded the limit for the |
Martin v. Löwis | 9e9a7c3 | 2003-06-21 14:15:25 +0000 | [diff] [blame] | 151 | uncompressed data buffer. This data has not yet been seen by the zlib |
| 152 | machinery, so you must feed it (possibly with further data |
| 153 | concatenated to it) back to a subsequent \method{decompress} method |
| 154 | call in order to get correct output. |
Jeremy Hylton | 511e2ca | 2001-10-16 20:39:49 +0000 | [diff] [blame] | 155 | \end{memberdesc} |
| 156 | |
Martin v. Löwis | 9e9a7c3 | 2003-06-21 14:15:25 +0000 | [diff] [blame] | 157 | |
Raymond Hettinger | f964154 | 2004-12-20 06:08:12 +0000 | [diff] [blame] | 158 | \begin{methoddesc}[Decompress]{decompress}{string\optional{, max_length}} |
Guido van Rossum | 04bc9d6 | 1997-04-30 18:12:27 +0000 | [diff] [blame] | 159 | Decompress \var{string}, returning a string containing the |
| 160 | uncompressed data corresponding to at least part of the data in |
| 161 | \var{string}. This data should be concatenated to the output produced |
| 162 | by any preceding calls to the |
Fred Drake | ed79783 | 1998-01-22 16:11:18 +0000 | [diff] [blame] | 163 | \method{decompress()} method. Some of the input data may be preserved |
Guido van Rossum | 412154f | 1997-04-30 19:39:21 +0000 | [diff] [blame] | 164 | in internal buffers for later processing. |
Jeremy Hylton | 511e2ca | 2001-10-16 20:39:49 +0000 | [diff] [blame] | 165 | |
| 166 | If the optional parameter \var{max_length} is supplied then the return value |
| 167 | will be no longer than \var{max_length}. This may mean that not all of the |
| 168 | compressed input can be processed; and unconsumed data will be stored |
| 169 | in the attribute \member{unconsumed_tail}. This string must be passed |
| 170 | to a subsequent call to \method{decompress()} if decompression is to |
| 171 | continue. If \var{max_length} is not supplied then the whole input is |
| 172 | decompressed, and \member{unconsumed_tail} is an empty string. |
Fred Drake | 74810d5 | 1998-04-03 06:49:26 +0000 | [diff] [blame] | 173 | \end{methoddesc} |
Guido van Rossum | 04bc9d6 | 1997-04-30 18:12:27 +0000 | [diff] [blame] | 174 | |
Georg Brandl | 22a9dc8 | 2006-04-01 07:39:41 +0000 | [diff] [blame] | 175 | \begin{methoddesc}[Decompress]{flush}{\optional{length}} |
Guido van Rossum | 04bc9d6 | 1997-04-30 18:12:27 +0000 | [diff] [blame] | 176 | All pending input is processed, and a string containing the remaining |
Fred Drake | ed79783 | 1998-01-22 16:11:18 +0000 | [diff] [blame] | 177 | uncompressed output is returned. After calling \method{flush()}, the |
| 178 | \method{decompress()} method cannot be called again; the only realistic |
Guido van Rossum | 04bc9d6 | 1997-04-30 18:12:27 +0000 | [diff] [blame] | 179 | action is to delete the object. |
Georg Brandl | 22a9dc8 | 2006-04-01 07:39:41 +0000 | [diff] [blame] | 180 | |
| 181 | The optional parameter \var{length} sets the initial size of the |
| 182 | output buffer. |
Fred Drake | 74810d5 | 1998-04-03 06:49:26 +0000 | [diff] [blame] | 183 | \end{methoddesc} |
Guido van Rossum | 04bc9d6 | 1997-04-30 18:12:27 +0000 | [diff] [blame] | 184 | |
Georg Brandl | 8d3342b | 2006-05-16 07:38:27 +0000 | [diff] [blame] | 185 | \begin{methoddesc}[Decompress]{copy}{} |
| 186 | Returns a copy of the decompression object. This can be used to save the |
| 187 | state of the decompressor midway through the data stream in order to speed up |
| 188 | random seeks into the stream at a future point. |
| 189 | \versionadded{2.5} |
| 190 | \end{methoddesc} |
| 191 | |
Guido van Rossum | e47da0a | 1997-07-17 16:34:52 +0000 | [diff] [blame] | 192 | \begin{seealso} |
Fred Drake | ba0a989 | 2000-10-18 17:43:06 +0000 | [diff] [blame] | 193 | \seemodule{gzip}{Reading and writing \program{gzip}-format files.} |
Andrew M. Kuchling | 2330e9e | 2005-08-31 16:52:40 +0000 | [diff] [blame] | 194 | \seeurl{http://www.zlib.net}{The zlib library home page.} |
| 195 | \seeurl{http://www.zlib.net/manual.html}{The zlib manual explains |
| 196 | the semantics and usage of the library's many functions.} |
Guido van Rossum | e47da0a | 1997-07-17 16:34:52 +0000 | [diff] [blame] | 197 | \end{seealso} |