Georg Brandl | 8ec7f65 | 2007-08-15 14:28:01 +0000 | [diff] [blame] | 1 | |
| 2 | :mod:`zlib` --- Compression compatible with :program:`gzip` |
| 3 | =========================================================== |
| 4 | |
| 5 | .. module:: zlib |
| 6 | :synopsis: Low-level interface to compression and decompression routines compatible with |
| 7 | gzip. |
| 8 | |
| 9 | |
| 10 | For applications that require data compression, the functions in this module |
| 11 | allow compression and decompression, using the zlib library. The zlib library |
| 12 | has its own home page at http://www.zlib.net. There are known |
| 13 | incompatibilities between the Python module and versions of the zlib library |
| 14 | earlier than 1.1.3; 1.1.3 has a security vulnerability, so we recommend using |
| 15 | 1.1.4 or later. |
| 16 | |
| 17 | zlib's functions have many options and often need to be used in a particular |
| 18 | order. This documentation doesn't attempt to cover all of the permutations; |
| 19 | consult the zlib manual at http://www.zlib.net/manual.html for authoritative |
| 20 | information. |
| 21 | |
Éric Araujo | c3cc2ac | 2012-02-26 01:10:14 +0100 | [diff] [blame] | 22 | For reading and writing ``.gz`` files see the :mod:`gzip` module. |
Mark Summerfield | aea6e59 | 2007-11-05 09:22:48 +0000 | [diff] [blame] | 23 | |
Georg Brandl | 8ec7f65 | 2007-08-15 14:28:01 +0000 | [diff] [blame] | 24 | The available exception and functions in this module are: |
| 25 | |
| 26 | |
| 27 | .. exception:: error |
| 28 | |
| 29 | Exception raised on compression and decompression errors. |
| 30 | |
| 31 | |
Gregory P. Smith | 987735c | 2009-01-11 17:57:54 +0000 | [diff] [blame] | 32 | .. function:: adler32(data[, value]) |
Georg Brandl | 8ec7f65 | 2007-08-15 14:28:01 +0000 | [diff] [blame] | 33 | |
Serhiy Storchaka | c72e66a | 2015-11-02 15:06:09 +0200 | [diff] [blame] | 34 | Computes an Adler-32 checksum of *data*. (An Adler-32 checksum is almost as |
Georg Brandl | 8ec7f65 | 2007-08-15 14:28:01 +0000 | [diff] [blame] | 35 | reliable as a CRC32 but can be computed much more quickly.) If *value* is |
| 36 | present, it is used as the starting value of the checksum; otherwise, a fixed |
| 37 | default value is used. This allows computing a running checksum over the |
Gregory P. Smith | 987735c | 2009-01-11 17:57:54 +0000 | [diff] [blame] | 38 | concatenation of several inputs. The algorithm is not cryptographically |
Georg Brandl | 8ec7f65 | 2007-08-15 14:28:01 +0000 | [diff] [blame] | 39 | strong, and should not be used for authentication or digital signatures. Since |
| 40 | the algorithm is designed for use as a checksum algorithm, it is not suitable |
| 41 | for use as a general hash algorithm. |
| 42 | |
Gregory P. Smith | f48f9d3 | 2008-03-17 18:48:05 +0000 | [diff] [blame] | 43 | This function always returns an integer object. |
| 44 | |
Gregory P. Smith | 987735c | 2009-01-11 17:57:54 +0000 | [diff] [blame] | 45 | .. note:: |
| 46 | To generate the same numeric value across all Python versions and |
| 47 | platforms use adler32(data) & 0xffffffff. If you are only using |
| 48 | the checksum in packed binary format this is not necessary as the |
Gregory P. Smith | 86cc502 | 2009-02-01 00:24:21 +0000 | [diff] [blame] | 49 | return value is the correct 32bit binary representation |
Gregory P. Smith | 987735c | 2009-01-11 17:57:54 +0000 | [diff] [blame] | 50 | regardless of sign. |
| 51 | |
| 52 | .. versionchanged:: 2.6 |
Gregory P. Smith | 86cc502 | 2009-02-01 00:24:21 +0000 | [diff] [blame] | 53 | The return value is in the range [-2**31, 2**31-1] |
| 54 | regardless of platform. In older versions the value is |
Gregory P. Smith | 987735c | 2009-01-11 17:57:54 +0000 | [diff] [blame] | 55 | signed on some platforms and unsigned on others. |
| 56 | |
| 57 | .. versionchanged:: 3.0 |
Gregory P. Smith | 86cc502 | 2009-02-01 00:24:21 +0000 | [diff] [blame] | 58 | The return value is unsigned and in the range [0, 2**32-1] |
Gregory P. Smith | 987735c | 2009-01-11 17:57:54 +0000 | [diff] [blame] | 59 | regardless of platform. |
Gregory P. Smith | f48f9d3 | 2008-03-17 18:48:05 +0000 | [diff] [blame] | 60 | |
Georg Brandl | 8ec7f65 | 2007-08-15 14:28:01 +0000 | [diff] [blame] | 61 | |
| 62 | .. function:: compress(string[, level]) |
| 63 | |
| 64 | Compresses the data in *string*, returning a string contained compressed data. |
Nadeem Vawda | 04050b8 | 2012-11-11 13:52:10 +0100 | [diff] [blame] | 65 | *level* is an integer from ``0`` to ``9`` controlling the level of compression; |
Georg Brandl | 8ec7f65 | 2007-08-15 14:28:01 +0000 | [diff] [blame] | 66 | ``1`` is fastest and produces the least compression, ``9`` is slowest and |
Nadeem Vawda | 04050b8 | 2012-11-11 13:52:10 +0100 | [diff] [blame] | 67 | produces the most. ``0`` is no compression. The default value is ``6``. |
| 68 | Raises the :exc:`error` exception if any error occurs. |
Georg Brandl | 8ec7f65 | 2007-08-15 14:28:01 +0000 | [diff] [blame] | 69 | |
| 70 | |
Georg Brandl | cea3808 | 2013-10-17 19:51:00 +0200 | [diff] [blame] | 71 | .. function:: compressobj([level[, method[, wbits[, memlevel[, strategy]]]]]) |
Georg Brandl | 8ec7f65 | 2007-08-15 14:28:01 +0000 | [diff] [blame] | 72 | |
| 73 | Returns a compression object, to be used for compressing data streams that won't |
Martin Panter | 1d269c1 | 2016-02-03 07:06:33 +0000 | [diff] [blame] | 74 | fit into memory at once. *level* is an integer from |
| 75 | ``0`` to ``9`` or ``-1``, controlling |
Georg Brandl | 8ec7f65 | 2007-08-15 14:28:01 +0000 | [diff] [blame] | 76 | the level of compression; ``1`` is fastest and produces the least compression, |
Nadeem Vawda | 04050b8 | 2012-11-11 13:52:10 +0100 | [diff] [blame] | 77 | ``9`` is slowest and produces the most. ``0`` is no compression. The default |
Martin Panter | 1d269c1 | 2016-02-03 07:06:33 +0000 | [diff] [blame] | 78 | value is ``-1`` (Z_DEFAULT_COMPRESSION). Z_DEFAULT_COMPRESSION represents a default |
| 79 | compromise between speed and compression (currently equivalent to level 6). |
Georg Brandl | 8ec7f65 | 2007-08-15 14:28:01 +0000 | [diff] [blame] | 80 | |
Georg Brandl | cea3808 | 2013-10-17 19:51:00 +0200 | [diff] [blame] | 81 | *method* is the compression algorithm. Currently, the only supported value is |
| 82 | ``DEFLATED``. |
| 83 | |
Martin Panter | 9c946bb | 2016-05-27 07:32:11 +0000 | [diff] [blame] | 84 | The *wbits* argument controls the size of the history buffer (or the |
| 85 | "window size") used when compressing data, and whether a header and |
| 86 | trailer is included in the output. It can take several ranges of values. |
| 87 | The default is 15. |
| 88 | |
| 89 | * +9 to +15: The base-two logarithm of the window size, which |
| 90 | therefore ranges between 512 and 32768. Larger values produce |
| 91 | better compression at the expense of greater memory usage. The |
| 92 | resulting output will include a zlib-specific header and trailer. |
| 93 | |
| 94 | * −9 to −15: Uses the absolute value of *wbits* as the |
| 95 | window size logarithm, while producing a raw output stream with no |
| 96 | header or trailing checksum. |
| 97 | |
| 98 | * +25 to +31 = 16 + (9 to 15): Uses the low 4 bits of the value as the |
| 99 | window size logarithm, while including a basic :program:`gzip` header |
| 100 | and trailing checksum in the output. |
Georg Brandl | cea3808 | 2013-10-17 19:51:00 +0200 | [diff] [blame] | 101 | |
| 102 | *memlevel* controls the amount of memory used for internal compression state. |
| 103 | Valid values range from ``1`` to ``9``. Higher values using more memory, |
| 104 | but are faster and produce smaller output. The default is 8. |
| 105 | |
| 106 | *strategy* is used to tune the compression algorithm. Possible values are |
| 107 | ``Z_DEFAULT_STRATEGY``, ``Z_FILTERED``, and ``Z_HUFFMAN_ONLY``. The default |
| 108 | is ``Z_DEFAULT_STRATEGY``. |
| 109 | |
Georg Brandl | 8ec7f65 | 2007-08-15 14:28:01 +0000 | [diff] [blame] | 110 | |
Gregory P. Smith | 987735c | 2009-01-11 17:57:54 +0000 | [diff] [blame] | 111 | .. function:: crc32(data[, value]) |
Georg Brandl | 8ec7f65 | 2007-08-15 14:28:01 +0000 | [diff] [blame] | 112 | |
| 113 | .. index:: |
| 114 | single: Cyclic Redundancy Check |
| 115 | single: checksum; Cyclic Redundancy Check |
| 116 | |
Gregory P. Smith | 987735c | 2009-01-11 17:57:54 +0000 | [diff] [blame] | 117 | Computes a CRC (Cyclic Redundancy Check) checksum of *data*. If *value* is |
Georg Brandl | 8ec7f65 | 2007-08-15 14:28:01 +0000 | [diff] [blame] | 118 | present, it is used as the starting value of the checksum; otherwise, a fixed |
| 119 | default value is used. This allows computing a running checksum over the |
Gregory P. Smith | 987735c | 2009-01-11 17:57:54 +0000 | [diff] [blame] | 120 | concatenation of several inputs. The algorithm is not cryptographically |
Georg Brandl | 8ec7f65 | 2007-08-15 14:28:01 +0000 | [diff] [blame] | 121 | strong, and should not be used for authentication or digital signatures. Since |
| 122 | the algorithm is designed for use as a checksum algorithm, it is not suitable |
| 123 | for use as a general hash algorithm. |
| 124 | |
Gregory P. Smith | f48f9d3 | 2008-03-17 18:48:05 +0000 | [diff] [blame] | 125 | This function always returns an integer object. |
| 126 | |
Gregory P. Smith | 987735c | 2009-01-11 17:57:54 +0000 | [diff] [blame] | 127 | .. note:: |
| 128 | To generate the same numeric value across all Python versions and |
| 129 | platforms use crc32(data) & 0xffffffff. If you are only using |
| 130 | the checksum in packed binary format this is not necessary as the |
Gregory P. Smith | 86cc502 | 2009-02-01 00:24:21 +0000 | [diff] [blame] | 131 | return value is the correct 32bit binary representation |
Gregory P. Smith | 987735c | 2009-01-11 17:57:54 +0000 | [diff] [blame] | 132 | regardless of sign. |
| 133 | |
| 134 | .. versionchanged:: 2.6 |
Gregory P. Smith | 86cc502 | 2009-02-01 00:24:21 +0000 | [diff] [blame] | 135 | The return value is in the range [-2**31, 2**31-1] |
Gregory P. Smith | 987735c | 2009-01-11 17:57:54 +0000 | [diff] [blame] | 136 | regardless of platform. In older versions the value would be |
| 137 | signed on some platforms and unsigned on others. |
| 138 | |
| 139 | .. versionchanged:: 3.0 |
Gregory P. Smith | 86cc502 | 2009-02-01 00:24:21 +0000 | [diff] [blame] | 140 | The return value is unsigned and in the range [0, 2**32-1] |
Gregory P. Smith | 987735c | 2009-01-11 17:57:54 +0000 | [diff] [blame] | 141 | regardless of platform. |
Gregory P. Smith | f48f9d3 | 2008-03-17 18:48:05 +0000 | [diff] [blame] | 142 | |
Georg Brandl | 8ec7f65 | 2007-08-15 14:28:01 +0000 | [diff] [blame] | 143 | |
| 144 | .. function:: decompress(string[, wbits[, bufsize]]) |
| 145 | |
| 146 | Decompresses the data in *string*, returning a string containing the |
Martin Panter | 9c946bb | 2016-05-27 07:32:11 +0000 | [diff] [blame] | 147 | uncompressed data. The *wbits* parameter depends on |
| 148 | the format of *string*, and is discussed further below. |
Andrew M. Kuchling | 66dab17 | 2010-03-01 19:51:43 +0000 | [diff] [blame] | 149 | If *bufsize* is given, it is used as the initial size of the output |
Georg Brandl | 8ec7f65 | 2007-08-15 14:28:01 +0000 | [diff] [blame] | 150 | buffer. Raises the :exc:`error` exception if any error occurs. |
| 151 | |
Martin Panter | 9c946bb | 2016-05-27 07:32:11 +0000 | [diff] [blame] | 152 | .. _decompress-wbits: |
| 153 | |
| 154 | The *wbits* parameter controls the size of the history buffer |
| 155 | (or "window size"), and what header and trailer format is expected. |
| 156 | It is similar to the parameter for :func:`compressobj`, but accepts |
| 157 | more ranges of values: |
| 158 | |
| 159 | * +8 to +15: The base-two logarithm of the window size. The input |
| 160 | must include a zlib header and trailer. |
| 161 | |
| 162 | * 0: Automatically determine the window size from the zlib header. |
Martin Panter | 6ecfab8 | 2016-05-27 11:20:21 +0000 | [diff] [blame] | 163 | Only supported since zlib 1.2.3.5. |
Martin Panter | 9c946bb | 2016-05-27 07:32:11 +0000 | [diff] [blame] | 164 | |
| 165 | * −8 to −15: Uses the absolute value of *wbits* as the window size |
| 166 | logarithm. The input must be a raw stream with no header or trailer. |
| 167 | |
| 168 | * +24 to +31 = 16 + (8 to 15): Uses the low 4 bits of the value as |
| 169 | the window size logarithm. The input must include a gzip header and |
| 170 | trailer. |
| 171 | |
| 172 | * +40 to +47 = 32 + (8 to 15): Uses the low 4 bits of the value as |
| 173 | the window size logarithm, and automatically accepts either |
| 174 | the zlib or gzip format. |
| 175 | |
| 176 | When decompressing a stream, the window size must not be smaller |
Andrew M. Kuchling | 66dab17 | 2010-03-01 19:51:43 +0000 | [diff] [blame] | 177 | than the size originally used to compress the stream; using a too-small |
Martin Panter | 9c946bb | 2016-05-27 07:32:11 +0000 | [diff] [blame] | 178 | value may result in an :exc:`error` exception. The default *wbits* value |
| 179 | is 15, which corresponds to the largest window size and requires a zlib |
| 180 | header and trailer to be included. |
Georg Brandl | 8ec7f65 | 2007-08-15 14:28:01 +0000 | [diff] [blame] | 181 | |
| 182 | *bufsize* is the initial size of the buffer used to hold decompressed data. If |
| 183 | more space is required, the buffer size will be increased as needed, so you |
| 184 | don't have to get this value exactly right; tuning it will only save a few calls |
Sandro Tosi | 98ed08f | 2012-01-14 16:42:02 +0100 | [diff] [blame] | 185 | to :c:func:`malloc`. The default size is 16384. |
Georg Brandl | 8ec7f65 | 2007-08-15 14:28:01 +0000 | [diff] [blame] | 186 | |
| 187 | |
| 188 | .. function:: decompressobj([wbits]) |
| 189 | |
| 190 | Returns a decompression object, to be used for decompressing data streams that |
Martin Panter | 9c946bb | 2016-05-27 07:32:11 +0000 | [diff] [blame] | 191 | won't fit into memory at once. |
| 192 | |
| 193 | The *wbits* parameter controls the size of the history buffer (or the |
| 194 | "window size"), and what header and trailer format is expected. It has |
| 195 | the same meaning as `described for decompress() <#decompress-wbits>`__. |
Georg Brandl | 8ec7f65 | 2007-08-15 14:28:01 +0000 | [diff] [blame] | 196 | |
| 197 | Compression objects support the following methods: |
| 198 | |
| 199 | |
| 200 | .. method:: Compress.compress(string) |
| 201 | |
| 202 | Compress *string*, returning a string containing compressed data for at least |
| 203 | part of the data in *string*. This data should be concatenated to the output |
| 204 | produced by any preceding calls to the :meth:`compress` method. Some input may |
| 205 | be kept in internal buffers for later processing. |
| 206 | |
| 207 | |
| 208 | .. method:: Compress.flush([mode]) |
| 209 | |
| 210 | All pending input is processed, and a string containing the remaining compressed |
| 211 | output is returned. *mode* can be selected from the constants |
| 212 | :const:`Z_SYNC_FLUSH`, :const:`Z_FULL_FLUSH`, or :const:`Z_FINISH`, |
| 213 | defaulting to :const:`Z_FINISH`. :const:`Z_SYNC_FLUSH` and |
| 214 | :const:`Z_FULL_FLUSH` allow compressing further strings of data, while |
| 215 | :const:`Z_FINISH` finishes the compressed stream and prevents compressing any |
| 216 | more data. After calling :meth:`flush` with *mode* set to :const:`Z_FINISH`, |
| 217 | the :meth:`compress` method cannot be called again; the only realistic action is |
| 218 | to delete the object. |
| 219 | |
| 220 | |
| 221 | .. method:: Compress.copy() |
| 222 | |
| 223 | Returns a copy of the compression object. This can be used to efficiently |
| 224 | compress a set of data that share a common initial prefix. |
| 225 | |
| 226 | .. versionadded:: 2.5 |
| 227 | |
| 228 | Decompression objects support the following methods, and two attributes: |
| 229 | |
| 230 | |
| 231 | .. attribute:: Decompress.unused_data |
| 232 | |
| 233 | A string which contains any bytes past the end of the compressed data. That is, |
| 234 | this remains ``""`` until the last byte that contains compression data is |
| 235 | available. If the whole string turned out to contain compressed data, this is |
| 236 | ``""``, the empty string. |
| 237 | |
| 238 | The only way to determine where a string of compressed data ends is by actually |
| 239 | decompressing it. This means that when compressed data is contained part of a |
| 240 | larger file, you can only find the end of it by reading data and feeding it |
| 241 | followed by some non-empty string into a decompression object's |
| 242 | :meth:`decompress` method until the :attr:`unused_data` attribute is no longer |
| 243 | the empty string. |
| 244 | |
| 245 | |
| 246 | .. attribute:: Decompress.unconsumed_tail |
| 247 | |
| 248 | A string that contains any data that was not consumed by the last |
| 249 | :meth:`decompress` call because it exceeded the limit for the uncompressed data |
| 250 | buffer. This data has not yet been seen by the zlib machinery, so you must feed |
| 251 | it (possibly with further data concatenated to it) back to a subsequent |
| 252 | :meth:`decompress` method call in order to get correct output. |
| 253 | |
| 254 | |
| 255 | .. method:: Decompress.decompress(string[, max_length]) |
| 256 | |
| 257 | Decompress *string*, returning a string containing the uncompressed data |
| 258 | corresponding to at least part of the data in *string*. This data should be |
| 259 | concatenated to the output produced by any preceding calls to the |
| 260 | :meth:`decompress` method. Some of the input data may be preserved in internal |
| 261 | buffers for later processing. |
| 262 | |
Martin Panter | 402803b | 2015-11-18 00:59:17 +0000 | [diff] [blame] | 263 | If the optional parameter *max_length* is non-zero then the return value will be |
Georg Brandl | 8ec7f65 | 2007-08-15 14:28:01 +0000 | [diff] [blame] | 264 | no longer than *max_length*. This may mean that not all of the compressed input |
| 265 | can be processed; and unconsumed data will be stored in the attribute |
| 266 | :attr:`unconsumed_tail`. This string must be passed to a subsequent call to |
| 267 | :meth:`decompress` if decompression is to continue. If *max_length* is not |
| 268 | supplied then the whole input is decompressed, and :attr:`unconsumed_tail` is an |
| 269 | empty string. |
| 270 | |
| 271 | |
| 272 | .. method:: Decompress.flush([length]) |
| 273 | |
| 274 | All pending input is processed, and a string containing the remaining |
| 275 | uncompressed output is returned. After calling :meth:`flush`, the |
| 276 | :meth:`decompress` method cannot be called again; the only realistic action is |
| 277 | to delete the object. |
| 278 | |
| 279 | The optional parameter *length* sets the initial size of the output buffer. |
| 280 | |
| 281 | |
| 282 | .. method:: Decompress.copy() |
| 283 | |
| 284 | Returns a copy of the decompression object. This can be used to save the state |
| 285 | of the decompressor midway through the data stream in order to speed up random |
| 286 | seeks into the stream at a future point. |
| 287 | |
| 288 | .. versionadded:: 2.5 |
| 289 | |
| 290 | |
| 291 | .. seealso:: |
| 292 | |
| 293 | Module :mod:`gzip` |
| 294 | Reading and writing :program:`gzip`\ -format files. |
| 295 | |
| 296 | http://www.zlib.net |
| 297 | The zlib library home page. |
| 298 | |
| 299 | http://www.zlib.net/manual.html |
| 300 | The zlib manual explains the semantics and usage of the library's many |
| 301 | functions. |
| 302 | |