Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 1 | :mod:`zlib` --- Compression compatible with :program:`gzip` |
| 2 | =========================================================== |
| 3 | |
| 4 | .. module:: zlib |
Georg Brandl | 7f01a13 | 2009-09-16 15:58:14 +0000 | [diff] [blame] | 5 | :synopsis: Low-level interface to compression and decompression routines |
| 6 | compatible with gzip. |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 7 | |
| 8 | |
| 9 | For applications that require data compression, the functions in this module |
| 10 | allow compression and decompression, using the zlib library. The zlib library |
| 11 | has its own home page at http://www.zlib.net. There are known |
| 12 | incompatibilities between the Python module and versions of the zlib library |
| 13 | earlier than 1.1.3; 1.1.3 has a security vulnerability, so we recommend using |
| 14 | 1.1.4 or later. |
| 15 | |
| 16 | zlib's functions have many options and often need to be used in a particular |
| 17 | order. This documentation doesn't attempt to cover all of the permutations; |
| 18 | consult the zlib manual at http://www.zlib.net/manual.html for authoritative |
| 19 | information. |
| 20 | |
Éric Araujo | f2fbb9c | 2012-01-16 16:55:55 +0100 | [diff] [blame^] | 21 | For reading and writing ``.gz`` files see the :mod:`gzip` module. |
Guido van Rossum | 7767711 | 2007-11-05 19:43:04 +0000 | [diff] [blame] | 22 | |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 23 | The available exception and functions in this module are: |
| 24 | |
| 25 | |
| 26 | .. exception:: error |
| 27 | |
| 28 | Exception raised on compression and decompression errors. |
| 29 | |
| 30 | |
Benjamin Peterson | 058e31e | 2009-01-16 03:54:08 +0000 | [diff] [blame] | 31 | .. function:: adler32(data[, value]) |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 32 | |
Benjamin Peterson | 058e31e | 2009-01-16 03:54:08 +0000 | [diff] [blame] | 33 | Computes a Adler-32 checksum of *data*. (An Adler-32 checksum is almost as |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 34 | reliable as a CRC32 but can be computed much more quickly.) If *value* is |
| 35 | present, it is used as the starting value of the checksum; otherwise, a fixed |
| 36 | default value is used. This allows computing a running checksum over the |
Benjamin Peterson | 058e31e | 2009-01-16 03:54:08 +0000 | [diff] [blame] | 37 | concatenation of several inputs. The algorithm is not cryptographically |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 38 | strong, and should not be used for authentication or digital signatures. Since |
| 39 | the algorithm is designed for use as a checksum algorithm, it is not suitable |
| 40 | for use as a general hash algorithm. |
| 41 | |
Gregory P. Smith | ab0d8a1 | 2008-03-17 20:24:09 +0000 | [diff] [blame] | 42 | Always returns an unsigned 32-bit integer. |
| 43 | |
Benjamin Peterson | 058e31e | 2009-01-16 03:54:08 +0000 | [diff] [blame] | 44 | .. note:: |
| 45 | To generate the same numeric value across all Python versions and |
| 46 | platforms use adler32(data) & 0xffffffff. If you are only using |
| 47 | the checksum in packed binary format this is not necessary as the |
Gregory P. Smith | fa6cf39 | 2009-02-01 00:30:50 +0000 | [diff] [blame] | 48 | return value is the correct 32bit binary representation |
Benjamin Peterson | 058e31e | 2009-01-16 03:54:08 +0000 | [diff] [blame] | 49 | regardless of sign. |
| 50 | |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 51 | |
Georg Brandl | 4ad934f | 2011-01-08 21:04:25 +0000 | [diff] [blame] | 52 | .. function:: compress(data[, level]) |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 53 | |
Georg Brandl | 4ad934f | 2011-01-08 21:04:25 +0000 | [diff] [blame] | 54 | Compresses the bytes in *data*, returning a bytes object containing compressed data. |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 55 | *level* is an integer from ``1`` to ``9`` controlling the level of compression; |
| 56 | ``1`` is fastest and produces the least compression, ``9`` is slowest and |
| 57 | produces the most. The default value is ``6``. Raises the :exc:`error` |
| 58 | exception if any error occurs. |
| 59 | |
| 60 | |
| 61 | .. function:: compressobj([level]) |
| 62 | |
| 63 | Returns a compression object, to be used for compressing data streams that won't |
| 64 | fit into memory at once. *level* is an integer from ``1`` to ``9`` controlling |
| 65 | the level of compression; ``1`` is fastest and produces the least compression, |
| 66 | ``9`` is slowest and produces the most. The default value is ``6``. |
| 67 | |
| 68 | |
Benjamin Peterson | 058e31e | 2009-01-16 03:54:08 +0000 | [diff] [blame] | 69 | .. function:: crc32(data[, value]) |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 70 | |
| 71 | .. index:: |
| 72 | single: Cyclic Redundancy Check |
| 73 | single: checksum; Cyclic Redundancy Check |
| 74 | |
Benjamin Peterson | 058e31e | 2009-01-16 03:54:08 +0000 | [diff] [blame] | 75 | Computes a CRC (Cyclic Redundancy Check) checksum of *data*. If *value* is |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 76 | present, it is used as the starting value of the checksum; otherwise, a fixed |
| 77 | default value is used. This allows computing a running checksum over the |
Benjamin Peterson | 058e31e | 2009-01-16 03:54:08 +0000 | [diff] [blame] | 78 | concatenation of several inputs. The algorithm is not cryptographically |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 79 | strong, and should not be used for authentication or digital signatures. Since |
| 80 | the algorithm is designed for use as a checksum algorithm, it is not suitable |
| 81 | for use as a general hash algorithm. |
| 82 | |
Gregory P. Smith | ab0d8a1 | 2008-03-17 20:24:09 +0000 | [diff] [blame] | 83 | Always returns an unsigned 32-bit integer. |
| 84 | |
Benjamin Peterson | 058e31e | 2009-01-16 03:54:08 +0000 | [diff] [blame] | 85 | .. note:: |
| 86 | To generate the same numeric value across all Python versions and |
| 87 | platforms use crc32(data) & 0xffffffff. If you are only using |
| 88 | the checksum in packed binary format this is not necessary as the |
Gregory P. Smith | fa6cf39 | 2009-02-01 00:30:50 +0000 | [diff] [blame] | 89 | return value is the correct 32bit binary representation |
Benjamin Peterson | 058e31e | 2009-01-16 03:54:08 +0000 | [diff] [blame] | 90 | regardless of sign. |
| 91 | |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 92 | |
Georg Brandl | 4ad934f | 2011-01-08 21:04:25 +0000 | [diff] [blame] | 93 | .. function:: decompress(data[, wbits[, bufsize]]) |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 94 | |
Georg Brandl | 4ad934f | 2011-01-08 21:04:25 +0000 | [diff] [blame] | 95 | Decompresses the bytes in *data*, returning a bytes object containing the |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 96 | uncompressed data. The *wbits* parameter controls the size of the window |
Benjamin Peterson | 2614cda | 2010-03-21 22:36:19 +0000 | [diff] [blame] | 97 | buffer, and is discussed further below. |
| 98 | If *bufsize* is given, it is used as the initial size of the output |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 99 | buffer. Raises the :exc:`error` exception if any error occurs. |
| 100 | |
| 101 | The absolute value of *wbits* is the base two logarithm of the size of the |
| 102 | history buffer (the "window size") used when compressing data. Its absolute |
| 103 | value should be between 8 and 15 for the most recent versions of the zlib |
| 104 | library, larger values resulting in better compression at the expense of greater |
Benjamin Peterson | 2614cda | 2010-03-21 22:36:19 +0000 | [diff] [blame] | 105 | memory usage. When decompressing a stream, *wbits* must not be smaller |
| 106 | than the size originally used to compress the stream; using a too-small |
| 107 | value will result in an exception. The default value is therefore the |
| 108 | highest value, 15. When *wbits* is negative, the standard |
Jesus Cea | fb7b668 | 2010-05-03 16:14:58 +0000 | [diff] [blame] | 109 | :program:`gzip` header is suppressed. |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 110 | |
| 111 | *bufsize* is the initial size of the buffer used to hold decompressed data. If |
| 112 | more space is required, the buffer size will be increased as needed, so you |
| 113 | don't have to get this value exactly right; tuning it will only save a few calls |
Georg Brandl | 60203b4 | 2010-10-06 10:11:56 +0000 | [diff] [blame] | 114 | to :c:func:`malloc`. The default size is 16384. |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 115 | |
| 116 | |
| 117 | .. function:: decompressobj([wbits]) |
| 118 | |
| 119 | Returns a decompression object, to be used for decompressing data streams that |
| 120 | won't fit into memory at once. The *wbits* parameter controls the size of the |
| 121 | window buffer. |
| 122 | |
| 123 | Compression objects support the following methods: |
| 124 | |
| 125 | |
Georg Brandl | 4ad934f | 2011-01-08 21:04:25 +0000 | [diff] [blame] | 126 | .. method:: Compress.compress(data) |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 127 | |
Georg Brandl | 4ad934f | 2011-01-08 21:04:25 +0000 | [diff] [blame] | 128 | Compress *data*, returning a bytes object containing compressed data for at least |
| 129 | part of the data in *data*. This data should be concatenated to the output |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 130 | produced by any preceding calls to the :meth:`compress` method. Some input may |
| 131 | be kept in internal buffers for later processing. |
| 132 | |
| 133 | |
| 134 | .. method:: Compress.flush([mode]) |
| 135 | |
Georg Brandl | 4ad934f | 2011-01-08 21:04:25 +0000 | [diff] [blame] | 136 | All pending input is processed, and a bytes object containing the remaining compressed |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 137 | output is returned. *mode* can be selected from the constants |
| 138 | :const:`Z_SYNC_FLUSH`, :const:`Z_FULL_FLUSH`, or :const:`Z_FINISH`, |
| 139 | defaulting to :const:`Z_FINISH`. :const:`Z_SYNC_FLUSH` and |
Georg Brandl | 4ad934f | 2011-01-08 21:04:25 +0000 | [diff] [blame] | 140 | :const:`Z_FULL_FLUSH` allow compressing further bytestrings of data, while |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 141 | :const:`Z_FINISH` finishes the compressed stream and prevents compressing any |
| 142 | more data. After calling :meth:`flush` with *mode* set to :const:`Z_FINISH`, |
| 143 | the :meth:`compress` method cannot be called again; the only realistic action is |
| 144 | to delete the object. |
| 145 | |
| 146 | |
| 147 | .. method:: Compress.copy() |
| 148 | |
| 149 | Returns a copy of the compression object. This can be used to efficiently |
| 150 | compress a set of data that share a common initial prefix. |
| 151 | |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 152 | |
| 153 | Decompression objects support the following methods, and two attributes: |
| 154 | |
| 155 | |
| 156 | .. attribute:: Decompress.unused_data |
| 157 | |
Georg Brandl | 4ad934f | 2011-01-08 21:04:25 +0000 | [diff] [blame] | 158 | A bytes object which contains any bytes past the end of the compressed data. That is, |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 159 | this remains ``""`` until the last byte that contains compression data is |
Georg Brandl | 4ad934f | 2011-01-08 21:04:25 +0000 | [diff] [blame] | 160 | available. If the whole bytestring turned out to contain compressed data, this is |
| 161 | ``b""``, an empty bytes object. |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 162 | |
Georg Brandl | 4ad934f | 2011-01-08 21:04:25 +0000 | [diff] [blame] | 163 | The only way to determine where a bytestring of compressed data ends is by actually |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 164 | decompressing it. This means that when compressed data is contained part of a |
| 165 | larger file, you can only find the end of it by reading data and feeding it |
Georg Brandl | 4ad934f | 2011-01-08 21:04:25 +0000 | [diff] [blame] | 166 | followed by some non-empty bytestring into a decompression object's |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 167 | :meth:`decompress` method until the :attr:`unused_data` attribute is no longer |
Georg Brandl | 4ad934f | 2011-01-08 21:04:25 +0000 | [diff] [blame] | 168 | empty. |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 169 | |
| 170 | |
| 171 | .. attribute:: Decompress.unconsumed_tail |
| 172 | |
Georg Brandl | 4ad934f | 2011-01-08 21:04:25 +0000 | [diff] [blame] | 173 | A bytes object that contains any data that was not consumed by the last |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 174 | :meth:`decompress` call because it exceeded the limit for the uncompressed data |
| 175 | buffer. This data has not yet been seen by the zlib machinery, so you must feed |
| 176 | it (possibly with further data concatenated to it) back to a subsequent |
| 177 | :meth:`decompress` method call in order to get correct output. |
| 178 | |
| 179 | |
Georg Brandl | 4ad934f | 2011-01-08 21:04:25 +0000 | [diff] [blame] | 180 | .. method:: Decompress.decompress(data[, max_length]) |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 181 | |
Georg Brandl | 4ad934f | 2011-01-08 21:04:25 +0000 | [diff] [blame] | 182 | Decompress *data*, returning a bytes object containing the uncompressed data |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 183 | corresponding to at least part of the data in *string*. This data should be |
| 184 | concatenated to the output produced by any preceding calls to the |
| 185 | :meth:`decompress` method. Some of the input data may be preserved in internal |
| 186 | buffers for later processing. |
| 187 | |
| 188 | If the optional parameter *max_length* is supplied then the return value will be |
| 189 | no longer than *max_length*. This may mean that not all of the compressed input |
| 190 | can be processed; and unconsumed data will be stored in the attribute |
Georg Brandl | 4ad934f | 2011-01-08 21:04:25 +0000 | [diff] [blame] | 191 | :attr:`unconsumed_tail`. This bytestring must be passed to a subsequent call to |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 192 | :meth:`decompress` if decompression is to continue. If *max_length* is not |
Georg Brandl | 4ad934f | 2011-01-08 21:04:25 +0000 | [diff] [blame] | 193 | supplied then the whole input is decompressed, and :attr:`unconsumed_tail` is |
| 194 | empty. |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 195 | |
| 196 | |
| 197 | .. method:: Decompress.flush([length]) |
| 198 | |
Georg Brandl | 4ad934f | 2011-01-08 21:04:25 +0000 | [diff] [blame] | 199 | All pending input is processed, and a bytes object containing the remaining |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 200 | uncompressed output is returned. After calling :meth:`flush`, the |
| 201 | :meth:`decompress` method cannot be called again; the only realistic action is |
| 202 | to delete the object. |
| 203 | |
| 204 | The optional parameter *length* sets the initial size of the output buffer. |
| 205 | |
| 206 | |
| 207 | .. method:: Decompress.copy() |
| 208 | |
| 209 | Returns a copy of the decompression object. This can be used to save the state |
| 210 | of the decompressor midway through the data stream in order to speed up random |
| 211 | seeks into the stream at a future point. |
| 212 | |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 213 | |
| 214 | .. seealso:: |
| 215 | |
| 216 | Module :mod:`gzip` |
| 217 | Reading and writing :program:`gzip`\ -format files. |
| 218 | |
| 219 | http://www.zlib.net |
| 220 | The zlib library home page. |
| 221 | |
| 222 | http://www.zlib.net/manual.html |
| 223 | The zlib manual explains the semantics and usage of the library's many |
| 224 | functions. |
| 225 | |