blob: 26b0dfc83225956dc239ca389d51f71322662059 [file] [log] [blame]
Georg Brandl8ec7f652007-08-15 14:28:01 +00001
2:mod:`zlib` --- Compression compatible with :program:`gzip`
3===========================================================
4
5.. module:: zlib
6 :synopsis: Low-level interface to compression and decompression routines compatible with
7 gzip.
8
9
10For applications that require data compression, the functions in this module
11allow compression and decompression, using the zlib library. The zlib library
12has its own home page at http://www.zlib.net. There are known
13incompatibilities between the Python module and versions of the zlib library
14earlier than 1.1.3; 1.1.3 has a security vulnerability, so we recommend using
151.1.4 or later.
16
17zlib's functions have many options and often need to be used in a particular
18order. This documentation doesn't attempt to cover all of the permutations;
19consult the zlib manual at http://www.zlib.net/manual.html for authoritative
20information.
21
Éric Araujoc3cc2ac2012-02-26 01:10:14 +010022For reading and writing ``.gz`` files see the :mod:`gzip` module.
Mark Summerfieldaea6e592007-11-05 09:22:48 +000023
Georg Brandl8ec7f652007-08-15 14:28:01 +000024The available exception and functions in this module are:
25
26
27.. exception:: error
28
29 Exception raised on compression and decompression errors.
30
31
Gregory P. Smith987735c2009-01-11 17:57:54 +000032.. function:: adler32(data[, value])
Georg Brandl8ec7f652007-08-15 14:28:01 +000033
Serhiy Storchakac72e66a2015-11-02 15:06:09 +020034 Computes an Adler-32 checksum of *data*. (An Adler-32 checksum is almost as
Georg Brandl8ec7f652007-08-15 14:28:01 +000035 reliable as a CRC32 but can be computed much more quickly.) If *value* is
36 present, it is used as the starting value of the checksum; otherwise, a fixed
37 default value is used. This allows computing a running checksum over the
Gregory P. Smith987735c2009-01-11 17:57:54 +000038 concatenation of several inputs. The algorithm is not cryptographically
Georg Brandl8ec7f652007-08-15 14:28:01 +000039 strong, and should not be used for authentication or digital signatures. Since
40 the algorithm is designed for use as a checksum algorithm, it is not suitable
41 for use as a general hash algorithm.
42
Gregory P. Smithf48f9d32008-03-17 18:48:05 +000043 This function always returns an integer object.
44
Gregory P. Smith987735c2009-01-11 17:57:54 +000045.. note::
46 To generate the same numeric value across all Python versions and
47 platforms use adler32(data) & 0xffffffff. If you are only using
48 the checksum in packed binary format this is not necessary as the
Gregory P. Smith86cc5022009-02-01 00:24:21 +000049 return value is the correct 32bit binary representation
Gregory P. Smith987735c2009-01-11 17:57:54 +000050 regardless of sign.
51
52.. versionchanged:: 2.6
Gregory P. Smith86cc5022009-02-01 00:24:21 +000053 The return value is in the range [-2**31, 2**31-1]
54 regardless of platform. In older versions the value is
Gregory P. Smith987735c2009-01-11 17:57:54 +000055 signed on some platforms and unsigned on others.
56
57.. versionchanged:: 3.0
Gregory P. Smith86cc5022009-02-01 00:24:21 +000058 The return value is unsigned and in the range [0, 2**32-1]
Gregory P. Smith987735c2009-01-11 17:57:54 +000059 regardless of platform.
Gregory P. Smithf48f9d32008-03-17 18:48:05 +000060
Georg Brandl8ec7f652007-08-15 14:28:01 +000061
62.. function:: compress(string[, level])
63
64 Compresses the data in *string*, returning a string contained compressed data.
Nadeem Vawda04050b82012-11-11 13:52:10 +010065 *level* is an integer from ``0`` to ``9`` controlling the level of compression;
Georg Brandl8ec7f652007-08-15 14:28:01 +000066 ``1`` is fastest and produces the least compression, ``9`` is slowest and
Nadeem Vawda04050b82012-11-11 13:52:10 +010067 produces the most. ``0`` is no compression. The default value is ``6``.
68 Raises the :exc:`error` exception if any error occurs.
Georg Brandl8ec7f652007-08-15 14:28:01 +000069
70
Georg Brandlcea38082013-10-17 19:51:00 +020071.. function:: compressobj([level[, method[, wbits[, memlevel[, strategy]]]]])
Georg Brandl8ec7f652007-08-15 14:28:01 +000072
73 Returns a compression object, to be used for compressing data streams that won't
Martin Panter1d269c12016-02-03 07:06:33 +000074 fit into memory at once. *level* is an integer from
75 ``0`` to ``9`` or ``-1``, controlling
Georg Brandl8ec7f652007-08-15 14:28:01 +000076 the level of compression; ``1`` is fastest and produces the least compression,
Nadeem Vawda04050b82012-11-11 13:52:10 +010077 ``9`` is slowest and produces the most. ``0`` is no compression. The default
Martin Panter1d269c12016-02-03 07:06:33 +000078 value is ``-1`` (Z_DEFAULT_COMPRESSION). Z_DEFAULT_COMPRESSION represents a default
79 compromise between speed and compression (currently equivalent to level 6).
Georg Brandl8ec7f652007-08-15 14:28:01 +000080
Georg Brandlcea38082013-10-17 19:51:00 +020081 *method* is the compression algorithm. Currently, the only supported value is
82 ``DEFLATED``.
83
84 *wbits* is the base two logarithm of the size of the window buffer. This
85 should be an integer from ``8`` to ``15``. Higher values give better
86 compression, but use more memory. The default is 15.
87
88 *memlevel* controls the amount of memory used for internal compression state.
89 Valid values range from ``1`` to ``9``. Higher values using more memory,
90 but are faster and produce smaller output. The default is 8.
91
92 *strategy* is used to tune the compression algorithm. Possible values are
93 ``Z_DEFAULT_STRATEGY``, ``Z_FILTERED``, and ``Z_HUFFMAN_ONLY``. The default
94 is ``Z_DEFAULT_STRATEGY``.
95
Georg Brandl8ec7f652007-08-15 14:28:01 +000096
Gregory P. Smith987735c2009-01-11 17:57:54 +000097.. function:: crc32(data[, value])
Georg Brandl8ec7f652007-08-15 14:28:01 +000098
99 .. index::
100 single: Cyclic Redundancy Check
101 single: checksum; Cyclic Redundancy Check
102
Gregory P. Smith987735c2009-01-11 17:57:54 +0000103 Computes a CRC (Cyclic Redundancy Check) checksum of *data*. If *value* is
Georg Brandl8ec7f652007-08-15 14:28:01 +0000104 present, it is used as the starting value of the checksum; otherwise, a fixed
105 default value is used. This allows computing a running checksum over the
Gregory P. Smith987735c2009-01-11 17:57:54 +0000106 concatenation of several inputs. The algorithm is not cryptographically
Georg Brandl8ec7f652007-08-15 14:28:01 +0000107 strong, and should not be used for authentication or digital signatures. Since
108 the algorithm is designed for use as a checksum algorithm, it is not suitable
109 for use as a general hash algorithm.
110
Gregory P. Smithf48f9d32008-03-17 18:48:05 +0000111 This function always returns an integer object.
112
Gregory P. Smith987735c2009-01-11 17:57:54 +0000113.. note::
114 To generate the same numeric value across all Python versions and
115 platforms use crc32(data) & 0xffffffff. If you are only using
116 the checksum in packed binary format this is not necessary as the
Gregory P. Smith86cc5022009-02-01 00:24:21 +0000117 return value is the correct 32bit binary representation
Gregory P. Smith987735c2009-01-11 17:57:54 +0000118 regardless of sign.
119
120.. versionchanged:: 2.6
Gregory P. Smith86cc5022009-02-01 00:24:21 +0000121 The return value is in the range [-2**31, 2**31-1]
Gregory P. Smith987735c2009-01-11 17:57:54 +0000122 regardless of platform. In older versions the value would be
123 signed on some platforms and unsigned on others.
124
125.. versionchanged:: 3.0
Gregory P. Smith86cc5022009-02-01 00:24:21 +0000126 The return value is unsigned and in the range [0, 2**32-1]
Gregory P. Smith987735c2009-01-11 17:57:54 +0000127 regardless of platform.
Gregory P. Smithf48f9d32008-03-17 18:48:05 +0000128
Georg Brandl8ec7f652007-08-15 14:28:01 +0000129
130.. function:: decompress(string[, wbits[, bufsize]])
131
132 Decompresses the data in *string*, returning a string containing the
133 uncompressed data. The *wbits* parameter controls the size of the window
Andrew M. Kuchling66dab172010-03-01 19:51:43 +0000134 buffer, and is discussed further below.
135 If *bufsize* is given, it is used as the initial size of the output
Georg Brandl8ec7f652007-08-15 14:28:01 +0000136 buffer. Raises the :exc:`error` exception if any error occurs.
137
138 The absolute value of *wbits* is the base two logarithm of the size of the
139 history buffer (the "window size") used when compressing data. Its absolute
140 value should be between 8 and 15 for the most recent versions of the zlib
141 library, larger values resulting in better compression at the expense of greater
Andrew M. Kuchling66dab172010-03-01 19:51:43 +0000142 memory usage. When decompressing a stream, *wbits* must not be smaller
143 than the size originally used to compress the stream; using a too-small
144 value will result in an exception. The default value is therefore the
145 highest value, 15. When *wbits* is negative, the standard
Jesus Ceac3ce9e32010-05-03 16:09:21 +0000146 :program:`gzip` header is suppressed.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000147
148 *bufsize* is the initial size of the buffer used to hold decompressed data. If
149 more space is required, the buffer size will be increased as needed, so you
150 don't have to get this value exactly right; tuning it will only save a few calls
Sandro Tosi98ed08f2012-01-14 16:42:02 +0100151 to :c:func:`malloc`. The default size is 16384.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000152
153
154.. function:: decompressobj([wbits])
155
156 Returns a decompression object, to be used for decompressing data streams that
157 won't fit into memory at once. The *wbits* parameter controls the size of the
158 window buffer.
159
160Compression objects support the following methods:
161
162
163.. method:: Compress.compress(string)
164
165 Compress *string*, returning a string containing compressed data for at least
166 part of the data in *string*. This data should be concatenated to the output
167 produced by any preceding calls to the :meth:`compress` method. Some input may
168 be kept in internal buffers for later processing.
169
170
171.. method:: Compress.flush([mode])
172
173 All pending input is processed, and a string containing the remaining compressed
174 output is returned. *mode* can be selected from the constants
175 :const:`Z_SYNC_FLUSH`, :const:`Z_FULL_FLUSH`, or :const:`Z_FINISH`,
176 defaulting to :const:`Z_FINISH`. :const:`Z_SYNC_FLUSH` and
177 :const:`Z_FULL_FLUSH` allow compressing further strings of data, while
178 :const:`Z_FINISH` finishes the compressed stream and prevents compressing any
179 more data. After calling :meth:`flush` with *mode* set to :const:`Z_FINISH`,
180 the :meth:`compress` method cannot be called again; the only realistic action is
181 to delete the object.
182
183
184.. method:: Compress.copy()
185
186 Returns a copy of the compression object. This can be used to efficiently
187 compress a set of data that share a common initial prefix.
188
189 .. versionadded:: 2.5
190
191Decompression objects support the following methods, and two attributes:
192
193
194.. attribute:: Decompress.unused_data
195
196 A string which contains any bytes past the end of the compressed data. That is,
197 this remains ``""`` until the last byte that contains compression data is
198 available. If the whole string turned out to contain compressed data, this is
199 ``""``, the empty string.
200
201 The only way to determine where a string of compressed data ends is by actually
202 decompressing it. This means that when compressed data is contained part of a
203 larger file, you can only find the end of it by reading data and feeding it
204 followed by some non-empty string into a decompression object's
205 :meth:`decompress` method until the :attr:`unused_data` attribute is no longer
206 the empty string.
207
208
209.. attribute:: Decompress.unconsumed_tail
210
211 A string that contains any data that was not consumed by the last
212 :meth:`decompress` call because it exceeded the limit for the uncompressed data
213 buffer. This data has not yet been seen by the zlib machinery, so you must feed
214 it (possibly with further data concatenated to it) back to a subsequent
215 :meth:`decompress` method call in order to get correct output.
216
217
218.. method:: Decompress.decompress(string[, max_length])
219
220 Decompress *string*, returning a string containing the uncompressed data
221 corresponding to at least part of the data in *string*. This data should be
222 concatenated to the output produced by any preceding calls to the
223 :meth:`decompress` method. Some of the input data may be preserved in internal
224 buffers for later processing.
225
Martin Panter402803b2015-11-18 00:59:17 +0000226 If the optional parameter *max_length* is non-zero then the return value will be
Georg Brandl8ec7f652007-08-15 14:28:01 +0000227 no longer than *max_length*. This may mean that not all of the compressed input
228 can be processed; and unconsumed data will be stored in the attribute
229 :attr:`unconsumed_tail`. This string must be passed to a subsequent call to
230 :meth:`decompress` if decompression is to continue. If *max_length* is not
231 supplied then the whole input is decompressed, and :attr:`unconsumed_tail` is an
232 empty string.
233
234
235.. method:: Decompress.flush([length])
236
237 All pending input is processed, and a string containing the remaining
238 uncompressed output is returned. After calling :meth:`flush`, the
239 :meth:`decompress` method cannot be called again; the only realistic action is
240 to delete the object.
241
242 The optional parameter *length* sets the initial size of the output buffer.
243
244
245.. method:: Decompress.copy()
246
247 Returns a copy of the decompression object. This can be used to save the state
248 of the decompressor midway through the data stream in order to speed up random
249 seeks into the stream at a future point.
250
251 .. versionadded:: 2.5
252
253
254.. seealso::
255
256 Module :mod:`gzip`
257 Reading and writing :program:`gzip`\ -format files.
258
259 http://www.zlib.net
260 The zlib library home page.
261
262 http://www.zlib.net/manual.html
263 The zlib manual explains the semantics and usage of the library's many
264 functions.
265