blob: 3d742ab35b9cc933b697551f23178a3c628e41d7 [file] [log] [blame]
Georg Brandl116aa622007-08-15 14:28:22 +00001:mod:`zlib` --- Compression compatible with :program:`gzip`
2===========================================================
3
4.. module:: zlib
Georg Brandl7f01a132009-09-16 15:58:14 +00005 :synopsis: Low-level interface to compression and decompression routines
6 compatible with gzip.
Georg Brandl116aa622007-08-15 14:28:22 +00007
Terry Jan Reedyfa089b92016-06-11 15:02:54 -04008--------------
Georg Brandl116aa622007-08-15 14:28:22 +00009
10For applications that require data compression, the functions in this module
11allow compression and decompression, using the zlib library. The zlib library
12has its own home page at http://www.zlib.net. There are known
13incompatibilities between the Python module and versions of the zlib library
14earlier than 1.1.3; 1.1.3 has a security vulnerability, so we recommend using
151.1.4 or later.
16
17zlib's functions have many options and often need to be used in a particular
18order. This documentation doesn't attempt to cover all of the permutations;
19consult the zlib manual at http://www.zlib.net/manual.html for authoritative
20information.
21
Éric Araujof2fbb9c2012-01-16 16:55:55 +010022For reading and writing ``.gz`` files see the :mod:`gzip` module.
Guido van Rossum77677112007-11-05 19:43:04 +000023
Georg Brandl116aa622007-08-15 14:28:22 +000024The available exception and functions in this module are:
25
26
27.. exception:: error
28
29 Exception raised on compression and decompression errors.
30
31
Benjamin Peterson058e31e2009-01-16 03:54:08 +000032.. function:: adler32(data[, value])
Georg Brandl116aa622007-08-15 14:28:22 +000033
Serhiy Storchakad65c9492015-11-02 14:10:23 +020034 Computes an Adler-32 checksum of *data*. (An Adler-32 checksum is almost as
Martin Panterb82032f2015-12-11 05:19:29 +000035 reliable as a CRC32 but can be computed much more quickly.) The result
36 is an unsigned 32-bit integer. If *value* is present, it is used as
37 the starting value of the checksum; otherwise, a default value of 1
38 is used. Passing in *value* allows computing a running checksum over the
Benjamin Peterson058e31e2009-01-16 03:54:08 +000039 concatenation of several inputs. The algorithm is not cryptographically
Georg Brandl116aa622007-08-15 14:28:22 +000040 strong, and should not be used for authentication or digital signatures. Since
41 the algorithm is designed for use as a checksum algorithm, it is not suitable
42 for use as a general hash algorithm.
43
Martin Panterb82032f2015-12-11 05:19:29 +000044 .. versionchanged:: 3.0
45 Always returns an unsigned value.
46 To generate the same numeric value across all Python versions and
47 platforms, use ``adler32(data) & 0xffffffff``.
Benjamin Peterson058e31e2009-01-16 03:54:08 +000048
Georg Brandl116aa622007-08-15 14:28:22 +000049
Martin Panter1fe0d132016-02-10 10:06:36 +000050.. function:: compress(data, level=-1)
Georg Brandl116aa622007-08-15 14:28:22 +000051
Georg Brandl4ad934f2011-01-08 21:04:25 +000052 Compresses the bytes in *data*, returning a bytes object containing compressed data.
Martin Panter1fe0d132016-02-10 10:06:36 +000053 *level* is an integer from ``0`` to ``9`` or ``-1`` controlling the level of compression;
Georg Brandl116aa622007-08-15 14:28:22 +000054 ``1`` is fastest and produces the least compression, ``9`` is slowest and
Martin Panter1fe0d132016-02-10 10:06:36 +000055 produces the most. ``0`` is no compression. The default value is ``-1``
56 (Z_DEFAULT_COMPRESSION). Z_DEFAULT_COMPRESSION represents a default
57 compromise between speed and compression (currently equivalent to level 6).
Nadeem Vawda6ff262e2012-11-11 14:14:47 +010058 Raises the :exc:`error` exception if any error occurs.
Georg Brandl116aa622007-08-15 14:28:22 +000059
Martin Panter1fe0d132016-02-10 10:06:36 +000060 .. versionchanged:: 3.6
Serhiy Storchaka2d8f9452016-06-25 22:47:04 +030061 *level* can now be used as a keyword parameter.
Martin Panter1fe0d132016-02-10 10:06:36 +000062
Georg Brandl116aa622007-08-15 14:28:22 +000063
Martin Panterbf19d162015-09-09 01:01:13 +000064.. function:: compressobj(level=-1, method=DEFLATED, wbits=15, memLevel=8, strategy=Z_DEFAULT_STRATEGY[, zdict])
Georg Brandl116aa622007-08-15 14:28:22 +000065
66 Returns a compression object, to be used for compressing data streams that won't
Nadeem Vawdafd8a8382012-06-21 02:13:12 +020067 fit into memory at once.
68
Martin Panter567d5132016-02-03 07:06:33 +000069 *level* is the compression level -- an integer from ``0`` to ``9`` or ``-1``.
70 A value of ``1`` is fastest and produces the least compression, while a value of
Nadeem Vawda6ff262e2012-11-11 14:14:47 +010071 ``9`` is slowest and produces the most. ``0`` is no compression. The default
Martin Panter567d5132016-02-03 07:06:33 +000072 value is ``-1`` (Z_DEFAULT_COMPRESSION). Z_DEFAULT_COMPRESSION represents a default
73 compromise between speed and compression (currently equivalent to level 6).
Nadeem Vawda2180c972012-06-22 01:40:49 +020074
75 *method* is the compression algorithm. Currently, the only supported value is
76 ``DEFLATED``.
77
Martin Panter0fdf41d2016-05-27 07:32:11 +000078 The *wbits* argument controls the size of the history buffer (or the
79 "window size") used when compressing data, and whether a header and
80 trailer is included in the output. It can take several ranges of values:
81
82 * +9 to +15: The base-two logarithm of the window size, which
83 therefore ranges between 512 and 32768. Larger values produce
84 better compression at the expense of greater memory usage. The
85 resulting output will include a zlib-specific header and trailer.
86
87 * −9 to −15: Uses the absolute value of *wbits* as the
88 window size logarithm, while producing a raw output stream with no
89 header or trailing checksum.
90
91 * +25 to +31 = 16 + (9 to 15): Uses the low 4 bits of the value as the
92 window size logarithm, while including a basic :program:`gzip` header
93 and trailing checksum in the output.
Nadeem Vawda2180c972012-06-22 01:40:49 +020094
Martin Panterbf19d162015-09-09 01:01:13 +000095 The *memLevel* argument controls the amount of memory used for the
96 internal compression state. Valid values range from ``1`` to ``9``.
97 Higher values use more memory, but are faster and produce smaller output.
Nadeem Vawda2180c972012-06-22 01:40:49 +020098
99 *strategy* is used to tune the compression algorithm. Possible values are
100 ``Z_DEFAULT_STRATEGY``, ``Z_FILTERED``, and ``Z_HUFFMAN_ONLY``.
Nadeem Vawdafd8a8382012-06-21 02:13:12 +0200101
102 *zdict* is a predefined compression dictionary. This is a sequence of bytes
103 (such as a :class:`bytes` object) containing subsequences that are expected
104 to occur frequently in the data that is to be compressed. Those subsequences
105 that are expected to be most common should come at the end of the dictionary.
Georg Brandl116aa622007-08-15 14:28:22 +0000106
Georg Brandl9aae9e52012-06-26 08:51:17 +0200107 .. versionchanged:: 3.3
Georg Brandl9ff06dc2013-10-17 19:51:34 +0200108 Added the *zdict* parameter and keyword argument support.
Georg Brandl9aae9e52012-06-26 08:51:17 +0200109
Georg Brandl116aa622007-08-15 14:28:22 +0000110
Benjamin Peterson058e31e2009-01-16 03:54:08 +0000111.. function:: crc32(data[, value])
Georg Brandl116aa622007-08-15 14:28:22 +0000112
113 .. index::
114 single: Cyclic Redundancy Check
115 single: checksum; Cyclic Redundancy Check
116
Martin Panterb82032f2015-12-11 05:19:29 +0000117 Computes a CRC (Cyclic Redundancy Check) checksum of *data*. The
118 result is an unsigned 32-bit integer. If *value* is present, it is used
119 as the starting value of the checksum; otherwise, a default value of 0
120 is used. Passing in *value* allows computing a running checksum over the
Benjamin Peterson058e31e2009-01-16 03:54:08 +0000121 concatenation of several inputs. The algorithm is not cryptographically
Georg Brandl116aa622007-08-15 14:28:22 +0000122 strong, and should not be used for authentication or digital signatures. Since
123 the algorithm is designed for use as a checksum algorithm, it is not suitable
124 for use as a general hash algorithm.
125
Martin Panterb82032f2015-12-11 05:19:29 +0000126 .. versionchanged:: 3.0
127 Always returns an unsigned value.
Georg Brandl9aae9e52012-06-26 08:51:17 +0200128 To generate the same numeric value across all Python versions and
Martin Panterb82032f2015-12-11 05:19:29 +0000129 platforms, use ``crc32(data) & 0xffffffff``.
Benjamin Peterson058e31e2009-01-16 03:54:08 +0000130
Georg Brandl116aa622007-08-15 14:28:22 +0000131
Serhiy Storchaka15f32282016-08-15 10:06:16 +0300132.. function:: decompress(data, wbits=MAX_WBITS, bufsize=DEF_BUF_SIZE)
Georg Brandl116aa622007-08-15 14:28:22 +0000133
Georg Brandl4ad934f2011-01-08 21:04:25 +0000134 Decompresses the bytes in *data*, returning a bytes object containing the
Martin Panter0fdf41d2016-05-27 07:32:11 +0000135 uncompressed data. The *wbits* parameter depends on
136 the format of *data*, and is discussed further below.
Benjamin Peterson2614cda2010-03-21 22:36:19 +0000137 If *bufsize* is given, it is used as the initial size of the output
Georg Brandl116aa622007-08-15 14:28:22 +0000138 buffer. Raises the :exc:`error` exception if any error occurs.
139
Martin Panter0fdf41d2016-05-27 07:32:11 +0000140 .. _decompress-wbits:
141
142 The *wbits* parameter controls the size of the history buffer
143 (or "window size"), and what header and trailer format is expected.
144 It is similar to the parameter for :func:`compressobj`, but accepts
145 more ranges of values:
146
147 * +8 to +15: The base-two logarithm of the window size. The input
148 must include a zlib header and trailer.
149
150 * 0: Automatically determine the window size from the zlib header.
Martin Panterc618ae82016-05-27 11:20:21 +0000151 Only supported since zlib 1.2.3.5.
Martin Panter0fdf41d2016-05-27 07:32:11 +0000152
153 * −8 to −15: Uses the absolute value of *wbits* as the window size
154 logarithm. The input must be a raw stream with no header or trailer.
155
156 * +24 to +31 = 16 + (8 to 15): Uses the low 4 bits of the value as
157 the window size logarithm. The input must include a gzip header and
158 trailer.
159
160 * +40 to +47 = 32 + (8 to 15): Uses the low 4 bits of the value as
161 the window size logarithm, and automatically accepts either
162 the zlib or gzip format.
163
164 When decompressing a stream, the window size must not be smaller
Benjamin Peterson2614cda2010-03-21 22:36:19 +0000165 than the size originally used to compress the stream; using a too-small
Martin Panter0fdf41d2016-05-27 07:32:11 +0000166 value may result in an :exc:`error` exception. The default *wbits* value
Serhiy Storchaka15f32282016-08-15 10:06:16 +0300167 corresponds to the largest window size and requires a zlib header and
168 trailer to be included.
Georg Brandl116aa622007-08-15 14:28:22 +0000169
170 *bufsize* is the initial size of the buffer used to hold decompressed data. If
171 more space is required, the buffer size will be increased as needed, so you
172 don't have to get this value exactly right; tuning it will only save a few calls
Serhiy Storchaka15f32282016-08-15 10:06:16 +0300173 to :c:func:`malloc`.
Georg Brandl116aa622007-08-15 14:28:22 +0000174
Serhiy Storchaka15f32282016-08-15 10:06:16 +0300175 .. versionchanged:: 3.6
176 *wbits* and *bufsize* can be used as keyword arguments.
Georg Brandl116aa622007-08-15 14:28:22 +0000177
Georg Brandl9aae9e52012-06-26 08:51:17 +0200178.. function:: decompressobj(wbits=15[, zdict])
Georg Brandl116aa622007-08-15 14:28:22 +0000179
180 Returns a decompression object, to be used for decompressing data streams that
Nadeem Vawdafd8a8382012-06-21 02:13:12 +0200181 won't fit into memory at once.
182
Martin Panter0fdf41d2016-05-27 07:32:11 +0000183 The *wbits* parameter controls the size of the history buffer (or the
184 "window size"), and what header and trailer format is expected. It has
185 the same meaning as `described for decompress() <#decompress-wbits>`__.
Nadeem Vawdafd8a8382012-06-21 02:13:12 +0200186
187 The *zdict* parameter specifies a predefined compression dictionary. If
188 provided, this must be the same dictionary as was used by the compressor that
189 produced the data that is to be decompressed.
190
Georg Brandl9aae9e52012-06-26 08:51:17 +0200191 .. note::
192
193 If *zdict* is a mutable object (such as a :class:`bytearray`), you must not
194 modify its contents between the call to :func:`decompressobj` and the first
195 call to the decompressor's ``decompress()`` method.
196
197 .. versionchanged:: 3.3
198 Added the *zdict* parameter.
Georg Brandl116aa622007-08-15 14:28:22 +0000199
Nadeem Vawda64d25dd2011-09-12 00:04:13 +0200200
Georg Brandl116aa622007-08-15 14:28:22 +0000201Compression objects support the following methods:
202
203
Georg Brandl4ad934f2011-01-08 21:04:25 +0000204.. method:: Compress.compress(data)
Georg Brandl116aa622007-08-15 14:28:22 +0000205
Georg Brandl4ad934f2011-01-08 21:04:25 +0000206 Compress *data*, returning a bytes object containing compressed data for at least
207 part of the data in *data*. This data should be concatenated to the output
Georg Brandl116aa622007-08-15 14:28:22 +0000208 produced by any preceding calls to the :meth:`compress` method. Some input may
209 be kept in internal buffers for later processing.
210
211
212.. method:: Compress.flush([mode])
213
Georg Brandl4ad934f2011-01-08 21:04:25 +0000214 All pending input is processed, and a bytes object containing the remaining compressed
Georg Brandl116aa622007-08-15 14:28:22 +0000215 output is returned. *mode* can be selected from the constants
216 :const:`Z_SYNC_FLUSH`, :const:`Z_FULL_FLUSH`, or :const:`Z_FINISH`,
217 defaulting to :const:`Z_FINISH`. :const:`Z_SYNC_FLUSH` and
Georg Brandl4ad934f2011-01-08 21:04:25 +0000218 :const:`Z_FULL_FLUSH` allow compressing further bytestrings of data, while
Georg Brandl116aa622007-08-15 14:28:22 +0000219 :const:`Z_FINISH` finishes the compressed stream and prevents compressing any
220 more data. After calling :meth:`flush` with *mode* set to :const:`Z_FINISH`,
221 the :meth:`compress` method cannot be called again; the only realistic action is
222 to delete the object.
223
224
225.. method:: Compress.copy()
226
227 Returns a copy of the compression object. This can be used to efficiently
228 compress a set of data that share a common initial prefix.
229
Georg Brandl116aa622007-08-15 14:28:22 +0000230
Nadeem Vawda1c385462011-08-13 15:22:40 +0200231Decompression objects support the following methods and attributes:
Georg Brandl116aa622007-08-15 14:28:22 +0000232
233
234.. attribute:: Decompress.unused_data
235
Georg Brandl4ad934f2011-01-08 21:04:25 +0000236 A bytes object which contains any bytes past the end of the compressed data. That is,
Serhiy Storchaka5e028ae2014-02-06 21:10:41 +0200237 this remains ``b""`` until the last byte that contains compression data is
Georg Brandl4ad934f2011-01-08 21:04:25 +0000238 available. If the whole bytestring turned out to contain compressed data, this is
239 ``b""``, an empty bytes object.
Georg Brandl116aa622007-08-15 14:28:22 +0000240
Georg Brandl116aa622007-08-15 14:28:22 +0000241
242.. attribute:: Decompress.unconsumed_tail
243
Georg Brandl4ad934f2011-01-08 21:04:25 +0000244 A bytes object that contains any data that was not consumed by the last
Georg Brandl116aa622007-08-15 14:28:22 +0000245 :meth:`decompress` call because it exceeded the limit for the uncompressed data
246 buffer. This data has not yet been seen by the zlib machinery, so you must feed
247 it (possibly with further data concatenated to it) back to a subsequent
248 :meth:`decompress` method call in order to get correct output.
249
250
Nadeem Vawda1c385462011-08-13 15:22:40 +0200251.. attribute:: Decompress.eof
252
253 A boolean indicating whether the end of the compressed data stream has been
254 reached.
255
256 This makes it possible to distinguish between a properly-formed compressed
257 stream, and an incomplete or truncated one.
258
259 .. versionadded:: 3.3
260
261
Serhiy Storchaka15f32282016-08-15 10:06:16 +0300262.. method:: Decompress.decompress(data, max_length=0)
Georg Brandl116aa622007-08-15 14:28:22 +0000263
Georg Brandl4ad934f2011-01-08 21:04:25 +0000264 Decompress *data*, returning a bytes object containing the uncompressed data
Georg Brandl116aa622007-08-15 14:28:22 +0000265 corresponding to at least part of the data in *string*. This data should be
266 concatenated to the output produced by any preceding calls to the
267 :meth:`decompress` method. Some of the input data may be preserved in internal
268 buffers for later processing.
269
Martin Panter38fe4dc2015-11-18 00:59:17 +0000270 If the optional parameter *max_length* is non-zero then the return value will be
Georg Brandl116aa622007-08-15 14:28:22 +0000271 no longer than *max_length*. This may mean that not all of the compressed input
272 can be processed; and unconsumed data will be stored in the attribute
Georg Brandl4ad934f2011-01-08 21:04:25 +0000273 :attr:`unconsumed_tail`. This bytestring must be passed to a subsequent call to
Serhiy Storchaka15f32282016-08-15 10:06:16 +0300274 :meth:`decompress` if decompression is to continue. If *max_length* is zero
275 then the whole input is decompressed, and :attr:`unconsumed_tail` is empty.
276
277 .. versionchanged:: 3.6
278 *max_length* can be used as a keyword argument.
Georg Brandl116aa622007-08-15 14:28:22 +0000279
280
281.. method:: Decompress.flush([length])
282
Georg Brandl4ad934f2011-01-08 21:04:25 +0000283 All pending input is processed, and a bytes object containing the remaining
Georg Brandl116aa622007-08-15 14:28:22 +0000284 uncompressed output is returned. After calling :meth:`flush`, the
285 :meth:`decompress` method cannot be called again; the only realistic action is
286 to delete the object.
287
288 The optional parameter *length* sets the initial size of the output buffer.
289
290
291.. method:: Decompress.copy()
292
293 Returns a copy of the decompression object. This can be used to save the state
294 of the decompressor midway through the data stream in order to speed up random
295 seeks into the stream at a future point.
296
Georg Brandl116aa622007-08-15 14:28:22 +0000297
Nadeem Vawda64d25dd2011-09-12 00:04:13 +0200298Information about the version of the zlib library in use is available through
299the following constants:
300
301
302.. data:: ZLIB_VERSION
303
304 The version string of the zlib library that was used for building the module.
305 This may be different from the zlib library actually used at runtime, which
306 is available as :const:`ZLIB_RUNTIME_VERSION`.
307
Nadeem Vawda64d25dd2011-09-12 00:04:13 +0200308
309.. data:: ZLIB_RUNTIME_VERSION
310
311 The version string of the zlib library actually loaded by the interpreter.
312
313 .. versionadded:: 3.3
314
315
Georg Brandl116aa622007-08-15 14:28:22 +0000316.. seealso::
317
318 Module :mod:`gzip`
319 Reading and writing :program:`gzip`\ -format files.
320
321 http://www.zlib.net
322 The zlib library home page.
323
324 http://www.zlib.net/manual.html
325 The zlib manual explains the semantics and usage of the library's many
326 functions.
327