blob: 876f8c0f774807c7e91fd86e69b54db8190cb8fb [file] [log] [blame]
Fred Drake295da241998-08-10 19:42:37 +00001\section{\module{zlib} ---
Fred Drakeb11d1081999-04-21 18:44:41 +00002 Compression compatible with \program{gzip}}
Fred Drakeb91e9341998-07-23 17:59:49 +00003
Fred Drakebbac4321999-02-20 00:14:17 +00004\declaremodule{builtin}{zlib}
Fred Drake08caa961998-07-27 22:08:49 +00005\modulesynopsis{Low-level interface to compression and decompression
Fred Drakeb11d1081999-04-21 18:44:41 +00006 routines compatible with \program{gzip}.}
Fred Drakeb91e9341998-07-23 17:59:49 +00007
Guido van Rossum04bc9d61997-04-30 18:12:27 +00008
9For applications that require data compression, the functions in this
Fred Drake8a254b51998-04-09 15:41:44 +000010module allow compression and decompression, using the zlib library.
Andrew M. Kuchling2330e9e2005-08-31 16:52:40 +000011The zlib library has its own home page at \url{http://www.zlib.net}.
Andrew M. Kuchling57712b32004-10-19 19:50:23 +000012There are known incompatibilities between the Python module and
13versions of the zlib library earlier than 1.1.3; 1.1.3 has a security
14vulnerability, so we recommend using 1.1.4 or later.
Jeremy Hylton45b0aed1999-04-05 21:55:21 +000015
Andrew M. Kuchling2330e9e2005-08-31 16:52:40 +000016zlib's functions have many options and often need to be used in a
17particular order. This documentation doesn't attempt to cover all of
18the permutations; consult the zlib manual at
19\url{http://www.zlib.net/manual.html} for authoritative information.
20
Fred Drake74810d51998-04-03 06:49:26 +000021The available exception and functions in this module are:
Guido van Rossum04bc9d61997-04-30 18:12:27 +000022
Fred Drake74810d51998-04-03 06:49:26 +000023\begin{excdesc}{error}
24 Exception raised on compression and decompression errors.
25\end{excdesc}
26
27
Fred Drakecce10901998-03-17 06:33:25 +000028\begin{funcdesc}{adler32}{string\optional{, value}}
Guido van Rossum04bc9d61997-04-30 18:12:27 +000029 Computes a Adler-32 checksum of \var{string}. (An Adler-32
30 checksum is almost as reliable as a CRC32 but can be computed much
31 more quickly.) If \var{value} is present, it is used as the
32 starting value of the checksum; otherwise, a fixed default value is
33 used. This allows computing a running checksum over the
34 concatenation of several input strings. The algorithm is not
35 cryptographically strong, and should not be used for
Fred Drake327798c2001-10-15 13:45:49 +000036 authentication or digital signatures. Since the algorithm is
37 designed for use as a checksum algorithm, it is not suitable for
38 use as a general hash algorithm.
Guido van Rossum04bc9d61997-04-30 18:12:27 +000039\end{funcdesc}
40
Fred Drakecce10901998-03-17 06:33:25 +000041\begin{funcdesc}{compress}{string\optional{, level}}
Fred Drake59160701998-06-19 21:18:28 +000042 Compresses the data in \var{string}, returning a string contained
43 compressed data. \var{level} is an integer from \code{1} to
44 \code{9} controlling the level of compression; \code{1} is fastest
45 and produces the least compression, \code{9} is slowest and produces
46 the most. The default value is \code{6}. Raises the
47 \exception{error} exception if any error occurs.
Guido van Rossum04bc9d61997-04-30 18:12:27 +000048\end{funcdesc}
49
50\begin{funcdesc}{compressobj}{\optional{level}}
Fred Drakeed797831998-01-22 16:11:18 +000051 Returns a compression object, to be used for compressing data streams
Guido van Rossum04bc9d61997-04-30 18:12:27 +000052 that won't fit into memory at once. \var{level} is an integer from
Fred Drakeed797831998-01-22 16:11:18 +000053 \code{1} to \code{9} controlling the level of compression; \code{1} is
54 fastest and produces the least compression, \code{9} is slowest and
55 produces the most. The default value is \code{6}.
Guido van Rossum04bc9d61997-04-30 18:12:27 +000056\end{funcdesc}
57
Fred Drakecce10901998-03-17 06:33:25 +000058\begin{funcdesc}{crc32}{string\optional{, value}}
Fred Drake74810d51998-04-03 06:49:26 +000059 Computes a CRC (Cyclic Redundancy Check)%
60 \index{Cyclic Redundancy Check}
Fred Drakeb208f121998-04-04 06:28:54 +000061 \index{checksum!Cyclic Redundancy Check}
Fred Drake74810d51998-04-03 06:49:26 +000062 checksum of \var{string}. If
63 \var{value} is present, it is used as the starting value of the
64 checksum; otherwise, a fixed default value is used. This allows
65 computing a running checksum over the concatenation of several
66 input strings. The algorithm is not cryptographically strong, and
Fred Drake327798c2001-10-15 13:45:49 +000067 should not be used for authentication or digital signatures. Since
68 the algorithm is designed for use as a checksum algorithm, it is not
69 suitable for use as a general hash algorithm.
Guido van Rossum04bc9d61997-04-30 18:12:27 +000070\end{funcdesc}
71
Fred Drake38e5d272000-04-03 20:13:55 +000072\begin{funcdesc}{decompress}{string\optional{, wbits\optional{, bufsize}}}
Fred Drake59160701998-06-19 21:18:28 +000073 Decompresses the data in \var{string}, returning a string containing
74 the uncompressed data. The \var{wbits} parameter controls the size of
Fred Drake38e5d272000-04-03 20:13:55 +000075 the window buffer. If \var{bufsize} is given, it is used as the
Fred Drake59160701998-06-19 21:18:28 +000076 initial size of the output buffer. Raises the \exception{error}
77 exception if any error occurs.
Fred Drake38e5d272000-04-03 20:13:55 +000078
79The absolute value of \var{wbits} is the base two logarithm of the
80size of the history buffer (the ``window size'') used when compressing
81data. Its absolute value should be between 8 and 15 for the most
82recent versions of the zlib library, larger values resulting in better
83compression at the expense of greater memory usage. The default value
84is 15. When \var{wbits} is negative, the standard
85\program{gzip} header is suppressed; this is an undocumented feature
86of the zlib library, used for compatibility with \program{unzip}'s
87compression file format.
88
89\var{bufsize} is the initial size of the buffer used to hold
90decompressed data. If more space is required, the buffer size will be
91increased as needed, so you don't have to get this value exactly
92right; tuning it will only save a few calls to \cfunction{malloc()}. The
93default size is 16384.
94
Guido van Rossum04bc9d61997-04-30 18:12:27 +000095\end{funcdesc}
96
97\begin{funcdesc}{decompressobj}{\optional{wbits}}
Fred Drakebc524c42001-04-18 20:16:51 +000098 Returns a decompression object, to be used for decompressing data
Fred Drake59160701998-06-19 21:18:28 +000099 streams that won't fit into memory at once. The \var{wbits}
100 parameter controls the size of the window buffer.
Guido van Rossum04bc9d61997-04-30 18:12:27 +0000101\end{funcdesc}
102
103Compression objects support the following methods:
104
Fred Drake74810d51998-04-03 06:49:26 +0000105\begin{methoddesc}[Compress]{compress}{string}
Guido van Rossum04bc9d61997-04-30 18:12:27 +0000106Compress \var{string}, returning a string containing compressed data
107for at least part of the data in \var{string}. This data should be
108concatenated to the output produced by any preceding calls to the
Fred Drakeed797831998-01-22 16:11:18 +0000109\method{compress()} method. Some input may be kept in internal buffers
Guido van Rossum04bc9d61997-04-30 18:12:27 +0000110for later processing.
Fred Drake74810d51998-04-03 06:49:26 +0000111\end{methoddesc}
Guido van Rossum04bc9d61997-04-30 18:12:27 +0000112
Andrew M. Kuchlingf07c3281998-12-31 21:14:23 +0000113\begin{methoddesc}[Compress]{flush}{\optional{mode}}
114All pending input is processed, and a string containing the remaining
115compressed output is returned. \var{mode} can be selected from the
116constants \constant{Z_SYNC_FLUSH}, \constant{Z_FULL_FLUSH}, or
117\constant{Z_FINISH}, defaulting to \constant{Z_FINISH}. \constant{Z_SYNC_FLUSH} and
Andrew M. Kuchlingc1c956b2005-09-01 14:08:38 +0000118\constant{Z_FULL_FLUSH} allow compressing further strings of data, while
Andrew M. Kuchlingf07c3281998-12-31 21:14:23 +0000119\constant{Z_FINISH} finishes the compressed stream and
120prevents compressing any more data. After calling
121\method{flush()} with \var{mode} set to \constant{Z_FINISH}, the
Fred Drakeed797831998-01-22 16:11:18 +0000122\method{compress()} method cannot be called again; the only realistic
Andrew M. Kuchlingf07c3281998-12-31 21:14:23 +0000123action is to delete the object.
Fred Drake74810d51998-04-03 06:49:26 +0000124\end{methoddesc}
Guido van Rossum04bc9d61997-04-30 18:12:27 +0000125
Georg Brandl8d3342b2006-05-16 07:38:27 +0000126\begin{methoddesc}[Compress]{copy}{}
127Returns a copy of the compression object. This can be used to efficiently
128compress a set of data that share a common initial prefix.
129\versionadded{2.5}
130\end{methoddesc}
131
Jeremy Hylton511e2ca2001-10-16 20:39:49 +0000132Decompression objects support the following methods, and two attributes:
Fred Drake38e5d272000-04-03 20:13:55 +0000133
134\begin{memberdesc}{unused_data}
Martin v. Löwis9e9a7c32003-06-21 14:15:25 +0000135A string which contains any bytes past the end of the compressed data.
136That is, this remains \code{""} until the last byte that contains
137compression data is available. If the whole string turned out to
138contain compressed data, this is \code{""}, the empty string.
Fred Drake38e5d272000-04-03 20:13:55 +0000139
140The only way to determine where a string of compressed data ends is by
141actually decompressing it. This means that when compressed data is
142contained part of a larger file, you can only find the end of it by
Martin v. Löwis9e9a7c32003-06-21 14:15:25 +0000143reading data and feeding it followed by some non-empty string into a
144decompression object's \method{decompress} method until the
145\member{unused_data} attribute is no longer the empty string.
Fred Drake38e5d272000-04-03 20:13:55 +0000146\end{memberdesc}
Guido van Rossum04bc9d61997-04-30 18:12:27 +0000147
Jeremy Hylton511e2ca2001-10-16 20:39:49 +0000148\begin{memberdesc}{unconsumed_tail}
149A string that contains any data that was not consumed by the last
150\method{decompress} call because it exceeded the limit for the
Martin v. Löwis9e9a7c32003-06-21 14:15:25 +0000151uncompressed data buffer. This data has not yet been seen by the zlib
152machinery, so you must feed it (possibly with further data
153concatenated to it) back to a subsequent \method{decompress} method
154call in order to get correct output.
Jeremy Hylton511e2ca2001-10-16 20:39:49 +0000155\end{memberdesc}
156
Martin v. Löwis9e9a7c32003-06-21 14:15:25 +0000157
Raymond Hettingerf9641542004-12-20 06:08:12 +0000158\begin{methoddesc}[Decompress]{decompress}{string\optional{, max_length}}
Guido van Rossum04bc9d61997-04-30 18:12:27 +0000159Decompress \var{string}, returning a string containing the
160uncompressed data corresponding to at least part of the data in
161\var{string}. This data should be concatenated to the output produced
162by any preceding calls to the
Fred Drakeed797831998-01-22 16:11:18 +0000163\method{decompress()} method. Some of the input data may be preserved
Guido van Rossum412154f1997-04-30 19:39:21 +0000164in internal buffers for later processing.
Jeremy Hylton511e2ca2001-10-16 20:39:49 +0000165
166If the optional parameter \var{max_length} is supplied then the return value
167will be no longer than \var{max_length}. This may mean that not all of the
168compressed input can be processed; and unconsumed data will be stored
169in the attribute \member{unconsumed_tail}. This string must be passed
170to a subsequent call to \method{decompress()} if decompression is to
171continue. If \var{max_length} is not supplied then the whole input is
172decompressed, and \member{unconsumed_tail} is an empty string.
Fred Drake74810d51998-04-03 06:49:26 +0000173\end{methoddesc}
Guido van Rossum04bc9d61997-04-30 18:12:27 +0000174
Georg Brandl22a9dc82006-04-01 07:39:41 +0000175\begin{methoddesc}[Decompress]{flush}{\optional{length}}
Guido van Rossum04bc9d61997-04-30 18:12:27 +0000176All pending input is processed, and a string containing the remaining
Fred Drakeed797831998-01-22 16:11:18 +0000177uncompressed output is returned. After calling \method{flush()}, the
178\method{decompress()} method cannot be called again; the only realistic
Guido van Rossum04bc9d61997-04-30 18:12:27 +0000179action is to delete the object.
Georg Brandl22a9dc82006-04-01 07:39:41 +0000180
181The optional parameter \var{length} sets the initial size of the
182output buffer.
Fred Drake74810d51998-04-03 06:49:26 +0000183\end{methoddesc}
Guido van Rossum04bc9d61997-04-30 18:12:27 +0000184
Georg Brandl8d3342b2006-05-16 07:38:27 +0000185\begin{methoddesc}[Decompress]{copy}{}
186Returns a copy of the decompression object. This can be used to save the
187state of the decompressor midway through the data stream in order to speed up
188random seeks into the stream at a future point.
189\versionadded{2.5}
190\end{methoddesc}
191
Guido van Rossume47da0a1997-07-17 16:34:52 +0000192\begin{seealso}
Fred Drakeba0a9892000-10-18 17:43:06 +0000193 \seemodule{gzip}{Reading and writing \program{gzip}-format files.}
Andrew M. Kuchling2330e9e2005-08-31 16:52:40 +0000194 \seeurl{http://www.zlib.net}{The zlib library home page.}
195 \seeurl{http://www.zlib.net/manual.html}{The zlib manual explains
196 the semantics and usage of the library's many functions.}
Guido van Rossume47da0a1997-07-17 16:34:52 +0000197\end{seealso}