blob: db62a270c322a9a5e604b7648af990511c8ab28d [file] [log] [blame]
Fred Drake295da241998-08-10 19:42:37 +00001\section{\module{zlib} ---
Fred Drakeb11d1081999-04-21 18:44:41 +00002 Compression compatible with \program{gzip}}
Fred Drakeb91e9341998-07-23 17:59:49 +00003
Fred Drakebbac4321999-02-20 00:14:17 +00004\declaremodule{builtin}{zlib}
Fred Drake08caa961998-07-27 22:08:49 +00005\modulesynopsis{Low-level interface to compression and decompression
Fred Drakeb11d1081999-04-21 18:44:41 +00006 routines compatible with \program{gzip}.}
Fred Drakeb91e9341998-07-23 17:59:49 +00007
Guido van Rossum04bc9d61997-04-30 18:12:27 +00008
9For applications that require data compression, the functions in this
Fred Drake8a254b51998-04-09 15:41:44 +000010module allow compression and decompression, using the zlib library.
Andrew M. Kuchling57712b32004-10-19 19:50:23 +000011The zlib library has its own home page at \url{http://www.gzip.org/zlib/}.
12There are known incompatibilities between the Python module and
13versions of the zlib library earlier than 1.1.3; 1.1.3 has a security
14vulnerability, so we recommend using 1.1.4 or later.
Jeremy Hylton45b0aed1999-04-05 21:55:21 +000015
Fred Drake74810d51998-04-03 06:49:26 +000016The available exception and functions in this module are:
Guido van Rossum04bc9d61997-04-30 18:12:27 +000017
Fred Drake74810d51998-04-03 06:49:26 +000018\begin{excdesc}{error}
19 Exception raised on compression and decompression errors.
20\end{excdesc}
21
22
Fred Drakecce10901998-03-17 06:33:25 +000023\begin{funcdesc}{adler32}{string\optional{, value}}
Guido van Rossum04bc9d61997-04-30 18:12:27 +000024 Computes a Adler-32 checksum of \var{string}. (An Adler-32
25 checksum is almost as reliable as a CRC32 but can be computed much
26 more quickly.) If \var{value} is present, it is used as the
27 starting value of the checksum; otherwise, a fixed default value is
28 used. This allows computing a running checksum over the
29 concatenation of several input strings. The algorithm is not
30 cryptographically strong, and should not be used for
Fred Drake327798c2001-10-15 13:45:49 +000031 authentication or digital signatures. Since the algorithm is
32 designed for use as a checksum algorithm, it is not suitable for
33 use as a general hash algorithm.
Guido van Rossum04bc9d61997-04-30 18:12:27 +000034\end{funcdesc}
35
Fred Drakecce10901998-03-17 06:33:25 +000036\begin{funcdesc}{compress}{string\optional{, level}}
Fred Drake59160701998-06-19 21:18:28 +000037 Compresses the data in \var{string}, returning a string contained
38 compressed data. \var{level} is an integer from \code{1} to
39 \code{9} controlling the level of compression; \code{1} is fastest
40 and produces the least compression, \code{9} is slowest and produces
41 the most. The default value is \code{6}. Raises the
42 \exception{error} exception if any error occurs.
Guido van Rossum04bc9d61997-04-30 18:12:27 +000043\end{funcdesc}
44
45\begin{funcdesc}{compressobj}{\optional{level}}
Fred Drakeed797831998-01-22 16:11:18 +000046 Returns a compression object, to be used for compressing data streams
Guido van Rossum04bc9d61997-04-30 18:12:27 +000047 that won't fit into memory at once. \var{level} is an integer from
Fred Drakeed797831998-01-22 16:11:18 +000048 \code{1} to \code{9} controlling the level of compression; \code{1} is
49 fastest and produces the least compression, \code{9} is slowest and
50 produces the most. The default value is \code{6}.
Guido van Rossum04bc9d61997-04-30 18:12:27 +000051\end{funcdesc}
52
Fred Drakecce10901998-03-17 06:33:25 +000053\begin{funcdesc}{crc32}{string\optional{, value}}
Fred Drake74810d51998-04-03 06:49:26 +000054 Computes a CRC (Cyclic Redundancy Check)%
55 \index{Cyclic Redundancy Check}
Fred Drakeb208f121998-04-04 06:28:54 +000056 \index{checksum!Cyclic Redundancy Check}
Fred Drake74810d51998-04-03 06:49:26 +000057 checksum of \var{string}. If
58 \var{value} is present, it is used as the starting value of the
59 checksum; otherwise, a fixed default value is used. This allows
60 computing a running checksum over the concatenation of several
61 input strings. The algorithm is not cryptographically strong, and
Fred Drake327798c2001-10-15 13:45:49 +000062 should not be used for authentication or digital signatures. Since
63 the algorithm is designed for use as a checksum algorithm, it is not
64 suitable for use as a general hash algorithm.
Guido van Rossum04bc9d61997-04-30 18:12:27 +000065\end{funcdesc}
66
Fred Drake38e5d272000-04-03 20:13:55 +000067\begin{funcdesc}{decompress}{string\optional{, wbits\optional{, bufsize}}}
Fred Drake59160701998-06-19 21:18:28 +000068 Decompresses the data in \var{string}, returning a string containing
69 the uncompressed data. The \var{wbits} parameter controls the size of
Fred Drake38e5d272000-04-03 20:13:55 +000070 the window buffer. If \var{bufsize} is given, it is used as the
Fred Drake59160701998-06-19 21:18:28 +000071 initial size of the output buffer. Raises the \exception{error}
72 exception if any error occurs.
Fred Drake38e5d272000-04-03 20:13:55 +000073
74The absolute value of \var{wbits} is the base two logarithm of the
75size of the history buffer (the ``window size'') used when compressing
76data. Its absolute value should be between 8 and 15 for the most
77recent versions of the zlib library, larger values resulting in better
78compression at the expense of greater memory usage. The default value
79is 15. When \var{wbits} is negative, the standard
80\program{gzip} header is suppressed; this is an undocumented feature
81of the zlib library, used for compatibility with \program{unzip}'s
82compression file format.
83
84\var{bufsize} is the initial size of the buffer used to hold
85decompressed data. If more space is required, the buffer size will be
86increased as needed, so you don't have to get this value exactly
87right; tuning it will only save a few calls to \cfunction{malloc()}. The
88default size is 16384.
89
Guido van Rossum04bc9d61997-04-30 18:12:27 +000090\end{funcdesc}
91
92\begin{funcdesc}{decompressobj}{\optional{wbits}}
Fred Drakebc524c42001-04-18 20:16:51 +000093 Returns a decompression object, to be used for decompressing data
Fred Drake59160701998-06-19 21:18:28 +000094 streams that won't fit into memory at once. The \var{wbits}
95 parameter controls the size of the window buffer.
Guido van Rossum04bc9d61997-04-30 18:12:27 +000096\end{funcdesc}
97
98Compression objects support the following methods:
99
Fred Drake74810d51998-04-03 06:49:26 +0000100\begin{methoddesc}[Compress]{compress}{string}
Guido van Rossum04bc9d61997-04-30 18:12:27 +0000101Compress \var{string}, returning a string containing compressed data
102for at least part of the data in \var{string}. This data should be
103concatenated to the output produced by any preceding calls to the
Fred Drakeed797831998-01-22 16:11:18 +0000104\method{compress()} method. Some input may be kept in internal buffers
Guido van Rossum04bc9d61997-04-30 18:12:27 +0000105for later processing.
Fred Drake74810d51998-04-03 06:49:26 +0000106\end{methoddesc}
Guido van Rossum04bc9d61997-04-30 18:12:27 +0000107
Andrew M. Kuchlingf07c3281998-12-31 21:14:23 +0000108\begin{methoddesc}[Compress]{flush}{\optional{mode}}
109All pending input is processed, and a string containing the remaining
110compressed output is returned. \var{mode} can be selected from the
111constants \constant{Z_SYNC_FLUSH}, \constant{Z_FULL_FLUSH}, or
112\constant{Z_FINISH}, defaulting to \constant{Z_FINISH}. \constant{Z_SYNC_FLUSH} and
113\constant{Z_FULL_FLUSH} allow compressing further strings of data and
114are used to allow partial error recovery on decompression, while
115\constant{Z_FINISH} finishes the compressed stream and
116prevents compressing any more data. After calling
117\method{flush()} with \var{mode} set to \constant{Z_FINISH}, the
Fred Drakeed797831998-01-22 16:11:18 +0000118\method{compress()} method cannot be called again; the only realistic
Andrew M. Kuchlingf07c3281998-12-31 21:14:23 +0000119action is to delete the object.
Fred Drake74810d51998-04-03 06:49:26 +0000120\end{methoddesc}
Guido van Rossum04bc9d61997-04-30 18:12:27 +0000121
Jeremy Hylton511e2ca2001-10-16 20:39:49 +0000122Decompression objects support the following methods, and two attributes:
Fred Drake38e5d272000-04-03 20:13:55 +0000123
124\begin{memberdesc}{unused_data}
Martin v. Löwis9e9a7c32003-06-21 14:15:25 +0000125A string which contains any bytes past the end of the compressed data.
126That is, this remains \code{""} until the last byte that contains
127compression data is available. If the whole string turned out to
128contain compressed data, this is \code{""}, the empty string.
Fred Drake38e5d272000-04-03 20:13:55 +0000129
130The only way to determine where a string of compressed data ends is by
131actually decompressing it. This means that when compressed data is
132contained part of a larger file, you can only find the end of it by
Martin v. Löwis9e9a7c32003-06-21 14:15:25 +0000133reading data and feeding it followed by some non-empty string into a
134decompression object's \method{decompress} method until the
135\member{unused_data} attribute is no longer the empty string.
Fred Drake38e5d272000-04-03 20:13:55 +0000136\end{memberdesc}
Guido van Rossum04bc9d61997-04-30 18:12:27 +0000137
Jeremy Hylton511e2ca2001-10-16 20:39:49 +0000138\begin{memberdesc}{unconsumed_tail}
139A string that contains any data that was not consumed by the last
140\method{decompress} call because it exceeded the limit for the
Martin v. Löwis9e9a7c32003-06-21 14:15:25 +0000141uncompressed data buffer. This data has not yet been seen by the zlib
142machinery, so you must feed it (possibly with further data
143concatenated to it) back to a subsequent \method{decompress} method
144call in order to get correct output.
Jeremy Hylton511e2ca2001-10-16 20:39:49 +0000145\end{memberdesc}
146
Martin v. Löwis9e9a7c32003-06-21 14:15:25 +0000147
Jeremy Hylton511e2ca2001-10-16 20:39:49 +0000148\begin{methoddesc}[Decompress]{decompress}{string}{\optional{max_length}}
Guido van Rossum04bc9d61997-04-30 18:12:27 +0000149Decompress \var{string}, returning a string containing the
150uncompressed data corresponding to at least part of the data in
151\var{string}. This data should be concatenated to the output produced
152by any preceding calls to the
Fred Drakeed797831998-01-22 16:11:18 +0000153\method{decompress()} method. Some of the input data may be preserved
Guido van Rossum412154f1997-04-30 19:39:21 +0000154in internal buffers for later processing.
Jeremy Hylton511e2ca2001-10-16 20:39:49 +0000155
156If the optional parameter \var{max_length} is supplied then the return value
157will be no longer than \var{max_length}. This may mean that not all of the
158compressed input can be processed; and unconsumed data will be stored
159in the attribute \member{unconsumed_tail}. This string must be passed
160to a subsequent call to \method{decompress()} if decompression is to
161continue. If \var{max_length} is not supplied then the whole input is
162decompressed, and \member{unconsumed_tail} is an empty string.
Fred Drake74810d51998-04-03 06:49:26 +0000163\end{methoddesc}
Guido van Rossum04bc9d61997-04-30 18:12:27 +0000164
Fred Drake74810d51998-04-03 06:49:26 +0000165\begin{methoddesc}[Decompress]{flush}{}
Guido van Rossum04bc9d61997-04-30 18:12:27 +0000166All pending input is processed, and a string containing the remaining
Fred Drakeed797831998-01-22 16:11:18 +0000167uncompressed output is returned. After calling \method{flush()}, the
168\method{decompress()} method cannot be called again; the only realistic
Guido van Rossum04bc9d61997-04-30 18:12:27 +0000169action is to delete the object.
Fred Drake74810d51998-04-03 06:49:26 +0000170\end{methoddesc}
Guido van Rossum04bc9d61997-04-30 18:12:27 +0000171
Guido van Rossume47da0a1997-07-17 16:34:52 +0000172\begin{seealso}
Fred Drakeba0a9892000-10-18 17:43:06 +0000173 \seemodule{gzip}{Reading and writing \program{gzip}-format files.}
Fred Drakeb037d332001-06-25 15:30:13 +0000174 \seeurl{http://www.gzip.org/zlib/}{The zlib library home page.}
Guido van Rossume47da0a1997-07-17 16:34:52 +0000175\end{seealso}