Fred Drake | 295da24 | 1998-08-10 19:42:37 +0000 | [diff] [blame] | 1 | \section{\module{zlib} --- |
Fred Drake | b11d108 | 1999-04-21 18:44:41 +0000 | [diff] [blame] | 2 | Compression compatible with \program{gzip}} |
Fred Drake | b91e934 | 1998-07-23 17:59:49 +0000 | [diff] [blame] | 3 | |
Fred Drake | bbac432 | 1999-02-20 00:14:17 +0000 | [diff] [blame] | 4 | \declaremodule{builtin}{zlib} |
Fred Drake | 08caa96 | 1998-07-27 22:08:49 +0000 | [diff] [blame] | 5 | \modulesynopsis{Low-level interface to compression and decompression |
Fred Drake | b11d108 | 1999-04-21 18:44:41 +0000 | [diff] [blame] | 6 | routines compatible with \program{gzip}.} |
Fred Drake | b91e934 | 1998-07-23 17:59:49 +0000 | [diff] [blame] | 7 | |
Guido van Rossum | 04bc9d6 | 1997-04-30 18:12:27 +0000 | [diff] [blame] | 8 | |
| 9 | For applications that require data compression, the functions in this |
Fred Drake | 8a254b5 | 1998-04-09 15:41:44 +0000 | [diff] [blame] | 10 | module allow compression and decompression, using the zlib library. |
Andrew M. Kuchling | 57712b3 | 2004-10-19 19:50:23 +0000 | [diff] [blame] | 11 | The zlib library has its own home page at \url{http://www.gzip.org/zlib/}. |
| 12 | There are known incompatibilities between the Python module and |
| 13 | versions of the zlib library earlier than 1.1.3; 1.1.3 has a security |
| 14 | vulnerability, so we recommend using 1.1.4 or later. |
Jeremy Hylton | 45b0aed | 1999-04-05 21:55:21 +0000 | [diff] [blame] | 15 | |
Fred Drake | 74810d5 | 1998-04-03 06:49:26 +0000 | [diff] [blame] | 16 | The available exception and functions in this module are: |
Guido van Rossum | 04bc9d6 | 1997-04-30 18:12:27 +0000 | [diff] [blame] | 17 | |
Fred Drake | 74810d5 | 1998-04-03 06:49:26 +0000 | [diff] [blame] | 18 | \begin{excdesc}{error} |
| 19 | Exception raised on compression and decompression errors. |
| 20 | \end{excdesc} |
| 21 | |
| 22 | |
Fred Drake | cce1090 | 1998-03-17 06:33:25 +0000 | [diff] [blame] | 23 | \begin{funcdesc}{adler32}{string\optional{, value}} |
Guido van Rossum | 04bc9d6 | 1997-04-30 18:12:27 +0000 | [diff] [blame] | 24 | Computes a Adler-32 checksum of \var{string}. (An Adler-32 |
| 25 | checksum is almost as reliable as a CRC32 but can be computed much |
| 26 | more quickly.) If \var{value} is present, it is used as the |
| 27 | starting value of the checksum; otherwise, a fixed default value is |
| 28 | used. This allows computing a running checksum over the |
| 29 | concatenation of several input strings. The algorithm is not |
| 30 | cryptographically strong, and should not be used for |
Fred Drake | 327798c | 2001-10-15 13:45:49 +0000 | [diff] [blame] | 31 | authentication or digital signatures. Since the algorithm is |
| 32 | designed for use as a checksum algorithm, it is not suitable for |
| 33 | use as a general hash algorithm. |
Guido van Rossum | 04bc9d6 | 1997-04-30 18:12:27 +0000 | [diff] [blame] | 34 | \end{funcdesc} |
| 35 | |
Fred Drake | cce1090 | 1998-03-17 06:33:25 +0000 | [diff] [blame] | 36 | \begin{funcdesc}{compress}{string\optional{, level}} |
Fred Drake | 5916070 | 1998-06-19 21:18:28 +0000 | [diff] [blame] | 37 | Compresses the data in \var{string}, returning a string contained |
| 38 | compressed data. \var{level} is an integer from \code{1} to |
| 39 | \code{9} controlling the level of compression; \code{1} is fastest |
| 40 | and produces the least compression, \code{9} is slowest and produces |
| 41 | the most. The default value is \code{6}. Raises the |
| 42 | \exception{error} exception if any error occurs. |
Guido van Rossum | 04bc9d6 | 1997-04-30 18:12:27 +0000 | [diff] [blame] | 43 | \end{funcdesc} |
| 44 | |
| 45 | \begin{funcdesc}{compressobj}{\optional{level}} |
Fred Drake | ed79783 | 1998-01-22 16:11:18 +0000 | [diff] [blame] | 46 | Returns a compression object, to be used for compressing data streams |
Guido van Rossum | 04bc9d6 | 1997-04-30 18:12:27 +0000 | [diff] [blame] | 47 | that won't fit into memory at once. \var{level} is an integer from |
Fred Drake | ed79783 | 1998-01-22 16:11:18 +0000 | [diff] [blame] | 48 | \code{1} to \code{9} controlling the level of compression; \code{1} is |
| 49 | fastest and produces the least compression, \code{9} is slowest and |
| 50 | produces the most. The default value is \code{6}. |
Guido van Rossum | 04bc9d6 | 1997-04-30 18:12:27 +0000 | [diff] [blame] | 51 | \end{funcdesc} |
| 52 | |
Fred Drake | cce1090 | 1998-03-17 06:33:25 +0000 | [diff] [blame] | 53 | \begin{funcdesc}{crc32}{string\optional{, value}} |
Fred Drake | 74810d5 | 1998-04-03 06:49:26 +0000 | [diff] [blame] | 54 | Computes a CRC (Cyclic Redundancy Check)% |
| 55 | \index{Cyclic Redundancy Check} |
Fred Drake | b208f12 | 1998-04-04 06:28:54 +0000 | [diff] [blame] | 56 | \index{checksum!Cyclic Redundancy Check} |
Fred Drake | 74810d5 | 1998-04-03 06:49:26 +0000 | [diff] [blame] | 57 | checksum of \var{string}. If |
| 58 | \var{value} is present, it is used as the starting value of the |
| 59 | checksum; otherwise, a fixed default value is used. This allows |
| 60 | computing a running checksum over the concatenation of several |
| 61 | input strings. The algorithm is not cryptographically strong, and |
Fred Drake | 327798c | 2001-10-15 13:45:49 +0000 | [diff] [blame] | 62 | should not be used for authentication or digital signatures. Since |
| 63 | the algorithm is designed for use as a checksum algorithm, it is not |
| 64 | suitable for use as a general hash algorithm. |
Guido van Rossum | 04bc9d6 | 1997-04-30 18:12:27 +0000 | [diff] [blame] | 65 | \end{funcdesc} |
| 66 | |
Fred Drake | 38e5d27 | 2000-04-03 20:13:55 +0000 | [diff] [blame] | 67 | \begin{funcdesc}{decompress}{string\optional{, wbits\optional{, bufsize}}} |
Fred Drake | 5916070 | 1998-06-19 21:18:28 +0000 | [diff] [blame] | 68 | Decompresses the data in \var{string}, returning a string containing |
| 69 | the uncompressed data. The \var{wbits} parameter controls the size of |
Fred Drake | 38e5d27 | 2000-04-03 20:13:55 +0000 | [diff] [blame] | 70 | the window buffer. If \var{bufsize} is given, it is used as the |
Fred Drake | 5916070 | 1998-06-19 21:18:28 +0000 | [diff] [blame] | 71 | initial size of the output buffer. Raises the \exception{error} |
| 72 | exception if any error occurs. |
Fred Drake | 38e5d27 | 2000-04-03 20:13:55 +0000 | [diff] [blame] | 73 | |
| 74 | The absolute value of \var{wbits} is the base two logarithm of the |
| 75 | size of the history buffer (the ``window size'') used when compressing |
| 76 | data. Its absolute value should be between 8 and 15 for the most |
| 77 | recent versions of the zlib library, larger values resulting in better |
| 78 | compression at the expense of greater memory usage. The default value |
| 79 | is 15. When \var{wbits} is negative, the standard |
| 80 | \program{gzip} header is suppressed; this is an undocumented feature |
| 81 | of the zlib library, used for compatibility with \program{unzip}'s |
| 82 | compression file format. |
| 83 | |
| 84 | \var{bufsize} is the initial size of the buffer used to hold |
| 85 | decompressed data. If more space is required, the buffer size will be |
| 86 | increased as needed, so you don't have to get this value exactly |
| 87 | right; tuning it will only save a few calls to \cfunction{malloc()}. The |
| 88 | default size is 16384. |
| 89 | |
Guido van Rossum | 04bc9d6 | 1997-04-30 18:12:27 +0000 | [diff] [blame] | 90 | \end{funcdesc} |
| 91 | |
| 92 | \begin{funcdesc}{decompressobj}{\optional{wbits}} |
Fred Drake | bc524c4 | 2001-04-18 20:16:51 +0000 | [diff] [blame] | 93 | Returns a decompression object, to be used for decompressing data |
Fred Drake | 5916070 | 1998-06-19 21:18:28 +0000 | [diff] [blame] | 94 | streams that won't fit into memory at once. The \var{wbits} |
| 95 | parameter controls the size of the window buffer. |
Guido van Rossum | 04bc9d6 | 1997-04-30 18:12:27 +0000 | [diff] [blame] | 96 | \end{funcdesc} |
| 97 | |
| 98 | Compression objects support the following methods: |
| 99 | |
Fred Drake | 74810d5 | 1998-04-03 06:49:26 +0000 | [diff] [blame] | 100 | \begin{methoddesc}[Compress]{compress}{string} |
Guido van Rossum | 04bc9d6 | 1997-04-30 18:12:27 +0000 | [diff] [blame] | 101 | Compress \var{string}, returning a string containing compressed data |
| 102 | for at least part of the data in \var{string}. This data should be |
| 103 | concatenated to the output produced by any preceding calls to the |
Fred Drake | ed79783 | 1998-01-22 16:11:18 +0000 | [diff] [blame] | 104 | \method{compress()} method. Some input may be kept in internal buffers |
Guido van Rossum | 04bc9d6 | 1997-04-30 18:12:27 +0000 | [diff] [blame] | 105 | for later processing. |
Fred Drake | 74810d5 | 1998-04-03 06:49:26 +0000 | [diff] [blame] | 106 | \end{methoddesc} |
Guido van Rossum | 04bc9d6 | 1997-04-30 18:12:27 +0000 | [diff] [blame] | 107 | |
Andrew M. Kuchling | f07c328 | 1998-12-31 21:14:23 +0000 | [diff] [blame] | 108 | \begin{methoddesc}[Compress]{flush}{\optional{mode}} |
| 109 | All pending input is processed, and a string containing the remaining |
| 110 | compressed output is returned. \var{mode} can be selected from the |
| 111 | constants \constant{Z_SYNC_FLUSH}, \constant{Z_FULL_FLUSH}, or |
| 112 | \constant{Z_FINISH}, defaulting to \constant{Z_FINISH}. \constant{Z_SYNC_FLUSH} and |
| 113 | \constant{Z_FULL_FLUSH} allow compressing further strings of data and |
| 114 | are used to allow partial error recovery on decompression, while |
| 115 | \constant{Z_FINISH} finishes the compressed stream and |
| 116 | prevents compressing any more data. After calling |
| 117 | \method{flush()} with \var{mode} set to \constant{Z_FINISH}, the |
Fred Drake | ed79783 | 1998-01-22 16:11:18 +0000 | [diff] [blame] | 118 | \method{compress()} method cannot be called again; the only realistic |
Andrew M. Kuchling | f07c328 | 1998-12-31 21:14:23 +0000 | [diff] [blame] | 119 | action is to delete the object. |
Fred Drake | 74810d5 | 1998-04-03 06:49:26 +0000 | [diff] [blame] | 120 | \end{methoddesc} |
Guido van Rossum | 04bc9d6 | 1997-04-30 18:12:27 +0000 | [diff] [blame] | 121 | |
Jeremy Hylton | 511e2ca | 2001-10-16 20:39:49 +0000 | [diff] [blame] | 122 | Decompression objects support the following methods, and two attributes: |
Fred Drake | 38e5d27 | 2000-04-03 20:13:55 +0000 | [diff] [blame] | 123 | |
| 124 | \begin{memberdesc}{unused_data} |
Martin v. Löwis | 9e9a7c3 | 2003-06-21 14:15:25 +0000 | [diff] [blame] | 125 | A string which contains any bytes past the end of the compressed data. |
| 126 | That is, this remains \code{""} until the last byte that contains |
| 127 | compression data is available. If the whole string turned out to |
| 128 | contain compressed data, this is \code{""}, the empty string. |
Fred Drake | 38e5d27 | 2000-04-03 20:13:55 +0000 | [diff] [blame] | 129 | |
| 130 | The only way to determine where a string of compressed data ends is by |
| 131 | actually decompressing it. This means that when compressed data is |
| 132 | contained part of a larger file, you can only find the end of it by |
Martin v. Löwis | 9e9a7c3 | 2003-06-21 14:15:25 +0000 | [diff] [blame] | 133 | reading data and feeding it followed by some non-empty string into a |
| 134 | decompression object's \method{decompress} method until the |
| 135 | \member{unused_data} attribute is no longer the empty string. |
Fred Drake | 38e5d27 | 2000-04-03 20:13:55 +0000 | [diff] [blame] | 136 | \end{memberdesc} |
Guido van Rossum | 04bc9d6 | 1997-04-30 18:12:27 +0000 | [diff] [blame] | 137 | |
Jeremy Hylton | 511e2ca | 2001-10-16 20:39:49 +0000 | [diff] [blame] | 138 | \begin{memberdesc}{unconsumed_tail} |
| 139 | A string that contains any data that was not consumed by the last |
| 140 | \method{decompress} call because it exceeded the limit for the |
Martin v. Löwis | 9e9a7c3 | 2003-06-21 14:15:25 +0000 | [diff] [blame] | 141 | uncompressed data buffer. This data has not yet been seen by the zlib |
| 142 | machinery, so you must feed it (possibly with further data |
| 143 | concatenated to it) back to a subsequent \method{decompress} method |
| 144 | call in order to get correct output. |
Jeremy Hylton | 511e2ca | 2001-10-16 20:39:49 +0000 | [diff] [blame] | 145 | \end{memberdesc} |
| 146 | |
Martin v. Löwis | 9e9a7c3 | 2003-06-21 14:15:25 +0000 | [diff] [blame] | 147 | |
Raymond Hettinger | f964154 | 2004-12-20 06:08:12 +0000 | [diff] [blame] | 148 | \begin{methoddesc}[Decompress]{decompress}{string\optional{, max_length}} |
Guido van Rossum | 04bc9d6 | 1997-04-30 18:12:27 +0000 | [diff] [blame] | 149 | Decompress \var{string}, returning a string containing the |
| 150 | uncompressed data corresponding to at least part of the data in |
| 151 | \var{string}. This data should be concatenated to the output produced |
| 152 | by any preceding calls to the |
Fred Drake | ed79783 | 1998-01-22 16:11:18 +0000 | [diff] [blame] | 153 | \method{decompress()} method. Some of the input data may be preserved |
Guido van Rossum | 412154f | 1997-04-30 19:39:21 +0000 | [diff] [blame] | 154 | in internal buffers for later processing. |
Jeremy Hylton | 511e2ca | 2001-10-16 20:39:49 +0000 | [diff] [blame] | 155 | |
| 156 | If the optional parameter \var{max_length} is supplied then the return value |
| 157 | will be no longer than \var{max_length}. This may mean that not all of the |
| 158 | compressed input can be processed; and unconsumed data will be stored |
| 159 | in the attribute \member{unconsumed_tail}. This string must be passed |
| 160 | to a subsequent call to \method{decompress()} if decompression is to |
| 161 | continue. If \var{max_length} is not supplied then the whole input is |
| 162 | decompressed, and \member{unconsumed_tail} is an empty string. |
Fred Drake | 74810d5 | 1998-04-03 06:49:26 +0000 | [diff] [blame] | 163 | \end{methoddesc} |
Guido van Rossum | 04bc9d6 | 1997-04-30 18:12:27 +0000 | [diff] [blame] | 164 | |
Fred Drake | 74810d5 | 1998-04-03 06:49:26 +0000 | [diff] [blame] | 165 | \begin{methoddesc}[Decompress]{flush}{} |
Guido van Rossum | 04bc9d6 | 1997-04-30 18:12:27 +0000 | [diff] [blame] | 166 | All pending input is processed, and a string containing the remaining |
Fred Drake | ed79783 | 1998-01-22 16:11:18 +0000 | [diff] [blame] | 167 | uncompressed output is returned. After calling \method{flush()}, the |
| 168 | \method{decompress()} method cannot be called again; the only realistic |
Guido van Rossum | 04bc9d6 | 1997-04-30 18:12:27 +0000 | [diff] [blame] | 169 | action is to delete the object. |
Fred Drake | 74810d5 | 1998-04-03 06:49:26 +0000 | [diff] [blame] | 170 | \end{methoddesc} |
Guido van Rossum | 04bc9d6 | 1997-04-30 18:12:27 +0000 | [diff] [blame] | 171 | |
Guido van Rossum | e47da0a | 1997-07-17 16:34:52 +0000 | [diff] [blame] | 172 | \begin{seealso} |
Fred Drake | ba0a989 | 2000-10-18 17:43:06 +0000 | [diff] [blame] | 173 | \seemodule{gzip}{Reading and writing \program{gzip}-format files.} |
Fred Drake | b037d33 | 2001-06-25 15:30:13 +0000 | [diff] [blame] | 174 | \seeurl{http://www.gzip.org/zlib/}{The zlib library home page.} |
Guido van Rossum | e47da0a | 1997-07-17 16:34:52 +0000 | [diff] [blame] | 175 | \end{seealso} |