| Georg Brandl | 8ec7f65 | 2007-08-15 14:28:01 +0000 | [diff] [blame] | 1 |  | 
|  | 2 | :mod:`zlib` --- Compression compatible with :program:`gzip` | 
|  | 3 | =========================================================== | 
|  | 4 |  | 
|  | 5 | .. module:: zlib | 
|  | 6 | :synopsis: Low-level interface to compression and decompression routines compatible with | 
|  | 7 | gzip. | 
|  | 8 |  | 
|  | 9 |  | 
|  | 10 | For applications that require data compression, the functions in this module | 
|  | 11 | allow compression and decompression, using the zlib library. The zlib library | 
|  | 12 | has its own home page at http://www.zlib.net.   There are known | 
|  | 13 | incompatibilities between the Python module and versions of the zlib library | 
|  | 14 | earlier than 1.1.3; 1.1.3 has a security vulnerability, so we recommend using | 
|  | 15 | 1.1.4 or later. | 
|  | 16 |  | 
|  | 17 | zlib's functions have many options and often need to be used in a particular | 
|  | 18 | order.  This documentation doesn't attempt to cover all of the permutations; | 
|  | 19 | consult the zlib manual at http://www.zlib.net/manual.html for authoritative | 
|  | 20 | information. | 
|  | 21 |  | 
| Mark Summerfield | aea6e59 | 2007-11-05 09:22:48 +0000 | [diff] [blame] | 22 | For reading and writing ``.gz`` files see the :mod:`gzip` module. For | 
|  | 23 | other archive formats, see the :mod:`bz2`, :mod:`zipfile`, and | 
|  | 24 | :mod:`tarfile` modules. | 
|  | 25 |  | 
| Georg Brandl | 8ec7f65 | 2007-08-15 14:28:01 +0000 | [diff] [blame] | 26 | The available exception and functions in this module are: | 
|  | 27 |  | 
|  | 28 |  | 
|  | 29 | .. exception:: error | 
|  | 30 |  | 
|  | 31 | Exception raised on compression and decompression errors. | 
|  | 32 |  | 
|  | 33 |  | 
|  | 34 | .. function:: adler32(string[, value]) | 
|  | 35 |  | 
|  | 36 | Computes a Adler-32 checksum of *string*.  (An Adler-32 checksum is almost as | 
|  | 37 | reliable as a CRC32 but can be computed much more quickly.)  If *value* is | 
|  | 38 | present, it is used as the starting value of the checksum; otherwise, a fixed | 
|  | 39 | default value is used.  This allows computing a running checksum over the | 
|  | 40 | concatenation of several input strings.  The algorithm is not cryptographically | 
|  | 41 | strong, and should not be used for authentication or digital signatures.  Since | 
|  | 42 | the algorithm is designed for use as a checksum algorithm, it is not suitable | 
|  | 43 | for use as a general hash algorithm. | 
|  | 44 |  | 
|  | 45 |  | 
|  | 46 | .. function:: compress(string[, level]) | 
|  | 47 |  | 
|  | 48 | Compresses the data in *string*, returning a string contained compressed data. | 
|  | 49 | *level* is an integer from ``1`` to ``9`` controlling the level of compression; | 
|  | 50 | ``1`` is fastest and produces the least compression, ``9`` is slowest and | 
|  | 51 | produces the most.  The default value is ``6``.  Raises the :exc:`error` | 
|  | 52 | exception if any error occurs. | 
|  | 53 |  | 
|  | 54 |  | 
|  | 55 | .. function:: compressobj([level]) | 
|  | 56 |  | 
|  | 57 | Returns a compression object, to be used for compressing data streams that won't | 
|  | 58 | fit into memory at once.  *level* is an integer from ``1`` to ``9`` controlling | 
|  | 59 | the level of compression; ``1`` is fastest and produces the least compression, | 
|  | 60 | ``9`` is slowest and produces the most.  The default value is ``6``. | 
|  | 61 |  | 
|  | 62 |  | 
|  | 63 | .. function:: crc32(string[, value]) | 
|  | 64 |  | 
|  | 65 | .. index:: | 
|  | 66 | single: Cyclic Redundancy Check | 
|  | 67 | single: checksum; Cyclic Redundancy Check | 
|  | 68 |  | 
|  | 69 | Computes a CRC (Cyclic Redundancy Check)  checksum of *string*. If *value* is | 
|  | 70 | present, it is used as the starting value of the checksum; otherwise, a fixed | 
|  | 71 | default value is used.  This allows computing a running checksum over the | 
|  | 72 | concatenation of several input strings.  The algorithm is not cryptographically | 
|  | 73 | strong, and should not be used for authentication or digital signatures.  Since | 
|  | 74 | the algorithm is designed for use as a checksum algorithm, it is not suitable | 
|  | 75 | for use as a general hash algorithm. | 
|  | 76 |  | 
| Georg Brandl | 8ec7f65 | 2007-08-15 14:28:01 +0000 | [diff] [blame] | 77 |  | 
|  | 78 | .. function:: decompress(string[, wbits[, bufsize]]) | 
|  | 79 |  | 
|  | 80 | Decompresses the data in *string*, returning a string containing the | 
|  | 81 | uncompressed data.  The *wbits* parameter controls the size of the window | 
|  | 82 | buffer.  If *bufsize* is given, it is used as the initial size of the output | 
|  | 83 | buffer.  Raises the :exc:`error` exception if any error occurs. | 
|  | 84 |  | 
|  | 85 | The absolute value of *wbits* is the base two logarithm of the size of the | 
|  | 86 | history buffer (the "window size") used when compressing data.  Its absolute | 
|  | 87 | value should be between 8 and 15 for the most recent versions of the zlib | 
|  | 88 | library, larger values resulting in better compression at the expense of greater | 
|  | 89 | memory usage.  The default value is 15.  When *wbits* is negative, the standard | 
|  | 90 | :program:`gzip` header is suppressed; this is an undocumented feature of the | 
|  | 91 | zlib library, used for compatibility with :program:`unzip`'s compression file | 
|  | 92 | format. | 
|  | 93 |  | 
|  | 94 | *bufsize* is the initial size of the buffer used to hold decompressed data.  If | 
|  | 95 | more space is required, the buffer size will be increased as needed, so you | 
|  | 96 | don't have to get this value exactly right; tuning it will only save a few calls | 
|  | 97 | to :cfunc:`malloc`.  The default size is 16384. | 
|  | 98 |  | 
|  | 99 |  | 
|  | 100 | .. function:: decompressobj([wbits]) | 
|  | 101 |  | 
|  | 102 | Returns a decompression object, to be used for decompressing data streams that | 
|  | 103 | won't fit into memory at once.  The *wbits* parameter controls the size of the | 
|  | 104 | window buffer. | 
|  | 105 |  | 
|  | 106 | Compression objects support the following methods: | 
|  | 107 |  | 
|  | 108 |  | 
|  | 109 | .. method:: Compress.compress(string) | 
|  | 110 |  | 
|  | 111 | Compress *string*, returning a string containing compressed data for at least | 
|  | 112 | part of the data in *string*.  This data should be concatenated to the output | 
|  | 113 | produced by any preceding calls to the :meth:`compress` method.  Some input may | 
|  | 114 | be kept in internal buffers for later processing. | 
|  | 115 |  | 
|  | 116 |  | 
|  | 117 | .. method:: Compress.flush([mode]) | 
|  | 118 |  | 
|  | 119 | All pending input is processed, and a string containing the remaining compressed | 
|  | 120 | output is returned.  *mode* can be selected from the constants | 
|  | 121 | :const:`Z_SYNC_FLUSH`,  :const:`Z_FULL_FLUSH`,  or  :const:`Z_FINISH`, | 
|  | 122 | defaulting to :const:`Z_FINISH`.  :const:`Z_SYNC_FLUSH` and | 
|  | 123 | :const:`Z_FULL_FLUSH` allow compressing further strings of data, while | 
|  | 124 | :const:`Z_FINISH` finishes the compressed stream and  prevents compressing any | 
|  | 125 | more data.  After calling :meth:`flush` with *mode* set to :const:`Z_FINISH`, | 
|  | 126 | the :meth:`compress` method cannot be called again; the only realistic action is | 
|  | 127 | to delete the object. | 
|  | 128 |  | 
|  | 129 |  | 
|  | 130 | .. method:: Compress.copy() | 
|  | 131 |  | 
|  | 132 | Returns a copy of the compression object.  This can be used to efficiently | 
|  | 133 | compress a set of data that share a common initial prefix. | 
|  | 134 |  | 
|  | 135 | .. versionadded:: 2.5 | 
|  | 136 |  | 
|  | 137 | Decompression objects support the following methods, and two attributes: | 
|  | 138 |  | 
|  | 139 |  | 
|  | 140 | .. attribute:: Decompress.unused_data | 
|  | 141 |  | 
|  | 142 | A string which contains any bytes past the end of the compressed data. That is, | 
|  | 143 | this remains ``""`` until the last byte that contains compression data is | 
|  | 144 | available.  If the whole string turned out to contain compressed data, this is | 
|  | 145 | ``""``, the empty string. | 
|  | 146 |  | 
|  | 147 | The only way to determine where a string of compressed data ends is by actually | 
|  | 148 | decompressing it.  This means that when compressed data is contained part of a | 
|  | 149 | larger file, you can only find the end of it by reading data and feeding it | 
|  | 150 | followed by some non-empty string into a decompression object's | 
|  | 151 | :meth:`decompress` method until the :attr:`unused_data` attribute is no longer | 
|  | 152 | the empty string. | 
|  | 153 |  | 
|  | 154 |  | 
|  | 155 | .. attribute:: Decompress.unconsumed_tail | 
|  | 156 |  | 
|  | 157 | A string that contains any data that was not consumed by the last | 
|  | 158 | :meth:`decompress` call because it exceeded the limit for the uncompressed data | 
|  | 159 | buffer.  This data has not yet been seen by the zlib machinery, so you must feed | 
|  | 160 | it (possibly with further data concatenated to it) back to a subsequent | 
|  | 161 | :meth:`decompress` method call in order to get correct output. | 
|  | 162 |  | 
|  | 163 |  | 
|  | 164 | .. method:: Decompress.decompress(string[, max_length]) | 
|  | 165 |  | 
|  | 166 | Decompress *string*, returning a string containing the uncompressed data | 
|  | 167 | corresponding to at least part of the data in *string*.  This data should be | 
|  | 168 | concatenated to the output produced by any preceding calls to the | 
|  | 169 | :meth:`decompress` method.  Some of the input data may be preserved in internal | 
|  | 170 | buffers for later processing. | 
|  | 171 |  | 
|  | 172 | If the optional parameter *max_length* is supplied then the return value will be | 
|  | 173 | no longer than *max_length*. This may mean that not all of the compressed input | 
|  | 174 | can be processed; and unconsumed data will be stored in the attribute | 
|  | 175 | :attr:`unconsumed_tail`. This string must be passed to a subsequent call to | 
|  | 176 | :meth:`decompress` if decompression is to continue.  If *max_length* is not | 
|  | 177 | supplied then the whole input is decompressed, and :attr:`unconsumed_tail` is an | 
|  | 178 | empty string. | 
|  | 179 |  | 
|  | 180 |  | 
|  | 181 | .. method:: Decompress.flush([length]) | 
|  | 182 |  | 
|  | 183 | All pending input is processed, and a string containing the remaining | 
|  | 184 | uncompressed output is returned.  After calling :meth:`flush`, the | 
|  | 185 | :meth:`decompress` method cannot be called again; the only realistic action is | 
|  | 186 | to delete the object. | 
|  | 187 |  | 
|  | 188 | The optional parameter *length* sets the initial size of the output buffer. | 
|  | 189 |  | 
|  | 190 |  | 
|  | 191 | .. method:: Decompress.copy() | 
|  | 192 |  | 
|  | 193 | Returns a copy of the decompression object.  This can be used to save the state | 
|  | 194 | of the decompressor midway through the data stream in order to speed up random | 
|  | 195 | seeks into the stream at a future point. | 
|  | 196 |  | 
|  | 197 | .. versionadded:: 2.5 | 
|  | 198 |  | 
|  | 199 |  | 
|  | 200 | .. seealso:: | 
|  | 201 |  | 
|  | 202 | Module :mod:`gzip` | 
|  | 203 | Reading and writing :program:`gzip`\ -format files. | 
|  | 204 |  | 
|  | 205 | http://www.zlib.net | 
|  | 206 | The zlib library home page. | 
|  | 207 |  | 
|  | 208 | http://www.zlib.net/manual.html | 
|  | 209 | The zlib manual explains  the semantics and usage of the library's many | 
|  | 210 | functions. | 
|  | 211 |  |