Antoine Pitrou | 37dc5f8 | 2011-04-03 17:05:46 +0200 | [diff] [blame] | 1 | :mod:`bz2` --- Support for :program:`bzip2` compression |
| 2 | ======================================================= |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 3 | |
| 4 | .. module:: bz2 |
Antoine Pitrou | 37dc5f8 | 2011-04-03 17:05:46 +0200 | [diff] [blame] | 5 | :synopsis: Interfaces for bzip2 compression and decompression. |
Terry Jan Reedy | fa089b9 | 2016-06-11 15:02:54 -0400 | [diff] [blame] | 6 | |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 7 | .. moduleauthor:: Gustavo Niemeyer <niemeyer@conectiva.com> |
Antoine Pitrou | 37dc5f8 | 2011-04-03 17:05:46 +0200 | [diff] [blame] | 8 | .. moduleauthor:: Nadeem Vawda <nadeem.vawda@gmail.com> |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 9 | .. sectionauthor:: Gustavo Niemeyer <niemeyer@conectiva.com> |
Antoine Pitrou | 37dc5f8 | 2011-04-03 17:05:46 +0200 | [diff] [blame] | 10 | .. sectionauthor:: Nadeem Vawda <nadeem.vawda@gmail.com> |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 11 | |
Terry Jan Reedy | fa089b9 | 2016-06-11 15:02:54 -0400 | [diff] [blame] | 12 | **Source code:** :source:`Lib/bz2.py` |
| 13 | |
| 14 | -------------- |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 15 | |
Antoine Pitrou | 37dc5f8 | 2011-04-03 17:05:46 +0200 | [diff] [blame] | 16 | This module provides a comprehensive interface for compressing and |
| 17 | decompressing data using the bzip2 compression algorithm. |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 18 | |
Antoine Pitrou | 37dc5f8 | 2011-04-03 17:05:46 +0200 | [diff] [blame] | 19 | The :mod:`bz2` module contains: |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 20 | |
Nadeem Vawda | af518c1 | 2012-06-04 23:32:38 +0200 | [diff] [blame] | 21 | * The :func:`.open` function and :class:`BZ2File` class for reading and |
| 22 | writing compressed files. |
Antoine Pitrou | 37dc5f8 | 2011-04-03 17:05:46 +0200 | [diff] [blame] | 23 | * The :class:`BZ2Compressor` and :class:`BZ2Decompressor` classes for |
| 24 | incremental (de)compression. |
| 25 | * The :func:`compress` and :func:`decompress` functions for one-shot |
| 26 | (de)compression. |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 27 | |
Antoine Pitrou | 37dc5f8 | 2011-04-03 17:05:46 +0200 | [diff] [blame] | 28 | All of the classes in this module may safely be accessed from multiple threads. |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 29 | |
| 30 | |
| 31 | (De)compression of files |
| 32 | ------------------------ |
| 33 | |
Nadeem Vawda | af518c1 | 2012-06-04 23:32:38 +0200 | [diff] [blame] | 34 | .. function:: open(filename, mode='r', compresslevel=9, encoding=None, errors=None, newline=None) |
| 35 | |
| 36 | Open a bzip2-compressed file in binary or text mode, returning a :term:`file |
| 37 | object`. |
| 38 | |
| 39 | As with the constructor for :class:`BZ2File`, the *filename* argument can be |
| 40 | an actual filename (a :class:`str` or :class:`bytes` object), or an existing |
| 41 | file object to read from or write to. |
| 42 | |
| 43 | The *mode* argument can be any of ``'r'``, ``'rb'``, ``'w'``, ``'wb'``, |
Nadeem Vawda | 8a9e99c | 2013-10-19 00:11:06 +0200 | [diff] [blame] | 44 | ``'x'``, ``'xb'``, ``'a'`` or ``'ab'`` for binary mode, or ``'rt'``, |
| 45 | ``'wt'``, ``'xt'``, or ``'at'`` for text mode. The default is ``'rb'``. |
Nadeem Vawda | af518c1 | 2012-06-04 23:32:38 +0200 | [diff] [blame] | 46 | |
| 47 | The *compresslevel* argument is an integer from 1 to 9, as for the |
| 48 | :class:`BZ2File` constructor. |
| 49 | |
| 50 | For binary mode, this function is equivalent to the :class:`BZ2File` |
| 51 | constructor: ``BZ2File(filename, mode, compresslevel=compresslevel)``. In |
| 52 | this case, the *encoding*, *errors* and *newline* arguments must not be |
| 53 | provided. |
| 54 | |
| 55 | For text mode, a :class:`BZ2File` object is created, and wrapped in an |
| 56 | :class:`io.TextIOWrapper` instance with the specified encoding, error |
| 57 | handling behavior, and line ending(s). |
| 58 | |
| 59 | .. versionadded:: 3.3 |
| 60 | |
Nadeem Vawda | 8a9e99c | 2013-10-19 00:11:06 +0200 | [diff] [blame] | 61 | .. versionchanged:: 3.4 |
| 62 | The ``'x'`` (exclusive creation) mode was added. |
| 63 | |
Berker Peksag | 8bdd448 | 2016-10-02 20:07:06 +0300 | [diff] [blame] | 64 | .. versionchanged:: 3.6 |
| 65 | Accepts a :term:`path-like object`. |
| 66 | |
Nadeem Vawda | af518c1 | 2012-06-04 23:32:38 +0200 | [diff] [blame] | 67 | |
Nadeem Vawda | aebcdba | 2012-06-04 23:31:20 +0200 | [diff] [blame] | 68 | .. class:: BZ2File(filename, mode='r', buffering=None, compresslevel=9) |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 69 | |
Nadeem Vawda | af518c1 | 2012-06-04 23:32:38 +0200 | [diff] [blame] | 70 | Open a bzip2-compressed file in binary mode. |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 71 | |
Nadeem Vawda | aebcdba | 2012-06-04 23:31:20 +0200 | [diff] [blame] | 72 | If *filename* is a :class:`str` or :class:`bytes` object, open the named file |
| 73 | directly. Otherwise, *filename* should be a :term:`file object`, which will |
| 74 | be used to read or write the compressed data. |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 75 | |
Nadeem Vawda | 200e00a | 2011-05-27 01:52:16 +0200 | [diff] [blame] | 76 | The *mode* argument can be either ``'r'`` for reading (default), ``'w'`` for |
Nadeem Vawda | 8a9e99c | 2013-10-19 00:11:06 +0200 | [diff] [blame] | 77 | overwriting, ``'x'`` for exclusive creation, or ``'a'`` for appending. These |
| 78 | can equivalently be given as ``'rb'``, ``'wb'``, ``'xb'`` and ``'ab'`` |
| 79 | respectively. |
Nadeem Vawda | 50cb936 | 2012-06-04 23:31:22 +0200 | [diff] [blame] | 80 | |
| 81 | If *filename* is a file object (rather than an actual file name), a mode of |
| 82 | ``'w'`` does not truncate the file, and is instead equivalent to ``'a'``. |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 83 | |
Antoine Pitrou | 37dc5f8 | 2011-04-03 17:05:46 +0200 | [diff] [blame] | 84 | The *buffering* argument is ignored. Its use is deprecated. |
| 85 | |
Nadeem Vawda | 200e00a | 2011-05-27 01:52:16 +0200 | [diff] [blame] | 86 | If *mode* is ``'w'`` or ``'a'``, *compresslevel* can be a number between |
| 87 | ``1`` and ``9`` specifying the level of compression: ``1`` produces the |
| 88 | least compression, and ``9`` (default) produces the most compression. |
| 89 | |
| 90 | If *mode* is ``'r'``, the input file may be the concatenation of multiple |
| 91 | compressed streams. |
Antoine Pitrou | 37dc5f8 | 2011-04-03 17:05:46 +0200 | [diff] [blame] | 92 | |
| 93 | :class:`BZ2File` provides all of the members specified by the |
| 94 | :class:`io.BufferedIOBase`, except for :meth:`detach` and :meth:`truncate`. |
| 95 | Iteration and the :keyword:`with` statement are supported. |
| 96 | |
| 97 | :class:`BZ2File` also provides the following method: |
| 98 | |
| 99 | .. method:: peek([n]) |
| 100 | |
| 101 | Return buffered data without advancing the file position. At least one |
| 102 | byte of data will be returned (unless at EOF). The exact number of bytes |
| 103 | returned is unspecified. |
| 104 | |
Nadeem Vawda | 6976104 | 2013-12-08 19:47:22 +0100 | [diff] [blame] | 105 | .. note:: While calling :meth:`peek` does not change the file position of |
| 106 | the :class:`BZ2File`, it may change the position of the underlying file |
| 107 | object (e.g. if the :class:`BZ2File` was constructed by passing a file |
| 108 | object for *filename*). |
| 109 | |
Antoine Pitrou | 37dc5f8 | 2011-04-03 17:05:46 +0200 | [diff] [blame] | 110 | .. versionadded:: 3.3 |
Benjamin Peterson | e0124bd | 2009-03-09 21:04:33 +0000 | [diff] [blame] | 111 | |
Benjamin Peterson | 10745a9 | 2009-03-09 21:08:47 +0000 | [diff] [blame] | 112 | .. versionchanged:: 3.1 |
Benjamin Peterson | e0124bd | 2009-03-09 21:04:33 +0000 | [diff] [blame] | 113 | Support for the :keyword:`with` statement was added. |
| 114 | |
Antoine Pitrou | 37dc5f8 | 2011-04-03 17:05:46 +0200 | [diff] [blame] | 115 | .. versionchanged:: 3.3 |
| 116 | The :meth:`fileno`, :meth:`readable`, :meth:`seekable`, :meth:`writable`, |
| 117 | :meth:`read1` and :meth:`readinto` methods were added. |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 118 | |
Antoine Pitrou | 37dc5f8 | 2011-04-03 17:05:46 +0200 | [diff] [blame] | 119 | .. versionchanged:: 3.3 |
Nadeem Vawda | aebcdba | 2012-06-04 23:31:20 +0200 | [diff] [blame] | 120 | Support was added for *filename* being a :term:`file object` instead of an |
| 121 | actual filename. |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 122 | |
Nadeem Vawda | 200e00a | 2011-05-27 01:52:16 +0200 | [diff] [blame] | 123 | .. versionchanged:: 3.3 |
| 124 | The ``'a'`` (append) mode was added, along with support for reading |
| 125 | multi-stream files. |
| 126 | |
Nadeem Vawda | 8a9e99c | 2013-10-19 00:11:06 +0200 | [diff] [blame] | 127 | .. versionchanged:: 3.4 |
| 128 | The ``'x'`` (exclusive creation) mode was added. |
| 129 | |
Antoine Pitrou | 2dbc6e6 | 2015-04-11 00:31:01 +0200 | [diff] [blame] | 130 | .. versionchanged:: 3.5 |
| 131 | The :meth:`~io.BufferedIOBase.read` method now accepts an argument of |
| 132 | ``None``. |
| 133 | |
Berker Peksag | 8bdd448 | 2016-10-02 20:07:06 +0300 | [diff] [blame] | 134 | .. versionchanged:: 3.6 |
| 135 | Accepts a :term:`path-like object`. |
| 136 | |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 137 | |
Antoine Pitrou | 37dc5f8 | 2011-04-03 17:05:46 +0200 | [diff] [blame] | 138 | Incremental (de)compression |
| 139 | --------------------------- |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 140 | |
Georg Brandl | 0d8f073 | 2009-04-05 22:20:44 +0000 | [diff] [blame] | 141 | .. class:: BZ2Compressor(compresslevel=9) |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 142 | |
| 143 | Create a new compressor object. This object may be used to compress data |
Antoine Pitrou | 37dc5f8 | 2011-04-03 17:05:46 +0200 | [diff] [blame] | 144 | incrementally. For one-shot compression, use the :func:`compress` function |
| 145 | instead. |
| 146 | |
| 147 | *compresslevel*, if given, must be a number between ``1`` and ``9``. The |
| 148 | default is ``9``. |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 149 | |
Benjamin Peterson | e41251e | 2008-04-25 01:59:09 +0000 | [diff] [blame] | 150 | .. method:: compress(data) |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 151 | |
Antoine Pitrou | 37dc5f8 | 2011-04-03 17:05:46 +0200 | [diff] [blame] | 152 | Provide data to the compressor object. Returns a chunk of compressed data |
| 153 | if possible, or an empty byte string otherwise. |
| 154 | |
| 155 | When you have finished providing data to the compressor, call the |
| 156 | :meth:`flush` method to finish the compression process. |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 157 | |
| 158 | |
Benjamin Peterson | e41251e | 2008-04-25 01:59:09 +0000 | [diff] [blame] | 159 | .. method:: flush() |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 160 | |
Antoine Pitrou | 37dc5f8 | 2011-04-03 17:05:46 +0200 | [diff] [blame] | 161 | Finish the compression process. Returns the compressed data left in |
| 162 | internal buffers. |
| 163 | |
| 164 | The compressor object may not be used after this method has been called. |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 165 | |
| 166 | |
| 167 | .. class:: BZ2Decompressor() |
| 168 | |
| 169 | Create a new decompressor object. This object may be used to decompress data |
Antoine Pitrou | 37dc5f8 | 2011-04-03 17:05:46 +0200 | [diff] [blame] | 170 | incrementally. For one-shot compression, use the :func:`decompress` function |
| 171 | instead. |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 172 | |
Nadeem Vawda | 200e00a | 2011-05-27 01:52:16 +0200 | [diff] [blame] | 173 | .. note:: |
| 174 | This class does not transparently handle inputs containing multiple |
| 175 | compressed streams, unlike :func:`decompress` and :class:`BZ2File`. If |
| 176 | you need to decompress a multi-stream input with :class:`BZ2Decompressor`, |
| 177 | you must use a new decompressor for each stream. |
| 178 | |
Antoine Pitrou | e71258a | 2015-02-26 13:08:07 +0100 | [diff] [blame] | 179 | .. method:: decompress(data, max_length=-1) |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 180 | |
Antoine Pitrou | e71258a | 2015-02-26 13:08:07 +0100 | [diff] [blame] | 181 | Decompress *data* (a :term:`bytes-like object`), returning |
| 182 | uncompressed data as bytes. Some of *data* may be buffered |
| 183 | internally, for use in later calls to :meth:`decompress`. The |
| 184 | returned data should be concatenated with the output of any |
| 185 | previous calls to :meth:`decompress`. |
Antoine Pitrou | 37dc5f8 | 2011-04-03 17:05:46 +0200 | [diff] [blame] | 186 | |
Antoine Pitrou | e71258a | 2015-02-26 13:08:07 +0100 | [diff] [blame] | 187 | If *max_length* is nonnegative, returns at most *max_length* |
| 188 | bytes of decompressed data. If this limit is reached and further |
| 189 | output can be produced, the :attr:`~.needs_input` attribute will |
| 190 | be set to ``False``. In this case, the next call to |
| 191 | :meth:`~.decompress` may provide *data* as ``b''`` to obtain |
| 192 | more of the output. |
Antoine Pitrou | 37dc5f8 | 2011-04-03 17:05:46 +0200 | [diff] [blame] | 193 | |
Antoine Pitrou | e71258a | 2015-02-26 13:08:07 +0100 | [diff] [blame] | 194 | If all of the input data was decompressed and returned (either |
| 195 | because this was less than *max_length* bytes, or because |
| 196 | *max_length* was negative), the :attr:`~.needs_input` attribute |
| 197 | will be set to ``True``. |
| 198 | |
| 199 | Attempting to decompress data after the end of stream is reached |
| 200 | raises an `EOFError`. Any data found after the end of the |
| 201 | stream is ignored and saved in the :attr:`~.unused_data` attribute. |
| 202 | |
| 203 | .. versionchanged:: 3.5 |
| 204 | Added the *max_length* parameter. |
Antoine Pitrou | 37dc5f8 | 2011-04-03 17:05:46 +0200 | [diff] [blame] | 205 | |
| 206 | .. attribute:: eof |
| 207 | |
Serhiy Storchaka | fbc1c26 | 2013-11-29 12:17:13 +0200 | [diff] [blame] | 208 | ``True`` if the end-of-stream marker has been reached. |
Antoine Pitrou | 37dc5f8 | 2011-04-03 17:05:46 +0200 | [diff] [blame] | 209 | |
| 210 | .. versionadded:: 3.3 |
| 211 | |
| 212 | |
| 213 | .. attribute:: unused_data |
| 214 | |
| 215 | Data found after the end of the compressed stream. |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 216 | |
Nadeem Vawda | 200e00a | 2011-05-27 01:52:16 +0200 | [diff] [blame] | 217 | If this attribute is accessed before the end of the stream has been |
| 218 | reached, its value will be ``b''``. |
| 219 | |
Antoine Pitrou | e71258a | 2015-02-26 13:08:07 +0100 | [diff] [blame] | 220 | .. attribute:: needs_input |
| 221 | |
| 222 | ``False`` if the :meth:`.decompress` method can provide more |
| 223 | decompressed data before requiring new uncompressed input. |
| 224 | |
| 225 | .. versionadded:: 3.5 |
| 226 | |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 227 | |
| 228 | One-shot (de)compression |
| 229 | ------------------------ |
| 230 | |
Georg Brandl | 0d8f073 | 2009-04-05 22:20:44 +0000 | [diff] [blame] | 231 | .. function:: compress(data, compresslevel=9) |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 232 | |
Antoine Pitrou | 37dc5f8 | 2011-04-03 17:05:46 +0200 | [diff] [blame] | 233 | Compress *data*. |
| 234 | |
| 235 | *compresslevel*, if given, must be a number between ``1`` and ``9``. The |
| 236 | default is ``9``. |
| 237 | |
| 238 | For incremental compression, use a :class:`BZ2Compressor` instead. |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 239 | |
| 240 | |
| 241 | .. function:: decompress(data) |
| 242 | |
Antoine Pitrou | 37dc5f8 | 2011-04-03 17:05:46 +0200 | [diff] [blame] | 243 | Decompress *data*. |
| 244 | |
Nadeem Vawda | 200e00a | 2011-05-27 01:52:16 +0200 | [diff] [blame] | 245 | If *data* is the concatenation of multiple compressed streams, decompress |
| 246 | all of the streams. |
| 247 | |
Antoine Pitrou | 37dc5f8 | 2011-04-03 17:05:46 +0200 | [diff] [blame] | 248 | For incremental decompression, use a :class:`BZ2Decompressor` instead. |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 249 | |
Nadeem Vawda | 200e00a | 2011-05-27 01:52:16 +0200 | [diff] [blame] | 250 | .. versionchanged:: 3.3 |
| 251 | Support for multi-stream inputs was added. |
| 252 | |