| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 1 | :mod:`bz2` --- Compression compatible with :program:`bzip2` |
| 2 | =========================================================== |
| 3 | |
| 4 | .. module:: bz2 |
| Georg Brandl | 0d8f073 | 2009-04-05 22:20:44 +0000 | [diff] [blame] | 5 | :synopsis: Interface to compression and decompression routines |
| 6 | compatible with bzip2. |
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 7 | .. moduleauthor:: Gustavo Niemeyer <niemeyer@conectiva.com> |
| 8 | .. sectionauthor:: Gustavo Niemeyer <niemeyer@conectiva.com> |
| 9 | |
| 10 | |
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 11 | This module provides a comprehensive interface for the bz2 compression library. |
| 12 | It implements a complete file interface, one-shot (de)compression functions, and |
| 13 | types for sequential (de)compression. |
| 14 | |
| Guido van Rossum | 7767711 | 2007-11-05 19:43:04 +0000 | [diff] [blame] | 15 | Here is a summary of the features offered by the bz2 module: |
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 16 | |
| 17 | * :class:`BZ2File` class implements a complete file interface, including |
| Ezio Melotti | 9e7ce59 | 2010-03-13 00:26:04 +0000 | [diff] [blame] | 18 | :meth:`~BZ2File.readline`, :meth:`~BZ2File.readlines`, |
| 19 | :meth:`~BZ2File.writelines`, :meth:`~BZ2File.seek`, etc; |
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 20 | |
| Ezio Melotti | 9e7ce59 | 2010-03-13 00:26:04 +0000 | [diff] [blame] | 21 | * :class:`BZ2File` class implements emulated :meth:`~BZ2File.seek` support; |
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 22 | |
| 23 | * :class:`BZ2File` class implements universal newline support; |
| 24 | |
| Antoine Pitrou | 11cb961 | 2010-09-15 11:11:28 +0000 | [diff] [blame] | 25 | * :class:`BZ2File` class offers an optimized line iteration using a readahead |
| 26 | algorithm; |
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 27 | |
| 28 | * Sequential (de)compression supported by :class:`BZ2Compressor` and |
| 29 | :class:`BZ2Decompressor` classes; |
| 30 | |
| 31 | * One-shot (de)compression supported by :func:`compress` and :func:`decompress` |
| 32 | functions; |
| 33 | |
| Guido van Rossum | 7767711 | 2007-11-05 19:43:04 +0000 | [diff] [blame] | 34 | * Thread safety uses individual locking mechanism. |
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 35 | |
| 36 | |
| 37 | (De)compression of files |
| 38 | ------------------------ |
| 39 | |
| 40 | Handling of compressed files is offered by the :class:`BZ2File` class. |
| 41 | |
| 42 | |
| R David Murray | 1b00f25 | 2012-08-15 10:43:58 -0400 | [diff] [blame] | 43 | .. index:: |
| 44 | single: universal newlines; bz2.BZ2File class |
| 45 | |
| Georg Brandl | 0d8f073 | 2009-04-05 22:20:44 +0000 | [diff] [blame] | 46 | .. class:: BZ2File(filename, mode='r', buffering=0, compresslevel=9) |
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 47 | |
| Benjamin Peterson | e41251e | 2008-04-25 01:59:09 +0000 | [diff] [blame] | 48 | Open a bz2 file. Mode can be either ``'r'`` or ``'w'``, for reading (default) |
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 49 | or writing. When opened for writing, the file will be created if it doesn't |
| Benjamin Peterson | e41251e | 2008-04-25 01:59:09 +0000 | [diff] [blame] | 50 | exist, and truncated otherwise. If *buffering* is given, ``0`` means |
| 51 | unbuffered, and larger numbers specify the buffer size; the default is |
| 52 | ``0``. If *compresslevel* is given, it must be a number between ``1`` and |
| 53 | ``9``; the default is ``9``. Add a ``'U'`` to mode to open the file for input |
| R David Murray | 1b00f25 | 2012-08-15 10:43:58 -0400 | [diff] [blame] | 54 | in :term:`universal newlines` mode. Any line ending in the input file will be |
| Benjamin Peterson | e41251e | 2008-04-25 01:59:09 +0000 | [diff] [blame] | 55 | seen as a ``'\n'`` in Python. Also, a file so opened gains the attribute |
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 56 | :attr:`newlines`; the value for this attribute is one of ``None`` (no newline |
| Benjamin Peterson | e41251e | 2008-04-25 01:59:09 +0000 | [diff] [blame] | 57 | read yet), ``'\r'``, ``'\n'``, ``'\r\n'`` or a tuple containing all the |
| 58 | newline types seen. Universal newlines are available only when |
| 59 | reading. Instances support iteration in the same way as normal :class:`file` |
| 60 | instances. |
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 61 | |
| Benjamin Peterson | e0124bd | 2009-03-09 21:04:33 +0000 | [diff] [blame] | 62 | :class:`BZ2File` supports the :keyword:`with` statement. |
| 63 | |
| Benjamin Peterson | 10745a9 | 2009-03-09 21:08:47 +0000 | [diff] [blame] | 64 | .. versionchanged:: 3.1 |
| Benjamin Peterson | e0124bd | 2009-03-09 21:04:33 +0000 | [diff] [blame] | 65 | Support for the :keyword:`with` statement was added. |
| 66 | |
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 67 | |
| Nadeem Vawda | 8f50912 | 2012-02-04 23:44:49 +0200 | [diff] [blame] | 68 | .. note:: |
| 69 | |
| 70 | This class does not support input files containing multiple streams (such |
| 71 | as those produced by the :program:`pbzip2` tool). When reading such an |
| 72 | input file, only the first stream will be accessible. If you require |
| Nadeem Vawda | bd249c1 | 2012-02-05 14:29:00 +0200 | [diff] [blame] | 73 | support for multi-stream files, consider using the third-party |
| 74 | :mod:`bz2file` module (available from |
| 75 | `PyPI <http://pypi.python.org/pypi/bz2file>`_). This module provides a |
| 76 | backport of Python 3.3's :class:`BZ2File` class, which does support |
| 77 | multi-stream files. |
| Nadeem Vawda | 8f50912 | 2012-02-04 23:44:49 +0200 | [diff] [blame] | 78 | |
| 79 | |
| Benjamin Peterson | e41251e | 2008-04-25 01:59:09 +0000 | [diff] [blame] | 80 | .. method:: close() |
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 81 | |
| Benjamin Peterson | e41251e | 2008-04-25 01:59:09 +0000 | [diff] [blame] | 82 | Close the file. Sets data attribute :attr:`closed` to true. A closed file |
| 83 | cannot be used for further I/O operations. :meth:`close` may be called |
| 84 | more than once without error. |
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 85 | |
| 86 | |
| Benjamin Peterson | e41251e | 2008-04-25 01:59:09 +0000 | [diff] [blame] | 87 | .. method:: read([size]) |
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 88 | |
| Ezio Melotti | 6540a82 | 2010-03-12 22:45:38 +0000 | [diff] [blame] | 89 | Read at most *size* uncompressed bytes, returned as a byte string. If the |
| Benjamin Peterson | e41251e | 2008-04-25 01:59:09 +0000 | [diff] [blame] | 90 | *size* argument is negative or omitted, read until EOF is reached. |
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 91 | |
| 92 | |
| Benjamin Peterson | e41251e | 2008-04-25 01:59:09 +0000 | [diff] [blame] | 93 | .. method:: readline([size]) |
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 94 | |
| Ezio Melotti | 6540a82 | 2010-03-12 22:45:38 +0000 | [diff] [blame] | 95 | Return the next line from the file, as a byte string, retaining newline. |
| 96 | A non-negative *size* argument limits the maximum number of bytes to |
| 97 | return (an incomplete line may be returned then). Return an empty byte |
| 98 | string at EOF. |
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 99 | |
| 100 | |
| Benjamin Peterson | e41251e | 2008-04-25 01:59:09 +0000 | [diff] [blame] | 101 | .. method:: readlines([size]) |
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 102 | |
| Benjamin Peterson | e41251e | 2008-04-25 01:59:09 +0000 | [diff] [blame] | 103 | Return a list of lines read. The optional *size* argument, if given, is an |
| 104 | approximate bound on the total number of bytes in the lines returned. |
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 105 | |
| 106 | |
| Benjamin Peterson | e41251e | 2008-04-25 01:59:09 +0000 | [diff] [blame] | 107 | .. method:: seek(offset[, whence]) |
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 108 | |
| Benjamin Peterson | e41251e | 2008-04-25 01:59:09 +0000 | [diff] [blame] | 109 | Move to new file position. Argument *offset* is a byte count. Optional |
| 110 | argument *whence* defaults to ``os.SEEK_SET`` or ``0`` (offset from start |
| 111 | of file; offset should be ``>= 0``); other values are ``os.SEEK_CUR`` or |
| 112 | ``1`` (move relative to current position; offset can be positive or |
| 113 | negative), and ``os.SEEK_END`` or ``2`` (move relative to end of file; |
| 114 | offset is usually negative, although many platforms allow seeking beyond |
| 115 | the end of a file). |
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 116 | |
| Benjamin Peterson | e41251e | 2008-04-25 01:59:09 +0000 | [diff] [blame] | 117 | Note that seeking of bz2 files is emulated, and depending on the |
| 118 | parameters the operation may be extremely slow. |
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 119 | |
| 120 | |
| Benjamin Peterson | e41251e | 2008-04-25 01:59:09 +0000 | [diff] [blame] | 121 | .. method:: tell() |
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 122 | |
| Benjamin Peterson | e41251e | 2008-04-25 01:59:09 +0000 | [diff] [blame] | 123 | Return the current file position, an integer. |
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 124 | |
| 125 | |
| Benjamin Peterson | e41251e | 2008-04-25 01:59:09 +0000 | [diff] [blame] | 126 | .. method:: write(data) |
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 127 | |
| Ezio Melotti | 6540a82 | 2010-03-12 22:45:38 +0000 | [diff] [blame] | 128 | Write the byte string *data* to file. Note that due to buffering, |
| 129 | :meth:`close` may be needed before the file on disk reflects the data |
| 130 | written. |
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 131 | |
| 132 | |
| Ezio Melotti | 6540a82 | 2010-03-12 22:45:38 +0000 | [diff] [blame] | 133 | .. method:: writelines(sequence_of_byte_strings) |
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 134 | |
| Ezio Melotti | 6540a82 | 2010-03-12 22:45:38 +0000 | [diff] [blame] | 135 | Write the sequence of byte strings to the file. Note that newlines are not |
| 136 | added. The sequence can be any iterable object producing byte strings. |
| 137 | This is equivalent to calling write() for each byte string. |
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 138 | |
| 139 | |
| 140 | Sequential (de)compression |
| 141 | -------------------------- |
| 142 | |
| 143 | Sequential compression and decompression is done using the classes |
| 144 | :class:`BZ2Compressor` and :class:`BZ2Decompressor`. |
| 145 | |
| 146 | |
| Georg Brandl | 0d8f073 | 2009-04-05 22:20:44 +0000 | [diff] [blame] | 147 | .. class:: BZ2Compressor(compresslevel=9) |
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 148 | |
| 149 | Create a new compressor object. This object may be used to compress data |
| Benjamin Peterson | e41251e | 2008-04-25 01:59:09 +0000 | [diff] [blame] | 150 | sequentially. If you want to compress data in one shot, use the |
| 151 | :func:`compress` function instead. The *compresslevel* parameter, if given, |
| 152 | must be a number between ``1`` and ``9``; the default is ``9``. |
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 153 | |
| Benjamin Peterson | e41251e | 2008-04-25 01:59:09 +0000 | [diff] [blame] | 154 | .. method:: compress(data) |
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 155 | |
| Benjamin Peterson | e41251e | 2008-04-25 01:59:09 +0000 | [diff] [blame] | 156 | Provide more data to the compressor object. It will return chunks of |
| 157 | compressed data whenever possible. When you've finished providing data to |
| 158 | compress, call the :meth:`flush` method to finish the compression process, |
| 159 | and return what is left in internal buffers. |
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 160 | |
| 161 | |
| Benjamin Peterson | e41251e | 2008-04-25 01:59:09 +0000 | [diff] [blame] | 162 | .. method:: flush() |
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 163 | |
| Benjamin Peterson | e41251e | 2008-04-25 01:59:09 +0000 | [diff] [blame] | 164 | Finish the compression process and return what is left in internal |
| 165 | buffers. You must not use the compressor object after calling this method. |
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 166 | |
| 167 | |
| 168 | .. class:: BZ2Decompressor() |
| 169 | |
| 170 | Create a new decompressor object. This object may be used to decompress data |
| 171 | sequentially. If you want to decompress data in one shot, use the |
| 172 | :func:`decompress` function instead. |
| 173 | |
| Benjamin Peterson | e41251e | 2008-04-25 01:59:09 +0000 | [diff] [blame] | 174 | .. method:: decompress(data) |
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 175 | |
| Benjamin Peterson | e41251e | 2008-04-25 01:59:09 +0000 | [diff] [blame] | 176 | Provide more data to the decompressor object. It will return chunks of |
| 177 | decompressed data whenever possible. If you try to decompress data after |
| 178 | the end of stream is found, :exc:`EOFError` will be raised. If any data |
| 179 | was found after the end of stream, it'll be ignored and saved in |
| 180 | :attr:`unused_data` attribute. |
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 181 | |
| 182 | |
| 183 | One-shot (de)compression |
| 184 | ------------------------ |
| 185 | |
| 186 | One-shot compression and decompression is provided through the :func:`compress` |
| 187 | and :func:`decompress` functions. |
| 188 | |
| 189 | |
| Georg Brandl | 0d8f073 | 2009-04-05 22:20:44 +0000 | [diff] [blame] | 190 | .. function:: compress(data, compresslevel=9) |
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 191 | |
| Benjamin Peterson | e41251e | 2008-04-25 01:59:09 +0000 | [diff] [blame] | 192 | Compress *data* in one shot. If you want to compress data sequentially, use |
| 193 | an instance of :class:`BZ2Compressor` instead. The *compresslevel* parameter, |
| 194 | if given, must be a number between ``1`` and ``9``; the default is ``9``. |
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 195 | |
| 196 | |
| 197 | .. function:: decompress(data) |
| 198 | |
| Benjamin Peterson | e41251e | 2008-04-25 01:59:09 +0000 | [diff] [blame] | 199 | Decompress *data* in one shot. If you want to decompress data sequentially, |
| 200 | use an instance of :class:`BZ2Decompressor` instead. |
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 201 | |