Nadeem Vawda | 3ff069e | 2011-11-30 00:25:06 +0200 | [diff] [blame] | 1 | :mod:`lzma` --- Compression using the LZMA algorithm |
| 2 | ==================================================== |
| 3 | |
| 4 | .. module:: lzma |
| 5 | :synopsis: A Python wrapper for the liblzma compression library. |
Terry Jan Reedy | fa089b9 | 2016-06-11 15:02:54 -0400 | [diff] [blame] | 6 | |
Nadeem Vawda | 3ff069e | 2011-11-30 00:25:06 +0200 | [diff] [blame] | 7 | .. moduleauthor:: Nadeem Vawda <nadeem.vawda@gmail.com> |
| 8 | .. sectionauthor:: Nadeem Vawda <nadeem.vawda@gmail.com> |
| 9 | |
| 10 | .. versionadded:: 3.3 |
| 11 | |
Terry Jan Reedy | fa089b9 | 2016-06-11 15:02:54 -0400 | [diff] [blame] | 12 | **Source code:** :source:`Lib/lzma.py` |
| 13 | |
| 14 | -------------- |
Nadeem Vawda | 3ff069e | 2011-11-30 00:25:06 +0200 | [diff] [blame] | 15 | |
| 16 | This module provides classes and convenience functions for compressing and |
| 17 | decompressing data using the LZMA compression algorithm. Also included is a file |
| 18 | interface supporting the ``.xz`` and legacy ``.lzma`` file formats used by the |
| 19 | :program:`xz` utility, as well as raw compressed streams. |
| 20 | |
Nadeem Vawda | 3ff069e | 2011-11-30 00:25:06 +0200 | [diff] [blame] | 21 | The interface provided by this module is very similar to that of the :mod:`bz2` |
| 22 | module. However, note that :class:`LZMAFile` is *not* thread-safe, unlike |
| 23 | :class:`bz2.BZ2File`, so if you need to use a single :class:`LZMAFile` instance |
| 24 | from multiple threads, it is necessary to protect it with a lock. |
| 25 | |
| 26 | |
| 27 | .. exception:: LZMAError |
| 28 | |
| 29 | This exception is raised when an error occurs during compression or |
| 30 | decompression, or while initializing the compressor/decompressor state. |
| 31 | |
| 32 | |
| 33 | Reading and writing compressed files |
| 34 | ------------------------------------ |
| 35 | |
Nadeem Vawda | e860404 | 2012-06-04 23:38:12 +0200 | [diff] [blame] | 36 | .. function:: open(filename, mode="rb", \*, format=None, check=-1, preset=None, filters=None, encoding=None, errors=None, newline=None) |
| 37 | |
| 38 | Open an LZMA-compressed file in binary or text mode, returning a :term:`file |
| 39 | object`. |
| 40 | |
| 41 | The *filename* argument can be either an actual file name (given as a |
Berker Peksag | d4d4874 | 2017-02-19 03:17:35 +0300 | [diff] [blame] | 42 | :class:`str`, :class:`bytes` or :term:`path-like <path-like object>` object), in |
Berker Peksag | 5f59ddd | 2016-10-04 20:41:20 +0300 | [diff] [blame] | 43 | which case the named file is opened, or it can be an existing file object |
| 44 | to read from or write to. |
Nadeem Vawda | e860404 | 2012-06-04 23:38:12 +0200 | [diff] [blame] | 45 | |
| 46 | The *mode* argument can be any of ``"r"``, ``"rb"``, ``"w"``, ``"wb"``, |
Nadeem Vawda | 42ca982 | 2013-10-19 00:06:19 +0200 | [diff] [blame] | 47 | ``"x"``, ``"xb"``, ``"a"`` or ``"ab"`` for binary mode, or ``"rt"``, |
| 48 | ``"wt"``, ``"xt"``, or ``"at"`` for text mode. The default is ``"rb"``. |
Nadeem Vawda | e860404 | 2012-06-04 23:38:12 +0200 | [diff] [blame] | 49 | |
| 50 | When opening a file for reading, the *format* and *filters* arguments have |
| 51 | the same meanings as for :class:`LZMADecompressor`. In this case, the *check* |
| 52 | and *preset* arguments should not be used. |
| 53 | |
| 54 | When opening a file for writing, the *format*, *check*, *preset* and |
| 55 | *filters* arguments have the same meanings as for :class:`LZMACompressor`. |
| 56 | |
| 57 | For binary mode, this function is equivalent to the :class:`LZMAFile` |
| 58 | constructor: ``LZMAFile(filename, mode, ...)``. In this case, the *encoding*, |
| 59 | *errors* and *newline* arguments must not be provided. |
| 60 | |
| 61 | For text mode, a :class:`LZMAFile` object is created, and wrapped in an |
| 62 | :class:`io.TextIOWrapper` instance with the specified encoding, error |
| 63 | handling behavior, and line ending(s). |
| 64 | |
Nadeem Vawda | 42ca982 | 2013-10-19 00:06:19 +0200 | [diff] [blame] | 65 | .. versionchanged:: 3.4 |
| 66 | Added support for the ``"x"``, ``"xb"`` and ``"xt"`` modes. |
| 67 | |
Berker Peksag | 5f59ddd | 2016-10-04 20:41:20 +0300 | [diff] [blame] | 68 | .. versionchanged:: 3.6 |
| 69 | Accepts a :term:`path-like object`. |
| 70 | |
Nadeem Vawda | e860404 | 2012-06-04 23:38:12 +0200 | [diff] [blame] | 71 | |
Nadeem Vawda | 33c34da | 2012-06-04 23:34:07 +0200 | [diff] [blame] | 72 | .. class:: LZMAFile(filename=None, mode="r", \*, format=None, check=-1, preset=None, filters=None) |
Nadeem Vawda | 3ff069e | 2011-11-30 00:25:06 +0200 | [diff] [blame] | 73 | |
Nadeem Vawda | 33c34da | 2012-06-04 23:34:07 +0200 | [diff] [blame] | 74 | Open an LZMA-compressed file in binary mode. |
Nadeem Vawda | 3ff069e | 2011-11-30 00:25:06 +0200 | [diff] [blame] | 75 | |
Nadeem Vawda | 33c34da | 2012-06-04 23:34:07 +0200 | [diff] [blame] | 76 | An :class:`LZMAFile` can wrap an already-open :term:`file object`, or operate |
| 77 | directly on a named file. The *filename* argument specifies either the file |
Berker Peksag | 5f59ddd | 2016-10-04 20:41:20 +0300 | [diff] [blame] | 78 | object to wrap, or the name of the file to open (as a :class:`str`, |
Berker Peksag | d4d4874 | 2017-02-19 03:17:35 +0300 | [diff] [blame] | 79 | :class:`bytes` or :term:`path-like <path-like object>` object). When wrapping an |
Berker Peksag | 5f59ddd | 2016-10-04 20:41:20 +0300 | [diff] [blame] | 80 | existing file object, the wrapped file will not be closed when the |
| 81 | :class:`LZMAFile` is closed. |
Nadeem Vawda | 3ff069e | 2011-11-30 00:25:06 +0200 | [diff] [blame] | 82 | |
| 83 | The *mode* argument can be either ``"r"`` for reading (default), ``"w"`` for |
Nadeem Vawda | 42ca982 | 2013-10-19 00:06:19 +0200 | [diff] [blame] | 84 | overwriting, ``"x"`` for exclusive creation, or ``"a"`` for appending. These |
| 85 | can equivalently be given as ``"rb"``, ``"wb"``, ``"xb"`` and ``"ab"`` |
| 86 | respectively. |
Nadeem Vawda | 6cbb20c | 2012-06-04 23:36:24 +0200 | [diff] [blame] | 87 | |
| 88 | If *filename* is a file object (rather than an actual file name), a mode of |
| 89 | ``"w"`` does not truncate the file, and is instead equivalent to ``"a"``. |
Nadeem Vawda | 3ff069e | 2011-11-30 00:25:06 +0200 | [diff] [blame] | 90 | |
| 91 | When opening a file for reading, the input file may be the concatenation of |
| 92 | multiple separate compressed streams. These are transparently decoded as a |
| 93 | single logical stream. |
| 94 | |
| 95 | When opening a file for reading, the *format* and *filters* arguments have |
| 96 | the same meanings as for :class:`LZMADecompressor`. In this case, the *check* |
| 97 | and *preset* arguments should not be used. |
| 98 | |
| 99 | When opening a file for writing, the *format*, *check*, *preset* and |
| 100 | *filters* arguments have the same meanings as for :class:`LZMACompressor`. |
| 101 | |
| 102 | :class:`LZMAFile` supports all the members specified by |
| 103 | :class:`io.BufferedIOBase`, except for :meth:`detach` and :meth:`truncate`. |
| 104 | Iteration and the :keyword:`with` statement are supported. |
| 105 | |
| 106 | The following method is also provided: |
| 107 | |
| 108 | .. method:: peek(size=-1) |
| 109 | |
| 110 | Return buffered data without advancing the file position. At least one |
| 111 | byte of data will be returned, unless EOF has been reached. The exact |
| 112 | number of bytes returned is unspecified (the *size* argument is ignored). |
| 113 | |
Nadeem Vawda | 6976104 | 2013-12-08 19:47:22 +0100 | [diff] [blame] | 114 | .. note:: While calling :meth:`peek` does not change the file position of |
| 115 | the :class:`LZMAFile`, it may change the position of the underlying |
| 116 | file object (e.g. if the :class:`LZMAFile` was constructed by passing a |
| 117 | file object for *filename*). |
| 118 | |
Nadeem Vawda | 42ca982 | 2013-10-19 00:06:19 +0200 | [diff] [blame] | 119 | .. versionchanged:: 3.4 |
| 120 | Added support for the ``"x"`` and ``"xb"`` modes. |
| 121 | |
Antoine Pitrou | 2dbc6e6 | 2015-04-11 00:31:01 +0200 | [diff] [blame] | 122 | .. versionchanged:: 3.5 |
| 123 | The :meth:`~io.BufferedIOBase.read` method now accepts an argument of |
| 124 | ``None``. |
| 125 | |
Berker Peksag | 5f59ddd | 2016-10-04 20:41:20 +0300 | [diff] [blame] | 126 | .. versionchanged:: 3.6 |
| 127 | Accepts a :term:`path-like object`. |
| 128 | |
Nadeem Vawda | 3ff069e | 2011-11-30 00:25:06 +0200 | [diff] [blame] | 129 | |
| 130 | Compressing and decompressing data in memory |
| 131 | -------------------------------------------- |
| 132 | |
| 133 | .. class:: LZMACompressor(format=FORMAT_XZ, check=-1, preset=None, filters=None) |
| 134 | |
| 135 | Create a compressor object, which can be used to compress data incrementally. |
| 136 | |
| 137 | For a more convenient way of compressing a single chunk of data, see |
| 138 | :func:`compress`. |
| 139 | |
| 140 | The *format* argument specifies what container format should be used. |
| 141 | Possible values are: |
| 142 | |
| 143 | * :const:`FORMAT_XZ`: The ``.xz`` container format. |
| 144 | This is the default format. |
| 145 | |
| 146 | * :const:`FORMAT_ALONE`: The legacy ``.lzma`` container format. |
| 147 | This format is more limited than ``.xz`` -- it does not support integrity |
| 148 | checks or multiple filters. |
| 149 | |
| 150 | * :const:`FORMAT_RAW`: A raw data stream, not using any container format. |
| 151 | This format specifier does not support integrity checks, and requires that |
| 152 | you always specify a custom filter chain (for both compression and |
| 153 | decompression). Additionally, data compressed in this manner cannot be |
| 154 | decompressed using :const:`FORMAT_AUTO` (see :class:`LZMADecompressor`). |
| 155 | |
| 156 | The *check* argument specifies the type of integrity check to include in the |
| 157 | compressed data. This check is used when decompressing, to ensure that the |
| 158 | data has not been corrupted. Possible values are: |
| 159 | |
| 160 | * :const:`CHECK_NONE`: No integrity check. |
| 161 | This is the default (and the only acceptable value) for |
| 162 | :const:`FORMAT_ALONE` and :const:`FORMAT_RAW`. |
| 163 | |
| 164 | * :const:`CHECK_CRC32`: 32-bit Cyclic Redundancy Check. |
| 165 | |
| 166 | * :const:`CHECK_CRC64`: 64-bit Cyclic Redundancy Check. |
| 167 | This is the default for :const:`FORMAT_XZ`. |
| 168 | |
| 169 | * :const:`CHECK_SHA256`: 256-bit Secure Hash Algorithm. |
| 170 | |
| 171 | If the specified check is not supported, an :class:`LZMAError` is raised. |
| 172 | |
| 173 | The compression settings can be specified either as a preset compression |
| 174 | level (with the *preset* argument), or in detail as a custom filter chain |
| 175 | (with the *filters* argument). |
| 176 | |
| 177 | The *preset* argument (if provided) should be an integer between ``0`` and |
| 178 | ``9`` (inclusive), optionally OR-ed with the constant |
| 179 | :const:`PRESET_EXTREME`. If neither *preset* nor *filters* are given, the |
| 180 | default behavior is to use :const:`PRESET_DEFAULT` (preset level ``6``). |
Nadeem Vawda | dc9dd0d | 2012-01-02 02:24:20 +0200 | [diff] [blame] | 181 | Higher presets produce smaller output, but make the compression process |
| 182 | slower. |
| 183 | |
| 184 | .. note:: |
| 185 | |
| 186 | In addition to being more CPU-intensive, compression with higher presets |
| 187 | also requires much more memory (and produces output that needs more memory |
| 188 | to decompress). With preset ``9`` for example, the overhead for an |
Serhiy Storchaka | f8def28 | 2013-02-16 17:29:56 +0200 | [diff] [blame] | 189 | :class:`LZMACompressor` object can be as high as 800 MiB. For this reason, |
Nadeem Vawda | dc9dd0d | 2012-01-02 02:24:20 +0200 | [diff] [blame] | 190 | it is generally best to stick with the default preset. |
Nadeem Vawda | 3ff069e | 2011-11-30 00:25:06 +0200 | [diff] [blame] | 191 | |
| 192 | The *filters* argument (if provided) should be a filter chain specifier. |
| 193 | See :ref:`filter-chain-specs` for details. |
| 194 | |
| 195 | .. method:: compress(data) |
| 196 | |
| 197 | Compress *data* (a :class:`bytes` object), returning a :class:`bytes` |
| 198 | object containing compressed data for at least part of the input. Some of |
| 199 | *data* may be buffered internally, for use in later calls to |
| 200 | :meth:`compress` and :meth:`flush`. The returned data should be |
| 201 | concatenated with the output of any previous calls to :meth:`compress`. |
| 202 | |
| 203 | .. method:: flush() |
| 204 | |
| 205 | Finish the compression process, returning a :class:`bytes` object |
| 206 | containing any data stored in the compressor's internal buffers. |
| 207 | |
| 208 | The compressor cannot be used after this method has been called. |
| 209 | |
| 210 | |
| 211 | .. class:: LZMADecompressor(format=FORMAT_AUTO, memlimit=None, filters=None) |
| 212 | |
| 213 | Create a decompressor object, which can be used to decompress data |
| 214 | incrementally. |
| 215 | |
| 216 | For a more convenient way of decompressing an entire compressed stream at |
| 217 | once, see :func:`decompress`. |
| 218 | |
| 219 | The *format* argument specifies the container format that should be used. The |
| 220 | default is :const:`FORMAT_AUTO`, which can decompress both ``.xz`` and |
| 221 | ``.lzma`` files. Other possible values are :const:`FORMAT_XZ`, |
| 222 | :const:`FORMAT_ALONE`, and :const:`FORMAT_RAW`. |
| 223 | |
| 224 | The *memlimit* argument specifies a limit (in bytes) on the amount of memory |
| 225 | that the decompressor can use. When this argument is used, decompression will |
| 226 | fail with an :class:`LZMAError` if it is not possible to decompress the input |
| 227 | within the given memory limit. |
| 228 | |
| 229 | The *filters* argument specifies the filter chain that was used to create |
| 230 | the stream being decompressed. This argument is required if *format* is |
| 231 | :const:`FORMAT_RAW`, but should not be used for other formats. |
| 232 | See :ref:`filter-chain-specs` for more information about filter chains. |
| 233 | |
| 234 | .. note:: |
| 235 | This class does not transparently handle inputs containing multiple |
| 236 | compressed streams, unlike :func:`decompress` and :class:`LZMAFile`. To |
| 237 | decompress a multi-stream input with :class:`LZMADecompressor`, you must |
| 238 | create a new decompressor for each stream. |
| 239 | |
Antoine Pitrou | 26795ba | 2015-01-17 16:22:18 +0100 | [diff] [blame] | 240 | .. method:: decompress(data, max_length=-1) |
Nadeem Vawda | 3ff069e | 2011-11-30 00:25:06 +0200 | [diff] [blame] | 241 | |
Antoine Pitrou | 26795ba | 2015-01-17 16:22:18 +0100 | [diff] [blame] | 242 | Decompress *data* (a :term:`bytes-like object`), returning |
| 243 | uncompressed data as bytes. Some of *data* may be buffered |
| 244 | internally, for use in later calls to :meth:`decompress`. The |
| 245 | returned data should be concatenated with the output of any |
| 246 | previous calls to :meth:`decompress`. |
| 247 | |
| 248 | If *max_length* is nonnegative, returns at most *max_length* |
| 249 | bytes of decompressed data. If this limit is reached and further |
| 250 | output can be produced, the :attr:`~.needs_input` attribute will |
| 251 | be set to ``False``. In this case, the next call to |
| 252 | :meth:`~.decompress` may provide *data* as ``b''`` to obtain |
| 253 | more of the output. |
| 254 | |
| 255 | If all of the input data was decompressed and returned (either |
| 256 | because this was less than *max_length* bytes, or because |
| 257 | *max_length* was negative), the :attr:`~.needs_input` attribute |
| 258 | will be set to ``True``. |
| 259 | |
| 260 | Attempting to decompress data after the end of stream is reached |
| 261 | raises an `EOFError`. Any data found after the end of the |
| 262 | stream is ignored and saved in the :attr:`~.unused_data` attribute. |
| 263 | |
| 264 | .. versionchanged:: 3.5 |
| 265 | Added the *max_length* parameter. |
Nadeem Vawda | 3ff069e | 2011-11-30 00:25:06 +0200 | [diff] [blame] | 266 | |
| 267 | .. attribute:: check |
| 268 | |
| 269 | The ID of the integrity check used by the input stream. This may be |
| 270 | :const:`CHECK_UNKNOWN` until enough of the input has been decoded to |
| 271 | determine what integrity check it uses. |
| 272 | |
| 273 | .. attribute:: eof |
| 274 | |
Serhiy Storchaka | fbc1c26 | 2013-11-29 12:17:13 +0200 | [diff] [blame] | 275 | ``True`` if the end-of-stream marker has been reached. |
Nadeem Vawda | 3ff069e | 2011-11-30 00:25:06 +0200 | [diff] [blame] | 276 | |
| 277 | .. attribute:: unused_data |
| 278 | |
| 279 | Data found after the end of the compressed stream. |
| 280 | |
| 281 | Before the end of the stream is reached, this will be ``b""``. |
| 282 | |
Antoine Pitrou | 26795ba | 2015-01-17 16:22:18 +0100 | [diff] [blame] | 283 | .. attribute:: needs_input |
| 284 | |
| 285 | ``False`` if the :meth:`.decompress` method can provide more |
| 286 | decompressed data before requiring new uncompressed input. |
| 287 | |
| 288 | .. versionadded:: 3.5 |
Nadeem Vawda | 3ff069e | 2011-11-30 00:25:06 +0200 | [diff] [blame] | 289 | |
| 290 | .. function:: compress(data, format=FORMAT_XZ, check=-1, preset=None, filters=None) |
| 291 | |
| 292 | Compress *data* (a :class:`bytes` object), returning the compressed data as a |
| 293 | :class:`bytes` object. |
| 294 | |
| 295 | See :class:`LZMACompressor` above for a description of the *format*, *check*, |
| 296 | *preset* and *filters* arguments. |
| 297 | |
| 298 | |
| 299 | .. function:: decompress(data, format=FORMAT_AUTO, memlimit=None, filters=None) |
| 300 | |
| 301 | Decompress *data* (a :class:`bytes` object), returning the uncompressed data |
| 302 | as a :class:`bytes` object. |
| 303 | |
| 304 | If *data* is the concatenation of multiple distinct compressed streams, |
| 305 | decompress all of these streams, and return the concatenation of the results. |
| 306 | |
| 307 | See :class:`LZMADecompressor` above for a description of the *format*, |
| 308 | *memlimit* and *filters* arguments. |
| 309 | |
| 310 | |
| 311 | Miscellaneous |
| 312 | ------------- |
| 313 | |
Nadeem Vawda | bc459bb | 2012-05-06 23:01:51 +0200 | [diff] [blame] | 314 | .. function:: is_check_supported(check) |
Nadeem Vawda | 3ff069e | 2011-11-30 00:25:06 +0200 | [diff] [blame] | 315 | |
| 316 | Returns true if the given integrity check is supported on this system. |
| 317 | |
| 318 | :const:`CHECK_NONE` and :const:`CHECK_CRC32` are always supported. |
| 319 | :const:`CHECK_CRC64` and :const:`CHECK_SHA256` may be unavailable if you are |
| 320 | using a version of :program:`liblzma` that was compiled with a limited |
| 321 | feature set. |
| 322 | |
| 323 | |
Nadeem Vawda | 3ff069e | 2011-11-30 00:25:06 +0200 | [diff] [blame] | 324 | .. _filter-chain-specs: |
| 325 | |
| 326 | Specifying custom filter chains |
| 327 | ------------------------------- |
| 328 | |
| 329 | A filter chain specifier is a sequence of dictionaries, where each dictionary |
| 330 | contains the ID and options for a single filter. Each dictionary must contain |
| 331 | the key ``"id"``, and may contain additional keys to specify filter-dependent |
| 332 | options. Valid filter IDs are as follows: |
| 333 | |
| 334 | * Compression filters: |
| 335 | * :const:`FILTER_LZMA1` (for use with :const:`FORMAT_ALONE`) |
| 336 | * :const:`FILTER_LZMA2` (for use with :const:`FORMAT_XZ` and :const:`FORMAT_RAW`) |
| 337 | |
| 338 | * Delta filter: |
| 339 | * :const:`FILTER_DELTA` |
| 340 | |
| 341 | * Branch-Call-Jump (BCJ) filters: |
| 342 | * :const:`FILTER_X86` |
| 343 | * :const:`FILTER_IA64` |
| 344 | * :const:`FILTER_ARM` |
| 345 | * :const:`FILTER_ARMTHUMB` |
| 346 | * :const:`FILTER_POWERPC` |
| 347 | * :const:`FILTER_SPARC` |
| 348 | |
| 349 | A filter chain can consist of up to 4 filters, and cannot be empty. The last |
| 350 | filter in the chain must be a compression filter, and any other filters must be |
| 351 | delta or BCJ filters. |
| 352 | |
| 353 | Compression filters support the following options (specified as additional |
| 354 | entries in the dictionary representing the filter): |
| 355 | |
| 356 | * ``preset``: A compression preset to use as a source of default values for |
| 357 | options that are not specified explicitly. |
Serhiy Storchaka | f8def28 | 2013-02-16 17:29:56 +0200 | [diff] [blame] | 358 | * ``dict_size``: Dictionary size in bytes. This should be between 4 KiB and |
| 359 | 1.5 GiB (inclusive). |
Nadeem Vawda | 3ff069e | 2011-11-30 00:25:06 +0200 | [diff] [blame] | 360 | * ``lc``: Number of literal context bits. |
| 361 | * ``lp``: Number of literal position bits. The sum ``lc + lp`` must be at |
| 362 | most 4. |
| 363 | * ``pb``: Number of position bits; must be at most 4. |
| 364 | * ``mode``: :const:`MODE_FAST` or :const:`MODE_NORMAL`. |
| 365 | * ``nice_len``: What should be considered a "nice length" for a match. |
| 366 | This should be 273 or less. |
| 367 | * ``mf``: What match finder to use -- :const:`MF_HC3`, :const:`MF_HC4`, |
| 368 | :const:`MF_BT2`, :const:`MF_BT3`, or :const:`MF_BT4`. |
| 369 | * ``depth``: Maximum search depth used by match finder. 0 (default) means to |
| 370 | select automatically based on other filter options. |
| 371 | |
| 372 | The delta filter stores the differences between bytes, producing more repetitive |
Berker Peksag | b334ee0 | 2016-10-01 01:19:04 +0300 | [diff] [blame] | 373 | input for the compressor in certain circumstances. It supports one option, |
| 374 | ``dist``. This indicates the distance between bytes to be subtracted. The |
| 375 | default is 1, i.e. take the differences between adjacent bytes. |
Nadeem Vawda | 3ff069e | 2011-11-30 00:25:06 +0200 | [diff] [blame] | 376 | |
| 377 | The BCJ filters are intended to be applied to machine code. They convert |
| 378 | relative branches, calls and jumps in the code to use absolute addressing, with |
| 379 | the aim of increasing the redundancy that can be exploited by the compressor. |
| 380 | These filters support one option, ``start_offset``. This specifies the address |
| 381 | that should be mapped to the beginning of the input data. The default is 0. |
| 382 | |
| 383 | |
| 384 | Examples |
| 385 | -------- |
| 386 | |
| 387 | Reading in a compressed file:: |
| 388 | |
| 389 | import lzma |
Nadeem Vawda | 5011244 | 2012-09-23 18:20:23 +0200 | [diff] [blame] | 390 | with lzma.open("file.xz") as f: |
Nadeem Vawda | 667a13b | 2012-09-23 18:08:57 +0200 | [diff] [blame] | 391 | file_content = f.read() |
Nadeem Vawda | 3ff069e | 2011-11-30 00:25:06 +0200 | [diff] [blame] | 392 | |
| 393 | Creating a compressed file:: |
| 394 | |
| 395 | import lzma |
| 396 | data = b"Insert Data Here" |
Nadeem Vawda | 5011244 | 2012-09-23 18:20:23 +0200 | [diff] [blame] | 397 | with lzma.open("file.xz", "w") as f: |
Nadeem Vawda | 667a13b | 2012-09-23 18:08:57 +0200 | [diff] [blame] | 398 | f.write(data) |
Nadeem Vawda | 3ff069e | 2011-11-30 00:25:06 +0200 | [diff] [blame] | 399 | |
| 400 | Compressing data in memory:: |
| 401 | |
| 402 | import lzma |
| 403 | data_in = b"Insert Data Here" |
| 404 | data_out = lzma.compress(data_in) |
| 405 | |
| 406 | Incremental compression:: |
| 407 | |
| 408 | import lzma |
| 409 | lzc = lzma.LZMACompressor() |
| 410 | out1 = lzc.compress(b"Some data\n") |
| 411 | out2 = lzc.compress(b"Another piece of data\n") |
| 412 | out3 = lzc.compress(b"Even more data\n") |
| 413 | out4 = lzc.flush() |
| 414 | # Concatenate all the partial results: |
| 415 | result = b"".join([out1, out2, out3, out4]) |
| 416 | |
| 417 | Writing compressed data to an already-open file:: |
| 418 | |
| 419 | import lzma |
| 420 | with open("file.xz", "wb") as f: |
| 421 | f.write(b"This data will not be compressed\n") |
Nadeem Vawda | 5011244 | 2012-09-23 18:20:23 +0200 | [diff] [blame] | 422 | with lzma.open(f, "w") as lzf: |
Nadeem Vawda | 3ff069e | 2011-11-30 00:25:06 +0200 | [diff] [blame] | 423 | lzf.write(b"This *will* be compressed\n") |
| 424 | f.write(b"Not compressed\n") |
| 425 | |
| 426 | Creating a compressed file using a custom filter chain:: |
| 427 | |
| 428 | import lzma |
| 429 | my_filters = [ |
| 430 | {"id": lzma.FILTER_DELTA, "dist": 5}, |
| 431 | {"id": lzma.FILTER_LZMA2, "preset": 7 | lzma.PRESET_EXTREME}, |
| 432 | ] |
Nadeem Vawda | 5011244 | 2012-09-23 18:20:23 +0200 | [diff] [blame] | 433 | with lzma.open("file.xz", "w", filters=my_filters) as f: |
Nadeem Vawda | 3ff069e | 2011-11-30 00:25:06 +0200 | [diff] [blame] | 434 | f.write(b"blah blah blah") |