Blame - Doc/library/lzma.rst - platform/external/python/cpython3

blob: 33a542883aac4bd5d6affc34e0125f31690ee442 [file] [log] [blame]

Nadeem Vawda	3ff069e	2011-11-30 00:25:06 +0200	[diff] [blame]	1	:mod:`lzma` --- Compression using the LZMA algorithm
				2	====================================================
				3
				4	.. module:: lzma
				5	:synopsis: A Python wrapper for the liblzma compression library.
				6	.. moduleauthor:: Nadeem Vawda <nadeem.vawda@gmail.com>
				7	.. sectionauthor:: Nadeem Vawda <nadeem.vawda@gmail.com>
				8
				9	.. versionadded:: 3.3
				10
				11
				12	This module provides classes and convenience functions for compressing and
				13	decompressing data using the LZMA compression algorithm. Also included is a file
				14	interface supporting the ``.xz`` and legacy ``.lzma`` file formats used by the
				15	:program:`xz` utility, as well as raw compressed streams.
				16
Nadeem Vawda	3ff069e	2011-11-30 00:25:06 +0200	[diff] [blame]	17	The interface provided by this module is very similar to that of the :mod:`bz2`
				18	module. However, note that :class:`LZMAFile` is not thread-safe, unlike
				19	:class:`bz2.BZ2File`, so if you need to use a single :class:`LZMAFile` instance
				20	from multiple threads, it is necessary to protect it with a lock.
				21
				22
				23	.. exception:: LZMAError
				24
				25	This exception is raised when an error occurs during compression or
				26	decompression, or while initializing the compressor/decompressor state.
				27
				28
				29	Reading and writing compressed files
				30	------------------------------------
				31
Nadeem Vawda	d85d0e7	2012-02-04 14:06:07 +0200	[diff] [blame]	32	.. class:: LZMAFile(filename=None, mode="r", \*, fileobj=None, format=None, check=-1, preset=None, filters=None)
Nadeem Vawda	3ff069e	2011-11-30 00:25:06 +0200	[diff] [blame]	33
				34	Open an LZMA-compressed file.
				35
				36	An :class:`LZMAFile` can wrap an existing :term:`file object` (given by
				37	fileobj), or operate directly on a named file (named by filename).
				38	Exactly one of these two parameters should be provided. If fileobj is
				39	provided, it is not closed when the :class:`LZMAFile` is closed.
				40
				41	The mode argument can be either ``"r"`` for reading (default), ``"w"`` for
				42	overwriting, or ``"a"`` for appending. If fileobj is provided, a mode of
				43	``"w"`` does not truncate the file, and is instead equivalent to ``"a"``.
				44
				45	When opening a file for reading, the input file may be the concatenation of
				46	multiple separate compressed streams. These are transparently decoded as a
				47	single logical stream.
				48
				49	When opening a file for reading, the format and filters arguments have
				50	the same meanings as for :class:`LZMADecompressor`. In this case, the check
				51	and preset arguments should not be used.
				52
				53	When opening a file for writing, the format, check, preset and
				54	filters arguments have the same meanings as for :class:`LZMACompressor`.
				55
				56	:class:`LZMAFile` supports all the members specified by
				57	:class:`io.BufferedIOBase`, except for :meth:`detach` and :meth:`truncate`.
				58	Iteration and the :keyword:`with` statement are supported.
				59
				60	The following method is also provided:
				61
				62	.. method:: peek(size=-1)
				63
				64	Return buffered data without advancing the file position. At least one
				65	byte of data will be returned, unless EOF has been reached. The exact
				66	number of bytes returned is unspecified (the size argument is ignored).
				67
				68
				69	Compressing and decompressing data in memory
				70	--------------------------------------------
				71
				72	.. class:: LZMACompressor(format=FORMAT_XZ, check=-1, preset=None, filters=None)
				73
				74	Create a compressor object, which can be used to compress data incrementally.
				75
				76	For a more convenient way of compressing a single chunk of data, see
				77	:func:`compress`.
				78
				79	The format argument specifies what container format should be used.
				80	Possible values are:
				81
				82	* :const:`FORMAT_XZ`: The ``.xz`` container format.
				83	This is the default format.
				84
				85	* :const:`FORMAT_ALONE`: The legacy ``.lzma`` container format.
				86	This format is more limited than ``.xz`` -- it does not support integrity
				87	checks or multiple filters.
				88
				89	* :const:`FORMAT_RAW`: A raw data stream, not using any container format.
				90	This format specifier does not support integrity checks, and requires that
				91	you always specify a custom filter chain (for both compression and
				92	decompression). Additionally, data compressed in this manner cannot be
				93	decompressed using :const:`FORMAT_AUTO` (see :class:`LZMADecompressor`).
				94
				95	The check argument specifies the type of integrity check to include in the
				96	compressed data. This check is used when decompressing, to ensure that the
				97	data has not been corrupted. Possible values are:
				98
				99	* :const:`CHECK_NONE`: No integrity check.
				100	This is the default (and the only acceptable value) for
				101	:const:`FORMAT_ALONE` and :const:`FORMAT_RAW`.
				102
				103	* :const:`CHECK_CRC32`: 32-bit Cyclic Redundancy Check.
				104
				105	* :const:`CHECK_CRC64`: 64-bit Cyclic Redundancy Check.
				106	This is the default for :const:`FORMAT_XZ`.
				107
				108	* :const:`CHECK_SHA256`: 256-bit Secure Hash Algorithm.
				109
				110	If the specified check is not supported, an :class:`LZMAError` is raised.
				111
				112	The compression settings can be specified either as a preset compression
				113	level (with the preset argument), or in detail as a custom filter chain
				114	(with the filters argument).
				115
				116	The preset argument (if provided) should be an integer between ``0`` and
				117	``9`` (inclusive), optionally OR-ed with the constant
				118	:const:`PRESET_EXTREME`. If neither preset nor filters are given, the
				119	default behavior is to use :const:`PRESET_DEFAULT` (preset level ``6``).
Nadeem Vawda	dc9dd0d	2012-01-02 02:24:20 +0200	[diff] [blame]	120	Higher presets produce smaller output, but make the compression process
				121	slower.
				122
				123	.. note::
				124
				125	In addition to being more CPU-intensive, compression with higher presets
				126	also requires much more memory (and produces output that needs more memory
				127	to decompress). With preset ``9`` for example, the overhead for an
				128	:class:`LZMACompressor` object can be as high as 800MiB. For this reason,
				129	it is generally best to stick with the default preset.
Nadeem Vawda	3ff069e	2011-11-30 00:25:06 +0200	[diff] [blame]	130
				131	The filters argument (if provided) should be a filter chain specifier.
				132	See :ref:`filter-chain-specs` for details.
				133
				134	.. method:: compress(data)
				135
				136	Compress data (a :class:`bytes` object), returning a :class:`bytes`
				137	object containing compressed data for at least part of the input. Some of
				138	data may be buffered internally, for use in later calls to
				139	:meth:`compress` and :meth:`flush`. The returned data should be
				140	concatenated with the output of any previous calls to :meth:`compress`.
				141
				142	.. method:: flush()
				143
				144	Finish the compression process, returning a :class:`bytes` object
				145	containing any data stored in the compressor's internal buffers.
				146
				147	The compressor cannot be used after this method has been called.
				148
				149
				150	.. class:: LZMADecompressor(format=FORMAT_AUTO, memlimit=None, filters=None)
				151
				152	Create a decompressor object, which can be used to decompress data
				153	incrementally.
				154
				155	For a more convenient way of decompressing an entire compressed stream at
				156	once, see :func:`decompress`.
				157
				158	The format argument specifies the container format that should be used. The
				159	default is :const:`FORMAT_AUTO`, which can decompress both ``.xz`` and
				160	``.lzma`` files. Other possible values are :const:`FORMAT_XZ`,
				161	:const:`FORMAT_ALONE`, and :const:`FORMAT_RAW`.
				162
				163	The memlimit argument specifies a limit (in bytes) on the amount of memory
				164	that the decompressor can use. When this argument is used, decompression will
				165	fail with an :class:`LZMAError` if it is not possible to decompress the input
				166	within the given memory limit.
				167
				168	The filters argument specifies the filter chain that was used to create
				169	the stream being decompressed. This argument is required if format is
				170	:const:`FORMAT_RAW`, but should not be used for other formats.
				171	See :ref:`filter-chain-specs` for more information about filter chains.
				172
				173	.. note::
				174	This class does not transparently handle inputs containing multiple
				175	compressed streams, unlike :func:`decompress` and :class:`LZMAFile`. To
				176	decompress a multi-stream input with :class:`LZMADecompressor`, you must
				177	create a new decompressor for each stream.
				178
				179	.. method:: decompress(data)
				180
				181	Decompress data (a :class:`bytes` object), returning a :class:`bytes`
				182	object containing the decompressed data for at least part of the input.
				183	Some of data may be buffered internally, for use in later calls to
				184	:meth:`decompress`. The returned data should be concatenated with the
				185	output of any previous calls to :meth:`decompress`.
				186
				187	.. attribute:: check
				188
				189	The ID of the integrity check used by the input stream. This may be
				190	:const:`CHECK_UNKNOWN` until enough of the input has been decoded to
				191	determine what integrity check it uses.
				192
				193	.. attribute:: eof
				194
				195	True if the end-of-stream marker has been reached.
				196
				197	.. attribute:: unused_data
				198
				199	Data found after the end of the compressed stream.
				200
				201	Before the end of the stream is reached, this will be ``b""``.
				202
				203
				204	.. function:: compress(data, format=FORMAT_XZ, check=-1, preset=None, filters=None)
				205
				206	Compress data (a :class:`bytes` object), returning the compressed data as a
				207	:class:`bytes` object.
				208
				209	See :class:`LZMACompressor` above for a description of the format, check,
				210	preset and filters arguments.
				211
				212
				213	.. function:: decompress(data, format=FORMAT_AUTO, memlimit=None, filters=None)
				214
				215	Decompress data (a :class:`bytes` object), returning the uncompressed data
				216	as a :class:`bytes` object.
				217
				218	If data is the concatenation of multiple distinct compressed streams,
				219	decompress all of these streams, and return the concatenation of the results.
				220
				221	See :class:`LZMADecompressor` above for a description of the format,
				222	memlimit and filters arguments.
				223
				224
				225	Miscellaneous
				226	-------------
				227
Nadeem Vawda	bc459bb	2012-05-06 23:01:51 +0200	[diff] [blame^]	228	.. function:: is_check_supported(check)
Nadeem Vawda	3ff069e	2011-11-30 00:25:06 +0200	[diff] [blame]	229
				230	Returns true if the given integrity check is supported on this system.
				231
				232	:const:`CHECK_NONE` and :const:`CHECK_CRC32` are always supported.
				233	:const:`CHECK_CRC64` and :const:`CHECK_SHA256` may be unavailable if you are
				234	using a version of :program:`liblzma` that was compiled with a limited
				235	feature set.
				236
				237
Nadeem Vawda	f55b329	2012-05-06 23:01:27 +0200	[diff] [blame]	238	.. function:: encode_filter_properties(filter)
				239
				240	Return a :class:`bytes` object encoding the options (properties) of the
				241	filter specified by filter (a dictionary).
				242
				243	filter is interpreted as a filter specifier, as described in
				244	:ref:`filter-chain-specs`.
				245
				246	The returned data does not include the filter ID itself, only the options.
				247
				248	This function is primarily of interest to users implementing custom file
				249	formats.
				250
				251
				252	.. function:: decode_filter_properties(filter_id, encoded_props)
				253
				254	Return a dictionary describing a filter with ID filter_id, and options
				255	(properties) decoded from the :class:`bytes` object encoded_props.
				256
				257	The returned dictionary is a filter specifier, as described in
				258	:ref:`filter-chain-specs`.
				259
				260	This function is primarily of interest to users implementing custom file
				261	formats.
				262
				263
Nadeem Vawda	3ff069e	2011-11-30 00:25:06 +0200	[diff] [blame]	264	.. _filter-chain-specs:
				265
				266	Specifying custom filter chains
				267	-------------------------------
				268
				269	A filter chain specifier is a sequence of dictionaries, where each dictionary
				270	contains the ID and options for a single filter. Each dictionary must contain
				271	the key ``"id"``, and may contain additional keys to specify filter-dependent
				272	options. Valid filter IDs are as follows:
				273
				274	* Compression filters:
				275	* :const:`FILTER_LZMA1` (for use with :const:`FORMAT_ALONE`)
				276	* :const:`FILTER_LZMA2` (for use with :const:`FORMAT_XZ` and :const:`FORMAT_RAW`)
				277
				278	* Delta filter:
				279	* :const:`FILTER_DELTA`
				280
				281	* Branch-Call-Jump (BCJ) filters:
				282	* :const:`FILTER_X86`
				283	* :const:`FILTER_IA64`
				284	* :const:`FILTER_ARM`
				285	* :const:`FILTER_ARMTHUMB`
				286	* :const:`FILTER_POWERPC`
				287	* :const:`FILTER_SPARC`
				288
				289	A filter chain can consist of up to 4 filters, and cannot be empty. The last
				290	filter in the chain must be a compression filter, and any other filters must be
				291	delta or BCJ filters.
				292
				293	Compression filters support the following options (specified as additional
				294	entries in the dictionary representing the filter):
				295
				296	* ``preset``: A compression preset to use as a source of default values for
				297	options that are not specified explicitly.
				298	* ``dict_size``: Dictionary size in bytes. This should be between 4KiB and
				299	1.5GiB (inclusive).
				300	* ``lc``: Number of literal context bits.
				301	* ``lp``: Number of literal position bits. The sum ``lc + lp`` must be at
				302	most 4.
				303	* ``pb``: Number of position bits; must be at most 4.
				304	* ``mode``: :const:`MODE_FAST` or :const:`MODE_NORMAL`.
				305	* ``nice_len``: What should be considered a "nice length" for a match.
				306	This should be 273 or less.
				307	* ``mf``: What match finder to use -- :const:`MF_HC3`, :const:`MF_HC4`,
				308	:const:`MF_BT2`, :const:`MF_BT3`, or :const:`MF_BT4`.
				309	* ``depth``: Maximum search depth used by match finder. 0 (default) means to
				310	select automatically based on other filter options.
				311
				312	The delta filter stores the differences between bytes, producing more repetitive
				313	input for the compressor in certain circumstances. It only supports a single
				314	The delta filter supports only one option, ``dist``. This indicates the distance
				315	between bytes to be subtracted. The default is 1, i.e. take the differences
				316	between adjacent bytes.
				317
				318	The BCJ filters are intended to be applied to machine code. They convert
				319	relative branches, calls and jumps in the code to use absolute addressing, with
				320	the aim of increasing the redundancy that can be exploited by the compressor.
				321	These filters support one option, ``start_offset``. This specifies the address
				322	that should be mapped to the beginning of the input data. The default is 0.
				323
				324
				325	Examples
				326	--------
				327
				328	Reading in a compressed file::
				329
				330	import lzma
				331	with lzma.LZMAFile("file.xz") as f:
				332	file_content = f.read()
				333
				334	Creating a compressed file::
				335
				336	import lzma
				337	data = b"Insert Data Here"
				338	with lzma.LZMAFile("file.xz", "w") as f:
				339	f.write(data)
				340
				341	Compressing data in memory::
				342
				343	import lzma
				344	data_in = b"Insert Data Here"
				345	data_out = lzma.compress(data_in)
				346
				347	Incremental compression::
				348
				349	import lzma
				350	lzc = lzma.LZMACompressor()
				351	out1 = lzc.compress(b"Some data\n")
				352	out2 = lzc.compress(b"Another piece of data\n")
				353	out3 = lzc.compress(b"Even more data\n")
				354	out4 = lzc.flush()
				355	# Concatenate all the partial results:
				356	result = b"".join([out1, out2, out3, out4])
				357
				358	Writing compressed data to an already-open file::
				359
				360	import lzma
				361	with open("file.xz", "wb") as f:
				362	f.write(b"This data will not be compressed\n")
				363	with lzma.LZMAFile(fileobj=f, mode="w") as lzf:
				364	lzf.write(b"This will be compressed\n")
				365	f.write(b"Not compressed\n")
				366
				367	Creating a compressed file using a custom filter chain::
				368
				369	import lzma
				370	my_filters = [
				371	{"id": lzma.FILTER_DELTA, "dist": 5},
				372	{"id": lzma.FILTER_LZMA2, "preset": 7 \| lzma.PRESET_EXTREME},
				373	]
				374	with lzma.LZMAFile("file.xz", "w", filters=my_filters) as f:
				375	f.write(b"blah blah blah")