blob: 33c40676f747c57bdd494f17c7c5cf71ee28e956 [file] [log] [blame]
Georg Brandl116aa622007-08-15 14:28:22 +00001:mod:`gzip` --- Support for :program:`gzip` files
2=================================================
3
4.. module:: gzip
5 :synopsis: Interfaces for gzip compression and decompression using file objects.
6
Raymond Hettinger469271d2011-01-27 20:38:46 +00007**Source code:** :source:`Lib/gzip.py`
8
9--------------
10
Christian Heimesbbe741d2008-03-28 10:53:29 +000011This module provides a simple interface to compress and decompress files just
12like the GNU programs :program:`gzip` and :program:`gunzip` would.
Georg Brandl116aa622007-08-15 14:28:22 +000013
Georg Brandl1f01deb2009-01-03 22:47:39 +000014The data compression is provided by the :mod:`zlib` module.
Christian Heimesbbe741d2008-03-28 10:53:29 +000015
Nadeem Vawda7e126202012-05-06 15:04:01 +020016The :mod:`gzip` module provides the :class:`GzipFile` class, as well as the
Nadeem Vawda68721012012-06-04 23:21:38 +020017:func:`.open`, :func:`compress` and :func:`decompress` convenience functions.
18The :class:`GzipFile` class reads and writes :program:`gzip`\ -format files,
19automatically compressing or decompressing the data so that it looks like an
20ordinary :term:`file object`.
Christian Heimesbbe741d2008-03-28 10:53:29 +000021
22Note that additional file formats which can be decompressed by the
23:program:`gzip` and :program:`gunzip` programs, such as those produced by
24:program:`compress` and :program:`pack`, are not supported by this module.
Georg Brandl116aa622007-08-15 14:28:22 +000025
Georg Brandl116aa622007-08-15 14:28:22 +000026The module defines the following items:
27
28
Nadeem Vawda7e126202012-05-06 15:04:01 +020029.. function:: open(filename, mode='rb', compresslevel=9, encoding=None, errors=None, newline=None)
30
Nadeem Vawda68721012012-06-04 23:21:38 +020031 Open a gzip-compressed file in binary or text mode, returning a :term:`file
32 object`.
Nadeem Vawda7e126202012-05-06 15:04:01 +020033
Nadeem Vawda68721012012-06-04 23:21:38 +020034 The *filename* argument can be an actual filename (a :class:`str` or
35 :class:`bytes` object), or an existing file object to read from or write to.
Nadeem Vawda7e126202012-05-06 15:04:01 +020036
37 The *mode* argument can be any of ``'r'``, ``'rb'``, ``'a'``, ``'ab'``,
Nadeem Vawdaee1be992013-10-19 00:11:13 +020038 ``'w'``, ``'wb'``, ``'x'`` or ``'xb'`` for binary mode, or ``'rt'``,
39 ``'at'``, ``'wt'``, or ``'xt'`` for text mode. The default is ``'rb'``.
Nadeem Vawda7e126202012-05-06 15:04:01 +020040
Nadeem Vawda6ff262e2012-11-11 14:14:47 +010041 The *compresslevel* argument is an integer from 0 to 9, as for the
Nadeem Vawda7e126202012-05-06 15:04:01 +020042 :class:`GzipFile` constructor.
43
44 For binary mode, this function is equivalent to the :class:`GzipFile`
45 constructor: ``GzipFile(filename, mode, compresslevel)``. In this case, the
46 *encoding*, *errors* and *newline* arguments must not be provided.
47
48 For text mode, a :class:`GzipFile` object is created, and wrapped in an
49 :class:`io.TextIOWrapper` instance with the specified encoding, error
50 handling behavior, and line ending(s).
51
52 .. versionchanged:: 3.3
Nadeem Vawda68721012012-06-04 23:21:38 +020053 Added support for *filename* being a file object, support for text mode,
54 and the *encoding*, *errors* and *newline* arguments.
Nadeem Vawda7e126202012-05-06 15:04:01 +020055
Nadeem Vawdaee1be992013-10-19 00:11:13 +020056 .. versionchanged:: 3.4
57 Added support for the ``'x'``, ``'xb'`` and ``'xt'`` modes.
58
Berker Peksag03020cf2016-10-02 13:47:58 +030059 .. versionchanged:: 3.6
60 Accepts a :term:`path-like object`.
Nadeem Vawda7e126202012-05-06 15:04:01 +020061
Zackery Spytzcf599f62019-05-13 01:50:52 -060062.. exception:: BadGzipFile
63
64 An exception raised for invalid gzip files. It inherits :exc:`OSError`.
65 :exc:`EOFError` and :exc:`zlib.error` can also be raised for invalid gzip
66 files.
67
68 .. versionadded:: 3.8
69
Georg Brandl036490d2009-05-17 13:00:36 +000070.. class:: GzipFile(filename=None, mode=None, compresslevel=9, fileobj=None, mtime=None)
Georg Brandl116aa622007-08-15 14:28:22 +000071
Antoine Pitrouc3ed2e72010-09-29 10:49:46 +000072 Constructor for the :class:`GzipFile` class, which simulates most of the
73 methods of a :term:`file object`, with the exception of the :meth:`truncate`
74 method. At least one of *fileobj* and *filename* must be given a non-trivial
75 value.
Georg Brandl116aa622007-08-15 14:28:22 +000076
Serhiy Storchakad65c9492015-11-02 14:10:23 +020077 The new class instance is based on *fileobj*, which can be a regular file, an
Serhiy Storchakae79be872013-08-17 00:09:55 +030078 :class:`io.BytesIO` object, or any other object which simulates a file. It
Georg Brandl116aa622007-08-15 14:28:22 +000079 defaults to ``None``, in which case *filename* is opened to provide a file
80 object.
81
82 When *fileobj* is not ``None``, the *filename* argument is only used to be
Georg Brandlf27bfd82013-10-06 12:33:20 +020083 included in the :program:`gzip` file header, which may include the original
Georg Brandl116aa622007-08-15 14:28:22 +000084 filename of the uncompressed file. It defaults to the filename of *fileobj*, if
85 discernible; otherwise, it defaults to the empty string, and in this case the
86 original filename is not included in the header.
87
88 The *mode* argument can be any of ``'r'``, ``'rb'``, ``'a'``, ``'ab'``, ``'w'``,
Nadeem Vawdaee1be992013-10-19 00:11:13 +020089 ``'wb'``, ``'x'``, or ``'xb'``, depending on whether the file will be read or
90 written. The default is the mode of *fileobj* if discernible; otherwise, the
Serhiy Storchakaa0652322019-11-16 18:56:57 +020091 default is ``'rb'``. In future Python releases the mode of *fileobj* will
92 not be used. It is better to always specify *mode* for writing.
Nadeem Vawda30d94b72012-02-11 23:45:10 +020093
Nadeem Vawda7e126202012-05-06 15:04:01 +020094 Note that the file is always opened in binary mode. To open a compressed file
Nadeem Vawda68721012012-06-04 23:21:38 +020095 in text mode, use :func:`.open` (or wrap your :class:`GzipFile` with an
Nadeem Vawda7e126202012-05-06 15:04:01 +020096 :class:`io.TextIOWrapper`).
Georg Brandl116aa622007-08-15 14:28:22 +000097
Nadeem Vawda19e568d2012-11-11 14:04:14 +010098 The *compresslevel* argument is an integer from ``0`` to ``9`` controlling
99 the level of compression; ``1`` is fastest and produces the least
100 compression, and ``9`` is slowest and produces the most compression. ``0``
101 is no compression. The default is ``9``.
Georg Brandl116aa622007-08-15 14:28:22 +0000102
Antoine Pitrou42db3ef2009-01-04 21:37:59 +0000103 The *mtime* argument is an optional numeric timestamp to be written to
Antoine Pitrou2dbc6e62015-04-11 00:31:01 +0200104 the last modification time field in the stream when compressing. It
105 should only be provided in compression mode. If omitted or ``None``, the
106 current time is used. See the :attr:`mtime` attribute for more details.
Antoine Pitrou42db3ef2009-01-04 21:37:59 +0000107
Georg Brandl116aa622007-08-15 14:28:22 +0000108 Calling a :class:`GzipFile` object's :meth:`close` method does not close
109 *fileobj*, since you might wish to append more material after the compressed
Martin Panter7462b6492015-11-02 03:37:02 +0000110 data. This also allows you to pass an :class:`io.BytesIO` object opened for
Georg Brandl116aa622007-08-15 14:28:22 +0000111 writing as *fileobj*, and retrieve the resulting memory buffer using the
Antoine Pitroue5768cf2010-09-23 16:45:17 +0000112 :class:`io.BytesIO` object's :meth:`~io.BytesIO.getvalue` method.
Georg Brandl116aa622007-08-15 14:28:22 +0000113
Antoine Pitrouc3ed2e72010-09-29 10:49:46 +0000114 :class:`GzipFile` supports the :class:`io.BufferedIOBase` interface,
115 including iteration and the :keyword:`with` statement. Only the
116 :meth:`truncate` method isn't implemented.
Benjamin Petersone0124bd2009-03-09 21:04:33 +0000117
Antoine Pitrou2dbc6e62015-04-11 00:31:01 +0200118 :class:`GzipFile` also provides the following method and attribute:
Antoine Pitrou7b998e92010-10-04 21:55:14 +0000119
Antoine Pitrou2dbc6e62015-04-11 00:31:01 +0200120 .. method:: peek(n)
Antoine Pitrou7b998e92010-10-04 21:55:14 +0000121
122 Read *n* uncompressed bytes without advancing the file position.
123 At most one single read on the compressed stream is done to satisfy
124 the call. The number of bytes returned may be more or less than
125 requested.
126
Nadeem Vawda69761042013-12-08 19:47:22 +0100127 .. note:: While calling :meth:`peek` does not change the file position of
128 the :class:`GzipFile`, it may change the position of the underlying
129 file object (e.g. if the :class:`GzipFile` was constructed with the
130 *fileobj* parameter).
131
Antoine Pitrou7b998e92010-10-04 21:55:14 +0000132 .. versionadded:: 3.2
133
Antoine Pitrou2dbc6e62015-04-11 00:31:01 +0200134 .. attribute:: mtime
135
136 When decompressing, the value of the last modification time field in
137 the most recently read header may be read from this attribute, as an
138 integer. The initial value before reading any headers is ``None``.
139
140 All :program:`gzip` compressed streams are required to contain this
141 timestamp field. Some programs, such as :program:`gunzip`\ , make use
142 of the timestamp. The format is the same as the return value of
143 :func:`time.time` and the :attr:`~os.stat_result.st_mtime` attribute of
144 the object returned by :func:`os.stat`.
145
Benjamin Peterson10745a92009-03-09 21:08:47 +0000146 .. versionchanged:: 3.1
Georg Brandlffb94ae2013-10-06 19:02:08 +0200147 Support for the :keyword:`with` statement was added, along with the
Antoine Pitrou2dbc6e62015-04-11 00:31:01 +0200148 *mtime* constructor argument and :attr:`mtime` attribute.
Benjamin Petersone0124bd2009-03-09 21:04:33 +0000149
Antoine Pitrou8e33fd72010-01-13 14:37:26 +0000150 .. versionchanged:: 3.2
Georg Brandlffb94ae2013-10-06 19:02:08 +0200151 Support for zero-padded and unseekable files was added.
Antoine Pitrou7b969842010-09-23 16:22:51 +0000152
Antoine Pitrou6b4be362011-04-04 21:09:05 +0200153 .. versionchanged:: 3.3
154 The :meth:`io.BufferedIOBase.read1` method is now implemented.
155
Nadeem Vawdaee1be992013-10-19 00:11:13 +0200156 .. versionchanged:: 3.4
157 Added support for the ``'x'`` and ``'xb'`` modes.
158
Serhiy Storchakabca63b32015-03-23 14:59:48 +0200159 .. versionchanged:: 3.5
160 Added support for writing arbitrary
161 :term:`bytes-like objects <bytes-like object>`.
Antoine Pitrou2dbc6e62015-04-11 00:31:01 +0200162 The :meth:`~io.BufferedIOBase.read` method now accepts an argument of
163 ``None``.
Serhiy Storchakabca63b32015-03-23 14:59:48 +0200164
Berker Peksag03020cf2016-10-02 13:47:58 +0300165 .. versionchanged:: 3.6
166 Accepts a :term:`path-like object`.
167
Serhiy Storchakaa0652322019-11-16 18:56:57 +0200168 .. deprecated:: 3.9
169 Opening :class:`GzipFile` for writing without specifying the *mode*
170 argument is deprecated.
171
Georg Brandl116aa622007-08-15 14:28:22 +0000172
guoci0e7497c2018-11-07 04:50:23 -0500173.. function:: compress(data, compresslevel=9, *, mtime=None)
Antoine Pitrou79c5ef12010-08-17 21:10:05 +0000174
175 Compress the *data*, returning a :class:`bytes` object containing
guoci0e7497c2018-11-07 04:50:23 -0500176 the compressed data. *compresslevel* and *mtime* have the same meaning as in
Antoine Pitrou79c5ef12010-08-17 21:10:05 +0000177 the :class:`GzipFile` constructor above.
178
Antoine Pitroucdfe1c52010-08-17 21:15:00 +0000179 .. versionadded:: 3.2
guoci0e7497c2018-11-07 04:50:23 -0500180 .. versionchanged:: 3.8
181 Added the *mtime* parameter for reproducible output.
Antoine Pitroucdfe1c52010-08-17 21:15:00 +0000182
Antoine Pitrou79c5ef12010-08-17 21:10:05 +0000183.. function:: decompress(data)
184
185 Decompress the *data*, returning a :class:`bytes` object containing the
186 uncompressed data.
187
Antoine Pitroucdfe1c52010-08-17 21:15:00 +0000188 .. versionadded:: 3.2
189
Georg Brandl116aa622007-08-15 14:28:22 +0000190
Christian Heimesbbe741d2008-03-28 10:53:29 +0000191.. _gzip-usage-examples:
192
193Examples of usage
194-----------------
195
196Example of how to read a compressed file::
197
198 import gzip
Antoine Pitroubf1a0182010-08-17 21:11:49 +0000199 with gzip.open('/home/joe/file.txt.gz', 'rb') as f:
200 file_content = f.read()
Christian Heimesbbe741d2008-03-28 10:53:29 +0000201
202Example of how to create a compressed GZIP file::
203
204 import gzip
Antoine Pitroubf1a0182010-08-17 21:11:49 +0000205 content = b"Lots of content here"
206 with gzip.open('/home/joe/file.txt.gz', 'wb') as f:
207 f.write(content)
Christian Heimesbbe741d2008-03-28 10:53:29 +0000208
209Example of how to GZIP compress an existing file::
210
211 import gzip
Andrew Kuchlingf887a612015-04-14 11:44:40 -0400212 import shutil
Antoine Pitroubf1a0182010-08-17 21:11:49 +0000213 with open('/home/joe/file.txt', 'rb') as f_in:
Éric Araujof5be0902010-08-17 21:24:05 +0000214 with gzip.open('/home/joe/file.txt.gz', 'wb') as f_out:
Andrew Kuchlingf887a612015-04-14 11:44:40 -0400215 shutil.copyfileobj(f_in, f_out)
Christian Heimesbbe741d2008-03-28 10:53:29 +0000216
Antoine Pitrou79c5ef12010-08-17 21:10:05 +0000217Example of how to GZIP compress a binary string::
218
219 import gzip
220 s_in = b"Lots of content here"
221 s_out = gzip.compress(s_in)
Christian Heimesbbe741d2008-03-28 10:53:29 +0000222
Georg Brandl116aa622007-08-15 14:28:22 +0000223.. seealso::
224
225 Module :mod:`zlib`
226 The basic data compression module needed to support the :program:`gzip` file
227 format.
228
Serhiy Storchaka083a7a12018-11-05 17:47:27 +0200229
230.. program:: gzip
231
Stéphane Wirtel7c817e62018-10-10 08:28:26 +0200232Command Line Interface
233----------------------
234
235The :mod:`gzip` module provides a simple command line interface to compress or
236decompress files.
237
238Once executed the :mod:`gzip` module keeps the input file(s).
239
240.. versionchanged:: 3.8
241
242 Add a new command line interface with a usage.
Stéphane Wirtel3e28eed2018-11-03 16:24:23 +0100243 By default, when you will execute the CLI, the default compression level is 6.
Stéphane Wirtel7c817e62018-10-10 08:28:26 +0200244
245Command line options
246^^^^^^^^^^^^^^^^^^^^
247
248.. cmdoption:: file
249
Stéphane Wirtel7c817e62018-10-10 08:28:26 +0200250 If *file* is not specified, read from :attr:`sys.stdin`.
251
Stéphane Wirtel3e28eed2018-11-03 16:24:23 +0100252.. cmdoption:: --fast
253
254 Indicates the fastest compression method (less compression).
255
256.. cmdoption:: --best
257
258 Indicates the slowest compression method (best compression).
259
Stéphane Wirtel7c817e62018-10-10 08:28:26 +0200260.. cmdoption:: -d, --decompress
261
Stéphane Wirtel3e28eed2018-11-03 16:24:23 +0100262 Decompress the given file.
Stéphane Wirtel7c817e62018-10-10 08:28:26 +0200263
264.. cmdoption:: -h, --help
265
266 Show the help message.
267