Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 1 | :mod:`gzip` --- Support for :program:`gzip` files |
| 2 | ================================================= |
| 3 | |
| 4 | .. module:: gzip |
| 5 | :synopsis: Interfaces for gzip compression and decompression using file objects. |
| 6 | |
Raymond Hettinger | 469271d | 2011-01-27 20:38:46 +0000 | [diff] [blame] | 7 | **Source code:** :source:`Lib/gzip.py` |
| 8 | |
| 9 | -------------- |
| 10 | |
Christian Heimes | bbe741d | 2008-03-28 10:53:29 +0000 | [diff] [blame] | 11 | This module provides a simple interface to compress and decompress files just |
| 12 | like the GNU programs :program:`gzip` and :program:`gunzip` would. |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 13 | |
Georg Brandl | 1f01deb | 2009-01-03 22:47:39 +0000 | [diff] [blame] | 14 | The data compression is provided by the :mod:`zlib` module. |
Christian Heimes | bbe741d | 2008-03-28 10:53:29 +0000 | [diff] [blame] | 15 | |
Nadeem Vawda | 7e12620 | 2012-05-06 15:04:01 +0200 | [diff] [blame] | 16 | The :mod:`gzip` module provides the :class:`GzipFile` class, as well as the |
Nadeem Vawda | 6872101 | 2012-06-04 23:21:38 +0200 | [diff] [blame] | 17 | :func:`.open`, :func:`compress` and :func:`decompress` convenience functions. |
| 18 | The :class:`GzipFile` class reads and writes :program:`gzip`\ -format files, |
| 19 | automatically compressing or decompressing the data so that it looks like an |
| 20 | ordinary :term:`file object`. |
Christian Heimes | bbe741d | 2008-03-28 10:53:29 +0000 | [diff] [blame] | 21 | |
| 22 | Note that additional file formats which can be decompressed by the |
| 23 | :program:`gzip` and :program:`gunzip` programs, such as those produced by |
| 24 | :program:`compress` and :program:`pack`, are not supported by this module. |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 25 | |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 26 | The module defines the following items: |
| 27 | |
| 28 | |
Nadeem Vawda | 7e12620 | 2012-05-06 15:04:01 +0200 | [diff] [blame] | 29 | .. function:: open(filename, mode='rb', compresslevel=9, encoding=None, errors=None, newline=None) |
| 30 | |
Nadeem Vawda | 6872101 | 2012-06-04 23:21:38 +0200 | [diff] [blame] | 31 | Open a gzip-compressed file in binary or text mode, returning a :term:`file |
| 32 | object`. |
Nadeem Vawda | 7e12620 | 2012-05-06 15:04:01 +0200 | [diff] [blame] | 33 | |
Nadeem Vawda | 6872101 | 2012-06-04 23:21:38 +0200 | [diff] [blame] | 34 | The *filename* argument can be an actual filename (a :class:`str` or |
| 35 | :class:`bytes` object), or an existing file object to read from or write to. |
Nadeem Vawda | 7e12620 | 2012-05-06 15:04:01 +0200 | [diff] [blame] | 36 | |
| 37 | The *mode* argument can be any of ``'r'``, ``'rb'``, ``'a'``, ``'ab'``, |
Nadeem Vawda | ee1be99 | 2013-10-19 00:11:13 +0200 | [diff] [blame] | 38 | ``'w'``, ``'wb'``, ``'x'`` or ``'xb'`` for binary mode, or ``'rt'``, |
| 39 | ``'at'``, ``'wt'``, or ``'xt'`` for text mode. The default is ``'rb'``. |
Nadeem Vawda | 7e12620 | 2012-05-06 15:04:01 +0200 | [diff] [blame] | 40 | |
Nadeem Vawda | 6ff262e | 2012-11-11 14:14:47 +0100 | [diff] [blame] | 41 | The *compresslevel* argument is an integer from 0 to 9, as for the |
Nadeem Vawda | 7e12620 | 2012-05-06 15:04:01 +0200 | [diff] [blame] | 42 | :class:`GzipFile` constructor. |
| 43 | |
| 44 | For binary mode, this function is equivalent to the :class:`GzipFile` |
| 45 | constructor: ``GzipFile(filename, mode, compresslevel)``. In this case, the |
| 46 | *encoding*, *errors* and *newline* arguments must not be provided. |
| 47 | |
| 48 | For text mode, a :class:`GzipFile` object is created, and wrapped in an |
| 49 | :class:`io.TextIOWrapper` instance with the specified encoding, error |
| 50 | handling behavior, and line ending(s). |
| 51 | |
| 52 | .. versionchanged:: 3.3 |
Nadeem Vawda | 6872101 | 2012-06-04 23:21:38 +0200 | [diff] [blame] | 53 | Added support for *filename* being a file object, support for text mode, |
| 54 | and the *encoding*, *errors* and *newline* arguments. |
Nadeem Vawda | 7e12620 | 2012-05-06 15:04:01 +0200 | [diff] [blame] | 55 | |
Nadeem Vawda | ee1be99 | 2013-10-19 00:11:13 +0200 | [diff] [blame] | 56 | .. versionchanged:: 3.4 |
| 57 | Added support for the ``'x'``, ``'xb'`` and ``'xt'`` modes. |
| 58 | |
Berker Peksag | 03020cf | 2016-10-02 13:47:58 +0300 | [diff] [blame] | 59 | .. versionchanged:: 3.6 |
| 60 | Accepts a :term:`path-like object`. |
Nadeem Vawda | 7e12620 | 2012-05-06 15:04:01 +0200 | [diff] [blame] | 61 | |
Georg Brandl | 036490d | 2009-05-17 13:00:36 +0000 | [diff] [blame] | 62 | .. class:: GzipFile(filename=None, mode=None, compresslevel=9, fileobj=None, mtime=None) |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 63 | |
Antoine Pitrou | c3ed2e7 | 2010-09-29 10:49:46 +0000 | [diff] [blame] | 64 | Constructor for the :class:`GzipFile` class, which simulates most of the |
| 65 | methods of a :term:`file object`, with the exception of the :meth:`truncate` |
| 66 | method. At least one of *fileobj* and *filename* must be given a non-trivial |
| 67 | value. |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 68 | |
Serhiy Storchaka | d65c949 | 2015-11-02 14:10:23 +0200 | [diff] [blame] | 69 | The new class instance is based on *fileobj*, which can be a regular file, an |
Serhiy Storchaka | e79be87 | 2013-08-17 00:09:55 +0300 | [diff] [blame] | 70 | :class:`io.BytesIO` object, or any other object which simulates a file. It |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 71 | defaults to ``None``, in which case *filename* is opened to provide a file |
| 72 | object. |
| 73 | |
| 74 | When *fileobj* is not ``None``, the *filename* argument is only used to be |
Georg Brandl | f27bfd8 | 2013-10-06 12:33:20 +0200 | [diff] [blame] | 75 | included in the :program:`gzip` file header, which may include the original |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 76 | filename of the uncompressed file. It defaults to the filename of *fileobj*, if |
| 77 | discernible; otherwise, it defaults to the empty string, and in this case the |
| 78 | original filename is not included in the header. |
| 79 | |
| 80 | The *mode* argument can be any of ``'r'``, ``'rb'``, ``'a'``, ``'ab'``, ``'w'``, |
Nadeem Vawda | ee1be99 | 2013-10-19 00:11:13 +0200 | [diff] [blame] | 81 | ``'wb'``, ``'x'``, or ``'xb'``, depending on whether the file will be read or |
| 82 | written. The default is the mode of *fileobj* if discernible; otherwise, the |
| 83 | default is ``'rb'``. |
Nadeem Vawda | 30d94b7 | 2012-02-11 23:45:10 +0200 | [diff] [blame] | 84 | |
Nadeem Vawda | 7e12620 | 2012-05-06 15:04:01 +0200 | [diff] [blame] | 85 | Note that the file is always opened in binary mode. To open a compressed file |
Nadeem Vawda | 6872101 | 2012-06-04 23:21:38 +0200 | [diff] [blame] | 86 | in text mode, use :func:`.open` (or wrap your :class:`GzipFile` with an |
Nadeem Vawda | 7e12620 | 2012-05-06 15:04:01 +0200 | [diff] [blame] | 87 | :class:`io.TextIOWrapper`). |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 88 | |
Nadeem Vawda | 19e568d | 2012-11-11 14:04:14 +0100 | [diff] [blame] | 89 | The *compresslevel* argument is an integer from ``0`` to ``9`` controlling |
| 90 | the level of compression; ``1`` is fastest and produces the least |
| 91 | compression, and ``9`` is slowest and produces the most compression. ``0`` |
| 92 | is no compression. The default is ``9``. |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 93 | |
Antoine Pitrou | 42db3ef | 2009-01-04 21:37:59 +0000 | [diff] [blame] | 94 | The *mtime* argument is an optional numeric timestamp to be written to |
Antoine Pitrou | 2dbc6e6 | 2015-04-11 00:31:01 +0200 | [diff] [blame] | 95 | the last modification time field in the stream when compressing. It |
| 96 | should only be provided in compression mode. If omitted or ``None``, the |
| 97 | current time is used. See the :attr:`mtime` attribute for more details. |
Antoine Pitrou | 42db3ef | 2009-01-04 21:37:59 +0000 | [diff] [blame] | 98 | |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 99 | Calling a :class:`GzipFile` object's :meth:`close` method does not close |
| 100 | *fileobj*, since you might wish to append more material after the compressed |
Martin Panter | 7462b649 | 2015-11-02 03:37:02 +0000 | [diff] [blame] | 101 | data. This also allows you to pass an :class:`io.BytesIO` object opened for |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 102 | writing as *fileobj*, and retrieve the resulting memory buffer using the |
Antoine Pitrou | e5768cf | 2010-09-23 16:45:17 +0000 | [diff] [blame] | 103 | :class:`io.BytesIO` object's :meth:`~io.BytesIO.getvalue` method. |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 104 | |
Antoine Pitrou | c3ed2e7 | 2010-09-29 10:49:46 +0000 | [diff] [blame] | 105 | :class:`GzipFile` supports the :class:`io.BufferedIOBase` interface, |
| 106 | including iteration and the :keyword:`with` statement. Only the |
| 107 | :meth:`truncate` method isn't implemented. |
Benjamin Peterson | e0124bd | 2009-03-09 21:04:33 +0000 | [diff] [blame] | 108 | |
Antoine Pitrou | 2dbc6e6 | 2015-04-11 00:31:01 +0200 | [diff] [blame] | 109 | :class:`GzipFile` also provides the following method and attribute: |
Antoine Pitrou | 7b998e9 | 2010-10-04 21:55:14 +0000 | [diff] [blame] | 110 | |
Antoine Pitrou | 2dbc6e6 | 2015-04-11 00:31:01 +0200 | [diff] [blame] | 111 | .. method:: peek(n) |
Antoine Pitrou | 7b998e9 | 2010-10-04 21:55:14 +0000 | [diff] [blame] | 112 | |
| 113 | Read *n* uncompressed bytes without advancing the file position. |
| 114 | At most one single read on the compressed stream is done to satisfy |
| 115 | the call. The number of bytes returned may be more or less than |
| 116 | requested. |
| 117 | |
Nadeem Vawda | 6976104 | 2013-12-08 19:47:22 +0100 | [diff] [blame] | 118 | .. note:: While calling :meth:`peek` does not change the file position of |
| 119 | the :class:`GzipFile`, it may change the position of the underlying |
| 120 | file object (e.g. if the :class:`GzipFile` was constructed with the |
| 121 | *fileobj* parameter). |
| 122 | |
Antoine Pitrou | 7b998e9 | 2010-10-04 21:55:14 +0000 | [diff] [blame] | 123 | .. versionadded:: 3.2 |
| 124 | |
Antoine Pitrou | 2dbc6e6 | 2015-04-11 00:31:01 +0200 | [diff] [blame] | 125 | .. attribute:: mtime |
| 126 | |
| 127 | When decompressing, the value of the last modification time field in |
| 128 | the most recently read header may be read from this attribute, as an |
| 129 | integer. The initial value before reading any headers is ``None``. |
| 130 | |
| 131 | All :program:`gzip` compressed streams are required to contain this |
| 132 | timestamp field. Some programs, such as :program:`gunzip`\ , make use |
| 133 | of the timestamp. The format is the same as the return value of |
| 134 | :func:`time.time` and the :attr:`~os.stat_result.st_mtime` attribute of |
| 135 | the object returned by :func:`os.stat`. |
| 136 | |
Benjamin Peterson | 10745a9 | 2009-03-09 21:08:47 +0000 | [diff] [blame] | 137 | .. versionchanged:: 3.1 |
Georg Brandl | ffb94ae | 2013-10-06 19:02:08 +0200 | [diff] [blame] | 138 | Support for the :keyword:`with` statement was added, along with the |
Antoine Pitrou | 2dbc6e6 | 2015-04-11 00:31:01 +0200 | [diff] [blame] | 139 | *mtime* constructor argument and :attr:`mtime` attribute. |
Benjamin Peterson | e0124bd | 2009-03-09 21:04:33 +0000 | [diff] [blame] | 140 | |
Antoine Pitrou | 8e33fd7 | 2010-01-13 14:37:26 +0000 | [diff] [blame] | 141 | .. versionchanged:: 3.2 |
Georg Brandl | ffb94ae | 2013-10-06 19:02:08 +0200 | [diff] [blame] | 142 | Support for zero-padded and unseekable files was added. |
Antoine Pitrou | 7b96984 | 2010-09-23 16:22:51 +0000 | [diff] [blame] | 143 | |
Antoine Pitrou | 6b4be36 | 2011-04-04 21:09:05 +0200 | [diff] [blame] | 144 | .. versionchanged:: 3.3 |
| 145 | The :meth:`io.BufferedIOBase.read1` method is now implemented. |
| 146 | |
Nadeem Vawda | ee1be99 | 2013-10-19 00:11:13 +0200 | [diff] [blame] | 147 | .. versionchanged:: 3.4 |
| 148 | Added support for the ``'x'`` and ``'xb'`` modes. |
| 149 | |
Serhiy Storchaka | bca63b3 | 2015-03-23 14:59:48 +0200 | [diff] [blame] | 150 | .. versionchanged:: 3.5 |
| 151 | Added support for writing arbitrary |
| 152 | :term:`bytes-like objects <bytes-like object>`. |
Antoine Pitrou | 2dbc6e6 | 2015-04-11 00:31:01 +0200 | [diff] [blame] | 153 | The :meth:`~io.BufferedIOBase.read` method now accepts an argument of |
| 154 | ``None``. |
Serhiy Storchaka | bca63b3 | 2015-03-23 14:59:48 +0200 | [diff] [blame] | 155 | |
Berker Peksag | 03020cf | 2016-10-02 13:47:58 +0300 | [diff] [blame] | 156 | .. versionchanged:: 3.6 |
| 157 | Accepts a :term:`path-like object`. |
| 158 | |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 159 | |
Antoine Pitrou | 79c5ef1 | 2010-08-17 21:10:05 +0000 | [diff] [blame] | 160 | .. function:: compress(data, compresslevel=9) |
| 161 | |
| 162 | Compress the *data*, returning a :class:`bytes` object containing |
| 163 | the compressed data. *compresslevel* has the same meaning as in |
| 164 | the :class:`GzipFile` constructor above. |
| 165 | |
Antoine Pitrou | cdfe1c5 | 2010-08-17 21:15:00 +0000 | [diff] [blame] | 166 | .. versionadded:: 3.2 |
| 167 | |
Antoine Pitrou | 79c5ef1 | 2010-08-17 21:10:05 +0000 | [diff] [blame] | 168 | .. function:: decompress(data) |
| 169 | |
| 170 | Decompress the *data*, returning a :class:`bytes` object containing the |
| 171 | uncompressed data. |
| 172 | |
Antoine Pitrou | cdfe1c5 | 2010-08-17 21:15:00 +0000 | [diff] [blame] | 173 | .. versionadded:: 3.2 |
| 174 | |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 175 | |
Christian Heimes | bbe741d | 2008-03-28 10:53:29 +0000 | [diff] [blame] | 176 | .. _gzip-usage-examples: |
| 177 | |
| 178 | Examples of usage |
| 179 | ----------------- |
| 180 | |
| 181 | Example of how to read a compressed file:: |
| 182 | |
| 183 | import gzip |
Antoine Pitrou | bf1a018 | 2010-08-17 21:11:49 +0000 | [diff] [blame] | 184 | with gzip.open('/home/joe/file.txt.gz', 'rb') as f: |
| 185 | file_content = f.read() |
Christian Heimes | bbe741d | 2008-03-28 10:53:29 +0000 | [diff] [blame] | 186 | |
| 187 | Example of how to create a compressed GZIP file:: |
| 188 | |
| 189 | import gzip |
Antoine Pitrou | bf1a018 | 2010-08-17 21:11:49 +0000 | [diff] [blame] | 190 | content = b"Lots of content here" |
| 191 | with gzip.open('/home/joe/file.txt.gz', 'wb') as f: |
| 192 | f.write(content) |
Christian Heimes | bbe741d | 2008-03-28 10:53:29 +0000 | [diff] [blame] | 193 | |
| 194 | Example of how to GZIP compress an existing file:: |
| 195 | |
| 196 | import gzip |
Andrew Kuchling | f887a61 | 2015-04-14 11:44:40 -0400 | [diff] [blame] | 197 | import shutil |
Antoine Pitrou | bf1a018 | 2010-08-17 21:11:49 +0000 | [diff] [blame] | 198 | with open('/home/joe/file.txt', 'rb') as f_in: |
Éric Araujo | f5be090 | 2010-08-17 21:24:05 +0000 | [diff] [blame] | 199 | with gzip.open('/home/joe/file.txt.gz', 'wb') as f_out: |
Andrew Kuchling | f887a61 | 2015-04-14 11:44:40 -0400 | [diff] [blame] | 200 | shutil.copyfileobj(f_in, f_out) |
Christian Heimes | bbe741d | 2008-03-28 10:53:29 +0000 | [diff] [blame] | 201 | |
Antoine Pitrou | 79c5ef1 | 2010-08-17 21:10:05 +0000 | [diff] [blame] | 202 | Example of how to GZIP compress a binary string:: |
| 203 | |
| 204 | import gzip |
| 205 | s_in = b"Lots of content here" |
| 206 | s_out = gzip.compress(s_in) |
Christian Heimes | bbe741d | 2008-03-28 10:53:29 +0000 | [diff] [blame] | 207 | |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 208 | .. seealso:: |
| 209 | |
| 210 | Module :mod:`zlib` |
| 211 | The basic data compression module needed to support the :program:`gzip` file |
| 212 | format. |
| 213 | |