Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 1 | :mod:`gzip` --- Support for :program:`gzip` files |
| 2 | ================================================= |
| 3 | |
| 4 | .. module:: gzip |
| 5 | :synopsis: Interfaces for gzip compression and decompression using file objects. |
| 6 | |
Raymond Hettinger | 469271d | 2011-01-27 20:38:46 +0000 | [diff] [blame] | 7 | **Source code:** :source:`Lib/gzip.py` |
| 8 | |
| 9 | -------------- |
| 10 | |
Christian Heimes | bbe741d | 2008-03-28 10:53:29 +0000 | [diff] [blame] | 11 | This module provides a simple interface to compress and decompress files just |
| 12 | like the GNU programs :program:`gzip` and :program:`gunzip` would. |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 13 | |
Georg Brandl | 1f01deb | 2009-01-03 22:47:39 +0000 | [diff] [blame] | 14 | The data compression is provided by the :mod:`zlib` module. |
Christian Heimes | bbe741d | 2008-03-28 10:53:29 +0000 | [diff] [blame] | 15 | |
Antoine Pitrou | 11cb961 | 2010-09-15 11:11:28 +0000 | [diff] [blame] | 16 | The :mod:`gzip` module provides the :class:`GzipFile` class. The :class:`GzipFile` |
| 17 | class reads and writes :program:`gzip`\ -format files, automatically compressing |
| 18 | or decompressing the data so that it looks like an ordinary :term:`file object`. |
Christian Heimes | bbe741d | 2008-03-28 10:53:29 +0000 | [diff] [blame] | 19 | |
| 20 | Note that additional file formats which can be decompressed by the |
| 21 | :program:`gzip` and :program:`gunzip` programs, such as those produced by |
| 22 | :program:`compress` and :program:`pack`, are not supported by this module. |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 23 | |
Guido van Rossum | 7767711 | 2007-11-05 19:43:04 +0000 | [diff] [blame] | 24 | For other archive formats, see the :mod:`bz2`, :mod:`zipfile`, and |
| 25 | :mod:`tarfile` modules. |
| 26 | |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 27 | The module defines the following items: |
| 28 | |
| 29 | |
Georg Brandl | 036490d | 2009-05-17 13:00:36 +0000 | [diff] [blame] | 30 | .. class:: GzipFile(filename=None, mode=None, compresslevel=9, fileobj=None, mtime=None) |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 31 | |
Antoine Pitrou | c3ed2e7 | 2010-09-29 10:49:46 +0000 | [diff] [blame] | 32 | Constructor for the :class:`GzipFile` class, which simulates most of the |
| 33 | methods of a :term:`file object`, with the exception of the :meth:`truncate` |
| 34 | method. At least one of *fileobj* and *filename* must be given a non-trivial |
| 35 | value. |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 36 | |
| 37 | The new class instance is based on *fileobj*, which can be a regular file, a |
| 38 | :class:`StringIO` object, or any other object which simulates a file. It |
| 39 | defaults to ``None``, in which case *filename* is opened to provide a file |
| 40 | object. |
| 41 | |
| 42 | When *fileobj* is not ``None``, the *filename* argument is only used to be |
| 43 | included in the :program:`gzip` file header, which may includes the original |
| 44 | filename of the uncompressed file. It defaults to the filename of *fileobj*, if |
| 45 | discernible; otherwise, it defaults to the empty string, and in this case the |
| 46 | original filename is not included in the header. |
| 47 | |
| 48 | The *mode* argument can be any of ``'r'``, ``'rb'``, ``'a'``, ``'ab'``, ``'w'``, |
| 49 | or ``'wb'``, depending on whether the file will be read or written. The default |
| 50 | is the mode of *fileobj* if discernible; otherwise, the default is ``'rb'``. If |
| 51 | not given, the 'b' flag will be added to the mode to ensure the file is opened |
| 52 | in binary mode for cross-platform portability. |
| 53 | |
| 54 | The *compresslevel* argument is an integer from ``1`` to ``9`` controlling the |
| 55 | level of compression; ``1`` is fastest and produces the least compression, and |
| 56 | ``9`` is slowest and produces the most compression. The default is ``9``. |
| 57 | |
Antoine Pitrou | 42db3ef | 2009-01-04 21:37:59 +0000 | [diff] [blame] | 58 | The *mtime* argument is an optional numeric timestamp to be written to |
Benjamin Peterson | e0124bd | 2009-03-09 21:04:33 +0000 | [diff] [blame] | 59 | the stream when compressing. All :program:`gzip` compressed streams are |
Antoine Pitrou | 42db3ef | 2009-01-04 21:37:59 +0000 | [diff] [blame] | 60 | required to contain a timestamp. If omitted or ``None``, the current |
| 61 | time is used. This module ignores the timestamp when decompressing; |
| 62 | however, some programs, such as :program:`gunzip`\ , make use of it. |
| 63 | The format of the timestamp is the same as that of the return value of |
| 64 | ``time.time()`` and of the ``st_mtime`` member of the object returned |
| 65 | by ``os.stat()``. |
| 66 | |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 67 | Calling a :class:`GzipFile` object's :meth:`close` method does not close |
| 68 | *fileobj*, since you might wish to append more material after the compressed |
Antoine Pitrou | e5768cf | 2010-09-23 16:45:17 +0000 | [diff] [blame] | 69 | data. This also allows you to pass a :class:`io.BytesIO` object opened for |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 70 | writing as *fileobj*, and retrieve the resulting memory buffer using the |
Antoine Pitrou | e5768cf | 2010-09-23 16:45:17 +0000 | [diff] [blame] | 71 | :class:`io.BytesIO` object's :meth:`~io.BytesIO.getvalue` method. |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 72 | |
Antoine Pitrou | c3ed2e7 | 2010-09-29 10:49:46 +0000 | [diff] [blame] | 73 | :class:`GzipFile` supports the :class:`io.BufferedIOBase` interface, |
| 74 | including iteration and the :keyword:`with` statement. Only the |
Antoine Pitrou | e886047 | 2011-04-04 21:06:20 +0200 | [diff] [blame] | 75 | :meth:`read1` and :meth:`truncate` methods aren't implemented. |
Benjamin Peterson | e0124bd | 2009-03-09 21:04:33 +0000 | [diff] [blame] | 76 | |
Antoine Pitrou | 7b998e9 | 2010-10-04 21:55:14 +0000 | [diff] [blame] | 77 | :class:`GzipFile` also provides the following method: |
| 78 | |
| 79 | .. method:: peek([n]) |
| 80 | |
| 81 | Read *n* uncompressed bytes without advancing the file position. |
| 82 | At most one single read on the compressed stream is done to satisfy |
| 83 | the call. The number of bytes returned may be more or less than |
| 84 | requested. |
| 85 | |
| 86 | .. versionadded:: 3.2 |
| 87 | |
Benjamin Peterson | 10745a9 | 2009-03-09 21:08:47 +0000 | [diff] [blame] | 88 | .. versionchanged:: 3.1 |
Benjamin Peterson | e0124bd | 2009-03-09 21:04:33 +0000 | [diff] [blame] | 89 | Support for the :keyword:`with` statement was added. |
| 90 | |
Antoine Pitrou | 8e33fd7 | 2010-01-13 14:37:26 +0000 | [diff] [blame] | 91 | .. versionchanged:: 3.2 |
| 92 | Support for zero-padded files was added. |
| 93 | |
Antoine Pitrou | 7b96984 | 2010-09-23 16:22:51 +0000 | [diff] [blame] | 94 | .. versionchanged:: 3.2 |
| 95 | Support for unseekable files was added. |
| 96 | |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 97 | |
Georg Brandl | 036490d | 2009-05-17 13:00:36 +0000 | [diff] [blame] | 98 | .. function:: open(filename, mode='rb', compresslevel=9) |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 99 | |
| 100 | This is a shorthand for ``GzipFile(filename,`` ``mode,`` ``compresslevel)``. |
| 101 | The *filename* argument is required; *mode* defaults to ``'rb'`` and |
| 102 | *compresslevel* defaults to ``9``. |
| 103 | |
Antoine Pitrou | 79c5ef1 | 2010-08-17 21:10:05 +0000 | [diff] [blame] | 104 | .. function:: compress(data, compresslevel=9) |
| 105 | |
| 106 | Compress the *data*, returning a :class:`bytes` object containing |
| 107 | the compressed data. *compresslevel* has the same meaning as in |
| 108 | the :class:`GzipFile` constructor above. |
| 109 | |
Antoine Pitrou | cdfe1c5 | 2010-08-17 21:15:00 +0000 | [diff] [blame] | 110 | .. versionadded:: 3.2 |
| 111 | |
Antoine Pitrou | 79c5ef1 | 2010-08-17 21:10:05 +0000 | [diff] [blame] | 112 | .. function:: decompress(data) |
| 113 | |
| 114 | Decompress the *data*, returning a :class:`bytes` object containing the |
| 115 | uncompressed data. |
| 116 | |
Antoine Pitrou | cdfe1c5 | 2010-08-17 21:15:00 +0000 | [diff] [blame] | 117 | .. versionadded:: 3.2 |
| 118 | |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 119 | |
Christian Heimes | bbe741d | 2008-03-28 10:53:29 +0000 | [diff] [blame] | 120 | .. _gzip-usage-examples: |
| 121 | |
| 122 | Examples of usage |
| 123 | ----------------- |
| 124 | |
| 125 | Example of how to read a compressed file:: |
| 126 | |
| 127 | import gzip |
Antoine Pitrou | bf1a018 | 2010-08-17 21:11:49 +0000 | [diff] [blame] | 128 | with gzip.open('/home/joe/file.txt.gz', 'rb') as f: |
| 129 | file_content = f.read() |
Christian Heimes | bbe741d | 2008-03-28 10:53:29 +0000 | [diff] [blame] | 130 | |
| 131 | Example of how to create a compressed GZIP file:: |
| 132 | |
| 133 | import gzip |
Antoine Pitrou | bf1a018 | 2010-08-17 21:11:49 +0000 | [diff] [blame] | 134 | content = b"Lots of content here" |
| 135 | with gzip.open('/home/joe/file.txt.gz', 'wb') as f: |
| 136 | f.write(content) |
Christian Heimes | bbe741d | 2008-03-28 10:53:29 +0000 | [diff] [blame] | 137 | |
| 138 | Example of how to GZIP compress an existing file:: |
| 139 | |
| 140 | import gzip |
Antoine Pitrou | bf1a018 | 2010-08-17 21:11:49 +0000 | [diff] [blame] | 141 | with open('/home/joe/file.txt', 'rb') as f_in: |
Éric Araujo | f5be090 | 2010-08-17 21:24:05 +0000 | [diff] [blame] | 142 | with gzip.open('/home/joe/file.txt.gz', 'wb') as f_out: |
Antoine Pitrou | bf1a018 | 2010-08-17 21:11:49 +0000 | [diff] [blame] | 143 | f_out.writelines(f_in) |
Christian Heimes | bbe741d | 2008-03-28 10:53:29 +0000 | [diff] [blame] | 144 | |
Antoine Pitrou | 79c5ef1 | 2010-08-17 21:10:05 +0000 | [diff] [blame] | 145 | Example of how to GZIP compress a binary string:: |
| 146 | |
| 147 | import gzip |
| 148 | s_in = b"Lots of content here" |
| 149 | s_out = gzip.compress(s_in) |
Christian Heimes | bbe741d | 2008-03-28 10:53:29 +0000 | [diff] [blame] | 150 | |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 151 | .. seealso:: |
| 152 | |
| 153 | Module :mod:`zlib` |
| 154 | The basic data compression module needed to support the :program:`gzip` file |
| 155 | format. |
| 156 | |