blob: c34f2c4a570246893247a96e70dd3e62c7ef9e36 [file] [log] [blame]
Georg Brandl116aa622007-08-15 14:28:22 +00001:mod:`tarfile` --- Read and write tar archive files
2===================================================
3
4.. module:: tarfile
5 :synopsis: Read and write tar-format archive files.
6
Georg Brandl116aa622007-08-15 14:28:22 +00007.. moduleauthor:: Lars Gustäbel <lars@gustaebel.de>
8.. sectionauthor:: Lars Gustäbel <lars@gustaebel.de>
9
Raymond Hettingera1993682011-01-27 01:20:32 +000010**Source code:** :source:`Lib/tarfile.py`
11
12--------------
Georg Brandl116aa622007-08-15 14:28:22 +000013
Guido van Rossum77677112007-11-05 19:43:04 +000014The :mod:`tarfile` module makes it possible to read and write tar
Lars Gustäbel0a9dd2f2011-12-10 20:38:14 +010015archives, including those using gzip, bz2 and lzma compression.
Éric Araujof2fbb9c2012-01-16 16:55:55 +010016Use the :mod:`zipfile` module to read or write :file:`.zip` files, or the
17higher-level functions in :ref:`shutil <archiving-operations>`.
Guido van Rossum77677112007-11-05 19:43:04 +000018
Georg Brandl116aa622007-08-15 14:28:22 +000019Some facts and figures:
20
R David Murraybf92bce2014-10-03 20:18:48 -040021* reads and writes :mod:`gzip`, :mod:`bz2` and :mod:`lzma` compressed archives
22 if the respective modules are available.
Georg Brandl116aa622007-08-15 14:28:22 +000023
24* read/write support for the POSIX.1-1988 (ustar) format.
25
26* read/write support for the GNU tar format including *longname* and *longlink*
Lars Gustäbel9cbdd752010-10-29 09:08:19 +000027 extensions, read-only support for all variants of the *sparse* extension
28 including restoration of sparse files.
Georg Brandl116aa622007-08-15 14:28:22 +000029
30* read/write support for the POSIX.1-2001 (pax) format.
31
Georg Brandl116aa622007-08-15 14:28:22 +000032* handles directories, regular files, hardlinks, symbolic links, fifos,
33 character devices and block devices and is able to acquire and restore file
34 information like timestamp, access permissions and owner.
35
Lars Gustäbel521dfb02011-12-12 10:22:56 +010036.. versionchanged:: 3.3
37 Added support for :mod:`lzma` compression.
38
Georg Brandl116aa622007-08-15 14:28:22 +000039
Benjamin Petersona37cfc62008-05-26 13:48:34 +000040.. function:: open(name=None, mode='r', fileobj=None, bufsize=10240, \*\*kwargs)
Georg Brandl116aa622007-08-15 14:28:22 +000041
42 Return a :class:`TarFile` object for the pathname *name*. For detailed
43 information on :class:`TarFile` objects and the keyword arguments that are
44 allowed, see :ref:`tarfile-objects`.
45
46 *mode* has to be a string of the form ``'filemode[:compression]'``, it defaults
47 to ``'r'``. Here is a full list of mode combinations:
48
49 +------------------+---------------------------------------------+
50 | mode | action |
51 +==================+=============================================+
52 | ``'r' or 'r:*'`` | Open for reading with transparent |
53 | | compression (recommended). |
54 +------------------+---------------------------------------------+
55 | ``'r:'`` | Open for reading exclusively without |
56 | | compression. |
57 +------------------+---------------------------------------------+
58 | ``'r:gz'`` | Open for reading with gzip compression. |
59 +------------------+---------------------------------------------+
60 | ``'r:bz2'`` | Open for reading with bzip2 compression. |
61 +------------------+---------------------------------------------+
Lars Gustäbel0a9dd2f2011-12-10 20:38:14 +010062 | ``'r:xz'`` | Open for reading with lzma compression. |
63 +------------------+---------------------------------------------+
Berker Peksag0fe63252015-02-13 21:02:12 +020064 | ``'x'`` or | Create a tarfile exclusively without |
65 | ``'x:'`` | compression. |
66 | | Raise an :exc:`FileExistsError` exception |
Berker Peksag97484782016-06-14 00:48:35 +030067 | | if it already exists. |
Berker Peksag0fe63252015-02-13 21:02:12 +020068 +------------------+---------------------------------------------+
69 | ``'x:gz'`` | Create a tarfile with gzip compression. |
70 | | Raise an :exc:`FileExistsError` exception |
Berker Peksag97484782016-06-14 00:48:35 +030071 | | if it already exists. |
Berker Peksag0fe63252015-02-13 21:02:12 +020072 +------------------+---------------------------------------------+
73 | ``'x:bz2'`` | Create a tarfile with bzip2 compression. |
74 | | Raise an :exc:`FileExistsError` exception |
Berker Peksag97484782016-06-14 00:48:35 +030075 | | if it already exists. |
Berker Peksag0fe63252015-02-13 21:02:12 +020076 +------------------+---------------------------------------------+
77 | ``'x:xz'`` | Create a tarfile with lzma compression. |
78 | | Raise an :exc:`FileExistsError` exception |
Berker Peksag97484782016-06-14 00:48:35 +030079 | | if it already exists. |
Berker Peksag0fe63252015-02-13 21:02:12 +020080 +------------------+---------------------------------------------+
Georg Brandl116aa622007-08-15 14:28:22 +000081 | ``'a' or 'a:'`` | Open for appending with no compression. The |
82 | | file is created if it does not exist. |
83 +------------------+---------------------------------------------+
84 | ``'w' or 'w:'`` | Open for uncompressed writing. |
85 +------------------+---------------------------------------------+
86 | ``'w:gz'`` | Open for gzip compressed writing. |
87 +------------------+---------------------------------------------+
88 | ``'w:bz2'`` | Open for bzip2 compressed writing. |
89 +------------------+---------------------------------------------+
Lars Gustäbel0a9dd2f2011-12-10 20:38:14 +010090 | ``'w:xz'`` | Open for lzma compressed writing. |
91 +------------------+---------------------------------------------+
Georg Brandl116aa622007-08-15 14:28:22 +000092
Lars Gustäbel0a9dd2f2011-12-10 20:38:14 +010093 Note that ``'a:gz'``, ``'a:bz2'`` or ``'a:xz'`` is not possible. If *mode*
94 is not suitable to open a certain (compressed) file for reading,
95 :exc:`ReadError` is raised. Use *mode* ``'r'`` to avoid this. If a
96 compression method is not supported, :exc:`CompressionError` is raised.
Georg Brandl116aa622007-08-15 14:28:22 +000097
Antoine Pitrou11cb9612010-09-15 11:11:28 +000098 If *fileobj* is specified, it is used as an alternative to a :term:`file object`
99 opened in binary mode for *name*. It is supposed to be at position 0.
Georg Brandl116aa622007-08-15 14:28:22 +0000100
Berker Peksag0fe63252015-02-13 21:02:12 +0200101 For modes ``'w:gz'``, ``'r:gz'``, ``'w:bz2'``, ``'r:bz2'``, ``'x:gz'``,
102 ``'x:bz2'``, :func:`tarfile.open` accepts the keyword argument
Martin Panter7f7c6052016-04-13 03:24:06 +0000103 *compresslevel* (default ``9``) to specify the compression level of the file.
Benjamin Peterson9b2731b2014-06-07 12:45:37 -0700104
Georg Brandl116aa622007-08-15 14:28:22 +0000105 For special purposes, there is a second format for *mode*:
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000106 ``'filemode|[compression]'``. :func:`tarfile.open` will return a :class:`TarFile`
Georg Brandl116aa622007-08-15 14:28:22 +0000107 object that processes its data as a stream of blocks. No random seeking will
108 be done on the file. If given, *fileobj* may be any object that has a
109 :meth:`read` or :meth:`write` method (depending on the *mode*). *bufsize*
110 specifies the blocksize and defaults to ``20 * 512`` bytes. Use this variant
Antoine Pitrou11cb9612010-09-15 11:11:28 +0000111 in combination with e.g. ``sys.stdin``, a socket :term:`file object` or a tape
Georg Brandl116aa622007-08-15 14:28:22 +0000112 device. However, such a :class:`TarFile` object is limited in that it does
Martin Panterc04fb562016-02-10 05:44:01 +0000113 not allow random access, see :ref:`tar-examples`. The currently
Georg Brandl116aa622007-08-15 14:28:22 +0000114 possible modes:
115
116 +-------------+--------------------------------------------+
117 | Mode | Action |
118 +=============+============================================+
119 | ``'r|*'`` | Open a *stream* of tar blocks for reading |
120 | | with transparent compression. |
121 +-------------+--------------------------------------------+
122 | ``'r|'`` | Open a *stream* of uncompressed tar blocks |
123 | | for reading. |
124 +-------------+--------------------------------------------+
125 | ``'r|gz'`` | Open a gzip compressed *stream* for |
126 | | reading. |
127 +-------------+--------------------------------------------+
128 | ``'r|bz2'`` | Open a bzip2 compressed *stream* for |
129 | | reading. |
130 +-------------+--------------------------------------------+
Serhiy Storchaka6a7b3a72016-04-17 08:32:47 +0300131 | ``'r|xz'`` | Open an lzma compressed *stream* for |
Lars Gustäbel0a9dd2f2011-12-10 20:38:14 +0100132 | | reading. |
133 +-------------+--------------------------------------------+
Georg Brandl116aa622007-08-15 14:28:22 +0000134 | ``'w|'`` | Open an uncompressed *stream* for writing. |
135 +-------------+--------------------------------------------+
Lars Gustäbel0c6cbbd2011-12-10 12:45:45 +0100136 | ``'w|gz'`` | Open a gzip compressed *stream* for |
Georg Brandl116aa622007-08-15 14:28:22 +0000137 | | writing. |
138 +-------------+--------------------------------------------+
Lars Gustäbel0c6cbbd2011-12-10 12:45:45 +0100139 | ``'w|bz2'`` | Open a bzip2 compressed *stream* for |
Georg Brandl116aa622007-08-15 14:28:22 +0000140 | | writing. |
141 +-------------+--------------------------------------------+
Lars Gustäbel0a9dd2f2011-12-10 20:38:14 +0100142 | ``'w|xz'`` | Open an lzma compressed *stream* for |
143 | | writing. |
144 +-------------+--------------------------------------------+
Georg Brandl116aa622007-08-15 14:28:22 +0000145
Berker Peksag0fe63252015-02-13 21:02:12 +0200146 .. versionchanged:: 3.5
147 The ``'x'`` (exclusive creation) mode was added.
Georg Brandl116aa622007-08-15 14:28:22 +0000148
Serhiy Storchakac45cd162017-03-08 10:32:44 +0200149 .. versionchanged:: 3.6
150 The *name* parameter accepts a :term:`path-like object`.
151
152
Georg Brandl116aa622007-08-15 14:28:22 +0000153.. class:: TarFile
154
Berker Peksag97484782016-06-14 00:48:35 +0300155 Class for reading and writing tar archives. Do not use this class directly:
156 use :func:`tarfile.open` instead. See :ref:`tarfile-objects`.
Georg Brandl116aa622007-08-15 14:28:22 +0000157
158
159.. function:: is_tarfile(name)
160
161 Return :const:`True` if *name* is a tar archive file, that the :mod:`tarfile`
162 module can read.
163
164
Lars Gustäbel0c24e8b2008-08-02 11:43:24 +0000165The :mod:`tarfile` module defines the following exceptions:
Georg Brandl116aa622007-08-15 14:28:22 +0000166
167
168.. exception:: TarError
169
170 Base class for all :mod:`tarfile` exceptions.
171
172
173.. exception:: ReadError
174
175 Is raised when a tar archive is opened, that either cannot be handled by the
176 :mod:`tarfile` module or is somehow invalid.
177
178
179.. exception:: CompressionError
180
181 Is raised when a compression method is not supported or when the data cannot be
182 decoded properly.
183
184
185.. exception:: StreamError
186
187 Is raised for the limitations that are typical for stream-like :class:`TarFile`
188 objects.
189
190
191.. exception:: ExtractError
192
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000193 Is raised for *non-fatal* errors when using :meth:`TarFile.extract`, but only if
Georg Brandl116aa622007-08-15 14:28:22 +0000194 :attr:`TarFile.errorlevel`\ ``== 2``.
195
196
197.. exception:: HeaderError
198
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000199 Is raised by :meth:`TarInfo.frombuf` if the buffer it gets is invalid.
200
Georg Brandl116aa622007-08-15 14:28:22 +0000201
R David Murraybf92bce2014-10-03 20:18:48 -0400202The following constants are available at the module level:
203
204.. data:: ENCODING
205
206 The default character encoding: ``'utf-8'`` on Windows, the value returned by
207 :func:`sys.getfilesystemencoding` otherwise.
208
Georg Brandl116aa622007-08-15 14:28:22 +0000209
210Each of the following constants defines a tar archive format that the
211:mod:`tarfile` module is able to create. See section :ref:`tar-formats` for
212details.
213
214
215.. data:: USTAR_FORMAT
216
217 POSIX.1-1988 (ustar) format.
218
219
220.. data:: GNU_FORMAT
221
222 GNU tar format.
223
224
225.. data:: PAX_FORMAT
226
227 POSIX.1-2001 (pax) format.
228
229
230.. data:: DEFAULT_FORMAT
231
CAM Gerlache680c3d2019-03-21 09:44:51 -0500232 The default format for creating archives. This is currently :const:`PAX_FORMAT`.
233
CAM Gerlach89a89442019-04-06 23:47:49 -0500234 .. versionchanged:: 3.8
235 The default format for new archives was changed to
236 :const:`PAX_FORMAT` from :const:`GNU_FORMAT`.
Georg Brandl116aa622007-08-15 14:28:22 +0000237
238
239.. seealso::
240
241 Module :mod:`zipfile`
242 Documentation of the :mod:`zipfile` standard module.
243
R David Murraybf92bce2014-10-03 20:18:48 -0400244 :ref:`archiving-operations`
245 Documentation of the higher-level archiving facilities provided by the
246 standard :mod:`shutil` module.
247
Serhiy Storchaka6dff0202016-05-07 10:49:07 +0300248 `GNU tar manual, Basic Tar Format <https://www.gnu.org/software/tar/manual/html_node/Standard.html>`_
Georg Brandl116aa622007-08-15 14:28:22 +0000249 Documentation for tar archive files, including GNU tar extensions.
250
Georg Brandl116aa622007-08-15 14:28:22 +0000251
252.. _tarfile-objects:
253
254TarFile Objects
255---------------
256
257The :class:`TarFile` object provides an interface to a tar archive. A tar
258archive is a sequence of blocks. An archive member (a stored file) is made up of
259a header block followed by data blocks. It is possible to store a file in a tar
260archive several times. Each archive member is represented by a :class:`TarInfo`
261object, see :ref:`tarinfo-objects` for details.
262
Lars Gustäbel01385812010-03-03 12:08:54 +0000263A :class:`TarFile` object can be used as a context manager in a :keyword:`with`
264statement. It will automatically be closed when the block is completed. Please
265note that in the event of an exception an archive opened for writing will not
Benjamin Peterson08bf91c2010-04-11 16:12:57 +0000266be finalized; only the internally used file object will be closed. See the
Lars Gustäbel01385812010-03-03 12:08:54 +0000267:ref:`tar-examples` section for a use case.
268
269.. versionadded:: 3.2
Serhiy Storchaka14867992014-09-10 23:43:41 +0300270 Added support for the context management protocol.
Georg Brandl116aa622007-08-15 14:28:22 +0000271
Victor Stinnerde629d42010-05-05 21:43:57 +0000272.. class:: TarFile(name=None, mode='r', fileobj=None, format=DEFAULT_FORMAT, tarinfo=TarInfo, dereference=False, ignore_zeros=False, encoding=ENCODING, errors='surrogateescape', pax_headers=None, debug=0, errorlevel=0)
Georg Brandl116aa622007-08-15 14:28:22 +0000273
274 All following arguments are optional and can be accessed as instance attributes
275 as well.
276
Serhiy Storchakac45cd162017-03-08 10:32:44 +0200277 *name* is the pathname of the archive. *name* may be a :term:`path-like object`.
278 It can be omitted if *fileobj* is given.
Georg Brandl116aa622007-08-15 14:28:22 +0000279 In this case, the file object's :attr:`name` attribute is used if it exists.
280
281 *mode* is either ``'r'`` to read from an existing archive, ``'a'`` to append
Berker Peksag0fe63252015-02-13 21:02:12 +0200282 data to an existing file, ``'w'`` to create a new file overwriting an existing
Berker Peksag97484782016-06-14 00:48:35 +0300283 one, or ``'x'`` to create a new file only if it does not already exist.
Georg Brandl116aa622007-08-15 14:28:22 +0000284
285 If *fileobj* is given, it is used for reading or writing data. If it can be
286 determined, *mode* is overridden by *fileobj*'s mode. *fileobj* will be used
287 from position 0.
288
289 .. note::
290
291 *fileobj* is not closed, when :class:`TarFile` is closed.
292
Pascal Chambonc5a7e0c2019-09-28 17:04:44 +0200293 *format* controls the archive format for writing. It must be one of the constants
Georg Brandl116aa622007-08-15 14:28:22 +0000294 :const:`USTAR_FORMAT`, :const:`GNU_FORMAT` or :const:`PAX_FORMAT` that are
Pascal Chambonc5a7e0c2019-09-28 17:04:44 +0200295 defined at module level. When reading, format will be automatically detected, even
296 if different formats are present in a single archive.
Georg Brandl116aa622007-08-15 14:28:22 +0000297
Georg Brandl116aa622007-08-15 14:28:22 +0000298 The *tarinfo* argument can be used to replace the default :class:`TarInfo` class
299 with a different one.
300
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000301 If *dereference* is :const:`False`, add symbolic and hard links to the archive. If it
302 is :const:`True`, add the content of the target files to the archive. This has no
Georg Brandl116aa622007-08-15 14:28:22 +0000303 effect on systems that do not support symbolic links.
304
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000305 If *ignore_zeros* is :const:`False`, treat an empty block as the end of the archive.
306 If it is :const:`True`, skip empty (and invalid) blocks and try to get as many members
Georg Brandl116aa622007-08-15 14:28:22 +0000307 as possible. This is only useful for reading concatenated or damaged archives.
308
309 *debug* can be set from ``0`` (no debug messages) up to ``3`` (all debug
310 messages). The messages are written to ``sys.stderr``.
311
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000312 If *errorlevel* is ``0``, all errors are ignored when using :meth:`TarFile.extract`.
Georg Brandl116aa622007-08-15 14:28:22 +0000313 Nevertheless, they appear as error messages in the debug output, when debugging
Antoine Pitrou62ab10a02011-10-12 20:10:51 +0200314 is enabled. If ``1``, all *fatal* errors are raised as :exc:`OSError`
315 exceptions. If ``2``, all *non-fatal* errors are raised as :exc:`TarError`
316 exceptions as well.
Georg Brandl116aa622007-08-15 14:28:22 +0000317
Lars Gustäbel3741eff2007-08-21 12:17:05 +0000318 The *encoding* and *errors* arguments define the character encoding to be
319 used for reading or writing the archive and how conversion errors are going
320 to be handled. The default settings will work for most users.
Georg Brandl116aa622007-08-15 14:28:22 +0000321 See section :ref:`tar-unicode` for in-depth information.
322
Lars Gustäbel3741eff2007-08-21 12:17:05 +0000323 The *pax_headers* argument is an optional dictionary of strings which
Georg Brandl116aa622007-08-15 14:28:22 +0000324 will be added as a pax global header if *format* is :const:`PAX_FORMAT`.
325
Berker Peksag0fe63252015-02-13 21:02:12 +0200326 .. versionchanged:: 3.2
327 Use ``'surrogateescape'`` as the default for the *errors* argument.
328
329 .. versionchanged:: 3.5
330 The ``'x'`` (exclusive creation) mode was added.
Georg Brandl116aa622007-08-15 14:28:22 +0000331
Serhiy Storchakac45cd162017-03-08 10:32:44 +0200332 .. versionchanged:: 3.6
333 The *name* parameter accepts a :term:`path-like object`.
334
335
Raymond Hettinger7096e262014-05-23 03:46:52 +0100336.. classmethod:: TarFile.open(...)
Georg Brandl116aa622007-08-15 14:28:22 +0000337
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000338 Alternative constructor. The :func:`tarfile.open` function is actually a
339 shortcut to this classmethod.
Georg Brandl116aa622007-08-15 14:28:22 +0000340
341
342.. method:: TarFile.getmember(name)
343
344 Return a :class:`TarInfo` object for member *name*. If *name* can not be found
345 in the archive, :exc:`KeyError` is raised.
346
347 .. note::
348
349 If a member occurs more than once in the archive, its last occurrence is assumed
350 to be the most up-to-date version.
351
352
353.. method:: TarFile.getmembers()
354
355 Return the members of the archive as a list of :class:`TarInfo` objects. The
356 list has the same order as the members in the archive.
357
358
359.. method:: TarFile.getnames()
360
361 Return the members as a list of their names. It has the same order as the list
362 returned by :meth:`getmembers`.
363
364
Serhiy Storchakaa7eb7462014-08-21 10:01:16 +0300365.. method:: TarFile.list(verbose=True, *, members=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000366
367 Print a table of contents to ``sys.stdout``. If *verbose* is :const:`False`,
368 only the names of the members are printed. If it is :const:`True`, output
Serhiy Storchakaa7eb7462014-08-21 10:01:16 +0300369 similar to that of :program:`ls -l` is produced. If optional *members* is
370 given, it must be a subset of the list returned by :meth:`getmembers`.
371
372 .. versionchanged:: 3.5
373 Added the *members* parameter.
Georg Brandl116aa622007-08-15 14:28:22 +0000374
375
376.. method:: TarFile.next()
377
378 Return the next member of the archive as a :class:`TarInfo` object, when
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000379 :class:`TarFile` is opened for reading. Return :const:`None` if there is no more
Georg Brandl116aa622007-08-15 14:28:22 +0000380 available.
381
382
Eric V. Smith7a803892015-04-15 10:27:58 -0400383.. method:: TarFile.extractall(path=".", members=None, *, numeric_owner=False)
Georg Brandl116aa622007-08-15 14:28:22 +0000384
385 Extract all members from the archive to the current working directory or
386 directory *path*. If optional *members* is given, it must be a subset of the
387 list returned by :meth:`getmembers`. Directory information like owner,
388 modification time and permissions are set after all members have been extracted.
389 This is done to work around two problems: A directory's modification time is
390 reset each time a file is created in it. And, if a directory's permissions do
391 not allow writing, extracting files to it will fail.
392
Eric V. Smith7a803892015-04-15 10:27:58 -0400393 If *numeric_owner* is :const:`True`, the uid and gid numbers from the tarfile
394 are used to set the owner/group for the extracted files. Otherwise, the named
395 values from the tarfile are used.
396
Thomas Wouters47b49bf2007-08-30 22:15:33 +0000397 .. warning::
398
399 Never extract archives from untrusted sources without prior inspection.
400 It is possible that files are created outside of *path*, e.g. members
401 that have absolute filenames starting with ``"/"`` or filenames with two
402 dots ``".."``.
403
Eric V. Smith7a803892015-04-15 10:27:58 -0400404 .. versionchanged:: 3.5
Martin Panterefbf20f2016-11-13 23:25:06 +0000405 Added the *numeric_owner* parameter.
Georg Brandl116aa622007-08-15 14:28:22 +0000406
Serhiy Storchakac45cd162017-03-08 10:32:44 +0200407 .. versionchanged:: 3.6
408 The *path* parameter accepts a :term:`path-like object`.
409
Eric V. Smith7a803892015-04-15 10:27:58 -0400410
411.. method:: TarFile.extract(member, path="", set_attrs=True, *, numeric_owner=False)
Georg Brandl116aa622007-08-15 14:28:22 +0000412
413 Extract a member from the archive to the current working directory, using its
414 full name. Its file information is extracted as accurately as possible. *member*
415 may be a filename or a :class:`TarInfo` object. You can specify a different
Serhiy Storchakac45cd162017-03-08 10:32:44 +0200416 directory using *path*. *path* may be a :term:`path-like object`.
417 File attributes (owner, mtime, mode) are set unless *set_attrs* is false.
Georg Brandl116aa622007-08-15 14:28:22 +0000418
Eric V. Smith7a803892015-04-15 10:27:58 -0400419 If *numeric_owner* is :const:`True`, the uid and gid numbers from the tarfile
420 are used to set the owner/group for the extracted files. Otherwise, the named
421 values from the tarfile are used.
422
Georg Brandl116aa622007-08-15 14:28:22 +0000423 .. note::
424
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000425 The :meth:`extract` method does not take care of several extraction issues.
426 In most cases you should consider using the :meth:`extractall` method.
Georg Brandl116aa622007-08-15 14:28:22 +0000427
Thomas Wouters47b49bf2007-08-30 22:15:33 +0000428 .. warning::
429
430 See the warning for :meth:`extractall`.
431
Martin v. Löwis16f344d2010-11-01 21:39:13 +0000432 .. versionchanged:: 3.2
433 Added the *set_attrs* parameter.
Georg Brandl116aa622007-08-15 14:28:22 +0000434
Eric V. Smith7a803892015-04-15 10:27:58 -0400435 .. versionchanged:: 3.5
Martin Panterefbf20f2016-11-13 23:25:06 +0000436 Added the *numeric_owner* parameter.
Eric V. Smith7a803892015-04-15 10:27:58 -0400437
Serhiy Storchakac45cd162017-03-08 10:32:44 +0200438 .. versionchanged:: 3.6
439 The *path* parameter accepts a :term:`path-like object`.
440
441
Georg Brandl116aa622007-08-15 14:28:22 +0000442.. method:: TarFile.extractfile(member)
443
444 Extract a member from the archive as a file object. *member* may be a filename
Lars Gustäbel7a919e92012-05-05 18:15:03 +0200445 or a :class:`TarInfo` object. If *member* is a regular file or a link, an
446 :class:`io.BufferedReader` object is returned. Otherwise, :const:`None` is
447 returned.
Georg Brandl116aa622007-08-15 14:28:22 +0000448
Lars Gustäbel7a919e92012-05-05 18:15:03 +0200449 .. versionchanged:: 3.3
450 Return an :class:`io.BufferedReader` object.
Georg Brandl116aa622007-08-15 14:28:22 +0000451
452
Serhiy Storchaka4f76fb12017-01-13 13:25:24 +0200453.. method:: TarFile.add(name, arcname=None, recursive=True, *, filter=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000454
Raymond Hettingera63a3122011-01-26 20:34:14 +0000455 Add the file *name* to the archive. *name* may be any type of file
456 (directory, fifo, symbolic link, etc.). If given, *arcname* specifies an
457 alternative name for the file in the archive. Directories are added
458 recursively by default. This can be avoided by setting *recursive* to
Bernhard M. Wiedemann84521042018-01-31 11:17:10 +0100459 :const:`False`. Recursion adds entries in sorted order.
460 If *filter* is given, it
Raymond Hettingera63a3122011-01-26 20:34:14 +0000461 should be a function that takes a :class:`TarInfo` object argument and
462 returns the changed :class:`TarInfo` object. If it instead returns
463 :const:`None` the :class:`TarInfo` object will be excluded from the
464 archive. See :ref:`tar-examples` for an example.
Lars Gustäbel049d2aa2009-09-12 10:44:00 +0000465
466 .. versionchanged:: 3.2
467 Added the *filter* parameter.
468
Bernhard M. Wiedemann84521042018-01-31 11:17:10 +0100469 .. versionchanged:: 3.7
470 Recursion adds entries in sorted order.
471
Georg Brandl116aa622007-08-15 14:28:22 +0000472
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000473.. method:: TarFile.addfile(tarinfo, fileobj=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000474
475 Add the :class:`TarInfo` object *tarinfo* to the archive. If *fileobj* is given,
Martin Panterf817a482016-02-19 23:34:56 +0000476 it should be a :term:`binary file`, and
Georg Brandl116aa622007-08-15 14:28:22 +0000477 ``tarinfo.size`` bytes are read from it and added to the archive. You can
Martin Panterf817a482016-02-19 23:34:56 +0000478 create :class:`TarInfo` objects directly, or by using :meth:`gettarinfo`.
Georg Brandl116aa622007-08-15 14:28:22 +0000479
480
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000481.. method:: TarFile.gettarinfo(name=None, arcname=None, fileobj=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000482
Martin Panterf817a482016-02-19 23:34:56 +0000483 Create a :class:`TarInfo` object from the result of :func:`os.stat` or
484 equivalent on an existing file. The file is either named by *name*, or
Serhiy Storchakac45cd162017-03-08 10:32:44 +0200485 specified as a :term:`file object` *fileobj* with a file descriptor.
486 *name* may be a :term:`path-like object`. If
Martin Panterf817a482016-02-19 23:34:56 +0000487 given, *arcname* specifies an alternative name for the file in the
488 archive, otherwise, the name is taken from *fileobj*’s
489 :attr:`~io.FileIO.name` attribute, or the *name* argument. The name
490 should be a text string.
491
492 You can modify
493 some of the :class:`TarInfo`’s attributes before you add it using :meth:`addfile`.
494 If the file object is not an ordinary file object positioned at the
495 beginning of the file, attributes such as :attr:`~TarInfo.size` may need
496 modifying. This is the case for objects such as :class:`~gzip.GzipFile`.
497 The :attr:`~TarInfo.name` may also be modified, in which case *arcname*
498 could be a dummy string.
Georg Brandl116aa622007-08-15 14:28:22 +0000499
Serhiy Storchakac45cd162017-03-08 10:32:44 +0200500 .. versionchanged:: 3.6
501 The *name* parameter accepts a :term:`path-like object`.
502
Georg Brandl116aa622007-08-15 14:28:22 +0000503
504.. method:: TarFile.close()
505
506 Close the :class:`TarFile`. In write mode, two finishing zero blocks are
507 appended to the archive.
508
509
Georg Brandl116aa622007-08-15 14:28:22 +0000510.. attribute:: TarFile.pax_headers
511
512 A dictionary containing key-value pairs of pax global headers.
513
Georg Brandl116aa622007-08-15 14:28:22 +0000514
Georg Brandl116aa622007-08-15 14:28:22 +0000515
516.. _tarinfo-objects:
517
518TarInfo Objects
519---------------
520
521A :class:`TarInfo` object represents one member in a :class:`TarFile`. Aside
522from storing all required attributes of a file (like file type, size, time,
523permissions, owner etc.), it provides some useful methods to determine its type.
524It does *not* contain the file's data itself.
525
526:class:`TarInfo` objects are returned by :class:`TarFile`'s methods
527:meth:`getmember`, :meth:`getmembers` and :meth:`gettarinfo`.
528
529
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000530.. class:: TarInfo(name="")
Georg Brandl116aa622007-08-15 14:28:22 +0000531
532 Create a :class:`TarInfo` object.
533
534
Berker Peksag37de9102015-04-19 04:37:35 +0300535.. classmethod:: TarInfo.frombuf(buf, encoding, errors)
Georg Brandl116aa622007-08-15 14:28:22 +0000536
537 Create and return a :class:`TarInfo` object from string buffer *buf*.
538
Berker Peksag37de9102015-04-19 04:37:35 +0300539 Raises :exc:`HeaderError` if the buffer is invalid.
Georg Brandl116aa622007-08-15 14:28:22 +0000540
541
Berker Peksag37de9102015-04-19 04:37:35 +0300542.. classmethod:: TarInfo.fromtarfile(tarfile)
Georg Brandl116aa622007-08-15 14:28:22 +0000543
544 Read the next member from the :class:`TarFile` object *tarfile* and return it as
545 a :class:`TarInfo` object.
546
Georg Brandl116aa622007-08-15 14:28:22 +0000547
Victor Stinnerde629d42010-05-05 21:43:57 +0000548.. method:: TarInfo.tobuf(format=DEFAULT_FORMAT, encoding=ENCODING, errors='surrogateescape')
Georg Brandl116aa622007-08-15 14:28:22 +0000549
550 Create a string buffer from a :class:`TarInfo` object. For information on the
551 arguments see the constructor of the :class:`TarFile` class.
552
Victor Stinnerde629d42010-05-05 21:43:57 +0000553 .. versionchanged:: 3.2
554 Use ``'surrogateescape'`` as the default for the *errors* argument.
555
Georg Brandl116aa622007-08-15 14:28:22 +0000556
557A ``TarInfo`` object has the following public data attributes:
558
559
560.. attribute:: TarInfo.name
561
562 Name of the archive member.
563
564
565.. attribute:: TarInfo.size
566
567 Size in bytes.
568
569
570.. attribute:: TarInfo.mtime
571
572 Time of last modification.
573
574
575.. attribute:: TarInfo.mode
576
577 Permission bits.
578
579
580.. attribute:: TarInfo.type
581
582 File type. *type* is usually one of these constants: :const:`REGTYPE`,
583 :const:`AREGTYPE`, :const:`LNKTYPE`, :const:`SYMTYPE`, :const:`DIRTYPE`,
584 :const:`FIFOTYPE`, :const:`CONTTYPE`, :const:`CHRTYPE`, :const:`BLKTYPE`,
585 :const:`GNUTYPE_SPARSE`. To determine the type of a :class:`TarInfo` object
Raymond Hettingerf7f64f92014-05-23 00:03:45 +0100586 more conveniently, use the ``is*()`` methods below.
Georg Brandl116aa622007-08-15 14:28:22 +0000587
588
589.. attribute:: TarInfo.linkname
590
591 Name of the target file name, which is only present in :class:`TarInfo` objects
592 of type :const:`LNKTYPE` and :const:`SYMTYPE`.
593
594
595.. attribute:: TarInfo.uid
596
597 User ID of the user who originally stored this member.
598
599
600.. attribute:: TarInfo.gid
601
602 Group ID of the user who originally stored this member.
603
604
605.. attribute:: TarInfo.uname
606
607 User name.
608
609
610.. attribute:: TarInfo.gname
611
612 Group name.
613
614
615.. attribute:: TarInfo.pax_headers
616
617 A dictionary containing key-value pairs of an associated pax extended header.
618
Georg Brandl116aa622007-08-15 14:28:22 +0000619
620A :class:`TarInfo` object also provides some convenient query methods:
621
622
623.. method:: TarInfo.isfile()
624
625 Return :const:`True` if the :class:`Tarinfo` object is a regular file.
626
627
628.. method:: TarInfo.isreg()
629
630 Same as :meth:`isfile`.
631
632
633.. method:: TarInfo.isdir()
634
635 Return :const:`True` if it is a directory.
636
637
638.. method:: TarInfo.issym()
639
640 Return :const:`True` if it is a symbolic link.
641
642
643.. method:: TarInfo.islnk()
644
645 Return :const:`True` if it is a hard link.
646
647
648.. method:: TarInfo.ischr()
649
650 Return :const:`True` if it is a character device.
651
652
653.. method:: TarInfo.isblk()
654
655 Return :const:`True` if it is a block device.
656
657
658.. method:: TarInfo.isfifo()
659
660 Return :const:`True` if it is a FIFO.
661
662
663.. method:: TarInfo.isdev()
664
665 Return :const:`True` if it is one of character device, block device or FIFO.
666
Georg Brandl116aa622007-08-15 14:28:22 +0000667
Serhiy Storchakad27b4552013-11-24 01:53:29 +0200668.. _tarfile-commandline:
Serhiy Storchaka72b34432016-11-02 21:04:45 +0200669.. program:: tarfile
Serhiy Storchakad27b4552013-11-24 01:53:29 +0200670
Serhiy Storchaka72b34432016-11-02 21:04:45 +0200671Command-Line Interface
Serhiy Storchakad27b4552013-11-24 01:53:29 +0200672----------------------
673
674.. versionadded:: 3.4
675
Serhiy Storchaka72b34432016-11-02 21:04:45 +0200676The :mod:`tarfile` module provides a simple command-line interface to interact
Serhiy Storchakad27b4552013-11-24 01:53:29 +0200677with tar archives.
678
679If you want to create a new tar archive, specify its name after the :option:`-c`
Martin Panter1050d2d2016-07-26 11:18:21 +0200680option and then list the filename(s) that should be included:
681
682.. code-block:: shell-session
Serhiy Storchakad27b4552013-11-24 01:53:29 +0200683
684 $ python -m tarfile -c monty.tar spam.txt eggs.txt
685
Martin Panter1050d2d2016-07-26 11:18:21 +0200686Passing a directory is also acceptable:
687
688.. code-block:: shell-session
Serhiy Storchakad27b4552013-11-24 01:53:29 +0200689
690 $ python -m tarfile -c monty.tar life-of-brian_1979/
691
692If you want to extract a tar archive into the current directory, use
Martin Panter1050d2d2016-07-26 11:18:21 +0200693the :option:`-e` option:
694
695.. code-block:: shell-session
Serhiy Storchakad27b4552013-11-24 01:53:29 +0200696
697 $ python -m tarfile -e monty.tar
698
699You can also extract a tar archive into a different directory by passing the
Martin Panter1050d2d2016-07-26 11:18:21 +0200700directory's name:
701
702.. code-block:: shell-session
Serhiy Storchakad27b4552013-11-24 01:53:29 +0200703
704 $ python -m tarfile -e monty.tar other-dir/
705
Martin Panter1050d2d2016-07-26 11:18:21 +0200706For a list of the files in a tar archive, use the :option:`-l` option:
707
708.. code-block:: shell-session
Serhiy Storchakad27b4552013-11-24 01:53:29 +0200709
710 $ python -m tarfile -l monty.tar
711
712
Serhiy Storchaka72b34432016-11-02 21:04:45 +0200713Command-line options
Serhiy Storchakad27b4552013-11-24 01:53:29 +0200714~~~~~~~~~~~~~~~~~~~~
715
716.. cmdoption:: -l <tarfile>
717 --list <tarfile>
718
719 List files in a tarfile.
720
Serhiy Storchaka72b34432016-11-02 21:04:45 +0200721.. cmdoption:: -c <tarfile> <source1> ... <sourceN>
722 --create <tarfile> <source1> ... <sourceN>
Serhiy Storchakad27b4552013-11-24 01:53:29 +0200723
724 Create tarfile from source files.
725
726.. cmdoption:: -e <tarfile> [<output_dir>]
727 --extract <tarfile> [<output_dir>]
728
729 Extract tarfile into the current directory if *output_dir* is not specified.
730
731.. cmdoption:: -t <tarfile>
732 --test <tarfile>
733
734 Test whether the tarfile is valid or not.
735
736.. cmdoption:: -v, --verbose
737
Serhiy Storchaka72b34432016-11-02 21:04:45 +0200738 Verbose output.
Serhiy Storchakad27b4552013-11-24 01:53:29 +0200739
Georg Brandl116aa622007-08-15 14:28:22 +0000740.. _tar-examples:
741
742Examples
743--------
744
745How to extract an entire tar archive to the current working directory::
746
747 import tarfile
748 tar = tarfile.open("sample.tar.gz")
749 tar.extractall()
750 tar.close()
751
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000752How to extract a subset of a tar archive with :meth:`TarFile.extractall` using
753a generator function instead of a list::
754
755 import os
756 import tarfile
757
758 def py_files(members):
759 for tarinfo in members:
760 if os.path.splitext(tarinfo.name)[1] == ".py":
761 yield tarinfo
762
763 tar = tarfile.open("sample.tar.gz")
764 tar.extractall(members=py_files(tar))
765 tar.close()
766
Georg Brandl116aa622007-08-15 14:28:22 +0000767How to create an uncompressed tar archive from a list of filenames::
768
769 import tarfile
770 tar = tarfile.open("sample.tar", "w")
771 for name in ["foo", "bar", "quux"]:
772 tar.add(name)
773 tar.close()
774
Lars Gustäbel01385812010-03-03 12:08:54 +0000775The same example using the :keyword:`with` statement::
776
777 import tarfile
778 with tarfile.open("sample.tar", "w") as tar:
779 for name in ["foo", "bar", "quux"]:
780 tar.add(name)
781
Georg Brandl116aa622007-08-15 14:28:22 +0000782How to read a gzip compressed tar archive and display some member information::
783
784 import tarfile
785 tar = tarfile.open("sample.tar.gz", "r:gz")
786 for tarinfo in tar:
Collin Winterc79461b2007-09-01 23:34:30 +0000787 print(tarinfo.name, "is", tarinfo.size, "bytes in size and is", end="")
Georg Brandl116aa622007-08-15 14:28:22 +0000788 if tarinfo.isreg():
Collin Winterc79461b2007-09-01 23:34:30 +0000789 print("a regular file.")
Georg Brandl116aa622007-08-15 14:28:22 +0000790 elif tarinfo.isdir():
Collin Winterc79461b2007-09-01 23:34:30 +0000791 print("a directory.")
Georg Brandl116aa622007-08-15 14:28:22 +0000792 else:
Collin Winterc79461b2007-09-01 23:34:30 +0000793 print("something else.")
Georg Brandl116aa622007-08-15 14:28:22 +0000794 tar.close()
795
Lars Gustäbel049d2aa2009-09-12 10:44:00 +0000796How to create an archive and reset the user information using the *filter*
797parameter in :meth:`TarFile.add`::
798
799 import tarfile
800 def reset(tarinfo):
801 tarinfo.uid = tarinfo.gid = 0
802 tarinfo.uname = tarinfo.gname = "root"
803 return tarinfo
804 tar = tarfile.open("sample.tar.gz", "w:gz")
805 tar.add("foo", filter=reset)
806 tar.close()
807
Georg Brandl116aa622007-08-15 14:28:22 +0000808
809.. _tar-formats:
810
811Supported tar formats
812---------------------
813
814There are three tar formats that can be created with the :mod:`tarfile` module:
815
816* The POSIX.1-1988 ustar format (:const:`USTAR_FORMAT`). It supports filenames
CAM Gerlach89a89442019-04-06 23:47:49 -0500817 up to a length of at best 256 characters and linknames up to 100 characters.
818 The maximum file size is 8 GiB. This is an old and limited but widely
Georg Brandl116aa622007-08-15 14:28:22 +0000819 supported format.
820
821* The GNU tar format (:const:`GNU_FORMAT`). It supports long filenames and
Serhiy Storchakaf8def282013-02-16 17:29:56 +0200822 linknames, files bigger than 8 GiB and sparse files. It is the de facto
Georg Brandl116aa622007-08-15 14:28:22 +0000823 standard on GNU/Linux systems. :mod:`tarfile` fully supports the GNU tar
824 extensions for long names, sparse file support is read-only.
825
826* The POSIX.1-2001 pax format (:const:`PAX_FORMAT`). It is the most flexible
827 format with virtually no limits. It supports long filenames and linknames, large
CAM Gerlache680c3d2019-03-21 09:44:51 -0500828 files and stores pathnames in a portable way. Modern tar implementations,
829 including GNU tar, bsdtar/libarchive and star, fully support extended *pax*
CAM Gerlach89a89442019-04-06 23:47:49 -0500830 features; some old or unmaintained libraries may not, but should treat
CAM Gerlache680c3d2019-03-21 09:44:51 -0500831 *pax* archives as if they were in the universally-supported *ustar* format.
CAM Gerlach89a89442019-04-06 23:47:49 -0500832 It is the current default format for new archives.
Georg Brandl116aa622007-08-15 14:28:22 +0000833
CAM Gerlach89a89442019-04-06 23:47:49 -0500834 It extends the existing *ustar* format with extra headers for information
835 that cannot be stored otherwise. There are two flavours of pax headers:
836 Extended headers only affect the subsequent file header, global
837 headers are valid for the complete archive and affect all following files.
838 All the data in a pax header is encoded in *UTF-8* for portability reasons.
Georg Brandl116aa622007-08-15 14:28:22 +0000839
840There are some more variants of the tar format which can be read, but not
841created:
842
843* The ancient V7 format. This is the first tar format from Unix Seventh Edition,
844 storing only regular files and directories. Names must not be longer than 100
845 characters, there is no user/group name information. Some archives have
846 miscalculated header checksums in case of fields with non-ASCII characters.
847
848* The SunOS tar extended format. This format is a variant of the POSIX.1-2001
849 pax format, but is not compatible.
850
Georg Brandl116aa622007-08-15 14:28:22 +0000851.. _tar-unicode:
852
853Unicode issues
854--------------
855
856The tar format was originally conceived to make backups on tape drives with the
857main focus on preserving file system information. Nowadays tar archives are
858commonly used for file distribution and exchanging archives over networks. One
Lars Gustäbel3741eff2007-08-21 12:17:05 +0000859problem of the original format (which is the basis of all other formats) is
860that there is no concept of supporting different character encodings. For
Georg Brandl116aa622007-08-15 14:28:22 +0000861example, an ordinary tar archive created on a *UTF-8* system cannot be read
Lars Gustäbel3741eff2007-08-21 12:17:05 +0000862correctly on a *Latin-1* system if it contains non-*ASCII* characters. Textual
863metadata (like filenames, linknames, user/group names) will appear damaged.
864Unfortunately, there is no way to autodetect the encoding of an archive. The
865pax format was designed to solve this problem. It stores non-ASCII metadata
866using the universal character encoding *UTF-8*.
Georg Brandl116aa622007-08-15 14:28:22 +0000867
Lars Gustäbel3741eff2007-08-21 12:17:05 +0000868The details of character conversion in :mod:`tarfile` are controlled by the
869*encoding* and *errors* keyword arguments of the :class:`TarFile` class.
Georg Brandl116aa622007-08-15 14:28:22 +0000870
Lars Gustäbel3741eff2007-08-21 12:17:05 +0000871*encoding* defines the character encoding to use for the metadata in the
872archive. The default value is :func:`sys.getfilesystemencoding` or ``'ascii'``
873as a fallback. Depending on whether the archive is read or written, the
874metadata must be either decoded or encoded. If *encoding* is not set
875appropriately, this conversion may fail.
Georg Brandl116aa622007-08-15 14:28:22 +0000876
877The *errors* argument defines how characters are treated that cannot be
Nick Coghlanb9fdb7a2015-01-07 00:22:00 +1000878converted. Possible values are listed in section :ref:`error-handlers`.
Victor Stinnerde629d42010-05-05 21:43:57 +0000879The default scheme is ``'surrogateescape'`` which Python also uses for its
880file system calls, see :ref:`os-filenames`.
Georg Brandl116aa622007-08-15 14:28:22 +0000881
CAM Gerlache680c3d2019-03-21 09:44:51 -0500882For :const:`PAX_FORMAT` archives (the default), *encoding* is generally not needed
Lars Gustäbel1465cc22010-05-17 18:02:50 +0000883because all the metadata is stored using *UTF-8*. *encoding* is only used in
884the rare cases when binary pax headers are decoded or when strings with
885surrogate characters are stored.