blob: 459e4ad991d9dcf2ac818b4044a90f9e2965ca01 [file] [log] [blame]
Georg Brandl116aa622007-08-15 14:28:22 +00001:mod:`tarfile` --- Read and write tar archive files
2===================================================
3
4.. module:: tarfile
5 :synopsis: Read and write tar-format archive files.
6
Georg Brandl116aa622007-08-15 14:28:22 +00007.. moduleauthor:: Lars Gustäbel <lars@gustaebel.de>
8.. sectionauthor:: Lars Gustäbel <lars@gustaebel.de>
9
Raymond Hettingera1993682011-01-27 01:20:32 +000010**Source code:** :source:`Lib/tarfile.py`
11
12--------------
Georg Brandl116aa622007-08-15 14:28:22 +000013
Guido van Rossum77677112007-11-05 19:43:04 +000014The :mod:`tarfile` module makes it possible to read and write tar
Lars Gustäbel0a9dd2f2011-12-10 20:38:14 +010015archives, including those using gzip, bz2 and lzma compression.
Éric Araujof2fbb9c2012-01-16 16:55:55 +010016Use the :mod:`zipfile` module to read or write :file:`.zip` files, or the
17higher-level functions in :ref:`shutil <archiving-operations>`.
Guido van Rossum77677112007-11-05 19:43:04 +000018
Georg Brandl116aa622007-08-15 14:28:22 +000019Some facts and figures:
20
R David Murraybf92bce2014-10-03 20:18:48 -040021* reads and writes :mod:`gzip`, :mod:`bz2` and :mod:`lzma` compressed archives
22 if the respective modules are available.
Georg Brandl116aa622007-08-15 14:28:22 +000023
24* read/write support for the POSIX.1-1988 (ustar) format.
25
26* read/write support for the GNU tar format including *longname* and *longlink*
Lars Gustäbel9cbdd752010-10-29 09:08:19 +000027 extensions, read-only support for all variants of the *sparse* extension
28 including restoration of sparse files.
Georg Brandl116aa622007-08-15 14:28:22 +000029
30* read/write support for the POSIX.1-2001 (pax) format.
31
Georg Brandl116aa622007-08-15 14:28:22 +000032* handles directories, regular files, hardlinks, symbolic links, fifos,
33 character devices and block devices and is able to acquire and restore file
34 information like timestamp, access permissions and owner.
35
Lars Gustäbel521dfb02011-12-12 10:22:56 +010036.. versionchanged:: 3.3
37 Added support for :mod:`lzma` compression.
38
Georg Brandl116aa622007-08-15 14:28:22 +000039
Benjamin Petersona37cfc62008-05-26 13:48:34 +000040.. function:: open(name=None, mode='r', fileobj=None, bufsize=10240, \*\*kwargs)
Georg Brandl116aa622007-08-15 14:28:22 +000041
42 Return a :class:`TarFile` object for the pathname *name*. For detailed
43 information on :class:`TarFile` objects and the keyword arguments that are
44 allowed, see :ref:`tarfile-objects`.
45
46 *mode* has to be a string of the form ``'filemode[:compression]'``, it defaults
47 to ``'r'``. Here is a full list of mode combinations:
48
49 +------------------+---------------------------------------------+
50 | mode | action |
51 +==================+=============================================+
52 | ``'r' or 'r:*'`` | Open for reading with transparent |
53 | | compression (recommended). |
54 +------------------+---------------------------------------------+
55 | ``'r:'`` | Open for reading exclusively without |
56 | | compression. |
57 +------------------+---------------------------------------------+
58 | ``'r:gz'`` | Open for reading with gzip compression. |
59 +------------------+---------------------------------------------+
60 | ``'r:bz2'`` | Open for reading with bzip2 compression. |
61 +------------------+---------------------------------------------+
Lars Gustäbel0a9dd2f2011-12-10 20:38:14 +010062 | ``'r:xz'`` | Open for reading with lzma compression. |
63 +------------------+---------------------------------------------+
Berker Peksag0fe63252015-02-13 21:02:12 +020064 | ``'x'`` or | Create a tarfile exclusively without |
65 | ``'x:'`` | compression. |
66 | | Raise an :exc:`FileExistsError` exception |
Berker Peksag97484782016-06-14 00:48:35 +030067 | | if it already exists. |
Berker Peksag0fe63252015-02-13 21:02:12 +020068 +------------------+---------------------------------------------+
69 | ``'x:gz'`` | Create a tarfile with gzip compression. |
70 | | Raise an :exc:`FileExistsError` exception |
Berker Peksag97484782016-06-14 00:48:35 +030071 | | if it already exists. |
Berker Peksag0fe63252015-02-13 21:02:12 +020072 +------------------+---------------------------------------------+
73 | ``'x:bz2'`` | Create a tarfile with bzip2 compression. |
74 | | Raise an :exc:`FileExistsError` exception |
Berker Peksag97484782016-06-14 00:48:35 +030075 | | if it already exists. |
Berker Peksag0fe63252015-02-13 21:02:12 +020076 +------------------+---------------------------------------------+
77 | ``'x:xz'`` | Create a tarfile with lzma compression. |
78 | | Raise an :exc:`FileExistsError` exception |
Berker Peksag97484782016-06-14 00:48:35 +030079 | | if it already exists. |
Berker Peksag0fe63252015-02-13 21:02:12 +020080 +------------------+---------------------------------------------+
Georg Brandl116aa622007-08-15 14:28:22 +000081 | ``'a' or 'a:'`` | Open for appending with no compression. The |
82 | | file is created if it does not exist. |
83 +------------------+---------------------------------------------+
84 | ``'w' or 'w:'`` | Open for uncompressed writing. |
85 +------------------+---------------------------------------------+
86 | ``'w:gz'`` | Open for gzip compressed writing. |
87 +------------------+---------------------------------------------+
88 | ``'w:bz2'`` | Open for bzip2 compressed writing. |
89 +------------------+---------------------------------------------+
Lars Gustäbel0a9dd2f2011-12-10 20:38:14 +010090 | ``'w:xz'`` | Open for lzma compressed writing. |
91 +------------------+---------------------------------------------+
Georg Brandl116aa622007-08-15 14:28:22 +000092
Lars Gustäbel0a9dd2f2011-12-10 20:38:14 +010093 Note that ``'a:gz'``, ``'a:bz2'`` or ``'a:xz'`` is not possible. If *mode*
94 is not suitable to open a certain (compressed) file for reading,
95 :exc:`ReadError` is raised. Use *mode* ``'r'`` to avoid this. If a
96 compression method is not supported, :exc:`CompressionError` is raised.
Georg Brandl116aa622007-08-15 14:28:22 +000097
Antoine Pitrou11cb9612010-09-15 11:11:28 +000098 If *fileobj* is specified, it is used as an alternative to a :term:`file object`
99 opened in binary mode for *name*. It is supposed to be at position 0.
Georg Brandl116aa622007-08-15 14:28:22 +0000100
Berker Peksag0fe63252015-02-13 21:02:12 +0200101 For modes ``'w:gz'``, ``'r:gz'``, ``'w:bz2'``, ``'r:bz2'``, ``'x:gz'``,
102 ``'x:bz2'``, :func:`tarfile.open` accepts the keyword argument
Martin Panter7f7c6052016-04-13 03:24:06 +0000103 *compresslevel* (default ``9``) to specify the compression level of the file.
Benjamin Peterson9b2731b2014-06-07 12:45:37 -0700104
Georg Brandl116aa622007-08-15 14:28:22 +0000105 For special purposes, there is a second format for *mode*:
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000106 ``'filemode|[compression]'``. :func:`tarfile.open` will return a :class:`TarFile`
Georg Brandl116aa622007-08-15 14:28:22 +0000107 object that processes its data as a stream of blocks. No random seeking will
108 be done on the file. If given, *fileobj* may be any object that has a
109 :meth:`read` or :meth:`write` method (depending on the *mode*). *bufsize*
110 specifies the blocksize and defaults to ``20 * 512`` bytes. Use this variant
Antoine Pitrou11cb9612010-09-15 11:11:28 +0000111 in combination with e.g. ``sys.stdin``, a socket :term:`file object` or a tape
Georg Brandl116aa622007-08-15 14:28:22 +0000112 device. However, such a :class:`TarFile` object is limited in that it does
Martin Panterc04fb562016-02-10 05:44:01 +0000113 not allow random access, see :ref:`tar-examples`. The currently
Georg Brandl116aa622007-08-15 14:28:22 +0000114 possible modes:
115
116 +-------------+--------------------------------------------+
117 | Mode | Action |
118 +=============+============================================+
119 | ``'r|*'`` | Open a *stream* of tar blocks for reading |
120 | | with transparent compression. |
121 +-------------+--------------------------------------------+
122 | ``'r|'`` | Open a *stream* of uncompressed tar blocks |
123 | | for reading. |
124 +-------------+--------------------------------------------+
125 | ``'r|gz'`` | Open a gzip compressed *stream* for |
126 | | reading. |
127 +-------------+--------------------------------------------+
128 | ``'r|bz2'`` | Open a bzip2 compressed *stream* for |
129 | | reading. |
130 +-------------+--------------------------------------------+
Serhiy Storchaka6a7b3a72016-04-17 08:32:47 +0300131 | ``'r|xz'`` | Open an lzma compressed *stream* for |
Lars Gustäbel0a9dd2f2011-12-10 20:38:14 +0100132 | | reading. |
133 +-------------+--------------------------------------------+
Georg Brandl116aa622007-08-15 14:28:22 +0000134 | ``'w|'`` | Open an uncompressed *stream* for writing. |
135 +-------------+--------------------------------------------+
Lars Gustäbel0c6cbbd2011-12-10 12:45:45 +0100136 | ``'w|gz'`` | Open a gzip compressed *stream* for |
Georg Brandl116aa622007-08-15 14:28:22 +0000137 | | writing. |
138 +-------------+--------------------------------------------+
Lars Gustäbel0c6cbbd2011-12-10 12:45:45 +0100139 | ``'w|bz2'`` | Open a bzip2 compressed *stream* for |
Georg Brandl116aa622007-08-15 14:28:22 +0000140 | | writing. |
141 +-------------+--------------------------------------------+
Lars Gustäbel0a9dd2f2011-12-10 20:38:14 +0100142 | ``'w|xz'`` | Open an lzma compressed *stream* for |
143 | | writing. |
144 +-------------+--------------------------------------------+
Georg Brandl116aa622007-08-15 14:28:22 +0000145
Berker Peksag0fe63252015-02-13 21:02:12 +0200146 .. versionchanged:: 3.5
147 The ``'x'`` (exclusive creation) mode was added.
Georg Brandl116aa622007-08-15 14:28:22 +0000148
Serhiy Storchakac45cd162017-03-08 10:32:44 +0200149 .. versionchanged:: 3.6
150 The *name* parameter accepts a :term:`path-like object`.
151
152
Georg Brandl116aa622007-08-15 14:28:22 +0000153.. class:: TarFile
154
Berker Peksag97484782016-06-14 00:48:35 +0300155 Class for reading and writing tar archives. Do not use this class directly:
156 use :func:`tarfile.open` instead. See :ref:`tarfile-objects`.
Georg Brandl116aa622007-08-15 14:28:22 +0000157
158
159.. function:: is_tarfile(name)
160
161 Return :const:`True` if *name* is a tar archive file, that the :mod:`tarfile`
William Woodruffdd754ca2020-01-22 21:24:16 -0500162 module can read. *name* may be a :class:`str`, file, or file-like object.
163
164 .. versionchanged:: 3.9
165 Support for file and file-like objects.
Georg Brandl116aa622007-08-15 14:28:22 +0000166
167
Lars Gustäbel0c24e8b2008-08-02 11:43:24 +0000168The :mod:`tarfile` module defines the following exceptions:
Georg Brandl116aa622007-08-15 14:28:22 +0000169
170
171.. exception:: TarError
172
173 Base class for all :mod:`tarfile` exceptions.
174
175
176.. exception:: ReadError
177
178 Is raised when a tar archive is opened, that either cannot be handled by the
179 :mod:`tarfile` module or is somehow invalid.
180
181
182.. exception:: CompressionError
183
184 Is raised when a compression method is not supported or when the data cannot be
185 decoded properly.
186
187
188.. exception:: StreamError
189
190 Is raised for the limitations that are typical for stream-like :class:`TarFile`
191 objects.
192
193
194.. exception:: ExtractError
195
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000196 Is raised for *non-fatal* errors when using :meth:`TarFile.extract`, but only if
Georg Brandl116aa622007-08-15 14:28:22 +0000197 :attr:`TarFile.errorlevel`\ ``== 2``.
198
199
200.. exception:: HeaderError
201
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000202 Is raised by :meth:`TarInfo.frombuf` if the buffer it gets is invalid.
203
Georg Brandl116aa622007-08-15 14:28:22 +0000204
R David Murraybf92bce2014-10-03 20:18:48 -0400205The following constants are available at the module level:
206
207.. data:: ENCODING
208
209 The default character encoding: ``'utf-8'`` on Windows, the value returned by
210 :func:`sys.getfilesystemencoding` otherwise.
211
Georg Brandl116aa622007-08-15 14:28:22 +0000212
213Each of the following constants defines a tar archive format that the
214:mod:`tarfile` module is able to create. See section :ref:`tar-formats` for
215details.
216
217
218.. data:: USTAR_FORMAT
219
220 POSIX.1-1988 (ustar) format.
221
222
223.. data:: GNU_FORMAT
224
225 GNU tar format.
226
227
228.. data:: PAX_FORMAT
229
230 POSIX.1-2001 (pax) format.
231
232
233.. data:: DEFAULT_FORMAT
234
CAM Gerlache680c3d2019-03-21 09:44:51 -0500235 The default format for creating archives. This is currently :const:`PAX_FORMAT`.
236
CAM Gerlach89a89442019-04-06 23:47:49 -0500237 .. versionchanged:: 3.8
238 The default format for new archives was changed to
239 :const:`PAX_FORMAT` from :const:`GNU_FORMAT`.
Georg Brandl116aa622007-08-15 14:28:22 +0000240
241
242.. seealso::
243
244 Module :mod:`zipfile`
245 Documentation of the :mod:`zipfile` standard module.
246
R David Murraybf92bce2014-10-03 20:18:48 -0400247 :ref:`archiving-operations`
248 Documentation of the higher-level archiving facilities provided by the
249 standard :mod:`shutil` module.
250
Serhiy Storchaka6dff0202016-05-07 10:49:07 +0300251 `GNU tar manual, Basic Tar Format <https://www.gnu.org/software/tar/manual/html_node/Standard.html>`_
Georg Brandl116aa622007-08-15 14:28:22 +0000252 Documentation for tar archive files, including GNU tar extensions.
253
Georg Brandl116aa622007-08-15 14:28:22 +0000254
255.. _tarfile-objects:
256
257TarFile Objects
258---------------
259
260The :class:`TarFile` object provides an interface to a tar archive. A tar
261archive is a sequence of blocks. An archive member (a stored file) is made up of
262a header block followed by data blocks. It is possible to store a file in a tar
263archive several times. Each archive member is represented by a :class:`TarInfo`
264object, see :ref:`tarinfo-objects` for details.
265
Lars Gustäbel01385812010-03-03 12:08:54 +0000266A :class:`TarFile` object can be used as a context manager in a :keyword:`with`
267statement. It will automatically be closed when the block is completed. Please
268note that in the event of an exception an archive opened for writing will not
Benjamin Peterson08bf91c2010-04-11 16:12:57 +0000269be finalized; only the internally used file object will be closed. See the
Lars Gustäbel01385812010-03-03 12:08:54 +0000270:ref:`tar-examples` section for a use case.
271
272.. versionadded:: 3.2
Serhiy Storchaka14867992014-09-10 23:43:41 +0300273 Added support for the context management protocol.
Georg Brandl116aa622007-08-15 14:28:22 +0000274
Victor Stinnerde629d42010-05-05 21:43:57 +0000275.. class:: TarFile(name=None, mode='r', fileobj=None, format=DEFAULT_FORMAT, tarinfo=TarInfo, dereference=False, ignore_zeros=False, encoding=ENCODING, errors='surrogateescape', pax_headers=None, debug=0, errorlevel=0)
Georg Brandl116aa622007-08-15 14:28:22 +0000276
277 All following arguments are optional and can be accessed as instance attributes
278 as well.
279
Serhiy Storchakac45cd162017-03-08 10:32:44 +0200280 *name* is the pathname of the archive. *name* may be a :term:`path-like object`.
281 It can be omitted if *fileobj* is given.
Georg Brandl116aa622007-08-15 14:28:22 +0000282 In this case, the file object's :attr:`name` attribute is used if it exists.
283
284 *mode* is either ``'r'`` to read from an existing archive, ``'a'`` to append
Berker Peksag0fe63252015-02-13 21:02:12 +0200285 data to an existing file, ``'w'`` to create a new file overwriting an existing
Berker Peksag97484782016-06-14 00:48:35 +0300286 one, or ``'x'`` to create a new file only if it does not already exist.
Georg Brandl116aa622007-08-15 14:28:22 +0000287
288 If *fileobj* is given, it is used for reading or writing data. If it can be
289 determined, *mode* is overridden by *fileobj*'s mode. *fileobj* will be used
290 from position 0.
291
292 .. note::
293
294 *fileobj* is not closed, when :class:`TarFile` is closed.
295
Pascal Chambonc5a7e0c2019-09-28 17:04:44 +0200296 *format* controls the archive format for writing. It must be one of the constants
Georg Brandl116aa622007-08-15 14:28:22 +0000297 :const:`USTAR_FORMAT`, :const:`GNU_FORMAT` or :const:`PAX_FORMAT` that are
Pascal Chambonc5a7e0c2019-09-28 17:04:44 +0200298 defined at module level. When reading, format will be automatically detected, even
299 if different formats are present in a single archive.
Georg Brandl116aa622007-08-15 14:28:22 +0000300
Georg Brandl116aa622007-08-15 14:28:22 +0000301 The *tarinfo* argument can be used to replace the default :class:`TarInfo` class
302 with a different one.
303
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000304 If *dereference* is :const:`False`, add symbolic and hard links to the archive. If it
305 is :const:`True`, add the content of the target files to the archive. This has no
Georg Brandl116aa622007-08-15 14:28:22 +0000306 effect on systems that do not support symbolic links.
307
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000308 If *ignore_zeros* is :const:`False`, treat an empty block as the end of the archive.
309 If it is :const:`True`, skip empty (and invalid) blocks and try to get as many members
Georg Brandl116aa622007-08-15 14:28:22 +0000310 as possible. This is only useful for reading concatenated or damaged archives.
311
312 *debug* can be set from ``0`` (no debug messages) up to ``3`` (all debug
313 messages). The messages are written to ``sys.stderr``.
314
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000315 If *errorlevel* is ``0``, all errors are ignored when using :meth:`TarFile.extract`.
Georg Brandl116aa622007-08-15 14:28:22 +0000316 Nevertheless, they appear as error messages in the debug output, when debugging
Antoine Pitrou62ab10a02011-10-12 20:10:51 +0200317 is enabled. If ``1``, all *fatal* errors are raised as :exc:`OSError`
318 exceptions. If ``2``, all *non-fatal* errors are raised as :exc:`TarError`
319 exceptions as well.
Georg Brandl116aa622007-08-15 14:28:22 +0000320
Lars Gustäbel3741eff2007-08-21 12:17:05 +0000321 The *encoding* and *errors* arguments define the character encoding to be
322 used for reading or writing the archive and how conversion errors are going
323 to be handled. The default settings will work for most users.
Georg Brandl116aa622007-08-15 14:28:22 +0000324 See section :ref:`tar-unicode` for in-depth information.
325
Lars Gustäbel3741eff2007-08-21 12:17:05 +0000326 The *pax_headers* argument is an optional dictionary of strings which
Georg Brandl116aa622007-08-15 14:28:22 +0000327 will be added as a pax global header if *format* is :const:`PAX_FORMAT`.
328
Berker Peksag0fe63252015-02-13 21:02:12 +0200329 .. versionchanged:: 3.2
330 Use ``'surrogateescape'`` as the default for the *errors* argument.
331
332 .. versionchanged:: 3.5
333 The ``'x'`` (exclusive creation) mode was added.
Georg Brandl116aa622007-08-15 14:28:22 +0000334
Serhiy Storchakac45cd162017-03-08 10:32:44 +0200335 .. versionchanged:: 3.6
336 The *name* parameter accepts a :term:`path-like object`.
337
338
Raymond Hettinger7096e262014-05-23 03:46:52 +0100339.. classmethod:: TarFile.open(...)
Georg Brandl116aa622007-08-15 14:28:22 +0000340
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000341 Alternative constructor. The :func:`tarfile.open` function is actually a
342 shortcut to this classmethod.
Georg Brandl116aa622007-08-15 14:28:22 +0000343
344
345.. method:: TarFile.getmember(name)
346
347 Return a :class:`TarInfo` object for member *name*. If *name* can not be found
348 in the archive, :exc:`KeyError` is raised.
349
350 .. note::
351
352 If a member occurs more than once in the archive, its last occurrence is assumed
353 to be the most up-to-date version.
354
355
356.. method:: TarFile.getmembers()
357
358 Return the members of the archive as a list of :class:`TarInfo` objects. The
359 list has the same order as the members in the archive.
360
361
362.. method:: TarFile.getnames()
363
364 Return the members as a list of their names. It has the same order as the list
365 returned by :meth:`getmembers`.
366
367
Serhiy Storchakaa7eb7462014-08-21 10:01:16 +0300368.. method:: TarFile.list(verbose=True, *, members=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000369
370 Print a table of contents to ``sys.stdout``. If *verbose* is :const:`False`,
371 only the names of the members are printed. If it is :const:`True`, output
Serhiy Storchakaa7eb7462014-08-21 10:01:16 +0300372 similar to that of :program:`ls -l` is produced. If optional *members* is
373 given, it must be a subset of the list returned by :meth:`getmembers`.
374
375 .. versionchanged:: 3.5
376 Added the *members* parameter.
Georg Brandl116aa622007-08-15 14:28:22 +0000377
378
379.. method:: TarFile.next()
380
381 Return the next member of the archive as a :class:`TarInfo` object, when
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000382 :class:`TarFile` is opened for reading. Return :const:`None` if there is no more
Georg Brandl116aa622007-08-15 14:28:22 +0000383 available.
384
385
Eric V. Smith7a803892015-04-15 10:27:58 -0400386.. method:: TarFile.extractall(path=".", members=None, *, numeric_owner=False)
Georg Brandl116aa622007-08-15 14:28:22 +0000387
388 Extract all members from the archive to the current working directory or
389 directory *path*. If optional *members* is given, it must be a subset of the
390 list returned by :meth:`getmembers`. Directory information like owner,
391 modification time and permissions are set after all members have been extracted.
392 This is done to work around two problems: A directory's modification time is
393 reset each time a file is created in it. And, if a directory's permissions do
394 not allow writing, extracting files to it will fail.
395
Eric V. Smith7a803892015-04-15 10:27:58 -0400396 If *numeric_owner* is :const:`True`, the uid and gid numbers from the tarfile
397 are used to set the owner/group for the extracted files. Otherwise, the named
398 values from the tarfile are used.
399
Thomas Wouters47b49bf2007-08-30 22:15:33 +0000400 .. warning::
401
402 Never extract archives from untrusted sources without prior inspection.
403 It is possible that files are created outside of *path*, e.g. members
404 that have absolute filenames starting with ``"/"`` or filenames with two
405 dots ``".."``.
406
Eric V. Smith7a803892015-04-15 10:27:58 -0400407 .. versionchanged:: 3.5
Martin Panterefbf20f2016-11-13 23:25:06 +0000408 Added the *numeric_owner* parameter.
Georg Brandl116aa622007-08-15 14:28:22 +0000409
Serhiy Storchakac45cd162017-03-08 10:32:44 +0200410 .. versionchanged:: 3.6
411 The *path* parameter accepts a :term:`path-like object`.
412
Eric V. Smith7a803892015-04-15 10:27:58 -0400413
414.. method:: TarFile.extract(member, path="", set_attrs=True, *, numeric_owner=False)
Georg Brandl116aa622007-08-15 14:28:22 +0000415
416 Extract a member from the archive to the current working directory, using its
417 full name. Its file information is extracted as accurately as possible. *member*
418 may be a filename or a :class:`TarInfo` object. You can specify a different
Serhiy Storchakac45cd162017-03-08 10:32:44 +0200419 directory using *path*. *path* may be a :term:`path-like object`.
420 File attributes (owner, mtime, mode) are set unless *set_attrs* is false.
Georg Brandl116aa622007-08-15 14:28:22 +0000421
Eric V. Smith7a803892015-04-15 10:27:58 -0400422 If *numeric_owner* is :const:`True`, the uid and gid numbers from the tarfile
423 are used to set the owner/group for the extracted files. Otherwise, the named
424 values from the tarfile are used.
425
Georg Brandl116aa622007-08-15 14:28:22 +0000426 .. note::
427
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000428 The :meth:`extract` method does not take care of several extraction issues.
429 In most cases you should consider using the :meth:`extractall` method.
Georg Brandl116aa622007-08-15 14:28:22 +0000430
Thomas Wouters47b49bf2007-08-30 22:15:33 +0000431 .. warning::
432
433 See the warning for :meth:`extractall`.
434
Martin v. Löwis16f344d2010-11-01 21:39:13 +0000435 .. versionchanged:: 3.2
436 Added the *set_attrs* parameter.
Georg Brandl116aa622007-08-15 14:28:22 +0000437
Eric V. Smith7a803892015-04-15 10:27:58 -0400438 .. versionchanged:: 3.5
Martin Panterefbf20f2016-11-13 23:25:06 +0000439 Added the *numeric_owner* parameter.
Eric V. Smith7a803892015-04-15 10:27:58 -0400440
Serhiy Storchakac45cd162017-03-08 10:32:44 +0200441 .. versionchanged:: 3.6
442 The *path* parameter accepts a :term:`path-like object`.
443
444
Georg Brandl116aa622007-08-15 14:28:22 +0000445.. method:: TarFile.extractfile(member)
446
447 Extract a member from the archive as a file object. *member* may be a filename
Lars Gustäbel7a919e92012-05-05 18:15:03 +0200448 or a :class:`TarInfo` object. If *member* is a regular file or a link, an
449 :class:`io.BufferedReader` object is returned. Otherwise, :const:`None` is
450 returned.
Georg Brandl116aa622007-08-15 14:28:22 +0000451
Lars Gustäbel7a919e92012-05-05 18:15:03 +0200452 .. versionchanged:: 3.3
453 Return an :class:`io.BufferedReader` object.
Georg Brandl116aa622007-08-15 14:28:22 +0000454
455
Serhiy Storchaka4f76fb12017-01-13 13:25:24 +0200456.. method:: TarFile.add(name, arcname=None, recursive=True, *, filter=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000457
Raymond Hettingera63a3122011-01-26 20:34:14 +0000458 Add the file *name* to the archive. *name* may be any type of file
459 (directory, fifo, symbolic link, etc.). If given, *arcname* specifies an
460 alternative name for the file in the archive. Directories are added
461 recursively by default. This can be avoided by setting *recursive* to
Bernhard M. Wiedemann84521042018-01-31 11:17:10 +0100462 :const:`False`. Recursion adds entries in sorted order.
463 If *filter* is given, it
Raymond Hettingera63a3122011-01-26 20:34:14 +0000464 should be a function that takes a :class:`TarInfo` object argument and
465 returns the changed :class:`TarInfo` object. If it instead returns
466 :const:`None` the :class:`TarInfo` object will be excluded from the
467 archive. See :ref:`tar-examples` for an example.
Lars Gustäbel049d2aa2009-09-12 10:44:00 +0000468
469 .. versionchanged:: 3.2
470 Added the *filter* parameter.
471
Bernhard M. Wiedemann84521042018-01-31 11:17:10 +0100472 .. versionchanged:: 3.7
473 Recursion adds entries in sorted order.
474
Georg Brandl116aa622007-08-15 14:28:22 +0000475
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000476.. method:: TarFile.addfile(tarinfo, fileobj=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000477
478 Add the :class:`TarInfo` object *tarinfo* to the archive. If *fileobj* is given,
Martin Panterf817a482016-02-19 23:34:56 +0000479 it should be a :term:`binary file`, and
Georg Brandl116aa622007-08-15 14:28:22 +0000480 ``tarinfo.size`` bytes are read from it and added to the archive. You can
Martin Panterf817a482016-02-19 23:34:56 +0000481 create :class:`TarInfo` objects directly, or by using :meth:`gettarinfo`.
Georg Brandl116aa622007-08-15 14:28:22 +0000482
483
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000484.. method:: TarFile.gettarinfo(name=None, arcname=None, fileobj=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000485
Martin Panterf817a482016-02-19 23:34:56 +0000486 Create a :class:`TarInfo` object from the result of :func:`os.stat` or
487 equivalent on an existing file. The file is either named by *name*, or
Serhiy Storchakac45cd162017-03-08 10:32:44 +0200488 specified as a :term:`file object` *fileobj* with a file descriptor.
489 *name* may be a :term:`path-like object`. If
Martin Panterf817a482016-02-19 23:34:56 +0000490 given, *arcname* specifies an alternative name for the file in the
491 archive, otherwise, the name is taken from *fileobj*’s
492 :attr:`~io.FileIO.name` attribute, or the *name* argument. The name
493 should be a text string.
494
495 You can modify
496 some of the :class:`TarInfo`’s attributes before you add it using :meth:`addfile`.
497 If the file object is not an ordinary file object positioned at the
498 beginning of the file, attributes such as :attr:`~TarInfo.size` may need
499 modifying. This is the case for objects such as :class:`~gzip.GzipFile`.
500 The :attr:`~TarInfo.name` may also be modified, in which case *arcname*
501 could be a dummy string.
Georg Brandl116aa622007-08-15 14:28:22 +0000502
Serhiy Storchakac45cd162017-03-08 10:32:44 +0200503 .. versionchanged:: 3.6
504 The *name* parameter accepts a :term:`path-like object`.
505
Georg Brandl116aa622007-08-15 14:28:22 +0000506
507.. method:: TarFile.close()
508
509 Close the :class:`TarFile`. In write mode, two finishing zero blocks are
510 appended to the archive.
511
512
Georg Brandl116aa622007-08-15 14:28:22 +0000513.. attribute:: TarFile.pax_headers
514
515 A dictionary containing key-value pairs of pax global headers.
516
Georg Brandl116aa622007-08-15 14:28:22 +0000517
Georg Brandl116aa622007-08-15 14:28:22 +0000518
519.. _tarinfo-objects:
520
521TarInfo Objects
522---------------
523
524A :class:`TarInfo` object represents one member in a :class:`TarFile`. Aside
525from storing all required attributes of a file (like file type, size, time,
526permissions, owner etc.), it provides some useful methods to determine its type.
527It does *not* contain the file's data itself.
528
529:class:`TarInfo` objects are returned by :class:`TarFile`'s methods
530:meth:`getmember`, :meth:`getmembers` and :meth:`gettarinfo`.
531
532
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000533.. class:: TarInfo(name="")
Georg Brandl116aa622007-08-15 14:28:22 +0000534
535 Create a :class:`TarInfo` object.
536
537
Berker Peksag37de9102015-04-19 04:37:35 +0300538.. classmethod:: TarInfo.frombuf(buf, encoding, errors)
Georg Brandl116aa622007-08-15 14:28:22 +0000539
540 Create and return a :class:`TarInfo` object from string buffer *buf*.
541
Berker Peksag37de9102015-04-19 04:37:35 +0300542 Raises :exc:`HeaderError` if the buffer is invalid.
Georg Brandl116aa622007-08-15 14:28:22 +0000543
544
Berker Peksag37de9102015-04-19 04:37:35 +0300545.. classmethod:: TarInfo.fromtarfile(tarfile)
Georg Brandl116aa622007-08-15 14:28:22 +0000546
547 Read the next member from the :class:`TarFile` object *tarfile* and return it as
548 a :class:`TarInfo` object.
549
Georg Brandl116aa622007-08-15 14:28:22 +0000550
Victor Stinnerde629d42010-05-05 21:43:57 +0000551.. method:: TarInfo.tobuf(format=DEFAULT_FORMAT, encoding=ENCODING, errors='surrogateescape')
Georg Brandl116aa622007-08-15 14:28:22 +0000552
553 Create a string buffer from a :class:`TarInfo` object. For information on the
554 arguments see the constructor of the :class:`TarFile` class.
555
Victor Stinnerde629d42010-05-05 21:43:57 +0000556 .. versionchanged:: 3.2
557 Use ``'surrogateescape'`` as the default for the *errors* argument.
558
Georg Brandl116aa622007-08-15 14:28:22 +0000559
560A ``TarInfo`` object has the following public data attributes:
561
562
563.. attribute:: TarInfo.name
564
565 Name of the archive member.
566
567
568.. attribute:: TarInfo.size
569
570 Size in bytes.
571
572
573.. attribute:: TarInfo.mtime
574
575 Time of last modification.
576
577
578.. attribute:: TarInfo.mode
579
580 Permission bits.
581
582
583.. attribute:: TarInfo.type
584
585 File type. *type* is usually one of these constants: :const:`REGTYPE`,
586 :const:`AREGTYPE`, :const:`LNKTYPE`, :const:`SYMTYPE`, :const:`DIRTYPE`,
587 :const:`FIFOTYPE`, :const:`CONTTYPE`, :const:`CHRTYPE`, :const:`BLKTYPE`,
588 :const:`GNUTYPE_SPARSE`. To determine the type of a :class:`TarInfo` object
Raymond Hettingerf7f64f92014-05-23 00:03:45 +0100589 more conveniently, use the ``is*()`` methods below.
Georg Brandl116aa622007-08-15 14:28:22 +0000590
591
592.. attribute:: TarInfo.linkname
593
594 Name of the target file name, which is only present in :class:`TarInfo` objects
595 of type :const:`LNKTYPE` and :const:`SYMTYPE`.
596
597
598.. attribute:: TarInfo.uid
599
600 User ID of the user who originally stored this member.
601
602
603.. attribute:: TarInfo.gid
604
605 Group ID of the user who originally stored this member.
606
607
608.. attribute:: TarInfo.uname
609
610 User name.
611
612
613.. attribute:: TarInfo.gname
614
615 Group name.
616
617
618.. attribute:: TarInfo.pax_headers
619
620 A dictionary containing key-value pairs of an associated pax extended header.
621
Georg Brandl116aa622007-08-15 14:28:22 +0000622
623A :class:`TarInfo` object also provides some convenient query methods:
624
625
626.. method:: TarInfo.isfile()
627
628 Return :const:`True` if the :class:`Tarinfo` object is a regular file.
629
630
631.. method:: TarInfo.isreg()
632
633 Same as :meth:`isfile`.
634
635
636.. method:: TarInfo.isdir()
637
638 Return :const:`True` if it is a directory.
639
640
641.. method:: TarInfo.issym()
642
643 Return :const:`True` if it is a symbolic link.
644
645
646.. method:: TarInfo.islnk()
647
648 Return :const:`True` if it is a hard link.
649
650
651.. method:: TarInfo.ischr()
652
653 Return :const:`True` if it is a character device.
654
655
656.. method:: TarInfo.isblk()
657
658 Return :const:`True` if it is a block device.
659
660
661.. method:: TarInfo.isfifo()
662
663 Return :const:`True` if it is a FIFO.
664
665
666.. method:: TarInfo.isdev()
667
668 Return :const:`True` if it is one of character device, block device or FIFO.
669
Georg Brandl116aa622007-08-15 14:28:22 +0000670
Serhiy Storchakad27b4552013-11-24 01:53:29 +0200671.. _tarfile-commandline:
Serhiy Storchaka72b34432016-11-02 21:04:45 +0200672.. program:: tarfile
Serhiy Storchakad27b4552013-11-24 01:53:29 +0200673
Serhiy Storchaka72b34432016-11-02 21:04:45 +0200674Command-Line Interface
Serhiy Storchakad27b4552013-11-24 01:53:29 +0200675----------------------
676
677.. versionadded:: 3.4
678
Serhiy Storchaka72b34432016-11-02 21:04:45 +0200679The :mod:`tarfile` module provides a simple command-line interface to interact
Serhiy Storchakad27b4552013-11-24 01:53:29 +0200680with tar archives.
681
682If you want to create a new tar archive, specify its name after the :option:`-c`
Martin Panter1050d2d2016-07-26 11:18:21 +0200683option and then list the filename(s) that should be included:
684
685.. code-block:: shell-session
Serhiy Storchakad27b4552013-11-24 01:53:29 +0200686
687 $ python -m tarfile -c monty.tar spam.txt eggs.txt
688
Martin Panter1050d2d2016-07-26 11:18:21 +0200689Passing a directory is also acceptable:
690
691.. code-block:: shell-session
Serhiy Storchakad27b4552013-11-24 01:53:29 +0200692
693 $ python -m tarfile -c monty.tar life-of-brian_1979/
694
695If you want to extract a tar archive into the current directory, use
Martin Panter1050d2d2016-07-26 11:18:21 +0200696the :option:`-e` option:
697
698.. code-block:: shell-session
Serhiy Storchakad27b4552013-11-24 01:53:29 +0200699
700 $ python -m tarfile -e monty.tar
701
702You can also extract a tar archive into a different directory by passing the
Martin Panter1050d2d2016-07-26 11:18:21 +0200703directory's name:
704
705.. code-block:: shell-session
Serhiy Storchakad27b4552013-11-24 01:53:29 +0200706
707 $ python -m tarfile -e monty.tar other-dir/
708
Martin Panter1050d2d2016-07-26 11:18:21 +0200709For a list of the files in a tar archive, use the :option:`-l` option:
710
711.. code-block:: shell-session
Serhiy Storchakad27b4552013-11-24 01:53:29 +0200712
713 $ python -m tarfile -l monty.tar
714
715
Serhiy Storchaka72b34432016-11-02 21:04:45 +0200716Command-line options
Serhiy Storchakad27b4552013-11-24 01:53:29 +0200717~~~~~~~~~~~~~~~~~~~~
718
719.. cmdoption:: -l <tarfile>
720 --list <tarfile>
721
722 List files in a tarfile.
723
Serhiy Storchaka72b34432016-11-02 21:04:45 +0200724.. cmdoption:: -c <tarfile> <source1> ... <sourceN>
725 --create <tarfile> <source1> ... <sourceN>
Serhiy Storchakad27b4552013-11-24 01:53:29 +0200726
727 Create tarfile from source files.
728
729.. cmdoption:: -e <tarfile> [<output_dir>]
730 --extract <tarfile> [<output_dir>]
731
732 Extract tarfile into the current directory if *output_dir* is not specified.
733
734.. cmdoption:: -t <tarfile>
735 --test <tarfile>
736
737 Test whether the tarfile is valid or not.
738
739.. cmdoption:: -v, --verbose
740
Serhiy Storchaka72b34432016-11-02 21:04:45 +0200741 Verbose output.
Serhiy Storchakad27b4552013-11-24 01:53:29 +0200742
Georg Brandl116aa622007-08-15 14:28:22 +0000743.. _tar-examples:
744
745Examples
746--------
747
748How to extract an entire tar archive to the current working directory::
749
750 import tarfile
751 tar = tarfile.open("sample.tar.gz")
752 tar.extractall()
753 tar.close()
754
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000755How to extract a subset of a tar archive with :meth:`TarFile.extractall` using
756a generator function instead of a list::
757
758 import os
759 import tarfile
760
761 def py_files(members):
762 for tarinfo in members:
763 if os.path.splitext(tarinfo.name)[1] == ".py":
764 yield tarinfo
765
766 tar = tarfile.open("sample.tar.gz")
767 tar.extractall(members=py_files(tar))
768 tar.close()
769
Georg Brandl116aa622007-08-15 14:28:22 +0000770How to create an uncompressed tar archive from a list of filenames::
771
772 import tarfile
773 tar = tarfile.open("sample.tar", "w")
774 for name in ["foo", "bar", "quux"]:
775 tar.add(name)
776 tar.close()
777
Lars Gustäbel01385812010-03-03 12:08:54 +0000778The same example using the :keyword:`with` statement::
779
780 import tarfile
781 with tarfile.open("sample.tar", "w") as tar:
782 for name in ["foo", "bar", "quux"]:
783 tar.add(name)
784
Georg Brandl116aa622007-08-15 14:28:22 +0000785How to read a gzip compressed tar archive and display some member information::
786
787 import tarfile
788 tar = tarfile.open("sample.tar.gz", "r:gz")
789 for tarinfo in tar:
Collin Winterc79461b2007-09-01 23:34:30 +0000790 print(tarinfo.name, "is", tarinfo.size, "bytes in size and is", end="")
Georg Brandl116aa622007-08-15 14:28:22 +0000791 if tarinfo.isreg():
Collin Winterc79461b2007-09-01 23:34:30 +0000792 print("a regular file.")
Georg Brandl116aa622007-08-15 14:28:22 +0000793 elif tarinfo.isdir():
Collin Winterc79461b2007-09-01 23:34:30 +0000794 print("a directory.")
Georg Brandl116aa622007-08-15 14:28:22 +0000795 else:
Collin Winterc79461b2007-09-01 23:34:30 +0000796 print("something else.")
Georg Brandl116aa622007-08-15 14:28:22 +0000797 tar.close()
798
Lars Gustäbel049d2aa2009-09-12 10:44:00 +0000799How to create an archive and reset the user information using the *filter*
800parameter in :meth:`TarFile.add`::
801
802 import tarfile
803 def reset(tarinfo):
804 tarinfo.uid = tarinfo.gid = 0
805 tarinfo.uname = tarinfo.gname = "root"
806 return tarinfo
807 tar = tarfile.open("sample.tar.gz", "w:gz")
808 tar.add("foo", filter=reset)
809 tar.close()
810
Georg Brandl116aa622007-08-15 14:28:22 +0000811
812.. _tar-formats:
813
814Supported tar formats
815---------------------
816
817There are three tar formats that can be created with the :mod:`tarfile` module:
818
819* The POSIX.1-1988 ustar format (:const:`USTAR_FORMAT`). It supports filenames
CAM Gerlach89a89442019-04-06 23:47:49 -0500820 up to a length of at best 256 characters and linknames up to 100 characters.
821 The maximum file size is 8 GiB. This is an old and limited but widely
Georg Brandl116aa622007-08-15 14:28:22 +0000822 supported format.
823
824* The GNU tar format (:const:`GNU_FORMAT`). It supports long filenames and
Serhiy Storchakaf8def282013-02-16 17:29:56 +0200825 linknames, files bigger than 8 GiB and sparse files. It is the de facto
Georg Brandl116aa622007-08-15 14:28:22 +0000826 standard on GNU/Linux systems. :mod:`tarfile` fully supports the GNU tar
827 extensions for long names, sparse file support is read-only.
828
829* The POSIX.1-2001 pax format (:const:`PAX_FORMAT`). It is the most flexible
830 format with virtually no limits. It supports long filenames and linknames, large
CAM Gerlache680c3d2019-03-21 09:44:51 -0500831 files and stores pathnames in a portable way. Modern tar implementations,
832 including GNU tar, bsdtar/libarchive and star, fully support extended *pax*
CAM Gerlach89a89442019-04-06 23:47:49 -0500833 features; some old or unmaintained libraries may not, but should treat
CAM Gerlache680c3d2019-03-21 09:44:51 -0500834 *pax* archives as if they were in the universally-supported *ustar* format.
CAM Gerlach89a89442019-04-06 23:47:49 -0500835 It is the current default format for new archives.
Georg Brandl116aa622007-08-15 14:28:22 +0000836
CAM Gerlach89a89442019-04-06 23:47:49 -0500837 It extends the existing *ustar* format with extra headers for information
838 that cannot be stored otherwise. There are two flavours of pax headers:
839 Extended headers only affect the subsequent file header, global
840 headers are valid for the complete archive and affect all following files.
841 All the data in a pax header is encoded in *UTF-8* for portability reasons.
Georg Brandl116aa622007-08-15 14:28:22 +0000842
843There are some more variants of the tar format which can be read, but not
844created:
845
846* The ancient V7 format. This is the first tar format from Unix Seventh Edition,
847 storing only regular files and directories. Names must not be longer than 100
848 characters, there is no user/group name information. Some archives have
849 miscalculated header checksums in case of fields with non-ASCII characters.
850
851* The SunOS tar extended format. This format is a variant of the POSIX.1-2001
852 pax format, but is not compatible.
853
Georg Brandl116aa622007-08-15 14:28:22 +0000854.. _tar-unicode:
855
856Unicode issues
857--------------
858
859The tar format was originally conceived to make backups on tape drives with the
860main focus on preserving file system information. Nowadays tar archives are
861commonly used for file distribution and exchanging archives over networks. One
Lars Gustäbel3741eff2007-08-21 12:17:05 +0000862problem of the original format (which is the basis of all other formats) is
863that there is no concept of supporting different character encodings. For
Georg Brandl116aa622007-08-15 14:28:22 +0000864example, an ordinary tar archive created on a *UTF-8* system cannot be read
Lars Gustäbel3741eff2007-08-21 12:17:05 +0000865correctly on a *Latin-1* system if it contains non-*ASCII* characters. Textual
866metadata (like filenames, linknames, user/group names) will appear damaged.
867Unfortunately, there is no way to autodetect the encoding of an archive. The
868pax format was designed to solve this problem. It stores non-ASCII metadata
869using the universal character encoding *UTF-8*.
Georg Brandl116aa622007-08-15 14:28:22 +0000870
Lars Gustäbel3741eff2007-08-21 12:17:05 +0000871The details of character conversion in :mod:`tarfile` are controlled by the
872*encoding* and *errors* keyword arguments of the :class:`TarFile` class.
Georg Brandl116aa622007-08-15 14:28:22 +0000873
Lars Gustäbel3741eff2007-08-21 12:17:05 +0000874*encoding* defines the character encoding to use for the metadata in the
875archive. The default value is :func:`sys.getfilesystemencoding` or ``'ascii'``
876as a fallback. Depending on whether the archive is read or written, the
877metadata must be either decoded or encoded. If *encoding* is not set
878appropriately, this conversion may fail.
Georg Brandl116aa622007-08-15 14:28:22 +0000879
880The *errors* argument defines how characters are treated that cannot be
Nick Coghlanb9fdb7a2015-01-07 00:22:00 +1000881converted. Possible values are listed in section :ref:`error-handlers`.
Victor Stinnerde629d42010-05-05 21:43:57 +0000882The default scheme is ``'surrogateescape'`` which Python also uses for its
883file system calls, see :ref:`os-filenames`.
Georg Brandl116aa622007-08-15 14:28:22 +0000884
CAM Gerlache680c3d2019-03-21 09:44:51 -0500885For :const:`PAX_FORMAT` archives (the default), *encoding* is generally not needed
Lars Gustäbel1465cc22010-05-17 18:02:50 +0000886because all the metadata is stored using *UTF-8*. *encoding* is only used in
887the rare cases when binary pax headers are decoded or when strings with
888surrogate characters are stored.