blob: c7012a7d48f6d347942e3714af48777f2ada3b37 [file] [log] [blame]
Georg Brandl116aa622007-08-15 14:28:22 +00001:mod:`tarfile` --- Read and write tar archive files
2===================================================
3
4.. module:: tarfile
5 :synopsis: Read and write tar-format archive files.
6
Georg Brandl116aa622007-08-15 14:28:22 +00007.. moduleauthor:: Lars Gustäbel <lars@gustaebel.de>
8.. sectionauthor:: Lars Gustäbel <lars@gustaebel.de>
9
Raymond Hettingera1993682011-01-27 01:20:32 +000010**Source code:** :source:`Lib/tarfile.py`
11
12--------------
Georg Brandl116aa622007-08-15 14:28:22 +000013
Guido van Rossum77677112007-11-05 19:43:04 +000014The :mod:`tarfile` module makes it possible to read and write tar
Lars Gustäbel0a9dd2f2011-12-10 20:38:14 +010015archives, including those using gzip, bz2 and lzma compression.
Éric Araujof2fbb9c2012-01-16 16:55:55 +010016Use the :mod:`zipfile` module to read or write :file:`.zip` files, or the
17higher-level functions in :ref:`shutil <archiving-operations>`.
Guido van Rossum77677112007-11-05 19:43:04 +000018
Georg Brandl116aa622007-08-15 14:28:22 +000019Some facts and figures:
20
R David Murraybf92bce2014-10-03 20:18:48 -040021* reads and writes :mod:`gzip`, :mod:`bz2` and :mod:`lzma` compressed archives
22 if the respective modules are available.
Georg Brandl116aa622007-08-15 14:28:22 +000023
24* read/write support for the POSIX.1-1988 (ustar) format.
25
26* read/write support for the GNU tar format including *longname* and *longlink*
Lars Gustäbel9cbdd752010-10-29 09:08:19 +000027 extensions, read-only support for all variants of the *sparse* extension
28 including restoration of sparse files.
Georg Brandl116aa622007-08-15 14:28:22 +000029
30* read/write support for the POSIX.1-2001 (pax) format.
31
Georg Brandl116aa622007-08-15 14:28:22 +000032* handles directories, regular files, hardlinks, symbolic links, fifos,
33 character devices and block devices and is able to acquire and restore file
34 information like timestamp, access permissions and owner.
35
Lars Gustäbel521dfb02011-12-12 10:22:56 +010036.. versionchanged:: 3.3
37 Added support for :mod:`lzma` compression.
38
Georg Brandl116aa622007-08-15 14:28:22 +000039
Benjamin Petersona37cfc62008-05-26 13:48:34 +000040.. function:: open(name=None, mode='r', fileobj=None, bufsize=10240, \*\*kwargs)
Georg Brandl116aa622007-08-15 14:28:22 +000041
42 Return a :class:`TarFile` object for the pathname *name*. For detailed
43 information on :class:`TarFile` objects and the keyword arguments that are
44 allowed, see :ref:`tarfile-objects`.
45
46 *mode* has to be a string of the form ``'filemode[:compression]'``, it defaults
47 to ``'r'``. Here is a full list of mode combinations:
48
49 +------------------+---------------------------------------------+
50 | mode | action |
51 +==================+=============================================+
52 | ``'r' or 'r:*'`` | Open for reading with transparent |
53 | | compression (recommended). |
54 +------------------+---------------------------------------------+
55 | ``'r:'`` | Open for reading exclusively without |
56 | | compression. |
57 +------------------+---------------------------------------------+
58 | ``'r:gz'`` | Open for reading with gzip compression. |
59 +------------------+---------------------------------------------+
60 | ``'r:bz2'`` | Open for reading with bzip2 compression. |
61 +------------------+---------------------------------------------+
Lars Gustäbel0a9dd2f2011-12-10 20:38:14 +010062 | ``'r:xz'`` | Open for reading with lzma compression. |
63 +------------------+---------------------------------------------+
Berker Peksag0fe63252015-02-13 21:02:12 +020064 | ``'x'`` or | Create a tarfile exclusively without |
65 | ``'x:'`` | compression. |
66 | | Raise an :exc:`FileExistsError` exception |
Berker Peksag97484782016-06-14 00:48:35 +030067 | | if it already exists. |
Berker Peksag0fe63252015-02-13 21:02:12 +020068 +------------------+---------------------------------------------+
69 | ``'x:gz'`` | Create a tarfile with gzip compression. |
70 | | Raise an :exc:`FileExistsError` exception |
Berker Peksag97484782016-06-14 00:48:35 +030071 | | if it already exists. |
Berker Peksag0fe63252015-02-13 21:02:12 +020072 +------------------+---------------------------------------------+
73 | ``'x:bz2'`` | Create a tarfile with bzip2 compression. |
74 | | Raise an :exc:`FileExistsError` exception |
Berker Peksag97484782016-06-14 00:48:35 +030075 | | if it already exists. |
Berker Peksag0fe63252015-02-13 21:02:12 +020076 +------------------+---------------------------------------------+
77 | ``'x:xz'`` | Create a tarfile with lzma compression. |
78 | | Raise an :exc:`FileExistsError` exception |
Berker Peksag97484782016-06-14 00:48:35 +030079 | | if it already exists. |
Berker Peksag0fe63252015-02-13 21:02:12 +020080 +------------------+---------------------------------------------+
Georg Brandl116aa622007-08-15 14:28:22 +000081 | ``'a' or 'a:'`` | Open for appending with no compression. The |
82 | | file is created if it does not exist. |
83 +------------------+---------------------------------------------+
84 | ``'w' or 'w:'`` | Open for uncompressed writing. |
85 +------------------+---------------------------------------------+
86 | ``'w:gz'`` | Open for gzip compressed writing. |
87 +------------------+---------------------------------------------+
88 | ``'w:bz2'`` | Open for bzip2 compressed writing. |
89 +------------------+---------------------------------------------+
Lars Gustäbel0a9dd2f2011-12-10 20:38:14 +010090 | ``'w:xz'`` | Open for lzma compressed writing. |
91 +------------------+---------------------------------------------+
Georg Brandl116aa622007-08-15 14:28:22 +000092
Lars Gustäbel0a9dd2f2011-12-10 20:38:14 +010093 Note that ``'a:gz'``, ``'a:bz2'`` or ``'a:xz'`` is not possible. If *mode*
94 is not suitable to open a certain (compressed) file for reading,
95 :exc:`ReadError` is raised. Use *mode* ``'r'`` to avoid this. If a
96 compression method is not supported, :exc:`CompressionError` is raised.
Georg Brandl116aa622007-08-15 14:28:22 +000097
Antoine Pitrou11cb9612010-09-15 11:11:28 +000098 If *fileobj* is specified, it is used as an alternative to a :term:`file object`
99 opened in binary mode for *name*. It is supposed to be at position 0.
Georg Brandl116aa622007-08-15 14:28:22 +0000100
Berker Peksag0fe63252015-02-13 21:02:12 +0200101 For modes ``'w:gz'``, ``'r:gz'``, ``'w:bz2'``, ``'r:bz2'``, ``'x:gz'``,
102 ``'x:bz2'``, :func:`tarfile.open` accepts the keyword argument
Martin Panter7f7c6052016-04-13 03:24:06 +0000103 *compresslevel* (default ``9``) to specify the compression level of the file.
Benjamin Peterson9b2731b2014-06-07 12:45:37 -0700104
Georg Brandl116aa622007-08-15 14:28:22 +0000105 For special purposes, there is a second format for *mode*:
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000106 ``'filemode|[compression]'``. :func:`tarfile.open` will return a :class:`TarFile`
Georg Brandl116aa622007-08-15 14:28:22 +0000107 object that processes its data as a stream of blocks. No random seeking will
108 be done on the file. If given, *fileobj* may be any object that has a
109 :meth:`read` or :meth:`write` method (depending on the *mode*). *bufsize*
110 specifies the blocksize and defaults to ``20 * 512`` bytes. Use this variant
Antoine Pitrou11cb9612010-09-15 11:11:28 +0000111 in combination with e.g. ``sys.stdin``, a socket :term:`file object` or a tape
Georg Brandl116aa622007-08-15 14:28:22 +0000112 device. However, such a :class:`TarFile` object is limited in that it does
Martin Panterc04fb562016-02-10 05:44:01 +0000113 not allow random access, see :ref:`tar-examples`. The currently
Georg Brandl116aa622007-08-15 14:28:22 +0000114 possible modes:
115
116 +-------------+--------------------------------------------+
117 | Mode | Action |
118 +=============+============================================+
119 | ``'r|*'`` | Open a *stream* of tar blocks for reading |
120 | | with transparent compression. |
121 +-------------+--------------------------------------------+
122 | ``'r|'`` | Open a *stream* of uncompressed tar blocks |
123 | | for reading. |
124 +-------------+--------------------------------------------+
125 | ``'r|gz'`` | Open a gzip compressed *stream* for |
126 | | reading. |
127 +-------------+--------------------------------------------+
128 | ``'r|bz2'`` | Open a bzip2 compressed *stream* for |
129 | | reading. |
130 +-------------+--------------------------------------------+
Serhiy Storchaka6a7b3a72016-04-17 08:32:47 +0300131 | ``'r|xz'`` | Open an lzma compressed *stream* for |
Lars Gustäbel0a9dd2f2011-12-10 20:38:14 +0100132 | | reading. |
133 +-------------+--------------------------------------------+
Georg Brandl116aa622007-08-15 14:28:22 +0000134 | ``'w|'`` | Open an uncompressed *stream* for writing. |
135 +-------------+--------------------------------------------+
Lars Gustäbel0c6cbbd2011-12-10 12:45:45 +0100136 | ``'w|gz'`` | Open a gzip compressed *stream* for |
Georg Brandl116aa622007-08-15 14:28:22 +0000137 | | writing. |
138 +-------------+--------------------------------------------+
Lars Gustäbel0c6cbbd2011-12-10 12:45:45 +0100139 | ``'w|bz2'`` | Open a bzip2 compressed *stream* for |
Georg Brandl116aa622007-08-15 14:28:22 +0000140 | | writing. |
141 +-------------+--------------------------------------------+
Lars Gustäbel0a9dd2f2011-12-10 20:38:14 +0100142 | ``'w|xz'`` | Open an lzma compressed *stream* for |
143 | | writing. |
144 +-------------+--------------------------------------------+
Georg Brandl116aa622007-08-15 14:28:22 +0000145
Berker Peksag0fe63252015-02-13 21:02:12 +0200146 .. versionchanged:: 3.5
147 The ``'x'`` (exclusive creation) mode was added.
Georg Brandl116aa622007-08-15 14:28:22 +0000148
Serhiy Storchakac45cd162017-03-08 10:32:44 +0200149 .. versionchanged:: 3.6
150 The *name* parameter accepts a :term:`path-like object`.
151
152
Georg Brandl116aa622007-08-15 14:28:22 +0000153.. class:: TarFile
154
Berker Peksag97484782016-06-14 00:48:35 +0300155 Class for reading and writing tar archives. Do not use this class directly:
156 use :func:`tarfile.open` instead. See :ref:`tarfile-objects`.
Georg Brandl116aa622007-08-15 14:28:22 +0000157
158
159.. function:: is_tarfile(name)
160
161 Return :const:`True` if *name* is a tar archive file, that the :mod:`tarfile`
162 module can read.
163
164
Lars Gustäbel0c24e8b2008-08-02 11:43:24 +0000165The :mod:`tarfile` module defines the following exceptions:
Georg Brandl116aa622007-08-15 14:28:22 +0000166
167
168.. exception:: TarError
169
170 Base class for all :mod:`tarfile` exceptions.
171
172
173.. exception:: ReadError
174
175 Is raised when a tar archive is opened, that either cannot be handled by the
176 :mod:`tarfile` module or is somehow invalid.
177
178
179.. exception:: CompressionError
180
181 Is raised when a compression method is not supported or when the data cannot be
182 decoded properly.
183
184
185.. exception:: StreamError
186
187 Is raised for the limitations that are typical for stream-like :class:`TarFile`
188 objects.
189
190
191.. exception:: ExtractError
192
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000193 Is raised for *non-fatal* errors when using :meth:`TarFile.extract`, but only if
Georg Brandl116aa622007-08-15 14:28:22 +0000194 :attr:`TarFile.errorlevel`\ ``== 2``.
195
196
197.. exception:: HeaderError
198
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000199 Is raised by :meth:`TarInfo.frombuf` if the buffer it gets is invalid.
200
Georg Brandl116aa622007-08-15 14:28:22 +0000201
R David Murraybf92bce2014-10-03 20:18:48 -0400202The following constants are available at the module level:
203
204.. data:: ENCODING
205
206 The default character encoding: ``'utf-8'`` on Windows, the value returned by
207 :func:`sys.getfilesystemencoding` otherwise.
208
Georg Brandl116aa622007-08-15 14:28:22 +0000209
210Each of the following constants defines a tar archive format that the
211:mod:`tarfile` module is able to create. See section :ref:`tar-formats` for
212details.
213
214
215.. data:: USTAR_FORMAT
216
217 POSIX.1-1988 (ustar) format.
218
219
220.. data:: GNU_FORMAT
221
222 GNU tar format.
223
224
225.. data:: PAX_FORMAT
226
227 POSIX.1-2001 (pax) format.
228
229
230.. data:: DEFAULT_FORMAT
231
CAM Gerlache680c3d2019-03-21 09:44:51 -0500232 The default format for creating archives. This is currently :const:`PAX_FORMAT`.
233
234 .. versionchanged:: 3.8
235 The default format for new archives was changed to
236 :const:`PAX_FORMAT` from :const:`GNU_FORMAT`.
Georg Brandl116aa622007-08-15 14:28:22 +0000237
238
239.. seealso::
240
241 Module :mod:`zipfile`
242 Documentation of the :mod:`zipfile` standard module.
243
R David Murraybf92bce2014-10-03 20:18:48 -0400244 :ref:`archiving-operations`
245 Documentation of the higher-level archiving facilities provided by the
246 standard :mod:`shutil` module.
247
Serhiy Storchaka6dff0202016-05-07 10:49:07 +0300248 `GNU tar manual, Basic Tar Format <https://www.gnu.org/software/tar/manual/html_node/Standard.html>`_
Georg Brandl116aa622007-08-15 14:28:22 +0000249 Documentation for tar archive files, including GNU tar extensions.
250
Georg Brandl116aa622007-08-15 14:28:22 +0000251
252.. _tarfile-objects:
253
254TarFile Objects
255---------------
256
257The :class:`TarFile` object provides an interface to a tar archive. A tar
258archive is a sequence of blocks. An archive member (a stored file) is made up of
259a header block followed by data blocks. It is possible to store a file in a tar
260archive several times. Each archive member is represented by a :class:`TarInfo`
261object, see :ref:`tarinfo-objects` for details.
262
Lars Gustäbel01385812010-03-03 12:08:54 +0000263A :class:`TarFile` object can be used as a context manager in a :keyword:`with`
264statement. It will automatically be closed when the block is completed. Please
265note that in the event of an exception an archive opened for writing will not
Benjamin Peterson08bf91c2010-04-11 16:12:57 +0000266be finalized; only the internally used file object will be closed. See the
Lars Gustäbel01385812010-03-03 12:08:54 +0000267:ref:`tar-examples` section for a use case.
268
269.. versionadded:: 3.2
Serhiy Storchaka14867992014-09-10 23:43:41 +0300270 Added support for the context management protocol.
Georg Brandl116aa622007-08-15 14:28:22 +0000271
Victor Stinnerde629d42010-05-05 21:43:57 +0000272.. class:: TarFile(name=None, mode='r', fileobj=None, format=DEFAULT_FORMAT, tarinfo=TarInfo, dereference=False, ignore_zeros=False, encoding=ENCODING, errors='surrogateescape', pax_headers=None, debug=0, errorlevel=0)
Georg Brandl116aa622007-08-15 14:28:22 +0000273
274 All following arguments are optional and can be accessed as instance attributes
275 as well.
276
Serhiy Storchakac45cd162017-03-08 10:32:44 +0200277 *name* is the pathname of the archive. *name* may be a :term:`path-like object`.
278 It can be omitted if *fileobj* is given.
Georg Brandl116aa622007-08-15 14:28:22 +0000279 In this case, the file object's :attr:`name` attribute is used if it exists.
280
281 *mode* is either ``'r'`` to read from an existing archive, ``'a'`` to append
Berker Peksag0fe63252015-02-13 21:02:12 +0200282 data to an existing file, ``'w'`` to create a new file overwriting an existing
Berker Peksag97484782016-06-14 00:48:35 +0300283 one, or ``'x'`` to create a new file only if it does not already exist.
Georg Brandl116aa622007-08-15 14:28:22 +0000284
285 If *fileobj* is given, it is used for reading or writing data. If it can be
286 determined, *mode* is overridden by *fileobj*'s mode. *fileobj* will be used
287 from position 0.
288
289 .. note::
290
291 *fileobj* is not closed, when :class:`TarFile` is closed.
292
293 *format* controls the archive format. It must be one of the constants
294 :const:`USTAR_FORMAT`, :const:`GNU_FORMAT` or :const:`PAX_FORMAT` that are
295 defined at module level.
296
Georg Brandl116aa622007-08-15 14:28:22 +0000297 The *tarinfo* argument can be used to replace the default :class:`TarInfo` class
298 with a different one.
299
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000300 If *dereference* is :const:`False`, add symbolic and hard links to the archive. If it
301 is :const:`True`, add the content of the target files to the archive. This has no
Georg Brandl116aa622007-08-15 14:28:22 +0000302 effect on systems that do not support symbolic links.
303
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000304 If *ignore_zeros* is :const:`False`, treat an empty block as the end of the archive.
305 If it is :const:`True`, skip empty (and invalid) blocks and try to get as many members
Georg Brandl116aa622007-08-15 14:28:22 +0000306 as possible. This is only useful for reading concatenated or damaged archives.
307
308 *debug* can be set from ``0`` (no debug messages) up to ``3`` (all debug
309 messages). The messages are written to ``sys.stderr``.
310
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000311 If *errorlevel* is ``0``, all errors are ignored when using :meth:`TarFile.extract`.
Georg Brandl116aa622007-08-15 14:28:22 +0000312 Nevertheless, they appear as error messages in the debug output, when debugging
Antoine Pitrou62ab10a02011-10-12 20:10:51 +0200313 is enabled. If ``1``, all *fatal* errors are raised as :exc:`OSError`
314 exceptions. If ``2``, all *non-fatal* errors are raised as :exc:`TarError`
315 exceptions as well.
Georg Brandl116aa622007-08-15 14:28:22 +0000316
Lars Gustäbel3741eff2007-08-21 12:17:05 +0000317 The *encoding* and *errors* arguments define the character encoding to be
318 used for reading or writing the archive and how conversion errors are going
319 to be handled. The default settings will work for most users.
Georg Brandl116aa622007-08-15 14:28:22 +0000320 See section :ref:`tar-unicode` for in-depth information.
321
Lars Gustäbel3741eff2007-08-21 12:17:05 +0000322 The *pax_headers* argument is an optional dictionary of strings which
Georg Brandl116aa622007-08-15 14:28:22 +0000323 will be added as a pax global header if *format* is :const:`PAX_FORMAT`.
324
Berker Peksag0fe63252015-02-13 21:02:12 +0200325 .. versionchanged:: 3.2
326 Use ``'surrogateescape'`` as the default for the *errors* argument.
327
328 .. versionchanged:: 3.5
329 The ``'x'`` (exclusive creation) mode was added.
Georg Brandl116aa622007-08-15 14:28:22 +0000330
Serhiy Storchakac45cd162017-03-08 10:32:44 +0200331 .. versionchanged:: 3.6
332 The *name* parameter accepts a :term:`path-like object`.
333
334
Raymond Hettinger7096e262014-05-23 03:46:52 +0100335.. classmethod:: TarFile.open(...)
Georg Brandl116aa622007-08-15 14:28:22 +0000336
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000337 Alternative constructor. The :func:`tarfile.open` function is actually a
338 shortcut to this classmethod.
Georg Brandl116aa622007-08-15 14:28:22 +0000339
340
341.. method:: TarFile.getmember(name)
342
343 Return a :class:`TarInfo` object for member *name*. If *name* can not be found
344 in the archive, :exc:`KeyError` is raised.
345
346 .. note::
347
348 If a member occurs more than once in the archive, its last occurrence is assumed
349 to be the most up-to-date version.
350
351
352.. method:: TarFile.getmembers()
353
354 Return the members of the archive as a list of :class:`TarInfo` objects. The
355 list has the same order as the members in the archive.
356
357
358.. method:: TarFile.getnames()
359
360 Return the members as a list of their names. It has the same order as the list
361 returned by :meth:`getmembers`.
362
363
Serhiy Storchakaa7eb7462014-08-21 10:01:16 +0300364.. method:: TarFile.list(verbose=True, *, members=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000365
366 Print a table of contents to ``sys.stdout``. If *verbose* is :const:`False`,
367 only the names of the members are printed. If it is :const:`True`, output
Serhiy Storchakaa7eb7462014-08-21 10:01:16 +0300368 similar to that of :program:`ls -l` is produced. If optional *members* is
369 given, it must be a subset of the list returned by :meth:`getmembers`.
370
371 .. versionchanged:: 3.5
372 Added the *members* parameter.
Georg Brandl116aa622007-08-15 14:28:22 +0000373
374
375.. method:: TarFile.next()
376
377 Return the next member of the archive as a :class:`TarInfo` object, when
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000378 :class:`TarFile` is opened for reading. Return :const:`None` if there is no more
Georg Brandl116aa622007-08-15 14:28:22 +0000379 available.
380
381
Eric V. Smith7a803892015-04-15 10:27:58 -0400382.. method:: TarFile.extractall(path=".", members=None, *, numeric_owner=False)
Georg Brandl116aa622007-08-15 14:28:22 +0000383
384 Extract all members from the archive to the current working directory or
385 directory *path*. If optional *members* is given, it must be a subset of the
386 list returned by :meth:`getmembers`. Directory information like owner,
387 modification time and permissions are set after all members have been extracted.
388 This is done to work around two problems: A directory's modification time is
389 reset each time a file is created in it. And, if a directory's permissions do
390 not allow writing, extracting files to it will fail.
391
Eric V. Smith7a803892015-04-15 10:27:58 -0400392 If *numeric_owner* is :const:`True`, the uid and gid numbers from the tarfile
393 are used to set the owner/group for the extracted files. Otherwise, the named
394 values from the tarfile are used.
395
Thomas Wouters47b49bf2007-08-30 22:15:33 +0000396 .. warning::
397
398 Never extract archives from untrusted sources without prior inspection.
399 It is possible that files are created outside of *path*, e.g. members
400 that have absolute filenames starting with ``"/"`` or filenames with two
401 dots ``".."``.
402
Eric V. Smith7a803892015-04-15 10:27:58 -0400403 .. versionchanged:: 3.5
Martin Panterefbf20f2016-11-13 23:25:06 +0000404 Added the *numeric_owner* parameter.
Georg Brandl116aa622007-08-15 14:28:22 +0000405
Serhiy Storchakac45cd162017-03-08 10:32:44 +0200406 .. versionchanged:: 3.6
407 The *path* parameter accepts a :term:`path-like object`.
408
Eric V. Smith7a803892015-04-15 10:27:58 -0400409
410.. method:: TarFile.extract(member, path="", set_attrs=True, *, numeric_owner=False)
Georg Brandl116aa622007-08-15 14:28:22 +0000411
412 Extract a member from the archive to the current working directory, using its
413 full name. Its file information is extracted as accurately as possible. *member*
414 may be a filename or a :class:`TarInfo` object. You can specify a different
Serhiy Storchakac45cd162017-03-08 10:32:44 +0200415 directory using *path*. *path* may be a :term:`path-like object`.
416 File attributes (owner, mtime, mode) are set unless *set_attrs* is false.
Georg Brandl116aa622007-08-15 14:28:22 +0000417
Eric V. Smith7a803892015-04-15 10:27:58 -0400418 If *numeric_owner* is :const:`True`, the uid and gid numbers from the tarfile
419 are used to set the owner/group for the extracted files. Otherwise, the named
420 values from the tarfile are used.
421
Georg Brandl116aa622007-08-15 14:28:22 +0000422 .. note::
423
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000424 The :meth:`extract` method does not take care of several extraction issues.
425 In most cases you should consider using the :meth:`extractall` method.
Georg Brandl116aa622007-08-15 14:28:22 +0000426
Thomas Wouters47b49bf2007-08-30 22:15:33 +0000427 .. warning::
428
429 See the warning for :meth:`extractall`.
430
Martin v. Löwis16f344d2010-11-01 21:39:13 +0000431 .. versionchanged:: 3.2
432 Added the *set_attrs* parameter.
Georg Brandl116aa622007-08-15 14:28:22 +0000433
Eric V. Smith7a803892015-04-15 10:27:58 -0400434 .. versionchanged:: 3.5
Martin Panterefbf20f2016-11-13 23:25:06 +0000435 Added the *numeric_owner* parameter.
Eric V. Smith7a803892015-04-15 10:27:58 -0400436
Serhiy Storchakac45cd162017-03-08 10:32:44 +0200437 .. versionchanged:: 3.6
438 The *path* parameter accepts a :term:`path-like object`.
439
440
Georg Brandl116aa622007-08-15 14:28:22 +0000441.. method:: TarFile.extractfile(member)
442
443 Extract a member from the archive as a file object. *member* may be a filename
Lars Gustäbel7a919e92012-05-05 18:15:03 +0200444 or a :class:`TarInfo` object. If *member* is a regular file or a link, an
445 :class:`io.BufferedReader` object is returned. Otherwise, :const:`None` is
446 returned.
Georg Brandl116aa622007-08-15 14:28:22 +0000447
Lars Gustäbel7a919e92012-05-05 18:15:03 +0200448 .. versionchanged:: 3.3
449 Return an :class:`io.BufferedReader` object.
Georg Brandl116aa622007-08-15 14:28:22 +0000450
451
Serhiy Storchaka4f76fb12017-01-13 13:25:24 +0200452.. method:: TarFile.add(name, arcname=None, recursive=True, *, filter=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000453
Raymond Hettingera63a3122011-01-26 20:34:14 +0000454 Add the file *name* to the archive. *name* may be any type of file
455 (directory, fifo, symbolic link, etc.). If given, *arcname* specifies an
456 alternative name for the file in the archive. Directories are added
457 recursively by default. This can be avoided by setting *recursive* to
Bernhard M. Wiedemann84521042018-01-31 11:17:10 +0100458 :const:`False`. Recursion adds entries in sorted order.
459 If *filter* is given, it
Raymond Hettingera63a3122011-01-26 20:34:14 +0000460 should be a function that takes a :class:`TarInfo` object argument and
461 returns the changed :class:`TarInfo` object. If it instead returns
462 :const:`None` the :class:`TarInfo` object will be excluded from the
463 archive. See :ref:`tar-examples` for an example.
Lars Gustäbel049d2aa2009-09-12 10:44:00 +0000464
465 .. versionchanged:: 3.2
466 Added the *filter* parameter.
467
Bernhard M. Wiedemann84521042018-01-31 11:17:10 +0100468 .. versionchanged:: 3.7
469 Recursion adds entries in sorted order.
470
Georg Brandl116aa622007-08-15 14:28:22 +0000471
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000472.. method:: TarFile.addfile(tarinfo, fileobj=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000473
474 Add the :class:`TarInfo` object *tarinfo* to the archive. If *fileobj* is given,
Martin Panterf817a482016-02-19 23:34:56 +0000475 it should be a :term:`binary file`, and
Georg Brandl116aa622007-08-15 14:28:22 +0000476 ``tarinfo.size`` bytes are read from it and added to the archive. You can
Martin Panterf817a482016-02-19 23:34:56 +0000477 create :class:`TarInfo` objects directly, or by using :meth:`gettarinfo`.
Georg Brandl116aa622007-08-15 14:28:22 +0000478
479
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000480.. method:: TarFile.gettarinfo(name=None, arcname=None, fileobj=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000481
Martin Panterf817a482016-02-19 23:34:56 +0000482 Create a :class:`TarInfo` object from the result of :func:`os.stat` or
483 equivalent on an existing file. The file is either named by *name*, or
Serhiy Storchakac45cd162017-03-08 10:32:44 +0200484 specified as a :term:`file object` *fileobj* with a file descriptor.
485 *name* may be a :term:`path-like object`. If
Martin Panterf817a482016-02-19 23:34:56 +0000486 given, *arcname* specifies an alternative name for the file in the
487 archive, otherwise, the name is taken from *fileobj*’s
488 :attr:`~io.FileIO.name` attribute, or the *name* argument. The name
489 should be a text string.
490
491 You can modify
492 some of the :class:`TarInfo`’s attributes before you add it using :meth:`addfile`.
493 If the file object is not an ordinary file object positioned at the
494 beginning of the file, attributes such as :attr:`~TarInfo.size` may need
495 modifying. This is the case for objects such as :class:`~gzip.GzipFile`.
496 The :attr:`~TarInfo.name` may also be modified, in which case *arcname*
497 could be a dummy string.
Georg Brandl116aa622007-08-15 14:28:22 +0000498
Serhiy Storchakac45cd162017-03-08 10:32:44 +0200499 .. versionchanged:: 3.6
500 The *name* parameter accepts a :term:`path-like object`.
501
Georg Brandl116aa622007-08-15 14:28:22 +0000502
503.. method:: TarFile.close()
504
505 Close the :class:`TarFile`. In write mode, two finishing zero blocks are
506 appended to the archive.
507
508
Georg Brandl116aa622007-08-15 14:28:22 +0000509.. attribute:: TarFile.pax_headers
510
511 A dictionary containing key-value pairs of pax global headers.
512
Georg Brandl116aa622007-08-15 14:28:22 +0000513
Georg Brandl116aa622007-08-15 14:28:22 +0000514
515.. _tarinfo-objects:
516
517TarInfo Objects
518---------------
519
520A :class:`TarInfo` object represents one member in a :class:`TarFile`. Aside
521from storing all required attributes of a file (like file type, size, time,
522permissions, owner etc.), it provides some useful methods to determine its type.
523It does *not* contain the file's data itself.
524
525:class:`TarInfo` objects are returned by :class:`TarFile`'s methods
526:meth:`getmember`, :meth:`getmembers` and :meth:`gettarinfo`.
527
528
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000529.. class:: TarInfo(name="")
Georg Brandl116aa622007-08-15 14:28:22 +0000530
531 Create a :class:`TarInfo` object.
532
533
Berker Peksag37de9102015-04-19 04:37:35 +0300534.. classmethod:: TarInfo.frombuf(buf, encoding, errors)
Georg Brandl116aa622007-08-15 14:28:22 +0000535
536 Create and return a :class:`TarInfo` object from string buffer *buf*.
537
Berker Peksag37de9102015-04-19 04:37:35 +0300538 Raises :exc:`HeaderError` if the buffer is invalid.
Georg Brandl116aa622007-08-15 14:28:22 +0000539
540
Berker Peksag37de9102015-04-19 04:37:35 +0300541.. classmethod:: TarInfo.fromtarfile(tarfile)
Georg Brandl116aa622007-08-15 14:28:22 +0000542
543 Read the next member from the :class:`TarFile` object *tarfile* and return it as
544 a :class:`TarInfo` object.
545
Georg Brandl116aa622007-08-15 14:28:22 +0000546
Victor Stinnerde629d42010-05-05 21:43:57 +0000547.. method:: TarInfo.tobuf(format=DEFAULT_FORMAT, encoding=ENCODING, errors='surrogateescape')
Georg Brandl116aa622007-08-15 14:28:22 +0000548
549 Create a string buffer from a :class:`TarInfo` object. For information on the
550 arguments see the constructor of the :class:`TarFile` class.
551
Victor Stinnerde629d42010-05-05 21:43:57 +0000552 .. versionchanged:: 3.2
553 Use ``'surrogateescape'`` as the default for the *errors* argument.
554
Georg Brandl116aa622007-08-15 14:28:22 +0000555
556A ``TarInfo`` object has the following public data attributes:
557
558
559.. attribute:: TarInfo.name
560
561 Name of the archive member.
562
563
564.. attribute:: TarInfo.size
565
566 Size in bytes.
567
568
569.. attribute:: TarInfo.mtime
570
571 Time of last modification.
572
573
574.. attribute:: TarInfo.mode
575
576 Permission bits.
577
578
579.. attribute:: TarInfo.type
580
581 File type. *type* is usually one of these constants: :const:`REGTYPE`,
582 :const:`AREGTYPE`, :const:`LNKTYPE`, :const:`SYMTYPE`, :const:`DIRTYPE`,
583 :const:`FIFOTYPE`, :const:`CONTTYPE`, :const:`CHRTYPE`, :const:`BLKTYPE`,
584 :const:`GNUTYPE_SPARSE`. To determine the type of a :class:`TarInfo` object
Raymond Hettingerf7f64f92014-05-23 00:03:45 +0100585 more conveniently, use the ``is*()`` methods below.
Georg Brandl116aa622007-08-15 14:28:22 +0000586
587
588.. attribute:: TarInfo.linkname
589
590 Name of the target file name, which is only present in :class:`TarInfo` objects
591 of type :const:`LNKTYPE` and :const:`SYMTYPE`.
592
593
594.. attribute:: TarInfo.uid
595
596 User ID of the user who originally stored this member.
597
598
599.. attribute:: TarInfo.gid
600
601 Group ID of the user who originally stored this member.
602
603
604.. attribute:: TarInfo.uname
605
606 User name.
607
608
609.. attribute:: TarInfo.gname
610
611 Group name.
612
613
614.. attribute:: TarInfo.pax_headers
615
616 A dictionary containing key-value pairs of an associated pax extended header.
617
Georg Brandl116aa622007-08-15 14:28:22 +0000618
619A :class:`TarInfo` object also provides some convenient query methods:
620
621
622.. method:: TarInfo.isfile()
623
624 Return :const:`True` if the :class:`Tarinfo` object is a regular file.
625
626
627.. method:: TarInfo.isreg()
628
629 Same as :meth:`isfile`.
630
631
632.. method:: TarInfo.isdir()
633
634 Return :const:`True` if it is a directory.
635
636
637.. method:: TarInfo.issym()
638
639 Return :const:`True` if it is a symbolic link.
640
641
642.. method:: TarInfo.islnk()
643
644 Return :const:`True` if it is a hard link.
645
646
647.. method:: TarInfo.ischr()
648
649 Return :const:`True` if it is a character device.
650
651
652.. method:: TarInfo.isblk()
653
654 Return :const:`True` if it is a block device.
655
656
657.. method:: TarInfo.isfifo()
658
659 Return :const:`True` if it is a FIFO.
660
661
662.. method:: TarInfo.isdev()
663
664 Return :const:`True` if it is one of character device, block device or FIFO.
665
Georg Brandl116aa622007-08-15 14:28:22 +0000666
Serhiy Storchakad27b4552013-11-24 01:53:29 +0200667.. _tarfile-commandline:
Serhiy Storchaka72b34432016-11-02 21:04:45 +0200668.. program:: tarfile
Serhiy Storchakad27b4552013-11-24 01:53:29 +0200669
Serhiy Storchaka72b34432016-11-02 21:04:45 +0200670Command-Line Interface
Serhiy Storchakad27b4552013-11-24 01:53:29 +0200671----------------------
672
673.. versionadded:: 3.4
674
Serhiy Storchaka72b34432016-11-02 21:04:45 +0200675The :mod:`tarfile` module provides a simple command-line interface to interact
Serhiy Storchakad27b4552013-11-24 01:53:29 +0200676with tar archives.
677
678If you want to create a new tar archive, specify its name after the :option:`-c`
Martin Panter1050d2d2016-07-26 11:18:21 +0200679option and then list the filename(s) that should be included:
680
681.. code-block:: shell-session
Serhiy Storchakad27b4552013-11-24 01:53:29 +0200682
683 $ python -m tarfile -c monty.tar spam.txt eggs.txt
684
Martin Panter1050d2d2016-07-26 11:18:21 +0200685Passing a directory is also acceptable:
686
687.. code-block:: shell-session
Serhiy Storchakad27b4552013-11-24 01:53:29 +0200688
689 $ python -m tarfile -c monty.tar life-of-brian_1979/
690
691If you want to extract a tar archive into the current directory, use
Martin Panter1050d2d2016-07-26 11:18:21 +0200692the :option:`-e` option:
693
694.. code-block:: shell-session
Serhiy Storchakad27b4552013-11-24 01:53:29 +0200695
696 $ python -m tarfile -e monty.tar
697
698You can also extract a tar archive into a different directory by passing the
Martin Panter1050d2d2016-07-26 11:18:21 +0200699directory's name:
700
701.. code-block:: shell-session
Serhiy Storchakad27b4552013-11-24 01:53:29 +0200702
703 $ python -m tarfile -e monty.tar other-dir/
704
Martin Panter1050d2d2016-07-26 11:18:21 +0200705For a list of the files in a tar archive, use the :option:`-l` option:
706
707.. code-block:: shell-session
Serhiy Storchakad27b4552013-11-24 01:53:29 +0200708
709 $ python -m tarfile -l monty.tar
710
711
Serhiy Storchaka72b34432016-11-02 21:04:45 +0200712Command-line options
Serhiy Storchakad27b4552013-11-24 01:53:29 +0200713~~~~~~~~~~~~~~~~~~~~
714
715.. cmdoption:: -l <tarfile>
716 --list <tarfile>
717
718 List files in a tarfile.
719
Serhiy Storchaka72b34432016-11-02 21:04:45 +0200720.. cmdoption:: -c <tarfile> <source1> ... <sourceN>
721 --create <tarfile> <source1> ... <sourceN>
Serhiy Storchakad27b4552013-11-24 01:53:29 +0200722
723 Create tarfile from source files.
724
725.. cmdoption:: -e <tarfile> [<output_dir>]
726 --extract <tarfile> [<output_dir>]
727
728 Extract tarfile into the current directory if *output_dir* is not specified.
729
730.. cmdoption:: -t <tarfile>
731 --test <tarfile>
732
733 Test whether the tarfile is valid or not.
734
735.. cmdoption:: -v, --verbose
736
Serhiy Storchaka72b34432016-11-02 21:04:45 +0200737 Verbose output.
Serhiy Storchakad27b4552013-11-24 01:53:29 +0200738
Georg Brandl116aa622007-08-15 14:28:22 +0000739.. _tar-examples:
740
741Examples
742--------
743
744How to extract an entire tar archive to the current working directory::
745
746 import tarfile
747 tar = tarfile.open("sample.tar.gz")
748 tar.extractall()
749 tar.close()
750
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000751How to extract a subset of a tar archive with :meth:`TarFile.extractall` using
752a generator function instead of a list::
753
754 import os
755 import tarfile
756
757 def py_files(members):
758 for tarinfo in members:
759 if os.path.splitext(tarinfo.name)[1] == ".py":
760 yield tarinfo
761
762 tar = tarfile.open("sample.tar.gz")
763 tar.extractall(members=py_files(tar))
764 tar.close()
765
Georg Brandl116aa622007-08-15 14:28:22 +0000766How to create an uncompressed tar archive from a list of filenames::
767
768 import tarfile
769 tar = tarfile.open("sample.tar", "w")
770 for name in ["foo", "bar", "quux"]:
771 tar.add(name)
772 tar.close()
773
Lars Gustäbel01385812010-03-03 12:08:54 +0000774The same example using the :keyword:`with` statement::
775
776 import tarfile
777 with tarfile.open("sample.tar", "w") as tar:
778 for name in ["foo", "bar", "quux"]:
779 tar.add(name)
780
Georg Brandl116aa622007-08-15 14:28:22 +0000781How to read a gzip compressed tar archive and display some member information::
782
783 import tarfile
784 tar = tarfile.open("sample.tar.gz", "r:gz")
785 for tarinfo in tar:
Collin Winterc79461b2007-09-01 23:34:30 +0000786 print(tarinfo.name, "is", tarinfo.size, "bytes in size and is", end="")
Georg Brandl116aa622007-08-15 14:28:22 +0000787 if tarinfo.isreg():
Collin Winterc79461b2007-09-01 23:34:30 +0000788 print("a regular file.")
Georg Brandl116aa622007-08-15 14:28:22 +0000789 elif tarinfo.isdir():
Collin Winterc79461b2007-09-01 23:34:30 +0000790 print("a directory.")
Georg Brandl116aa622007-08-15 14:28:22 +0000791 else:
Collin Winterc79461b2007-09-01 23:34:30 +0000792 print("something else.")
Georg Brandl116aa622007-08-15 14:28:22 +0000793 tar.close()
794
Lars Gustäbel049d2aa2009-09-12 10:44:00 +0000795How to create an archive and reset the user information using the *filter*
796parameter in :meth:`TarFile.add`::
797
798 import tarfile
799 def reset(tarinfo):
800 tarinfo.uid = tarinfo.gid = 0
801 tarinfo.uname = tarinfo.gname = "root"
802 return tarinfo
803 tar = tarfile.open("sample.tar.gz", "w:gz")
804 tar.add("foo", filter=reset)
805 tar.close()
806
Georg Brandl116aa622007-08-15 14:28:22 +0000807
808.. _tar-formats:
809
810Supported tar formats
811---------------------
812
813There are three tar formats that can be created with the :mod:`tarfile` module:
814
815* The POSIX.1-1988 ustar format (:const:`USTAR_FORMAT`). It supports filenames
816 up to a length of at best 256 characters and linknames up to 100 characters. The
Serhiy Storchakaf8def282013-02-16 17:29:56 +0200817 maximum file size is 8 GiB. This is an old and limited but widely
Georg Brandl116aa622007-08-15 14:28:22 +0000818 supported format.
819
820* The GNU tar format (:const:`GNU_FORMAT`). It supports long filenames and
Serhiy Storchakaf8def282013-02-16 17:29:56 +0200821 linknames, files bigger than 8 GiB and sparse files. It is the de facto
Georg Brandl116aa622007-08-15 14:28:22 +0000822 standard on GNU/Linux systems. :mod:`tarfile` fully supports the GNU tar
823 extensions for long names, sparse file support is read-only.
824
825* The POSIX.1-2001 pax format (:const:`PAX_FORMAT`). It is the most flexible
826 format with virtually no limits. It supports long filenames and linknames, large
CAM Gerlache680c3d2019-03-21 09:44:51 -0500827 files and stores pathnames in a portable way. Modern tar implementations,
828 including GNU tar, bsdtar/libarchive and star, fully support extended *pax*
829 features; some older or unmaintained libraries may not, but should treat
830 *pax* archives as if they were in the universally-supported *ustar* format.
Georg Brandl116aa622007-08-15 14:28:22 +0000831
832 The *pax* format is an extension to the existing *ustar* format. It uses extra
833 headers for information that cannot be stored otherwise. There are two flavours
834 of pax headers: Extended headers only affect the subsequent file header, global
835 headers are valid for the complete archive and affect all following files. All
836 the data in a pax header is encoded in *UTF-8* for portability reasons.
837
838There are some more variants of the tar format which can be read, but not
839created:
840
841* The ancient V7 format. This is the first tar format from Unix Seventh Edition,
842 storing only regular files and directories. Names must not be longer than 100
843 characters, there is no user/group name information. Some archives have
844 miscalculated header checksums in case of fields with non-ASCII characters.
845
846* The SunOS tar extended format. This format is a variant of the POSIX.1-2001
847 pax format, but is not compatible.
848
Georg Brandl116aa622007-08-15 14:28:22 +0000849.. _tar-unicode:
850
851Unicode issues
852--------------
853
854The tar format was originally conceived to make backups on tape drives with the
855main focus on preserving file system information. Nowadays tar archives are
856commonly used for file distribution and exchanging archives over networks. One
Lars Gustäbel3741eff2007-08-21 12:17:05 +0000857problem of the original format (which is the basis of all other formats) is
858that there is no concept of supporting different character encodings. For
Georg Brandl116aa622007-08-15 14:28:22 +0000859example, an ordinary tar archive created on a *UTF-8* system cannot be read
Lars Gustäbel3741eff2007-08-21 12:17:05 +0000860correctly on a *Latin-1* system if it contains non-*ASCII* characters. Textual
861metadata (like filenames, linknames, user/group names) will appear damaged.
862Unfortunately, there is no way to autodetect the encoding of an archive. The
863pax format was designed to solve this problem. It stores non-ASCII metadata
864using the universal character encoding *UTF-8*.
Georg Brandl116aa622007-08-15 14:28:22 +0000865
Lars Gustäbel3741eff2007-08-21 12:17:05 +0000866The details of character conversion in :mod:`tarfile` are controlled by the
867*encoding* and *errors* keyword arguments of the :class:`TarFile` class.
Georg Brandl116aa622007-08-15 14:28:22 +0000868
Lars Gustäbel3741eff2007-08-21 12:17:05 +0000869*encoding* defines the character encoding to use for the metadata in the
870archive. The default value is :func:`sys.getfilesystemencoding` or ``'ascii'``
871as a fallback. Depending on whether the archive is read or written, the
872metadata must be either decoded or encoded. If *encoding* is not set
873appropriately, this conversion may fail.
Georg Brandl116aa622007-08-15 14:28:22 +0000874
875The *errors* argument defines how characters are treated that cannot be
Nick Coghlanb9fdb7a2015-01-07 00:22:00 +1000876converted. Possible values are listed in section :ref:`error-handlers`.
Victor Stinnerde629d42010-05-05 21:43:57 +0000877The default scheme is ``'surrogateescape'`` which Python also uses for its
878file system calls, see :ref:`os-filenames`.
Georg Brandl116aa622007-08-15 14:28:22 +0000879
CAM Gerlache680c3d2019-03-21 09:44:51 -0500880For :const:`PAX_FORMAT` archives (the default), *encoding* is generally not needed
Lars Gustäbel1465cc22010-05-17 18:02:50 +0000881because all the metadata is stored using *UTF-8*. *encoding* is only used in
882the rare cases when binary pax headers are decoded or when strings with
883surrogate characters are stored.