blob: 9cd07158e7f62865fa459e4e18f58237746698e4 [file] [log] [blame]
Georg Brandl116aa622007-08-15 14:28:22 +00001:mod:`tarfile` --- Read and write tar archive files
2===================================================
3
4.. module:: tarfile
5 :synopsis: Read and write tar-format archive files.
6
Georg Brandl116aa622007-08-15 14:28:22 +00007.. moduleauthor:: Lars Gustäbel <lars@gustaebel.de>
8.. sectionauthor:: Lars Gustäbel <lars@gustaebel.de>
9
Raymond Hettingera1993682011-01-27 01:20:32 +000010**Source code:** :source:`Lib/tarfile.py`
11
12--------------
Georg Brandl116aa622007-08-15 14:28:22 +000013
Guido van Rossum77677112007-11-05 19:43:04 +000014The :mod:`tarfile` module makes it possible to read and write tar
Lars Gustäbel0a9dd2f2011-12-10 20:38:14 +010015archives, including those using gzip, bz2 and lzma compression.
Éric Araujof2fbb9c2012-01-16 16:55:55 +010016Use the :mod:`zipfile` module to read or write :file:`.zip` files, or the
17higher-level functions in :ref:`shutil <archiving-operations>`.
Guido van Rossum77677112007-11-05 19:43:04 +000018
Georg Brandl116aa622007-08-15 14:28:22 +000019Some facts and figures:
20
R David Murraybf92bce2014-10-03 20:18:48 -040021* reads and writes :mod:`gzip`, :mod:`bz2` and :mod:`lzma` compressed archives
22 if the respective modules are available.
Georg Brandl116aa622007-08-15 14:28:22 +000023
24* read/write support for the POSIX.1-1988 (ustar) format.
25
26* read/write support for the GNU tar format including *longname* and *longlink*
Lars Gustäbel9cbdd752010-10-29 09:08:19 +000027 extensions, read-only support for all variants of the *sparse* extension
28 including restoration of sparse files.
Georg Brandl116aa622007-08-15 14:28:22 +000029
30* read/write support for the POSIX.1-2001 (pax) format.
31
Georg Brandl116aa622007-08-15 14:28:22 +000032* handles directories, regular files, hardlinks, symbolic links, fifos,
33 character devices and block devices and is able to acquire and restore file
34 information like timestamp, access permissions and owner.
35
Lars Gustäbel521dfb02011-12-12 10:22:56 +010036.. versionchanged:: 3.3
37 Added support for :mod:`lzma` compression.
38
Georg Brandl116aa622007-08-15 14:28:22 +000039
Benjamin Petersona37cfc62008-05-26 13:48:34 +000040.. function:: open(name=None, mode='r', fileobj=None, bufsize=10240, \*\*kwargs)
Georg Brandl116aa622007-08-15 14:28:22 +000041
42 Return a :class:`TarFile` object for the pathname *name*. For detailed
43 information on :class:`TarFile` objects and the keyword arguments that are
44 allowed, see :ref:`tarfile-objects`.
45
46 *mode* has to be a string of the form ``'filemode[:compression]'``, it defaults
47 to ``'r'``. Here is a full list of mode combinations:
48
49 +------------------+---------------------------------------------+
50 | mode | action |
51 +==================+=============================================+
52 | ``'r' or 'r:*'`` | Open for reading with transparent |
53 | | compression (recommended). |
54 +------------------+---------------------------------------------+
55 | ``'r:'`` | Open for reading exclusively without |
56 | | compression. |
57 +------------------+---------------------------------------------+
58 | ``'r:gz'`` | Open for reading with gzip compression. |
59 +------------------+---------------------------------------------+
60 | ``'r:bz2'`` | Open for reading with bzip2 compression. |
61 +------------------+---------------------------------------------+
Lars Gustäbel0a9dd2f2011-12-10 20:38:14 +010062 | ``'r:xz'`` | Open for reading with lzma compression. |
63 +------------------+---------------------------------------------+
Berker Peksag0fe63252015-02-13 21:02:12 +020064 | ``'x'`` or | Create a tarfile exclusively without |
65 | ``'x:'`` | compression. |
66 | | Raise an :exc:`FileExistsError` exception |
Berker Peksag97484782016-06-14 00:48:35 +030067 | | if it already exists. |
Berker Peksag0fe63252015-02-13 21:02:12 +020068 +------------------+---------------------------------------------+
69 | ``'x:gz'`` | Create a tarfile with gzip compression. |
70 | | Raise an :exc:`FileExistsError` exception |
Berker Peksag97484782016-06-14 00:48:35 +030071 | | if it already exists. |
Berker Peksag0fe63252015-02-13 21:02:12 +020072 +------------------+---------------------------------------------+
73 | ``'x:bz2'`` | Create a tarfile with bzip2 compression. |
74 | | Raise an :exc:`FileExistsError` exception |
Berker Peksag97484782016-06-14 00:48:35 +030075 | | if it already exists. |
Berker Peksag0fe63252015-02-13 21:02:12 +020076 +------------------+---------------------------------------------+
77 | ``'x:xz'`` | Create a tarfile with lzma compression. |
78 | | Raise an :exc:`FileExistsError` exception |
Berker Peksag97484782016-06-14 00:48:35 +030079 | | if it already exists. |
Berker Peksag0fe63252015-02-13 21:02:12 +020080 +------------------+---------------------------------------------+
Georg Brandl116aa622007-08-15 14:28:22 +000081 | ``'a' or 'a:'`` | Open for appending with no compression. The |
82 | | file is created if it does not exist. |
83 +------------------+---------------------------------------------+
84 | ``'w' or 'w:'`` | Open for uncompressed writing. |
85 +------------------+---------------------------------------------+
86 | ``'w:gz'`` | Open for gzip compressed writing. |
87 +------------------+---------------------------------------------+
88 | ``'w:bz2'`` | Open for bzip2 compressed writing. |
89 +------------------+---------------------------------------------+
Lars Gustäbel0a9dd2f2011-12-10 20:38:14 +010090 | ``'w:xz'`` | Open for lzma compressed writing. |
91 +------------------+---------------------------------------------+
Georg Brandl116aa622007-08-15 14:28:22 +000092
Lars Gustäbel0a9dd2f2011-12-10 20:38:14 +010093 Note that ``'a:gz'``, ``'a:bz2'`` or ``'a:xz'`` is not possible. If *mode*
94 is not suitable to open a certain (compressed) file for reading,
95 :exc:`ReadError` is raised. Use *mode* ``'r'`` to avoid this. If a
96 compression method is not supported, :exc:`CompressionError` is raised.
Georg Brandl116aa622007-08-15 14:28:22 +000097
Antoine Pitrou11cb9612010-09-15 11:11:28 +000098 If *fileobj* is specified, it is used as an alternative to a :term:`file object`
99 opened in binary mode for *name*. It is supposed to be at position 0.
Georg Brandl116aa622007-08-15 14:28:22 +0000100
Berker Peksag0fe63252015-02-13 21:02:12 +0200101 For modes ``'w:gz'``, ``'r:gz'``, ``'w:bz2'``, ``'r:bz2'``, ``'x:gz'``,
102 ``'x:bz2'``, :func:`tarfile.open` accepts the keyword argument
Martin Panter7f7c6052016-04-13 03:24:06 +0000103 *compresslevel* (default ``9``) to specify the compression level of the file.
Benjamin Peterson9b2731b2014-06-07 12:45:37 -0700104
Georg Brandl116aa622007-08-15 14:28:22 +0000105 For special purposes, there is a second format for *mode*:
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000106 ``'filemode|[compression]'``. :func:`tarfile.open` will return a :class:`TarFile`
Georg Brandl116aa622007-08-15 14:28:22 +0000107 object that processes its data as a stream of blocks. No random seeking will
108 be done on the file. If given, *fileobj* may be any object that has a
109 :meth:`read` or :meth:`write` method (depending on the *mode*). *bufsize*
110 specifies the blocksize and defaults to ``20 * 512`` bytes. Use this variant
Antoine Pitrou11cb9612010-09-15 11:11:28 +0000111 in combination with e.g. ``sys.stdin``, a socket :term:`file object` or a tape
Georg Brandl116aa622007-08-15 14:28:22 +0000112 device. However, such a :class:`TarFile` object is limited in that it does
Martin Panterc04fb562016-02-10 05:44:01 +0000113 not allow random access, see :ref:`tar-examples`. The currently
Georg Brandl116aa622007-08-15 14:28:22 +0000114 possible modes:
115
116 +-------------+--------------------------------------------+
117 | Mode | Action |
118 +=============+============================================+
119 | ``'r|*'`` | Open a *stream* of tar blocks for reading |
120 | | with transparent compression. |
121 +-------------+--------------------------------------------+
122 | ``'r|'`` | Open a *stream* of uncompressed tar blocks |
123 | | for reading. |
124 +-------------+--------------------------------------------+
125 | ``'r|gz'`` | Open a gzip compressed *stream* for |
126 | | reading. |
127 +-------------+--------------------------------------------+
128 | ``'r|bz2'`` | Open a bzip2 compressed *stream* for |
129 | | reading. |
130 +-------------+--------------------------------------------+
Serhiy Storchaka6a7b3a72016-04-17 08:32:47 +0300131 | ``'r|xz'`` | Open an lzma compressed *stream* for |
Lars Gustäbel0a9dd2f2011-12-10 20:38:14 +0100132 | | reading. |
133 +-------------+--------------------------------------------+
Georg Brandl116aa622007-08-15 14:28:22 +0000134 | ``'w|'`` | Open an uncompressed *stream* for writing. |
135 +-------------+--------------------------------------------+
Lars Gustäbel0c6cbbd2011-12-10 12:45:45 +0100136 | ``'w|gz'`` | Open a gzip compressed *stream* for |
Georg Brandl116aa622007-08-15 14:28:22 +0000137 | | writing. |
138 +-------------+--------------------------------------------+
Lars Gustäbel0c6cbbd2011-12-10 12:45:45 +0100139 | ``'w|bz2'`` | Open a bzip2 compressed *stream* for |
Georg Brandl116aa622007-08-15 14:28:22 +0000140 | | writing. |
141 +-------------+--------------------------------------------+
Lars Gustäbel0a9dd2f2011-12-10 20:38:14 +0100142 | ``'w|xz'`` | Open an lzma compressed *stream* for |
143 | | writing. |
144 +-------------+--------------------------------------------+
Georg Brandl116aa622007-08-15 14:28:22 +0000145
Berker Peksag0fe63252015-02-13 21:02:12 +0200146 .. versionchanged:: 3.5
147 The ``'x'`` (exclusive creation) mode was added.
Georg Brandl116aa622007-08-15 14:28:22 +0000148
Serhiy Storchakac45cd162017-03-08 10:32:44 +0200149 .. versionchanged:: 3.6
150 The *name* parameter accepts a :term:`path-like object`.
151
152
Georg Brandl116aa622007-08-15 14:28:22 +0000153.. class:: TarFile
154
Berker Peksag97484782016-06-14 00:48:35 +0300155 Class for reading and writing tar archives. Do not use this class directly:
156 use :func:`tarfile.open` instead. See :ref:`tarfile-objects`.
Georg Brandl116aa622007-08-15 14:28:22 +0000157
158
159.. function:: is_tarfile(name)
160
161 Return :const:`True` if *name* is a tar archive file, that the :mod:`tarfile`
162 module can read.
163
164
Lars Gustäbel0c24e8b2008-08-02 11:43:24 +0000165The :mod:`tarfile` module defines the following exceptions:
Georg Brandl116aa622007-08-15 14:28:22 +0000166
167
168.. exception:: TarError
169
170 Base class for all :mod:`tarfile` exceptions.
171
172
173.. exception:: ReadError
174
175 Is raised when a tar archive is opened, that either cannot be handled by the
176 :mod:`tarfile` module or is somehow invalid.
177
178
179.. exception:: CompressionError
180
181 Is raised when a compression method is not supported or when the data cannot be
182 decoded properly.
183
184
185.. exception:: StreamError
186
187 Is raised for the limitations that are typical for stream-like :class:`TarFile`
188 objects.
189
190
191.. exception:: ExtractError
192
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000193 Is raised for *non-fatal* errors when using :meth:`TarFile.extract`, but only if
Georg Brandl116aa622007-08-15 14:28:22 +0000194 :attr:`TarFile.errorlevel`\ ``== 2``.
195
196
197.. exception:: HeaderError
198
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000199 Is raised by :meth:`TarInfo.frombuf` if the buffer it gets is invalid.
200
Georg Brandl116aa622007-08-15 14:28:22 +0000201
R David Murraybf92bce2014-10-03 20:18:48 -0400202The following constants are available at the module level:
203
204.. data:: ENCODING
205
206 The default character encoding: ``'utf-8'`` on Windows, the value returned by
207 :func:`sys.getfilesystemencoding` otherwise.
208
Georg Brandl116aa622007-08-15 14:28:22 +0000209
210Each of the following constants defines a tar archive format that the
211:mod:`tarfile` module is able to create. See section :ref:`tar-formats` for
212details.
213
214
215.. data:: USTAR_FORMAT
216
217 POSIX.1-1988 (ustar) format.
218
219
220.. data:: GNU_FORMAT
221
222 GNU tar format.
223
224
225.. data:: PAX_FORMAT
226
227 POSIX.1-2001 (pax) format.
228
229
230.. data:: DEFAULT_FORMAT
231
232 The default format for creating archives. This is currently :const:`GNU_FORMAT`.
233
234
235.. seealso::
236
237 Module :mod:`zipfile`
238 Documentation of the :mod:`zipfile` standard module.
239
R David Murraybf92bce2014-10-03 20:18:48 -0400240 :ref:`archiving-operations`
241 Documentation of the higher-level archiving facilities provided by the
242 standard :mod:`shutil` module.
243
Serhiy Storchaka6dff0202016-05-07 10:49:07 +0300244 `GNU tar manual, Basic Tar Format <https://www.gnu.org/software/tar/manual/html_node/Standard.html>`_
Georg Brandl116aa622007-08-15 14:28:22 +0000245 Documentation for tar archive files, including GNU tar extensions.
246
Georg Brandl116aa622007-08-15 14:28:22 +0000247
248.. _tarfile-objects:
249
250TarFile Objects
251---------------
252
253The :class:`TarFile` object provides an interface to a tar archive. A tar
254archive is a sequence of blocks. An archive member (a stored file) is made up of
255a header block followed by data blocks. It is possible to store a file in a tar
256archive several times. Each archive member is represented by a :class:`TarInfo`
257object, see :ref:`tarinfo-objects` for details.
258
Lars Gustäbel01385812010-03-03 12:08:54 +0000259A :class:`TarFile` object can be used as a context manager in a :keyword:`with`
260statement. It will automatically be closed when the block is completed. Please
261note that in the event of an exception an archive opened for writing will not
Benjamin Peterson08bf91c2010-04-11 16:12:57 +0000262be finalized; only the internally used file object will be closed. See the
Lars Gustäbel01385812010-03-03 12:08:54 +0000263:ref:`tar-examples` section for a use case.
264
265.. versionadded:: 3.2
Serhiy Storchaka14867992014-09-10 23:43:41 +0300266 Added support for the context management protocol.
Georg Brandl116aa622007-08-15 14:28:22 +0000267
Victor Stinnerde629d42010-05-05 21:43:57 +0000268.. class:: TarFile(name=None, mode='r', fileobj=None, format=DEFAULT_FORMAT, tarinfo=TarInfo, dereference=False, ignore_zeros=False, encoding=ENCODING, errors='surrogateescape', pax_headers=None, debug=0, errorlevel=0)
Georg Brandl116aa622007-08-15 14:28:22 +0000269
270 All following arguments are optional and can be accessed as instance attributes
271 as well.
272
Serhiy Storchakac45cd162017-03-08 10:32:44 +0200273 *name* is the pathname of the archive. *name* may be a :term:`path-like object`.
274 It can be omitted if *fileobj* is given.
Georg Brandl116aa622007-08-15 14:28:22 +0000275 In this case, the file object's :attr:`name` attribute is used if it exists.
276
277 *mode* is either ``'r'`` to read from an existing archive, ``'a'`` to append
Berker Peksag0fe63252015-02-13 21:02:12 +0200278 data to an existing file, ``'w'`` to create a new file overwriting an existing
Berker Peksag97484782016-06-14 00:48:35 +0300279 one, or ``'x'`` to create a new file only if it does not already exist.
Georg Brandl116aa622007-08-15 14:28:22 +0000280
281 If *fileobj* is given, it is used for reading or writing data. If it can be
282 determined, *mode* is overridden by *fileobj*'s mode. *fileobj* will be used
283 from position 0.
284
285 .. note::
286
287 *fileobj* is not closed, when :class:`TarFile` is closed.
288
289 *format* controls the archive format. It must be one of the constants
290 :const:`USTAR_FORMAT`, :const:`GNU_FORMAT` or :const:`PAX_FORMAT` that are
291 defined at module level.
292
Georg Brandl116aa622007-08-15 14:28:22 +0000293 The *tarinfo* argument can be used to replace the default :class:`TarInfo` class
294 with a different one.
295
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000296 If *dereference* is :const:`False`, add symbolic and hard links to the archive. If it
297 is :const:`True`, add the content of the target files to the archive. This has no
Georg Brandl116aa622007-08-15 14:28:22 +0000298 effect on systems that do not support symbolic links.
299
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000300 If *ignore_zeros* is :const:`False`, treat an empty block as the end of the archive.
301 If it is :const:`True`, skip empty (and invalid) blocks and try to get as many members
Georg Brandl116aa622007-08-15 14:28:22 +0000302 as possible. This is only useful for reading concatenated or damaged archives.
303
304 *debug* can be set from ``0`` (no debug messages) up to ``3`` (all debug
305 messages). The messages are written to ``sys.stderr``.
306
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000307 If *errorlevel* is ``0``, all errors are ignored when using :meth:`TarFile.extract`.
Georg Brandl116aa622007-08-15 14:28:22 +0000308 Nevertheless, they appear as error messages in the debug output, when debugging
Antoine Pitrou62ab10a02011-10-12 20:10:51 +0200309 is enabled. If ``1``, all *fatal* errors are raised as :exc:`OSError`
310 exceptions. If ``2``, all *non-fatal* errors are raised as :exc:`TarError`
311 exceptions as well.
Georg Brandl116aa622007-08-15 14:28:22 +0000312
Lars Gustäbel3741eff2007-08-21 12:17:05 +0000313 The *encoding* and *errors* arguments define the character encoding to be
314 used for reading or writing the archive and how conversion errors are going
315 to be handled. The default settings will work for most users.
Georg Brandl116aa622007-08-15 14:28:22 +0000316 See section :ref:`tar-unicode` for in-depth information.
317
Lars Gustäbel3741eff2007-08-21 12:17:05 +0000318 The *pax_headers* argument is an optional dictionary of strings which
Georg Brandl116aa622007-08-15 14:28:22 +0000319 will be added as a pax global header if *format* is :const:`PAX_FORMAT`.
320
Berker Peksag0fe63252015-02-13 21:02:12 +0200321 .. versionchanged:: 3.2
322 Use ``'surrogateescape'`` as the default for the *errors* argument.
323
324 .. versionchanged:: 3.5
325 The ``'x'`` (exclusive creation) mode was added.
Georg Brandl116aa622007-08-15 14:28:22 +0000326
Serhiy Storchakac45cd162017-03-08 10:32:44 +0200327 .. versionchanged:: 3.6
328 The *name* parameter accepts a :term:`path-like object`.
329
330
Raymond Hettinger7096e262014-05-23 03:46:52 +0100331.. classmethod:: TarFile.open(...)
Georg Brandl116aa622007-08-15 14:28:22 +0000332
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000333 Alternative constructor. The :func:`tarfile.open` function is actually a
334 shortcut to this classmethod.
Georg Brandl116aa622007-08-15 14:28:22 +0000335
336
337.. method:: TarFile.getmember(name)
338
339 Return a :class:`TarInfo` object for member *name*. If *name* can not be found
340 in the archive, :exc:`KeyError` is raised.
341
342 .. note::
343
344 If a member occurs more than once in the archive, its last occurrence is assumed
345 to be the most up-to-date version.
346
347
348.. method:: TarFile.getmembers()
349
350 Return the members of the archive as a list of :class:`TarInfo` objects. The
351 list has the same order as the members in the archive.
352
353
354.. method:: TarFile.getnames()
355
356 Return the members as a list of their names. It has the same order as the list
357 returned by :meth:`getmembers`.
358
359
Serhiy Storchakaa7eb7462014-08-21 10:01:16 +0300360.. method:: TarFile.list(verbose=True, *, members=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000361
362 Print a table of contents to ``sys.stdout``. If *verbose* is :const:`False`,
363 only the names of the members are printed. If it is :const:`True`, output
Serhiy Storchakaa7eb7462014-08-21 10:01:16 +0300364 similar to that of :program:`ls -l` is produced. If optional *members* is
365 given, it must be a subset of the list returned by :meth:`getmembers`.
366
367 .. versionchanged:: 3.5
368 Added the *members* parameter.
Georg Brandl116aa622007-08-15 14:28:22 +0000369
370
371.. method:: TarFile.next()
372
373 Return the next member of the archive as a :class:`TarInfo` object, when
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000374 :class:`TarFile` is opened for reading. Return :const:`None` if there is no more
Georg Brandl116aa622007-08-15 14:28:22 +0000375 available.
376
377
Eric V. Smith7a803892015-04-15 10:27:58 -0400378.. method:: TarFile.extractall(path=".", members=None, *, numeric_owner=False)
Georg Brandl116aa622007-08-15 14:28:22 +0000379
380 Extract all members from the archive to the current working directory or
381 directory *path*. If optional *members* is given, it must be a subset of the
382 list returned by :meth:`getmembers`. Directory information like owner,
383 modification time and permissions are set after all members have been extracted.
384 This is done to work around two problems: A directory's modification time is
385 reset each time a file is created in it. And, if a directory's permissions do
386 not allow writing, extracting files to it will fail.
387
Eric V. Smith7a803892015-04-15 10:27:58 -0400388 If *numeric_owner* is :const:`True`, the uid and gid numbers from the tarfile
389 are used to set the owner/group for the extracted files. Otherwise, the named
390 values from the tarfile are used.
391
Thomas Wouters47b49bf2007-08-30 22:15:33 +0000392 .. warning::
393
394 Never extract archives from untrusted sources without prior inspection.
395 It is possible that files are created outside of *path*, e.g. members
396 that have absolute filenames starting with ``"/"`` or filenames with two
397 dots ``".."``.
398
Eric V. Smith7a803892015-04-15 10:27:58 -0400399 .. versionchanged:: 3.5
Martin Panterefbf20f2016-11-13 23:25:06 +0000400 Added the *numeric_owner* parameter.
Georg Brandl116aa622007-08-15 14:28:22 +0000401
Serhiy Storchakac45cd162017-03-08 10:32:44 +0200402 .. versionchanged:: 3.6
403 The *path* parameter accepts a :term:`path-like object`.
404
Eric V. Smith7a803892015-04-15 10:27:58 -0400405
406.. method:: TarFile.extract(member, path="", set_attrs=True, *, numeric_owner=False)
Georg Brandl116aa622007-08-15 14:28:22 +0000407
408 Extract a member from the archive to the current working directory, using its
409 full name. Its file information is extracted as accurately as possible. *member*
410 may be a filename or a :class:`TarInfo` object. You can specify a different
Serhiy Storchakac45cd162017-03-08 10:32:44 +0200411 directory using *path*. *path* may be a :term:`path-like object`.
412 File attributes (owner, mtime, mode) are set unless *set_attrs* is false.
Georg Brandl116aa622007-08-15 14:28:22 +0000413
Eric V. Smith7a803892015-04-15 10:27:58 -0400414 If *numeric_owner* is :const:`True`, the uid and gid numbers from the tarfile
415 are used to set the owner/group for the extracted files. Otherwise, the named
416 values from the tarfile are used.
417
Georg Brandl116aa622007-08-15 14:28:22 +0000418 .. note::
419
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000420 The :meth:`extract` method does not take care of several extraction issues.
421 In most cases you should consider using the :meth:`extractall` method.
Georg Brandl116aa622007-08-15 14:28:22 +0000422
Thomas Wouters47b49bf2007-08-30 22:15:33 +0000423 .. warning::
424
425 See the warning for :meth:`extractall`.
426
Martin v. Löwis16f344d2010-11-01 21:39:13 +0000427 .. versionchanged:: 3.2
428 Added the *set_attrs* parameter.
Georg Brandl116aa622007-08-15 14:28:22 +0000429
Eric V. Smith7a803892015-04-15 10:27:58 -0400430 .. versionchanged:: 3.5
Martin Panterefbf20f2016-11-13 23:25:06 +0000431 Added the *numeric_owner* parameter.
Eric V. Smith7a803892015-04-15 10:27:58 -0400432
Serhiy Storchakac45cd162017-03-08 10:32:44 +0200433 .. versionchanged:: 3.6
434 The *path* parameter accepts a :term:`path-like object`.
435
436
Georg Brandl116aa622007-08-15 14:28:22 +0000437.. method:: TarFile.extractfile(member)
438
439 Extract a member from the archive as a file object. *member* may be a filename
Lars Gustäbel7a919e92012-05-05 18:15:03 +0200440 or a :class:`TarInfo` object. If *member* is a regular file or a link, an
441 :class:`io.BufferedReader` object is returned. Otherwise, :const:`None` is
442 returned.
Georg Brandl116aa622007-08-15 14:28:22 +0000443
Lars Gustäbel7a919e92012-05-05 18:15:03 +0200444 .. versionchanged:: 3.3
445 Return an :class:`io.BufferedReader` object.
Georg Brandl116aa622007-08-15 14:28:22 +0000446
447
Serhiy Storchaka4f76fb12017-01-13 13:25:24 +0200448.. method:: TarFile.add(name, arcname=None, recursive=True, *, filter=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000449
Raymond Hettingera63a3122011-01-26 20:34:14 +0000450 Add the file *name* to the archive. *name* may be any type of file
451 (directory, fifo, symbolic link, etc.). If given, *arcname* specifies an
452 alternative name for the file in the archive. Directories are added
453 recursively by default. This can be avoided by setting *recursive* to
Bernhard M. Wiedemann84521042018-01-31 11:17:10 +0100454 :const:`False`. Recursion adds entries in sorted order.
455 If *filter* is given, it
Raymond Hettingera63a3122011-01-26 20:34:14 +0000456 should be a function that takes a :class:`TarInfo` object argument and
457 returns the changed :class:`TarInfo` object. If it instead returns
458 :const:`None` the :class:`TarInfo` object will be excluded from the
459 archive. See :ref:`tar-examples` for an example.
Lars Gustäbel049d2aa2009-09-12 10:44:00 +0000460
461 .. versionchanged:: 3.2
462 Added the *filter* parameter.
463
Bernhard M. Wiedemann84521042018-01-31 11:17:10 +0100464 .. versionchanged:: 3.7
465 Recursion adds entries in sorted order.
466
Georg Brandl116aa622007-08-15 14:28:22 +0000467
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000468.. method:: TarFile.addfile(tarinfo, fileobj=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000469
470 Add the :class:`TarInfo` object *tarinfo* to the archive. If *fileobj* is given,
Martin Panterf817a482016-02-19 23:34:56 +0000471 it should be a :term:`binary file`, and
Georg Brandl116aa622007-08-15 14:28:22 +0000472 ``tarinfo.size`` bytes are read from it and added to the archive. You can
Martin Panterf817a482016-02-19 23:34:56 +0000473 create :class:`TarInfo` objects directly, or by using :meth:`gettarinfo`.
Georg Brandl116aa622007-08-15 14:28:22 +0000474
475
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000476.. method:: TarFile.gettarinfo(name=None, arcname=None, fileobj=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000477
Martin Panterf817a482016-02-19 23:34:56 +0000478 Create a :class:`TarInfo` object from the result of :func:`os.stat` or
479 equivalent on an existing file. The file is either named by *name*, or
Serhiy Storchakac45cd162017-03-08 10:32:44 +0200480 specified as a :term:`file object` *fileobj* with a file descriptor.
481 *name* may be a :term:`path-like object`. If
Martin Panterf817a482016-02-19 23:34:56 +0000482 given, *arcname* specifies an alternative name for the file in the
483 archive, otherwise, the name is taken from *fileobj*’s
484 :attr:`~io.FileIO.name` attribute, or the *name* argument. The name
485 should be a text string.
486
487 You can modify
488 some of the :class:`TarInfo`’s attributes before you add it using :meth:`addfile`.
489 If the file object is not an ordinary file object positioned at the
490 beginning of the file, attributes such as :attr:`~TarInfo.size` may need
491 modifying. This is the case for objects such as :class:`~gzip.GzipFile`.
492 The :attr:`~TarInfo.name` may also be modified, in which case *arcname*
493 could be a dummy string.
Georg Brandl116aa622007-08-15 14:28:22 +0000494
Serhiy Storchakac45cd162017-03-08 10:32:44 +0200495 .. versionchanged:: 3.6
496 The *name* parameter accepts a :term:`path-like object`.
497
Georg Brandl116aa622007-08-15 14:28:22 +0000498
499.. method:: TarFile.close()
500
501 Close the :class:`TarFile`. In write mode, two finishing zero blocks are
502 appended to the archive.
503
504
Georg Brandl116aa622007-08-15 14:28:22 +0000505.. attribute:: TarFile.pax_headers
506
507 A dictionary containing key-value pairs of pax global headers.
508
Georg Brandl116aa622007-08-15 14:28:22 +0000509
Georg Brandl116aa622007-08-15 14:28:22 +0000510
511.. _tarinfo-objects:
512
513TarInfo Objects
514---------------
515
516A :class:`TarInfo` object represents one member in a :class:`TarFile`. Aside
517from storing all required attributes of a file (like file type, size, time,
518permissions, owner etc.), it provides some useful methods to determine its type.
519It does *not* contain the file's data itself.
520
521:class:`TarInfo` objects are returned by :class:`TarFile`'s methods
522:meth:`getmember`, :meth:`getmembers` and :meth:`gettarinfo`.
523
524
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000525.. class:: TarInfo(name="")
Georg Brandl116aa622007-08-15 14:28:22 +0000526
527 Create a :class:`TarInfo` object.
528
529
Berker Peksag37de9102015-04-19 04:37:35 +0300530.. classmethod:: TarInfo.frombuf(buf, encoding, errors)
Georg Brandl116aa622007-08-15 14:28:22 +0000531
532 Create and return a :class:`TarInfo` object from string buffer *buf*.
533
Berker Peksag37de9102015-04-19 04:37:35 +0300534 Raises :exc:`HeaderError` if the buffer is invalid.
Georg Brandl116aa622007-08-15 14:28:22 +0000535
536
Berker Peksag37de9102015-04-19 04:37:35 +0300537.. classmethod:: TarInfo.fromtarfile(tarfile)
Georg Brandl116aa622007-08-15 14:28:22 +0000538
539 Read the next member from the :class:`TarFile` object *tarfile* and return it as
540 a :class:`TarInfo` object.
541
Georg Brandl116aa622007-08-15 14:28:22 +0000542
Victor Stinnerde629d42010-05-05 21:43:57 +0000543.. method:: TarInfo.tobuf(format=DEFAULT_FORMAT, encoding=ENCODING, errors='surrogateescape')
Georg Brandl116aa622007-08-15 14:28:22 +0000544
545 Create a string buffer from a :class:`TarInfo` object. For information on the
546 arguments see the constructor of the :class:`TarFile` class.
547
Victor Stinnerde629d42010-05-05 21:43:57 +0000548 .. versionchanged:: 3.2
549 Use ``'surrogateescape'`` as the default for the *errors* argument.
550
Georg Brandl116aa622007-08-15 14:28:22 +0000551
552A ``TarInfo`` object has the following public data attributes:
553
554
555.. attribute:: TarInfo.name
556
557 Name of the archive member.
558
559
560.. attribute:: TarInfo.size
561
562 Size in bytes.
563
564
565.. attribute:: TarInfo.mtime
566
567 Time of last modification.
568
569
570.. attribute:: TarInfo.mode
571
572 Permission bits.
573
574
575.. attribute:: TarInfo.type
576
577 File type. *type* is usually one of these constants: :const:`REGTYPE`,
578 :const:`AREGTYPE`, :const:`LNKTYPE`, :const:`SYMTYPE`, :const:`DIRTYPE`,
579 :const:`FIFOTYPE`, :const:`CONTTYPE`, :const:`CHRTYPE`, :const:`BLKTYPE`,
580 :const:`GNUTYPE_SPARSE`. To determine the type of a :class:`TarInfo` object
Raymond Hettingerf7f64f92014-05-23 00:03:45 +0100581 more conveniently, use the ``is*()`` methods below.
Georg Brandl116aa622007-08-15 14:28:22 +0000582
583
584.. attribute:: TarInfo.linkname
585
586 Name of the target file name, which is only present in :class:`TarInfo` objects
587 of type :const:`LNKTYPE` and :const:`SYMTYPE`.
588
589
590.. attribute:: TarInfo.uid
591
592 User ID of the user who originally stored this member.
593
594
595.. attribute:: TarInfo.gid
596
597 Group ID of the user who originally stored this member.
598
599
600.. attribute:: TarInfo.uname
601
602 User name.
603
604
605.. attribute:: TarInfo.gname
606
607 Group name.
608
609
610.. attribute:: TarInfo.pax_headers
611
612 A dictionary containing key-value pairs of an associated pax extended header.
613
Georg Brandl116aa622007-08-15 14:28:22 +0000614
615A :class:`TarInfo` object also provides some convenient query methods:
616
617
618.. method:: TarInfo.isfile()
619
620 Return :const:`True` if the :class:`Tarinfo` object is a regular file.
621
622
623.. method:: TarInfo.isreg()
624
625 Same as :meth:`isfile`.
626
627
628.. method:: TarInfo.isdir()
629
630 Return :const:`True` if it is a directory.
631
632
633.. method:: TarInfo.issym()
634
635 Return :const:`True` if it is a symbolic link.
636
637
638.. method:: TarInfo.islnk()
639
640 Return :const:`True` if it is a hard link.
641
642
643.. method:: TarInfo.ischr()
644
645 Return :const:`True` if it is a character device.
646
647
648.. method:: TarInfo.isblk()
649
650 Return :const:`True` if it is a block device.
651
652
653.. method:: TarInfo.isfifo()
654
655 Return :const:`True` if it is a FIFO.
656
657
658.. method:: TarInfo.isdev()
659
660 Return :const:`True` if it is one of character device, block device or FIFO.
661
Georg Brandl116aa622007-08-15 14:28:22 +0000662
Serhiy Storchakad27b4552013-11-24 01:53:29 +0200663.. _tarfile-commandline:
Serhiy Storchaka72b34432016-11-02 21:04:45 +0200664.. program:: tarfile
Serhiy Storchakad27b4552013-11-24 01:53:29 +0200665
Serhiy Storchaka72b34432016-11-02 21:04:45 +0200666Command-Line Interface
Serhiy Storchakad27b4552013-11-24 01:53:29 +0200667----------------------
668
669.. versionadded:: 3.4
670
Serhiy Storchaka72b34432016-11-02 21:04:45 +0200671The :mod:`tarfile` module provides a simple command-line interface to interact
Serhiy Storchakad27b4552013-11-24 01:53:29 +0200672with tar archives.
673
674If you want to create a new tar archive, specify its name after the :option:`-c`
Martin Panter1050d2d2016-07-26 11:18:21 +0200675option and then list the filename(s) that should be included:
676
677.. code-block:: shell-session
Serhiy Storchakad27b4552013-11-24 01:53:29 +0200678
679 $ python -m tarfile -c monty.tar spam.txt eggs.txt
680
Martin Panter1050d2d2016-07-26 11:18:21 +0200681Passing a directory is also acceptable:
682
683.. code-block:: shell-session
Serhiy Storchakad27b4552013-11-24 01:53:29 +0200684
685 $ python -m tarfile -c monty.tar life-of-brian_1979/
686
687If you want to extract a tar archive into the current directory, use
Martin Panter1050d2d2016-07-26 11:18:21 +0200688the :option:`-e` option:
689
690.. code-block:: shell-session
Serhiy Storchakad27b4552013-11-24 01:53:29 +0200691
692 $ python -m tarfile -e monty.tar
693
694You can also extract a tar archive into a different directory by passing the
Martin Panter1050d2d2016-07-26 11:18:21 +0200695directory's name:
696
697.. code-block:: shell-session
Serhiy Storchakad27b4552013-11-24 01:53:29 +0200698
699 $ python -m tarfile -e monty.tar other-dir/
700
Martin Panter1050d2d2016-07-26 11:18:21 +0200701For a list of the files in a tar archive, use the :option:`-l` option:
702
703.. code-block:: shell-session
Serhiy Storchakad27b4552013-11-24 01:53:29 +0200704
705 $ python -m tarfile -l monty.tar
706
707
Serhiy Storchaka72b34432016-11-02 21:04:45 +0200708Command-line options
Serhiy Storchakad27b4552013-11-24 01:53:29 +0200709~~~~~~~~~~~~~~~~~~~~
710
711.. cmdoption:: -l <tarfile>
712 --list <tarfile>
713
714 List files in a tarfile.
715
Serhiy Storchaka72b34432016-11-02 21:04:45 +0200716.. cmdoption:: -c <tarfile> <source1> ... <sourceN>
717 --create <tarfile> <source1> ... <sourceN>
Serhiy Storchakad27b4552013-11-24 01:53:29 +0200718
719 Create tarfile from source files.
720
721.. cmdoption:: -e <tarfile> [<output_dir>]
722 --extract <tarfile> [<output_dir>]
723
724 Extract tarfile into the current directory if *output_dir* is not specified.
725
726.. cmdoption:: -t <tarfile>
727 --test <tarfile>
728
729 Test whether the tarfile is valid or not.
730
731.. cmdoption:: -v, --verbose
732
Serhiy Storchaka72b34432016-11-02 21:04:45 +0200733 Verbose output.
Serhiy Storchakad27b4552013-11-24 01:53:29 +0200734
Georg Brandl116aa622007-08-15 14:28:22 +0000735.. _tar-examples:
736
737Examples
738--------
739
740How to extract an entire tar archive to the current working directory::
741
742 import tarfile
743 tar = tarfile.open("sample.tar.gz")
744 tar.extractall()
745 tar.close()
746
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000747How to extract a subset of a tar archive with :meth:`TarFile.extractall` using
748a generator function instead of a list::
749
750 import os
751 import tarfile
752
753 def py_files(members):
754 for tarinfo in members:
755 if os.path.splitext(tarinfo.name)[1] == ".py":
756 yield tarinfo
757
758 tar = tarfile.open("sample.tar.gz")
759 tar.extractall(members=py_files(tar))
760 tar.close()
761
Georg Brandl116aa622007-08-15 14:28:22 +0000762How to create an uncompressed tar archive from a list of filenames::
763
764 import tarfile
765 tar = tarfile.open("sample.tar", "w")
766 for name in ["foo", "bar", "quux"]:
767 tar.add(name)
768 tar.close()
769
Lars Gustäbel01385812010-03-03 12:08:54 +0000770The same example using the :keyword:`with` statement::
771
772 import tarfile
773 with tarfile.open("sample.tar", "w") as tar:
774 for name in ["foo", "bar", "quux"]:
775 tar.add(name)
776
Georg Brandl116aa622007-08-15 14:28:22 +0000777How to read a gzip compressed tar archive and display some member information::
778
779 import tarfile
780 tar = tarfile.open("sample.tar.gz", "r:gz")
781 for tarinfo in tar:
Collin Winterc79461b2007-09-01 23:34:30 +0000782 print(tarinfo.name, "is", tarinfo.size, "bytes in size and is", end="")
Georg Brandl116aa622007-08-15 14:28:22 +0000783 if tarinfo.isreg():
Collin Winterc79461b2007-09-01 23:34:30 +0000784 print("a regular file.")
Georg Brandl116aa622007-08-15 14:28:22 +0000785 elif tarinfo.isdir():
Collin Winterc79461b2007-09-01 23:34:30 +0000786 print("a directory.")
Georg Brandl116aa622007-08-15 14:28:22 +0000787 else:
Collin Winterc79461b2007-09-01 23:34:30 +0000788 print("something else.")
Georg Brandl116aa622007-08-15 14:28:22 +0000789 tar.close()
790
Lars Gustäbel049d2aa2009-09-12 10:44:00 +0000791How to create an archive and reset the user information using the *filter*
792parameter in :meth:`TarFile.add`::
793
794 import tarfile
795 def reset(tarinfo):
796 tarinfo.uid = tarinfo.gid = 0
797 tarinfo.uname = tarinfo.gname = "root"
798 return tarinfo
799 tar = tarfile.open("sample.tar.gz", "w:gz")
800 tar.add("foo", filter=reset)
801 tar.close()
802
Georg Brandl116aa622007-08-15 14:28:22 +0000803
804.. _tar-formats:
805
806Supported tar formats
807---------------------
808
809There are three tar formats that can be created with the :mod:`tarfile` module:
810
811* The POSIX.1-1988 ustar format (:const:`USTAR_FORMAT`). It supports filenames
812 up to a length of at best 256 characters and linknames up to 100 characters. The
Serhiy Storchakaf8def282013-02-16 17:29:56 +0200813 maximum file size is 8 GiB. This is an old and limited but widely
Georg Brandl116aa622007-08-15 14:28:22 +0000814 supported format.
815
816* The GNU tar format (:const:`GNU_FORMAT`). It supports long filenames and
Serhiy Storchakaf8def282013-02-16 17:29:56 +0200817 linknames, files bigger than 8 GiB and sparse files. It is the de facto
Georg Brandl116aa622007-08-15 14:28:22 +0000818 standard on GNU/Linux systems. :mod:`tarfile` fully supports the GNU tar
819 extensions for long names, sparse file support is read-only.
820
821* The POSIX.1-2001 pax format (:const:`PAX_FORMAT`). It is the most flexible
822 format with virtually no limits. It supports long filenames and linknames, large
823 files and stores pathnames in a portable way. However, not all tar
824 implementations today are able to handle pax archives properly.
825
826 The *pax* format is an extension to the existing *ustar* format. It uses extra
827 headers for information that cannot be stored otherwise. There are two flavours
828 of pax headers: Extended headers only affect the subsequent file header, global
829 headers are valid for the complete archive and affect all following files. All
830 the data in a pax header is encoded in *UTF-8* for portability reasons.
831
832There are some more variants of the tar format which can be read, but not
833created:
834
835* The ancient V7 format. This is the first tar format from Unix Seventh Edition,
836 storing only regular files and directories. Names must not be longer than 100
837 characters, there is no user/group name information. Some archives have
838 miscalculated header checksums in case of fields with non-ASCII characters.
839
840* The SunOS tar extended format. This format is a variant of the POSIX.1-2001
841 pax format, but is not compatible.
842
Georg Brandl116aa622007-08-15 14:28:22 +0000843.. _tar-unicode:
844
845Unicode issues
846--------------
847
848The tar format was originally conceived to make backups on tape drives with the
849main focus on preserving file system information. Nowadays tar archives are
850commonly used for file distribution and exchanging archives over networks. One
Lars Gustäbel3741eff2007-08-21 12:17:05 +0000851problem of the original format (which is the basis of all other formats) is
852that there is no concept of supporting different character encodings. For
Georg Brandl116aa622007-08-15 14:28:22 +0000853example, an ordinary tar archive created on a *UTF-8* system cannot be read
Lars Gustäbel3741eff2007-08-21 12:17:05 +0000854correctly on a *Latin-1* system if it contains non-*ASCII* characters. Textual
855metadata (like filenames, linknames, user/group names) will appear damaged.
856Unfortunately, there is no way to autodetect the encoding of an archive. The
857pax format was designed to solve this problem. It stores non-ASCII metadata
858using the universal character encoding *UTF-8*.
Georg Brandl116aa622007-08-15 14:28:22 +0000859
Lars Gustäbel3741eff2007-08-21 12:17:05 +0000860The details of character conversion in :mod:`tarfile` are controlled by the
861*encoding* and *errors* keyword arguments of the :class:`TarFile` class.
Georg Brandl116aa622007-08-15 14:28:22 +0000862
Lars Gustäbel3741eff2007-08-21 12:17:05 +0000863*encoding* defines the character encoding to use for the metadata in the
864archive. The default value is :func:`sys.getfilesystemencoding` or ``'ascii'``
865as a fallback. Depending on whether the archive is read or written, the
866metadata must be either decoded or encoded. If *encoding* is not set
867appropriately, this conversion may fail.
Georg Brandl116aa622007-08-15 14:28:22 +0000868
869The *errors* argument defines how characters are treated that cannot be
Nick Coghlanb9fdb7a2015-01-07 00:22:00 +1000870converted. Possible values are listed in section :ref:`error-handlers`.
Victor Stinnerde629d42010-05-05 21:43:57 +0000871The default scheme is ``'surrogateescape'`` which Python also uses for its
872file system calls, see :ref:`os-filenames`.
Georg Brandl116aa622007-08-15 14:28:22 +0000873
Lars Gustäbel1465cc22010-05-17 18:02:50 +0000874In case of :const:`PAX_FORMAT` archives, *encoding* is generally not needed
875because all the metadata is stored using *UTF-8*. *encoding* is only used in
876the rare cases when binary pax headers are decoded or when strings with
877surrogate characters are stored.