blob: b49da472afffe4c2b1950ac8e0eb80ee6c567e44 [file] [log] [blame]
Georg Brandl116aa622007-08-15 14:28:22 +00001:mod:`tarfile` --- Read and write tar archive files
2===================================================
3
4.. module:: tarfile
5 :synopsis: Read and write tar-format archive files.
6
7
Georg Brandl116aa622007-08-15 14:28:22 +00008.. moduleauthor:: Lars Gustäbel <lars@gustaebel.de>
9.. sectionauthor:: Lars Gustäbel <lars@gustaebel.de>
10
Raymond Hettingera1993682011-01-27 01:20:32 +000011**Source code:** :source:`Lib/tarfile.py`
12
13--------------
Georg Brandl116aa622007-08-15 14:28:22 +000014
Guido van Rossum77677112007-11-05 19:43:04 +000015The :mod:`tarfile` module makes it possible to read and write tar
Lars Gustäbel0a9dd2f2011-12-10 20:38:14 +010016archives, including those using gzip, bz2 and lzma compression.
Éric Araujof2fbb9c2012-01-16 16:55:55 +010017Use the :mod:`zipfile` module to read or write :file:`.zip` files, or the
18higher-level functions in :ref:`shutil <archiving-operations>`.
Guido van Rossum77677112007-11-05 19:43:04 +000019
Georg Brandl116aa622007-08-15 14:28:22 +000020Some facts and figures:
21
R David Murraybf92bce2014-10-03 20:18:48 -040022* reads and writes :mod:`gzip`, :mod:`bz2` and :mod:`lzma` compressed archives
23 if the respective modules are available.
Georg Brandl116aa622007-08-15 14:28:22 +000024
25* read/write support for the POSIX.1-1988 (ustar) format.
26
27* read/write support for the GNU tar format including *longname* and *longlink*
Lars Gustäbel9cbdd752010-10-29 09:08:19 +000028 extensions, read-only support for all variants of the *sparse* extension
29 including restoration of sparse files.
Georg Brandl116aa622007-08-15 14:28:22 +000030
31* read/write support for the POSIX.1-2001 (pax) format.
32
Georg Brandl116aa622007-08-15 14:28:22 +000033* handles directories, regular files, hardlinks, symbolic links, fifos,
34 character devices and block devices and is able to acquire and restore file
35 information like timestamp, access permissions and owner.
36
Lars Gustäbel521dfb02011-12-12 10:22:56 +010037.. versionchanged:: 3.3
38 Added support for :mod:`lzma` compression.
39
Georg Brandl116aa622007-08-15 14:28:22 +000040
Benjamin Petersona37cfc62008-05-26 13:48:34 +000041.. function:: open(name=None, mode='r', fileobj=None, bufsize=10240, \*\*kwargs)
Georg Brandl116aa622007-08-15 14:28:22 +000042
43 Return a :class:`TarFile` object for the pathname *name*. For detailed
44 information on :class:`TarFile` objects and the keyword arguments that are
45 allowed, see :ref:`tarfile-objects`.
46
47 *mode* has to be a string of the form ``'filemode[:compression]'``, it defaults
48 to ``'r'``. Here is a full list of mode combinations:
49
50 +------------------+---------------------------------------------+
51 | mode | action |
52 +==================+=============================================+
53 | ``'r' or 'r:*'`` | Open for reading with transparent |
54 | | compression (recommended). |
55 +------------------+---------------------------------------------+
56 | ``'r:'`` | Open for reading exclusively without |
57 | | compression. |
58 +------------------+---------------------------------------------+
59 | ``'r:gz'`` | Open for reading with gzip compression. |
60 +------------------+---------------------------------------------+
61 | ``'r:bz2'`` | Open for reading with bzip2 compression. |
62 +------------------+---------------------------------------------+
Lars Gustäbel0a9dd2f2011-12-10 20:38:14 +010063 | ``'r:xz'`` | Open for reading with lzma compression. |
64 +------------------+---------------------------------------------+
Berker Peksag0fe63252015-02-13 21:02:12 +020065 | ``'x'`` or | Create a tarfile exclusively without |
66 | ``'x:'`` | compression. |
67 | | Raise an :exc:`FileExistsError` exception |
68 | | if it is already exists. |
69 +------------------+---------------------------------------------+
70 | ``'x:gz'`` | Create a tarfile with gzip compression. |
71 | | Raise an :exc:`FileExistsError` exception |
72 | | if it is already exists. |
73 +------------------+---------------------------------------------+
74 | ``'x:bz2'`` | Create a tarfile with bzip2 compression. |
75 | | Raise an :exc:`FileExistsError` exception |
76 | | if it is already exists. |
77 +------------------+---------------------------------------------+
78 | ``'x:xz'`` | Create a tarfile with lzma compression. |
79 | | Raise an :exc:`FileExistsError` exception |
80 | | if it is already exists. |
81 +------------------+---------------------------------------------+
Georg Brandl116aa622007-08-15 14:28:22 +000082 | ``'a' or 'a:'`` | Open for appending with no compression. The |
83 | | file is created if it does not exist. |
84 +------------------+---------------------------------------------+
85 | ``'w' or 'w:'`` | Open for uncompressed writing. |
86 +------------------+---------------------------------------------+
87 | ``'w:gz'`` | Open for gzip compressed writing. |
88 +------------------+---------------------------------------------+
89 | ``'w:bz2'`` | Open for bzip2 compressed writing. |
90 +------------------+---------------------------------------------+
Lars Gustäbel0a9dd2f2011-12-10 20:38:14 +010091 | ``'w:xz'`` | Open for lzma compressed writing. |
92 +------------------+---------------------------------------------+
Georg Brandl116aa622007-08-15 14:28:22 +000093
Lars Gustäbel0a9dd2f2011-12-10 20:38:14 +010094 Note that ``'a:gz'``, ``'a:bz2'`` or ``'a:xz'`` is not possible. If *mode*
95 is not suitable to open a certain (compressed) file for reading,
96 :exc:`ReadError` is raised. Use *mode* ``'r'`` to avoid this. If a
97 compression method is not supported, :exc:`CompressionError` is raised.
Georg Brandl116aa622007-08-15 14:28:22 +000098
Antoine Pitrou11cb9612010-09-15 11:11:28 +000099 If *fileobj* is specified, it is used as an alternative to a :term:`file object`
100 opened in binary mode for *name*. It is supposed to be at position 0.
Georg Brandl116aa622007-08-15 14:28:22 +0000101
Berker Peksag0fe63252015-02-13 21:02:12 +0200102 For modes ``'w:gz'``, ``'r:gz'``, ``'w:bz2'``, ``'r:bz2'``, ``'x:gz'``,
103 ``'x:bz2'``, :func:`tarfile.open` accepts the keyword argument
Martin Panter7f7c6052016-04-13 03:24:06 +0000104 *compresslevel* (default ``9``) to specify the compression level of the file.
Benjamin Peterson9b2731b2014-06-07 12:45:37 -0700105
Georg Brandl116aa622007-08-15 14:28:22 +0000106 For special purposes, there is a second format for *mode*:
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000107 ``'filemode|[compression]'``. :func:`tarfile.open` will return a :class:`TarFile`
Georg Brandl116aa622007-08-15 14:28:22 +0000108 object that processes its data as a stream of blocks. No random seeking will
109 be done on the file. If given, *fileobj* may be any object that has a
110 :meth:`read` or :meth:`write` method (depending on the *mode*). *bufsize*
111 specifies the blocksize and defaults to ``20 * 512`` bytes. Use this variant
Antoine Pitrou11cb9612010-09-15 11:11:28 +0000112 in combination with e.g. ``sys.stdin``, a socket :term:`file object` or a tape
Georg Brandl116aa622007-08-15 14:28:22 +0000113 device. However, such a :class:`TarFile` object is limited in that it does
Martin Panterc04fb562016-02-10 05:44:01 +0000114 not allow random access, see :ref:`tar-examples`. The currently
Georg Brandl116aa622007-08-15 14:28:22 +0000115 possible modes:
116
117 +-------------+--------------------------------------------+
118 | Mode | Action |
119 +=============+============================================+
120 | ``'r|*'`` | Open a *stream* of tar blocks for reading |
121 | | with transparent compression. |
122 +-------------+--------------------------------------------+
123 | ``'r|'`` | Open a *stream* of uncompressed tar blocks |
124 | | for reading. |
125 +-------------+--------------------------------------------+
126 | ``'r|gz'`` | Open a gzip compressed *stream* for |
127 | | reading. |
128 +-------------+--------------------------------------------+
129 | ``'r|bz2'`` | Open a bzip2 compressed *stream* for |
130 | | reading. |
131 +-------------+--------------------------------------------+
Serhiy Storchaka6a7b3a72016-04-17 08:32:47 +0300132 | ``'r|xz'`` | Open an lzma compressed *stream* for |
Lars Gustäbel0a9dd2f2011-12-10 20:38:14 +0100133 | | reading. |
134 +-------------+--------------------------------------------+
Georg Brandl116aa622007-08-15 14:28:22 +0000135 | ``'w|'`` | Open an uncompressed *stream* for writing. |
136 +-------------+--------------------------------------------+
Lars Gustäbel0c6cbbd2011-12-10 12:45:45 +0100137 | ``'w|gz'`` | Open a gzip compressed *stream* for |
Georg Brandl116aa622007-08-15 14:28:22 +0000138 | | writing. |
139 +-------------+--------------------------------------------+
Lars Gustäbel0c6cbbd2011-12-10 12:45:45 +0100140 | ``'w|bz2'`` | Open a bzip2 compressed *stream* for |
Georg Brandl116aa622007-08-15 14:28:22 +0000141 | | writing. |
142 +-------------+--------------------------------------------+
Lars Gustäbel0a9dd2f2011-12-10 20:38:14 +0100143 | ``'w|xz'`` | Open an lzma compressed *stream* for |
144 | | writing. |
145 +-------------+--------------------------------------------+
Georg Brandl116aa622007-08-15 14:28:22 +0000146
Berker Peksag0fe63252015-02-13 21:02:12 +0200147 .. versionchanged:: 3.5
148 The ``'x'`` (exclusive creation) mode was added.
Georg Brandl116aa622007-08-15 14:28:22 +0000149
150.. class:: TarFile
151
152 Class for reading and writing tar archives. Do not use this class directly,
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000153 better use :func:`tarfile.open` instead. See :ref:`tarfile-objects`.
Georg Brandl116aa622007-08-15 14:28:22 +0000154
155
156.. function:: is_tarfile(name)
157
158 Return :const:`True` if *name* is a tar archive file, that the :mod:`tarfile`
159 module can read.
160
161
Lars Gustäbel0c24e8b2008-08-02 11:43:24 +0000162The :mod:`tarfile` module defines the following exceptions:
Georg Brandl116aa622007-08-15 14:28:22 +0000163
164
165.. exception:: TarError
166
167 Base class for all :mod:`tarfile` exceptions.
168
169
170.. exception:: ReadError
171
172 Is raised when a tar archive is opened, that either cannot be handled by the
173 :mod:`tarfile` module or is somehow invalid.
174
175
176.. exception:: CompressionError
177
178 Is raised when a compression method is not supported or when the data cannot be
179 decoded properly.
180
181
182.. exception:: StreamError
183
184 Is raised for the limitations that are typical for stream-like :class:`TarFile`
185 objects.
186
187
188.. exception:: ExtractError
189
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000190 Is raised for *non-fatal* errors when using :meth:`TarFile.extract`, but only if
Georg Brandl116aa622007-08-15 14:28:22 +0000191 :attr:`TarFile.errorlevel`\ ``== 2``.
192
193
194.. exception:: HeaderError
195
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000196 Is raised by :meth:`TarInfo.frombuf` if the buffer it gets is invalid.
197
Georg Brandl116aa622007-08-15 14:28:22 +0000198
R David Murraybf92bce2014-10-03 20:18:48 -0400199The following constants are available at the module level:
200
201.. data:: ENCODING
202
203 The default character encoding: ``'utf-8'`` on Windows, the value returned by
204 :func:`sys.getfilesystemencoding` otherwise.
205
Georg Brandl116aa622007-08-15 14:28:22 +0000206
207Each of the following constants defines a tar archive format that the
208:mod:`tarfile` module is able to create. See section :ref:`tar-formats` for
209details.
210
211
212.. data:: USTAR_FORMAT
213
214 POSIX.1-1988 (ustar) format.
215
216
217.. data:: GNU_FORMAT
218
219 GNU tar format.
220
221
222.. data:: PAX_FORMAT
223
224 POSIX.1-2001 (pax) format.
225
226
227.. data:: DEFAULT_FORMAT
228
229 The default format for creating archives. This is currently :const:`GNU_FORMAT`.
230
231
232.. seealso::
233
234 Module :mod:`zipfile`
235 Documentation of the :mod:`zipfile` standard module.
236
R David Murraybf92bce2014-10-03 20:18:48 -0400237 :ref:`archiving-operations`
238 Documentation of the higher-level archiving facilities provided by the
239 standard :mod:`shutil` module.
240
Serhiy Storchaka6dff0202016-05-07 10:49:07 +0300241 `GNU tar manual, Basic Tar Format <https://www.gnu.org/software/tar/manual/html_node/Standard.html>`_
Georg Brandl116aa622007-08-15 14:28:22 +0000242 Documentation for tar archive files, including GNU tar extensions.
243
Georg Brandl116aa622007-08-15 14:28:22 +0000244
245.. _tarfile-objects:
246
247TarFile Objects
248---------------
249
250The :class:`TarFile` object provides an interface to a tar archive. A tar
251archive is a sequence of blocks. An archive member (a stored file) is made up of
252a header block followed by data blocks. It is possible to store a file in a tar
253archive several times. Each archive member is represented by a :class:`TarInfo`
254object, see :ref:`tarinfo-objects` for details.
255
Lars Gustäbel01385812010-03-03 12:08:54 +0000256A :class:`TarFile` object can be used as a context manager in a :keyword:`with`
257statement. It will automatically be closed when the block is completed. Please
258note that in the event of an exception an archive opened for writing will not
Benjamin Peterson08bf91c2010-04-11 16:12:57 +0000259be finalized; only the internally used file object will be closed. See the
Lars Gustäbel01385812010-03-03 12:08:54 +0000260:ref:`tar-examples` section for a use case.
261
262.. versionadded:: 3.2
Serhiy Storchaka14867992014-09-10 23:43:41 +0300263 Added support for the context management protocol.
Georg Brandl116aa622007-08-15 14:28:22 +0000264
Victor Stinnerde629d42010-05-05 21:43:57 +0000265.. class:: TarFile(name=None, mode='r', fileobj=None, format=DEFAULT_FORMAT, tarinfo=TarInfo, dereference=False, ignore_zeros=False, encoding=ENCODING, errors='surrogateescape', pax_headers=None, debug=0, errorlevel=0)
Georg Brandl116aa622007-08-15 14:28:22 +0000266
267 All following arguments are optional and can be accessed as instance attributes
268 as well.
269
270 *name* is the pathname of the archive. It can be omitted if *fileobj* is given.
271 In this case, the file object's :attr:`name` attribute is used if it exists.
272
273 *mode* is either ``'r'`` to read from an existing archive, ``'a'`` to append
Berker Peksag0fe63252015-02-13 21:02:12 +0200274 data to an existing file, ``'w'`` to create a new file overwriting an existing
275 one or ``'x'`` to create a new file only if it's not exists.
Georg Brandl116aa622007-08-15 14:28:22 +0000276
277 If *fileobj* is given, it is used for reading or writing data. If it can be
278 determined, *mode* is overridden by *fileobj*'s mode. *fileobj* will be used
279 from position 0.
280
281 .. note::
282
283 *fileobj* is not closed, when :class:`TarFile` is closed.
284
285 *format* controls the archive format. It must be one of the constants
286 :const:`USTAR_FORMAT`, :const:`GNU_FORMAT` or :const:`PAX_FORMAT` that are
287 defined at module level.
288
Georg Brandl116aa622007-08-15 14:28:22 +0000289 The *tarinfo* argument can be used to replace the default :class:`TarInfo` class
290 with a different one.
291
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000292 If *dereference* is :const:`False`, add symbolic and hard links to the archive. If it
293 is :const:`True`, add the content of the target files to the archive. This has no
Georg Brandl116aa622007-08-15 14:28:22 +0000294 effect on systems that do not support symbolic links.
295
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000296 If *ignore_zeros* is :const:`False`, treat an empty block as the end of the archive.
297 If it is :const:`True`, skip empty (and invalid) blocks and try to get as many members
Georg Brandl116aa622007-08-15 14:28:22 +0000298 as possible. This is only useful for reading concatenated or damaged archives.
299
300 *debug* can be set from ``0`` (no debug messages) up to ``3`` (all debug
301 messages). The messages are written to ``sys.stderr``.
302
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000303 If *errorlevel* is ``0``, all errors are ignored when using :meth:`TarFile.extract`.
Georg Brandl116aa622007-08-15 14:28:22 +0000304 Nevertheless, they appear as error messages in the debug output, when debugging
Antoine Pitrou62ab10a02011-10-12 20:10:51 +0200305 is enabled. If ``1``, all *fatal* errors are raised as :exc:`OSError`
306 exceptions. If ``2``, all *non-fatal* errors are raised as :exc:`TarError`
307 exceptions as well.
Georg Brandl116aa622007-08-15 14:28:22 +0000308
Lars Gustäbel3741eff2007-08-21 12:17:05 +0000309 The *encoding* and *errors* arguments define the character encoding to be
310 used for reading or writing the archive and how conversion errors are going
311 to be handled. The default settings will work for most users.
Georg Brandl116aa622007-08-15 14:28:22 +0000312 See section :ref:`tar-unicode` for in-depth information.
313
Lars Gustäbel3741eff2007-08-21 12:17:05 +0000314 The *pax_headers* argument is an optional dictionary of strings which
Georg Brandl116aa622007-08-15 14:28:22 +0000315 will be added as a pax global header if *format* is :const:`PAX_FORMAT`.
316
Berker Peksag0fe63252015-02-13 21:02:12 +0200317 .. versionchanged:: 3.2
318 Use ``'surrogateescape'`` as the default for the *errors* argument.
319
320 .. versionchanged:: 3.5
321 The ``'x'`` (exclusive creation) mode was added.
Georg Brandl116aa622007-08-15 14:28:22 +0000322
Raymond Hettinger7096e262014-05-23 03:46:52 +0100323.. classmethod:: TarFile.open(...)
Georg Brandl116aa622007-08-15 14:28:22 +0000324
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000325 Alternative constructor. The :func:`tarfile.open` function is actually a
326 shortcut to this classmethod.
Georg Brandl116aa622007-08-15 14:28:22 +0000327
328
329.. method:: TarFile.getmember(name)
330
331 Return a :class:`TarInfo` object for member *name*. If *name* can not be found
332 in the archive, :exc:`KeyError` is raised.
333
334 .. note::
335
336 If a member occurs more than once in the archive, its last occurrence is assumed
337 to be the most up-to-date version.
338
339
340.. method:: TarFile.getmembers()
341
342 Return the members of the archive as a list of :class:`TarInfo` objects. The
343 list has the same order as the members in the archive.
344
345
346.. method:: TarFile.getnames()
347
348 Return the members as a list of their names. It has the same order as the list
349 returned by :meth:`getmembers`.
350
351
Serhiy Storchakaa7eb7462014-08-21 10:01:16 +0300352.. method:: TarFile.list(verbose=True, *, members=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000353
354 Print a table of contents to ``sys.stdout``. If *verbose* is :const:`False`,
355 only the names of the members are printed. If it is :const:`True`, output
Serhiy Storchakaa7eb7462014-08-21 10:01:16 +0300356 similar to that of :program:`ls -l` is produced. If optional *members* is
357 given, it must be a subset of the list returned by :meth:`getmembers`.
358
359 .. versionchanged:: 3.5
360 Added the *members* parameter.
Georg Brandl116aa622007-08-15 14:28:22 +0000361
362
363.. method:: TarFile.next()
364
365 Return the next member of the archive as a :class:`TarInfo` object, when
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000366 :class:`TarFile` is opened for reading. Return :const:`None` if there is no more
Georg Brandl116aa622007-08-15 14:28:22 +0000367 available.
368
369
Eric V. Smith7a803892015-04-15 10:27:58 -0400370.. method:: TarFile.extractall(path=".", members=None, *, numeric_owner=False)
Georg Brandl116aa622007-08-15 14:28:22 +0000371
372 Extract all members from the archive to the current working directory or
373 directory *path*. If optional *members* is given, it must be a subset of the
374 list returned by :meth:`getmembers`. Directory information like owner,
375 modification time and permissions are set after all members have been extracted.
376 This is done to work around two problems: A directory's modification time is
377 reset each time a file is created in it. And, if a directory's permissions do
378 not allow writing, extracting files to it will fail.
379
Eric V. Smith7a803892015-04-15 10:27:58 -0400380 If *numeric_owner* is :const:`True`, the uid and gid numbers from the tarfile
381 are used to set the owner/group for the extracted files. Otherwise, the named
382 values from the tarfile are used.
383
Thomas Wouters47b49bf2007-08-30 22:15:33 +0000384 .. warning::
385
386 Never extract archives from untrusted sources without prior inspection.
387 It is possible that files are created outside of *path*, e.g. members
388 that have absolute filenames starting with ``"/"`` or filenames with two
389 dots ``".."``.
390
Eric V. Smith7a803892015-04-15 10:27:58 -0400391 .. versionchanged:: 3.5
392 Added the *numeric_only* parameter.
Georg Brandl116aa622007-08-15 14:28:22 +0000393
Eric V. Smith7a803892015-04-15 10:27:58 -0400394
395.. method:: TarFile.extract(member, path="", set_attrs=True, *, numeric_owner=False)
Georg Brandl116aa622007-08-15 14:28:22 +0000396
397 Extract a member from the archive to the current working directory, using its
398 full name. Its file information is extracted as accurately as possible. *member*
399 may be a filename or a :class:`TarInfo` object. You can specify a different
Martin v. Löwis16f344d2010-11-01 21:39:13 +0000400 directory using *path*. File attributes (owner, mtime, mode) are set unless
Serhiy Storchakafbc1c262013-11-29 12:17:13 +0200401 *set_attrs* is false.
Georg Brandl116aa622007-08-15 14:28:22 +0000402
Eric V. Smith7a803892015-04-15 10:27:58 -0400403 If *numeric_owner* is :const:`True`, the uid and gid numbers from the tarfile
404 are used to set the owner/group for the extracted files. Otherwise, the named
405 values from the tarfile are used.
406
Georg Brandl116aa622007-08-15 14:28:22 +0000407 .. note::
408
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000409 The :meth:`extract` method does not take care of several extraction issues.
410 In most cases you should consider using the :meth:`extractall` method.
Georg Brandl116aa622007-08-15 14:28:22 +0000411
Thomas Wouters47b49bf2007-08-30 22:15:33 +0000412 .. warning::
413
414 See the warning for :meth:`extractall`.
415
Martin v. Löwis16f344d2010-11-01 21:39:13 +0000416 .. versionchanged:: 3.2
417 Added the *set_attrs* parameter.
Georg Brandl116aa622007-08-15 14:28:22 +0000418
Eric V. Smith7a803892015-04-15 10:27:58 -0400419 .. versionchanged:: 3.5
420 Added the *numeric_only* parameter.
421
Georg Brandl116aa622007-08-15 14:28:22 +0000422.. method:: TarFile.extractfile(member)
423
424 Extract a member from the archive as a file object. *member* may be a filename
Lars Gustäbel7a919e92012-05-05 18:15:03 +0200425 or a :class:`TarInfo` object. If *member* is a regular file or a link, an
426 :class:`io.BufferedReader` object is returned. Otherwise, :const:`None` is
427 returned.
Georg Brandl116aa622007-08-15 14:28:22 +0000428
Lars Gustäbel7a919e92012-05-05 18:15:03 +0200429 .. versionchanged:: 3.3
430 Return an :class:`io.BufferedReader` object.
Georg Brandl116aa622007-08-15 14:28:22 +0000431
432
Raymond Hettingera63a3122011-01-26 20:34:14 +0000433.. method:: TarFile.add(name, arcname=None, recursive=True, exclude=None, *, filter=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000434
Raymond Hettingera63a3122011-01-26 20:34:14 +0000435 Add the file *name* to the archive. *name* may be any type of file
436 (directory, fifo, symbolic link, etc.). If given, *arcname* specifies an
437 alternative name for the file in the archive. Directories are added
438 recursively by default. This can be avoided by setting *recursive* to
439 :const:`False`. If *exclude* is given, it must be a function that takes one
440 filename argument and returns a boolean value. Depending on this value the
441 respective file is either excluded (:const:`True`) or added
442 (:const:`False`). If *filter* is specified it must be a keyword argument. It
443 should be a function that takes a :class:`TarInfo` object argument and
444 returns the changed :class:`TarInfo` object. If it instead returns
445 :const:`None` the :class:`TarInfo` object will be excluded from the
446 archive. See :ref:`tar-examples` for an example.
Lars Gustäbel049d2aa2009-09-12 10:44:00 +0000447
448 .. versionchanged:: 3.2
449 Added the *filter* parameter.
450
451 .. deprecated:: 3.2
452 The *exclude* parameter is deprecated, please use the *filter* parameter
453 instead.
Georg Brandl116aa622007-08-15 14:28:22 +0000454
Georg Brandl116aa622007-08-15 14:28:22 +0000455
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000456.. method:: TarFile.addfile(tarinfo, fileobj=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000457
458 Add the :class:`TarInfo` object *tarinfo* to the archive. If *fileobj* is given,
Martin Panterf817a482016-02-19 23:34:56 +0000459 it should be a :term:`binary file`, and
Georg Brandl116aa622007-08-15 14:28:22 +0000460 ``tarinfo.size`` bytes are read from it and added to the archive. You can
Martin Panterf817a482016-02-19 23:34:56 +0000461 create :class:`TarInfo` objects directly, or by using :meth:`gettarinfo`.
Georg Brandl116aa622007-08-15 14:28:22 +0000462
463
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000464.. method:: TarFile.gettarinfo(name=None, arcname=None, fileobj=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000465
Martin Panterf817a482016-02-19 23:34:56 +0000466 Create a :class:`TarInfo` object from the result of :func:`os.stat` or
467 equivalent on an existing file. The file is either named by *name*, or
468 specified as a :term:`file object` *fileobj* with a file descriptor. If
469 given, *arcname* specifies an alternative name for the file in the
470 archive, otherwise, the name is taken from *fileobj*’s
471 :attr:`~io.FileIO.name` attribute, or the *name* argument. The name
472 should be a text string.
473
474 You can modify
475 some of the :class:`TarInfo`’s attributes before you add it using :meth:`addfile`.
476 If the file object is not an ordinary file object positioned at the
477 beginning of the file, attributes such as :attr:`~TarInfo.size` may need
478 modifying. This is the case for objects such as :class:`~gzip.GzipFile`.
479 The :attr:`~TarInfo.name` may also be modified, in which case *arcname*
480 could be a dummy string.
Georg Brandl116aa622007-08-15 14:28:22 +0000481
482
483.. method:: TarFile.close()
484
485 Close the :class:`TarFile`. In write mode, two finishing zero blocks are
486 appended to the archive.
487
488
Georg Brandl116aa622007-08-15 14:28:22 +0000489.. attribute:: TarFile.pax_headers
490
491 A dictionary containing key-value pairs of pax global headers.
492
Georg Brandl116aa622007-08-15 14:28:22 +0000493
Georg Brandl116aa622007-08-15 14:28:22 +0000494
495.. _tarinfo-objects:
496
497TarInfo Objects
498---------------
499
500A :class:`TarInfo` object represents one member in a :class:`TarFile`. Aside
501from storing all required attributes of a file (like file type, size, time,
502permissions, owner etc.), it provides some useful methods to determine its type.
503It does *not* contain the file's data itself.
504
505:class:`TarInfo` objects are returned by :class:`TarFile`'s methods
506:meth:`getmember`, :meth:`getmembers` and :meth:`gettarinfo`.
507
508
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000509.. class:: TarInfo(name="")
Georg Brandl116aa622007-08-15 14:28:22 +0000510
511 Create a :class:`TarInfo` object.
512
513
Berker Peksag37de9102015-04-19 04:37:35 +0300514.. classmethod:: TarInfo.frombuf(buf, encoding, errors)
Georg Brandl116aa622007-08-15 14:28:22 +0000515
516 Create and return a :class:`TarInfo` object from string buffer *buf*.
517
Berker Peksag37de9102015-04-19 04:37:35 +0300518 Raises :exc:`HeaderError` if the buffer is invalid.
Georg Brandl116aa622007-08-15 14:28:22 +0000519
520
Berker Peksag37de9102015-04-19 04:37:35 +0300521.. classmethod:: TarInfo.fromtarfile(tarfile)
Georg Brandl116aa622007-08-15 14:28:22 +0000522
523 Read the next member from the :class:`TarFile` object *tarfile* and return it as
524 a :class:`TarInfo` object.
525
Georg Brandl116aa622007-08-15 14:28:22 +0000526
Victor Stinnerde629d42010-05-05 21:43:57 +0000527.. method:: TarInfo.tobuf(format=DEFAULT_FORMAT, encoding=ENCODING, errors='surrogateescape')
Georg Brandl116aa622007-08-15 14:28:22 +0000528
529 Create a string buffer from a :class:`TarInfo` object. For information on the
530 arguments see the constructor of the :class:`TarFile` class.
531
Victor Stinnerde629d42010-05-05 21:43:57 +0000532 .. versionchanged:: 3.2
533 Use ``'surrogateescape'`` as the default for the *errors* argument.
534
Georg Brandl116aa622007-08-15 14:28:22 +0000535
536A ``TarInfo`` object has the following public data attributes:
537
538
539.. attribute:: TarInfo.name
540
541 Name of the archive member.
542
543
544.. attribute:: TarInfo.size
545
546 Size in bytes.
547
548
549.. attribute:: TarInfo.mtime
550
551 Time of last modification.
552
553
554.. attribute:: TarInfo.mode
555
556 Permission bits.
557
558
559.. attribute:: TarInfo.type
560
561 File type. *type* is usually one of these constants: :const:`REGTYPE`,
562 :const:`AREGTYPE`, :const:`LNKTYPE`, :const:`SYMTYPE`, :const:`DIRTYPE`,
563 :const:`FIFOTYPE`, :const:`CONTTYPE`, :const:`CHRTYPE`, :const:`BLKTYPE`,
564 :const:`GNUTYPE_SPARSE`. To determine the type of a :class:`TarInfo` object
Raymond Hettingerf7f64f92014-05-23 00:03:45 +0100565 more conveniently, use the ``is*()`` methods below.
Georg Brandl116aa622007-08-15 14:28:22 +0000566
567
568.. attribute:: TarInfo.linkname
569
570 Name of the target file name, which is only present in :class:`TarInfo` objects
571 of type :const:`LNKTYPE` and :const:`SYMTYPE`.
572
573
574.. attribute:: TarInfo.uid
575
576 User ID of the user who originally stored this member.
577
578
579.. attribute:: TarInfo.gid
580
581 Group ID of the user who originally stored this member.
582
583
584.. attribute:: TarInfo.uname
585
586 User name.
587
588
589.. attribute:: TarInfo.gname
590
591 Group name.
592
593
594.. attribute:: TarInfo.pax_headers
595
596 A dictionary containing key-value pairs of an associated pax extended header.
597
Georg Brandl116aa622007-08-15 14:28:22 +0000598
599A :class:`TarInfo` object also provides some convenient query methods:
600
601
602.. method:: TarInfo.isfile()
603
604 Return :const:`True` if the :class:`Tarinfo` object is a regular file.
605
606
607.. method:: TarInfo.isreg()
608
609 Same as :meth:`isfile`.
610
611
612.. method:: TarInfo.isdir()
613
614 Return :const:`True` if it is a directory.
615
616
617.. method:: TarInfo.issym()
618
619 Return :const:`True` if it is a symbolic link.
620
621
622.. method:: TarInfo.islnk()
623
624 Return :const:`True` if it is a hard link.
625
626
627.. method:: TarInfo.ischr()
628
629 Return :const:`True` if it is a character device.
630
631
632.. method:: TarInfo.isblk()
633
634 Return :const:`True` if it is a block device.
635
636
637.. method:: TarInfo.isfifo()
638
639 Return :const:`True` if it is a FIFO.
640
641
642.. method:: TarInfo.isdev()
643
644 Return :const:`True` if it is one of character device, block device or FIFO.
645
Georg Brandl116aa622007-08-15 14:28:22 +0000646
Serhiy Storchakad27b4552013-11-24 01:53:29 +0200647.. _tarfile-commandline:
648
649Command Line Interface
650----------------------
651
652.. versionadded:: 3.4
653
654The :mod:`tarfile` module provides a simple command line interface to interact
655with tar archives.
656
657If you want to create a new tar archive, specify its name after the :option:`-c`
658option and then list the filename(s) that should be included::
659
660 $ python -m tarfile -c monty.tar spam.txt eggs.txt
661
662Passing a directory is also acceptable::
663
664 $ python -m tarfile -c monty.tar life-of-brian_1979/
665
666If you want to extract a tar archive into the current directory, use
667the :option:`-e` option::
668
669 $ python -m tarfile -e monty.tar
670
671You can also extract a tar archive into a different directory by passing the
672directory's name::
673
674 $ python -m tarfile -e monty.tar other-dir/
675
676For a list of the files in a tar archive, use the :option:`-l` option::
677
678 $ python -m tarfile -l monty.tar
679
680
681Command line options
682~~~~~~~~~~~~~~~~~~~~
683
684.. cmdoption:: -l <tarfile>
685 --list <tarfile>
686
687 List files in a tarfile.
688
689.. cmdoption:: -c <tarfile> <source1> <sourceN>
690 --create <tarfile> <source1> <sourceN>
691
692 Create tarfile from source files.
693
694.. cmdoption:: -e <tarfile> [<output_dir>]
695 --extract <tarfile> [<output_dir>]
696
697 Extract tarfile into the current directory if *output_dir* is not specified.
698
699.. cmdoption:: -t <tarfile>
700 --test <tarfile>
701
702 Test whether the tarfile is valid or not.
703
704.. cmdoption:: -v, --verbose
705
706 Verbose output
707
Georg Brandl116aa622007-08-15 14:28:22 +0000708.. _tar-examples:
709
710Examples
711--------
712
713How to extract an entire tar archive to the current working directory::
714
715 import tarfile
716 tar = tarfile.open("sample.tar.gz")
717 tar.extractall()
718 tar.close()
719
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000720How to extract a subset of a tar archive with :meth:`TarFile.extractall` using
721a generator function instead of a list::
722
723 import os
724 import tarfile
725
726 def py_files(members):
727 for tarinfo in members:
728 if os.path.splitext(tarinfo.name)[1] == ".py":
729 yield tarinfo
730
731 tar = tarfile.open("sample.tar.gz")
732 tar.extractall(members=py_files(tar))
733 tar.close()
734
Georg Brandl116aa622007-08-15 14:28:22 +0000735How to create an uncompressed tar archive from a list of filenames::
736
737 import tarfile
738 tar = tarfile.open("sample.tar", "w")
739 for name in ["foo", "bar", "quux"]:
740 tar.add(name)
741 tar.close()
742
Lars Gustäbel01385812010-03-03 12:08:54 +0000743The same example using the :keyword:`with` statement::
744
745 import tarfile
746 with tarfile.open("sample.tar", "w") as tar:
747 for name in ["foo", "bar", "quux"]:
748 tar.add(name)
749
Georg Brandl116aa622007-08-15 14:28:22 +0000750How to read a gzip compressed tar archive and display some member information::
751
752 import tarfile
753 tar = tarfile.open("sample.tar.gz", "r:gz")
754 for tarinfo in tar:
Collin Winterc79461b2007-09-01 23:34:30 +0000755 print(tarinfo.name, "is", tarinfo.size, "bytes in size and is", end="")
Georg Brandl116aa622007-08-15 14:28:22 +0000756 if tarinfo.isreg():
Collin Winterc79461b2007-09-01 23:34:30 +0000757 print("a regular file.")
Georg Brandl116aa622007-08-15 14:28:22 +0000758 elif tarinfo.isdir():
Collin Winterc79461b2007-09-01 23:34:30 +0000759 print("a directory.")
Georg Brandl116aa622007-08-15 14:28:22 +0000760 else:
Collin Winterc79461b2007-09-01 23:34:30 +0000761 print("something else.")
Georg Brandl116aa622007-08-15 14:28:22 +0000762 tar.close()
763
Lars Gustäbel049d2aa2009-09-12 10:44:00 +0000764How to create an archive and reset the user information using the *filter*
765parameter in :meth:`TarFile.add`::
766
767 import tarfile
768 def reset(tarinfo):
769 tarinfo.uid = tarinfo.gid = 0
770 tarinfo.uname = tarinfo.gname = "root"
771 return tarinfo
772 tar = tarfile.open("sample.tar.gz", "w:gz")
773 tar.add("foo", filter=reset)
774 tar.close()
775
Georg Brandl116aa622007-08-15 14:28:22 +0000776
777.. _tar-formats:
778
779Supported tar formats
780---------------------
781
782There are three tar formats that can be created with the :mod:`tarfile` module:
783
784* The POSIX.1-1988 ustar format (:const:`USTAR_FORMAT`). It supports filenames
785 up to a length of at best 256 characters and linknames up to 100 characters. The
Serhiy Storchakaf8def282013-02-16 17:29:56 +0200786 maximum file size is 8 GiB. This is an old and limited but widely
Georg Brandl116aa622007-08-15 14:28:22 +0000787 supported format.
788
789* The GNU tar format (:const:`GNU_FORMAT`). It supports long filenames and
Serhiy Storchakaf8def282013-02-16 17:29:56 +0200790 linknames, files bigger than 8 GiB and sparse files. It is the de facto
Georg Brandl116aa622007-08-15 14:28:22 +0000791 standard on GNU/Linux systems. :mod:`tarfile` fully supports the GNU tar
792 extensions for long names, sparse file support is read-only.
793
794* The POSIX.1-2001 pax format (:const:`PAX_FORMAT`). It is the most flexible
795 format with virtually no limits. It supports long filenames and linknames, large
796 files and stores pathnames in a portable way. However, not all tar
797 implementations today are able to handle pax archives properly.
798
799 The *pax* format is an extension to the existing *ustar* format. It uses extra
800 headers for information that cannot be stored otherwise. There are two flavours
801 of pax headers: Extended headers only affect the subsequent file header, global
802 headers are valid for the complete archive and affect all following files. All
803 the data in a pax header is encoded in *UTF-8* for portability reasons.
804
805There are some more variants of the tar format which can be read, but not
806created:
807
808* The ancient V7 format. This is the first tar format from Unix Seventh Edition,
809 storing only regular files and directories. Names must not be longer than 100
810 characters, there is no user/group name information. Some archives have
811 miscalculated header checksums in case of fields with non-ASCII characters.
812
813* The SunOS tar extended format. This format is a variant of the POSIX.1-2001
814 pax format, but is not compatible.
815
Georg Brandl116aa622007-08-15 14:28:22 +0000816.. _tar-unicode:
817
818Unicode issues
819--------------
820
821The tar format was originally conceived to make backups on tape drives with the
822main focus on preserving file system information. Nowadays tar archives are
823commonly used for file distribution and exchanging archives over networks. One
Lars Gustäbel3741eff2007-08-21 12:17:05 +0000824problem of the original format (which is the basis of all other formats) is
825that there is no concept of supporting different character encodings. For
Georg Brandl116aa622007-08-15 14:28:22 +0000826example, an ordinary tar archive created on a *UTF-8* system cannot be read
Lars Gustäbel3741eff2007-08-21 12:17:05 +0000827correctly on a *Latin-1* system if it contains non-*ASCII* characters. Textual
828metadata (like filenames, linknames, user/group names) will appear damaged.
829Unfortunately, there is no way to autodetect the encoding of an archive. The
830pax format was designed to solve this problem. It stores non-ASCII metadata
831using the universal character encoding *UTF-8*.
Georg Brandl116aa622007-08-15 14:28:22 +0000832
Lars Gustäbel3741eff2007-08-21 12:17:05 +0000833The details of character conversion in :mod:`tarfile` are controlled by the
834*encoding* and *errors* keyword arguments of the :class:`TarFile` class.
Georg Brandl116aa622007-08-15 14:28:22 +0000835
Lars Gustäbel3741eff2007-08-21 12:17:05 +0000836*encoding* defines the character encoding to use for the metadata in the
837archive. The default value is :func:`sys.getfilesystemencoding` or ``'ascii'``
838as a fallback. Depending on whether the archive is read or written, the
839metadata must be either decoded or encoded. If *encoding* is not set
840appropriately, this conversion may fail.
Georg Brandl116aa622007-08-15 14:28:22 +0000841
842The *errors* argument defines how characters are treated that cannot be
Nick Coghlanb9fdb7a2015-01-07 00:22:00 +1000843converted. Possible values are listed in section :ref:`error-handlers`.
Victor Stinnerde629d42010-05-05 21:43:57 +0000844The default scheme is ``'surrogateescape'`` which Python also uses for its
845file system calls, see :ref:`os-filenames`.
Georg Brandl116aa622007-08-15 14:28:22 +0000846
Lars Gustäbel1465cc22010-05-17 18:02:50 +0000847In case of :const:`PAX_FORMAT` archives, *encoding* is generally not needed
848because all the metadata is stored using *UTF-8*. *encoding* is only used in
849the rare cases when binary pax headers are decoded or when strings with
850surrogate characters are stored.