blob: e418d5b162fbff71ba8e798e1d13149d2e05433a [file] [log] [blame]
Georg Brandl116aa622007-08-15 14:28:22 +00001:mod:`tarfile` --- Read and write tar archive files
2===================================================
3
4.. module:: tarfile
5 :synopsis: Read and write tar-format archive files.
6
7
Georg Brandl116aa622007-08-15 14:28:22 +00008.. moduleauthor:: Lars Gustäbel <lars@gustaebel.de>
9.. sectionauthor:: Lars Gustäbel <lars@gustaebel.de>
10
Raymond Hettingera1993682011-01-27 01:20:32 +000011**Source code:** :source:`Lib/tarfile.py`
12
13--------------
Georg Brandl116aa622007-08-15 14:28:22 +000014
Guido van Rossum77677112007-11-05 19:43:04 +000015The :mod:`tarfile` module makes it possible to read and write tar
Lars Gustäbel0a9dd2f2011-12-10 20:38:14 +010016archives, including those using gzip, bz2 and lzma compression.
Éric Araujof2fbb9c2012-01-16 16:55:55 +010017Use the :mod:`zipfile` module to read or write :file:`.zip` files, or the
18higher-level functions in :ref:`shutil <archiving-operations>`.
Guido van Rossum77677112007-11-05 19:43:04 +000019
Georg Brandl116aa622007-08-15 14:28:22 +000020Some facts and figures:
21
R David Murraybf92bce2014-10-03 20:18:48 -040022* reads and writes :mod:`gzip`, :mod:`bz2` and :mod:`lzma` compressed archives
23 if the respective modules are available.
Georg Brandl116aa622007-08-15 14:28:22 +000024
25* read/write support for the POSIX.1-1988 (ustar) format.
26
27* read/write support for the GNU tar format including *longname* and *longlink*
Lars Gustäbel9cbdd752010-10-29 09:08:19 +000028 extensions, read-only support for all variants of the *sparse* extension
29 including restoration of sparse files.
Georg Brandl116aa622007-08-15 14:28:22 +000030
31* read/write support for the POSIX.1-2001 (pax) format.
32
Georg Brandl116aa622007-08-15 14:28:22 +000033* handles directories, regular files, hardlinks, symbolic links, fifos,
34 character devices and block devices and is able to acquire and restore file
35 information like timestamp, access permissions and owner.
36
Lars Gustäbel521dfb02011-12-12 10:22:56 +010037.. versionchanged:: 3.3
38 Added support for :mod:`lzma` compression.
39
Georg Brandl116aa622007-08-15 14:28:22 +000040
Benjamin Petersona37cfc62008-05-26 13:48:34 +000041.. function:: open(name=None, mode='r', fileobj=None, bufsize=10240, \*\*kwargs)
Georg Brandl116aa622007-08-15 14:28:22 +000042
43 Return a :class:`TarFile` object for the pathname *name*. For detailed
44 information on :class:`TarFile` objects and the keyword arguments that are
45 allowed, see :ref:`tarfile-objects`.
46
47 *mode* has to be a string of the form ``'filemode[:compression]'``, it defaults
48 to ``'r'``. Here is a full list of mode combinations:
49
50 +------------------+---------------------------------------------+
51 | mode | action |
52 +==================+=============================================+
53 | ``'r' or 'r:*'`` | Open for reading with transparent |
54 | | compression (recommended). |
55 +------------------+---------------------------------------------+
56 | ``'r:'`` | Open for reading exclusively without |
57 | | compression. |
58 +------------------+---------------------------------------------+
59 | ``'r:gz'`` | Open for reading with gzip compression. |
60 +------------------+---------------------------------------------+
61 | ``'r:bz2'`` | Open for reading with bzip2 compression. |
62 +------------------+---------------------------------------------+
Lars Gustäbel0a9dd2f2011-12-10 20:38:14 +010063 | ``'r:xz'`` | Open for reading with lzma compression. |
64 +------------------+---------------------------------------------+
Berker Peksag0fe63252015-02-13 21:02:12 +020065 | ``'x'`` or | Create a tarfile exclusively without |
66 | ``'x:'`` | compression. |
67 | | Raise an :exc:`FileExistsError` exception |
68 | | if it is already exists. |
69 +------------------+---------------------------------------------+
70 | ``'x:gz'`` | Create a tarfile with gzip compression. |
71 | | Raise an :exc:`FileExistsError` exception |
72 | | if it is already exists. |
73 +------------------+---------------------------------------------+
74 | ``'x:bz2'`` | Create a tarfile with bzip2 compression. |
75 | | Raise an :exc:`FileExistsError` exception |
76 | | if it is already exists. |
77 +------------------+---------------------------------------------+
78 | ``'x:xz'`` | Create a tarfile with lzma compression. |
79 | | Raise an :exc:`FileExistsError` exception |
80 | | if it is already exists. |
81 +------------------+---------------------------------------------+
Georg Brandl116aa622007-08-15 14:28:22 +000082 | ``'a' or 'a:'`` | Open for appending with no compression. The |
83 | | file is created if it does not exist. |
84 +------------------+---------------------------------------------+
85 | ``'w' or 'w:'`` | Open for uncompressed writing. |
86 +------------------+---------------------------------------------+
87 | ``'w:gz'`` | Open for gzip compressed writing. |
88 +------------------+---------------------------------------------+
89 | ``'w:bz2'`` | Open for bzip2 compressed writing. |
90 +------------------+---------------------------------------------+
Lars Gustäbel0a9dd2f2011-12-10 20:38:14 +010091 | ``'w:xz'`` | Open for lzma compressed writing. |
92 +------------------+---------------------------------------------+
Georg Brandl116aa622007-08-15 14:28:22 +000093
Lars Gustäbel0a9dd2f2011-12-10 20:38:14 +010094 Note that ``'a:gz'``, ``'a:bz2'`` or ``'a:xz'`` is not possible. If *mode*
95 is not suitable to open a certain (compressed) file for reading,
96 :exc:`ReadError` is raised. Use *mode* ``'r'`` to avoid this. If a
97 compression method is not supported, :exc:`CompressionError` is raised.
Georg Brandl116aa622007-08-15 14:28:22 +000098
Antoine Pitrou11cb9612010-09-15 11:11:28 +000099 If *fileobj* is specified, it is used as an alternative to a :term:`file object`
100 opened in binary mode for *name*. It is supposed to be at position 0.
Georg Brandl116aa622007-08-15 14:28:22 +0000101
Berker Peksag0fe63252015-02-13 21:02:12 +0200102 For modes ``'w:gz'``, ``'r:gz'``, ``'w:bz2'``, ``'r:bz2'``, ``'x:gz'``,
103 ``'x:bz2'``, :func:`tarfile.open` accepts the keyword argument
104 *compresslevel* to specify the compression level of the file.
Benjamin Peterson9b2731b2014-06-07 12:45:37 -0700105
Georg Brandl116aa622007-08-15 14:28:22 +0000106 For special purposes, there is a second format for *mode*:
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000107 ``'filemode|[compression]'``. :func:`tarfile.open` will return a :class:`TarFile`
Georg Brandl116aa622007-08-15 14:28:22 +0000108 object that processes its data as a stream of blocks. No random seeking will
109 be done on the file. If given, *fileobj* may be any object that has a
110 :meth:`read` or :meth:`write` method (depending on the *mode*). *bufsize*
111 specifies the blocksize and defaults to ``20 * 512`` bytes. Use this variant
Antoine Pitrou11cb9612010-09-15 11:11:28 +0000112 in combination with e.g. ``sys.stdin``, a socket :term:`file object` or a tape
Georg Brandl116aa622007-08-15 14:28:22 +0000113 device. However, such a :class:`TarFile` object is limited in that it does
Martin Panterc04fb562016-02-10 05:44:01 +0000114 not allow random access, see :ref:`tar-examples`. The currently
Georg Brandl116aa622007-08-15 14:28:22 +0000115 possible modes:
116
117 +-------------+--------------------------------------------+
118 | Mode | Action |
119 +=============+============================================+
120 | ``'r|*'`` | Open a *stream* of tar blocks for reading |
121 | | with transparent compression. |
122 +-------------+--------------------------------------------+
123 | ``'r|'`` | Open a *stream* of uncompressed tar blocks |
124 | | for reading. |
125 +-------------+--------------------------------------------+
126 | ``'r|gz'`` | Open a gzip compressed *stream* for |
127 | | reading. |
128 +-------------+--------------------------------------------+
129 | ``'r|bz2'`` | Open a bzip2 compressed *stream* for |
130 | | reading. |
131 +-------------+--------------------------------------------+
Lars Gustäbel0a9dd2f2011-12-10 20:38:14 +0100132 | ``'r|xz'`` | Open a lzma compressed *stream* for |
133 | | reading. |
134 +-------------+--------------------------------------------+
Georg Brandl116aa622007-08-15 14:28:22 +0000135 | ``'w|'`` | Open an uncompressed *stream* for writing. |
136 +-------------+--------------------------------------------+
Lars Gustäbel0c6cbbd2011-12-10 12:45:45 +0100137 | ``'w|gz'`` | Open a gzip compressed *stream* for |
Georg Brandl116aa622007-08-15 14:28:22 +0000138 | | writing. |
139 +-------------+--------------------------------------------+
Lars Gustäbel0c6cbbd2011-12-10 12:45:45 +0100140 | ``'w|bz2'`` | Open a bzip2 compressed *stream* for |
Georg Brandl116aa622007-08-15 14:28:22 +0000141 | | writing. |
142 +-------------+--------------------------------------------+
Lars Gustäbel0a9dd2f2011-12-10 20:38:14 +0100143 | ``'w|xz'`` | Open an lzma compressed *stream* for |
144 | | writing. |
145 +-------------+--------------------------------------------+
Georg Brandl116aa622007-08-15 14:28:22 +0000146
Berker Peksag0fe63252015-02-13 21:02:12 +0200147 .. versionchanged:: 3.5
148 The ``'x'`` (exclusive creation) mode was added.
Georg Brandl116aa622007-08-15 14:28:22 +0000149
150.. class:: TarFile
151
152 Class for reading and writing tar archives. Do not use this class directly,
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000153 better use :func:`tarfile.open` instead. See :ref:`tarfile-objects`.
Georg Brandl116aa622007-08-15 14:28:22 +0000154
155
156.. function:: is_tarfile(name)
157
158 Return :const:`True` if *name* is a tar archive file, that the :mod:`tarfile`
159 module can read.
160
161
Lars Gustäbel0c24e8b2008-08-02 11:43:24 +0000162The :mod:`tarfile` module defines the following exceptions:
Georg Brandl116aa622007-08-15 14:28:22 +0000163
164
165.. exception:: TarError
166
167 Base class for all :mod:`tarfile` exceptions.
168
169
170.. exception:: ReadError
171
172 Is raised when a tar archive is opened, that either cannot be handled by the
173 :mod:`tarfile` module or is somehow invalid.
174
175
176.. exception:: CompressionError
177
178 Is raised when a compression method is not supported or when the data cannot be
179 decoded properly.
180
181
182.. exception:: StreamError
183
184 Is raised for the limitations that are typical for stream-like :class:`TarFile`
185 objects.
186
187
188.. exception:: ExtractError
189
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000190 Is raised for *non-fatal* errors when using :meth:`TarFile.extract`, but only if
Georg Brandl116aa622007-08-15 14:28:22 +0000191 :attr:`TarFile.errorlevel`\ ``== 2``.
192
193
194.. exception:: HeaderError
195
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000196 Is raised by :meth:`TarInfo.frombuf` if the buffer it gets is invalid.
197
Georg Brandl116aa622007-08-15 14:28:22 +0000198
R David Murraybf92bce2014-10-03 20:18:48 -0400199The following constants are available at the module level:
200
201.. data:: ENCODING
202
203 The default character encoding: ``'utf-8'`` on Windows, the value returned by
204 :func:`sys.getfilesystemencoding` otherwise.
205
Georg Brandl116aa622007-08-15 14:28:22 +0000206
207Each of the following constants defines a tar archive format that the
208:mod:`tarfile` module is able to create. See section :ref:`tar-formats` for
209details.
210
211
212.. data:: USTAR_FORMAT
213
214 POSIX.1-1988 (ustar) format.
215
216
217.. data:: GNU_FORMAT
218
219 GNU tar format.
220
221
222.. data:: PAX_FORMAT
223
224 POSIX.1-2001 (pax) format.
225
226
227.. data:: DEFAULT_FORMAT
228
229 The default format for creating archives. This is currently :const:`GNU_FORMAT`.
230
231
232.. seealso::
233
234 Module :mod:`zipfile`
235 Documentation of the :mod:`zipfile` standard module.
236
R David Murraybf92bce2014-10-03 20:18:48 -0400237 :ref:`archiving-operations`
238 Documentation of the higher-level archiving facilities provided by the
239 standard :mod:`shutil` module.
240
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000241 `GNU tar manual, Basic Tar Format <http://www.gnu.org/software/tar/manual/html_node/Standard.html>`_
Georg Brandl116aa622007-08-15 14:28:22 +0000242 Documentation for tar archive files, including GNU tar extensions.
243
Georg Brandl116aa622007-08-15 14:28:22 +0000244
245.. _tarfile-objects:
246
247TarFile Objects
248---------------
249
250The :class:`TarFile` object provides an interface to a tar archive. A tar
251archive is a sequence of blocks. An archive member (a stored file) is made up of
252a header block followed by data blocks. It is possible to store a file in a tar
253archive several times. Each archive member is represented by a :class:`TarInfo`
254object, see :ref:`tarinfo-objects` for details.
255
Lars Gustäbel01385812010-03-03 12:08:54 +0000256A :class:`TarFile` object can be used as a context manager in a :keyword:`with`
257statement. It will automatically be closed when the block is completed. Please
258note that in the event of an exception an archive opened for writing will not
Benjamin Peterson08bf91c2010-04-11 16:12:57 +0000259be finalized; only the internally used file object will be closed. See the
Lars Gustäbel01385812010-03-03 12:08:54 +0000260:ref:`tar-examples` section for a use case.
261
262.. versionadded:: 3.2
Serhiy Storchaka14867992014-09-10 23:43:41 +0300263 Added support for the context management protocol.
Georg Brandl116aa622007-08-15 14:28:22 +0000264
Victor Stinnerde629d42010-05-05 21:43:57 +0000265.. class:: TarFile(name=None, mode='r', fileobj=None, format=DEFAULT_FORMAT, tarinfo=TarInfo, dereference=False, ignore_zeros=False, encoding=ENCODING, errors='surrogateescape', pax_headers=None, debug=0, errorlevel=0)
Georg Brandl116aa622007-08-15 14:28:22 +0000266
267 All following arguments are optional and can be accessed as instance attributes
268 as well.
269
270 *name* is the pathname of the archive. It can be omitted if *fileobj* is given.
271 In this case, the file object's :attr:`name` attribute is used if it exists.
272
273 *mode* is either ``'r'`` to read from an existing archive, ``'a'`` to append
Berker Peksag0fe63252015-02-13 21:02:12 +0200274 data to an existing file, ``'w'`` to create a new file overwriting an existing
275 one or ``'x'`` to create a new file only if it's not exists.
Georg Brandl116aa622007-08-15 14:28:22 +0000276
277 If *fileobj* is given, it is used for reading or writing data. If it can be
278 determined, *mode* is overridden by *fileobj*'s mode. *fileobj* will be used
279 from position 0.
280
281 .. note::
282
283 *fileobj* is not closed, when :class:`TarFile` is closed.
284
285 *format* controls the archive format. It must be one of the constants
286 :const:`USTAR_FORMAT`, :const:`GNU_FORMAT` or :const:`PAX_FORMAT` that are
287 defined at module level.
288
Georg Brandl116aa622007-08-15 14:28:22 +0000289 The *tarinfo* argument can be used to replace the default :class:`TarInfo` class
290 with a different one.
291
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000292 If *dereference* is :const:`False`, add symbolic and hard links to the archive. If it
293 is :const:`True`, add the content of the target files to the archive. This has no
Georg Brandl116aa622007-08-15 14:28:22 +0000294 effect on systems that do not support symbolic links.
295
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000296 If *ignore_zeros* is :const:`False`, treat an empty block as the end of the archive.
297 If it is :const:`True`, skip empty (and invalid) blocks and try to get as many members
Georg Brandl116aa622007-08-15 14:28:22 +0000298 as possible. This is only useful for reading concatenated or damaged archives.
299
300 *debug* can be set from ``0`` (no debug messages) up to ``3`` (all debug
301 messages). The messages are written to ``sys.stderr``.
302
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000303 If *errorlevel* is ``0``, all errors are ignored when using :meth:`TarFile.extract`.
Georg Brandl116aa622007-08-15 14:28:22 +0000304 Nevertheless, they appear as error messages in the debug output, when debugging
Antoine Pitrou62ab10a02011-10-12 20:10:51 +0200305 is enabled. If ``1``, all *fatal* errors are raised as :exc:`OSError`
306 exceptions. If ``2``, all *non-fatal* errors are raised as :exc:`TarError`
307 exceptions as well.
Georg Brandl116aa622007-08-15 14:28:22 +0000308
Lars Gustäbel3741eff2007-08-21 12:17:05 +0000309 The *encoding* and *errors* arguments define the character encoding to be
310 used for reading or writing the archive and how conversion errors are going
311 to be handled. The default settings will work for most users.
Georg Brandl116aa622007-08-15 14:28:22 +0000312 See section :ref:`tar-unicode` for in-depth information.
313
Lars Gustäbel3741eff2007-08-21 12:17:05 +0000314 The *pax_headers* argument is an optional dictionary of strings which
Georg Brandl116aa622007-08-15 14:28:22 +0000315 will be added as a pax global header if *format* is :const:`PAX_FORMAT`.
316
Berker Peksag0fe63252015-02-13 21:02:12 +0200317 .. versionchanged:: 3.2
318 Use ``'surrogateescape'`` as the default for the *errors* argument.
319
320 .. versionchanged:: 3.5
321 The ``'x'`` (exclusive creation) mode was added.
Georg Brandl116aa622007-08-15 14:28:22 +0000322
Raymond Hettinger7096e262014-05-23 03:46:52 +0100323.. classmethod:: TarFile.open(...)
Georg Brandl116aa622007-08-15 14:28:22 +0000324
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000325 Alternative constructor. The :func:`tarfile.open` function is actually a
326 shortcut to this classmethod.
Georg Brandl116aa622007-08-15 14:28:22 +0000327
328
329.. method:: TarFile.getmember(name)
330
331 Return a :class:`TarInfo` object for member *name*. If *name* can not be found
332 in the archive, :exc:`KeyError` is raised.
333
334 .. note::
335
336 If a member occurs more than once in the archive, its last occurrence is assumed
337 to be the most up-to-date version.
338
339
340.. method:: TarFile.getmembers()
341
342 Return the members of the archive as a list of :class:`TarInfo` objects. The
343 list has the same order as the members in the archive.
344
345
346.. method:: TarFile.getnames()
347
348 Return the members as a list of their names. It has the same order as the list
349 returned by :meth:`getmembers`.
350
351
Serhiy Storchakaa7eb7462014-08-21 10:01:16 +0300352.. method:: TarFile.list(verbose=True, *, members=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000353
354 Print a table of contents to ``sys.stdout``. If *verbose* is :const:`False`,
355 only the names of the members are printed. If it is :const:`True`, output
Serhiy Storchakaa7eb7462014-08-21 10:01:16 +0300356 similar to that of :program:`ls -l` is produced. If optional *members* is
357 given, it must be a subset of the list returned by :meth:`getmembers`.
358
359 .. versionchanged:: 3.5
360 Added the *members* parameter.
Georg Brandl116aa622007-08-15 14:28:22 +0000361
362
363.. method:: TarFile.next()
364
365 Return the next member of the archive as a :class:`TarInfo` object, when
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000366 :class:`TarFile` is opened for reading. Return :const:`None` if there is no more
Georg Brandl116aa622007-08-15 14:28:22 +0000367 available.
368
369
Eric V. Smith7a803892015-04-15 10:27:58 -0400370.. method:: TarFile.extractall(path=".", members=None, *, numeric_owner=False)
Georg Brandl116aa622007-08-15 14:28:22 +0000371
372 Extract all members from the archive to the current working directory or
373 directory *path*. If optional *members* is given, it must be a subset of the
374 list returned by :meth:`getmembers`. Directory information like owner,
375 modification time and permissions are set after all members have been extracted.
376 This is done to work around two problems: A directory's modification time is
377 reset each time a file is created in it. And, if a directory's permissions do
378 not allow writing, extracting files to it will fail.
379
Eric V. Smith7a803892015-04-15 10:27:58 -0400380 If *numeric_owner* is :const:`True`, the uid and gid numbers from the tarfile
381 are used to set the owner/group for the extracted files. Otherwise, the named
382 values from the tarfile are used.
383
Thomas Wouters47b49bf2007-08-30 22:15:33 +0000384 .. warning::
385
386 Never extract archives from untrusted sources without prior inspection.
387 It is possible that files are created outside of *path*, e.g. members
388 that have absolute filenames starting with ``"/"`` or filenames with two
389 dots ``".."``.
390
Eric V. Smith7a803892015-04-15 10:27:58 -0400391 .. versionchanged:: 3.5
392 Added the *numeric_only* parameter.
Georg Brandl116aa622007-08-15 14:28:22 +0000393
Eric V. Smith7a803892015-04-15 10:27:58 -0400394
395.. method:: TarFile.extract(member, path="", set_attrs=True, *, numeric_owner=False)
Georg Brandl116aa622007-08-15 14:28:22 +0000396
397 Extract a member from the archive to the current working directory, using its
398 full name. Its file information is extracted as accurately as possible. *member*
399 may be a filename or a :class:`TarInfo` object. You can specify a different
Martin v. Löwis16f344d2010-11-01 21:39:13 +0000400 directory using *path*. File attributes (owner, mtime, mode) are set unless
Serhiy Storchakafbc1c262013-11-29 12:17:13 +0200401 *set_attrs* is false.
Georg Brandl116aa622007-08-15 14:28:22 +0000402
Eric V. Smith7a803892015-04-15 10:27:58 -0400403 If *numeric_owner* is :const:`True`, the uid and gid numbers from the tarfile
404 are used to set the owner/group for the extracted files. Otherwise, the named
405 values from the tarfile are used.
406
Georg Brandl116aa622007-08-15 14:28:22 +0000407 .. note::
408
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000409 The :meth:`extract` method does not take care of several extraction issues.
410 In most cases you should consider using the :meth:`extractall` method.
Georg Brandl116aa622007-08-15 14:28:22 +0000411
Thomas Wouters47b49bf2007-08-30 22:15:33 +0000412 .. warning::
413
414 See the warning for :meth:`extractall`.
415
Martin v. Löwis16f344d2010-11-01 21:39:13 +0000416 .. versionchanged:: 3.2
417 Added the *set_attrs* parameter.
Georg Brandl116aa622007-08-15 14:28:22 +0000418
Eric V. Smith7a803892015-04-15 10:27:58 -0400419 .. versionchanged:: 3.5
420 Added the *numeric_only* parameter.
421
Georg Brandl116aa622007-08-15 14:28:22 +0000422.. method:: TarFile.extractfile(member)
423
424 Extract a member from the archive as a file object. *member* may be a filename
Lars Gustäbel7a919e92012-05-05 18:15:03 +0200425 or a :class:`TarInfo` object. If *member* is a regular file or a link, an
426 :class:`io.BufferedReader` object is returned. Otherwise, :const:`None` is
427 returned.
Georg Brandl116aa622007-08-15 14:28:22 +0000428
Lars Gustäbel7a919e92012-05-05 18:15:03 +0200429 .. versionchanged:: 3.3
430 Return an :class:`io.BufferedReader` object.
Georg Brandl116aa622007-08-15 14:28:22 +0000431
432
Raymond Hettingera63a3122011-01-26 20:34:14 +0000433.. method:: TarFile.add(name, arcname=None, recursive=True, exclude=None, *, filter=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000434
Raymond Hettingera63a3122011-01-26 20:34:14 +0000435 Add the file *name* to the archive. *name* may be any type of file
436 (directory, fifo, symbolic link, etc.). If given, *arcname* specifies an
437 alternative name for the file in the archive. Directories are added
438 recursively by default. This can be avoided by setting *recursive* to
439 :const:`False`. If *exclude* is given, it must be a function that takes one
440 filename argument and returns a boolean value. Depending on this value the
441 respective file is either excluded (:const:`True`) or added
442 (:const:`False`). If *filter* is specified it must be a keyword argument. It
443 should be a function that takes a :class:`TarInfo` object argument and
444 returns the changed :class:`TarInfo` object. If it instead returns
445 :const:`None` the :class:`TarInfo` object will be excluded from the
446 archive. See :ref:`tar-examples` for an example.
Lars Gustäbel049d2aa2009-09-12 10:44:00 +0000447
448 .. versionchanged:: 3.2
449 Added the *filter* parameter.
450
451 .. deprecated:: 3.2
452 The *exclude* parameter is deprecated, please use the *filter* parameter
453 instead.
Georg Brandl116aa622007-08-15 14:28:22 +0000454
Georg Brandl116aa622007-08-15 14:28:22 +0000455
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000456.. method:: TarFile.addfile(tarinfo, fileobj=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000457
458 Add the :class:`TarInfo` object *tarinfo* to the archive. If *fileobj* is given,
459 ``tarinfo.size`` bytes are read from it and added to the archive. You can
460 create :class:`TarInfo` objects using :meth:`gettarinfo`.
461
462 .. note::
463
464 On Windows platforms, *fileobj* should always be opened with mode ``'rb'`` to
465 avoid irritation about the file size.
466
467
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000468.. method:: TarFile.gettarinfo(name=None, arcname=None, fileobj=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000469
Antoine Pitrou11cb9612010-09-15 11:11:28 +0000470 Create a :class:`TarInfo` object for either the file *name* or the :term:`file
471 object` *fileobj* (using :func:`os.fstat` on its file descriptor). You can modify
472 some of the :class:`TarInfo`'s attributes before you add it using :meth:`addfile`.
Georg Brandl116aa622007-08-15 14:28:22 +0000473 If given, *arcname* specifies an alternative name for the file in the archive.
474
475
476.. method:: TarFile.close()
477
478 Close the :class:`TarFile`. In write mode, two finishing zero blocks are
479 appended to the archive.
480
481
Georg Brandl116aa622007-08-15 14:28:22 +0000482.. attribute:: TarFile.pax_headers
483
484 A dictionary containing key-value pairs of pax global headers.
485
Georg Brandl116aa622007-08-15 14:28:22 +0000486
Georg Brandl116aa622007-08-15 14:28:22 +0000487
488.. _tarinfo-objects:
489
490TarInfo Objects
491---------------
492
493A :class:`TarInfo` object represents one member in a :class:`TarFile`. Aside
494from storing all required attributes of a file (like file type, size, time,
495permissions, owner etc.), it provides some useful methods to determine its type.
496It does *not* contain the file's data itself.
497
498:class:`TarInfo` objects are returned by :class:`TarFile`'s methods
499:meth:`getmember`, :meth:`getmembers` and :meth:`gettarinfo`.
500
501
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000502.. class:: TarInfo(name="")
Georg Brandl116aa622007-08-15 14:28:22 +0000503
504 Create a :class:`TarInfo` object.
505
506
Berker Peksag37de9102015-04-19 04:37:35 +0300507.. classmethod:: TarInfo.frombuf(buf, encoding, errors)
Georg Brandl116aa622007-08-15 14:28:22 +0000508
509 Create and return a :class:`TarInfo` object from string buffer *buf*.
510
Berker Peksag37de9102015-04-19 04:37:35 +0300511 Raises :exc:`HeaderError` if the buffer is invalid.
Georg Brandl116aa622007-08-15 14:28:22 +0000512
513
Berker Peksag37de9102015-04-19 04:37:35 +0300514.. classmethod:: TarInfo.fromtarfile(tarfile)
Georg Brandl116aa622007-08-15 14:28:22 +0000515
516 Read the next member from the :class:`TarFile` object *tarfile* and return it as
517 a :class:`TarInfo` object.
518
Georg Brandl116aa622007-08-15 14:28:22 +0000519
Victor Stinnerde629d42010-05-05 21:43:57 +0000520.. method:: TarInfo.tobuf(format=DEFAULT_FORMAT, encoding=ENCODING, errors='surrogateescape')
Georg Brandl116aa622007-08-15 14:28:22 +0000521
522 Create a string buffer from a :class:`TarInfo` object. For information on the
523 arguments see the constructor of the :class:`TarFile` class.
524
Victor Stinnerde629d42010-05-05 21:43:57 +0000525 .. versionchanged:: 3.2
526 Use ``'surrogateescape'`` as the default for the *errors* argument.
527
Georg Brandl116aa622007-08-15 14:28:22 +0000528
529A ``TarInfo`` object has the following public data attributes:
530
531
532.. attribute:: TarInfo.name
533
534 Name of the archive member.
535
536
537.. attribute:: TarInfo.size
538
539 Size in bytes.
540
541
542.. attribute:: TarInfo.mtime
543
544 Time of last modification.
545
546
547.. attribute:: TarInfo.mode
548
549 Permission bits.
550
551
552.. attribute:: TarInfo.type
553
554 File type. *type* is usually one of these constants: :const:`REGTYPE`,
555 :const:`AREGTYPE`, :const:`LNKTYPE`, :const:`SYMTYPE`, :const:`DIRTYPE`,
556 :const:`FIFOTYPE`, :const:`CONTTYPE`, :const:`CHRTYPE`, :const:`BLKTYPE`,
557 :const:`GNUTYPE_SPARSE`. To determine the type of a :class:`TarInfo` object
Raymond Hettingerf7f64f92014-05-23 00:03:45 +0100558 more conveniently, use the ``is*()`` methods below.
Georg Brandl116aa622007-08-15 14:28:22 +0000559
560
561.. attribute:: TarInfo.linkname
562
563 Name of the target file name, which is only present in :class:`TarInfo` objects
564 of type :const:`LNKTYPE` and :const:`SYMTYPE`.
565
566
567.. attribute:: TarInfo.uid
568
569 User ID of the user who originally stored this member.
570
571
572.. attribute:: TarInfo.gid
573
574 Group ID of the user who originally stored this member.
575
576
577.. attribute:: TarInfo.uname
578
579 User name.
580
581
582.. attribute:: TarInfo.gname
583
584 Group name.
585
586
587.. attribute:: TarInfo.pax_headers
588
589 A dictionary containing key-value pairs of an associated pax extended header.
590
Georg Brandl116aa622007-08-15 14:28:22 +0000591
592A :class:`TarInfo` object also provides some convenient query methods:
593
594
595.. method:: TarInfo.isfile()
596
597 Return :const:`True` if the :class:`Tarinfo` object is a regular file.
598
599
600.. method:: TarInfo.isreg()
601
602 Same as :meth:`isfile`.
603
604
605.. method:: TarInfo.isdir()
606
607 Return :const:`True` if it is a directory.
608
609
610.. method:: TarInfo.issym()
611
612 Return :const:`True` if it is a symbolic link.
613
614
615.. method:: TarInfo.islnk()
616
617 Return :const:`True` if it is a hard link.
618
619
620.. method:: TarInfo.ischr()
621
622 Return :const:`True` if it is a character device.
623
624
625.. method:: TarInfo.isblk()
626
627 Return :const:`True` if it is a block device.
628
629
630.. method:: TarInfo.isfifo()
631
632 Return :const:`True` if it is a FIFO.
633
634
635.. method:: TarInfo.isdev()
636
637 Return :const:`True` if it is one of character device, block device or FIFO.
638
Georg Brandl116aa622007-08-15 14:28:22 +0000639
Serhiy Storchakad27b4552013-11-24 01:53:29 +0200640.. _tarfile-commandline:
641
642Command Line Interface
643----------------------
644
645.. versionadded:: 3.4
646
647The :mod:`tarfile` module provides a simple command line interface to interact
648with tar archives.
649
650If you want to create a new tar archive, specify its name after the :option:`-c`
651option and then list the filename(s) that should be included::
652
653 $ python -m tarfile -c monty.tar spam.txt eggs.txt
654
655Passing a directory is also acceptable::
656
657 $ python -m tarfile -c monty.tar life-of-brian_1979/
658
659If you want to extract a tar archive into the current directory, use
660the :option:`-e` option::
661
662 $ python -m tarfile -e monty.tar
663
664You can also extract a tar archive into a different directory by passing the
665directory's name::
666
667 $ python -m tarfile -e monty.tar other-dir/
668
669For a list of the files in a tar archive, use the :option:`-l` option::
670
671 $ python -m tarfile -l monty.tar
672
673
674Command line options
675~~~~~~~~~~~~~~~~~~~~
676
677.. cmdoption:: -l <tarfile>
678 --list <tarfile>
679
680 List files in a tarfile.
681
682.. cmdoption:: -c <tarfile> <source1> <sourceN>
683 --create <tarfile> <source1> <sourceN>
684
685 Create tarfile from source files.
686
687.. cmdoption:: -e <tarfile> [<output_dir>]
688 --extract <tarfile> [<output_dir>]
689
690 Extract tarfile into the current directory if *output_dir* is not specified.
691
692.. cmdoption:: -t <tarfile>
693 --test <tarfile>
694
695 Test whether the tarfile is valid or not.
696
697.. cmdoption:: -v, --verbose
698
699 Verbose output
700
Georg Brandl116aa622007-08-15 14:28:22 +0000701.. _tar-examples:
702
703Examples
704--------
705
706How to extract an entire tar archive to the current working directory::
707
708 import tarfile
709 tar = tarfile.open("sample.tar.gz")
710 tar.extractall()
711 tar.close()
712
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000713How to extract a subset of a tar archive with :meth:`TarFile.extractall` using
714a generator function instead of a list::
715
716 import os
717 import tarfile
718
719 def py_files(members):
720 for tarinfo in members:
721 if os.path.splitext(tarinfo.name)[1] == ".py":
722 yield tarinfo
723
724 tar = tarfile.open("sample.tar.gz")
725 tar.extractall(members=py_files(tar))
726 tar.close()
727
Georg Brandl116aa622007-08-15 14:28:22 +0000728How to create an uncompressed tar archive from a list of filenames::
729
730 import tarfile
731 tar = tarfile.open("sample.tar", "w")
732 for name in ["foo", "bar", "quux"]:
733 tar.add(name)
734 tar.close()
735
Lars Gustäbel01385812010-03-03 12:08:54 +0000736The same example using the :keyword:`with` statement::
737
738 import tarfile
739 with tarfile.open("sample.tar", "w") as tar:
740 for name in ["foo", "bar", "quux"]:
741 tar.add(name)
742
Georg Brandl116aa622007-08-15 14:28:22 +0000743How to read a gzip compressed tar archive and display some member information::
744
745 import tarfile
746 tar = tarfile.open("sample.tar.gz", "r:gz")
747 for tarinfo in tar:
Collin Winterc79461b2007-09-01 23:34:30 +0000748 print(tarinfo.name, "is", tarinfo.size, "bytes in size and is", end="")
Georg Brandl116aa622007-08-15 14:28:22 +0000749 if tarinfo.isreg():
Collin Winterc79461b2007-09-01 23:34:30 +0000750 print("a regular file.")
Georg Brandl116aa622007-08-15 14:28:22 +0000751 elif tarinfo.isdir():
Collin Winterc79461b2007-09-01 23:34:30 +0000752 print("a directory.")
Georg Brandl116aa622007-08-15 14:28:22 +0000753 else:
Collin Winterc79461b2007-09-01 23:34:30 +0000754 print("something else.")
Georg Brandl116aa622007-08-15 14:28:22 +0000755 tar.close()
756
Lars Gustäbel049d2aa2009-09-12 10:44:00 +0000757How to create an archive and reset the user information using the *filter*
758parameter in :meth:`TarFile.add`::
759
760 import tarfile
761 def reset(tarinfo):
762 tarinfo.uid = tarinfo.gid = 0
763 tarinfo.uname = tarinfo.gname = "root"
764 return tarinfo
765 tar = tarfile.open("sample.tar.gz", "w:gz")
766 tar.add("foo", filter=reset)
767 tar.close()
768
Georg Brandl116aa622007-08-15 14:28:22 +0000769
770.. _tar-formats:
771
772Supported tar formats
773---------------------
774
775There are three tar formats that can be created with the :mod:`tarfile` module:
776
777* The POSIX.1-1988 ustar format (:const:`USTAR_FORMAT`). It supports filenames
778 up to a length of at best 256 characters and linknames up to 100 characters. The
Serhiy Storchakaf8def282013-02-16 17:29:56 +0200779 maximum file size is 8 GiB. This is an old and limited but widely
Georg Brandl116aa622007-08-15 14:28:22 +0000780 supported format.
781
782* The GNU tar format (:const:`GNU_FORMAT`). It supports long filenames and
Serhiy Storchakaf8def282013-02-16 17:29:56 +0200783 linknames, files bigger than 8 GiB and sparse files. It is the de facto
Georg Brandl116aa622007-08-15 14:28:22 +0000784 standard on GNU/Linux systems. :mod:`tarfile` fully supports the GNU tar
785 extensions for long names, sparse file support is read-only.
786
787* The POSIX.1-2001 pax format (:const:`PAX_FORMAT`). It is the most flexible
788 format with virtually no limits. It supports long filenames and linknames, large
789 files and stores pathnames in a portable way. However, not all tar
790 implementations today are able to handle pax archives properly.
791
792 The *pax* format is an extension to the existing *ustar* format. It uses extra
793 headers for information that cannot be stored otherwise. There are two flavours
794 of pax headers: Extended headers only affect the subsequent file header, global
795 headers are valid for the complete archive and affect all following files. All
796 the data in a pax header is encoded in *UTF-8* for portability reasons.
797
798There are some more variants of the tar format which can be read, but not
799created:
800
801* The ancient V7 format. This is the first tar format from Unix Seventh Edition,
802 storing only regular files and directories. Names must not be longer than 100
803 characters, there is no user/group name information. Some archives have
804 miscalculated header checksums in case of fields with non-ASCII characters.
805
806* The SunOS tar extended format. This format is a variant of the POSIX.1-2001
807 pax format, but is not compatible.
808
Georg Brandl116aa622007-08-15 14:28:22 +0000809.. _tar-unicode:
810
811Unicode issues
812--------------
813
814The tar format was originally conceived to make backups on tape drives with the
815main focus on preserving file system information. Nowadays tar archives are
816commonly used for file distribution and exchanging archives over networks. One
Lars Gustäbel3741eff2007-08-21 12:17:05 +0000817problem of the original format (which is the basis of all other formats) is
818that there is no concept of supporting different character encodings. For
Georg Brandl116aa622007-08-15 14:28:22 +0000819example, an ordinary tar archive created on a *UTF-8* system cannot be read
Lars Gustäbel3741eff2007-08-21 12:17:05 +0000820correctly on a *Latin-1* system if it contains non-*ASCII* characters. Textual
821metadata (like filenames, linknames, user/group names) will appear damaged.
822Unfortunately, there is no way to autodetect the encoding of an archive. The
823pax format was designed to solve this problem. It stores non-ASCII metadata
824using the universal character encoding *UTF-8*.
Georg Brandl116aa622007-08-15 14:28:22 +0000825
Lars Gustäbel3741eff2007-08-21 12:17:05 +0000826The details of character conversion in :mod:`tarfile` are controlled by the
827*encoding* and *errors* keyword arguments of the :class:`TarFile` class.
Georg Brandl116aa622007-08-15 14:28:22 +0000828
Lars Gustäbel3741eff2007-08-21 12:17:05 +0000829*encoding* defines the character encoding to use for the metadata in the
830archive. The default value is :func:`sys.getfilesystemencoding` or ``'ascii'``
831as a fallback. Depending on whether the archive is read or written, the
832metadata must be either decoded or encoded. If *encoding* is not set
833appropriately, this conversion may fail.
Georg Brandl116aa622007-08-15 14:28:22 +0000834
835The *errors* argument defines how characters are treated that cannot be
Nick Coghlanb9fdb7a2015-01-07 00:22:00 +1000836converted. Possible values are listed in section :ref:`error-handlers`.
Victor Stinnerde629d42010-05-05 21:43:57 +0000837The default scheme is ``'surrogateescape'`` which Python also uses for its
838file system calls, see :ref:`os-filenames`.
Georg Brandl116aa622007-08-15 14:28:22 +0000839
Lars Gustäbel1465cc22010-05-17 18:02:50 +0000840In case of :const:`PAX_FORMAT` archives, *encoding* is generally not needed
841because all the metadata is stored using *UTF-8*. *encoding* is only used in
842the rare cases when binary pax headers are decoded or when strings with
843surrogate characters are stored.