blob: 2450716a1d912057b42013d19a5ac7c6bca4c88e [file] [log] [blame]
Georg Brandl116aa622007-08-15 14:28:22 +00001:mod:`tarfile` --- Read and write tar archive files
2===================================================
3
4.. module:: tarfile
5 :synopsis: Read and write tar-format archive files.
6
Georg Brandl116aa622007-08-15 14:28:22 +00007.. moduleauthor:: Lars Gustäbel <lars@gustaebel.de>
8.. sectionauthor:: Lars Gustäbel <lars@gustaebel.de>
9
Raymond Hettingera1993682011-01-27 01:20:32 +000010**Source code:** :source:`Lib/tarfile.py`
11
12--------------
Georg Brandl116aa622007-08-15 14:28:22 +000013
Guido van Rossum77677112007-11-05 19:43:04 +000014The :mod:`tarfile` module makes it possible to read and write tar
Lars Gustäbel0a9dd2f2011-12-10 20:38:14 +010015archives, including those using gzip, bz2 and lzma compression.
Éric Araujof2fbb9c2012-01-16 16:55:55 +010016Use the :mod:`zipfile` module to read or write :file:`.zip` files, or the
17higher-level functions in :ref:`shutil <archiving-operations>`.
Guido van Rossum77677112007-11-05 19:43:04 +000018
Georg Brandl116aa622007-08-15 14:28:22 +000019Some facts and figures:
20
R David Murraybf92bce2014-10-03 20:18:48 -040021* reads and writes :mod:`gzip`, :mod:`bz2` and :mod:`lzma` compressed archives
22 if the respective modules are available.
Georg Brandl116aa622007-08-15 14:28:22 +000023
24* read/write support for the POSIX.1-1988 (ustar) format.
25
26* read/write support for the GNU tar format including *longname* and *longlink*
Lars Gustäbel9cbdd752010-10-29 09:08:19 +000027 extensions, read-only support for all variants of the *sparse* extension
28 including restoration of sparse files.
Georg Brandl116aa622007-08-15 14:28:22 +000029
30* read/write support for the POSIX.1-2001 (pax) format.
31
Georg Brandl116aa622007-08-15 14:28:22 +000032* handles directories, regular files, hardlinks, symbolic links, fifos,
33 character devices and block devices and is able to acquire and restore file
34 information like timestamp, access permissions and owner.
35
Lars Gustäbel521dfb02011-12-12 10:22:56 +010036.. versionchanged:: 3.3
37 Added support for :mod:`lzma` compression.
38
Georg Brandl116aa622007-08-15 14:28:22 +000039
Benjamin Petersona37cfc62008-05-26 13:48:34 +000040.. function:: open(name=None, mode='r', fileobj=None, bufsize=10240, \*\*kwargs)
Georg Brandl116aa622007-08-15 14:28:22 +000041
42 Return a :class:`TarFile` object for the pathname *name*. For detailed
43 information on :class:`TarFile` objects and the keyword arguments that are
44 allowed, see :ref:`tarfile-objects`.
45
46 *mode* has to be a string of the form ``'filemode[:compression]'``, it defaults
47 to ``'r'``. Here is a full list of mode combinations:
48
49 +------------------+---------------------------------------------+
50 | mode | action |
51 +==================+=============================================+
52 | ``'r' or 'r:*'`` | Open for reading with transparent |
53 | | compression (recommended). |
54 +------------------+---------------------------------------------+
55 | ``'r:'`` | Open for reading exclusively without |
56 | | compression. |
57 +------------------+---------------------------------------------+
58 | ``'r:gz'`` | Open for reading with gzip compression. |
59 +------------------+---------------------------------------------+
60 | ``'r:bz2'`` | Open for reading with bzip2 compression. |
61 +------------------+---------------------------------------------+
Lars Gustäbel0a9dd2f2011-12-10 20:38:14 +010062 | ``'r:xz'`` | Open for reading with lzma compression. |
63 +------------------+---------------------------------------------+
Berker Peksag0fe63252015-02-13 21:02:12 +020064 | ``'x'`` or | Create a tarfile exclusively without |
65 | ``'x:'`` | compression. |
66 | | Raise an :exc:`FileExistsError` exception |
Berker Peksag97484782016-06-14 00:48:35 +030067 | | if it already exists. |
Berker Peksag0fe63252015-02-13 21:02:12 +020068 +------------------+---------------------------------------------+
69 | ``'x:gz'`` | Create a tarfile with gzip compression. |
70 | | Raise an :exc:`FileExistsError` exception |
Berker Peksag97484782016-06-14 00:48:35 +030071 | | if it already exists. |
Berker Peksag0fe63252015-02-13 21:02:12 +020072 +------------------+---------------------------------------------+
73 | ``'x:bz2'`` | Create a tarfile with bzip2 compression. |
74 | | Raise an :exc:`FileExistsError` exception |
Berker Peksag97484782016-06-14 00:48:35 +030075 | | if it already exists. |
Berker Peksag0fe63252015-02-13 21:02:12 +020076 +------------------+---------------------------------------------+
77 | ``'x:xz'`` | Create a tarfile with lzma compression. |
78 | | Raise an :exc:`FileExistsError` exception |
Berker Peksag97484782016-06-14 00:48:35 +030079 | | if it already exists. |
Berker Peksag0fe63252015-02-13 21:02:12 +020080 +------------------+---------------------------------------------+
Georg Brandl116aa622007-08-15 14:28:22 +000081 | ``'a' or 'a:'`` | Open for appending with no compression. The |
82 | | file is created if it does not exist. |
83 +------------------+---------------------------------------------+
84 | ``'w' or 'w:'`` | Open for uncompressed writing. |
85 +------------------+---------------------------------------------+
86 | ``'w:gz'`` | Open for gzip compressed writing. |
87 +------------------+---------------------------------------------+
88 | ``'w:bz2'`` | Open for bzip2 compressed writing. |
89 +------------------+---------------------------------------------+
Lars Gustäbel0a9dd2f2011-12-10 20:38:14 +010090 | ``'w:xz'`` | Open for lzma compressed writing. |
91 +------------------+---------------------------------------------+
Georg Brandl116aa622007-08-15 14:28:22 +000092
Lars Gustäbel0a9dd2f2011-12-10 20:38:14 +010093 Note that ``'a:gz'``, ``'a:bz2'`` or ``'a:xz'`` is not possible. If *mode*
94 is not suitable to open a certain (compressed) file for reading,
95 :exc:`ReadError` is raised. Use *mode* ``'r'`` to avoid this. If a
96 compression method is not supported, :exc:`CompressionError` is raised.
Georg Brandl116aa622007-08-15 14:28:22 +000097
Antoine Pitrou11cb9612010-09-15 11:11:28 +000098 If *fileobj* is specified, it is used as an alternative to a :term:`file object`
99 opened in binary mode for *name*. It is supposed to be at position 0.
Georg Brandl116aa622007-08-15 14:28:22 +0000100
Berker Peksag0fe63252015-02-13 21:02:12 +0200101 For modes ``'w:gz'``, ``'r:gz'``, ``'w:bz2'``, ``'r:bz2'``, ``'x:gz'``,
102 ``'x:bz2'``, :func:`tarfile.open` accepts the keyword argument
Martin Panter7f7c6052016-04-13 03:24:06 +0000103 *compresslevel* (default ``9``) to specify the compression level of the file.
Benjamin Peterson9b2731b2014-06-07 12:45:37 -0700104
Georg Brandl116aa622007-08-15 14:28:22 +0000105 For special purposes, there is a second format for *mode*:
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000106 ``'filemode|[compression]'``. :func:`tarfile.open` will return a :class:`TarFile`
Georg Brandl116aa622007-08-15 14:28:22 +0000107 object that processes its data as a stream of blocks. No random seeking will
108 be done on the file. If given, *fileobj* may be any object that has a
109 :meth:`read` or :meth:`write` method (depending on the *mode*). *bufsize*
110 specifies the blocksize and defaults to ``20 * 512`` bytes. Use this variant
Antoine Pitrou11cb9612010-09-15 11:11:28 +0000111 in combination with e.g. ``sys.stdin``, a socket :term:`file object` or a tape
Georg Brandl116aa622007-08-15 14:28:22 +0000112 device. However, such a :class:`TarFile` object is limited in that it does
Martin Panterc04fb562016-02-10 05:44:01 +0000113 not allow random access, see :ref:`tar-examples`. The currently
Georg Brandl116aa622007-08-15 14:28:22 +0000114 possible modes:
115
116 +-------------+--------------------------------------------+
117 | Mode | Action |
118 +=============+============================================+
119 | ``'r|*'`` | Open a *stream* of tar blocks for reading |
120 | | with transparent compression. |
121 +-------------+--------------------------------------------+
122 | ``'r|'`` | Open a *stream* of uncompressed tar blocks |
123 | | for reading. |
124 +-------------+--------------------------------------------+
125 | ``'r|gz'`` | Open a gzip compressed *stream* for |
126 | | reading. |
127 +-------------+--------------------------------------------+
128 | ``'r|bz2'`` | Open a bzip2 compressed *stream* for |
129 | | reading. |
130 +-------------+--------------------------------------------+
Serhiy Storchaka6a7b3a72016-04-17 08:32:47 +0300131 | ``'r|xz'`` | Open an lzma compressed *stream* for |
Lars Gustäbel0a9dd2f2011-12-10 20:38:14 +0100132 | | reading. |
133 +-------------+--------------------------------------------+
Georg Brandl116aa622007-08-15 14:28:22 +0000134 | ``'w|'`` | Open an uncompressed *stream* for writing. |
135 +-------------+--------------------------------------------+
Lars Gustäbel0c6cbbd2011-12-10 12:45:45 +0100136 | ``'w|gz'`` | Open a gzip compressed *stream* for |
Georg Brandl116aa622007-08-15 14:28:22 +0000137 | | writing. |
138 +-------------+--------------------------------------------+
Lars Gustäbel0c6cbbd2011-12-10 12:45:45 +0100139 | ``'w|bz2'`` | Open a bzip2 compressed *stream* for |
Georg Brandl116aa622007-08-15 14:28:22 +0000140 | | writing. |
141 +-------------+--------------------------------------------+
Lars Gustäbel0a9dd2f2011-12-10 20:38:14 +0100142 | ``'w|xz'`` | Open an lzma compressed *stream* for |
143 | | writing. |
144 +-------------+--------------------------------------------+
Georg Brandl116aa622007-08-15 14:28:22 +0000145
Berker Peksag0fe63252015-02-13 21:02:12 +0200146 .. versionchanged:: 3.5
147 The ``'x'`` (exclusive creation) mode was added.
Georg Brandl116aa622007-08-15 14:28:22 +0000148
Serhiy Storchakac45cd162017-03-08 10:32:44 +0200149 .. versionchanged:: 3.6
150 The *name* parameter accepts a :term:`path-like object`.
151
152
Georg Brandl116aa622007-08-15 14:28:22 +0000153.. class:: TarFile
154
Berker Peksag97484782016-06-14 00:48:35 +0300155 Class for reading and writing tar archives. Do not use this class directly:
156 use :func:`tarfile.open` instead. See :ref:`tarfile-objects`.
Georg Brandl116aa622007-08-15 14:28:22 +0000157
158
159.. function:: is_tarfile(name)
160
161 Return :const:`True` if *name* is a tar archive file, that the :mod:`tarfile`
162 module can read.
163
164
Lars Gustäbel0c24e8b2008-08-02 11:43:24 +0000165The :mod:`tarfile` module defines the following exceptions:
Georg Brandl116aa622007-08-15 14:28:22 +0000166
167
168.. exception:: TarError
169
170 Base class for all :mod:`tarfile` exceptions.
171
172
173.. exception:: ReadError
174
175 Is raised when a tar archive is opened, that either cannot be handled by the
176 :mod:`tarfile` module or is somehow invalid.
177
178
179.. exception:: CompressionError
180
181 Is raised when a compression method is not supported or when the data cannot be
182 decoded properly.
183
184
185.. exception:: StreamError
186
187 Is raised for the limitations that are typical for stream-like :class:`TarFile`
188 objects.
189
190
191.. exception:: ExtractError
192
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000193 Is raised for *non-fatal* errors when using :meth:`TarFile.extract`, but only if
Georg Brandl116aa622007-08-15 14:28:22 +0000194 :attr:`TarFile.errorlevel`\ ``== 2``.
195
196
197.. exception:: HeaderError
198
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000199 Is raised by :meth:`TarInfo.frombuf` if the buffer it gets is invalid.
200
Georg Brandl116aa622007-08-15 14:28:22 +0000201
R David Murraybf92bce2014-10-03 20:18:48 -0400202The following constants are available at the module level:
203
204.. data:: ENCODING
205
206 The default character encoding: ``'utf-8'`` on Windows, the value returned by
207 :func:`sys.getfilesystemencoding` otherwise.
208
Georg Brandl116aa622007-08-15 14:28:22 +0000209
210Each of the following constants defines a tar archive format that the
211:mod:`tarfile` module is able to create. See section :ref:`tar-formats` for
212details.
213
214
215.. data:: USTAR_FORMAT
216
217 POSIX.1-1988 (ustar) format.
218
219
220.. data:: GNU_FORMAT
221
222 GNU tar format.
223
224
225.. data:: PAX_FORMAT
226
227 POSIX.1-2001 (pax) format.
228
229
230.. data:: DEFAULT_FORMAT
231
232 The default format for creating archives. This is currently :const:`GNU_FORMAT`.
233
234
235.. seealso::
236
237 Module :mod:`zipfile`
238 Documentation of the :mod:`zipfile` standard module.
239
R David Murraybf92bce2014-10-03 20:18:48 -0400240 :ref:`archiving-operations`
241 Documentation of the higher-level archiving facilities provided by the
242 standard :mod:`shutil` module.
243
Serhiy Storchaka6dff0202016-05-07 10:49:07 +0300244 `GNU tar manual, Basic Tar Format <https://www.gnu.org/software/tar/manual/html_node/Standard.html>`_
Georg Brandl116aa622007-08-15 14:28:22 +0000245 Documentation for tar archive files, including GNU tar extensions.
246
Georg Brandl116aa622007-08-15 14:28:22 +0000247
248.. _tarfile-objects:
249
250TarFile Objects
251---------------
252
253The :class:`TarFile` object provides an interface to a tar archive. A tar
254archive is a sequence of blocks. An archive member (a stored file) is made up of
255a header block followed by data blocks. It is possible to store a file in a tar
256archive several times. Each archive member is represented by a :class:`TarInfo`
257object, see :ref:`tarinfo-objects` for details.
258
Lars Gustäbel01385812010-03-03 12:08:54 +0000259A :class:`TarFile` object can be used as a context manager in a :keyword:`with`
260statement. It will automatically be closed when the block is completed. Please
261note that in the event of an exception an archive opened for writing will not
Benjamin Peterson08bf91c2010-04-11 16:12:57 +0000262be finalized; only the internally used file object will be closed. See the
Lars Gustäbel01385812010-03-03 12:08:54 +0000263:ref:`tar-examples` section for a use case.
264
265.. versionadded:: 3.2
Serhiy Storchaka14867992014-09-10 23:43:41 +0300266 Added support for the context management protocol.
Georg Brandl116aa622007-08-15 14:28:22 +0000267
Victor Stinnerde629d42010-05-05 21:43:57 +0000268.. class:: TarFile(name=None, mode='r', fileobj=None, format=DEFAULT_FORMAT, tarinfo=TarInfo, dereference=False, ignore_zeros=False, encoding=ENCODING, errors='surrogateescape', pax_headers=None, debug=0, errorlevel=0)
Georg Brandl116aa622007-08-15 14:28:22 +0000269
270 All following arguments are optional and can be accessed as instance attributes
271 as well.
272
Serhiy Storchakac45cd162017-03-08 10:32:44 +0200273 *name* is the pathname of the archive. *name* may be a :term:`path-like object`.
274 It can be omitted if *fileobj* is given.
Georg Brandl116aa622007-08-15 14:28:22 +0000275 In this case, the file object's :attr:`name` attribute is used if it exists.
276
277 *mode* is either ``'r'`` to read from an existing archive, ``'a'`` to append
Berker Peksag0fe63252015-02-13 21:02:12 +0200278 data to an existing file, ``'w'`` to create a new file overwriting an existing
Berker Peksag97484782016-06-14 00:48:35 +0300279 one, or ``'x'`` to create a new file only if it does not already exist.
Georg Brandl116aa622007-08-15 14:28:22 +0000280
281 If *fileobj* is given, it is used for reading or writing data. If it can be
282 determined, *mode* is overridden by *fileobj*'s mode. *fileobj* will be used
283 from position 0.
284
285 .. note::
286
287 *fileobj* is not closed, when :class:`TarFile` is closed.
288
289 *format* controls the archive format. It must be one of the constants
290 :const:`USTAR_FORMAT`, :const:`GNU_FORMAT` or :const:`PAX_FORMAT` that are
291 defined at module level.
292
Georg Brandl116aa622007-08-15 14:28:22 +0000293 The *tarinfo* argument can be used to replace the default :class:`TarInfo` class
294 with a different one.
295
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000296 If *dereference* is :const:`False`, add symbolic and hard links to the archive. If it
297 is :const:`True`, add the content of the target files to the archive. This has no
Georg Brandl116aa622007-08-15 14:28:22 +0000298 effect on systems that do not support symbolic links.
299
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000300 If *ignore_zeros* is :const:`False`, treat an empty block as the end of the archive.
301 If it is :const:`True`, skip empty (and invalid) blocks and try to get as many members
Georg Brandl116aa622007-08-15 14:28:22 +0000302 as possible. This is only useful for reading concatenated or damaged archives.
303
304 *debug* can be set from ``0`` (no debug messages) up to ``3`` (all debug
305 messages). The messages are written to ``sys.stderr``.
306
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000307 If *errorlevel* is ``0``, all errors are ignored when using :meth:`TarFile.extract`.
Georg Brandl116aa622007-08-15 14:28:22 +0000308 Nevertheless, they appear as error messages in the debug output, when debugging
Antoine Pitrou62ab10a02011-10-12 20:10:51 +0200309 is enabled. If ``1``, all *fatal* errors are raised as :exc:`OSError`
310 exceptions. If ``2``, all *non-fatal* errors are raised as :exc:`TarError`
311 exceptions as well.
Georg Brandl116aa622007-08-15 14:28:22 +0000312
Lars Gustäbel3741eff2007-08-21 12:17:05 +0000313 The *encoding* and *errors* arguments define the character encoding to be
314 used for reading or writing the archive and how conversion errors are going
315 to be handled. The default settings will work for most users.
Georg Brandl116aa622007-08-15 14:28:22 +0000316 See section :ref:`tar-unicode` for in-depth information.
317
Lars Gustäbel3741eff2007-08-21 12:17:05 +0000318 The *pax_headers* argument is an optional dictionary of strings which
Georg Brandl116aa622007-08-15 14:28:22 +0000319 will be added as a pax global header if *format* is :const:`PAX_FORMAT`.
320
Berker Peksag0fe63252015-02-13 21:02:12 +0200321 .. versionchanged:: 3.2
322 Use ``'surrogateescape'`` as the default for the *errors* argument.
323
324 .. versionchanged:: 3.5
325 The ``'x'`` (exclusive creation) mode was added.
Georg Brandl116aa622007-08-15 14:28:22 +0000326
Serhiy Storchakac45cd162017-03-08 10:32:44 +0200327 .. versionchanged:: 3.6
328 The *name* parameter accepts a :term:`path-like object`.
329
330
Raymond Hettinger7096e262014-05-23 03:46:52 +0100331.. classmethod:: TarFile.open(...)
Georg Brandl116aa622007-08-15 14:28:22 +0000332
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000333 Alternative constructor. The :func:`tarfile.open` function is actually a
334 shortcut to this classmethod.
Georg Brandl116aa622007-08-15 14:28:22 +0000335
336
337.. method:: TarFile.getmember(name)
338
339 Return a :class:`TarInfo` object for member *name*. If *name* can not be found
340 in the archive, :exc:`KeyError` is raised.
341
342 .. note::
343
344 If a member occurs more than once in the archive, its last occurrence is assumed
345 to be the most up-to-date version.
346
347
348.. method:: TarFile.getmembers()
349
350 Return the members of the archive as a list of :class:`TarInfo` objects. The
351 list has the same order as the members in the archive.
352
353
354.. method:: TarFile.getnames()
355
356 Return the members as a list of their names. It has the same order as the list
357 returned by :meth:`getmembers`.
358
359
Serhiy Storchakaa7eb7462014-08-21 10:01:16 +0300360.. method:: TarFile.list(verbose=True, *, members=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000361
362 Print a table of contents to ``sys.stdout``. If *verbose* is :const:`False`,
363 only the names of the members are printed. If it is :const:`True`, output
Serhiy Storchakaa7eb7462014-08-21 10:01:16 +0300364 similar to that of :program:`ls -l` is produced. If optional *members* is
365 given, it must be a subset of the list returned by :meth:`getmembers`.
366
367 .. versionchanged:: 3.5
368 Added the *members* parameter.
Georg Brandl116aa622007-08-15 14:28:22 +0000369
370
371.. method:: TarFile.next()
372
373 Return the next member of the archive as a :class:`TarInfo` object, when
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000374 :class:`TarFile` is opened for reading. Return :const:`None` if there is no more
Georg Brandl116aa622007-08-15 14:28:22 +0000375 available.
376
377
Eric V. Smith7a803892015-04-15 10:27:58 -0400378.. method:: TarFile.extractall(path=".", members=None, *, numeric_owner=False)
Georg Brandl116aa622007-08-15 14:28:22 +0000379
380 Extract all members from the archive to the current working directory or
381 directory *path*. If optional *members* is given, it must be a subset of the
382 list returned by :meth:`getmembers`. Directory information like owner,
383 modification time and permissions are set after all members have been extracted.
384 This is done to work around two problems: A directory's modification time is
385 reset each time a file is created in it. And, if a directory's permissions do
386 not allow writing, extracting files to it will fail.
387
Eric V. Smith7a803892015-04-15 10:27:58 -0400388 If *numeric_owner* is :const:`True`, the uid and gid numbers from the tarfile
389 are used to set the owner/group for the extracted files. Otherwise, the named
390 values from the tarfile are used.
391
Thomas Wouters47b49bf2007-08-30 22:15:33 +0000392 .. warning::
393
394 Never extract archives from untrusted sources without prior inspection.
395 It is possible that files are created outside of *path*, e.g. members
396 that have absolute filenames starting with ``"/"`` or filenames with two
397 dots ``".."``.
398
Eric V. Smith7a803892015-04-15 10:27:58 -0400399 .. versionchanged:: 3.5
Martin Panterefbf20f2016-11-13 23:25:06 +0000400 Added the *numeric_owner* parameter.
Georg Brandl116aa622007-08-15 14:28:22 +0000401
Serhiy Storchakac45cd162017-03-08 10:32:44 +0200402 .. versionchanged:: 3.6
403 The *path* parameter accepts a :term:`path-like object`.
404
Eric V. Smith7a803892015-04-15 10:27:58 -0400405
406.. method:: TarFile.extract(member, path="", set_attrs=True, *, numeric_owner=False)
Georg Brandl116aa622007-08-15 14:28:22 +0000407
408 Extract a member from the archive to the current working directory, using its
409 full name. Its file information is extracted as accurately as possible. *member*
410 may be a filename or a :class:`TarInfo` object. You can specify a different
Serhiy Storchakac45cd162017-03-08 10:32:44 +0200411 directory using *path*. *path* may be a :term:`path-like object`.
412 File attributes (owner, mtime, mode) are set unless *set_attrs* is false.
Georg Brandl116aa622007-08-15 14:28:22 +0000413
Eric V. Smith7a803892015-04-15 10:27:58 -0400414 If *numeric_owner* is :const:`True`, the uid and gid numbers from the tarfile
415 are used to set the owner/group for the extracted files. Otherwise, the named
416 values from the tarfile are used.
417
Georg Brandl116aa622007-08-15 14:28:22 +0000418 .. note::
419
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000420 The :meth:`extract` method does not take care of several extraction issues.
421 In most cases you should consider using the :meth:`extractall` method.
Georg Brandl116aa622007-08-15 14:28:22 +0000422
Thomas Wouters47b49bf2007-08-30 22:15:33 +0000423 .. warning::
424
425 See the warning for :meth:`extractall`.
426
Martin v. Löwis16f344d2010-11-01 21:39:13 +0000427 .. versionchanged:: 3.2
428 Added the *set_attrs* parameter.
Georg Brandl116aa622007-08-15 14:28:22 +0000429
Eric V. Smith7a803892015-04-15 10:27:58 -0400430 .. versionchanged:: 3.5
Martin Panterefbf20f2016-11-13 23:25:06 +0000431 Added the *numeric_owner* parameter.
Eric V. Smith7a803892015-04-15 10:27:58 -0400432
Serhiy Storchakac45cd162017-03-08 10:32:44 +0200433 .. versionchanged:: 3.6
434 The *path* parameter accepts a :term:`path-like object`.
435
436
Georg Brandl116aa622007-08-15 14:28:22 +0000437.. method:: TarFile.extractfile(member)
438
439 Extract a member from the archive as a file object. *member* may be a filename
Lars Gustäbel7a919e92012-05-05 18:15:03 +0200440 or a :class:`TarInfo` object. If *member* is a regular file or a link, an
441 :class:`io.BufferedReader` object is returned. Otherwise, :const:`None` is
442 returned.
Georg Brandl116aa622007-08-15 14:28:22 +0000443
Lars Gustäbel7a919e92012-05-05 18:15:03 +0200444 .. versionchanged:: 3.3
445 Return an :class:`io.BufferedReader` object.
Georg Brandl116aa622007-08-15 14:28:22 +0000446
447
Serhiy Storchaka4f76fb12017-01-13 13:25:24 +0200448.. method:: TarFile.add(name, arcname=None, recursive=True, *, filter=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000449
Raymond Hettingera63a3122011-01-26 20:34:14 +0000450 Add the file *name* to the archive. *name* may be any type of file
451 (directory, fifo, symbolic link, etc.). If given, *arcname* specifies an
452 alternative name for the file in the archive. Directories are added
453 recursively by default. This can be avoided by setting *recursive* to
Serhiy Storchaka4f76fb12017-01-13 13:25:24 +0200454 :const:`False`. If *filter* is given, it
Raymond Hettingera63a3122011-01-26 20:34:14 +0000455 should be a function that takes a :class:`TarInfo` object argument and
456 returns the changed :class:`TarInfo` object. If it instead returns
457 :const:`None` the :class:`TarInfo` object will be excluded from the
458 archive. See :ref:`tar-examples` for an example.
Lars Gustäbel049d2aa2009-09-12 10:44:00 +0000459
460 .. versionchanged:: 3.2
461 Added the *filter* parameter.
462
Georg Brandl116aa622007-08-15 14:28:22 +0000463
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000464.. method:: TarFile.addfile(tarinfo, fileobj=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000465
466 Add the :class:`TarInfo` object *tarinfo* to the archive. If *fileobj* is given,
Martin Panterf817a482016-02-19 23:34:56 +0000467 it should be a :term:`binary file`, and
Georg Brandl116aa622007-08-15 14:28:22 +0000468 ``tarinfo.size`` bytes are read from it and added to the archive. You can
Martin Panterf817a482016-02-19 23:34:56 +0000469 create :class:`TarInfo` objects directly, or by using :meth:`gettarinfo`.
Georg Brandl116aa622007-08-15 14:28:22 +0000470
471
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000472.. method:: TarFile.gettarinfo(name=None, arcname=None, fileobj=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000473
Martin Panterf817a482016-02-19 23:34:56 +0000474 Create a :class:`TarInfo` object from the result of :func:`os.stat` or
475 equivalent on an existing file. The file is either named by *name*, or
Serhiy Storchakac45cd162017-03-08 10:32:44 +0200476 specified as a :term:`file object` *fileobj* with a file descriptor.
477 *name* may be a :term:`path-like object`. If
Martin Panterf817a482016-02-19 23:34:56 +0000478 given, *arcname* specifies an alternative name for the file in the
479 archive, otherwise, the name is taken from *fileobj*’s
480 :attr:`~io.FileIO.name` attribute, or the *name* argument. The name
481 should be a text string.
482
483 You can modify
484 some of the :class:`TarInfo`’s attributes before you add it using :meth:`addfile`.
485 If the file object is not an ordinary file object positioned at the
486 beginning of the file, attributes such as :attr:`~TarInfo.size` may need
487 modifying. This is the case for objects such as :class:`~gzip.GzipFile`.
488 The :attr:`~TarInfo.name` may also be modified, in which case *arcname*
489 could be a dummy string.
Georg Brandl116aa622007-08-15 14:28:22 +0000490
Serhiy Storchakac45cd162017-03-08 10:32:44 +0200491 .. versionchanged:: 3.6
492 The *name* parameter accepts a :term:`path-like object`.
493
Georg Brandl116aa622007-08-15 14:28:22 +0000494
495.. method:: TarFile.close()
496
497 Close the :class:`TarFile`. In write mode, two finishing zero blocks are
498 appended to the archive.
499
500
Georg Brandl116aa622007-08-15 14:28:22 +0000501.. attribute:: TarFile.pax_headers
502
503 A dictionary containing key-value pairs of pax global headers.
504
Georg Brandl116aa622007-08-15 14:28:22 +0000505
Georg Brandl116aa622007-08-15 14:28:22 +0000506
507.. _tarinfo-objects:
508
509TarInfo Objects
510---------------
511
512A :class:`TarInfo` object represents one member in a :class:`TarFile`. Aside
513from storing all required attributes of a file (like file type, size, time,
514permissions, owner etc.), it provides some useful methods to determine its type.
515It does *not* contain the file's data itself.
516
517:class:`TarInfo` objects are returned by :class:`TarFile`'s methods
518:meth:`getmember`, :meth:`getmembers` and :meth:`gettarinfo`.
519
520
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000521.. class:: TarInfo(name="")
Georg Brandl116aa622007-08-15 14:28:22 +0000522
523 Create a :class:`TarInfo` object.
524
525
Berker Peksag37de9102015-04-19 04:37:35 +0300526.. classmethod:: TarInfo.frombuf(buf, encoding, errors)
Georg Brandl116aa622007-08-15 14:28:22 +0000527
528 Create and return a :class:`TarInfo` object from string buffer *buf*.
529
Berker Peksag37de9102015-04-19 04:37:35 +0300530 Raises :exc:`HeaderError` if the buffer is invalid.
Georg Brandl116aa622007-08-15 14:28:22 +0000531
532
Berker Peksag37de9102015-04-19 04:37:35 +0300533.. classmethod:: TarInfo.fromtarfile(tarfile)
Georg Brandl116aa622007-08-15 14:28:22 +0000534
535 Read the next member from the :class:`TarFile` object *tarfile* and return it as
536 a :class:`TarInfo` object.
537
Georg Brandl116aa622007-08-15 14:28:22 +0000538
Victor Stinnerde629d42010-05-05 21:43:57 +0000539.. method:: TarInfo.tobuf(format=DEFAULT_FORMAT, encoding=ENCODING, errors='surrogateescape')
Georg Brandl116aa622007-08-15 14:28:22 +0000540
541 Create a string buffer from a :class:`TarInfo` object. For information on the
542 arguments see the constructor of the :class:`TarFile` class.
543
Victor Stinnerde629d42010-05-05 21:43:57 +0000544 .. versionchanged:: 3.2
545 Use ``'surrogateescape'`` as the default for the *errors* argument.
546
Georg Brandl116aa622007-08-15 14:28:22 +0000547
548A ``TarInfo`` object has the following public data attributes:
549
550
551.. attribute:: TarInfo.name
552
553 Name of the archive member.
554
555
556.. attribute:: TarInfo.size
557
558 Size in bytes.
559
560
561.. attribute:: TarInfo.mtime
562
563 Time of last modification.
564
565
566.. attribute:: TarInfo.mode
567
568 Permission bits.
569
570
571.. attribute:: TarInfo.type
572
573 File type. *type* is usually one of these constants: :const:`REGTYPE`,
574 :const:`AREGTYPE`, :const:`LNKTYPE`, :const:`SYMTYPE`, :const:`DIRTYPE`,
575 :const:`FIFOTYPE`, :const:`CONTTYPE`, :const:`CHRTYPE`, :const:`BLKTYPE`,
576 :const:`GNUTYPE_SPARSE`. To determine the type of a :class:`TarInfo` object
Raymond Hettingerf7f64f92014-05-23 00:03:45 +0100577 more conveniently, use the ``is*()`` methods below.
Georg Brandl116aa622007-08-15 14:28:22 +0000578
579
580.. attribute:: TarInfo.linkname
581
582 Name of the target file name, which is only present in :class:`TarInfo` objects
583 of type :const:`LNKTYPE` and :const:`SYMTYPE`.
584
585
586.. attribute:: TarInfo.uid
587
588 User ID of the user who originally stored this member.
589
590
591.. attribute:: TarInfo.gid
592
593 Group ID of the user who originally stored this member.
594
595
596.. attribute:: TarInfo.uname
597
598 User name.
599
600
601.. attribute:: TarInfo.gname
602
603 Group name.
604
605
606.. attribute:: TarInfo.pax_headers
607
608 A dictionary containing key-value pairs of an associated pax extended header.
609
Georg Brandl116aa622007-08-15 14:28:22 +0000610
611A :class:`TarInfo` object also provides some convenient query methods:
612
613
614.. method:: TarInfo.isfile()
615
616 Return :const:`True` if the :class:`Tarinfo` object is a regular file.
617
618
619.. method:: TarInfo.isreg()
620
621 Same as :meth:`isfile`.
622
623
624.. method:: TarInfo.isdir()
625
626 Return :const:`True` if it is a directory.
627
628
629.. method:: TarInfo.issym()
630
631 Return :const:`True` if it is a symbolic link.
632
633
634.. method:: TarInfo.islnk()
635
636 Return :const:`True` if it is a hard link.
637
638
639.. method:: TarInfo.ischr()
640
641 Return :const:`True` if it is a character device.
642
643
644.. method:: TarInfo.isblk()
645
646 Return :const:`True` if it is a block device.
647
648
649.. method:: TarInfo.isfifo()
650
651 Return :const:`True` if it is a FIFO.
652
653
654.. method:: TarInfo.isdev()
655
656 Return :const:`True` if it is one of character device, block device or FIFO.
657
Georg Brandl116aa622007-08-15 14:28:22 +0000658
Serhiy Storchakad27b4552013-11-24 01:53:29 +0200659.. _tarfile-commandline:
Serhiy Storchaka72b34432016-11-02 21:04:45 +0200660.. program:: tarfile
Serhiy Storchakad27b4552013-11-24 01:53:29 +0200661
Serhiy Storchaka72b34432016-11-02 21:04:45 +0200662Command-Line Interface
Serhiy Storchakad27b4552013-11-24 01:53:29 +0200663----------------------
664
665.. versionadded:: 3.4
666
Serhiy Storchaka72b34432016-11-02 21:04:45 +0200667The :mod:`tarfile` module provides a simple command-line interface to interact
Serhiy Storchakad27b4552013-11-24 01:53:29 +0200668with tar archives.
669
670If you want to create a new tar archive, specify its name after the :option:`-c`
Martin Panter1050d2d2016-07-26 11:18:21 +0200671option and then list the filename(s) that should be included:
672
673.. code-block:: shell-session
Serhiy Storchakad27b4552013-11-24 01:53:29 +0200674
675 $ python -m tarfile -c monty.tar spam.txt eggs.txt
676
Martin Panter1050d2d2016-07-26 11:18:21 +0200677Passing a directory is also acceptable:
678
679.. code-block:: shell-session
Serhiy Storchakad27b4552013-11-24 01:53:29 +0200680
681 $ python -m tarfile -c monty.tar life-of-brian_1979/
682
683If you want to extract a tar archive into the current directory, use
Martin Panter1050d2d2016-07-26 11:18:21 +0200684the :option:`-e` option:
685
686.. code-block:: shell-session
Serhiy Storchakad27b4552013-11-24 01:53:29 +0200687
688 $ python -m tarfile -e monty.tar
689
690You can also extract a tar archive into a different directory by passing the
Martin Panter1050d2d2016-07-26 11:18:21 +0200691directory's name:
692
693.. code-block:: shell-session
Serhiy Storchakad27b4552013-11-24 01:53:29 +0200694
695 $ python -m tarfile -e monty.tar other-dir/
696
Martin Panter1050d2d2016-07-26 11:18:21 +0200697For a list of the files in a tar archive, use the :option:`-l` option:
698
699.. code-block:: shell-session
Serhiy Storchakad27b4552013-11-24 01:53:29 +0200700
701 $ python -m tarfile -l monty.tar
702
703
Serhiy Storchaka72b34432016-11-02 21:04:45 +0200704Command-line options
Serhiy Storchakad27b4552013-11-24 01:53:29 +0200705~~~~~~~~~~~~~~~~~~~~
706
707.. cmdoption:: -l <tarfile>
708 --list <tarfile>
709
710 List files in a tarfile.
711
Serhiy Storchaka72b34432016-11-02 21:04:45 +0200712.. cmdoption:: -c <tarfile> <source1> ... <sourceN>
713 --create <tarfile> <source1> ... <sourceN>
Serhiy Storchakad27b4552013-11-24 01:53:29 +0200714
715 Create tarfile from source files.
716
717.. cmdoption:: -e <tarfile> [<output_dir>]
718 --extract <tarfile> [<output_dir>]
719
720 Extract tarfile into the current directory if *output_dir* is not specified.
721
722.. cmdoption:: -t <tarfile>
723 --test <tarfile>
724
725 Test whether the tarfile is valid or not.
726
727.. cmdoption:: -v, --verbose
728
Serhiy Storchaka72b34432016-11-02 21:04:45 +0200729 Verbose output.
Serhiy Storchakad27b4552013-11-24 01:53:29 +0200730
Georg Brandl116aa622007-08-15 14:28:22 +0000731.. _tar-examples:
732
733Examples
734--------
735
736How to extract an entire tar archive to the current working directory::
737
738 import tarfile
739 tar = tarfile.open("sample.tar.gz")
740 tar.extractall()
741 tar.close()
742
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000743How to extract a subset of a tar archive with :meth:`TarFile.extractall` using
744a generator function instead of a list::
745
746 import os
747 import tarfile
748
749 def py_files(members):
750 for tarinfo in members:
751 if os.path.splitext(tarinfo.name)[1] == ".py":
752 yield tarinfo
753
754 tar = tarfile.open("sample.tar.gz")
755 tar.extractall(members=py_files(tar))
756 tar.close()
757
Georg Brandl116aa622007-08-15 14:28:22 +0000758How to create an uncompressed tar archive from a list of filenames::
759
760 import tarfile
761 tar = tarfile.open("sample.tar", "w")
762 for name in ["foo", "bar", "quux"]:
763 tar.add(name)
764 tar.close()
765
Lars Gustäbel01385812010-03-03 12:08:54 +0000766The same example using the :keyword:`with` statement::
767
768 import tarfile
769 with tarfile.open("sample.tar", "w") as tar:
770 for name in ["foo", "bar", "quux"]:
771 tar.add(name)
772
Georg Brandl116aa622007-08-15 14:28:22 +0000773How to read a gzip compressed tar archive and display some member information::
774
775 import tarfile
776 tar = tarfile.open("sample.tar.gz", "r:gz")
777 for tarinfo in tar:
Collin Winterc79461b2007-09-01 23:34:30 +0000778 print(tarinfo.name, "is", tarinfo.size, "bytes in size and is", end="")
Georg Brandl116aa622007-08-15 14:28:22 +0000779 if tarinfo.isreg():
Collin Winterc79461b2007-09-01 23:34:30 +0000780 print("a regular file.")
Georg Brandl116aa622007-08-15 14:28:22 +0000781 elif tarinfo.isdir():
Collin Winterc79461b2007-09-01 23:34:30 +0000782 print("a directory.")
Georg Brandl116aa622007-08-15 14:28:22 +0000783 else:
Collin Winterc79461b2007-09-01 23:34:30 +0000784 print("something else.")
Georg Brandl116aa622007-08-15 14:28:22 +0000785 tar.close()
786
Lars Gustäbel049d2aa2009-09-12 10:44:00 +0000787How to create an archive and reset the user information using the *filter*
788parameter in :meth:`TarFile.add`::
789
790 import tarfile
791 def reset(tarinfo):
792 tarinfo.uid = tarinfo.gid = 0
793 tarinfo.uname = tarinfo.gname = "root"
794 return tarinfo
795 tar = tarfile.open("sample.tar.gz", "w:gz")
796 tar.add("foo", filter=reset)
797 tar.close()
798
Georg Brandl116aa622007-08-15 14:28:22 +0000799
800.. _tar-formats:
801
802Supported tar formats
803---------------------
804
805There are three tar formats that can be created with the :mod:`tarfile` module:
806
807* The POSIX.1-1988 ustar format (:const:`USTAR_FORMAT`). It supports filenames
808 up to a length of at best 256 characters and linknames up to 100 characters. The
Serhiy Storchakaf8def282013-02-16 17:29:56 +0200809 maximum file size is 8 GiB. This is an old and limited but widely
Georg Brandl116aa622007-08-15 14:28:22 +0000810 supported format.
811
812* The GNU tar format (:const:`GNU_FORMAT`). It supports long filenames and
Serhiy Storchakaf8def282013-02-16 17:29:56 +0200813 linknames, files bigger than 8 GiB and sparse files. It is the de facto
Georg Brandl116aa622007-08-15 14:28:22 +0000814 standard on GNU/Linux systems. :mod:`tarfile` fully supports the GNU tar
815 extensions for long names, sparse file support is read-only.
816
817* The POSIX.1-2001 pax format (:const:`PAX_FORMAT`). It is the most flexible
818 format with virtually no limits. It supports long filenames and linknames, large
819 files and stores pathnames in a portable way. However, not all tar
820 implementations today are able to handle pax archives properly.
821
822 The *pax* format is an extension to the existing *ustar* format. It uses extra
823 headers for information that cannot be stored otherwise. There are two flavours
824 of pax headers: Extended headers only affect the subsequent file header, global
825 headers are valid for the complete archive and affect all following files. All
826 the data in a pax header is encoded in *UTF-8* for portability reasons.
827
828There are some more variants of the tar format which can be read, but not
829created:
830
831* The ancient V7 format. This is the first tar format from Unix Seventh Edition,
832 storing only regular files and directories. Names must not be longer than 100
833 characters, there is no user/group name information. Some archives have
834 miscalculated header checksums in case of fields with non-ASCII characters.
835
836* The SunOS tar extended format. This format is a variant of the POSIX.1-2001
837 pax format, but is not compatible.
838
Georg Brandl116aa622007-08-15 14:28:22 +0000839.. _tar-unicode:
840
841Unicode issues
842--------------
843
844The tar format was originally conceived to make backups on tape drives with the
845main focus on preserving file system information. Nowadays tar archives are
846commonly used for file distribution and exchanging archives over networks. One
Lars Gustäbel3741eff2007-08-21 12:17:05 +0000847problem of the original format (which is the basis of all other formats) is
848that there is no concept of supporting different character encodings. For
Georg Brandl116aa622007-08-15 14:28:22 +0000849example, an ordinary tar archive created on a *UTF-8* system cannot be read
Lars Gustäbel3741eff2007-08-21 12:17:05 +0000850correctly on a *Latin-1* system if it contains non-*ASCII* characters. Textual
851metadata (like filenames, linknames, user/group names) will appear damaged.
852Unfortunately, there is no way to autodetect the encoding of an archive. The
853pax format was designed to solve this problem. It stores non-ASCII metadata
854using the universal character encoding *UTF-8*.
Georg Brandl116aa622007-08-15 14:28:22 +0000855
Lars Gustäbel3741eff2007-08-21 12:17:05 +0000856The details of character conversion in :mod:`tarfile` are controlled by the
857*encoding* and *errors* keyword arguments of the :class:`TarFile` class.
Georg Brandl116aa622007-08-15 14:28:22 +0000858
Lars Gustäbel3741eff2007-08-21 12:17:05 +0000859*encoding* defines the character encoding to use for the metadata in the
860archive. The default value is :func:`sys.getfilesystemencoding` or ``'ascii'``
861as a fallback. Depending on whether the archive is read or written, the
862metadata must be either decoded or encoded. If *encoding* is not set
863appropriately, this conversion may fail.
Georg Brandl116aa622007-08-15 14:28:22 +0000864
865The *errors* argument defines how characters are treated that cannot be
Nick Coghlanb9fdb7a2015-01-07 00:22:00 +1000866converted. Possible values are listed in section :ref:`error-handlers`.
Victor Stinnerde629d42010-05-05 21:43:57 +0000867The default scheme is ``'surrogateescape'`` which Python also uses for its
868file system calls, see :ref:`os-filenames`.
Georg Brandl116aa622007-08-15 14:28:22 +0000869
Lars Gustäbel1465cc22010-05-17 18:02:50 +0000870In case of :const:`PAX_FORMAT` archives, *encoding* is generally not needed
871because all the metadata is stored using *UTF-8*. *encoding* is only used in
872the rare cases when binary pax headers are decoded or when strings with
873surrogate characters are stored.