blob: 2167f32f6a2550d321391163460eba4de698a899 [file] [log] [blame]
Georg Brandl116aa622007-08-15 14:28:22 +00001:mod:`tarfile` --- Read and write tar archive files
2===================================================
3
4.. module:: tarfile
5 :synopsis: Read and write tar-format archive files.
6
Georg Brandl116aa622007-08-15 14:28:22 +00007.. moduleauthor:: Lars Gustäbel <lars@gustaebel.de>
8.. sectionauthor:: Lars Gustäbel <lars@gustaebel.de>
9
Raymond Hettingera1993682011-01-27 01:20:32 +000010**Source code:** :source:`Lib/tarfile.py`
11
12--------------
Georg Brandl116aa622007-08-15 14:28:22 +000013
Guido van Rossum77677112007-11-05 19:43:04 +000014The :mod:`tarfile` module makes it possible to read and write tar
Lars Gustäbel0a9dd2f2011-12-10 20:38:14 +010015archives, including those using gzip, bz2 and lzma compression.
Éric Araujof2fbb9c2012-01-16 16:55:55 +010016Use the :mod:`zipfile` module to read or write :file:`.zip` files, or the
17higher-level functions in :ref:`shutil <archiving-operations>`.
Guido van Rossum77677112007-11-05 19:43:04 +000018
Georg Brandl116aa622007-08-15 14:28:22 +000019Some facts and figures:
20
R David Murraybf92bce2014-10-03 20:18:48 -040021* reads and writes :mod:`gzip`, :mod:`bz2` and :mod:`lzma` compressed archives
22 if the respective modules are available.
Georg Brandl116aa622007-08-15 14:28:22 +000023
24* read/write support for the POSIX.1-1988 (ustar) format.
25
26* read/write support for the GNU tar format including *longname* and *longlink*
Lars Gustäbel9cbdd752010-10-29 09:08:19 +000027 extensions, read-only support for all variants of the *sparse* extension
28 including restoration of sparse files.
Georg Brandl116aa622007-08-15 14:28:22 +000029
30* read/write support for the POSIX.1-2001 (pax) format.
31
Georg Brandl116aa622007-08-15 14:28:22 +000032* handles directories, regular files, hardlinks, symbolic links, fifos,
33 character devices and block devices and is able to acquire and restore file
34 information like timestamp, access permissions and owner.
35
Lars Gustäbel521dfb02011-12-12 10:22:56 +010036.. versionchanged:: 3.3
37 Added support for :mod:`lzma` compression.
38
Georg Brandl116aa622007-08-15 14:28:22 +000039
Benjamin Petersona37cfc62008-05-26 13:48:34 +000040.. function:: open(name=None, mode='r', fileobj=None, bufsize=10240, \*\*kwargs)
Georg Brandl116aa622007-08-15 14:28:22 +000041
42 Return a :class:`TarFile` object for the pathname *name*. For detailed
43 information on :class:`TarFile` objects and the keyword arguments that are
44 allowed, see :ref:`tarfile-objects`.
45
46 *mode* has to be a string of the form ``'filemode[:compression]'``, it defaults
47 to ``'r'``. Here is a full list of mode combinations:
48
49 +------------------+---------------------------------------------+
50 | mode | action |
51 +==================+=============================================+
52 | ``'r' or 'r:*'`` | Open for reading with transparent |
53 | | compression (recommended). |
54 +------------------+---------------------------------------------+
55 | ``'r:'`` | Open for reading exclusively without |
56 | | compression. |
57 +------------------+---------------------------------------------+
58 | ``'r:gz'`` | Open for reading with gzip compression. |
59 +------------------+---------------------------------------------+
60 | ``'r:bz2'`` | Open for reading with bzip2 compression. |
61 +------------------+---------------------------------------------+
Lars Gustäbel0a9dd2f2011-12-10 20:38:14 +010062 | ``'r:xz'`` | Open for reading with lzma compression. |
63 +------------------+---------------------------------------------+
Berker Peksag0fe63252015-02-13 21:02:12 +020064 | ``'x'`` or | Create a tarfile exclusively without |
65 | ``'x:'`` | compression. |
66 | | Raise an :exc:`FileExistsError` exception |
Berker Peksag97484782016-06-14 00:48:35 +030067 | | if it already exists. |
Berker Peksag0fe63252015-02-13 21:02:12 +020068 +------------------+---------------------------------------------+
69 | ``'x:gz'`` | Create a tarfile with gzip compression. |
70 | | Raise an :exc:`FileExistsError` exception |
Berker Peksag97484782016-06-14 00:48:35 +030071 | | if it already exists. |
Berker Peksag0fe63252015-02-13 21:02:12 +020072 +------------------+---------------------------------------------+
73 | ``'x:bz2'`` | Create a tarfile with bzip2 compression. |
74 | | Raise an :exc:`FileExistsError` exception |
Berker Peksag97484782016-06-14 00:48:35 +030075 | | if it already exists. |
Berker Peksag0fe63252015-02-13 21:02:12 +020076 +------------------+---------------------------------------------+
77 | ``'x:xz'`` | Create a tarfile with lzma compression. |
78 | | Raise an :exc:`FileExistsError` exception |
Berker Peksag97484782016-06-14 00:48:35 +030079 | | if it already exists. |
Berker Peksag0fe63252015-02-13 21:02:12 +020080 +------------------+---------------------------------------------+
Georg Brandl116aa622007-08-15 14:28:22 +000081 | ``'a' or 'a:'`` | Open for appending with no compression. The |
82 | | file is created if it does not exist. |
83 +------------------+---------------------------------------------+
84 | ``'w' or 'w:'`` | Open for uncompressed writing. |
85 +------------------+---------------------------------------------+
86 | ``'w:gz'`` | Open for gzip compressed writing. |
87 +------------------+---------------------------------------------+
88 | ``'w:bz2'`` | Open for bzip2 compressed writing. |
89 +------------------+---------------------------------------------+
Lars Gustäbel0a9dd2f2011-12-10 20:38:14 +010090 | ``'w:xz'`` | Open for lzma compressed writing. |
91 +------------------+---------------------------------------------+
Georg Brandl116aa622007-08-15 14:28:22 +000092
Lars Gustäbel0a9dd2f2011-12-10 20:38:14 +010093 Note that ``'a:gz'``, ``'a:bz2'`` or ``'a:xz'`` is not possible. If *mode*
94 is not suitable to open a certain (compressed) file for reading,
95 :exc:`ReadError` is raised. Use *mode* ``'r'`` to avoid this. If a
96 compression method is not supported, :exc:`CompressionError` is raised.
Georg Brandl116aa622007-08-15 14:28:22 +000097
Antoine Pitrou11cb9612010-09-15 11:11:28 +000098 If *fileobj* is specified, it is used as an alternative to a :term:`file object`
99 opened in binary mode for *name*. It is supposed to be at position 0.
Georg Brandl116aa622007-08-15 14:28:22 +0000100
Berker Peksag0fe63252015-02-13 21:02:12 +0200101 For modes ``'w:gz'``, ``'r:gz'``, ``'w:bz2'``, ``'r:bz2'``, ``'x:gz'``,
102 ``'x:bz2'``, :func:`tarfile.open` accepts the keyword argument
Martin Panter7f7c6052016-04-13 03:24:06 +0000103 *compresslevel* (default ``9``) to specify the compression level of the file.
Benjamin Peterson9b2731b2014-06-07 12:45:37 -0700104
Georg Brandl116aa622007-08-15 14:28:22 +0000105 For special purposes, there is a second format for *mode*:
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000106 ``'filemode|[compression]'``. :func:`tarfile.open` will return a :class:`TarFile`
Georg Brandl116aa622007-08-15 14:28:22 +0000107 object that processes its data as a stream of blocks. No random seeking will
108 be done on the file. If given, *fileobj* may be any object that has a
109 :meth:`read` or :meth:`write` method (depending on the *mode*). *bufsize*
110 specifies the blocksize and defaults to ``20 * 512`` bytes. Use this variant
Antoine Pitrou11cb9612010-09-15 11:11:28 +0000111 in combination with e.g. ``sys.stdin``, a socket :term:`file object` or a tape
Georg Brandl116aa622007-08-15 14:28:22 +0000112 device. However, such a :class:`TarFile` object is limited in that it does
Martin Panterc04fb562016-02-10 05:44:01 +0000113 not allow random access, see :ref:`tar-examples`. The currently
Georg Brandl116aa622007-08-15 14:28:22 +0000114 possible modes:
115
116 +-------------+--------------------------------------------+
117 | Mode | Action |
118 +=============+============================================+
119 | ``'r|*'`` | Open a *stream* of tar blocks for reading |
120 | | with transparent compression. |
121 +-------------+--------------------------------------------+
122 | ``'r|'`` | Open a *stream* of uncompressed tar blocks |
123 | | for reading. |
124 +-------------+--------------------------------------------+
125 | ``'r|gz'`` | Open a gzip compressed *stream* for |
126 | | reading. |
127 +-------------+--------------------------------------------+
128 | ``'r|bz2'`` | Open a bzip2 compressed *stream* for |
129 | | reading. |
130 +-------------+--------------------------------------------+
Serhiy Storchaka6a7b3a72016-04-17 08:32:47 +0300131 | ``'r|xz'`` | Open an lzma compressed *stream* for |
Lars Gustäbel0a9dd2f2011-12-10 20:38:14 +0100132 | | reading. |
133 +-------------+--------------------------------------------+
Georg Brandl116aa622007-08-15 14:28:22 +0000134 | ``'w|'`` | Open an uncompressed *stream* for writing. |
135 +-------------+--------------------------------------------+
Lars Gustäbel0c6cbbd2011-12-10 12:45:45 +0100136 | ``'w|gz'`` | Open a gzip compressed *stream* for |
Georg Brandl116aa622007-08-15 14:28:22 +0000137 | | writing. |
138 +-------------+--------------------------------------------+
Lars Gustäbel0c6cbbd2011-12-10 12:45:45 +0100139 | ``'w|bz2'`` | Open a bzip2 compressed *stream* for |
Georg Brandl116aa622007-08-15 14:28:22 +0000140 | | writing. |
141 +-------------+--------------------------------------------+
Lars Gustäbel0a9dd2f2011-12-10 20:38:14 +0100142 | ``'w|xz'`` | Open an lzma compressed *stream* for |
143 | | writing. |
144 +-------------+--------------------------------------------+
Georg Brandl116aa622007-08-15 14:28:22 +0000145
Berker Peksag0fe63252015-02-13 21:02:12 +0200146 .. versionchanged:: 3.5
147 The ``'x'`` (exclusive creation) mode was added.
Georg Brandl116aa622007-08-15 14:28:22 +0000148
149.. class:: TarFile
150
Berker Peksag97484782016-06-14 00:48:35 +0300151 Class for reading and writing tar archives. Do not use this class directly:
152 use :func:`tarfile.open` instead. See :ref:`tarfile-objects`.
Georg Brandl116aa622007-08-15 14:28:22 +0000153
154
155.. function:: is_tarfile(name)
156
157 Return :const:`True` if *name* is a tar archive file, that the :mod:`tarfile`
158 module can read.
159
160
Lars Gustäbel0c24e8b2008-08-02 11:43:24 +0000161The :mod:`tarfile` module defines the following exceptions:
Georg Brandl116aa622007-08-15 14:28:22 +0000162
163
164.. exception:: TarError
165
166 Base class for all :mod:`tarfile` exceptions.
167
168
169.. exception:: ReadError
170
171 Is raised when a tar archive is opened, that either cannot be handled by the
172 :mod:`tarfile` module or is somehow invalid.
173
174
175.. exception:: CompressionError
176
177 Is raised when a compression method is not supported or when the data cannot be
178 decoded properly.
179
180
181.. exception:: StreamError
182
183 Is raised for the limitations that are typical for stream-like :class:`TarFile`
184 objects.
185
186
187.. exception:: ExtractError
188
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000189 Is raised for *non-fatal* errors when using :meth:`TarFile.extract`, but only if
Georg Brandl116aa622007-08-15 14:28:22 +0000190 :attr:`TarFile.errorlevel`\ ``== 2``.
191
192
193.. exception:: HeaderError
194
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000195 Is raised by :meth:`TarInfo.frombuf` if the buffer it gets is invalid.
196
Georg Brandl116aa622007-08-15 14:28:22 +0000197
R David Murraybf92bce2014-10-03 20:18:48 -0400198The following constants are available at the module level:
199
200.. data:: ENCODING
201
202 The default character encoding: ``'utf-8'`` on Windows, the value returned by
203 :func:`sys.getfilesystemencoding` otherwise.
204
Georg Brandl116aa622007-08-15 14:28:22 +0000205
206Each of the following constants defines a tar archive format that the
207:mod:`tarfile` module is able to create. See section :ref:`tar-formats` for
208details.
209
210
211.. data:: USTAR_FORMAT
212
213 POSIX.1-1988 (ustar) format.
214
215
216.. data:: GNU_FORMAT
217
218 GNU tar format.
219
220
221.. data:: PAX_FORMAT
222
223 POSIX.1-2001 (pax) format.
224
225
226.. data:: DEFAULT_FORMAT
227
228 The default format for creating archives. This is currently :const:`GNU_FORMAT`.
229
230
231.. seealso::
232
233 Module :mod:`zipfile`
234 Documentation of the :mod:`zipfile` standard module.
235
R David Murraybf92bce2014-10-03 20:18:48 -0400236 :ref:`archiving-operations`
237 Documentation of the higher-level archiving facilities provided by the
238 standard :mod:`shutil` module.
239
Serhiy Storchaka6dff0202016-05-07 10:49:07 +0300240 `GNU tar manual, Basic Tar Format <https://www.gnu.org/software/tar/manual/html_node/Standard.html>`_
Georg Brandl116aa622007-08-15 14:28:22 +0000241 Documentation for tar archive files, including GNU tar extensions.
242
Georg Brandl116aa622007-08-15 14:28:22 +0000243
244.. _tarfile-objects:
245
246TarFile Objects
247---------------
248
249The :class:`TarFile` object provides an interface to a tar archive. A tar
250archive is a sequence of blocks. An archive member (a stored file) is made up of
251a header block followed by data blocks. It is possible to store a file in a tar
252archive several times. Each archive member is represented by a :class:`TarInfo`
253object, see :ref:`tarinfo-objects` for details.
254
Lars Gustäbel01385812010-03-03 12:08:54 +0000255A :class:`TarFile` object can be used as a context manager in a :keyword:`with`
256statement. It will automatically be closed when the block is completed. Please
257note that in the event of an exception an archive opened for writing will not
Benjamin Peterson08bf91c2010-04-11 16:12:57 +0000258be finalized; only the internally used file object will be closed. See the
Lars Gustäbel01385812010-03-03 12:08:54 +0000259:ref:`tar-examples` section for a use case.
260
261.. versionadded:: 3.2
Serhiy Storchaka14867992014-09-10 23:43:41 +0300262 Added support for the context management protocol.
Georg Brandl116aa622007-08-15 14:28:22 +0000263
Victor Stinnerde629d42010-05-05 21:43:57 +0000264.. class:: TarFile(name=None, mode='r', fileobj=None, format=DEFAULT_FORMAT, tarinfo=TarInfo, dereference=False, ignore_zeros=False, encoding=ENCODING, errors='surrogateescape', pax_headers=None, debug=0, errorlevel=0)
Georg Brandl116aa622007-08-15 14:28:22 +0000265
266 All following arguments are optional and can be accessed as instance attributes
267 as well.
268
269 *name* is the pathname of the archive. It can be omitted if *fileobj* is given.
270 In this case, the file object's :attr:`name` attribute is used if it exists.
271
272 *mode* is either ``'r'`` to read from an existing archive, ``'a'`` to append
Berker Peksag0fe63252015-02-13 21:02:12 +0200273 data to an existing file, ``'w'`` to create a new file overwriting an existing
Berker Peksag97484782016-06-14 00:48:35 +0300274 one, or ``'x'`` to create a new file only if it does not already exist.
Georg Brandl116aa622007-08-15 14:28:22 +0000275
276 If *fileobj* is given, it is used for reading or writing data. If it can be
277 determined, *mode* is overridden by *fileobj*'s mode. *fileobj* will be used
278 from position 0.
279
280 .. note::
281
282 *fileobj* is not closed, when :class:`TarFile` is closed.
283
284 *format* controls the archive format. It must be one of the constants
285 :const:`USTAR_FORMAT`, :const:`GNU_FORMAT` or :const:`PAX_FORMAT` that are
286 defined at module level.
287
Georg Brandl116aa622007-08-15 14:28:22 +0000288 The *tarinfo* argument can be used to replace the default :class:`TarInfo` class
289 with a different one.
290
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000291 If *dereference* is :const:`False`, add symbolic and hard links to the archive. If it
292 is :const:`True`, add the content of the target files to the archive. This has no
Georg Brandl116aa622007-08-15 14:28:22 +0000293 effect on systems that do not support symbolic links.
294
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000295 If *ignore_zeros* is :const:`False`, treat an empty block as the end of the archive.
296 If it is :const:`True`, skip empty (and invalid) blocks and try to get as many members
Georg Brandl116aa622007-08-15 14:28:22 +0000297 as possible. This is only useful for reading concatenated or damaged archives.
298
299 *debug* can be set from ``0`` (no debug messages) up to ``3`` (all debug
300 messages). The messages are written to ``sys.stderr``.
301
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000302 If *errorlevel* is ``0``, all errors are ignored when using :meth:`TarFile.extract`.
Georg Brandl116aa622007-08-15 14:28:22 +0000303 Nevertheless, they appear as error messages in the debug output, when debugging
Antoine Pitrou62ab10a02011-10-12 20:10:51 +0200304 is enabled. If ``1``, all *fatal* errors are raised as :exc:`OSError`
305 exceptions. If ``2``, all *non-fatal* errors are raised as :exc:`TarError`
306 exceptions as well.
Georg Brandl116aa622007-08-15 14:28:22 +0000307
Lars Gustäbel3741eff2007-08-21 12:17:05 +0000308 The *encoding* and *errors* arguments define the character encoding to be
309 used for reading or writing the archive and how conversion errors are going
310 to be handled. The default settings will work for most users.
Georg Brandl116aa622007-08-15 14:28:22 +0000311 See section :ref:`tar-unicode` for in-depth information.
312
Lars Gustäbel3741eff2007-08-21 12:17:05 +0000313 The *pax_headers* argument is an optional dictionary of strings which
Georg Brandl116aa622007-08-15 14:28:22 +0000314 will be added as a pax global header if *format* is :const:`PAX_FORMAT`.
315
Berker Peksag0fe63252015-02-13 21:02:12 +0200316 .. versionchanged:: 3.2
317 Use ``'surrogateescape'`` as the default for the *errors* argument.
318
319 .. versionchanged:: 3.5
320 The ``'x'`` (exclusive creation) mode was added.
Georg Brandl116aa622007-08-15 14:28:22 +0000321
Raymond Hettinger7096e262014-05-23 03:46:52 +0100322.. classmethod:: TarFile.open(...)
Georg Brandl116aa622007-08-15 14:28:22 +0000323
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000324 Alternative constructor. The :func:`tarfile.open` function is actually a
325 shortcut to this classmethod.
Georg Brandl116aa622007-08-15 14:28:22 +0000326
327
328.. method:: TarFile.getmember(name)
329
330 Return a :class:`TarInfo` object for member *name*. If *name* can not be found
331 in the archive, :exc:`KeyError` is raised.
332
333 .. note::
334
335 If a member occurs more than once in the archive, its last occurrence is assumed
336 to be the most up-to-date version.
337
338
339.. method:: TarFile.getmembers()
340
341 Return the members of the archive as a list of :class:`TarInfo` objects. The
342 list has the same order as the members in the archive.
343
344
345.. method:: TarFile.getnames()
346
347 Return the members as a list of their names. It has the same order as the list
348 returned by :meth:`getmembers`.
349
350
Serhiy Storchakaa7eb7462014-08-21 10:01:16 +0300351.. method:: TarFile.list(verbose=True, *, members=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000352
353 Print a table of contents to ``sys.stdout``. If *verbose* is :const:`False`,
354 only the names of the members are printed. If it is :const:`True`, output
Serhiy Storchakaa7eb7462014-08-21 10:01:16 +0300355 similar to that of :program:`ls -l` is produced. If optional *members* is
356 given, it must be a subset of the list returned by :meth:`getmembers`.
357
358 .. versionchanged:: 3.5
359 Added the *members* parameter.
Georg Brandl116aa622007-08-15 14:28:22 +0000360
361
362.. method:: TarFile.next()
363
364 Return the next member of the archive as a :class:`TarInfo` object, when
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000365 :class:`TarFile` is opened for reading. Return :const:`None` if there is no more
Georg Brandl116aa622007-08-15 14:28:22 +0000366 available.
367
368
Eric V. Smith7a803892015-04-15 10:27:58 -0400369.. method:: TarFile.extractall(path=".", members=None, *, numeric_owner=False)
Georg Brandl116aa622007-08-15 14:28:22 +0000370
371 Extract all members from the archive to the current working directory or
372 directory *path*. If optional *members* is given, it must be a subset of the
373 list returned by :meth:`getmembers`. Directory information like owner,
374 modification time and permissions are set after all members have been extracted.
375 This is done to work around two problems: A directory's modification time is
376 reset each time a file is created in it. And, if a directory's permissions do
377 not allow writing, extracting files to it will fail.
378
Eric V. Smith7a803892015-04-15 10:27:58 -0400379 If *numeric_owner* is :const:`True`, the uid and gid numbers from the tarfile
380 are used to set the owner/group for the extracted files. Otherwise, the named
381 values from the tarfile are used.
382
Thomas Wouters47b49bf2007-08-30 22:15:33 +0000383 .. warning::
384
385 Never extract archives from untrusted sources without prior inspection.
386 It is possible that files are created outside of *path*, e.g. members
387 that have absolute filenames starting with ``"/"`` or filenames with two
388 dots ``".."``.
389
Eric V. Smith7a803892015-04-15 10:27:58 -0400390 .. versionchanged:: 3.5
Martin Panterefbf20f2016-11-13 23:25:06 +0000391 Added the *numeric_owner* parameter.
Georg Brandl116aa622007-08-15 14:28:22 +0000392
Eric V. Smith7a803892015-04-15 10:27:58 -0400393
394.. method:: TarFile.extract(member, path="", set_attrs=True, *, numeric_owner=False)
Georg Brandl116aa622007-08-15 14:28:22 +0000395
396 Extract a member from the archive to the current working directory, using its
397 full name. Its file information is extracted as accurately as possible. *member*
398 may be a filename or a :class:`TarInfo` object. You can specify a different
Martin v. Löwis16f344d2010-11-01 21:39:13 +0000399 directory using *path*. File attributes (owner, mtime, mode) are set unless
Serhiy Storchakafbc1c262013-11-29 12:17:13 +0200400 *set_attrs* is false.
Georg Brandl116aa622007-08-15 14:28:22 +0000401
Eric V. Smith7a803892015-04-15 10:27:58 -0400402 If *numeric_owner* is :const:`True`, the uid and gid numbers from the tarfile
403 are used to set the owner/group for the extracted files. Otherwise, the named
404 values from the tarfile are used.
405
Georg Brandl116aa622007-08-15 14:28:22 +0000406 .. note::
407
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000408 The :meth:`extract` method does not take care of several extraction issues.
409 In most cases you should consider using the :meth:`extractall` method.
Georg Brandl116aa622007-08-15 14:28:22 +0000410
Thomas Wouters47b49bf2007-08-30 22:15:33 +0000411 .. warning::
412
413 See the warning for :meth:`extractall`.
414
Martin v. Löwis16f344d2010-11-01 21:39:13 +0000415 .. versionchanged:: 3.2
416 Added the *set_attrs* parameter.
Georg Brandl116aa622007-08-15 14:28:22 +0000417
Eric V. Smith7a803892015-04-15 10:27:58 -0400418 .. versionchanged:: 3.5
Martin Panterefbf20f2016-11-13 23:25:06 +0000419 Added the *numeric_owner* parameter.
Eric V. Smith7a803892015-04-15 10:27:58 -0400420
Georg Brandl116aa622007-08-15 14:28:22 +0000421.. method:: TarFile.extractfile(member)
422
423 Extract a member from the archive as a file object. *member* may be a filename
Lars Gustäbel7a919e92012-05-05 18:15:03 +0200424 or a :class:`TarInfo` object. If *member* is a regular file or a link, an
425 :class:`io.BufferedReader` object is returned. Otherwise, :const:`None` is
426 returned.
Georg Brandl116aa622007-08-15 14:28:22 +0000427
Lars Gustäbel7a919e92012-05-05 18:15:03 +0200428 .. versionchanged:: 3.3
429 Return an :class:`io.BufferedReader` object.
Georg Brandl116aa622007-08-15 14:28:22 +0000430
431
Serhiy Storchaka4f76fb12017-01-13 13:25:24 +0200432.. method:: TarFile.add(name, arcname=None, recursive=True, *, filter=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000433
Raymond Hettingera63a3122011-01-26 20:34:14 +0000434 Add the file *name* to the archive. *name* may be any type of file
435 (directory, fifo, symbolic link, etc.). If given, *arcname* specifies an
436 alternative name for the file in the archive. Directories are added
437 recursively by default. This can be avoided by setting *recursive* to
Serhiy Storchaka4f76fb12017-01-13 13:25:24 +0200438 :const:`False`. If *filter* is given, it
Raymond Hettingera63a3122011-01-26 20:34:14 +0000439 should be a function that takes a :class:`TarInfo` object argument and
440 returns the changed :class:`TarInfo` object. If it instead returns
441 :const:`None` the :class:`TarInfo` object will be excluded from the
442 archive. See :ref:`tar-examples` for an example.
Lars Gustäbel049d2aa2009-09-12 10:44:00 +0000443
444 .. versionchanged:: 3.2
445 Added the *filter* parameter.
446
Georg Brandl116aa622007-08-15 14:28:22 +0000447
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000448.. method:: TarFile.addfile(tarinfo, fileobj=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000449
450 Add the :class:`TarInfo` object *tarinfo* to the archive. If *fileobj* is given,
Martin Panterf817a482016-02-19 23:34:56 +0000451 it should be a :term:`binary file`, and
Georg Brandl116aa622007-08-15 14:28:22 +0000452 ``tarinfo.size`` bytes are read from it and added to the archive. You can
Martin Panterf817a482016-02-19 23:34:56 +0000453 create :class:`TarInfo` objects directly, or by using :meth:`gettarinfo`.
Georg Brandl116aa622007-08-15 14:28:22 +0000454
455
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000456.. method:: TarFile.gettarinfo(name=None, arcname=None, fileobj=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000457
Martin Panterf817a482016-02-19 23:34:56 +0000458 Create a :class:`TarInfo` object from the result of :func:`os.stat` or
459 equivalent on an existing file. The file is either named by *name*, or
460 specified as a :term:`file object` *fileobj* with a file descriptor. If
461 given, *arcname* specifies an alternative name for the file in the
462 archive, otherwise, the name is taken from *fileobj*’s
463 :attr:`~io.FileIO.name` attribute, or the *name* argument. The name
464 should be a text string.
465
466 You can modify
467 some of the :class:`TarInfo`’s attributes before you add it using :meth:`addfile`.
468 If the file object is not an ordinary file object positioned at the
469 beginning of the file, attributes such as :attr:`~TarInfo.size` may need
470 modifying. This is the case for objects such as :class:`~gzip.GzipFile`.
471 The :attr:`~TarInfo.name` may also be modified, in which case *arcname*
472 could be a dummy string.
Georg Brandl116aa622007-08-15 14:28:22 +0000473
474
475.. method:: TarFile.close()
476
477 Close the :class:`TarFile`. In write mode, two finishing zero blocks are
478 appended to the archive.
479
480
Georg Brandl116aa622007-08-15 14:28:22 +0000481.. attribute:: TarFile.pax_headers
482
483 A dictionary containing key-value pairs of pax global headers.
484
Georg Brandl116aa622007-08-15 14:28:22 +0000485
Georg Brandl116aa622007-08-15 14:28:22 +0000486
487.. _tarinfo-objects:
488
489TarInfo Objects
490---------------
491
492A :class:`TarInfo` object represents one member in a :class:`TarFile`. Aside
493from storing all required attributes of a file (like file type, size, time,
494permissions, owner etc.), it provides some useful methods to determine its type.
495It does *not* contain the file's data itself.
496
497:class:`TarInfo` objects are returned by :class:`TarFile`'s methods
498:meth:`getmember`, :meth:`getmembers` and :meth:`gettarinfo`.
499
500
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000501.. class:: TarInfo(name="")
Georg Brandl116aa622007-08-15 14:28:22 +0000502
503 Create a :class:`TarInfo` object.
504
505
Berker Peksag37de9102015-04-19 04:37:35 +0300506.. classmethod:: TarInfo.frombuf(buf, encoding, errors)
Georg Brandl116aa622007-08-15 14:28:22 +0000507
508 Create and return a :class:`TarInfo` object from string buffer *buf*.
509
Berker Peksag37de9102015-04-19 04:37:35 +0300510 Raises :exc:`HeaderError` if the buffer is invalid.
Georg Brandl116aa622007-08-15 14:28:22 +0000511
512
Berker Peksag37de9102015-04-19 04:37:35 +0300513.. classmethod:: TarInfo.fromtarfile(tarfile)
Georg Brandl116aa622007-08-15 14:28:22 +0000514
515 Read the next member from the :class:`TarFile` object *tarfile* and return it as
516 a :class:`TarInfo` object.
517
Georg Brandl116aa622007-08-15 14:28:22 +0000518
Victor Stinnerde629d42010-05-05 21:43:57 +0000519.. method:: TarInfo.tobuf(format=DEFAULT_FORMAT, encoding=ENCODING, errors='surrogateescape')
Georg Brandl116aa622007-08-15 14:28:22 +0000520
521 Create a string buffer from a :class:`TarInfo` object. For information on the
522 arguments see the constructor of the :class:`TarFile` class.
523
Victor Stinnerde629d42010-05-05 21:43:57 +0000524 .. versionchanged:: 3.2
525 Use ``'surrogateescape'`` as the default for the *errors* argument.
526
Georg Brandl116aa622007-08-15 14:28:22 +0000527
528A ``TarInfo`` object has the following public data attributes:
529
530
531.. attribute:: TarInfo.name
532
533 Name of the archive member.
534
535
536.. attribute:: TarInfo.size
537
538 Size in bytes.
539
540
541.. attribute:: TarInfo.mtime
542
543 Time of last modification.
544
545
546.. attribute:: TarInfo.mode
547
548 Permission bits.
549
550
551.. attribute:: TarInfo.type
552
553 File type. *type* is usually one of these constants: :const:`REGTYPE`,
554 :const:`AREGTYPE`, :const:`LNKTYPE`, :const:`SYMTYPE`, :const:`DIRTYPE`,
555 :const:`FIFOTYPE`, :const:`CONTTYPE`, :const:`CHRTYPE`, :const:`BLKTYPE`,
556 :const:`GNUTYPE_SPARSE`. To determine the type of a :class:`TarInfo` object
Raymond Hettingerf7f64f92014-05-23 00:03:45 +0100557 more conveniently, use the ``is*()`` methods below.
Georg Brandl116aa622007-08-15 14:28:22 +0000558
559
560.. attribute:: TarInfo.linkname
561
562 Name of the target file name, which is only present in :class:`TarInfo` objects
563 of type :const:`LNKTYPE` and :const:`SYMTYPE`.
564
565
566.. attribute:: TarInfo.uid
567
568 User ID of the user who originally stored this member.
569
570
571.. attribute:: TarInfo.gid
572
573 Group ID of the user who originally stored this member.
574
575
576.. attribute:: TarInfo.uname
577
578 User name.
579
580
581.. attribute:: TarInfo.gname
582
583 Group name.
584
585
586.. attribute:: TarInfo.pax_headers
587
588 A dictionary containing key-value pairs of an associated pax extended header.
589
Georg Brandl116aa622007-08-15 14:28:22 +0000590
591A :class:`TarInfo` object also provides some convenient query methods:
592
593
594.. method:: TarInfo.isfile()
595
596 Return :const:`True` if the :class:`Tarinfo` object is a regular file.
597
598
599.. method:: TarInfo.isreg()
600
601 Same as :meth:`isfile`.
602
603
604.. method:: TarInfo.isdir()
605
606 Return :const:`True` if it is a directory.
607
608
609.. method:: TarInfo.issym()
610
611 Return :const:`True` if it is a symbolic link.
612
613
614.. method:: TarInfo.islnk()
615
616 Return :const:`True` if it is a hard link.
617
618
619.. method:: TarInfo.ischr()
620
621 Return :const:`True` if it is a character device.
622
623
624.. method:: TarInfo.isblk()
625
626 Return :const:`True` if it is a block device.
627
628
629.. method:: TarInfo.isfifo()
630
631 Return :const:`True` if it is a FIFO.
632
633
634.. method:: TarInfo.isdev()
635
636 Return :const:`True` if it is one of character device, block device or FIFO.
637
Georg Brandl116aa622007-08-15 14:28:22 +0000638
Serhiy Storchakad27b4552013-11-24 01:53:29 +0200639.. _tarfile-commandline:
Serhiy Storchaka72b34432016-11-02 21:04:45 +0200640.. program:: tarfile
Serhiy Storchakad27b4552013-11-24 01:53:29 +0200641
Serhiy Storchaka72b34432016-11-02 21:04:45 +0200642Command-Line Interface
Serhiy Storchakad27b4552013-11-24 01:53:29 +0200643----------------------
644
645.. versionadded:: 3.4
646
Serhiy Storchaka72b34432016-11-02 21:04:45 +0200647The :mod:`tarfile` module provides a simple command-line interface to interact
Serhiy Storchakad27b4552013-11-24 01:53:29 +0200648with tar archives.
649
650If you want to create a new tar archive, specify its name after the :option:`-c`
Martin Panter1050d2d2016-07-26 11:18:21 +0200651option and then list the filename(s) that should be included:
652
653.. code-block:: shell-session
Serhiy Storchakad27b4552013-11-24 01:53:29 +0200654
655 $ python -m tarfile -c monty.tar spam.txt eggs.txt
656
Martin Panter1050d2d2016-07-26 11:18:21 +0200657Passing a directory is also acceptable:
658
659.. code-block:: shell-session
Serhiy Storchakad27b4552013-11-24 01:53:29 +0200660
661 $ python -m tarfile -c monty.tar life-of-brian_1979/
662
663If you want to extract a tar archive into the current directory, use
Martin Panter1050d2d2016-07-26 11:18:21 +0200664the :option:`-e` option:
665
666.. code-block:: shell-session
Serhiy Storchakad27b4552013-11-24 01:53:29 +0200667
668 $ python -m tarfile -e monty.tar
669
670You can also extract a tar archive into a different directory by passing the
Martin Panter1050d2d2016-07-26 11:18:21 +0200671directory's name:
672
673.. code-block:: shell-session
Serhiy Storchakad27b4552013-11-24 01:53:29 +0200674
675 $ python -m tarfile -e monty.tar other-dir/
676
Martin Panter1050d2d2016-07-26 11:18:21 +0200677For a list of the files in a tar archive, use the :option:`-l` option:
678
679.. code-block:: shell-session
Serhiy Storchakad27b4552013-11-24 01:53:29 +0200680
681 $ python -m tarfile -l monty.tar
682
683
Serhiy Storchaka72b34432016-11-02 21:04:45 +0200684Command-line options
Serhiy Storchakad27b4552013-11-24 01:53:29 +0200685~~~~~~~~~~~~~~~~~~~~
686
687.. cmdoption:: -l <tarfile>
688 --list <tarfile>
689
690 List files in a tarfile.
691
Serhiy Storchaka72b34432016-11-02 21:04:45 +0200692.. cmdoption:: -c <tarfile> <source1> ... <sourceN>
693 --create <tarfile> <source1> ... <sourceN>
Serhiy Storchakad27b4552013-11-24 01:53:29 +0200694
695 Create tarfile from source files.
696
697.. cmdoption:: -e <tarfile> [<output_dir>]
698 --extract <tarfile> [<output_dir>]
699
700 Extract tarfile into the current directory if *output_dir* is not specified.
701
702.. cmdoption:: -t <tarfile>
703 --test <tarfile>
704
705 Test whether the tarfile is valid or not.
706
707.. cmdoption:: -v, --verbose
708
Serhiy Storchaka72b34432016-11-02 21:04:45 +0200709 Verbose output.
Serhiy Storchakad27b4552013-11-24 01:53:29 +0200710
Georg Brandl116aa622007-08-15 14:28:22 +0000711.. _tar-examples:
712
713Examples
714--------
715
716How to extract an entire tar archive to the current working directory::
717
718 import tarfile
719 tar = tarfile.open("sample.tar.gz")
720 tar.extractall()
721 tar.close()
722
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000723How to extract a subset of a tar archive with :meth:`TarFile.extractall` using
724a generator function instead of a list::
725
726 import os
727 import tarfile
728
729 def py_files(members):
730 for tarinfo in members:
731 if os.path.splitext(tarinfo.name)[1] == ".py":
732 yield tarinfo
733
734 tar = tarfile.open("sample.tar.gz")
735 tar.extractall(members=py_files(tar))
736 tar.close()
737
Georg Brandl116aa622007-08-15 14:28:22 +0000738How to create an uncompressed tar archive from a list of filenames::
739
740 import tarfile
741 tar = tarfile.open("sample.tar", "w")
742 for name in ["foo", "bar", "quux"]:
743 tar.add(name)
744 tar.close()
745
Lars Gustäbel01385812010-03-03 12:08:54 +0000746The same example using the :keyword:`with` statement::
747
748 import tarfile
749 with tarfile.open("sample.tar", "w") as tar:
750 for name in ["foo", "bar", "quux"]:
751 tar.add(name)
752
Georg Brandl116aa622007-08-15 14:28:22 +0000753How to read a gzip compressed tar archive and display some member information::
754
755 import tarfile
756 tar = tarfile.open("sample.tar.gz", "r:gz")
757 for tarinfo in tar:
Collin Winterc79461b2007-09-01 23:34:30 +0000758 print(tarinfo.name, "is", tarinfo.size, "bytes in size and is", end="")
Georg Brandl116aa622007-08-15 14:28:22 +0000759 if tarinfo.isreg():
Collin Winterc79461b2007-09-01 23:34:30 +0000760 print("a regular file.")
Georg Brandl116aa622007-08-15 14:28:22 +0000761 elif tarinfo.isdir():
Collin Winterc79461b2007-09-01 23:34:30 +0000762 print("a directory.")
Georg Brandl116aa622007-08-15 14:28:22 +0000763 else:
Collin Winterc79461b2007-09-01 23:34:30 +0000764 print("something else.")
Georg Brandl116aa622007-08-15 14:28:22 +0000765 tar.close()
766
Lars Gustäbel049d2aa2009-09-12 10:44:00 +0000767How to create an archive and reset the user information using the *filter*
768parameter in :meth:`TarFile.add`::
769
770 import tarfile
771 def reset(tarinfo):
772 tarinfo.uid = tarinfo.gid = 0
773 tarinfo.uname = tarinfo.gname = "root"
774 return tarinfo
775 tar = tarfile.open("sample.tar.gz", "w:gz")
776 tar.add("foo", filter=reset)
777 tar.close()
778
Georg Brandl116aa622007-08-15 14:28:22 +0000779
780.. _tar-formats:
781
782Supported tar formats
783---------------------
784
785There are three tar formats that can be created with the :mod:`tarfile` module:
786
787* The POSIX.1-1988 ustar format (:const:`USTAR_FORMAT`). It supports filenames
788 up to a length of at best 256 characters and linknames up to 100 characters. The
Serhiy Storchakaf8def282013-02-16 17:29:56 +0200789 maximum file size is 8 GiB. This is an old and limited but widely
Georg Brandl116aa622007-08-15 14:28:22 +0000790 supported format.
791
792* The GNU tar format (:const:`GNU_FORMAT`). It supports long filenames and
Serhiy Storchakaf8def282013-02-16 17:29:56 +0200793 linknames, files bigger than 8 GiB and sparse files. It is the de facto
Georg Brandl116aa622007-08-15 14:28:22 +0000794 standard on GNU/Linux systems. :mod:`tarfile` fully supports the GNU tar
795 extensions for long names, sparse file support is read-only.
796
797* The POSIX.1-2001 pax format (:const:`PAX_FORMAT`). It is the most flexible
798 format with virtually no limits. It supports long filenames and linknames, large
799 files and stores pathnames in a portable way. However, not all tar
800 implementations today are able to handle pax archives properly.
801
802 The *pax* format is an extension to the existing *ustar* format. It uses extra
803 headers for information that cannot be stored otherwise. There are two flavours
804 of pax headers: Extended headers only affect the subsequent file header, global
805 headers are valid for the complete archive and affect all following files. All
806 the data in a pax header is encoded in *UTF-8* for portability reasons.
807
808There are some more variants of the tar format which can be read, but not
809created:
810
811* The ancient V7 format. This is the first tar format from Unix Seventh Edition,
812 storing only regular files and directories. Names must not be longer than 100
813 characters, there is no user/group name information. Some archives have
814 miscalculated header checksums in case of fields with non-ASCII characters.
815
816* The SunOS tar extended format. This format is a variant of the POSIX.1-2001
817 pax format, but is not compatible.
818
Georg Brandl116aa622007-08-15 14:28:22 +0000819.. _tar-unicode:
820
821Unicode issues
822--------------
823
824The tar format was originally conceived to make backups on tape drives with the
825main focus on preserving file system information. Nowadays tar archives are
826commonly used for file distribution and exchanging archives over networks. One
Lars Gustäbel3741eff2007-08-21 12:17:05 +0000827problem of the original format (which is the basis of all other formats) is
828that there is no concept of supporting different character encodings. For
Georg Brandl116aa622007-08-15 14:28:22 +0000829example, an ordinary tar archive created on a *UTF-8* system cannot be read
Lars Gustäbel3741eff2007-08-21 12:17:05 +0000830correctly on a *Latin-1* system if it contains non-*ASCII* characters. Textual
831metadata (like filenames, linknames, user/group names) will appear damaged.
832Unfortunately, there is no way to autodetect the encoding of an archive. The
833pax format was designed to solve this problem. It stores non-ASCII metadata
834using the universal character encoding *UTF-8*.
Georg Brandl116aa622007-08-15 14:28:22 +0000835
Lars Gustäbel3741eff2007-08-21 12:17:05 +0000836The details of character conversion in :mod:`tarfile` are controlled by the
837*encoding* and *errors* keyword arguments of the :class:`TarFile` class.
Georg Brandl116aa622007-08-15 14:28:22 +0000838
Lars Gustäbel3741eff2007-08-21 12:17:05 +0000839*encoding* defines the character encoding to use for the metadata in the
840archive. The default value is :func:`sys.getfilesystemencoding` or ``'ascii'``
841as a fallback. Depending on whether the archive is read or written, the
842metadata must be either decoded or encoded. If *encoding* is not set
843appropriately, this conversion may fail.
Georg Brandl116aa622007-08-15 14:28:22 +0000844
845The *errors* argument defines how characters are treated that cannot be
Nick Coghlanb9fdb7a2015-01-07 00:22:00 +1000846converted. Possible values are listed in section :ref:`error-handlers`.
Victor Stinnerde629d42010-05-05 21:43:57 +0000847The default scheme is ``'surrogateescape'`` which Python also uses for its
848file system calls, see :ref:`os-filenames`.
Georg Brandl116aa622007-08-15 14:28:22 +0000849
Lars Gustäbel1465cc22010-05-17 18:02:50 +0000850In case of :const:`PAX_FORMAT` archives, *encoding* is generally not needed
851because all the metadata is stored using *UTF-8*. *encoding* is only used in
852the rare cases when binary pax headers are decoded or when strings with
853surrogate characters are stored.