blob: 90a58523833852ecc2828994323d114806dfb1e9 [file] [log] [blame]
Georg Brandl116aa622007-08-15 14:28:22 +00001:mod:`tarfile` --- Read and write tar archive files
2===================================================
3
4.. module:: tarfile
5 :synopsis: Read and write tar-format archive files.
6
Georg Brandl116aa622007-08-15 14:28:22 +00007.. moduleauthor:: Lars Gustäbel <lars@gustaebel.de>
8.. sectionauthor:: Lars Gustäbel <lars@gustaebel.de>
9
Raymond Hettingera1993682011-01-27 01:20:32 +000010**Source code:** :source:`Lib/tarfile.py`
11
12--------------
Georg Brandl116aa622007-08-15 14:28:22 +000013
Guido van Rossum77677112007-11-05 19:43:04 +000014The :mod:`tarfile` module makes it possible to read and write tar
Lars Gustäbel0a9dd2f2011-12-10 20:38:14 +010015archives, including those using gzip, bz2 and lzma compression.
Éric Araujof2fbb9c2012-01-16 16:55:55 +010016Use the :mod:`zipfile` module to read or write :file:`.zip` files, or the
17higher-level functions in :ref:`shutil <archiving-operations>`.
Guido van Rossum77677112007-11-05 19:43:04 +000018
Georg Brandl116aa622007-08-15 14:28:22 +000019Some facts and figures:
20
R David Murraybf92bce2014-10-03 20:18:48 -040021* reads and writes :mod:`gzip`, :mod:`bz2` and :mod:`lzma` compressed archives
22 if the respective modules are available.
Georg Brandl116aa622007-08-15 14:28:22 +000023
24* read/write support for the POSIX.1-1988 (ustar) format.
25
26* read/write support for the GNU tar format including *longname* and *longlink*
Lars Gustäbel9cbdd752010-10-29 09:08:19 +000027 extensions, read-only support for all variants of the *sparse* extension
28 including restoration of sparse files.
Georg Brandl116aa622007-08-15 14:28:22 +000029
30* read/write support for the POSIX.1-2001 (pax) format.
31
Georg Brandl116aa622007-08-15 14:28:22 +000032* handles directories, regular files, hardlinks, symbolic links, fifos,
33 character devices and block devices and is able to acquire and restore file
34 information like timestamp, access permissions and owner.
35
Lars Gustäbel521dfb02011-12-12 10:22:56 +010036.. versionchanged:: 3.3
37 Added support for :mod:`lzma` compression.
38
Georg Brandl116aa622007-08-15 14:28:22 +000039
Benjamin Petersona37cfc62008-05-26 13:48:34 +000040.. function:: open(name=None, mode='r', fileobj=None, bufsize=10240, \*\*kwargs)
Georg Brandl116aa622007-08-15 14:28:22 +000041
42 Return a :class:`TarFile` object for the pathname *name*. For detailed
43 information on :class:`TarFile` objects and the keyword arguments that are
44 allowed, see :ref:`tarfile-objects`.
45
46 *mode* has to be a string of the form ``'filemode[:compression]'``, it defaults
47 to ``'r'``. Here is a full list of mode combinations:
48
49 +------------------+---------------------------------------------+
50 | mode | action |
51 +==================+=============================================+
52 | ``'r' or 'r:*'`` | Open for reading with transparent |
53 | | compression (recommended). |
54 +------------------+---------------------------------------------+
55 | ``'r:'`` | Open for reading exclusively without |
56 | | compression. |
57 +------------------+---------------------------------------------+
58 | ``'r:gz'`` | Open for reading with gzip compression. |
59 +------------------+---------------------------------------------+
60 | ``'r:bz2'`` | Open for reading with bzip2 compression. |
61 +------------------+---------------------------------------------+
Lars Gustäbel0a9dd2f2011-12-10 20:38:14 +010062 | ``'r:xz'`` | Open for reading with lzma compression. |
63 +------------------+---------------------------------------------+
Berker Peksag0fe63252015-02-13 21:02:12 +020064 | ``'x'`` or | Create a tarfile exclusively without |
65 | ``'x:'`` | compression. |
66 | | Raise an :exc:`FileExistsError` exception |
Berker Peksag97484782016-06-14 00:48:35 +030067 | | if it already exists. |
Berker Peksag0fe63252015-02-13 21:02:12 +020068 +------------------+---------------------------------------------+
69 | ``'x:gz'`` | Create a tarfile with gzip compression. |
70 | | Raise an :exc:`FileExistsError` exception |
Berker Peksag97484782016-06-14 00:48:35 +030071 | | if it already exists. |
Berker Peksag0fe63252015-02-13 21:02:12 +020072 +------------------+---------------------------------------------+
73 | ``'x:bz2'`` | Create a tarfile with bzip2 compression. |
74 | | Raise an :exc:`FileExistsError` exception |
Berker Peksag97484782016-06-14 00:48:35 +030075 | | if it already exists. |
Berker Peksag0fe63252015-02-13 21:02:12 +020076 +------------------+---------------------------------------------+
77 | ``'x:xz'`` | Create a tarfile with lzma compression. |
78 | | Raise an :exc:`FileExistsError` exception |
Berker Peksag97484782016-06-14 00:48:35 +030079 | | if it already exists. |
Berker Peksag0fe63252015-02-13 21:02:12 +020080 +------------------+---------------------------------------------+
Georg Brandl116aa622007-08-15 14:28:22 +000081 | ``'a' or 'a:'`` | Open for appending with no compression. The |
82 | | file is created if it does not exist. |
83 +------------------+---------------------------------------------+
84 | ``'w' or 'w:'`` | Open for uncompressed writing. |
85 +------------------+---------------------------------------------+
86 | ``'w:gz'`` | Open for gzip compressed writing. |
87 +------------------+---------------------------------------------+
88 | ``'w:bz2'`` | Open for bzip2 compressed writing. |
89 +------------------+---------------------------------------------+
Lars Gustäbel0a9dd2f2011-12-10 20:38:14 +010090 | ``'w:xz'`` | Open for lzma compressed writing. |
91 +------------------+---------------------------------------------+
Georg Brandl116aa622007-08-15 14:28:22 +000092
Lars Gustäbel0a9dd2f2011-12-10 20:38:14 +010093 Note that ``'a:gz'``, ``'a:bz2'`` or ``'a:xz'`` is not possible. If *mode*
94 is not suitable to open a certain (compressed) file for reading,
95 :exc:`ReadError` is raised. Use *mode* ``'r'`` to avoid this. If a
96 compression method is not supported, :exc:`CompressionError` is raised.
Georg Brandl116aa622007-08-15 14:28:22 +000097
Antoine Pitrou11cb9612010-09-15 11:11:28 +000098 If *fileobj* is specified, it is used as an alternative to a :term:`file object`
99 opened in binary mode for *name*. It is supposed to be at position 0.
Georg Brandl116aa622007-08-15 14:28:22 +0000100
Berker Peksag0fe63252015-02-13 21:02:12 +0200101 For modes ``'w:gz'``, ``'r:gz'``, ``'w:bz2'``, ``'r:bz2'``, ``'x:gz'``,
102 ``'x:bz2'``, :func:`tarfile.open` accepts the keyword argument
Martin Panter7f7c6052016-04-13 03:24:06 +0000103 *compresslevel* (default ``9``) to specify the compression level of the file.
Benjamin Peterson9b2731b2014-06-07 12:45:37 -0700104
Georg Brandl116aa622007-08-15 14:28:22 +0000105 For special purposes, there is a second format for *mode*:
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000106 ``'filemode|[compression]'``. :func:`tarfile.open` will return a :class:`TarFile`
Georg Brandl116aa622007-08-15 14:28:22 +0000107 object that processes its data as a stream of blocks. No random seeking will
108 be done on the file. If given, *fileobj* may be any object that has a
109 :meth:`read` or :meth:`write` method (depending on the *mode*). *bufsize*
110 specifies the blocksize and defaults to ``20 * 512`` bytes. Use this variant
Antoine Pitrou11cb9612010-09-15 11:11:28 +0000111 in combination with e.g. ``sys.stdin``, a socket :term:`file object` or a tape
Georg Brandl116aa622007-08-15 14:28:22 +0000112 device. However, such a :class:`TarFile` object is limited in that it does
Martin Panterc04fb562016-02-10 05:44:01 +0000113 not allow random access, see :ref:`tar-examples`. The currently
Georg Brandl116aa622007-08-15 14:28:22 +0000114 possible modes:
115
116 +-------------+--------------------------------------------+
117 | Mode | Action |
118 +=============+============================================+
119 | ``'r|*'`` | Open a *stream* of tar blocks for reading |
120 | | with transparent compression. |
121 +-------------+--------------------------------------------+
122 | ``'r|'`` | Open a *stream* of uncompressed tar blocks |
123 | | for reading. |
124 +-------------+--------------------------------------------+
125 | ``'r|gz'`` | Open a gzip compressed *stream* for |
126 | | reading. |
127 +-------------+--------------------------------------------+
128 | ``'r|bz2'`` | Open a bzip2 compressed *stream* for |
129 | | reading. |
130 +-------------+--------------------------------------------+
Serhiy Storchaka6a7b3a72016-04-17 08:32:47 +0300131 | ``'r|xz'`` | Open an lzma compressed *stream* for |
Lars Gustäbel0a9dd2f2011-12-10 20:38:14 +0100132 | | reading. |
133 +-------------+--------------------------------------------+
Georg Brandl116aa622007-08-15 14:28:22 +0000134 | ``'w|'`` | Open an uncompressed *stream* for writing. |
135 +-------------+--------------------------------------------+
Lars Gustäbel0c6cbbd2011-12-10 12:45:45 +0100136 | ``'w|gz'`` | Open a gzip compressed *stream* for |
Georg Brandl116aa622007-08-15 14:28:22 +0000137 | | writing. |
138 +-------------+--------------------------------------------+
Lars Gustäbel0c6cbbd2011-12-10 12:45:45 +0100139 | ``'w|bz2'`` | Open a bzip2 compressed *stream* for |
Georg Brandl116aa622007-08-15 14:28:22 +0000140 | | writing. |
141 +-------------+--------------------------------------------+
Lars Gustäbel0a9dd2f2011-12-10 20:38:14 +0100142 | ``'w|xz'`` | Open an lzma compressed *stream* for |
143 | | writing. |
144 +-------------+--------------------------------------------+
Georg Brandl116aa622007-08-15 14:28:22 +0000145
Berker Peksag0fe63252015-02-13 21:02:12 +0200146 .. versionchanged:: 3.5
147 The ``'x'`` (exclusive creation) mode was added.
Georg Brandl116aa622007-08-15 14:28:22 +0000148
149.. class:: TarFile
150
Berker Peksag97484782016-06-14 00:48:35 +0300151 Class for reading and writing tar archives. Do not use this class directly:
152 use :func:`tarfile.open` instead. See :ref:`tarfile-objects`.
Georg Brandl116aa622007-08-15 14:28:22 +0000153
154
155.. function:: is_tarfile(name)
156
157 Return :const:`True` if *name* is a tar archive file, that the :mod:`tarfile`
158 module can read.
159
160
Lars Gustäbel0c24e8b2008-08-02 11:43:24 +0000161The :mod:`tarfile` module defines the following exceptions:
Georg Brandl116aa622007-08-15 14:28:22 +0000162
163
164.. exception:: TarError
165
166 Base class for all :mod:`tarfile` exceptions.
167
168
169.. exception:: ReadError
170
171 Is raised when a tar archive is opened, that either cannot be handled by the
172 :mod:`tarfile` module or is somehow invalid.
173
174
175.. exception:: CompressionError
176
177 Is raised when a compression method is not supported or when the data cannot be
178 decoded properly.
179
180
181.. exception:: StreamError
182
183 Is raised for the limitations that are typical for stream-like :class:`TarFile`
184 objects.
185
186
187.. exception:: ExtractError
188
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000189 Is raised for *non-fatal* errors when using :meth:`TarFile.extract`, but only if
Georg Brandl116aa622007-08-15 14:28:22 +0000190 :attr:`TarFile.errorlevel`\ ``== 2``.
191
192
193.. exception:: HeaderError
194
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000195 Is raised by :meth:`TarInfo.frombuf` if the buffer it gets is invalid.
196
Georg Brandl116aa622007-08-15 14:28:22 +0000197
R David Murraybf92bce2014-10-03 20:18:48 -0400198The following constants are available at the module level:
199
200.. data:: ENCODING
201
202 The default character encoding: ``'utf-8'`` on Windows, the value returned by
203 :func:`sys.getfilesystemencoding` otherwise.
204
Georg Brandl116aa622007-08-15 14:28:22 +0000205
206Each of the following constants defines a tar archive format that the
207:mod:`tarfile` module is able to create. See section :ref:`tar-formats` for
208details.
209
210
211.. data:: USTAR_FORMAT
212
213 POSIX.1-1988 (ustar) format.
214
215
216.. data:: GNU_FORMAT
217
218 GNU tar format.
219
220
221.. data:: PAX_FORMAT
222
223 POSIX.1-2001 (pax) format.
224
225
226.. data:: DEFAULT_FORMAT
227
228 The default format for creating archives. This is currently :const:`GNU_FORMAT`.
229
230
231.. seealso::
232
233 Module :mod:`zipfile`
234 Documentation of the :mod:`zipfile` standard module.
235
R David Murraybf92bce2014-10-03 20:18:48 -0400236 :ref:`archiving-operations`
237 Documentation of the higher-level archiving facilities provided by the
238 standard :mod:`shutil` module.
239
Serhiy Storchaka6dff0202016-05-07 10:49:07 +0300240 `GNU tar manual, Basic Tar Format <https://www.gnu.org/software/tar/manual/html_node/Standard.html>`_
Georg Brandl116aa622007-08-15 14:28:22 +0000241 Documentation for tar archive files, including GNU tar extensions.
242
Georg Brandl116aa622007-08-15 14:28:22 +0000243
244.. _tarfile-objects:
245
246TarFile Objects
247---------------
248
249The :class:`TarFile` object provides an interface to a tar archive. A tar
250archive is a sequence of blocks. An archive member (a stored file) is made up of
251a header block followed by data blocks. It is possible to store a file in a tar
252archive several times. Each archive member is represented by a :class:`TarInfo`
253object, see :ref:`tarinfo-objects` for details.
254
Lars Gustäbel01385812010-03-03 12:08:54 +0000255A :class:`TarFile` object can be used as a context manager in a :keyword:`with`
256statement. It will automatically be closed when the block is completed. Please
257note that in the event of an exception an archive opened for writing will not
Benjamin Peterson08bf91c2010-04-11 16:12:57 +0000258be finalized; only the internally used file object will be closed. See the
Lars Gustäbel01385812010-03-03 12:08:54 +0000259:ref:`tar-examples` section for a use case.
260
261.. versionadded:: 3.2
Serhiy Storchaka14867992014-09-10 23:43:41 +0300262 Added support for the context management protocol.
Georg Brandl116aa622007-08-15 14:28:22 +0000263
Victor Stinnerde629d42010-05-05 21:43:57 +0000264.. class:: TarFile(name=None, mode='r', fileobj=None, format=DEFAULT_FORMAT, tarinfo=TarInfo, dereference=False, ignore_zeros=False, encoding=ENCODING, errors='surrogateescape', pax_headers=None, debug=0, errorlevel=0)
Georg Brandl116aa622007-08-15 14:28:22 +0000265
266 All following arguments are optional and can be accessed as instance attributes
267 as well.
268
269 *name* is the pathname of the archive. It can be omitted if *fileobj* is given.
270 In this case, the file object's :attr:`name` attribute is used if it exists.
271
272 *mode* is either ``'r'`` to read from an existing archive, ``'a'`` to append
Berker Peksag0fe63252015-02-13 21:02:12 +0200273 data to an existing file, ``'w'`` to create a new file overwriting an existing
Berker Peksag97484782016-06-14 00:48:35 +0300274 one, or ``'x'`` to create a new file only if it does not already exist.
Georg Brandl116aa622007-08-15 14:28:22 +0000275
276 If *fileobj* is given, it is used for reading or writing data. If it can be
277 determined, *mode* is overridden by *fileobj*'s mode. *fileobj* will be used
278 from position 0.
279
280 .. note::
281
282 *fileobj* is not closed, when :class:`TarFile` is closed.
283
284 *format* controls the archive format. It must be one of the constants
285 :const:`USTAR_FORMAT`, :const:`GNU_FORMAT` or :const:`PAX_FORMAT` that are
286 defined at module level.
287
Georg Brandl116aa622007-08-15 14:28:22 +0000288 The *tarinfo* argument can be used to replace the default :class:`TarInfo` class
289 with a different one.
290
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000291 If *dereference* is :const:`False`, add symbolic and hard links to the archive. If it
292 is :const:`True`, add the content of the target files to the archive. This has no
Georg Brandl116aa622007-08-15 14:28:22 +0000293 effect on systems that do not support symbolic links.
294
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000295 If *ignore_zeros* is :const:`False`, treat an empty block as the end of the archive.
296 If it is :const:`True`, skip empty (and invalid) blocks and try to get as many members
Georg Brandl116aa622007-08-15 14:28:22 +0000297 as possible. This is only useful for reading concatenated or damaged archives.
298
299 *debug* can be set from ``0`` (no debug messages) up to ``3`` (all debug
300 messages). The messages are written to ``sys.stderr``.
301
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000302 If *errorlevel* is ``0``, all errors are ignored when using :meth:`TarFile.extract`.
Georg Brandl116aa622007-08-15 14:28:22 +0000303 Nevertheless, they appear as error messages in the debug output, when debugging
Antoine Pitrou62ab10a02011-10-12 20:10:51 +0200304 is enabled. If ``1``, all *fatal* errors are raised as :exc:`OSError`
305 exceptions. If ``2``, all *non-fatal* errors are raised as :exc:`TarError`
306 exceptions as well.
Georg Brandl116aa622007-08-15 14:28:22 +0000307
Lars Gustäbel3741eff2007-08-21 12:17:05 +0000308 The *encoding* and *errors* arguments define the character encoding to be
309 used for reading or writing the archive and how conversion errors are going
310 to be handled. The default settings will work for most users.
Georg Brandl116aa622007-08-15 14:28:22 +0000311 See section :ref:`tar-unicode` for in-depth information.
312
Lars Gustäbel3741eff2007-08-21 12:17:05 +0000313 The *pax_headers* argument is an optional dictionary of strings which
Georg Brandl116aa622007-08-15 14:28:22 +0000314 will be added as a pax global header if *format* is :const:`PAX_FORMAT`.
315
Berker Peksag0fe63252015-02-13 21:02:12 +0200316 .. versionchanged:: 3.2
317 Use ``'surrogateescape'`` as the default for the *errors* argument.
318
319 .. versionchanged:: 3.5
320 The ``'x'`` (exclusive creation) mode was added.
Georg Brandl116aa622007-08-15 14:28:22 +0000321
Raymond Hettinger7096e262014-05-23 03:46:52 +0100322.. classmethod:: TarFile.open(...)
Georg Brandl116aa622007-08-15 14:28:22 +0000323
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000324 Alternative constructor. The :func:`tarfile.open` function is actually a
325 shortcut to this classmethod.
Georg Brandl116aa622007-08-15 14:28:22 +0000326
327
328.. method:: TarFile.getmember(name)
329
330 Return a :class:`TarInfo` object for member *name*. If *name* can not be found
331 in the archive, :exc:`KeyError` is raised.
332
333 .. note::
334
335 If a member occurs more than once in the archive, its last occurrence is assumed
336 to be the most up-to-date version.
337
338
339.. method:: TarFile.getmembers()
340
341 Return the members of the archive as a list of :class:`TarInfo` objects. The
342 list has the same order as the members in the archive.
343
344
345.. method:: TarFile.getnames()
346
347 Return the members as a list of their names. It has the same order as the list
348 returned by :meth:`getmembers`.
349
350
Serhiy Storchakaa7eb7462014-08-21 10:01:16 +0300351.. method:: TarFile.list(verbose=True, *, members=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000352
353 Print a table of contents to ``sys.stdout``. If *verbose* is :const:`False`,
354 only the names of the members are printed. If it is :const:`True`, output
Serhiy Storchakaa7eb7462014-08-21 10:01:16 +0300355 similar to that of :program:`ls -l` is produced. If optional *members* is
356 given, it must be a subset of the list returned by :meth:`getmembers`.
357
358 .. versionchanged:: 3.5
359 Added the *members* parameter.
Georg Brandl116aa622007-08-15 14:28:22 +0000360
361
362.. method:: TarFile.next()
363
364 Return the next member of the archive as a :class:`TarInfo` object, when
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000365 :class:`TarFile` is opened for reading. Return :const:`None` if there is no more
Georg Brandl116aa622007-08-15 14:28:22 +0000366 available.
367
368
Eric V. Smith7a803892015-04-15 10:27:58 -0400369.. method:: TarFile.extractall(path=".", members=None, *, numeric_owner=False)
Georg Brandl116aa622007-08-15 14:28:22 +0000370
371 Extract all members from the archive to the current working directory or
372 directory *path*. If optional *members* is given, it must be a subset of the
373 list returned by :meth:`getmembers`. Directory information like owner,
374 modification time and permissions are set after all members have been extracted.
375 This is done to work around two problems: A directory's modification time is
376 reset each time a file is created in it. And, if a directory's permissions do
377 not allow writing, extracting files to it will fail.
378
Eric V. Smith7a803892015-04-15 10:27:58 -0400379 If *numeric_owner* is :const:`True`, the uid and gid numbers from the tarfile
380 are used to set the owner/group for the extracted files. Otherwise, the named
381 values from the tarfile are used.
382
Thomas Wouters47b49bf2007-08-30 22:15:33 +0000383 .. warning::
384
385 Never extract archives from untrusted sources without prior inspection.
386 It is possible that files are created outside of *path*, e.g. members
387 that have absolute filenames starting with ``"/"`` or filenames with two
388 dots ``".."``.
389
Eric V. Smith7a803892015-04-15 10:27:58 -0400390 .. versionchanged:: 3.5
391 Added the *numeric_only* parameter.
Georg Brandl116aa622007-08-15 14:28:22 +0000392
Eric V. Smith7a803892015-04-15 10:27:58 -0400393
394.. method:: TarFile.extract(member, path="", set_attrs=True, *, numeric_owner=False)
Georg Brandl116aa622007-08-15 14:28:22 +0000395
396 Extract a member from the archive to the current working directory, using its
397 full name. Its file information is extracted as accurately as possible. *member*
398 may be a filename or a :class:`TarInfo` object. You can specify a different
Martin v. Löwis16f344d2010-11-01 21:39:13 +0000399 directory using *path*. File attributes (owner, mtime, mode) are set unless
Serhiy Storchakafbc1c262013-11-29 12:17:13 +0200400 *set_attrs* is false.
Georg Brandl116aa622007-08-15 14:28:22 +0000401
Eric V. Smith7a803892015-04-15 10:27:58 -0400402 If *numeric_owner* is :const:`True`, the uid and gid numbers from the tarfile
403 are used to set the owner/group for the extracted files. Otherwise, the named
404 values from the tarfile are used.
405
Georg Brandl116aa622007-08-15 14:28:22 +0000406 .. note::
407
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000408 The :meth:`extract` method does not take care of several extraction issues.
409 In most cases you should consider using the :meth:`extractall` method.
Georg Brandl116aa622007-08-15 14:28:22 +0000410
Thomas Wouters47b49bf2007-08-30 22:15:33 +0000411 .. warning::
412
413 See the warning for :meth:`extractall`.
414
Martin v. Löwis16f344d2010-11-01 21:39:13 +0000415 .. versionchanged:: 3.2
416 Added the *set_attrs* parameter.
Georg Brandl116aa622007-08-15 14:28:22 +0000417
Eric V. Smith7a803892015-04-15 10:27:58 -0400418 .. versionchanged:: 3.5
419 Added the *numeric_only* parameter.
420
Georg Brandl116aa622007-08-15 14:28:22 +0000421.. method:: TarFile.extractfile(member)
422
423 Extract a member from the archive as a file object. *member* may be a filename
Lars Gustäbel7a919e92012-05-05 18:15:03 +0200424 or a :class:`TarInfo` object. If *member* is a regular file or a link, an
425 :class:`io.BufferedReader` object is returned. Otherwise, :const:`None` is
426 returned.
Georg Brandl116aa622007-08-15 14:28:22 +0000427
Lars Gustäbel7a919e92012-05-05 18:15:03 +0200428 .. versionchanged:: 3.3
429 Return an :class:`io.BufferedReader` object.
Georg Brandl116aa622007-08-15 14:28:22 +0000430
431
Raymond Hettingera63a3122011-01-26 20:34:14 +0000432.. method:: TarFile.add(name, arcname=None, recursive=True, exclude=None, *, filter=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000433
Raymond Hettingera63a3122011-01-26 20:34:14 +0000434 Add the file *name* to the archive. *name* may be any type of file
435 (directory, fifo, symbolic link, etc.). If given, *arcname* specifies an
436 alternative name for the file in the archive. Directories are added
437 recursively by default. This can be avoided by setting *recursive* to
438 :const:`False`. If *exclude* is given, it must be a function that takes one
439 filename argument and returns a boolean value. Depending on this value the
440 respective file is either excluded (:const:`True`) or added
441 (:const:`False`). If *filter* is specified it must be a keyword argument. It
442 should be a function that takes a :class:`TarInfo` object argument and
443 returns the changed :class:`TarInfo` object. If it instead returns
444 :const:`None` the :class:`TarInfo` object will be excluded from the
445 archive. See :ref:`tar-examples` for an example.
Lars Gustäbel049d2aa2009-09-12 10:44:00 +0000446
447 .. versionchanged:: 3.2
448 Added the *filter* parameter.
449
450 .. deprecated:: 3.2
451 The *exclude* parameter is deprecated, please use the *filter* parameter
452 instead.
Georg Brandl116aa622007-08-15 14:28:22 +0000453
Georg Brandl116aa622007-08-15 14:28:22 +0000454
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000455.. method:: TarFile.addfile(tarinfo, fileobj=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000456
457 Add the :class:`TarInfo` object *tarinfo* to the archive. If *fileobj* is given,
Martin Panterf817a482016-02-19 23:34:56 +0000458 it should be a :term:`binary file`, and
Georg Brandl116aa622007-08-15 14:28:22 +0000459 ``tarinfo.size`` bytes are read from it and added to the archive. You can
Martin Panterf817a482016-02-19 23:34:56 +0000460 create :class:`TarInfo` objects directly, or by using :meth:`gettarinfo`.
Georg Brandl116aa622007-08-15 14:28:22 +0000461
462
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000463.. method:: TarFile.gettarinfo(name=None, arcname=None, fileobj=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000464
Martin Panterf817a482016-02-19 23:34:56 +0000465 Create a :class:`TarInfo` object from the result of :func:`os.stat` or
466 equivalent on an existing file. The file is either named by *name*, or
467 specified as a :term:`file object` *fileobj* with a file descriptor. If
468 given, *arcname* specifies an alternative name for the file in the
469 archive, otherwise, the name is taken from *fileobj*’s
470 :attr:`~io.FileIO.name` attribute, or the *name* argument. The name
471 should be a text string.
472
473 You can modify
474 some of the :class:`TarInfo`’s attributes before you add it using :meth:`addfile`.
475 If the file object is not an ordinary file object positioned at the
476 beginning of the file, attributes such as :attr:`~TarInfo.size` may need
477 modifying. This is the case for objects such as :class:`~gzip.GzipFile`.
478 The :attr:`~TarInfo.name` may also be modified, in which case *arcname*
479 could be a dummy string.
Georg Brandl116aa622007-08-15 14:28:22 +0000480
481
482.. method:: TarFile.close()
483
484 Close the :class:`TarFile`. In write mode, two finishing zero blocks are
485 appended to the archive.
486
487
Georg Brandl116aa622007-08-15 14:28:22 +0000488.. attribute:: TarFile.pax_headers
489
490 A dictionary containing key-value pairs of pax global headers.
491
Georg Brandl116aa622007-08-15 14:28:22 +0000492
Georg Brandl116aa622007-08-15 14:28:22 +0000493
494.. _tarinfo-objects:
495
496TarInfo Objects
497---------------
498
499A :class:`TarInfo` object represents one member in a :class:`TarFile`. Aside
500from storing all required attributes of a file (like file type, size, time,
501permissions, owner etc.), it provides some useful methods to determine its type.
502It does *not* contain the file's data itself.
503
504:class:`TarInfo` objects are returned by :class:`TarFile`'s methods
505:meth:`getmember`, :meth:`getmembers` and :meth:`gettarinfo`.
506
507
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000508.. class:: TarInfo(name="")
Georg Brandl116aa622007-08-15 14:28:22 +0000509
510 Create a :class:`TarInfo` object.
511
512
Berker Peksag37de9102015-04-19 04:37:35 +0300513.. classmethod:: TarInfo.frombuf(buf, encoding, errors)
Georg Brandl116aa622007-08-15 14:28:22 +0000514
515 Create and return a :class:`TarInfo` object from string buffer *buf*.
516
Berker Peksag37de9102015-04-19 04:37:35 +0300517 Raises :exc:`HeaderError` if the buffer is invalid.
Georg Brandl116aa622007-08-15 14:28:22 +0000518
519
Berker Peksag37de9102015-04-19 04:37:35 +0300520.. classmethod:: TarInfo.fromtarfile(tarfile)
Georg Brandl116aa622007-08-15 14:28:22 +0000521
522 Read the next member from the :class:`TarFile` object *tarfile* and return it as
523 a :class:`TarInfo` object.
524
Georg Brandl116aa622007-08-15 14:28:22 +0000525
Victor Stinnerde629d42010-05-05 21:43:57 +0000526.. method:: TarInfo.tobuf(format=DEFAULT_FORMAT, encoding=ENCODING, errors='surrogateescape')
Georg Brandl116aa622007-08-15 14:28:22 +0000527
528 Create a string buffer from a :class:`TarInfo` object. For information on the
529 arguments see the constructor of the :class:`TarFile` class.
530
Victor Stinnerde629d42010-05-05 21:43:57 +0000531 .. versionchanged:: 3.2
532 Use ``'surrogateescape'`` as the default for the *errors* argument.
533
Georg Brandl116aa622007-08-15 14:28:22 +0000534
535A ``TarInfo`` object has the following public data attributes:
536
537
538.. attribute:: TarInfo.name
539
540 Name of the archive member.
541
542
543.. attribute:: TarInfo.size
544
545 Size in bytes.
546
547
548.. attribute:: TarInfo.mtime
549
550 Time of last modification.
551
552
553.. attribute:: TarInfo.mode
554
555 Permission bits.
556
557
558.. attribute:: TarInfo.type
559
560 File type. *type* is usually one of these constants: :const:`REGTYPE`,
561 :const:`AREGTYPE`, :const:`LNKTYPE`, :const:`SYMTYPE`, :const:`DIRTYPE`,
562 :const:`FIFOTYPE`, :const:`CONTTYPE`, :const:`CHRTYPE`, :const:`BLKTYPE`,
563 :const:`GNUTYPE_SPARSE`. To determine the type of a :class:`TarInfo` object
Raymond Hettingerf7f64f92014-05-23 00:03:45 +0100564 more conveniently, use the ``is*()`` methods below.
Georg Brandl116aa622007-08-15 14:28:22 +0000565
566
567.. attribute:: TarInfo.linkname
568
569 Name of the target file name, which is only present in :class:`TarInfo` objects
570 of type :const:`LNKTYPE` and :const:`SYMTYPE`.
571
572
573.. attribute:: TarInfo.uid
574
575 User ID of the user who originally stored this member.
576
577
578.. attribute:: TarInfo.gid
579
580 Group ID of the user who originally stored this member.
581
582
583.. attribute:: TarInfo.uname
584
585 User name.
586
587
588.. attribute:: TarInfo.gname
589
590 Group name.
591
592
593.. attribute:: TarInfo.pax_headers
594
595 A dictionary containing key-value pairs of an associated pax extended header.
596
Georg Brandl116aa622007-08-15 14:28:22 +0000597
598A :class:`TarInfo` object also provides some convenient query methods:
599
600
601.. method:: TarInfo.isfile()
602
603 Return :const:`True` if the :class:`Tarinfo` object is a regular file.
604
605
606.. method:: TarInfo.isreg()
607
608 Same as :meth:`isfile`.
609
610
611.. method:: TarInfo.isdir()
612
613 Return :const:`True` if it is a directory.
614
615
616.. method:: TarInfo.issym()
617
618 Return :const:`True` if it is a symbolic link.
619
620
621.. method:: TarInfo.islnk()
622
623 Return :const:`True` if it is a hard link.
624
625
626.. method:: TarInfo.ischr()
627
628 Return :const:`True` if it is a character device.
629
630
631.. method:: TarInfo.isblk()
632
633 Return :const:`True` if it is a block device.
634
635
636.. method:: TarInfo.isfifo()
637
638 Return :const:`True` if it is a FIFO.
639
640
641.. method:: TarInfo.isdev()
642
643 Return :const:`True` if it is one of character device, block device or FIFO.
644
Georg Brandl116aa622007-08-15 14:28:22 +0000645
Serhiy Storchakad27b4552013-11-24 01:53:29 +0200646.. _tarfile-commandline:
647
648Command Line Interface
649----------------------
650
651.. versionadded:: 3.4
652
653The :mod:`tarfile` module provides a simple command line interface to interact
654with tar archives.
655
656If you want to create a new tar archive, specify its name after the :option:`-c`
657option and then list the filename(s) that should be included::
658
659 $ python -m tarfile -c monty.tar spam.txt eggs.txt
660
661Passing a directory is also acceptable::
662
663 $ python -m tarfile -c monty.tar life-of-brian_1979/
664
665If you want to extract a tar archive into the current directory, use
666the :option:`-e` option::
667
668 $ python -m tarfile -e monty.tar
669
670You can also extract a tar archive into a different directory by passing the
671directory's name::
672
673 $ python -m tarfile -e monty.tar other-dir/
674
675For a list of the files in a tar archive, use the :option:`-l` option::
676
677 $ python -m tarfile -l monty.tar
678
679
680Command line options
681~~~~~~~~~~~~~~~~~~~~
682
683.. cmdoption:: -l <tarfile>
684 --list <tarfile>
685
686 List files in a tarfile.
687
688.. cmdoption:: -c <tarfile> <source1> <sourceN>
689 --create <tarfile> <source1> <sourceN>
690
691 Create tarfile from source files.
692
693.. cmdoption:: -e <tarfile> [<output_dir>]
694 --extract <tarfile> [<output_dir>]
695
696 Extract tarfile into the current directory if *output_dir* is not specified.
697
698.. cmdoption:: -t <tarfile>
699 --test <tarfile>
700
701 Test whether the tarfile is valid or not.
702
703.. cmdoption:: -v, --verbose
704
705 Verbose output
706
Georg Brandl116aa622007-08-15 14:28:22 +0000707.. _tar-examples:
708
709Examples
710--------
711
712How to extract an entire tar archive to the current working directory::
713
714 import tarfile
715 tar = tarfile.open("sample.tar.gz")
716 tar.extractall()
717 tar.close()
718
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000719How to extract a subset of a tar archive with :meth:`TarFile.extractall` using
720a generator function instead of a list::
721
722 import os
723 import tarfile
724
725 def py_files(members):
726 for tarinfo in members:
727 if os.path.splitext(tarinfo.name)[1] == ".py":
728 yield tarinfo
729
730 tar = tarfile.open("sample.tar.gz")
731 tar.extractall(members=py_files(tar))
732 tar.close()
733
Georg Brandl116aa622007-08-15 14:28:22 +0000734How to create an uncompressed tar archive from a list of filenames::
735
736 import tarfile
737 tar = tarfile.open("sample.tar", "w")
738 for name in ["foo", "bar", "quux"]:
739 tar.add(name)
740 tar.close()
741
Lars Gustäbel01385812010-03-03 12:08:54 +0000742The same example using the :keyword:`with` statement::
743
744 import tarfile
745 with tarfile.open("sample.tar", "w") as tar:
746 for name in ["foo", "bar", "quux"]:
747 tar.add(name)
748
Georg Brandl116aa622007-08-15 14:28:22 +0000749How to read a gzip compressed tar archive and display some member information::
750
751 import tarfile
752 tar = tarfile.open("sample.tar.gz", "r:gz")
753 for tarinfo in tar:
Collin Winterc79461b2007-09-01 23:34:30 +0000754 print(tarinfo.name, "is", tarinfo.size, "bytes in size and is", end="")
Georg Brandl116aa622007-08-15 14:28:22 +0000755 if tarinfo.isreg():
Collin Winterc79461b2007-09-01 23:34:30 +0000756 print("a regular file.")
Georg Brandl116aa622007-08-15 14:28:22 +0000757 elif tarinfo.isdir():
Collin Winterc79461b2007-09-01 23:34:30 +0000758 print("a directory.")
Georg Brandl116aa622007-08-15 14:28:22 +0000759 else:
Collin Winterc79461b2007-09-01 23:34:30 +0000760 print("something else.")
Georg Brandl116aa622007-08-15 14:28:22 +0000761 tar.close()
762
Lars Gustäbel049d2aa2009-09-12 10:44:00 +0000763How to create an archive and reset the user information using the *filter*
764parameter in :meth:`TarFile.add`::
765
766 import tarfile
767 def reset(tarinfo):
768 tarinfo.uid = tarinfo.gid = 0
769 tarinfo.uname = tarinfo.gname = "root"
770 return tarinfo
771 tar = tarfile.open("sample.tar.gz", "w:gz")
772 tar.add("foo", filter=reset)
773 tar.close()
774
Georg Brandl116aa622007-08-15 14:28:22 +0000775
776.. _tar-formats:
777
778Supported tar formats
779---------------------
780
781There are three tar formats that can be created with the :mod:`tarfile` module:
782
783* The POSIX.1-1988 ustar format (:const:`USTAR_FORMAT`). It supports filenames
784 up to a length of at best 256 characters and linknames up to 100 characters. The
Serhiy Storchakaf8def282013-02-16 17:29:56 +0200785 maximum file size is 8 GiB. This is an old and limited but widely
Georg Brandl116aa622007-08-15 14:28:22 +0000786 supported format.
787
788* The GNU tar format (:const:`GNU_FORMAT`). It supports long filenames and
Serhiy Storchakaf8def282013-02-16 17:29:56 +0200789 linknames, files bigger than 8 GiB and sparse files. It is the de facto
Georg Brandl116aa622007-08-15 14:28:22 +0000790 standard on GNU/Linux systems. :mod:`tarfile` fully supports the GNU tar
791 extensions for long names, sparse file support is read-only.
792
793* The POSIX.1-2001 pax format (:const:`PAX_FORMAT`). It is the most flexible
794 format with virtually no limits. It supports long filenames and linknames, large
795 files and stores pathnames in a portable way. However, not all tar
796 implementations today are able to handle pax archives properly.
797
798 The *pax* format is an extension to the existing *ustar* format. It uses extra
799 headers for information that cannot be stored otherwise. There are two flavours
800 of pax headers: Extended headers only affect the subsequent file header, global
801 headers are valid for the complete archive and affect all following files. All
802 the data in a pax header is encoded in *UTF-8* for portability reasons.
803
804There are some more variants of the tar format which can be read, but not
805created:
806
807* The ancient V7 format. This is the first tar format from Unix Seventh Edition,
808 storing only regular files and directories. Names must not be longer than 100
809 characters, there is no user/group name information. Some archives have
810 miscalculated header checksums in case of fields with non-ASCII characters.
811
812* The SunOS tar extended format. This format is a variant of the POSIX.1-2001
813 pax format, but is not compatible.
814
Georg Brandl116aa622007-08-15 14:28:22 +0000815.. _tar-unicode:
816
817Unicode issues
818--------------
819
820The tar format was originally conceived to make backups on tape drives with the
821main focus on preserving file system information. Nowadays tar archives are
822commonly used for file distribution and exchanging archives over networks. One
Lars Gustäbel3741eff2007-08-21 12:17:05 +0000823problem of the original format (which is the basis of all other formats) is
824that there is no concept of supporting different character encodings. For
Georg Brandl116aa622007-08-15 14:28:22 +0000825example, an ordinary tar archive created on a *UTF-8* system cannot be read
Lars Gustäbel3741eff2007-08-21 12:17:05 +0000826correctly on a *Latin-1* system if it contains non-*ASCII* characters. Textual
827metadata (like filenames, linknames, user/group names) will appear damaged.
828Unfortunately, there is no way to autodetect the encoding of an archive. The
829pax format was designed to solve this problem. It stores non-ASCII metadata
830using the universal character encoding *UTF-8*.
Georg Brandl116aa622007-08-15 14:28:22 +0000831
Lars Gustäbel3741eff2007-08-21 12:17:05 +0000832The details of character conversion in :mod:`tarfile` are controlled by the
833*encoding* and *errors* keyword arguments of the :class:`TarFile` class.
Georg Brandl116aa622007-08-15 14:28:22 +0000834
Lars Gustäbel3741eff2007-08-21 12:17:05 +0000835*encoding* defines the character encoding to use for the metadata in the
836archive. The default value is :func:`sys.getfilesystemencoding` or ``'ascii'``
837as a fallback. Depending on whether the archive is read or written, the
838metadata must be either decoded or encoded. If *encoding* is not set
839appropriately, this conversion may fail.
Georg Brandl116aa622007-08-15 14:28:22 +0000840
841The *errors* argument defines how characters are treated that cannot be
Nick Coghlanb9fdb7a2015-01-07 00:22:00 +1000842converted. Possible values are listed in section :ref:`error-handlers`.
Victor Stinnerde629d42010-05-05 21:43:57 +0000843The default scheme is ``'surrogateescape'`` which Python also uses for its
844file system calls, see :ref:`os-filenames`.
Georg Brandl116aa622007-08-15 14:28:22 +0000845
Lars Gustäbel1465cc22010-05-17 18:02:50 +0000846In case of :const:`PAX_FORMAT` archives, *encoding* is generally not needed
847because all the metadata is stored using *UTF-8*. *encoding* is only used in
848the rare cases when binary pax headers are decoded or when strings with
849surrogate characters are stored.