blob: 4fd94fd90d957fe10715da11099835de067b435a [file] [log] [blame]
Georg Brandl116aa622007-08-15 14:28:22 +00001:mod:`tarfile` --- Read and write tar archive files
2===================================================
3
4.. module:: tarfile
5 :synopsis: Read and write tar-format archive files.
6
7
Georg Brandl116aa622007-08-15 14:28:22 +00008.. moduleauthor:: Lars Gustäbel <lars@gustaebel.de>
9.. sectionauthor:: Lars Gustäbel <lars@gustaebel.de>
10
Raymond Hettingera1993682011-01-27 01:20:32 +000011**Source code:** :source:`Lib/tarfile.py`
12
13--------------
Georg Brandl116aa622007-08-15 14:28:22 +000014
Guido van Rossum77677112007-11-05 19:43:04 +000015The :mod:`tarfile` module makes it possible to read and write tar
Lars Gustäbel0a9dd2f2011-12-10 20:38:14 +010016archives, including those using gzip, bz2 and lzma compression.
Éric Araujof2fbb9c2012-01-16 16:55:55 +010017Use the :mod:`zipfile` module to read or write :file:`.zip` files, or the
18higher-level functions in :ref:`shutil <archiving-operations>`.
Guido van Rossum77677112007-11-05 19:43:04 +000019
Georg Brandl116aa622007-08-15 14:28:22 +000020Some facts and figures:
21
R David Murraybf92bce2014-10-03 20:18:48 -040022* reads and writes :mod:`gzip`, :mod:`bz2` and :mod:`lzma` compressed archives
23 if the respective modules are available.
Georg Brandl116aa622007-08-15 14:28:22 +000024
25* read/write support for the POSIX.1-1988 (ustar) format.
26
27* read/write support for the GNU tar format including *longname* and *longlink*
Lars Gustäbel9cbdd752010-10-29 09:08:19 +000028 extensions, read-only support for all variants of the *sparse* extension
29 including restoration of sparse files.
Georg Brandl116aa622007-08-15 14:28:22 +000030
31* read/write support for the POSIX.1-2001 (pax) format.
32
Georg Brandl116aa622007-08-15 14:28:22 +000033* handles directories, regular files, hardlinks, symbolic links, fifos,
34 character devices and block devices and is able to acquire and restore file
35 information like timestamp, access permissions and owner.
36
Lars Gustäbel521dfb02011-12-12 10:22:56 +010037.. versionchanged:: 3.3
38 Added support for :mod:`lzma` compression.
39
Georg Brandl116aa622007-08-15 14:28:22 +000040
Benjamin Petersona37cfc62008-05-26 13:48:34 +000041.. function:: open(name=None, mode='r', fileobj=None, bufsize=10240, \*\*kwargs)
Georg Brandl116aa622007-08-15 14:28:22 +000042
43 Return a :class:`TarFile` object for the pathname *name*. For detailed
44 information on :class:`TarFile` objects and the keyword arguments that are
45 allowed, see :ref:`tarfile-objects`.
46
47 *mode* has to be a string of the form ``'filemode[:compression]'``, it defaults
48 to ``'r'``. Here is a full list of mode combinations:
49
50 +------------------+---------------------------------------------+
51 | mode | action |
52 +==================+=============================================+
53 | ``'r' or 'r:*'`` | Open for reading with transparent |
54 | | compression (recommended). |
55 +------------------+---------------------------------------------+
56 | ``'r:'`` | Open for reading exclusively without |
57 | | compression. |
58 +------------------+---------------------------------------------+
59 | ``'r:gz'`` | Open for reading with gzip compression. |
60 +------------------+---------------------------------------------+
61 | ``'r:bz2'`` | Open for reading with bzip2 compression. |
62 +------------------+---------------------------------------------+
Lars Gustäbel0a9dd2f2011-12-10 20:38:14 +010063 | ``'r:xz'`` | Open for reading with lzma compression. |
64 +------------------+---------------------------------------------+
Berker Peksag0fe63252015-02-13 21:02:12 +020065 | ``'x'`` or | Create a tarfile exclusively without |
66 | ``'x:'`` | compression. |
67 | | Raise an :exc:`FileExistsError` exception |
68 | | if it is already exists. |
69 +------------------+---------------------------------------------+
70 | ``'x:gz'`` | Create a tarfile with gzip compression. |
71 | | Raise an :exc:`FileExistsError` exception |
72 | | if it is already exists. |
73 +------------------+---------------------------------------------+
74 | ``'x:bz2'`` | Create a tarfile with bzip2 compression. |
75 | | Raise an :exc:`FileExistsError` exception |
76 | | if it is already exists. |
77 +------------------+---------------------------------------------+
78 | ``'x:xz'`` | Create a tarfile with lzma compression. |
79 | | Raise an :exc:`FileExistsError` exception |
80 | | if it is already exists. |
81 +------------------+---------------------------------------------+
Georg Brandl116aa622007-08-15 14:28:22 +000082 | ``'a' or 'a:'`` | Open for appending with no compression. The |
83 | | file is created if it does not exist. |
84 +------------------+---------------------------------------------+
85 | ``'w' or 'w:'`` | Open for uncompressed writing. |
86 +------------------+---------------------------------------------+
87 | ``'w:gz'`` | Open for gzip compressed writing. |
88 +------------------+---------------------------------------------+
89 | ``'w:bz2'`` | Open for bzip2 compressed writing. |
90 +------------------+---------------------------------------------+
Lars Gustäbel0a9dd2f2011-12-10 20:38:14 +010091 | ``'w:xz'`` | Open for lzma compressed writing. |
92 +------------------+---------------------------------------------+
Georg Brandl116aa622007-08-15 14:28:22 +000093
Lars Gustäbel0a9dd2f2011-12-10 20:38:14 +010094 Note that ``'a:gz'``, ``'a:bz2'`` or ``'a:xz'`` is not possible. If *mode*
95 is not suitable to open a certain (compressed) file for reading,
96 :exc:`ReadError` is raised. Use *mode* ``'r'`` to avoid this. If a
97 compression method is not supported, :exc:`CompressionError` is raised.
Georg Brandl116aa622007-08-15 14:28:22 +000098
Antoine Pitrou11cb9612010-09-15 11:11:28 +000099 If *fileobj* is specified, it is used as an alternative to a :term:`file object`
100 opened in binary mode for *name*. It is supposed to be at position 0.
Georg Brandl116aa622007-08-15 14:28:22 +0000101
Berker Peksag0fe63252015-02-13 21:02:12 +0200102 For modes ``'w:gz'``, ``'r:gz'``, ``'w:bz2'``, ``'r:bz2'``, ``'x:gz'``,
103 ``'x:bz2'``, :func:`tarfile.open` accepts the keyword argument
104 *compresslevel* to specify the compression level of the file.
Benjamin Peterson9b2731b2014-06-07 12:45:37 -0700105
Georg Brandl116aa622007-08-15 14:28:22 +0000106 For special purposes, there is a second format for *mode*:
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000107 ``'filemode|[compression]'``. :func:`tarfile.open` will return a :class:`TarFile`
Georg Brandl116aa622007-08-15 14:28:22 +0000108 object that processes its data as a stream of blocks. No random seeking will
109 be done on the file. If given, *fileobj* may be any object that has a
110 :meth:`read` or :meth:`write` method (depending on the *mode*). *bufsize*
111 specifies the blocksize and defaults to ``20 * 512`` bytes. Use this variant
Antoine Pitrou11cb9612010-09-15 11:11:28 +0000112 in combination with e.g. ``sys.stdin``, a socket :term:`file object` or a tape
Georg Brandl116aa622007-08-15 14:28:22 +0000113 device. However, such a :class:`TarFile` object is limited in that it does
114 not allow to be accessed randomly, see :ref:`tar-examples`. The currently
115 possible modes:
116
117 +-------------+--------------------------------------------+
118 | Mode | Action |
119 +=============+============================================+
120 | ``'r|*'`` | Open a *stream* of tar blocks for reading |
121 | | with transparent compression. |
122 +-------------+--------------------------------------------+
123 | ``'r|'`` | Open a *stream* of uncompressed tar blocks |
124 | | for reading. |
125 +-------------+--------------------------------------------+
126 | ``'r|gz'`` | Open a gzip compressed *stream* for |
127 | | reading. |
128 +-------------+--------------------------------------------+
129 | ``'r|bz2'`` | Open a bzip2 compressed *stream* for |
130 | | reading. |
131 +-------------+--------------------------------------------+
Lars Gustäbel0a9dd2f2011-12-10 20:38:14 +0100132 | ``'r|xz'`` | Open a lzma compressed *stream* for |
133 | | reading. |
134 +-------------+--------------------------------------------+
Georg Brandl116aa622007-08-15 14:28:22 +0000135 | ``'w|'`` | Open an uncompressed *stream* for writing. |
136 +-------------+--------------------------------------------+
Lars Gustäbel0c6cbbd2011-12-10 12:45:45 +0100137 | ``'w|gz'`` | Open a gzip compressed *stream* for |
Georg Brandl116aa622007-08-15 14:28:22 +0000138 | | writing. |
139 +-------------+--------------------------------------------+
Lars Gustäbel0c6cbbd2011-12-10 12:45:45 +0100140 | ``'w|bz2'`` | Open a bzip2 compressed *stream* for |
Georg Brandl116aa622007-08-15 14:28:22 +0000141 | | writing. |
142 +-------------+--------------------------------------------+
Lars Gustäbel0a9dd2f2011-12-10 20:38:14 +0100143 | ``'w|xz'`` | Open an lzma compressed *stream* for |
144 | | writing. |
145 +-------------+--------------------------------------------+
Georg Brandl116aa622007-08-15 14:28:22 +0000146
Berker Peksag0fe63252015-02-13 21:02:12 +0200147 .. versionchanged:: 3.5
148 The ``'x'`` (exclusive creation) mode was added.
Georg Brandl116aa622007-08-15 14:28:22 +0000149
150.. class:: TarFile
151
152 Class for reading and writing tar archives. Do not use this class directly,
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000153 better use :func:`tarfile.open` instead. See :ref:`tarfile-objects`.
Georg Brandl116aa622007-08-15 14:28:22 +0000154
155
156.. function:: is_tarfile(name)
157
158 Return :const:`True` if *name* is a tar archive file, that the :mod:`tarfile`
159 module can read.
160
161
Lars Gustäbel0c24e8b2008-08-02 11:43:24 +0000162The :mod:`tarfile` module defines the following exceptions:
Georg Brandl116aa622007-08-15 14:28:22 +0000163
164
165.. exception:: TarError
166
167 Base class for all :mod:`tarfile` exceptions.
168
169
170.. exception:: ReadError
171
172 Is raised when a tar archive is opened, that either cannot be handled by the
173 :mod:`tarfile` module or is somehow invalid.
174
175
176.. exception:: CompressionError
177
178 Is raised when a compression method is not supported or when the data cannot be
179 decoded properly.
180
181
182.. exception:: StreamError
183
184 Is raised for the limitations that are typical for stream-like :class:`TarFile`
185 objects.
186
187
188.. exception:: ExtractError
189
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000190 Is raised for *non-fatal* errors when using :meth:`TarFile.extract`, but only if
Georg Brandl116aa622007-08-15 14:28:22 +0000191 :attr:`TarFile.errorlevel`\ ``== 2``.
192
193
194.. exception:: HeaderError
195
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000196 Is raised by :meth:`TarInfo.frombuf` if the buffer it gets is invalid.
197
Georg Brandl116aa622007-08-15 14:28:22 +0000198
R David Murraybf92bce2014-10-03 20:18:48 -0400199The following constants are available at the module level:
200
201.. data:: ENCODING
202
203 The default character encoding: ``'utf-8'`` on Windows, the value returned by
204 :func:`sys.getfilesystemencoding` otherwise.
205
Georg Brandl116aa622007-08-15 14:28:22 +0000206
207Each of the following constants defines a tar archive format that the
208:mod:`tarfile` module is able to create. See section :ref:`tar-formats` for
209details.
210
211
212.. data:: USTAR_FORMAT
213
214 POSIX.1-1988 (ustar) format.
215
216
217.. data:: GNU_FORMAT
218
219 GNU tar format.
220
221
222.. data:: PAX_FORMAT
223
224 POSIX.1-2001 (pax) format.
225
226
227.. data:: DEFAULT_FORMAT
228
229 The default format for creating archives. This is currently :const:`GNU_FORMAT`.
230
231
232.. seealso::
233
234 Module :mod:`zipfile`
235 Documentation of the :mod:`zipfile` standard module.
236
R David Murraybf92bce2014-10-03 20:18:48 -0400237 :ref:`archiving-operations`
238 Documentation of the higher-level archiving facilities provided by the
239 standard :mod:`shutil` module.
240
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000241 `GNU tar manual, Basic Tar Format <http://www.gnu.org/software/tar/manual/html_node/Standard.html>`_
Georg Brandl116aa622007-08-15 14:28:22 +0000242 Documentation for tar archive files, including GNU tar extensions.
243
Georg Brandl116aa622007-08-15 14:28:22 +0000244
245.. _tarfile-objects:
246
247TarFile Objects
248---------------
249
250The :class:`TarFile` object provides an interface to a tar archive. A tar
251archive is a sequence of blocks. An archive member (a stored file) is made up of
252a header block followed by data blocks. It is possible to store a file in a tar
253archive several times. Each archive member is represented by a :class:`TarInfo`
254object, see :ref:`tarinfo-objects` for details.
255
Lars Gustäbel01385812010-03-03 12:08:54 +0000256A :class:`TarFile` object can be used as a context manager in a :keyword:`with`
257statement. It will automatically be closed when the block is completed. Please
258note that in the event of an exception an archive opened for writing will not
Benjamin Peterson08bf91c2010-04-11 16:12:57 +0000259be finalized; only the internally used file object will be closed. See the
Lars Gustäbel01385812010-03-03 12:08:54 +0000260:ref:`tar-examples` section for a use case.
261
262.. versionadded:: 3.2
Serhiy Storchaka14867992014-09-10 23:43:41 +0300263 Added support for the context management protocol.
Georg Brandl116aa622007-08-15 14:28:22 +0000264
Victor Stinnerde629d42010-05-05 21:43:57 +0000265.. class:: TarFile(name=None, mode='r', fileobj=None, format=DEFAULT_FORMAT, tarinfo=TarInfo, dereference=False, ignore_zeros=False, encoding=ENCODING, errors='surrogateescape', pax_headers=None, debug=0, errorlevel=0)
Georg Brandl116aa622007-08-15 14:28:22 +0000266
267 All following arguments are optional and can be accessed as instance attributes
268 as well.
269
270 *name* is the pathname of the archive. It can be omitted if *fileobj* is given.
271 In this case, the file object's :attr:`name` attribute is used if it exists.
272
273 *mode* is either ``'r'`` to read from an existing archive, ``'a'`` to append
Berker Peksag0fe63252015-02-13 21:02:12 +0200274 data to an existing file, ``'w'`` to create a new file overwriting an existing
275 one or ``'x'`` to create a new file only if it's not exists.
Georg Brandl116aa622007-08-15 14:28:22 +0000276
277 If *fileobj* is given, it is used for reading or writing data. If it can be
278 determined, *mode* is overridden by *fileobj*'s mode. *fileobj* will be used
279 from position 0.
280
281 .. note::
282
283 *fileobj* is not closed, when :class:`TarFile` is closed.
284
285 *format* controls the archive format. It must be one of the constants
286 :const:`USTAR_FORMAT`, :const:`GNU_FORMAT` or :const:`PAX_FORMAT` that are
287 defined at module level.
288
Georg Brandl116aa622007-08-15 14:28:22 +0000289 The *tarinfo* argument can be used to replace the default :class:`TarInfo` class
290 with a different one.
291
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000292 If *dereference* is :const:`False`, add symbolic and hard links to the archive. If it
293 is :const:`True`, add the content of the target files to the archive. This has no
Georg Brandl116aa622007-08-15 14:28:22 +0000294 effect on systems that do not support symbolic links.
295
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000296 If *ignore_zeros* is :const:`False`, treat an empty block as the end of the archive.
297 If it is :const:`True`, skip empty (and invalid) blocks and try to get as many members
Georg Brandl116aa622007-08-15 14:28:22 +0000298 as possible. This is only useful for reading concatenated or damaged archives.
299
300 *debug* can be set from ``0`` (no debug messages) up to ``3`` (all debug
301 messages). The messages are written to ``sys.stderr``.
302
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000303 If *errorlevel* is ``0``, all errors are ignored when using :meth:`TarFile.extract`.
Georg Brandl116aa622007-08-15 14:28:22 +0000304 Nevertheless, they appear as error messages in the debug output, when debugging
Antoine Pitrou62ab10a02011-10-12 20:10:51 +0200305 is enabled. If ``1``, all *fatal* errors are raised as :exc:`OSError`
306 exceptions. If ``2``, all *non-fatal* errors are raised as :exc:`TarError`
307 exceptions as well.
Georg Brandl116aa622007-08-15 14:28:22 +0000308
Lars Gustäbel3741eff2007-08-21 12:17:05 +0000309 The *encoding* and *errors* arguments define the character encoding to be
310 used for reading or writing the archive and how conversion errors are going
311 to be handled. The default settings will work for most users.
Georg Brandl116aa622007-08-15 14:28:22 +0000312 See section :ref:`tar-unicode` for in-depth information.
313
Lars Gustäbel3741eff2007-08-21 12:17:05 +0000314 The *pax_headers* argument is an optional dictionary of strings which
Georg Brandl116aa622007-08-15 14:28:22 +0000315 will be added as a pax global header if *format* is :const:`PAX_FORMAT`.
316
Berker Peksag0fe63252015-02-13 21:02:12 +0200317 .. versionchanged:: 3.2
318 Use ``'surrogateescape'`` as the default for the *errors* argument.
319
320 .. versionchanged:: 3.5
321 The ``'x'`` (exclusive creation) mode was added.
Georg Brandl116aa622007-08-15 14:28:22 +0000322
Raymond Hettinger7096e262014-05-23 03:46:52 +0100323.. classmethod:: TarFile.open(...)
Georg Brandl116aa622007-08-15 14:28:22 +0000324
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000325 Alternative constructor. The :func:`tarfile.open` function is actually a
326 shortcut to this classmethod.
Georg Brandl116aa622007-08-15 14:28:22 +0000327
328
329.. method:: TarFile.getmember(name)
330
331 Return a :class:`TarInfo` object for member *name*. If *name* can not be found
332 in the archive, :exc:`KeyError` is raised.
333
334 .. note::
335
336 If a member occurs more than once in the archive, its last occurrence is assumed
337 to be the most up-to-date version.
338
339
340.. method:: TarFile.getmembers()
341
342 Return the members of the archive as a list of :class:`TarInfo` objects. The
343 list has the same order as the members in the archive.
344
345
346.. method:: TarFile.getnames()
347
348 Return the members as a list of their names. It has the same order as the list
349 returned by :meth:`getmembers`.
350
351
Serhiy Storchakaa7eb7462014-08-21 10:01:16 +0300352.. method:: TarFile.list(verbose=True, *, members=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000353
354 Print a table of contents to ``sys.stdout``. If *verbose* is :const:`False`,
355 only the names of the members are printed. If it is :const:`True`, output
Serhiy Storchakaa7eb7462014-08-21 10:01:16 +0300356 similar to that of :program:`ls -l` is produced. If optional *members* is
357 given, it must be a subset of the list returned by :meth:`getmembers`.
358
359 .. versionchanged:: 3.5
360 Added the *members* parameter.
Georg Brandl116aa622007-08-15 14:28:22 +0000361
362
363.. method:: TarFile.next()
364
365 Return the next member of the archive as a :class:`TarInfo` object, when
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000366 :class:`TarFile` is opened for reading. Return :const:`None` if there is no more
Georg Brandl116aa622007-08-15 14:28:22 +0000367 available.
368
369
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000370.. method:: TarFile.extractall(path=".", members=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000371
372 Extract all members from the archive to the current working directory or
373 directory *path*. If optional *members* is given, it must be a subset of the
374 list returned by :meth:`getmembers`. Directory information like owner,
375 modification time and permissions are set after all members have been extracted.
376 This is done to work around two problems: A directory's modification time is
377 reset each time a file is created in it. And, if a directory's permissions do
378 not allow writing, extracting files to it will fail.
379
Thomas Wouters47b49bf2007-08-30 22:15:33 +0000380 .. warning::
381
382 Never extract archives from untrusted sources without prior inspection.
383 It is possible that files are created outside of *path*, e.g. members
384 that have absolute filenames starting with ``"/"`` or filenames with two
385 dots ``".."``.
386
Georg Brandl116aa622007-08-15 14:28:22 +0000387
Martin v. Löwis16f344d2010-11-01 21:39:13 +0000388.. method:: TarFile.extract(member, path="", set_attrs=True)
Georg Brandl116aa622007-08-15 14:28:22 +0000389
390 Extract a member from the archive to the current working directory, using its
391 full name. Its file information is extracted as accurately as possible. *member*
392 may be a filename or a :class:`TarInfo` object. You can specify a different
Martin v. Löwis16f344d2010-11-01 21:39:13 +0000393 directory using *path*. File attributes (owner, mtime, mode) are set unless
Serhiy Storchakafbc1c262013-11-29 12:17:13 +0200394 *set_attrs* is false.
Georg Brandl116aa622007-08-15 14:28:22 +0000395
396 .. note::
397
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000398 The :meth:`extract` method does not take care of several extraction issues.
399 In most cases you should consider using the :meth:`extractall` method.
Georg Brandl116aa622007-08-15 14:28:22 +0000400
Thomas Wouters47b49bf2007-08-30 22:15:33 +0000401 .. warning::
402
403 See the warning for :meth:`extractall`.
404
Martin v. Löwis16f344d2010-11-01 21:39:13 +0000405 .. versionchanged:: 3.2
406 Added the *set_attrs* parameter.
Georg Brandl116aa622007-08-15 14:28:22 +0000407
408.. method:: TarFile.extractfile(member)
409
410 Extract a member from the archive as a file object. *member* may be a filename
Lars Gustäbel7a919e92012-05-05 18:15:03 +0200411 or a :class:`TarInfo` object. If *member* is a regular file or a link, an
412 :class:`io.BufferedReader` object is returned. Otherwise, :const:`None` is
413 returned.
Georg Brandl116aa622007-08-15 14:28:22 +0000414
Lars Gustäbel7a919e92012-05-05 18:15:03 +0200415 .. versionchanged:: 3.3
416 Return an :class:`io.BufferedReader` object.
Georg Brandl116aa622007-08-15 14:28:22 +0000417
418
Raymond Hettingera63a3122011-01-26 20:34:14 +0000419.. method:: TarFile.add(name, arcname=None, recursive=True, exclude=None, *, filter=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000420
Raymond Hettingera63a3122011-01-26 20:34:14 +0000421 Add the file *name* to the archive. *name* may be any type of file
422 (directory, fifo, symbolic link, etc.). If given, *arcname* specifies an
423 alternative name for the file in the archive. Directories are added
424 recursively by default. This can be avoided by setting *recursive* to
425 :const:`False`. If *exclude* is given, it must be a function that takes one
426 filename argument and returns a boolean value. Depending on this value the
427 respective file is either excluded (:const:`True`) or added
428 (:const:`False`). If *filter* is specified it must be a keyword argument. It
429 should be a function that takes a :class:`TarInfo` object argument and
430 returns the changed :class:`TarInfo` object. If it instead returns
431 :const:`None` the :class:`TarInfo` object will be excluded from the
432 archive. See :ref:`tar-examples` for an example.
Lars Gustäbel049d2aa2009-09-12 10:44:00 +0000433
434 .. versionchanged:: 3.2
435 Added the *filter* parameter.
436
437 .. deprecated:: 3.2
438 The *exclude* parameter is deprecated, please use the *filter* parameter
439 instead.
Georg Brandl116aa622007-08-15 14:28:22 +0000440
Georg Brandl116aa622007-08-15 14:28:22 +0000441
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000442.. method:: TarFile.addfile(tarinfo, fileobj=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000443
444 Add the :class:`TarInfo` object *tarinfo* to the archive. If *fileobj* is given,
445 ``tarinfo.size`` bytes are read from it and added to the archive. You can
446 create :class:`TarInfo` objects using :meth:`gettarinfo`.
447
448 .. note::
449
450 On Windows platforms, *fileobj* should always be opened with mode ``'rb'`` to
451 avoid irritation about the file size.
452
453
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000454.. method:: TarFile.gettarinfo(name=None, arcname=None, fileobj=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000455
Antoine Pitrou11cb9612010-09-15 11:11:28 +0000456 Create a :class:`TarInfo` object for either the file *name* or the :term:`file
457 object` *fileobj* (using :func:`os.fstat` on its file descriptor). You can modify
458 some of the :class:`TarInfo`'s attributes before you add it using :meth:`addfile`.
Georg Brandl116aa622007-08-15 14:28:22 +0000459 If given, *arcname* specifies an alternative name for the file in the archive.
460
461
462.. method:: TarFile.close()
463
464 Close the :class:`TarFile`. In write mode, two finishing zero blocks are
465 appended to the archive.
466
467
Georg Brandl116aa622007-08-15 14:28:22 +0000468.. attribute:: TarFile.pax_headers
469
470 A dictionary containing key-value pairs of pax global headers.
471
Georg Brandl116aa622007-08-15 14:28:22 +0000472
Georg Brandl116aa622007-08-15 14:28:22 +0000473
474.. _tarinfo-objects:
475
476TarInfo Objects
477---------------
478
479A :class:`TarInfo` object represents one member in a :class:`TarFile`. Aside
480from storing all required attributes of a file (like file type, size, time,
481permissions, owner etc.), it provides some useful methods to determine its type.
482It does *not* contain the file's data itself.
483
484:class:`TarInfo` objects are returned by :class:`TarFile`'s methods
485:meth:`getmember`, :meth:`getmembers` and :meth:`gettarinfo`.
486
487
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000488.. class:: TarInfo(name="")
Georg Brandl116aa622007-08-15 14:28:22 +0000489
490 Create a :class:`TarInfo` object.
491
492
493.. method:: TarInfo.frombuf(buf)
494
495 Create and return a :class:`TarInfo` object from string buffer *buf*.
496
Georg Brandl55ac8f02007-09-01 13:51:09 +0000497 Raises :exc:`HeaderError` if the buffer is invalid..
Georg Brandl116aa622007-08-15 14:28:22 +0000498
499
500.. method:: TarInfo.fromtarfile(tarfile)
501
502 Read the next member from the :class:`TarFile` object *tarfile* and return it as
503 a :class:`TarInfo` object.
504
Georg Brandl116aa622007-08-15 14:28:22 +0000505
Victor Stinnerde629d42010-05-05 21:43:57 +0000506.. method:: TarInfo.tobuf(format=DEFAULT_FORMAT, encoding=ENCODING, errors='surrogateescape')
Georg Brandl116aa622007-08-15 14:28:22 +0000507
508 Create a string buffer from a :class:`TarInfo` object. For information on the
509 arguments see the constructor of the :class:`TarFile` class.
510
Victor Stinnerde629d42010-05-05 21:43:57 +0000511 .. versionchanged:: 3.2
512 Use ``'surrogateescape'`` as the default for the *errors* argument.
513
Georg Brandl116aa622007-08-15 14:28:22 +0000514
515A ``TarInfo`` object has the following public data attributes:
516
517
518.. attribute:: TarInfo.name
519
520 Name of the archive member.
521
522
523.. attribute:: TarInfo.size
524
525 Size in bytes.
526
527
528.. attribute:: TarInfo.mtime
529
530 Time of last modification.
531
532
533.. attribute:: TarInfo.mode
534
535 Permission bits.
536
537
538.. attribute:: TarInfo.type
539
540 File type. *type* is usually one of these constants: :const:`REGTYPE`,
541 :const:`AREGTYPE`, :const:`LNKTYPE`, :const:`SYMTYPE`, :const:`DIRTYPE`,
542 :const:`FIFOTYPE`, :const:`CONTTYPE`, :const:`CHRTYPE`, :const:`BLKTYPE`,
543 :const:`GNUTYPE_SPARSE`. To determine the type of a :class:`TarInfo` object
Raymond Hettingerf7f64f92014-05-23 00:03:45 +0100544 more conveniently, use the ``is*()`` methods below.
Georg Brandl116aa622007-08-15 14:28:22 +0000545
546
547.. attribute:: TarInfo.linkname
548
549 Name of the target file name, which is only present in :class:`TarInfo` objects
550 of type :const:`LNKTYPE` and :const:`SYMTYPE`.
551
552
553.. attribute:: TarInfo.uid
554
555 User ID of the user who originally stored this member.
556
557
558.. attribute:: TarInfo.gid
559
560 Group ID of the user who originally stored this member.
561
562
563.. attribute:: TarInfo.uname
564
565 User name.
566
567
568.. attribute:: TarInfo.gname
569
570 Group name.
571
572
573.. attribute:: TarInfo.pax_headers
574
575 A dictionary containing key-value pairs of an associated pax extended header.
576
Georg Brandl116aa622007-08-15 14:28:22 +0000577
578A :class:`TarInfo` object also provides some convenient query methods:
579
580
581.. method:: TarInfo.isfile()
582
583 Return :const:`True` if the :class:`Tarinfo` object is a regular file.
584
585
586.. method:: TarInfo.isreg()
587
588 Same as :meth:`isfile`.
589
590
591.. method:: TarInfo.isdir()
592
593 Return :const:`True` if it is a directory.
594
595
596.. method:: TarInfo.issym()
597
598 Return :const:`True` if it is a symbolic link.
599
600
601.. method:: TarInfo.islnk()
602
603 Return :const:`True` if it is a hard link.
604
605
606.. method:: TarInfo.ischr()
607
608 Return :const:`True` if it is a character device.
609
610
611.. method:: TarInfo.isblk()
612
613 Return :const:`True` if it is a block device.
614
615
616.. method:: TarInfo.isfifo()
617
618 Return :const:`True` if it is a FIFO.
619
620
621.. method:: TarInfo.isdev()
622
623 Return :const:`True` if it is one of character device, block device or FIFO.
624
Georg Brandl116aa622007-08-15 14:28:22 +0000625
Serhiy Storchakad27b4552013-11-24 01:53:29 +0200626.. _tarfile-commandline:
627
628Command Line Interface
629----------------------
630
631.. versionadded:: 3.4
632
633The :mod:`tarfile` module provides a simple command line interface to interact
634with tar archives.
635
636If you want to create a new tar archive, specify its name after the :option:`-c`
637option and then list the filename(s) that should be included::
638
639 $ python -m tarfile -c monty.tar spam.txt eggs.txt
640
641Passing a directory is also acceptable::
642
643 $ python -m tarfile -c monty.tar life-of-brian_1979/
644
645If you want to extract a tar archive into the current directory, use
646the :option:`-e` option::
647
648 $ python -m tarfile -e monty.tar
649
650You can also extract a tar archive into a different directory by passing the
651directory's name::
652
653 $ python -m tarfile -e monty.tar other-dir/
654
655For a list of the files in a tar archive, use the :option:`-l` option::
656
657 $ python -m tarfile -l monty.tar
658
659
660Command line options
661~~~~~~~~~~~~~~~~~~~~
662
663.. cmdoption:: -l <tarfile>
664 --list <tarfile>
665
666 List files in a tarfile.
667
668.. cmdoption:: -c <tarfile> <source1> <sourceN>
669 --create <tarfile> <source1> <sourceN>
670
671 Create tarfile from source files.
672
673.. cmdoption:: -e <tarfile> [<output_dir>]
674 --extract <tarfile> [<output_dir>]
675
676 Extract tarfile into the current directory if *output_dir* is not specified.
677
678.. cmdoption:: -t <tarfile>
679 --test <tarfile>
680
681 Test whether the tarfile is valid or not.
682
683.. cmdoption:: -v, --verbose
684
685 Verbose output
686
Georg Brandl116aa622007-08-15 14:28:22 +0000687.. _tar-examples:
688
689Examples
690--------
691
692How to extract an entire tar archive to the current working directory::
693
694 import tarfile
695 tar = tarfile.open("sample.tar.gz")
696 tar.extractall()
697 tar.close()
698
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000699How to extract a subset of a tar archive with :meth:`TarFile.extractall` using
700a generator function instead of a list::
701
702 import os
703 import tarfile
704
705 def py_files(members):
706 for tarinfo in members:
707 if os.path.splitext(tarinfo.name)[1] == ".py":
708 yield tarinfo
709
710 tar = tarfile.open("sample.tar.gz")
711 tar.extractall(members=py_files(tar))
712 tar.close()
713
Georg Brandl116aa622007-08-15 14:28:22 +0000714How to create an uncompressed tar archive from a list of filenames::
715
716 import tarfile
717 tar = tarfile.open("sample.tar", "w")
718 for name in ["foo", "bar", "quux"]:
719 tar.add(name)
720 tar.close()
721
Lars Gustäbel01385812010-03-03 12:08:54 +0000722The same example using the :keyword:`with` statement::
723
724 import tarfile
725 with tarfile.open("sample.tar", "w") as tar:
726 for name in ["foo", "bar", "quux"]:
727 tar.add(name)
728
Georg Brandl116aa622007-08-15 14:28:22 +0000729How to read a gzip compressed tar archive and display some member information::
730
731 import tarfile
732 tar = tarfile.open("sample.tar.gz", "r:gz")
733 for tarinfo in tar:
Collin Winterc79461b2007-09-01 23:34:30 +0000734 print(tarinfo.name, "is", tarinfo.size, "bytes in size and is", end="")
Georg Brandl116aa622007-08-15 14:28:22 +0000735 if tarinfo.isreg():
Collin Winterc79461b2007-09-01 23:34:30 +0000736 print("a regular file.")
Georg Brandl116aa622007-08-15 14:28:22 +0000737 elif tarinfo.isdir():
Collin Winterc79461b2007-09-01 23:34:30 +0000738 print("a directory.")
Georg Brandl116aa622007-08-15 14:28:22 +0000739 else:
Collin Winterc79461b2007-09-01 23:34:30 +0000740 print("something else.")
Georg Brandl116aa622007-08-15 14:28:22 +0000741 tar.close()
742
Lars Gustäbel049d2aa2009-09-12 10:44:00 +0000743How to create an archive and reset the user information using the *filter*
744parameter in :meth:`TarFile.add`::
745
746 import tarfile
747 def reset(tarinfo):
748 tarinfo.uid = tarinfo.gid = 0
749 tarinfo.uname = tarinfo.gname = "root"
750 return tarinfo
751 tar = tarfile.open("sample.tar.gz", "w:gz")
752 tar.add("foo", filter=reset)
753 tar.close()
754
Georg Brandl116aa622007-08-15 14:28:22 +0000755
756.. _tar-formats:
757
758Supported tar formats
759---------------------
760
761There are three tar formats that can be created with the :mod:`tarfile` module:
762
763* The POSIX.1-1988 ustar format (:const:`USTAR_FORMAT`). It supports filenames
764 up to a length of at best 256 characters and linknames up to 100 characters. The
Serhiy Storchakaf8def282013-02-16 17:29:56 +0200765 maximum file size is 8 GiB. This is an old and limited but widely
Georg Brandl116aa622007-08-15 14:28:22 +0000766 supported format.
767
768* The GNU tar format (:const:`GNU_FORMAT`). It supports long filenames and
Serhiy Storchakaf8def282013-02-16 17:29:56 +0200769 linknames, files bigger than 8 GiB and sparse files. It is the de facto
Georg Brandl116aa622007-08-15 14:28:22 +0000770 standard on GNU/Linux systems. :mod:`tarfile` fully supports the GNU tar
771 extensions for long names, sparse file support is read-only.
772
773* The POSIX.1-2001 pax format (:const:`PAX_FORMAT`). It is the most flexible
774 format with virtually no limits. It supports long filenames and linknames, large
775 files and stores pathnames in a portable way. However, not all tar
776 implementations today are able to handle pax archives properly.
777
778 The *pax* format is an extension to the existing *ustar* format. It uses extra
779 headers for information that cannot be stored otherwise. There are two flavours
780 of pax headers: Extended headers only affect the subsequent file header, global
781 headers are valid for the complete archive and affect all following files. All
782 the data in a pax header is encoded in *UTF-8* for portability reasons.
783
784There are some more variants of the tar format which can be read, but not
785created:
786
787* The ancient V7 format. This is the first tar format from Unix Seventh Edition,
788 storing only regular files and directories. Names must not be longer than 100
789 characters, there is no user/group name information. Some archives have
790 miscalculated header checksums in case of fields with non-ASCII characters.
791
792* The SunOS tar extended format. This format is a variant of the POSIX.1-2001
793 pax format, but is not compatible.
794
Georg Brandl116aa622007-08-15 14:28:22 +0000795.. _tar-unicode:
796
797Unicode issues
798--------------
799
800The tar format was originally conceived to make backups on tape drives with the
801main focus on preserving file system information. Nowadays tar archives are
802commonly used for file distribution and exchanging archives over networks. One
Lars Gustäbel3741eff2007-08-21 12:17:05 +0000803problem of the original format (which is the basis of all other formats) is
804that there is no concept of supporting different character encodings. For
Georg Brandl116aa622007-08-15 14:28:22 +0000805example, an ordinary tar archive created on a *UTF-8* system cannot be read
Lars Gustäbel3741eff2007-08-21 12:17:05 +0000806correctly on a *Latin-1* system if it contains non-*ASCII* characters. Textual
807metadata (like filenames, linknames, user/group names) will appear damaged.
808Unfortunately, there is no way to autodetect the encoding of an archive. The
809pax format was designed to solve this problem. It stores non-ASCII metadata
810using the universal character encoding *UTF-8*.
Georg Brandl116aa622007-08-15 14:28:22 +0000811
Lars Gustäbel3741eff2007-08-21 12:17:05 +0000812The details of character conversion in :mod:`tarfile` are controlled by the
813*encoding* and *errors* keyword arguments of the :class:`TarFile` class.
Georg Brandl116aa622007-08-15 14:28:22 +0000814
Lars Gustäbel3741eff2007-08-21 12:17:05 +0000815*encoding* defines the character encoding to use for the metadata in the
816archive. The default value is :func:`sys.getfilesystemencoding` or ``'ascii'``
817as a fallback. Depending on whether the archive is read or written, the
818metadata must be either decoded or encoded. If *encoding* is not set
819appropriately, this conversion may fail.
Georg Brandl116aa622007-08-15 14:28:22 +0000820
821The *errors* argument defines how characters are treated that cannot be
Nick Coghlanb9fdb7a2015-01-07 00:22:00 +1000822converted. Possible values are listed in section :ref:`error-handlers`.
Victor Stinnerde629d42010-05-05 21:43:57 +0000823The default scheme is ``'surrogateescape'`` which Python also uses for its
824file system calls, see :ref:`os-filenames`.
Georg Brandl116aa622007-08-15 14:28:22 +0000825
Lars Gustäbel1465cc22010-05-17 18:02:50 +0000826In case of :const:`PAX_FORMAT` archives, *encoding* is generally not needed
827because all the metadata is stored using *UTF-8*. *encoding* is only used in
828the rare cases when binary pax headers are decoded or when strings with
829surrogate characters are stored.
830