blob: 4a9db8e7f216b594ae6eea8877845bd441e09aa8 [file] [log] [blame]
Georg Brandl116aa622007-08-15 14:28:22 +00001:mod:`tarfile` --- Read and write tar archive files
2===================================================
3
4.. module:: tarfile
5 :synopsis: Read and write tar-format archive files.
6
7
Georg Brandl116aa622007-08-15 14:28:22 +00008.. moduleauthor:: Lars Gustäbel <lars@gustaebel.de>
9.. sectionauthor:: Lars Gustäbel <lars@gustaebel.de>
10
Raymond Hettingera1993682011-01-27 01:20:32 +000011**Source code:** :source:`Lib/tarfile.py`
12
13--------------
Georg Brandl116aa622007-08-15 14:28:22 +000014
Guido van Rossum77677112007-11-05 19:43:04 +000015The :mod:`tarfile` module makes it possible to read and write tar
Lars Gustäbel0a9dd2f2011-12-10 20:38:14 +010016archives, including those using gzip, bz2 and lzma compression.
Christian Heimes255f53b2007-12-08 15:33:56 +000017(:file:`.zip` files can be read and written using the :mod:`zipfile` module.)
Guido van Rossum77677112007-11-05 19:43:04 +000018
Georg Brandl116aa622007-08-15 14:28:22 +000019Some facts and figures:
20
Lars Gustäbel0a9dd2f2011-12-10 20:38:14 +010021* reads and writes :mod:`gzip`, :mod:`bz2` and :mod:`lzma` compressed archives.
Georg Brandl116aa622007-08-15 14:28:22 +000022
23* read/write support for the POSIX.1-1988 (ustar) format.
24
25* read/write support for the GNU tar format including *longname* and *longlink*
Lars Gustäbel9cbdd752010-10-29 09:08:19 +000026 extensions, read-only support for all variants of the *sparse* extension
27 including restoration of sparse files.
Georg Brandl116aa622007-08-15 14:28:22 +000028
29* read/write support for the POSIX.1-2001 (pax) format.
30
Georg Brandl116aa622007-08-15 14:28:22 +000031* handles directories, regular files, hardlinks, symbolic links, fifos,
32 character devices and block devices and is able to acquire and restore file
33 information like timestamp, access permissions and owner.
34
Lars Gustäbel521dfb02011-12-12 10:22:56 +010035.. versionchanged:: 3.3
36 Added support for :mod:`lzma` compression.
37
Georg Brandl116aa622007-08-15 14:28:22 +000038
Benjamin Petersona37cfc62008-05-26 13:48:34 +000039.. function:: open(name=None, mode='r', fileobj=None, bufsize=10240, \*\*kwargs)
Georg Brandl116aa622007-08-15 14:28:22 +000040
41 Return a :class:`TarFile` object for the pathname *name*. For detailed
42 information on :class:`TarFile` objects and the keyword arguments that are
43 allowed, see :ref:`tarfile-objects`.
44
45 *mode* has to be a string of the form ``'filemode[:compression]'``, it defaults
46 to ``'r'``. Here is a full list of mode combinations:
47
48 +------------------+---------------------------------------------+
49 | mode | action |
50 +==================+=============================================+
51 | ``'r' or 'r:*'`` | Open for reading with transparent |
52 | | compression (recommended). |
53 +------------------+---------------------------------------------+
54 | ``'r:'`` | Open for reading exclusively without |
55 | | compression. |
56 +------------------+---------------------------------------------+
57 | ``'r:gz'`` | Open for reading with gzip compression. |
58 +------------------+---------------------------------------------+
59 | ``'r:bz2'`` | Open for reading with bzip2 compression. |
60 +------------------+---------------------------------------------+
Lars Gustäbel0a9dd2f2011-12-10 20:38:14 +010061 | ``'r:xz'`` | Open for reading with lzma compression. |
62 +------------------+---------------------------------------------+
Georg Brandl116aa622007-08-15 14:28:22 +000063 | ``'a' or 'a:'`` | Open for appending with no compression. The |
64 | | file is created if it does not exist. |
65 +------------------+---------------------------------------------+
66 | ``'w' or 'w:'`` | Open for uncompressed writing. |
67 +------------------+---------------------------------------------+
68 | ``'w:gz'`` | Open for gzip compressed writing. |
69 +------------------+---------------------------------------------+
70 | ``'w:bz2'`` | Open for bzip2 compressed writing. |
71 +------------------+---------------------------------------------+
Lars Gustäbel0a9dd2f2011-12-10 20:38:14 +010072 | ``'w:xz'`` | Open for lzma compressed writing. |
73 +------------------+---------------------------------------------+
Georg Brandl116aa622007-08-15 14:28:22 +000074
Lars Gustäbel0a9dd2f2011-12-10 20:38:14 +010075 Note that ``'a:gz'``, ``'a:bz2'`` or ``'a:xz'`` is not possible. If *mode*
76 is not suitable to open a certain (compressed) file for reading,
77 :exc:`ReadError` is raised. Use *mode* ``'r'`` to avoid this. If a
78 compression method is not supported, :exc:`CompressionError` is raised.
Georg Brandl116aa622007-08-15 14:28:22 +000079
Antoine Pitrou11cb9612010-09-15 11:11:28 +000080 If *fileobj* is specified, it is used as an alternative to a :term:`file object`
81 opened in binary mode for *name*. It is supposed to be at position 0.
Georg Brandl116aa622007-08-15 14:28:22 +000082
83 For special purposes, there is a second format for *mode*:
Benjamin Petersona37cfc62008-05-26 13:48:34 +000084 ``'filemode|[compression]'``. :func:`tarfile.open` will return a :class:`TarFile`
Georg Brandl116aa622007-08-15 14:28:22 +000085 object that processes its data as a stream of blocks. No random seeking will
86 be done on the file. If given, *fileobj* may be any object that has a
87 :meth:`read` or :meth:`write` method (depending on the *mode*). *bufsize*
88 specifies the blocksize and defaults to ``20 * 512`` bytes. Use this variant
Antoine Pitrou11cb9612010-09-15 11:11:28 +000089 in combination with e.g. ``sys.stdin``, a socket :term:`file object` or a tape
Georg Brandl116aa622007-08-15 14:28:22 +000090 device. However, such a :class:`TarFile` object is limited in that it does
91 not allow to be accessed randomly, see :ref:`tar-examples`. The currently
92 possible modes:
93
94 +-------------+--------------------------------------------+
95 | Mode | Action |
96 +=============+============================================+
97 | ``'r|*'`` | Open a *stream* of tar blocks for reading |
98 | | with transparent compression. |
99 +-------------+--------------------------------------------+
100 | ``'r|'`` | Open a *stream* of uncompressed tar blocks |
101 | | for reading. |
102 +-------------+--------------------------------------------+
103 | ``'r|gz'`` | Open a gzip compressed *stream* for |
104 | | reading. |
105 +-------------+--------------------------------------------+
106 | ``'r|bz2'`` | Open a bzip2 compressed *stream* for |
107 | | reading. |
108 +-------------+--------------------------------------------+
Lars Gustäbel0a9dd2f2011-12-10 20:38:14 +0100109 | ``'r|xz'`` | Open a lzma compressed *stream* for |
110 | | reading. |
111 +-------------+--------------------------------------------+
Georg Brandl116aa622007-08-15 14:28:22 +0000112 | ``'w|'`` | Open an uncompressed *stream* for writing. |
113 +-------------+--------------------------------------------+
Lars Gustäbel0c6cbbd2011-12-10 12:45:45 +0100114 | ``'w|gz'`` | Open a gzip compressed *stream* for |
Georg Brandl116aa622007-08-15 14:28:22 +0000115 | | writing. |
116 +-------------+--------------------------------------------+
Lars Gustäbel0c6cbbd2011-12-10 12:45:45 +0100117 | ``'w|bz2'`` | Open a bzip2 compressed *stream* for |
Georg Brandl116aa622007-08-15 14:28:22 +0000118 | | writing. |
119 +-------------+--------------------------------------------+
Lars Gustäbel0a9dd2f2011-12-10 20:38:14 +0100120 | ``'w|xz'`` | Open an lzma compressed *stream* for |
121 | | writing. |
122 +-------------+--------------------------------------------+
Georg Brandl116aa622007-08-15 14:28:22 +0000123
124
125.. class:: TarFile
126
127 Class for reading and writing tar archives. Do not use this class directly,
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000128 better use :func:`tarfile.open` instead. See :ref:`tarfile-objects`.
Georg Brandl116aa622007-08-15 14:28:22 +0000129
130
131.. function:: is_tarfile(name)
132
133 Return :const:`True` if *name* is a tar archive file, that the :mod:`tarfile`
134 module can read.
135
136
Lars Gustäbel0c24e8b2008-08-02 11:43:24 +0000137The :mod:`tarfile` module defines the following exceptions:
Georg Brandl116aa622007-08-15 14:28:22 +0000138
139
140.. exception:: TarError
141
142 Base class for all :mod:`tarfile` exceptions.
143
144
145.. exception:: ReadError
146
147 Is raised when a tar archive is opened, that either cannot be handled by the
148 :mod:`tarfile` module or is somehow invalid.
149
150
151.. exception:: CompressionError
152
153 Is raised when a compression method is not supported or when the data cannot be
154 decoded properly.
155
156
157.. exception:: StreamError
158
159 Is raised for the limitations that are typical for stream-like :class:`TarFile`
160 objects.
161
162
163.. exception:: ExtractError
164
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000165 Is raised for *non-fatal* errors when using :meth:`TarFile.extract`, but only if
Georg Brandl116aa622007-08-15 14:28:22 +0000166 :attr:`TarFile.errorlevel`\ ``== 2``.
167
168
169.. exception:: HeaderError
170
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000171 Is raised by :meth:`TarInfo.frombuf` if the buffer it gets is invalid.
172
Georg Brandl116aa622007-08-15 14:28:22 +0000173
Georg Brandl116aa622007-08-15 14:28:22 +0000174
175Each of the following constants defines a tar archive format that the
176:mod:`tarfile` module is able to create. See section :ref:`tar-formats` for
177details.
178
179
180.. data:: USTAR_FORMAT
181
182 POSIX.1-1988 (ustar) format.
183
184
185.. data:: GNU_FORMAT
186
187 GNU tar format.
188
189
190.. data:: PAX_FORMAT
191
192 POSIX.1-2001 (pax) format.
193
194
195.. data:: DEFAULT_FORMAT
196
197 The default format for creating archives. This is currently :const:`GNU_FORMAT`.
198
199
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000200The following variables are available on module level:
201
202
203.. data:: ENCODING
204
Victor Stinner0f35e2c2010-06-11 23:46:47 +0000205 The default character encoding: ``'utf-8'`` on Windows,
206 :func:`sys.getfilesystemencoding` otherwise.
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000207
208
Georg Brandl116aa622007-08-15 14:28:22 +0000209.. seealso::
210
211 Module :mod:`zipfile`
212 Documentation of the :mod:`zipfile` standard module.
213
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000214 `GNU tar manual, Basic Tar Format <http://www.gnu.org/software/tar/manual/html_node/Standard.html>`_
Georg Brandl116aa622007-08-15 14:28:22 +0000215 Documentation for tar archive files, including GNU tar extensions.
216
Georg Brandl116aa622007-08-15 14:28:22 +0000217
218.. _tarfile-objects:
219
220TarFile Objects
221---------------
222
223The :class:`TarFile` object provides an interface to a tar archive. A tar
224archive is a sequence of blocks. An archive member (a stored file) is made up of
225a header block followed by data blocks. It is possible to store a file in a tar
226archive several times. Each archive member is represented by a :class:`TarInfo`
227object, see :ref:`tarinfo-objects` for details.
228
Lars Gustäbel01385812010-03-03 12:08:54 +0000229A :class:`TarFile` object can be used as a context manager in a :keyword:`with`
230statement. It will automatically be closed when the block is completed. Please
231note that in the event of an exception an archive opened for writing will not
Benjamin Peterson08bf91c2010-04-11 16:12:57 +0000232be finalized; only the internally used file object will be closed. See the
Lars Gustäbel01385812010-03-03 12:08:54 +0000233:ref:`tar-examples` section for a use case.
234
235.. versionadded:: 3.2
236 Added support for the context manager protocol.
Georg Brandl116aa622007-08-15 14:28:22 +0000237
Victor Stinnerde629d42010-05-05 21:43:57 +0000238.. class:: TarFile(name=None, mode='r', fileobj=None, format=DEFAULT_FORMAT, tarinfo=TarInfo, dereference=False, ignore_zeros=False, encoding=ENCODING, errors='surrogateescape', pax_headers=None, debug=0, errorlevel=0)
Georg Brandl116aa622007-08-15 14:28:22 +0000239
240 All following arguments are optional and can be accessed as instance attributes
241 as well.
242
243 *name* is the pathname of the archive. It can be omitted if *fileobj* is given.
244 In this case, the file object's :attr:`name` attribute is used if it exists.
245
246 *mode* is either ``'r'`` to read from an existing archive, ``'a'`` to append
247 data to an existing file or ``'w'`` to create a new file overwriting an existing
248 one.
249
250 If *fileobj* is given, it is used for reading or writing data. If it can be
251 determined, *mode* is overridden by *fileobj*'s mode. *fileobj* will be used
252 from position 0.
253
254 .. note::
255
256 *fileobj* is not closed, when :class:`TarFile` is closed.
257
258 *format* controls the archive format. It must be one of the constants
259 :const:`USTAR_FORMAT`, :const:`GNU_FORMAT` or :const:`PAX_FORMAT` that are
260 defined at module level.
261
Georg Brandl116aa622007-08-15 14:28:22 +0000262 The *tarinfo* argument can be used to replace the default :class:`TarInfo` class
263 with a different one.
264
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000265 If *dereference* is :const:`False`, add symbolic and hard links to the archive. If it
266 is :const:`True`, add the content of the target files to the archive. This has no
Georg Brandl116aa622007-08-15 14:28:22 +0000267 effect on systems that do not support symbolic links.
268
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000269 If *ignore_zeros* is :const:`False`, treat an empty block as the end of the archive.
270 If it is :const:`True`, skip empty (and invalid) blocks and try to get as many members
Georg Brandl116aa622007-08-15 14:28:22 +0000271 as possible. This is only useful for reading concatenated or damaged archives.
272
273 *debug* can be set from ``0`` (no debug messages) up to ``3`` (all debug
274 messages). The messages are written to ``sys.stderr``.
275
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000276 If *errorlevel* is ``0``, all errors are ignored when using :meth:`TarFile.extract`.
Georg Brandl116aa622007-08-15 14:28:22 +0000277 Nevertheless, they appear as error messages in the debug output, when debugging
Antoine Pitrou62ab10a2011-10-12 20:10:51 +0200278 is enabled. If ``1``, all *fatal* errors are raised as :exc:`OSError`
279 exceptions. If ``2``, all *non-fatal* errors are raised as :exc:`TarError`
280 exceptions as well.
Georg Brandl116aa622007-08-15 14:28:22 +0000281
Lars Gustäbel3741eff2007-08-21 12:17:05 +0000282 The *encoding* and *errors* arguments define the character encoding to be
283 used for reading or writing the archive and how conversion errors are going
284 to be handled. The default settings will work for most users.
Georg Brandl116aa622007-08-15 14:28:22 +0000285 See section :ref:`tar-unicode` for in-depth information.
286
Victor Stinnerde629d42010-05-05 21:43:57 +0000287 .. versionchanged:: 3.2
288 Use ``'surrogateescape'`` as the default for the *errors* argument.
289
Lars Gustäbel3741eff2007-08-21 12:17:05 +0000290 The *pax_headers* argument is an optional dictionary of strings which
Georg Brandl116aa622007-08-15 14:28:22 +0000291 will be added as a pax global header if *format* is :const:`PAX_FORMAT`.
292
Georg Brandl116aa622007-08-15 14:28:22 +0000293
294.. method:: TarFile.open(...)
295
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000296 Alternative constructor. The :func:`tarfile.open` function is actually a
297 shortcut to this classmethod.
Georg Brandl116aa622007-08-15 14:28:22 +0000298
299
300.. method:: TarFile.getmember(name)
301
302 Return a :class:`TarInfo` object for member *name*. If *name* can not be found
303 in the archive, :exc:`KeyError` is raised.
304
305 .. note::
306
307 If a member occurs more than once in the archive, its last occurrence is assumed
308 to be the most up-to-date version.
309
310
311.. method:: TarFile.getmembers()
312
313 Return the members of the archive as a list of :class:`TarInfo` objects. The
314 list has the same order as the members in the archive.
315
316
317.. method:: TarFile.getnames()
318
319 Return the members as a list of their names. It has the same order as the list
320 returned by :meth:`getmembers`.
321
322
323.. method:: TarFile.list(verbose=True)
324
325 Print a table of contents to ``sys.stdout``. If *verbose* is :const:`False`,
326 only the names of the members are printed. If it is :const:`True`, output
327 similar to that of :program:`ls -l` is produced.
328
329
330.. method:: TarFile.next()
331
332 Return the next member of the archive as a :class:`TarInfo` object, when
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000333 :class:`TarFile` is opened for reading. Return :const:`None` if there is no more
Georg Brandl116aa622007-08-15 14:28:22 +0000334 available.
335
336
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000337.. method:: TarFile.extractall(path=".", members=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000338
339 Extract all members from the archive to the current working directory or
340 directory *path*. If optional *members* is given, it must be a subset of the
341 list returned by :meth:`getmembers`. Directory information like owner,
342 modification time and permissions are set after all members have been extracted.
343 This is done to work around two problems: A directory's modification time is
344 reset each time a file is created in it. And, if a directory's permissions do
345 not allow writing, extracting files to it will fail.
346
Thomas Wouters47b49bf2007-08-30 22:15:33 +0000347 .. warning::
348
349 Never extract archives from untrusted sources without prior inspection.
350 It is possible that files are created outside of *path*, e.g. members
351 that have absolute filenames starting with ``"/"`` or filenames with two
352 dots ``".."``.
353
Georg Brandl116aa622007-08-15 14:28:22 +0000354
Martin v. Löwis16f344d2010-11-01 21:39:13 +0000355.. method:: TarFile.extract(member, path="", set_attrs=True)
Georg Brandl116aa622007-08-15 14:28:22 +0000356
357 Extract a member from the archive to the current working directory, using its
358 full name. Its file information is extracted as accurately as possible. *member*
359 may be a filename or a :class:`TarInfo` object. You can specify a different
Martin v. Löwis16f344d2010-11-01 21:39:13 +0000360 directory using *path*. File attributes (owner, mtime, mode) are set unless
361 *set_attrs* is False.
Georg Brandl116aa622007-08-15 14:28:22 +0000362
363 .. note::
364
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000365 The :meth:`extract` method does not take care of several extraction issues.
366 In most cases you should consider using the :meth:`extractall` method.
Georg Brandl116aa622007-08-15 14:28:22 +0000367
Thomas Wouters47b49bf2007-08-30 22:15:33 +0000368 .. warning::
369
370 See the warning for :meth:`extractall`.
371
Martin v. Löwis16f344d2010-11-01 21:39:13 +0000372 .. versionchanged:: 3.2
373 Added the *set_attrs* parameter.
Georg Brandl116aa622007-08-15 14:28:22 +0000374
375.. method:: TarFile.extractfile(member)
376
377 Extract a member from the archive as a file object. *member* may be a filename
Antoine Pitrou11cb9612010-09-15 11:11:28 +0000378 or a :class:`TarInfo` object. If *member* is a regular file, a :term:`file-like
379 object` is returned. If *member* is a link, a file-like object is constructed from
380 the link's target. If *member* is none of the above, :const:`None` is returned.
Georg Brandl116aa622007-08-15 14:28:22 +0000381
382 .. note::
383
Georg Brandlff2ad0e2009-04-27 16:51:45 +0000384 The file-like object is read-only. It provides the methods
385 :meth:`read`, :meth:`readline`, :meth:`readlines`, :meth:`seek`, :meth:`tell`,
386 and :meth:`close`, and also supports iteration over its lines.
Georg Brandl116aa622007-08-15 14:28:22 +0000387
388
Raymond Hettingera63a3122011-01-26 20:34:14 +0000389.. method:: TarFile.add(name, arcname=None, recursive=True, exclude=None, *, filter=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000390
Raymond Hettingera63a3122011-01-26 20:34:14 +0000391 Add the file *name* to the archive. *name* may be any type of file
392 (directory, fifo, symbolic link, etc.). If given, *arcname* specifies an
393 alternative name for the file in the archive. Directories are added
394 recursively by default. This can be avoided by setting *recursive* to
395 :const:`False`. If *exclude* is given, it must be a function that takes one
396 filename argument and returns a boolean value. Depending on this value the
397 respective file is either excluded (:const:`True`) or added
398 (:const:`False`). If *filter* is specified it must be a keyword argument. It
399 should be a function that takes a :class:`TarInfo` object argument and
400 returns the changed :class:`TarInfo` object. If it instead returns
401 :const:`None` the :class:`TarInfo` object will be excluded from the
402 archive. See :ref:`tar-examples` for an example.
Lars Gustäbel049d2aa2009-09-12 10:44:00 +0000403
404 .. versionchanged:: 3.2
405 Added the *filter* parameter.
406
407 .. deprecated:: 3.2
408 The *exclude* parameter is deprecated, please use the *filter* parameter
409 instead.
Georg Brandl116aa622007-08-15 14:28:22 +0000410
Georg Brandl116aa622007-08-15 14:28:22 +0000411
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000412.. method:: TarFile.addfile(tarinfo, fileobj=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000413
414 Add the :class:`TarInfo` object *tarinfo* to the archive. If *fileobj* is given,
415 ``tarinfo.size`` bytes are read from it and added to the archive. You can
416 create :class:`TarInfo` objects using :meth:`gettarinfo`.
417
418 .. note::
419
420 On Windows platforms, *fileobj* should always be opened with mode ``'rb'`` to
421 avoid irritation about the file size.
422
423
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000424.. method:: TarFile.gettarinfo(name=None, arcname=None, fileobj=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000425
Antoine Pitrou11cb9612010-09-15 11:11:28 +0000426 Create a :class:`TarInfo` object for either the file *name* or the :term:`file
427 object` *fileobj* (using :func:`os.fstat` on its file descriptor). You can modify
428 some of the :class:`TarInfo`'s attributes before you add it using :meth:`addfile`.
Georg Brandl116aa622007-08-15 14:28:22 +0000429 If given, *arcname* specifies an alternative name for the file in the archive.
430
431
432.. method:: TarFile.close()
433
434 Close the :class:`TarFile`. In write mode, two finishing zero blocks are
435 appended to the archive.
436
437
Georg Brandl116aa622007-08-15 14:28:22 +0000438.. attribute:: TarFile.pax_headers
439
440 A dictionary containing key-value pairs of pax global headers.
441
Georg Brandl116aa622007-08-15 14:28:22 +0000442
Georg Brandl116aa622007-08-15 14:28:22 +0000443
444.. _tarinfo-objects:
445
446TarInfo Objects
447---------------
448
449A :class:`TarInfo` object represents one member in a :class:`TarFile`. Aside
450from storing all required attributes of a file (like file type, size, time,
451permissions, owner etc.), it provides some useful methods to determine its type.
452It does *not* contain the file's data itself.
453
454:class:`TarInfo` objects are returned by :class:`TarFile`'s methods
455:meth:`getmember`, :meth:`getmembers` and :meth:`gettarinfo`.
456
457
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000458.. class:: TarInfo(name="")
Georg Brandl116aa622007-08-15 14:28:22 +0000459
460 Create a :class:`TarInfo` object.
461
462
463.. method:: TarInfo.frombuf(buf)
464
465 Create and return a :class:`TarInfo` object from string buffer *buf*.
466
Georg Brandl55ac8f02007-09-01 13:51:09 +0000467 Raises :exc:`HeaderError` if the buffer is invalid..
Georg Brandl116aa622007-08-15 14:28:22 +0000468
469
470.. method:: TarInfo.fromtarfile(tarfile)
471
472 Read the next member from the :class:`TarFile` object *tarfile* and return it as
473 a :class:`TarInfo` object.
474
Georg Brandl116aa622007-08-15 14:28:22 +0000475
Victor Stinnerde629d42010-05-05 21:43:57 +0000476.. method:: TarInfo.tobuf(format=DEFAULT_FORMAT, encoding=ENCODING, errors='surrogateescape')
Georg Brandl116aa622007-08-15 14:28:22 +0000477
478 Create a string buffer from a :class:`TarInfo` object. For information on the
479 arguments see the constructor of the :class:`TarFile` class.
480
Victor Stinnerde629d42010-05-05 21:43:57 +0000481 .. versionchanged:: 3.2
482 Use ``'surrogateescape'`` as the default for the *errors* argument.
483
Georg Brandl116aa622007-08-15 14:28:22 +0000484
485A ``TarInfo`` object has the following public data attributes:
486
487
488.. attribute:: TarInfo.name
489
490 Name of the archive member.
491
492
493.. attribute:: TarInfo.size
494
495 Size in bytes.
496
497
498.. attribute:: TarInfo.mtime
499
500 Time of last modification.
501
502
503.. attribute:: TarInfo.mode
504
505 Permission bits.
506
507
508.. attribute:: TarInfo.type
509
510 File type. *type* is usually one of these constants: :const:`REGTYPE`,
511 :const:`AREGTYPE`, :const:`LNKTYPE`, :const:`SYMTYPE`, :const:`DIRTYPE`,
512 :const:`FIFOTYPE`, :const:`CONTTYPE`, :const:`CHRTYPE`, :const:`BLKTYPE`,
513 :const:`GNUTYPE_SPARSE`. To determine the type of a :class:`TarInfo` object
514 more conveniently, use the ``is_*()`` methods below.
515
516
517.. attribute:: TarInfo.linkname
518
519 Name of the target file name, which is only present in :class:`TarInfo` objects
520 of type :const:`LNKTYPE` and :const:`SYMTYPE`.
521
522
523.. attribute:: TarInfo.uid
524
525 User ID of the user who originally stored this member.
526
527
528.. attribute:: TarInfo.gid
529
530 Group ID of the user who originally stored this member.
531
532
533.. attribute:: TarInfo.uname
534
535 User name.
536
537
538.. attribute:: TarInfo.gname
539
540 Group name.
541
542
543.. attribute:: TarInfo.pax_headers
544
545 A dictionary containing key-value pairs of an associated pax extended header.
546
Georg Brandl116aa622007-08-15 14:28:22 +0000547
548A :class:`TarInfo` object also provides some convenient query methods:
549
550
551.. method:: TarInfo.isfile()
552
553 Return :const:`True` if the :class:`Tarinfo` object is a regular file.
554
555
556.. method:: TarInfo.isreg()
557
558 Same as :meth:`isfile`.
559
560
561.. method:: TarInfo.isdir()
562
563 Return :const:`True` if it is a directory.
564
565
566.. method:: TarInfo.issym()
567
568 Return :const:`True` if it is a symbolic link.
569
570
571.. method:: TarInfo.islnk()
572
573 Return :const:`True` if it is a hard link.
574
575
576.. method:: TarInfo.ischr()
577
578 Return :const:`True` if it is a character device.
579
580
581.. method:: TarInfo.isblk()
582
583 Return :const:`True` if it is a block device.
584
585
586.. method:: TarInfo.isfifo()
587
588 Return :const:`True` if it is a FIFO.
589
590
591.. method:: TarInfo.isdev()
592
593 Return :const:`True` if it is one of character device, block device or FIFO.
594
Georg Brandl116aa622007-08-15 14:28:22 +0000595
596.. _tar-examples:
597
598Examples
599--------
600
601How to extract an entire tar archive to the current working directory::
602
603 import tarfile
604 tar = tarfile.open("sample.tar.gz")
605 tar.extractall()
606 tar.close()
607
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000608How to extract a subset of a tar archive with :meth:`TarFile.extractall` using
609a generator function instead of a list::
610
611 import os
612 import tarfile
613
614 def py_files(members):
615 for tarinfo in members:
616 if os.path.splitext(tarinfo.name)[1] == ".py":
617 yield tarinfo
618
619 tar = tarfile.open("sample.tar.gz")
620 tar.extractall(members=py_files(tar))
621 tar.close()
622
Georg Brandl116aa622007-08-15 14:28:22 +0000623How to create an uncompressed tar archive from a list of filenames::
624
625 import tarfile
626 tar = tarfile.open("sample.tar", "w")
627 for name in ["foo", "bar", "quux"]:
628 tar.add(name)
629 tar.close()
630
Lars Gustäbel01385812010-03-03 12:08:54 +0000631The same example using the :keyword:`with` statement::
632
633 import tarfile
634 with tarfile.open("sample.tar", "w") as tar:
635 for name in ["foo", "bar", "quux"]:
636 tar.add(name)
637
Georg Brandl116aa622007-08-15 14:28:22 +0000638How to read a gzip compressed tar archive and display some member information::
639
640 import tarfile
641 tar = tarfile.open("sample.tar.gz", "r:gz")
642 for tarinfo in tar:
Collin Winterc79461b2007-09-01 23:34:30 +0000643 print(tarinfo.name, "is", tarinfo.size, "bytes in size and is", end="")
Georg Brandl116aa622007-08-15 14:28:22 +0000644 if tarinfo.isreg():
Collin Winterc79461b2007-09-01 23:34:30 +0000645 print("a regular file.")
Georg Brandl116aa622007-08-15 14:28:22 +0000646 elif tarinfo.isdir():
Collin Winterc79461b2007-09-01 23:34:30 +0000647 print("a directory.")
Georg Brandl116aa622007-08-15 14:28:22 +0000648 else:
Collin Winterc79461b2007-09-01 23:34:30 +0000649 print("something else.")
Georg Brandl116aa622007-08-15 14:28:22 +0000650 tar.close()
651
Lars Gustäbel049d2aa2009-09-12 10:44:00 +0000652How to create an archive and reset the user information using the *filter*
653parameter in :meth:`TarFile.add`::
654
655 import tarfile
656 def reset(tarinfo):
657 tarinfo.uid = tarinfo.gid = 0
658 tarinfo.uname = tarinfo.gname = "root"
659 return tarinfo
660 tar = tarfile.open("sample.tar.gz", "w:gz")
661 tar.add("foo", filter=reset)
662 tar.close()
663
Georg Brandl116aa622007-08-15 14:28:22 +0000664
665.. _tar-formats:
666
667Supported tar formats
668---------------------
669
670There are three tar formats that can be created with the :mod:`tarfile` module:
671
672* The POSIX.1-1988 ustar format (:const:`USTAR_FORMAT`). It supports filenames
673 up to a length of at best 256 characters and linknames up to 100 characters. The
674 maximum file size is 8 gigabytes. This is an old and limited but widely
675 supported format.
676
677* The GNU tar format (:const:`GNU_FORMAT`). It supports long filenames and
678 linknames, files bigger than 8 gigabytes and sparse files. It is the de facto
679 standard on GNU/Linux systems. :mod:`tarfile` fully supports the GNU tar
680 extensions for long names, sparse file support is read-only.
681
682* The POSIX.1-2001 pax format (:const:`PAX_FORMAT`). It is the most flexible
683 format with virtually no limits. It supports long filenames and linknames, large
684 files and stores pathnames in a portable way. However, not all tar
685 implementations today are able to handle pax archives properly.
686
687 The *pax* format is an extension to the existing *ustar* format. It uses extra
688 headers for information that cannot be stored otherwise. There are two flavours
689 of pax headers: Extended headers only affect the subsequent file header, global
690 headers are valid for the complete archive and affect all following files. All
691 the data in a pax header is encoded in *UTF-8* for portability reasons.
692
693There are some more variants of the tar format which can be read, but not
694created:
695
696* The ancient V7 format. This is the first tar format from Unix Seventh Edition,
697 storing only regular files and directories. Names must not be longer than 100
698 characters, there is no user/group name information. Some archives have
699 miscalculated header checksums in case of fields with non-ASCII characters.
700
701* The SunOS tar extended format. This format is a variant of the POSIX.1-2001
702 pax format, but is not compatible.
703
Georg Brandl116aa622007-08-15 14:28:22 +0000704.. _tar-unicode:
705
706Unicode issues
707--------------
708
709The tar format was originally conceived to make backups on tape drives with the
710main focus on preserving file system information. Nowadays tar archives are
711commonly used for file distribution and exchanging archives over networks. One
Lars Gustäbel3741eff2007-08-21 12:17:05 +0000712problem of the original format (which is the basis of all other formats) is
713that there is no concept of supporting different character encodings. For
Georg Brandl116aa622007-08-15 14:28:22 +0000714example, an ordinary tar archive created on a *UTF-8* system cannot be read
Lars Gustäbel3741eff2007-08-21 12:17:05 +0000715correctly on a *Latin-1* system if it contains non-*ASCII* characters. Textual
716metadata (like filenames, linknames, user/group names) will appear damaged.
717Unfortunately, there is no way to autodetect the encoding of an archive. The
718pax format was designed to solve this problem. It stores non-ASCII metadata
719using the universal character encoding *UTF-8*.
Georg Brandl116aa622007-08-15 14:28:22 +0000720
Lars Gustäbel3741eff2007-08-21 12:17:05 +0000721The details of character conversion in :mod:`tarfile` are controlled by the
722*encoding* and *errors* keyword arguments of the :class:`TarFile` class.
Georg Brandl116aa622007-08-15 14:28:22 +0000723
Lars Gustäbel3741eff2007-08-21 12:17:05 +0000724*encoding* defines the character encoding to use for the metadata in the
725archive. The default value is :func:`sys.getfilesystemencoding` or ``'ascii'``
726as a fallback. Depending on whether the archive is read or written, the
727metadata must be either decoded or encoded. If *encoding* is not set
728appropriately, this conversion may fail.
Georg Brandl116aa622007-08-15 14:28:22 +0000729
730The *errors* argument defines how characters are treated that cannot be
Victor Stinnerde629d42010-05-05 21:43:57 +0000731converted. Possible values are listed in section :ref:`codec-base-classes`.
732The default scheme is ``'surrogateescape'`` which Python also uses for its
733file system calls, see :ref:`os-filenames`.
Georg Brandl116aa622007-08-15 14:28:22 +0000734
Lars Gustäbel1465cc22010-05-17 18:02:50 +0000735In case of :const:`PAX_FORMAT` archives, *encoding* is generally not needed
736because all the metadata is stored using *UTF-8*. *encoding* is only used in
737the rare cases when binary pax headers are decoded or when strings with
738surrogate characters are stored.
739