blob: 9b7071bf69f307def4baec10faa18719e2b00ba6 [file] [log] [blame]
Georg Brandl116aa622007-08-15 14:28:22 +00001:mod:`tarfile` --- Read and write tar archive files
2===================================================
3
4.. module:: tarfile
5 :synopsis: Read and write tar-format archive files.
6
7
Georg Brandl116aa622007-08-15 14:28:22 +00008.. moduleauthor:: Lars Gustäbel <lars@gustaebel.de>
9.. sectionauthor:: Lars Gustäbel <lars@gustaebel.de>
10
Raymond Hettingera1993682011-01-27 01:20:32 +000011**Source code:** :source:`Lib/tarfile.py`
12
13--------------
Georg Brandl116aa622007-08-15 14:28:22 +000014
Guido van Rossum77677112007-11-05 19:43:04 +000015The :mod:`tarfile` module makes it possible to read and write tar
16archives, including those using gzip or bz2 compression.
Christian Heimes255f53b2007-12-08 15:33:56 +000017(:file:`.zip` files can be read and written using the :mod:`zipfile` module.)
Guido van Rossum77677112007-11-05 19:43:04 +000018
Georg Brandl116aa622007-08-15 14:28:22 +000019Some facts and figures:
20
Guido van Rossum77677112007-11-05 19:43:04 +000021* reads and writes :mod:`gzip` and :mod:`bz2` compressed archives.
Georg Brandl116aa622007-08-15 14:28:22 +000022
23* read/write support for the POSIX.1-1988 (ustar) format.
24
25* read/write support for the GNU tar format including *longname* and *longlink*
Lars Gustäbel9cbdd752010-10-29 09:08:19 +000026 extensions, read-only support for all variants of the *sparse* extension
27 including restoration of sparse files.
Georg Brandl116aa622007-08-15 14:28:22 +000028
29* read/write support for the POSIX.1-2001 (pax) format.
30
Georg Brandl116aa622007-08-15 14:28:22 +000031* handles directories, regular files, hardlinks, symbolic links, fifos,
32 character devices and block devices and is able to acquire and restore file
33 information like timestamp, access permissions and owner.
34
Georg Brandl116aa622007-08-15 14:28:22 +000035
Benjamin Petersona37cfc62008-05-26 13:48:34 +000036.. function:: open(name=None, mode='r', fileobj=None, bufsize=10240, \*\*kwargs)
Georg Brandl116aa622007-08-15 14:28:22 +000037
38 Return a :class:`TarFile` object for the pathname *name*. For detailed
39 information on :class:`TarFile` objects and the keyword arguments that are
40 allowed, see :ref:`tarfile-objects`.
41
42 *mode* has to be a string of the form ``'filemode[:compression]'``, it defaults
43 to ``'r'``. Here is a full list of mode combinations:
44
45 +------------------+---------------------------------------------+
46 | mode | action |
47 +==================+=============================================+
48 | ``'r' or 'r:*'`` | Open for reading with transparent |
49 | | compression (recommended). |
50 +------------------+---------------------------------------------+
51 | ``'r:'`` | Open for reading exclusively without |
52 | | compression. |
53 +------------------+---------------------------------------------+
54 | ``'r:gz'`` | Open for reading with gzip compression. |
55 +------------------+---------------------------------------------+
56 | ``'r:bz2'`` | Open for reading with bzip2 compression. |
57 +------------------+---------------------------------------------+
58 | ``'a' or 'a:'`` | Open for appending with no compression. The |
59 | | file is created if it does not exist. |
60 +------------------+---------------------------------------------+
61 | ``'w' or 'w:'`` | Open for uncompressed writing. |
62 +------------------+---------------------------------------------+
63 | ``'w:gz'`` | Open for gzip compressed writing. |
64 +------------------+---------------------------------------------+
65 | ``'w:bz2'`` | Open for bzip2 compressed writing. |
66 +------------------+---------------------------------------------+
67
68 Note that ``'a:gz'`` or ``'a:bz2'`` is not possible. If *mode* is not suitable
69 to open a certain (compressed) file for reading, :exc:`ReadError` is raised. Use
70 *mode* ``'r'`` to avoid this. If a compression method is not supported,
71 :exc:`CompressionError` is raised.
72
Antoine Pitrou11cb9612010-09-15 11:11:28 +000073 If *fileobj* is specified, it is used as an alternative to a :term:`file object`
74 opened in binary mode for *name*. It is supposed to be at position 0.
Georg Brandl116aa622007-08-15 14:28:22 +000075
76 For special purposes, there is a second format for *mode*:
Benjamin Petersona37cfc62008-05-26 13:48:34 +000077 ``'filemode|[compression]'``. :func:`tarfile.open` will return a :class:`TarFile`
Georg Brandl116aa622007-08-15 14:28:22 +000078 object that processes its data as a stream of blocks. No random seeking will
79 be done on the file. If given, *fileobj* may be any object that has a
80 :meth:`read` or :meth:`write` method (depending on the *mode*). *bufsize*
81 specifies the blocksize and defaults to ``20 * 512`` bytes. Use this variant
Antoine Pitrou11cb9612010-09-15 11:11:28 +000082 in combination with e.g. ``sys.stdin``, a socket :term:`file object` or a tape
Georg Brandl116aa622007-08-15 14:28:22 +000083 device. However, such a :class:`TarFile` object is limited in that it does
84 not allow to be accessed randomly, see :ref:`tar-examples`. The currently
85 possible modes:
86
87 +-------------+--------------------------------------------+
88 | Mode | Action |
89 +=============+============================================+
90 | ``'r|*'`` | Open a *stream* of tar blocks for reading |
91 | | with transparent compression. |
92 +-------------+--------------------------------------------+
93 | ``'r|'`` | Open a *stream* of uncompressed tar blocks |
94 | | for reading. |
95 +-------------+--------------------------------------------+
96 | ``'r|gz'`` | Open a gzip compressed *stream* for |
97 | | reading. |
98 +-------------+--------------------------------------------+
99 | ``'r|bz2'`` | Open a bzip2 compressed *stream* for |
100 | | reading. |
101 +-------------+--------------------------------------------+
102 | ``'w|'`` | Open an uncompressed *stream* for writing. |
103 +-------------+--------------------------------------------+
104 | ``'w|gz'`` | Open an gzip compressed *stream* for |
105 | | writing. |
106 +-------------+--------------------------------------------+
107 | ``'w|bz2'`` | Open an bzip2 compressed *stream* for |
108 | | writing. |
109 +-------------+--------------------------------------------+
110
111
112.. class:: TarFile
113
114 Class for reading and writing tar archives. Do not use this class directly,
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000115 better use :func:`tarfile.open` instead. See :ref:`tarfile-objects`.
Georg Brandl116aa622007-08-15 14:28:22 +0000116
117
118.. function:: is_tarfile(name)
119
120 Return :const:`True` if *name* is a tar archive file, that the :mod:`tarfile`
121 module can read.
122
123
Lars Gustäbel0c24e8b2008-08-02 11:43:24 +0000124The :mod:`tarfile` module defines the following exceptions:
Georg Brandl116aa622007-08-15 14:28:22 +0000125
126
127.. exception:: TarError
128
129 Base class for all :mod:`tarfile` exceptions.
130
131
132.. exception:: ReadError
133
134 Is raised when a tar archive is opened, that either cannot be handled by the
135 :mod:`tarfile` module or is somehow invalid.
136
137
138.. exception:: CompressionError
139
140 Is raised when a compression method is not supported or when the data cannot be
141 decoded properly.
142
143
144.. exception:: StreamError
145
146 Is raised for the limitations that are typical for stream-like :class:`TarFile`
147 objects.
148
149
150.. exception:: ExtractError
151
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000152 Is raised for *non-fatal* errors when using :meth:`TarFile.extract`, but only if
Georg Brandl116aa622007-08-15 14:28:22 +0000153 :attr:`TarFile.errorlevel`\ ``== 2``.
154
155
156.. exception:: HeaderError
157
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000158 Is raised by :meth:`TarInfo.frombuf` if the buffer it gets is invalid.
159
Georg Brandl116aa622007-08-15 14:28:22 +0000160
Georg Brandl116aa622007-08-15 14:28:22 +0000161
162Each of the following constants defines a tar archive format that the
163:mod:`tarfile` module is able to create. See section :ref:`tar-formats` for
164details.
165
166
167.. data:: USTAR_FORMAT
168
169 POSIX.1-1988 (ustar) format.
170
171
172.. data:: GNU_FORMAT
173
174 GNU tar format.
175
176
177.. data:: PAX_FORMAT
178
179 POSIX.1-2001 (pax) format.
180
181
182.. data:: DEFAULT_FORMAT
183
184 The default format for creating archives. This is currently :const:`GNU_FORMAT`.
185
186
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000187The following variables are available on module level:
188
189
190.. data:: ENCODING
191
Victor Stinner0f35e2c2010-06-11 23:46:47 +0000192 The default character encoding: ``'utf-8'`` on Windows,
193 :func:`sys.getfilesystemencoding` otherwise.
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000194
195
Georg Brandl116aa622007-08-15 14:28:22 +0000196.. seealso::
197
198 Module :mod:`zipfile`
199 Documentation of the :mod:`zipfile` standard module.
200
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000201 `GNU tar manual, Basic Tar Format <http://www.gnu.org/software/tar/manual/html_node/Standard.html>`_
Georg Brandl116aa622007-08-15 14:28:22 +0000202 Documentation for tar archive files, including GNU tar extensions.
203
Georg Brandl116aa622007-08-15 14:28:22 +0000204
205.. _tarfile-objects:
206
207TarFile Objects
208---------------
209
210The :class:`TarFile` object provides an interface to a tar archive. A tar
211archive is a sequence of blocks. An archive member (a stored file) is made up of
212a header block followed by data blocks. It is possible to store a file in a tar
213archive several times. Each archive member is represented by a :class:`TarInfo`
214object, see :ref:`tarinfo-objects` for details.
215
Lars Gustäbel01385812010-03-03 12:08:54 +0000216A :class:`TarFile` object can be used as a context manager in a :keyword:`with`
217statement. It will automatically be closed when the block is completed. Please
218note that in the event of an exception an archive opened for writing will not
Benjamin Peterson08bf91c2010-04-11 16:12:57 +0000219be finalized; only the internally used file object will be closed. See the
Lars Gustäbel01385812010-03-03 12:08:54 +0000220:ref:`tar-examples` section for a use case.
221
222.. versionadded:: 3.2
223 Added support for the context manager protocol.
Georg Brandl116aa622007-08-15 14:28:22 +0000224
Victor Stinnerde629d42010-05-05 21:43:57 +0000225.. class:: TarFile(name=None, mode='r', fileobj=None, format=DEFAULT_FORMAT, tarinfo=TarInfo, dereference=False, ignore_zeros=False, encoding=ENCODING, errors='surrogateescape', pax_headers=None, debug=0, errorlevel=0)
Georg Brandl116aa622007-08-15 14:28:22 +0000226
227 All following arguments are optional and can be accessed as instance attributes
228 as well.
229
230 *name* is the pathname of the archive. It can be omitted if *fileobj* is given.
231 In this case, the file object's :attr:`name` attribute is used if it exists.
232
233 *mode* is either ``'r'`` to read from an existing archive, ``'a'`` to append
234 data to an existing file or ``'w'`` to create a new file overwriting an existing
235 one.
236
237 If *fileobj* is given, it is used for reading or writing data. If it can be
238 determined, *mode* is overridden by *fileobj*'s mode. *fileobj* will be used
239 from position 0.
240
241 .. note::
242
243 *fileobj* is not closed, when :class:`TarFile` is closed.
244
245 *format* controls the archive format. It must be one of the constants
246 :const:`USTAR_FORMAT`, :const:`GNU_FORMAT` or :const:`PAX_FORMAT` that are
247 defined at module level.
248
Georg Brandl116aa622007-08-15 14:28:22 +0000249 The *tarinfo* argument can be used to replace the default :class:`TarInfo` class
250 with a different one.
251
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000252 If *dereference* is :const:`False`, add symbolic and hard links to the archive. If it
253 is :const:`True`, add the content of the target files to the archive. This has no
Georg Brandl116aa622007-08-15 14:28:22 +0000254 effect on systems that do not support symbolic links.
255
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000256 If *ignore_zeros* is :const:`False`, treat an empty block as the end of the archive.
257 If it is :const:`True`, skip empty (and invalid) blocks and try to get as many members
Georg Brandl116aa622007-08-15 14:28:22 +0000258 as possible. This is only useful for reading concatenated or damaged archives.
259
260 *debug* can be set from ``0`` (no debug messages) up to ``3`` (all debug
261 messages). The messages are written to ``sys.stderr``.
262
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000263 If *errorlevel* is ``0``, all errors are ignored when using :meth:`TarFile.extract`.
Georg Brandl116aa622007-08-15 14:28:22 +0000264 Nevertheless, they appear as error messages in the debug output, when debugging
265 is enabled. If ``1``, all *fatal* errors are raised as :exc:`OSError` or
266 :exc:`IOError` exceptions. If ``2``, all *non-fatal* errors are raised as
267 :exc:`TarError` exceptions as well.
268
Lars Gustäbel3741eff2007-08-21 12:17:05 +0000269 The *encoding* and *errors* arguments define the character encoding to be
270 used for reading or writing the archive and how conversion errors are going
271 to be handled. The default settings will work for most users.
Georg Brandl116aa622007-08-15 14:28:22 +0000272 See section :ref:`tar-unicode` for in-depth information.
273
Victor Stinnerde629d42010-05-05 21:43:57 +0000274 .. versionchanged:: 3.2
275 Use ``'surrogateescape'`` as the default for the *errors* argument.
276
Lars Gustäbel3741eff2007-08-21 12:17:05 +0000277 The *pax_headers* argument is an optional dictionary of strings which
Georg Brandl116aa622007-08-15 14:28:22 +0000278 will be added as a pax global header if *format* is :const:`PAX_FORMAT`.
279
Georg Brandl116aa622007-08-15 14:28:22 +0000280
281.. method:: TarFile.open(...)
282
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000283 Alternative constructor. The :func:`tarfile.open` function is actually a
284 shortcut to this classmethod.
Georg Brandl116aa622007-08-15 14:28:22 +0000285
286
287.. method:: TarFile.getmember(name)
288
289 Return a :class:`TarInfo` object for member *name*. If *name* can not be found
290 in the archive, :exc:`KeyError` is raised.
291
292 .. note::
293
294 If a member occurs more than once in the archive, its last occurrence is assumed
295 to be the most up-to-date version.
296
297
298.. method:: TarFile.getmembers()
299
300 Return the members of the archive as a list of :class:`TarInfo` objects. The
301 list has the same order as the members in the archive.
302
303
304.. method:: TarFile.getnames()
305
306 Return the members as a list of their names. It has the same order as the list
307 returned by :meth:`getmembers`.
308
309
310.. method:: TarFile.list(verbose=True)
311
312 Print a table of contents to ``sys.stdout``. If *verbose* is :const:`False`,
313 only the names of the members are printed. If it is :const:`True`, output
314 similar to that of :program:`ls -l` is produced.
315
316
317.. method:: TarFile.next()
318
319 Return the next member of the archive as a :class:`TarInfo` object, when
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000320 :class:`TarFile` is opened for reading. Return :const:`None` if there is no more
Georg Brandl116aa622007-08-15 14:28:22 +0000321 available.
322
323
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000324.. method:: TarFile.extractall(path=".", members=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000325
326 Extract all members from the archive to the current working directory or
327 directory *path*. If optional *members* is given, it must be a subset of the
328 list returned by :meth:`getmembers`. Directory information like owner,
329 modification time and permissions are set after all members have been extracted.
330 This is done to work around two problems: A directory's modification time is
331 reset each time a file is created in it. And, if a directory's permissions do
332 not allow writing, extracting files to it will fail.
333
Thomas Wouters47b49bf2007-08-30 22:15:33 +0000334 .. warning::
335
336 Never extract archives from untrusted sources without prior inspection.
337 It is possible that files are created outside of *path*, e.g. members
338 that have absolute filenames starting with ``"/"`` or filenames with two
339 dots ``".."``.
340
Georg Brandl116aa622007-08-15 14:28:22 +0000341
Martin v. Löwis16f344d2010-11-01 21:39:13 +0000342.. method:: TarFile.extract(member, path="", set_attrs=True)
Georg Brandl116aa622007-08-15 14:28:22 +0000343
344 Extract a member from the archive to the current working directory, using its
345 full name. Its file information is extracted as accurately as possible. *member*
346 may be a filename or a :class:`TarInfo` object. You can specify a different
Martin v. Löwis16f344d2010-11-01 21:39:13 +0000347 directory using *path*. File attributes (owner, mtime, mode) are set unless
348 *set_attrs* is False.
Georg Brandl116aa622007-08-15 14:28:22 +0000349
350 .. note::
351
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000352 The :meth:`extract` method does not take care of several extraction issues.
353 In most cases you should consider using the :meth:`extractall` method.
Georg Brandl116aa622007-08-15 14:28:22 +0000354
Thomas Wouters47b49bf2007-08-30 22:15:33 +0000355 .. warning::
356
357 See the warning for :meth:`extractall`.
358
Martin v. Löwis16f344d2010-11-01 21:39:13 +0000359 .. versionchanged:: 3.2
360 Added the *set_attrs* parameter.
Georg Brandl116aa622007-08-15 14:28:22 +0000361
362.. method:: TarFile.extractfile(member)
363
364 Extract a member from the archive as a file object. *member* may be a filename
Antoine Pitrou11cb9612010-09-15 11:11:28 +0000365 or a :class:`TarInfo` object. If *member* is a regular file, a :term:`file-like
366 object` is returned. If *member* is a link, a file-like object is constructed from
367 the link's target. If *member* is none of the above, :const:`None` is returned.
Georg Brandl116aa622007-08-15 14:28:22 +0000368
369 .. note::
370
Georg Brandlff2ad0e2009-04-27 16:51:45 +0000371 The file-like object is read-only. It provides the methods
372 :meth:`read`, :meth:`readline`, :meth:`readlines`, :meth:`seek`, :meth:`tell`,
373 and :meth:`close`, and also supports iteration over its lines.
Georg Brandl116aa622007-08-15 14:28:22 +0000374
375
Raymond Hettingera63a3122011-01-26 20:34:14 +0000376.. method:: TarFile.add(name, arcname=None, recursive=True, exclude=None, *, filter=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000377
Raymond Hettingera63a3122011-01-26 20:34:14 +0000378 Add the file *name* to the archive. *name* may be any type of file
379 (directory, fifo, symbolic link, etc.). If given, *arcname* specifies an
380 alternative name for the file in the archive. Directories are added
381 recursively by default. This can be avoided by setting *recursive* to
382 :const:`False`. If *exclude* is given, it must be a function that takes one
383 filename argument and returns a boolean value. Depending on this value the
384 respective file is either excluded (:const:`True`) or added
385 (:const:`False`). If *filter* is specified it must be a keyword argument. It
386 should be a function that takes a :class:`TarInfo` object argument and
387 returns the changed :class:`TarInfo` object. If it instead returns
388 :const:`None` the :class:`TarInfo` object will be excluded from the
389 archive. See :ref:`tar-examples` for an example.
Lars Gustäbel049d2aa2009-09-12 10:44:00 +0000390
391 .. versionchanged:: 3.2
392 Added the *filter* parameter.
393
394 .. deprecated:: 3.2
395 The *exclude* parameter is deprecated, please use the *filter* parameter
396 instead.
Georg Brandl116aa622007-08-15 14:28:22 +0000397
Georg Brandl116aa622007-08-15 14:28:22 +0000398
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000399.. method:: TarFile.addfile(tarinfo, fileobj=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000400
401 Add the :class:`TarInfo` object *tarinfo* to the archive. If *fileobj* is given,
402 ``tarinfo.size`` bytes are read from it and added to the archive. You can
403 create :class:`TarInfo` objects using :meth:`gettarinfo`.
404
405 .. note::
406
407 On Windows platforms, *fileobj* should always be opened with mode ``'rb'`` to
408 avoid irritation about the file size.
409
410
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000411.. method:: TarFile.gettarinfo(name=None, arcname=None, fileobj=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000412
Antoine Pitrou11cb9612010-09-15 11:11:28 +0000413 Create a :class:`TarInfo` object for either the file *name* or the :term:`file
414 object` *fileobj* (using :func:`os.fstat` on its file descriptor). You can modify
415 some of the :class:`TarInfo`'s attributes before you add it using :meth:`addfile`.
Georg Brandl116aa622007-08-15 14:28:22 +0000416 If given, *arcname* specifies an alternative name for the file in the archive.
417
418
419.. method:: TarFile.close()
420
421 Close the :class:`TarFile`. In write mode, two finishing zero blocks are
422 appended to the archive.
423
424
Georg Brandl116aa622007-08-15 14:28:22 +0000425.. attribute:: TarFile.pax_headers
426
427 A dictionary containing key-value pairs of pax global headers.
428
Georg Brandl116aa622007-08-15 14:28:22 +0000429
Georg Brandl116aa622007-08-15 14:28:22 +0000430
431.. _tarinfo-objects:
432
433TarInfo Objects
434---------------
435
436A :class:`TarInfo` object represents one member in a :class:`TarFile`. Aside
437from storing all required attributes of a file (like file type, size, time,
438permissions, owner etc.), it provides some useful methods to determine its type.
439It does *not* contain the file's data itself.
440
441:class:`TarInfo` objects are returned by :class:`TarFile`'s methods
442:meth:`getmember`, :meth:`getmembers` and :meth:`gettarinfo`.
443
444
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000445.. class:: TarInfo(name="")
Georg Brandl116aa622007-08-15 14:28:22 +0000446
447 Create a :class:`TarInfo` object.
448
449
450.. method:: TarInfo.frombuf(buf)
451
452 Create and return a :class:`TarInfo` object from string buffer *buf*.
453
Georg Brandl55ac8f02007-09-01 13:51:09 +0000454 Raises :exc:`HeaderError` if the buffer is invalid..
Georg Brandl116aa622007-08-15 14:28:22 +0000455
456
457.. method:: TarInfo.fromtarfile(tarfile)
458
459 Read the next member from the :class:`TarFile` object *tarfile* and return it as
460 a :class:`TarInfo` object.
461
Georg Brandl116aa622007-08-15 14:28:22 +0000462
Victor Stinnerde629d42010-05-05 21:43:57 +0000463.. method:: TarInfo.tobuf(format=DEFAULT_FORMAT, encoding=ENCODING, errors='surrogateescape')
Georg Brandl116aa622007-08-15 14:28:22 +0000464
465 Create a string buffer from a :class:`TarInfo` object. For information on the
466 arguments see the constructor of the :class:`TarFile` class.
467
Victor Stinnerde629d42010-05-05 21:43:57 +0000468 .. versionchanged:: 3.2
469 Use ``'surrogateescape'`` as the default for the *errors* argument.
470
Georg Brandl116aa622007-08-15 14:28:22 +0000471
472A ``TarInfo`` object has the following public data attributes:
473
474
475.. attribute:: TarInfo.name
476
477 Name of the archive member.
478
479
480.. attribute:: TarInfo.size
481
482 Size in bytes.
483
484
485.. attribute:: TarInfo.mtime
486
487 Time of last modification.
488
489
490.. attribute:: TarInfo.mode
491
492 Permission bits.
493
494
495.. attribute:: TarInfo.type
496
497 File type. *type* is usually one of these constants: :const:`REGTYPE`,
498 :const:`AREGTYPE`, :const:`LNKTYPE`, :const:`SYMTYPE`, :const:`DIRTYPE`,
499 :const:`FIFOTYPE`, :const:`CONTTYPE`, :const:`CHRTYPE`, :const:`BLKTYPE`,
500 :const:`GNUTYPE_SPARSE`. To determine the type of a :class:`TarInfo` object
501 more conveniently, use the ``is_*()`` methods below.
502
503
504.. attribute:: TarInfo.linkname
505
506 Name of the target file name, which is only present in :class:`TarInfo` objects
507 of type :const:`LNKTYPE` and :const:`SYMTYPE`.
508
509
510.. attribute:: TarInfo.uid
511
512 User ID of the user who originally stored this member.
513
514
515.. attribute:: TarInfo.gid
516
517 Group ID of the user who originally stored this member.
518
519
520.. attribute:: TarInfo.uname
521
522 User name.
523
524
525.. attribute:: TarInfo.gname
526
527 Group name.
528
529
530.. attribute:: TarInfo.pax_headers
531
532 A dictionary containing key-value pairs of an associated pax extended header.
533
Georg Brandl116aa622007-08-15 14:28:22 +0000534
535A :class:`TarInfo` object also provides some convenient query methods:
536
537
538.. method:: TarInfo.isfile()
539
540 Return :const:`True` if the :class:`Tarinfo` object is a regular file.
541
542
543.. method:: TarInfo.isreg()
544
545 Same as :meth:`isfile`.
546
547
548.. method:: TarInfo.isdir()
549
550 Return :const:`True` if it is a directory.
551
552
553.. method:: TarInfo.issym()
554
555 Return :const:`True` if it is a symbolic link.
556
557
558.. method:: TarInfo.islnk()
559
560 Return :const:`True` if it is a hard link.
561
562
563.. method:: TarInfo.ischr()
564
565 Return :const:`True` if it is a character device.
566
567
568.. method:: TarInfo.isblk()
569
570 Return :const:`True` if it is a block device.
571
572
573.. method:: TarInfo.isfifo()
574
575 Return :const:`True` if it is a FIFO.
576
577
578.. method:: TarInfo.isdev()
579
580 Return :const:`True` if it is one of character device, block device or FIFO.
581
Georg Brandl116aa622007-08-15 14:28:22 +0000582
583.. _tar-examples:
584
585Examples
586--------
587
588How to extract an entire tar archive to the current working directory::
589
590 import tarfile
591 tar = tarfile.open("sample.tar.gz")
592 tar.extractall()
593 tar.close()
594
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000595How to extract a subset of a tar archive with :meth:`TarFile.extractall` using
596a generator function instead of a list::
597
598 import os
599 import tarfile
600
601 def py_files(members):
602 for tarinfo in members:
603 if os.path.splitext(tarinfo.name)[1] == ".py":
604 yield tarinfo
605
606 tar = tarfile.open("sample.tar.gz")
607 tar.extractall(members=py_files(tar))
608 tar.close()
609
Georg Brandl116aa622007-08-15 14:28:22 +0000610How to create an uncompressed tar archive from a list of filenames::
611
612 import tarfile
613 tar = tarfile.open("sample.tar", "w")
614 for name in ["foo", "bar", "quux"]:
615 tar.add(name)
616 tar.close()
617
Lars Gustäbel01385812010-03-03 12:08:54 +0000618The same example using the :keyword:`with` statement::
619
620 import tarfile
621 with tarfile.open("sample.tar", "w") as tar:
622 for name in ["foo", "bar", "quux"]:
623 tar.add(name)
624
Georg Brandl116aa622007-08-15 14:28:22 +0000625How to read a gzip compressed tar archive and display some member information::
626
627 import tarfile
628 tar = tarfile.open("sample.tar.gz", "r:gz")
629 for tarinfo in tar:
Collin Winterc79461b2007-09-01 23:34:30 +0000630 print(tarinfo.name, "is", tarinfo.size, "bytes in size and is", end="")
Georg Brandl116aa622007-08-15 14:28:22 +0000631 if tarinfo.isreg():
Collin Winterc79461b2007-09-01 23:34:30 +0000632 print("a regular file.")
Georg Brandl116aa622007-08-15 14:28:22 +0000633 elif tarinfo.isdir():
Collin Winterc79461b2007-09-01 23:34:30 +0000634 print("a directory.")
Georg Brandl116aa622007-08-15 14:28:22 +0000635 else:
Collin Winterc79461b2007-09-01 23:34:30 +0000636 print("something else.")
Georg Brandl116aa622007-08-15 14:28:22 +0000637 tar.close()
638
Lars Gustäbel049d2aa2009-09-12 10:44:00 +0000639How to create an archive and reset the user information using the *filter*
640parameter in :meth:`TarFile.add`::
641
642 import tarfile
643 def reset(tarinfo):
644 tarinfo.uid = tarinfo.gid = 0
645 tarinfo.uname = tarinfo.gname = "root"
646 return tarinfo
647 tar = tarfile.open("sample.tar.gz", "w:gz")
648 tar.add("foo", filter=reset)
649 tar.close()
650
Georg Brandl116aa622007-08-15 14:28:22 +0000651
652.. _tar-formats:
653
654Supported tar formats
655---------------------
656
657There are three tar formats that can be created with the :mod:`tarfile` module:
658
659* The POSIX.1-1988 ustar format (:const:`USTAR_FORMAT`). It supports filenames
660 up to a length of at best 256 characters and linknames up to 100 characters. The
661 maximum file size is 8 gigabytes. This is an old and limited but widely
662 supported format.
663
664* The GNU tar format (:const:`GNU_FORMAT`). It supports long filenames and
665 linknames, files bigger than 8 gigabytes and sparse files. It is the de facto
666 standard on GNU/Linux systems. :mod:`tarfile` fully supports the GNU tar
667 extensions for long names, sparse file support is read-only.
668
669* The POSIX.1-2001 pax format (:const:`PAX_FORMAT`). It is the most flexible
670 format with virtually no limits. It supports long filenames and linknames, large
671 files and stores pathnames in a portable way. However, not all tar
672 implementations today are able to handle pax archives properly.
673
674 The *pax* format is an extension to the existing *ustar* format. It uses extra
675 headers for information that cannot be stored otherwise. There are two flavours
676 of pax headers: Extended headers only affect the subsequent file header, global
677 headers are valid for the complete archive and affect all following files. All
678 the data in a pax header is encoded in *UTF-8* for portability reasons.
679
680There are some more variants of the tar format which can be read, but not
681created:
682
683* The ancient V7 format. This is the first tar format from Unix Seventh Edition,
684 storing only regular files and directories. Names must not be longer than 100
685 characters, there is no user/group name information. Some archives have
686 miscalculated header checksums in case of fields with non-ASCII characters.
687
688* The SunOS tar extended format. This format is a variant of the POSIX.1-2001
689 pax format, but is not compatible.
690
Georg Brandl116aa622007-08-15 14:28:22 +0000691.. _tar-unicode:
692
693Unicode issues
694--------------
695
696The tar format was originally conceived to make backups on tape drives with the
697main focus on preserving file system information. Nowadays tar archives are
698commonly used for file distribution and exchanging archives over networks. One
Lars Gustäbel3741eff2007-08-21 12:17:05 +0000699problem of the original format (which is the basis of all other formats) is
700that there is no concept of supporting different character encodings. For
Georg Brandl116aa622007-08-15 14:28:22 +0000701example, an ordinary tar archive created on a *UTF-8* system cannot be read
Lars Gustäbel3741eff2007-08-21 12:17:05 +0000702correctly on a *Latin-1* system if it contains non-*ASCII* characters. Textual
703metadata (like filenames, linknames, user/group names) will appear damaged.
704Unfortunately, there is no way to autodetect the encoding of an archive. The
705pax format was designed to solve this problem. It stores non-ASCII metadata
706using the universal character encoding *UTF-8*.
Georg Brandl116aa622007-08-15 14:28:22 +0000707
Lars Gustäbel3741eff2007-08-21 12:17:05 +0000708The details of character conversion in :mod:`tarfile` are controlled by the
709*encoding* and *errors* keyword arguments of the :class:`TarFile` class.
Georg Brandl116aa622007-08-15 14:28:22 +0000710
Lars Gustäbel3741eff2007-08-21 12:17:05 +0000711*encoding* defines the character encoding to use for the metadata in the
712archive. The default value is :func:`sys.getfilesystemencoding` or ``'ascii'``
713as a fallback. Depending on whether the archive is read or written, the
714metadata must be either decoded or encoded. If *encoding* is not set
715appropriately, this conversion may fail.
Georg Brandl116aa622007-08-15 14:28:22 +0000716
717The *errors* argument defines how characters are treated that cannot be
Victor Stinnerde629d42010-05-05 21:43:57 +0000718converted. Possible values are listed in section :ref:`codec-base-classes`.
719The default scheme is ``'surrogateescape'`` which Python also uses for its
720file system calls, see :ref:`os-filenames`.
Georg Brandl116aa622007-08-15 14:28:22 +0000721
Lars Gustäbel1465cc22010-05-17 18:02:50 +0000722In case of :const:`PAX_FORMAT` archives, *encoding* is generally not needed
723because all the metadata is stored using *UTF-8*. *encoding* is only used in
724the rare cases when binary pax headers are decoded or when strings with
725surrogate characters are stored.
726