blob: d578a79421375fe5b1aee2b864bdb0dfdc527121 [file] [log] [blame]
Georg Brandl116aa622007-08-15 14:28:22 +00001:mod:`tarfile` --- Read and write tar archive files
2===================================================
3
4.. module:: tarfile
5 :synopsis: Read and write tar-format archive files.
6
7
Georg Brandl116aa622007-08-15 14:28:22 +00008.. moduleauthor:: Lars Gustäbel <lars@gustaebel.de>
9.. sectionauthor:: Lars Gustäbel <lars@gustaebel.de>
10
11
Guido van Rossum77677112007-11-05 19:43:04 +000012The :mod:`tarfile` module makes it possible to read and write tar
13archives, including those using gzip or bz2 compression.
Christian Heimes255f53b2007-12-08 15:33:56 +000014(:file:`.zip` files can be read and written using the :mod:`zipfile` module.)
Guido van Rossum77677112007-11-05 19:43:04 +000015
Georg Brandl116aa622007-08-15 14:28:22 +000016Some facts and figures:
17
Guido van Rossum77677112007-11-05 19:43:04 +000018* reads and writes :mod:`gzip` and :mod:`bz2` compressed archives.
Georg Brandl116aa622007-08-15 14:28:22 +000019
20* read/write support for the POSIX.1-1988 (ustar) format.
21
22* read/write support for the GNU tar format including *longname* and *longlink*
Lars Gustäbel9cbdd752010-10-29 09:08:19 +000023 extensions, read-only support for all variants of the *sparse* extension
24 including restoration of sparse files.
Georg Brandl116aa622007-08-15 14:28:22 +000025
26* read/write support for the POSIX.1-2001 (pax) format.
27
Georg Brandl116aa622007-08-15 14:28:22 +000028* handles directories, regular files, hardlinks, symbolic links, fifos,
29 character devices and block devices and is able to acquire and restore file
30 information like timestamp, access permissions and owner.
31
Georg Brandl116aa622007-08-15 14:28:22 +000032
Benjamin Petersona37cfc62008-05-26 13:48:34 +000033.. function:: open(name=None, mode='r', fileobj=None, bufsize=10240, \*\*kwargs)
Georg Brandl116aa622007-08-15 14:28:22 +000034
35 Return a :class:`TarFile` object for the pathname *name*. For detailed
36 information on :class:`TarFile` objects and the keyword arguments that are
37 allowed, see :ref:`tarfile-objects`.
38
39 *mode* has to be a string of the form ``'filemode[:compression]'``, it defaults
40 to ``'r'``. Here is a full list of mode combinations:
41
42 +------------------+---------------------------------------------+
43 | mode | action |
44 +==================+=============================================+
45 | ``'r' or 'r:*'`` | Open for reading with transparent |
46 | | compression (recommended). |
47 +------------------+---------------------------------------------+
48 | ``'r:'`` | Open for reading exclusively without |
49 | | compression. |
50 +------------------+---------------------------------------------+
51 | ``'r:gz'`` | Open for reading with gzip compression. |
52 +------------------+---------------------------------------------+
53 | ``'r:bz2'`` | Open for reading with bzip2 compression. |
54 +------------------+---------------------------------------------+
55 | ``'a' or 'a:'`` | Open for appending with no compression. The |
56 | | file is created if it does not exist. |
57 +------------------+---------------------------------------------+
58 | ``'w' or 'w:'`` | Open for uncompressed writing. |
59 +------------------+---------------------------------------------+
60 | ``'w:gz'`` | Open for gzip compressed writing. |
61 +------------------+---------------------------------------------+
62 | ``'w:bz2'`` | Open for bzip2 compressed writing. |
63 +------------------+---------------------------------------------+
64
65 Note that ``'a:gz'`` or ``'a:bz2'`` is not possible. If *mode* is not suitable
66 to open a certain (compressed) file for reading, :exc:`ReadError` is raised. Use
67 *mode* ``'r'`` to avoid this. If a compression method is not supported,
68 :exc:`CompressionError` is raised.
69
Antoine Pitrou11cb9612010-09-15 11:11:28 +000070 If *fileobj* is specified, it is used as an alternative to a :term:`file object`
71 opened in binary mode for *name*. It is supposed to be at position 0.
Georg Brandl116aa622007-08-15 14:28:22 +000072
73 For special purposes, there is a second format for *mode*:
Benjamin Petersona37cfc62008-05-26 13:48:34 +000074 ``'filemode|[compression]'``. :func:`tarfile.open` will return a :class:`TarFile`
Georg Brandl116aa622007-08-15 14:28:22 +000075 object that processes its data as a stream of blocks. No random seeking will
76 be done on the file. If given, *fileobj* may be any object that has a
77 :meth:`read` or :meth:`write` method (depending on the *mode*). *bufsize*
78 specifies the blocksize and defaults to ``20 * 512`` bytes. Use this variant
Antoine Pitrou11cb9612010-09-15 11:11:28 +000079 in combination with e.g. ``sys.stdin``, a socket :term:`file object` or a tape
Georg Brandl116aa622007-08-15 14:28:22 +000080 device. However, such a :class:`TarFile` object is limited in that it does
81 not allow to be accessed randomly, see :ref:`tar-examples`. The currently
82 possible modes:
83
84 +-------------+--------------------------------------------+
85 | Mode | Action |
86 +=============+============================================+
87 | ``'r|*'`` | Open a *stream* of tar blocks for reading |
88 | | with transparent compression. |
89 +-------------+--------------------------------------------+
90 | ``'r|'`` | Open a *stream* of uncompressed tar blocks |
91 | | for reading. |
92 +-------------+--------------------------------------------+
93 | ``'r|gz'`` | Open a gzip compressed *stream* for |
94 | | reading. |
95 +-------------+--------------------------------------------+
96 | ``'r|bz2'`` | Open a bzip2 compressed *stream* for |
97 | | reading. |
98 +-------------+--------------------------------------------+
99 | ``'w|'`` | Open an uncompressed *stream* for writing. |
100 +-------------+--------------------------------------------+
101 | ``'w|gz'`` | Open an gzip compressed *stream* for |
102 | | writing. |
103 +-------------+--------------------------------------------+
104 | ``'w|bz2'`` | Open an bzip2 compressed *stream* for |
105 | | writing. |
106 +-------------+--------------------------------------------+
107
108
109.. class:: TarFile
110
111 Class for reading and writing tar archives. Do not use this class directly,
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000112 better use :func:`tarfile.open` instead. See :ref:`tarfile-objects`.
Georg Brandl116aa622007-08-15 14:28:22 +0000113
114
115.. function:: is_tarfile(name)
116
117 Return :const:`True` if *name* is a tar archive file, that the :mod:`tarfile`
118 module can read.
119
120
Lars Gustäbel0c24e8b2008-08-02 11:43:24 +0000121The :mod:`tarfile` module defines the following exceptions:
Georg Brandl116aa622007-08-15 14:28:22 +0000122
123
124.. exception:: TarError
125
126 Base class for all :mod:`tarfile` exceptions.
127
128
129.. exception:: ReadError
130
131 Is raised when a tar archive is opened, that either cannot be handled by the
132 :mod:`tarfile` module or is somehow invalid.
133
134
135.. exception:: CompressionError
136
137 Is raised when a compression method is not supported or when the data cannot be
138 decoded properly.
139
140
141.. exception:: StreamError
142
143 Is raised for the limitations that are typical for stream-like :class:`TarFile`
144 objects.
145
146
147.. exception:: ExtractError
148
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000149 Is raised for *non-fatal* errors when using :meth:`TarFile.extract`, but only if
Georg Brandl116aa622007-08-15 14:28:22 +0000150 :attr:`TarFile.errorlevel`\ ``== 2``.
151
152
153.. exception:: HeaderError
154
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000155 Is raised by :meth:`TarInfo.frombuf` if the buffer it gets is invalid.
156
Georg Brandl116aa622007-08-15 14:28:22 +0000157
Georg Brandl116aa622007-08-15 14:28:22 +0000158
159Each of the following constants defines a tar archive format that the
160:mod:`tarfile` module is able to create. See section :ref:`tar-formats` for
161details.
162
163
164.. data:: USTAR_FORMAT
165
166 POSIX.1-1988 (ustar) format.
167
168
169.. data:: GNU_FORMAT
170
171 GNU tar format.
172
173
174.. data:: PAX_FORMAT
175
176 POSIX.1-2001 (pax) format.
177
178
179.. data:: DEFAULT_FORMAT
180
181 The default format for creating archives. This is currently :const:`GNU_FORMAT`.
182
183
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000184The following variables are available on module level:
185
186
187.. data:: ENCODING
188
Victor Stinner0f35e2c2010-06-11 23:46:47 +0000189 The default character encoding: ``'utf-8'`` on Windows,
190 :func:`sys.getfilesystemencoding` otherwise.
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000191
192
Georg Brandl116aa622007-08-15 14:28:22 +0000193.. seealso::
194
195 Module :mod:`zipfile`
196 Documentation of the :mod:`zipfile` standard module.
197
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000198 `GNU tar manual, Basic Tar Format <http://www.gnu.org/software/tar/manual/html_node/Standard.html>`_
Georg Brandl116aa622007-08-15 14:28:22 +0000199 Documentation for tar archive files, including GNU tar extensions.
200
Georg Brandl116aa622007-08-15 14:28:22 +0000201
202.. _tarfile-objects:
203
204TarFile Objects
205---------------
206
207The :class:`TarFile` object provides an interface to a tar archive. A tar
208archive is a sequence of blocks. An archive member (a stored file) is made up of
209a header block followed by data blocks. It is possible to store a file in a tar
210archive several times. Each archive member is represented by a :class:`TarInfo`
211object, see :ref:`tarinfo-objects` for details.
212
Lars Gustäbel01385812010-03-03 12:08:54 +0000213A :class:`TarFile` object can be used as a context manager in a :keyword:`with`
214statement. It will automatically be closed when the block is completed. Please
215note that in the event of an exception an archive opened for writing will not
Benjamin Peterson08bf91c2010-04-11 16:12:57 +0000216be finalized; only the internally used file object will be closed. See the
Lars Gustäbel01385812010-03-03 12:08:54 +0000217:ref:`tar-examples` section for a use case.
218
219.. versionadded:: 3.2
220 Added support for the context manager protocol.
Georg Brandl116aa622007-08-15 14:28:22 +0000221
Victor Stinnerde629d42010-05-05 21:43:57 +0000222.. class:: TarFile(name=None, mode='r', fileobj=None, format=DEFAULT_FORMAT, tarinfo=TarInfo, dereference=False, ignore_zeros=False, encoding=ENCODING, errors='surrogateescape', pax_headers=None, debug=0, errorlevel=0)
Georg Brandl116aa622007-08-15 14:28:22 +0000223
224 All following arguments are optional and can be accessed as instance attributes
225 as well.
226
227 *name* is the pathname of the archive. It can be omitted if *fileobj* is given.
228 In this case, the file object's :attr:`name` attribute is used if it exists.
229
230 *mode* is either ``'r'`` to read from an existing archive, ``'a'`` to append
231 data to an existing file or ``'w'`` to create a new file overwriting an existing
232 one.
233
234 If *fileobj* is given, it is used for reading or writing data. If it can be
235 determined, *mode* is overridden by *fileobj*'s mode. *fileobj* will be used
236 from position 0.
237
238 .. note::
239
240 *fileobj* is not closed, when :class:`TarFile` is closed.
241
242 *format* controls the archive format. It must be one of the constants
243 :const:`USTAR_FORMAT`, :const:`GNU_FORMAT` or :const:`PAX_FORMAT` that are
244 defined at module level.
245
Georg Brandl116aa622007-08-15 14:28:22 +0000246 The *tarinfo* argument can be used to replace the default :class:`TarInfo` class
247 with a different one.
248
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000249 If *dereference* is :const:`False`, add symbolic and hard links to the archive. If it
250 is :const:`True`, add the content of the target files to the archive. This has no
Georg Brandl116aa622007-08-15 14:28:22 +0000251 effect on systems that do not support symbolic links.
252
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000253 If *ignore_zeros* is :const:`False`, treat an empty block as the end of the archive.
254 If it is :const:`True`, skip empty (and invalid) blocks and try to get as many members
Georg Brandl116aa622007-08-15 14:28:22 +0000255 as possible. This is only useful for reading concatenated or damaged archives.
256
257 *debug* can be set from ``0`` (no debug messages) up to ``3`` (all debug
258 messages). The messages are written to ``sys.stderr``.
259
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000260 If *errorlevel* is ``0``, all errors are ignored when using :meth:`TarFile.extract`.
Georg Brandl116aa622007-08-15 14:28:22 +0000261 Nevertheless, they appear as error messages in the debug output, when debugging
262 is enabled. If ``1``, all *fatal* errors are raised as :exc:`OSError` or
263 :exc:`IOError` exceptions. If ``2``, all *non-fatal* errors are raised as
264 :exc:`TarError` exceptions as well.
265
Lars Gustäbel3741eff2007-08-21 12:17:05 +0000266 The *encoding* and *errors* arguments define the character encoding to be
267 used for reading or writing the archive and how conversion errors are going
268 to be handled. The default settings will work for most users.
Georg Brandl116aa622007-08-15 14:28:22 +0000269 See section :ref:`tar-unicode` for in-depth information.
270
Victor Stinnerde629d42010-05-05 21:43:57 +0000271 .. versionchanged:: 3.2
272 Use ``'surrogateescape'`` as the default for the *errors* argument.
273
Lars Gustäbel3741eff2007-08-21 12:17:05 +0000274 The *pax_headers* argument is an optional dictionary of strings which
Georg Brandl116aa622007-08-15 14:28:22 +0000275 will be added as a pax global header if *format* is :const:`PAX_FORMAT`.
276
Georg Brandl116aa622007-08-15 14:28:22 +0000277
278.. method:: TarFile.open(...)
279
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000280 Alternative constructor. The :func:`tarfile.open` function is actually a
281 shortcut to this classmethod.
Georg Brandl116aa622007-08-15 14:28:22 +0000282
283
284.. method:: TarFile.getmember(name)
285
286 Return a :class:`TarInfo` object for member *name*. If *name* can not be found
287 in the archive, :exc:`KeyError` is raised.
288
289 .. note::
290
291 If a member occurs more than once in the archive, its last occurrence is assumed
292 to be the most up-to-date version.
293
294
295.. method:: TarFile.getmembers()
296
297 Return the members of the archive as a list of :class:`TarInfo` objects. The
298 list has the same order as the members in the archive.
299
300
301.. method:: TarFile.getnames()
302
303 Return the members as a list of their names. It has the same order as the list
304 returned by :meth:`getmembers`.
305
306
307.. method:: TarFile.list(verbose=True)
308
309 Print a table of contents to ``sys.stdout``. If *verbose* is :const:`False`,
310 only the names of the members are printed. If it is :const:`True`, output
311 similar to that of :program:`ls -l` is produced.
312
313
314.. method:: TarFile.next()
315
316 Return the next member of the archive as a :class:`TarInfo` object, when
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000317 :class:`TarFile` is opened for reading. Return :const:`None` if there is no more
Georg Brandl116aa622007-08-15 14:28:22 +0000318 available.
319
320
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000321.. method:: TarFile.extractall(path=".", members=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000322
323 Extract all members from the archive to the current working directory or
324 directory *path*. If optional *members* is given, it must be a subset of the
325 list returned by :meth:`getmembers`. Directory information like owner,
326 modification time and permissions are set after all members have been extracted.
327 This is done to work around two problems: A directory's modification time is
328 reset each time a file is created in it. And, if a directory's permissions do
329 not allow writing, extracting files to it will fail.
330
Thomas Wouters47b49bf2007-08-30 22:15:33 +0000331 .. warning::
332
333 Never extract archives from untrusted sources without prior inspection.
334 It is possible that files are created outside of *path*, e.g. members
335 that have absolute filenames starting with ``"/"`` or filenames with two
336 dots ``".."``.
337
Georg Brandl116aa622007-08-15 14:28:22 +0000338
Martin v. Löwis16f344d2010-11-01 21:39:13 +0000339.. method:: TarFile.extract(member, path="", set_attrs=True)
Georg Brandl116aa622007-08-15 14:28:22 +0000340
341 Extract a member from the archive to the current working directory, using its
342 full name. Its file information is extracted as accurately as possible. *member*
343 may be a filename or a :class:`TarInfo` object. You can specify a different
Martin v. Löwis16f344d2010-11-01 21:39:13 +0000344 directory using *path*. File attributes (owner, mtime, mode) are set unless
345 *set_attrs* is False.
Georg Brandl116aa622007-08-15 14:28:22 +0000346
347 .. note::
348
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000349 The :meth:`extract` method does not take care of several extraction issues.
350 In most cases you should consider using the :meth:`extractall` method.
Georg Brandl116aa622007-08-15 14:28:22 +0000351
Thomas Wouters47b49bf2007-08-30 22:15:33 +0000352 .. warning::
353
354 See the warning for :meth:`extractall`.
355
Martin v. Löwis16f344d2010-11-01 21:39:13 +0000356 .. versionchanged:: 3.2
357 Added the *set_attrs* parameter.
Georg Brandl116aa622007-08-15 14:28:22 +0000358
359.. method:: TarFile.extractfile(member)
360
361 Extract a member from the archive as a file object. *member* may be a filename
Antoine Pitrou11cb9612010-09-15 11:11:28 +0000362 or a :class:`TarInfo` object. If *member* is a regular file, a :term:`file-like
363 object` is returned. If *member* is a link, a file-like object is constructed from
364 the link's target. If *member* is none of the above, :const:`None` is returned.
Georg Brandl116aa622007-08-15 14:28:22 +0000365
366 .. note::
367
Georg Brandlff2ad0e2009-04-27 16:51:45 +0000368 The file-like object is read-only. It provides the methods
369 :meth:`read`, :meth:`readline`, :meth:`readlines`, :meth:`seek`, :meth:`tell`,
370 and :meth:`close`, and also supports iteration over its lines.
Georg Brandl116aa622007-08-15 14:28:22 +0000371
372
Raymond Hettingera63a3122011-01-26 20:34:14 +0000373.. method:: TarFile.add(name, arcname=None, recursive=True, exclude=None, *, filter=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000374
Raymond Hettingera63a3122011-01-26 20:34:14 +0000375 Add the file *name* to the archive. *name* may be any type of file
376 (directory, fifo, symbolic link, etc.). If given, *arcname* specifies an
377 alternative name for the file in the archive. Directories are added
378 recursively by default. This can be avoided by setting *recursive* to
379 :const:`False`. If *exclude* is given, it must be a function that takes one
380 filename argument and returns a boolean value. Depending on this value the
381 respective file is either excluded (:const:`True`) or added
382 (:const:`False`). If *filter* is specified it must be a keyword argument. It
383 should be a function that takes a :class:`TarInfo` object argument and
384 returns the changed :class:`TarInfo` object. If it instead returns
385 :const:`None` the :class:`TarInfo` object will be excluded from the
386 archive. See :ref:`tar-examples` for an example.
Lars Gustäbel049d2aa2009-09-12 10:44:00 +0000387
388 .. versionchanged:: 3.2
389 Added the *filter* parameter.
390
391 .. deprecated:: 3.2
392 The *exclude* parameter is deprecated, please use the *filter* parameter
393 instead.
Georg Brandl116aa622007-08-15 14:28:22 +0000394
Georg Brandl116aa622007-08-15 14:28:22 +0000395
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000396.. method:: TarFile.addfile(tarinfo, fileobj=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000397
398 Add the :class:`TarInfo` object *tarinfo* to the archive. If *fileobj* is given,
399 ``tarinfo.size`` bytes are read from it and added to the archive. You can
400 create :class:`TarInfo` objects using :meth:`gettarinfo`.
401
402 .. note::
403
404 On Windows platforms, *fileobj* should always be opened with mode ``'rb'`` to
405 avoid irritation about the file size.
406
407
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000408.. method:: TarFile.gettarinfo(name=None, arcname=None, fileobj=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000409
Antoine Pitrou11cb9612010-09-15 11:11:28 +0000410 Create a :class:`TarInfo` object for either the file *name* or the :term:`file
411 object` *fileobj* (using :func:`os.fstat` on its file descriptor). You can modify
412 some of the :class:`TarInfo`'s attributes before you add it using :meth:`addfile`.
Georg Brandl116aa622007-08-15 14:28:22 +0000413 If given, *arcname* specifies an alternative name for the file in the archive.
414
415
416.. method:: TarFile.close()
417
418 Close the :class:`TarFile`. In write mode, two finishing zero blocks are
419 appended to the archive.
420
421
Georg Brandl116aa622007-08-15 14:28:22 +0000422.. attribute:: TarFile.pax_headers
423
424 A dictionary containing key-value pairs of pax global headers.
425
Georg Brandl116aa622007-08-15 14:28:22 +0000426
Georg Brandl116aa622007-08-15 14:28:22 +0000427
428.. _tarinfo-objects:
429
430TarInfo Objects
431---------------
432
433A :class:`TarInfo` object represents one member in a :class:`TarFile`. Aside
434from storing all required attributes of a file (like file type, size, time,
435permissions, owner etc.), it provides some useful methods to determine its type.
436It does *not* contain the file's data itself.
437
438:class:`TarInfo` objects are returned by :class:`TarFile`'s methods
439:meth:`getmember`, :meth:`getmembers` and :meth:`gettarinfo`.
440
441
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000442.. class:: TarInfo(name="")
Georg Brandl116aa622007-08-15 14:28:22 +0000443
444 Create a :class:`TarInfo` object.
445
446
447.. method:: TarInfo.frombuf(buf)
448
449 Create and return a :class:`TarInfo` object from string buffer *buf*.
450
Georg Brandl55ac8f02007-09-01 13:51:09 +0000451 Raises :exc:`HeaderError` if the buffer is invalid..
Georg Brandl116aa622007-08-15 14:28:22 +0000452
453
454.. method:: TarInfo.fromtarfile(tarfile)
455
456 Read the next member from the :class:`TarFile` object *tarfile* and return it as
457 a :class:`TarInfo` object.
458
Georg Brandl116aa622007-08-15 14:28:22 +0000459
Victor Stinnerde629d42010-05-05 21:43:57 +0000460.. method:: TarInfo.tobuf(format=DEFAULT_FORMAT, encoding=ENCODING, errors='surrogateescape')
Georg Brandl116aa622007-08-15 14:28:22 +0000461
462 Create a string buffer from a :class:`TarInfo` object. For information on the
463 arguments see the constructor of the :class:`TarFile` class.
464
Victor Stinnerde629d42010-05-05 21:43:57 +0000465 .. versionchanged:: 3.2
466 Use ``'surrogateescape'`` as the default for the *errors* argument.
467
Georg Brandl116aa622007-08-15 14:28:22 +0000468
469A ``TarInfo`` object has the following public data attributes:
470
471
472.. attribute:: TarInfo.name
473
474 Name of the archive member.
475
476
477.. attribute:: TarInfo.size
478
479 Size in bytes.
480
481
482.. attribute:: TarInfo.mtime
483
484 Time of last modification.
485
486
487.. attribute:: TarInfo.mode
488
489 Permission bits.
490
491
492.. attribute:: TarInfo.type
493
494 File type. *type* is usually one of these constants: :const:`REGTYPE`,
495 :const:`AREGTYPE`, :const:`LNKTYPE`, :const:`SYMTYPE`, :const:`DIRTYPE`,
496 :const:`FIFOTYPE`, :const:`CONTTYPE`, :const:`CHRTYPE`, :const:`BLKTYPE`,
497 :const:`GNUTYPE_SPARSE`. To determine the type of a :class:`TarInfo` object
498 more conveniently, use the ``is_*()`` methods below.
499
500
501.. attribute:: TarInfo.linkname
502
503 Name of the target file name, which is only present in :class:`TarInfo` objects
504 of type :const:`LNKTYPE` and :const:`SYMTYPE`.
505
506
507.. attribute:: TarInfo.uid
508
509 User ID of the user who originally stored this member.
510
511
512.. attribute:: TarInfo.gid
513
514 Group ID of the user who originally stored this member.
515
516
517.. attribute:: TarInfo.uname
518
519 User name.
520
521
522.. attribute:: TarInfo.gname
523
524 Group name.
525
526
527.. attribute:: TarInfo.pax_headers
528
529 A dictionary containing key-value pairs of an associated pax extended header.
530
Georg Brandl116aa622007-08-15 14:28:22 +0000531
532A :class:`TarInfo` object also provides some convenient query methods:
533
534
535.. method:: TarInfo.isfile()
536
537 Return :const:`True` if the :class:`Tarinfo` object is a regular file.
538
539
540.. method:: TarInfo.isreg()
541
542 Same as :meth:`isfile`.
543
544
545.. method:: TarInfo.isdir()
546
547 Return :const:`True` if it is a directory.
548
549
550.. method:: TarInfo.issym()
551
552 Return :const:`True` if it is a symbolic link.
553
554
555.. method:: TarInfo.islnk()
556
557 Return :const:`True` if it is a hard link.
558
559
560.. method:: TarInfo.ischr()
561
562 Return :const:`True` if it is a character device.
563
564
565.. method:: TarInfo.isblk()
566
567 Return :const:`True` if it is a block device.
568
569
570.. method:: TarInfo.isfifo()
571
572 Return :const:`True` if it is a FIFO.
573
574
575.. method:: TarInfo.isdev()
576
577 Return :const:`True` if it is one of character device, block device or FIFO.
578
Georg Brandl116aa622007-08-15 14:28:22 +0000579
580.. _tar-examples:
581
582Examples
583--------
584
585How to extract an entire tar archive to the current working directory::
586
587 import tarfile
588 tar = tarfile.open("sample.tar.gz")
589 tar.extractall()
590 tar.close()
591
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000592How to extract a subset of a tar archive with :meth:`TarFile.extractall` using
593a generator function instead of a list::
594
595 import os
596 import tarfile
597
598 def py_files(members):
599 for tarinfo in members:
600 if os.path.splitext(tarinfo.name)[1] == ".py":
601 yield tarinfo
602
603 tar = tarfile.open("sample.tar.gz")
604 tar.extractall(members=py_files(tar))
605 tar.close()
606
Georg Brandl116aa622007-08-15 14:28:22 +0000607How to create an uncompressed tar archive from a list of filenames::
608
609 import tarfile
610 tar = tarfile.open("sample.tar", "w")
611 for name in ["foo", "bar", "quux"]:
612 tar.add(name)
613 tar.close()
614
Lars Gustäbel01385812010-03-03 12:08:54 +0000615The same example using the :keyword:`with` statement::
616
617 import tarfile
618 with tarfile.open("sample.tar", "w") as tar:
619 for name in ["foo", "bar", "quux"]:
620 tar.add(name)
621
Georg Brandl116aa622007-08-15 14:28:22 +0000622How to read a gzip compressed tar archive and display some member information::
623
624 import tarfile
625 tar = tarfile.open("sample.tar.gz", "r:gz")
626 for tarinfo in tar:
Collin Winterc79461b2007-09-01 23:34:30 +0000627 print(tarinfo.name, "is", tarinfo.size, "bytes in size and is", end="")
Georg Brandl116aa622007-08-15 14:28:22 +0000628 if tarinfo.isreg():
Collin Winterc79461b2007-09-01 23:34:30 +0000629 print("a regular file.")
Georg Brandl116aa622007-08-15 14:28:22 +0000630 elif tarinfo.isdir():
Collin Winterc79461b2007-09-01 23:34:30 +0000631 print("a directory.")
Georg Brandl116aa622007-08-15 14:28:22 +0000632 else:
Collin Winterc79461b2007-09-01 23:34:30 +0000633 print("something else.")
Georg Brandl116aa622007-08-15 14:28:22 +0000634 tar.close()
635
Lars Gustäbel049d2aa2009-09-12 10:44:00 +0000636How to create an archive and reset the user information using the *filter*
637parameter in :meth:`TarFile.add`::
638
639 import tarfile
640 def reset(tarinfo):
641 tarinfo.uid = tarinfo.gid = 0
642 tarinfo.uname = tarinfo.gname = "root"
643 return tarinfo
644 tar = tarfile.open("sample.tar.gz", "w:gz")
645 tar.add("foo", filter=reset)
646 tar.close()
647
Georg Brandl116aa622007-08-15 14:28:22 +0000648
649.. _tar-formats:
650
651Supported tar formats
652---------------------
653
654There are three tar formats that can be created with the :mod:`tarfile` module:
655
656* The POSIX.1-1988 ustar format (:const:`USTAR_FORMAT`). It supports filenames
657 up to a length of at best 256 characters and linknames up to 100 characters. The
658 maximum file size is 8 gigabytes. This is an old and limited but widely
659 supported format.
660
661* The GNU tar format (:const:`GNU_FORMAT`). It supports long filenames and
662 linknames, files bigger than 8 gigabytes and sparse files. It is the de facto
663 standard on GNU/Linux systems. :mod:`tarfile` fully supports the GNU tar
664 extensions for long names, sparse file support is read-only.
665
666* The POSIX.1-2001 pax format (:const:`PAX_FORMAT`). It is the most flexible
667 format with virtually no limits. It supports long filenames and linknames, large
668 files and stores pathnames in a portable way. However, not all tar
669 implementations today are able to handle pax archives properly.
670
671 The *pax* format is an extension to the existing *ustar* format. It uses extra
672 headers for information that cannot be stored otherwise. There are two flavours
673 of pax headers: Extended headers only affect the subsequent file header, global
674 headers are valid for the complete archive and affect all following files. All
675 the data in a pax header is encoded in *UTF-8* for portability reasons.
676
677There are some more variants of the tar format which can be read, but not
678created:
679
680* The ancient V7 format. This is the first tar format from Unix Seventh Edition,
681 storing only regular files and directories. Names must not be longer than 100
682 characters, there is no user/group name information. Some archives have
683 miscalculated header checksums in case of fields with non-ASCII characters.
684
685* The SunOS tar extended format. This format is a variant of the POSIX.1-2001
686 pax format, but is not compatible.
687
Georg Brandl116aa622007-08-15 14:28:22 +0000688.. _tar-unicode:
689
690Unicode issues
691--------------
692
693The tar format was originally conceived to make backups on tape drives with the
694main focus on preserving file system information. Nowadays tar archives are
695commonly used for file distribution and exchanging archives over networks. One
Lars Gustäbel3741eff2007-08-21 12:17:05 +0000696problem of the original format (which is the basis of all other formats) is
697that there is no concept of supporting different character encodings. For
Georg Brandl116aa622007-08-15 14:28:22 +0000698example, an ordinary tar archive created on a *UTF-8* system cannot be read
Lars Gustäbel3741eff2007-08-21 12:17:05 +0000699correctly on a *Latin-1* system if it contains non-*ASCII* characters. Textual
700metadata (like filenames, linknames, user/group names) will appear damaged.
701Unfortunately, there is no way to autodetect the encoding of an archive. The
702pax format was designed to solve this problem. It stores non-ASCII metadata
703using the universal character encoding *UTF-8*.
Georg Brandl116aa622007-08-15 14:28:22 +0000704
Lars Gustäbel3741eff2007-08-21 12:17:05 +0000705The details of character conversion in :mod:`tarfile` are controlled by the
706*encoding* and *errors* keyword arguments of the :class:`TarFile` class.
Georg Brandl116aa622007-08-15 14:28:22 +0000707
Lars Gustäbel3741eff2007-08-21 12:17:05 +0000708*encoding* defines the character encoding to use for the metadata in the
709archive. The default value is :func:`sys.getfilesystemencoding` or ``'ascii'``
710as a fallback. Depending on whether the archive is read or written, the
711metadata must be either decoded or encoded. If *encoding* is not set
712appropriately, this conversion may fail.
Georg Brandl116aa622007-08-15 14:28:22 +0000713
714The *errors* argument defines how characters are treated that cannot be
Victor Stinnerde629d42010-05-05 21:43:57 +0000715converted. Possible values are listed in section :ref:`codec-base-classes`.
716The default scheme is ``'surrogateescape'`` which Python also uses for its
717file system calls, see :ref:`os-filenames`.
Georg Brandl116aa622007-08-15 14:28:22 +0000718
Lars Gustäbel1465cc22010-05-17 18:02:50 +0000719In case of :const:`PAX_FORMAT` archives, *encoding* is generally not needed
720because all the metadata is stored using *UTF-8*. *encoding* is only used in
721the rare cases when binary pax headers are decoded or when strings with
722surrogate characters are stored.
723