blob: d7fbf39ee7d8ab3443d02a651ceb79d355e950cb [file] [log] [blame]
Georg Brandl116aa622007-08-15 14:28:22 +00001:mod:`tarfile` --- Read and write tar archive files
2===================================================
3
4.. module:: tarfile
5 :synopsis: Read and write tar-format archive files.
6
7
Georg Brandl116aa622007-08-15 14:28:22 +00008.. moduleauthor:: Lars Gustäbel <lars@gustaebel.de>
9.. sectionauthor:: Lars Gustäbel <lars@gustaebel.de>
10
11
Guido van Rossum77677112007-11-05 19:43:04 +000012The :mod:`tarfile` module makes it possible to read and write tar
13archives, including those using gzip or bz2 compression.
Christian Heimes255f53b2007-12-08 15:33:56 +000014(:file:`.zip` files can be read and written using the :mod:`zipfile` module.)
Guido van Rossum77677112007-11-05 19:43:04 +000015
Georg Brandl116aa622007-08-15 14:28:22 +000016Some facts and figures:
17
Guido van Rossum77677112007-11-05 19:43:04 +000018* reads and writes :mod:`gzip` and :mod:`bz2` compressed archives.
Georg Brandl116aa622007-08-15 14:28:22 +000019
20* read/write support for the POSIX.1-1988 (ustar) format.
21
22* read/write support for the GNU tar format including *longname* and *longlink*
Lars Gustäbel9cbdd752010-10-29 09:08:19 +000023 extensions, read-only support for all variants of the *sparse* extension
24 including restoration of sparse files.
Georg Brandl116aa622007-08-15 14:28:22 +000025
26* read/write support for the POSIX.1-2001 (pax) format.
27
Georg Brandl116aa622007-08-15 14:28:22 +000028* handles directories, regular files, hardlinks, symbolic links, fifos,
29 character devices and block devices and is able to acquire and restore file
30 information like timestamp, access permissions and owner.
31
Georg Brandl116aa622007-08-15 14:28:22 +000032
Benjamin Petersona37cfc62008-05-26 13:48:34 +000033.. function:: open(name=None, mode='r', fileobj=None, bufsize=10240, \*\*kwargs)
Georg Brandl116aa622007-08-15 14:28:22 +000034
35 Return a :class:`TarFile` object for the pathname *name*. For detailed
36 information on :class:`TarFile` objects and the keyword arguments that are
37 allowed, see :ref:`tarfile-objects`.
38
39 *mode* has to be a string of the form ``'filemode[:compression]'``, it defaults
40 to ``'r'``. Here is a full list of mode combinations:
41
42 +------------------+---------------------------------------------+
43 | mode | action |
44 +==================+=============================================+
45 | ``'r' or 'r:*'`` | Open for reading with transparent |
46 | | compression (recommended). |
47 +------------------+---------------------------------------------+
48 | ``'r:'`` | Open for reading exclusively without |
49 | | compression. |
50 +------------------+---------------------------------------------+
51 | ``'r:gz'`` | Open for reading with gzip compression. |
52 +------------------+---------------------------------------------+
53 | ``'r:bz2'`` | Open for reading with bzip2 compression. |
54 +------------------+---------------------------------------------+
55 | ``'a' or 'a:'`` | Open for appending with no compression. The |
56 | | file is created if it does not exist. |
57 +------------------+---------------------------------------------+
58 | ``'w' or 'w:'`` | Open for uncompressed writing. |
59 +------------------+---------------------------------------------+
60 | ``'w:gz'`` | Open for gzip compressed writing. |
61 +------------------+---------------------------------------------+
62 | ``'w:bz2'`` | Open for bzip2 compressed writing. |
63 +------------------+---------------------------------------------+
64
65 Note that ``'a:gz'`` or ``'a:bz2'`` is not possible. If *mode* is not suitable
66 to open a certain (compressed) file for reading, :exc:`ReadError` is raised. Use
67 *mode* ``'r'`` to avoid this. If a compression method is not supported,
68 :exc:`CompressionError` is raised.
69
Antoine Pitrou11cb9612010-09-15 11:11:28 +000070 If *fileobj* is specified, it is used as an alternative to a :term:`file object`
71 opened in binary mode for *name*. It is supposed to be at position 0.
Georg Brandl116aa622007-08-15 14:28:22 +000072
73 For special purposes, there is a second format for *mode*:
Benjamin Petersona37cfc62008-05-26 13:48:34 +000074 ``'filemode|[compression]'``. :func:`tarfile.open` will return a :class:`TarFile`
Georg Brandl116aa622007-08-15 14:28:22 +000075 object that processes its data as a stream of blocks. No random seeking will
76 be done on the file. If given, *fileobj* may be any object that has a
77 :meth:`read` or :meth:`write` method (depending on the *mode*). *bufsize*
78 specifies the blocksize and defaults to ``20 * 512`` bytes. Use this variant
Antoine Pitrou11cb9612010-09-15 11:11:28 +000079 in combination with e.g. ``sys.stdin``, a socket :term:`file object` or a tape
Georg Brandl116aa622007-08-15 14:28:22 +000080 device. However, such a :class:`TarFile` object is limited in that it does
81 not allow to be accessed randomly, see :ref:`tar-examples`. The currently
82 possible modes:
83
84 +-------------+--------------------------------------------+
85 | Mode | Action |
86 +=============+============================================+
87 | ``'r|*'`` | Open a *stream* of tar blocks for reading |
88 | | with transparent compression. |
89 +-------------+--------------------------------------------+
90 | ``'r|'`` | Open a *stream* of uncompressed tar blocks |
91 | | for reading. |
92 +-------------+--------------------------------------------+
93 | ``'r|gz'`` | Open a gzip compressed *stream* for |
94 | | reading. |
95 +-------------+--------------------------------------------+
96 | ``'r|bz2'`` | Open a bzip2 compressed *stream* for |
97 | | reading. |
98 +-------------+--------------------------------------------+
99 | ``'w|'`` | Open an uncompressed *stream* for writing. |
100 +-------------+--------------------------------------------+
101 | ``'w|gz'`` | Open an gzip compressed *stream* for |
102 | | writing. |
103 +-------------+--------------------------------------------+
104 | ``'w|bz2'`` | Open an bzip2 compressed *stream* for |
105 | | writing. |
106 +-------------+--------------------------------------------+
107
108
109.. class:: TarFile
110
111 Class for reading and writing tar archives. Do not use this class directly,
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000112 better use :func:`tarfile.open` instead. See :ref:`tarfile-objects`.
Georg Brandl116aa622007-08-15 14:28:22 +0000113
114
115.. function:: is_tarfile(name)
116
117 Return :const:`True` if *name* is a tar archive file, that the :mod:`tarfile`
118 module can read.
119
120
Lars Gustäbel0c24e8b2008-08-02 11:43:24 +0000121The :mod:`tarfile` module defines the following exceptions:
Georg Brandl116aa622007-08-15 14:28:22 +0000122
123
124.. exception:: TarError
125
126 Base class for all :mod:`tarfile` exceptions.
127
128
129.. exception:: ReadError
130
131 Is raised when a tar archive is opened, that either cannot be handled by the
132 :mod:`tarfile` module or is somehow invalid.
133
134
135.. exception:: CompressionError
136
137 Is raised when a compression method is not supported or when the data cannot be
138 decoded properly.
139
140
141.. exception:: StreamError
142
143 Is raised for the limitations that are typical for stream-like :class:`TarFile`
144 objects.
145
146
147.. exception:: ExtractError
148
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000149 Is raised for *non-fatal* errors when using :meth:`TarFile.extract`, but only if
Georg Brandl116aa622007-08-15 14:28:22 +0000150 :attr:`TarFile.errorlevel`\ ``== 2``.
151
152
153.. exception:: HeaderError
154
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000155 Is raised by :meth:`TarInfo.frombuf` if the buffer it gets is invalid.
156
Georg Brandl116aa622007-08-15 14:28:22 +0000157
Georg Brandl116aa622007-08-15 14:28:22 +0000158
159Each of the following constants defines a tar archive format that the
160:mod:`tarfile` module is able to create. See section :ref:`tar-formats` for
161details.
162
163
164.. data:: USTAR_FORMAT
165
166 POSIX.1-1988 (ustar) format.
167
168
169.. data:: GNU_FORMAT
170
171 GNU tar format.
172
173
174.. data:: PAX_FORMAT
175
176 POSIX.1-2001 (pax) format.
177
178
179.. data:: DEFAULT_FORMAT
180
181 The default format for creating archives. This is currently :const:`GNU_FORMAT`.
182
183
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000184The following variables are available on module level:
185
186
187.. data:: ENCODING
188
Victor Stinner0f35e2c2010-06-11 23:46:47 +0000189 The default character encoding: ``'utf-8'`` on Windows,
190 :func:`sys.getfilesystemencoding` otherwise.
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000191
192
Georg Brandl116aa622007-08-15 14:28:22 +0000193.. seealso::
194
195 Module :mod:`zipfile`
196 Documentation of the :mod:`zipfile` standard module.
197
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000198 `GNU tar manual, Basic Tar Format <http://www.gnu.org/software/tar/manual/html_node/Standard.html>`_
Georg Brandl116aa622007-08-15 14:28:22 +0000199 Documentation for tar archive files, including GNU tar extensions.
200
Georg Brandl116aa622007-08-15 14:28:22 +0000201
202.. _tarfile-objects:
203
204TarFile Objects
205---------------
206
207The :class:`TarFile` object provides an interface to a tar archive. A tar
208archive is a sequence of blocks. An archive member (a stored file) is made up of
209a header block followed by data blocks. It is possible to store a file in a tar
210archive several times. Each archive member is represented by a :class:`TarInfo`
211object, see :ref:`tarinfo-objects` for details.
212
Lars Gustäbel01385812010-03-03 12:08:54 +0000213A :class:`TarFile` object can be used as a context manager in a :keyword:`with`
214statement. It will automatically be closed when the block is completed. Please
215note that in the event of an exception an archive opened for writing will not
Benjamin Peterson08bf91c2010-04-11 16:12:57 +0000216be finalized; only the internally used file object will be closed. See the
Lars Gustäbel01385812010-03-03 12:08:54 +0000217:ref:`tar-examples` section for a use case.
218
219.. versionadded:: 3.2
220 Added support for the context manager protocol.
Georg Brandl116aa622007-08-15 14:28:22 +0000221
Victor Stinnerde629d42010-05-05 21:43:57 +0000222.. class:: TarFile(name=None, mode='r', fileobj=None, format=DEFAULT_FORMAT, tarinfo=TarInfo, dereference=False, ignore_zeros=False, encoding=ENCODING, errors='surrogateescape', pax_headers=None, debug=0, errorlevel=0)
Georg Brandl116aa622007-08-15 14:28:22 +0000223
224 All following arguments are optional and can be accessed as instance attributes
225 as well.
226
227 *name* is the pathname of the archive. It can be omitted if *fileobj* is given.
228 In this case, the file object's :attr:`name` attribute is used if it exists.
229
230 *mode* is either ``'r'`` to read from an existing archive, ``'a'`` to append
231 data to an existing file or ``'w'`` to create a new file overwriting an existing
232 one.
233
234 If *fileobj* is given, it is used for reading or writing data. If it can be
235 determined, *mode* is overridden by *fileobj*'s mode. *fileobj* will be used
236 from position 0.
237
238 .. note::
239
240 *fileobj* is not closed, when :class:`TarFile` is closed.
241
242 *format* controls the archive format. It must be one of the constants
243 :const:`USTAR_FORMAT`, :const:`GNU_FORMAT` or :const:`PAX_FORMAT` that are
244 defined at module level.
245
Georg Brandl116aa622007-08-15 14:28:22 +0000246 The *tarinfo* argument can be used to replace the default :class:`TarInfo` class
247 with a different one.
248
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000249 If *dereference* is :const:`False`, add symbolic and hard links to the archive. If it
250 is :const:`True`, add the content of the target files to the archive. This has no
Georg Brandl116aa622007-08-15 14:28:22 +0000251 effect on systems that do not support symbolic links.
252
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000253 If *ignore_zeros* is :const:`False`, treat an empty block as the end of the archive.
254 If it is :const:`True`, skip empty (and invalid) blocks and try to get as many members
Georg Brandl116aa622007-08-15 14:28:22 +0000255 as possible. This is only useful for reading concatenated or damaged archives.
256
257 *debug* can be set from ``0`` (no debug messages) up to ``3`` (all debug
258 messages). The messages are written to ``sys.stderr``.
259
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000260 If *errorlevel* is ``0``, all errors are ignored when using :meth:`TarFile.extract`.
Georg Brandl116aa622007-08-15 14:28:22 +0000261 Nevertheless, they appear as error messages in the debug output, when debugging
262 is enabled. If ``1``, all *fatal* errors are raised as :exc:`OSError` or
263 :exc:`IOError` exceptions. If ``2``, all *non-fatal* errors are raised as
264 :exc:`TarError` exceptions as well.
265
Lars Gustäbel3741eff2007-08-21 12:17:05 +0000266 The *encoding* and *errors* arguments define the character encoding to be
267 used for reading or writing the archive and how conversion errors are going
268 to be handled. The default settings will work for most users.
Georg Brandl116aa622007-08-15 14:28:22 +0000269 See section :ref:`tar-unicode` for in-depth information.
270
Victor Stinnerde629d42010-05-05 21:43:57 +0000271 .. versionchanged:: 3.2
272 Use ``'surrogateescape'`` as the default for the *errors* argument.
273
Lars Gustäbel3741eff2007-08-21 12:17:05 +0000274 The *pax_headers* argument is an optional dictionary of strings which
Georg Brandl116aa622007-08-15 14:28:22 +0000275 will be added as a pax global header if *format* is :const:`PAX_FORMAT`.
276
Georg Brandl116aa622007-08-15 14:28:22 +0000277
278.. method:: TarFile.open(...)
279
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000280 Alternative constructor. The :func:`tarfile.open` function is actually a
281 shortcut to this classmethod.
Georg Brandl116aa622007-08-15 14:28:22 +0000282
283
284.. method:: TarFile.getmember(name)
285
286 Return a :class:`TarInfo` object for member *name*. If *name* can not be found
287 in the archive, :exc:`KeyError` is raised.
288
289 .. note::
290
291 If a member occurs more than once in the archive, its last occurrence is assumed
292 to be the most up-to-date version.
293
294
295.. method:: TarFile.getmembers()
296
297 Return the members of the archive as a list of :class:`TarInfo` objects. The
298 list has the same order as the members in the archive.
299
300
301.. method:: TarFile.getnames()
302
303 Return the members as a list of their names. It has the same order as the list
304 returned by :meth:`getmembers`.
305
306
307.. method:: TarFile.list(verbose=True)
308
309 Print a table of contents to ``sys.stdout``. If *verbose* is :const:`False`,
310 only the names of the members are printed. If it is :const:`True`, output
311 similar to that of :program:`ls -l` is produced.
312
313
314.. method:: TarFile.next()
315
316 Return the next member of the archive as a :class:`TarInfo` object, when
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000317 :class:`TarFile` is opened for reading. Return :const:`None` if there is no more
Georg Brandl116aa622007-08-15 14:28:22 +0000318 available.
319
320
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000321.. method:: TarFile.extractall(path=".", members=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000322
323 Extract all members from the archive to the current working directory or
324 directory *path*. If optional *members* is given, it must be a subset of the
325 list returned by :meth:`getmembers`. Directory information like owner,
326 modification time and permissions are set after all members have been extracted.
327 This is done to work around two problems: A directory's modification time is
328 reset each time a file is created in it. And, if a directory's permissions do
329 not allow writing, extracting files to it will fail.
330
Thomas Wouters47b49bf2007-08-30 22:15:33 +0000331 .. warning::
332
333 Never extract archives from untrusted sources without prior inspection.
334 It is possible that files are created outside of *path*, e.g. members
335 that have absolute filenames starting with ``"/"`` or filenames with two
336 dots ``".."``.
337
Georg Brandl116aa622007-08-15 14:28:22 +0000338
Martin v. Löwis16f344d2010-11-01 21:39:13 +0000339.. method:: TarFile.extract(member, path="", set_attrs=True)
Georg Brandl116aa622007-08-15 14:28:22 +0000340
341 Extract a member from the archive to the current working directory, using its
342 full name. Its file information is extracted as accurately as possible. *member*
343 may be a filename or a :class:`TarInfo` object. You can specify a different
Martin v. Löwis16f344d2010-11-01 21:39:13 +0000344 directory using *path*. File attributes (owner, mtime, mode) are set unless
345 *set_attrs* is False.
Georg Brandl116aa622007-08-15 14:28:22 +0000346
347 .. note::
348
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000349 The :meth:`extract` method does not take care of several extraction issues.
350 In most cases you should consider using the :meth:`extractall` method.
Georg Brandl116aa622007-08-15 14:28:22 +0000351
Thomas Wouters47b49bf2007-08-30 22:15:33 +0000352 .. warning::
353
354 See the warning for :meth:`extractall`.
355
Martin v. Löwis16f344d2010-11-01 21:39:13 +0000356 .. versionchanged:: 3.2
357 Added the *set_attrs* parameter.
Georg Brandl116aa622007-08-15 14:28:22 +0000358
359.. method:: TarFile.extractfile(member)
360
361 Extract a member from the archive as a file object. *member* may be a filename
Antoine Pitrou11cb9612010-09-15 11:11:28 +0000362 or a :class:`TarInfo` object. If *member* is a regular file, a :term:`file-like
363 object` is returned. If *member* is a link, a file-like object is constructed from
364 the link's target. If *member* is none of the above, :const:`None` is returned.
Georg Brandl116aa622007-08-15 14:28:22 +0000365
366 .. note::
367
Georg Brandlff2ad0e2009-04-27 16:51:45 +0000368 The file-like object is read-only. It provides the methods
369 :meth:`read`, :meth:`readline`, :meth:`readlines`, :meth:`seek`, :meth:`tell`,
370 and :meth:`close`, and also supports iteration over its lines.
Georg Brandl116aa622007-08-15 14:28:22 +0000371
372
Lars Gustäbel049d2aa2009-09-12 10:44:00 +0000373.. method:: TarFile.add(name, arcname=None, recursive=True, exclude=None, filter=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000374
375 Add the file *name* to the archive. *name* may be any type of file (directory,
376 fifo, symbolic link, etc.). If given, *arcname* specifies an alternative name
377 for the file in the archive. Directories are added recursively by default. This
Georg Brandl55ac8f02007-09-01 13:51:09 +0000378 can be avoided by setting *recursive* to :const:`False`. If *exclude* is given,
Georg Brandl116aa622007-08-15 14:28:22 +0000379 it must be a function that takes one filename argument and returns a boolean
380 value. Depending on this value the respective file is either excluded
Lars Gustäbel049d2aa2009-09-12 10:44:00 +0000381 (:const:`True`) or added (:const:`False`). If *filter* is specified it must
382 be a function that takes a :class:`TarInfo` object argument and returns the
Benjamin Petersona0dfa822009-11-13 02:25:08 +0000383 changed :class:`TarInfo` object. If it instead returns :const:`None` the :class:`TarInfo`
Lars Gustäbel049d2aa2009-09-12 10:44:00 +0000384 object will be excluded from the archive. See :ref:`tar-examples` for an
385 example.
386
387 .. versionchanged:: 3.2
388 Added the *filter* parameter.
389
390 .. deprecated:: 3.2
391 The *exclude* parameter is deprecated, please use the *filter* parameter
392 instead.
Georg Brandl116aa622007-08-15 14:28:22 +0000393
Georg Brandl116aa622007-08-15 14:28:22 +0000394
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000395.. method:: TarFile.addfile(tarinfo, fileobj=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000396
397 Add the :class:`TarInfo` object *tarinfo* to the archive. If *fileobj* is given,
398 ``tarinfo.size`` bytes are read from it and added to the archive. You can
399 create :class:`TarInfo` objects using :meth:`gettarinfo`.
400
401 .. note::
402
403 On Windows platforms, *fileobj* should always be opened with mode ``'rb'`` to
404 avoid irritation about the file size.
405
406
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000407.. method:: TarFile.gettarinfo(name=None, arcname=None, fileobj=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000408
Antoine Pitrou11cb9612010-09-15 11:11:28 +0000409 Create a :class:`TarInfo` object for either the file *name* or the :term:`file
410 object` *fileobj* (using :func:`os.fstat` on its file descriptor). You can modify
411 some of the :class:`TarInfo`'s attributes before you add it using :meth:`addfile`.
Georg Brandl116aa622007-08-15 14:28:22 +0000412 If given, *arcname* specifies an alternative name for the file in the archive.
413
414
415.. method:: TarFile.close()
416
417 Close the :class:`TarFile`. In write mode, two finishing zero blocks are
418 appended to the archive.
419
420
Georg Brandl116aa622007-08-15 14:28:22 +0000421.. attribute:: TarFile.pax_headers
422
423 A dictionary containing key-value pairs of pax global headers.
424
Georg Brandl116aa622007-08-15 14:28:22 +0000425
Georg Brandl116aa622007-08-15 14:28:22 +0000426
427.. _tarinfo-objects:
428
429TarInfo Objects
430---------------
431
432A :class:`TarInfo` object represents one member in a :class:`TarFile`. Aside
433from storing all required attributes of a file (like file type, size, time,
434permissions, owner etc.), it provides some useful methods to determine its type.
435It does *not* contain the file's data itself.
436
437:class:`TarInfo` objects are returned by :class:`TarFile`'s methods
438:meth:`getmember`, :meth:`getmembers` and :meth:`gettarinfo`.
439
440
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000441.. class:: TarInfo(name="")
Georg Brandl116aa622007-08-15 14:28:22 +0000442
443 Create a :class:`TarInfo` object.
444
445
446.. method:: TarInfo.frombuf(buf)
447
448 Create and return a :class:`TarInfo` object from string buffer *buf*.
449
Georg Brandl55ac8f02007-09-01 13:51:09 +0000450 Raises :exc:`HeaderError` if the buffer is invalid..
Georg Brandl116aa622007-08-15 14:28:22 +0000451
452
453.. method:: TarInfo.fromtarfile(tarfile)
454
455 Read the next member from the :class:`TarFile` object *tarfile* and return it as
456 a :class:`TarInfo` object.
457
Georg Brandl116aa622007-08-15 14:28:22 +0000458
Victor Stinnerde629d42010-05-05 21:43:57 +0000459.. method:: TarInfo.tobuf(format=DEFAULT_FORMAT, encoding=ENCODING, errors='surrogateescape')
Georg Brandl116aa622007-08-15 14:28:22 +0000460
461 Create a string buffer from a :class:`TarInfo` object. For information on the
462 arguments see the constructor of the :class:`TarFile` class.
463
Victor Stinnerde629d42010-05-05 21:43:57 +0000464 .. versionchanged:: 3.2
465 Use ``'surrogateescape'`` as the default for the *errors* argument.
466
Georg Brandl116aa622007-08-15 14:28:22 +0000467
468A ``TarInfo`` object has the following public data attributes:
469
470
471.. attribute:: TarInfo.name
472
473 Name of the archive member.
474
475
476.. attribute:: TarInfo.size
477
478 Size in bytes.
479
480
481.. attribute:: TarInfo.mtime
482
483 Time of last modification.
484
485
486.. attribute:: TarInfo.mode
487
488 Permission bits.
489
490
491.. attribute:: TarInfo.type
492
493 File type. *type* is usually one of these constants: :const:`REGTYPE`,
494 :const:`AREGTYPE`, :const:`LNKTYPE`, :const:`SYMTYPE`, :const:`DIRTYPE`,
495 :const:`FIFOTYPE`, :const:`CONTTYPE`, :const:`CHRTYPE`, :const:`BLKTYPE`,
496 :const:`GNUTYPE_SPARSE`. To determine the type of a :class:`TarInfo` object
497 more conveniently, use the ``is_*()`` methods below.
498
499
500.. attribute:: TarInfo.linkname
501
502 Name of the target file name, which is only present in :class:`TarInfo` objects
503 of type :const:`LNKTYPE` and :const:`SYMTYPE`.
504
505
506.. attribute:: TarInfo.uid
507
508 User ID of the user who originally stored this member.
509
510
511.. attribute:: TarInfo.gid
512
513 Group ID of the user who originally stored this member.
514
515
516.. attribute:: TarInfo.uname
517
518 User name.
519
520
521.. attribute:: TarInfo.gname
522
523 Group name.
524
525
526.. attribute:: TarInfo.pax_headers
527
528 A dictionary containing key-value pairs of an associated pax extended header.
529
Georg Brandl116aa622007-08-15 14:28:22 +0000530
531A :class:`TarInfo` object also provides some convenient query methods:
532
533
534.. method:: TarInfo.isfile()
535
536 Return :const:`True` if the :class:`Tarinfo` object is a regular file.
537
538
539.. method:: TarInfo.isreg()
540
541 Same as :meth:`isfile`.
542
543
544.. method:: TarInfo.isdir()
545
546 Return :const:`True` if it is a directory.
547
548
549.. method:: TarInfo.issym()
550
551 Return :const:`True` if it is a symbolic link.
552
553
554.. method:: TarInfo.islnk()
555
556 Return :const:`True` if it is a hard link.
557
558
559.. method:: TarInfo.ischr()
560
561 Return :const:`True` if it is a character device.
562
563
564.. method:: TarInfo.isblk()
565
566 Return :const:`True` if it is a block device.
567
568
569.. method:: TarInfo.isfifo()
570
571 Return :const:`True` if it is a FIFO.
572
573
574.. method:: TarInfo.isdev()
575
576 Return :const:`True` if it is one of character device, block device or FIFO.
577
Georg Brandl116aa622007-08-15 14:28:22 +0000578
579.. _tar-examples:
580
581Examples
582--------
583
584How to extract an entire tar archive to the current working directory::
585
586 import tarfile
587 tar = tarfile.open("sample.tar.gz")
588 tar.extractall()
589 tar.close()
590
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000591How to extract a subset of a tar archive with :meth:`TarFile.extractall` using
592a generator function instead of a list::
593
594 import os
595 import tarfile
596
597 def py_files(members):
598 for tarinfo in members:
599 if os.path.splitext(tarinfo.name)[1] == ".py":
600 yield tarinfo
601
602 tar = tarfile.open("sample.tar.gz")
603 tar.extractall(members=py_files(tar))
604 tar.close()
605
Georg Brandl116aa622007-08-15 14:28:22 +0000606How to create an uncompressed tar archive from a list of filenames::
607
608 import tarfile
609 tar = tarfile.open("sample.tar", "w")
610 for name in ["foo", "bar", "quux"]:
611 tar.add(name)
612 tar.close()
613
Lars Gustäbel01385812010-03-03 12:08:54 +0000614The same example using the :keyword:`with` statement::
615
616 import tarfile
617 with tarfile.open("sample.tar", "w") as tar:
618 for name in ["foo", "bar", "quux"]:
619 tar.add(name)
620
Georg Brandl116aa622007-08-15 14:28:22 +0000621How to read a gzip compressed tar archive and display some member information::
622
623 import tarfile
624 tar = tarfile.open("sample.tar.gz", "r:gz")
625 for tarinfo in tar:
Collin Winterc79461b2007-09-01 23:34:30 +0000626 print(tarinfo.name, "is", tarinfo.size, "bytes in size and is", end="")
Georg Brandl116aa622007-08-15 14:28:22 +0000627 if tarinfo.isreg():
Collin Winterc79461b2007-09-01 23:34:30 +0000628 print("a regular file.")
Georg Brandl116aa622007-08-15 14:28:22 +0000629 elif tarinfo.isdir():
Collin Winterc79461b2007-09-01 23:34:30 +0000630 print("a directory.")
Georg Brandl116aa622007-08-15 14:28:22 +0000631 else:
Collin Winterc79461b2007-09-01 23:34:30 +0000632 print("something else.")
Georg Brandl116aa622007-08-15 14:28:22 +0000633 tar.close()
634
Lars Gustäbel049d2aa2009-09-12 10:44:00 +0000635How to create an archive and reset the user information using the *filter*
636parameter in :meth:`TarFile.add`::
637
638 import tarfile
639 def reset(tarinfo):
640 tarinfo.uid = tarinfo.gid = 0
641 tarinfo.uname = tarinfo.gname = "root"
642 return tarinfo
643 tar = tarfile.open("sample.tar.gz", "w:gz")
644 tar.add("foo", filter=reset)
645 tar.close()
646
Georg Brandl116aa622007-08-15 14:28:22 +0000647
648.. _tar-formats:
649
650Supported tar formats
651---------------------
652
653There are three tar formats that can be created with the :mod:`tarfile` module:
654
655* The POSIX.1-1988 ustar format (:const:`USTAR_FORMAT`). It supports filenames
656 up to a length of at best 256 characters and linknames up to 100 characters. The
657 maximum file size is 8 gigabytes. This is an old and limited but widely
658 supported format.
659
660* The GNU tar format (:const:`GNU_FORMAT`). It supports long filenames and
661 linknames, files bigger than 8 gigabytes and sparse files. It is the de facto
662 standard on GNU/Linux systems. :mod:`tarfile` fully supports the GNU tar
663 extensions for long names, sparse file support is read-only.
664
665* The POSIX.1-2001 pax format (:const:`PAX_FORMAT`). It is the most flexible
666 format with virtually no limits. It supports long filenames and linknames, large
667 files and stores pathnames in a portable way. However, not all tar
668 implementations today are able to handle pax archives properly.
669
670 The *pax* format is an extension to the existing *ustar* format. It uses extra
671 headers for information that cannot be stored otherwise. There are two flavours
672 of pax headers: Extended headers only affect the subsequent file header, global
673 headers are valid for the complete archive and affect all following files. All
674 the data in a pax header is encoded in *UTF-8* for portability reasons.
675
676There are some more variants of the tar format which can be read, but not
677created:
678
679* The ancient V7 format. This is the first tar format from Unix Seventh Edition,
680 storing only regular files and directories. Names must not be longer than 100
681 characters, there is no user/group name information. Some archives have
682 miscalculated header checksums in case of fields with non-ASCII characters.
683
684* The SunOS tar extended format. This format is a variant of the POSIX.1-2001
685 pax format, but is not compatible.
686
Georg Brandl116aa622007-08-15 14:28:22 +0000687.. _tar-unicode:
688
689Unicode issues
690--------------
691
692The tar format was originally conceived to make backups on tape drives with the
693main focus on preserving file system information. Nowadays tar archives are
694commonly used for file distribution and exchanging archives over networks. One
Lars Gustäbel3741eff2007-08-21 12:17:05 +0000695problem of the original format (which is the basis of all other formats) is
696that there is no concept of supporting different character encodings. For
Georg Brandl116aa622007-08-15 14:28:22 +0000697example, an ordinary tar archive created on a *UTF-8* system cannot be read
Lars Gustäbel3741eff2007-08-21 12:17:05 +0000698correctly on a *Latin-1* system if it contains non-*ASCII* characters. Textual
699metadata (like filenames, linknames, user/group names) will appear damaged.
700Unfortunately, there is no way to autodetect the encoding of an archive. The
701pax format was designed to solve this problem. It stores non-ASCII metadata
702using the universal character encoding *UTF-8*.
Georg Brandl116aa622007-08-15 14:28:22 +0000703
Lars Gustäbel3741eff2007-08-21 12:17:05 +0000704The details of character conversion in :mod:`tarfile` are controlled by the
705*encoding* and *errors* keyword arguments of the :class:`TarFile` class.
Georg Brandl116aa622007-08-15 14:28:22 +0000706
Lars Gustäbel3741eff2007-08-21 12:17:05 +0000707*encoding* defines the character encoding to use for the metadata in the
708archive. The default value is :func:`sys.getfilesystemencoding` or ``'ascii'``
709as a fallback. Depending on whether the archive is read or written, the
710metadata must be either decoded or encoded. If *encoding* is not set
711appropriately, this conversion may fail.
Georg Brandl116aa622007-08-15 14:28:22 +0000712
713The *errors* argument defines how characters are treated that cannot be
Victor Stinnerde629d42010-05-05 21:43:57 +0000714converted. Possible values are listed in section :ref:`codec-base-classes`.
715The default scheme is ``'surrogateescape'`` which Python also uses for its
716file system calls, see :ref:`os-filenames`.
Georg Brandl116aa622007-08-15 14:28:22 +0000717
Lars Gustäbel1465cc22010-05-17 18:02:50 +0000718In case of :const:`PAX_FORMAT` archives, *encoding* is generally not needed
719because all the metadata is stored using *UTF-8*. *encoding* is only used in
720the rare cases when binary pax headers are decoded or when strings with
721surrogate characters are stored.
722