blob: 46e4900b40be62e9850f890c3a5804865a033d55 [file] [log] [blame]
Georg Brandl116aa622007-08-15 14:28:22 +00001:mod:`tarfile` --- Read and write tar archive files
2===================================================
3
4.. module:: tarfile
5 :synopsis: Read and write tar-format archive files.
6
7
Georg Brandl116aa622007-08-15 14:28:22 +00008.. moduleauthor:: Lars Gustäbel <lars@gustaebel.de>
9.. sectionauthor:: Lars Gustäbel <lars@gustaebel.de>
10
Raymond Hettingera1993682011-01-27 01:20:32 +000011**Source code:** :source:`Lib/tarfile.py`
12
13--------------
Georg Brandl116aa622007-08-15 14:28:22 +000014
Guido van Rossum77677112007-11-05 19:43:04 +000015The :mod:`tarfile` module makes it possible to read and write tar
16archives, including those using gzip or bz2 compression.
Éric Araujof2fbb9c2012-01-16 16:55:55 +010017Use the :mod:`zipfile` module to read or write :file:`.zip` files, or the
18higher-level functions in :ref:`shutil <archiving-operations>`.
Guido van Rossum77677112007-11-05 19:43:04 +000019
Georg Brandl116aa622007-08-15 14:28:22 +000020Some facts and figures:
21
Guido van Rossum77677112007-11-05 19:43:04 +000022* reads and writes :mod:`gzip` and :mod:`bz2` compressed archives.
Georg Brandl116aa622007-08-15 14:28:22 +000023
24* read/write support for the POSIX.1-1988 (ustar) format.
25
26* read/write support for the GNU tar format including *longname* and *longlink*
Lars Gustäbel9cbdd752010-10-29 09:08:19 +000027 extensions, read-only support for all variants of the *sparse* extension
28 including restoration of sparse files.
Georg Brandl116aa622007-08-15 14:28:22 +000029
30* read/write support for the POSIX.1-2001 (pax) format.
31
Georg Brandl116aa622007-08-15 14:28:22 +000032* handles directories, regular files, hardlinks, symbolic links, fifos,
33 character devices and block devices and is able to acquire and restore file
34 information like timestamp, access permissions and owner.
35
Georg Brandl116aa622007-08-15 14:28:22 +000036
Benjamin Petersona37cfc62008-05-26 13:48:34 +000037.. function:: open(name=None, mode='r', fileobj=None, bufsize=10240, \*\*kwargs)
Georg Brandl116aa622007-08-15 14:28:22 +000038
39 Return a :class:`TarFile` object for the pathname *name*. For detailed
40 information on :class:`TarFile` objects and the keyword arguments that are
41 allowed, see :ref:`tarfile-objects`.
42
43 *mode* has to be a string of the form ``'filemode[:compression]'``, it defaults
44 to ``'r'``. Here is a full list of mode combinations:
45
46 +------------------+---------------------------------------------+
47 | mode | action |
48 +==================+=============================================+
49 | ``'r' or 'r:*'`` | Open for reading with transparent |
50 | | compression (recommended). |
51 +------------------+---------------------------------------------+
52 | ``'r:'`` | Open for reading exclusively without |
53 | | compression. |
54 +------------------+---------------------------------------------+
55 | ``'r:gz'`` | Open for reading with gzip compression. |
56 +------------------+---------------------------------------------+
57 | ``'r:bz2'`` | Open for reading with bzip2 compression. |
58 +------------------+---------------------------------------------+
59 | ``'a' or 'a:'`` | Open for appending with no compression. The |
60 | | file is created if it does not exist. |
61 +------------------+---------------------------------------------+
62 | ``'w' or 'w:'`` | Open for uncompressed writing. |
63 +------------------+---------------------------------------------+
64 | ``'w:gz'`` | Open for gzip compressed writing. |
65 +------------------+---------------------------------------------+
66 | ``'w:bz2'`` | Open for bzip2 compressed writing. |
67 +------------------+---------------------------------------------+
68
69 Note that ``'a:gz'`` or ``'a:bz2'`` is not possible. If *mode* is not suitable
70 to open a certain (compressed) file for reading, :exc:`ReadError` is raised. Use
71 *mode* ``'r'`` to avoid this. If a compression method is not supported,
72 :exc:`CompressionError` is raised.
73
Antoine Pitrou11cb9612010-09-15 11:11:28 +000074 If *fileobj* is specified, it is used as an alternative to a :term:`file object`
75 opened in binary mode for *name*. It is supposed to be at position 0.
Georg Brandl116aa622007-08-15 14:28:22 +000076
77 For special purposes, there is a second format for *mode*:
Benjamin Petersona37cfc62008-05-26 13:48:34 +000078 ``'filemode|[compression]'``. :func:`tarfile.open` will return a :class:`TarFile`
Georg Brandl116aa622007-08-15 14:28:22 +000079 object that processes its data as a stream of blocks. No random seeking will
80 be done on the file. If given, *fileobj* may be any object that has a
81 :meth:`read` or :meth:`write` method (depending on the *mode*). *bufsize*
82 specifies the blocksize and defaults to ``20 * 512`` bytes. Use this variant
Antoine Pitrou11cb9612010-09-15 11:11:28 +000083 in combination with e.g. ``sys.stdin``, a socket :term:`file object` or a tape
Georg Brandl116aa622007-08-15 14:28:22 +000084 device. However, such a :class:`TarFile` object is limited in that it does
85 not allow to be accessed randomly, see :ref:`tar-examples`. The currently
86 possible modes:
87
88 +-------------+--------------------------------------------+
89 | Mode | Action |
90 +=============+============================================+
91 | ``'r|*'`` | Open a *stream* of tar blocks for reading |
92 | | with transparent compression. |
93 +-------------+--------------------------------------------+
94 | ``'r|'`` | Open a *stream* of uncompressed tar blocks |
95 | | for reading. |
96 +-------------+--------------------------------------------+
97 | ``'r|gz'`` | Open a gzip compressed *stream* for |
98 | | reading. |
99 +-------------+--------------------------------------------+
100 | ``'r|bz2'`` | Open a bzip2 compressed *stream* for |
101 | | reading. |
102 +-------------+--------------------------------------------+
103 | ``'w|'`` | Open an uncompressed *stream* for writing. |
104 +-------------+--------------------------------------------+
Lars Gustäbel0c6cbbd2011-12-10 12:45:45 +0100105 | ``'w|gz'`` | Open a gzip compressed *stream* for |
Georg Brandl116aa622007-08-15 14:28:22 +0000106 | | writing. |
107 +-------------+--------------------------------------------+
Lars Gustäbel0c6cbbd2011-12-10 12:45:45 +0100108 | ``'w|bz2'`` | Open a bzip2 compressed *stream* for |
Georg Brandl116aa622007-08-15 14:28:22 +0000109 | | writing. |
110 +-------------+--------------------------------------------+
111
112
113.. class:: TarFile
114
115 Class for reading and writing tar archives. Do not use this class directly,
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000116 better use :func:`tarfile.open` instead. See :ref:`tarfile-objects`.
Georg Brandl116aa622007-08-15 14:28:22 +0000117
118
119.. function:: is_tarfile(name)
120
121 Return :const:`True` if *name* is a tar archive file, that the :mod:`tarfile`
122 module can read.
123
124
Lars Gustäbel0c24e8b2008-08-02 11:43:24 +0000125The :mod:`tarfile` module defines the following exceptions:
Georg Brandl116aa622007-08-15 14:28:22 +0000126
127
128.. exception:: TarError
129
130 Base class for all :mod:`tarfile` exceptions.
131
132
133.. exception:: ReadError
134
135 Is raised when a tar archive is opened, that either cannot be handled by the
136 :mod:`tarfile` module or is somehow invalid.
137
138
139.. exception:: CompressionError
140
141 Is raised when a compression method is not supported or when the data cannot be
142 decoded properly.
143
144
145.. exception:: StreamError
146
147 Is raised for the limitations that are typical for stream-like :class:`TarFile`
148 objects.
149
150
151.. exception:: ExtractError
152
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000153 Is raised for *non-fatal* errors when using :meth:`TarFile.extract`, but only if
Georg Brandl116aa622007-08-15 14:28:22 +0000154 :attr:`TarFile.errorlevel`\ ``== 2``.
155
156
157.. exception:: HeaderError
158
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000159 Is raised by :meth:`TarInfo.frombuf` if the buffer it gets is invalid.
160
Georg Brandl116aa622007-08-15 14:28:22 +0000161
Georg Brandl116aa622007-08-15 14:28:22 +0000162
163Each of the following constants defines a tar archive format that the
164:mod:`tarfile` module is able to create. See section :ref:`tar-formats` for
165details.
166
167
168.. data:: USTAR_FORMAT
169
170 POSIX.1-1988 (ustar) format.
171
172
173.. data:: GNU_FORMAT
174
175 GNU tar format.
176
177
178.. data:: PAX_FORMAT
179
180 POSIX.1-2001 (pax) format.
181
182
183.. data:: DEFAULT_FORMAT
184
185 The default format for creating archives. This is currently :const:`GNU_FORMAT`.
186
187
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000188The following variables are available on module level:
189
190
191.. data:: ENCODING
192
Victor Stinner0f35e2c2010-06-11 23:46:47 +0000193 The default character encoding: ``'utf-8'`` on Windows,
194 :func:`sys.getfilesystemencoding` otherwise.
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000195
196
Georg Brandl116aa622007-08-15 14:28:22 +0000197.. seealso::
198
199 Module :mod:`zipfile`
200 Documentation of the :mod:`zipfile` standard module.
201
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000202 `GNU tar manual, Basic Tar Format <http://www.gnu.org/software/tar/manual/html_node/Standard.html>`_
Georg Brandl116aa622007-08-15 14:28:22 +0000203 Documentation for tar archive files, including GNU tar extensions.
204
Georg Brandl116aa622007-08-15 14:28:22 +0000205
206.. _tarfile-objects:
207
208TarFile Objects
209---------------
210
211The :class:`TarFile` object provides an interface to a tar archive. A tar
212archive is a sequence of blocks. An archive member (a stored file) is made up of
213a header block followed by data blocks. It is possible to store a file in a tar
214archive several times. Each archive member is represented by a :class:`TarInfo`
215object, see :ref:`tarinfo-objects` for details.
216
Lars Gustäbel01385812010-03-03 12:08:54 +0000217A :class:`TarFile` object can be used as a context manager in a :keyword:`with`
218statement. It will automatically be closed when the block is completed. Please
219note that in the event of an exception an archive opened for writing will not
Benjamin Peterson08bf91c2010-04-11 16:12:57 +0000220be finalized; only the internally used file object will be closed. See the
Lars Gustäbel01385812010-03-03 12:08:54 +0000221:ref:`tar-examples` section for a use case.
222
223.. versionadded:: 3.2
224 Added support for the context manager protocol.
Georg Brandl116aa622007-08-15 14:28:22 +0000225
Victor Stinnerde629d42010-05-05 21:43:57 +0000226.. class:: TarFile(name=None, mode='r', fileobj=None, format=DEFAULT_FORMAT, tarinfo=TarInfo, dereference=False, ignore_zeros=False, encoding=ENCODING, errors='surrogateescape', pax_headers=None, debug=0, errorlevel=0)
Georg Brandl116aa622007-08-15 14:28:22 +0000227
228 All following arguments are optional and can be accessed as instance attributes
229 as well.
230
231 *name* is the pathname of the archive. It can be omitted if *fileobj* is given.
232 In this case, the file object's :attr:`name` attribute is used if it exists.
233
234 *mode* is either ``'r'`` to read from an existing archive, ``'a'`` to append
235 data to an existing file or ``'w'`` to create a new file overwriting an existing
236 one.
237
238 If *fileobj* is given, it is used for reading or writing data. If it can be
239 determined, *mode* is overridden by *fileobj*'s mode. *fileobj* will be used
240 from position 0.
241
242 .. note::
243
244 *fileobj* is not closed, when :class:`TarFile` is closed.
245
246 *format* controls the archive format. It must be one of the constants
247 :const:`USTAR_FORMAT`, :const:`GNU_FORMAT` or :const:`PAX_FORMAT` that are
248 defined at module level.
249
Georg Brandl116aa622007-08-15 14:28:22 +0000250 The *tarinfo* argument can be used to replace the default :class:`TarInfo` class
251 with a different one.
252
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000253 If *dereference* is :const:`False`, add symbolic and hard links to the archive. If it
254 is :const:`True`, add the content of the target files to the archive. This has no
Georg Brandl116aa622007-08-15 14:28:22 +0000255 effect on systems that do not support symbolic links.
256
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000257 If *ignore_zeros* is :const:`False`, treat an empty block as the end of the archive.
258 If it is :const:`True`, skip empty (and invalid) blocks and try to get as many members
Georg Brandl116aa622007-08-15 14:28:22 +0000259 as possible. This is only useful for reading concatenated or damaged archives.
260
261 *debug* can be set from ``0`` (no debug messages) up to ``3`` (all debug
262 messages). The messages are written to ``sys.stderr``.
263
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000264 If *errorlevel* is ``0``, all errors are ignored when using :meth:`TarFile.extract`.
Georg Brandl116aa622007-08-15 14:28:22 +0000265 Nevertheless, they appear as error messages in the debug output, when debugging
266 is enabled. If ``1``, all *fatal* errors are raised as :exc:`OSError` or
267 :exc:`IOError` exceptions. If ``2``, all *non-fatal* errors are raised as
268 :exc:`TarError` exceptions as well.
269
Lars Gustäbel3741eff2007-08-21 12:17:05 +0000270 The *encoding* and *errors* arguments define the character encoding to be
271 used for reading or writing the archive and how conversion errors are going
272 to be handled. The default settings will work for most users.
Georg Brandl116aa622007-08-15 14:28:22 +0000273 See section :ref:`tar-unicode` for in-depth information.
274
Victor Stinnerde629d42010-05-05 21:43:57 +0000275 .. versionchanged:: 3.2
276 Use ``'surrogateescape'`` as the default for the *errors* argument.
277
Lars Gustäbel3741eff2007-08-21 12:17:05 +0000278 The *pax_headers* argument is an optional dictionary of strings which
Georg Brandl116aa622007-08-15 14:28:22 +0000279 will be added as a pax global header if *format* is :const:`PAX_FORMAT`.
280
Georg Brandl116aa622007-08-15 14:28:22 +0000281
282.. method:: TarFile.open(...)
283
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000284 Alternative constructor. The :func:`tarfile.open` function is actually a
285 shortcut to this classmethod.
Georg Brandl116aa622007-08-15 14:28:22 +0000286
287
288.. method:: TarFile.getmember(name)
289
290 Return a :class:`TarInfo` object for member *name*. If *name* can not be found
291 in the archive, :exc:`KeyError` is raised.
292
293 .. note::
294
295 If a member occurs more than once in the archive, its last occurrence is assumed
296 to be the most up-to-date version.
297
298
299.. method:: TarFile.getmembers()
300
301 Return the members of the archive as a list of :class:`TarInfo` objects. The
302 list has the same order as the members in the archive.
303
304
305.. method:: TarFile.getnames()
306
307 Return the members as a list of their names. It has the same order as the list
308 returned by :meth:`getmembers`.
309
310
311.. method:: TarFile.list(verbose=True)
312
313 Print a table of contents to ``sys.stdout``. If *verbose* is :const:`False`,
314 only the names of the members are printed. If it is :const:`True`, output
315 similar to that of :program:`ls -l` is produced.
316
317
318.. method:: TarFile.next()
319
320 Return the next member of the archive as a :class:`TarInfo` object, when
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000321 :class:`TarFile` is opened for reading. Return :const:`None` if there is no more
Georg Brandl116aa622007-08-15 14:28:22 +0000322 available.
323
324
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000325.. method:: TarFile.extractall(path=".", members=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000326
327 Extract all members from the archive to the current working directory or
328 directory *path*. If optional *members* is given, it must be a subset of the
329 list returned by :meth:`getmembers`. Directory information like owner,
330 modification time and permissions are set after all members have been extracted.
331 This is done to work around two problems: A directory's modification time is
332 reset each time a file is created in it. And, if a directory's permissions do
333 not allow writing, extracting files to it will fail.
334
Thomas Wouters47b49bf2007-08-30 22:15:33 +0000335 .. warning::
336
337 Never extract archives from untrusted sources without prior inspection.
338 It is possible that files are created outside of *path*, e.g. members
339 that have absolute filenames starting with ``"/"`` or filenames with two
340 dots ``".."``.
341
Georg Brandl116aa622007-08-15 14:28:22 +0000342
Martin v. Löwis16f344d2010-11-01 21:39:13 +0000343.. method:: TarFile.extract(member, path="", set_attrs=True)
Georg Brandl116aa622007-08-15 14:28:22 +0000344
345 Extract a member from the archive to the current working directory, using its
346 full name. Its file information is extracted as accurately as possible. *member*
347 may be a filename or a :class:`TarInfo` object. You can specify a different
Martin v. Löwis16f344d2010-11-01 21:39:13 +0000348 directory using *path*. File attributes (owner, mtime, mode) are set unless
349 *set_attrs* is False.
Georg Brandl116aa622007-08-15 14:28:22 +0000350
351 .. note::
352
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000353 The :meth:`extract` method does not take care of several extraction issues.
354 In most cases you should consider using the :meth:`extractall` method.
Georg Brandl116aa622007-08-15 14:28:22 +0000355
Thomas Wouters47b49bf2007-08-30 22:15:33 +0000356 .. warning::
357
358 See the warning for :meth:`extractall`.
359
Martin v. Löwis16f344d2010-11-01 21:39:13 +0000360 .. versionchanged:: 3.2
361 Added the *set_attrs* parameter.
Georg Brandl116aa622007-08-15 14:28:22 +0000362
363.. method:: TarFile.extractfile(member)
364
365 Extract a member from the archive as a file object. *member* may be a filename
Antoine Pitrou11cb9612010-09-15 11:11:28 +0000366 or a :class:`TarInfo` object. If *member* is a regular file, a :term:`file-like
367 object` is returned. If *member* is a link, a file-like object is constructed from
368 the link's target. If *member* is none of the above, :const:`None` is returned.
Georg Brandl116aa622007-08-15 14:28:22 +0000369
370 .. note::
371
Georg Brandlff2ad0e2009-04-27 16:51:45 +0000372 The file-like object is read-only. It provides the methods
373 :meth:`read`, :meth:`readline`, :meth:`readlines`, :meth:`seek`, :meth:`tell`,
374 and :meth:`close`, and also supports iteration over its lines.
Georg Brandl116aa622007-08-15 14:28:22 +0000375
376
Raymond Hettingera63a3122011-01-26 20:34:14 +0000377.. method:: TarFile.add(name, arcname=None, recursive=True, exclude=None, *, filter=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000378
Raymond Hettingera63a3122011-01-26 20:34:14 +0000379 Add the file *name* to the archive. *name* may be any type of file
380 (directory, fifo, symbolic link, etc.). If given, *arcname* specifies an
381 alternative name for the file in the archive. Directories are added
382 recursively by default. This can be avoided by setting *recursive* to
383 :const:`False`. If *exclude* is given, it must be a function that takes one
384 filename argument and returns a boolean value. Depending on this value the
385 respective file is either excluded (:const:`True`) or added
386 (:const:`False`). If *filter* is specified it must be a keyword argument. It
387 should be a function that takes a :class:`TarInfo` object argument and
388 returns the changed :class:`TarInfo` object. If it instead returns
389 :const:`None` the :class:`TarInfo` object will be excluded from the
390 archive. See :ref:`tar-examples` for an example.
Lars Gustäbel049d2aa2009-09-12 10:44:00 +0000391
392 .. versionchanged:: 3.2
393 Added the *filter* parameter.
394
395 .. deprecated:: 3.2
396 The *exclude* parameter is deprecated, please use the *filter* parameter
397 instead.
Georg Brandl116aa622007-08-15 14:28:22 +0000398
Georg Brandl116aa622007-08-15 14:28:22 +0000399
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000400.. method:: TarFile.addfile(tarinfo, fileobj=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000401
402 Add the :class:`TarInfo` object *tarinfo* to the archive. If *fileobj* is given,
403 ``tarinfo.size`` bytes are read from it and added to the archive. You can
404 create :class:`TarInfo` objects using :meth:`gettarinfo`.
405
406 .. note::
407
408 On Windows platforms, *fileobj* should always be opened with mode ``'rb'`` to
409 avoid irritation about the file size.
410
411
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000412.. method:: TarFile.gettarinfo(name=None, arcname=None, fileobj=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000413
Antoine Pitrou11cb9612010-09-15 11:11:28 +0000414 Create a :class:`TarInfo` object for either the file *name* or the :term:`file
415 object` *fileobj* (using :func:`os.fstat` on its file descriptor). You can modify
416 some of the :class:`TarInfo`'s attributes before you add it using :meth:`addfile`.
Georg Brandl116aa622007-08-15 14:28:22 +0000417 If given, *arcname* specifies an alternative name for the file in the archive.
418
419
420.. method:: TarFile.close()
421
422 Close the :class:`TarFile`. In write mode, two finishing zero blocks are
423 appended to the archive.
424
425
Georg Brandl116aa622007-08-15 14:28:22 +0000426.. attribute:: TarFile.pax_headers
427
428 A dictionary containing key-value pairs of pax global headers.
429
Georg Brandl116aa622007-08-15 14:28:22 +0000430
Georg Brandl116aa622007-08-15 14:28:22 +0000431
432.. _tarinfo-objects:
433
434TarInfo Objects
435---------------
436
437A :class:`TarInfo` object represents one member in a :class:`TarFile`. Aside
438from storing all required attributes of a file (like file type, size, time,
439permissions, owner etc.), it provides some useful methods to determine its type.
440It does *not* contain the file's data itself.
441
442:class:`TarInfo` objects are returned by :class:`TarFile`'s methods
443:meth:`getmember`, :meth:`getmembers` and :meth:`gettarinfo`.
444
445
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000446.. class:: TarInfo(name="")
Georg Brandl116aa622007-08-15 14:28:22 +0000447
448 Create a :class:`TarInfo` object.
449
450
451.. method:: TarInfo.frombuf(buf)
452
453 Create and return a :class:`TarInfo` object from string buffer *buf*.
454
Georg Brandl55ac8f02007-09-01 13:51:09 +0000455 Raises :exc:`HeaderError` if the buffer is invalid..
Georg Brandl116aa622007-08-15 14:28:22 +0000456
457
458.. method:: TarInfo.fromtarfile(tarfile)
459
460 Read the next member from the :class:`TarFile` object *tarfile* and return it as
461 a :class:`TarInfo` object.
462
Georg Brandl116aa622007-08-15 14:28:22 +0000463
Victor Stinnerde629d42010-05-05 21:43:57 +0000464.. method:: TarInfo.tobuf(format=DEFAULT_FORMAT, encoding=ENCODING, errors='surrogateescape')
Georg Brandl116aa622007-08-15 14:28:22 +0000465
466 Create a string buffer from a :class:`TarInfo` object. For information on the
467 arguments see the constructor of the :class:`TarFile` class.
468
Victor Stinnerde629d42010-05-05 21:43:57 +0000469 .. versionchanged:: 3.2
470 Use ``'surrogateescape'`` as the default for the *errors* argument.
471
Georg Brandl116aa622007-08-15 14:28:22 +0000472
473A ``TarInfo`` object has the following public data attributes:
474
475
476.. attribute:: TarInfo.name
477
478 Name of the archive member.
479
480
481.. attribute:: TarInfo.size
482
483 Size in bytes.
484
485
486.. attribute:: TarInfo.mtime
487
488 Time of last modification.
489
490
491.. attribute:: TarInfo.mode
492
493 Permission bits.
494
495
496.. attribute:: TarInfo.type
497
498 File type. *type* is usually one of these constants: :const:`REGTYPE`,
499 :const:`AREGTYPE`, :const:`LNKTYPE`, :const:`SYMTYPE`, :const:`DIRTYPE`,
500 :const:`FIFOTYPE`, :const:`CONTTYPE`, :const:`CHRTYPE`, :const:`BLKTYPE`,
501 :const:`GNUTYPE_SPARSE`. To determine the type of a :class:`TarInfo` object
502 more conveniently, use the ``is_*()`` methods below.
503
504
505.. attribute:: TarInfo.linkname
506
507 Name of the target file name, which is only present in :class:`TarInfo` objects
508 of type :const:`LNKTYPE` and :const:`SYMTYPE`.
509
510
511.. attribute:: TarInfo.uid
512
513 User ID of the user who originally stored this member.
514
515
516.. attribute:: TarInfo.gid
517
518 Group ID of the user who originally stored this member.
519
520
521.. attribute:: TarInfo.uname
522
523 User name.
524
525
526.. attribute:: TarInfo.gname
527
528 Group name.
529
530
531.. attribute:: TarInfo.pax_headers
532
533 A dictionary containing key-value pairs of an associated pax extended header.
534
Georg Brandl116aa622007-08-15 14:28:22 +0000535
536A :class:`TarInfo` object also provides some convenient query methods:
537
538
539.. method:: TarInfo.isfile()
540
541 Return :const:`True` if the :class:`Tarinfo` object is a regular file.
542
543
544.. method:: TarInfo.isreg()
545
546 Same as :meth:`isfile`.
547
548
549.. method:: TarInfo.isdir()
550
551 Return :const:`True` if it is a directory.
552
553
554.. method:: TarInfo.issym()
555
556 Return :const:`True` if it is a symbolic link.
557
558
559.. method:: TarInfo.islnk()
560
561 Return :const:`True` if it is a hard link.
562
563
564.. method:: TarInfo.ischr()
565
566 Return :const:`True` if it is a character device.
567
568
569.. method:: TarInfo.isblk()
570
571 Return :const:`True` if it is a block device.
572
573
574.. method:: TarInfo.isfifo()
575
576 Return :const:`True` if it is a FIFO.
577
578
579.. method:: TarInfo.isdev()
580
581 Return :const:`True` if it is one of character device, block device or FIFO.
582
Georg Brandl116aa622007-08-15 14:28:22 +0000583
584.. _tar-examples:
585
586Examples
587--------
588
589How to extract an entire tar archive to the current working directory::
590
591 import tarfile
592 tar = tarfile.open("sample.tar.gz")
593 tar.extractall()
594 tar.close()
595
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000596How to extract a subset of a tar archive with :meth:`TarFile.extractall` using
597a generator function instead of a list::
598
599 import os
600 import tarfile
601
602 def py_files(members):
603 for tarinfo in members:
604 if os.path.splitext(tarinfo.name)[1] == ".py":
605 yield tarinfo
606
607 tar = tarfile.open("sample.tar.gz")
608 tar.extractall(members=py_files(tar))
609 tar.close()
610
Georg Brandl116aa622007-08-15 14:28:22 +0000611How to create an uncompressed tar archive from a list of filenames::
612
613 import tarfile
614 tar = tarfile.open("sample.tar", "w")
615 for name in ["foo", "bar", "quux"]:
616 tar.add(name)
617 tar.close()
618
Lars Gustäbel01385812010-03-03 12:08:54 +0000619The same example using the :keyword:`with` statement::
620
621 import tarfile
622 with tarfile.open("sample.tar", "w") as tar:
623 for name in ["foo", "bar", "quux"]:
624 tar.add(name)
625
Georg Brandl116aa622007-08-15 14:28:22 +0000626How to read a gzip compressed tar archive and display some member information::
627
628 import tarfile
629 tar = tarfile.open("sample.tar.gz", "r:gz")
630 for tarinfo in tar:
Collin Winterc79461b2007-09-01 23:34:30 +0000631 print(tarinfo.name, "is", tarinfo.size, "bytes in size and is", end="")
Georg Brandl116aa622007-08-15 14:28:22 +0000632 if tarinfo.isreg():
Collin Winterc79461b2007-09-01 23:34:30 +0000633 print("a regular file.")
Georg Brandl116aa622007-08-15 14:28:22 +0000634 elif tarinfo.isdir():
Collin Winterc79461b2007-09-01 23:34:30 +0000635 print("a directory.")
Georg Brandl116aa622007-08-15 14:28:22 +0000636 else:
Collin Winterc79461b2007-09-01 23:34:30 +0000637 print("something else.")
Georg Brandl116aa622007-08-15 14:28:22 +0000638 tar.close()
639
Lars Gustäbel049d2aa2009-09-12 10:44:00 +0000640How to create an archive and reset the user information using the *filter*
641parameter in :meth:`TarFile.add`::
642
643 import tarfile
644 def reset(tarinfo):
645 tarinfo.uid = tarinfo.gid = 0
646 tarinfo.uname = tarinfo.gname = "root"
647 return tarinfo
648 tar = tarfile.open("sample.tar.gz", "w:gz")
649 tar.add("foo", filter=reset)
650 tar.close()
651
Georg Brandl116aa622007-08-15 14:28:22 +0000652
653.. _tar-formats:
654
655Supported tar formats
656---------------------
657
658There are three tar formats that can be created with the :mod:`tarfile` module:
659
660* The POSIX.1-1988 ustar format (:const:`USTAR_FORMAT`). It supports filenames
661 up to a length of at best 256 characters and linknames up to 100 characters. The
662 maximum file size is 8 gigabytes. This is an old and limited but widely
663 supported format.
664
665* The GNU tar format (:const:`GNU_FORMAT`). It supports long filenames and
666 linknames, files bigger than 8 gigabytes and sparse files. It is the de facto
667 standard on GNU/Linux systems. :mod:`tarfile` fully supports the GNU tar
668 extensions for long names, sparse file support is read-only.
669
670* The POSIX.1-2001 pax format (:const:`PAX_FORMAT`). It is the most flexible
671 format with virtually no limits. It supports long filenames and linknames, large
672 files and stores pathnames in a portable way. However, not all tar
673 implementations today are able to handle pax archives properly.
674
675 The *pax* format is an extension to the existing *ustar* format. It uses extra
676 headers for information that cannot be stored otherwise. There are two flavours
677 of pax headers: Extended headers only affect the subsequent file header, global
678 headers are valid for the complete archive and affect all following files. All
679 the data in a pax header is encoded in *UTF-8* for portability reasons.
680
681There are some more variants of the tar format which can be read, but not
682created:
683
684* The ancient V7 format. This is the first tar format from Unix Seventh Edition,
685 storing only regular files and directories. Names must not be longer than 100
686 characters, there is no user/group name information. Some archives have
687 miscalculated header checksums in case of fields with non-ASCII characters.
688
689* The SunOS tar extended format. This format is a variant of the POSIX.1-2001
690 pax format, but is not compatible.
691
Georg Brandl116aa622007-08-15 14:28:22 +0000692.. _tar-unicode:
693
694Unicode issues
695--------------
696
697The tar format was originally conceived to make backups on tape drives with the
698main focus on preserving file system information. Nowadays tar archives are
699commonly used for file distribution and exchanging archives over networks. One
Lars Gustäbel3741eff2007-08-21 12:17:05 +0000700problem of the original format (which is the basis of all other formats) is
701that there is no concept of supporting different character encodings. For
Georg Brandl116aa622007-08-15 14:28:22 +0000702example, an ordinary tar archive created on a *UTF-8* system cannot be read
Lars Gustäbel3741eff2007-08-21 12:17:05 +0000703correctly on a *Latin-1* system if it contains non-*ASCII* characters. Textual
704metadata (like filenames, linknames, user/group names) will appear damaged.
705Unfortunately, there is no way to autodetect the encoding of an archive. The
706pax format was designed to solve this problem. It stores non-ASCII metadata
707using the universal character encoding *UTF-8*.
Georg Brandl116aa622007-08-15 14:28:22 +0000708
Lars Gustäbel3741eff2007-08-21 12:17:05 +0000709The details of character conversion in :mod:`tarfile` are controlled by the
710*encoding* and *errors* keyword arguments of the :class:`TarFile` class.
Georg Brandl116aa622007-08-15 14:28:22 +0000711
Lars Gustäbel3741eff2007-08-21 12:17:05 +0000712*encoding* defines the character encoding to use for the metadata in the
713archive. The default value is :func:`sys.getfilesystemencoding` or ``'ascii'``
714as a fallback. Depending on whether the archive is read or written, the
715metadata must be either decoded or encoded. If *encoding* is not set
716appropriately, this conversion may fail.
Georg Brandl116aa622007-08-15 14:28:22 +0000717
718The *errors* argument defines how characters are treated that cannot be
Victor Stinnerde629d42010-05-05 21:43:57 +0000719converted. Possible values are listed in section :ref:`codec-base-classes`.
720The default scheme is ``'surrogateescape'`` which Python also uses for its
721file system calls, see :ref:`os-filenames`.
Georg Brandl116aa622007-08-15 14:28:22 +0000722
Lars Gustäbel1465cc22010-05-17 18:02:50 +0000723In case of :const:`PAX_FORMAT` archives, *encoding* is generally not needed
724because all the metadata is stored using *UTF-8*. *encoding* is only used in
725the rare cases when binary pax headers are decoded or when strings with
726surrogate characters are stored.
727