blob: fe2ed99189cb6a0167459ec596f454e58cfc890c [file] [log] [blame]
Georg Brandl116aa622007-08-15 14:28:22 +00001:mod:`tarfile` --- Read and write tar archive files
2===================================================
3
4.. module:: tarfile
5 :synopsis: Read and write tar-format archive files.
6
7
Georg Brandl116aa622007-08-15 14:28:22 +00008.. moduleauthor:: Lars Gustäbel <lars@gustaebel.de>
9.. sectionauthor:: Lars Gustäbel <lars@gustaebel.de>
10
Raymond Hettingera1993682011-01-27 01:20:32 +000011**Source code:** :source:`Lib/tarfile.py`
12
13--------------
Georg Brandl116aa622007-08-15 14:28:22 +000014
Guido van Rossum77677112007-11-05 19:43:04 +000015The :mod:`tarfile` module makes it possible to read and write tar
Lars Gustäbel0a9dd2f2011-12-10 20:38:14 +010016archives, including those using gzip, bz2 and lzma compression.
Éric Araujof2fbb9c2012-01-16 16:55:55 +010017Use the :mod:`zipfile` module to read or write :file:`.zip` files, or the
18higher-level functions in :ref:`shutil <archiving-operations>`.
Guido van Rossum77677112007-11-05 19:43:04 +000019
Georg Brandl116aa622007-08-15 14:28:22 +000020Some facts and figures:
21
R David Murraybf92bce2014-10-03 20:18:48 -040022* reads and writes :mod:`gzip`, :mod:`bz2` and :mod:`lzma` compressed archives
23 if the respective modules are available.
Georg Brandl116aa622007-08-15 14:28:22 +000024
25* read/write support for the POSIX.1-1988 (ustar) format.
26
27* read/write support for the GNU tar format including *longname* and *longlink*
Lars Gustäbel9cbdd752010-10-29 09:08:19 +000028 extensions, read-only support for all variants of the *sparse* extension
29 including restoration of sparse files.
Georg Brandl116aa622007-08-15 14:28:22 +000030
31* read/write support for the POSIX.1-2001 (pax) format.
32
Georg Brandl116aa622007-08-15 14:28:22 +000033* handles directories, regular files, hardlinks, symbolic links, fifos,
34 character devices and block devices and is able to acquire and restore file
35 information like timestamp, access permissions and owner.
36
Lars Gustäbel521dfb02011-12-12 10:22:56 +010037.. versionchanged:: 3.3
38 Added support for :mod:`lzma` compression.
39
Georg Brandl116aa622007-08-15 14:28:22 +000040
Benjamin Petersona37cfc62008-05-26 13:48:34 +000041.. function:: open(name=None, mode='r', fileobj=None, bufsize=10240, \*\*kwargs)
Georg Brandl116aa622007-08-15 14:28:22 +000042
43 Return a :class:`TarFile` object for the pathname *name*. For detailed
44 information on :class:`TarFile` objects and the keyword arguments that are
45 allowed, see :ref:`tarfile-objects`.
46
47 *mode* has to be a string of the form ``'filemode[:compression]'``, it defaults
48 to ``'r'``. Here is a full list of mode combinations:
49
50 +------------------+---------------------------------------------+
51 | mode | action |
52 +==================+=============================================+
53 | ``'r' or 'r:*'`` | Open for reading with transparent |
54 | | compression (recommended). |
55 +------------------+---------------------------------------------+
56 | ``'r:'`` | Open for reading exclusively without |
57 | | compression. |
58 +------------------+---------------------------------------------+
59 | ``'r:gz'`` | Open for reading with gzip compression. |
60 +------------------+---------------------------------------------+
61 | ``'r:bz2'`` | Open for reading with bzip2 compression. |
62 +------------------+---------------------------------------------+
Lars Gustäbel0a9dd2f2011-12-10 20:38:14 +010063 | ``'r:xz'`` | Open for reading with lzma compression. |
64 +------------------+---------------------------------------------+
Georg Brandl116aa622007-08-15 14:28:22 +000065 | ``'a' or 'a:'`` | Open for appending with no compression. The |
66 | | file is created if it does not exist. |
67 +------------------+---------------------------------------------+
68 | ``'w' or 'w:'`` | Open for uncompressed writing. |
69 +------------------+---------------------------------------------+
70 | ``'w:gz'`` | Open for gzip compressed writing. |
71 +------------------+---------------------------------------------+
72 | ``'w:bz2'`` | Open for bzip2 compressed writing. |
73 +------------------+---------------------------------------------+
Lars Gustäbel0a9dd2f2011-12-10 20:38:14 +010074 | ``'w:xz'`` | Open for lzma compressed writing. |
75 +------------------+---------------------------------------------+
Georg Brandl116aa622007-08-15 14:28:22 +000076
Lars Gustäbel0a9dd2f2011-12-10 20:38:14 +010077 Note that ``'a:gz'``, ``'a:bz2'`` or ``'a:xz'`` is not possible. If *mode*
78 is not suitable to open a certain (compressed) file for reading,
79 :exc:`ReadError` is raised. Use *mode* ``'r'`` to avoid this. If a
80 compression method is not supported, :exc:`CompressionError` is raised.
Georg Brandl116aa622007-08-15 14:28:22 +000081
Antoine Pitrou11cb9612010-09-15 11:11:28 +000082 If *fileobj* is specified, it is used as an alternative to a :term:`file object`
83 opened in binary mode for *name*. It is supposed to be at position 0.
Georg Brandl116aa622007-08-15 14:28:22 +000084
Benjamin Peterson9b2731b2014-06-07 12:45:37 -070085 For modes ``'w:gz'``, ``'r:gz'``, ``'w:bz2'``, ``'r:bz2'``, :func:`tarfile.open`
86 accepts the keyword argument *compresslevel* to specify the compression level of
87 the file.
88
Georg Brandl116aa622007-08-15 14:28:22 +000089 For special purposes, there is a second format for *mode*:
Benjamin Petersona37cfc62008-05-26 13:48:34 +000090 ``'filemode|[compression]'``. :func:`tarfile.open` will return a :class:`TarFile`
Georg Brandl116aa622007-08-15 14:28:22 +000091 object that processes its data as a stream of blocks. No random seeking will
92 be done on the file. If given, *fileobj* may be any object that has a
93 :meth:`read` or :meth:`write` method (depending on the *mode*). *bufsize*
94 specifies the blocksize and defaults to ``20 * 512`` bytes. Use this variant
Antoine Pitrou11cb9612010-09-15 11:11:28 +000095 in combination with e.g. ``sys.stdin``, a socket :term:`file object` or a tape
Georg Brandl116aa622007-08-15 14:28:22 +000096 device. However, such a :class:`TarFile` object is limited in that it does
97 not allow to be accessed randomly, see :ref:`tar-examples`. The currently
98 possible modes:
99
100 +-------------+--------------------------------------------+
101 | Mode | Action |
102 +=============+============================================+
103 | ``'r|*'`` | Open a *stream* of tar blocks for reading |
104 | | with transparent compression. |
105 +-------------+--------------------------------------------+
106 | ``'r|'`` | Open a *stream* of uncompressed tar blocks |
107 | | for reading. |
108 +-------------+--------------------------------------------+
109 | ``'r|gz'`` | Open a gzip compressed *stream* for |
110 | | reading. |
111 +-------------+--------------------------------------------+
112 | ``'r|bz2'`` | Open a bzip2 compressed *stream* for |
113 | | reading. |
114 +-------------+--------------------------------------------+
Lars Gustäbel0a9dd2f2011-12-10 20:38:14 +0100115 | ``'r|xz'`` | Open a lzma compressed *stream* for |
116 | | reading. |
117 +-------------+--------------------------------------------+
Georg Brandl116aa622007-08-15 14:28:22 +0000118 | ``'w|'`` | Open an uncompressed *stream* for writing. |
119 +-------------+--------------------------------------------+
Lars Gustäbel0c6cbbd2011-12-10 12:45:45 +0100120 | ``'w|gz'`` | Open a gzip compressed *stream* for |
Georg Brandl116aa622007-08-15 14:28:22 +0000121 | | writing. |
122 +-------------+--------------------------------------------+
Lars Gustäbel0c6cbbd2011-12-10 12:45:45 +0100123 | ``'w|bz2'`` | Open a bzip2 compressed *stream* for |
Georg Brandl116aa622007-08-15 14:28:22 +0000124 | | writing. |
125 +-------------+--------------------------------------------+
Lars Gustäbel0a9dd2f2011-12-10 20:38:14 +0100126 | ``'w|xz'`` | Open an lzma compressed *stream* for |
127 | | writing. |
128 +-------------+--------------------------------------------+
Georg Brandl116aa622007-08-15 14:28:22 +0000129
130
131.. class:: TarFile
132
133 Class for reading and writing tar archives. Do not use this class directly,
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000134 better use :func:`tarfile.open` instead. See :ref:`tarfile-objects`.
Georg Brandl116aa622007-08-15 14:28:22 +0000135
136
137.. function:: is_tarfile(name)
138
139 Return :const:`True` if *name* is a tar archive file, that the :mod:`tarfile`
140 module can read.
141
142
Lars Gustäbel0c24e8b2008-08-02 11:43:24 +0000143The :mod:`tarfile` module defines the following exceptions:
Georg Brandl116aa622007-08-15 14:28:22 +0000144
145
146.. exception:: TarError
147
148 Base class for all :mod:`tarfile` exceptions.
149
150
151.. exception:: ReadError
152
153 Is raised when a tar archive is opened, that either cannot be handled by the
154 :mod:`tarfile` module or is somehow invalid.
155
156
157.. exception:: CompressionError
158
159 Is raised when a compression method is not supported or when the data cannot be
160 decoded properly.
161
162
163.. exception:: StreamError
164
165 Is raised for the limitations that are typical for stream-like :class:`TarFile`
166 objects.
167
168
169.. exception:: ExtractError
170
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000171 Is raised for *non-fatal* errors when using :meth:`TarFile.extract`, but only if
Georg Brandl116aa622007-08-15 14:28:22 +0000172 :attr:`TarFile.errorlevel`\ ``== 2``.
173
174
175.. exception:: HeaderError
176
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000177 Is raised by :meth:`TarInfo.frombuf` if the buffer it gets is invalid.
178
Georg Brandl116aa622007-08-15 14:28:22 +0000179
R David Murraybf92bce2014-10-03 20:18:48 -0400180The following constants are available at the module level:
181
182.. data:: ENCODING
183
184 The default character encoding: ``'utf-8'`` on Windows, the value returned by
185 :func:`sys.getfilesystemencoding` otherwise.
186
Georg Brandl116aa622007-08-15 14:28:22 +0000187
188Each of the following constants defines a tar archive format that the
189:mod:`tarfile` module is able to create. See section :ref:`tar-formats` for
190details.
191
192
193.. data:: USTAR_FORMAT
194
195 POSIX.1-1988 (ustar) format.
196
197
198.. data:: GNU_FORMAT
199
200 GNU tar format.
201
202
203.. data:: PAX_FORMAT
204
205 POSIX.1-2001 (pax) format.
206
207
208.. data:: DEFAULT_FORMAT
209
210 The default format for creating archives. This is currently :const:`GNU_FORMAT`.
211
212
213.. seealso::
214
215 Module :mod:`zipfile`
216 Documentation of the :mod:`zipfile` standard module.
217
R David Murraybf92bce2014-10-03 20:18:48 -0400218 :ref:`archiving-operations`
219 Documentation of the higher-level archiving facilities provided by the
220 standard :mod:`shutil` module.
221
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000222 `GNU tar manual, Basic Tar Format <http://www.gnu.org/software/tar/manual/html_node/Standard.html>`_
Georg Brandl116aa622007-08-15 14:28:22 +0000223 Documentation for tar archive files, including GNU tar extensions.
224
Georg Brandl116aa622007-08-15 14:28:22 +0000225
226.. _tarfile-objects:
227
228TarFile Objects
229---------------
230
231The :class:`TarFile` object provides an interface to a tar archive. A tar
232archive is a sequence of blocks. An archive member (a stored file) is made up of
233a header block followed by data blocks. It is possible to store a file in a tar
234archive several times. Each archive member is represented by a :class:`TarInfo`
235object, see :ref:`tarinfo-objects` for details.
236
Lars Gustäbel01385812010-03-03 12:08:54 +0000237A :class:`TarFile` object can be used as a context manager in a :keyword:`with`
238statement. It will automatically be closed when the block is completed. Please
239note that in the event of an exception an archive opened for writing will not
Benjamin Peterson08bf91c2010-04-11 16:12:57 +0000240be finalized; only the internally used file object will be closed. See the
Lars Gustäbel01385812010-03-03 12:08:54 +0000241:ref:`tar-examples` section for a use case.
242
243.. versionadded:: 3.2
Serhiy Storchaka14867992014-09-10 23:43:41 +0300244 Added support for the context management protocol.
Georg Brandl116aa622007-08-15 14:28:22 +0000245
Victor Stinnerde629d42010-05-05 21:43:57 +0000246.. class:: TarFile(name=None, mode='r', fileobj=None, format=DEFAULT_FORMAT, tarinfo=TarInfo, dereference=False, ignore_zeros=False, encoding=ENCODING, errors='surrogateescape', pax_headers=None, debug=0, errorlevel=0)
Georg Brandl116aa622007-08-15 14:28:22 +0000247
248 All following arguments are optional and can be accessed as instance attributes
249 as well.
250
251 *name* is the pathname of the archive. It can be omitted if *fileobj* is given.
252 In this case, the file object's :attr:`name` attribute is used if it exists.
253
254 *mode* is either ``'r'`` to read from an existing archive, ``'a'`` to append
255 data to an existing file or ``'w'`` to create a new file overwriting an existing
256 one.
257
258 If *fileobj* is given, it is used for reading or writing data. If it can be
259 determined, *mode* is overridden by *fileobj*'s mode. *fileobj* will be used
260 from position 0.
261
262 .. note::
263
264 *fileobj* is not closed, when :class:`TarFile` is closed.
265
266 *format* controls the archive format. It must be one of the constants
267 :const:`USTAR_FORMAT`, :const:`GNU_FORMAT` or :const:`PAX_FORMAT` that are
268 defined at module level.
269
Georg Brandl116aa622007-08-15 14:28:22 +0000270 The *tarinfo* argument can be used to replace the default :class:`TarInfo` class
271 with a different one.
272
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000273 If *dereference* is :const:`False`, add symbolic and hard links to the archive. If it
274 is :const:`True`, add the content of the target files to the archive. This has no
Georg Brandl116aa622007-08-15 14:28:22 +0000275 effect on systems that do not support symbolic links.
276
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000277 If *ignore_zeros* is :const:`False`, treat an empty block as the end of the archive.
278 If it is :const:`True`, skip empty (and invalid) blocks and try to get as many members
Georg Brandl116aa622007-08-15 14:28:22 +0000279 as possible. This is only useful for reading concatenated or damaged archives.
280
281 *debug* can be set from ``0`` (no debug messages) up to ``3`` (all debug
282 messages). The messages are written to ``sys.stderr``.
283
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000284 If *errorlevel* is ``0``, all errors are ignored when using :meth:`TarFile.extract`.
Georg Brandl116aa622007-08-15 14:28:22 +0000285 Nevertheless, they appear as error messages in the debug output, when debugging
Antoine Pitrou62ab10a02011-10-12 20:10:51 +0200286 is enabled. If ``1``, all *fatal* errors are raised as :exc:`OSError`
287 exceptions. If ``2``, all *non-fatal* errors are raised as :exc:`TarError`
288 exceptions as well.
Georg Brandl116aa622007-08-15 14:28:22 +0000289
Lars Gustäbel3741eff2007-08-21 12:17:05 +0000290 The *encoding* and *errors* arguments define the character encoding to be
291 used for reading or writing the archive and how conversion errors are going
292 to be handled. The default settings will work for most users.
Georg Brandl116aa622007-08-15 14:28:22 +0000293 See section :ref:`tar-unicode` for in-depth information.
294
Victor Stinnerde629d42010-05-05 21:43:57 +0000295 .. versionchanged:: 3.2
296 Use ``'surrogateescape'`` as the default for the *errors* argument.
297
Lars Gustäbel3741eff2007-08-21 12:17:05 +0000298 The *pax_headers* argument is an optional dictionary of strings which
Georg Brandl116aa622007-08-15 14:28:22 +0000299 will be added as a pax global header if *format* is :const:`PAX_FORMAT`.
300
Georg Brandl116aa622007-08-15 14:28:22 +0000301
Raymond Hettinger7096e262014-05-23 03:46:52 +0100302.. classmethod:: TarFile.open(...)
Georg Brandl116aa622007-08-15 14:28:22 +0000303
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000304 Alternative constructor. The :func:`tarfile.open` function is actually a
305 shortcut to this classmethod.
Georg Brandl116aa622007-08-15 14:28:22 +0000306
307
308.. method:: TarFile.getmember(name)
309
310 Return a :class:`TarInfo` object for member *name*. If *name* can not be found
311 in the archive, :exc:`KeyError` is raised.
312
313 .. note::
314
315 If a member occurs more than once in the archive, its last occurrence is assumed
316 to be the most up-to-date version.
317
318
319.. method:: TarFile.getmembers()
320
321 Return the members of the archive as a list of :class:`TarInfo` objects. The
322 list has the same order as the members in the archive.
323
324
325.. method:: TarFile.getnames()
326
327 Return the members as a list of their names. It has the same order as the list
328 returned by :meth:`getmembers`.
329
330
Serhiy Storchakaa7eb7462014-08-21 10:01:16 +0300331.. method:: TarFile.list(verbose=True, *, members=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000332
333 Print a table of contents to ``sys.stdout``. If *verbose* is :const:`False`,
334 only the names of the members are printed. If it is :const:`True`, output
Serhiy Storchakaa7eb7462014-08-21 10:01:16 +0300335 similar to that of :program:`ls -l` is produced. If optional *members* is
336 given, it must be a subset of the list returned by :meth:`getmembers`.
337
338 .. versionchanged:: 3.5
339 Added the *members* parameter.
Georg Brandl116aa622007-08-15 14:28:22 +0000340
341
342.. method:: TarFile.next()
343
344 Return the next member of the archive as a :class:`TarInfo` object, when
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000345 :class:`TarFile` is opened for reading. Return :const:`None` if there is no more
Georg Brandl116aa622007-08-15 14:28:22 +0000346 available.
347
348
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000349.. method:: TarFile.extractall(path=".", members=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000350
351 Extract all members from the archive to the current working directory or
352 directory *path*. If optional *members* is given, it must be a subset of the
353 list returned by :meth:`getmembers`. Directory information like owner,
354 modification time and permissions are set after all members have been extracted.
355 This is done to work around two problems: A directory's modification time is
356 reset each time a file is created in it. And, if a directory's permissions do
357 not allow writing, extracting files to it will fail.
358
Thomas Wouters47b49bf2007-08-30 22:15:33 +0000359 .. warning::
360
361 Never extract archives from untrusted sources without prior inspection.
362 It is possible that files are created outside of *path*, e.g. members
363 that have absolute filenames starting with ``"/"`` or filenames with two
364 dots ``".."``.
365
Georg Brandl116aa622007-08-15 14:28:22 +0000366
Martin v. Löwis16f344d2010-11-01 21:39:13 +0000367.. method:: TarFile.extract(member, path="", set_attrs=True)
Georg Brandl116aa622007-08-15 14:28:22 +0000368
369 Extract a member from the archive to the current working directory, using its
370 full name. Its file information is extracted as accurately as possible. *member*
371 may be a filename or a :class:`TarInfo` object. You can specify a different
Martin v. Löwis16f344d2010-11-01 21:39:13 +0000372 directory using *path*. File attributes (owner, mtime, mode) are set unless
Serhiy Storchakafbc1c262013-11-29 12:17:13 +0200373 *set_attrs* is false.
Georg Brandl116aa622007-08-15 14:28:22 +0000374
375 .. note::
376
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000377 The :meth:`extract` method does not take care of several extraction issues.
378 In most cases you should consider using the :meth:`extractall` method.
Georg Brandl116aa622007-08-15 14:28:22 +0000379
Thomas Wouters47b49bf2007-08-30 22:15:33 +0000380 .. warning::
381
382 See the warning for :meth:`extractall`.
383
Martin v. Löwis16f344d2010-11-01 21:39:13 +0000384 .. versionchanged:: 3.2
385 Added the *set_attrs* parameter.
Georg Brandl116aa622007-08-15 14:28:22 +0000386
387.. method:: TarFile.extractfile(member)
388
389 Extract a member from the archive as a file object. *member* may be a filename
Lars Gustäbel7a919e92012-05-05 18:15:03 +0200390 or a :class:`TarInfo` object. If *member* is a regular file or a link, an
391 :class:`io.BufferedReader` object is returned. Otherwise, :const:`None` is
392 returned.
Georg Brandl116aa622007-08-15 14:28:22 +0000393
Lars Gustäbel7a919e92012-05-05 18:15:03 +0200394 .. versionchanged:: 3.3
395 Return an :class:`io.BufferedReader` object.
Georg Brandl116aa622007-08-15 14:28:22 +0000396
397
Raymond Hettingera63a3122011-01-26 20:34:14 +0000398.. method:: TarFile.add(name, arcname=None, recursive=True, exclude=None, *, filter=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000399
Raymond Hettingera63a3122011-01-26 20:34:14 +0000400 Add the file *name* to the archive. *name* may be any type of file
401 (directory, fifo, symbolic link, etc.). If given, *arcname* specifies an
402 alternative name for the file in the archive. Directories are added
403 recursively by default. This can be avoided by setting *recursive* to
404 :const:`False`. If *exclude* is given, it must be a function that takes one
405 filename argument and returns a boolean value. Depending on this value the
406 respective file is either excluded (:const:`True`) or added
407 (:const:`False`). If *filter* is specified it must be a keyword argument. It
408 should be a function that takes a :class:`TarInfo` object argument and
409 returns the changed :class:`TarInfo` object. If it instead returns
410 :const:`None` the :class:`TarInfo` object will be excluded from the
411 archive. See :ref:`tar-examples` for an example.
Lars Gustäbel049d2aa2009-09-12 10:44:00 +0000412
413 .. versionchanged:: 3.2
414 Added the *filter* parameter.
415
416 .. deprecated:: 3.2
417 The *exclude* parameter is deprecated, please use the *filter* parameter
418 instead.
Georg Brandl116aa622007-08-15 14:28:22 +0000419
Georg Brandl116aa622007-08-15 14:28:22 +0000420
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000421.. method:: TarFile.addfile(tarinfo, fileobj=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000422
423 Add the :class:`TarInfo` object *tarinfo* to the archive. If *fileobj* is given,
424 ``tarinfo.size`` bytes are read from it and added to the archive. You can
425 create :class:`TarInfo` objects using :meth:`gettarinfo`.
426
427 .. note::
428
429 On Windows platforms, *fileobj* should always be opened with mode ``'rb'`` to
430 avoid irritation about the file size.
431
432
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000433.. method:: TarFile.gettarinfo(name=None, arcname=None, fileobj=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000434
Antoine Pitrou11cb9612010-09-15 11:11:28 +0000435 Create a :class:`TarInfo` object for either the file *name* or the :term:`file
436 object` *fileobj* (using :func:`os.fstat` on its file descriptor). You can modify
437 some of the :class:`TarInfo`'s attributes before you add it using :meth:`addfile`.
Georg Brandl116aa622007-08-15 14:28:22 +0000438 If given, *arcname* specifies an alternative name for the file in the archive.
439
440
441.. method:: TarFile.close()
442
443 Close the :class:`TarFile`. In write mode, two finishing zero blocks are
444 appended to the archive.
445
446
Georg Brandl116aa622007-08-15 14:28:22 +0000447.. attribute:: TarFile.pax_headers
448
449 A dictionary containing key-value pairs of pax global headers.
450
Georg Brandl116aa622007-08-15 14:28:22 +0000451
Georg Brandl116aa622007-08-15 14:28:22 +0000452
453.. _tarinfo-objects:
454
455TarInfo Objects
456---------------
457
458A :class:`TarInfo` object represents one member in a :class:`TarFile`. Aside
459from storing all required attributes of a file (like file type, size, time,
460permissions, owner etc.), it provides some useful methods to determine its type.
461It does *not* contain the file's data itself.
462
463:class:`TarInfo` objects are returned by :class:`TarFile`'s methods
464:meth:`getmember`, :meth:`getmembers` and :meth:`gettarinfo`.
465
466
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000467.. class:: TarInfo(name="")
Georg Brandl116aa622007-08-15 14:28:22 +0000468
469 Create a :class:`TarInfo` object.
470
471
472.. method:: TarInfo.frombuf(buf)
473
474 Create and return a :class:`TarInfo` object from string buffer *buf*.
475
Georg Brandl55ac8f02007-09-01 13:51:09 +0000476 Raises :exc:`HeaderError` if the buffer is invalid..
Georg Brandl116aa622007-08-15 14:28:22 +0000477
478
479.. method:: TarInfo.fromtarfile(tarfile)
480
481 Read the next member from the :class:`TarFile` object *tarfile* and return it as
482 a :class:`TarInfo` object.
483
Georg Brandl116aa622007-08-15 14:28:22 +0000484
Victor Stinnerde629d42010-05-05 21:43:57 +0000485.. method:: TarInfo.tobuf(format=DEFAULT_FORMAT, encoding=ENCODING, errors='surrogateescape')
Georg Brandl116aa622007-08-15 14:28:22 +0000486
487 Create a string buffer from a :class:`TarInfo` object. For information on the
488 arguments see the constructor of the :class:`TarFile` class.
489
Victor Stinnerde629d42010-05-05 21:43:57 +0000490 .. versionchanged:: 3.2
491 Use ``'surrogateescape'`` as the default for the *errors* argument.
492
Georg Brandl116aa622007-08-15 14:28:22 +0000493
494A ``TarInfo`` object has the following public data attributes:
495
496
497.. attribute:: TarInfo.name
498
499 Name of the archive member.
500
501
502.. attribute:: TarInfo.size
503
504 Size in bytes.
505
506
507.. attribute:: TarInfo.mtime
508
509 Time of last modification.
510
511
512.. attribute:: TarInfo.mode
513
514 Permission bits.
515
516
517.. attribute:: TarInfo.type
518
519 File type. *type* is usually one of these constants: :const:`REGTYPE`,
520 :const:`AREGTYPE`, :const:`LNKTYPE`, :const:`SYMTYPE`, :const:`DIRTYPE`,
521 :const:`FIFOTYPE`, :const:`CONTTYPE`, :const:`CHRTYPE`, :const:`BLKTYPE`,
522 :const:`GNUTYPE_SPARSE`. To determine the type of a :class:`TarInfo` object
Raymond Hettingerf7f64f92014-05-23 00:03:45 +0100523 more conveniently, use the ``is*()`` methods below.
Georg Brandl116aa622007-08-15 14:28:22 +0000524
525
526.. attribute:: TarInfo.linkname
527
528 Name of the target file name, which is only present in :class:`TarInfo` objects
529 of type :const:`LNKTYPE` and :const:`SYMTYPE`.
530
531
532.. attribute:: TarInfo.uid
533
534 User ID of the user who originally stored this member.
535
536
537.. attribute:: TarInfo.gid
538
539 Group ID of the user who originally stored this member.
540
541
542.. attribute:: TarInfo.uname
543
544 User name.
545
546
547.. attribute:: TarInfo.gname
548
549 Group name.
550
551
552.. attribute:: TarInfo.pax_headers
553
554 A dictionary containing key-value pairs of an associated pax extended header.
555
Georg Brandl116aa622007-08-15 14:28:22 +0000556
557A :class:`TarInfo` object also provides some convenient query methods:
558
559
560.. method:: TarInfo.isfile()
561
562 Return :const:`True` if the :class:`Tarinfo` object is a regular file.
563
564
565.. method:: TarInfo.isreg()
566
567 Same as :meth:`isfile`.
568
569
570.. method:: TarInfo.isdir()
571
572 Return :const:`True` if it is a directory.
573
574
575.. method:: TarInfo.issym()
576
577 Return :const:`True` if it is a symbolic link.
578
579
580.. method:: TarInfo.islnk()
581
582 Return :const:`True` if it is a hard link.
583
584
585.. method:: TarInfo.ischr()
586
587 Return :const:`True` if it is a character device.
588
589
590.. method:: TarInfo.isblk()
591
592 Return :const:`True` if it is a block device.
593
594
595.. method:: TarInfo.isfifo()
596
597 Return :const:`True` if it is a FIFO.
598
599
600.. method:: TarInfo.isdev()
601
602 Return :const:`True` if it is one of character device, block device or FIFO.
603
Georg Brandl116aa622007-08-15 14:28:22 +0000604
Serhiy Storchakad27b4552013-11-24 01:53:29 +0200605.. _tarfile-commandline:
606
607Command Line Interface
608----------------------
609
610.. versionadded:: 3.4
611
612The :mod:`tarfile` module provides a simple command line interface to interact
613with tar archives.
614
615If you want to create a new tar archive, specify its name after the :option:`-c`
616option and then list the filename(s) that should be included::
617
618 $ python -m tarfile -c monty.tar spam.txt eggs.txt
619
620Passing a directory is also acceptable::
621
622 $ python -m tarfile -c monty.tar life-of-brian_1979/
623
624If you want to extract a tar archive into the current directory, use
625the :option:`-e` option::
626
627 $ python -m tarfile -e monty.tar
628
629You can also extract a tar archive into a different directory by passing the
630directory's name::
631
632 $ python -m tarfile -e monty.tar other-dir/
633
634For a list of the files in a tar archive, use the :option:`-l` option::
635
636 $ python -m tarfile -l monty.tar
637
638
639Command line options
640~~~~~~~~~~~~~~~~~~~~
641
642.. cmdoption:: -l <tarfile>
643 --list <tarfile>
644
645 List files in a tarfile.
646
647.. cmdoption:: -c <tarfile> <source1> <sourceN>
648 --create <tarfile> <source1> <sourceN>
649
650 Create tarfile from source files.
651
652.. cmdoption:: -e <tarfile> [<output_dir>]
653 --extract <tarfile> [<output_dir>]
654
655 Extract tarfile into the current directory if *output_dir* is not specified.
656
657.. cmdoption:: -t <tarfile>
658 --test <tarfile>
659
660 Test whether the tarfile is valid or not.
661
662.. cmdoption:: -v, --verbose
663
664 Verbose output
665
Georg Brandl116aa622007-08-15 14:28:22 +0000666.. _tar-examples:
667
668Examples
669--------
670
671How to extract an entire tar archive to the current working directory::
672
673 import tarfile
674 tar = tarfile.open("sample.tar.gz")
675 tar.extractall()
676 tar.close()
677
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000678How to extract a subset of a tar archive with :meth:`TarFile.extractall` using
679a generator function instead of a list::
680
681 import os
682 import tarfile
683
684 def py_files(members):
685 for tarinfo in members:
686 if os.path.splitext(tarinfo.name)[1] == ".py":
687 yield tarinfo
688
689 tar = tarfile.open("sample.tar.gz")
690 tar.extractall(members=py_files(tar))
691 tar.close()
692
Georg Brandl116aa622007-08-15 14:28:22 +0000693How to create an uncompressed tar archive from a list of filenames::
694
695 import tarfile
696 tar = tarfile.open("sample.tar", "w")
697 for name in ["foo", "bar", "quux"]:
698 tar.add(name)
699 tar.close()
700
Lars Gustäbel01385812010-03-03 12:08:54 +0000701The same example using the :keyword:`with` statement::
702
703 import tarfile
704 with tarfile.open("sample.tar", "w") as tar:
705 for name in ["foo", "bar", "quux"]:
706 tar.add(name)
707
Georg Brandl116aa622007-08-15 14:28:22 +0000708How to read a gzip compressed tar archive and display some member information::
709
710 import tarfile
711 tar = tarfile.open("sample.tar.gz", "r:gz")
712 for tarinfo in tar:
Collin Winterc79461b2007-09-01 23:34:30 +0000713 print(tarinfo.name, "is", tarinfo.size, "bytes in size and is", end="")
Georg Brandl116aa622007-08-15 14:28:22 +0000714 if tarinfo.isreg():
Collin Winterc79461b2007-09-01 23:34:30 +0000715 print("a regular file.")
Georg Brandl116aa622007-08-15 14:28:22 +0000716 elif tarinfo.isdir():
Collin Winterc79461b2007-09-01 23:34:30 +0000717 print("a directory.")
Georg Brandl116aa622007-08-15 14:28:22 +0000718 else:
Collin Winterc79461b2007-09-01 23:34:30 +0000719 print("something else.")
Georg Brandl116aa622007-08-15 14:28:22 +0000720 tar.close()
721
Lars Gustäbel049d2aa2009-09-12 10:44:00 +0000722How to create an archive and reset the user information using the *filter*
723parameter in :meth:`TarFile.add`::
724
725 import tarfile
726 def reset(tarinfo):
727 tarinfo.uid = tarinfo.gid = 0
728 tarinfo.uname = tarinfo.gname = "root"
729 return tarinfo
730 tar = tarfile.open("sample.tar.gz", "w:gz")
731 tar.add("foo", filter=reset)
732 tar.close()
733
Georg Brandl116aa622007-08-15 14:28:22 +0000734
735.. _tar-formats:
736
737Supported tar formats
738---------------------
739
740There are three tar formats that can be created with the :mod:`tarfile` module:
741
742* The POSIX.1-1988 ustar format (:const:`USTAR_FORMAT`). It supports filenames
743 up to a length of at best 256 characters and linknames up to 100 characters. The
Serhiy Storchakaf8def282013-02-16 17:29:56 +0200744 maximum file size is 8 GiB. This is an old and limited but widely
Georg Brandl116aa622007-08-15 14:28:22 +0000745 supported format.
746
747* The GNU tar format (:const:`GNU_FORMAT`). It supports long filenames and
Serhiy Storchakaf8def282013-02-16 17:29:56 +0200748 linknames, files bigger than 8 GiB and sparse files. It is the de facto
Georg Brandl116aa622007-08-15 14:28:22 +0000749 standard on GNU/Linux systems. :mod:`tarfile` fully supports the GNU tar
750 extensions for long names, sparse file support is read-only.
751
752* The POSIX.1-2001 pax format (:const:`PAX_FORMAT`). It is the most flexible
753 format with virtually no limits. It supports long filenames and linknames, large
754 files and stores pathnames in a portable way. However, not all tar
755 implementations today are able to handle pax archives properly.
756
757 The *pax* format is an extension to the existing *ustar* format. It uses extra
758 headers for information that cannot be stored otherwise. There are two flavours
759 of pax headers: Extended headers only affect the subsequent file header, global
760 headers are valid for the complete archive and affect all following files. All
761 the data in a pax header is encoded in *UTF-8* for portability reasons.
762
763There are some more variants of the tar format which can be read, but not
764created:
765
766* The ancient V7 format. This is the first tar format from Unix Seventh Edition,
767 storing only regular files and directories. Names must not be longer than 100
768 characters, there is no user/group name information. Some archives have
769 miscalculated header checksums in case of fields with non-ASCII characters.
770
771* The SunOS tar extended format. This format is a variant of the POSIX.1-2001
772 pax format, but is not compatible.
773
Georg Brandl116aa622007-08-15 14:28:22 +0000774.. _tar-unicode:
775
776Unicode issues
777--------------
778
779The tar format was originally conceived to make backups on tape drives with the
780main focus on preserving file system information. Nowadays tar archives are
781commonly used for file distribution and exchanging archives over networks. One
Lars Gustäbel3741eff2007-08-21 12:17:05 +0000782problem of the original format (which is the basis of all other formats) is
783that there is no concept of supporting different character encodings. For
Georg Brandl116aa622007-08-15 14:28:22 +0000784example, an ordinary tar archive created on a *UTF-8* system cannot be read
Lars Gustäbel3741eff2007-08-21 12:17:05 +0000785correctly on a *Latin-1* system if it contains non-*ASCII* characters. Textual
786metadata (like filenames, linknames, user/group names) will appear damaged.
787Unfortunately, there is no way to autodetect the encoding of an archive. The
788pax format was designed to solve this problem. It stores non-ASCII metadata
789using the universal character encoding *UTF-8*.
Georg Brandl116aa622007-08-15 14:28:22 +0000790
Lars Gustäbel3741eff2007-08-21 12:17:05 +0000791The details of character conversion in :mod:`tarfile` are controlled by the
792*encoding* and *errors* keyword arguments of the :class:`TarFile` class.
Georg Brandl116aa622007-08-15 14:28:22 +0000793
Lars Gustäbel3741eff2007-08-21 12:17:05 +0000794*encoding* defines the character encoding to use for the metadata in the
795archive. The default value is :func:`sys.getfilesystemencoding` or ``'ascii'``
796as a fallback. Depending on whether the archive is read or written, the
797metadata must be either decoded or encoded. If *encoding* is not set
798appropriately, this conversion may fail.
Georg Brandl116aa622007-08-15 14:28:22 +0000799
800The *errors* argument defines how characters are treated that cannot be
Nick Coghlanb9fdb7a2015-01-07 00:22:00 +1000801converted. Possible values are listed in section :ref:`error-handlers`.
Victor Stinnerde629d42010-05-05 21:43:57 +0000802The default scheme is ``'surrogateescape'`` which Python also uses for its
803file system calls, see :ref:`os-filenames`.
Georg Brandl116aa622007-08-15 14:28:22 +0000804
Lars Gustäbel1465cc22010-05-17 18:02:50 +0000805In case of :const:`PAX_FORMAT` archives, *encoding* is generally not needed
806because all the metadata is stored using *UTF-8*. *encoding* is only used in
807the rare cases when binary pax headers are decoded or when strings with
808surrogate characters are stored.
809