blob: 05f29adb0b55b9c7d4bf6011085328fe9fd9c0df [file] [log] [blame]
Georg Brandl116aa622007-08-15 14:28:22 +00001:mod:`tarfile` --- Read and write tar archive files
2===================================================
3
4.. module:: tarfile
5 :synopsis: Read and write tar-format archive files.
6
7
Georg Brandl116aa622007-08-15 14:28:22 +00008.. moduleauthor:: Lars Gustäbel <lars@gustaebel.de>
9.. sectionauthor:: Lars Gustäbel <lars@gustaebel.de>
10
Raymond Hettingera1993682011-01-27 01:20:32 +000011**Source code:** :source:`Lib/tarfile.py`
12
13--------------
Georg Brandl116aa622007-08-15 14:28:22 +000014
Guido van Rossum77677112007-11-05 19:43:04 +000015The :mod:`tarfile` module makes it possible to read and write tar
Lars Gustäbel0a9dd2f2011-12-10 20:38:14 +010016archives, including those using gzip, bz2 and lzma compression.
Éric Araujof2fbb9c2012-01-16 16:55:55 +010017Use the :mod:`zipfile` module to read or write :file:`.zip` files, or the
18higher-level functions in :ref:`shutil <archiving-operations>`.
Guido van Rossum77677112007-11-05 19:43:04 +000019
Georg Brandl116aa622007-08-15 14:28:22 +000020Some facts and figures:
21
R David Murraybf92bce2014-10-03 20:18:48 -040022* reads and writes :mod:`gzip`, :mod:`bz2` and :mod:`lzma` compressed archives
23 if the respective modules are available.
Georg Brandl116aa622007-08-15 14:28:22 +000024
25* read/write support for the POSIX.1-1988 (ustar) format.
26
27* read/write support for the GNU tar format including *longname* and *longlink*
Lars Gustäbel9cbdd752010-10-29 09:08:19 +000028 extensions, read-only support for all variants of the *sparse* extension
29 including restoration of sparse files.
Georg Brandl116aa622007-08-15 14:28:22 +000030
31* read/write support for the POSIX.1-2001 (pax) format.
32
Georg Brandl116aa622007-08-15 14:28:22 +000033* handles directories, regular files, hardlinks, symbolic links, fifos,
34 character devices and block devices and is able to acquire and restore file
35 information like timestamp, access permissions and owner.
36
Lars Gustäbel521dfb02011-12-12 10:22:56 +010037.. versionchanged:: 3.3
38 Added support for :mod:`lzma` compression.
39
Georg Brandl116aa622007-08-15 14:28:22 +000040
Benjamin Petersona37cfc62008-05-26 13:48:34 +000041.. function:: open(name=None, mode='r', fileobj=None, bufsize=10240, \*\*kwargs)
Georg Brandl116aa622007-08-15 14:28:22 +000042
43 Return a :class:`TarFile` object for the pathname *name*. For detailed
44 information on :class:`TarFile` objects and the keyword arguments that are
45 allowed, see :ref:`tarfile-objects`.
46
47 *mode* has to be a string of the form ``'filemode[:compression]'``, it defaults
48 to ``'r'``. Here is a full list of mode combinations:
49
50 +------------------+---------------------------------------------+
51 | mode | action |
52 +==================+=============================================+
53 | ``'r' or 'r:*'`` | Open for reading with transparent |
54 | | compression (recommended). |
55 +------------------+---------------------------------------------+
56 | ``'r:'`` | Open for reading exclusively without |
57 | | compression. |
58 +------------------+---------------------------------------------+
59 | ``'r:gz'`` | Open for reading with gzip compression. |
60 +------------------+---------------------------------------------+
61 | ``'r:bz2'`` | Open for reading with bzip2 compression. |
62 +------------------+---------------------------------------------+
Lars Gustäbel0a9dd2f2011-12-10 20:38:14 +010063 | ``'r:xz'`` | Open for reading with lzma compression. |
64 +------------------+---------------------------------------------+
Georg Brandl116aa622007-08-15 14:28:22 +000065 | ``'a' or 'a:'`` | Open for appending with no compression. The |
66 | | file is created if it does not exist. |
67 +------------------+---------------------------------------------+
68 | ``'w' or 'w:'`` | Open for uncompressed writing. |
69 +------------------+---------------------------------------------+
70 | ``'w:gz'`` | Open for gzip compressed writing. |
71 +------------------+---------------------------------------------+
72 | ``'w:bz2'`` | Open for bzip2 compressed writing. |
73 +------------------+---------------------------------------------+
Lars Gustäbel0a9dd2f2011-12-10 20:38:14 +010074 | ``'w:xz'`` | Open for lzma compressed writing. |
75 +------------------+---------------------------------------------+
Georg Brandl116aa622007-08-15 14:28:22 +000076
Lars Gustäbel0a9dd2f2011-12-10 20:38:14 +010077 Note that ``'a:gz'``, ``'a:bz2'`` or ``'a:xz'`` is not possible. If *mode*
78 is not suitable to open a certain (compressed) file for reading,
79 :exc:`ReadError` is raised. Use *mode* ``'r'`` to avoid this. If a
80 compression method is not supported, :exc:`CompressionError` is raised.
Georg Brandl116aa622007-08-15 14:28:22 +000081
Antoine Pitrou11cb9612010-09-15 11:11:28 +000082 If *fileobj* is specified, it is used as an alternative to a :term:`file object`
83 opened in binary mode for *name*. It is supposed to be at position 0.
Georg Brandl116aa622007-08-15 14:28:22 +000084
Benjamin Peterson9b2731b2014-06-07 12:45:37 -070085 For modes ``'w:gz'``, ``'r:gz'``, ``'w:bz2'``, ``'r:bz2'``, :func:`tarfile.open`
86 accepts the keyword argument *compresslevel* to specify the compression level of
87 the file.
88
Georg Brandl116aa622007-08-15 14:28:22 +000089 For special purposes, there is a second format for *mode*:
Benjamin Petersona37cfc62008-05-26 13:48:34 +000090 ``'filemode|[compression]'``. :func:`tarfile.open` will return a :class:`TarFile`
Georg Brandl116aa622007-08-15 14:28:22 +000091 object that processes its data as a stream of blocks. No random seeking will
92 be done on the file. If given, *fileobj* may be any object that has a
93 :meth:`read` or :meth:`write` method (depending on the *mode*). *bufsize*
94 specifies the blocksize and defaults to ``20 * 512`` bytes. Use this variant
Antoine Pitrou11cb9612010-09-15 11:11:28 +000095 in combination with e.g. ``sys.stdin``, a socket :term:`file object` or a tape
Georg Brandl116aa622007-08-15 14:28:22 +000096 device. However, such a :class:`TarFile` object is limited in that it does
97 not allow to be accessed randomly, see :ref:`tar-examples`. The currently
98 possible modes:
99
100 +-------------+--------------------------------------------+
101 | Mode | Action |
102 +=============+============================================+
103 | ``'r|*'`` | Open a *stream* of tar blocks for reading |
104 | | with transparent compression. |
105 +-------------+--------------------------------------------+
106 | ``'r|'`` | Open a *stream* of uncompressed tar blocks |
107 | | for reading. |
108 +-------------+--------------------------------------------+
109 | ``'r|gz'`` | Open a gzip compressed *stream* for |
110 | | reading. |
111 +-------------+--------------------------------------------+
112 | ``'r|bz2'`` | Open a bzip2 compressed *stream* for |
113 | | reading. |
114 +-------------+--------------------------------------------+
Lars Gustäbel0a9dd2f2011-12-10 20:38:14 +0100115 | ``'r|xz'`` | Open a lzma compressed *stream* for |
116 | | reading. |
117 +-------------+--------------------------------------------+
Georg Brandl116aa622007-08-15 14:28:22 +0000118 | ``'w|'`` | Open an uncompressed *stream* for writing. |
119 +-------------+--------------------------------------------+
Lars Gustäbel0c6cbbd2011-12-10 12:45:45 +0100120 | ``'w|gz'`` | Open a gzip compressed *stream* for |
Georg Brandl116aa622007-08-15 14:28:22 +0000121 | | writing. |
122 +-------------+--------------------------------------------+
Lars Gustäbel0c6cbbd2011-12-10 12:45:45 +0100123 | ``'w|bz2'`` | Open a bzip2 compressed *stream* for |
Georg Brandl116aa622007-08-15 14:28:22 +0000124 | | writing. |
125 +-------------+--------------------------------------------+
Lars Gustäbel0a9dd2f2011-12-10 20:38:14 +0100126 | ``'w|xz'`` | Open an lzma compressed *stream* for |
127 | | writing. |
128 +-------------+--------------------------------------------+
Georg Brandl116aa622007-08-15 14:28:22 +0000129
130
131.. class:: TarFile
132
133 Class for reading and writing tar archives. Do not use this class directly,
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000134 better use :func:`tarfile.open` instead. See :ref:`tarfile-objects`.
Georg Brandl116aa622007-08-15 14:28:22 +0000135
136
137.. function:: is_tarfile(name)
138
139 Return :const:`True` if *name* is a tar archive file, that the :mod:`tarfile`
140 module can read.
141
142
Lars Gustäbel0c24e8b2008-08-02 11:43:24 +0000143The :mod:`tarfile` module defines the following exceptions:
Georg Brandl116aa622007-08-15 14:28:22 +0000144
145
146.. exception:: TarError
147
148 Base class for all :mod:`tarfile` exceptions.
149
150
151.. exception:: ReadError
152
153 Is raised when a tar archive is opened, that either cannot be handled by the
154 :mod:`tarfile` module or is somehow invalid.
155
156
157.. exception:: CompressionError
158
159 Is raised when a compression method is not supported or when the data cannot be
160 decoded properly.
161
162
163.. exception:: StreamError
164
165 Is raised for the limitations that are typical for stream-like :class:`TarFile`
166 objects.
167
168
169.. exception:: ExtractError
170
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000171 Is raised for *non-fatal* errors when using :meth:`TarFile.extract`, but only if
Georg Brandl116aa622007-08-15 14:28:22 +0000172 :attr:`TarFile.errorlevel`\ ``== 2``.
173
174
175.. exception:: HeaderError
176
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000177 Is raised by :meth:`TarInfo.frombuf` if the buffer it gets is invalid.
178
Georg Brandl116aa622007-08-15 14:28:22 +0000179
R David Murraybf92bce2014-10-03 20:18:48 -0400180The following constants are available at the module level:
181
182.. data:: ENCODING
183
184 The default character encoding: ``'utf-8'`` on Windows, the value returned by
185 :func:`sys.getfilesystemencoding` otherwise.
186
Georg Brandl116aa622007-08-15 14:28:22 +0000187
188Each of the following constants defines a tar archive format that the
189:mod:`tarfile` module is able to create. See section :ref:`tar-formats` for
190details.
191
192
193.. data:: USTAR_FORMAT
194
195 POSIX.1-1988 (ustar) format.
196
197
198.. data:: GNU_FORMAT
199
200 GNU tar format.
201
202
203.. data:: PAX_FORMAT
204
205 POSIX.1-2001 (pax) format.
206
207
208.. data:: DEFAULT_FORMAT
209
210 The default format for creating archives. This is currently :const:`GNU_FORMAT`.
211
212
213.. seealso::
214
215 Module :mod:`zipfile`
216 Documentation of the :mod:`zipfile` standard module.
217
R David Murraybf92bce2014-10-03 20:18:48 -0400218 :ref:`archiving-operations`
219 Documentation of the higher-level archiving facilities provided by the
220 standard :mod:`shutil` module.
221
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000222 `GNU tar manual, Basic Tar Format <http://www.gnu.org/software/tar/manual/html_node/Standard.html>`_
Georg Brandl116aa622007-08-15 14:28:22 +0000223 Documentation for tar archive files, including GNU tar extensions.
224
Georg Brandl116aa622007-08-15 14:28:22 +0000225
226.. _tarfile-objects:
227
228TarFile Objects
229---------------
230
231The :class:`TarFile` object provides an interface to a tar archive. A tar
232archive is a sequence of blocks. An archive member (a stored file) is made up of
233a header block followed by data blocks. It is possible to store a file in a tar
234archive several times. Each archive member is represented by a :class:`TarInfo`
235object, see :ref:`tarinfo-objects` for details.
236
Lars Gustäbel01385812010-03-03 12:08:54 +0000237A :class:`TarFile` object can be used as a context manager in a :keyword:`with`
238statement. It will automatically be closed when the block is completed. Please
239note that in the event of an exception an archive opened for writing will not
Benjamin Peterson08bf91c2010-04-11 16:12:57 +0000240be finalized; only the internally used file object will be closed. See the
Lars Gustäbel01385812010-03-03 12:08:54 +0000241:ref:`tar-examples` section for a use case.
242
243.. versionadded:: 3.2
Serhiy Storchaka14867992014-09-10 23:43:41 +0300244 Added support for the context management protocol.
Georg Brandl116aa622007-08-15 14:28:22 +0000245
Victor Stinnerde629d42010-05-05 21:43:57 +0000246.. class:: TarFile(name=None, mode='r', fileobj=None, format=DEFAULT_FORMAT, tarinfo=TarInfo, dereference=False, ignore_zeros=False, encoding=ENCODING, errors='surrogateescape', pax_headers=None, debug=0, errorlevel=0)
Georg Brandl116aa622007-08-15 14:28:22 +0000247
248 All following arguments are optional and can be accessed as instance attributes
249 as well.
250
251 *name* is the pathname of the archive. It can be omitted if *fileobj* is given.
252 In this case, the file object's :attr:`name` attribute is used if it exists.
253
254 *mode* is either ``'r'`` to read from an existing archive, ``'a'`` to append
255 data to an existing file or ``'w'`` to create a new file overwriting an existing
256 one.
257
258 If *fileobj* is given, it is used for reading or writing data. If it can be
259 determined, *mode* is overridden by *fileobj*'s mode. *fileobj* will be used
260 from position 0.
261
262 .. note::
263
264 *fileobj* is not closed, when :class:`TarFile` is closed.
265
266 *format* controls the archive format. It must be one of the constants
267 :const:`USTAR_FORMAT`, :const:`GNU_FORMAT` or :const:`PAX_FORMAT` that are
268 defined at module level.
269
Georg Brandl116aa622007-08-15 14:28:22 +0000270 The *tarinfo* argument can be used to replace the default :class:`TarInfo` class
271 with a different one.
272
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000273 If *dereference* is :const:`False`, add symbolic and hard links to the archive. If it
274 is :const:`True`, add the content of the target files to the archive. This has no
Georg Brandl116aa622007-08-15 14:28:22 +0000275 effect on systems that do not support symbolic links.
276
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000277 If *ignore_zeros* is :const:`False`, treat an empty block as the end of the archive.
278 If it is :const:`True`, skip empty (and invalid) blocks and try to get as many members
Georg Brandl116aa622007-08-15 14:28:22 +0000279 as possible. This is only useful for reading concatenated or damaged archives.
280
281 *debug* can be set from ``0`` (no debug messages) up to ``3`` (all debug
282 messages). The messages are written to ``sys.stderr``.
283
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000284 If *errorlevel* is ``0``, all errors are ignored when using :meth:`TarFile.extract`.
Georg Brandl116aa622007-08-15 14:28:22 +0000285 Nevertheless, they appear as error messages in the debug output, when debugging
Antoine Pitrou62ab10a02011-10-12 20:10:51 +0200286 is enabled. If ``1``, all *fatal* errors are raised as :exc:`OSError`
287 exceptions. If ``2``, all *non-fatal* errors are raised as :exc:`TarError`
288 exceptions as well.
Georg Brandl116aa622007-08-15 14:28:22 +0000289
Lars Gustäbel3741eff2007-08-21 12:17:05 +0000290 The *encoding* and *errors* arguments define the character encoding to be
291 used for reading or writing the archive and how conversion errors are going
292 to be handled. The default settings will work for most users.
Georg Brandl116aa622007-08-15 14:28:22 +0000293 See section :ref:`tar-unicode` for in-depth information.
294
Victor Stinnerde629d42010-05-05 21:43:57 +0000295 .. versionchanged:: 3.2
296 Use ``'surrogateescape'`` as the default for the *errors* argument.
297
Lars Gustäbel3741eff2007-08-21 12:17:05 +0000298 The *pax_headers* argument is an optional dictionary of strings which
Georg Brandl116aa622007-08-15 14:28:22 +0000299 will be added as a pax global header if *format* is :const:`PAX_FORMAT`.
300
Georg Brandl116aa622007-08-15 14:28:22 +0000301
Raymond Hettinger7096e262014-05-23 03:46:52 +0100302.. classmethod:: TarFile.open(...)
Georg Brandl116aa622007-08-15 14:28:22 +0000303
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000304 Alternative constructor. The :func:`tarfile.open` function is actually a
305 shortcut to this classmethod.
Georg Brandl116aa622007-08-15 14:28:22 +0000306
307
308.. method:: TarFile.getmember(name)
309
310 Return a :class:`TarInfo` object for member *name*. If *name* can not be found
311 in the archive, :exc:`KeyError` is raised.
312
313 .. note::
314
315 If a member occurs more than once in the archive, its last occurrence is assumed
316 to be the most up-to-date version.
317
318
319.. method:: TarFile.getmembers()
320
321 Return the members of the archive as a list of :class:`TarInfo` objects. The
322 list has the same order as the members in the archive.
323
324
325.. method:: TarFile.getnames()
326
327 Return the members as a list of their names. It has the same order as the list
328 returned by :meth:`getmembers`.
329
330
331.. method:: TarFile.list(verbose=True)
332
333 Print a table of contents to ``sys.stdout``. If *verbose* is :const:`False`,
334 only the names of the members are printed. If it is :const:`True`, output
335 similar to that of :program:`ls -l` is produced.
336
337
338.. method:: TarFile.next()
339
340 Return the next member of the archive as a :class:`TarInfo` object, when
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000341 :class:`TarFile` is opened for reading. Return :const:`None` if there is no more
Georg Brandl116aa622007-08-15 14:28:22 +0000342 available.
343
344
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000345.. method:: TarFile.extractall(path=".", members=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000346
347 Extract all members from the archive to the current working directory or
348 directory *path*. If optional *members* is given, it must be a subset of the
349 list returned by :meth:`getmembers`. Directory information like owner,
350 modification time and permissions are set after all members have been extracted.
351 This is done to work around two problems: A directory's modification time is
352 reset each time a file is created in it. And, if a directory's permissions do
353 not allow writing, extracting files to it will fail.
354
Thomas Wouters47b49bf2007-08-30 22:15:33 +0000355 .. warning::
356
357 Never extract archives from untrusted sources without prior inspection.
358 It is possible that files are created outside of *path*, e.g. members
359 that have absolute filenames starting with ``"/"`` or filenames with two
360 dots ``".."``.
361
Georg Brandl116aa622007-08-15 14:28:22 +0000362
Martin v. Löwis16f344d2010-11-01 21:39:13 +0000363.. method:: TarFile.extract(member, path="", set_attrs=True)
Georg Brandl116aa622007-08-15 14:28:22 +0000364
365 Extract a member from the archive to the current working directory, using its
366 full name. Its file information is extracted as accurately as possible. *member*
367 may be a filename or a :class:`TarInfo` object. You can specify a different
Martin v. Löwis16f344d2010-11-01 21:39:13 +0000368 directory using *path*. File attributes (owner, mtime, mode) are set unless
Serhiy Storchakafbc1c262013-11-29 12:17:13 +0200369 *set_attrs* is false.
Georg Brandl116aa622007-08-15 14:28:22 +0000370
371 .. note::
372
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000373 The :meth:`extract` method does not take care of several extraction issues.
374 In most cases you should consider using the :meth:`extractall` method.
Georg Brandl116aa622007-08-15 14:28:22 +0000375
Thomas Wouters47b49bf2007-08-30 22:15:33 +0000376 .. warning::
377
378 See the warning for :meth:`extractall`.
379
Martin v. Löwis16f344d2010-11-01 21:39:13 +0000380 .. versionchanged:: 3.2
381 Added the *set_attrs* parameter.
Georg Brandl116aa622007-08-15 14:28:22 +0000382
383.. method:: TarFile.extractfile(member)
384
385 Extract a member from the archive as a file object. *member* may be a filename
Lars Gustäbel7a919e92012-05-05 18:15:03 +0200386 or a :class:`TarInfo` object. If *member* is a regular file or a link, an
387 :class:`io.BufferedReader` object is returned. Otherwise, :const:`None` is
388 returned.
Georg Brandl116aa622007-08-15 14:28:22 +0000389
Lars Gustäbel7a919e92012-05-05 18:15:03 +0200390 .. versionchanged:: 3.3
391 Return an :class:`io.BufferedReader` object.
Georg Brandl116aa622007-08-15 14:28:22 +0000392
393
Raymond Hettingera63a3122011-01-26 20:34:14 +0000394.. method:: TarFile.add(name, arcname=None, recursive=True, exclude=None, *, filter=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000395
Raymond Hettingera63a3122011-01-26 20:34:14 +0000396 Add the file *name* to the archive. *name* may be any type of file
397 (directory, fifo, symbolic link, etc.). If given, *arcname* specifies an
398 alternative name for the file in the archive. Directories are added
399 recursively by default. This can be avoided by setting *recursive* to
400 :const:`False`. If *exclude* is given, it must be a function that takes one
401 filename argument and returns a boolean value. Depending on this value the
402 respective file is either excluded (:const:`True`) or added
403 (:const:`False`). If *filter* is specified it must be a keyword argument. It
404 should be a function that takes a :class:`TarInfo` object argument and
405 returns the changed :class:`TarInfo` object. If it instead returns
406 :const:`None` the :class:`TarInfo` object will be excluded from the
407 archive. See :ref:`tar-examples` for an example.
Lars Gustäbel049d2aa2009-09-12 10:44:00 +0000408
409 .. versionchanged:: 3.2
410 Added the *filter* parameter.
411
412 .. deprecated:: 3.2
413 The *exclude* parameter is deprecated, please use the *filter* parameter
414 instead.
Georg Brandl116aa622007-08-15 14:28:22 +0000415
Georg Brandl116aa622007-08-15 14:28:22 +0000416
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000417.. method:: TarFile.addfile(tarinfo, fileobj=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000418
419 Add the :class:`TarInfo` object *tarinfo* to the archive. If *fileobj* is given,
420 ``tarinfo.size`` bytes are read from it and added to the archive. You can
421 create :class:`TarInfo` objects using :meth:`gettarinfo`.
422
423 .. note::
424
425 On Windows platforms, *fileobj* should always be opened with mode ``'rb'`` to
426 avoid irritation about the file size.
427
428
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000429.. method:: TarFile.gettarinfo(name=None, arcname=None, fileobj=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000430
Antoine Pitrou11cb9612010-09-15 11:11:28 +0000431 Create a :class:`TarInfo` object for either the file *name* or the :term:`file
432 object` *fileobj* (using :func:`os.fstat` on its file descriptor). You can modify
433 some of the :class:`TarInfo`'s attributes before you add it using :meth:`addfile`.
Georg Brandl116aa622007-08-15 14:28:22 +0000434 If given, *arcname* specifies an alternative name for the file in the archive.
435
436
437.. method:: TarFile.close()
438
439 Close the :class:`TarFile`. In write mode, two finishing zero blocks are
440 appended to the archive.
441
442
Georg Brandl116aa622007-08-15 14:28:22 +0000443.. attribute:: TarFile.pax_headers
444
445 A dictionary containing key-value pairs of pax global headers.
446
Georg Brandl116aa622007-08-15 14:28:22 +0000447
Georg Brandl116aa622007-08-15 14:28:22 +0000448
449.. _tarinfo-objects:
450
451TarInfo Objects
452---------------
453
454A :class:`TarInfo` object represents one member in a :class:`TarFile`. Aside
455from storing all required attributes of a file (like file type, size, time,
456permissions, owner etc.), it provides some useful methods to determine its type.
457It does *not* contain the file's data itself.
458
459:class:`TarInfo` objects are returned by :class:`TarFile`'s methods
460:meth:`getmember`, :meth:`getmembers` and :meth:`gettarinfo`.
461
462
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000463.. class:: TarInfo(name="")
Georg Brandl116aa622007-08-15 14:28:22 +0000464
465 Create a :class:`TarInfo` object.
466
467
Berker Peksag37de9102015-04-19 04:37:35 +0300468.. classmethod:: TarInfo.frombuf(buf, encoding, errors)
Georg Brandl116aa622007-08-15 14:28:22 +0000469
470 Create and return a :class:`TarInfo` object from string buffer *buf*.
471
Berker Peksag37de9102015-04-19 04:37:35 +0300472 Raises :exc:`HeaderError` if the buffer is invalid.
Georg Brandl116aa622007-08-15 14:28:22 +0000473
474
Berker Peksag37de9102015-04-19 04:37:35 +0300475.. classmethod:: TarInfo.fromtarfile(tarfile)
Georg Brandl116aa622007-08-15 14:28:22 +0000476
477 Read the next member from the :class:`TarFile` object *tarfile* and return it as
478 a :class:`TarInfo` object.
479
Georg Brandl116aa622007-08-15 14:28:22 +0000480
Victor Stinnerde629d42010-05-05 21:43:57 +0000481.. method:: TarInfo.tobuf(format=DEFAULT_FORMAT, encoding=ENCODING, errors='surrogateescape')
Georg Brandl116aa622007-08-15 14:28:22 +0000482
483 Create a string buffer from a :class:`TarInfo` object. For information on the
484 arguments see the constructor of the :class:`TarFile` class.
485
Victor Stinnerde629d42010-05-05 21:43:57 +0000486 .. versionchanged:: 3.2
487 Use ``'surrogateescape'`` as the default for the *errors* argument.
488
Georg Brandl116aa622007-08-15 14:28:22 +0000489
490A ``TarInfo`` object has the following public data attributes:
491
492
493.. attribute:: TarInfo.name
494
495 Name of the archive member.
496
497
498.. attribute:: TarInfo.size
499
500 Size in bytes.
501
502
503.. attribute:: TarInfo.mtime
504
505 Time of last modification.
506
507
508.. attribute:: TarInfo.mode
509
510 Permission bits.
511
512
513.. attribute:: TarInfo.type
514
515 File type. *type* is usually one of these constants: :const:`REGTYPE`,
516 :const:`AREGTYPE`, :const:`LNKTYPE`, :const:`SYMTYPE`, :const:`DIRTYPE`,
517 :const:`FIFOTYPE`, :const:`CONTTYPE`, :const:`CHRTYPE`, :const:`BLKTYPE`,
518 :const:`GNUTYPE_SPARSE`. To determine the type of a :class:`TarInfo` object
Raymond Hettingerf7f64f92014-05-23 00:03:45 +0100519 more conveniently, use the ``is*()`` methods below.
Georg Brandl116aa622007-08-15 14:28:22 +0000520
521
522.. attribute:: TarInfo.linkname
523
524 Name of the target file name, which is only present in :class:`TarInfo` objects
525 of type :const:`LNKTYPE` and :const:`SYMTYPE`.
526
527
528.. attribute:: TarInfo.uid
529
530 User ID of the user who originally stored this member.
531
532
533.. attribute:: TarInfo.gid
534
535 Group ID of the user who originally stored this member.
536
537
538.. attribute:: TarInfo.uname
539
540 User name.
541
542
543.. attribute:: TarInfo.gname
544
545 Group name.
546
547
548.. attribute:: TarInfo.pax_headers
549
550 A dictionary containing key-value pairs of an associated pax extended header.
551
Georg Brandl116aa622007-08-15 14:28:22 +0000552
553A :class:`TarInfo` object also provides some convenient query methods:
554
555
556.. method:: TarInfo.isfile()
557
558 Return :const:`True` if the :class:`Tarinfo` object is a regular file.
559
560
561.. method:: TarInfo.isreg()
562
563 Same as :meth:`isfile`.
564
565
566.. method:: TarInfo.isdir()
567
568 Return :const:`True` if it is a directory.
569
570
571.. method:: TarInfo.issym()
572
573 Return :const:`True` if it is a symbolic link.
574
575
576.. method:: TarInfo.islnk()
577
578 Return :const:`True` if it is a hard link.
579
580
581.. method:: TarInfo.ischr()
582
583 Return :const:`True` if it is a character device.
584
585
586.. method:: TarInfo.isblk()
587
588 Return :const:`True` if it is a block device.
589
590
591.. method:: TarInfo.isfifo()
592
593 Return :const:`True` if it is a FIFO.
594
595
596.. method:: TarInfo.isdev()
597
598 Return :const:`True` if it is one of character device, block device or FIFO.
599
Georg Brandl116aa622007-08-15 14:28:22 +0000600
Serhiy Storchakad27b4552013-11-24 01:53:29 +0200601.. _tarfile-commandline:
602
603Command Line Interface
604----------------------
605
606.. versionadded:: 3.4
607
608The :mod:`tarfile` module provides a simple command line interface to interact
609with tar archives.
610
611If you want to create a new tar archive, specify its name after the :option:`-c`
612option and then list the filename(s) that should be included::
613
614 $ python -m tarfile -c monty.tar spam.txt eggs.txt
615
616Passing a directory is also acceptable::
617
618 $ python -m tarfile -c monty.tar life-of-brian_1979/
619
620If you want to extract a tar archive into the current directory, use
621the :option:`-e` option::
622
623 $ python -m tarfile -e monty.tar
624
625You can also extract a tar archive into a different directory by passing the
626directory's name::
627
628 $ python -m tarfile -e monty.tar other-dir/
629
630For a list of the files in a tar archive, use the :option:`-l` option::
631
632 $ python -m tarfile -l monty.tar
633
634
635Command line options
636~~~~~~~~~~~~~~~~~~~~
637
638.. cmdoption:: -l <tarfile>
639 --list <tarfile>
640
641 List files in a tarfile.
642
643.. cmdoption:: -c <tarfile> <source1> <sourceN>
644 --create <tarfile> <source1> <sourceN>
645
646 Create tarfile from source files.
647
648.. cmdoption:: -e <tarfile> [<output_dir>]
649 --extract <tarfile> [<output_dir>]
650
651 Extract tarfile into the current directory if *output_dir* is not specified.
652
653.. cmdoption:: -t <tarfile>
654 --test <tarfile>
655
656 Test whether the tarfile is valid or not.
657
658.. cmdoption:: -v, --verbose
659
660 Verbose output
661
Georg Brandl116aa622007-08-15 14:28:22 +0000662.. _tar-examples:
663
664Examples
665--------
666
667How to extract an entire tar archive to the current working directory::
668
669 import tarfile
670 tar = tarfile.open("sample.tar.gz")
671 tar.extractall()
672 tar.close()
673
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000674How to extract a subset of a tar archive with :meth:`TarFile.extractall` using
675a generator function instead of a list::
676
677 import os
678 import tarfile
679
680 def py_files(members):
681 for tarinfo in members:
682 if os.path.splitext(tarinfo.name)[1] == ".py":
683 yield tarinfo
684
685 tar = tarfile.open("sample.tar.gz")
686 tar.extractall(members=py_files(tar))
687 tar.close()
688
Georg Brandl116aa622007-08-15 14:28:22 +0000689How to create an uncompressed tar archive from a list of filenames::
690
691 import tarfile
692 tar = tarfile.open("sample.tar", "w")
693 for name in ["foo", "bar", "quux"]:
694 tar.add(name)
695 tar.close()
696
Lars Gustäbel01385812010-03-03 12:08:54 +0000697The same example using the :keyword:`with` statement::
698
699 import tarfile
700 with tarfile.open("sample.tar", "w") as tar:
701 for name in ["foo", "bar", "quux"]:
702 tar.add(name)
703
Georg Brandl116aa622007-08-15 14:28:22 +0000704How to read a gzip compressed tar archive and display some member information::
705
706 import tarfile
707 tar = tarfile.open("sample.tar.gz", "r:gz")
708 for tarinfo in tar:
Collin Winterc79461b2007-09-01 23:34:30 +0000709 print(tarinfo.name, "is", tarinfo.size, "bytes in size and is", end="")
Georg Brandl116aa622007-08-15 14:28:22 +0000710 if tarinfo.isreg():
Collin Winterc79461b2007-09-01 23:34:30 +0000711 print("a regular file.")
Georg Brandl116aa622007-08-15 14:28:22 +0000712 elif tarinfo.isdir():
Collin Winterc79461b2007-09-01 23:34:30 +0000713 print("a directory.")
Georg Brandl116aa622007-08-15 14:28:22 +0000714 else:
Collin Winterc79461b2007-09-01 23:34:30 +0000715 print("something else.")
Georg Brandl116aa622007-08-15 14:28:22 +0000716 tar.close()
717
Lars Gustäbel049d2aa2009-09-12 10:44:00 +0000718How to create an archive and reset the user information using the *filter*
719parameter in :meth:`TarFile.add`::
720
721 import tarfile
722 def reset(tarinfo):
723 tarinfo.uid = tarinfo.gid = 0
724 tarinfo.uname = tarinfo.gname = "root"
725 return tarinfo
726 tar = tarfile.open("sample.tar.gz", "w:gz")
727 tar.add("foo", filter=reset)
728 tar.close()
729
Georg Brandl116aa622007-08-15 14:28:22 +0000730
731.. _tar-formats:
732
733Supported tar formats
734---------------------
735
736There are three tar formats that can be created with the :mod:`tarfile` module:
737
738* The POSIX.1-1988 ustar format (:const:`USTAR_FORMAT`). It supports filenames
739 up to a length of at best 256 characters and linknames up to 100 characters. The
Serhiy Storchakaf8def282013-02-16 17:29:56 +0200740 maximum file size is 8 GiB. This is an old and limited but widely
Georg Brandl116aa622007-08-15 14:28:22 +0000741 supported format.
742
743* The GNU tar format (:const:`GNU_FORMAT`). It supports long filenames and
Serhiy Storchakaf8def282013-02-16 17:29:56 +0200744 linknames, files bigger than 8 GiB and sparse files. It is the de facto
Georg Brandl116aa622007-08-15 14:28:22 +0000745 standard on GNU/Linux systems. :mod:`tarfile` fully supports the GNU tar
746 extensions for long names, sparse file support is read-only.
747
748* The POSIX.1-2001 pax format (:const:`PAX_FORMAT`). It is the most flexible
749 format with virtually no limits. It supports long filenames and linknames, large
750 files and stores pathnames in a portable way. However, not all tar
751 implementations today are able to handle pax archives properly.
752
753 The *pax* format is an extension to the existing *ustar* format. It uses extra
754 headers for information that cannot be stored otherwise. There are two flavours
755 of pax headers: Extended headers only affect the subsequent file header, global
756 headers are valid for the complete archive and affect all following files. All
757 the data in a pax header is encoded in *UTF-8* for portability reasons.
758
759There are some more variants of the tar format which can be read, but not
760created:
761
762* The ancient V7 format. This is the first tar format from Unix Seventh Edition,
763 storing only regular files and directories. Names must not be longer than 100
764 characters, there is no user/group name information. Some archives have
765 miscalculated header checksums in case of fields with non-ASCII characters.
766
767* The SunOS tar extended format. This format is a variant of the POSIX.1-2001
768 pax format, but is not compatible.
769
Georg Brandl116aa622007-08-15 14:28:22 +0000770.. _tar-unicode:
771
772Unicode issues
773--------------
774
775The tar format was originally conceived to make backups on tape drives with the
776main focus on preserving file system information. Nowadays tar archives are
777commonly used for file distribution and exchanging archives over networks. One
Lars Gustäbel3741eff2007-08-21 12:17:05 +0000778problem of the original format (which is the basis of all other formats) is
779that there is no concept of supporting different character encodings. For
Georg Brandl116aa622007-08-15 14:28:22 +0000780example, an ordinary tar archive created on a *UTF-8* system cannot be read
Lars Gustäbel3741eff2007-08-21 12:17:05 +0000781correctly on a *Latin-1* system if it contains non-*ASCII* characters. Textual
782metadata (like filenames, linknames, user/group names) will appear damaged.
783Unfortunately, there is no way to autodetect the encoding of an archive. The
784pax format was designed to solve this problem. It stores non-ASCII metadata
785using the universal character encoding *UTF-8*.
Georg Brandl116aa622007-08-15 14:28:22 +0000786
Lars Gustäbel3741eff2007-08-21 12:17:05 +0000787The details of character conversion in :mod:`tarfile` are controlled by the
788*encoding* and *errors* keyword arguments of the :class:`TarFile` class.
Georg Brandl116aa622007-08-15 14:28:22 +0000789
Lars Gustäbel3741eff2007-08-21 12:17:05 +0000790*encoding* defines the character encoding to use for the metadata in the
791archive. The default value is :func:`sys.getfilesystemencoding` or ``'ascii'``
792as a fallback. Depending on whether the archive is read or written, the
793metadata must be either decoded or encoded. If *encoding* is not set
794appropriately, this conversion may fail.
Georg Brandl116aa622007-08-15 14:28:22 +0000795
796The *errors* argument defines how characters are treated that cannot be
Nick Coghlanb9fdb7a2015-01-07 00:22:00 +1000797converted. Possible values are listed in section :ref:`error-handlers`.
Victor Stinnerde629d42010-05-05 21:43:57 +0000798The default scheme is ``'surrogateescape'`` which Python also uses for its
799file system calls, see :ref:`os-filenames`.
Georg Brandl116aa622007-08-15 14:28:22 +0000800
Lars Gustäbel1465cc22010-05-17 18:02:50 +0000801In case of :const:`PAX_FORMAT` archives, *encoding* is generally not needed
802because all the metadata is stored using *UTF-8*. *encoding* is only used in
803the rare cases when binary pax headers are decoded or when strings with
804surrogate characters are stored.
805