blob: b8b65b98e833031e35e0a7feea54cde300e0804b [file] [log] [blame]
Georg Brandl116aa622007-08-15 14:28:22 +00001:mod:`tarfile` --- Read and write tar archive files
2===================================================
3
4.. module:: tarfile
5 :synopsis: Read and write tar-format archive files.
6
7
Georg Brandl116aa622007-08-15 14:28:22 +00008.. moduleauthor:: Lars Gustäbel <lars@gustaebel.de>
9.. sectionauthor:: Lars Gustäbel <lars@gustaebel.de>
10
Raymond Hettingera1993682011-01-27 01:20:32 +000011**Source code:** :source:`Lib/tarfile.py`
12
13--------------
Georg Brandl116aa622007-08-15 14:28:22 +000014
Guido van Rossum77677112007-11-05 19:43:04 +000015The :mod:`tarfile` module makes it possible to read and write tar
Lars Gustäbel0a9dd2f2011-12-10 20:38:14 +010016archives, including those using gzip, bz2 and lzma compression.
Éric Araujof2fbb9c2012-01-16 16:55:55 +010017Use the :mod:`zipfile` module to read or write :file:`.zip` files, or the
18higher-level functions in :ref:`shutil <archiving-operations>`.
Guido van Rossum77677112007-11-05 19:43:04 +000019
Georg Brandl116aa622007-08-15 14:28:22 +000020Some facts and figures:
21
Lars Gustäbel0a9dd2f2011-12-10 20:38:14 +010022* reads and writes :mod:`gzip`, :mod:`bz2` and :mod:`lzma` compressed archives.
Georg Brandl116aa622007-08-15 14:28:22 +000023
24* read/write support for the POSIX.1-1988 (ustar) format.
25
26* read/write support for the GNU tar format including *longname* and *longlink*
Lars Gustäbel9cbdd752010-10-29 09:08:19 +000027 extensions, read-only support for all variants of the *sparse* extension
28 including restoration of sparse files.
Georg Brandl116aa622007-08-15 14:28:22 +000029
30* read/write support for the POSIX.1-2001 (pax) format.
31
Georg Brandl116aa622007-08-15 14:28:22 +000032* handles directories, regular files, hardlinks, symbolic links, fifos,
33 character devices and block devices and is able to acquire and restore file
34 information like timestamp, access permissions and owner.
35
Lars Gustäbel521dfb02011-12-12 10:22:56 +010036.. versionchanged:: 3.3
37 Added support for :mod:`lzma` compression.
38
Georg Brandl116aa622007-08-15 14:28:22 +000039
Benjamin Petersona37cfc62008-05-26 13:48:34 +000040.. function:: open(name=None, mode='r', fileobj=None, bufsize=10240, \*\*kwargs)
Georg Brandl116aa622007-08-15 14:28:22 +000041
42 Return a :class:`TarFile` object for the pathname *name*. For detailed
43 information on :class:`TarFile` objects and the keyword arguments that are
44 allowed, see :ref:`tarfile-objects`.
45
46 *mode* has to be a string of the form ``'filemode[:compression]'``, it defaults
47 to ``'r'``. Here is a full list of mode combinations:
48
49 +------------------+---------------------------------------------+
50 | mode | action |
51 +==================+=============================================+
52 | ``'r' or 'r:*'`` | Open for reading with transparent |
53 | | compression (recommended). |
54 +------------------+---------------------------------------------+
55 | ``'r:'`` | Open for reading exclusively without |
56 | | compression. |
57 +------------------+---------------------------------------------+
58 | ``'r:gz'`` | Open for reading with gzip compression. |
59 +------------------+---------------------------------------------+
60 | ``'r:bz2'`` | Open for reading with bzip2 compression. |
61 +------------------+---------------------------------------------+
Lars Gustäbel0a9dd2f2011-12-10 20:38:14 +010062 | ``'r:xz'`` | Open for reading with lzma compression. |
63 +------------------+---------------------------------------------+
Georg Brandl116aa622007-08-15 14:28:22 +000064 | ``'a' or 'a:'`` | Open for appending with no compression. The |
65 | | file is created if it does not exist. |
66 +------------------+---------------------------------------------+
67 | ``'w' or 'w:'`` | Open for uncompressed writing. |
68 +------------------+---------------------------------------------+
69 | ``'w:gz'`` | Open for gzip compressed writing. |
70 +------------------+---------------------------------------------+
71 | ``'w:bz2'`` | Open for bzip2 compressed writing. |
72 +------------------+---------------------------------------------+
Lars Gustäbel0a9dd2f2011-12-10 20:38:14 +010073 | ``'w:xz'`` | Open for lzma compressed writing. |
74 +------------------+---------------------------------------------+
Georg Brandl116aa622007-08-15 14:28:22 +000075
Lars Gustäbel0a9dd2f2011-12-10 20:38:14 +010076 Note that ``'a:gz'``, ``'a:bz2'`` or ``'a:xz'`` is not possible. If *mode*
77 is not suitable to open a certain (compressed) file for reading,
78 :exc:`ReadError` is raised. Use *mode* ``'r'`` to avoid this. If a
79 compression method is not supported, :exc:`CompressionError` is raised.
Georg Brandl116aa622007-08-15 14:28:22 +000080
Antoine Pitrou11cb9612010-09-15 11:11:28 +000081 If *fileobj* is specified, it is used as an alternative to a :term:`file object`
82 opened in binary mode for *name*. It is supposed to be at position 0.
Georg Brandl116aa622007-08-15 14:28:22 +000083
84 For special purposes, there is a second format for *mode*:
Benjamin Petersona37cfc62008-05-26 13:48:34 +000085 ``'filemode|[compression]'``. :func:`tarfile.open` will return a :class:`TarFile`
Georg Brandl116aa622007-08-15 14:28:22 +000086 object that processes its data as a stream of blocks. No random seeking will
87 be done on the file. If given, *fileobj* may be any object that has a
88 :meth:`read` or :meth:`write` method (depending on the *mode*). *bufsize*
89 specifies the blocksize and defaults to ``20 * 512`` bytes. Use this variant
Antoine Pitrou11cb9612010-09-15 11:11:28 +000090 in combination with e.g. ``sys.stdin``, a socket :term:`file object` or a tape
Georg Brandl116aa622007-08-15 14:28:22 +000091 device. However, such a :class:`TarFile` object is limited in that it does
92 not allow to be accessed randomly, see :ref:`tar-examples`. The currently
93 possible modes:
94
95 +-------------+--------------------------------------------+
96 | Mode | Action |
97 +=============+============================================+
98 | ``'r|*'`` | Open a *stream* of tar blocks for reading |
99 | | with transparent compression. |
100 +-------------+--------------------------------------------+
101 | ``'r|'`` | Open a *stream* of uncompressed tar blocks |
102 | | for reading. |
103 +-------------+--------------------------------------------+
104 | ``'r|gz'`` | Open a gzip compressed *stream* for |
105 | | reading. |
106 +-------------+--------------------------------------------+
107 | ``'r|bz2'`` | Open a bzip2 compressed *stream* for |
108 | | reading. |
109 +-------------+--------------------------------------------+
Lars Gustäbel0a9dd2f2011-12-10 20:38:14 +0100110 | ``'r|xz'`` | Open a lzma compressed *stream* for |
111 | | reading. |
112 +-------------+--------------------------------------------+
Georg Brandl116aa622007-08-15 14:28:22 +0000113 | ``'w|'`` | Open an uncompressed *stream* for writing. |
114 +-------------+--------------------------------------------+
Lars Gustäbel0c6cbbd2011-12-10 12:45:45 +0100115 | ``'w|gz'`` | Open a gzip compressed *stream* for |
Georg Brandl116aa622007-08-15 14:28:22 +0000116 | | writing. |
117 +-------------+--------------------------------------------+
Lars Gustäbel0c6cbbd2011-12-10 12:45:45 +0100118 | ``'w|bz2'`` | Open a bzip2 compressed *stream* for |
Georg Brandl116aa622007-08-15 14:28:22 +0000119 | | writing. |
120 +-------------+--------------------------------------------+
Lars Gustäbel0a9dd2f2011-12-10 20:38:14 +0100121 | ``'w|xz'`` | Open an lzma compressed *stream* for |
122 | | writing. |
123 +-------------+--------------------------------------------+
Georg Brandl116aa622007-08-15 14:28:22 +0000124
125
126.. class:: TarFile
127
128 Class for reading and writing tar archives. Do not use this class directly,
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000129 better use :func:`tarfile.open` instead. See :ref:`tarfile-objects`.
Georg Brandl116aa622007-08-15 14:28:22 +0000130
131
132.. function:: is_tarfile(name)
133
134 Return :const:`True` if *name* is a tar archive file, that the :mod:`tarfile`
135 module can read.
136
137
Lars Gustäbel0c24e8b2008-08-02 11:43:24 +0000138The :mod:`tarfile` module defines the following exceptions:
Georg Brandl116aa622007-08-15 14:28:22 +0000139
140
141.. exception:: TarError
142
143 Base class for all :mod:`tarfile` exceptions.
144
145
146.. exception:: ReadError
147
148 Is raised when a tar archive is opened, that either cannot be handled by the
149 :mod:`tarfile` module or is somehow invalid.
150
151
152.. exception:: CompressionError
153
154 Is raised when a compression method is not supported or when the data cannot be
155 decoded properly.
156
157
158.. exception:: StreamError
159
160 Is raised for the limitations that are typical for stream-like :class:`TarFile`
161 objects.
162
163
164.. exception:: ExtractError
165
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000166 Is raised for *non-fatal* errors when using :meth:`TarFile.extract`, but only if
Georg Brandl116aa622007-08-15 14:28:22 +0000167 :attr:`TarFile.errorlevel`\ ``== 2``.
168
169
170.. exception:: HeaderError
171
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000172 Is raised by :meth:`TarInfo.frombuf` if the buffer it gets is invalid.
173
Georg Brandl116aa622007-08-15 14:28:22 +0000174
Georg Brandl116aa622007-08-15 14:28:22 +0000175
176Each of the following constants defines a tar archive format that the
177:mod:`tarfile` module is able to create. See section :ref:`tar-formats` for
178details.
179
180
181.. data:: USTAR_FORMAT
182
183 POSIX.1-1988 (ustar) format.
184
185
186.. data:: GNU_FORMAT
187
188 GNU tar format.
189
190
191.. data:: PAX_FORMAT
192
193 POSIX.1-2001 (pax) format.
194
195
196.. data:: DEFAULT_FORMAT
197
198 The default format for creating archives. This is currently :const:`GNU_FORMAT`.
199
200
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000201The following variables are available on module level:
202
203
204.. data:: ENCODING
205
Victor Stinner0f35e2c2010-06-11 23:46:47 +0000206 The default character encoding: ``'utf-8'`` on Windows,
207 :func:`sys.getfilesystemencoding` otherwise.
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000208
209
Georg Brandl116aa622007-08-15 14:28:22 +0000210.. seealso::
211
212 Module :mod:`zipfile`
213 Documentation of the :mod:`zipfile` standard module.
214
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000215 `GNU tar manual, Basic Tar Format <http://www.gnu.org/software/tar/manual/html_node/Standard.html>`_
Georg Brandl116aa622007-08-15 14:28:22 +0000216 Documentation for tar archive files, including GNU tar extensions.
217
Georg Brandl116aa622007-08-15 14:28:22 +0000218
219.. _tarfile-objects:
220
221TarFile Objects
222---------------
223
224The :class:`TarFile` object provides an interface to a tar archive. A tar
225archive is a sequence of blocks. An archive member (a stored file) is made up of
226a header block followed by data blocks. It is possible to store a file in a tar
227archive several times. Each archive member is represented by a :class:`TarInfo`
228object, see :ref:`tarinfo-objects` for details.
229
Lars Gustäbel01385812010-03-03 12:08:54 +0000230A :class:`TarFile` object can be used as a context manager in a :keyword:`with`
231statement. It will automatically be closed when the block is completed. Please
232note that in the event of an exception an archive opened for writing will not
Benjamin Peterson08bf91c2010-04-11 16:12:57 +0000233be finalized; only the internally used file object will be closed. See the
Lars Gustäbel01385812010-03-03 12:08:54 +0000234:ref:`tar-examples` section for a use case.
235
236.. versionadded:: 3.2
237 Added support for the context manager protocol.
Georg Brandl116aa622007-08-15 14:28:22 +0000238
Victor Stinnerde629d42010-05-05 21:43:57 +0000239.. class:: TarFile(name=None, mode='r', fileobj=None, format=DEFAULT_FORMAT, tarinfo=TarInfo, dereference=False, ignore_zeros=False, encoding=ENCODING, errors='surrogateescape', pax_headers=None, debug=0, errorlevel=0)
Georg Brandl116aa622007-08-15 14:28:22 +0000240
241 All following arguments are optional and can be accessed as instance attributes
242 as well.
243
244 *name* is the pathname of the archive. It can be omitted if *fileobj* is given.
245 In this case, the file object's :attr:`name` attribute is used if it exists.
246
247 *mode* is either ``'r'`` to read from an existing archive, ``'a'`` to append
248 data to an existing file or ``'w'`` to create a new file overwriting an existing
249 one.
250
251 If *fileobj* is given, it is used for reading or writing data. If it can be
252 determined, *mode* is overridden by *fileobj*'s mode. *fileobj* will be used
253 from position 0.
254
255 .. note::
256
257 *fileobj* is not closed, when :class:`TarFile` is closed.
258
259 *format* controls the archive format. It must be one of the constants
260 :const:`USTAR_FORMAT`, :const:`GNU_FORMAT` or :const:`PAX_FORMAT` that are
261 defined at module level.
262
Georg Brandl116aa622007-08-15 14:28:22 +0000263 The *tarinfo* argument can be used to replace the default :class:`TarInfo` class
264 with a different one.
265
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000266 If *dereference* is :const:`False`, add symbolic and hard links to the archive. If it
267 is :const:`True`, add the content of the target files to the archive. This has no
Georg Brandl116aa622007-08-15 14:28:22 +0000268 effect on systems that do not support symbolic links.
269
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000270 If *ignore_zeros* is :const:`False`, treat an empty block as the end of the archive.
271 If it is :const:`True`, skip empty (and invalid) blocks and try to get as many members
Georg Brandl116aa622007-08-15 14:28:22 +0000272 as possible. This is only useful for reading concatenated or damaged archives.
273
274 *debug* can be set from ``0`` (no debug messages) up to ``3`` (all debug
275 messages). The messages are written to ``sys.stderr``.
276
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000277 If *errorlevel* is ``0``, all errors are ignored when using :meth:`TarFile.extract`.
Georg Brandl116aa622007-08-15 14:28:22 +0000278 Nevertheless, they appear as error messages in the debug output, when debugging
Antoine Pitrou62ab10a02011-10-12 20:10:51 +0200279 is enabled. If ``1``, all *fatal* errors are raised as :exc:`OSError`
280 exceptions. If ``2``, all *non-fatal* errors are raised as :exc:`TarError`
281 exceptions as well.
Georg Brandl116aa622007-08-15 14:28:22 +0000282
Lars Gustäbel3741eff2007-08-21 12:17:05 +0000283 The *encoding* and *errors* arguments define the character encoding to be
284 used for reading or writing the archive and how conversion errors are going
285 to be handled. The default settings will work for most users.
Georg Brandl116aa622007-08-15 14:28:22 +0000286 See section :ref:`tar-unicode` for in-depth information.
287
Victor Stinnerde629d42010-05-05 21:43:57 +0000288 .. versionchanged:: 3.2
289 Use ``'surrogateescape'`` as the default for the *errors* argument.
290
Lars Gustäbel3741eff2007-08-21 12:17:05 +0000291 The *pax_headers* argument is an optional dictionary of strings which
Georg Brandl116aa622007-08-15 14:28:22 +0000292 will be added as a pax global header if *format* is :const:`PAX_FORMAT`.
293
Georg Brandl116aa622007-08-15 14:28:22 +0000294
295.. method:: TarFile.open(...)
296
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000297 Alternative constructor. The :func:`tarfile.open` function is actually a
298 shortcut to this classmethod.
Georg Brandl116aa622007-08-15 14:28:22 +0000299
300
301.. method:: TarFile.getmember(name)
302
303 Return a :class:`TarInfo` object for member *name*. If *name* can not be found
304 in the archive, :exc:`KeyError` is raised.
305
306 .. note::
307
308 If a member occurs more than once in the archive, its last occurrence is assumed
309 to be the most up-to-date version.
310
311
312.. method:: TarFile.getmembers()
313
314 Return the members of the archive as a list of :class:`TarInfo` objects. The
315 list has the same order as the members in the archive.
316
317
318.. method:: TarFile.getnames()
319
320 Return the members as a list of their names. It has the same order as the list
321 returned by :meth:`getmembers`.
322
323
324.. method:: TarFile.list(verbose=True)
325
326 Print a table of contents to ``sys.stdout``. If *verbose* is :const:`False`,
327 only the names of the members are printed. If it is :const:`True`, output
328 similar to that of :program:`ls -l` is produced.
329
330
331.. method:: TarFile.next()
332
333 Return the next member of the archive as a :class:`TarInfo` object, when
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000334 :class:`TarFile` is opened for reading. Return :const:`None` if there is no more
Georg Brandl116aa622007-08-15 14:28:22 +0000335 available.
336
337
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000338.. method:: TarFile.extractall(path=".", members=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000339
340 Extract all members from the archive to the current working directory or
341 directory *path*. If optional *members* is given, it must be a subset of the
342 list returned by :meth:`getmembers`. Directory information like owner,
343 modification time and permissions are set after all members have been extracted.
344 This is done to work around two problems: A directory's modification time is
345 reset each time a file is created in it. And, if a directory's permissions do
346 not allow writing, extracting files to it will fail.
347
Thomas Wouters47b49bf2007-08-30 22:15:33 +0000348 .. warning::
349
350 Never extract archives from untrusted sources without prior inspection.
351 It is possible that files are created outside of *path*, e.g. members
352 that have absolute filenames starting with ``"/"`` or filenames with two
353 dots ``".."``.
354
Georg Brandl116aa622007-08-15 14:28:22 +0000355
Martin v. Löwis16f344d2010-11-01 21:39:13 +0000356.. method:: TarFile.extract(member, path="", set_attrs=True)
Georg Brandl116aa622007-08-15 14:28:22 +0000357
358 Extract a member from the archive to the current working directory, using its
359 full name. Its file information is extracted as accurately as possible. *member*
360 may be a filename or a :class:`TarInfo` object. You can specify a different
Martin v. Löwis16f344d2010-11-01 21:39:13 +0000361 directory using *path*. File attributes (owner, mtime, mode) are set unless
362 *set_attrs* is False.
Georg Brandl116aa622007-08-15 14:28:22 +0000363
364 .. note::
365
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000366 The :meth:`extract` method does not take care of several extraction issues.
367 In most cases you should consider using the :meth:`extractall` method.
Georg Brandl116aa622007-08-15 14:28:22 +0000368
Thomas Wouters47b49bf2007-08-30 22:15:33 +0000369 .. warning::
370
371 See the warning for :meth:`extractall`.
372
Martin v. Löwis16f344d2010-11-01 21:39:13 +0000373 .. versionchanged:: 3.2
374 Added the *set_attrs* parameter.
Georg Brandl116aa622007-08-15 14:28:22 +0000375
376.. method:: TarFile.extractfile(member)
377
378 Extract a member from the archive as a file object. *member* may be a filename
Lars Gustäbel7a919e92012-05-05 18:15:03 +0200379 or a :class:`TarInfo` object. If *member* is a regular file or a link, an
380 :class:`io.BufferedReader` object is returned. Otherwise, :const:`None` is
381 returned.
Georg Brandl116aa622007-08-15 14:28:22 +0000382
Lars Gustäbel7a919e92012-05-05 18:15:03 +0200383 .. versionchanged:: 3.3
384 Return an :class:`io.BufferedReader` object.
Georg Brandl116aa622007-08-15 14:28:22 +0000385
386
Raymond Hettingera63a3122011-01-26 20:34:14 +0000387.. method:: TarFile.add(name, arcname=None, recursive=True, exclude=None, *, filter=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000388
Raymond Hettingera63a3122011-01-26 20:34:14 +0000389 Add the file *name* to the archive. *name* may be any type of file
390 (directory, fifo, symbolic link, etc.). If given, *arcname* specifies an
391 alternative name for the file in the archive. Directories are added
392 recursively by default. This can be avoided by setting *recursive* to
393 :const:`False`. If *exclude* is given, it must be a function that takes one
394 filename argument and returns a boolean value. Depending on this value the
395 respective file is either excluded (:const:`True`) or added
396 (:const:`False`). If *filter* is specified it must be a keyword argument. It
397 should be a function that takes a :class:`TarInfo` object argument and
398 returns the changed :class:`TarInfo` object. If it instead returns
399 :const:`None` the :class:`TarInfo` object will be excluded from the
400 archive. See :ref:`tar-examples` for an example.
Lars Gustäbel049d2aa2009-09-12 10:44:00 +0000401
402 .. versionchanged:: 3.2
403 Added the *filter* parameter.
404
405 .. deprecated:: 3.2
406 The *exclude* parameter is deprecated, please use the *filter* parameter
407 instead.
Georg Brandl116aa622007-08-15 14:28:22 +0000408
Georg Brandl116aa622007-08-15 14:28:22 +0000409
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000410.. method:: TarFile.addfile(tarinfo, fileobj=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000411
412 Add the :class:`TarInfo` object *tarinfo* to the archive. If *fileobj* is given,
413 ``tarinfo.size`` bytes are read from it and added to the archive. You can
414 create :class:`TarInfo` objects using :meth:`gettarinfo`.
415
416 .. note::
417
418 On Windows platforms, *fileobj* should always be opened with mode ``'rb'`` to
419 avoid irritation about the file size.
420
421
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000422.. method:: TarFile.gettarinfo(name=None, arcname=None, fileobj=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000423
Antoine Pitrou11cb9612010-09-15 11:11:28 +0000424 Create a :class:`TarInfo` object for either the file *name* or the :term:`file
425 object` *fileobj* (using :func:`os.fstat` on its file descriptor). You can modify
426 some of the :class:`TarInfo`'s attributes before you add it using :meth:`addfile`.
Georg Brandl116aa622007-08-15 14:28:22 +0000427 If given, *arcname* specifies an alternative name for the file in the archive.
428
429
430.. method:: TarFile.close()
431
432 Close the :class:`TarFile`. In write mode, two finishing zero blocks are
433 appended to the archive.
434
435
Georg Brandl116aa622007-08-15 14:28:22 +0000436.. attribute:: TarFile.pax_headers
437
438 A dictionary containing key-value pairs of pax global headers.
439
Georg Brandl116aa622007-08-15 14:28:22 +0000440
Georg Brandl116aa622007-08-15 14:28:22 +0000441
442.. _tarinfo-objects:
443
444TarInfo Objects
445---------------
446
447A :class:`TarInfo` object represents one member in a :class:`TarFile`. Aside
448from storing all required attributes of a file (like file type, size, time,
449permissions, owner etc.), it provides some useful methods to determine its type.
450It does *not* contain the file's data itself.
451
452:class:`TarInfo` objects are returned by :class:`TarFile`'s methods
453:meth:`getmember`, :meth:`getmembers` and :meth:`gettarinfo`.
454
455
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000456.. class:: TarInfo(name="")
Georg Brandl116aa622007-08-15 14:28:22 +0000457
458 Create a :class:`TarInfo` object.
459
460
461.. method:: TarInfo.frombuf(buf)
462
463 Create and return a :class:`TarInfo` object from string buffer *buf*.
464
Georg Brandl55ac8f02007-09-01 13:51:09 +0000465 Raises :exc:`HeaderError` if the buffer is invalid..
Georg Brandl116aa622007-08-15 14:28:22 +0000466
467
468.. method:: TarInfo.fromtarfile(tarfile)
469
470 Read the next member from the :class:`TarFile` object *tarfile* and return it as
471 a :class:`TarInfo` object.
472
Georg Brandl116aa622007-08-15 14:28:22 +0000473
Victor Stinnerde629d42010-05-05 21:43:57 +0000474.. method:: TarInfo.tobuf(format=DEFAULT_FORMAT, encoding=ENCODING, errors='surrogateescape')
Georg Brandl116aa622007-08-15 14:28:22 +0000475
476 Create a string buffer from a :class:`TarInfo` object. For information on the
477 arguments see the constructor of the :class:`TarFile` class.
478
Victor Stinnerde629d42010-05-05 21:43:57 +0000479 .. versionchanged:: 3.2
480 Use ``'surrogateescape'`` as the default for the *errors* argument.
481
Georg Brandl116aa622007-08-15 14:28:22 +0000482
483A ``TarInfo`` object has the following public data attributes:
484
485
486.. attribute:: TarInfo.name
487
488 Name of the archive member.
489
490
491.. attribute:: TarInfo.size
492
493 Size in bytes.
494
495
496.. attribute:: TarInfo.mtime
497
498 Time of last modification.
499
500
501.. attribute:: TarInfo.mode
502
503 Permission bits.
504
505
506.. attribute:: TarInfo.type
507
508 File type. *type* is usually one of these constants: :const:`REGTYPE`,
509 :const:`AREGTYPE`, :const:`LNKTYPE`, :const:`SYMTYPE`, :const:`DIRTYPE`,
510 :const:`FIFOTYPE`, :const:`CONTTYPE`, :const:`CHRTYPE`, :const:`BLKTYPE`,
511 :const:`GNUTYPE_SPARSE`. To determine the type of a :class:`TarInfo` object
512 more conveniently, use the ``is_*()`` methods below.
513
514
515.. attribute:: TarInfo.linkname
516
517 Name of the target file name, which is only present in :class:`TarInfo` objects
518 of type :const:`LNKTYPE` and :const:`SYMTYPE`.
519
520
521.. attribute:: TarInfo.uid
522
523 User ID of the user who originally stored this member.
524
525
526.. attribute:: TarInfo.gid
527
528 Group ID of the user who originally stored this member.
529
530
531.. attribute:: TarInfo.uname
532
533 User name.
534
535
536.. attribute:: TarInfo.gname
537
538 Group name.
539
540
541.. attribute:: TarInfo.pax_headers
542
543 A dictionary containing key-value pairs of an associated pax extended header.
544
Georg Brandl116aa622007-08-15 14:28:22 +0000545
546A :class:`TarInfo` object also provides some convenient query methods:
547
548
549.. method:: TarInfo.isfile()
550
551 Return :const:`True` if the :class:`Tarinfo` object is a regular file.
552
553
554.. method:: TarInfo.isreg()
555
556 Same as :meth:`isfile`.
557
558
559.. method:: TarInfo.isdir()
560
561 Return :const:`True` if it is a directory.
562
563
564.. method:: TarInfo.issym()
565
566 Return :const:`True` if it is a symbolic link.
567
568
569.. method:: TarInfo.islnk()
570
571 Return :const:`True` if it is a hard link.
572
573
574.. method:: TarInfo.ischr()
575
576 Return :const:`True` if it is a character device.
577
578
579.. method:: TarInfo.isblk()
580
581 Return :const:`True` if it is a block device.
582
583
584.. method:: TarInfo.isfifo()
585
586 Return :const:`True` if it is a FIFO.
587
588
589.. method:: TarInfo.isdev()
590
591 Return :const:`True` if it is one of character device, block device or FIFO.
592
Georg Brandl116aa622007-08-15 14:28:22 +0000593
594.. _tar-examples:
595
596Examples
597--------
598
599How to extract an entire tar archive to the current working directory::
600
601 import tarfile
602 tar = tarfile.open("sample.tar.gz")
603 tar.extractall()
604 tar.close()
605
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000606How to extract a subset of a tar archive with :meth:`TarFile.extractall` using
607a generator function instead of a list::
608
609 import os
610 import tarfile
611
612 def py_files(members):
613 for tarinfo in members:
614 if os.path.splitext(tarinfo.name)[1] == ".py":
615 yield tarinfo
616
617 tar = tarfile.open("sample.tar.gz")
618 tar.extractall(members=py_files(tar))
619 tar.close()
620
Georg Brandl116aa622007-08-15 14:28:22 +0000621How to create an uncompressed tar archive from a list of filenames::
622
623 import tarfile
624 tar = tarfile.open("sample.tar", "w")
625 for name in ["foo", "bar", "quux"]:
626 tar.add(name)
627 tar.close()
628
Lars Gustäbel01385812010-03-03 12:08:54 +0000629The same example using the :keyword:`with` statement::
630
631 import tarfile
632 with tarfile.open("sample.tar", "w") as tar:
633 for name in ["foo", "bar", "quux"]:
634 tar.add(name)
635
Georg Brandl116aa622007-08-15 14:28:22 +0000636How to read a gzip compressed tar archive and display some member information::
637
638 import tarfile
639 tar = tarfile.open("sample.tar.gz", "r:gz")
640 for tarinfo in tar:
Collin Winterc79461b2007-09-01 23:34:30 +0000641 print(tarinfo.name, "is", tarinfo.size, "bytes in size and is", end="")
Georg Brandl116aa622007-08-15 14:28:22 +0000642 if tarinfo.isreg():
Collin Winterc79461b2007-09-01 23:34:30 +0000643 print("a regular file.")
Georg Brandl116aa622007-08-15 14:28:22 +0000644 elif tarinfo.isdir():
Collin Winterc79461b2007-09-01 23:34:30 +0000645 print("a directory.")
Georg Brandl116aa622007-08-15 14:28:22 +0000646 else:
Collin Winterc79461b2007-09-01 23:34:30 +0000647 print("something else.")
Georg Brandl116aa622007-08-15 14:28:22 +0000648 tar.close()
649
Lars Gustäbel049d2aa2009-09-12 10:44:00 +0000650How to create an archive and reset the user information using the *filter*
651parameter in :meth:`TarFile.add`::
652
653 import tarfile
654 def reset(tarinfo):
655 tarinfo.uid = tarinfo.gid = 0
656 tarinfo.uname = tarinfo.gname = "root"
657 return tarinfo
658 tar = tarfile.open("sample.tar.gz", "w:gz")
659 tar.add("foo", filter=reset)
660 tar.close()
661
Georg Brandl116aa622007-08-15 14:28:22 +0000662
663.. _tar-formats:
664
665Supported tar formats
666---------------------
667
668There are three tar formats that can be created with the :mod:`tarfile` module:
669
670* The POSIX.1-1988 ustar format (:const:`USTAR_FORMAT`). It supports filenames
671 up to a length of at best 256 characters and linknames up to 100 characters. The
Serhiy Storchakaf8def282013-02-16 17:29:56 +0200672 maximum file size is 8 GiB. This is an old and limited but widely
Georg Brandl116aa622007-08-15 14:28:22 +0000673 supported format.
674
675* The GNU tar format (:const:`GNU_FORMAT`). It supports long filenames and
Serhiy Storchakaf8def282013-02-16 17:29:56 +0200676 linknames, files bigger than 8 GiB and sparse files. It is the de facto
Georg Brandl116aa622007-08-15 14:28:22 +0000677 standard on GNU/Linux systems. :mod:`tarfile` fully supports the GNU tar
678 extensions for long names, sparse file support is read-only.
679
680* The POSIX.1-2001 pax format (:const:`PAX_FORMAT`). It is the most flexible
681 format with virtually no limits. It supports long filenames and linknames, large
682 files and stores pathnames in a portable way. However, not all tar
683 implementations today are able to handle pax archives properly.
684
685 The *pax* format is an extension to the existing *ustar* format. It uses extra
686 headers for information that cannot be stored otherwise. There are two flavours
687 of pax headers: Extended headers only affect the subsequent file header, global
688 headers are valid for the complete archive and affect all following files. All
689 the data in a pax header is encoded in *UTF-8* for portability reasons.
690
691There are some more variants of the tar format which can be read, but not
692created:
693
694* The ancient V7 format. This is the first tar format from Unix Seventh Edition,
695 storing only regular files and directories. Names must not be longer than 100
696 characters, there is no user/group name information. Some archives have
697 miscalculated header checksums in case of fields with non-ASCII characters.
698
699* The SunOS tar extended format. This format is a variant of the POSIX.1-2001
700 pax format, but is not compatible.
701
Georg Brandl116aa622007-08-15 14:28:22 +0000702.. _tar-unicode:
703
704Unicode issues
705--------------
706
707The tar format was originally conceived to make backups on tape drives with the
708main focus on preserving file system information. Nowadays tar archives are
709commonly used for file distribution and exchanging archives over networks. One
Lars Gustäbel3741eff2007-08-21 12:17:05 +0000710problem of the original format (which is the basis of all other formats) is
711that there is no concept of supporting different character encodings. For
Georg Brandl116aa622007-08-15 14:28:22 +0000712example, an ordinary tar archive created on a *UTF-8* system cannot be read
Lars Gustäbel3741eff2007-08-21 12:17:05 +0000713correctly on a *Latin-1* system if it contains non-*ASCII* characters. Textual
714metadata (like filenames, linknames, user/group names) will appear damaged.
715Unfortunately, there is no way to autodetect the encoding of an archive. The
716pax format was designed to solve this problem. It stores non-ASCII metadata
717using the universal character encoding *UTF-8*.
Georg Brandl116aa622007-08-15 14:28:22 +0000718
Lars Gustäbel3741eff2007-08-21 12:17:05 +0000719The details of character conversion in :mod:`tarfile` are controlled by the
720*encoding* and *errors* keyword arguments of the :class:`TarFile` class.
Georg Brandl116aa622007-08-15 14:28:22 +0000721
Lars Gustäbel3741eff2007-08-21 12:17:05 +0000722*encoding* defines the character encoding to use for the metadata in the
723archive. The default value is :func:`sys.getfilesystemencoding` or ``'ascii'``
724as a fallback. Depending on whether the archive is read or written, the
725metadata must be either decoded or encoded. If *encoding* is not set
726appropriately, this conversion may fail.
Georg Brandl116aa622007-08-15 14:28:22 +0000727
728The *errors* argument defines how characters are treated that cannot be
Victor Stinnerde629d42010-05-05 21:43:57 +0000729converted. Possible values are listed in section :ref:`codec-base-classes`.
730The default scheme is ``'surrogateescape'`` which Python also uses for its
731file system calls, see :ref:`os-filenames`.
Georg Brandl116aa622007-08-15 14:28:22 +0000732
Lars Gustäbel1465cc22010-05-17 18:02:50 +0000733In case of :const:`PAX_FORMAT` archives, *encoding* is generally not needed
734because all the metadata is stored using *UTF-8*. *encoding* is only used in
735the rare cases when binary pax headers are decoded or when strings with
736surrogate characters are stored.
737