blob: 2bd99cffd9022b6b63145690b7bc10dc5cc7fd9d [file] [log] [blame]
Georg Brandl116aa622007-08-15 14:28:22 +00001:mod:`tarfile` --- Read and write tar archive files
2===================================================
3
4.. module:: tarfile
5 :synopsis: Read and write tar-format archive files.
6
7
Georg Brandl116aa622007-08-15 14:28:22 +00008.. moduleauthor:: Lars Gustäbel <lars@gustaebel.de>
9.. sectionauthor:: Lars Gustäbel <lars@gustaebel.de>
10
Raymond Hettingera1993682011-01-27 01:20:32 +000011**Source code:** :source:`Lib/tarfile.py`
12
13--------------
Georg Brandl116aa622007-08-15 14:28:22 +000014
Guido van Rossum77677112007-11-05 19:43:04 +000015The :mod:`tarfile` module makes it possible to read and write tar
Lars Gustäbel0a9dd2f2011-12-10 20:38:14 +010016archives, including those using gzip, bz2 and lzma compression.
Christian Heimes255f53b2007-12-08 15:33:56 +000017(:file:`.zip` files can be read and written using the :mod:`zipfile` module.)
Guido van Rossum77677112007-11-05 19:43:04 +000018
Georg Brandl116aa622007-08-15 14:28:22 +000019Some facts and figures:
20
Lars Gustäbel0a9dd2f2011-12-10 20:38:14 +010021* reads and writes :mod:`gzip`, :mod:`bz2` and :mod:`lzma` compressed archives.
Georg Brandl116aa622007-08-15 14:28:22 +000022
23* read/write support for the POSIX.1-1988 (ustar) format.
24
25* read/write support for the GNU tar format including *longname* and *longlink*
Lars Gustäbel9cbdd752010-10-29 09:08:19 +000026 extensions, read-only support for all variants of the *sparse* extension
27 including restoration of sparse files.
Georg Brandl116aa622007-08-15 14:28:22 +000028
29* read/write support for the POSIX.1-2001 (pax) format.
30
Georg Brandl116aa622007-08-15 14:28:22 +000031* handles directories, regular files, hardlinks, symbolic links, fifos,
32 character devices and block devices and is able to acquire and restore file
33 information like timestamp, access permissions and owner.
34
Georg Brandl116aa622007-08-15 14:28:22 +000035
Benjamin Petersona37cfc62008-05-26 13:48:34 +000036.. function:: open(name=None, mode='r', fileobj=None, bufsize=10240, \*\*kwargs)
Georg Brandl116aa622007-08-15 14:28:22 +000037
38 Return a :class:`TarFile` object for the pathname *name*. For detailed
39 information on :class:`TarFile` objects and the keyword arguments that are
40 allowed, see :ref:`tarfile-objects`.
41
42 *mode* has to be a string of the form ``'filemode[:compression]'``, it defaults
43 to ``'r'``. Here is a full list of mode combinations:
44
45 +------------------+---------------------------------------------+
46 | mode | action |
47 +==================+=============================================+
48 | ``'r' or 'r:*'`` | Open for reading with transparent |
49 | | compression (recommended). |
50 +------------------+---------------------------------------------+
51 | ``'r:'`` | Open for reading exclusively without |
52 | | compression. |
53 +------------------+---------------------------------------------+
54 | ``'r:gz'`` | Open for reading with gzip compression. |
55 +------------------+---------------------------------------------+
56 | ``'r:bz2'`` | Open for reading with bzip2 compression. |
57 +------------------+---------------------------------------------+
Lars Gustäbel0a9dd2f2011-12-10 20:38:14 +010058 | ``'r:xz'`` | Open for reading with lzma compression. |
59 +------------------+---------------------------------------------+
Georg Brandl116aa622007-08-15 14:28:22 +000060 | ``'a' or 'a:'`` | Open for appending with no compression. The |
61 | | file is created if it does not exist. |
62 +------------------+---------------------------------------------+
63 | ``'w' or 'w:'`` | Open for uncompressed writing. |
64 +------------------+---------------------------------------------+
65 | ``'w:gz'`` | Open for gzip compressed writing. |
66 +------------------+---------------------------------------------+
67 | ``'w:bz2'`` | Open for bzip2 compressed writing. |
68 +------------------+---------------------------------------------+
Lars Gustäbel0a9dd2f2011-12-10 20:38:14 +010069 | ``'w:xz'`` | Open for lzma compressed writing. |
70 +------------------+---------------------------------------------+
Georg Brandl116aa622007-08-15 14:28:22 +000071
Lars Gustäbel0a9dd2f2011-12-10 20:38:14 +010072 Note that ``'a:gz'``, ``'a:bz2'`` or ``'a:xz'`` is not possible. If *mode*
73 is not suitable to open a certain (compressed) file for reading,
74 :exc:`ReadError` is raised. Use *mode* ``'r'`` to avoid this. If a
75 compression method is not supported, :exc:`CompressionError` is raised.
Georg Brandl116aa622007-08-15 14:28:22 +000076
Antoine Pitrou11cb9612010-09-15 11:11:28 +000077 If *fileobj* is specified, it is used as an alternative to a :term:`file object`
78 opened in binary mode for *name*. It is supposed to be at position 0.
Georg Brandl116aa622007-08-15 14:28:22 +000079
80 For special purposes, there is a second format for *mode*:
Benjamin Petersona37cfc62008-05-26 13:48:34 +000081 ``'filemode|[compression]'``. :func:`tarfile.open` will return a :class:`TarFile`
Georg Brandl116aa622007-08-15 14:28:22 +000082 object that processes its data as a stream of blocks. No random seeking will
83 be done on the file. If given, *fileobj* may be any object that has a
84 :meth:`read` or :meth:`write` method (depending on the *mode*). *bufsize*
85 specifies the blocksize and defaults to ``20 * 512`` bytes. Use this variant
Antoine Pitrou11cb9612010-09-15 11:11:28 +000086 in combination with e.g. ``sys.stdin``, a socket :term:`file object` or a tape
Georg Brandl116aa622007-08-15 14:28:22 +000087 device. However, such a :class:`TarFile` object is limited in that it does
88 not allow to be accessed randomly, see :ref:`tar-examples`. The currently
89 possible modes:
90
91 +-------------+--------------------------------------------+
92 | Mode | Action |
93 +=============+============================================+
94 | ``'r|*'`` | Open a *stream* of tar blocks for reading |
95 | | with transparent compression. |
96 +-------------+--------------------------------------------+
97 | ``'r|'`` | Open a *stream* of uncompressed tar blocks |
98 | | for reading. |
99 +-------------+--------------------------------------------+
100 | ``'r|gz'`` | Open a gzip compressed *stream* for |
101 | | reading. |
102 +-------------+--------------------------------------------+
103 | ``'r|bz2'`` | Open a bzip2 compressed *stream* for |
104 | | reading. |
105 +-------------+--------------------------------------------+
Lars Gustäbel0a9dd2f2011-12-10 20:38:14 +0100106 | ``'r|xz'`` | Open a lzma compressed *stream* for |
107 | | reading. |
108 +-------------+--------------------------------------------+
Georg Brandl116aa622007-08-15 14:28:22 +0000109 | ``'w|'`` | Open an uncompressed *stream* for writing. |
110 +-------------+--------------------------------------------+
Lars Gustäbel0c6cbbd2011-12-10 12:45:45 +0100111 | ``'w|gz'`` | Open a gzip compressed *stream* for |
Georg Brandl116aa622007-08-15 14:28:22 +0000112 | | writing. |
113 +-------------+--------------------------------------------+
Lars Gustäbel0c6cbbd2011-12-10 12:45:45 +0100114 | ``'w|bz2'`` | Open a bzip2 compressed *stream* for |
Georg Brandl116aa622007-08-15 14:28:22 +0000115 | | writing. |
116 +-------------+--------------------------------------------+
Lars Gustäbel0a9dd2f2011-12-10 20:38:14 +0100117 | ``'w|xz'`` | Open an lzma compressed *stream* for |
118 | | writing. |
119 +-------------+--------------------------------------------+
Georg Brandl116aa622007-08-15 14:28:22 +0000120
121
122.. class:: TarFile
123
124 Class for reading and writing tar archives. Do not use this class directly,
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000125 better use :func:`tarfile.open` instead. See :ref:`tarfile-objects`.
Georg Brandl116aa622007-08-15 14:28:22 +0000126
127
128.. function:: is_tarfile(name)
129
130 Return :const:`True` if *name* is a tar archive file, that the :mod:`tarfile`
131 module can read.
132
133
Lars Gustäbel0c24e8b2008-08-02 11:43:24 +0000134The :mod:`tarfile` module defines the following exceptions:
Georg Brandl116aa622007-08-15 14:28:22 +0000135
136
137.. exception:: TarError
138
139 Base class for all :mod:`tarfile` exceptions.
140
141
142.. exception:: ReadError
143
144 Is raised when a tar archive is opened, that either cannot be handled by the
145 :mod:`tarfile` module or is somehow invalid.
146
147
148.. exception:: CompressionError
149
150 Is raised when a compression method is not supported or when the data cannot be
151 decoded properly.
152
153
154.. exception:: StreamError
155
156 Is raised for the limitations that are typical for stream-like :class:`TarFile`
157 objects.
158
159
160.. exception:: ExtractError
161
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000162 Is raised for *non-fatal* errors when using :meth:`TarFile.extract`, but only if
Georg Brandl116aa622007-08-15 14:28:22 +0000163 :attr:`TarFile.errorlevel`\ ``== 2``.
164
165
166.. exception:: HeaderError
167
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000168 Is raised by :meth:`TarInfo.frombuf` if the buffer it gets is invalid.
169
Georg Brandl116aa622007-08-15 14:28:22 +0000170
Georg Brandl116aa622007-08-15 14:28:22 +0000171
172Each of the following constants defines a tar archive format that the
173:mod:`tarfile` module is able to create. See section :ref:`tar-formats` for
174details.
175
176
177.. data:: USTAR_FORMAT
178
179 POSIX.1-1988 (ustar) format.
180
181
182.. data:: GNU_FORMAT
183
184 GNU tar format.
185
186
187.. data:: PAX_FORMAT
188
189 POSIX.1-2001 (pax) format.
190
191
192.. data:: DEFAULT_FORMAT
193
194 The default format for creating archives. This is currently :const:`GNU_FORMAT`.
195
196
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000197The following variables are available on module level:
198
199
200.. data:: ENCODING
201
Victor Stinner0f35e2c2010-06-11 23:46:47 +0000202 The default character encoding: ``'utf-8'`` on Windows,
203 :func:`sys.getfilesystemencoding` otherwise.
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000204
205
Georg Brandl116aa622007-08-15 14:28:22 +0000206.. seealso::
207
208 Module :mod:`zipfile`
209 Documentation of the :mod:`zipfile` standard module.
210
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000211 `GNU tar manual, Basic Tar Format <http://www.gnu.org/software/tar/manual/html_node/Standard.html>`_
Georg Brandl116aa622007-08-15 14:28:22 +0000212 Documentation for tar archive files, including GNU tar extensions.
213
Georg Brandl116aa622007-08-15 14:28:22 +0000214
215.. _tarfile-objects:
216
217TarFile Objects
218---------------
219
220The :class:`TarFile` object provides an interface to a tar archive. A tar
221archive is a sequence of blocks. An archive member (a stored file) is made up of
222a header block followed by data blocks. It is possible to store a file in a tar
223archive several times. Each archive member is represented by a :class:`TarInfo`
224object, see :ref:`tarinfo-objects` for details.
225
Lars Gustäbel01385812010-03-03 12:08:54 +0000226A :class:`TarFile` object can be used as a context manager in a :keyword:`with`
227statement. It will automatically be closed when the block is completed. Please
228note that in the event of an exception an archive opened for writing will not
Benjamin Peterson08bf91c2010-04-11 16:12:57 +0000229be finalized; only the internally used file object will be closed. See the
Lars Gustäbel01385812010-03-03 12:08:54 +0000230:ref:`tar-examples` section for a use case.
231
232.. versionadded:: 3.2
233 Added support for the context manager protocol.
Georg Brandl116aa622007-08-15 14:28:22 +0000234
Victor Stinnerde629d42010-05-05 21:43:57 +0000235.. class:: TarFile(name=None, mode='r', fileobj=None, format=DEFAULT_FORMAT, tarinfo=TarInfo, dereference=False, ignore_zeros=False, encoding=ENCODING, errors='surrogateescape', pax_headers=None, debug=0, errorlevel=0)
Georg Brandl116aa622007-08-15 14:28:22 +0000236
237 All following arguments are optional and can be accessed as instance attributes
238 as well.
239
240 *name* is the pathname of the archive. It can be omitted if *fileobj* is given.
241 In this case, the file object's :attr:`name` attribute is used if it exists.
242
243 *mode* is either ``'r'`` to read from an existing archive, ``'a'`` to append
244 data to an existing file or ``'w'`` to create a new file overwriting an existing
245 one.
246
247 If *fileobj* is given, it is used for reading or writing data. If it can be
248 determined, *mode* is overridden by *fileobj*'s mode. *fileobj* will be used
249 from position 0.
250
251 .. note::
252
253 *fileobj* is not closed, when :class:`TarFile` is closed.
254
255 *format* controls the archive format. It must be one of the constants
256 :const:`USTAR_FORMAT`, :const:`GNU_FORMAT` or :const:`PAX_FORMAT` that are
257 defined at module level.
258
Georg Brandl116aa622007-08-15 14:28:22 +0000259 The *tarinfo* argument can be used to replace the default :class:`TarInfo` class
260 with a different one.
261
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000262 If *dereference* is :const:`False`, add symbolic and hard links to the archive. If it
263 is :const:`True`, add the content of the target files to the archive. This has no
Georg Brandl116aa622007-08-15 14:28:22 +0000264 effect on systems that do not support symbolic links.
265
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000266 If *ignore_zeros* is :const:`False`, treat an empty block as the end of the archive.
267 If it is :const:`True`, skip empty (and invalid) blocks and try to get as many members
Georg Brandl116aa622007-08-15 14:28:22 +0000268 as possible. This is only useful for reading concatenated or damaged archives.
269
270 *debug* can be set from ``0`` (no debug messages) up to ``3`` (all debug
271 messages). The messages are written to ``sys.stderr``.
272
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000273 If *errorlevel* is ``0``, all errors are ignored when using :meth:`TarFile.extract`.
Georg Brandl116aa622007-08-15 14:28:22 +0000274 Nevertheless, they appear as error messages in the debug output, when debugging
Antoine Pitrou62ab10a2011-10-12 20:10:51 +0200275 is enabled. If ``1``, all *fatal* errors are raised as :exc:`OSError`
276 exceptions. If ``2``, all *non-fatal* errors are raised as :exc:`TarError`
277 exceptions as well.
Georg Brandl116aa622007-08-15 14:28:22 +0000278
Lars Gustäbel3741eff2007-08-21 12:17:05 +0000279 The *encoding* and *errors* arguments define the character encoding to be
280 used for reading or writing the archive and how conversion errors are going
281 to be handled. The default settings will work for most users.
Georg Brandl116aa622007-08-15 14:28:22 +0000282 See section :ref:`tar-unicode` for in-depth information.
283
Victor Stinnerde629d42010-05-05 21:43:57 +0000284 .. versionchanged:: 3.2
285 Use ``'surrogateescape'`` as the default for the *errors* argument.
286
Lars Gustäbel3741eff2007-08-21 12:17:05 +0000287 The *pax_headers* argument is an optional dictionary of strings which
Georg Brandl116aa622007-08-15 14:28:22 +0000288 will be added as a pax global header if *format* is :const:`PAX_FORMAT`.
289
Georg Brandl116aa622007-08-15 14:28:22 +0000290
291.. method:: TarFile.open(...)
292
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000293 Alternative constructor. The :func:`tarfile.open` function is actually a
294 shortcut to this classmethod.
Georg Brandl116aa622007-08-15 14:28:22 +0000295
296
297.. method:: TarFile.getmember(name)
298
299 Return a :class:`TarInfo` object for member *name*. If *name* can not be found
300 in the archive, :exc:`KeyError` is raised.
301
302 .. note::
303
304 If a member occurs more than once in the archive, its last occurrence is assumed
305 to be the most up-to-date version.
306
307
308.. method:: TarFile.getmembers()
309
310 Return the members of the archive as a list of :class:`TarInfo` objects. The
311 list has the same order as the members in the archive.
312
313
314.. method:: TarFile.getnames()
315
316 Return the members as a list of their names. It has the same order as the list
317 returned by :meth:`getmembers`.
318
319
320.. method:: TarFile.list(verbose=True)
321
322 Print a table of contents to ``sys.stdout``. If *verbose* is :const:`False`,
323 only the names of the members are printed. If it is :const:`True`, output
324 similar to that of :program:`ls -l` is produced.
325
326
327.. method:: TarFile.next()
328
329 Return the next member of the archive as a :class:`TarInfo` object, when
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000330 :class:`TarFile` is opened for reading. Return :const:`None` if there is no more
Georg Brandl116aa622007-08-15 14:28:22 +0000331 available.
332
333
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000334.. method:: TarFile.extractall(path=".", members=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000335
336 Extract all members from the archive to the current working directory or
337 directory *path*. If optional *members* is given, it must be a subset of the
338 list returned by :meth:`getmembers`. Directory information like owner,
339 modification time and permissions are set after all members have been extracted.
340 This is done to work around two problems: A directory's modification time is
341 reset each time a file is created in it. And, if a directory's permissions do
342 not allow writing, extracting files to it will fail.
343
Thomas Wouters47b49bf2007-08-30 22:15:33 +0000344 .. warning::
345
346 Never extract archives from untrusted sources without prior inspection.
347 It is possible that files are created outside of *path*, e.g. members
348 that have absolute filenames starting with ``"/"`` or filenames with two
349 dots ``".."``.
350
Georg Brandl116aa622007-08-15 14:28:22 +0000351
Martin v. Löwis16f344d2010-11-01 21:39:13 +0000352.. method:: TarFile.extract(member, path="", set_attrs=True)
Georg Brandl116aa622007-08-15 14:28:22 +0000353
354 Extract a member from the archive to the current working directory, using its
355 full name. Its file information is extracted as accurately as possible. *member*
356 may be a filename or a :class:`TarInfo` object. You can specify a different
Martin v. Löwis16f344d2010-11-01 21:39:13 +0000357 directory using *path*. File attributes (owner, mtime, mode) are set unless
358 *set_attrs* is False.
Georg Brandl116aa622007-08-15 14:28:22 +0000359
360 .. note::
361
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000362 The :meth:`extract` method does not take care of several extraction issues.
363 In most cases you should consider using the :meth:`extractall` method.
Georg Brandl116aa622007-08-15 14:28:22 +0000364
Thomas Wouters47b49bf2007-08-30 22:15:33 +0000365 .. warning::
366
367 See the warning for :meth:`extractall`.
368
Martin v. Löwis16f344d2010-11-01 21:39:13 +0000369 .. versionchanged:: 3.2
370 Added the *set_attrs* parameter.
Georg Brandl116aa622007-08-15 14:28:22 +0000371
372.. method:: TarFile.extractfile(member)
373
374 Extract a member from the archive as a file object. *member* may be a filename
Antoine Pitrou11cb9612010-09-15 11:11:28 +0000375 or a :class:`TarInfo` object. If *member* is a regular file, a :term:`file-like
376 object` is returned. If *member* is a link, a file-like object is constructed from
377 the link's target. If *member* is none of the above, :const:`None` is returned.
Georg Brandl116aa622007-08-15 14:28:22 +0000378
379 .. note::
380
Georg Brandlff2ad0e2009-04-27 16:51:45 +0000381 The file-like object is read-only. It provides the methods
382 :meth:`read`, :meth:`readline`, :meth:`readlines`, :meth:`seek`, :meth:`tell`,
383 and :meth:`close`, and also supports iteration over its lines.
Georg Brandl116aa622007-08-15 14:28:22 +0000384
385
Raymond Hettingera63a3122011-01-26 20:34:14 +0000386.. method:: TarFile.add(name, arcname=None, recursive=True, exclude=None, *, filter=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000387
Raymond Hettingera63a3122011-01-26 20:34:14 +0000388 Add the file *name* to the archive. *name* may be any type of file
389 (directory, fifo, symbolic link, etc.). If given, *arcname* specifies an
390 alternative name for the file in the archive. Directories are added
391 recursively by default. This can be avoided by setting *recursive* to
392 :const:`False`. If *exclude* is given, it must be a function that takes one
393 filename argument and returns a boolean value. Depending on this value the
394 respective file is either excluded (:const:`True`) or added
395 (:const:`False`). If *filter* is specified it must be a keyword argument. It
396 should be a function that takes a :class:`TarInfo` object argument and
397 returns the changed :class:`TarInfo` object. If it instead returns
398 :const:`None` the :class:`TarInfo` object will be excluded from the
399 archive. See :ref:`tar-examples` for an example.
Lars Gustäbel049d2aa2009-09-12 10:44:00 +0000400
401 .. versionchanged:: 3.2
402 Added the *filter* parameter.
403
404 .. deprecated:: 3.2
405 The *exclude* parameter is deprecated, please use the *filter* parameter
406 instead.
Georg Brandl116aa622007-08-15 14:28:22 +0000407
Georg Brandl116aa622007-08-15 14:28:22 +0000408
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000409.. method:: TarFile.addfile(tarinfo, fileobj=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000410
411 Add the :class:`TarInfo` object *tarinfo* to the archive. If *fileobj* is given,
412 ``tarinfo.size`` bytes are read from it and added to the archive. You can
413 create :class:`TarInfo` objects using :meth:`gettarinfo`.
414
415 .. note::
416
417 On Windows platforms, *fileobj* should always be opened with mode ``'rb'`` to
418 avoid irritation about the file size.
419
420
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000421.. method:: TarFile.gettarinfo(name=None, arcname=None, fileobj=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000422
Antoine Pitrou11cb9612010-09-15 11:11:28 +0000423 Create a :class:`TarInfo` object for either the file *name* or the :term:`file
424 object` *fileobj* (using :func:`os.fstat` on its file descriptor). You can modify
425 some of the :class:`TarInfo`'s attributes before you add it using :meth:`addfile`.
Georg Brandl116aa622007-08-15 14:28:22 +0000426 If given, *arcname* specifies an alternative name for the file in the archive.
427
428
429.. method:: TarFile.close()
430
431 Close the :class:`TarFile`. In write mode, two finishing zero blocks are
432 appended to the archive.
433
434
Georg Brandl116aa622007-08-15 14:28:22 +0000435.. attribute:: TarFile.pax_headers
436
437 A dictionary containing key-value pairs of pax global headers.
438
Georg Brandl116aa622007-08-15 14:28:22 +0000439
Georg Brandl116aa622007-08-15 14:28:22 +0000440
441.. _tarinfo-objects:
442
443TarInfo Objects
444---------------
445
446A :class:`TarInfo` object represents one member in a :class:`TarFile`. Aside
447from storing all required attributes of a file (like file type, size, time,
448permissions, owner etc.), it provides some useful methods to determine its type.
449It does *not* contain the file's data itself.
450
451:class:`TarInfo` objects are returned by :class:`TarFile`'s methods
452:meth:`getmember`, :meth:`getmembers` and :meth:`gettarinfo`.
453
454
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000455.. class:: TarInfo(name="")
Georg Brandl116aa622007-08-15 14:28:22 +0000456
457 Create a :class:`TarInfo` object.
458
459
460.. method:: TarInfo.frombuf(buf)
461
462 Create and return a :class:`TarInfo` object from string buffer *buf*.
463
Georg Brandl55ac8f02007-09-01 13:51:09 +0000464 Raises :exc:`HeaderError` if the buffer is invalid..
Georg Brandl116aa622007-08-15 14:28:22 +0000465
466
467.. method:: TarInfo.fromtarfile(tarfile)
468
469 Read the next member from the :class:`TarFile` object *tarfile* and return it as
470 a :class:`TarInfo` object.
471
Georg Brandl116aa622007-08-15 14:28:22 +0000472
Victor Stinnerde629d42010-05-05 21:43:57 +0000473.. method:: TarInfo.tobuf(format=DEFAULT_FORMAT, encoding=ENCODING, errors='surrogateescape')
Georg Brandl116aa622007-08-15 14:28:22 +0000474
475 Create a string buffer from a :class:`TarInfo` object. For information on the
476 arguments see the constructor of the :class:`TarFile` class.
477
Victor Stinnerde629d42010-05-05 21:43:57 +0000478 .. versionchanged:: 3.2
479 Use ``'surrogateescape'`` as the default for the *errors* argument.
480
Georg Brandl116aa622007-08-15 14:28:22 +0000481
482A ``TarInfo`` object has the following public data attributes:
483
484
485.. attribute:: TarInfo.name
486
487 Name of the archive member.
488
489
490.. attribute:: TarInfo.size
491
492 Size in bytes.
493
494
495.. attribute:: TarInfo.mtime
496
497 Time of last modification.
498
499
500.. attribute:: TarInfo.mode
501
502 Permission bits.
503
504
505.. attribute:: TarInfo.type
506
507 File type. *type* is usually one of these constants: :const:`REGTYPE`,
508 :const:`AREGTYPE`, :const:`LNKTYPE`, :const:`SYMTYPE`, :const:`DIRTYPE`,
509 :const:`FIFOTYPE`, :const:`CONTTYPE`, :const:`CHRTYPE`, :const:`BLKTYPE`,
510 :const:`GNUTYPE_SPARSE`. To determine the type of a :class:`TarInfo` object
511 more conveniently, use the ``is_*()`` methods below.
512
513
514.. attribute:: TarInfo.linkname
515
516 Name of the target file name, which is only present in :class:`TarInfo` objects
517 of type :const:`LNKTYPE` and :const:`SYMTYPE`.
518
519
520.. attribute:: TarInfo.uid
521
522 User ID of the user who originally stored this member.
523
524
525.. attribute:: TarInfo.gid
526
527 Group ID of the user who originally stored this member.
528
529
530.. attribute:: TarInfo.uname
531
532 User name.
533
534
535.. attribute:: TarInfo.gname
536
537 Group name.
538
539
540.. attribute:: TarInfo.pax_headers
541
542 A dictionary containing key-value pairs of an associated pax extended header.
543
Georg Brandl116aa622007-08-15 14:28:22 +0000544
545A :class:`TarInfo` object also provides some convenient query methods:
546
547
548.. method:: TarInfo.isfile()
549
550 Return :const:`True` if the :class:`Tarinfo` object is a regular file.
551
552
553.. method:: TarInfo.isreg()
554
555 Same as :meth:`isfile`.
556
557
558.. method:: TarInfo.isdir()
559
560 Return :const:`True` if it is a directory.
561
562
563.. method:: TarInfo.issym()
564
565 Return :const:`True` if it is a symbolic link.
566
567
568.. method:: TarInfo.islnk()
569
570 Return :const:`True` if it is a hard link.
571
572
573.. method:: TarInfo.ischr()
574
575 Return :const:`True` if it is a character device.
576
577
578.. method:: TarInfo.isblk()
579
580 Return :const:`True` if it is a block device.
581
582
583.. method:: TarInfo.isfifo()
584
585 Return :const:`True` if it is a FIFO.
586
587
588.. method:: TarInfo.isdev()
589
590 Return :const:`True` if it is one of character device, block device or FIFO.
591
Georg Brandl116aa622007-08-15 14:28:22 +0000592
593.. _tar-examples:
594
595Examples
596--------
597
598How to extract an entire tar archive to the current working directory::
599
600 import tarfile
601 tar = tarfile.open("sample.tar.gz")
602 tar.extractall()
603 tar.close()
604
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000605How to extract a subset of a tar archive with :meth:`TarFile.extractall` using
606a generator function instead of a list::
607
608 import os
609 import tarfile
610
611 def py_files(members):
612 for tarinfo in members:
613 if os.path.splitext(tarinfo.name)[1] == ".py":
614 yield tarinfo
615
616 tar = tarfile.open("sample.tar.gz")
617 tar.extractall(members=py_files(tar))
618 tar.close()
619
Georg Brandl116aa622007-08-15 14:28:22 +0000620How to create an uncompressed tar archive from a list of filenames::
621
622 import tarfile
623 tar = tarfile.open("sample.tar", "w")
624 for name in ["foo", "bar", "quux"]:
625 tar.add(name)
626 tar.close()
627
Lars Gustäbel01385812010-03-03 12:08:54 +0000628The same example using the :keyword:`with` statement::
629
630 import tarfile
631 with tarfile.open("sample.tar", "w") as tar:
632 for name in ["foo", "bar", "quux"]:
633 tar.add(name)
634
Georg Brandl116aa622007-08-15 14:28:22 +0000635How to read a gzip compressed tar archive and display some member information::
636
637 import tarfile
638 tar = tarfile.open("sample.tar.gz", "r:gz")
639 for tarinfo in tar:
Collin Winterc79461b2007-09-01 23:34:30 +0000640 print(tarinfo.name, "is", tarinfo.size, "bytes in size and is", end="")
Georg Brandl116aa622007-08-15 14:28:22 +0000641 if tarinfo.isreg():
Collin Winterc79461b2007-09-01 23:34:30 +0000642 print("a regular file.")
Georg Brandl116aa622007-08-15 14:28:22 +0000643 elif tarinfo.isdir():
Collin Winterc79461b2007-09-01 23:34:30 +0000644 print("a directory.")
Georg Brandl116aa622007-08-15 14:28:22 +0000645 else:
Collin Winterc79461b2007-09-01 23:34:30 +0000646 print("something else.")
Georg Brandl116aa622007-08-15 14:28:22 +0000647 tar.close()
648
Lars Gustäbel049d2aa2009-09-12 10:44:00 +0000649How to create an archive and reset the user information using the *filter*
650parameter in :meth:`TarFile.add`::
651
652 import tarfile
653 def reset(tarinfo):
654 tarinfo.uid = tarinfo.gid = 0
655 tarinfo.uname = tarinfo.gname = "root"
656 return tarinfo
657 tar = tarfile.open("sample.tar.gz", "w:gz")
658 tar.add("foo", filter=reset)
659 tar.close()
660
Georg Brandl116aa622007-08-15 14:28:22 +0000661
662.. _tar-formats:
663
664Supported tar formats
665---------------------
666
667There are three tar formats that can be created with the :mod:`tarfile` module:
668
669* The POSIX.1-1988 ustar format (:const:`USTAR_FORMAT`). It supports filenames
670 up to a length of at best 256 characters and linknames up to 100 characters. The
671 maximum file size is 8 gigabytes. This is an old and limited but widely
672 supported format.
673
674* The GNU tar format (:const:`GNU_FORMAT`). It supports long filenames and
675 linknames, files bigger than 8 gigabytes and sparse files. It is the de facto
676 standard on GNU/Linux systems. :mod:`tarfile` fully supports the GNU tar
677 extensions for long names, sparse file support is read-only.
678
679* The POSIX.1-2001 pax format (:const:`PAX_FORMAT`). It is the most flexible
680 format with virtually no limits. It supports long filenames and linknames, large
681 files and stores pathnames in a portable way. However, not all tar
682 implementations today are able to handle pax archives properly.
683
684 The *pax* format is an extension to the existing *ustar* format. It uses extra
685 headers for information that cannot be stored otherwise. There are two flavours
686 of pax headers: Extended headers only affect the subsequent file header, global
687 headers are valid for the complete archive and affect all following files. All
688 the data in a pax header is encoded in *UTF-8* for portability reasons.
689
690There are some more variants of the tar format which can be read, but not
691created:
692
693* The ancient V7 format. This is the first tar format from Unix Seventh Edition,
694 storing only regular files and directories. Names must not be longer than 100
695 characters, there is no user/group name information. Some archives have
696 miscalculated header checksums in case of fields with non-ASCII characters.
697
698* The SunOS tar extended format. This format is a variant of the POSIX.1-2001
699 pax format, but is not compatible.
700
Georg Brandl116aa622007-08-15 14:28:22 +0000701.. _tar-unicode:
702
703Unicode issues
704--------------
705
706The tar format was originally conceived to make backups on tape drives with the
707main focus on preserving file system information. Nowadays tar archives are
708commonly used for file distribution and exchanging archives over networks. One
Lars Gustäbel3741eff2007-08-21 12:17:05 +0000709problem of the original format (which is the basis of all other formats) is
710that there is no concept of supporting different character encodings. For
Georg Brandl116aa622007-08-15 14:28:22 +0000711example, an ordinary tar archive created on a *UTF-8* system cannot be read
Lars Gustäbel3741eff2007-08-21 12:17:05 +0000712correctly on a *Latin-1* system if it contains non-*ASCII* characters. Textual
713metadata (like filenames, linknames, user/group names) will appear damaged.
714Unfortunately, there is no way to autodetect the encoding of an archive. The
715pax format was designed to solve this problem. It stores non-ASCII metadata
716using the universal character encoding *UTF-8*.
Georg Brandl116aa622007-08-15 14:28:22 +0000717
Lars Gustäbel3741eff2007-08-21 12:17:05 +0000718The details of character conversion in :mod:`tarfile` are controlled by the
719*encoding* and *errors* keyword arguments of the :class:`TarFile` class.
Georg Brandl116aa622007-08-15 14:28:22 +0000720
Lars Gustäbel3741eff2007-08-21 12:17:05 +0000721*encoding* defines the character encoding to use for the metadata in the
722archive. The default value is :func:`sys.getfilesystemencoding` or ``'ascii'``
723as a fallback. Depending on whether the archive is read or written, the
724metadata must be either decoded or encoded. If *encoding* is not set
725appropriately, this conversion may fail.
Georg Brandl116aa622007-08-15 14:28:22 +0000726
727The *errors* argument defines how characters are treated that cannot be
Victor Stinnerde629d42010-05-05 21:43:57 +0000728converted. Possible values are listed in section :ref:`codec-base-classes`.
729The default scheme is ``'surrogateescape'`` which Python also uses for its
730file system calls, see :ref:`os-filenames`.
Georg Brandl116aa622007-08-15 14:28:22 +0000731
Lars Gustäbel1465cc22010-05-17 18:02:50 +0000732In case of :const:`PAX_FORMAT` archives, *encoding* is generally not needed
733because all the metadata is stored using *UTF-8*. *encoding* is only used in
734the rare cases when binary pax headers are decoded or when strings with
735surrogate characters are stored.
736