blob: 32ec2301fd1c85b5b5361dc0a67071529e1be960 [file] [log] [blame]
Georg Brandl8ec7f652007-08-15 14:28:01 +00001:mod:`tarfile` --- Read and write tar archive files
2===================================================
3
4.. module:: tarfile
5 :synopsis: Read and write tar-format archive files.
6
7
8.. versionadded:: 2.3
9
10.. moduleauthor:: Lars Gustäbel <lars@gustaebel.de>
11.. sectionauthor:: Lars Gustäbel <lars@gustaebel.de>
12
Éric Araujo29a0b572011-08-19 02:14:03 +020013**Source code:** :source:`Lib/tarfile.py`
14
15--------------
Georg Brandl8ec7f652007-08-15 14:28:01 +000016
Mark Summerfieldaea6e592007-11-05 09:22:48 +000017The :mod:`tarfile` module makes it possible to read and write tar
18archives, including those using gzip or bz2 compression.
Éric Araujoc3cc2ac2012-02-26 01:10:14 +010019Use the :mod:`zipfile` module to read or write :file:`.zip` files, or the
20higher-level functions in :ref:`shutil <archiving-operations>`.
Mark Summerfieldaea6e592007-11-05 09:22:48 +000021
Georg Brandl8ec7f652007-08-15 14:28:01 +000022Some facts and figures:
23
Mark Summerfieldaea6e592007-11-05 09:22:48 +000024* reads and writes :mod:`gzip` and :mod:`bz2` compressed archives.
Georg Brandl8ec7f652007-08-15 14:28:01 +000025
26* read/write support for the POSIX.1-1988 (ustar) format.
27
28* read/write support for the GNU tar format including *longname* and *longlink*
29 extensions, read-only support for the *sparse* extension.
30
31* read/write support for the POSIX.1-2001 (pax) format.
32
33 .. versionadded:: 2.6
34
35* handles directories, regular files, hardlinks, symbolic links, fifos,
36 character devices and block devices and is able to acquire and restore file
37 information like timestamp, access permissions and owner.
38
Georg Brandl8ec7f652007-08-15 14:28:01 +000039
Lars Gustäbel4bfb5932008-05-17 16:50:22 +000040.. function:: open(name=None, mode='r', fileobj=None, bufsize=10240, \*\*kwargs)
Georg Brandl8ec7f652007-08-15 14:28:01 +000041
42 Return a :class:`TarFile` object for the pathname *name*. For detailed
43 information on :class:`TarFile` objects and the keyword arguments that are
44 allowed, see :ref:`tarfile-objects`.
45
46 *mode* has to be a string of the form ``'filemode[:compression]'``, it defaults
47 to ``'r'``. Here is a full list of mode combinations:
48
49 +------------------+---------------------------------------------+
50 | mode | action |
51 +==================+=============================================+
52 | ``'r' or 'r:*'`` | Open for reading with transparent |
53 | | compression (recommended). |
54 +------------------+---------------------------------------------+
55 | ``'r:'`` | Open for reading exclusively without |
56 | | compression. |
57 +------------------+---------------------------------------------+
58 | ``'r:gz'`` | Open for reading with gzip compression. |
59 +------------------+---------------------------------------------+
60 | ``'r:bz2'`` | Open for reading with bzip2 compression. |
61 +------------------+---------------------------------------------+
62 | ``'a' or 'a:'`` | Open for appending with no compression. The |
63 | | file is created if it does not exist. |
64 +------------------+---------------------------------------------+
65 | ``'w' or 'w:'`` | Open for uncompressed writing. |
66 +------------------+---------------------------------------------+
67 | ``'w:gz'`` | Open for gzip compressed writing. |
68 +------------------+---------------------------------------------+
69 | ``'w:bz2'`` | Open for bzip2 compressed writing. |
70 +------------------+---------------------------------------------+
71
72 Note that ``'a:gz'`` or ``'a:bz2'`` is not possible. If *mode* is not suitable
73 to open a certain (compressed) file for reading, :exc:`ReadError` is raised. Use
74 *mode* ``'r'`` to avoid this. If a compression method is not supported,
75 :exc:`CompressionError` is raised.
76
77 If *fileobj* is specified, it is used as an alternative to a file object opened
78 for *name*. It is supposed to be at position 0.
79
Benjamin Peterson3afd9562014-06-07 12:45:37 -070080 For modes ``'w:gz'``, ``'r:gz'``, ``'w:bz2'``, ``'r:bz2'``, :func:`tarfile.open`
81 accepts the keyword argument *compresslevel* to specify the compression level of
82 the file.
83
Georg Brandl8ec7f652007-08-15 14:28:01 +000084 For special purposes, there is a second format for *mode*:
Lars Gustäbel4bfb5932008-05-17 16:50:22 +000085 ``'filemode|[compression]'``. :func:`tarfile.open` will return a :class:`TarFile`
Georg Brandl8ec7f652007-08-15 14:28:01 +000086 object that processes its data as a stream of blocks. No random seeking will
87 be done on the file. If given, *fileobj* may be any object that has a
88 :meth:`read` or :meth:`write` method (depending on the *mode*). *bufsize*
89 specifies the blocksize and defaults to ``20 * 512`` bytes. Use this variant
90 in combination with e.g. ``sys.stdin``, a socket file object or a tape
91 device. However, such a :class:`TarFile` object is limited in that it does
92 not allow to be accessed randomly, see :ref:`tar-examples`. The currently
93 possible modes:
94
95 +-------------+--------------------------------------------+
96 | Mode | Action |
97 +=============+============================================+
98 | ``'r|*'`` | Open a *stream* of tar blocks for reading |
99 | | with transparent compression. |
100 +-------------+--------------------------------------------+
101 | ``'r|'`` | Open a *stream* of uncompressed tar blocks |
102 | | for reading. |
103 +-------------+--------------------------------------------+
104 | ``'r|gz'`` | Open a gzip compressed *stream* for |
105 | | reading. |
106 +-------------+--------------------------------------------+
107 | ``'r|bz2'`` | Open a bzip2 compressed *stream* for |
108 | | reading. |
109 +-------------+--------------------------------------------+
110 | ``'w|'`` | Open an uncompressed *stream* for writing. |
111 +-------------+--------------------------------------------+
112 | ``'w|gz'`` | Open an gzip compressed *stream* for |
113 | | writing. |
114 +-------------+--------------------------------------------+
115 | ``'w|bz2'`` | Open an bzip2 compressed *stream* for |
116 | | writing. |
117 +-------------+--------------------------------------------+
118
119
120.. class:: TarFile
121
122 Class for reading and writing tar archives. Do not use this class directly,
Lars Gustäbel4bfb5932008-05-17 16:50:22 +0000123 better use :func:`tarfile.open` instead. See :ref:`tarfile-objects`.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000124
125
126.. function:: is_tarfile(name)
127
128 Return :const:`True` if *name* is a tar archive file, that the :mod:`tarfile`
129 module can read.
130
131
Lars Gustäbel4bfb5932008-05-17 16:50:22 +0000132.. class:: TarFileCompat(filename, mode='r', compression=TAR_PLAIN)
Georg Brandl8ec7f652007-08-15 14:28:01 +0000133
134 Class for limited access to tar archives with a :mod:`zipfile`\ -like interface.
135 Please consult the documentation of the :mod:`zipfile` module for more details.
136 *compression* must be one of the following constants:
137
138
139 .. data:: TAR_PLAIN
140
141 Constant for an uncompressed tar archive.
142
143
144 .. data:: TAR_GZIPPED
145
146 Constant for a :mod:`gzip` compressed tar archive.
147
148
Lars Gustäbel727bd0b2008-08-02 11:26:39 +0000149 .. deprecated:: 2.6
Ezio Melotti510ff542012-05-03 19:21:40 +0300150 The :class:`TarFileCompat` class has been removed in Python 3.
Lars Gustäbel727bd0b2008-08-02 11:26:39 +0000151
152
Georg Brandl8ec7f652007-08-15 14:28:01 +0000153.. exception:: TarError
154
155 Base class for all :mod:`tarfile` exceptions.
156
157
158.. exception:: ReadError
159
160 Is raised when a tar archive is opened, that either cannot be handled by the
161 :mod:`tarfile` module or is somehow invalid.
162
163
164.. exception:: CompressionError
165
166 Is raised when a compression method is not supported or when the data cannot be
167 decoded properly.
168
169
170.. exception:: StreamError
171
172 Is raised for the limitations that are typical for stream-like :class:`TarFile`
173 objects.
174
175
176.. exception:: ExtractError
177
Lars Gustäbel4bfb5932008-05-17 16:50:22 +0000178 Is raised for *non-fatal* errors when using :meth:`TarFile.extract`, but only if
Georg Brandl8ec7f652007-08-15 14:28:01 +0000179 :attr:`TarFile.errorlevel`\ ``== 2``.
180
181
182.. exception:: HeaderError
183
Lars Gustäbel4bfb5932008-05-17 16:50:22 +0000184 Is raised by :meth:`TarInfo.frombuf` if the buffer it gets is invalid.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000185
186 .. versionadded:: 2.6
187
Lars Gustäbel4bfb5932008-05-17 16:50:22 +0000188
Georg Brandl8ec7f652007-08-15 14:28:01 +0000189Each of the following constants defines a tar archive format that the
190:mod:`tarfile` module is able to create. See section :ref:`tar-formats` for
191details.
192
193
194.. data:: USTAR_FORMAT
195
196 POSIX.1-1988 (ustar) format.
197
198
199.. data:: GNU_FORMAT
200
201 GNU tar format.
202
203
204.. data:: PAX_FORMAT
205
206 POSIX.1-2001 (pax) format.
207
208
209.. data:: DEFAULT_FORMAT
210
211 The default format for creating archives. This is currently :const:`GNU_FORMAT`.
212
213
Lars Gustäbel4bfb5932008-05-17 16:50:22 +0000214The following variables are available on module level:
215
216
217.. data:: ENCODING
218
219 The default character encoding i.e. the value from either
220 :func:`sys.getfilesystemencoding` or :func:`sys.getdefaultencoding`.
221
222
Georg Brandl8ec7f652007-08-15 14:28:01 +0000223.. seealso::
224
225 Module :mod:`zipfile`
226 Documentation of the :mod:`zipfile` standard module.
227
Lars Gustäbel4bfb5932008-05-17 16:50:22 +0000228 `GNU tar manual, Basic Tar Format <http://www.gnu.org/software/tar/manual/html_node/Standard.html>`_
Georg Brandl8ec7f652007-08-15 14:28:01 +0000229 Documentation for tar archive files, including GNU tar extensions.
230
Georg Brandl8ec7f652007-08-15 14:28:01 +0000231
232.. _tarfile-objects:
233
234TarFile Objects
235---------------
236
237The :class:`TarFile` object provides an interface to a tar archive. A tar
238archive is a sequence of blocks. An archive member (a stored file) is made up of
239a header block followed by data blocks. It is possible to store a file in a tar
240archive several times. Each archive member is represented by a :class:`TarInfo`
241object, see :ref:`tarinfo-objects` for details.
242
Lars Gustäbel64581042010-03-03 11:55:48 +0000243A :class:`TarFile` object can be used as a context manager in a :keyword:`with`
244statement. It will automatically be closed when the block is completed. Please
245note that in the event of an exception an archive opened for writing will not
Andrew M. Kuchlingca2413e2010-04-11 01:40:06 +0000246be finalized; only the internally used file object will be closed. See the
Lars Gustäbel64581042010-03-03 11:55:48 +0000247:ref:`tar-examples` section for a use case.
248
249.. versionadded:: 2.7
Serhiy Storchaka581448b2014-09-10 23:46:14 +0300250 Added support for the context management protocol.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000251
Lars Gustäbel4bfb5932008-05-17 16:50:22 +0000252.. class:: TarFile(name=None, mode='r', fileobj=None, format=DEFAULT_FORMAT, tarinfo=TarInfo, dereference=False, ignore_zeros=False, encoding=ENCODING, errors=None, pax_headers=None, debug=0, errorlevel=0)
Georg Brandl8ec7f652007-08-15 14:28:01 +0000253
254 All following arguments are optional and can be accessed as instance attributes
255 as well.
256
257 *name* is the pathname of the archive. It can be omitted if *fileobj* is given.
258 In this case, the file object's :attr:`name` attribute is used if it exists.
259
260 *mode* is either ``'r'`` to read from an existing archive, ``'a'`` to append
261 data to an existing file or ``'w'`` to create a new file overwriting an existing
262 one.
263
264 If *fileobj* is given, it is used for reading or writing data. If it can be
265 determined, *mode* is overridden by *fileobj*'s mode. *fileobj* will be used
266 from position 0.
267
268 .. note::
269
270 *fileobj* is not closed, when :class:`TarFile` is closed.
271
272 *format* controls the archive format. It must be one of the constants
273 :const:`USTAR_FORMAT`, :const:`GNU_FORMAT` or :const:`PAX_FORMAT` that are
274 defined at module level.
275
276 .. versionadded:: 2.6
277
278 The *tarinfo* argument can be used to replace the default :class:`TarInfo` class
279 with a different one.
280
281 .. versionadded:: 2.6
282
Lars Gustäbel4bfb5932008-05-17 16:50:22 +0000283 If *dereference* is :const:`False`, add symbolic and hard links to the archive. If it
284 is :const:`True`, add the content of the target files to the archive. This has no
Georg Brandl8ec7f652007-08-15 14:28:01 +0000285 effect on systems that do not support symbolic links.
286
Lars Gustäbel4bfb5932008-05-17 16:50:22 +0000287 If *ignore_zeros* is :const:`False`, treat an empty block as the end of the archive.
288 If it is :const:`True`, skip empty (and invalid) blocks and try to get as many members
Georg Brandl8ec7f652007-08-15 14:28:01 +0000289 as possible. This is only useful for reading concatenated or damaged archives.
290
291 *debug* can be set from ``0`` (no debug messages) up to ``3`` (all debug
292 messages). The messages are written to ``sys.stderr``.
293
Lars Gustäbel4bfb5932008-05-17 16:50:22 +0000294 If *errorlevel* is ``0``, all errors are ignored when using :meth:`TarFile.extract`.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000295 Nevertheless, they appear as error messages in the debug output, when debugging
296 is enabled. If ``1``, all *fatal* errors are raised as :exc:`OSError` or
297 :exc:`IOError` exceptions. If ``2``, all *non-fatal* errors are raised as
298 :exc:`TarError` exceptions as well.
299
300 The *encoding* and *errors* arguments control the way strings are converted to
301 unicode objects and vice versa. The default settings will work for most users.
302 See section :ref:`tar-unicode` for in-depth information.
303
304 .. versionadded:: 2.6
305
306 The *pax_headers* argument is an optional dictionary of unicode strings which
307 will be added as a pax global header if *format* is :const:`PAX_FORMAT`.
308
309 .. versionadded:: 2.6
310
311
Raymond Hettingerfd613492014-05-23 03:43:29 +0100312.. classmethod:: TarFile.open(...)
Georg Brandl8ec7f652007-08-15 14:28:01 +0000313
Lars Gustäbel4bfb5932008-05-17 16:50:22 +0000314 Alternative constructor. The :func:`tarfile.open` function is actually a
315 shortcut to this classmethod.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000316
317
318.. method:: TarFile.getmember(name)
319
320 Return a :class:`TarInfo` object for member *name*. If *name* can not be found
321 in the archive, :exc:`KeyError` is raised.
322
323 .. note::
324
325 If a member occurs more than once in the archive, its last occurrence is assumed
326 to be the most up-to-date version.
327
328
329.. method:: TarFile.getmembers()
330
331 Return the members of the archive as a list of :class:`TarInfo` objects. The
332 list has the same order as the members in the archive.
333
334
335.. method:: TarFile.getnames()
336
337 Return the members as a list of their names. It has the same order as the list
338 returned by :meth:`getmembers`.
339
340
341.. method:: TarFile.list(verbose=True)
342
343 Print a table of contents to ``sys.stdout``. If *verbose* is :const:`False`,
344 only the names of the members are printed. If it is :const:`True`, output
345 similar to that of :program:`ls -l` is produced.
346
347
348.. method:: TarFile.next()
349
350 Return the next member of the archive as a :class:`TarInfo` object, when
Lars Gustäbel4bfb5932008-05-17 16:50:22 +0000351 :class:`TarFile` is opened for reading. Return :const:`None` if there is no more
Georg Brandl8ec7f652007-08-15 14:28:01 +0000352 available.
353
354
Lars Gustäbel4bfb5932008-05-17 16:50:22 +0000355.. method:: TarFile.extractall(path=".", members=None)
Georg Brandl8ec7f652007-08-15 14:28:01 +0000356
357 Extract all members from the archive to the current working directory or
358 directory *path*. If optional *members* is given, it must be a subset of the
359 list returned by :meth:`getmembers`. Directory information like owner,
360 modification time and permissions are set after all members have been extracted.
361 This is done to work around two problems: A directory's modification time is
362 reset each time a file is created in it. And, if a directory's permissions do
363 not allow writing, extracting files to it will fail.
364
Lars Gustäbel89241a32007-08-30 20:24:31 +0000365 .. warning::
366
367 Never extract archives from untrusted sources without prior inspection.
368 It is possible that files are created outside of *path*, e.g. members
369 that have absolute filenames starting with ``"/"`` or filenames with two
370 dots ``".."``.
371
Georg Brandl8ec7f652007-08-15 14:28:01 +0000372 .. versionadded:: 2.5
373
374
Lars Gustäbel4bfb5932008-05-17 16:50:22 +0000375.. method:: TarFile.extract(member, path="")
Georg Brandl8ec7f652007-08-15 14:28:01 +0000376
377 Extract a member from the archive to the current working directory, using its
378 full name. Its file information is extracted as accurately as possible. *member*
379 may be a filename or a :class:`TarInfo` object. You can specify a different
380 directory using *path*.
381
382 .. note::
383
Lars Gustäbel4bfb5932008-05-17 16:50:22 +0000384 The :meth:`extract` method does not take care of several extraction issues.
385 In most cases you should consider using the :meth:`extractall` method.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000386
Lars Gustäbel89241a32007-08-30 20:24:31 +0000387 .. warning::
388
389 See the warning for :meth:`extractall`.
390
Georg Brandl8ec7f652007-08-15 14:28:01 +0000391
392.. method:: TarFile.extractfile(member)
393
394 Extract a member from the archive as a file object. *member* may be a filename
395 or a :class:`TarInfo` object. If *member* is a regular file, a file-like object
396 is returned. If *member* is a link, a file-like object is constructed from the
Lars Gustäbel4bfb5932008-05-17 16:50:22 +0000397 link's target. If *member* is none of the above, :const:`None` is returned.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000398
399 .. note::
400
Georg Brandlcf5608d2009-04-25 15:05:04 +0000401 The file-like object is read-only. It provides the methods
402 :meth:`read`, :meth:`readline`, :meth:`readlines`, :meth:`seek`, :meth:`tell`,
403 and :meth:`close`, and also supports iteration over its lines.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000404
405
Lars Gustäbel21121e62009-09-12 10:28:15 +0000406.. method:: TarFile.add(name, arcname=None, recursive=True, exclude=None, filter=None)
Georg Brandl8ec7f652007-08-15 14:28:01 +0000407
408 Add the file *name* to the archive. *name* may be any type of file (directory,
409 fifo, symbolic link, etc.). If given, *arcname* specifies an alternative name
410 for the file in the archive. Directories are added recursively by default. This
411 can be avoided by setting *recursive* to :const:`False`. If *exclude* is given
412 it must be a function that takes one filename argument and returns a boolean
413 value. Depending on this value the respective file is either excluded
Lars Gustäbel21121e62009-09-12 10:28:15 +0000414 (:const:`True`) or added (:const:`False`). If *filter* is specified it must
415 be a function that takes a :class:`TarInfo` object argument and returns the
Andrew M. Kuchlingf5852f52009-10-05 21:24:35 +0000416 changed :class:`TarInfo` object. If it instead returns :const:`None` the :class:`TarInfo`
Lars Gustäbel21121e62009-09-12 10:28:15 +0000417 object will be excluded from the archive. See :ref:`tar-examples` for an
418 example.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000419
420 .. versionchanged:: 2.6
421 Added the *exclude* parameter.
422
Lars Gustäbel21121e62009-09-12 10:28:15 +0000423 .. versionchanged:: 2.7
424 Added the *filter* parameter.
425
426 .. deprecated:: 2.7
427 The *exclude* parameter is deprecated, please use the *filter* parameter
Raymond Hettinger32074e32011-01-26 20:40:32 +0000428 instead. For maximum portability, *filter* should be used as a keyword
429 argument rather than as a positional argument so that code won't be
430 affected when *exclude* is ultimately removed.
Lars Gustäbel21121e62009-09-12 10:28:15 +0000431
Georg Brandl8ec7f652007-08-15 14:28:01 +0000432
Lars Gustäbel4bfb5932008-05-17 16:50:22 +0000433.. method:: TarFile.addfile(tarinfo, fileobj=None)
Georg Brandl8ec7f652007-08-15 14:28:01 +0000434
435 Add the :class:`TarInfo` object *tarinfo* to the archive. If *fileobj* is given,
436 ``tarinfo.size`` bytes are read from it and added to the archive. You can
437 create :class:`TarInfo` objects using :meth:`gettarinfo`.
438
439 .. note::
440
441 On Windows platforms, *fileobj* should always be opened with mode ``'rb'`` to
442 avoid irritation about the file size.
443
444
Lars Gustäbel4bfb5932008-05-17 16:50:22 +0000445.. method:: TarFile.gettarinfo(name=None, arcname=None, fileobj=None)
Georg Brandl8ec7f652007-08-15 14:28:01 +0000446
447 Create a :class:`TarInfo` object for either the file *name* or the file object
448 *fileobj* (using :func:`os.fstat` on its file descriptor). You can modify some
449 of the :class:`TarInfo`'s attributes before you add it using :meth:`addfile`.
450 If given, *arcname* specifies an alternative name for the file in the archive.
451
452
453.. method:: TarFile.close()
454
455 Close the :class:`TarFile`. In write mode, two finishing zero blocks are
456 appended to the archive.
457
458
459.. attribute:: TarFile.posix
460
461 Setting this to :const:`True` is equivalent to setting the :attr:`format`
462 attribute to :const:`USTAR_FORMAT`, :const:`False` is equivalent to
463 :const:`GNU_FORMAT`.
464
465 .. versionchanged:: 2.4
466 *posix* defaults to :const:`False`.
467
468 .. deprecated:: 2.6
469 Use the :attr:`format` attribute instead.
470
471
472.. attribute:: TarFile.pax_headers
473
474 A dictionary containing key-value pairs of pax global headers.
475
476 .. versionadded:: 2.6
477
Georg Brandl8ec7f652007-08-15 14:28:01 +0000478
479.. _tarinfo-objects:
480
481TarInfo Objects
482---------------
483
484A :class:`TarInfo` object represents one member in a :class:`TarFile`. Aside
485from storing all required attributes of a file (like file type, size, time,
486permissions, owner etc.), it provides some useful methods to determine its type.
487It does *not* contain the file's data itself.
488
489:class:`TarInfo` objects are returned by :class:`TarFile`'s methods
490:meth:`getmember`, :meth:`getmembers` and :meth:`gettarinfo`.
491
492
Lars Gustäbel4bfb5932008-05-17 16:50:22 +0000493.. class:: TarInfo(name="")
Georg Brandl8ec7f652007-08-15 14:28:01 +0000494
495 Create a :class:`TarInfo` object.
496
497
498.. method:: TarInfo.frombuf(buf)
499
500 Create and return a :class:`TarInfo` object from string buffer *buf*.
501
502 .. versionadded:: 2.6
503 Raises :exc:`HeaderError` if the buffer is invalid..
504
505
506.. method:: TarInfo.fromtarfile(tarfile)
507
508 Read the next member from the :class:`TarFile` object *tarfile* and return it as
509 a :class:`TarInfo` object.
510
511 .. versionadded:: 2.6
512
513
Lars Gustäbel4bfb5932008-05-17 16:50:22 +0000514.. method:: TarInfo.tobuf(format=DEFAULT_FORMAT, encoding=ENCODING, errors='strict')
Georg Brandl8ec7f652007-08-15 14:28:01 +0000515
516 Create a string buffer from a :class:`TarInfo` object. For information on the
517 arguments see the constructor of the :class:`TarFile` class.
518
519 .. versionchanged:: 2.6
520 The arguments were added.
521
522A ``TarInfo`` object has the following public data attributes:
523
524
525.. attribute:: TarInfo.name
526
527 Name of the archive member.
528
529
530.. attribute:: TarInfo.size
531
532 Size in bytes.
533
534
535.. attribute:: TarInfo.mtime
536
537 Time of last modification.
538
539
540.. attribute:: TarInfo.mode
541
542 Permission bits.
543
544
545.. attribute:: TarInfo.type
546
547 File type. *type* is usually one of these constants: :const:`REGTYPE`,
548 :const:`AREGTYPE`, :const:`LNKTYPE`, :const:`SYMTYPE`, :const:`DIRTYPE`,
549 :const:`FIFOTYPE`, :const:`CONTTYPE`, :const:`CHRTYPE`, :const:`BLKTYPE`,
550 :const:`GNUTYPE_SPARSE`. To determine the type of a :class:`TarInfo` object
Raymond Hettinger198123c2014-05-23 00:05:48 +0100551 more conveniently, use the ``is*()`` methods below.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000552
553
554.. attribute:: TarInfo.linkname
555
556 Name of the target file name, which is only present in :class:`TarInfo` objects
557 of type :const:`LNKTYPE` and :const:`SYMTYPE`.
558
559
560.. attribute:: TarInfo.uid
561
562 User ID of the user who originally stored this member.
563
564
565.. attribute:: TarInfo.gid
566
567 Group ID of the user who originally stored this member.
568
569
570.. attribute:: TarInfo.uname
571
572 User name.
573
574
575.. attribute:: TarInfo.gname
576
577 Group name.
578
579
580.. attribute:: TarInfo.pax_headers
581
582 A dictionary containing key-value pairs of an associated pax extended header.
583
584 .. versionadded:: 2.6
585
586A :class:`TarInfo` object also provides some convenient query methods:
587
588
589.. method:: TarInfo.isfile()
590
591 Return :const:`True` if the :class:`Tarinfo` object is a regular file.
592
593
594.. method:: TarInfo.isreg()
595
596 Same as :meth:`isfile`.
597
598
599.. method:: TarInfo.isdir()
600
601 Return :const:`True` if it is a directory.
602
603
604.. method:: TarInfo.issym()
605
606 Return :const:`True` if it is a symbolic link.
607
608
609.. method:: TarInfo.islnk()
610
611 Return :const:`True` if it is a hard link.
612
613
614.. method:: TarInfo.ischr()
615
616 Return :const:`True` if it is a character device.
617
618
619.. method:: TarInfo.isblk()
620
621 Return :const:`True` if it is a block device.
622
623
624.. method:: TarInfo.isfifo()
625
626 Return :const:`True` if it is a FIFO.
627
628
629.. method:: TarInfo.isdev()
630
631 Return :const:`True` if it is one of character device, block device or FIFO.
632
Georg Brandl8ec7f652007-08-15 14:28:01 +0000633
634.. _tar-examples:
635
636Examples
637--------
638
639How to extract an entire tar archive to the current working directory::
640
641 import tarfile
642 tar = tarfile.open("sample.tar.gz")
643 tar.extractall()
644 tar.close()
645
Lars Gustäbel4bfb5932008-05-17 16:50:22 +0000646How to extract a subset of a tar archive with :meth:`TarFile.extractall` using
647a generator function instead of a list::
648
649 import os
650 import tarfile
651
652 def py_files(members):
653 for tarinfo in members:
654 if os.path.splitext(tarinfo.name)[1] == ".py":
655 yield tarinfo
656
657 tar = tarfile.open("sample.tar.gz")
658 tar.extractall(members=py_files(tar))
659 tar.close()
660
Georg Brandl8ec7f652007-08-15 14:28:01 +0000661How to create an uncompressed tar archive from a list of filenames::
662
663 import tarfile
664 tar = tarfile.open("sample.tar", "w")
665 for name in ["foo", "bar", "quux"]:
666 tar.add(name)
667 tar.close()
668
Lars Gustäbel64581042010-03-03 11:55:48 +0000669The same example using the :keyword:`with` statement::
670
671 import tarfile
672 with tarfile.open("sample.tar", "w") as tar:
673 for name in ["foo", "bar", "quux"]:
674 tar.add(name)
675
Georg Brandl8ec7f652007-08-15 14:28:01 +0000676How to read a gzip compressed tar archive and display some member information::
677
678 import tarfile
679 tar = tarfile.open("sample.tar.gz", "r:gz")
680 for tarinfo in tar:
681 print tarinfo.name, "is", tarinfo.size, "bytes in size and is",
682 if tarinfo.isreg():
683 print "a regular file."
684 elif tarinfo.isdir():
685 print "a directory."
686 else:
687 print "something else."
688 tar.close()
689
Lars Gustäbel21121e62009-09-12 10:28:15 +0000690How to create an archive and reset the user information using the *filter*
691parameter in :meth:`TarFile.add`::
692
693 import tarfile
694 def reset(tarinfo):
695 tarinfo.uid = tarinfo.gid = 0
696 tarinfo.uname = tarinfo.gname = "root"
697 return tarinfo
698 tar = tarfile.open("sample.tar.gz", "w:gz")
699 tar.add("foo", filter=reset)
700 tar.close()
701
Georg Brandl8ec7f652007-08-15 14:28:01 +0000702
703.. _tar-formats:
704
705Supported tar formats
706---------------------
707
708There are three tar formats that can be created with the :mod:`tarfile` module:
709
710* The POSIX.1-1988 ustar format (:const:`USTAR_FORMAT`). It supports filenames
711 up to a length of at best 256 characters and linknames up to 100 characters. The
712 maximum file size is 8 gigabytes. This is an old and limited but widely
713 supported format.
714
715* The GNU tar format (:const:`GNU_FORMAT`). It supports long filenames and
716 linknames, files bigger than 8 gigabytes and sparse files. It is the de facto
717 standard on GNU/Linux systems. :mod:`tarfile` fully supports the GNU tar
718 extensions for long names, sparse file support is read-only.
719
720* The POSIX.1-2001 pax format (:const:`PAX_FORMAT`). It is the most flexible
721 format with virtually no limits. It supports long filenames and linknames, large
722 files and stores pathnames in a portable way. However, not all tar
723 implementations today are able to handle pax archives properly.
724
725 The *pax* format is an extension to the existing *ustar* format. It uses extra
726 headers for information that cannot be stored otherwise. There are two flavours
727 of pax headers: Extended headers only affect the subsequent file header, global
728 headers are valid for the complete archive and affect all following files. All
729 the data in a pax header is encoded in *UTF-8* for portability reasons.
730
731There are some more variants of the tar format which can be read, but not
732created:
733
734* The ancient V7 format. This is the first tar format from Unix Seventh Edition,
735 storing only regular files and directories. Names must not be longer than 100
736 characters, there is no user/group name information. Some archives have
737 miscalculated header checksums in case of fields with non-ASCII characters.
738
739* The SunOS tar extended format. This format is a variant of the POSIX.1-2001
740 pax format, but is not compatible.
741
Georg Brandl8ec7f652007-08-15 14:28:01 +0000742.. _tar-unicode:
743
744Unicode issues
745--------------
746
747The tar format was originally conceived to make backups on tape drives with the
748main focus on preserving file system information. Nowadays tar archives are
749commonly used for file distribution and exchanging archives over networks. One
750problem of the original format (that all other formats are merely variants of)
751is that there is no concept of supporting different character encodings. For
752example, an ordinary tar archive created on a *UTF-8* system cannot be read
753correctly on a *Latin-1* system if it contains non-ASCII characters. Names (i.e.
754filenames, linknames, user/group names) containing these characters will appear
755damaged. Unfortunately, there is no way to autodetect the encoding of an
756archive.
757
758The pax format was designed to solve this problem. It stores non-ASCII names
759using the universal character encoding *UTF-8*. When a pax archive is read,
760these *UTF-8* names are converted to the encoding of the local file system.
761
762The details of unicode conversion are controlled by the *encoding* and *errors*
763keyword arguments of the :class:`TarFile` class.
764
765The default value for *encoding* is the local character encoding. It is deduced
766from :func:`sys.getfilesystemencoding` and :func:`sys.getdefaultencoding`. In
767read mode, *encoding* is used exclusively to convert unicode names from a pax
768archive to strings in the local character encoding. In write mode, the use of
769*encoding* depends on the chosen archive format. In case of :const:`PAX_FORMAT`,
770input names that contain non-ASCII characters need to be decoded before being
771stored as *UTF-8* strings. The other formats do not make use of *encoding*
772unless unicode objects are used as input names. These are converted to 8-bit
773character strings before they are added to the archive.
774
775The *errors* argument defines how characters are treated that cannot be
776converted to or from *encoding*. Possible values are listed in section
777:ref:`codec-base-classes`. In read mode, there is an additional scheme
778``'utf-8'`` which means that bad characters are replaced by their *UTF-8*
779representation. This is the default scheme. In write mode the default value for
780*errors* is ``'strict'`` to ensure that name information is not altered
781unnoticed.
782