blob: 5502adce74fc9d302cdd92e3572479ad2798ec46 [file] [log] [blame]
Georg Brandl8ec7f652007-08-15 14:28:01 +00001:mod:`tarfile` --- Read and write tar archive files
2===================================================
3
4.. module:: tarfile
5 :synopsis: Read and write tar-format archive files.
6
7
8.. versionadded:: 2.3
9
10.. moduleauthor:: Lars Gustäbel <lars@gustaebel.de>
11.. sectionauthor:: Lars Gustäbel <lars@gustaebel.de>
12
Éric Araujo29a0b572011-08-19 02:14:03 +020013**Source code:** :source:`Lib/tarfile.py`
14
15--------------
Georg Brandl8ec7f652007-08-15 14:28:01 +000016
Mark Summerfieldaea6e592007-11-05 09:22:48 +000017The :mod:`tarfile` module makes it possible to read and write tar
18archives, including those using gzip or bz2 compression.
Georg Brandl2b92f6b2007-12-06 01:52:24 +000019(:file:`.zip` files can be read and written using the :mod:`zipfile` module.)
Mark Summerfieldaea6e592007-11-05 09:22:48 +000020
Georg Brandl8ec7f652007-08-15 14:28:01 +000021Some facts and figures:
22
Mark Summerfieldaea6e592007-11-05 09:22:48 +000023* reads and writes :mod:`gzip` and :mod:`bz2` compressed archives.
Georg Brandl8ec7f652007-08-15 14:28:01 +000024
25* read/write support for the POSIX.1-1988 (ustar) format.
26
27* read/write support for the GNU tar format including *longname* and *longlink*
28 extensions, read-only support for the *sparse* extension.
29
30* read/write support for the POSIX.1-2001 (pax) format.
31
32 .. versionadded:: 2.6
33
34* handles directories, regular files, hardlinks, symbolic links, fifos,
35 character devices and block devices and is able to acquire and restore file
36 information like timestamp, access permissions and owner.
37
Georg Brandl8ec7f652007-08-15 14:28:01 +000038
Lars Gustäbel4bfb5932008-05-17 16:50:22 +000039.. function:: open(name=None, mode='r', fileobj=None, bufsize=10240, \*\*kwargs)
Georg Brandl8ec7f652007-08-15 14:28:01 +000040
41 Return a :class:`TarFile` object for the pathname *name*. For detailed
42 information on :class:`TarFile` objects and the keyword arguments that are
43 allowed, see :ref:`tarfile-objects`.
44
45 *mode* has to be a string of the form ``'filemode[:compression]'``, it defaults
46 to ``'r'``. Here is a full list of mode combinations:
47
48 +------------------+---------------------------------------------+
49 | mode | action |
50 +==================+=============================================+
51 | ``'r' or 'r:*'`` | Open for reading with transparent |
52 | | compression (recommended). |
53 +------------------+---------------------------------------------+
54 | ``'r:'`` | Open for reading exclusively without |
55 | | compression. |
56 +------------------+---------------------------------------------+
57 | ``'r:gz'`` | Open for reading with gzip compression. |
58 +------------------+---------------------------------------------+
59 | ``'r:bz2'`` | Open for reading with bzip2 compression. |
60 +------------------+---------------------------------------------+
61 | ``'a' or 'a:'`` | Open for appending with no compression. The |
62 | | file is created if it does not exist. |
63 +------------------+---------------------------------------------+
64 | ``'w' or 'w:'`` | Open for uncompressed writing. |
65 +------------------+---------------------------------------------+
66 | ``'w:gz'`` | Open for gzip compressed writing. |
67 +------------------+---------------------------------------------+
68 | ``'w:bz2'`` | Open for bzip2 compressed writing. |
69 +------------------+---------------------------------------------+
70
71 Note that ``'a:gz'`` or ``'a:bz2'`` is not possible. If *mode* is not suitable
72 to open a certain (compressed) file for reading, :exc:`ReadError` is raised. Use
73 *mode* ``'r'`` to avoid this. If a compression method is not supported,
74 :exc:`CompressionError` is raised.
75
76 If *fileobj* is specified, it is used as an alternative to a file object opened
77 for *name*. It is supposed to be at position 0.
78
79 For special purposes, there is a second format for *mode*:
Lars Gustäbel4bfb5932008-05-17 16:50:22 +000080 ``'filemode|[compression]'``. :func:`tarfile.open` will return a :class:`TarFile`
Georg Brandl8ec7f652007-08-15 14:28:01 +000081 object that processes its data as a stream of blocks. No random seeking will
82 be done on the file. If given, *fileobj* may be any object that has a
83 :meth:`read` or :meth:`write` method (depending on the *mode*). *bufsize*
84 specifies the blocksize and defaults to ``20 * 512`` bytes. Use this variant
85 in combination with e.g. ``sys.stdin``, a socket file object or a tape
86 device. However, such a :class:`TarFile` object is limited in that it does
87 not allow to be accessed randomly, see :ref:`tar-examples`. The currently
88 possible modes:
89
90 +-------------+--------------------------------------------+
91 | Mode | Action |
92 +=============+============================================+
93 | ``'r|*'`` | Open a *stream* of tar blocks for reading |
94 | | with transparent compression. |
95 +-------------+--------------------------------------------+
96 | ``'r|'`` | Open a *stream* of uncompressed tar blocks |
97 | | for reading. |
98 +-------------+--------------------------------------------+
99 | ``'r|gz'`` | Open a gzip compressed *stream* for |
100 | | reading. |
101 +-------------+--------------------------------------------+
102 | ``'r|bz2'`` | Open a bzip2 compressed *stream* for |
103 | | reading. |
104 +-------------+--------------------------------------------+
105 | ``'w|'`` | Open an uncompressed *stream* for writing. |
106 +-------------+--------------------------------------------+
107 | ``'w|gz'`` | Open an gzip compressed *stream* for |
108 | | writing. |
109 +-------------+--------------------------------------------+
110 | ``'w|bz2'`` | Open an bzip2 compressed *stream* for |
111 | | writing. |
112 +-------------+--------------------------------------------+
113
114
115.. class:: TarFile
116
117 Class for reading and writing tar archives. Do not use this class directly,
Lars Gustäbel4bfb5932008-05-17 16:50:22 +0000118 better use :func:`tarfile.open` instead. See :ref:`tarfile-objects`.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000119
120
121.. function:: is_tarfile(name)
122
123 Return :const:`True` if *name* is a tar archive file, that the :mod:`tarfile`
124 module can read.
125
126
Lars Gustäbel4bfb5932008-05-17 16:50:22 +0000127.. class:: TarFileCompat(filename, mode='r', compression=TAR_PLAIN)
Georg Brandl8ec7f652007-08-15 14:28:01 +0000128
129 Class for limited access to tar archives with a :mod:`zipfile`\ -like interface.
130 Please consult the documentation of the :mod:`zipfile` module for more details.
131 *compression* must be one of the following constants:
132
133
134 .. data:: TAR_PLAIN
135
136 Constant for an uncompressed tar archive.
137
138
139 .. data:: TAR_GZIPPED
140
141 Constant for a :mod:`gzip` compressed tar archive.
142
143
Lars Gustäbel727bd0b2008-08-02 11:26:39 +0000144 .. deprecated:: 2.6
145 The :class:`TarFileCompat` class has been deprecated for removal in Python 3.0.
146
147
Georg Brandl8ec7f652007-08-15 14:28:01 +0000148.. exception:: TarError
149
150 Base class for all :mod:`tarfile` exceptions.
151
152
153.. exception:: ReadError
154
155 Is raised when a tar archive is opened, that either cannot be handled by the
156 :mod:`tarfile` module or is somehow invalid.
157
158
159.. exception:: CompressionError
160
161 Is raised when a compression method is not supported or when the data cannot be
162 decoded properly.
163
164
165.. exception:: StreamError
166
167 Is raised for the limitations that are typical for stream-like :class:`TarFile`
168 objects.
169
170
171.. exception:: ExtractError
172
Lars Gustäbel4bfb5932008-05-17 16:50:22 +0000173 Is raised for *non-fatal* errors when using :meth:`TarFile.extract`, but only if
Georg Brandl8ec7f652007-08-15 14:28:01 +0000174 :attr:`TarFile.errorlevel`\ ``== 2``.
175
176
177.. exception:: HeaderError
178
Lars Gustäbel4bfb5932008-05-17 16:50:22 +0000179 Is raised by :meth:`TarInfo.frombuf` if the buffer it gets is invalid.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000180
181 .. versionadded:: 2.6
182
Lars Gustäbel4bfb5932008-05-17 16:50:22 +0000183
Georg Brandl8ec7f652007-08-15 14:28:01 +0000184Each of the following constants defines a tar archive format that the
185:mod:`tarfile` module is able to create. See section :ref:`tar-formats` for
186details.
187
188
189.. data:: USTAR_FORMAT
190
191 POSIX.1-1988 (ustar) format.
192
193
194.. data:: GNU_FORMAT
195
196 GNU tar format.
197
198
199.. data:: PAX_FORMAT
200
201 POSIX.1-2001 (pax) format.
202
203
204.. data:: DEFAULT_FORMAT
205
206 The default format for creating archives. This is currently :const:`GNU_FORMAT`.
207
208
Lars Gustäbel4bfb5932008-05-17 16:50:22 +0000209The following variables are available on module level:
210
211
212.. data:: ENCODING
213
214 The default character encoding i.e. the value from either
215 :func:`sys.getfilesystemencoding` or :func:`sys.getdefaultencoding`.
216
217
Georg Brandl8ec7f652007-08-15 14:28:01 +0000218.. seealso::
219
220 Module :mod:`zipfile`
221 Documentation of the :mod:`zipfile` standard module.
222
Lars Gustäbel4bfb5932008-05-17 16:50:22 +0000223 `GNU tar manual, Basic Tar Format <http://www.gnu.org/software/tar/manual/html_node/Standard.html>`_
Georg Brandl8ec7f652007-08-15 14:28:01 +0000224 Documentation for tar archive files, including GNU tar extensions.
225
Georg Brandl8ec7f652007-08-15 14:28:01 +0000226
227.. _tarfile-objects:
228
229TarFile Objects
230---------------
231
232The :class:`TarFile` object provides an interface to a tar archive. A tar
233archive is a sequence of blocks. An archive member (a stored file) is made up of
234a header block followed by data blocks. It is possible to store a file in a tar
235archive several times. Each archive member is represented by a :class:`TarInfo`
236object, see :ref:`tarinfo-objects` for details.
237
Lars Gustäbel64581042010-03-03 11:55:48 +0000238A :class:`TarFile` object can be used as a context manager in a :keyword:`with`
239statement. It will automatically be closed when the block is completed. Please
240note that in the event of an exception an archive opened for writing will not
Andrew M. Kuchlingca2413e2010-04-11 01:40:06 +0000241be finalized; only the internally used file object will be closed. See the
Lars Gustäbel64581042010-03-03 11:55:48 +0000242:ref:`tar-examples` section for a use case.
243
244.. versionadded:: 2.7
245 Added support for the context manager protocol.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000246
Lars Gustäbel4bfb5932008-05-17 16:50:22 +0000247.. class:: TarFile(name=None, mode='r', fileobj=None, format=DEFAULT_FORMAT, tarinfo=TarInfo, dereference=False, ignore_zeros=False, encoding=ENCODING, errors=None, pax_headers=None, debug=0, errorlevel=0)
Georg Brandl8ec7f652007-08-15 14:28:01 +0000248
249 All following arguments are optional and can be accessed as instance attributes
250 as well.
251
252 *name* is the pathname of the archive. It can be omitted if *fileobj* is given.
253 In this case, the file object's :attr:`name` attribute is used if it exists.
254
255 *mode* is either ``'r'`` to read from an existing archive, ``'a'`` to append
256 data to an existing file or ``'w'`` to create a new file overwriting an existing
257 one.
258
259 If *fileobj* is given, it is used for reading or writing data. If it can be
260 determined, *mode* is overridden by *fileobj*'s mode. *fileobj* will be used
261 from position 0.
262
263 .. note::
264
265 *fileobj* is not closed, when :class:`TarFile` is closed.
266
267 *format* controls the archive format. It must be one of the constants
268 :const:`USTAR_FORMAT`, :const:`GNU_FORMAT` or :const:`PAX_FORMAT` that are
269 defined at module level.
270
271 .. versionadded:: 2.6
272
273 The *tarinfo* argument can be used to replace the default :class:`TarInfo` class
274 with a different one.
275
276 .. versionadded:: 2.6
277
Lars Gustäbel4bfb5932008-05-17 16:50:22 +0000278 If *dereference* is :const:`False`, add symbolic and hard links to the archive. If it
279 is :const:`True`, add the content of the target files to the archive. This has no
Georg Brandl8ec7f652007-08-15 14:28:01 +0000280 effect on systems that do not support symbolic links.
281
Lars Gustäbel4bfb5932008-05-17 16:50:22 +0000282 If *ignore_zeros* is :const:`False`, treat an empty block as the end of the archive.
283 If it is :const:`True`, skip empty (and invalid) blocks and try to get as many members
Georg Brandl8ec7f652007-08-15 14:28:01 +0000284 as possible. This is only useful for reading concatenated or damaged archives.
285
286 *debug* can be set from ``0`` (no debug messages) up to ``3`` (all debug
287 messages). The messages are written to ``sys.stderr``.
288
Lars Gustäbel4bfb5932008-05-17 16:50:22 +0000289 If *errorlevel* is ``0``, all errors are ignored when using :meth:`TarFile.extract`.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000290 Nevertheless, they appear as error messages in the debug output, when debugging
291 is enabled. If ``1``, all *fatal* errors are raised as :exc:`OSError` or
292 :exc:`IOError` exceptions. If ``2``, all *non-fatal* errors are raised as
293 :exc:`TarError` exceptions as well.
294
295 The *encoding* and *errors* arguments control the way strings are converted to
296 unicode objects and vice versa. The default settings will work for most users.
297 See section :ref:`tar-unicode` for in-depth information.
298
299 .. versionadded:: 2.6
300
301 The *pax_headers* argument is an optional dictionary of unicode strings which
302 will be added as a pax global header if *format* is :const:`PAX_FORMAT`.
303
304 .. versionadded:: 2.6
305
306
307.. method:: TarFile.open(...)
308
Lars Gustäbel4bfb5932008-05-17 16:50:22 +0000309 Alternative constructor. The :func:`tarfile.open` function is actually a
310 shortcut to this classmethod.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000311
312
313.. method:: TarFile.getmember(name)
314
315 Return a :class:`TarInfo` object for member *name*. If *name* can not be found
316 in the archive, :exc:`KeyError` is raised.
317
318 .. note::
319
320 If a member occurs more than once in the archive, its last occurrence is assumed
321 to be the most up-to-date version.
322
323
324.. method:: TarFile.getmembers()
325
326 Return the members of the archive as a list of :class:`TarInfo` objects. The
327 list has the same order as the members in the archive.
328
329
330.. method:: TarFile.getnames()
331
332 Return the members as a list of their names. It has the same order as the list
333 returned by :meth:`getmembers`.
334
335
336.. method:: TarFile.list(verbose=True)
337
338 Print a table of contents to ``sys.stdout``. If *verbose* is :const:`False`,
339 only the names of the members are printed. If it is :const:`True`, output
340 similar to that of :program:`ls -l` is produced.
341
342
343.. method:: TarFile.next()
344
345 Return the next member of the archive as a :class:`TarInfo` object, when
Lars Gustäbel4bfb5932008-05-17 16:50:22 +0000346 :class:`TarFile` is opened for reading. Return :const:`None` if there is no more
Georg Brandl8ec7f652007-08-15 14:28:01 +0000347 available.
348
349
Lars Gustäbel4bfb5932008-05-17 16:50:22 +0000350.. method:: TarFile.extractall(path=".", members=None)
Georg Brandl8ec7f652007-08-15 14:28:01 +0000351
352 Extract all members from the archive to the current working directory or
353 directory *path*. If optional *members* is given, it must be a subset of the
354 list returned by :meth:`getmembers`. Directory information like owner,
355 modification time and permissions are set after all members have been extracted.
356 This is done to work around two problems: A directory's modification time is
357 reset each time a file is created in it. And, if a directory's permissions do
358 not allow writing, extracting files to it will fail.
359
Lars Gustäbel89241a32007-08-30 20:24:31 +0000360 .. warning::
361
362 Never extract archives from untrusted sources without prior inspection.
363 It is possible that files are created outside of *path*, e.g. members
364 that have absolute filenames starting with ``"/"`` or filenames with two
365 dots ``".."``.
366
Georg Brandl8ec7f652007-08-15 14:28:01 +0000367 .. versionadded:: 2.5
368
369
Lars Gustäbel4bfb5932008-05-17 16:50:22 +0000370.. method:: TarFile.extract(member, path="")
Georg Brandl8ec7f652007-08-15 14:28:01 +0000371
372 Extract a member from the archive to the current working directory, using its
373 full name. Its file information is extracted as accurately as possible. *member*
374 may be a filename or a :class:`TarInfo` object. You can specify a different
375 directory using *path*.
376
377 .. note::
378
Lars Gustäbel4bfb5932008-05-17 16:50:22 +0000379 The :meth:`extract` method does not take care of several extraction issues.
380 In most cases you should consider using the :meth:`extractall` method.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000381
Lars Gustäbel89241a32007-08-30 20:24:31 +0000382 .. warning::
383
384 See the warning for :meth:`extractall`.
385
Georg Brandl8ec7f652007-08-15 14:28:01 +0000386
387.. method:: TarFile.extractfile(member)
388
389 Extract a member from the archive as a file object. *member* may be a filename
390 or a :class:`TarInfo` object. If *member* is a regular file, a file-like object
391 is returned. If *member* is a link, a file-like object is constructed from the
Lars Gustäbel4bfb5932008-05-17 16:50:22 +0000392 link's target. If *member* is none of the above, :const:`None` is returned.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000393
394 .. note::
395
Georg Brandlcf5608d2009-04-25 15:05:04 +0000396 The file-like object is read-only. It provides the methods
397 :meth:`read`, :meth:`readline`, :meth:`readlines`, :meth:`seek`, :meth:`tell`,
398 and :meth:`close`, and also supports iteration over its lines.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000399
400
Lars Gustäbel21121e62009-09-12 10:28:15 +0000401.. method:: TarFile.add(name, arcname=None, recursive=True, exclude=None, filter=None)
Georg Brandl8ec7f652007-08-15 14:28:01 +0000402
403 Add the file *name* to the archive. *name* may be any type of file (directory,
404 fifo, symbolic link, etc.). If given, *arcname* specifies an alternative name
405 for the file in the archive. Directories are added recursively by default. This
406 can be avoided by setting *recursive* to :const:`False`. If *exclude* is given
407 it must be a function that takes one filename argument and returns a boolean
408 value. Depending on this value the respective file is either excluded
Lars Gustäbel21121e62009-09-12 10:28:15 +0000409 (:const:`True`) or added (:const:`False`). If *filter* is specified it must
410 be a function that takes a :class:`TarInfo` object argument and returns the
Andrew M. Kuchlingf5852f52009-10-05 21:24:35 +0000411 changed :class:`TarInfo` object. If it instead returns :const:`None` the :class:`TarInfo`
Lars Gustäbel21121e62009-09-12 10:28:15 +0000412 object will be excluded from the archive. See :ref:`tar-examples` for an
413 example.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000414
415 .. versionchanged:: 2.6
416 Added the *exclude* parameter.
417
Lars Gustäbel21121e62009-09-12 10:28:15 +0000418 .. versionchanged:: 2.7
419 Added the *filter* parameter.
420
421 .. deprecated:: 2.7
422 The *exclude* parameter is deprecated, please use the *filter* parameter
Raymond Hettinger32074e32011-01-26 20:40:32 +0000423 instead. For maximum portability, *filter* should be used as a keyword
424 argument rather than as a positional argument so that code won't be
425 affected when *exclude* is ultimately removed.
Lars Gustäbel21121e62009-09-12 10:28:15 +0000426
Georg Brandl8ec7f652007-08-15 14:28:01 +0000427
Lars Gustäbel4bfb5932008-05-17 16:50:22 +0000428.. method:: TarFile.addfile(tarinfo, fileobj=None)
Georg Brandl8ec7f652007-08-15 14:28:01 +0000429
430 Add the :class:`TarInfo` object *tarinfo* to the archive. If *fileobj* is given,
431 ``tarinfo.size`` bytes are read from it and added to the archive. You can
432 create :class:`TarInfo` objects using :meth:`gettarinfo`.
433
434 .. note::
435
436 On Windows platforms, *fileobj* should always be opened with mode ``'rb'`` to
437 avoid irritation about the file size.
438
439
Lars Gustäbel4bfb5932008-05-17 16:50:22 +0000440.. method:: TarFile.gettarinfo(name=None, arcname=None, fileobj=None)
Georg Brandl8ec7f652007-08-15 14:28:01 +0000441
442 Create a :class:`TarInfo` object for either the file *name* or the file object
443 *fileobj* (using :func:`os.fstat` on its file descriptor). You can modify some
444 of the :class:`TarInfo`'s attributes before you add it using :meth:`addfile`.
445 If given, *arcname* specifies an alternative name for the file in the archive.
446
447
448.. method:: TarFile.close()
449
450 Close the :class:`TarFile`. In write mode, two finishing zero blocks are
451 appended to the archive.
452
453
454.. attribute:: TarFile.posix
455
456 Setting this to :const:`True` is equivalent to setting the :attr:`format`
457 attribute to :const:`USTAR_FORMAT`, :const:`False` is equivalent to
458 :const:`GNU_FORMAT`.
459
460 .. versionchanged:: 2.4
461 *posix* defaults to :const:`False`.
462
463 .. deprecated:: 2.6
464 Use the :attr:`format` attribute instead.
465
466
467.. attribute:: TarFile.pax_headers
468
469 A dictionary containing key-value pairs of pax global headers.
470
471 .. versionadded:: 2.6
472
Georg Brandl8ec7f652007-08-15 14:28:01 +0000473
474.. _tarinfo-objects:
475
476TarInfo Objects
477---------------
478
479A :class:`TarInfo` object represents one member in a :class:`TarFile`. Aside
480from storing all required attributes of a file (like file type, size, time,
481permissions, owner etc.), it provides some useful methods to determine its type.
482It does *not* contain the file's data itself.
483
484:class:`TarInfo` objects are returned by :class:`TarFile`'s methods
485:meth:`getmember`, :meth:`getmembers` and :meth:`gettarinfo`.
486
487
Lars Gustäbel4bfb5932008-05-17 16:50:22 +0000488.. class:: TarInfo(name="")
Georg Brandl8ec7f652007-08-15 14:28:01 +0000489
490 Create a :class:`TarInfo` object.
491
492
493.. method:: TarInfo.frombuf(buf)
494
495 Create and return a :class:`TarInfo` object from string buffer *buf*.
496
497 .. versionadded:: 2.6
498 Raises :exc:`HeaderError` if the buffer is invalid..
499
500
501.. method:: TarInfo.fromtarfile(tarfile)
502
503 Read the next member from the :class:`TarFile` object *tarfile* and return it as
504 a :class:`TarInfo` object.
505
506 .. versionadded:: 2.6
507
508
Lars Gustäbel4bfb5932008-05-17 16:50:22 +0000509.. method:: TarInfo.tobuf(format=DEFAULT_FORMAT, encoding=ENCODING, errors='strict')
Georg Brandl8ec7f652007-08-15 14:28:01 +0000510
511 Create a string buffer from a :class:`TarInfo` object. For information on the
512 arguments see the constructor of the :class:`TarFile` class.
513
514 .. versionchanged:: 2.6
515 The arguments were added.
516
517A ``TarInfo`` object has the following public data attributes:
518
519
520.. attribute:: TarInfo.name
521
522 Name of the archive member.
523
524
525.. attribute:: TarInfo.size
526
527 Size in bytes.
528
529
530.. attribute:: TarInfo.mtime
531
532 Time of last modification.
533
534
535.. attribute:: TarInfo.mode
536
537 Permission bits.
538
539
540.. attribute:: TarInfo.type
541
542 File type. *type* is usually one of these constants: :const:`REGTYPE`,
543 :const:`AREGTYPE`, :const:`LNKTYPE`, :const:`SYMTYPE`, :const:`DIRTYPE`,
544 :const:`FIFOTYPE`, :const:`CONTTYPE`, :const:`CHRTYPE`, :const:`BLKTYPE`,
545 :const:`GNUTYPE_SPARSE`. To determine the type of a :class:`TarInfo` object
546 more conveniently, use the ``is_*()`` methods below.
547
548
549.. attribute:: TarInfo.linkname
550
551 Name of the target file name, which is only present in :class:`TarInfo` objects
552 of type :const:`LNKTYPE` and :const:`SYMTYPE`.
553
554
555.. attribute:: TarInfo.uid
556
557 User ID of the user who originally stored this member.
558
559
560.. attribute:: TarInfo.gid
561
562 Group ID of the user who originally stored this member.
563
564
565.. attribute:: TarInfo.uname
566
567 User name.
568
569
570.. attribute:: TarInfo.gname
571
572 Group name.
573
574
575.. attribute:: TarInfo.pax_headers
576
577 A dictionary containing key-value pairs of an associated pax extended header.
578
579 .. versionadded:: 2.6
580
581A :class:`TarInfo` object also provides some convenient query methods:
582
583
584.. method:: TarInfo.isfile()
585
586 Return :const:`True` if the :class:`Tarinfo` object is a regular file.
587
588
589.. method:: TarInfo.isreg()
590
591 Same as :meth:`isfile`.
592
593
594.. method:: TarInfo.isdir()
595
596 Return :const:`True` if it is a directory.
597
598
599.. method:: TarInfo.issym()
600
601 Return :const:`True` if it is a symbolic link.
602
603
604.. method:: TarInfo.islnk()
605
606 Return :const:`True` if it is a hard link.
607
608
609.. method:: TarInfo.ischr()
610
611 Return :const:`True` if it is a character device.
612
613
614.. method:: TarInfo.isblk()
615
616 Return :const:`True` if it is a block device.
617
618
619.. method:: TarInfo.isfifo()
620
621 Return :const:`True` if it is a FIFO.
622
623
624.. method:: TarInfo.isdev()
625
626 Return :const:`True` if it is one of character device, block device or FIFO.
627
Georg Brandl8ec7f652007-08-15 14:28:01 +0000628
629.. _tar-examples:
630
631Examples
632--------
633
634How to extract an entire tar archive to the current working directory::
635
636 import tarfile
637 tar = tarfile.open("sample.tar.gz")
638 tar.extractall()
639 tar.close()
640
Lars Gustäbel4bfb5932008-05-17 16:50:22 +0000641How to extract a subset of a tar archive with :meth:`TarFile.extractall` using
642a generator function instead of a list::
643
644 import os
645 import tarfile
646
647 def py_files(members):
648 for tarinfo in members:
649 if os.path.splitext(tarinfo.name)[1] == ".py":
650 yield tarinfo
651
652 tar = tarfile.open("sample.tar.gz")
653 tar.extractall(members=py_files(tar))
654 tar.close()
655
Georg Brandl8ec7f652007-08-15 14:28:01 +0000656How to create an uncompressed tar archive from a list of filenames::
657
658 import tarfile
659 tar = tarfile.open("sample.tar", "w")
660 for name in ["foo", "bar", "quux"]:
661 tar.add(name)
662 tar.close()
663
Lars Gustäbel64581042010-03-03 11:55:48 +0000664The same example using the :keyword:`with` statement::
665
666 import tarfile
667 with tarfile.open("sample.tar", "w") as tar:
668 for name in ["foo", "bar", "quux"]:
669 tar.add(name)
670
Georg Brandl8ec7f652007-08-15 14:28:01 +0000671How to read a gzip compressed tar archive and display some member information::
672
673 import tarfile
674 tar = tarfile.open("sample.tar.gz", "r:gz")
675 for tarinfo in tar:
676 print tarinfo.name, "is", tarinfo.size, "bytes in size and is",
677 if tarinfo.isreg():
678 print "a regular file."
679 elif tarinfo.isdir():
680 print "a directory."
681 else:
682 print "something else."
683 tar.close()
684
Lars Gustäbel21121e62009-09-12 10:28:15 +0000685How to create an archive and reset the user information using the *filter*
686parameter in :meth:`TarFile.add`::
687
688 import tarfile
689 def reset(tarinfo):
690 tarinfo.uid = tarinfo.gid = 0
691 tarinfo.uname = tarinfo.gname = "root"
692 return tarinfo
693 tar = tarfile.open("sample.tar.gz", "w:gz")
694 tar.add("foo", filter=reset)
695 tar.close()
696
Georg Brandl8ec7f652007-08-15 14:28:01 +0000697
698.. _tar-formats:
699
700Supported tar formats
701---------------------
702
703There are three tar formats that can be created with the :mod:`tarfile` module:
704
705* The POSIX.1-1988 ustar format (:const:`USTAR_FORMAT`). It supports filenames
706 up to a length of at best 256 characters and linknames up to 100 characters. The
707 maximum file size is 8 gigabytes. This is an old and limited but widely
708 supported format.
709
710* The GNU tar format (:const:`GNU_FORMAT`). It supports long filenames and
711 linknames, files bigger than 8 gigabytes and sparse files. It is the de facto
712 standard on GNU/Linux systems. :mod:`tarfile` fully supports the GNU tar
713 extensions for long names, sparse file support is read-only.
714
715* The POSIX.1-2001 pax format (:const:`PAX_FORMAT`). It is the most flexible
716 format with virtually no limits. It supports long filenames and linknames, large
717 files and stores pathnames in a portable way. However, not all tar
718 implementations today are able to handle pax archives properly.
719
720 The *pax* format is an extension to the existing *ustar* format. It uses extra
721 headers for information that cannot be stored otherwise. There are two flavours
722 of pax headers: Extended headers only affect the subsequent file header, global
723 headers are valid for the complete archive and affect all following files. All
724 the data in a pax header is encoded in *UTF-8* for portability reasons.
725
726There are some more variants of the tar format which can be read, but not
727created:
728
729* The ancient V7 format. This is the first tar format from Unix Seventh Edition,
730 storing only regular files and directories. Names must not be longer than 100
731 characters, there is no user/group name information. Some archives have
732 miscalculated header checksums in case of fields with non-ASCII characters.
733
734* The SunOS tar extended format. This format is a variant of the POSIX.1-2001
735 pax format, but is not compatible.
736
Georg Brandl8ec7f652007-08-15 14:28:01 +0000737.. _tar-unicode:
738
739Unicode issues
740--------------
741
742The tar format was originally conceived to make backups on tape drives with the
743main focus on preserving file system information. Nowadays tar archives are
744commonly used for file distribution and exchanging archives over networks. One
745problem of the original format (that all other formats are merely variants of)
746is that there is no concept of supporting different character encodings. For
747example, an ordinary tar archive created on a *UTF-8* system cannot be read
748correctly on a *Latin-1* system if it contains non-ASCII characters. Names (i.e.
749filenames, linknames, user/group names) containing these characters will appear
750damaged. Unfortunately, there is no way to autodetect the encoding of an
751archive.
752
753The pax format was designed to solve this problem. It stores non-ASCII names
754using the universal character encoding *UTF-8*. When a pax archive is read,
755these *UTF-8* names are converted to the encoding of the local file system.
756
757The details of unicode conversion are controlled by the *encoding* and *errors*
758keyword arguments of the :class:`TarFile` class.
759
760The default value for *encoding* is the local character encoding. It is deduced
761from :func:`sys.getfilesystemencoding` and :func:`sys.getdefaultencoding`. In
762read mode, *encoding* is used exclusively to convert unicode names from a pax
763archive to strings in the local character encoding. In write mode, the use of
764*encoding* depends on the chosen archive format. In case of :const:`PAX_FORMAT`,
765input names that contain non-ASCII characters need to be decoded before being
766stored as *UTF-8* strings. The other formats do not make use of *encoding*
767unless unicode objects are used as input names. These are converted to 8-bit
768character strings before they are added to the archive.
769
770The *errors* argument defines how characters are treated that cannot be
771converted to or from *encoding*. Possible values are listed in section
772:ref:`codec-base-classes`. In read mode, there is an additional scheme
773``'utf-8'`` which means that bad characters are replaced by their *UTF-8*
774representation. This is the default scheme. In write mode the default value for
775*errors* is ``'strict'`` to ensure that name information is not altered
776unnoticed.
777