blob: b1d736130e7b49aebade4762ebfe647a26ee22d7 [file] [log] [blame]
Georg Brandl116aa622007-08-15 14:28:22 +00001:mod:`tarfile` --- Read and write tar archive files
2===================================================
3
4.. module:: tarfile
5 :synopsis: Read and write tar-format archive files.
6
7
Georg Brandl116aa622007-08-15 14:28:22 +00008.. moduleauthor:: Lars Gustäbel <lars@gustaebel.de>
9.. sectionauthor:: Lars Gustäbel <lars@gustaebel.de>
10
11
Guido van Rossum77677112007-11-05 19:43:04 +000012The :mod:`tarfile` module makes it possible to read and write tar
13archives, including those using gzip or bz2 compression.
Christian Heimes255f53b2007-12-08 15:33:56 +000014(:file:`.zip` files can be read and written using the :mod:`zipfile` module.)
Guido van Rossum77677112007-11-05 19:43:04 +000015
Georg Brandl116aa622007-08-15 14:28:22 +000016Some facts and figures:
17
Guido van Rossum77677112007-11-05 19:43:04 +000018* reads and writes :mod:`gzip` and :mod:`bz2` compressed archives.
Georg Brandl116aa622007-08-15 14:28:22 +000019
20* read/write support for the POSIX.1-1988 (ustar) format.
21
22* read/write support for the GNU tar format including *longname* and *longlink*
23 extensions, read-only support for the *sparse* extension.
24
25* read/write support for the POSIX.1-2001 (pax) format.
26
Georg Brandl116aa622007-08-15 14:28:22 +000027* handles directories, regular files, hardlinks, symbolic links, fifos,
28 character devices and block devices and is able to acquire and restore file
29 information like timestamp, access permissions and owner.
30
Georg Brandl116aa622007-08-15 14:28:22 +000031
Benjamin Petersona37cfc62008-05-26 13:48:34 +000032.. function:: open(name=None, mode='r', fileobj=None, bufsize=10240, \*\*kwargs)
Georg Brandl116aa622007-08-15 14:28:22 +000033
34 Return a :class:`TarFile` object for the pathname *name*. For detailed
35 information on :class:`TarFile` objects and the keyword arguments that are
36 allowed, see :ref:`tarfile-objects`.
37
38 *mode* has to be a string of the form ``'filemode[:compression]'``, it defaults
39 to ``'r'``. Here is a full list of mode combinations:
40
41 +------------------+---------------------------------------------+
42 | mode | action |
43 +==================+=============================================+
44 | ``'r' or 'r:*'`` | Open for reading with transparent |
45 | | compression (recommended). |
46 +------------------+---------------------------------------------+
47 | ``'r:'`` | Open for reading exclusively without |
48 | | compression. |
49 +------------------+---------------------------------------------+
50 | ``'r:gz'`` | Open for reading with gzip compression. |
51 +------------------+---------------------------------------------+
52 | ``'r:bz2'`` | Open for reading with bzip2 compression. |
53 +------------------+---------------------------------------------+
54 | ``'a' or 'a:'`` | Open for appending with no compression. The |
55 | | file is created if it does not exist. |
56 +------------------+---------------------------------------------+
57 | ``'w' or 'w:'`` | Open for uncompressed writing. |
58 +------------------+---------------------------------------------+
59 | ``'w:gz'`` | Open for gzip compressed writing. |
60 +------------------+---------------------------------------------+
61 | ``'w:bz2'`` | Open for bzip2 compressed writing. |
62 +------------------+---------------------------------------------+
63
64 Note that ``'a:gz'`` or ``'a:bz2'`` is not possible. If *mode* is not suitable
65 to open a certain (compressed) file for reading, :exc:`ReadError` is raised. Use
66 *mode* ``'r'`` to avoid this. If a compression method is not supported,
67 :exc:`CompressionError` is raised.
68
69 If *fileobj* is specified, it is used as an alternative to a file object opened
70 for *name*. It is supposed to be at position 0.
71
72 For special purposes, there is a second format for *mode*:
Benjamin Petersona37cfc62008-05-26 13:48:34 +000073 ``'filemode|[compression]'``. :func:`tarfile.open` will return a :class:`TarFile`
Georg Brandl116aa622007-08-15 14:28:22 +000074 object that processes its data as a stream of blocks. No random seeking will
75 be done on the file. If given, *fileobj* may be any object that has a
76 :meth:`read` or :meth:`write` method (depending on the *mode*). *bufsize*
77 specifies the blocksize and defaults to ``20 * 512`` bytes. Use this variant
78 in combination with e.g. ``sys.stdin``, a socket file object or a tape
79 device. However, such a :class:`TarFile` object is limited in that it does
80 not allow to be accessed randomly, see :ref:`tar-examples`. The currently
81 possible modes:
82
83 +-------------+--------------------------------------------+
84 | Mode | Action |
85 +=============+============================================+
86 | ``'r|*'`` | Open a *stream* of tar blocks for reading |
87 | | with transparent compression. |
88 +-------------+--------------------------------------------+
89 | ``'r|'`` | Open a *stream* of uncompressed tar blocks |
90 | | for reading. |
91 +-------------+--------------------------------------------+
92 | ``'r|gz'`` | Open a gzip compressed *stream* for |
93 | | reading. |
94 +-------------+--------------------------------------------+
95 | ``'r|bz2'`` | Open a bzip2 compressed *stream* for |
96 | | reading. |
97 +-------------+--------------------------------------------+
98 | ``'w|'`` | Open an uncompressed *stream* for writing. |
99 +-------------+--------------------------------------------+
100 | ``'w|gz'`` | Open an gzip compressed *stream* for |
101 | | writing. |
102 +-------------+--------------------------------------------+
103 | ``'w|bz2'`` | Open an bzip2 compressed *stream* for |
104 | | writing. |
105 +-------------+--------------------------------------------+
106
107
108.. class:: TarFile
109
110 Class for reading and writing tar archives. Do not use this class directly,
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000111 better use :func:`tarfile.open` instead. See :ref:`tarfile-objects`.
Georg Brandl116aa622007-08-15 14:28:22 +0000112
113
114.. function:: is_tarfile(name)
115
116 Return :const:`True` if *name* is a tar archive file, that the :mod:`tarfile`
117 module can read.
118
119
Lars Gustäbel0c24e8b2008-08-02 11:43:24 +0000120The :mod:`tarfile` module defines the following exceptions:
Georg Brandl116aa622007-08-15 14:28:22 +0000121
122
123.. exception:: TarError
124
125 Base class for all :mod:`tarfile` exceptions.
126
127
128.. exception:: ReadError
129
130 Is raised when a tar archive is opened, that either cannot be handled by the
131 :mod:`tarfile` module or is somehow invalid.
132
133
134.. exception:: CompressionError
135
136 Is raised when a compression method is not supported or when the data cannot be
137 decoded properly.
138
139
140.. exception:: StreamError
141
142 Is raised for the limitations that are typical for stream-like :class:`TarFile`
143 objects.
144
145
146.. exception:: ExtractError
147
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000148 Is raised for *non-fatal* errors when using :meth:`TarFile.extract`, but only if
Georg Brandl116aa622007-08-15 14:28:22 +0000149 :attr:`TarFile.errorlevel`\ ``== 2``.
150
151
152.. exception:: HeaderError
153
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000154 Is raised by :meth:`TarInfo.frombuf` if the buffer it gets is invalid.
155
Georg Brandl116aa622007-08-15 14:28:22 +0000156
Georg Brandl116aa622007-08-15 14:28:22 +0000157
158Each of the following constants defines a tar archive format that the
159:mod:`tarfile` module is able to create. See section :ref:`tar-formats` for
160details.
161
162
163.. data:: USTAR_FORMAT
164
165 POSIX.1-1988 (ustar) format.
166
167
168.. data:: GNU_FORMAT
169
170 GNU tar format.
171
172
173.. data:: PAX_FORMAT
174
175 POSIX.1-2001 (pax) format.
176
177
178.. data:: DEFAULT_FORMAT
179
180 The default format for creating archives. This is currently :const:`GNU_FORMAT`.
181
182
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000183The following variables are available on module level:
184
185
186.. data:: ENCODING
187
Victor Stinner0f35e2c2010-06-11 23:46:47 +0000188 The default character encoding: ``'utf-8'`` on Windows,
189 :func:`sys.getfilesystemencoding` otherwise.
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000190
191
Georg Brandl116aa622007-08-15 14:28:22 +0000192.. seealso::
193
194 Module :mod:`zipfile`
195 Documentation of the :mod:`zipfile` standard module.
196
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000197 `GNU tar manual, Basic Tar Format <http://www.gnu.org/software/tar/manual/html_node/Standard.html>`_
Georg Brandl116aa622007-08-15 14:28:22 +0000198 Documentation for tar archive files, including GNU tar extensions.
199
Georg Brandl116aa622007-08-15 14:28:22 +0000200
201.. _tarfile-objects:
202
203TarFile Objects
204---------------
205
206The :class:`TarFile` object provides an interface to a tar archive. A tar
207archive is a sequence of blocks. An archive member (a stored file) is made up of
208a header block followed by data blocks. It is possible to store a file in a tar
209archive several times. Each archive member is represented by a :class:`TarInfo`
210object, see :ref:`tarinfo-objects` for details.
211
Lars Gustäbel01385812010-03-03 12:08:54 +0000212A :class:`TarFile` object can be used as a context manager in a :keyword:`with`
213statement. It will automatically be closed when the block is completed. Please
214note that in the event of an exception an archive opened for writing will not
Benjamin Peterson08bf91c2010-04-11 16:12:57 +0000215be finalized; only the internally used file object will be closed. See the
Lars Gustäbel01385812010-03-03 12:08:54 +0000216:ref:`tar-examples` section for a use case.
217
218.. versionadded:: 3.2
219 Added support for the context manager protocol.
Georg Brandl116aa622007-08-15 14:28:22 +0000220
Victor Stinnerde629d42010-05-05 21:43:57 +0000221.. class:: TarFile(name=None, mode='r', fileobj=None, format=DEFAULT_FORMAT, tarinfo=TarInfo, dereference=False, ignore_zeros=False, encoding=ENCODING, errors='surrogateescape', pax_headers=None, debug=0, errorlevel=0)
Georg Brandl116aa622007-08-15 14:28:22 +0000222
223 All following arguments are optional and can be accessed as instance attributes
224 as well.
225
226 *name* is the pathname of the archive. It can be omitted if *fileobj* is given.
227 In this case, the file object's :attr:`name` attribute is used if it exists.
228
229 *mode* is either ``'r'`` to read from an existing archive, ``'a'`` to append
230 data to an existing file or ``'w'`` to create a new file overwriting an existing
231 one.
232
233 If *fileobj* is given, it is used for reading or writing data. If it can be
234 determined, *mode* is overridden by *fileobj*'s mode. *fileobj* will be used
235 from position 0.
236
237 .. note::
238
239 *fileobj* is not closed, when :class:`TarFile` is closed.
240
241 *format* controls the archive format. It must be one of the constants
242 :const:`USTAR_FORMAT`, :const:`GNU_FORMAT` or :const:`PAX_FORMAT` that are
243 defined at module level.
244
Georg Brandl116aa622007-08-15 14:28:22 +0000245 The *tarinfo* argument can be used to replace the default :class:`TarInfo` class
246 with a different one.
247
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000248 If *dereference* is :const:`False`, add symbolic and hard links to the archive. If it
249 is :const:`True`, add the content of the target files to the archive. This has no
Georg Brandl116aa622007-08-15 14:28:22 +0000250 effect on systems that do not support symbolic links.
251
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000252 If *ignore_zeros* is :const:`False`, treat an empty block as the end of the archive.
253 If it is :const:`True`, skip empty (and invalid) blocks and try to get as many members
Georg Brandl116aa622007-08-15 14:28:22 +0000254 as possible. This is only useful for reading concatenated or damaged archives.
255
256 *debug* can be set from ``0`` (no debug messages) up to ``3`` (all debug
257 messages). The messages are written to ``sys.stderr``.
258
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000259 If *errorlevel* is ``0``, all errors are ignored when using :meth:`TarFile.extract`.
Georg Brandl116aa622007-08-15 14:28:22 +0000260 Nevertheless, they appear as error messages in the debug output, when debugging
261 is enabled. If ``1``, all *fatal* errors are raised as :exc:`OSError` or
262 :exc:`IOError` exceptions. If ``2``, all *non-fatal* errors are raised as
263 :exc:`TarError` exceptions as well.
264
Lars Gustäbel3741eff2007-08-21 12:17:05 +0000265 The *encoding* and *errors* arguments define the character encoding to be
266 used for reading or writing the archive and how conversion errors are going
267 to be handled. The default settings will work for most users.
Georg Brandl116aa622007-08-15 14:28:22 +0000268 See section :ref:`tar-unicode` for in-depth information.
269
Victor Stinnerde629d42010-05-05 21:43:57 +0000270 .. versionchanged:: 3.2
271 Use ``'surrogateescape'`` as the default for the *errors* argument.
272
Lars Gustäbel3741eff2007-08-21 12:17:05 +0000273 The *pax_headers* argument is an optional dictionary of strings which
Georg Brandl116aa622007-08-15 14:28:22 +0000274 will be added as a pax global header if *format* is :const:`PAX_FORMAT`.
275
Georg Brandl116aa622007-08-15 14:28:22 +0000276
277.. method:: TarFile.open(...)
278
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000279 Alternative constructor. The :func:`tarfile.open` function is actually a
280 shortcut to this classmethod.
Georg Brandl116aa622007-08-15 14:28:22 +0000281
282
283.. method:: TarFile.getmember(name)
284
285 Return a :class:`TarInfo` object for member *name*. If *name* can not be found
286 in the archive, :exc:`KeyError` is raised.
287
288 .. note::
289
290 If a member occurs more than once in the archive, its last occurrence is assumed
291 to be the most up-to-date version.
292
293
294.. method:: TarFile.getmembers()
295
296 Return the members of the archive as a list of :class:`TarInfo` objects. The
297 list has the same order as the members in the archive.
298
299
300.. method:: TarFile.getnames()
301
302 Return the members as a list of their names. It has the same order as the list
303 returned by :meth:`getmembers`.
304
305
306.. method:: TarFile.list(verbose=True)
307
308 Print a table of contents to ``sys.stdout``. If *verbose* is :const:`False`,
309 only the names of the members are printed. If it is :const:`True`, output
310 similar to that of :program:`ls -l` is produced.
311
312
313.. method:: TarFile.next()
314
315 Return the next member of the archive as a :class:`TarInfo` object, when
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000316 :class:`TarFile` is opened for reading. Return :const:`None` if there is no more
Georg Brandl116aa622007-08-15 14:28:22 +0000317 available.
318
319
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000320.. method:: TarFile.extractall(path=".", members=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000321
322 Extract all members from the archive to the current working directory or
323 directory *path*. If optional *members* is given, it must be a subset of the
324 list returned by :meth:`getmembers`. Directory information like owner,
325 modification time and permissions are set after all members have been extracted.
326 This is done to work around two problems: A directory's modification time is
327 reset each time a file is created in it. And, if a directory's permissions do
328 not allow writing, extracting files to it will fail.
329
Thomas Wouters47b49bf2007-08-30 22:15:33 +0000330 .. warning::
331
332 Never extract archives from untrusted sources without prior inspection.
333 It is possible that files are created outside of *path*, e.g. members
334 that have absolute filenames starting with ``"/"`` or filenames with two
335 dots ``".."``.
336
Georg Brandl116aa622007-08-15 14:28:22 +0000337
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000338.. method:: TarFile.extract(member, path="")
Georg Brandl116aa622007-08-15 14:28:22 +0000339
340 Extract a member from the archive to the current working directory, using its
341 full name. Its file information is extracted as accurately as possible. *member*
342 may be a filename or a :class:`TarInfo` object. You can specify a different
343 directory using *path*.
344
345 .. note::
346
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000347 The :meth:`extract` method does not take care of several extraction issues.
348 In most cases you should consider using the :meth:`extractall` method.
Georg Brandl116aa622007-08-15 14:28:22 +0000349
Thomas Wouters47b49bf2007-08-30 22:15:33 +0000350 .. warning::
351
352 See the warning for :meth:`extractall`.
353
Georg Brandl116aa622007-08-15 14:28:22 +0000354
355.. method:: TarFile.extractfile(member)
356
357 Extract a member from the archive as a file object. *member* may be a filename
358 or a :class:`TarInfo` object. If *member* is a regular file, a file-like object
359 is returned. If *member* is a link, a file-like object is constructed from the
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000360 link's target. If *member* is none of the above, :const:`None` is returned.
Georg Brandl116aa622007-08-15 14:28:22 +0000361
362 .. note::
363
Georg Brandlff2ad0e2009-04-27 16:51:45 +0000364 The file-like object is read-only. It provides the methods
365 :meth:`read`, :meth:`readline`, :meth:`readlines`, :meth:`seek`, :meth:`tell`,
366 and :meth:`close`, and also supports iteration over its lines.
Georg Brandl116aa622007-08-15 14:28:22 +0000367
368
Lars Gustäbel049d2aa2009-09-12 10:44:00 +0000369.. method:: TarFile.add(name, arcname=None, recursive=True, exclude=None, filter=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000370
371 Add the file *name* to the archive. *name* may be any type of file (directory,
372 fifo, symbolic link, etc.). If given, *arcname* specifies an alternative name
373 for the file in the archive. Directories are added recursively by default. This
Georg Brandl55ac8f02007-09-01 13:51:09 +0000374 can be avoided by setting *recursive* to :const:`False`. If *exclude* is given,
Georg Brandl116aa622007-08-15 14:28:22 +0000375 it must be a function that takes one filename argument and returns a boolean
376 value. Depending on this value the respective file is either excluded
Lars Gustäbel049d2aa2009-09-12 10:44:00 +0000377 (:const:`True`) or added (:const:`False`). If *filter* is specified it must
378 be a function that takes a :class:`TarInfo` object argument and returns the
Benjamin Petersona0dfa822009-11-13 02:25:08 +0000379 changed :class:`TarInfo` object. If it instead returns :const:`None` the :class:`TarInfo`
Lars Gustäbel049d2aa2009-09-12 10:44:00 +0000380 object will be excluded from the archive. See :ref:`tar-examples` for an
381 example.
382
383 .. versionchanged:: 3.2
384 Added the *filter* parameter.
385
386 .. deprecated:: 3.2
387 The *exclude* parameter is deprecated, please use the *filter* parameter
388 instead.
Georg Brandl116aa622007-08-15 14:28:22 +0000389
Georg Brandl116aa622007-08-15 14:28:22 +0000390
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000391.. method:: TarFile.addfile(tarinfo, fileobj=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000392
393 Add the :class:`TarInfo` object *tarinfo* to the archive. If *fileobj* is given,
394 ``tarinfo.size`` bytes are read from it and added to the archive. You can
395 create :class:`TarInfo` objects using :meth:`gettarinfo`.
396
397 .. note::
398
399 On Windows platforms, *fileobj* should always be opened with mode ``'rb'`` to
400 avoid irritation about the file size.
401
402
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000403.. method:: TarFile.gettarinfo(name=None, arcname=None, fileobj=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000404
405 Create a :class:`TarInfo` object for either the file *name* or the file object
406 *fileobj* (using :func:`os.fstat` on its file descriptor). You can modify some
407 of the :class:`TarInfo`'s attributes before you add it using :meth:`addfile`.
408 If given, *arcname* specifies an alternative name for the file in the archive.
409
410
411.. method:: TarFile.close()
412
413 Close the :class:`TarFile`. In write mode, two finishing zero blocks are
414 appended to the archive.
415
416
Georg Brandl116aa622007-08-15 14:28:22 +0000417.. attribute:: TarFile.pax_headers
418
419 A dictionary containing key-value pairs of pax global headers.
420
Georg Brandl116aa622007-08-15 14:28:22 +0000421
Georg Brandl116aa622007-08-15 14:28:22 +0000422
423.. _tarinfo-objects:
424
425TarInfo Objects
426---------------
427
428A :class:`TarInfo` object represents one member in a :class:`TarFile`. Aside
429from storing all required attributes of a file (like file type, size, time,
430permissions, owner etc.), it provides some useful methods to determine its type.
431It does *not* contain the file's data itself.
432
433:class:`TarInfo` objects are returned by :class:`TarFile`'s methods
434:meth:`getmember`, :meth:`getmembers` and :meth:`gettarinfo`.
435
436
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000437.. class:: TarInfo(name="")
Georg Brandl116aa622007-08-15 14:28:22 +0000438
439 Create a :class:`TarInfo` object.
440
441
442.. method:: TarInfo.frombuf(buf)
443
444 Create and return a :class:`TarInfo` object from string buffer *buf*.
445
Georg Brandl55ac8f02007-09-01 13:51:09 +0000446 Raises :exc:`HeaderError` if the buffer is invalid..
Georg Brandl116aa622007-08-15 14:28:22 +0000447
448
449.. method:: TarInfo.fromtarfile(tarfile)
450
451 Read the next member from the :class:`TarFile` object *tarfile* and return it as
452 a :class:`TarInfo` object.
453
Georg Brandl116aa622007-08-15 14:28:22 +0000454
Victor Stinnerde629d42010-05-05 21:43:57 +0000455.. method:: TarInfo.tobuf(format=DEFAULT_FORMAT, encoding=ENCODING, errors='surrogateescape')
Georg Brandl116aa622007-08-15 14:28:22 +0000456
457 Create a string buffer from a :class:`TarInfo` object. For information on the
458 arguments see the constructor of the :class:`TarFile` class.
459
Victor Stinnerde629d42010-05-05 21:43:57 +0000460 .. versionchanged:: 3.2
461 Use ``'surrogateescape'`` as the default for the *errors* argument.
462
Georg Brandl116aa622007-08-15 14:28:22 +0000463
464A ``TarInfo`` object has the following public data attributes:
465
466
467.. attribute:: TarInfo.name
468
469 Name of the archive member.
470
471
472.. attribute:: TarInfo.size
473
474 Size in bytes.
475
476
477.. attribute:: TarInfo.mtime
478
479 Time of last modification.
480
481
482.. attribute:: TarInfo.mode
483
484 Permission bits.
485
486
487.. attribute:: TarInfo.type
488
489 File type. *type* is usually one of these constants: :const:`REGTYPE`,
490 :const:`AREGTYPE`, :const:`LNKTYPE`, :const:`SYMTYPE`, :const:`DIRTYPE`,
491 :const:`FIFOTYPE`, :const:`CONTTYPE`, :const:`CHRTYPE`, :const:`BLKTYPE`,
492 :const:`GNUTYPE_SPARSE`. To determine the type of a :class:`TarInfo` object
493 more conveniently, use the ``is_*()`` methods below.
494
495
496.. attribute:: TarInfo.linkname
497
498 Name of the target file name, which is only present in :class:`TarInfo` objects
499 of type :const:`LNKTYPE` and :const:`SYMTYPE`.
500
501
502.. attribute:: TarInfo.uid
503
504 User ID of the user who originally stored this member.
505
506
507.. attribute:: TarInfo.gid
508
509 Group ID of the user who originally stored this member.
510
511
512.. attribute:: TarInfo.uname
513
514 User name.
515
516
517.. attribute:: TarInfo.gname
518
519 Group name.
520
521
522.. attribute:: TarInfo.pax_headers
523
524 A dictionary containing key-value pairs of an associated pax extended header.
525
Georg Brandl116aa622007-08-15 14:28:22 +0000526
527A :class:`TarInfo` object also provides some convenient query methods:
528
529
530.. method:: TarInfo.isfile()
531
532 Return :const:`True` if the :class:`Tarinfo` object is a regular file.
533
534
535.. method:: TarInfo.isreg()
536
537 Same as :meth:`isfile`.
538
539
540.. method:: TarInfo.isdir()
541
542 Return :const:`True` if it is a directory.
543
544
545.. method:: TarInfo.issym()
546
547 Return :const:`True` if it is a symbolic link.
548
549
550.. method:: TarInfo.islnk()
551
552 Return :const:`True` if it is a hard link.
553
554
555.. method:: TarInfo.ischr()
556
557 Return :const:`True` if it is a character device.
558
559
560.. method:: TarInfo.isblk()
561
562 Return :const:`True` if it is a block device.
563
564
565.. method:: TarInfo.isfifo()
566
567 Return :const:`True` if it is a FIFO.
568
569
570.. method:: TarInfo.isdev()
571
572 Return :const:`True` if it is one of character device, block device or FIFO.
573
Georg Brandl116aa622007-08-15 14:28:22 +0000574
575.. _tar-examples:
576
577Examples
578--------
579
580How to extract an entire tar archive to the current working directory::
581
582 import tarfile
583 tar = tarfile.open("sample.tar.gz")
584 tar.extractall()
585 tar.close()
586
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000587How to extract a subset of a tar archive with :meth:`TarFile.extractall` using
588a generator function instead of a list::
589
590 import os
591 import tarfile
592
593 def py_files(members):
594 for tarinfo in members:
595 if os.path.splitext(tarinfo.name)[1] == ".py":
596 yield tarinfo
597
598 tar = tarfile.open("sample.tar.gz")
599 tar.extractall(members=py_files(tar))
600 tar.close()
601
Georg Brandl116aa622007-08-15 14:28:22 +0000602How to create an uncompressed tar archive from a list of filenames::
603
604 import tarfile
605 tar = tarfile.open("sample.tar", "w")
606 for name in ["foo", "bar", "quux"]:
607 tar.add(name)
608 tar.close()
609
Lars Gustäbel01385812010-03-03 12:08:54 +0000610The same example using the :keyword:`with` statement::
611
612 import tarfile
613 with tarfile.open("sample.tar", "w") as tar:
614 for name in ["foo", "bar", "quux"]:
615 tar.add(name)
616
Georg Brandl116aa622007-08-15 14:28:22 +0000617How to read a gzip compressed tar archive and display some member information::
618
619 import tarfile
620 tar = tarfile.open("sample.tar.gz", "r:gz")
621 for tarinfo in tar:
Collin Winterc79461b2007-09-01 23:34:30 +0000622 print(tarinfo.name, "is", tarinfo.size, "bytes in size and is", end="")
Georg Brandl116aa622007-08-15 14:28:22 +0000623 if tarinfo.isreg():
Collin Winterc79461b2007-09-01 23:34:30 +0000624 print("a regular file.")
Georg Brandl116aa622007-08-15 14:28:22 +0000625 elif tarinfo.isdir():
Collin Winterc79461b2007-09-01 23:34:30 +0000626 print("a directory.")
Georg Brandl116aa622007-08-15 14:28:22 +0000627 else:
Collin Winterc79461b2007-09-01 23:34:30 +0000628 print("something else.")
Georg Brandl116aa622007-08-15 14:28:22 +0000629 tar.close()
630
Lars Gustäbel049d2aa2009-09-12 10:44:00 +0000631How to create an archive and reset the user information using the *filter*
632parameter in :meth:`TarFile.add`::
633
634 import tarfile
635 def reset(tarinfo):
636 tarinfo.uid = tarinfo.gid = 0
637 tarinfo.uname = tarinfo.gname = "root"
638 return tarinfo
639 tar = tarfile.open("sample.tar.gz", "w:gz")
640 tar.add("foo", filter=reset)
641 tar.close()
642
Georg Brandl116aa622007-08-15 14:28:22 +0000643
644.. _tar-formats:
645
646Supported tar formats
647---------------------
648
649There are three tar formats that can be created with the :mod:`tarfile` module:
650
651* The POSIX.1-1988 ustar format (:const:`USTAR_FORMAT`). It supports filenames
652 up to a length of at best 256 characters and linknames up to 100 characters. The
653 maximum file size is 8 gigabytes. This is an old and limited but widely
654 supported format.
655
656* The GNU tar format (:const:`GNU_FORMAT`). It supports long filenames and
657 linknames, files bigger than 8 gigabytes and sparse files. It is the de facto
658 standard on GNU/Linux systems. :mod:`tarfile` fully supports the GNU tar
659 extensions for long names, sparse file support is read-only.
660
661* The POSIX.1-2001 pax format (:const:`PAX_FORMAT`). It is the most flexible
662 format with virtually no limits. It supports long filenames and linknames, large
663 files and stores pathnames in a portable way. However, not all tar
664 implementations today are able to handle pax archives properly.
665
666 The *pax* format is an extension to the existing *ustar* format. It uses extra
667 headers for information that cannot be stored otherwise. There are two flavours
668 of pax headers: Extended headers only affect the subsequent file header, global
669 headers are valid for the complete archive and affect all following files. All
670 the data in a pax header is encoded in *UTF-8* for portability reasons.
671
672There are some more variants of the tar format which can be read, but not
673created:
674
675* The ancient V7 format. This is the first tar format from Unix Seventh Edition,
676 storing only regular files and directories. Names must not be longer than 100
677 characters, there is no user/group name information. Some archives have
678 miscalculated header checksums in case of fields with non-ASCII characters.
679
680* The SunOS tar extended format. This format is a variant of the POSIX.1-2001
681 pax format, but is not compatible.
682
Georg Brandl116aa622007-08-15 14:28:22 +0000683.. _tar-unicode:
684
685Unicode issues
686--------------
687
688The tar format was originally conceived to make backups on tape drives with the
689main focus on preserving file system information. Nowadays tar archives are
690commonly used for file distribution and exchanging archives over networks. One
Lars Gustäbel3741eff2007-08-21 12:17:05 +0000691problem of the original format (which is the basis of all other formats) is
692that there is no concept of supporting different character encodings. For
Georg Brandl116aa622007-08-15 14:28:22 +0000693example, an ordinary tar archive created on a *UTF-8* system cannot be read
Lars Gustäbel3741eff2007-08-21 12:17:05 +0000694correctly on a *Latin-1* system if it contains non-*ASCII* characters. Textual
695metadata (like filenames, linknames, user/group names) will appear damaged.
696Unfortunately, there is no way to autodetect the encoding of an archive. The
697pax format was designed to solve this problem. It stores non-ASCII metadata
698using the universal character encoding *UTF-8*.
Georg Brandl116aa622007-08-15 14:28:22 +0000699
Lars Gustäbel3741eff2007-08-21 12:17:05 +0000700The details of character conversion in :mod:`tarfile` are controlled by the
701*encoding* and *errors* keyword arguments of the :class:`TarFile` class.
Georg Brandl116aa622007-08-15 14:28:22 +0000702
Lars Gustäbel3741eff2007-08-21 12:17:05 +0000703*encoding* defines the character encoding to use for the metadata in the
704archive. The default value is :func:`sys.getfilesystemencoding` or ``'ascii'``
705as a fallback. Depending on whether the archive is read or written, the
706metadata must be either decoded or encoded. If *encoding* is not set
707appropriately, this conversion may fail.
Georg Brandl116aa622007-08-15 14:28:22 +0000708
709The *errors* argument defines how characters are treated that cannot be
Victor Stinnerde629d42010-05-05 21:43:57 +0000710converted. Possible values are listed in section :ref:`codec-base-classes`.
711The default scheme is ``'surrogateescape'`` which Python also uses for its
712file system calls, see :ref:`os-filenames`.
Georg Brandl116aa622007-08-15 14:28:22 +0000713
Lars Gustäbel1465cc22010-05-17 18:02:50 +0000714In case of :const:`PAX_FORMAT` archives, *encoding* is generally not needed
715because all the metadata is stored using *UTF-8*. *encoding* is only used in
716the rare cases when binary pax headers are decoded or when strings with
717surrogate characters are stored.
718