blob: b6124e1e40c5a89bd463d179b70452996db73105 [file] [log] [blame]
Georg Brandl8ec7f652007-08-15 14:28:01 +00001:mod:`tarfile` --- Read and write tar archive files
2===================================================
3
4.. module:: tarfile
5 :synopsis: Read and write tar-format archive files.
6
7
8.. versionadded:: 2.3
9
10.. moduleauthor:: Lars Gustäbel <lars@gustaebel.de>
11.. sectionauthor:: Lars Gustäbel <lars@gustaebel.de>
12
Éric Araujo29a0b572011-08-19 02:14:03 +020013**Source code:** :source:`Lib/tarfile.py`
14
15--------------
Georg Brandl8ec7f652007-08-15 14:28:01 +000016
Mark Summerfieldaea6e592007-11-05 09:22:48 +000017The :mod:`tarfile` module makes it possible to read and write tar
18archives, including those using gzip or bz2 compression.
Éric Araujoc3cc2ac2012-02-26 01:10:14 +010019Use the :mod:`zipfile` module to read or write :file:`.zip` files, or the
20higher-level functions in :ref:`shutil <archiving-operations>`.
Mark Summerfieldaea6e592007-11-05 09:22:48 +000021
Georg Brandl8ec7f652007-08-15 14:28:01 +000022Some facts and figures:
23
Mark Summerfieldaea6e592007-11-05 09:22:48 +000024* reads and writes :mod:`gzip` and :mod:`bz2` compressed archives.
Georg Brandl8ec7f652007-08-15 14:28:01 +000025
26* read/write support for the POSIX.1-1988 (ustar) format.
27
28* read/write support for the GNU tar format including *longname* and *longlink*
29 extensions, read-only support for the *sparse* extension.
30
31* read/write support for the POSIX.1-2001 (pax) format.
32
33 .. versionadded:: 2.6
34
35* handles directories, regular files, hardlinks, symbolic links, fifos,
36 character devices and block devices and is able to acquire and restore file
37 information like timestamp, access permissions and owner.
38
Georg Brandl8ec7f652007-08-15 14:28:01 +000039
Lars Gustäbel4bfb5932008-05-17 16:50:22 +000040.. function:: open(name=None, mode='r', fileobj=None, bufsize=10240, \*\*kwargs)
Georg Brandl8ec7f652007-08-15 14:28:01 +000041
42 Return a :class:`TarFile` object for the pathname *name*. For detailed
43 information on :class:`TarFile` objects and the keyword arguments that are
44 allowed, see :ref:`tarfile-objects`.
45
46 *mode* has to be a string of the form ``'filemode[:compression]'``, it defaults
47 to ``'r'``. Here is a full list of mode combinations:
48
49 +------------------+---------------------------------------------+
50 | mode | action |
51 +==================+=============================================+
52 | ``'r' or 'r:*'`` | Open for reading with transparent |
53 | | compression (recommended). |
54 +------------------+---------------------------------------------+
55 | ``'r:'`` | Open for reading exclusively without |
56 | | compression. |
57 +------------------+---------------------------------------------+
58 | ``'r:gz'`` | Open for reading with gzip compression. |
59 +------------------+---------------------------------------------+
60 | ``'r:bz2'`` | Open for reading with bzip2 compression. |
61 +------------------+---------------------------------------------+
62 | ``'a' or 'a:'`` | Open for appending with no compression. The |
63 | | file is created if it does not exist. |
64 +------------------+---------------------------------------------+
65 | ``'w' or 'w:'`` | Open for uncompressed writing. |
66 +------------------+---------------------------------------------+
67 | ``'w:gz'`` | Open for gzip compressed writing. |
68 +------------------+---------------------------------------------+
69 | ``'w:bz2'`` | Open for bzip2 compressed writing. |
70 +------------------+---------------------------------------------+
71
72 Note that ``'a:gz'`` or ``'a:bz2'`` is not possible. If *mode* is not suitable
73 to open a certain (compressed) file for reading, :exc:`ReadError` is raised. Use
74 *mode* ``'r'`` to avoid this. If a compression method is not supported,
75 :exc:`CompressionError` is raised.
76
77 If *fileobj* is specified, it is used as an alternative to a file object opened
78 for *name*. It is supposed to be at position 0.
79
80 For special purposes, there is a second format for *mode*:
Lars Gustäbel4bfb5932008-05-17 16:50:22 +000081 ``'filemode|[compression]'``. :func:`tarfile.open` will return a :class:`TarFile`
Georg Brandl8ec7f652007-08-15 14:28:01 +000082 object that processes its data as a stream of blocks. No random seeking will
83 be done on the file. If given, *fileobj* may be any object that has a
84 :meth:`read` or :meth:`write` method (depending on the *mode*). *bufsize*
85 specifies the blocksize and defaults to ``20 * 512`` bytes. Use this variant
86 in combination with e.g. ``sys.stdin``, a socket file object or a tape
87 device. However, such a :class:`TarFile` object is limited in that it does
88 not allow to be accessed randomly, see :ref:`tar-examples`. The currently
89 possible modes:
90
91 +-------------+--------------------------------------------+
92 | Mode | Action |
93 +=============+============================================+
94 | ``'r|*'`` | Open a *stream* of tar blocks for reading |
95 | | with transparent compression. |
96 +-------------+--------------------------------------------+
97 | ``'r|'`` | Open a *stream* of uncompressed tar blocks |
98 | | for reading. |
99 +-------------+--------------------------------------------+
100 | ``'r|gz'`` | Open a gzip compressed *stream* for |
101 | | reading. |
102 +-------------+--------------------------------------------+
103 | ``'r|bz2'`` | Open a bzip2 compressed *stream* for |
104 | | reading. |
105 +-------------+--------------------------------------------+
106 | ``'w|'`` | Open an uncompressed *stream* for writing. |
107 +-------------+--------------------------------------------+
108 | ``'w|gz'`` | Open an gzip compressed *stream* for |
109 | | writing. |
110 +-------------+--------------------------------------------+
111 | ``'w|bz2'`` | Open an bzip2 compressed *stream* for |
112 | | writing. |
113 +-------------+--------------------------------------------+
114
115
116.. class:: TarFile
117
118 Class for reading and writing tar archives. Do not use this class directly,
Lars Gustäbel4bfb5932008-05-17 16:50:22 +0000119 better use :func:`tarfile.open` instead. See :ref:`tarfile-objects`.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000120
121
122.. function:: is_tarfile(name)
123
124 Return :const:`True` if *name* is a tar archive file, that the :mod:`tarfile`
125 module can read.
126
127
Lars Gustäbel4bfb5932008-05-17 16:50:22 +0000128.. class:: TarFileCompat(filename, mode='r', compression=TAR_PLAIN)
Georg Brandl8ec7f652007-08-15 14:28:01 +0000129
130 Class for limited access to tar archives with a :mod:`zipfile`\ -like interface.
131 Please consult the documentation of the :mod:`zipfile` module for more details.
132 *compression* must be one of the following constants:
133
134
135 .. data:: TAR_PLAIN
136
137 Constant for an uncompressed tar archive.
138
139
140 .. data:: TAR_GZIPPED
141
142 Constant for a :mod:`gzip` compressed tar archive.
143
144
Lars Gustäbel727bd0b2008-08-02 11:26:39 +0000145 .. deprecated:: 2.6
146 The :class:`TarFileCompat` class has been deprecated for removal in Python 3.0.
147
148
Georg Brandl8ec7f652007-08-15 14:28:01 +0000149.. exception:: TarError
150
151 Base class for all :mod:`tarfile` exceptions.
152
153
154.. exception:: ReadError
155
156 Is raised when a tar archive is opened, that either cannot be handled by the
157 :mod:`tarfile` module or is somehow invalid.
158
159
160.. exception:: CompressionError
161
162 Is raised when a compression method is not supported or when the data cannot be
163 decoded properly.
164
165
166.. exception:: StreamError
167
168 Is raised for the limitations that are typical for stream-like :class:`TarFile`
169 objects.
170
171
172.. exception:: ExtractError
173
Lars Gustäbel4bfb5932008-05-17 16:50:22 +0000174 Is raised for *non-fatal* errors when using :meth:`TarFile.extract`, but only if
Georg Brandl8ec7f652007-08-15 14:28:01 +0000175 :attr:`TarFile.errorlevel`\ ``== 2``.
176
177
178.. exception:: HeaderError
179
Lars Gustäbel4bfb5932008-05-17 16:50:22 +0000180 Is raised by :meth:`TarInfo.frombuf` if the buffer it gets is invalid.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000181
182 .. versionadded:: 2.6
183
Lars Gustäbel4bfb5932008-05-17 16:50:22 +0000184
Georg Brandl8ec7f652007-08-15 14:28:01 +0000185Each of the following constants defines a tar archive format that the
186:mod:`tarfile` module is able to create. See section :ref:`tar-formats` for
187details.
188
189
190.. data:: USTAR_FORMAT
191
192 POSIX.1-1988 (ustar) format.
193
194
195.. data:: GNU_FORMAT
196
197 GNU tar format.
198
199
200.. data:: PAX_FORMAT
201
202 POSIX.1-2001 (pax) format.
203
204
205.. data:: DEFAULT_FORMAT
206
207 The default format for creating archives. This is currently :const:`GNU_FORMAT`.
208
209
Lars Gustäbel4bfb5932008-05-17 16:50:22 +0000210The following variables are available on module level:
211
212
213.. data:: ENCODING
214
215 The default character encoding i.e. the value from either
216 :func:`sys.getfilesystemencoding` or :func:`sys.getdefaultencoding`.
217
218
Georg Brandl8ec7f652007-08-15 14:28:01 +0000219.. seealso::
220
221 Module :mod:`zipfile`
222 Documentation of the :mod:`zipfile` standard module.
223
Lars Gustäbel4bfb5932008-05-17 16:50:22 +0000224 `GNU tar manual, Basic Tar Format <http://www.gnu.org/software/tar/manual/html_node/Standard.html>`_
Georg Brandl8ec7f652007-08-15 14:28:01 +0000225 Documentation for tar archive files, including GNU tar extensions.
226
Georg Brandl8ec7f652007-08-15 14:28:01 +0000227
228.. _tarfile-objects:
229
230TarFile Objects
231---------------
232
233The :class:`TarFile` object provides an interface to a tar archive. A tar
234archive is a sequence of blocks. An archive member (a stored file) is made up of
235a header block followed by data blocks. It is possible to store a file in a tar
236archive several times. Each archive member is represented by a :class:`TarInfo`
237object, see :ref:`tarinfo-objects` for details.
238
Lars Gustäbel64581042010-03-03 11:55:48 +0000239A :class:`TarFile` object can be used as a context manager in a :keyword:`with`
240statement. It will automatically be closed when the block is completed. Please
241note that in the event of an exception an archive opened for writing will not
Andrew M. Kuchlingca2413e2010-04-11 01:40:06 +0000242be finalized; only the internally used file object will be closed. See the
Lars Gustäbel64581042010-03-03 11:55:48 +0000243:ref:`tar-examples` section for a use case.
244
245.. versionadded:: 2.7
246 Added support for the context manager protocol.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000247
Lars Gustäbel4bfb5932008-05-17 16:50:22 +0000248.. class:: TarFile(name=None, mode='r', fileobj=None, format=DEFAULT_FORMAT, tarinfo=TarInfo, dereference=False, ignore_zeros=False, encoding=ENCODING, errors=None, pax_headers=None, debug=0, errorlevel=0)
Georg Brandl8ec7f652007-08-15 14:28:01 +0000249
250 All following arguments are optional and can be accessed as instance attributes
251 as well.
252
253 *name* is the pathname of the archive. It can be omitted if *fileobj* is given.
254 In this case, the file object's :attr:`name` attribute is used if it exists.
255
256 *mode* is either ``'r'`` to read from an existing archive, ``'a'`` to append
257 data to an existing file or ``'w'`` to create a new file overwriting an existing
258 one.
259
260 If *fileobj* is given, it is used for reading or writing data. If it can be
261 determined, *mode* is overridden by *fileobj*'s mode. *fileobj* will be used
262 from position 0.
263
264 .. note::
265
266 *fileobj* is not closed, when :class:`TarFile` is closed.
267
268 *format* controls the archive format. It must be one of the constants
269 :const:`USTAR_FORMAT`, :const:`GNU_FORMAT` or :const:`PAX_FORMAT` that are
270 defined at module level.
271
272 .. versionadded:: 2.6
273
274 The *tarinfo* argument can be used to replace the default :class:`TarInfo` class
275 with a different one.
276
277 .. versionadded:: 2.6
278
Lars Gustäbel4bfb5932008-05-17 16:50:22 +0000279 If *dereference* is :const:`False`, add symbolic and hard links to the archive. If it
280 is :const:`True`, add the content of the target files to the archive. This has no
Georg Brandl8ec7f652007-08-15 14:28:01 +0000281 effect on systems that do not support symbolic links.
282
Lars Gustäbel4bfb5932008-05-17 16:50:22 +0000283 If *ignore_zeros* is :const:`False`, treat an empty block as the end of the archive.
284 If it is :const:`True`, skip empty (and invalid) blocks and try to get as many members
Georg Brandl8ec7f652007-08-15 14:28:01 +0000285 as possible. This is only useful for reading concatenated or damaged archives.
286
287 *debug* can be set from ``0`` (no debug messages) up to ``3`` (all debug
288 messages). The messages are written to ``sys.stderr``.
289
Lars Gustäbel4bfb5932008-05-17 16:50:22 +0000290 If *errorlevel* is ``0``, all errors are ignored when using :meth:`TarFile.extract`.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000291 Nevertheless, they appear as error messages in the debug output, when debugging
292 is enabled. If ``1``, all *fatal* errors are raised as :exc:`OSError` or
293 :exc:`IOError` exceptions. If ``2``, all *non-fatal* errors are raised as
294 :exc:`TarError` exceptions as well.
295
296 The *encoding* and *errors* arguments control the way strings are converted to
297 unicode objects and vice versa. The default settings will work for most users.
298 See section :ref:`tar-unicode` for in-depth information.
299
300 .. versionadded:: 2.6
301
302 The *pax_headers* argument is an optional dictionary of unicode strings which
303 will be added as a pax global header if *format* is :const:`PAX_FORMAT`.
304
305 .. versionadded:: 2.6
306
307
308.. method:: TarFile.open(...)
309
Lars Gustäbel4bfb5932008-05-17 16:50:22 +0000310 Alternative constructor. The :func:`tarfile.open` function is actually a
311 shortcut to this classmethod.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000312
313
314.. method:: TarFile.getmember(name)
315
316 Return a :class:`TarInfo` object for member *name*. If *name* can not be found
317 in the archive, :exc:`KeyError` is raised.
318
319 .. note::
320
321 If a member occurs more than once in the archive, its last occurrence is assumed
322 to be the most up-to-date version.
323
324
325.. method:: TarFile.getmembers()
326
327 Return the members of the archive as a list of :class:`TarInfo` objects. The
328 list has the same order as the members in the archive.
329
330
331.. method:: TarFile.getnames()
332
333 Return the members as a list of their names. It has the same order as the list
334 returned by :meth:`getmembers`.
335
336
337.. method:: TarFile.list(verbose=True)
338
339 Print a table of contents to ``sys.stdout``. If *verbose* is :const:`False`,
340 only the names of the members are printed. If it is :const:`True`, output
341 similar to that of :program:`ls -l` is produced.
342
343
344.. method:: TarFile.next()
345
346 Return the next member of the archive as a :class:`TarInfo` object, when
Lars Gustäbel4bfb5932008-05-17 16:50:22 +0000347 :class:`TarFile` is opened for reading. Return :const:`None` if there is no more
Georg Brandl8ec7f652007-08-15 14:28:01 +0000348 available.
349
350
Lars Gustäbel4bfb5932008-05-17 16:50:22 +0000351.. method:: TarFile.extractall(path=".", members=None)
Georg Brandl8ec7f652007-08-15 14:28:01 +0000352
353 Extract all members from the archive to the current working directory or
354 directory *path*. If optional *members* is given, it must be a subset of the
355 list returned by :meth:`getmembers`. Directory information like owner,
356 modification time and permissions are set after all members have been extracted.
357 This is done to work around two problems: A directory's modification time is
358 reset each time a file is created in it. And, if a directory's permissions do
359 not allow writing, extracting files to it will fail.
360
Lars Gustäbel89241a32007-08-30 20:24:31 +0000361 .. warning::
362
363 Never extract archives from untrusted sources without prior inspection.
364 It is possible that files are created outside of *path*, e.g. members
365 that have absolute filenames starting with ``"/"`` or filenames with two
366 dots ``".."``.
367
Georg Brandl8ec7f652007-08-15 14:28:01 +0000368 .. versionadded:: 2.5
369
370
Lars Gustäbel4bfb5932008-05-17 16:50:22 +0000371.. method:: TarFile.extract(member, path="")
Georg Brandl8ec7f652007-08-15 14:28:01 +0000372
373 Extract a member from the archive to the current working directory, using its
374 full name. Its file information is extracted as accurately as possible. *member*
375 may be a filename or a :class:`TarInfo` object. You can specify a different
376 directory using *path*.
377
378 .. note::
379
Lars Gustäbel4bfb5932008-05-17 16:50:22 +0000380 The :meth:`extract` method does not take care of several extraction issues.
381 In most cases you should consider using the :meth:`extractall` method.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000382
Lars Gustäbel89241a32007-08-30 20:24:31 +0000383 .. warning::
384
385 See the warning for :meth:`extractall`.
386
Georg Brandl8ec7f652007-08-15 14:28:01 +0000387
388.. method:: TarFile.extractfile(member)
389
390 Extract a member from the archive as a file object. *member* may be a filename
391 or a :class:`TarInfo` object. If *member* is a regular file, a file-like object
392 is returned. If *member* is a link, a file-like object is constructed from the
Lars Gustäbel4bfb5932008-05-17 16:50:22 +0000393 link's target. If *member* is none of the above, :const:`None` is returned.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000394
395 .. note::
396
Georg Brandlcf5608d2009-04-25 15:05:04 +0000397 The file-like object is read-only. It provides the methods
398 :meth:`read`, :meth:`readline`, :meth:`readlines`, :meth:`seek`, :meth:`tell`,
399 and :meth:`close`, and also supports iteration over its lines.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000400
401
Lars Gustäbel21121e62009-09-12 10:28:15 +0000402.. method:: TarFile.add(name, arcname=None, recursive=True, exclude=None, filter=None)
Georg Brandl8ec7f652007-08-15 14:28:01 +0000403
404 Add the file *name* to the archive. *name* may be any type of file (directory,
405 fifo, symbolic link, etc.). If given, *arcname* specifies an alternative name
406 for the file in the archive. Directories are added recursively by default. This
407 can be avoided by setting *recursive* to :const:`False`. If *exclude* is given
408 it must be a function that takes one filename argument and returns a boolean
409 value. Depending on this value the respective file is either excluded
Lars Gustäbel21121e62009-09-12 10:28:15 +0000410 (:const:`True`) or added (:const:`False`). If *filter* is specified it must
411 be a function that takes a :class:`TarInfo` object argument and returns the
Andrew M. Kuchlingf5852f52009-10-05 21:24:35 +0000412 changed :class:`TarInfo` object. If it instead returns :const:`None` the :class:`TarInfo`
Lars Gustäbel21121e62009-09-12 10:28:15 +0000413 object will be excluded from the archive. See :ref:`tar-examples` for an
414 example.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000415
416 .. versionchanged:: 2.6
417 Added the *exclude* parameter.
418
Lars Gustäbel21121e62009-09-12 10:28:15 +0000419 .. versionchanged:: 2.7
420 Added the *filter* parameter.
421
422 .. deprecated:: 2.7
423 The *exclude* parameter is deprecated, please use the *filter* parameter
Raymond Hettinger32074e32011-01-26 20:40:32 +0000424 instead. For maximum portability, *filter* should be used as a keyword
425 argument rather than as a positional argument so that code won't be
426 affected when *exclude* is ultimately removed.
Lars Gustäbel21121e62009-09-12 10:28:15 +0000427
Georg Brandl8ec7f652007-08-15 14:28:01 +0000428
Lars Gustäbel4bfb5932008-05-17 16:50:22 +0000429.. method:: TarFile.addfile(tarinfo, fileobj=None)
Georg Brandl8ec7f652007-08-15 14:28:01 +0000430
431 Add the :class:`TarInfo` object *tarinfo* to the archive. If *fileobj* is given,
432 ``tarinfo.size`` bytes are read from it and added to the archive. You can
433 create :class:`TarInfo` objects using :meth:`gettarinfo`.
434
435 .. note::
436
437 On Windows platforms, *fileobj* should always be opened with mode ``'rb'`` to
438 avoid irritation about the file size.
439
440
Lars Gustäbel4bfb5932008-05-17 16:50:22 +0000441.. method:: TarFile.gettarinfo(name=None, arcname=None, fileobj=None)
Georg Brandl8ec7f652007-08-15 14:28:01 +0000442
443 Create a :class:`TarInfo` object for either the file *name* or the file object
444 *fileobj* (using :func:`os.fstat` on its file descriptor). You can modify some
445 of the :class:`TarInfo`'s attributes before you add it using :meth:`addfile`.
446 If given, *arcname* specifies an alternative name for the file in the archive.
447
448
449.. method:: TarFile.close()
450
451 Close the :class:`TarFile`. In write mode, two finishing zero blocks are
452 appended to the archive.
453
454
455.. attribute:: TarFile.posix
456
457 Setting this to :const:`True` is equivalent to setting the :attr:`format`
458 attribute to :const:`USTAR_FORMAT`, :const:`False` is equivalent to
459 :const:`GNU_FORMAT`.
460
461 .. versionchanged:: 2.4
462 *posix* defaults to :const:`False`.
463
464 .. deprecated:: 2.6
465 Use the :attr:`format` attribute instead.
466
467
468.. attribute:: TarFile.pax_headers
469
470 A dictionary containing key-value pairs of pax global headers.
471
472 .. versionadded:: 2.6
473
Georg Brandl8ec7f652007-08-15 14:28:01 +0000474
475.. _tarinfo-objects:
476
477TarInfo Objects
478---------------
479
480A :class:`TarInfo` object represents one member in a :class:`TarFile`. Aside
481from storing all required attributes of a file (like file type, size, time,
482permissions, owner etc.), it provides some useful methods to determine its type.
483It does *not* contain the file's data itself.
484
485:class:`TarInfo` objects are returned by :class:`TarFile`'s methods
486:meth:`getmember`, :meth:`getmembers` and :meth:`gettarinfo`.
487
488
Lars Gustäbel4bfb5932008-05-17 16:50:22 +0000489.. class:: TarInfo(name="")
Georg Brandl8ec7f652007-08-15 14:28:01 +0000490
491 Create a :class:`TarInfo` object.
492
493
494.. method:: TarInfo.frombuf(buf)
495
496 Create and return a :class:`TarInfo` object from string buffer *buf*.
497
498 .. versionadded:: 2.6
499 Raises :exc:`HeaderError` if the buffer is invalid..
500
501
502.. method:: TarInfo.fromtarfile(tarfile)
503
504 Read the next member from the :class:`TarFile` object *tarfile* and return it as
505 a :class:`TarInfo` object.
506
507 .. versionadded:: 2.6
508
509
Lars Gustäbel4bfb5932008-05-17 16:50:22 +0000510.. method:: TarInfo.tobuf(format=DEFAULT_FORMAT, encoding=ENCODING, errors='strict')
Georg Brandl8ec7f652007-08-15 14:28:01 +0000511
512 Create a string buffer from a :class:`TarInfo` object. For information on the
513 arguments see the constructor of the :class:`TarFile` class.
514
515 .. versionchanged:: 2.6
516 The arguments were added.
517
518A ``TarInfo`` object has the following public data attributes:
519
520
521.. attribute:: TarInfo.name
522
523 Name of the archive member.
524
525
526.. attribute:: TarInfo.size
527
528 Size in bytes.
529
530
531.. attribute:: TarInfo.mtime
532
533 Time of last modification.
534
535
536.. attribute:: TarInfo.mode
537
538 Permission bits.
539
540
541.. attribute:: TarInfo.type
542
543 File type. *type* is usually one of these constants: :const:`REGTYPE`,
544 :const:`AREGTYPE`, :const:`LNKTYPE`, :const:`SYMTYPE`, :const:`DIRTYPE`,
545 :const:`FIFOTYPE`, :const:`CONTTYPE`, :const:`CHRTYPE`, :const:`BLKTYPE`,
546 :const:`GNUTYPE_SPARSE`. To determine the type of a :class:`TarInfo` object
547 more conveniently, use the ``is_*()`` methods below.
548
549
550.. attribute:: TarInfo.linkname
551
552 Name of the target file name, which is only present in :class:`TarInfo` objects
553 of type :const:`LNKTYPE` and :const:`SYMTYPE`.
554
555
556.. attribute:: TarInfo.uid
557
558 User ID of the user who originally stored this member.
559
560
561.. attribute:: TarInfo.gid
562
563 Group ID of the user who originally stored this member.
564
565
566.. attribute:: TarInfo.uname
567
568 User name.
569
570
571.. attribute:: TarInfo.gname
572
573 Group name.
574
575
576.. attribute:: TarInfo.pax_headers
577
578 A dictionary containing key-value pairs of an associated pax extended header.
579
580 .. versionadded:: 2.6
581
582A :class:`TarInfo` object also provides some convenient query methods:
583
584
585.. method:: TarInfo.isfile()
586
587 Return :const:`True` if the :class:`Tarinfo` object is a regular file.
588
589
590.. method:: TarInfo.isreg()
591
592 Same as :meth:`isfile`.
593
594
595.. method:: TarInfo.isdir()
596
597 Return :const:`True` if it is a directory.
598
599
600.. method:: TarInfo.issym()
601
602 Return :const:`True` if it is a symbolic link.
603
604
605.. method:: TarInfo.islnk()
606
607 Return :const:`True` if it is a hard link.
608
609
610.. method:: TarInfo.ischr()
611
612 Return :const:`True` if it is a character device.
613
614
615.. method:: TarInfo.isblk()
616
617 Return :const:`True` if it is a block device.
618
619
620.. method:: TarInfo.isfifo()
621
622 Return :const:`True` if it is a FIFO.
623
624
625.. method:: TarInfo.isdev()
626
627 Return :const:`True` if it is one of character device, block device or FIFO.
628
Georg Brandl8ec7f652007-08-15 14:28:01 +0000629
630.. _tar-examples:
631
632Examples
633--------
634
635How to extract an entire tar archive to the current working directory::
636
637 import tarfile
638 tar = tarfile.open("sample.tar.gz")
639 tar.extractall()
640 tar.close()
641
Lars Gustäbel4bfb5932008-05-17 16:50:22 +0000642How to extract a subset of a tar archive with :meth:`TarFile.extractall` using
643a generator function instead of a list::
644
645 import os
646 import tarfile
647
648 def py_files(members):
649 for tarinfo in members:
650 if os.path.splitext(tarinfo.name)[1] == ".py":
651 yield tarinfo
652
653 tar = tarfile.open("sample.tar.gz")
654 tar.extractall(members=py_files(tar))
655 tar.close()
656
Georg Brandl8ec7f652007-08-15 14:28:01 +0000657How to create an uncompressed tar archive from a list of filenames::
658
659 import tarfile
660 tar = tarfile.open("sample.tar", "w")
661 for name in ["foo", "bar", "quux"]:
662 tar.add(name)
663 tar.close()
664
Lars Gustäbel64581042010-03-03 11:55:48 +0000665The same example using the :keyword:`with` statement::
666
667 import tarfile
668 with tarfile.open("sample.tar", "w") as tar:
669 for name in ["foo", "bar", "quux"]:
670 tar.add(name)
671
Georg Brandl8ec7f652007-08-15 14:28:01 +0000672How to read a gzip compressed tar archive and display some member information::
673
674 import tarfile
675 tar = tarfile.open("sample.tar.gz", "r:gz")
676 for tarinfo in tar:
677 print tarinfo.name, "is", tarinfo.size, "bytes in size and is",
678 if tarinfo.isreg():
679 print "a regular file."
680 elif tarinfo.isdir():
681 print "a directory."
682 else:
683 print "something else."
684 tar.close()
685
Lars Gustäbel21121e62009-09-12 10:28:15 +0000686How to create an archive and reset the user information using the *filter*
687parameter in :meth:`TarFile.add`::
688
689 import tarfile
690 def reset(tarinfo):
691 tarinfo.uid = tarinfo.gid = 0
692 tarinfo.uname = tarinfo.gname = "root"
693 return tarinfo
694 tar = tarfile.open("sample.tar.gz", "w:gz")
695 tar.add("foo", filter=reset)
696 tar.close()
697
Georg Brandl8ec7f652007-08-15 14:28:01 +0000698
699.. _tar-formats:
700
701Supported tar formats
702---------------------
703
704There are three tar formats that can be created with the :mod:`tarfile` module:
705
706* The POSIX.1-1988 ustar format (:const:`USTAR_FORMAT`). It supports filenames
707 up to a length of at best 256 characters and linknames up to 100 characters. The
708 maximum file size is 8 gigabytes. This is an old and limited but widely
709 supported format.
710
711* The GNU tar format (:const:`GNU_FORMAT`). It supports long filenames and
712 linknames, files bigger than 8 gigabytes and sparse files. It is the de facto
713 standard on GNU/Linux systems. :mod:`tarfile` fully supports the GNU tar
714 extensions for long names, sparse file support is read-only.
715
716* The POSIX.1-2001 pax format (:const:`PAX_FORMAT`). It is the most flexible
717 format with virtually no limits. It supports long filenames and linknames, large
718 files and stores pathnames in a portable way. However, not all tar
719 implementations today are able to handle pax archives properly.
720
721 The *pax* format is an extension to the existing *ustar* format. It uses extra
722 headers for information that cannot be stored otherwise. There are two flavours
723 of pax headers: Extended headers only affect the subsequent file header, global
724 headers are valid for the complete archive and affect all following files. All
725 the data in a pax header is encoded in *UTF-8* for portability reasons.
726
727There are some more variants of the tar format which can be read, but not
728created:
729
730* The ancient V7 format. This is the first tar format from Unix Seventh Edition,
731 storing only regular files and directories. Names must not be longer than 100
732 characters, there is no user/group name information. Some archives have
733 miscalculated header checksums in case of fields with non-ASCII characters.
734
735* The SunOS tar extended format. This format is a variant of the POSIX.1-2001
736 pax format, but is not compatible.
737
Georg Brandl8ec7f652007-08-15 14:28:01 +0000738.. _tar-unicode:
739
740Unicode issues
741--------------
742
743The tar format was originally conceived to make backups on tape drives with the
744main focus on preserving file system information. Nowadays tar archives are
745commonly used for file distribution and exchanging archives over networks. One
746problem of the original format (that all other formats are merely variants of)
747is that there is no concept of supporting different character encodings. For
748example, an ordinary tar archive created on a *UTF-8* system cannot be read
749correctly on a *Latin-1* system if it contains non-ASCII characters. Names (i.e.
750filenames, linknames, user/group names) containing these characters will appear
751damaged. Unfortunately, there is no way to autodetect the encoding of an
752archive.
753
754The pax format was designed to solve this problem. It stores non-ASCII names
755using the universal character encoding *UTF-8*. When a pax archive is read,
756these *UTF-8* names are converted to the encoding of the local file system.
757
758The details of unicode conversion are controlled by the *encoding* and *errors*
759keyword arguments of the :class:`TarFile` class.
760
761The default value for *encoding* is the local character encoding. It is deduced
762from :func:`sys.getfilesystemencoding` and :func:`sys.getdefaultencoding`. In
763read mode, *encoding* is used exclusively to convert unicode names from a pax
764archive to strings in the local character encoding. In write mode, the use of
765*encoding* depends on the chosen archive format. In case of :const:`PAX_FORMAT`,
766input names that contain non-ASCII characters need to be decoded before being
767stored as *UTF-8* strings. The other formats do not make use of *encoding*
768unless unicode objects are used as input names. These are converted to 8-bit
769character strings before they are added to the archive.
770
771The *errors* argument defines how characters are treated that cannot be
772converted to or from *encoding*. Possible values are listed in section
773:ref:`codec-base-classes`. In read mode, there is an additional scheme
774``'utf-8'`` which means that bad characters are replaced by their *UTF-8*
775representation. This is the default scheme. In write mode the default value for
776*errors* is ``'strict'`` to ensure that name information is not altered
777unnoticed.
778