blob: d5a511eaaf250208fa458b479252476898438b04 [file] [log] [blame]
Georg Brandl116aa622007-08-15 14:28:22 +00001:mod:`tarfile` --- Read and write tar archive files
2===================================================
3
4.. module:: tarfile
5 :synopsis: Read and write tar-format archive files.
6
7
Georg Brandl116aa622007-08-15 14:28:22 +00008.. moduleauthor:: Lars Gustäbel <lars@gustaebel.de>
9.. sectionauthor:: Lars Gustäbel <lars@gustaebel.de>
10
11
Guido van Rossum77677112007-11-05 19:43:04 +000012The :mod:`tarfile` module makes it possible to read and write tar
13archives, including those using gzip or bz2 compression.
Christian Heimes255f53b2007-12-08 15:33:56 +000014(:file:`.zip` files can be read and written using the :mod:`zipfile` module.)
Guido van Rossum77677112007-11-05 19:43:04 +000015
Georg Brandl116aa622007-08-15 14:28:22 +000016Some facts and figures:
17
Guido van Rossum77677112007-11-05 19:43:04 +000018* reads and writes :mod:`gzip` and :mod:`bz2` compressed archives.
Georg Brandl116aa622007-08-15 14:28:22 +000019
20* read/write support for the POSIX.1-1988 (ustar) format.
21
22* read/write support for the GNU tar format including *longname* and *longlink*
23 extensions, read-only support for the *sparse* extension.
24
25* read/write support for the POSIX.1-2001 (pax) format.
26
Georg Brandl116aa622007-08-15 14:28:22 +000027* handles directories, regular files, hardlinks, symbolic links, fifos,
28 character devices and block devices and is able to acquire and restore file
29 information like timestamp, access permissions and owner.
30
Georg Brandl116aa622007-08-15 14:28:22 +000031
Benjamin Petersona37cfc62008-05-26 13:48:34 +000032.. function:: open(name=None, mode='r', fileobj=None, bufsize=10240, \*\*kwargs)
Georg Brandl116aa622007-08-15 14:28:22 +000033
34 Return a :class:`TarFile` object for the pathname *name*. For detailed
35 information on :class:`TarFile` objects and the keyword arguments that are
36 allowed, see :ref:`tarfile-objects`.
37
38 *mode* has to be a string of the form ``'filemode[:compression]'``, it defaults
39 to ``'r'``. Here is a full list of mode combinations:
40
41 +------------------+---------------------------------------------+
42 | mode | action |
43 +==================+=============================================+
44 | ``'r' or 'r:*'`` | Open for reading with transparent |
45 | | compression (recommended). |
46 +------------------+---------------------------------------------+
47 | ``'r:'`` | Open for reading exclusively without |
48 | | compression. |
49 +------------------+---------------------------------------------+
50 | ``'r:gz'`` | Open for reading with gzip compression. |
51 +------------------+---------------------------------------------+
52 | ``'r:bz2'`` | Open for reading with bzip2 compression. |
53 +------------------+---------------------------------------------+
54 | ``'a' or 'a:'`` | Open for appending with no compression. The |
55 | | file is created if it does not exist. |
56 +------------------+---------------------------------------------+
57 | ``'w' or 'w:'`` | Open for uncompressed writing. |
58 +------------------+---------------------------------------------+
59 | ``'w:gz'`` | Open for gzip compressed writing. |
60 +------------------+---------------------------------------------+
61 | ``'w:bz2'`` | Open for bzip2 compressed writing. |
62 +------------------+---------------------------------------------+
63
64 Note that ``'a:gz'`` or ``'a:bz2'`` is not possible. If *mode* is not suitable
65 to open a certain (compressed) file for reading, :exc:`ReadError` is raised. Use
66 *mode* ``'r'`` to avoid this. If a compression method is not supported,
67 :exc:`CompressionError` is raised.
68
Antoine Pitrou25d535e2010-09-15 11:25:11 +000069 If *fileobj* is specified, it is used as an alternative to a :term:`file object`
70 opened in binary mode for *name*. It is supposed to be at position 0.
Georg Brandl116aa622007-08-15 14:28:22 +000071
72 For special purposes, there is a second format for *mode*:
Benjamin Petersona37cfc62008-05-26 13:48:34 +000073 ``'filemode|[compression]'``. :func:`tarfile.open` will return a :class:`TarFile`
Georg Brandl116aa622007-08-15 14:28:22 +000074 object that processes its data as a stream of blocks. No random seeking will
75 be done on the file. If given, *fileobj* may be any object that has a
76 :meth:`read` or :meth:`write` method (depending on the *mode*). *bufsize*
77 specifies the blocksize and defaults to ``20 * 512`` bytes. Use this variant
Antoine Pitrou25d535e2010-09-15 11:25:11 +000078 in combination with e.g. ``sys.stdin``, a socket :term:`file object` or a tape
Georg Brandl116aa622007-08-15 14:28:22 +000079 device. However, such a :class:`TarFile` object is limited in that it does
80 not allow to be accessed randomly, see :ref:`tar-examples`. The currently
81 possible modes:
82
83 +-------------+--------------------------------------------+
84 | Mode | Action |
85 +=============+============================================+
86 | ``'r|*'`` | Open a *stream* of tar blocks for reading |
87 | | with transparent compression. |
88 +-------------+--------------------------------------------+
89 | ``'r|'`` | Open a *stream* of uncompressed tar blocks |
90 | | for reading. |
91 +-------------+--------------------------------------------+
92 | ``'r|gz'`` | Open a gzip compressed *stream* for |
93 | | reading. |
94 +-------------+--------------------------------------------+
95 | ``'r|bz2'`` | Open a bzip2 compressed *stream* for |
96 | | reading. |
97 +-------------+--------------------------------------------+
98 | ``'w|'`` | Open an uncompressed *stream* for writing. |
99 +-------------+--------------------------------------------+
100 | ``'w|gz'`` | Open an gzip compressed *stream* for |
101 | | writing. |
102 +-------------+--------------------------------------------+
103 | ``'w|bz2'`` | Open an bzip2 compressed *stream* for |
104 | | writing. |
105 +-------------+--------------------------------------------+
106
107
108.. class:: TarFile
109
110 Class for reading and writing tar archives. Do not use this class directly,
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000111 better use :func:`tarfile.open` instead. See :ref:`tarfile-objects`.
Georg Brandl116aa622007-08-15 14:28:22 +0000112
113
114.. function:: is_tarfile(name)
115
116 Return :const:`True` if *name* is a tar archive file, that the :mod:`tarfile`
117 module can read.
118
119
Lars Gustäbel0c24e8b2008-08-02 11:43:24 +0000120The :mod:`tarfile` module defines the following exceptions:
Georg Brandl116aa622007-08-15 14:28:22 +0000121
122
123.. exception:: TarError
124
125 Base class for all :mod:`tarfile` exceptions.
126
127
128.. exception:: ReadError
129
130 Is raised when a tar archive is opened, that either cannot be handled by the
131 :mod:`tarfile` module or is somehow invalid.
132
133
134.. exception:: CompressionError
135
136 Is raised when a compression method is not supported or when the data cannot be
137 decoded properly.
138
139
140.. exception:: StreamError
141
142 Is raised for the limitations that are typical for stream-like :class:`TarFile`
143 objects.
144
145
146.. exception:: ExtractError
147
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000148 Is raised for *non-fatal* errors when using :meth:`TarFile.extract`, but only if
Georg Brandl116aa622007-08-15 14:28:22 +0000149 :attr:`TarFile.errorlevel`\ ``== 2``.
150
151
152.. exception:: HeaderError
153
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000154 Is raised by :meth:`TarInfo.frombuf` if the buffer it gets is invalid.
155
Georg Brandl116aa622007-08-15 14:28:22 +0000156
Georg Brandl116aa622007-08-15 14:28:22 +0000157
158Each of the following constants defines a tar archive format that the
159:mod:`tarfile` module is able to create. See section :ref:`tar-formats` for
160details.
161
162
163.. data:: USTAR_FORMAT
164
165 POSIX.1-1988 (ustar) format.
166
167
168.. data:: GNU_FORMAT
169
170 GNU tar format.
171
172
173.. data:: PAX_FORMAT
174
175 POSIX.1-2001 (pax) format.
176
177
178.. data:: DEFAULT_FORMAT
179
180 The default format for creating archives. This is currently :const:`GNU_FORMAT`.
181
182
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000183The following variables are available on module level:
184
185
186.. data:: ENCODING
187
188 The default character encoding i.e. the value from either
189 :func:`sys.getfilesystemencoding` or :func:`sys.getdefaultencoding`.
190
191
Georg Brandl116aa622007-08-15 14:28:22 +0000192.. seealso::
193
194 Module :mod:`zipfile`
195 Documentation of the :mod:`zipfile` standard module.
196
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000197 `GNU tar manual, Basic Tar Format <http://www.gnu.org/software/tar/manual/html_node/Standard.html>`_
Georg Brandl116aa622007-08-15 14:28:22 +0000198 Documentation for tar archive files, including GNU tar extensions.
199
Georg Brandl116aa622007-08-15 14:28:22 +0000200
201.. _tarfile-objects:
202
203TarFile Objects
204---------------
205
206The :class:`TarFile` object provides an interface to a tar archive. A tar
207archive is a sequence of blocks. An archive member (a stored file) is made up of
208a header block followed by data blocks. It is possible to store a file in a tar
209archive several times. Each archive member is represented by a :class:`TarInfo`
210object, see :ref:`tarinfo-objects` for details.
211
212
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000213.. class:: TarFile(name=None, mode='r', fileobj=None, format=DEFAULT_FORMAT, tarinfo=TarInfo, dereference=False, ignore_zeros=False, encoding=ENCODING, errors=None, pax_headers=None, debug=0, errorlevel=0)
Georg Brandl116aa622007-08-15 14:28:22 +0000214
215 All following arguments are optional and can be accessed as instance attributes
216 as well.
217
218 *name* is the pathname of the archive. It can be omitted if *fileobj* is given.
219 In this case, the file object's :attr:`name` attribute is used if it exists.
220
221 *mode* is either ``'r'`` to read from an existing archive, ``'a'`` to append
222 data to an existing file or ``'w'`` to create a new file overwriting an existing
223 one.
224
225 If *fileobj* is given, it is used for reading or writing data. If it can be
226 determined, *mode* is overridden by *fileobj*'s mode. *fileobj* will be used
227 from position 0.
228
229 .. note::
230
231 *fileobj* is not closed, when :class:`TarFile` is closed.
232
233 *format* controls the archive format. It must be one of the constants
234 :const:`USTAR_FORMAT`, :const:`GNU_FORMAT` or :const:`PAX_FORMAT` that are
235 defined at module level.
236
Georg Brandl116aa622007-08-15 14:28:22 +0000237 The *tarinfo* argument can be used to replace the default :class:`TarInfo` class
238 with a different one.
239
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000240 If *dereference* is :const:`False`, add symbolic and hard links to the archive. If it
241 is :const:`True`, add the content of the target files to the archive. This has no
Georg Brandl116aa622007-08-15 14:28:22 +0000242 effect on systems that do not support symbolic links.
243
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000244 If *ignore_zeros* is :const:`False`, treat an empty block as the end of the archive.
245 If it is :const:`True`, skip empty (and invalid) blocks and try to get as many members
Georg Brandl116aa622007-08-15 14:28:22 +0000246 as possible. This is only useful for reading concatenated or damaged archives.
247
248 *debug* can be set from ``0`` (no debug messages) up to ``3`` (all debug
249 messages). The messages are written to ``sys.stderr``.
250
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000251 If *errorlevel* is ``0``, all errors are ignored when using :meth:`TarFile.extract`.
Georg Brandl116aa622007-08-15 14:28:22 +0000252 Nevertheless, they appear as error messages in the debug output, when debugging
253 is enabled. If ``1``, all *fatal* errors are raised as :exc:`OSError` or
254 :exc:`IOError` exceptions. If ``2``, all *non-fatal* errors are raised as
255 :exc:`TarError` exceptions as well.
256
Lars Gustäbel3741eff2007-08-21 12:17:05 +0000257 The *encoding* and *errors* arguments define the character encoding to be
258 used for reading or writing the archive and how conversion errors are going
259 to be handled. The default settings will work for most users.
Georg Brandl116aa622007-08-15 14:28:22 +0000260 See section :ref:`tar-unicode` for in-depth information.
261
Lars Gustäbel3741eff2007-08-21 12:17:05 +0000262 The *pax_headers* argument is an optional dictionary of strings which
Georg Brandl116aa622007-08-15 14:28:22 +0000263 will be added as a pax global header if *format* is :const:`PAX_FORMAT`.
264
Georg Brandl116aa622007-08-15 14:28:22 +0000265
266.. method:: TarFile.open(...)
267
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000268 Alternative constructor. The :func:`tarfile.open` function is actually a
269 shortcut to this classmethod.
Georg Brandl116aa622007-08-15 14:28:22 +0000270
271
272.. method:: TarFile.getmember(name)
273
274 Return a :class:`TarInfo` object for member *name*. If *name* can not be found
275 in the archive, :exc:`KeyError` is raised.
276
277 .. note::
278
279 If a member occurs more than once in the archive, its last occurrence is assumed
280 to be the most up-to-date version.
281
282
283.. method:: TarFile.getmembers()
284
285 Return the members of the archive as a list of :class:`TarInfo` objects. The
286 list has the same order as the members in the archive.
287
288
289.. method:: TarFile.getnames()
290
291 Return the members as a list of their names. It has the same order as the list
292 returned by :meth:`getmembers`.
293
294
295.. method:: TarFile.list(verbose=True)
296
297 Print a table of contents to ``sys.stdout``. If *verbose* is :const:`False`,
298 only the names of the members are printed. If it is :const:`True`, output
299 similar to that of :program:`ls -l` is produced.
300
301
302.. method:: TarFile.next()
303
304 Return the next member of the archive as a :class:`TarInfo` object, when
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000305 :class:`TarFile` is opened for reading. Return :const:`None` if there is no more
Georg Brandl116aa622007-08-15 14:28:22 +0000306 available.
307
308
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000309.. method:: TarFile.extractall(path=".", members=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000310
311 Extract all members from the archive to the current working directory or
312 directory *path*. If optional *members* is given, it must be a subset of the
313 list returned by :meth:`getmembers`. Directory information like owner,
314 modification time and permissions are set after all members have been extracted.
315 This is done to work around two problems: A directory's modification time is
316 reset each time a file is created in it. And, if a directory's permissions do
317 not allow writing, extracting files to it will fail.
318
Thomas Wouters47b49bf2007-08-30 22:15:33 +0000319 .. warning::
320
321 Never extract archives from untrusted sources without prior inspection.
322 It is possible that files are created outside of *path*, e.g. members
323 that have absolute filenames starting with ``"/"`` or filenames with two
324 dots ``".."``.
325
Georg Brandl116aa622007-08-15 14:28:22 +0000326
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000327.. method:: TarFile.extract(member, path="")
Georg Brandl116aa622007-08-15 14:28:22 +0000328
329 Extract a member from the archive to the current working directory, using its
330 full name. Its file information is extracted as accurately as possible. *member*
331 may be a filename or a :class:`TarInfo` object. You can specify a different
332 directory using *path*.
333
334 .. note::
335
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000336 The :meth:`extract` method does not take care of several extraction issues.
337 In most cases you should consider using the :meth:`extractall` method.
Georg Brandl116aa622007-08-15 14:28:22 +0000338
Thomas Wouters47b49bf2007-08-30 22:15:33 +0000339 .. warning::
340
341 See the warning for :meth:`extractall`.
342
Georg Brandl116aa622007-08-15 14:28:22 +0000343
344.. method:: TarFile.extractfile(member)
345
346 Extract a member from the archive as a file object. *member* may be a filename
Antoine Pitrou25d535e2010-09-15 11:25:11 +0000347 or a :class:`TarInfo` object. If *member* is a regular file, a :term:`file-like
348 object` is returned. If *member* is a link, a file-like object is constructed from
349 the link's target. If *member* is none of the above, :const:`None` is returned.
Georg Brandl116aa622007-08-15 14:28:22 +0000350
351 .. note::
352
Georg Brandlff2ad0e2009-04-27 16:51:45 +0000353 The file-like object is read-only. It provides the methods
354 :meth:`read`, :meth:`readline`, :meth:`readlines`, :meth:`seek`, :meth:`tell`,
355 and :meth:`close`, and also supports iteration over its lines.
Georg Brandl116aa622007-08-15 14:28:22 +0000356
357
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000358.. method:: TarFile.add(name, arcname=None, recursive=True, exclude=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000359
360 Add the file *name* to the archive. *name* may be any type of file (directory,
361 fifo, symbolic link, etc.). If given, *arcname* specifies an alternative name
362 for the file in the archive. Directories are added recursively by default. This
Georg Brandl55ac8f02007-09-01 13:51:09 +0000363 can be avoided by setting *recursive* to :const:`False`. If *exclude* is given,
Georg Brandl116aa622007-08-15 14:28:22 +0000364 it must be a function that takes one filename argument and returns a boolean
365 value. Depending on this value the respective file is either excluded
366 (:const:`True`) or added (:const:`False`).
367
Georg Brandl116aa622007-08-15 14:28:22 +0000368
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000369.. method:: TarFile.addfile(tarinfo, fileobj=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000370
371 Add the :class:`TarInfo` object *tarinfo* to the archive. If *fileobj* is given,
372 ``tarinfo.size`` bytes are read from it and added to the archive. You can
373 create :class:`TarInfo` objects using :meth:`gettarinfo`.
374
375 .. note::
376
377 On Windows platforms, *fileobj* should always be opened with mode ``'rb'`` to
378 avoid irritation about the file size.
379
380
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000381.. method:: TarFile.gettarinfo(name=None, arcname=None, fileobj=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000382
Antoine Pitrou25d535e2010-09-15 11:25:11 +0000383 Create a :class:`TarInfo` object for either the file *name* or the :term:`file
384 object` *fileobj* (using :func:`os.fstat` on its file descriptor). You can modify
385 some of the :class:`TarInfo`'s attributes before you add it using :meth:`addfile`.
Georg Brandl116aa622007-08-15 14:28:22 +0000386 If given, *arcname* specifies an alternative name for the file in the archive.
387
388
389.. method:: TarFile.close()
390
391 Close the :class:`TarFile`. In write mode, two finishing zero blocks are
392 appended to the archive.
393
394
Georg Brandl116aa622007-08-15 14:28:22 +0000395.. attribute:: TarFile.pax_headers
396
397 A dictionary containing key-value pairs of pax global headers.
398
Georg Brandl116aa622007-08-15 14:28:22 +0000399
Georg Brandl116aa622007-08-15 14:28:22 +0000400
401.. _tarinfo-objects:
402
403TarInfo Objects
404---------------
405
406A :class:`TarInfo` object represents one member in a :class:`TarFile`. Aside
407from storing all required attributes of a file (like file type, size, time,
408permissions, owner etc.), it provides some useful methods to determine its type.
409It does *not* contain the file's data itself.
410
411:class:`TarInfo` objects are returned by :class:`TarFile`'s methods
412:meth:`getmember`, :meth:`getmembers` and :meth:`gettarinfo`.
413
414
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000415.. class:: TarInfo(name="")
Georg Brandl116aa622007-08-15 14:28:22 +0000416
417 Create a :class:`TarInfo` object.
418
419
420.. method:: TarInfo.frombuf(buf)
421
422 Create and return a :class:`TarInfo` object from string buffer *buf*.
423
Georg Brandl55ac8f02007-09-01 13:51:09 +0000424 Raises :exc:`HeaderError` if the buffer is invalid..
Georg Brandl116aa622007-08-15 14:28:22 +0000425
426
427.. method:: TarInfo.fromtarfile(tarfile)
428
429 Read the next member from the :class:`TarFile` object *tarfile* and return it as
430 a :class:`TarInfo` object.
431
Georg Brandl116aa622007-08-15 14:28:22 +0000432
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000433.. method:: TarInfo.tobuf(format=DEFAULT_FORMAT, encoding=ENCODING, errors='strict')
Georg Brandl116aa622007-08-15 14:28:22 +0000434
435 Create a string buffer from a :class:`TarInfo` object. For information on the
436 arguments see the constructor of the :class:`TarFile` class.
437
Georg Brandl116aa622007-08-15 14:28:22 +0000438
439A ``TarInfo`` object has the following public data attributes:
440
441
442.. attribute:: TarInfo.name
443
444 Name of the archive member.
445
446
447.. attribute:: TarInfo.size
448
449 Size in bytes.
450
451
452.. attribute:: TarInfo.mtime
453
454 Time of last modification.
455
456
457.. attribute:: TarInfo.mode
458
459 Permission bits.
460
461
462.. attribute:: TarInfo.type
463
464 File type. *type* is usually one of these constants: :const:`REGTYPE`,
465 :const:`AREGTYPE`, :const:`LNKTYPE`, :const:`SYMTYPE`, :const:`DIRTYPE`,
466 :const:`FIFOTYPE`, :const:`CONTTYPE`, :const:`CHRTYPE`, :const:`BLKTYPE`,
467 :const:`GNUTYPE_SPARSE`. To determine the type of a :class:`TarInfo` object
468 more conveniently, use the ``is_*()`` methods below.
469
470
471.. attribute:: TarInfo.linkname
472
473 Name of the target file name, which is only present in :class:`TarInfo` objects
474 of type :const:`LNKTYPE` and :const:`SYMTYPE`.
475
476
477.. attribute:: TarInfo.uid
478
479 User ID of the user who originally stored this member.
480
481
482.. attribute:: TarInfo.gid
483
484 Group ID of the user who originally stored this member.
485
486
487.. attribute:: TarInfo.uname
488
489 User name.
490
491
492.. attribute:: TarInfo.gname
493
494 Group name.
495
496
497.. attribute:: TarInfo.pax_headers
498
499 A dictionary containing key-value pairs of an associated pax extended header.
500
Georg Brandl116aa622007-08-15 14:28:22 +0000501
502A :class:`TarInfo` object also provides some convenient query methods:
503
504
505.. method:: TarInfo.isfile()
506
507 Return :const:`True` if the :class:`Tarinfo` object is a regular file.
508
509
510.. method:: TarInfo.isreg()
511
512 Same as :meth:`isfile`.
513
514
515.. method:: TarInfo.isdir()
516
517 Return :const:`True` if it is a directory.
518
519
520.. method:: TarInfo.issym()
521
522 Return :const:`True` if it is a symbolic link.
523
524
525.. method:: TarInfo.islnk()
526
527 Return :const:`True` if it is a hard link.
528
529
530.. method:: TarInfo.ischr()
531
532 Return :const:`True` if it is a character device.
533
534
535.. method:: TarInfo.isblk()
536
537 Return :const:`True` if it is a block device.
538
539
540.. method:: TarInfo.isfifo()
541
542 Return :const:`True` if it is a FIFO.
543
544
545.. method:: TarInfo.isdev()
546
547 Return :const:`True` if it is one of character device, block device or FIFO.
548
Georg Brandl116aa622007-08-15 14:28:22 +0000549
550.. _tar-examples:
551
552Examples
553--------
554
555How to extract an entire tar archive to the current working directory::
556
557 import tarfile
558 tar = tarfile.open("sample.tar.gz")
559 tar.extractall()
560 tar.close()
561
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000562How to extract a subset of a tar archive with :meth:`TarFile.extractall` using
563a generator function instead of a list::
564
565 import os
566 import tarfile
567
568 def py_files(members):
569 for tarinfo in members:
570 if os.path.splitext(tarinfo.name)[1] == ".py":
571 yield tarinfo
572
573 tar = tarfile.open("sample.tar.gz")
574 tar.extractall(members=py_files(tar))
575 tar.close()
576
Georg Brandl116aa622007-08-15 14:28:22 +0000577How to create an uncompressed tar archive from a list of filenames::
578
579 import tarfile
580 tar = tarfile.open("sample.tar", "w")
581 for name in ["foo", "bar", "quux"]:
582 tar.add(name)
583 tar.close()
584
585How to read a gzip compressed tar archive and display some member information::
586
587 import tarfile
588 tar = tarfile.open("sample.tar.gz", "r:gz")
589 for tarinfo in tar:
Collin Winterc79461b2007-09-01 23:34:30 +0000590 print(tarinfo.name, "is", tarinfo.size, "bytes in size and is", end="")
Georg Brandl116aa622007-08-15 14:28:22 +0000591 if tarinfo.isreg():
Collin Winterc79461b2007-09-01 23:34:30 +0000592 print("a regular file.")
Georg Brandl116aa622007-08-15 14:28:22 +0000593 elif tarinfo.isdir():
Collin Winterc79461b2007-09-01 23:34:30 +0000594 print("a directory.")
Georg Brandl116aa622007-08-15 14:28:22 +0000595 else:
Collin Winterc79461b2007-09-01 23:34:30 +0000596 print("something else.")
Georg Brandl116aa622007-08-15 14:28:22 +0000597 tar.close()
598
Georg Brandl116aa622007-08-15 14:28:22 +0000599
600.. _tar-formats:
601
602Supported tar formats
603---------------------
604
605There are three tar formats that can be created with the :mod:`tarfile` module:
606
607* The POSIX.1-1988 ustar format (:const:`USTAR_FORMAT`). It supports filenames
608 up to a length of at best 256 characters and linknames up to 100 characters. The
609 maximum file size is 8 gigabytes. This is an old and limited but widely
610 supported format.
611
612* The GNU tar format (:const:`GNU_FORMAT`). It supports long filenames and
613 linknames, files bigger than 8 gigabytes and sparse files. It is the de facto
614 standard on GNU/Linux systems. :mod:`tarfile` fully supports the GNU tar
615 extensions for long names, sparse file support is read-only.
616
617* The POSIX.1-2001 pax format (:const:`PAX_FORMAT`). It is the most flexible
618 format with virtually no limits. It supports long filenames and linknames, large
619 files and stores pathnames in a portable way. However, not all tar
620 implementations today are able to handle pax archives properly.
621
622 The *pax* format is an extension to the existing *ustar* format. It uses extra
623 headers for information that cannot be stored otherwise. There are two flavours
624 of pax headers: Extended headers only affect the subsequent file header, global
625 headers are valid for the complete archive and affect all following files. All
626 the data in a pax header is encoded in *UTF-8* for portability reasons.
627
628There are some more variants of the tar format which can be read, but not
629created:
630
631* The ancient V7 format. This is the first tar format from Unix Seventh Edition,
632 storing only regular files and directories. Names must not be longer than 100
633 characters, there is no user/group name information. Some archives have
634 miscalculated header checksums in case of fields with non-ASCII characters.
635
636* The SunOS tar extended format. This format is a variant of the POSIX.1-2001
637 pax format, but is not compatible.
638
Georg Brandl116aa622007-08-15 14:28:22 +0000639.. _tar-unicode:
640
641Unicode issues
642--------------
643
644The tar format was originally conceived to make backups on tape drives with the
645main focus on preserving file system information. Nowadays tar archives are
646commonly used for file distribution and exchanging archives over networks. One
Lars Gustäbel3741eff2007-08-21 12:17:05 +0000647problem of the original format (which is the basis of all other formats) is
648that there is no concept of supporting different character encodings. For
Georg Brandl116aa622007-08-15 14:28:22 +0000649example, an ordinary tar archive created on a *UTF-8* system cannot be read
Lars Gustäbel3741eff2007-08-21 12:17:05 +0000650correctly on a *Latin-1* system if it contains non-*ASCII* characters. Textual
651metadata (like filenames, linknames, user/group names) will appear damaged.
652Unfortunately, there is no way to autodetect the encoding of an archive. The
653pax format was designed to solve this problem. It stores non-ASCII metadata
654using the universal character encoding *UTF-8*.
Georg Brandl116aa622007-08-15 14:28:22 +0000655
Lars Gustäbel3741eff2007-08-21 12:17:05 +0000656The details of character conversion in :mod:`tarfile` are controlled by the
657*encoding* and *errors* keyword arguments of the :class:`TarFile` class.
Georg Brandl116aa622007-08-15 14:28:22 +0000658
Lars Gustäbel3741eff2007-08-21 12:17:05 +0000659*encoding* defines the character encoding to use for the metadata in the
660archive. The default value is :func:`sys.getfilesystemencoding` or ``'ascii'``
661as a fallback. Depending on whether the archive is read or written, the
662metadata must be either decoded or encoded. If *encoding* is not set
663appropriately, this conversion may fail.
Georg Brandl116aa622007-08-15 14:28:22 +0000664
665The *errors* argument defines how characters are treated that cannot be
Lars Gustäbel3741eff2007-08-21 12:17:05 +0000666converted. Possible values are listed in section :ref:`codec-base-classes`. In
667read mode the default scheme is ``'replace'``. This avoids unexpected
668:exc:`UnicodeError` exceptions and guarantees that an archive can always be
669read. In write mode the default value for *errors* is ``'strict'``. This
670ensures that name information is not altered unnoticed.
Georg Brandl116aa622007-08-15 14:28:22 +0000671
Lars Gustäbel3741eff2007-08-21 12:17:05 +0000672In case of writing :const:`PAX_FORMAT` archives, *encoding* is ignored because
673non-ASCII metadata is stored using *UTF-8*.