blob: 0002143341592f0f16da0a79cba00a80beb0d575 [file] [log] [blame]
Georg Brandl116aa622007-08-15 14:28:22 +00001.. _tarfile-mod:
2
3:mod:`tarfile` --- Read and write tar archive files
4===================================================
5
6.. module:: tarfile
7 :synopsis: Read and write tar-format archive files.
8
9
Georg Brandl116aa622007-08-15 14:28:22 +000010.. moduleauthor:: Lars Gustäbel <lars@gustaebel.de>
11.. sectionauthor:: Lars Gustäbel <lars@gustaebel.de>
12
13
Guido van Rossum77677112007-11-05 19:43:04 +000014The :mod:`tarfile` module makes it possible to read and write tar
15archives, including those using gzip or bz2 compression.
Christian Heimes255f53b2007-12-08 15:33:56 +000016(:file:`.zip` files can be read and written using the :mod:`zipfile` module.)
Guido van Rossum77677112007-11-05 19:43:04 +000017
Georg Brandl116aa622007-08-15 14:28:22 +000018Some facts and figures:
19
Guido van Rossum77677112007-11-05 19:43:04 +000020* reads and writes :mod:`gzip` and :mod:`bz2` compressed archives.
Georg Brandl116aa622007-08-15 14:28:22 +000021
22* read/write support for the POSIX.1-1988 (ustar) format.
23
24* read/write support for the GNU tar format including *longname* and *longlink*
25 extensions, read-only support for the *sparse* extension.
26
27* read/write support for the POSIX.1-2001 (pax) format.
28
Georg Brandl116aa622007-08-15 14:28:22 +000029* handles directories, regular files, hardlinks, symbolic links, fifos,
30 character devices and block devices and is able to acquire and restore file
31 information like timestamp, access permissions and owner.
32
Georg Brandl116aa622007-08-15 14:28:22 +000033
Benjamin Petersona37cfc62008-05-26 13:48:34 +000034.. function:: open(name=None, mode='r', fileobj=None, bufsize=10240, \*\*kwargs)
Georg Brandl116aa622007-08-15 14:28:22 +000035
36 Return a :class:`TarFile` object for the pathname *name*. For detailed
37 information on :class:`TarFile` objects and the keyword arguments that are
38 allowed, see :ref:`tarfile-objects`.
39
40 *mode* has to be a string of the form ``'filemode[:compression]'``, it defaults
41 to ``'r'``. Here is a full list of mode combinations:
42
43 +------------------+---------------------------------------------+
44 | mode | action |
45 +==================+=============================================+
46 | ``'r' or 'r:*'`` | Open for reading with transparent |
47 | | compression (recommended). |
48 +------------------+---------------------------------------------+
49 | ``'r:'`` | Open for reading exclusively without |
50 | | compression. |
51 +------------------+---------------------------------------------+
52 | ``'r:gz'`` | Open for reading with gzip compression. |
53 +------------------+---------------------------------------------+
54 | ``'r:bz2'`` | Open for reading with bzip2 compression. |
55 +------------------+---------------------------------------------+
56 | ``'a' or 'a:'`` | Open for appending with no compression. The |
57 | | file is created if it does not exist. |
58 +------------------+---------------------------------------------+
59 | ``'w' or 'w:'`` | Open for uncompressed writing. |
60 +------------------+---------------------------------------------+
61 | ``'w:gz'`` | Open for gzip compressed writing. |
62 +------------------+---------------------------------------------+
63 | ``'w:bz2'`` | Open for bzip2 compressed writing. |
64 +------------------+---------------------------------------------+
65
66 Note that ``'a:gz'`` or ``'a:bz2'`` is not possible. If *mode* is not suitable
67 to open a certain (compressed) file for reading, :exc:`ReadError` is raised. Use
68 *mode* ``'r'`` to avoid this. If a compression method is not supported,
69 :exc:`CompressionError` is raised.
70
71 If *fileobj* is specified, it is used as an alternative to a file object opened
72 for *name*. It is supposed to be at position 0.
73
74 For special purposes, there is a second format for *mode*:
Benjamin Petersona37cfc62008-05-26 13:48:34 +000075 ``'filemode|[compression]'``. :func:`tarfile.open` will return a :class:`TarFile`
Georg Brandl116aa622007-08-15 14:28:22 +000076 object that processes its data as a stream of blocks. No random seeking will
77 be done on the file. If given, *fileobj* may be any object that has a
78 :meth:`read` or :meth:`write` method (depending on the *mode*). *bufsize*
79 specifies the blocksize and defaults to ``20 * 512`` bytes. Use this variant
80 in combination with e.g. ``sys.stdin``, a socket file object or a tape
81 device. However, such a :class:`TarFile` object is limited in that it does
82 not allow to be accessed randomly, see :ref:`tar-examples`. The currently
83 possible modes:
84
85 +-------------+--------------------------------------------+
86 | Mode | Action |
87 +=============+============================================+
88 | ``'r|*'`` | Open a *stream* of tar blocks for reading |
89 | | with transparent compression. |
90 +-------------+--------------------------------------------+
91 | ``'r|'`` | Open a *stream* of uncompressed tar blocks |
92 | | for reading. |
93 +-------------+--------------------------------------------+
94 | ``'r|gz'`` | Open a gzip compressed *stream* for |
95 | | reading. |
96 +-------------+--------------------------------------------+
97 | ``'r|bz2'`` | Open a bzip2 compressed *stream* for |
98 | | reading. |
99 +-------------+--------------------------------------------+
100 | ``'w|'`` | Open an uncompressed *stream* for writing. |
101 +-------------+--------------------------------------------+
102 | ``'w|gz'`` | Open an gzip compressed *stream* for |
103 | | writing. |
104 +-------------+--------------------------------------------+
105 | ``'w|bz2'`` | Open an bzip2 compressed *stream* for |
106 | | writing. |
107 +-------------+--------------------------------------------+
108
109
110.. class:: TarFile
111
112 Class for reading and writing tar archives. Do not use this class directly,
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000113 better use :func:`tarfile.open` instead. See :ref:`tarfile-objects`.
Georg Brandl116aa622007-08-15 14:28:22 +0000114
115
116.. function:: is_tarfile(name)
117
118 Return :const:`True` if *name* is a tar archive file, that the :mod:`tarfile`
119 module can read.
120
121
Lars Gustäbel0c24e8b2008-08-02 11:43:24 +0000122The :mod:`tarfile` module defines the following exceptions:
Georg Brandl116aa622007-08-15 14:28:22 +0000123
124
125.. exception:: TarError
126
127 Base class for all :mod:`tarfile` exceptions.
128
129
130.. exception:: ReadError
131
132 Is raised when a tar archive is opened, that either cannot be handled by the
133 :mod:`tarfile` module or is somehow invalid.
134
135
136.. exception:: CompressionError
137
138 Is raised when a compression method is not supported or when the data cannot be
139 decoded properly.
140
141
142.. exception:: StreamError
143
144 Is raised for the limitations that are typical for stream-like :class:`TarFile`
145 objects.
146
147
148.. exception:: ExtractError
149
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000150 Is raised for *non-fatal* errors when using :meth:`TarFile.extract`, but only if
Georg Brandl116aa622007-08-15 14:28:22 +0000151 :attr:`TarFile.errorlevel`\ ``== 2``.
152
153
154.. exception:: HeaderError
155
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000156 Is raised by :meth:`TarInfo.frombuf` if the buffer it gets is invalid.
157
Georg Brandl116aa622007-08-15 14:28:22 +0000158
Georg Brandl116aa622007-08-15 14:28:22 +0000159
160Each of the following constants defines a tar archive format that the
161:mod:`tarfile` module is able to create. See section :ref:`tar-formats` for
162details.
163
164
165.. data:: USTAR_FORMAT
166
167 POSIX.1-1988 (ustar) format.
168
169
170.. data:: GNU_FORMAT
171
172 GNU tar format.
173
174
175.. data:: PAX_FORMAT
176
177 POSIX.1-2001 (pax) format.
178
179
180.. data:: DEFAULT_FORMAT
181
182 The default format for creating archives. This is currently :const:`GNU_FORMAT`.
183
184
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000185The following variables are available on module level:
186
187
188.. data:: ENCODING
189
190 The default character encoding i.e. the value from either
191 :func:`sys.getfilesystemencoding` or :func:`sys.getdefaultencoding`.
192
193
Georg Brandl116aa622007-08-15 14:28:22 +0000194.. seealso::
195
196 Module :mod:`zipfile`
197 Documentation of the :mod:`zipfile` standard module.
198
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000199 `GNU tar manual, Basic Tar Format <http://www.gnu.org/software/tar/manual/html_node/Standard.html>`_
Georg Brandl116aa622007-08-15 14:28:22 +0000200 Documentation for tar archive files, including GNU tar extensions.
201
Georg Brandl116aa622007-08-15 14:28:22 +0000202
203.. _tarfile-objects:
204
205TarFile Objects
206---------------
207
208The :class:`TarFile` object provides an interface to a tar archive. A tar
209archive is a sequence of blocks. An archive member (a stored file) is made up of
210a header block followed by data blocks. It is possible to store a file in a tar
211archive several times. Each archive member is represented by a :class:`TarInfo`
212object, see :ref:`tarinfo-objects` for details.
213
214
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000215.. class:: TarFile(name=None, mode='r', fileobj=None, format=DEFAULT_FORMAT, tarinfo=TarInfo, dereference=False, ignore_zeros=False, encoding=ENCODING, errors=None, pax_headers=None, debug=0, errorlevel=0)
Georg Brandl116aa622007-08-15 14:28:22 +0000216
217 All following arguments are optional and can be accessed as instance attributes
218 as well.
219
220 *name* is the pathname of the archive. It can be omitted if *fileobj* is given.
221 In this case, the file object's :attr:`name` attribute is used if it exists.
222
223 *mode* is either ``'r'`` to read from an existing archive, ``'a'`` to append
224 data to an existing file or ``'w'`` to create a new file overwriting an existing
225 one.
226
227 If *fileobj* is given, it is used for reading or writing data. If it can be
228 determined, *mode* is overridden by *fileobj*'s mode. *fileobj* will be used
229 from position 0.
230
231 .. note::
232
233 *fileobj* is not closed, when :class:`TarFile` is closed.
234
235 *format* controls the archive format. It must be one of the constants
236 :const:`USTAR_FORMAT`, :const:`GNU_FORMAT` or :const:`PAX_FORMAT` that are
237 defined at module level.
238
Georg Brandl116aa622007-08-15 14:28:22 +0000239 The *tarinfo* argument can be used to replace the default :class:`TarInfo` class
240 with a different one.
241
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000242 If *dereference* is :const:`False`, add symbolic and hard links to the archive. If it
243 is :const:`True`, add the content of the target files to the archive. This has no
Georg Brandl116aa622007-08-15 14:28:22 +0000244 effect on systems that do not support symbolic links.
245
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000246 If *ignore_zeros* is :const:`False`, treat an empty block as the end of the archive.
247 If it is :const:`True`, skip empty (and invalid) blocks and try to get as many members
Georg Brandl116aa622007-08-15 14:28:22 +0000248 as possible. This is only useful for reading concatenated or damaged archives.
249
250 *debug* can be set from ``0`` (no debug messages) up to ``3`` (all debug
251 messages). The messages are written to ``sys.stderr``.
252
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000253 If *errorlevel* is ``0``, all errors are ignored when using :meth:`TarFile.extract`.
Georg Brandl116aa622007-08-15 14:28:22 +0000254 Nevertheless, they appear as error messages in the debug output, when debugging
255 is enabled. If ``1``, all *fatal* errors are raised as :exc:`OSError` or
256 :exc:`IOError` exceptions. If ``2``, all *non-fatal* errors are raised as
257 :exc:`TarError` exceptions as well.
258
Lars Gustäbel3741eff2007-08-21 12:17:05 +0000259 The *encoding* and *errors* arguments define the character encoding to be
260 used for reading or writing the archive and how conversion errors are going
261 to be handled. The default settings will work for most users.
Georg Brandl116aa622007-08-15 14:28:22 +0000262 See section :ref:`tar-unicode` for in-depth information.
263
Lars Gustäbel3741eff2007-08-21 12:17:05 +0000264 The *pax_headers* argument is an optional dictionary of strings which
Georg Brandl116aa622007-08-15 14:28:22 +0000265 will be added as a pax global header if *format* is :const:`PAX_FORMAT`.
266
Georg Brandl116aa622007-08-15 14:28:22 +0000267
268.. method:: TarFile.open(...)
269
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000270 Alternative constructor. The :func:`tarfile.open` function is actually a
271 shortcut to this classmethod.
Georg Brandl116aa622007-08-15 14:28:22 +0000272
273
274.. method:: TarFile.getmember(name)
275
276 Return a :class:`TarInfo` object for member *name*. If *name* can not be found
277 in the archive, :exc:`KeyError` is raised.
278
279 .. note::
280
281 If a member occurs more than once in the archive, its last occurrence is assumed
282 to be the most up-to-date version.
283
284
285.. method:: TarFile.getmembers()
286
287 Return the members of the archive as a list of :class:`TarInfo` objects. The
288 list has the same order as the members in the archive.
289
290
291.. method:: TarFile.getnames()
292
293 Return the members as a list of their names. It has the same order as the list
294 returned by :meth:`getmembers`.
295
296
297.. method:: TarFile.list(verbose=True)
298
299 Print a table of contents to ``sys.stdout``. If *verbose* is :const:`False`,
300 only the names of the members are printed. If it is :const:`True`, output
301 similar to that of :program:`ls -l` is produced.
302
303
304.. method:: TarFile.next()
305
306 Return the next member of the archive as a :class:`TarInfo` object, when
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000307 :class:`TarFile` is opened for reading. Return :const:`None` if there is no more
Georg Brandl116aa622007-08-15 14:28:22 +0000308 available.
309
310
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000311.. method:: TarFile.extractall(path=".", members=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000312
313 Extract all members from the archive to the current working directory or
314 directory *path*. If optional *members* is given, it must be a subset of the
315 list returned by :meth:`getmembers`. Directory information like owner,
316 modification time and permissions are set after all members have been extracted.
317 This is done to work around two problems: A directory's modification time is
318 reset each time a file is created in it. And, if a directory's permissions do
319 not allow writing, extracting files to it will fail.
320
Thomas Wouters47b49bf2007-08-30 22:15:33 +0000321 .. warning::
322
323 Never extract archives from untrusted sources without prior inspection.
324 It is possible that files are created outside of *path*, e.g. members
325 that have absolute filenames starting with ``"/"`` or filenames with two
326 dots ``".."``.
327
Georg Brandl116aa622007-08-15 14:28:22 +0000328
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000329.. method:: TarFile.extract(member, path="")
Georg Brandl116aa622007-08-15 14:28:22 +0000330
331 Extract a member from the archive to the current working directory, using its
332 full name. Its file information is extracted as accurately as possible. *member*
333 may be a filename or a :class:`TarInfo` object. You can specify a different
334 directory using *path*.
335
336 .. note::
337
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000338 The :meth:`extract` method does not take care of several extraction issues.
339 In most cases you should consider using the :meth:`extractall` method.
Georg Brandl116aa622007-08-15 14:28:22 +0000340
Thomas Wouters47b49bf2007-08-30 22:15:33 +0000341 .. warning::
342
343 See the warning for :meth:`extractall`.
344
Georg Brandl116aa622007-08-15 14:28:22 +0000345
346.. method:: TarFile.extractfile(member)
347
348 Extract a member from the archive as a file object. *member* may be a filename
349 or a :class:`TarInfo` object. If *member* is a regular file, a file-like object
350 is returned. If *member* is a link, a file-like object is constructed from the
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000351 link's target. If *member* is none of the above, :const:`None` is returned.
Georg Brandl116aa622007-08-15 14:28:22 +0000352
353 .. note::
354
Georg Brandlff2ad0e2009-04-27 16:51:45 +0000355 The file-like object is read-only. It provides the methods
356 :meth:`read`, :meth:`readline`, :meth:`readlines`, :meth:`seek`, :meth:`tell`,
357 and :meth:`close`, and also supports iteration over its lines.
Georg Brandl116aa622007-08-15 14:28:22 +0000358
359
Lars Gustäbel049d2aa2009-09-12 10:44:00 +0000360.. method:: TarFile.add(name, arcname=None, recursive=True, exclude=None, filter=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000361
362 Add the file *name* to the archive. *name* may be any type of file (directory,
363 fifo, symbolic link, etc.). If given, *arcname* specifies an alternative name
364 for the file in the archive. Directories are added recursively by default. This
Georg Brandl55ac8f02007-09-01 13:51:09 +0000365 can be avoided by setting *recursive* to :const:`False`. If *exclude* is given,
Georg Brandl116aa622007-08-15 14:28:22 +0000366 it must be a function that takes one filename argument and returns a boolean
367 value. Depending on this value the respective file is either excluded
Lars Gustäbel049d2aa2009-09-12 10:44:00 +0000368 (:const:`True`) or added (:const:`False`). If *filter* is specified it must
369 be a function that takes a :class:`TarInfo` object argument and returns the
370 changed TarInfo object. If it instead returns :const:`None` the TarInfo
371 object will be excluded from the archive. See :ref:`tar-examples` for an
372 example.
373
374 .. versionchanged:: 3.2
375 Added the *filter* parameter.
376
377 .. deprecated:: 3.2
378 The *exclude* parameter is deprecated, please use the *filter* parameter
379 instead.
Georg Brandl116aa622007-08-15 14:28:22 +0000380
Georg Brandl116aa622007-08-15 14:28:22 +0000381
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000382.. method:: TarFile.addfile(tarinfo, fileobj=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000383
384 Add the :class:`TarInfo` object *tarinfo* to the archive. If *fileobj* is given,
385 ``tarinfo.size`` bytes are read from it and added to the archive. You can
386 create :class:`TarInfo` objects using :meth:`gettarinfo`.
387
388 .. note::
389
390 On Windows platforms, *fileobj* should always be opened with mode ``'rb'`` to
391 avoid irritation about the file size.
392
393
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000394.. method:: TarFile.gettarinfo(name=None, arcname=None, fileobj=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000395
396 Create a :class:`TarInfo` object for either the file *name* or the file object
397 *fileobj* (using :func:`os.fstat` on its file descriptor). You can modify some
398 of the :class:`TarInfo`'s attributes before you add it using :meth:`addfile`.
399 If given, *arcname* specifies an alternative name for the file in the archive.
400
401
402.. method:: TarFile.close()
403
404 Close the :class:`TarFile`. In write mode, two finishing zero blocks are
405 appended to the archive.
406
407
Georg Brandl116aa622007-08-15 14:28:22 +0000408.. attribute:: TarFile.pax_headers
409
410 A dictionary containing key-value pairs of pax global headers.
411
Georg Brandl116aa622007-08-15 14:28:22 +0000412
Georg Brandl116aa622007-08-15 14:28:22 +0000413
414.. _tarinfo-objects:
415
416TarInfo Objects
417---------------
418
419A :class:`TarInfo` object represents one member in a :class:`TarFile`. Aside
420from storing all required attributes of a file (like file type, size, time,
421permissions, owner etc.), it provides some useful methods to determine its type.
422It does *not* contain the file's data itself.
423
424:class:`TarInfo` objects are returned by :class:`TarFile`'s methods
425:meth:`getmember`, :meth:`getmembers` and :meth:`gettarinfo`.
426
427
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000428.. class:: TarInfo(name="")
Georg Brandl116aa622007-08-15 14:28:22 +0000429
430 Create a :class:`TarInfo` object.
431
432
433.. method:: TarInfo.frombuf(buf)
434
435 Create and return a :class:`TarInfo` object from string buffer *buf*.
436
Georg Brandl55ac8f02007-09-01 13:51:09 +0000437 Raises :exc:`HeaderError` if the buffer is invalid..
Georg Brandl116aa622007-08-15 14:28:22 +0000438
439
440.. method:: TarInfo.fromtarfile(tarfile)
441
442 Read the next member from the :class:`TarFile` object *tarfile* and return it as
443 a :class:`TarInfo` object.
444
Georg Brandl116aa622007-08-15 14:28:22 +0000445
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000446.. method:: TarInfo.tobuf(format=DEFAULT_FORMAT, encoding=ENCODING, errors='strict')
Georg Brandl116aa622007-08-15 14:28:22 +0000447
448 Create a string buffer from a :class:`TarInfo` object. For information on the
449 arguments see the constructor of the :class:`TarFile` class.
450
Georg Brandl116aa622007-08-15 14:28:22 +0000451
452A ``TarInfo`` object has the following public data attributes:
453
454
455.. attribute:: TarInfo.name
456
457 Name of the archive member.
458
459
460.. attribute:: TarInfo.size
461
462 Size in bytes.
463
464
465.. attribute:: TarInfo.mtime
466
467 Time of last modification.
468
469
470.. attribute:: TarInfo.mode
471
472 Permission bits.
473
474
475.. attribute:: TarInfo.type
476
477 File type. *type* is usually one of these constants: :const:`REGTYPE`,
478 :const:`AREGTYPE`, :const:`LNKTYPE`, :const:`SYMTYPE`, :const:`DIRTYPE`,
479 :const:`FIFOTYPE`, :const:`CONTTYPE`, :const:`CHRTYPE`, :const:`BLKTYPE`,
480 :const:`GNUTYPE_SPARSE`. To determine the type of a :class:`TarInfo` object
481 more conveniently, use the ``is_*()`` methods below.
482
483
484.. attribute:: TarInfo.linkname
485
486 Name of the target file name, which is only present in :class:`TarInfo` objects
487 of type :const:`LNKTYPE` and :const:`SYMTYPE`.
488
489
490.. attribute:: TarInfo.uid
491
492 User ID of the user who originally stored this member.
493
494
495.. attribute:: TarInfo.gid
496
497 Group ID of the user who originally stored this member.
498
499
500.. attribute:: TarInfo.uname
501
502 User name.
503
504
505.. attribute:: TarInfo.gname
506
507 Group name.
508
509
510.. attribute:: TarInfo.pax_headers
511
512 A dictionary containing key-value pairs of an associated pax extended header.
513
Georg Brandl116aa622007-08-15 14:28:22 +0000514
515A :class:`TarInfo` object also provides some convenient query methods:
516
517
518.. method:: TarInfo.isfile()
519
520 Return :const:`True` if the :class:`Tarinfo` object is a regular file.
521
522
523.. method:: TarInfo.isreg()
524
525 Same as :meth:`isfile`.
526
527
528.. method:: TarInfo.isdir()
529
530 Return :const:`True` if it is a directory.
531
532
533.. method:: TarInfo.issym()
534
535 Return :const:`True` if it is a symbolic link.
536
537
538.. method:: TarInfo.islnk()
539
540 Return :const:`True` if it is a hard link.
541
542
543.. method:: TarInfo.ischr()
544
545 Return :const:`True` if it is a character device.
546
547
548.. method:: TarInfo.isblk()
549
550 Return :const:`True` if it is a block device.
551
552
553.. method:: TarInfo.isfifo()
554
555 Return :const:`True` if it is a FIFO.
556
557
558.. method:: TarInfo.isdev()
559
560 Return :const:`True` if it is one of character device, block device or FIFO.
561
Georg Brandl116aa622007-08-15 14:28:22 +0000562
563.. _tar-examples:
564
565Examples
566--------
567
568How to extract an entire tar archive to the current working directory::
569
570 import tarfile
571 tar = tarfile.open("sample.tar.gz")
572 tar.extractall()
573 tar.close()
574
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000575How to extract a subset of a tar archive with :meth:`TarFile.extractall` using
576a generator function instead of a list::
577
578 import os
579 import tarfile
580
581 def py_files(members):
582 for tarinfo in members:
583 if os.path.splitext(tarinfo.name)[1] == ".py":
584 yield tarinfo
585
586 tar = tarfile.open("sample.tar.gz")
587 tar.extractall(members=py_files(tar))
588 tar.close()
589
Georg Brandl116aa622007-08-15 14:28:22 +0000590How to create an uncompressed tar archive from a list of filenames::
591
592 import tarfile
593 tar = tarfile.open("sample.tar", "w")
594 for name in ["foo", "bar", "quux"]:
595 tar.add(name)
596 tar.close()
597
598How to read a gzip compressed tar archive and display some member information::
599
600 import tarfile
601 tar = tarfile.open("sample.tar.gz", "r:gz")
602 for tarinfo in tar:
Collin Winterc79461b2007-09-01 23:34:30 +0000603 print(tarinfo.name, "is", tarinfo.size, "bytes in size and is", end="")
Georg Brandl116aa622007-08-15 14:28:22 +0000604 if tarinfo.isreg():
Collin Winterc79461b2007-09-01 23:34:30 +0000605 print("a regular file.")
Georg Brandl116aa622007-08-15 14:28:22 +0000606 elif tarinfo.isdir():
Collin Winterc79461b2007-09-01 23:34:30 +0000607 print("a directory.")
Georg Brandl116aa622007-08-15 14:28:22 +0000608 else:
Collin Winterc79461b2007-09-01 23:34:30 +0000609 print("something else.")
Georg Brandl116aa622007-08-15 14:28:22 +0000610 tar.close()
611
Lars Gustäbel049d2aa2009-09-12 10:44:00 +0000612How to create an archive and reset the user information using the *filter*
613parameter in :meth:`TarFile.add`::
614
615 import tarfile
616 def reset(tarinfo):
617 tarinfo.uid = tarinfo.gid = 0
618 tarinfo.uname = tarinfo.gname = "root"
619 return tarinfo
620 tar = tarfile.open("sample.tar.gz", "w:gz")
621 tar.add("foo", filter=reset)
622 tar.close()
623
Georg Brandl116aa622007-08-15 14:28:22 +0000624
625.. _tar-formats:
626
627Supported tar formats
628---------------------
629
630There are three tar formats that can be created with the :mod:`tarfile` module:
631
632* The POSIX.1-1988 ustar format (:const:`USTAR_FORMAT`). It supports filenames
633 up to a length of at best 256 characters and linknames up to 100 characters. The
634 maximum file size is 8 gigabytes. This is an old and limited but widely
635 supported format.
636
637* The GNU tar format (:const:`GNU_FORMAT`). It supports long filenames and
638 linknames, files bigger than 8 gigabytes and sparse files. It is the de facto
639 standard on GNU/Linux systems. :mod:`tarfile` fully supports the GNU tar
640 extensions for long names, sparse file support is read-only.
641
642* The POSIX.1-2001 pax format (:const:`PAX_FORMAT`). It is the most flexible
643 format with virtually no limits. It supports long filenames and linknames, large
644 files and stores pathnames in a portable way. However, not all tar
645 implementations today are able to handle pax archives properly.
646
647 The *pax* format is an extension to the existing *ustar* format. It uses extra
648 headers for information that cannot be stored otherwise. There are two flavours
649 of pax headers: Extended headers only affect the subsequent file header, global
650 headers are valid for the complete archive and affect all following files. All
651 the data in a pax header is encoded in *UTF-8* for portability reasons.
652
653There are some more variants of the tar format which can be read, but not
654created:
655
656* The ancient V7 format. This is the first tar format from Unix Seventh Edition,
657 storing only regular files and directories. Names must not be longer than 100
658 characters, there is no user/group name information. Some archives have
659 miscalculated header checksums in case of fields with non-ASCII characters.
660
661* The SunOS tar extended format. This format is a variant of the POSIX.1-2001
662 pax format, but is not compatible.
663
Georg Brandl116aa622007-08-15 14:28:22 +0000664.. _tar-unicode:
665
666Unicode issues
667--------------
668
669The tar format was originally conceived to make backups on tape drives with the
670main focus on preserving file system information. Nowadays tar archives are
671commonly used for file distribution and exchanging archives over networks. One
Lars Gustäbel3741eff2007-08-21 12:17:05 +0000672problem of the original format (which is the basis of all other formats) is
673that there is no concept of supporting different character encodings. For
Georg Brandl116aa622007-08-15 14:28:22 +0000674example, an ordinary tar archive created on a *UTF-8* system cannot be read
Lars Gustäbel3741eff2007-08-21 12:17:05 +0000675correctly on a *Latin-1* system if it contains non-*ASCII* characters. Textual
676metadata (like filenames, linknames, user/group names) will appear damaged.
677Unfortunately, there is no way to autodetect the encoding of an archive. The
678pax format was designed to solve this problem. It stores non-ASCII metadata
679using the universal character encoding *UTF-8*.
Georg Brandl116aa622007-08-15 14:28:22 +0000680
Lars Gustäbel3741eff2007-08-21 12:17:05 +0000681The details of character conversion in :mod:`tarfile` are controlled by the
682*encoding* and *errors* keyword arguments of the :class:`TarFile` class.
Georg Brandl116aa622007-08-15 14:28:22 +0000683
Lars Gustäbel3741eff2007-08-21 12:17:05 +0000684*encoding* defines the character encoding to use for the metadata in the
685archive. The default value is :func:`sys.getfilesystemencoding` or ``'ascii'``
686as a fallback. Depending on whether the archive is read or written, the
687metadata must be either decoded or encoded. If *encoding* is not set
688appropriately, this conversion may fail.
Georg Brandl116aa622007-08-15 14:28:22 +0000689
690The *errors* argument defines how characters are treated that cannot be
Lars Gustäbel3741eff2007-08-21 12:17:05 +0000691converted. Possible values are listed in section :ref:`codec-base-classes`. In
692read mode the default scheme is ``'replace'``. This avoids unexpected
693:exc:`UnicodeError` exceptions and guarantees that an archive can always be
694read. In write mode the default value for *errors* is ``'strict'``. This
695ensures that name information is not altered unnoticed.
Georg Brandl116aa622007-08-15 14:28:22 +0000696
Lars Gustäbel3741eff2007-08-21 12:17:05 +0000697In case of writing :const:`PAX_FORMAT` archives, *encoding* is ignored because
698non-ASCII metadata is stored using *UTF-8*.