blob: bc5ce6261220498f79a8da62d47d21910f6fd39b [file] [log] [blame]
Georg Brandl116aa622007-08-15 14:28:22 +00001.. _tarfile-mod:
2
3:mod:`tarfile` --- Read and write tar archive files
4===================================================
5
6.. module:: tarfile
7 :synopsis: Read and write tar-format archive files.
8
9
Georg Brandl116aa622007-08-15 14:28:22 +000010.. moduleauthor:: Lars Gustäbel <lars@gustaebel.de>
11.. sectionauthor:: Lars Gustäbel <lars@gustaebel.de>
12
13
Guido van Rossum77677112007-11-05 19:43:04 +000014The :mod:`tarfile` module makes it possible to read and write tar
15archives, including those using gzip or bz2 compression.
Christian Heimes255f53b2007-12-08 15:33:56 +000016(:file:`.zip` files can be read and written using the :mod:`zipfile` module.)
Guido van Rossum77677112007-11-05 19:43:04 +000017
Georg Brandl116aa622007-08-15 14:28:22 +000018Some facts and figures:
19
Guido van Rossum77677112007-11-05 19:43:04 +000020* reads and writes :mod:`gzip` and :mod:`bz2` compressed archives.
Georg Brandl116aa622007-08-15 14:28:22 +000021
22* read/write support for the POSIX.1-1988 (ustar) format.
23
24* read/write support for the GNU tar format including *longname* and *longlink*
25 extensions, read-only support for the *sparse* extension.
26
27* read/write support for the POSIX.1-2001 (pax) format.
28
Georg Brandl116aa622007-08-15 14:28:22 +000029* handles directories, regular files, hardlinks, symbolic links, fifos,
30 character devices and block devices and is able to acquire and restore file
31 information like timestamp, access permissions and owner.
32
Georg Brandl116aa622007-08-15 14:28:22 +000033
Benjamin Petersona37cfc62008-05-26 13:48:34 +000034.. function:: open(name=None, mode='r', fileobj=None, bufsize=10240, \*\*kwargs)
Georg Brandl116aa622007-08-15 14:28:22 +000035
36 Return a :class:`TarFile` object for the pathname *name*. For detailed
37 information on :class:`TarFile` objects and the keyword arguments that are
38 allowed, see :ref:`tarfile-objects`.
39
40 *mode* has to be a string of the form ``'filemode[:compression]'``, it defaults
41 to ``'r'``. Here is a full list of mode combinations:
42
43 +------------------+---------------------------------------------+
44 | mode | action |
45 +==================+=============================================+
46 | ``'r' or 'r:*'`` | Open for reading with transparent |
47 | | compression (recommended). |
48 +------------------+---------------------------------------------+
49 | ``'r:'`` | Open for reading exclusively without |
50 | | compression. |
51 +------------------+---------------------------------------------+
52 | ``'r:gz'`` | Open for reading with gzip compression. |
53 +------------------+---------------------------------------------+
54 | ``'r:bz2'`` | Open for reading with bzip2 compression. |
55 +------------------+---------------------------------------------+
56 | ``'a' or 'a:'`` | Open for appending with no compression. The |
57 | | file is created if it does not exist. |
58 +------------------+---------------------------------------------+
59 | ``'w' or 'w:'`` | Open for uncompressed writing. |
60 +------------------+---------------------------------------------+
61 | ``'w:gz'`` | Open for gzip compressed writing. |
62 +------------------+---------------------------------------------+
63 | ``'w:bz2'`` | Open for bzip2 compressed writing. |
64 +------------------+---------------------------------------------+
65
66 Note that ``'a:gz'`` or ``'a:bz2'`` is not possible. If *mode* is not suitable
67 to open a certain (compressed) file for reading, :exc:`ReadError` is raised. Use
68 *mode* ``'r'`` to avoid this. If a compression method is not supported,
69 :exc:`CompressionError` is raised.
70
71 If *fileobj* is specified, it is used as an alternative to a file object opened
72 for *name*. It is supposed to be at position 0.
73
74 For special purposes, there is a second format for *mode*:
Benjamin Petersona37cfc62008-05-26 13:48:34 +000075 ``'filemode|[compression]'``. :func:`tarfile.open` will return a :class:`TarFile`
Georg Brandl116aa622007-08-15 14:28:22 +000076 object that processes its data as a stream of blocks. No random seeking will
77 be done on the file. If given, *fileobj* may be any object that has a
78 :meth:`read` or :meth:`write` method (depending on the *mode*). *bufsize*
79 specifies the blocksize and defaults to ``20 * 512`` bytes. Use this variant
80 in combination with e.g. ``sys.stdin``, a socket file object or a tape
81 device. However, such a :class:`TarFile` object is limited in that it does
82 not allow to be accessed randomly, see :ref:`tar-examples`. The currently
83 possible modes:
84
85 +-------------+--------------------------------------------+
86 | Mode | Action |
87 +=============+============================================+
88 | ``'r|*'`` | Open a *stream* of tar blocks for reading |
89 | | with transparent compression. |
90 +-------------+--------------------------------------------+
91 | ``'r|'`` | Open a *stream* of uncompressed tar blocks |
92 | | for reading. |
93 +-------------+--------------------------------------------+
94 | ``'r|gz'`` | Open a gzip compressed *stream* for |
95 | | reading. |
96 +-------------+--------------------------------------------+
97 | ``'r|bz2'`` | Open a bzip2 compressed *stream* for |
98 | | reading. |
99 +-------------+--------------------------------------------+
100 | ``'w|'`` | Open an uncompressed *stream* for writing. |
101 +-------------+--------------------------------------------+
102 | ``'w|gz'`` | Open an gzip compressed *stream* for |
103 | | writing. |
104 +-------------+--------------------------------------------+
105 | ``'w|bz2'`` | Open an bzip2 compressed *stream* for |
106 | | writing. |
107 +-------------+--------------------------------------------+
108
109
110.. class:: TarFile
111
112 Class for reading and writing tar archives. Do not use this class directly,
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000113 better use :func:`tarfile.open` instead. See :ref:`tarfile-objects`.
Georg Brandl116aa622007-08-15 14:28:22 +0000114
115
116.. function:: is_tarfile(name)
117
118 Return :const:`True` if *name* is a tar archive file, that the :mod:`tarfile`
119 module can read.
120
121
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000122.. class:: TarFileCompat(filename, mode='r', compression=TAR_PLAIN)
Georg Brandl116aa622007-08-15 14:28:22 +0000123
124 Class for limited access to tar archives with a :mod:`zipfile`\ -like interface.
125 Please consult the documentation of the :mod:`zipfile` module for more details.
126 *compression* must be one of the following constants:
127
128
129 .. data:: TAR_PLAIN
130
131 Constant for an uncompressed tar archive.
132
133
134 .. data:: TAR_GZIPPED
135
136 Constant for a :mod:`gzip` compressed tar archive.
137
138
139.. exception:: TarError
140
141 Base class for all :mod:`tarfile` exceptions.
142
143
144.. exception:: ReadError
145
146 Is raised when a tar archive is opened, that either cannot be handled by the
147 :mod:`tarfile` module or is somehow invalid.
148
149
150.. exception:: CompressionError
151
152 Is raised when a compression method is not supported or when the data cannot be
153 decoded properly.
154
155
156.. exception:: StreamError
157
158 Is raised for the limitations that are typical for stream-like :class:`TarFile`
159 objects.
160
161
162.. exception:: ExtractError
163
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000164 Is raised for *non-fatal* errors when using :meth:`TarFile.extract`, but only if
Georg Brandl116aa622007-08-15 14:28:22 +0000165 :attr:`TarFile.errorlevel`\ ``== 2``.
166
167
168.. exception:: HeaderError
169
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000170 Is raised by :meth:`TarInfo.frombuf` if the buffer it gets is invalid.
171
Georg Brandl116aa622007-08-15 14:28:22 +0000172
Georg Brandl116aa622007-08-15 14:28:22 +0000173
174Each of the following constants defines a tar archive format that the
175:mod:`tarfile` module is able to create. See section :ref:`tar-formats` for
176details.
177
178
179.. data:: USTAR_FORMAT
180
181 POSIX.1-1988 (ustar) format.
182
183
184.. data:: GNU_FORMAT
185
186 GNU tar format.
187
188
189.. data:: PAX_FORMAT
190
191 POSIX.1-2001 (pax) format.
192
193
194.. data:: DEFAULT_FORMAT
195
196 The default format for creating archives. This is currently :const:`GNU_FORMAT`.
197
198
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000199The following variables are available on module level:
200
201
202.. data:: ENCODING
203
204 The default character encoding i.e. the value from either
205 :func:`sys.getfilesystemencoding` or :func:`sys.getdefaultencoding`.
206
207
Georg Brandl116aa622007-08-15 14:28:22 +0000208.. seealso::
209
210 Module :mod:`zipfile`
211 Documentation of the :mod:`zipfile` standard module.
212
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000213 `GNU tar manual, Basic Tar Format <http://www.gnu.org/software/tar/manual/html_node/Standard.html>`_
Georg Brandl116aa622007-08-15 14:28:22 +0000214 Documentation for tar archive files, including GNU tar extensions.
215
Georg Brandl116aa622007-08-15 14:28:22 +0000216
217.. _tarfile-objects:
218
219TarFile Objects
220---------------
221
222The :class:`TarFile` object provides an interface to a tar archive. A tar
223archive is a sequence of blocks. An archive member (a stored file) is made up of
224a header block followed by data blocks. It is possible to store a file in a tar
225archive several times. Each archive member is represented by a :class:`TarInfo`
226object, see :ref:`tarinfo-objects` for details.
227
228
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000229.. class:: TarFile(name=None, mode='r', fileobj=None, format=DEFAULT_FORMAT, tarinfo=TarInfo, dereference=False, ignore_zeros=False, encoding=ENCODING, errors=None, pax_headers=None, debug=0, errorlevel=0)
Georg Brandl116aa622007-08-15 14:28:22 +0000230
231 All following arguments are optional and can be accessed as instance attributes
232 as well.
233
234 *name* is the pathname of the archive. It can be omitted if *fileobj* is given.
235 In this case, the file object's :attr:`name` attribute is used if it exists.
236
237 *mode* is either ``'r'`` to read from an existing archive, ``'a'`` to append
238 data to an existing file or ``'w'`` to create a new file overwriting an existing
239 one.
240
241 If *fileobj* is given, it is used for reading or writing data. If it can be
242 determined, *mode* is overridden by *fileobj*'s mode. *fileobj* will be used
243 from position 0.
244
245 .. note::
246
247 *fileobj* is not closed, when :class:`TarFile` is closed.
248
249 *format* controls the archive format. It must be one of the constants
250 :const:`USTAR_FORMAT`, :const:`GNU_FORMAT` or :const:`PAX_FORMAT` that are
251 defined at module level.
252
Georg Brandl116aa622007-08-15 14:28:22 +0000253 The *tarinfo* argument can be used to replace the default :class:`TarInfo` class
254 with a different one.
255
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000256 If *dereference* is :const:`False`, add symbolic and hard links to the archive. If it
257 is :const:`True`, add the content of the target files to the archive. This has no
Georg Brandl116aa622007-08-15 14:28:22 +0000258 effect on systems that do not support symbolic links.
259
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000260 If *ignore_zeros* is :const:`False`, treat an empty block as the end of the archive.
261 If it is :const:`True`, skip empty (and invalid) blocks and try to get as many members
Georg Brandl116aa622007-08-15 14:28:22 +0000262 as possible. This is only useful for reading concatenated or damaged archives.
263
264 *debug* can be set from ``0`` (no debug messages) up to ``3`` (all debug
265 messages). The messages are written to ``sys.stderr``.
266
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000267 If *errorlevel* is ``0``, all errors are ignored when using :meth:`TarFile.extract`.
Georg Brandl116aa622007-08-15 14:28:22 +0000268 Nevertheless, they appear as error messages in the debug output, when debugging
269 is enabled. If ``1``, all *fatal* errors are raised as :exc:`OSError` or
270 :exc:`IOError` exceptions. If ``2``, all *non-fatal* errors are raised as
271 :exc:`TarError` exceptions as well.
272
Lars Gustäbel3741eff2007-08-21 12:17:05 +0000273 The *encoding* and *errors* arguments define the character encoding to be
274 used for reading or writing the archive and how conversion errors are going
275 to be handled. The default settings will work for most users.
Georg Brandl116aa622007-08-15 14:28:22 +0000276 See section :ref:`tar-unicode` for in-depth information.
277
Lars Gustäbel3741eff2007-08-21 12:17:05 +0000278 The *pax_headers* argument is an optional dictionary of strings which
Georg Brandl116aa622007-08-15 14:28:22 +0000279 will be added as a pax global header if *format* is :const:`PAX_FORMAT`.
280
Georg Brandl116aa622007-08-15 14:28:22 +0000281
282.. method:: TarFile.open(...)
283
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000284 Alternative constructor. The :func:`tarfile.open` function is actually a
285 shortcut to this classmethod.
Georg Brandl116aa622007-08-15 14:28:22 +0000286
287
288.. method:: TarFile.getmember(name)
289
290 Return a :class:`TarInfo` object for member *name*. If *name* can not be found
291 in the archive, :exc:`KeyError` is raised.
292
293 .. note::
294
295 If a member occurs more than once in the archive, its last occurrence is assumed
296 to be the most up-to-date version.
297
298
299.. method:: TarFile.getmembers()
300
301 Return the members of the archive as a list of :class:`TarInfo` objects. The
302 list has the same order as the members in the archive.
303
304
305.. method:: TarFile.getnames()
306
307 Return the members as a list of their names. It has the same order as the list
308 returned by :meth:`getmembers`.
309
310
311.. method:: TarFile.list(verbose=True)
312
313 Print a table of contents to ``sys.stdout``. If *verbose* is :const:`False`,
314 only the names of the members are printed. If it is :const:`True`, output
315 similar to that of :program:`ls -l` is produced.
316
317
318.. method:: TarFile.next()
319
320 Return the next member of the archive as a :class:`TarInfo` object, when
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000321 :class:`TarFile` is opened for reading. Return :const:`None` if there is no more
Georg Brandl116aa622007-08-15 14:28:22 +0000322 available.
323
324
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000325.. method:: TarFile.extractall(path=".", members=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000326
327 Extract all members from the archive to the current working directory or
328 directory *path*. If optional *members* is given, it must be a subset of the
329 list returned by :meth:`getmembers`. Directory information like owner,
330 modification time and permissions are set after all members have been extracted.
331 This is done to work around two problems: A directory's modification time is
332 reset each time a file is created in it. And, if a directory's permissions do
333 not allow writing, extracting files to it will fail.
334
Thomas Wouters47b49bf2007-08-30 22:15:33 +0000335 .. warning::
336
337 Never extract archives from untrusted sources without prior inspection.
338 It is possible that files are created outside of *path*, e.g. members
339 that have absolute filenames starting with ``"/"`` or filenames with two
340 dots ``".."``.
341
Georg Brandl116aa622007-08-15 14:28:22 +0000342
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000343.. method:: TarFile.extract(member, path="")
Georg Brandl116aa622007-08-15 14:28:22 +0000344
345 Extract a member from the archive to the current working directory, using its
346 full name. Its file information is extracted as accurately as possible. *member*
347 may be a filename or a :class:`TarInfo` object. You can specify a different
348 directory using *path*.
349
350 .. note::
351
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000352 The :meth:`extract` method does not take care of several extraction issues.
353 In most cases you should consider using the :meth:`extractall` method.
Georg Brandl116aa622007-08-15 14:28:22 +0000354
Thomas Wouters47b49bf2007-08-30 22:15:33 +0000355 .. warning::
356
357 See the warning for :meth:`extractall`.
358
Georg Brandl116aa622007-08-15 14:28:22 +0000359
360.. method:: TarFile.extractfile(member)
361
362 Extract a member from the archive as a file object. *member* may be a filename
363 or a :class:`TarInfo` object. If *member* is a regular file, a file-like object
364 is returned. If *member* is a link, a file-like object is constructed from the
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000365 link's target. If *member* is none of the above, :const:`None` is returned.
Georg Brandl116aa622007-08-15 14:28:22 +0000366
367 .. note::
368
369 The file-like object is read-only and provides the following methods:
370 :meth:`read`, :meth:`readline`, :meth:`readlines`, :meth:`seek`, :meth:`tell`.
371
372
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000373.. method:: TarFile.add(name, arcname=None, recursive=True, exclude=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000374
375 Add the file *name* to the archive. *name* may be any type of file (directory,
376 fifo, symbolic link, etc.). If given, *arcname* specifies an alternative name
377 for the file in the archive. Directories are added recursively by default. This
Georg Brandl55ac8f02007-09-01 13:51:09 +0000378 can be avoided by setting *recursive* to :const:`False`. If *exclude* is given,
Georg Brandl116aa622007-08-15 14:28:22 +0000379 it must be a function that takes one filename argument and returns a boolean
380 value. Depending on this value the respective file is either excluded
381 (:const:`True`) or added (:const:`False`).
382
Georg Brandl116aa622007-08-15 14:28:22 +0000383
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000384.. method:: TarFile.addfile(tarinfo, fileobj=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000385
386 Add the :class:`TarInfo` object *tarinfo* to the archive. If *fileobj* is given,
387 ``tarinfo.size`` bytes are read from it and added to the archive. You can
388 create :class:`TarInfo` objects using :meth:`gettarinfo`.
389
390 .. note::
391
392 On Windows platforms, *fileobj* should always be opened with mode ``'rb'`` to
393 avoid irritation about the file size.
394
395
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000396.. method:: TarFile.gettarinfo(name=None, arcname=None, fileobj=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000397
398 Create a :class:`TarInfo` object for either the file *name* or the file object
399 *fileobj* (using :func:`os.fstat` on its file descriptor). You can modify some
400 of the :class:`TarInfo`'s attributes before you add it using :meth:`addfile`.
401 If given, *arcname* specifies an alternative name for the file in the archive.
402
403
404.. method:: TarFile.close()
405
406 Close the :class:`TarFile`. In write mode, two finishing zero blocks are
407 appended to the archive.
408
409
410.. attribute:: TarFile.posix
411
412 Setting this to :const:`True` is equivalent to setting the :attr:`format`
413 attribute to :const:`USTAR_FORMAT`, :const:`False` is equivalent to
414 :const:`GNU_FORMAT`.
415
Georg Brandl55ac8f02007-09-01 13:51:09 +0000416 *posix* defaults to :const:`False`.
Georg Brandl116aa622007-08-15 14:28:22 +0000417
418 .. deprecated:: 2.6
419 Use the :attr:`format` attribute instead.
420
421
422.. attribute:: TarFile.pax_headers
423
424 A dictionary containing key-value pairs of pax global headers.
425
Georg Brandl116aa622007-08-15 14:28:22 +0000426
Georg Brandl116aa622007-08-15 14:28:22 +0000427
428.. _tarinfo-objects:
429
430TarInfo Objects
431---------------
432
433A :class:`TarInfo` object represents one member in a :class:`TarFile`. Aside
434from storing all required attributes of a file (like file type, size, time,
435permissions, owner etc.), it provides some useful methods to determine its type.
436It does *not* contain the file's data itself.
437
438:class:`TarInfo` objects are returned by :class:`TarFile`'s methods
439:meth:`getmember`, :meth:`getmembers` and :meth:`gettarinfo`.
440
441
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000442.. class:: TarInfo(name="")
Georg Brandl116aa622007-08-15 14:28:22 +0000443
444 Create a :class:`TarInfo` object.
445
446
447.. method:: TarInfo.frombuf(buf)
448
449 Create and return a :class:`TarInfo` object from string buffer *buf*.
450
Georg Brandl55ac8f02007-09-01 13:51:09 +0000451 Raises :exc:`HeaderError` if the buffer is invalid..
Georg Brandl116aa622007-08-15 14:28:22 +0000452
453
454.. method:: TarInfo.fromtarfile(tarfile)
455
456 Read the next member from the :class:`TarFile` object *tarfile* and return it as
457 a :class:`TarInfo` object.
458
Georg Brandl116aa622007-08-15 14:28:22 +0000459
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000460.. method:: TarInfo.tobuf(format=DEFAULT_FORMAT, encoding=ENCODING, errors='strict')
Georg Brandl116aa622007-08-15 14:28:22 +0000461
462 Create a string buffer from a :class:`TarInfo` object. For information on the
463 arguments see the constructor of the :class:`TarFile` class.
464
Georg Brandl116aa622007-08-15 14:28:22 +0000465
466A ``TarInfo`` object has the following public data attributes:
467
468
469.. attribute:: TarInfo.name
470
471 Name of the archive member.
472
473
474.. attribute:: TarInfo.size
475
476 Size in bytes.
477
478
479.. attribute:: TarInfo.mtime
480
481 Time of last modification.
482
483
484.. attribute:: TarInfo.mode
485
486 Permission bits.
487
488
489.. attribute:: TarInfo.type
490
491 File type. *type* is usually one of these constants: :const:`REGTYPE`,
492 :const:`AREGTYPE`, :const:`LNKTYPE`, :const:`SYMTYPE`, :const:`DIRTYPE`,
493 :const:`FIFOTYPE`, :const:`CONTTYPE`, :const:`CHRTYPE`, :const:`BLKTYPE`,
494 :const:`GNUTYPE_SPARSE`. To determine the type of a :class:`TarInfo` object
495 more conveniently, use the ``is_*()`` methods below.
496
497
498.. attribute:: TarInfo.linkname
499
500 Name of the target file name, which is only present in :class:`TarInfo` objects
501 of type :const:`LNKTYPE` and :const:`SYMTYPE`.
502
503
504.. attribute:: TarInfo.uid
505
506 User ID of the user who originally stored this member.
507
508
509.. attribute:: TarInfo.gid
510
511 Group ID of the user who originally stored this member.
512
513
514.. attribute:: TarInfo.uname
515
516 User name.
517
518
519.. attribute:: TarInfo.gname
520
521 Group name.
522
523
524.. attribute:: TarInfo.pax_headers
525
526 A dictionary containing key-value pairs of an associated pax extended header.
527
Georg Brandl116aa622007-08-15 14:28:22 +0000528
529A :class:`TarInfo` object also provides some convenient query methods:
530
531
532.. method:: TarInfo.isfile()
533
534 Return :const:`True` if the :class:`Tarinfo` object is a regular file.
535
536
537.. method:: TarInfo.isreg()
538
539 Same as :meth:`isfile`.
540
541
542.. method:: TarInfo.isdir()
543
544 Return :const:`True` if it is a directory.
545
546
547.. method:: TarInfo.issym()
548
549 Return :const:`True` if it is a symbolic link.
550
551
552.. method:: TarInfo.islnk()
553
554 Return :const:`True` if it is a hard link.
555
556
557.. method:: TarInfo.ischr()
558
559 Return :const:`True` if it is a character device.
560
561
562.. method:: TarInfo.isblk()
563
564 Return :const:`True` if it is a block device.
565
566
567.. method:: TarInfo.isfifo()
568
569 Return :const:`True` if it is a FIFO.
570
571
572.. method:: TarInfo.isdev()
573
574 Return :const:`True` if it is one of character device, block device or FIFO.
575
Georg Brandl116aa622007-08-15 14:28:22 +0000576
577.. _tar-examples:
578
579Examples
580--------
581
582How to extract an entire tar archive to the current working directory::
583
584 import tarfile
585 tar = tarfile.open("sample.tar.gz")
586 tar.extractall()
587 tar.close()
588
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000589How to extract a subset of a tar archive with :meth:`TarFile.extractall` using
590a generator function instead of a list::
591
592 import os
593 import tarfile
594
595 def py_files(members):
596 for tarinfo in members:
597 if os.path.splitext(tarinfo.name)[1] == ".py":
598 yield tarinfo
599
600 tar = tarfile.open("sample.tar.gz")
601 tar.extractall(members=py_files(tar))
602 tar.close()
603
Georg Brandl116aa622007-08-15 14:28:22 +0000604How to create an uncompressed tar archive from a list of filenames::
605
606 import tarfile
607 tar = tarfile.open("sample.tar", "w")
608 for name in ["foo", "bar", "quux"]:
609 tar.add(name)
610 tar.close()
611
612How to read a gzip compressed tar archive and display some member information::
613
614 import tarfile
615 tar = tarfile.open("sample.tar.gz", "r:gz")
616 for tarinfo in tar:
Collin Winterc79461b2007-09-01 23:34:30 +0000617 print(tarinfo.name, "is", tarinfo.size, "bytes in size and is", end="")
Georg Brandl116aa622007-08-15 14:28:22 +0000618 if tarinfo.isreg():
Collin Winterc79461b2007-09-01 23:34:30 +0000619 print("a regular file.")
Georg Brandl116aa622007-08-15 14:28:22 +0000620 elif tarinfo.isdir():
Collin Winterc79461b2007-09-01 23:34:30 +0000621 print("a directory.")
Georg Brandl116aa622007-08-15 14:28:22 +0000622 else:
Collin Winterc79461b2007-09-01 23:34:30 +0000623 print("something else.")
Georg Brandl116aa622007-08-15 14:28:22 +0000624 tar.close()
625
Georg Brandl116aa622007-08-15 14:28:22 +0000626
627.. _tar-formats:
628
629Supported tar formats
630---------------------
631
632There are three tar formats that can be created with the :mod:`tarfile` module:
633
634* The POSIX.1-1988 ustar format (:const:`USTAR_FORMAT`). It supports filenames
635 up to a length of at best 256 characters and linknames up to 100 characters. The
636 maximum file size is 8 gigabytes. This is an old and limited but widely
637 supported format.
638
639* The GNU tar format (:const:`GNU_FORMAT`). It supports long filenames and
640 linknames, files bigger than 8 gigabytes and sparse files. It is the de facto
641 standard on GNU/Linux systems. :mod:`tarfile` fully supports the GNU tar
642 extensions for long names, sparse file support is read-only.
643
644* The POSIX.1-2001 pax format (:const:`PAX_FORMAT`). It is the most flexible
645 format with virtually no limits. It supports long filenames and linknames, large
646 files and stores pathnames in a portable way. However, not all tar
647 implementations today are able to handle pax archives properly.
648
649 The *pax* format is an extension to the existing *ustar* format. It uses extra
650 headers for information that cannot be stored otherwise. There are two flavours
651 of pax headers: Extended headers only affect the subsequent file header, global
652 headers are valid for the complete archive and affect all following files. All
653 the data in a pax header is encoded in *UTF-8* for portability reasons.
654
655There are some more variants of the tar format which can be read, but not
656created:
657
658* The ancient V7 format. This is the first tar format from Unix Seventh Edition,
659 storing only regular files and directories. Names must not be longer than 100
660 characters, there is no user/group name information. Some archives have
661 miscalculated header checksums in case of fields with non-ASCII characters.
662
663* The SunOS tar extended format. This format is a variant of the POSIX.1-2001
664 pax format, but is not compatible.
665
Georg Brandl116aa622007-08-15 14:28:22 +0000666.. _tar-unicode:
667
668Unicode issues
669--------------
670
671The tar format was originally conceived to make backups on tape drives with the
672main focus on preserving file system information. Nowadays tar archives are
673commonly used for file distribution and exchanging archives over networks. One
Lars Gustäbel3741eff2007-08-21 12:17:05 +0000674problem of the original format (which is the basis of all other formats) is
675that there is no concept of supporting different character encodings. For
Georg Brandl116aa622007-08-15 14:28:22 +0000676example, an ordinary tar archive created on a *UTF-8* system cannot be read
Lars Gustäbel3741eff2007-08-21 12:17:05 +0000677correctly on a *Latin-1* system if it contains non-*ASCII* characters. Textual
678metadata (like filenames, linknames, user/group names) will appear damaged.
679Unfortunately, there is no way to autodetect the encoding of an archive. The
680pax format was designed to solve this problem. It stores non-ASCII metadata
681using the universal character encoding *UTF-8*.
Georg Brandl116aa622007-08-15 14:28:22 +0000682
Lars Gustäbel3741eff2007-08-21 12:17:05 +0000683The details of character conversion in :mod:`tarfile` are controlled by the
684*encoding* and *errors* keyword arguments of the :class:`TarFile` class.
Georg Brandl116aa622007-08-15 14:28:22 +0000685
Lars Gustäbel3741eff2007-08-21 12:17:05 +0000686*encoding* defines the character encoding to use for the metadata in the
687archive. The default value is :func:`sys.getfilesystemencoding` or ``'ascii'``
688as a fallback. Depending on whether the archive is read or written, the
689metadata must be either decoded or encoded. If *encoding* is not set
690appropriately, this conversion may fail.
Georg Brandl116aa622007-08-15 14:28:22 +0000691
692The *errors* argument defines how characters are treated that cannot be
Lars Gustäbel3741eff2007-08-21 12:17:05 +0000693converted. Possible values are listed in section :ref:`codec-base-classes`. In
694read mode the default scheme is ``'replace'``. This avoids unexpected
695:exc:`UnicodeError` exceptions and guarantees that an archive can always be
696read. In write mode the default value for *errors* is ``'strict'``. This
697ensures that name information is not altered unnoticed.
Georg Brandl116aa622007-08-15 14:28:22 +0000698
Lars Gustäbel3741eff2007-08-21 12:17:05 +0000699In case of writing :const:`PAX_FORMAT` archives, *encoding* is ignored because
700non-ASCII metadata is stored using *UTF-8*.