blob: 1f53037c570b7d9c862e047558159b184be0e635 [file] [log] [blame]
Georg Brandl116aa622007-08-15 14:28:22 +00001:mod:`tarfile` --- Read and write tar archive files
2===================================================
3
4.. module:: tarfile
5 :synopsis: Read and write tar-format archive files.
6
7
Georg Brandl116aa622007-08-15 14:28:22 +00008.. moduleauthor:: Lars Gustäbel <lars@gustaebel.de>
9.. sectionauthor:: Lars Gustäbel <lars@gustaebel.de>
10
11
Guido van Rossum77677112007-11-05 19:43:04 +000012The :mod:`tarfile` module makes it possible to read and write tar
13archives, including those using gzip or bz2 compression.
Christian Heimes255f53b2007-12-08 15:33:56 +000014(:file:`.zip` files can be read and written using the :mod:`zipfile` module.)
Guido van Rossum77677112007-11-05 19:43:04 +000015
Georg Brandl116aa622007-08-15 14:28:22 +000016Some facts and figures:
17
Guido van Rossum77677112007-11-05 19:43:04 +000018* reads and writes :mod:`gzip` and :mod:`bz2` compressed archives.
Georg Brandl116aa622007-08-15 14:28:22 +000019
20* read/write support for the POSIX.1-1988 (ustar) format.
21
22* read/write support for the GNU tar format including *longname* and *longlink*
23 extensions, read-only support for the *sparse* extension.
24
25* read/write support for the POSIX.1-2001 (pax) format.
26
Georg Brandl116aa622007-08-15 14:28:22 +000027* handles directories, regular files, hardlinks, symbolic links, fifos,
28 character devices and block devices and is able to acquire and restore file
29 information like timestamp, access permissions and owner.
30
Georg Brandl116aa622007-08-15 14:28:22 +000031
Benjamin Petersona37cfc62008-05-26 13:48:34 +000032.. function:: open(name=None, mode='r', fileobj=None, bufsize=10240, \*\*kwargs)
Georg Brandl116aa622007-08-15 14:28:22 +000033
34 Return a :class:`TarFile` object for the pathname *name*. For detailed
35 information on :class:`TarFile` objects and the keyword arguments that are
36 allowed, see :ref:`tarfile-objects`.
37
38 *mode* has to be a string of the form ``'filemode[:compression]'``, it defaults
39 to ``'r'``. Here is a full list of mode combinations:
40
41 +------------------+---------------------------------------------+
42 | mode | action |
43 +==================+=============================================+
44 | ``'r' or 'r:*'`` | Open for reading with transparent |
45 | | compression (recommended). |
46 +------------------+---------------------------------------------+
47 | ``'r:'`` | Open for reading exclusively without |
48 | | compression. |
49 +------------------+---------------------------------------------+
50 | ``'r:gz'`` | Open for reading with gzip compression. |
51 +------------------+---------------------------------------------+
52 | ``'r:bz2'`` | Open for reading with bzip2 compression. |
53 +------------------+---------------------------------------------+
54 | ``'a' or 'a:'`` | Open for appending with no compression. The |
55 | | file is created if it does not exist. |
56 +------------------+---------------------------------------------+
57 | ``'w' or 'w:'`` | Open for uncompressed writing. |
58 +------------------+---------------------------------------------+
59 | ``'w:gz'`` | Open for gzip compressed writing. |
60 +------------------+---------------------------------------------+
61 | ``'w:bz2'`` | Open for bzip2 compressed writing. |
62 +------------------+---------------------------------------------+
63
64 Note that ``'a:gz'`` or ``'a:bz2'`` is not possible. If *mode* is not suitable
65 to open a certain (compressed) file for reading, :exc:`ReadError` is raised. Use
66 *mode* ``'r'`` to avoid this. If a compression method is not supported,
67 :exc:`CompressionError` is raised.
68
69 If *fileobj* is specified, it is used as an alternative to a file object opened
70 for *name*. It is supposed to be at position 0.
71
72 For special purposes, there is a second format for *mode*:
Benjamin Petersona37cfc62008-05-26 13:48:34 +000073 ``'filemode|[compression]'``. :func:`tarfile.open` will return a :class:`TarFile`
Georg Brandl116aa622007-08-15 14:28:22 +000074 object that processes its data as a stream of blocks. No random seeking will
75 be done on the file. If given, *fileobj* may be any object that has a
76 :meth:`read` or :meth:`write` method (depending on the *mode*). *bufsize*
77 specifies the blocksize and defaults to ``20 * 512`` bytes. Use this variant
78 in combination with e.g. ``sys.stdin``, a socket file object or a tape
79 device. However, such a :class:`TarFile` object is limited in that it does
80 not allow to be accessed randomly, see :ref:`tar-examples`. The currently
81 possible modes:
82
83 +-------------+--------------------------------------------+
84 | Mode | Action |
85 +=============+============================================+
86 | ``'r|*'`` | Open a *stream* of tar blocks for reading |
87 | | with transparent compression. |
88 +-------------+--------------------------------------------+
89 | ``'r|'`` | Open a *stream* of uncompressed tar blocks |
90 | | for reading. |
91 +-------------+--------------------------------------------+
92 | ``'r|gz'`` | Open a gzip compressed *stream* for |
93 | | reading. |
94 +-------------+--------------------------------------------+
95 | ``'r|bz2'`` | Open a bzip2 compressed *stream* for |
96 | | reading. |
97 +-------------+--------------------------------------------+
98 | ``'w|'`` | Open an uncompressed *stream* for writing. |
99 +-------------+--------------------------------------------+
100 | ``'w|gz'`` | Open an gzip compressed *stream* for |
101 | | writing. |
102 +-------------+--------------------------------------------+
103 | ``'w|bz2'`` | Open an bzip2 compressed *stream* for |
104 | | writing. |
105 +-------------+--------------------------------------------+
106
107
108.. class:: TarFile
109
110 Class for reading and writing tar archives. Do not use this class directly,
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000111 better use :func:`tarfile.open` instead. See :ref:`tarfile-objects`.
Georg Brandl116aa622007-08-15 14:28:22 +0000112
113
114.. function:: is_tarfile(name)
115
116 Return :const:`True` if *name* is a tar archive file, that the :mod:`tarfile`
117 module can read.
118
119
Lars Gustäbel0c24e8b2008-08-02 11:43:24 +0000120The :mod:`tarfile` module defines the following exceptions:
Georg Brandl116aa622007-08-15 14:28:22 +0000121
122
123.. exception:: TarError
124
125 Base class for all :mod:`tarfile` exceptions.
126
127
128.. exception:: ReadError
129
130 Is raised when a tar archive is opened, that either cannot be handled by the
131 :mod:`tarfile` module or is somehow invalid.
132
133
134.. exception:: CompressionError
135
136 Is raised when a compression method is not supported or when the data cannot be
137 decoded properly.
138
139
140.. exception:: StreamError
141
142 Is raised for the limitations that are typical for stream-like :class:`TarFile`
143 objects.
144
145
146.. exception:: ExtractError
147
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000148 Is raised for *non-fatal* errors when using :meth:`TarFile.extract`, but only if
Georg Brandl116aa622007-08-15 14:28:22 +0000149 :attr:`TarFile.errorlevel`\ ``== 2``.
150
151
152.. exception:: HeaderError
153
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000154 Is raised by :meth:`TarInfo.frombuf` if the buffer it gets is invalid.
155
Georg Brandl116aa622007-08-15 14:28:22 +0000156
Georg Brandl116aa622007-08-15 14:28:22 +0000157
158Each of the following constants defines a tar archive format that the
159:mod:`tarfile` module is able to create. See section :ref:`tar-formats` for
160details.
161
162
163.. data:: USTAR_FORMAT
164
165 POSIX.1-1988 (ustar) format.
166
167
168.. data:: GNU_FORMAT
169
170 GNU tar format.
171
172
173.. data:: PAX_FORMAT
174
175 POSIX.1-2001 (pax) format.
176
177
178.. data:: DEFAULT_FORMAT
179
180 The default format for creating archives. This is currently :const:`GNU_FORMAT`.
181
182
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000183The following variables are available on module level:
184
185
186.. data:: ENCODING
187
188 The default character encoding i.e. the value from either
189 :func:`sys.getfilesystemencoding` or :func:`sys.getdefaultencoding`.
190
191
Georg Brandl116aa622007-08-15 14:28:22 +0000192.. seealso::
193
194 Module :mod:`zipfile`
195 Documentation of the :mod:`zipfile` standard module.
196
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000197 `GNU tar manual, Basic Tar Format <http://www.gnu.org/software/tar/manual/html_node/Standard.html>`_
Georg Brandl116aa622007-08-15 14:28:22 +0000198 Documentation for tar archive files, including GNU tar extensions.
199
Georg Brandl116aa622007-08-15 14:28:22 +0000200
201.. _tarfile-objects:
202
203TarFile Objects
204---------------
205
206The :class:`TarFile` object provides an interface to a tar archive. A tar
207archive is a sequence of blocks. An archive member (a stored file) is made up of
208a header block followed by data blocks. It is possible to store a file in a tar
209archive several times. Each archive member is represented by a :class:`TarInfo`
210object, see :ref:`tarinfo-objects` for details.
211
212
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000213.. class:: TarFile(name=None, mode='r', fileobj=None, format=DEFAULT_FORMAT, tarinfo=TarInfo, dereference=False, ignore_zeros=False, encoding=ENCODING, errors=None, pax_headers=None, debug=0, errorlevel=0)
Georg Brandl116aa622007-08-15 14:28:22 +0000214
215 All following arguments are optional and can be accessed as instance attributes
216 as well.
217
218 *name* is the pathname of the archive. It can be omitted if *fileobj* is given.
219 In this case, the file object's :attr:`name` attribute is used if it exists.
220
221 *mode* is either ``'r'`` to read from an existing archive, ``'a'`` to append
222 data to an existing file or ``'w'`` to create a new file overwriting an existing
223 one.
224
225 If *fileobj* is given, it is used for reading or writing data. If it can be
226 determined, *mode* is overridden by *fileobj*'s mode. *fileobj* will be used
227 from position 0.
228
229 .. note::
230
231 *fileobj* is not closed, when :class:`TarFile` is closed.
232
233 *format* controls the archive format. It must be one of the constants
234 :const:`USTAR_FORMAT`, :const:`GNU_FORMAT` or :const:`PAX_FORMAT` that are
235 defined at module level.
236
Georg Brandl116aa622007-08-15 14:28:22 +0000237 The *tarinfo* argument can be used to replace the default :class:`TarInfo` class
238 with a different one.
239
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000240 If *dereference* is :const:`False`, add symbolic and hard links to the archive. If it
241 is :const:`True`, add the content of the target files to the archive. This has no
Georg Brandl116aa622007-08-15 14:28:22 +0000242 effect on systems that do not support symbolic links.
243
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000244 If *ignore_zeros* is :const:`False`, treat an empty block as the end of the archive.
245 If it is :const:`True`, skip empty (and invalid) blocks and try to get as many members
Georg Brandl116aa622007-08-15 14:28:22 +0000246 as possible. This is only useful for reading concatenated or damaged archives.
247
248 *debug* can be set from ``0`` (no debug messages) up to ``3`` (all debug
249 messages). The messages are written to ``sys.stderr``.
250
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000251 If *errorlevel* is ``0``, all errors are ignored when using :meth:`TarFile.extract`.
Georg Brandl116aa622007-08-15 14:28:22 +0000252 Nevertheless, they appear as error messages in the debug output, when debugging
253 is enabled. If ``1``, all *fatal* errors are raised as :exc:`OSError` or
254 :exc:`IOError` exceptions. If ``2``, all *non-fatal* errors are raised as
255 :exc:`TarError` exceptions as well.
256
Lars Gustäbel3741eff2007-08-21 12:17:05 +0000257 The *encoding* and *errors* arguments define the character encoding to be
258 used for reading or writing the archive and how conversion errors are going
259 to be handled. The default settings will work for most users.
Georg Brandl116aa622007-08-15 14:28:22 +0000260 See section :ref:`tar-unicode` for in-depth information.
261
Lars Gustäbel3741eff2007-08-21 12:17:05 +0000262 The *pax_headers* argument is an optional dictionary of strings which
Georg Brandl116aa622007-08-15 14:28:22 +0000263 will be added as a pax global header if *format* is :const:`PAX_FORMAT`.
264
Georg Brandl116aa622007-08-15 14:28:22 +0000265
266.. method:: TarFile.open(...)
267
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000268 Alternative constructor. The :func:`tarfile.open` function is actually a
269 shortcut to this classmethod.
Georg Brandl116aa622007-08-15 14:28:22 +0000270
271
272.. method:: TarFile.getmember(name)
273
274 Return a :class:`TarInfo` object for member *name*. If *name* can not be found
275 in the archive, :exc:`KeyError` is raised.
276
277 .. note::
278
279 If a member occurs more than once in the archive, its last occurrence is assumed
280 to be the most up-to-date version.
281
282
283.. method:: TarFile.getmembers()
284
285 Return the members of the archive as a list of :class:`TarInfo` objects. The
286 list has the same order as the members in the archive.
287
288
289.. method:: TarFile.getnames()
290
291 Return the members as a list of their names. It has the same order as the list
292 returned by :meth:`getmembers`.
293
294
295.. method:: TarFile.list(verbose=True)
296
297 Print a table of contents to ``sys.stdout``. If *verbose* is :const:`False`,
298 only the names of the members are printed. If it is :const:`True`, output
299 similar to that of :program:`ls -l` is produced.
300
301
302.. method:: TarFile.next()
303
304 Return the next member of the archive as a :class:`TarInfo` object, when
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000305 :class:`TarFile` is opened for reading. Return :const:`None` if there is no more
Georg Brandl116aa622007-08-15 14:28:22 +0000306 available.
307
308
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000309.. method:: TarFile.extractall(path=".", members=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000310
311 Extract all members from the archive to the current working directory or
312 directory *path*. If optional *members* is given, it must be a subset of the
313 list returned by :meth:`getmembers`. Directory information like owner,
314 modification time and permissions are set after all members have been extracted.
315 This is done to work around two problems: A directory's modification time is
316 reset each time a file is created in it. And, if a directory's permissions do
317 not allow writing, extracting files to it will fail.
318
Thomas Wouters47b49bf2007-08-30 22:15:33 +0000319 .. warning::
320
321 Never extract archives from untrusted sources without prior inspection.
322 It is possible that files are created outside of *path*, e.g. members
323 that have absolute filenames starting with ``"/"`` or filenames with two
324 dots ``".."``.
325
Georg Brandl116aa622007-08-15 14:28:22 +0000326
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000327.. method:: TarFile.extract(member, path="")
Georg Brandl116aa622007-08-15 14:28:22 +0000328
329 Extract a member from the archive to the current working directory, using its
330 full name. Its file information is extracted as accurately as possible. *member*
331 may be a filename or a :class:`TarInfo` object. You can specify a different
332 directory using *path*.
333
334 .. note::
335
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000336 The :meth:`extract` method does not take care of several extraction issues.
337 In most cases you should consider using the :meth:`extractall` method.
Georg Brandl116aa622007-08-15 14:28:22 +0000338
Thomas Wouters47b49bf2007-08-30 22:15:33 +0000339 .. warning::
340
341 See the warning for :meth:`extractall`.
342
Georg Brandl116aa622007-08-15 14:28:22 +0000343
344.. method:: TarFile.extractfile(member)
345
346 Extract a member from the archive as a file object. *member* may be a filename
347 or a :class:`TarInfo` object. If *member* is a regular file, a file-like object
348 is returned. If *member* is a link, a file-like object is constructed from the
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000349 link's target. If *member* is none of the above, :const:`None` is returned.
Georg Brandl116aa622007-08-15 14:28:22 +0000350
351 .. note::
352
Georg Brandlff2ad0e2009-04-27 16:51:45 +0000353 The file-like object is read-only. It provides the methods
354 :meth:`read`, :meth:`readline`, :meth:`readlines`, :meth:`seek`, :meth:`tell`,
355 and :meth:`close`, and also supports iteration over its lines.
Georg Brandl116aa622007-08-15 14:28:22 +0000356
357
Lars Gustäbel049d2aa2009-09-12 10:44:00 +0000358.. method:: TarFile.add(name, arcname=None, recursive=True, exclude=None, filter=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000359
360 Add the file *name* to the archive. *name* may be any type of file (directory,
361 fifo, symbolic link, etc.). If given, *arcname* specifies an alternative name
362 for the file in the archive. Directories are added recursively by default. This
Georg Brandl55ac8f02007-09-01 13:51:09 +0000363 can be avoided by setting *recursive* to :const:`False`. If *exclude* is given,
Georg Brandl116aa622007-08-15 14:28:22 +0000364 it must be a function that takes one filename argument and returns a boolean
365 value. Depending on this value the respective file is either excluded
Lars Gustäbel049d2aa2009-09-12 10:44:00 +0000366 (:const:`True`) or added (:const:`False`). If *filter* is specified it must
367 be a function that takes a :class:`TarInfo` object argument and returns the
368 changed TarInfo object. If it instead returns :const:`None` the TarInfo
369 object will be excluded from the archive. See :ref:`tar-examples` for an
370 example.
371
372 .. versionchanged:: 3.2
373 Added the *filter* parameter.
374
375 .. deprecated:: 3.2
376 The *exclude* parameter is deprecated, please use the *filter* parameter
377 instead.
Georg Brandl116aa622007-08-15 14:28:22 +0000378
Georg Brandl116aa622007-08-15 14:28:22 +0000379
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000380.. method:: TarFile.addfile(tarinfo, fileobj=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000381
382 Add the :class:`TarInfo` object *tarinfo* to the archive. If *fileobj* is given,
383 ``tarinfo.size`` bytes are read from it and added to the archive. You can
384 create :class:`TarInfo` objects using :meth:`gettarinfo`.
385
386 .. note::
387
388 On Windows platforms, *fileobj* should always be opened with mode ``'rb'`` to
389 avoid irritation about the file size.
390
391
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000392.. method:: TarFile.gettarinfo(name=None, arcname=None, fileobj=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000393
394 Create a :class:`TarInfo` object for either the file *name* or the file object
395 *fileobj* (using :func:`os.fstat` on its file descriptor). You can modify some
396 of the :class:`TarInfo`'s attributes before you add it using :meth:`addfile`.
397 If given, *arcname* specifies an alternative name for the file in the archive.
398
399
400.. method:: TarFile.close()
401
402 Close the :class:`TarFile`. In write mode, two finishing zero blocks are
403 appended to the archive.
404
405
Georg Brandl116aa622007-08-15 14:28:22 +0000406.. attribute:: TarFile.pax_headers
407
408 A dictionary containing key-value pairs of pax global headers.
409
Georg Brandl116aa622007-08-15 14:28:22 +0000410
Georg Brandl116aa622007-08-15 14:28:22 +0000411
412.. _tarinfo-objects:
413
414TarInfo Objects
415---------------
416
417A :class:`TarInfo` object represents one member in a :class:`TarFile`. Aside
418from storing all required attributes of a file (like file type, size, time,
419permissions, owner etc.), it provides some useful methods to determine its type.
420It does *not* contain the file's data itself.
421
422:class:`TarInfo` objects are returned by :class:`TarFile`'s methods
423:meth:`getmember`, :meth:`getmembers` and :meth:`gettarinfo`.
424
425
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000426.. class:: TarInfo(name="")
Georg Brandl116aa622007-08-15 14:28:22 +0000427
428 Create a :class:`TarInfo` object.
429
430
431.. method:: TarInfo.frombuf(buf)
432
433 Create and return a :class:`TarInfo` object from string buffer *buf*.
434
Georg Brandl55ac8f02007-09-01 13:51:09 +0000435 Raises :exc:`HeaderError` if the buffer is invalid..
Georg Brandl116aa622007-08-15 14:28:22 +0000436
437
438.. method:: TarInfo.fromtarfile(tarfile)
439
440 Read the next member from the :class:`TarFile` object *tarfile* and return it as
441 a :class:`TarInfo` object.
442
Georg Brandl116aa622007-08-15 14:28:22 +0000443
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000444.. method:: TarInfo.tobuf(format=DEFAULT_FORMAT, encoding=ENCODING, errors='strict')
Georg Brandl116aa622007-08-15 14:28:22 +0000445
446 Create a string buffer from a :class:`TarInfo` object. For information on the
447 arguments see the constructor of the :class:`TarFile` class.
448
Georg Brandl116aa622007-08-15 14:28:22 +0000449
450A ``TarInfo`` object has the following public data attributes:
451
452
453.. attribute:: TarInfo.name
454
455 Name of the archive member.
456
457
458.. attribute:: TarInfo.size
459
460 Size in bytes.
461
462
463.. attribute:: TarInfo.mtime
464
465 Time of last modification.
466
467
468.. attribute:: TarInfo.mode
469
470 Permission bits.
471
472
473.. attribute:: TarInfo.type
474
475 File type. *type* is usually one of these constants: :const:`REGTYPE`,
476 :const:`AREGTYPE`, :const:`LNKTYPE`, :const:`SYMTYPE`, :const:`DIRTYPE`,
477 :const:`FIFOTYPE`, :const:`CONTTYPE`, :const:`CHRTYPE`, :const:`BLKTYPE`,
478 :const:`GNUTYPE_SPARSE`. To determine the type of a :class:`TarInfo` object
479 more conveniently, use the ``is_*()`` methods below.
480
481
482.. attribute:: TarInfo.linkname
483
484 Name of the target file name, which is only present in :class:`TarInfo` objects
485 of type :const:`LNKTYPE` and :const:`SYMTYPE`.
486
487
488.. attribute:: TarInfo.uid
489
490 User ID of the user who originally stored this member.
491
492
493.. attribute:: TarInfo.gid
494
495 Group ID of the user who originally stored this member.
496
497
498.. attribute:: TarInfo.uname
499
500 User name.
501
502
503.. attribute:: TarInfo.gname
504
505 Group name.
506
507
508.. attribute:: TarInfo.pax_headers
509
510 A dictionary containing key-value pairs of an associated pax extended header.
511
Georg Brandl116aa622007-08-15 14:28:22 +0000512
513A :class:`TarInfo` object also provides some convenient query methods:
514
515
516.. method:: TarInfo.isfile()
517
518 Return :const:`True` if the :class:`Tarinfo` object is a regular file.
519
520
521.. method:: TarInfo.isreg()
522
523 Same as :meth:`isfile`.
524
525
526.. method:: TarInfo.isdir()
527
528 Return :const:`True` if it is a directory.
529
530
531.. method:: TarInfo.issym()
532
533 Return :const:`True` if it is a symbolic link.
534
535
536.. method:: TarInfo.islnk()
537
538 Return :const:`True` if it is a hard link.
539
540
541.. method:: TarInfo.ischr()
542
543 Return :const:`True` if it is a character device.
544
545
546.. method:: TarInfo.isblk()
547
548 Return :const:`True` if it is a block device.
549
550
551.. method:: TarInfo.isfifo()
552
553 Return :const:`True` if it is a FIFO.
554
555
556.. method:: TarInfo.isdev()
557
558 Return :const:`True` if it is one of character device, block device or FIFO.
559
Georg Brandl116aa622007-08-15 14:28:22 +0000560
561.. _tar-examples:
562
563Examples
564--------
565
566How to extract an entire tar archive to the current working directory::
567
568 import tarfile
569 tar = tarfile.open("sample.tar.gz")
570 tar.extractall()
571 tar.close()
572
Benjamin Petersona37cfc62008-05-26 13:48:34 +0000573How to extract a subset of a tar archive with :meth:`TarFile.extractall` using
574a generator function instead of a list::
575
576 import os
577 import tarfile
578
579 def py_files(members):
580 for tarinfo in members:
581 if os.path.splitext(tarinfo.name)[1] == ".py":
582 yield tarinfo
583
584 tar = tarfile.open("sample.tar.gz")
585 tar.extractall(members=py_files(tar))
586 tar.close()
587
Georg Brandl116aa622007-08-15 14:28:22 +0000588How to create an uncompressed tar archive from a list of filenames::
589
590 import tarfile
591 tar = tarfile.open("sample.tar", "w")
592 for name in ["foo", "bar", "quux"]:
593 tar.add(name)
594 tar.close()
595
596How to read a gzip compressed tar archive and display some member information::
597
598 import tarfile
599 tar = tarfile.open("sample.tar.gz", "r:gz")
600 for tarinfo in tar:
Collin Winterc79461b2007-09-01 23:34:30 +0000601 print(tarinfo.name, "is", tarinfo.size, "bytes in size and is", end="")
Georg Brandl116aa622007-08-15 14:28:22 +0000602 if tarinfo.isreg():
Collin Winterc79461b2007-09-01 23:34:30 +0000603 print("a regular file.")
Georg Brandl116aa622007-08-15 14:28:22 +0000604 elif tarinfo.isdir():
Collin Winterc79461b2007-09-01 23:34:30 +0000605 print("a directory.")
Georg Brandl116aa622007-08-15 14:28:22 +0000606 else:
Collin Winterc79461b2007-09-01 23:34:30 +0000607 print("something else.")
Georg Brandl116aa622007-08-15 14:28:22 +0000608 tar.close()
609
Lars Gustäbel049d2aa2009-09-12 10:44:00 +0000610How to create an archive and reset the user information using the *filter*
611parameter in :meth:`TarFile.add`::
612
613 import tarfile
614 def reset(tarinfo):
615 tarinfo.uid = tarinfo.gid = 0
616 tarinfo.uname = tarinfo.gname = "root"
617 return tarinfo
618 tar = tarfile.open("sample.tar.gz", "w:gz")
619 tar.add("foo", filter=reset)
620 tar.close()
621
Georg Brandl116aa622007-08-15 14:28:22 +0000622
623.. _tar-formats:
624
625Supported tar formats
626---------------------
627
628There are three tar formats that can be created with the :mod:`tarfile` module:
629
630* The POSIX.1-1988 ustar format (:const:`USTAR_FORMAT`). It supports filenames
631 up to a length of at best 256 characters and linknames up to 100 characters. The
632 maximum file size is 8 gigabytes. This is an old and limited but widely
633 supported format.
634
635* The GNU tar format (:const:`GNU_FORMAT`). It supports long filenames and
636 linknames, files bigger than 8 gigabytes and sparse files. It is the de facto
637 standard on GNU/Linux systems. :mod:`tarfile` fully supports the GNU tar
638 extensions for long names, sparse file support is read-only.
639
640* The POSIX.1-2001 pax format (:const:`PAX_FORMAT`). It is the most flexible
641 format with virtually no limits. It supports long filenames and linknames, large
642 files and stores pathnames in a portable way. However, not all tar
643 implementations today are able to handle pax archives properly.
644
645 The *pax* format is an extension to the existing *ustar* format. It uses extra
646 headers for information that cannot be stored otherwise. There are two flavours
647 of pax headers: Extended headers only affect the subsequent file header, global
648 headers are valid for the complete archive and affect all following files. All
649 the data in a pax header is encoded in *UTF-8* for portability reasons.
650
651There are some more variants of the tar format which can be read, but not
652created:
653
654* The ancient V7 format. This is the first tar format from Unix Seventh Edition,
655 storing only regular files and directories. Names must not be longer than 100
656 characters, there is no user/group name information. Some archives have
657 miscalculated header checksums in case of fields with non-ASCII characters.
658
659* The SunOS tar extended format. This format is a variant of the POSIX.1-2001
660 pax format, but is not compatible.
661
Georg Brandl116aa622007-08-15 14:28:22 +0000662.. _tar-unicode:
663
664Unicode issues
665--------------
666
667The tar format was originally conceived to make backups on tape drives with the
668main focus on preserving file system information. Nowadays tar archives are
669commonly used for file distribution and exchanging archives over networks. One
Lars Gustäbel3741eff2007-08-21 12:17:05 +0000670problem of the original format (which is the basis of all other formats) is
671that there is no concept of supporting different character encodings. For
Georg Brandl116aa622007-08-15 14:28:22 +0000672example, an ordinary tar archive created on a *UTF-8* system cannot be read
Lars Gustäbel3741eff2007-08-21 12:17:05 +0000673correctly on a *Latin-1* system if it contains non-*ASCII* characters. Textual
674metadata (like filenames, linknames, user/group names) will appear damaged.
675Unfortunately, there is no way to autodetect the encoding of an archive. The
676pax format was designed to solve this problem. It stores non-ASCII metadata
677using the universal character encoding *UTF-8*.
Georg Brandl116aa622007-08-15 14:28:22 +0000678
Lars Gustäbel3741eff2007-08-21 12:17:05 +0000679The details of character conversion in :mod:`tarfile` are controlled by the
680*encoding* and *errors* keyword arguments of the :class:`TarFile` class.
Georg Brandl116aa622007-08-15 14:28:22 +0000681
Lars Gustäbel3741eff2007-08-21 12:17:05 +0000682*encoding* defines the character encoding to use for the metadata in the
683archive. The default value is :func:`sys.getfilesystemencoding` or ``'ascii'``
684as a fallback. Depending on whether the archive is read or written, the
685metadata must be either decoded or encoded. If *encoding* is not set
686appropriately, this conversion may fail.
Georg Brandl116aa622007-08-15 14:28:22 +0000687
688The *errors* argument defines how characters are treated that cannot be
Lars Gustäbel3741eff2007-08-21 12:17:05 +0000689converted. Possible values are listed in section :ref:`codec-base-classes`. In
690read mode the default scheme is ``'replace'``. This avoids unexpected
691:exc:`UnicodeError` exceptions and guarantees that an archive can always be
692read. In write mode the default value for *errors* is ``'strict'``. This
693ensures that name information is not altered unnoticed.
Georg Brandl116aa622007-08-15 14:28:22 +0000694
Lars Gustäbel3741eff2007-08-21 12:17:05 +0000695In case of writing :const:`PAX_FORMAT` archives, *encoding* is ignored because
696non-ASCII metadata is stored using *UTF-8*.