blob: 75bde660eb3e81e17cc3bb61cfd0d5511c4dd68c [file] [log] [blame]
Georg Brandl8ec7f652007-08-15 14:28:01 +00001:mod:`tarfile` --- Read and write tar archive files
2===================================================
3
4.. module:: tarfile
5 :synopsis: Read and write tar-format archive files.
6
7
8.. versionadded:: 2.3
9
10.. moduleauthor:: Lars Gustäbel <lars@gustaebel.de>
11.. sectionauthor:: Lars Gustäbel <lars@gustaebel.de>
12
Éric Araujo29a0b572011-08-19 02:14:03 +020013**Source code:** :source:`Lib/tarfile.py`
14
15--------------
Georg Brandl8ec7f652007-08-15 14:28:01 +000016
Mark Summerfieldaea6e592007-11-05 09:22:48 +000017The :mod:`tarfile` module makes it possible to read and write tar
18archives, including those using gzip or bz2 compression.
Éric Araujoc3cc2ac2012-02-26 01:10:14 +010019Use the :mod:`zipfile` module to read or write :file:`.zip` files, or the
20higher-level functions in :ref:`shutil <archiving-operations>`.
Mark Summerfieldaea6e592007-11-05 09:22:48 +000021
Georg Brandl8ec7f652007-08-15 14:28:01 +000022Some facts and figures:
23
R David Murrayc6cf35d2014-10-03 20:30:42 -040024* reads and writes :mod:`gzip` and :mod:`bz2` compressed archives
25 if the respective modules are available.
Georg Brandl8ec7f652007-08-15 14:28:01 +000026
27* read/write support for the POSIX.1-1988 (ustar) format.
28
29* read/write support for the GNU tar format including *longname* and *longlink*
30 extensions, read-only support for the *sparse* extension.
31
32* read/write support for the POSIX.1-2001 (pax) format.
33
34 .. versionadded:: 2.6
35
36* handles directories, regular files, hardlinks, symbolic links, fifos,
37 character devices and block devices and is able to acquire and restore file
38 information like timestamp, access permissions and owner.
39
Georg Brandl8ec7f652007-08-15 14:28:01 +000040
Lars Gustäbel4bfb5932008-05-17 16:50:22 +000041.. function:: open(name=None, mode='r', fileobj=None, bufsize=10240, \*\*kwargs)
Georg Brandl8ec7f652007-08-15 14:28:01 +000042
43 Return a :class:`TarFile` object for the pathname *name*. For detailed
44 information on :class:`TarFile` objects and the keyword arguments that are
45 allowed, see :ref:`tarfile-objects`.
46
47 *mode* has to be a string of the form ``'filemode[:compression]'``, it defaults
48 to ``'r'``. Here is a full list of mode combinations:
49
50 +------------------+---------------------------------------------+
51 | mode | action |
52 +==================+=============================================+
53 | ``'r' or 'r:*'`` | Open for reading with transparent |
54 | | compression (recommended). |
55 +------------------+---------------------------------------------+
56 | ``'r:'`` | Open for reading exclusively without |
57 | | compression. |
58 +------------------+---------------------------------------------+
59 | ``'r:gz'`` | Open for reading with gzip compression. |
60 +------------------+---------------------------------------------+
61 | ``'r:bz2'`` | Open for reading with bzip2 compression. |
62 +------------------+---------------------------------------------+
63 | ``'a' or 'a:'`` | Open for appending with no compression. The |
64 | | file is created if it does not exist. |
65 +------------------+---------------------------------------------+
66 | ``'w' or 'w:'`` | Open for uncompressed writing. |
67 +------------------+---------------------------------------------+
68 | ``'w:gz'`` | Open for gzip compressed writing. |
69 +------------------+---------------------------------------------+
70 | ``'w:bz2'`` | Open for bzip2 compressed writing. |
71 +------------------+---------------------------------------------+
72
73 Note that ``'a:gz'`` or ``'a:bz2'`` is not possible. If *mode* is not suitable
74 to open a certain (compressed) file for reading, :exc:`ReadError` is raised. Use
75 *mode* ``'r'`` to avoid this. If a compression method is not supported,
76 :exc:`CompressionError` is raised.
77
78 If *fileobj* is specified, it is used as an alternative to a file object opened
79 for *name*. It is supposed to be at position 0.
80
Benjamin Peterson3afd9562014-06-07 12:45:37 -070081 For modes ``'w:gz'``, ``'r:gz'``, ``'w:bz2'``, ``'r:bz2'``, :func:`tarfile.open`
82 accepts the keyword argument *compresslevel* to specify the compression level of
83 the file.
84
Georg Brandl8ec7f652007-08-15 14:28:01 +000085 For special purposes, there is a second format for *mode*:
Lars Gustäbel4bfb5932008-05-17 16:50:22 +000086 ``'filemode|[compression]'``. :func:`tarfile.open` will return a :class:`TarFile`
Georg Brandl8ec7f652007-08-15 14:28:01 +000087 object that processes its data as a stream of blocks. No random seeking will
88 be done on the file. If given, *fileobj* may be any object that has a
89 :meth:`read` or :meth:`write` method (depending on the *mode*). *bufsize*
90 specifies the blocksize and defaults to ``20 * 512`` bytes. Use this variant
91 in combination with e.g. ``sys.stdin``, a socket file object or a tape
92 device. However, such a :class:`TarFile` object is limited in that it does
93 not allow to be accessed randomly, see :ref:`tar-examples`. The currently
94 possible modes:
95
96 +-------------+--------------------------------------------+
97 | Mode | Action |
98 +=============+============================================+
99 | ``'r|*'`` | Open a *stream* of tar blocks for reading |
100 | | with transparent compression. |
101 +-------------+--------------------------------------------+
102 | ``'r|'`` | Open a *stream* of uncompressed tar blocks |
103 | | for reading. |
104 +-------------+--------------------------------------------+
105 | ``'r|gz'`` | Open a gzip compressed *stream* for |
106 | | reading. |
107 +-------------+--------------------------------------------+
108 | ``'r|bz2'`` | Open a bzip2 compressed *stream* for |
109 | | reading. |
110 +-------------+--------------------------------------------+
111 | ``'w|'`` | Open an uncompressed *stream* for writing. |
112 +-------------+--------------------------------------------+
113 | ``'w|gz'`` | Open an gzip compressed *stream* for |
114 | | writing. |
115 +-------------+--------------------------------------------+
116 | ``'w|bz2'`` | Open an bzip2 compressed *stream* for |
117 | | writing. |
118 +-------------+--------------------------------------------+
119
120
121.. class:: TarFile
122
123 Class for reading and writing tar archives. Do not use this class directly,
Lars Gustäbel4bfb5932008-05-17 16:50:22 +0000124 better use :func:`tarfile.open` instead. See :ref:`tarfile-objects`.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000125
126
127.. function:: is_tarfile(name)
128
129 Return :const:`True` if *name* is a tar archive file, that the :mod:`tarfile`
130 module can read.
131
132
Lars Gustäbel4bfb5932008-05-17 16:50:22 +0000133.. class:: TarFileCompat(filename, mode='r', compression=TAR_PLAIN)
Georg Brandl8ec7f652007-08-15 14:28:01 +0000134
135 Class for limited access to tar archives with a :mod:`zipfile`\ -like interface.
136 Please consult the documentation of the :mod:`zipfile` module for more details.
137 *compression* must be one of the following constants:
138
139
140 .. data:: TAR_PLAIN
141
142 Constant for an uncompressed tar archive.
143
144
145 .. data:: TAR_GZIPPED
146
147 Constant for a :mod:`gzip` compressed tar archive.
148
149
Lars Gustäbel727bd0b2008-08-02 11:26:39 +0000150 .. deprecated:: 2.6
Ezio Melotti510ff542012-05-03 19:21:40 +0300151 The :class:`TarFileCompat` class has been removed in Python 3.
Lars Gustäbel727bd0b2008-08-02 11:26:39 +0000152
153
Georg Brandl8ec7f652007-08-15 14:28:01 +0000154.. exception:: TarError
155
156 Base class for all :mod:`tarfile` exceptions.
157
158
159.. exception:: ReadError
160
161 Is raised when a tar archive is opened, that either cannot be handled by the
162 :mod:`tarfile` module or is somehow invalid.
163
164
165.. exception:: CompressionError
166
167 Is raised when a compression method is not supported or when the data cannot be
168 decoded properly.
169
170
171.. exception:: StreamError
172
173 Is raised for the limitations that are typical for stream-like :class:`TarFile`
174 objects.
175
176
177.. exception:: ExtractError
178
Lars Gustäbel4bfb5932008-05-17 16:50:22 +0000179 Is raised for *non-fatal* errors when using :meth:`TarFile.extract`, but only if
Georg Brandl8ec7f652007-08-15 14:28:01 +0000180 :attr:`TarFile.errorlevel`\ ``== 2``.
181
182
R David Murrayc6cf35d2014-10-03 20:30:42 -0400183The following constants are available at the module level:
184
185.. data:: ENCODING
186
187 The default character encoding: ``'utf-8'`` on Windows, the value returned by
188 :func:`sys.getfilesystemencoding` otherwise.
189
190
Georg Brandl8ec7f652007-08-15 14:28:01 +0000191.. exception:: HeaderError
192
Lars Gustäbel4bfb5932008-05-17 16:50:22 +0000193 Is raised by :meth:`TarInfo.frombuf` if the buffer it gets is invalid.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000194
195 .. versionadded:: 2.6
196
Lars Gustäbel4bfb5932008-05-17 16:50:22 +0000197
Georg Brandl8ec7f652007-08-15 14:28:01 +0000198Each of the following constants defines a tar archive format that the
199:mod:`tarfile` module is able to create. See section :ref:`tar-formats` for
200details.
201
202
203.. data:: USTAR_FORMAT
204
205 POSIX.1-1988 (ustar) format.
206
207
208.. data:: GNU_FORMAT
209
210 GNU tar format.
211
212
213.. data:: PAX_FORMAT
214
215 POSIX.1-2001 (pax) format.
216
217
218.. data:: DEFAULT_FORMAT
219
220 The default format for creating archives. This is currently :const:`GNU_FORMAT`.
221
222
223.. seealso::
224
225 Module :mod:`zipfile`
226 Documentation of the :mod:`zipfile` standard module.
227
R David Murrayc6cf35d2014-10-03 20:30:42 -0400228 :ref:`archiving-operations`
229 Documentation of the higher-level archiving facilities provided by the
230 standard :mod:`shutil` module.
231
Lars Gustäbel4bfb5932008-05-17 16:50:22 +0000232 `GNU tar manual, Basic Tar Format <http://www.gnu.org/software/tar/manual/html_node/Standard.html>`_
Georg Brandl8ec7f652007-08-15 14:28:01 +0000233 Documentation for tar archive files, including GNU tar extensions.
234
Georg Brandl8ec7f652007-08-15 14:28:01 +0000235
236.. _tarfile-objects:
237
238TarFile Objects
239---------------
240
241The :class:`TarFile` object provides an interface to a tar archive. A tar
242archive is a sequence of blocks. An archive member (a stored file) is made up of
243a header block followed by data blocks. It is possible to store a file in a tar
244archive several times. Each archive member is represented by a :class:`TarInfo`
245object, see :ref:`tarinfo-objects` for details.
246
Lars Gustäbel64581042010-03-03 11:55:48 +0000247A :class:`TarFile` object can be used as a context manager in a :keyword:`with`
248statement. It will automatically be closed when the block is completed. Please
249note that in the event of an exception an archive opened for writing will not
Andrew M. Kuchlingca2413e2010-04-11 01:40:06 +0000250be finalized; only the internally used file object will be closed. See the
Lars Gustäbel64581042010-03-03 11:55:48 +0000251:ref:`tar-examples` section for a use case.
252
253.. versionadded:: 2.7
Serhiy Storchaka581448b2014-09-10 23:46:14 +0300254 Added support for the context management protocol.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000255
Lars Gustäbel4bfb5932008-05-17 16:50:22 +0000256.. class:: TarFile(name=None, mode='r', fileobj=None, format=DEFAULT_FORMAT, tarinfo=TarInfo, dereference=False, ignore_zeros=False, encoding=ENCODING, errors=None, pax_headers=None, debug=0, errorlevel=0)
Georg Brandl8ec7f652007-08-15 14:28:01 +0000257
258 All following arguments are optional and can be accessed as instance attributes
259 as well.
260
261 *name* is the pathname of the archive. It can be omitted if *fileobj* is given.
262 In this case, the file object's :attr:`name` attribute is used if it exists.
263
264 *mode* is either ``'r'`` to read from an existing archive, ``'a'`` to append
265 data to an existing file or ``'w'`` to create a new file overwriting an existing
266 one.
267
268 If *fileobj* is given, it is used for reading or writing data. If it can be
269 determined, *mode* is overridden by *fileobj*'s mode. *fileobj* will be used
270 from position 0.
271
272 .. note::
273
274 *fileobj* is not closed, when :class:`TarFile` is closed.
275
276 *format* controls the archive format. It must be one of the constants
277 :const:`USTAR_FORMAT`, :const:`GNU_FORMAT` or :const:`PAX_FORMAT` that are
278 defined at module level.
279
280 .. versionadded:: 2.6
281
282 The *tarinfo* argument can be used to replace the default :class:`TarInfo` class
283 with a different one.
284
285 .. versionadded:: 2.6
286
Lars Gustäbel4bfb5932008-05-17 16:50:22 +0000287 If *dereference* is :const:`False`, add symbolic and hard links to the archive. If it
288 is :const:`True`, add the content of the target files to the archive. This has no
Georg Brandl8ec7f652007-08-15 14:28:01 +0000289 effect on systems that do not support symbolic links.
290
Lars Gustäbel4bfb5932008-05-17 16:50:22 +0000291 If *ignore_zeros* is :const:`False`, treat an empty block as the end of the archive.
292 If it is :const:`True`, skip empty (and invalid) blocks and try to get as many members
Georg Brandl8ec7f652007-08-15 14:28:01 +0000293 as possible. This is only useful for reading concatenated or damaged archives.
294
295 *debug* can be set from ``0`` (no debug messages) up to ``3`` (all debug
296 messages). The messages are written to ``sys.stderr``.
297
Lars Gustäbel4bfb5932008-05-17 16:50:22 +0000298 If *errorlevel* is ``0``, all errors are ignored when using :meth:`TarFile.extract`.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000299 Nevertheless, they appear as error messages in the debug output, when debugging
300 is enabled. If ``1``, all *fatal* errors are raised as :exc:`OSError` or
301 :exc:`IOError` exceptions. If ``2``, all *non-fatal* errors are raised as
302 :exc:`TarError` exceptions as well.
303
304 The *encoding* and *errors* arguments control the way strings are converted to
305 unicode objects and vice versa. The default settings will work for most users.
306 See section :ref:`tar-unicode` for in-depth information.
307
308 .. versionadded:: 2.6
309
310 The *pax_headers* argument is an optional dictionary of unicode strings which
311 will be added as a pax global header if *format* is :const:`PAX_FORMAT`.
312
313 .. versionadded:: 2.6
314
315
Raymond Hettingerfd613492014-05-23 03:43:29 +0100316.. classmethod:: TarFile.open(...)
Georg Brandl8ec7f652007-08-15 14:28:01 +0000317
Lars Gustäbel4bfb5932008-05-17 16:50:22 +0000318 Alternative constructor. The :func:`tarfile.open` function is actually a
319 shortcut to this classmethod.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000320
321
322.. method:: TarFile.getmember(name)
323
324 Return a :class:`TarInfo` object for member *name*. If *name* can not be found
325 in the archive, :exc:`KeyError` is raised.
326
327 .. note::
328
329 If a member occurs more than once in the archive, its last occurrence is assumed
330 to be the most up-to-date version.
331
332
333.. method:: TarFile.getmembers()
334
335 Return the members of the archive as a list of :class:`TarInfo` objects. The
336 list has the same order as the members in the archive.
337
338
339.. method:: TarFile.getnames()
340
341 Return the members as a list of their names. It has the same order as the list
342 returned by :meth:`getmembers`.
343
344
345.. method:: TarFile.list(verbose=True)
346
347 Print a table of contents to ``sys.stdout``. If *verbose* is :const:`False`,
348 only the names of the members are printed. If it is :const:`True`, output
349 similar to that of :program:`ls -l` is produced.
350
351
352.. method:: TarFile.next()
353
354 Return the next member of the archive as a :class:`TarInfo` object, when
Lars Gustäbel4bfb5932008-05-17 16:50:22 +0000355 :class:`TarFile` is opened for reading. Return :const:`None` if there is no more
Georg Brandl8ec7f652007-08-15 14:28:01 +0000356 available.
357
358
Lars Gustäbel4bfb5932008-05-17 16:50:22 +0000359.. method:: TarFile.extractall(path=".", members=None)
Georg Brandl8ec7f652007-08-15 14:28:01 +0000360
361 Extract all members from the archive to the current working directory or
362 directory *path*. If optional *members* is given, it must be a subset of the
363 list returned by :meth:`getmembers`. Directory information like owner,
364 modification time and permissions are set after all members have been extracted.
365 This is done to work around two problems: A directory's modification time is
366 reset each time a file is created in it. And, if a directory's permissions do
367 not allow writing, extracting files to it will fail.
368
Lars Gustäbel89241a32007-08-30 20:24:31 +0000369 .. warning::
370
371 Never extract archives from untrusted sources without prior inspection.
372 It is possible that files are created outside of *path*, e.g. members
373 that have absolute filenames starting with ``"/"`` or filenames with two
374 dots ``".."``.
375
Georg Brandl8ec7f652007-08-15 14:28:01 +0000376 .. versionadded:: 2.5
377
378
Lars Gustäbel4bfb5932008-05-17 16:50:22 +0000379.. method:: TarFile.extract(member, path="")
Georg Brandl8ec7f652007-08-15 14:28:01 +0000380
381 Extract a member from the archive to the current working directory, using its
382 full name. Its file information is extracted as accurately as possible. *member*
383 may be a filename or a :class:`TarInfo` object. You can specify a different
384 directory using *path*.
385
386 .. note::
387
Lars Gustäbel4bfb5932008-05-17 16:50:22 +0000388 The :meth:`extract` method does not take care of several extraction issues.
389 In most cases you should consider using the :meth:`extractall` method.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000390
Lars Gustäbel89241a32007-08-30 20:24:31 +0000391 .. warning::
392
393 See the warning for :meth:`extractall`.
394
Georg Brandl8ec7f652007-08-15 14:28:01 +0000395
396.. method:: TarFile.extractfile(member)
397
398 Extract a member from the archive as a file object. *member* may be a filename
399 or a :class:`TarInfo` object. If *member* is a regular file, a file-like object
400 is returned. If *member* is a link, a file-like object is constructed from the
Lars Gustäbel4bfb5932008-05-17 16:50:22 +0000401 link's target. If *member* is none of the above, :const:`None` is returned.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000402
403 .. note::
404
Georg Brandlcf5608d2009-04-25 15:05:04 +0000405 The file-like object is read-only. It provides the methods
406 :meth:`read`, :meth:`readline`, :meth:`readlines`, :meth:`seek`, :meth:`tell`,
407 and :meth:`close`, and also supports iteration over its lines.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000408
409
Lars Gustäbel21121e62009-09-12 10:28:15 +0000410.. method:: TarFile.add(name, arcname=None, recursive=True, exclude=None, filter=None)
Georg Brandl8ec7f652007-08-15 14:28:01 +0000411
412 Add the file *name* to the archive. *name* may be any type of file (directory,
413 fifo, symbolic link, etc.). If given, *arcname* specifies an alternative name
414 for the file in the archive. Directories are added recursively by default. This
415 can be avoided by setting *recursive* to :const:`False`. If *exclude* is given
416 it must be a function that takes one filename argument and returns a boolean
417 value. Depending on this value the respective file is either excluded
Lars Gustäbel21121e62009-09-12 10:28:15 +0000418 (:const:`True`) or added (:const:`False`). If *filter* is specified it must
419 be a function that takes a :class:`TarInfo` object argument and returns the
Andrew M. Kuchlingf5852f52009-10-05 21:24:35 +0000420 changed :class:`TarInfo` object. If it instead returns :const:`None` the :class:`TarInfo`
Lars Gustäbel21121e62009-09-12 10:28:15 +0000421 object will be excluded from the archive. See :ref:`tar-examples` for an
422 example.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000423
424 .. versionchanged:: 2.6
425 Added the *exclude* parameter.
426
Lars Gustäbel21121e62009-09-12 10:28:15 +0000427 .. versionchanged:: 2.7
428 Added the *filter* parameter.
429
430 .. deprecated:: 2.7
431 The *exclude* parameter is deprecated, please use the *filter* parameter
Raymond Hettinger32074e32011-01-26 20:40:32 +0000432 instead. For maximum portability, *filter* should be used as a keyword
433 argument rather than as a positional argument so that code won't be
434 affected when *exclude* is ultimately removed.
Lars Gustäbel21121e62009-09-12 10:28:15 +0000435
Georg Brandl8ec7f652007-08-15 14:28:01 +0000436
Lars Gustäbel4bfb5932008-05-17 16:50:22 +0000437.. method:: TarFile.addfile(tarinfo, fileobj=None)
Georg Brandl8ec7f652007-08-15 14:28:01 +0000438
439 Add the :class:`TarInfo` object *tarinfo* to the archive. If *fileobj* is given,
440 ``tarinfo.size`` bytes are read from it and added to the archive. You can
441 create :class:`TarInfo` objects using :meth:`gettarinfo`.
442
443 .. note::
444
445 On Windows platforms, *fileobj* should always be opened with mode ``'rb'`` to
446 avoid irritation about the file size.
447
448
Lars Gustäbel4bfb5932008-05-17 16:50:22 +0000449.. method:: TarFile.gettarinfo(name=None, arcname=None, fileobj=None)
Georg Brandl8ec7f652007-08-15 14:28:01 +0000450
451 Create a :class:`TarInfo` object for either the file *name* or the file object
452 *fileobj* (using :func:`os.fstat` on its file descriptor). You can modify some
453 of the :class:`TarInfo`'s attributes before you add it using :meth:`addfile`.
454 If given, *arcname* specifies an alternative name for the file in the archive.
455
456
457.. method:: TarFile.close()
458
459 Close the :class:`TarFile`. In write mode, two finishing zero blocks are
460 appended to the archive.
461
462
463.. attribute:: TarFile.posix
464
465 Setting this to :const:`True` is equivalent to setting the :attr:`format`
466 attribute to :const:`USTAR_FORMAT`, :const:`False` is equivalent to
467 :const:`GNU_FORMAT`.
468
469 .. versionchanged:: 2.4
470 *posix* defaults to :const:`False`.
471
472 .. deprecated:: 2.6
473 Use the :attr:`format` attribute instead.
474
475
476.. attribute:: TarFile.pax_headers
477
478 A dictionary containing key-value pairs of pax global headers.
479
480 .. versionadded:: 2.6
481
Georg Brandl8ec7f652007-08-15 14:28:01 +0000482
483.. _tarinfo-objects:
484
485TarInfo Objects
486---------------
487
488A :class:`TarInfo` object represents one member in a :class:`TarFile`. Aside
489from storing all required attributes of a file (like file type, size, time,
490permissions, owner etc.), it provides some useful methods to determine its type.
491It does *not* contain the file's data itself.
492
493:class:`TarInfo` objects are returned by :class:`TarFile`'s methods
494:meth:`getmember`, :meth:`getmembers` and :meth:`gettarinfo`.
495
496
Lars Gustäbel4bfb5932008-05-17 16:50:22 +0000497.. class:: TarInfo(name="")
Georg Brandl8ec7f652007-08-15 14:28:01 +0000498
499 Create a :class:`TarInfo` object.
500
501
502.. method:: TarInfo.frombuf(buf)
503
504 Create and return a :class:`TarInfo` object from string buffer *buf*.
505
506 .. versionadded:: 2.6
507 Raises :exc:`HeaderError` if the buffer is invalid..
508
509
510.. method:: TarInfo.fromtarfile(tarfile)
511
512 Read the next member from the :class:`TarFile` object *tarfile* and return it as
513 a :class:`TarInfo` object.
514
515 .. versionadded:: 2.6
516
517
Lars Gustäbel4bfb5932008-05-17 16:50:22 +0000518.. method:: TarInfo.tobuf(format=DEFAULT_FORMAT, encoding=ENCODING, errors='strict')
Georg Brandl8ec7f652007-08-15 14:28:01 +0000519
520 Create a string buffer from a :class:`TarInfo` object. For information on the
521 arguments see the constructor of the :class:`TarFile` class.
522
523 .. versionchanged:: 2.6
524 The arguments were added.
525
526A ``TarInfo`` object has the following public data attributes:
527
528
529.. attribute:: TarInfo.name
530
531 Name of the archive member.
532
533
534.. attribute:: TarInfo.size
535
536 Size in bytes.
537
538
539.. attribute:: TarInfo.mtime
540
541 Time of last modification.
542
543
544.. attribute:: TarInfo.mode
545
546 Permission bits.
547
548
549.. attribute:: TarInfo.type
550
551 File type. *type* is usually one of these constants: :const:`REGTYPE`,
552 :const:`AREGTYPE`, :const:`LNKTYPE`, :const:`SYMTYPE`, :const:`DIRTYPE`,
553 :const:`FIFOTYPE`, :const:`CONTTYPE`, :const:`CHRTYPE`, :const:`BLKTYPE`,
554 :const:`GNUTYPE_SPARSE`. To determine the type of a :class:`TarInfo` object
Raymond Hettinger198123c2014-05-23 00:05:48 +0100555 more conveniently, use the ``is*()`` methods below.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000556
557
558.. attribute:: TarInfo.linkname
559
560 Name of the target file name, which is only present in :class:`TarInfo` objects
561 of type :const:`LNKTYPE` and :const:`SYMTYPE`.
562
563
564.. attribute:: TarInfo.uid
565
566 User ID of the user who originally stored this member.
567
568
569.. attribute:: TarInfo.gid
570
571 Group ID of the user who originally stored this member.
572
573
574.. attribute:: TarInfo.uname
575
576 User name.
577
578
579.. attribute:: TarInfo.gname
580
581 Group name.
582
583
584.. attribute:: TarInfo.pax_headers
585
586 A dictionary containing key-value pairs of an associated pax extended header.
587
588 .. versionadded:: 2.6
589
590A :class:`TarInfo` object also provides some convenient query methods:
591
592
593.. method:: TarInfo.isfile()
594
595 Return :const:`True` if the :class:`Tarinfo` object is a regular file.
596
597
598.. method:: TarInfo.isreg()
599
600 Same as :meth:`isfile`.
601
602
603.. method:: TarInfo.isdir()
604
605 Return :const:`True` if it is a directory.
606
607
608.. method:: TarInfo.issym()
609
610 Return :const:`True` if it is a symbolic link.
611
612
613.. method:: TarInfo.islnk()
614
615 Return :const:`True` if it is a hard link.
616
617
618.. method:: TarInfo.ischr()
619
620 Return :const:`True` if it is a character device.
621
622
623.. method:: TarInfo.isblk()
624
625 Return :const:`True` if it is a block device.
626
627
628.. method:: TarInfo.isfifo()
629
630 Return :const:`True` if it is a FIFO.
631
632
633.. method:: TarInfo.isdev()
634
635 Return :const:`True` if it is one of character device, block device or FIFO.
636
Georg Brandl8ec7f652007-08-15 14:28:01 +0000637
638.. _tar-examples:
639
640Examples
641--------
642
643How to extract an entire tar archive to the current working directory::
644
645 import tarfile
646 tar = tarfile.open("sample.tar.gz")
647 tar.extractall()
648 tar.close()
649
Lars Gustäbel4bfb5932008-05-17 16:50:22 +0000650How to extract a subset of a tar archive with :meth:`TarFile.extractall` using
651a generator function instead of a list::
652
653 import os
654 import tarfile
655
656 def py_files(members):
657 for tarinfo in members:
658 if os.path.splitext(tarinfo.name)[1] == ".py":
659 yield tarinfo
660
661 tar = tarfile.open("sample.tar.gz")
662 tar.extractall(members=py_files(tar))
663 tar.close()
664
Georg Brandl8ec7f652007-08-15 14:28:01 +0000665How to create an uncompressed tar archive from a list of filenames::
666
667 import tarfile
668 tar = tarfile.open("sample.tar", "w")
669 for name in ["foo", "bar", "quux"]:
670 tar.add(name)
671 tar.close()
672
Lars Gustäbel64581042010-03-03 11:55:48 +0000673The same example using the :keyword:`with` statement::
674
675 import tarfile
676 with tarfile.open("sample.tar", "w") as tar:
677 for name in ["foo", "bar", "quux"]:
678 tar.add(name)
679
Georg Brandl8ec7f652007-08-15 14:28:01 +0000680How to read a gzip compressed tar archive and display some member information::
681
682 import tarfile
683 tar = tarfile.open("sample.tar.gz", "r:gz")
684 for tarinfo in tar:
685 print tarinfo.name, "is", tarinfo.size, "bytes in size and is",
686 if tarinfo.isreg():
687 print "a regular file."
688 elif tarinfo.isdir():
689 print "a directory."
690 else:
691 print "something else."
692 tar.close()
693
Lars Gustäbel21121e62009-09-12 10:28:15 +0000694How to create an archive and reset the user information using the *filter*
695parameter in :meth:`TarFile.add`::
696
697 import tarfile
698 def reset(tarinfo):
699 tarinfo.uid = tarinfo.gid = 0
700 tarinfo.uname = tarinfo.gname = "root"
701 return tarinfo
702 tar = tarfile.open("sample.tar.gz", "w:gz")
703 tar.add("foo", filter=reset)
704 tar.close()
705
Georg Brandl8ec7f652007-08-15 14:28:01 +0000706
707.. _tar-formats:
708
709Supported tar formats
710---------------------
711
712There are three tar formats that can be created with the :mod:`tarfile` module:
713
714* The POSIX.1-1988 ustar format (:const:`USTAR_FORMAT`). It supports filenames
715 up to a length of at best 256 characters and linknames up to 100 characters. The
716 maximum file size is 8 gigabytes. This is an old and limited but widely
717 supported format.
718
719* The GNU tar format (:const:`GNU_FORMAT`). It supports long filenames and
720 linknames, files bigger than 8 gigabytes and sparse files. It is the de facto
721 standard on GNU/Linux systems. :mod:`tarfile` fully supports the GNU tar
722 extensions for long names, sparse file support is read-only.
723
724* The POSIX.1-2001 pax format (:const:`PAX_FORMAT`). It is the most flexible
725 format with virtually no limits. It supports long filenames and linknames, large
726 files and stores pathnames in a portable way. However, not all tar
727 implementations today are able to handle pax archives properly.
728
729 The *pax* format is an extension to the existing *ustar* format. It uses extra
730 headers for information that cannot be stored otherwise. There are two flavours
731 of pax headers: Extended headers only affect the subsequent file header, global
732 headers are valid for the complete archive and affect all following files. All
733 the data in a pax header is encoded in *UTF-8* for portability reasons.
734
735There are some more variants of the tar format which can be read, but not
736created:
737
738* The ancient V7 format. This is the first tar format from Unix Seventh Edition,
739 storing only regular files and directories. Names must not be longer than 100
740 characters, there is no user/group name information. Some archives have
741 miscalculated header checksums in case of fields with non-ASCII characters.
742
743* The SunOS tar extended format. This format is a variant of the POSIX.1-2001
744 pax format, but is not compatible.
745
Georg Brandl8ec7f652007-08-15 14:28:01 +0000746.. _tar-unicode:
747
748Unicode issues
749--------------
750
751The tar format was originally conceived to make backups on tape drives with the
752main focus on preserving file system information. Nowadays tar archives are
753commonly used for file distribution and exchanging archives over networks. One
754problem of the original format (that all other formats are merely variants of)
755is that there is no concept of supporting different character encodings. For
756example, an ordinary tar archive created on a *UTF-8* system cannot be read
757correctly on a *Latin-1* system if it contains non-ASCII characters. Names (i.e.
758filenames, linknames, user/group names) containing these characters will appear
759damaged. Unfortunately, there is no way to autodetect the encoding of an
760archive.
761
762The pax format was designed to solve this problem. It stores non-ASCII names
763using the universal character encoding *UTF-8*. When a pax archive is read,
764these *UTF-8* names are converted to the encoding of the local file system.
765
766The details of unicode conversion are controlled by the *encoding* and *errors*
767keyword arguments of the :class:`TarFile` class.
768
769The default value for *encoding* is the local character encoding. It is deduced
770from :func:`sys.getfilesystemencoding` and :func:`sys.getdefaultencoding`. In
771read mode, *encoding* is used exclusively to convert unicode names from a pax
772archive to strings in the local character encoding. In write mode, the use of
773*encoding* depends on the chosen archive format. In case of :const:`PAX_FORMAT`,
774input names that contain non-ASCII characters need to be decoded before being
775stored as *UTF-8* strings. The other formats do not make use of *encoding*
776unless unicode objects are used as input names. These are converted to 8-bit
777character strings before they are added to the archive.
778
779The *errors* argument defines how characters are treated that cannot be
780converted to or from *encoding*. Possible values are listed in section
781:ref:`codec-base-classes`. In read mode, there is an additional scheme
782``'utf-8'`` which means that bad characters are replaced by their *UTF-8*
783representation. This is the default scheme. In write mode the default value for
784*errors* is ``'strict'`` to ensure that name information is not altered
785unnoticed.
786