blob: c819bf500a196b43ebaf8a7ac036c5fcb62212fe [file] [log] [blame]
Georg Brandl8ec7f652007-08-15 14:28:01 +00001:mod:`tarfile` --- Read and write tar archive files
2===================================================
3
4.. module:: tarfile
5 :synopsis: Read and write tar-format archive files.
6
7
8.. versionadded:: 2.3
9
10.. moduleauthor:: Lars Gustäbel <lars@gustaebel.de>
11.. sectionauthor:: Lars Gustäbel <lars@gustaebel.de>
12
Éric Araujo29a0b572011-08-19 02:14:03 +020013**Source code:** :source:`Lib/tarfile.py`
14
15--------------
Georg Brandl8ec7f652007-08-15 14:28:01 +000016
Mark Summerfieldaea6e592007-11-05 09:22:48 +000017The :mod:`tarfile` module makes it possible to read and write tar
18archives, including those using gzip or bz2 compression.
Éric Araujoc3cc2ac2012-02-26 01:10:14 +010019Use the :mod:`zipfile` module to read or write :file:`.zip` files, or the
20higher-level functions in :ref:`shutil <archiving-operations>`.
Mark Summerfieldaea6e592007-11-05 09:22:48 +000021
Georg Brandl8ec7f652007-08-15 14:28:01 +000022Some facts and figures:
23
R David Murrayc6cf35d2014-10-03 20:30:42 -040024* reads and writes :mod:`gzip` and :mod:`bz2` compressed archives
25 if the respective modules are available.
Georg Brandl8ec7f652007-08-15 14:28:01 +000026
27* read/write support for the POSIX.1-1988 (ustar) format.
28
29* read/write support for the GNU tar format including *longname* and *longlink*
30 extensions, read-only support for the *sparse* extension.
31
32* read/write support for the POSIX.1-2001 (pax) format.
33
34 .. versionadded:: 2.6
35
36* handles directories, regular files, hardlinks, symbolic links, fifos,
37 character devices and block devices and is able to acquire and restore file
38 information like timestamp, access permissions and owner.
39
Georg Brandl8ec7f652007-08-15 14:28:01 +000040
Lars Gustäbel4bfb5932008-05-17 16:50:22 +000041.. function:: open(name=None, mode='r', fileobj=None, bufsize=10240, \*\*kwargs)
Georg Brandl8ec7f652007-08-15 14:28:01 +000042
43 Return a :class:`TarFile` object for the pathname *name*. For detailed
44 information on :class:`TarFile` objects and the keyword arguments that are
45 allowed, see :ref:`tarfile-objects`.
46
47 *mode* has to be a string of the form ``'filemode[:compression]'``, it defaults
48 to ``'r'``. Here is a full list of mode combinations:
49
50 +------------------+---------------------------------------------+
51 | mode | action |
52 +==================+=============================================+
53 | ``'r' or 'r:*'`` | Open for reading with transparent |
54 | | compression (recommended). |
55 +------------------+---------------------------------------------+
56 | ``'r:'`` | Open for reading exclusively without |
57 | | compression. |
58 +------------------+---------------------------------------------+
59 | ``'r:gz'`` | Open for reading with gzip compression. |
60 +------------------+---------------------------------------------+
61 | ``'r:bz2'`` | Open for reading with bzip2 compression. |
62 +------------------+---------------------------------------------+
63 | ``'a' or 'a:'`` | Open for appending with no compression. The |
64 | | file is created if it does not exist. |
65 +------------------+---------------------------------------------+
66 | ``'w' or 'w:'`` | Open for uncompressed writing. |
67 +------------------+---------------------------------------------+
68 | ``'w:gz'`` | Open for gzip compressed writing. |
69 +------------------+---------------------------------------------+
70 | ``'w:bz2'`` | Open for bzip2 compressed writing. |
71 +------------------+---------------------------------------------+
72
73 Note that ``'a:gz'`` or ``'a:bz2'`` is not possible. If *mode* is not suitable
74 to open a certain (compressed) file for reading, :exc:`ReadError` is raised. Use
75 *mode* ``'r'`` to avoid this. If a compression method is not supported,
76 :exc:`CompressionError` is raised.
77
78 If *fileobj* is specified, it is used as an alternative to a file object opened
79 for *name*. It is supposed to be at position 0.
80
Benjamin Peterson3afd9562014-06-07 12:45:37 -070081 For modes ``'w:gz'``, ``'r:gz'``, ``'w:bz2'``, ``'r:bz2'``, :func:`tarfile.open`
Martin Panterd47b9962016-04-13 03:24:06 +000082 accepts the keyword argument *compresslevel* (default ``9``) to
83 specify the compression level of the file.
Benjamin Peterson3afd9562014-06-07 12:45:37 -070084
Georg Brandl8ec7f652007-08-15 14:28:01 +000085 For special purposes, there is a second format for *mode*:
Lars Gustäbel4bfb5932008-05-17 16:50:22 +000086 ``'filemode|[compression]'``. :func:`tarfile.open` will return a :class:`TarFile`
Georg Brandl8ec7f652007-08-15 14:28:01 +000087 object that processes its data as a stream of blocks. No random seeking will
88 be done on the file. If given, *fileobj* may be any object that has a
89 :meth:`read` or :meth:`write` method (depending on the *mode*). *bufsize*
90 specifies the blocksize and defaults to ``20 * 512`` bytes. Use this variant
91 in combination with e.g. ``sys.stdin``, a socket file object or a tape
92 device. However, such a :class:`TarFile` object is limited in that it does
Martin Panter53ae0ba2016-02-10 05:44:01 +000093 not allow random access, see :ref:`tar-examples`. The currently
Georg Brandl8ec7f652007-08-15 14:28:01 +000094 possible modes:
95
96 +-------------+--------------------------------------------+
97 | Mode | Action |
98 +=============+============================================+
99 | ``'r|*'`` | Open a *stream* of tar blocks for reading |
100 | | with transparent compression. |
101 +-------------+--------------------------------------------+
102 | ``'r|'`` | Open a *stream* of uncompressed tar blocks |
103 | | for reading. |
104 +-------------+--------------------------------------------+
105 | ``'r|gz'`` | Open a gzip compressed *stream* for |
106 | | reading. |
107 +-------------+--------------------------------------------+
108 | ``'r|bz2'`` | Open a bzip2 compressed *stream* for |
109 | | reading. |
110 +-------------+--------------------------------------------+
111 | ``'w|'`` | Open an uncompressed *stream* for writing. |
112 +-------------+--------------------------------------------+
Serhiy Storchaka9a118f12016-04-17 09:37:36 +0300113 | ``'w|gz'`` | Open a gzip compressed *stream* for |
Georg Brandl8ec7f652007-08-15 14:28:01 +0000114 | | writing. |
115 +-------------+--------------------------------------------+
Serhiy Storchaka9a118f12016-04-17 09:37:36 +0300116 | ``'w|bz2'`` | Open a bzip2 compressed *stream* for |
Georg Brandl8ec7f652007-08-15 14:28:01 +0000117 | | writing. |
118 +-------------+--------------------------------------------+
119
120
121.. class:: TarFile
122
123 Class for reading and writing tar archives. Do not use this class directly,
Lars Gustäbel4bfb5932008-05-17 16:50:22 +0000124 better use :func:`tarfile.open` instead. See :ref:`tarfile-objects`.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000125
126
127.. function:: is_tarfile(name)
128
129 Return :const:`True` if *name* is a tar archive file, that the :mod:`tarfile`
130 module can read.
131
132
Lars Gustäbel4bfb5932008-05-17 16:50:22 +0000133.. class:: TarFileCompat(filename, mode='r', compression=TAR_PLAIN)
Georg Brandl8ec7f652007-08-15 14:28:01 +0000134
135 Class for limited access to tar archives with a :mod:`zipfile`\ -like interface.
136 Please consult the documentation of the :mod:`zipfile` module for more details.
137 *compression* must be one of the following constants:
138
139
140 .. data:: TAR_PLAIN
141
142 Constant for an uncompressed tar archive.
143
144
145 .. data:: TAR_GZIPPED
146
147 Constant for a :mod:`gzip` compressed tar archive.
148
149
Lars Gustäbel727bd0b2008-08-02 11:26:39 +0000150 .. deprecated:: 2.6
Ezio Melotti510ff542012-05-03 19:21:40 +0300151 The :class:`TarFileCompat` class has been removed in Python 3.
Lars Gustäbel727bd0b2008-08-02 11:26:39 +0000152
153
Georg Brandl8ec7f652007-08-15 14:28:01 +0000154.. exception:: TarError
155
156 Base class for all :mod:`tarfile` exceptions.
157
158
159.. exception:: ReadError
160
161 Is raised when a tar archive is opened, that either cannot be handled by the
162 :mod:`tarfile` module or is somehow invalid.
163
164
165.. exception:: CompressionError
166
167 Is raised when a compression method is not supported or when the data cannot be
168 decoded properly.
169
170
171.. exception:: StreamError
172
173 Is raised for the limitations that are typical for stream-like :class:`TarFile`
174 objects.
175
176
177.. exception:: ExtractError
178
Lars Gustäbel4bfb5932008-05-17 16:50:22 +0000179 Is raised for *non-fatal* errors when using :meth:`TarFile.extract`, but only if
Georg Brandl8ec7f652007-08-15 14:28:01 +0000180 :attr:`TarFile.errorlevel`\ ``== 2``.
181
182
R David Murrayc6cf35d2014-10-03 20:30:42 -0400183The following constants are available at the module level:
184
185.. data:: ENCODING
186
187 The default character encoding: ``'utf-8'`` on Windows, the value returned by
188 :func:`sys.getfilesystemencoding` otherwise.
189
190
Georg Brandl8ec7f652007-08-15 14:28:01 +0000191.. exception:: HeaderError
192
Lars Gustäbel4bfb5932008-05-17 16:50:22 +0000193 Is raised by :meth:`TarInfo.frombuf` if the buffer it gets is invalid.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000194
195 .. versionadded:: 2.6
196
Lars Gustäbel4bfb5932008-05-17 16:50:22 +0000197
Georg Brandl8ec7f652007-08-15 14:28:01 +0000198Each of the following constants defines a tar archive format that the
199:mod:`tarfile` module is able to create. See section :ref:`tar-formats` for
200details.
201
202
203.. data:: USTAR_FORMAT
204
205 POSIX.1-1988 (ustar) format.
206
207
208.. data:: GNU_FORMAT
209
210 GNU tar format.
211
212
213.. data:: PAX_FORMAT
214
215 POSIX.1-2001 (pax) format.
216
217
218.. data:: DEFAULT_FORMAT
219
220 The default format for creating archives. This is currently :const:`GNU_FORMAT`.
221
222
223.. seealso::
224
225 Module :mod:`zipfile`
226 Documentation of the :mod:`zipfile` standard module.
227
R David Murrayc6cf35d2014-10-03 20:30:42 -0400228 :ref:`archiving-operations`
229 Documentation of the higher-level archiving facilities provided by the
230 standard :mod:`shutil` module.
231
Serhiy Storchakab4905ef2016-05-07 10:50:12 +0300232 `GNU tar manual, Basic Tar Format <https://www.gnu.org/software/tar/manual/html_node/Standard.html>`_
Georg Brandl8ec7f652007-08-15 14:28:01 +0000233 Documentation for tar archive files, including GNU tar extensions.
234
Georg Brandl8ec7f652007-08-15 14:28:01 +0000235
236.. _tarfile-objects:
237
238TarFile Objects
239---------------
240
241The :class:`TarFile` object provides an interface to a tar archive. A tar
242archive is a sequence of blocks. An archive member (a stored file) is made up of
243a header block followed by data blocks. It is possible to store a file in a tar
244archive several times. Each archive member is represented by a :class:`TarInfo`
245object, see :ref:`tarinfo-objects` for details.
246
Lars Gustäbel64581042010-03-03 11:55:48 +0000247A :class:`TarFile` object can be used as a context manager in a :keyword:`with`
248statement. It will automatically be closed when the block is completed. Please
249note that in the event of an exception an archive opened for writing will not
Andrew M. Kuchlingca2413e2010-04-11 01:40:06 +0000250be finalized; only the internally used file object will be closed. See the
Lars Gustäbel64581042010-03-03 11:55:48 +0000251:ref:`tar-examples` section for a use case.
252
253.. versionadded:: 2.7
Serhiy Storchaka581448b2014-09-10 23:46:14 +0300254 Added support for the context management protocol.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000255
Lars Gustäbel4bfb5932008-05-17 16:50:22 +0000256.. class:: TarFile(name=None, mode='r', fileobj=None, format=DEFAULT_FORMAT, tarinfo=TarInfo, dereference=False, ignore_zeros=False, encoding=ENCODING, errors=None, pax_headers=None, debug=0, errorlevel=0)
Georg Brandl8ec7f652007-08-15 14:28:01 +0000257
258 All following arguments are optional and can be accessed as instance attributes
259 as well.
260
261 *name* is the pathname of the archive. It can be omitted if *fileobj* is given.
262 In this case, the file object's :attr:`name` attribute is used if it exists.
263
264 *mode* is either ``'r'`` to read from an existing archive, ``'a'`` to append
265 data to an existing file or ``'w'`` to create a new file overwriting an existing
266 one.
267
268 If *fileobj* is given, it is used for reading or writing data. If it can be
269 determined, *mode* is overridden by *fileobj*'s mode. *fileobj* will be used
270 from position 0.
271
272 .. note::
273
274 *fileobj* is not closed, when :class:`TarFile` is closed.
275
276 *format* controls the archive format. It must be one of the constants
277 :const:`USTAR_FORMAT`, :const:`GNU_FORMAT` or :const:`PAX_FORMAT` that are
278 defined at module level.
279
280 .. versionadded:: 2.6
281
282 The *tarinfo* argument can be used to replace the default :class:`TarInfo` class
283 with a different one.
284
285 .. versionadded:: 2.6
286
Lars Gustäbel4bfb5932008-05-17 16:50:22 +0000287 If *dereference* is :const:`False`, add symbolic and hard links to the archive. If it
288 is :const:`True`, add the content of the target files to the archive. This has no
Georg Brandl8ec7f652007-08-15 14:28:01 +0000289 effect on systems that do not support symbolic links.
290
Lars Gustäbel4bfb5932008-05-17 16:50:22 +0000291 If *ignore_zeros* is :const:`False`, treat an empty block as the end of the archive.
292 If it is :const:`True`, skip empty (and invalid) blocks and try to get as many members
Georg Brandl8ec7f652007-08-15 14:28:01 +0000293 as possible. This is only useful for reading concatenated or damaged archives.
294
295 *debug* can be set from ``0`` (no debug messages) up to ``3`` (all debug
296 messages). The messages are written to ``sys.stderr``.
297
Lars Gustäbel4bfb5932008-05-17 16:50:22 +0000298 If *errorlevel* is ``0``, all errors are ignored when using :meth:`TarFile.extract`.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000299 Nevertheless, they appear as error messages in the debug output, when debugging
300 is enabled. If ``1``, all *fatal* errors are raised as :exc:`OSError` or
301 :exc:`IOError` exceptions. If ``2``, all *non-fatal* errors are raised as
302 :exc:`TarError` exceptions as well.
303
304 The *encoding* and *errors* arguments control the way strings are converted to
305 unicode objects and vice versa. The default settings will work for most users.
306 See section :ref:`tar-unicode` for in-depth information.
307
308 .. versionadded:: 2.6
309
310 The *pax_headers* argument is an optional dictionary of unicode strings which
311 will be added as a pax global header if *format* is :const:`PAX_FORMAT`.
312
313 .. versionadded:: 2.6
314
315
Raymond Hettingerfd613492014-05-23 03:43:29 +0100316.. classmethod:: TarFile.open(...)
Georg Brandl8ec7f652007-08-15 14:28:01 +0000317
Lars Gustäbel4bfb5932008-05-17 16:50:22 +0000318 Alternative constructor. The :func:`tarfile.open` function is actually a
319 shortcut to this classmethod.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000320
321
322.. method:: TarFile.getmember(name)
323
324 Return a :class:`TarInfo` object for member *name*. If *name* can not be found
325 in the archive, :exc:`KeyError` is raised.
326
327 .. note::
328
329 If a member occurs more than once in the archive, its last occurrence is assumed
330 to be the most up-to-date version.
331
332
333.. method:: TarFile.getmembers()
334
335 Return the members of the archive as a list of :class:`TarInfo` objects. The
336 list has the same order as the members in the archive.
337
338
339.. method:: TarFile.getnames()
340
341 Return the members as a list of their names. It has the same order as the list
342 returned by :meth:`getmembers`.
343
344
345.. method:: TarFile.list(verbose=True)
346
347 Print a table of contents to ``sys.stdout``. If *verbose* is :const:`False`,
348 only the names of the members are printed. If it is :const:`True`, output
349 similar to that of :program:`ls -l` is produced.
350
351
352.. method:: TarFile.next()
353
354 Return the next member of the archive as a :class:`TarInfo` object, when
Lars Gustäbel4bfb5932008-05-17 16:50:22 +0000355 :class:`TarFile` is opened for reading. Return :const:`None` if there is no more
Georg Brandl8ec7f652007-08-15 14:28:01 +0000356 available.
357
358
Lars Gustäbel4bfb5932008-05-17 16:50:22 +0000359.. method:: TarFile.extractall(path=".", members=None)
Georg Brandl8ec7f652007-08-15 14:28:01 +0000360
361 Extract all members from the archive to the current working directory or
362 directory *path*. If optional *members* is given, it must be a subset of the
363 list returned by :meth:`getmembers`. Directory information like owner,
364 modification time and permissions are set after all members have been extracted.
365 This is done to work around two problems: A directory's modification time is
366 reset each time a file is created in it. And, if a directory's permissions do
367 not allow writing, extracting files to it will fail.
368
Lars Gustäbel89241a32007-08-30 20:24:31 +0000369 .. warning::
370
371 Never extract archives from untrusted sources without prior inspection.
372 It is possible that files are created outside of *path*, e.g. members
373 that have absolute filenames starting with ``"/"`` or filenames with two
374 dots ``".."``.
375
Georg Brandl8ec7f652007-08-15 14:28:01 +0000376 .. versionadded:: 2.5
377
378
Lars Gustäbel4bfb5932008-05-17 16:50:22 +0000379.. method:: TarFile.extract(member, path="")
Georg Brandl8ec7f652007-08-15 14:28:01 +0000380
381 Extract a member from the archive to the current working directory, using its
382 full name. Its file information is extracted as accurately as possible. *member*
383 may be a filename or a :class:`TarInfo` object. You can specify a different
384 directory using *path*.
385
386 .. note::
387
Lars Gustäbel4bfb5932008-05-17 16:50:22 +0000388 The :meth:`extract` method does not take care of several extraction issues.
389 In most cases you should consider using the :meth:`extractall` method.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000390
Lars Gustäbel89241a32007-08-30 20:24:31 +0000391 .. warning::
392
393 See the warning for :meth:`extractall`.
394
Georg Brandl8ec7f652007-08-15 14:28:01 +0000395
396.. method:: TarFile.extractfile(member)
397
398 Extract a member from the archive as a file object. *member* may be a filename
399 or a :class:`TarInfo` object. If *member* is a regular file, a file-like object
400 is returned. If *member* is a link, a file-like object is constructed from the
Lars Gustäbel4bfb5932008-05-17 16:50:22 +0000401 link's target. If *member* is none of the above, :const:`None` is returned.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000402
403 .. note::
404
Georg Brandlcf5608d2009-04-25 15:05:04 +0000405 The file-like object is read-only. It provides the methods
406 :meth:`read`, :meth:`readline`, :meth:`readlines`, :meth:`seek`, :meth:`tell`,
407 and :meth:`close`, and also supports iteration over its lines.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000408
409
Lars Gustäbel21121e62009-09-12 10:28:15 +0000410.. method:: TarFile.add(name, arcname=None, recursive=True, exclude=None, filter=None)
Georg Brandl8ec7f652007-08-15 14:28:01 +0000411
412 Add the file *name* to the archive. *name* may be any type of file (directory,
413 fifo, symbolic link, etc.). If given, *arcname* specifies an alternative name
414 for the file in the archive. Directories are added recursively by default. This
415 can be avoided by setting *recursive* to :const:`False`. If *exclude* is given
416 it must be a function that takes one filename argument and returns a boolean
417 value. Depending on this value the respective file is either excluded
Lars Gustäbel21121e62009-09-12 10:28:15 +0000418 (:const:`True`) or added (:const:`False`). If *filter* is specified it must
419 be a function that takes a :class:`TarInfo` object argument and returns the
Andrew M. Kuchlingf5852f52009-10-05 21:24:35 +0000420 changed :class:`TarInfo` object. If it instead returns :const:`None` the :class:`TarInfo`
Lars Gustäbel21121e62009-09-12 10:28:15 +0000421 object will be excluded from the archive. See :ref:`tar-examples` for an
422 example.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000423
424 .. versionchanged:: 2.6
425 Added the *exclude* parameter.
426
Lars Gustäbel21121e62009-09-12 10:28:15 +0000427 .. versionchanged:: 2.7
428 Added the *filter* parameter.
429
430 .. deprecated:: 2.7
431 The *exclude* parameter is deprecated, please use the *filter* parameter
Raymond Hettinger32074e32011-01-26 20:40:32 +0000432 instead. For maximum portability, *filter* should be used as a keyword
433 argument rather than as a positional argument so that code won't be
434 affected when *exclude* is ultimately removed.
Lars Gustäbel21121e62009-09-12 10:28:15 +0000435
Georg Brandl8ec7f652007-08-15 14:28:01 +0000436
Lars Gustäbel4bfb5932008-05-17 16:50:22 +0000437.. method:: TarFile.addfile(tarinfo, fileobj=None)
Georg Brandl8ec7f652007-08-15 14:28:01 +0000438
439 Add the :class:`TarInfo` object *tarinfo* to the archive. If *fileobj* is given,
440 ``tarinfo.size`` bytes are read from it and added to the archive. You can
Martin Panter59b9a162016-02-19 23:34:56 +0000441 create :class:`TarInfo` objects directly, or by using :meth:`gettarinfo`.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000442
443 .. note::
Georg Brandl8ec7f652007-08-15 14:28:01 +0000444 On Windows platforms, *fileobj* should always be opened with mode ``'rb'`` to
445 avoid irritation about the file size.
446
447
Lars Gustäbel4bfb5932008-05-17 16:50:22 +0000448.. method:: TarFile.gettarinfo(name=None, arcname=None, fileobj=None)
Georg Brandl8ec7f652007-08-15 14:28:01 +0000449
Martin Panter59b9a162016-02-19 23:34:56 +0000450 Create a :class:`TarInfo` object from the result of :func:`os.stat` or
451 equivalent on an existing file. The file is either named by *name*, or
452 specified as a file object *fileobj* with a file descriptor. If
453 given, *arcname* specifies an alternative name for the file in the
454 archive, otherwise, the name is taken from *fileobj*’s
455 :attr:`~file.name` attribute, or the *name* argument.
456
457 You can modify some
458 of the :class:`TarInfo`s attributes before you add it using :meth:`addfile`.
459 If the file object is not an ordinary file object positioned at the
460 beginning of the file, attributes such as :attr:`~TarInfo.size` may need
461 modifying. This is the case for objects such as :class:`~gzip.GzipFile`.
462 The :attr:`~TarInfo.name` may also be modified, in which case *arcname*
463 could be a dummy string.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000464
465
466.. method:: TarFile.close()
467
468 Close the :class:`TarFile`. In write mode, two finishing zero blocks are
469 appended to the archive.
470
471
472.. attribute:: TarFile.posix
473
474 Setting this to :const:`True` is equivalent to setting the :attr:`format`
475 attribute to :const:`USTAR_FORMAT`, :const:`False` is equivalent to
476 :const:`GNU_FORMAT`.
477
478 .. versionchanged:: 2.4
479 *posix* defaults to :const:`False`.
480
481 .. deprecated:: 2.6
482 Use the :attr:`format` attribute instead.
483
484
485.. attribute:: TarFile.pax_headers
486
487 A dictionary containing key-value pairs of pax global headers.
488
489 .. versionadded:: 2.6
490
Georg Brandl8ec7f652007-08-15 14:28:01 +0000491
492.. _tarinfo-objects:
493
494TarInfo Objects
495---------------
496
497A :class:`TarInfo` object represents one member in a :class:`TarFile`. Aside
498from storing all required attributes of a file (like file type, size, time,
499permissions, owner etc.), it provides some useful methods to determine its type.
500It does *not* contain the file's data itself.
501
502:class:`TarInfo` objects are returned by :class:`TarFile`'s methods
503:meth:`getmember`, :meth:`getmembers` and :meth:`gettarinfo`.
504
505
Lars Gustäbel4bfb5932008-05-17 16:50:22 +0000506.. class:: TarInfo(name="")
Georg Brandl8ec7f652007-08-15 14:28:01 +0000507
508 Create a :class:`TarInfo` object.
509
510
511.. method:: TarInfo.frombuf(buf)
512
513 Create and return a :class:`TarInfo` object from string buffer *buf*.
514
515 .. versionadded:: 2.6
516 Raises :exc:`HeaderError` if the buffer is invalid..
517
518
519.. method:: TarInfo.fromtarfile(tarfile)
520
521 Read the next member from the :class:`TarFile` object *tarfile* and return it as
522 a :class:`TarInfo` object.
523
524 .. versionadded:: 2.6
525
526
Lars Gustäbel4bfb5932008-05-17 16:50:22 +0000527.. method:: TarInfo.tobuf(format=DEFAULT_FORMAT, encoding=ENCODING, errors='strict')
Georg Brandl8ec7f652007-08-15 14:28:01 +0000528
529 Create a string buffer from a :class:`TarInfo` object. For information on the
530 arguments see the constructor of the :class:`TarFile` class.
531
532 .. versionchanged:: 2.6
533 The arguments were added.
534
535A ``TarInfo`` object has the following public data attributes:
536
537
538.. attribute:: TarInfo.name
539
540 Name of the archive member.
541
542
543.. attribute:: TarInfo.size
544
545 Size in bytes.
546
547
548.. attribute:: TarInfo.mtime
549
550 Time of last modification.
551
552
553.. attribute:: TarInfo.mode
554
555 Permission bits.
556
557
558.. attribute:: TarInfo.type
559
560 File type. *type* is usually one of these constants: :const:`REGTYPE`,
561 :const:`AREGTYPE`, :const:`LNKTYPE`, :const:`SYMTYPE`, :const:`DIRTYPE`,
562 :const:`FIFOTYPE`, :const:`CONTTYPE`, :const:`CHRTYPE`, :const:`BLKTYPE`,
563 :const:`GNUTYPE_SPARSE`. To determine the type of a :class:`TarInfo` object
Raymond Hettinger198123c2014-05-23 00:05:48 +0100564 more conveniently, use the ``is*()`` methods below.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000565
566
567.. attribute:: TarInfo.linkname
568
569 Name of the target file name, which is only present in :class:`TarInfo` objects
570 of type :const:`LNKTYPE` and :const:`SYMTYPE`.
571
572
573.. attribute:: TarInfo.uid
574
575 User ID of the user who originally stored this member.
576
577
578.. attribute:: TarInfo.gid
579
580 Group ID of the user who originally stored this member.
581
582
583.. attribute:: TarInfo.uname
584
585 User name.
586
587
588.. attribute:: TarInfo.gname
589
590 Group name.
591
592
593.. attribute:: TarInfo.pax_headers
594
595 A dictionary containing key-value pairs of an associated pax extended header.
596
597 .. versionadded:: 2.6
598
599A :class:`TarInfo` object also provides some convenient query methods:
600
601
602.. method:: TarInfo.isfile()
603
604 Return :const:`True` if the :class:`Tarinfo` object is a regular file.
605
606
607.. method:: TarInfo.isreg()
608
609 Same as :meth:`isfile`.
610
611
612.. method:: TarInfo.isdir()
613
614 Return :const:`True` if it is a directory.
615
616
617.. method:: TarInfo.issym()
618
619 Return :const:`True` if it is a symbolic link.
620
621
622.. method:: TarInfo.islnk()
623
624 Return :const:`True` if it is a hard link.
625
626
627.. method:: TarInfo.ischr()
628
629 Return :const:`True` if it is a character device.
630
631
632.. method:: TarInfo.isblk()
633
634 Return :const:`True` if it is a block device.
635
636
637.. method:: TarInfo.isfifo()
638
639 Return :const:`True` if it is a FIFO.
640
641
642.. method:: TarInfo.isdev()
643
644 Return :const:`True` if it is one of character device, block device or FIFO.
645
Georg Brandl8ec7f652007-08-15 14:28:01 +0000646
647.. _tar-examples:
648
649Examples
650--------
651
652How to extract an entire tar archive to the current working directory::
653
654 import tarfile
655 tar = tarfile.open("sample.tar.gz")
656 tar.extractall()
657 tar.close()
658
Lars Gustäbel4bfb5932008-05-17 16:50:22 +0000659How to extract a subset of a tar archive with :meth:`TarFile.extractall` using
660a generator function instead of a list::
661
662 import os
663 import tarfile
664
665 def py_files(members):
666 for tarinfo in members:
667 if os.path.splitext(tarinfo.name)[1] == ".py":
668 yield tarinfo
669
670 tar = tarfile.open("sample.tar.gz")
671 tar.extractall(members=py_files(tar))
672 tar.close()
673
Georg Brandl8ec7f652007-08-15 14:28:01 +0000674How to create an uncompressed tar archive from a list of filenames::
675
676 import tarfile
677 tar = tarfile.open("sample.tar", "w")
678 for name in ["foo", "bar", "quux"]:
679 tar.add(name)
680 tar.close()
681
Lars Gustäbel64581042010-03-03 11:55:48 +0000682The same example using the :keyword:`with` statement::
683
684 import tarfile
685 with tarfile.open("sample.tar", "w") as tar:
686 for name in ["foo", "bar", "quux"]:
687 tar.add(name)
688
Georg Brandl8ec7f652007-08-15 14:28:01 +0000689How to read a gzip compressed tar archive and display some member information::
690
691 import tarfile
692 tar = tarfile.open("sample.tar.gz", "r:gz")
693 for tarinfo in tar:
694 print tarinfo.name, "is", tarinfo.size, "bytes in size and is",
695 if tarinfo.isreg():
696 print "a regular file."
697 elif tarinfo.isdir():
698 print "a directory."
699 else:
700 print "something else."
701 tar.close()
702
Lars Gustäbel21121e62009-09-12 10:28:15 +0000703How to create an archive and reset the user information using the *filter*
704parameter in :meth:`TarFile.add`::
705
706 import tarfile
707 def reset(tarinfo):
708 tarinfo.uid = tarinfo.gid = 0
709 tarinfo.uname = tarinfo.gname = "root"
710 return tarinfo
711 tar = tarfile.open("sample.tar.gz", "w:gz")
712 tar.add("foo", filter=reset)
713 tar.close()
714
Georg Brandl8ec7f652007-08-15 14:28:01 +0000715
716.. _tar-formats:
717
718Supported tar formats
719---------------------
720
721There are three tar formats that can be created with the :mod:`tarfile` module:
722
723* The POSIX.1-1988 ustar format (:const:`USTAR_FORMAT`). It supports filenames
724 up to a length of at best 256 characters and linknames up to 100 characters. The
725 maximum file size is 8 gigabytes. This is an old and limited but widely
726 supported format.
727
728* The GNU tar format (:const:`GNU_FORMAT`). It supports long filenames and
729 linknames, files bigger than 8 gigabytes and sparse files. It is the de facto
730 standard on GNU/Linux systems. :mod:`tarfile` fully supports the GNU tar
731 extensions for long names, sparse file support is read-only.
732
733* The POSIX.1-2001 pax format (:const:`PAX_FORMAT`). It is the most flexible
734 format with virtually no limits. It supports long filenames and linknames, large
735 files and stores pathnames in a portable way. However, not all tar
736 implementations today are able to handle pax archives properly.
737
738 The *pax* format is an extension to the existing *ustar* format. It uses extra
739 headers for information that cannot be stored otherwise. There are two flavours
740 of pax headers: Extended headers only affect the subsequent file header, global
741 headers are valid for the complete archive and affect all following files. All
742 the data in a pax header is encoded in *UTF-8* for portability reasons.
743
744There are some more variants of the tar format which can be read, but not
745created:
746
747* The ancient V7 format. This is the first tar format from Unix Seventh Edition,
748 storing only regular files and directories. Names must not be longer than 100
749 characters, there is no user/group name information. Some archives have
750 miscalculated header checksums in case of fields with non-ASCII characters.
751
752* The SunOS tar extended format. This format is a variant of the POSIX.1-2001
753 pax format, but is not compatible.
754
Georg Brandl8ec7f652007-08-15 14:28:01 +0000755.. _tar-unicode:
756
757Unicode issues
758--------------
759
760The tar format was originally conceived to make backups on tape drives with the
761main focus on preserving file system information. Nowadays tar archives are
762commonly used for file distribution and exchanging archives over networks. One
763problem of the original format (that all other formats are merely variants of)
764is that there is no concept of supporting different character encodings. For
765example, an ordinary tar archive created on a *UTF-8* system cannot be read
766correctly on a *Latin-1* system if it contains non-ASCII characters. Names (i.e.
767filenames, linknames, user/group names) containing these characters will appear
768damaged. Unfortunately, there is no way to autodetect the encoding of an
769archive.
770
771The pax format was designed to solve this problem. It stores non-ASCII names
772using the universal character encoding *UTF-8*. When a pax archive is read,
773these *UTF-8* names are converted to the encoding of the local file system.
774
775The details of unicode conversion are controlled by the *encoding* and *errors*
776keyword arguments of the :class:`TarFile` class.
777
778The default value for *encoding* is the local character encoding. It is deduced
779from :func:`sys.getfilesystemencoding` and :func:`sys.getdefaultencoding`. In
780read mode, *encoding* is used exclusively to convert unicode names from a pax
781archive to strings in the local character encoding. In write mode, the use of
782*encoding* depends on the chosen archive format. In case of :const:`PAX_FORMAT`,
783input names that contain non-ASCII characters need to be decoded before being
784stored as *UTF-8* strings. The other formats do not make use of *encoding*
785unless unicode objects are used as input names. These are converted to 8-bit
786character strings before they are added to the archive.
787
788The *errors* argument defines how characters are treated that cannot be
789converted to or from *encoding*. Possible values are listed in section
790:ref:`codec-base-classes`. In read mode, there is an additional scheme
791``'utf-8'`` which means that bad characters are replaced by their *UTF-8*
792representation. This is the default scheme. In write mode the default value for
793*errors* is ``'strict'`` to ensure that name information is not altered
794unnoticed.
795