| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 1 | :mod:`tarfile` --- Read and write tar archive files | 
 | 2 | =================================================== | 
 | 3 |  | 
 | 4 | .. module:: tarfile | 
 | 5 |    :synopsis: Read and write tar-format archive files. | 
 | 6 |  | 
 | 7 |  | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 8 | .. moduleauthor:: Lars Gustäbel <lars@gustaebel.de> | 
 | 9 | .. sectionauthor:: Lars Gustäbel <lars@gustaebel.de> | 
 | 10 |  | 
| Raymond Hettinger | a199368 | 2011-01-27 01:20:32 +0000 | [diff] [blame] | 11 | **Source code:** :source:`Lib/tarfile.py` | 
 | 12 |  | 
 | 13 | -------------- | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 14 |  | 
| Guido van Rossum | 7767711 | 2007-11-05 19:43:04 +0000 | [diff] [blame] | 15 | The :mod:`tarfile` module makes it possible to read and write tar | 
 | 16 | archives, including those using gzip or bz2 compression. | 
| Christian Heimes | 255f53b | 2007-12-08 15:33:56 +0000 | [diff] [blame] | 17 | (:file:`.zip` files can be read and written using the :mod:`zipfile` module.) | 
| Guido van Rossum | 7767711 | 2007-11-05 19:43:04 +0000 | [diff] [blame] | 18 |  | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 19 | Some facts and figures: | 
 | 20 |  | 
| Guido van Rossum | 7767711 | 2007-11-05 19:43:04 +0000 | [diff] [blame] | 21 | * reads and writes :mod:`gzip` and :mod:`bz2` compressed archives. | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 22 |  | 
 | 23 | * read/write support for the POSIX.1-1988 (ustar) format. | 
 | 24 |  | 
 | 25 | * read/write support for the GNU tar format including *longname* and *longlink* | 
| Lars Gustäbel | 9cbdd75 | 2010-10-29 09:08:19 +0000 | [diff] [blame] | 26 |   extensions, read-only support for all variants of the *sparse* extension | 
 | 27 |   including restoration of sparse files. | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 28 |  | 
 | 29 | * read/write support for the POSIX.1-2001 (pax) format. | 
 | 30 |  | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 31 | * handles directories, regular files, hardlinks, symbolic links, fifos, | 
 | 32 |   character devices and block devices and is able to acquire and restore file | 
 | 33 |   information like timestamp, access permissions and owner. | 
 | 34 |  | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 35 |  | 
| Benjamin Peterson | a37cfc6 | 2008-05-26 13:48:34 +0000 | [diff] [blame] | 36 | .. function:: open(name=None, mode='r', fileobj=None, bufsize=10240, \*\*kwargs) | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 37 |  | 
 | 38 |    Return a :class:`TarFile` object for the pathname *name*. For detailed | 
 | 39 |    information on :class:`TarFile` objects and the keyword arguments that are | 
 | 40 |    allowed, see :ref:`tarfile-objects`. | 
 | 41 |  | 
 | 42 |    *mode* has to be a string of the form ``'filemode[:compression]'``, it defaults | 
 | 43 |    to ``'r'``. Here is a full list of mode combinations: | 
 | 44 |  | 
 | 45 |    +------------------+---------------------------------------------+ | 
 | 46 |    | mode             | action                                      | | 
 | 47 |    +==================+=============================================+ | 
 | 48 |    | ``'r' or 'r:*'`` | Open for reading with transparent           | | 
 | 49 |    |                  | compression (recommended).                  | | 
 | 50 |    +------------------+---------------------------------------------+ | 
 | 51 |    | ``'r:'``         | Open for reading exclusively without        | | 
 | 52 |    |                  | compression.                                | | 
 | 53 |    +------------------+---------------------------------------------+ | 
 | 54 |    | ``'r:gz'``       | Open for reading with gzip compression.     | | 
 | 55 |    +------------------+---------------------------------------------+ | 
 | 56 |    | ``'r:bz2'``      | Open for reading with bzip2 compression.    | | 
 | 57 |    +------------------+---------------------------------------------+ | 
 | 58 |    | ``'a' or 'a:'``  | Open for appending with no compression. The | | 
 | 59 |    |                  | file is created if it does not exist.       | | 
 | 60 |    +------------------+---------------------------------------------+ | 
 | 61 |    | ``'w' or 'w:'``  | Open for uncompressed writing.              | | 
 | 62 |    +------------------+---------------------------------------------+ | 
 | 63 |    | ``'w:gz'``       | Open for gzip compressed writing.           | | 
 | 64 |    +------------------+---------------------------------------------+ | 
 | 65 |    | ``'w:bz2'``      | Open for bzip2 compressed writing.          | | 
 | 66 |    +------------------+---------------------------------------------+ | 
 | 67 |  | 
 | 68 |    Note that ``'a:gz'`` or ``'a:bz2'`` is not possible. If *mode* is not suitable | 
 | 69 |    to open a certain (compressed) file for reading, :exc:`ReadError` is raised. Use | 
 | 70 |    *mode* ``'r'`` to avoid this.  If a compression method is not supported, | 
 | 71 |    :exc:`CompressionError` is raised. | 
 | 72 |  | 
| Antoine Pitrou | 11cb961 | 2010-09-15 11:11:28 +0000 | [diff] [blame] | 73 |    If *fileobj* is specified, it is used as an alternative to a :term:`file object` | 
 | 74 |    opened in binary mode for *name*. It is supposed to be at position 0. | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 75 |  | 
 | 76 |    For special purposes, there is a second format for *mode*: | 
| Benjamin Peterson | a37cfc6 | 2008-05-26 13:48:34 +0000 | [diff] [blame] | 77 |    ``'filemode|[compression]'``.  :func:`tarfile.open` will return a :class:`TarFile` | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 78 |    object that processes its data as a stream of blocks.  No random seeking will | 
 | 79 |    be done on the file. If given, *fileobj* may be any object that has a | 
 | 80 |    :meth:`read` or :meth:`write` method (depending on the *mode*). *bufsize* | 
 | 81 |    specifies the blocksize and defaults to ``20 * 512`` bytes. Use this variant | 
| Antoine Pitrou | 11cb961 | 2010-09-15 11:11:28 +0000 | [diff] [blame] | 82 |    in combination with e.g. ``sys.stdin``, a socket :term:`file object` or a tape | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 83 |    device. However, such a :class:`TarFile` object is limited in that it does | 
 | 84 |    not allow to be accessed randomly, see :ref:`tar-examples`.  The currently | 
 | 85 |    possible modes: | 
 | 86 |  | 
 | 87 |    +-------------+--------------------------------------------+ | 
 | 88 |    | Mode        | Action                                     | | 
 | 89 |    +=============+============================================+ | 
 | 90 |    | ``'r|*'``   | Open a *stream* of tar blocks for reading  | | 
 | 91 |    |             | with transparent compression.              | | 
 | 92 |    +-------------+--------------------------------------------+ | 
 | 93 |    | ``'r|'``    | Open a *stream* of uncompressed tar blocks | | 
 | 94 |    |             | for reading.                               | | 
 | 95 |    +-------------+--------------------------------------------+ | 
 | 96 |    | ``'r|gz'``  | Open a gzip compressed *stream* for        | | 
 | 97 |    |             | reading.                                   | | 
 | 98 |    +-------------+--------------------------------------------+ | 
 | 99 |    | ``'r|bz2'`` | Open a bzip2 compressed *stream* for       | | 
 | 100 |    |             | reading.                                   | | 
 | 101 |    +-------------+--------------------------------------------+ | 
 | 102 |    | ``'w|'``    | Open an uncompressed *stream* for writing. | | 
 | 103 |    +-------------+--------------------------------------------+ | 
 | 104 |    | ``'w|gz'``  | Open an gzip compressed *stream* for       | | 
 | 105 |    |             | writing.                                   | | 
 | 106 |    +-------------+--------------------------------------------+ | 
 | 107 |    | ``'w|bz2'`` | Open an bzip2 compressed *stream* for      | | 
 | 108 |    |             | writing.                                   | | 
 | 109 |    +-------------+--------------------------------------------+ | 
 | 110 |  | 
 | 111 |  | 
 | 112 | .. class:: TarFile | 
 | 113 |  | 
 | 114 |    Class for reading and writing tar archives. Do not use this class directly, | 
| Benjamin Peterson | a37cfc6 | 2008-05-26 13:48:34 +0000 | [diff] [blame] | 115 |    better use :func:`tarfile.open` instead. See :ref:`tarfile-objects`. | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 116 |  | 
 | 117 |  | 
 | 118 | .. function:: is_tarfile(name) | 
 | 119 |  | 
 | 120 |    Return :const:`True` if *name* is a tar archive file, that the :mod:`tarfile` | 
 | 121 |    module can read. | 
 | 122 |  | 
 | 123 |  | 
| Lars Gustäbel | 0c24e8b | 2008-08-02 11:43:24 +0000 | [diff] [blame] | 124 | The :mod:`tarfile` module defines the following exceptions: | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 125 |  | 
 | 126 |  | 
 | 127 | .. exception:: TarError | 
 | 128 |  | 
 | 129 |    Base class for all :mod:`tarfile` exceptions. | 
 | 130 |  | 
 | 131 |  | 
 | 132 | .. exception:: ReadError | 
 | 133 |  | 
 | 134 |    Is raised when a tar archive is opened, that either cannot be handled by the | 
 | 135 |    :mod:`tarfile` module or is somehow invalid. | 
 | 136 |  | 
 | 137 |  | 
 | 138 | .. exception:: CompressionError | 
 | 139 |  | 
 | 140 |    Is raised when a compression method is not supported or when the data cannot be | 
 | 141 |    decoded properly. | 
 | 142 |  | 
 | 143 |  | 
 | 144 | .. exception:: StreamError | 
 | 145 |  | 
 | 146 |    Is raised for the limitations that are typical for stream-like :class:`TarFile` | 
 | 147 |    objects. | 
 | 148 |  | 
 | 149 |  | 
 | 150 | .. exception:: ExtractError | 
 | 151 |  | 
| Benjamin Peterson | a37cfc6 | 2008-05-26 13:48:34 +0000 | [diff] [blame] | 152 |    Is raised for *non-fatal* errors when using :meth:`TarFile.extract`, but only if | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 153 |    :attr:`TarFile.errorlevel`\ ``== 2``. | 
 | 154 |  | 
 | 155 |  | 
 | 156 | .. exception:: HeaderError | 
 | 157 |  | 
| Benjamin Peterson | a37cfc6 | 2008-05-26 13:48:34 +0000 | [diff] [blame] | 158 |    Is raised by :meth:`TarInfo.frombuf` if the buffer it gets is invalid. | 
 | 159 |  | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 160 |  | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 161 |  | 
 | 162 | Each of the following constants defines a tar archive format that the | 
 | 163 | :mod:`tarfile` module is able to create. See section :ref:`tar-formats` for | 
 | 164 | details. | 
 | 165 |  | 
 | 166 |  | 
 | 167 | .. data:: USTAR_FORMAT | 
 | 168 |  | 
 | 169 |    POSIX.1-1988 (ustar) format. | 
 | 170 |  | 
 | 171 |  | 
 | 172 | .. data:: GNU_FORMAT | 
 | 173 |  | 
 | 174 |    GNU tar format. | 
 | 175 |  | 
 | 176 |  | 
 | 177 | .. data:: PAX_FORMAT | 
 | 178 |  | 
 | 179 |    POSIX.1-2001 (pax) format. | 
 | 180 |  | 
 | 181 |  | 
 | 182 | .. data:: DEFAULT_FORMAT | 
 | 183 |  | 
 | 184 |    The default format for creating archives. This is currently :const:`GNU_FORMAT`. | 
 | 185 |  | 
 | 186 |  | 
| Benjamin Peterson | a37cfc6 | 2008-05-26 13:48:34 +0000 | [diff] [blame] | 187 | The following variables are available on module level: | 
 | 188 |  | 
 | 189 |  | 
 | 190 | .. data:: ENCODING | 
 | 191 |  | 
| Victor Stinner | 0f35e2c | 2010-06-11 23:46:47 +0000 | [diff] [blame] | 192 |    The default character encoding: ``'utf-8'`` on Windows, | 
 | 193 |    :func:`sys.getfilesystemencoding` otherwise. | 
| Benjamin Peterson | a37cfc6 | 2008-05-26 13:48:34 +0000 | [diff] [blame] | 194 |  | 
 | 195 |  | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 196 | .. seealso:: | 
 | 197 |  | 
 | 198 |    Module :mod:`zipfile` | 
 | 199 |       Documentation of the :mod:`zipfile` standard module. | 
 | 200 |  | 
| Benjamin Peterson | a37cfc6 | 2008-05-26 13:48:34 +0000 | [diff] [blame] | 201 |    `GNU tar manual, Basic Tar Format <http://www.gnu.org/software/tar/manual/html_node/Standard.html>`_ | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 202 |       Documentation for tar archive files, including GNU tar extensions. | 
 | 203 |  | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 204 |  | 
 | 205 | .. _tarfile-objects: | 
 | 206 |  | 
 | 207 | TarFile Objects | 
 | 208 | --------------- | 
 | 209 |  | 
 | 210 | The :class:`TarFile` object provides an interface to a tar archive. A tar | 
 | 211 | archive is a sequence of blocks. An archive member (a stored file) is made up of | 
 | 212 | a header block followed by data blocks. It is possible to store a file in a tar | 
 | 213 | archive several times. Each archive member is represented by a :class:`TarInfo` | 
 | 214 | object, see :ref:`tarinfo-objects` for details. | 
 | 215 |  | 
| Lars Gustäbel | 0138581 | 2010-03-03 12:08:54 +0000 | [diff] [blame] | 216 | A :class:`TarFile` object can be used as a context manager in a :keyword:`with` | 
 | 217 | statement. It will automatically be closed when the block is completed. Please | 
 | 218 | note that in the event of an exception an archive opened for writing will not | 
| Benjamin Peterson | 08bf91c | 2010-04-11 16:12:57 +0000 | [diff] [blame] | 219 | be finalized; only the internally used file object will be closed. See the | 
| Lars Gustäbel | 0138581 | 2010-03-03 12:08:54 +0000 | [diff] [blame] | 220 | :ref:`tar-examples` section for a use case. | 
 | 221 |  | 
 | 222 | .. versionadded:: 3.2 | 
 | 223 |    Added support for the context manager protocol. | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 224 |  | 
| Victor Stinner | de629d4 | 2010-05-05 21:43:57 +0000 | [diff] [blame] | 225 | .. class:: TarFile(name=None, mode='r', fileobj=None, format=DEFAULT_FORMAT, tarinfo=TarInfo, dereference=False, ignore_zeros=False, encoding=ENCODING, errors='surrogateescape', pax_headers=None, debug=0, errorlevel=0) | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 226 |  | 
 | 227 |    All following arguments are optional and can be accessed as instance attributes | 
 | 228 |    as well. | 
 | 229 |  | 
 | 230 |    *name* is the pathname of the archive. It can be omitted if *fileobj* is given. | 
 | 231 |    In this case, the file object's :attr:`name` attribute is used if it exists. | 
 | 232 |  | 
 | 233 |    *mode* is either ``'r'`` to read from an existing archive, ``'a'`` to append | 
 | 234 |    data to an existing file or ``'w'`` to create a new file overwriting an existing | 
 | 235 |    one. | 
 | 236 |  | 
 | 237 |    If *fileobj* is given, it is used for reading or writing data. If it can be | 
 | 238 |    determined, *mode* is overridden by *fileobj*'s mode. *fileobj* will be used | 
 | 239 |    from position 0. | 
 | 240 |  | 
 | 241 |    .. note:: | 
 | 242 |  | 
 | 243 |       *fileobj* is not closed, when :class:`TarFile` is closed. | 
 | 244 |  | 
 | 245 |    *format* controls the archive format. It must be one of the constants | 
 | 246 |    :const:`USTAR_FORMAT`, :const:`GNU_FORMAT` or :const:`PAX_FORMAT` that are | 
 | 247 |    defined at module level. | 
 | 248 |  | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 249 |    The *tarinfo* argument can be used to replace the default :class:`TarInfo` class | 
 | 250 |    with a different one. | 
 | 251 |  | 
| Benjamin Peterson | a37cfc6 | 2008-05-26 13:48:34 +0000 | [diff] [blame] | 252 |    If *dereference* is :const:`False`, add symbolic and hard links to the archive. If it | 
 | 253 |    is :const:`True`, add the content of the target files to the archive. This has no | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 254 |    effect on systems that do not support symbolic links. | 
 | 255 |  | 
| Benjamin Peterson | a37cfc6 | 2008-05-26 13:48:34 +0000 | [diff] [blame] | 256 |    If *ignore_zeros* is :const:`False`, treat an empty block as the end of the archive. | 
 | 257 |    If it is :const:`True`, skip empty (and invalid) blocks and try to get as many members | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 258 |    as possible. This is only useful for reading concatenated or damaged archives. | 
 | 259 |  | 
 | 260 |    *debug* can be set from ``0`` (no debug messages) up to ``3`` (all debug | 
 | 261 |    messages). The messages are written to ``sys.stderr``. | 
 | 262 |  | 
| Benjamin Peterson | a37cfc6 | 2008-05-26 13:48:34 +0000 | [diff] [blame] | 263 |    If *errorlevel* is ``0``, all errors are ignored when using :meth:`TarFile.extract`. | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 264 |    Nevertheless, they appear as error messages in the debug output, when debugging | 
 | 265 |    is enabled.  If ``1``, all *fatal* errors are raised as :exc:`OSError` or | 
 | 266 |    :exc:`IOError` exceptions. If ``2``, all *non-fatal* errors are raised as | 
 | 267 |    :exc:`TarError` exceptions as well. | 
 | 268 |  | 
| Lars Gustäbel | 3741eff | 2007-08-21 12:17:05 +0000 | [diff] [blame] | 269 |    The *encoding* and *errors* arguments define the character encoding to be | 
 | 270 |    used for reading or writing the archive and how conversion errors are going | 
 | 271 |    to be handled. The default settings will work for most users. | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 272 |    See section :ref:`tar-unicode` for in-depth information. | 
 | 273 |  | 
| Victor Stinner | de629d4 | 2010-05-05 21:43:57 +0000 | [diff] [blame] | 274 |    .. versionchanged:: 3.2 | 
 | 275 |       Use ``'surrogateescape'`` as the default for the *errors* argument. | 
 | 276 |  | 
| Lars Gustäbel | 3741eff | 2007-08-21 12:17:05 +0000 | [diff] [blame] | 277 |    The *pax_headers* argument is an optional dictionary of strings which | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 278 |    will be added as a pax global header if *format* is :const:`PAX_FORMAT`. | 
 | 279 |  | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 280 |  | 
 | 281 | .. method:: TarFile.open(...) | 
 | 282 |  | 
| Benjamin Peterson | a37cfc6 | 2008-05-26 13:48:34 +0000 | [diff] [blame] | 283 |    Alternative constructor. The :func:`tarfile.open` function is actually a | 
 | 284 |    shortcut to this classmethod. | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 285 |  | 
 | 286 |  | 
 | 287 | .. method:: TarFile.getmember(name) | 
 | 288 |  | 
 | 289 |    Return a :class:`TarInfo` object for member *name*. If *name* can not be found | 
 | 290 |    in the archive, :exc:`KeyError` is raised. | 
 | 291 |  | 
 | 292 |    .. note:: | 
 | 293 |  | 
 | 294 |       If a member occurs more than once in the archive, its last occurrence is assumed | 
 | 295 |       to be the most up-to-date version. | 
 | 296 |  | 
 | 297 |  | 
 | 298 | .. method:: TarFile.getmembers() | 
 | 299 |  | 
 | 300 |    Return the members of the archive as a list of :class:`TarInfo` objects. The | 
 | 301 |    list has the same order as the members in the archive. | 
 | 302 |  | 
 | 303 |  | 
 | 304 | .. method:: TarFile.getnames() | 
 | 305 |  | 
 | 306 |    Return the members as a list of their names. It has the same order as the list | 
 | 307 |    returned by :meth:`getmembers`. | 
 | 308 |  | 
 | 309 |  | 
 | 310 | .. method:: TarFile.list(verbose=True) | 
 | 311 |  | 
 | 312 |    Print a table of contents to ``sys.stdout``. If *verbose* is :const:`False`, | 
 | 313 |    only the names of the members are printed. If it is :const:`True`, output | 
 | 314 |    similar to that of :program:`ls -l` is produced. | 
 | 315 |  | 
 | 316 |  | 
 | 317 | .. method:: TarFile.next() | 
 | 318 |  | 
 | 319 |    Return the next member of the archive as a :class:`TarInfo` object, when | 
| Benjamin Peterson | a37cfc6 | 2008-05-26 13:48:34 +0000 | [diff] [blame] | 320 |    :class:`TarFile` is opened for reading. Return :const:`None` if there is no more | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 321 |    available. | 
 | 322 |  | 
 | 323 |  | 
| Benjamin Peterson | a37cfc6 | 2008-05-26 13:48:34 +0000 | [diff] [blame] | 324 | .. method:: TarFile.extractall(path=".", members=None) | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 325 |  | 
 | 326 |    Extract all members from the archive to the current working directory or | 
 | 327 |    directory *path*. If optional *members* is given, it must be a subset of the | 
 | 328 |    list returned by :meth:`getmembers`. Directory information like owner, | 
 | 329 |    modification time and permissions are set after all members have been extracted. | 
 | 330 |    This is done to work around two problems: A directory's modification time is | 
 | 331 |    reset each time a file is created in it. And, if a directory's permissions do | 
 | 332 |    not allow writing, extracting files to it will fail. | 
 | 333 |  | 
| Thomas Wouters | 47b49bf | 2007-08-30 22:15:33 +0000 | [diff] [blame] | 334 |    .. warning:: | 
 | 335 |  | 
 | 336 |       Never extract archives from untrusted sources without prior inspection. | 
 | 337 |       It is possible that files are created outside of *path*, e.g. members | 
 | 338 |       that have absolute filenames starting with ``"/"`` or filenames with two | 
 | 339 |       dots ``".."``. | 
 | 340 |  | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 341 |  | 
| Martin v. Löwis | 16f344d | 2010-11-01 21:39:13 +0000 | [diff] [blame] | 342 | .. method:: TarFile.extract(member, path="", set_attrs=True) | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 343 |  | 
 | 344 |    Extract a member from the archive to the current working directory, using its | 
 | 345 |    full name. Its file information is extracted as accurately as possible. *member* | 
 | 346 |    may be a filename or a :class:`TarInfo` object. You can specify a different | 
| Martin v. Löwis | 16f344d | 2010-11-01 21:39:13 +0000 | [diff] [blame] | 347 |    directory using *path*. File attributes (owner, mtime, mode) are set unless | 
 | 348 |    *set_attrs* is False. | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 349 |  | 
 | 350 |    .. note:: | 
 | 351 |  | 
| Benjamin Peterson | a37cfc6 | 2008-05-26 13:48:34 +0000 | [diff] [blame] | 352 |       The :meth:`extract` method does not take care of several extraction issues. | 
 | 353 |       In most cases you should consider using the :meth:`extractall` method. | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 354 |  | 
| Thomas Wouters | 47b49bf | 2007-08-30 22:15:33 +0000 | [diff] [blame] | 355 |    .. warning:: | 
 | 356 |  | 
 | 357 |       See the warning for :meth:`extractall`. | 
 | 358 |  | 
| Martin v. Löwis | 16f344d | 2010-11-01 21:39:13 +0000 | [diff] [blame] | 359 |    .. versionchanged:: 3.2 | 
 | 360 |       Added the *set_attrs* parameter. | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 361 |  | 
 | 362 | .. method:: TarFile.extractfile(member) | 
 | 363 |  | 
 | 364 |    Extract a member from the archive as a file object. *member* may be a filename | 
| Antoine Pitrou | 11cb961 | 2010-09-15 11:11:28 +0000 | [diff] [blame] | 365 |    or a :class:`TarInfo` object. If *member* is a regular file, a :term:`file-like | 
 | 366 |    object` is returned. If *member* is a link, a file-like object is constructed from | 
 | 367 |    the link's target. If *member* is none of the above, :const:`None` is returned. | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 368 |  | 
 | 369 |    .. note:: | 
 | 370 |  | 
| Georg Brandl | ff2ad0e | 2009-04-27 16:51:45 +0000 | [diff] [blame] | 371 |       The file-like object is read-only.  It provides the methods | 
 | 372 |       :meth:`read`, :meth:`readline`, :meth:`readlines`, :meth:`seek`, :meth:`tell`, | 
 | 373 |       and :meth:`close`, and also supports iteration over its lines. | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 374 |  | 
 | 375 |  | 
| Raymond Hettinger | a63a312 | 2011-01-26 20:34:14 +0000 | [diff] [blame] | 376 | .. method:: TarFile.add(name, arcname=None, recursive=True, exclude=None, *, filter=None) | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 377 |  | 
| Raymond Hettinger | a63a312 | 2011-01-26 20:34:14 +0000 | [diff] [blame] | 378 |    Add the file *name* to the archive. *name* may be any type of file | 
 | 379 |    (directory, fifo, symbolic link, etc.). If given, *arcname* specifies an | 
 | 380 |    alternative name for the file in the archive. Directories are added | 
 | 381 |    recursively by default. This can be avoided by setting *recursive* to | 
 | 382 |    :const:`False`. If *exclude* is given, it must be a function that takes one | 
 | 383 |    filename argument and returns a boolean value. Depending on this value the | 
 | 384 |    respective file is either excluded (:const:`True`) or added | 
 | 385 |    (:const:`False`). If *filter* is specified it must be a keyword argument.  It | 
 | 386 |    should be a function that takes a :class:`TarInfo` object argument and | 
 | 387 |    returns the changed :class:`TarInfo` object. If it instead returns | 
 | 388 |    :const:`None` the :class:`TarInfo` object will be excluded from the | 
 | 389 |    archive. See :ref:`tar-examples` for an example. | 
| Lars Gustäbel | 049d2aa | 2009-09-12 10:44:00 +0000 | [diff] [blame] | 390 |  | 
 | 391 |    .. versionchanged:: 3.2 | 
 | 392 |       Added the *filter* parameter. | 
 | 393 |  | 
 | 394 |    .. deprecated:: 3.2 | 
 | 395 |       The *exclude* parameter is deprecated, please use the *filter* parameter | 
 | 396 |       instead. | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 397 |  | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 398 |  | 
| Benjamin Peterson | a37cfc6 | 2008-05-26 13:48:34 +0000 | [diff] [blame] | 399 | .. method:: TarFile.addfile(tarinfo, fileobj=None) | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 400 |  | 
 | 401 |    Add the :class:`TarInfo` object *tarinfo* to the archive. If *fileobj* is given, | 
 | 402 |    ``tarinfo.size`` bytes are read from it and added to the archive.  You can | 
 | 403 |    create :class:`TarInfo` objects using :meth:`gettarinfo`. | 
 | 404 |  | 
 | 405 |    .. note:: | 
 | 406 |  | 
 | 407 |       On Windows platforms, *fileobj* should always be opened with mode ``'rb'`` to | 
 | 408 |       avoid irritation about the file size. | 
 | 409 |  | 
 | 410 |  | 
| Benjamin Peterson | a37cfc6 | 2008-05-26 13:48:34 +0000 | [diff] [blame] | 411 | .. method:: TarFile.gettarinfo(name=None, arcname=None, fileobj=None) | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 412 |  | 
| Antoine Pitrou | 11cb961 | 2010-09-15 11:11:28 +0000 | [diff] [blame] | 413 |    Create a :class:`TarInfo` object for either the file *name* or the :term:`file | 
 | 414 |    object` *fileobj* (using :func:`os.fstat` on its file descriptor).  You can modify | 
 | 415 |    some of the :class:`TarInfo`'s attributes before you add it using :meth:`addfile`. | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 416 |    If given, *arcname* specifies an alternative name for the file in the archive. | 
 | 417 |  | 
 | 418 |  | 
 | 419 | .. method:: TarFile.close() | 
 | 420 |  | 
 | 421 |    Close the :class:`TarFile`. In write mode, two finishing zero blocks are | 
 | 422 |    appended to the archive. | 
 | 423 |  | 
 | 424 |  | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 425 | .. attribute:: TarFile.pax_headers | 
 | 426 |  | 
 | 427 |    A dictionary containing key-value pairs of pax global headers. | 
 | 428 |  | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 429 |  | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 430 |  | 
 | 431 | .. _tarinfo-objects: | 
 | 432 |  | 
 | 433 | TarInfo Objects | 
 | 434 | --------------- | 
 | 435 |  | 
 | 436 | A :class:`TarInfo` object represents one member in a :class:`TarFile`. Aside | 
 | 437 | from storing all required attributes of a file (like file type, size, time, | 
 | 438 | permissions, owner etc.), it provides some useful methods to determine its type. | 
 | 439 | It does *not* contain the file's data itself. | 
 | 440 |  | 
 | 441 | :class:`TarInfo` objects are returned by :class:`TarFile`'s methods | 
 | 442 | :meth:`getmember`, :meth:`getmembers` and :meth:`gettarinfo`. | 
 | 443 |  | 
 | 444 |  | 
| Benjamin Peterson | a37cfc6 | 2008-05-26 13:48:34 +0000 | [diff] [blame] | 445 | .. class:: TarInfo(name="") | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 446 |  | 
 | 447 |    Create a :class:`TarInfo` object. | 
 | 448 |  | 
 | 449 |  | 
 | 450 | .. method:: TarInfo.frombuf(buf) | 
 | 451 |  | 
 | 452 |    Create and return a :class:`TarInfo` object from string buffer *buf*. | 
 | 453 |  | 
| Georg Brandl | 55ac8f0 | 2007-09-01 13:51:09 +0000 | [diff] [blame] | 454 |    Raises :exc:`HeaderError` if the buffer is invalid.. | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 455 |  | 
 | 456 |  | 
 | 457 | .. method:: TarInfo.fromtarfile(tarfile) | 
 | 458 |  | 
 | 459 |    Read the next member from the :class:`TarFile` object *tarfile* and return it as | 
 | 460 |    a :class:`TarInfo` object. | 
 | 461 |  | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 462 |  | 
| Victor Stinner | de629d4 | 2010-05-05 21:43:57 +0000 | [diff] [blame] | 463 | .. method:: TarInfo.tobuf(format=DEFAULT_FORMAT, encoding=ENCODING, errors='surrogateescape') | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 464 |  | 
 | 465 |    Create a string buffer from a :class:`TarInfo` object. For information on the | 
 | 466 |    arguments see the constructor of the :class:`TarFile` class. | 
 | 467 |  | 
| Victor Stinner | de629d4 | 2010-05-05 21:43:57 +0000 | [diff] [blame] | 468 |    .. versionchanged:: 3.2 | 
 | 469 |       Use ``'surrogateescape'`` as the default for the *errors* argument. | 
 | 470 |  | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 471 |  | 
 | 472 | A ``TarInfo`` object has the following public data attributes: | 
 | 473 |  | 
 | 474 |  | 
 | 475 | .. attribute:: TarInfo.name | 
 | 476 |  | 
 | 477 |    Name of the archive member. | 
 | 478 |  | 
 | 479 |  | 
 | 480 | .. attribute:: TarInfo.size | 
 | 481 |  | 
 | 482 |    Size in bytes. | 
 | 483 |  | 
 | 484 |  | 
 | 485 | .. attribute:: TarInfo.mtime | 
 | 486 |  | 
 | 487 |    Time of last modification. | 
 | 488 |  | 
 | 489 |  | 
 | 490 | .. attribute:: TarInfo.mode | 
 | 491 |  | 
 | 492 |    Permission bits. | 
 | 493 |  | 
 | 494 |  | 
 | 495 | .. attribute:: TarInfo.type | 
 | 496 |  | 
 | 497 |    File type.  *type* is usually one of these constants: :const:`REGTYPE`, | 
 | 498 |    :const:`AREGTYPE`, :const:`LNKTYPE`, :const:`SYMTYPE`, :const:`DIRTYPE`, | 
 | 499 |    :const:`FIFOTYPE`, :const:`CONTTYPE`, :const:`CHRTYPE`, :const:`BLKTYPE`, | 
 | 500 |    :const:`GNUTYPE_SPARSE`.  To determine the type of a :class:`TarInfo` object | 
 | 501 |    more conveniently, use the ``is_*()`` methods below. | 
 | 502 |  | 
 | 503 |  | 
 | 504 | .. attribute:: TarInfo.linkname | 
 | 505 |  | 
 | 506 |    Name of the target file name, which is only present in :class:`TarInfo` objects | 
 | 507 |    of type :const:`LNKTYPE` and :const:`SYMTYPE`. | 
 | 508 |  | 
 | 509 |  | 
 | 510 | .. attribute:: TarInfo.uid | 
 | 511 |  | 
 | 512 |    User ID of the user who originally stored this member. | 
 | 513 |  | 
 | 514 |  | 
 | 515 | .. attribute:: TarInfo.gid | 
 | 516 |  | 
 | 517 |    Group ID of the user who originally stored this member. | 
 | 518 |  | 
 | 519 |  | 
 | 520 | .. attribute:: TarInfo.uname | 
 | 521 |  | 
 | 522 |    User name. | 
 | 523 |  | 
 | 524 |  | 
 | 525 | .. attribute:: TarInfo.gname | 
 | 526 |  | 
 | 527 |    Group name. | 
 | 528 |  | 
 | 529 |  | 
 | 530 | .. attribute:: TarInfo.pax_headers | 
 | 531 |  | 
 | 532 |    A dictionary containing key-value pairs of an associated pax extended header. | 
 | 533 |  | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 534 |  | 
 | 535 | A :class:`TarInfo` object also provides some convenient query methods: | 
 | 536 |  | 
 | 537 |  | 
 | 538 | .. method:: TarInfo.isfile() | 
 | 539 |  | 
 | 540 |    Return :const:`True` if the :class:`Tarinfo` object is a regular file. | 
 | 541 |  | 
 | 542 |  | 
 | 543 | .. method:: TarInfo.isreg() | 
 | 544 |  | 
 | 545 |    Same as :meth:`isfile`. | 
 | 546 |  | 
 | 547 |  | 
 | 548 | .. method:: TarInfo.isdir() | 
 | 549 |  | 
 | 550 |    Return :const:`True` if it is a directory. | 
 | 551 |  | 
 | 552 |  | 
 | 553 | .. method:: TarInfo.issym() | 
 | 554 |  | 
 | 555 |    Return :const:`True` if it is a symbolic link. | 
 | 556 |  | 
 | 557 |  | 
 | 558 | .. method:: TarInfo.islnk() | 
 | 559 |  | 
 | 560 |    Return :const:`True` if it is a hard link. | 
 | 561 |  | 
 | 562 |  | 
 | 563 | .. method:: TarInfo.ischr() | 
 | 564 |  | 
 | 565 |    Return :const:`True` if it is a character device. | 
 | 566 |  | 
 | 567 |  | 
 | 568 | .. method:: TarInfo.isblk() | 
 | 569 |  | 
 | 570 |    Return :const:`True` if it is a block device. | 
 | 571 |  | 
 | 572 |  | 
 | 573 | .. method:: TarInfo.isfifo() | 
 | 574 |  | 
 | 575 |    Return :const:`True` if it is a FIFO. | 
 | 576 |  | 
 | 577 |  | 
 | 578 | .. method:: TarInfo.isdev() | 
 | 579 |  | 
 | 580 |    Return :const:`True` if it is one of character device, block device or FIFO. | 
 | 581 |  | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 582 |  | 
 | 583 | .. _tar-examples: | 
 | 584 |  | 
 | 585 | Examples | 
 | 586 | -------- | 
 | 587 |  | 
 | 588 | How to extract an entire tar archive to the current working directory:: | 
 | 589 |  | 
 | 590 |    import tarfile | 
 | 591 |    tar = tarfile.open("sample.tar.gz") | 
 | 592 |    tar.extractall() | 
 | 593 |    tar.close() | 
 | 594 |  | 
| Benjamin Peterson | a37cfc6 | 2008-05-26 13:48:34 +0000 | [diff] [blame] | 595 | How to extract a subset of a tar archive with :meth:`TarFile.extractall` using | 
 | 596 | a generator function instead of a list:: | 
 | 597 |  | 
 | 598 |    import os | 
 | 599 |    import tarfile | 
 | 600 |  | 
 | 601 |    def py_files(members): | 
 | 602 |        for tarinfo in members: | 
 | 603 |            if os.path.splitext(tarinfo.name)[1] == ".py": | 
 | 604 |                yield tarinfo | 
 | 605 |  | 
 | 606 |    tar = tarfile.open("sample.tar.gz") | 
 | 607 |    tar.extractall(members=py_files(tar)) | 
 | 608 |    tar.close() | 
 | 609 |  | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 610 | How to create an uncompressed tar archive from a list of filenames:: | 
 | 611 |  | 
 | 612 |    import tarfile | 
 | 613 |    tar = tarfile.open("sample.tar", "w") | 
 | 614 |    for name in ["foo", "bar", "quux"]: | 
 | 615 |        tar.add(name) | 
 | 616 |    tar.close() | 
 | 617 |  | 
| Lars Gustäbel | 0138581 | 2010-03-03 12:08:54 +0000 | [diff] [blame] | 618 | The same example using the :keyword:`with` statement:: | 
 | 619 |  | 
 | 620 |     import tarfile | 
 | 621 |     with tarfile.open("sample.tar", "w") as tar: | 
 | 622 |         for name in ["foo", "bar", "quux"]: | 
 | 623 |             tar.add(name) | 
 | 624 |  | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 625 | How to read a gzip compressed tar archive and display some member information:: | 
 | 626 |  | 
 | 627 |    import tarfile | 
 | 628 |    tar = tarfile.open("sample.tar.gz", "r:gz") | 
 | 629 |    for tarinfo in tar: | 
| Collin Winter | c79461b | 2007-09-01 23:34:30 +0000 | [diff] [blame] | 630 |        print(tarinfo.name, "is", tarinfo.size, "bytes in size and is", end="") | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 631 |        if tarinfo.isreg(): | 
| Collin Winter | c79461b | 2007-09-01 23:34:30 +0000 | [diff] [blame] | 632 |            print("a regular file.") | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 633 |        elif tarinfo.isdir(): | 
| Collin Winter | c79461b | 2007-09-01 23:34:30 +0000 | [diff] [blame] | 634 |            print("a directory.") | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 635 |        else: | 
| Collin Winter | c79461b | 2007-09-01 23:34:30 +0000 | [diff] [blame] | 636 |            print("something else.") | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 637 |    tar.close() | 
 | 638 |  | 
| Lars Gustäbel | 049d2aa | 2009-09-12 10:44:00 +0000 | [diff] [blame] | 639 | How to create an archive and reset the user information using the *filter* | 
 | 640 | parameter in :meth:`TarFile.add`:: | 
 | 641 |  | 
 | 642 |     import tarfile | 
 | 643 |     def reset(tarinfo): | 
 | 644 |         tarinfo.uid = tarinfo.gid = 0 | 
 | 645 |         tarinfo.uname = tarinfo.gname = "root" | 
 | 646 |         return tarinfo | 
 | 647 |     tar = tarfile.open("sample.tar.gz", "w:gz") | 
 | 648 |     tar.add("foo", filter=reset) | 
 | 649 |     tar.close() | 
 | 650 |  | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 651 |  | 
 | 652 | .. _tar-formats: | 
 | 653 |  | 
 | 654 | Supported tar formats | 
 | 655 | --------------------- | 
 | 656 |  | 
 | 657 | There are three tar formats that can be created with the :mod:`tarfile` module: | 
 | 658 |  | 
 | 659 | * The POSIX.1-1988 ustar format (:const:`USTAR_FORMAT`). It supports filenames | 
 | 660 |   up to a length of at best 256 characters and linknames up to 100 characters. The | 
 | 661 |   maximum file size is 8 gigabytes. This is an old and limited but widely | 
 | 662 |   supported format. | 
 | 663 |  | 
 | 664 | * The GNU tar format (:const:`GNU_FORMAT`). It supports long filenames and | 
 | 665 |   linknames, files bigger than 8 gigabytes and sparse files. It is the de facto | 
 | 666 |   standard on GNU/Linux systems. :mod:`tarfile` fully supports the GNU tar | 
 | 667 |   extensions for long names, sparse file support is read-only. | 
 | 668 |  | 
 | 669 | * The POSIX.1-2001 pax format (:const:`PAX_FORMAT`). It is the most flexible | 
 | 670 |   format with virtually no limits. It supports long filenames and linknames, large | 
 | 671 |   files and stores pathnames in a portable way. However, not all tar | 
 | 672 |   implementations today are able to handle pax archives properly. | 
 | 673 |  | 
 | 674 |   The *pax* format is an extension to the existing *ustar* format. It uses extra | 
 | 675 |   headers for information that cannot be stored otherwise. There are two flavours | 
 | 676 |   of pax headers: Extended headers only affect the subsequent file header, global | 
 | 677 |   headers are valid for the complete archive and affect all following files. All | 
 | 678 |   the data in a pax header is encoded in *UTF-8* for portability reasons. | 
 | 679 |  | 
 | 680 | There are some more variants of the tar format which can be read, but not | 
 | 681 | created: | 
 | 682 |  | 
 | 683 | * The ancient V7 format. This is the first tar format from Unix Seventh Edition, | 
 | 684 |   storing only regular files and directories. Names must not be longer than 100 | 
 | 685 |   characters, there is no user/group name information. Some archives have | 
 | 686 |   miscalculated header checksums in case of fields with non-ASCII characters. | 
 | 687 |  | 
 | 688 | * The SunOS tar extended format. This format is a variant of the POSIX.1-2001 | 
 | 689 |   pax format, but is not compatible. | 
 | 690 |  | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 691 | .. _tar-unicode: | 
 | 692 |  | 
 | 693 | Unicode issues | 
 | 694 | -------------- | 
 | 695 |  | 
 | 696 | The tar format was originally conceived to make backups on tape drives with the | 
 | 697 | main focus on preserving file system information. Nowadays tar archives are | 
 | 698 | commonly used for file distribution and exchanging archives over networks. One | 
| Lars Gustäbel | 3741eff | 2007-08-21 12:17:05 +0000 | [diff] [blame] | 699 | problem of the original format (which is the basis of all other formats) is | 
 | 700 | that there is no concept of supporting different character encodings. For | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 701 | example, an ordinary tar archive created on a *UTF-8* system cannot be read | 
| Lars Gustäbel | 3741eff | 2007-08-21 12:17:05 +0000 | [diff] [blame] | 702 | correctly on a *Latin-1* system if it contains non-*ASCII* characters. Textual | 
 | 703 | metadata (like filenames, linknames, user/group names) will appear damaged. | 
 | 704 | Unfortunately, there is no way to autodetect the encoding of an archive. The | 
 | 705 | pax format was designed to solve this problem. It stores non-ASCII metadata | 
 | 706 | using the universal character encoding *UTF-8*. | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 707 |  | 
| Lars Gustäbel | 3741eff | 2007-08-21 12:17:05 +0000 | [diff] [blame] | 708 | The details of character conversion in :mod:`tarfile` are controlled by the | 
 | 709 | *encoding* and *errors* keyword arguments of the :class:`TarFile` class. | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 710 |  | 
| Lars Gustäbel | 3741eff | 2007-08-21 12:17:05 +0000 | [diff] [blame] | 711 | *encoding* defines the character encoding to use for the metadata in the | 
 | 712 | archive. The default value is :func:`sys.getfilesystemencoding` or ``'ascii'`` | 
 | 713 | as a fallback. Depending on whether the archive is read or written, the | 
 | 714 | metadata must be either decoded or encoded. If *encoding* is not set | 
 | 715 | appropriately, this conversion may fail. | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 716 |  | 
 | 717 | The *errors* argument defines how characters are treated that cannot be | 
| Victor Stinner | de629d4 | 2010-05-05 21:43:57 +0000 | [diff] [blame] | 718 | converted. Possible values are listed in section :ref:`codec-base-classes`. | 
 | 719 | The default scheme is ``'surrogateescape'`` which Python also uses for its | 
 | 720 | file system calls, see :ref:`os-filenames`. | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 721 |  | 
| Lars Gustäbel | 1465cc2 | 2010-05-17 18:02:50 +0000 | [diff] [blame] | 722 | In case of :const:`PAX_FORMAT` archives, *encoding* is generally not needed | 
 | 723 | because all the metadata is stored using *UTF-8*. *encoding* is only used in | 
 | 724 | the rare cases when binary pax headers are decoded or when strings with | 
 | 725 | surrogate characters are stored. | 
 | 726 |  |