blob: 5f181559399314a2bbf48eb1003c090a342ea4f5 [file] [log] [blame]
Georg Brandl116aa622007-08-15 14:28:22 +00001.. _tarfile-mod:
2
3:mod:`tarfile` --- Read and write tar archive files
4===================================================
5
6.. module:: tarfile
7 :synopsis: Read and write tar-format archive files.
8
9
Georg Brandl116aa622007-08-15 14:28:22 +000010.. moduleauthor:: Lars Gustäbel <lars@gustaebel.de>
11.. sectionauthor:: Lars Gustäbel <lars@gustaebel.de>
12
13
Guido van Rossum77677112007-11-05 19:43:04 +000014The :mod:`tarfile` module makes it possible to read and write tar
15archives, including those using gzip or bz2 compression.
Christian Heimes255f53b2007-12-08 15:33:56 +000016(:file:`.zip` files can be read and written using the :mod:`zipfile` module.)
Guido van Rossum77677112007-11-05 19:43:04 +000017
Georg Brandl116aa622007-08-15 14:28:22 +000018Some facts and figures:
19
Guido van Rossum77677112007-11-05 19:43:04 +000020* reads and writes :mod:`gzip` and :mod:`bz2` compressed archives.
Georg Brandl116aa622007-08-15 14:28:22 +000021
22* read/write support for the POSIX.1-1988 (ustar) format.
23
24* read/write support for the GNU tar format including *longname* and *longlink*
25 extensions, read-only support for the *sparse* extension.
26
27* read/write support for the POSIX.1-2001 (pax) format.
28
Georg Brandl116aa622007-08-15 14:28:22 +000029* handles directories, regular files, hardlinks, symbolic links, fifos,
30 character devices and block devices and is able to acquire and restore file
31 information like timestamp, access permissions and owner.
32
33* can handle tape devices.
34
35
36.. function:: open(name[, mode[, fileobj[, bufsize]]], **kwargs)
37
38 Return a :class:`TarFile` object for the pathname *name*. For detailed
39 information on :class:`TarFile` objects and the keyword arguments that are
40 allowed, see :ref:`tarfile-objects`.
41
42 *mode* has to be a string of the form ``'filemode[:compression]'``, it defaults
43 to ``'r'``. Here is a full list of mode combinations:
44
45 +------------------+---------------------------------------------+
46 | mode | action |
47 +==================+=============================================+
48 | ``'r' or 'r:*'`` | Open for reading with transparent |
49 | | compression (recommended). |
50 +------------------+---------------------------------------------+
51 | ``'r:'`` | Open for reading exclusively without |
52 | | compression. |
53 +------------------+---------------------------------------------+
54 | ``'r:gz'`` | Open for reading with gzip compression. |
55 +------------------+---------------------------------------------+
56 | ``'r:bz2'`` | Open for reading with bzip2 compression. |
57 +------------------+---------------------------------------------+
58 | ``'a' or 'a:'`` | Open for appending with no compression. The |
59 | | file is created if it does not exist. |
60 +------------------+---------------------------------------------+
61 | ``'w' or 'w:'`` | Open for uncompressed writing. |
62 +------------------+---------------------------------------------+
63 | ``'w:gz'`` | Open for gzip compressed writing. |
64 +------------------+---------------------------------------------+
65 | ``'w:bz2'`` | Open for bzip2 compressed writing. |
66 +------------------+---------------------------------------------+
67
68 Note that ``'a:gz'`` or ``'a:bz2'`` is not possible. If *mode* is not suitable
69 to open a certain (compressed) file for reading, :exc:`ReadError` is raised. Use
70 *mode* ``'r'`` to avoid this. If a compression method is not supported,
71 :exc:`CompressionError` is raised.
72
73 If *fileobj* is specified, it is used as an alternative to a file object opened
74 for *name*. It is supposed to be at position 0.
75
76 For special purposes, there is a second format for *mode*:
77 ``'filemode|[compression]'``. :func:`open` will return a :class:`TarFile`
78 object that processes its data as a stream of blocks. No random seeking will
79 be done on the file. If given, *fileobj* may be any object that has a
80 :meth:`read` or :meth:`write` method (depending on the *mode*). *bufsize*
81 specifies the blocksize and defaults to ``20 * 512`` bytes. Use this variant
82 in combination with e.g. ``sys.stdin``, a socket file object or a tape
83 device. However, such a :class:`TarFile` object is limited in that it does
84 not allow to be accessed randomly, see :ref:`tar-examples`. The currently
85 possible modes:
86
87 +-------------+--------------------------------------------+
88 | Mode | Action |
89 +=============+============================================+
90 | ``'r|*'`` | Open a *stream* of tar blocks for reading |
91 | | with transparent compression. |
92 +-------------+--------------------------------------------+
93 | ``'r|'`` | Open a *stream* of uncompressed tar blocks |
94 | | for reading. |
95 +-------------+--------------------------------------------+
96 | ``'r|gz'`` | Open a gzip compressed *stream* for |
97 | | reading. |
98 +-------------+--------------------------------------------+
99 | ``'r|bz2'`` | Open a bzip2 compressed *stream* for |
100 | | reading. |
101 +-------------+--------------------------------------------+
102 | ``'w|'`` | Open an uncompressed *stream* for writing. |
103 +-------------+--------------------------------------------+
104 | ``'w|gz'`` | Open an gzip compressed *stream* for |
105 | | writing. |
106 +-------------+--------------------------------------------+
107 | ``'w|bz2'`` | Open an bzip2 compressed *stream* for |
108 | | writing. |
109 +-------------+--------------------------------------------+
110
111
112.. class:: TarFile
113
114 Class for reading and writing tar archives. Do not use this class directly,
115 better use :func:`open` instead. See :ref:`tarfile-objects`.
116
117
118.. function:: is_tarfile(name)
119
120 Return :const:`True` if *name* is a tar archive file, that the :mod:`tarfile`
121 module can read.
122
123
124.. class:: TarFileCompat(filename[, mode[, compression]])
125
126 Class for limited access to tar archives with a :mod:`zipfile`\ -like interface.
127 Please consult the documentation of the :mod:`zipfile` module for more details.
128 *compression* must be one of the following constants:
129
130
131 .. data:: TAR_PLAIN
132
133 Constant for an uncompressed tar archive.
134
135
136 .. data:: TAR_GZIPPED
137
138 Constant for a :mod:`gzip` compressed tar archive.
139
140
141.. exception:: TarError
142
143 Base class for all :mod:`tarfile` exceptions.
144
145
146.. exception:: ReadError
147
148 Is raised when a tar archive is opened, that either cannot be handled by the
149 :mod:`tarfile` module or is somehow invalid.
150
151
152.. exception:: CompressionError
153
154 Is raised when a compression method is not supported or when the data cannot be
155 decoded properly.
156
157
158.. exception:: StreamError
159
160 Is raised for the limitations that are typical for stream-like :class:`TarFile`
161 objects.
162
163
164.. exception:: ExtractError
165
166 Is raised for *non-fatal* errors when using :meth:`extract`, but only if
167 :attr:`TarFile.errorlevel`\ ``== 2``.
168
169
170.. exception:: HeaderError
171
172 Is raised by :meth:`frombuf` if the buffer it gets is invalid.
173
Georg Brandl116aa622007-08-15 14:28:22 +0000174
175Each of the following constants defines a tar archive format that the
176:mod:`tarfile` module is able to create. See section :ref:`tar-formats` for
177details.
178
179
180.. data:: USTAR_FORMAT
181
182 POSIX.1-1988 (ustar) format.
183
184
185.. data:: GNU_FORMAT
186
187 GNU tar format.
188
189
190.. data:: PAX_FORMAT
191
192 POSIX.1-2001 (pax) format.
193
194
195.. data:: DEFAULT_FORMAT
196
197 The default format for creating archives. This is currently :const:`GNU_FORMAT`.
198
199
200.. seealso::
201
202 Module :mod:`zipfile`
203 Documentation of the :mod:`zipfile` standard module.
204
205 `GNU tar manual, Basic Tar Format <http://www.gnu.org/software/tar/manual/html_node/tar_134.html#SEC134>`_
206 Documentation for tar archive files, including GNU tar extensions.
207
208.. % -----------------
209.. % TarFile Objects
210.. % -----------------
211
212
213.. _tarfile-objects:
214
215TarFile Objects
216---------------
217
218The :class:`TarFile` object provides an interface to a tar archive. A tar
219archive is a sequence of blocks. An archive member (a stored file) is made up of
220a header block followed by data blocks. It is possible to store a file in a tar
221archive several times. Each archive member is represented by a :class:`TarInfo`
222object, see :ref:`tarinfo-objects` for details.
223
224
225.. class:: TarFile(name=None, mode='r', fileobj=None, format=DEFAULT_FORMAT, tarinfo=TarInfo, dereference=False, ignore_zeros=False, encoding=None, errors=None, pax_headers=None, debug=0, errorlevel=0)
226
227 All following arguments are optional and can be accessed as instance attributes
228 as well.
229
230 *name* is the pathname of the archive. It can be omitted if *fileobj* is given.
231 In this case, the file object's :attr:`name` attribute is used if it exists.
232
233 *mode* is either ``'r'`` to read from an existing archive, ``'a'`` to append
234 data to an existing file or ``'w'`` to create a new file overwriting an existing
235 one.
236
237 If *fileobj* is given, it is used for reading or writing data. If it can be
238 determined, *mode* is overridden by *fileobj*'s mode. *fileobj* will be used
239 from position 0.
240
241 .. note::
242
243 *fileobj* is not closed, when :class:`TarFile` is closed.
244
245 *format* controls the archive format. It must be one of the constants
246 :const:`USTAR_FORMAT`, :const:`GNU_FORMAT` or :const:`PAX_FORMAT` that are
247 defined at module level.
248
Georg Brandl116aa622007-08-15 14:28:22 +0000249 The *tarinfo* argument can be used to replace the default :class:`TarInfo` class
250 with a different one.
251
Georg Brandl116aa622007-08-15 14:28:22 +0000252 If *dereference* is ``False``, add symbolic and hard links to the archive. If it
253 is ``True``, add the content of the target files to the archive. This has no
254 effect on systems that do not support symbolic links.
255
256 If *ignore_zeros* is ``False``, treat an empty block as the end of the archive.
257 If it is *True*, skip empty (and invalid) blocks and try to get as many members
258 as possible. This is only useful for reading concatenated or damaged archives.
259
260 *debug* can be set from ``0`` (no debug messages) up to ``3`` (all debug
261 messages). The messages are written to ``sys.stderr``.
262
263 If *errorlevel* is ``0``, all errors are ignored when using :meth:`extract`.
264 Nevertheless, they appear as error messages in the debug output, when debugging
265 is enabled. If ``1``, all *fatal* errors are raised as :exc:`OSError` or
266 :exc:`IOError` exceptions. If ``2``, all *non-fatal* errors are raised as
267 :exc:`TarError` exceptions as well.
268
Lars Gustäbel3741eff2007-08-21 12:17:05 +0000269 The *encoding* and *errors* arguments define the character encoding to be
270 used for reading or writing the archive and how conversion errors are going
271 to be handled. The default settings will work for most users.
Georg Brandl116aa622007-08-15 14:28:22 +0000272 See section :ref:`tar-unicode` for in-depth information.
273
Lars Gustäbel3741eff2007-08-21 12:17:05 +0000274 The *pax_headers* argument is an optional dictionary of strings which
Georg Brandl116aa622007-08-15 14:28:22 +0000275 will be added as a pax global header if *format* is :const:`PAX_FORMAT`.
276
Georg Brandl116aa622007-08-15 14:28:22 +0000277
278.. method:: TarFile.open(...)
279
280 Alternative constructor. The :func:`open` function on module level is actually a
281 shortcut to this classmethod. See section :ref:`tarfile-mod` for details.
282
283
284.. method:: TarFile.getmember(name)
285
286 Return a :class:`TarInfo` object for member *name*. If *name* can not be found
287 in the archive, :exc:`KeyError` is raised.
288
289 .. note::
290
291 If a member occurs more than once in the archive, its last occurrence is assumed
292 to be the most up-to-date version.
293
294
295.. method:: TarFile.getmembers()
296
297 Return the members of the archive as a list of :class:`TarInfo` objects. The
298 list has the same order as the members in the archive.
299
300
301.. method:: TarFile.getnames()
302
303 Return the members as a list of their names. It has the same order as the list
304 returned by :meth:`getmembers`.
305
306
307.. method:: TarFile.list(verbose=True)
308
309 Print a table of contents to ``sys.stdout``. If *verbose* is :const:`False`,
310 only the names of the members are printed. If it is :const:`True`, output
311 similar to that of :program:`ls -l` is produced.
312
313
314.. method:: TarFile.next()
315
316 Return the next member of the archive as a :class:`TarInfo` object, when
317 :class:`TarFile` is opened for reading. Return ``None`` if there is no more
318 available.
319
320
321.. method:: TarFile.extractall([path[, members]])
322
323 Extract all members from the archive to the current working directory or
324 directory *path*. If optional *members* is given, it must be a subset of the
325 list returned by :meth:`getmembers`. Directory information like owner,
326 modification time and permissions are set after all members have been extracted.
327 This is done to work around two problems: A directory's modification time is
328 reset each time a file is created in it. And, if a directory's permissions do
329 not allow writing, extracting files to it will fail.
330
Thomas Wouters47b49bf2007-08-30 22:15:33 +0000331 .. warning::
332
333 Never extract archives from untrusted sources without prior inspection.
334 It is possible that files are created outside of *path*, e.g. members
335 that have absolute filenames starting with ``"/"`` or filenames with two
336 dots ``".."``.
337
Georg Brandl116aa622007-08-15 14:28:22 +0000338
339.. method:: TarFile.extract(member[, path])
340
341 Extract a member from the archive to the current working directory, using its
342 full name. Its file information is extracted as accurately as possible. *member*
343 may be a filename or a :class:`TarInfo` object. You can specify a different
344 directory using *path*.
345
346 .. note::
347
348 Because the :meth:`extract` method allows random access to a tar archive there
349 are some issues you must take care of yourself. See the description for
350 :meth:`extractall` above.
351
Thomas Wouters47b49bf2007-08-30 22:15:33 +0000352 .. warning::
353
354 See the warning for :meth:`extractall`.
355
Georg Brandl116aa622007-08-15 14:28:22 +0000356
357.. method:: TarFile.extractfile(member)
358
359 Extract a member from the archive as a file object. *member* may be a filename
360 or a :class:`TarInfo` object. If *member* is a regular file, a file-like object
361 is returned. If *member* is a link, a file-like object is constructed from the
362 link's target. If *member* is none of the above, ``None`` is returned.
363
364 .. note::
365
366 The file-like object is read-only and provides the following methods:
367 :meth:`read`, :meth:`readline`, :meth:`readlines`, :meth:`seek`, :meth:`tell`.
368
369
370.. method:: TarFile.add(name[, arcname[, recursive[, exclude]]])
371
372 Add the file *name* to the archive. *name* may be any type of file (directory,
373 fifo, symbolic link, etc.). If given, *arcname* specifies an alternative name
374 for the file in the archive. Directories are added recursively by default. This
Georg Brandl55ac8f02007-09-01 13:51:09 +0000375 can be avoided by setting *recursive* to :const:`False`. If *exclude* is given,
Georg Brandl116aa622007-08-15 14:28:22 +0000376 it must be a function that takes one filename argument and returns a boolean
377 value. Depending on this value the respective file is either excluded
378 (:const:`True`) or added (:const:`False`).
379
Georg Brandl116aa622007-08-15 14:28:22 +0000380
381.. method:: TarFile.addfile(tarinfo[, fileobj])
382
383 Add the :class:`TarInfo` object *tarinfo* to the archive. If *fileobj* is given,
384 ``tarinfo.size`` bytes are read from it and added to the archive. You can
385 create :class:`TarInfo` objects using :meth:`gettarinfo`.
386
387 .. note::
388
389 On Windows platforms, *fileobj* should always be opened with mode ``'rb'`` to
390 avoid irritation about the file size.
391
392
393.. method:: TarFile.gettarinfo([name[, arcname[, fileobj]]])
394
395 Create a :class:`TarInfo` object for either the file *name* or the file object
396 *fileobj* (using :func:`os.fstat` on its file descriptor). You can modify some
397 of the :class:`TarInfo`'s attributes before you add it using :meth:`addfile`.
398 If given, *arcname* specifies an alternative name for the file in the archive.
399
400
401.. method:: TarFile.close()
402
403 Close the :class:`TarFile`. In write mode, two finishing zero blocks are
404 appended to the archive.
405
406
407.. attribute:: TarFile.posix
408
409 Setting this to :const:`True` is equivalent to setting the :attr:`format`
410 attribute to :const:`USTAR_FORMAT`, :const:`False` is equivalent to
411 :const:`GNU_FORMAT`.
412
Georg Brandl55ac8f02007-09-01 13:51:09 +0000413 *posix* defaults to :const:`False`.
Georg Brandl116aa622007-08-15 14:28:22 +0000414
415 .. deprecated:: 2.6
416 Use the :attr:`format` attribute instead.
417
418
419.. attribute:: TarFile.pax_headers
420
421 A dictionary containing key-value pairs of pax global headers.
422
Georg Brandl116aa622007-08-15 14:28:22 +0000423
424.. % -----------------
425.. % TarInfo Objects
426.. % -----------------
427
428
429.. _tarinfo-objects:
430
431TarInfo Objects
432---------------
433
434A :class:`TarInfo` object represents one member in a :class:`TarFile`. Aside
435from storing all required attributes of a file (like file type, size, time,
436permissions, owner etc.), it provides some useful methods to determine its type.
437It does *not* contain the file's data itself.
438
439:class:`TarInfo` objects are returned by :class:`TarFile`'s methods
440:meth:`getmember`, :meth:`getmembers` and :meth:`gettarinfo`.
441
442
443.. class:: TarInfo([name])
444
445 Create a :class:`TarInfo` object.
446
447
448.. method:: TarInfo.frombuf(buf)
449
450 Create and return a :class:`TarInfo` object from string buffer *buf*.
451
Georg Brandl55ac8f02007-09-01 13:51:09 +0000452 Raises :exc:`HeaderError` if the buffer is invalid..
Georg Brandl116aa622007-08-15 14:28:22 +0000453
454
455.. method:: TarInfo.fromtarfile(tarfile)
456
457 Read the next member from the :class:`TarFile` object *tarfile* and return it as
458 a :class:`TarInfo` object.
459
Georg Brandl116aa622007-08-15 14:28:22 +0000460
461.. method:: TarInfo.tobuf([format[, encoding [, errors]]])
462
463 Create a string buffer from a :class:`TarInfo` object. For information on the
464 arguments see the constructor of the :class:`TarFile` class.
465
Georg Brandl116aa622007-08-15 14:28:22 +0000466
467A ``TarInfo`` object has the following public data attributes:
468
469
470.. attribute:: TarInfo.name
471
472 Name of the archive member.
473
474
475.. attribute:: TarInfo.size
476
477 Size in bytes.
478
479
480.. attribute:: TarInfo.mtime
481
482 Time of last modification.
483
484
485.. attribute:: TarInfo.mode
486
487 Permission bits.
488
489
490.. attribute:: TarInfo.type
491
492 File type. *type* is usually one of these constants: :const:`REGTYPE`,
493 :const:`AREGTYPE`, :const:`LNKTYPE`, :const:`SYMTYPE`, :const:`DIRTYPE`,
494 :const:`FIFOTYPE`, :const:`CONTTYPE`, :const:`CHRTYPE`, :const:`BLKTYPE`,
495 :const:`GNUTYPE_SPARSE`. To determine the type of a :class:`TarInfo` object
496 more conveniently, use the ``is_*()`` methods below.
497
498
499.. attribute:: TarInfo.linkname
500
501 Name of the target file name, which is only present in :class:`TarInfo` objects
502 of type :const:`LNKTYPE` and :const:`SYMTYPE`.
503
504
505.. attribute:: TarInfo.uid
506
507 User ID of the user who originally stored this member.
508
509
510.. attribute:: TarInfo.gid
511
512 Group ID of the user who originally stored this member.
513
514
515.. attribute:: TarInfo.uname
516
517 User name.
518
519
520.. attribute:: TarInfo.gname
521
522 Group name.
523
524
525.. attribute:: TarInfo.pax_headers
526
527 A dictionary containing key-value pairs of an associated pax extended header.
528
Georg Brandl116aa622007-08-15 14:28:22 +0000529
530A :class:`TarInfo` object also provides some convenient query methods:
531
532
533.. method:: TarInfo.isfile()
534
535 Return :const:`True` if the :class:`Tarinfo` object is a regular file.
536
537
538.. method:: TarInfo.isreg()
539
540 Same as :meth:`isfile`.
541
542
543.. method:: TarInfo.isdir()
544
545 Return :const:`True` if it is a directory.
546
547
548.. method:: TarInfo.issym()
549
550 Return :const:`True` if it is a symbolic link.
551
552
553.. method:: TarInfo.islnk()
554
555 Return :const:`True` if it is a hard link.
556
557
558.. method:: TarInfo.ischr()
559
560 Return :const:`True` if it is a character device.
561
562
563.. method:: TarInfo.isblk()
564
565 Return :const:`True` if it is a block device.
566
567
568.. method:: TarInfo.isfifo()
569
570 Return :const:`True` if it is a FIFO.
571
572
573.. method:: TarInfo.isdev()
574
575 Return :const:`True` if it is one of character device, block device or FIFO.
576
577.. % ------------------------
578.. % Examples
579.. % ------------------------
580
581
582.. _tar-examples:
583
584Examples
585--------
586
587How to extract an entire tar archive to the current working directory::
588
589 import tarfile
590 tar = tarfile.open("sample.tar.gz")
591 tar.extractall()
592 tar.close()
593
594How to create an uncompressed tar archive from a list of filenames::
595
596 import tarfile
597 tar = tarfile.open("sample.tar", "w")
598 for name in ["foo", "bar", "quux"]:
599 tar.add(name)
600 tar.close()
601
602How to read a gzip compressed tar archive and display some member information::
603
604 import tarfile
605 tar = tarfile.open("sample.tar.gz", "r:gz")
606 for tarinfo in tar:
Collin Winterc79461b2007-09-01 23:34:30 +0000607 print(tarinfo.name, "is", tarinfo.size, "bytes in size and is", end="")
Georg Brandl116aa622007-08-15 14:28:22 +0000608 if tarinfo.isreg():
Collin Winterc79461b2007-09-01 23:34:30 +0000609 print("a regular file.")
Georg Brandl116aa622007-08-15 14:28:22 +0000610 elif tarinfo.isdir():
Collin Winterc79461b2007-09-01 23:34:30 +0000611 print("a directory.")
Georg Brandl116aa622007-08-15 14:28:22 +0000612 else:
Collin Winterc79461b2007-09-01 23:34:30 +0000613 print("something else.")
Georg Brandl116aa622007-08-15 14:28:22 +0000614 tar.close()
615
616How to create a tar archive with faked information::
617
618 import tarfile
619 tar = tarfile.open("sample.tar.gz", "w:gz")
620 for name in namelist:
621 tarinfo = tar.gettarinfo(name, "fakeproj-1.0/" + name)
622 tarinfo.uid = 123
623 tarinfo.gid = 456
624 tarinfo.uname = "johndoe"
625 tarinfo.gname = "fake"
626 tar.addfile(tarinfo, file(name))
627 tar.close()
628
629The *only* way to extract an uncompressed tar stream from ``sys.stdin``::
630
631 import sys
632 import tarfile
633 tar = tarfile.open(mode="r|", fileobj=sys.stdin)
634 for tarinfo in tar:
635 tar.extract(tarinfo)
636 tar.close()
637
638.. % ------------
639.. % Tar format
640.. % ------------
641
642
643.. _tar-formats:
644
645Supported tar formats
646---------------------
647
648There are three tar formats that can be created with the :mod:`tarfile` module:
649
650* The POSIX.1-1988 ustar format (:const:`USTAR_FORMAT`). It supports filenames
651 up to a length of at best 256 characters and linknames up to 100 characters. The
652 maximum file size is 8 gigabytes. This is an old and limited but widely
653 supported format.
654
655* The GNU tar format (:const:`GNU_FORMAT`). It supports long filenames and
656 linknames, files bigger than 8 gigabytes and sparse files. It is the de facto
657 standard on GNU/Linux systems. :mod:`tarfile` fully supports the GNU tar
658 extensions for long names, sparse file support is read-only.
659
660* The POSIX.1-2001 pax format (:const:`PAX_FORMAT`). It is the most flexible
661 format with virtually no limits. It supports long filenames and linknames, large
662 files and stores pathnames in a portable way. However, not all tar
663 implementations today are able to handle pax archives properly.
664
665 The *pax* format is an extension to the existing *ustar* format. It uses extra
666 headers for information that cannot be stored otherwise. There are two flavours
667 of pax headers: Extended headers only affect the subsequent file header, global
668 headers are valid for the complete archive and affect all following files. All
669 the data in a pax header is encoded in *UTF-8* for portability reasons.
670
671There are some more variants of the tar format which can be read, but not
672created:
673
674* The ancient V7 format. This is the first tar format from Unix Seventh Edition,
675 storing only regular files and directories. Names must not be longer than 100
676 characters, there is no user/group name information. Some archives have
677 miscalculated header checksums in case of fields with non-ASCII characters.
678
679* The SunOS tar extended format. This format is a variant of the POSIX.1-2001
680 pax format, but is not compatible.
681
682.. % ----------------
683.. % Unicode issues
684.. % ----------------
685
686
687.. _tar-unicode:
688
689Unicode issues
690--------------
691
692The tar format was originally conceived to make backups on tape drives with the
693main focus on preserving file system information. Nowadays tar archives are
694commonly used for file distribution and exchanging archives over networks. One
Lars Gustäbel3741eff2007-08-21 12:17:05 +0000695problem of the original format (which is the basis of all other formats) is
696that there is no concept of supporting different character encodings. For
Georg Brandl116aa622007-08-15 14:28:22 +0000697example, an ordinary tar archive created on a *UTF-8* system cannot be read
Lars Gustäbel3741eff2007-08-21 12:17:05 +0000698correctly on a *Latin-1* system if it contains non-*ASCII* characters. Textual
699metadata (like filenames, linknames, user/group names) will appear damaged.
700Unfortunately, there is no way to autodetect the encoding of an archive. The
701pax format was designed to solve this problem. It stores non-ASCII metadata
702using the universal character encoding *UTF-8*.
Georg Brandl116aa622007-08-15 14:28:22 +0000703
Lars Gustäbel3741eff2007-08-21 12:17:05 +0000704The details of character conversion in :mod:`tarfile` are controlled by the
705*encoding* and *errors* keyword arguments of the :class:`TarFile` class.
Georg Brandl116aa622007-08-15 14:28:22 +0000706
Lars Gustäbel3741eff2007-08-21 12:17:05 +0000707*encoding* defines the character encoding to use for the metadata in the
708archive. The default value is :func:`sys.getfilesystemencoding` or ``'ascii'``
709as a fallback. Depending on whether the archive is read or written, the
710metadata must be either decoded or encoded. If *encoding* is not set
711appropriately, this conversion may fail.
Georg Brandl116aa622007-08-15 14:28:22 +0000712
713The *errors* argument defines how characters are treated that cannot be
Lars Gustäbel3741eff2007-08-21 12:17:05 +0000714converted. Possible values are listed in section :ref:`codec-base-classes`. In
715read mode the default scheme is ``'replace'``. This avoids unexpected
716:exc:`UnicodeError` exceptions and guarantees that an archive can always be
717read. In write mode the default value for *errors* is ``'strict'``. This
718ensures that name information is not altered unnoticed.
Georg Brandl116aa622007-08-15 14:28:22 +0000719
Lars Gustäbel3741eff2007-08-21 12:17:05 +0000720In case of writing :const:`PAX_FORMAT` archives, *encoding* is ignored because
721non-ASCII metadata is stored using *UTF-8*.