blob: 4aabd8185fb542317504f5238dfaceace97b3a90 [file] [log] [blame]
Georg Brandl116aa622007-08-15 14:28:22 +00001.. _tarfile-mod:
2
3:mod:`tarfile` --- Read and write tar archive files
4===================================================
5
6.. module:: tarfile
7 :synopsis: Read and write tar-format archive files.
8
9
Georg Brandl116aa622007-08-15 14:28:22 +000010.. moduleauthor:: Lars Gustäbel <lars@gustaebel.de>
11.. sectionauthor:: Lars Gustäbel <lars@gustaebel.de>
12
13
14The :mod:`tarfile` module makes it possible to read and create tar archives.
15Some facts and figures:
16
17* reads and writes :mod:`gzip` and :mod:`bzip2` compressed archives.
18
19* read/write support for the POSIX.1-1988 (ustar) format.
20
21* read/write support for the GNU tar format including *longname* and *longlink*
22 extensions, read-only support for the *sparse* extension.
23
24* read/write support for the POSIX.1-2001 (pax) format.
25
Georg Brandl116aa622007-08-15 14:28:22 +000026* handles directories, regular files, hardlinks, symbolic links, fifos,
27 character devices and block devices and is able to acquire and restore file
28 information like timestamp, access permissions and owner.
29
30* can handle tape devices.
31
32
33.. function:: open(name[, mode[, fileobj[, bufsize]]], **kwargs)
34
35 Return a :class:`TarFile` object for the pathname *name*. For detailed
36 information on :class:`TarFile` objects and the keyword arguments that are
37 allowed, see :ref:`tarfile-objects`.
38
39 *mode* has to be a string of the form ``'filemode[:compression]'``, it defaults
40 to ``'r'``. Here is a full list of mode combinations:
41
42 +------------------+---------------------------------------------+
43 | mode | action |
44 +==================+=============================================+
45 | ``'r' or 'r:*'`` | Open for reading with transparent |
46 | | compression (recommended). |
47 +------------------+---------------------------------------------+
48 | ``'r:'`` | Open for reading exclusively without |
49 | | compression. |
50 +------------------+---------------------------------------------+
51 | ``'r:gz'`` | Open for reading with gzip compression. |
52 +------------------+---------------------------------------------+
53 | ``'r:bz2'`` | Open for reading with bzip2 compression. |
54 +------------------+---------------------------------------------+
55 | ``'a' or 'a:'`` | Open for appending with no compression. The |
56 | | file is created if it does not exist. |
57 +------------------+---------------------------------------------+
58 | ``'w' or 'w:'`` | Open for uncompressed writing. |
59 +------------------+---------------------------------------------+
60 | ``'w:gz'`` | Open for gzip compressed writing. |
61 +------------------+---------------------------------------------+
62 | ``'w:bz2'`` | Open for bzip2 compressed writing. |
63 +------------------+---------------------------------------------+
64
65 Note that ``'a:gz'`` or ``'a:bz2'`` is not possible. If *mode* is not suitable
66 to open a certain (compressed) file for reading, :exc:`ReadError` is raised. Use
67 *mode* ``'r'`` to avoid this. If a compression method is not supported,
68 :exc:`CompressionError` is raised.
69
70 If *fileobj* is specified, it is used as an alternative to a file object opened
71 for *name*. It is supposed to be at position 0.
72
73 For special purposes, there is a second format for *mode*:
74 ``'filemode|[compression]'``. :func:`open` will return a :class:`TarFile`
75 object that processes its data as a stream of blocks. No random seeking will
76 be done on the file. If given, *fileobj* may be any object that has a
77 :meth:`read` or :meth:`write` method (depending on the *mode*). *bufsize*
78 specifies the blocksize and defaults to ``20 * 512`` bytes. Use this variant
79 in combination with e.g. ``sys.stdin``, a socket file object or a tape
80 device. However, such a :class:`TarFile` object is limited in that it does
81 not allow to be accessed randomly, see :ref:`tar-examples`. The currently
82 possible modes:
83
84 +-------------+--------------------------------------------+
85 | Mode | Action |
86 +=============+============================================+
87 | ``'r|*'`` | Open a *stream* of tar blocks for reading |
88 | | with transparent compression. |
89 +-------------+--------------------------------------------+
90 | ``'r|'`` | Open a *stream* of uncompressed tar blocks |
91 | | for reading. |
92 +-------------+--------------------------------------------+
93 | ``'r|gz'`` | Open a gzip compressed *stream* for |
94 | | reading. |
95 +-------------+--------------------------------------------+
96 | ``'r|bz2'`` | Open a bzip2 compressed *stream* for |
97 | | reading. |
98 +-------------+--------------------------------------------+
99 | ``'w|'`` | Open an uncompressed *stream* for writing. |
100 +-------------+--------------------------------------------+
101 | ``'w|gz'`` | Open an gzip compressed *stream* for |
102 | | writing. |
103 +-------------+--------------------------------------------+
104 | ``'w|bz2'`` | Open an bzip2 compressed *stream* for |
105 | | writing. |
106 +-------------+--------------------------------------------+
107
108
109.. class:: TarFile
110
111 Class for reading and writing tar archives. Do not use this class directly,
112 better use :func:`open` instead. See :ref:`tarfile-objects`.
113
114
115.. function:: is_tarfile(name)
116
117 Return :const:`True` if *name* is a tar archive file, that the :mod:`tarfile`
118 module can read.
119
120
121.. class:: TarFileCompat(filename[, mode[, compression]])
122
123 Class for limited access to tar archives with a :mod:`zipfile`\ -like interface.
124 Please consult the documentation of the :mod:`zipfile` module for more details.
125 *compression* must be one of the following constants:
126
127
128 .. data:: TAR_PLAIN
129
130 Constant for an uncompressed tar archive.
131
132
133 .. data:: TAR_GZIPPED
134
135 Constant for a :mod:`gzip` compressed tar archive.
136
137
138.. exception:: TarError
139
140 Base class for all :mod:`tarfile` exceptions.
141
142
143.. exception:: ReadError
144
145 Is raised when a tar archive is opened, that either cannot be handled by the
146 :mod:`tarfile` module or is somehow invalid.
147
148
149.. exception:: CompressionError
150
151 Is raised when a compression method is not supported or when the data cannot be
152 decoded properly.
153
154
155.. exception:: StreamError
156
157 Is raised for the limitations that are typical for stream-like :class:`TarFile`
158 objects.
159
160
161.. exception:: ExtractError
162
163 Is raised for *non-fatal* errors when using :meth:`extract`, but only if
164 :attr:`TarFile.errorlevel`\ ``== 2``.
165
166
167.. exception:: HeaderError
168
169 Is raised by :meth:`frombuf` if the buffer it gets is invalid.
170
Georg Brandl116aa622007-08-15 14:28:22 +0000171
172Each of the following constants defines a tar archive format that the
173:mod:`tarfile` module is able to create. See section :ref:`tar-formats` for
174details.
175
176
177.. data:: USTAR_FORMAT
178
179 POSIX.1-1988 (ustar) format.
180
181
182.. data:: GNU_FORMAT
183
184 GNU tar format.
185
186
187.. data:: PAX_FORMAT
188
189 POSIX.1-2001 (pax) format.
190
191
192.. data:: DEFAULT_FORMAT
193
194 The default format for creating archives. This is currently :const:`GNU_FORMAT`.
195
196
197.. seealso::
198
199 Module :mod:`zipfile`
200 Documentation of the :mod:`zipfile` standard module.
201
202 `GNU tar manual, Basic Tar Format <http://www.gnu.org/software/tar/manual/html_node/tar_134.html#SEC134>`_
203 Documentation for tar archive files, including GNU tar extensions.
204
205.. % -----------------
206.. % TarFile Objects
207.. % -----------------
208
209
210.. _tarfile-objects:
211
212TarFile Objects
213---------------
214
215The :class:`TarFile` object provides an interface to a tar archive. A tar
216archive is a sequence of blocks. An archive member (a stored file) is made up of
217a header block followed by data blocks. It is possible to store a file in a tar
218archive several times. Each archive member is represented by a :class:`TarInfo`
219object, see :ref:`tarinfo-objects` for details.
220
221
222.. class:: TarFile(name=None, mode='r', fileobj=None, format=DEFAULT_FORMAT, tarinfo=TarInfo, dereference=False, ignore_zeros=False, encoding=None, errors=None, pax_headers=None, debug=0, errorlevel=0)
223
224 All following arguments are optional and can be accessed as instance attributes
225 as well.
226
227 *name* is the pathname of the archive. It can be omitted if *fileobj* is given.
228 In this case, the file object's :attr:`name` attribute is used if it exists.
229
230 *mode* is either ``'r'`` to read from an existing archive, ``'a'`` to append
231 data to an existing file or ``'w'`` to create a new file overwriting an existing
232 one.
233
234 If *fileobj* is given, it is used for reading or writing data. If it can be
235 determined, *mode* is overridden by *fileobj*'s mode. *fileobj* will be used
236 from position 0.
237
238 .. note::
239
240 *fileobj* is not closed, when :class:`TarFile` is closed.
241
242 *format* controls the archive format. It must be one of the constants
243 :const:`USTAR_FORMAT`, :const:`GNU_FORMAT` or :const:`PAX_FORMAT` that are
244 defined at module level.
245
Georg Brandl116aa622007-08-15 14:28:22 +0000246 The *tarinfo* argument can be used to replace the default :class:`TarInfo` class
247 with a different one.
248
Georg Brandl116aa622007-08-15 14:28:22 +0000249 If *dereference* is ``False``, add symbolic and hard links to the archive. If it
250 is ``True``, add the content of the target files to the archive. This has no
251 effect on systems that do not support symbolic links.
252
253 If *ignore_zeros* is ``False``, treat an empty block as the end of the archive.
254 If it is *True*, skip empty (and invalid) blocks and try to get as many members
255 as possible. This is only useful for reading concatenated or damaged archives.
256
257 *debug* can be set from ``0`` (no debug messages) up to ``3`` (all debug
258 messages). The messages are written to ``sys.stderr``.
259
260 If *errorlevel* is ``0``, all errors are ignored when using :meth:`extract`.
261 Nevertheless, they appear as error messages in the debug output, when debugging
262 is enabled. If ``1``, all *fatal* errors are raised as :exc:`OSError` or
263 :exc:`IOError` exceptions. If ``2``, all *non-fatal* errors are raised as
264 :exc:`TarError` exceptions as well.
265
Lars Gustäbel3741eff2007-08-21 12:17:05 +0000266 The *encoding* and *errors* arguments define the character encoding to be
267 used for reading or writing the archive and how conversion errors are going
268 to be handled. The default settings will work for most users.
Georg Brandl116aa622007-08-15 14:28:22 +0000269 See section :ref:`tar-unicode` for in-depth information.
270
Lars Gustäbel3741eff2007-08-21 12:17:05 +0000271 The *pax_headers* argument is an optional dictionary of strings which
Georg Brandl116aa622007-08-15 14:28:22 +0000272 will be added as a pax global header if *format* is :const:`PAX_FORMAT`.
273
Georg Brandl116aa622007-08-15 14:28:22 +0000274
275.. method:: TarFile.open(...)
276
277 Alternative constructor. The :func:`open` function on module level is actually a
278 shortcut to this classmethod. See section :ref:`tarfile-mod` for details.
279
280
281.. method:: TarFile.getmember(name)
282
283 Return a :class:`TarInfo` object for member *name*. If *name* can not be found
284 in the archive, :exc:`KeyError` is raised.
285
286 .. note::
287
288 If a member occurs more than once in the archive, its last occurrence is assumed
289 to be the most up-to-date version.
290
291
292.. method:: TarFile.getmembers()
293
294 Return the members of the archive as a list of :class:`TarInfo` objects. The
295 list has the same order as the members in the archive.
296
297
298.. method:: TarFile.getnames()
299
300 Return the members as a list of their names. It has the same order as the list
301 returned by :meth:`getmembers`.
302
303
304.. method:: TarFile.list(verbose=True)
305
306 Print a table of contents to ``sys.stdout``. If *verbose* is :const:`False`,
307 only the names of the members are printed. If it is :const:`True`, output
308 similar to that of :program:`ls -l` is produced.
309
310
311.. method:: TarFile.next()
312
313 Return the next member of the archive as a :class:`TarInfo` object, when
314 :class:`TarFile` is opened for reading. Return ``None`` if there is no more
315 available.
316
317
318.. method:: TarFile.extractall([path[, members]])
319
320 Extract all members from the archive to the current working directory or
321 directory *path*. If optional *members* is given, it must be a subset of the
322 list returned by :meth:`getmembers`. Directory information like owner,
323 modification time and permissions are set after all members have been extracted.
324 This is done to work around two problems: A directory's modification time is
325 reset each time a file is created in it. And, if a directory's permissions do
326 not allow writing, extracting files to it will fail.
327
Thomas Wouters47b49bf2007-08-30 22:15:33 +0000328 .. warning::
329
330 Never extract archives from untrusted sources without prior inspection.
331 It is possible that files are created outside of *path*, e.g. members
332 that have absolute filenames starting with ``"/"`` or filenames with two
333 dots ``".."``.
334
Georg Brandl116aa622007-08-15 14:28:22 +0000335
336.. method:: TarFile.extract(member[, path])
337
338 Extract a member from the archive to the current working directory, using its
339 full name. Its file information is extracted as accurately as possible. *member*
340 may be a filename or a :class:`TarInfo` object. You can specify a different
341 directory using *path*.
342
343 .. note::
344
345 Because the :meth:`extract` method allows random access to a tar archive there
346 are some issues you must take care of yourself. See the description for
347 :meth:`extractall` above.
348
Thomas Wouters47b49bf2007-08-30 22:15:33 +0000349 .. warning::
350
351 See the warning for :meth:`extractall`.
352
Georg Brandl116aa622007-08-15 14:28:22 +0000353
354.. method:: TarFile.extractfile(member)
355
356 Extract a member from the archive as a file object. *member* may be a filename
357 or a :class:`TarInfo` object. If *member* is a regular file, a file-like object
358 is returned. If *member* is a link, a file-like object is constructed from the
359 link's target. If *member* is none of the above, ``None`` is returned.
360
361 .. note::
362
363 The file-like object is read-only and provides the following methods:
364 :meth:`read`, :meth:`readline`, :meth:`readlines`, :meth:`seek`, :meth:`tell`.
365
366
367.. method:: TarFile.add(name[, arcname[, recursive[, exclude]]])
368
369 Add the file *name* to the archive. *name* may be any type of file (directory,
370 fifo, symbolic link, etc.). If given, *arcname* specifies an alternative name
371 for the file in the archive. Directories are added recursively by default. This
Georg Brandl55ac8f02007-09-01 13:51:09 +0000372 can be avoided by setting *recursive* to :const:`False`. If *exclude* is given,
Georg Brandl116aa622007-08-15 14:28:22 +0000373 it must be a function that takes one filename argument and returns a boolean
374 value. Depending on this value the respective file is either excluded
375 (:const:`True`) or added (:const:`False`).
376
Georg Brandl116aa622007-08-15 14:28:22 +0000377
378.. method:: TarFile.addfile(tarinfo[, fileobj])
379
380 Add the :class:`TarInfo` object *tarinfo* to the archive. If *fileobj* is given,
381 ``tarinfo.size`` bytes are read from it and added to the archive. You can
382 create :class:`TarInfo` objects using :meth:`gettarinfo`.
383
384 .. note::
385
386 On Windows platforms, *fileobj* should always be opened with mode ``'rb'`` to
387 avoid irritation about the file size.
388
389
390.. method:: TarFile.gettarinfo([name[, arcname[, fileobj]]])
391
392 Create a :class:`TarInfo` object for either the file *name* or the file object
393 *fileobj* (using :func:`os.fstat` on its file descriptor). You can modify some
394 of the :class:`TarInfo`'s attributes before you add it using :meth:`addfile`.
395 If given, *arcname* specifies an alternative name for the file in the archive.
396
397
398.. method:: TarFile.close()
399
400 Close the :class:`TarFile`. In write mode, two finishing zero blocks are
401 appended to the archive.
402
403
404.. attribute:: TarFile.posix
405
406 Setting this to :const:`True` is equivalent to setting the :attr:`format`
407 attribute to :const:`USTAR_FORMAT`, :const:`False` is equivalent to
408 :const:`GNU_FORMAT`.
409
Georg Brandl55ac8f02007-09-01 13:51:09 +0000410 *posix* defaults to :const:`False`.
Georg Brandl116aa622007-08-15 14:28:22 +0000411
412 .. deprecated:: 2.6
413 Use the :attr:`format` attribute instead.
414
415
416.. attribute:: TarFile.pax_headers
417
418 A dictionary containing key-value pairs of pax global headers.
419
Georg Brandl116aa622007-08-15 14:28:22 +0000420
421.. % -----------------
422.. % TarInfo Objects
423.. % -----------------
424
425
426.. _tarinfo-objects:
427
428TarInfo Objects
429---------------
430
431A :class:`TarInfo` object represents one member in a :class:`TarFile`. Aside
432from storing all required attributes of a file (like file type, size, time,
433permissions, owner etc.), it provides some useful methods to determine its type.
434It does *not* contain the file's data itself.
435
436:class:`TarInfo` objects are returned by :class:`TarFile`'s methods
437:meth:`getmember`, :meth:`getmembers` and :meth:`gettarinfo`.
438
439
440.. class:: TarInfo([name])
441
442 Create a :class:`TarInfo` object.
443
444
445.. method:: TarInfo.frombuf(buf)
446
447 Create and return a :class:`TarInfo` object from string buffer *buf*.
448
Georg Brandl55ac8f02007-09-01 13:51:09 +0000449 Raises :exc:`HeaderError` if the buffer is invalid..
Georg Brandl116aa622007-08-15 14:28:22 +0000450
451
452.. method:: TarInfo.fromtarfile(tarfile)
453
454 Read the next member from the :class:`TarFile` object *tarfile* and return it as
455 a :class:`TarInfo` object.
456
Georg Brandl116aa622007-08-15 14:28:22 +0000457
458.. method:: TarInfo.tobuf([format[, encoding [, errors]]])
459
460 Create a string buffer from a :class:`TarInfo` object. For information on the
461 arguments see the constructor of the :class:`TarFile` class.
462
Georg Brandl116aa622007-08-15 14:28:22 +0000463
464A ``TarInfo`` object has the following public data attributes:
465
466
467.. attribute:: TarInfo.name
468
469 Name of the archive member.
470
471
472.. attribute:: TarInfo.size
473
474 Size in bytes.
475
476
477.. attribute:: TarInfo.mtime
478
479 Time of last modification.
480
481
482.. attribute:: TarInfo.mode
483
484 Permission bits.
485
486
487.. attribute:: TarInfo.type
488
489 File type. *type* is usually one of these constants: :const:`REGTYPE`,
490 :const:`AREGTYPE`, :const:`LNKTYPE`, :const:`SYMTYPE`, :const:`DIRTYPE`,
491 :const:`FIFOTYPE`, :const:`CONTTYPE`, :const:`CHRTYPE`, :const:`BLKTYPE`,
492 :const:`GNUTYPE_SPARSE`. To determine the type of a :class:`TarInfo` object
493 more conveniently, use the ``is_*()`` methods below.
494
495
496.. attribute:: TarInfo.linkname
497
498 Name of the target file name, which is only present in :class:`TarInfo` objects
499 of type :const:`LNKTYPE` and :const:`SYMTYPE`.
500
501
502.. attribute:: TarInfo.uid
503
504 User ID of the user who originally stored this member.
505
506
507.. attribute:: TarInfo.gid
508
509 Group ID of the user who originally stored this member.
510
511
512.. attribute:: TarInfo.uname
513
514 User name.
515
516
517.. attribute:: TarInfo.gname
518
519 Group name.
520
521
522.. attribute:: TarInfo.pax_headers
523
524 A dictionary containing key-value pairs of an associated pax extended header.
525
Georg Brandl116aa622007-08-15 14:28:22 +0000526
527A :class:`TarInfo` object also provides some convenient query methods:
528
529
530.. method:: TarInfo.isfile()
531
532 Return :const:`True` if the :class:`Tarinfo` object is a regular file.
533
534
535.. method:: TarInfo.isreg()
536
537 Same as :meth:`isfile`.
538
539
540.. method:: TarInfo.isdir()
541
542 Return :const:`True` if it is a directory.
543
544
545.. method:: TarInfo.issym()
546
547 Return :const:`True` if it is a symbolic link.
548
549
550.. method:: TarInfo.islnk()
551
552 Return :const:`True` if it is a hard link.
553
554
555.. method:: TarInfo.ischr()
556
557 Return :const:`True` if it is a character device.
558
559
560.. method:: TarInfo.isblk()
561
562 Return :const:`True` if it is a block device.
563
564
565.. method:: TarInfo.isfifo()
566
567 Return :const:`True` if it is a FIFO.
568
569
570.. method:: TarInfo.isdev()
571
572 Return :const:`True` if it is one of character device, block device or FIFO.
573
574.. % ------------------------
575.. % Examples
576.. % ------------------------
577
578
579.. _tar-examples:
580
581Examples
582--------
583
584How to extract an entire tar archive to the current working directory::
585
586 import tarfile
587 tar = tarfile.open("sample.tar.gz")
588 tar.extractall()
589 tar.close()
590
591How to create an uncompressed tar archive from a list of filenames::
592
593 import tarfile
594 tar = tarfile.open("sample.tar", "w")
595 for name in ["foo", "bar", "quux"]:
596 tar.add(name)
597 tar.close()
598
599How to read a gzip compressed tar archive and display some member information::
600
601 import tarfile
602 tar = tarfile.open("sample.tar.gz", "r:gz")
603 for tarinfo in tar:
Collin Winterc79461b2007-09-01 23:34:30 +0000604 print(tarinfo.name, "is", tarinfo.size, "bytes in size and is", end="")
Georg Brandl116aa622007-08-15 14:28:22 +0000605 if tarinfo.isreg():
Collin Winterc79461b2007-09-01 23:34:30 +0000606 print("a regular file.")
Georg Brandl116aa622007-08-15 14:28:22 +0000607 elif tarinfo.isdir():
Collin Winterc79461b2007-09-01 23:34:30 +0000608 print("a directory.")
Georg Brandl116aa622007-08-15 14:28:22 +0000609 else:
Collin Winterc79461b2007-09-01 23:34:30 +0000610 print("something else.")
Georg Brandl116aa622007-08-15 14:28:22 +0000611 tar.close()
612
613How to create a tar archive with faked information::
614
615 import tarfile
616 tar = tarfile.open("sample.tar.gz", "w:gz")
617 for name in namelist:
618 tarinfo = tar.gettarinfo(name, "fakeproj-1.0/" + name)
619 tarinfo.uid = 123
620 tarinfo.gid = 456
621 tarinfo.uname = "johndoe"
622 tarinfo.gname = "fake"
623 tar.addfile(tarinfo, file(name))
624 tar.close()
625
626The *only* way to extract an uncompressed tar stream from ``sys.stdin``::
627
628 import sys
629 import tarfile
630 tar = tarfile.open(mode="r|", fileobj=sys.stdin)
631 for tarinfo in tar:
632 tar.extract(tarinfo)
633 tar.close()
634
635.. % ------------
636.. % Tar format
637.. % ------------
638
639
640.. _tar-formats:
641
642Supported tar formats
643---------------------
644
645There are three tar formats that can be created with the :mod:`tarfile` module:
646
647* The POSIX.1-1988 ustar format (:const:`USTAR_FORMAT`). It supports filenames
648 up to a length of at best 256 characters and linknames up to 100 characters. The
649 maximum file size is 8 gigabytes. This is an old and limited but widely
650 supported format.
651
652* The GNU tar format (:const:`GNU_FORMAT`). It supports long filenames and
653 linknames, files bigger than 8 gigabytes and sparse files. It is the de facto
654 standard on GNU/Linux systems. :mod:`tarfile` fully supports the GNU tar
655 extensions for long names, sparse file support is read-only.
656
657* The POSIX.1-2001 pax format (:const:`PAX_FORMAT`). It is the most flexible
658 format with virtually no limits. It supports long filenames and linknames, large
659 files and stores pathnames in a portable way. However, not all tar
660 implementations today are able to handle pax archives properly.
661
662 The *pax* format is an extension to the existing *ustar* format. It uses extra
663 headers for information that cannot be stored otherwise. There are two flavours
664 of pax headers: Extended headers only affect the subsequent file header, global
665 headers are valid for the complete archive and affect all following files. All
666 the data in a pax header is encoded in *UTF-8* for portability reasons.
667
668There are some more variants of the tar format which can be read, but not
669created:
670
671* The ancient V7 format. This is the first tar format from Unix Seventh Edition,
672 storing only regular files and directories. Names must not be longer than 100
673 characters, there is no user/group name information. Some archives have
674 miscalculated header checksums in case of fields with non-ASCII characters.
675
676* The SunOS tar extended format. This format is a variant of the POSIX.1-2001
677 pax format, but is not compatible.
678
679.. % ----------------
680.. % Unicode issues
681.. % ----------------
682
683
684.. _tar-unicode:
685
686Unicode issues
687--------------
688
689The tar format was originally conceived to make backups on tape drives with the
690main focus on preserving file system information. Nowadays tar archives are
691commonly used for file distribution and exchanging archives over networks. One
Lars Gustäbel3741eff2007-08-21 12:17:05 +0000692problem of the original format (which is the basis of all other formats) is
693that there is no concept of supporting different character encodings. For
Georg Brandl116aa622007-08-15 14:28:22 +0000694example, an ordinary tar archive created on a *UTF-8* system cannot be read
Lars Gustäbel3741eff2007-08-21 12:17:05 +0000695correctly on a *Latin-1* system if it contains non-*ASCII* characters. Textual
696metadata (like filenames, linknames, user/group names) will appear damaged.
697Unfortunately, there is no way to autodetect the encoding of an archive. The
698pax format was designed to solve this problem. It stores non-ASCII metadata
699using the universal character encoding *UTF-8*.
Georg Brandl116aa622007-08-15 14:28:22 +0000700
Lars Gustäbel3741eff2007-08-21 12:17:05 +0000701The details of character conversion in :mod:`tarfile` are controlled by the
702*encoding* and *errors* keyword arguments of the :class:`TarFile` class.
Georg Brandl116aa622007-08-15 14:28:22 +0000703
Lars Gustäbel3741eff2007-08-21 12:17:05 +0000704*encoding* defines the character encoding to use for the metadata in the
705archive. The default value is :func:`sys.getfilesystemencoding` or ``'ascii'``
706as a fallback. Depending on whether the archive is read or written, the
707metadata must be either decoded or encoded. If *encoding* is not set
708appropriately, this conversion may fail.
Georg Brandl116aa622007-08-15 14:28:22 +0000709
710The *errors* argument defines how characters are treated that cannot be
Lars Gustäbel3741eff2007-08-21 12:17:05 +0000711converted. Possible values are listed in section :ref:`codec-base-classes`. In
712read mode the default scheme is ``'replace'``. This avoids unexpected
713:exc:`UnicodeError` exceptions and guarantees that an archive can always be
714read. In write mode the default value for *errors* is ``'strict'``. This
715ensures that name information is not altered unnoticed.
Georg Brandl116aa622007-08-15 14:28:22 +0000716
Lars Gustäbel3741eff2007-08-21 12:17:05 +0000717In case of writing :const:`PAX_FORMAT` archives, *encoding* is ignored because
718non-ASCII metadata is stored using *UTF-8*.