blob: 54683a7707967f8affbdd72c95b5b35332bcf5c1 [file] [log] [blame]
Neal Norwitzb9ef4ae2003-01-05 23:19:43 +00001\section{\module{tarfile} --- Read and write tar archive files}
2
3\declaremodule{standard}{tarfile}
4\modulesynopsis{Read and write tar-format archive files.}
5\versionadded{2.3}
6
7\moduleauthor{Lars Gust\"abel}{lars@gustaebel.de}
8\sectionauthor{Lars Gust\"abel}{lars@gustaebel.de}
9
10The \module{tarfile} module makes it possible to read and create tar archives.
11Some facts and figures:
12
13\begin{itemize}
14\item reads and writes \module{gzip} and \module{bzip2} compressed archives.
Guido van Rossumd8faa362007-04-27 19:54:29 +000015\item read/write support for the \POSIX{}.1-1988 (ustar) format.
16\item read/write support for the GNU tar format including \emph{longname} and
17 \emph{longlink} extensions, read-only support for the \emph{sparse}
18 extension.
19\item read/write support for the \POSIX{}.1-2001 (pax) format.
20 \versionadded{2.6}
Neal Norwitzb9ef4ae2003-01-05 23:19:43 +000021\item handles directories, regular files, hardlinks, symbolic links, fifos,
22 character devices and block devices and is able to acquire and
23 restore file information like timestamp, access permissions and owner.
24\item can handle tape devices.
25\end{itemize}
26
Guido van Rossumd8faa362007-04-27 19:54:29 +000027\begin{funcdesc}{open}{name\optional{, mode\optional{,
28 fileobj\optional{, bufsize}}}, **kwargs}
Neal Norwitzb9ef4ae2003-01-05 23:19:43 +000029 Return a \class{TarFile} object for the pathname \var{name}.
Guido van Rossumd8faa362007-04-27 19:54:29 +000030 For detailed information on \class{TarFile} objects and the keyword
31 arguments that are allowed, see \citetitle{TarFile Objects}
32 (section \ref{tarfile-objects}).
Neal Norwitzb9ef4ae2003-01-05 23:19:43 +000033
34 \var{mode} has to be a string of the form \code{'filemode[:compression]'},
35 it defaults to \code{'r'}. Here is a full list of mode combinations:
36
37 \begin{tableii}{c|l}{code}{mode}{action}
Martin v. Löwis78be7df2005-03-05 12:47:42 +000038 \lineii{'r' or 'r:*'}{Open for reading with transparent compression (recommended).}
Neal Norwitzb9ef4ae2003-01-05 23:19:43 +000039 \lineii{'r:'}{Open for reading exclusively without compression.}
40 \lineii{'r:gz'}{Open for reading with gzip compression.}
41 \lineii{'r:bz2'}{Open for reading with bzip2 compression.}
Thomas Wouterscf297e42007-02-23 15:07:44 +000042 \lineii{'a' or 'a:'}{Open for appending with no compression. The file
43 is created if it does not exist.}
Neal Norwitzb9ef4ae2003-01-05 23:19:43 +000044 \lineii{'w' or 'w:'}{Open for uncompressed writing.}
45 \lineii{'w:gz'}{Open for gzip compressed writing.}
46 \lineii{'w:bz2'}{Open for bzip2 compressed writing.}
47 \end{tableii}
48
49 Note that \code{'a:gz'} or \code{'a:bz2'} is not possible.
50 If \var{mode} is not suitable to open a certain (compressed) file for
51 reading, \exception{ReadError} is raised. Use \var{mode} \code{'r'} to
52 avoid this. If a compression method is not supported,
53 \exception{CompressionError} is raised.
54
Thomas Wouterscf297e42007-02-23 15:07:44 +000055 If \var{fileobj} is specified, it is used as an alternative to a file
56 object opened for \var{name}. It is supposed to be at position 0.
Neal Norwitzb9ef4ae2003-01-05 23:19:43 +000057
58 For special purposes, there is a second format for \var{mode}:
Fred Drake3bbd1152004-01-13 23:41:32 +000059 \code{'filemode|[compression]'}. \function{open()} will return a
60 \class{TarFile} object that processes its data as a stream of
61 blocks. No random seeking will be done on the file. If given,
62 \var{fileobj} may be any object that has a \method{read()} or
63 \method{write()} method (depending on the \var{mode}).
64 \var{bufsize} specifies the blocksize and defaults to \code{20 *
65 512} bytes. Use this variant in combination with
66 e.g. \code{sys.stdin}, a socket file object or a tape device.
67 However, such a \class{TarFile} object is limited in that it does
68 not allow to be accessed randomly, see ``Examples''
69 (section~\ref{tar-examples}). The currently possible modes:
Neal Norwitzb9ef4ae2003-01-05 23:19:43 +000070
Fred Drake3bbd1152004-01-13 23:41:32 +000071 \begin{tableii}{c|l}{code}{Mode}{Action}
Martin v. Löwis78be7df2005-03-05 12:47:42 +000072 \lineii{'r|*'}{Open a \emph{stream} of tar blocks for reading with transparent compression.}
Neal Norwitzb9ef4ae2003-01-05 23:19:43 +000073 \lineii{'r|'}{Open a \emph{stream} of uncompressed tar blocks for reading.}
74 \lineii{'r|gz'}{Open a gzip compressed \emph{stream} for reading.}
75 \lineii{'r|bz2'}{Open a bzip2 compressed \emph{stream} for reading.}
76 \lineii{'w|'}{Open an uncompressed \emph{stream} for writing.}
77 \lineii{'w|gz'}{Open an gzip compressed \emph{stream} for writing.}
78 \lineii{'w|bz2'}{Open an bzip2 compressed \emph{stream} for writing.}
79 \end{tableii}
80\end{funcdesc}
81
82\begin{classdesc*}{TarFile}
83 Class for reading and writing tar archives. Do not use this
84 class directly, better use \function{open()} instead.
Fred Drake3bbd1152004-01-13 23:41:32 +000085 See ``TarFile Objects'' (section~\ref{tarfile-objects}).
Neal Norwitzb9ef4ae2003-01-05 23:19:43 +000086\end{classdesc*}
87
88\begin{funcdesc}{is_tarfile}{name}
Fred Drake3bbd1152004-01-13 23:41:32 +000089 Return \constant{True} if \var{name} is a tar archive file, that
90 the \module{tarfile} module can read.
Neal Norwitzb9ef4ae2003-01-05 23:19:43 +000091\end{funcdesc}
92
93\begin{classdesc}{TarFileCompat}{filename\optional{, mode\optional{,
Fred Drake3bbd1152004-01-13 23:41:32 +000094 compression}}}
95 Class for limited access to tar archives with a
96 \refmodule{zipfile}-like interface. Please consult the
97 documentation of the \refmodule{zipfile} module for more details.
98 \var{compression} must be one of the following constants:
Neal Norwitzb9ef4ae2003-01-05 23:19:43 +000099 \begin{datadesc}{TAR_PLAIN}
100 Constant for an uncompressed tar archive.
101 \end{datadesc}
102 \begin{datadesc}{TAR_GZIPPED}
Fred Drake3bbd1152004-01-13 23:41:32 +0000103 Constant for a \refmodule{gzip} compressed tar archive.
Neal Norwitzb9ef4ae2003-01-05 23:19:43 +0000104 \end{datadesc}
105\end{classdesc}
106
107\begin{excdesc}{TarError}
108 Base class for all \module{tarfile} exceptions.
109\end{excdesc}
110
111\begin{excdesc}{ReadError}
112 Is raised when a tar archive is opened, that either cannot be handled by
113 the \module{tarfile} module or is somehow invalid.
114\end{excdesc}
115
116\begin{excdesc}{CompressionError}
117 Is raised when a compression method is not supported or when the data
118 cannot be decoded properly.
119\end{excdesc}
120
121\begin{excdesc}{StreamError}
122 Is raised for the limitations that are typical for stream-like
123 \class{TarFile} objects.
124\end{excdesc}
125
126\begin{excdesc}{ExtractError}
127 Is raised for \emph{non-fatal} errors when using \method{extract()}, but
128 only if \member{TarFile.errorlevel}\code{ == 2}.
129\end{excdesc}
130
Thomas Wouters902d6eb2007-01-09 23:18:33 +0000131\begin{excdesc}{HeaderError}
132 Is raised by \method{frombuf()} if the buffer it gets is invalid.
133 \versionadded{2.6}
134\end{excdesc}
135
Guido van Rossume7ba4952007-06-06 23:52:48 +0000136Each of the following constants defines a tar archive format that the
137\module{tarfile} module is able to create. See section \ref{tar-formats} for
138details.
139
Guido van Rossumd8faa362007-04-27 19:54:29 +0000140\begin{datadesc}{USTAR_FORMAT}
Guido van Rossume7ba4952007-06-06 23:52:48 +0000141 \POSIX{}.1-1988 (ustar) format.
Guido van Rossumd8faa362007-04-27 19:54:29 +0000142\end{datadesc}
143
144\begin{datadesc}{GNU_FORMAT}
Guido van Rossume7ba4952007-06-06 23:52:48 +0000145 GNU tar format.
Guido van Rossumd8faa362007-04-27 19:54:29 +0000146\end{datadesc}
147
148\begin{datadesc}{PAX_FORMAT}
Guido van Rossume7ba4952007-06-06 23:52:48 +0000149 \POSIX{}.1-2001 (pax) format.
Guido van Rossumd8faa362007-04-27 19:54:29 +0000150\end{datadesc}
151
152\begin{datadesc}{DEFAULT_FORMAT}
153 The default format for creating archives. This is currently
154 \constant{GNU_FORMAT}.
155\end{datadesc}
156
Neal Norwitzb9ef4ae2003-01-05 23:19:43 +0000157\begin{seealso}
Fred Drake3bbd1152004-01-13 23:41:32 +0000158 \seemodule{zipfile}{Documentation of the \refmodule{zipfile}
Neal Norwitzb9ef4ae2003-01-05 23:19:43 +0000159 standard module.}
160
Thomas Wouters477c8d52006-05-27 19:21:47 +0000161 \seetitle[http://www.gnu.org/software/tar/manual/html_node/tar_134.html\#SEC134]
Georg Brandl9a19e5c2005-08-27 17:10:35 +0000162 {GNU tar manual, Basic Tar Format}{Documentation for tar archive files,
Neal Norwitzb9ef4ae2003-01-05 23:19:43 +0000163 including GNU tar extensions.}
164\end{seealso}
165
166%-----------------
167% TarFile Objects
168%-----------------
169
170\subsection{TarFile Objects \label{tarfile-objects}}
171
172The \class{TarFile} object provides an interface to a tar archive. A tar
173archive is a sequence of blocks. An archive member (a stored file) is made up
Guido van Rossume7ba4952007-06-06 23:52:48 +0000174of a header block followed by data blocks. It is possible to store a file in a
Neal Norwitzb9ef4ae2003-01-05 23:19:43 +0000175tar archive several times. Each archive member is represented by a
176\class{TarInfo} object, see \citetitle{TarInfo Objects} (section
177\ref{tarinfo-objects}) for details.
178
Guido van Rossumd8faa362007-04-27 19:54:29 +0000179\begin{classdesc}{TarFile}{name=None, mode='r', fileobj=None,
180 format=DEFAULT_FORMAT, tarinfo=TarInfo, dereference=False,
Guido van Rossume7ba4952007-06-06 23:52:48 +0000181 ignore_zeros=False, encoding=None, errors=None, pax_headers=None,
182 debug=0, errorlevel=0}
Guido van Rossumd8faa362007-04-27 19:54:29 +0000183
184 All following arguments are optional and can be accessed as instance
185 attributes as well.
186
187 \var{name} is the pathname of the archive. It can be omitted if
188 \var{fileobj} is given. In this case, the file object's \member{name}
189 attribute is used if it exists.
190
Neal Norwitzb9ef4ae2003-01-05 23:19:43 +0000191 \var{mode} is either \code{'r'} to read from an existing archive,
192 \code{'a'} to append data to an existing file or \code{'w'} to create a new
Guido van Rossumd8faa362007-04-27 19:54:29 +0000193 file overwriting an existing one.
Neal Norwitzb9ef4ae2003-01-05 23:19:43 +0000194
195 If \var{fileobj} is given, it is used for reading or writing data.
196 If it can be determined, \var{mode} is overridden by \var{fileobj}'s mode.
Thomas Wouterscf297e42007-02-23 15:07:44 +0000197 \var{fileobj} will be used from position 0.
Neal Norwitzb9ef4ae2003-01-05 23:19:43 +0000198 \begin{notice}
199 \var{fileobj} is not closed, when \class{TarFile} is closed.
200 \end{notice}
Guido van Rossumd8faa362007-04-27 19:54:29 +0000201
202 \var{format} controls the archive format. It must be one of the constants
203 \constant{USTAR_FORMAT}, \constant{GNU_FORMAT} or \constant{PAX_FORMAT}
204 that are defined at module level.
205 \versionadded{2.6}
206
207 The \var{tarinfo} argument can be used to replace the default
208 \class{TarInfo} class with a different one.
209 \versionadded{2.6}
210
211 If \var{dereference} is \code{False}, add symbolic and hard links to the
212 archive. If it is \code{True}, add the content of the target files to the
213 archive. This has no effect on systems that do not support symbolic links.
214
215 If \var{ignore_zeros} is \code{False}, treat an empty block as the end of
216 the archive. If it is \var{True}, skip empty (and invalid) blocks and try
217 to get as many members as possible. This is only useful for reading
218 concatenated or damaged archives.
219
220 \var{debug} can be set from \code{0} (no debug messages) up to \code{3}
221 (all debug messages). The messages are written to \code{sys.stderr}.
222
223 If \var{errorlevel} is \code{0}, all errors are ignored when using
224 \method{extract()}. Nevertheless, they appear as error messages in the
225 debug output, when debugging is enabled. If \code{1}, all \emph{fatal}
226 errors are raised as \exception{OSError} or \exception{IOError} exceptions.
227 If \code{2}, all \emph{non-fatal} errors are raised as \exception{TarError}
228 exceptions as well.
229
Guido van Rossume7ba4952007-06-06 23:52:48 +0000230 The \var{encoding} and \var{errors} arguments control the way strings are
231 converted to unicode objects and vice versa. The default settings will work
232 for most users. See section \ref{tar-unicode} for in-depth information.
Guido van Rossumd8faa362007-04-27 19:54:29 +0000233 \versionadded{2.6}
234
Guido van Rossume7ba4952007-06-06 23:52:48 +0000235 The \var{pax_headers} argument is an optional dictionary of unicode strings
236 which will be added as a pax global header if \var{format} is
237 \constant{PAX_FORMAT}.
Guido van Rossumd8faa362007-04-27 19:54:29 +0000238 \versionadded{2.6}
Neal Norwitzb9ef4ae2003-01-05 23:19:43 +0000239\end{classdesc}
240
241\begin{methoddesc}{open}{...}
242 Alternative constructor. The \function{open()} function on module level is
Fred Drake3bbd1152004-01-13 23:41:32 +0000243 actually a shortcut to this classmethod. See section~\ref{module-tarfile}
Neal Norwitzb9ef4ae2003-01-05 23:19:43 +0000244 for details.
245\end{methoddesc}
246
247\begin{methoddesc}{getmember}{name}
248 Return a \class{TarInfo} object for member \var{name}. If \var{name} can
249 not be found in the archive, \exception{KeyError} is raised.
250 \begin{notice}
251 If a member occurs more than once in the archive, its last
Johannes Gijsbersd3452252004-09-11 16:50:06 +0000252 occurrence is assumed to be the most up-to-date version.
Neal Norwitzb9ef4ae2003-01-05 23:19:43 +0000253 \end{notice}
254\end{methoddesc}
255
256\begin{methoddesc}{getmembers}{}
257 Return the members of the archive as a list of \class{TarInfo} objects.
258 The list has the same order as the members in the archive.
259\end{methoddesc}
260
261\begin{methoddesc}{getnames}{}
262 Return the members as a list of their names. It has the same order as
263 the list returned by \method{getmembers()}.
264\end{methoddesc}
265
266\begin{methoddesc}{list}{verbose=True}
267 Print a table of contents to \code{sys.stdout}. If \var{verbose} is
Fred Drake3bbd1152004-01-13 23:41:32 +0000268 \constant{False}, only the names of the members are printed. If it is
269 \constant{True}, output similar to that of \program{ls -l} is produced.
Neal Norwitzb9ef4ae2003-01-05 23:19:43 +0000270\end{methoddesc}
271
272\begin{methoddesc}{next}{}
273 Return the next member of the archive as a \class{TarInfo} object, when
274 \class{TarFile} is opened for reading. Return \code{None} if there is no
275 more available.
276\end{methoddesc}
277
Martin v. Löwis00a73e72005-03-04 19:40:34 +0000278\begin{methoddesc}{extractall}{\optional{path\optional{, members}}}
279 Extract all members from the archive to the current working directory
280 or directory \var{path}. If optional \var{members} is given, it must be
281 a subset of the list returned by \method{getmembers()}.
Guido van Rossume7ba4952007-06-06 23:52:48 +0000282 Directory information like owner, modification time and permissions are
Martin v. Löwis00a73e72005-03-04 19:40:34 +0000283 set after all members have been extracted. This is done to work around two
284 problems: A directory's modification time is reset each time a file is
285 created in it. And, if a directory's permissions do not allow writing,
286 extracting files to it will fail.
287 \versionadded{2.5}
288\end{methoddesc}
289
Neal Norwitzb9ef4ae2003-01-05 23:19:43 +0000290\begin{methoddesc}{extract}{member\optional{, path}}
291 Extract a member from the archive to the current working directory,
292 using its full name. Its file information is extracted as accurately as
293 possible.
294 \var{member} may be a filename or a \class{TarInfo} object.
295 You can specify a different directory using \var{path}.
Martin v. Löwis00a73e72005-03-04 19:40:34 +0000296 \begin{notice}
297 Because the \method{extract()} method allows random access to a tar
298 archive there are some issues you must take care of yourself. See the
299 description for \method{extractall()} above.
300 \end{notice}
Neal Norwitzb9ef4ae2003-01-05 23:19:43 +0000301\end{methoddesc}
302
303\begin{methoddesc}{extractfile}{member}
304 Extract a member from the archive as a file object.
305 \var{member} may be a filename or a \class{TarInfo} object.
306 If \var{member} is a regular file, a file-like object is returned.
307 If \var{member} is a link, a file-like object is constructed from the
308 link's target.
309 If \var{member} is none of the above, \code{None} is returned.
310 \begin{notice}
311 The file-like object is read-only and provides the following methods:
312 \method{read()}, \method{readline()}, \method{readlines()},
313 \method{seek()}, \method{tell()}.
314 \end{notice}
315\end{methoddesc}
316
Fred Drake3bbd1152004-01-13 23:41:32 +0000317\begin{methoddesc}{add}{name\optional{, arcname\optional{, recursive}}}
Neal Norwitzb9ef4ae2003-01-05 23:19:43 +0000318 Add the file \var{name} to the archive. \var{name} may be any type
319 of file (directory, fifo, symbolic link, etc.).
320 If given, \var{arcname} specifies an alternative name for the file in the
321 archive. Directories are added recursively by default.
Fred Drake3bbd1152004-01-13 23:41:32 +0000322 This can be avoided by setting \var{recursive} to \constant{False};
323 the default is \constant{True}.
Neal Norwitzb9ef4ae2003-01-05 23:19:43 +0000324\end{methoddesc}
325
326\begin{methoddesc}{addfile}{tarinfo\optional{, fileobj}}
327 Add the \class{TarInfo} object \var{tarinfo} to the archive.
Fred Drake3bbd1152004-01-13 23:41:32 +0000328 If \var{fileobj} is given, \code{\var{tarinfo}.size} bytes are read
Neal Norwitzb9ef4ae2003-01-05 23:19:43 +0000329 from it and added to the archive. You can create \class{TarInfo} objects
330 using \method{gettarinfo()}.
331 \begin{notice}
332 On Windows platforms, \var{fileobj} should always be opened with mode
333 \code{'rb'} to avoid irritation about the file size.
334 \end{notice}
335\end{methoddesc}
336
Fred Drake3bbd1152004-01-13 23:41:32 +0000337\begin{methoddesc}{gettarinfo}{\optional{name\optional{,
338 arcname\optional{, fileobj}}}}
339 Create a \class{TarInfo} object for either the file \var{name} or
340 the file object \var{fileobj} (using \function{os.fstat()} on its
341 file descriptor). You can modify some of the \class{TarInfo}'s
342 attributes before you add it using \method{addfile()}. If given,
343 \var{arcname} specifies an alternative name for the file in the
Neal Norwitzb9ef4ae2003-01-05 23:19:43 +0000344 archive.
345\end{methoddesc}
346
347\begin{methoddesc}{close}{}
Fred Drake3bbd1152004-01-13 23:41:32 +0000348 Close the \class{TarFile}. In write mode, two finishing zero
349 blocks are appended to the archive.
Neal Norwitzb9ef4ae2003-01-05 23:19:43 +0000350\end{methoddesc}
351
Fred Drake3bbd1152004-01-13 23:41:32 +0000352\begin{memberdesc}{posix}
Guido van Rossumd8faa362007-04-27 19:54:29 +0000353 Setting this to \constant{True} is equivalent to setting the
354 \member{format} attribute to \constant{USTAR_FORMAT},
355 \constant{False} is equivalent to \constant{GNU_FORMAT}.
Neal Norwitz525b3152004-08-20 01:52:42 +0000356 \versionchanged[\var{posix} defaults to \constant{False}]{2.4}
Guido van Rossumd8faa362007-04-27 19:54:29 +0000357 \deprecated{2.6}{Use the \member{format} attribute instead.}
Neal Norwitzb9ef4ae2003-01-05 23:19:43 +0000358\end{memberdesc}
359
Guido van Rossume7ba4952007-06-06 23:52:48 +0000360\begin{memberdesc}{pax_headers}
361 A dictionary containing key-value pairs of pax global headers.
362 \versionadded{2.6}
363\end{memberdesc}
364
Neal Norwitzb9ef4ae2003-01-05 23:19:43 +0000365%-----------------
366% TarInfo Objects
367%-----------------
368
369\subsection{TarInfo Objects \label{tarinfo-objects}}
370
Fred Drake3bbd1152004-01-13 23:41:32 +0000371A \class{TarInfo} object represents one member in a
372\class{TarFile}. Aside from storing all required attributes of a file
373(like file type, size, time, permissions, owner etc.), it provides
374some useful methods to determine its type. It does \emph{not} contain
375the file's data itself.
Neal Norwitzb9ef4ae2003-01-05 23:19:43 +0000376
Fred Drake3bbd1152004-01-13 23:41:32 +0000377\class{TarInfo} objects are returned by \class{TarFile}'s methods
378\method{getmember()}, \method{getmembers()} and \method{gettarinfo()}.
Neal Norwitzb9ef4ae2003-01-05 23:19:43 +0000379
380\begin{classdesc}{TarInfo}{\optional{name}}
381 Create a \class{TarInfo} object.
382\end{classdesc}
383
Guido van Rossume7ba4952007-06-06 23:52:48 +0000384\begin{methoddesc}{frombuf}{buf}
385 Create and return a \class{TarInfo} object from string buffer \var{buf}.
Thomas Wouters902d6eb2007-01-09 23:18:33 +0000386 \versionadded[Raises \exception{HeaderError} if the buffer is
387 invalid.]{2.6}
Neal Norwitzb9ef4ae2003-01-05 23:19:43 +0000388\end{methoddesc}
389
Guido van Rossumd8faa362007-04-27 19:54:29 +0000390\begin{methoddesc}{fromtarfile}{tarfile}
391 Read the next member from the \class{TarFile} object \var{tarfile} and
392 return it as a \class{TarInfo} object.
393 \versionadded{2.6}
394\end{methoddesc}
Thomas Wouters477c8d52006-05-27 19:21:47 +0000395
Guido van Rossume7ba4952007-06-06 23:52:48 +0000396\begin{methoddesc}{tobuf}{\optional{format\optional{, encoding
397 \optional{, errors}}}}
398 Create a string buffer from a \class{TarInfo} object. For information
399 on the arguments see the constructor of the \class{TarFile} class.
400 \versionchanged[The arguments were added]{2.6}
Neal Norwitzb9ef4ae2003-01-05 23:19:43 +0000401\end{methoddesc}
402
403A \code{TarInfo} object has the following public data attributes:
Fred Drake3bbd1152004-01-13 23:41:32 +0000404
Neal Norwitzb9ef4ae2003-01-05 23:19:43 +0000405\begin{memberdesc}{name}
406 Name of the archive member.
407\end{memberdesc}
408
409\begin{memberdesc}{size}
410 Size in bytes.
411\end{memberdesc}
412
413\begin{memberdesc}{mtime}
414 Time of last modification.
415\end{memberdesc}
416
417\begin{memberdesc}{mode}
418 Permission bits.
419\end{memberdesc}
420
421\begin{memberdesc}{type}
Fred Drake3bbd1152004-01-13 23:41:32 +0000422 File type. \var{type} is usually one of these constants:
423 \constant{REGTYPE}, \constant{AREGTYPE}, \constant{LNKTYPE},
424 \constant{SYMTYPE}, \constant{DIRTYPE}, \constant{FIFOTYPE},
425 \constant{CONTTYPE}, \constant{CHRTYPE}, \constant{BLKTYPE},
426 \constant{GNUTYPE_SPARSE}. To determine the type of a
427 \class{TarInfo} object more conveniently, use the \code{is_*()}
428 methods below.
Neal Norwitzb9ef4ae2003-01-05 23:19:43 +0000429\end{memberdesc}
430
431\begin{memberdesc}{linkname}
Fred Drake3bbd1152004-01-13 23:41:32 +0000432 Name of the target file name, which is only present in
433 \class{TarInfo} objects of type \constant{LNKTYPE} and
434 \constant{SYMTYPE}.
Neal Norwitzb9ef4ae2003-01-05 23:19:43 +0000435\end{memberdesc}
436
Fred Drake3bbd1152004-01-13 23:41:32 +0000437\begin{memberdesc}{uid}
438 User ID of the user who originally stored this member.
Neal Norwitzb9ef4ae2003-01-05 23:19:43 +0000439\end{memberdesc}
440
Fred Drake3bbd1152004-01-13 23:41:32 +0000441\begin{memberdesc}{gid}
442 Group ID of the user who originally stored this member.
443\end{memberdesc}
444
445\begin{memberdesc}{uname}
446 User name.
447\end{memberdesc}
448
449\begin{memberdesc}{gname}
450 Group name.
Neal Norwitzb9ef4ae2003-01-05 23:19:43 +0000451\end{memberdesc}
452
Guido van Rossume7ba4952007-06-06 23:52:48 +0000453\begin{memberdesc}{pax_headers}
454 A dictionary containing key-value pairs of an associated pax
455 extended header.
456 \versionadded{2.6}
457\end{memberdesc}
458
Neal Norwitzb9ef4ae2003-01-05 23:19:43 +0000459A \class{TarInfo} object also provides some convenient query methods:
Fred Drake3bbd1152004-01-13 23:41:32 +0000460
Neal Norwitzb9ef4ae2003-01-05 23:19:43 +0000461\begin{methoddesc}{isfile}{}
Fred Drake3bbd1152004-01-13 23:41:32 +0000462 Return \constant{True} if the \class{Tarinfo} object is a regular
463 file.
Neal Norwitzb9ef4ae2003-01-05 23:19:43 +0000464\end{methoddesc}
465
466\begin{methoddesc}{isreg}{}
467 Same as \method{isfile()}.
468\end{methoddesc}
469
470\begin{methoddesc}{isdir}{}
Fred Drake3bbd1152004-01-13 23:41:32 +0000471 Return \constant{True} if it is a directory.
Neal Norwitzb9ef4ae2003-01-05 23:19:43 +0000472\end{methoddesc}
473
474\begin{methoddesc}{issym}{}
Fred Drake3bbd1152004-01-13 23:41:32 +0000475 Return \constant{True} if it is a symbolic link.
Neal Norwitzb9ef4ae2003-01-05 23:19:43 +0000476\end{methoddesc}
477
478\begin{methoddesc}{islnk}{}
Fred Drake3bbd1152004-01-13 23:41:32 +0000479 Return \constant{True} if it is a hard link.
Neal Norwitzb9ef4ae2003-01-05 23:19:43 +0000480\end{methoddesc}
481
482\begin{methoddesc}{ischr}{}
Fred Drake3bbd1152004-01-13 23:41:32 +0000483 Return \constant{True} if it is a character device.
Neal Norwitzb9ef4ae2003-01-05 23:19:43 +0000484\end{methoddesc}
485
486\begin{methoddesc}{isblk}{}
Fred Drake3bbd1152004-01-13 23:41:32 +0000487 Return \constant{True} if it is a block device.
Neal Norwitzb9ef4ae2003-01-05 23:19:43 +0000488\end{methoddesc}
489
490\begin{methoddesc}{isfifo}{}
Fred Drake3bbd1152004-01-13 23:41:32 +0000491 Return \constant{True} if it is a FIFO.
Neal Norwitzb9ef4ae2003-01-05 23:19:43 +0000492\end{methoddesc}
493
494\begin{methoddesc}{isdev}{}
Fred Drake3bbd1152004-01-13 23:41:32 +0000495 Return \constant{True} if it is one of character device, block
496 device or FIFO.
Neal Norwitzb9ef4ae2003-01-05 23:19:43 +0000497\end{methoddesc}
498
499%------------------------
500% Examples
501%------------------------
502
503\subsection{Examples \label{tar-examples}}
504
Martin v. Löwis00a73e72005-03-04 19:40:34 +0000505How to extract an entire tar archive to the current working directory:
506\begin{verbatim}
507import tarfile
508tar = tarfile.open("sample.tar.gz")
509tar.extractall()
510tar.close()
511\end{verbatim}
512
Neal Norwitzb9ef4ae2003-01-05 23:19:43 +0000513How to create an uncompressed tar archive from a list of filenames:
514\begin{verbatim}
515import tarfile
516tar = tarfile.open("sample.tar", "w")
517for name in ["foo", "bar", "quux"]:
518 tar.add(name)
519tar.close()
520\end{verbatim}
521
522How to read a gzip compressed tar archive and display some member information:
523\begin{verbatim}
524import tarfile
525tar = tarfile.open("sample.tar.gz", "r:gz")
526for tarinfo in tar:
527 print tarinfo.name, "is", tarinfo.size, "bytes in size and is",
528 if tarinfo.isreg():
529 print "a regular file."
530 elif tarinfo.isdir():
531 print "a directory."
532 else:
533 print "something else."
534tar.close()
535\end{verbatim}
536
537How to create a tar archive with faked information:
538\begin{verbatim}
539import tarfile
540tar = tarfile.open("sample.tar.gz", "w:gz")
541for name in namelist:
542 tarinfo = tar.gettarinfo(name, "fakeproj-1.0/" + name)
543 tarinfo.uid = 123
544 tarinfo.gid = 456
545 tarinfo.uname = "johndoe"
546 tarinfo.gname = "fake"
547 tar.addfile(tarinfo, file(name))
548tar.close()
549\end{verbatim}
550
551The \emph{only} way to extract an uncompressed tar stream from
552\code{sys.stdin}:
553\begin{verbatim}
554import sys
555import tarfile
556tar = tarfile.open(mode="r|", fileobj=sys.stdin)
557for tarinfo in tar:
558 tar.extract(tarinfo)
559tar.close()
560\end{verbatim}
Guido van Rossume7ba4952007-06-06 23:52:48 +0000561
562%------------
563% Tar format
564%------------
565
566\subsection{Supported tar formats \label{tar-formats}}
567
568There are three tar formats that can be created with the \module{tarfile}
569module:
570
571\begin{itemize}
572
573\item
574The \POSIX{}.1-1988 ustar format (\constant{USTAR_FORMAT}). It supports
575filenames up to a length of at best 256 characters and linknames up to 100
576characters. The maximum file size is 8 gigabytes. This is an old and limited
577but widely supported format.
578
579\item
580The GNU tar format (\constant{GNU_FORMAT}). It supports long filenames and
581linknames, files bigger than 8 gigabytes and sparse files. It is the de facto
582standard on GNU/Linux systems. \module{tarfile} fully supports the GNU tar
583extensions for long names, sparse file support is read-only.
584
585\item
586The \POSIX{}.1-2001 pax format (\constant{PAX_FORMAT}). It is the most
587flexible format with virtually no limits. It supports long filenames and
588linknames, large files and stores pathnames in a portable way. However, not
589all tar implementations today are able to handle pax archives properly.
590
591The \emph{pax} format is an extension to the existing \emph{ustar} format. It
592uses extra headers for information that cannot be stored otherwise. There are
593two flavours of pax headers: Extended headers only affect the subsequent file
594header, global headers are valid for the complete archive and affect all
595following files. All the data in a pax header is encoded in \emph{UTF-8} for
596portability reasons.
597
598\end{itemize}
599
600There are some more variants of the tar format which can be read, but not
601created:
602
603\begin{itemize}
604
605\item
606The ancient V7 format. This is the first tar format from \UNIX{} Seventh
607Edition, storing only regular files and directories. Names must not be longer
608than 100 characters, there is no user/group name information. Some archives
609have miscalculated header checksums in case of fields with non-\ASCII{}
610characters.
611
612\item
613The SunOS tar extended format. This format is a variant of the \POSIX{}.1-2001
614pax format, but is not compatible.
615
616\end{itemize}
617
618%----------------
619% Unicode issues
620%----------------
621
622\subsection{Unicode issues \label{tar-unicode}}
623
624The tar format was originally conceived to make backups on tape drives with the
625main focus on preserving file system information. Nowadays tar archives are
626commonly used for file distribution and exchanging archives over networks. One
627problem of the original format (that all other formats are merely variants of)
628is that there is no concept of supporting different character encodings.
629For example, an ordinary tar archive created on a \emph{UTF-8} system cannot be
630read correctly on a \emph{Latin-1} system if it contains non-\ASCII{}
631characters. Names (i.e. filenames, linknames, user/group names) containing
632these characters will appear damaged. Unfortunately, there is no way to
633autodetect the encoding of an archive.
634
635The pax format was designed to solve this problem. It stores non-\ASCII{} names
636using the universal character encoding \emph{UTF-8}. When a pax archive is
637read, these \emph{UTF-8} names are converted to the encoding of the local
638file system.
639
640The details of unicode conversion are controlled by the \var{encoding} and
641\var{errors} keyword arguments of the \class{TarFile} class.
642
643The default value for \var{encoding} is the local character encoding. It is
644deduced from \function{sys.getfilesystemencoding()} and
645\function{sys.getdefaultencoding()}. In read mode, \var{encoding} is used
646exclusively to convert unicode names from a pax archive to strings in the local
647character encoding. In write mode, the use of \var{encoding} depends on the
648chosen archive format. In case of \constant{PAX_FORMAT}, input names that
649contain non-\ASCII{} characters need to be decoded before being stored as
650\emph{UTF-8} strings. The other formats do not make use of \var{encoding}
651unless unicode objects are used as input names. These are converted to
6528-bit character strings before they are added to the archive.
653
654The \var{errors} argument defines how characters are treated that cannot be
655converted to or from \var{encoding}. Possible values are listed in section
656\ref{codec-base-classes}. In read mode, there is an additional scheme
657\code{'utf-8'} which means that bad characters are replaced by their
658\emph{UTF-8} representation. This is the default scheme. In write mode the
659default value for \var{errors} is \code{'strict'} to ensure that name
660information is not altered unnoticed.