blob: 95ea05146dad209161bc555444914c26a9ae4f28 [file] [log] [blame]
Neal Norwitzb9ef4ae2003-01-05 23:19:43 +00001\section{\module{tarfile} --- Read and write tar archive files}
2
3\declaremodule{standard}{tarfile}
4\modulesynopsis{Read and write tar-format archive files.}
5\versionadded{2.3}
6
7\moduleauthor{Lars Gust\"abel}{lars@gustaebel.de}
8\sectionauthor{Lars Gust\"abel}{lars@gustaebel.de}
9
10The \module{tarfile} module makes it possible to read and create tar archives.
11Some facts and figures:
12
13\begin{itemize}
14\item reads and writes \module{gzip} and \module{bzip2} compressed archives.
Guido van Rossumd8faa362007-04-27 19:54:29 +000015\item read/write support for the \POSIX{}.1-1988 (ustar) format.
16\item read/write support for the GNU tar format including \emph{longname} and
17 \emph{longlink} extensions, read-only support for the \emph{sparse}
18 extension.
19\item read/write support for the \POSIX{}.1-2001 (pax) format.
20 \versionadded{2.6}
Neal Norwitzb9ef4ae2003-01-05 23:19:43 +000021\item handles directories, regular files, hardlinks, symbolic links, fifos,
22 character devices and block devices and is able to acquire and
23 restore file information like timestamp, access permissions and owner.
24\item can handle tape devices.
25\end{itemize}
26
Guido van Rossumd8faa362007-04-27 19:54:29 +000027\begin{funcdesc}{open}{name\optional{, mode\optional{,
28 fileobj\optional{, bufsize}}}, **kwargs}
Neal Norwitzb9ef4ae2003-01-05 23:19:43 +000029 Return a \class{TarFile} object for the pathname \var{name}.
Guido van Rossumd8faa362007-04-27 19:54:29 +000030 For detailed information on \class{TarFile} objects and the keyword
31 arguments that are allowed, see \citetitle{TarFile Objects}
32 (section \ref{tarfile-objects}).
Neal Norwitzb9ef4ae2003-01-05 23:19:43 +000033
34 \var{mode} has to be a string of the form \code{'filemode[:compression]'},
35 it defaults to \code{'r'}. Here is a full list of mode combinations:
36
37 \begin{tableii}{c|l}{code}{mode}{action}
Martin v. Löwis78be7df2005-03-05 12:47:42 +000038 \lineii{'r' or 'r:*'}{Open for reading with transparent compression (recommended).}
Neal Norwitzb9ef4ae2003-01-05 23:19:43 +000039 \lineii{'r:'}{Open for reading exclusively without compression.}
40 \lineii{'r:gz'}{Open for reading with gzip compression.}
41 \lineii{'r:bz2'}{Open for reading with bzip2 compression.}
Thomas Wouterscf297e42007-02-23 15:07:44 +000042 \lineii{'a' or 'a:'}{Open for appending with no compression. The file
43 is created if it does not exist.}
Neal Norwitzb9ef4ae2003-01-05 23:19:43 +000044 \lineii{'w' or 'w:'}{Open for uncompressed writing.}
45 \lineii{'w:gz'}{Open for gzip compressed writing.}
46 \lineii{'w:bz2'}{Open for bzip2 compressed writing.}
47 \end{tableii}
48
49 Note that \code{'a:gz'} or \code{'a:bz2'} is not possible.
50 If \var{mode} is not suitable to open a certain (compressed) file for
51 reading, \exception{ReadError} is raised. Use \var{mode} \code{'r'} to
52 avoid this. If a compression method is not supported,
53 \exception{CompressionError} is raised.
54
Thomas Wouterscf297e42007-02-23 15:07:44 +000055 If \var{fileobj} is specified, it is used as an alternative to a file
56 object opened for \var{name}. It is supposed to be at position 0.
Neal Norwitzb9ef4ae2003-01-05 23:19:43 +000057
58 For special purposes, there is a second format for \var{mode}:
Fred Drake3bbd1152004-01-13 23:41:32 +000059 \code{'filemode|[compression]'}. \function{open()} will return a
60 \class{TarFile} object that processes its data as a stream of
61 blocks. No random seeking will be done on the file. If given,
62 \var{fileobj} may be any object that has a \method{read()} or
63 \method{write()} method (depending on the \var{mode}).
64 \var{bufsize} specifies the blocksize and defaults to \code{20 *
65 512} bytes. Use this variant in combination with
66 e.g. \code{sys.stdin}, a socket file object or a tape device.
67 However, such a \class{TarFile} object is limited in that it does
68 not allow to be accessed randomly, see ``Examples''
69 (section~\ref{tar-examples}). The currently possible modes:
Neal Norwitzb9ef4ae2003-01-05 23:19:43 +000070
Fred Drake3bbd1152004-01-13 23:41:32 +000071 \begin{tableii}{c|l}{code}{Mode}{Action}
Martin v. Löwis78be7df2005-03-05 12:47:42 +000072 \lineii{'r|*'}{Open a \emph{stream} of tar blocks for reading with transparent compression.}
Neal Norwitzb9ef4ae2003-01-05 23:19:43 +000073 \lineii{'r|'}{Open a \emph{stream} of uncompressed tar blocks for reading.}
74 \lineii{'r|gz'}{Open a gzip compressed \emph{stream} for reading.}
75 \lineii{'r|bz2'}{Open a bzip2 compressed \emph{stream} for reading.}
76 \lineii{'w|'}{Open an uncompressed \emph{stream} for writing.}
77 \lineii{'w|gz'}{Open an gzip compressed \emph{stream} for writing.}
78 \lineii{'w|bz2'}{Open an bzip2 compressed \emph{stream} for writing.}
79 \end{tableii}
80\end{funcdesc}
81
82\begin{classdesc*}{TarFile}
83 Class for reading and writing tar archives. Do not use this
84 class directly, better use \function{open()} instead.
Fred Drake3bbd1152004-01-13 23:41:32 +000085 See ``TarFile Objects'' (section~\ref{tarfile-objects}).
Neal Norwitzb9ef4ae2003-01-05 23:19:43 +000086\end{classdesc*}
87
88\begin{funcdesc}{is_tarfile}{name}
Fred Drake3bbd1152004-01-13 23:41:32 +000089 Return \constant{True} if \var{name} is a tar archive file, that
90 the \module{tarfile} module can read.
Neal Norwitzb9ef4ae2003-01-05 23:19:43 +000091\end{funcdesc}
92
93\begin{classdesc}{TarFileCompat}{filename\optional{, mode\optional{,
Fred Drake3bbd1152004-01-13 23:41:32 +000094 compression}}}
95 Class for limited access to tar archives with a
96 \refmodule{zipfile}-like interface. Please consult the
97 documentation of the \refmodule{zipfile} module for more details.
98 \var{compression} must be one of the following constants:
Neal Norwitzb9ef4ae2003-01-05 23:19:43 +000099 \begin{datadesc}{TAR_PLAIN}
100 Constant for an uncompressed tar archive.
101 \end{datadesc}
102 \begin{datadesc}{TAR_GZIPPED}
Fred Drake3bbd1152004-01-13 23:41:32 +0000103 Constant for a \refmodule{gzip} compressed tar archive.
Neal Norwitzb9ef4ae2003-01-05 23:19:43 +0000104 \end{datadesc}
105\end{classdesc}
106
107\begin{excdesc}{TarError}
108 Base class for all \module{tarfile} exceptions.
109\end{excdesc}
110
111\begin{excdesc}{ReadError}
112 Is raised when a tar archive is opened, that either cannot be handled by
113 the \module{tarfile} module or is somehow invalid.
114\end{excdesc}
115
116\begin{excdesc}{CompressionError}
117 Is raised when a compression method is not supported or when the data
118 cannot be decoded properly.
119\end{excdesc}
120
121\begin{excdesc}{StreamError}
122 Is raised for the limitations that are typical for stream-like
123 \class{TarFile} objects.
124\end{excdesc}
125
126\begin{excdesc}{ExtractError}
127 Is raised for \emph{non-fatal} errors when using \method{extract()}, but
128 only if \member{TarFile.errorlevel}\code{ == 2}.
129\end{excdesc}
130
Thomas Wouters902d6eb2007-01-09 23:18:33 +0000131\begin{excdesc}{HeaderError}
132 Is raised by \method{frombuf()} if the buffer it gets is invalid.
133 \versionadded{2.6}
134\end{excdesc}
135
Guido van Rossume7ba4952007-06-06 23:52:48 +0000136Each of the following constants defines a tar archive format that the
137\module{tarfile} module is able to create. See section \ref{tar-formats} for
138details.
139
Guido van Rossumd8faa362007-04-27 19:54:29 +0000140\begin{datadesc}{USTAR_FORMAT}
Guido van Rossume7ba4952007-06-06 23:52:48 +0000141 \POSIX{}.1-1988 (ustar) format.
Guido van Rossumd8faa362007-04-27 19:54:29 +0000142\end{datadesc}
143
144\begin{datadesc}{GNU_FORMAT}
Guido van Rossume7ba4952007-06-06 23:52:48 +0000145 GNU tar format.
Guido van Rossumd8faa362007-04-27 19:54:29 +0000146\end{datadesc}
147
148\begin{datadesc}{PAX_FORMAT}
Guido van Rossume7ba4952007-06-06 23:52:48 +0000149 \POSIX{}.1-2001 (pax) format.
Guido van Rossumd8faa362007-04-27 19:54:29 +0000150\end{datadesc}
151
152\begin{datadesc}{DEFAULT_FORMAT}
153 The default format for creating archives. This is currently
154 \constant{GNU_FORMAT}.
155\end{datadesc}
156
Neal Norwitzb9ef4ae2003-01-05 23:19:43 +0000157\begin{seealso}
Fred Drake3bbd1152004-01-13 23:41:32 +0000158 \seemodule{zipfile}{Documentation of the \refmodule{zipfile}
Neal Norwitzb9ef4ae2003-01-05 23:19:43 +0000159 standard module.}
160
Thomas Wouters477c8d52006-05-27 19:21:47 +0000161 \seetitle[http://www.gnu.org/software/tar/manual/html_node/tar_134.html\#SEC134]
Georg Brandl9a19e5c2005-08-27 17:10:35 +0000162 {GNU tar manual, Basic Tar Format}{Documentation for tar archive files,
Neal Norwitzb9ef4ae2003-01-05 23:19:43 +0000163 including GNU tar extensions.}
164\end{seealso}
165
166%-----------------
167% TarFile Objects
168%-----------------
169
170\subsection{TarFile Objects \label{tarfile-objects}}
171
172The \class{TarFile} object provides an interface to a tar archive. A tar
173archive is a sequence of blocks. An archive member (a stored file) is made up
Guido van Rossume7ba4952007-06-06 23:52:48 +0000174of a header block followed by data blocks. It is possible to store a file in a
Neal Norwitzb9ef4ae2003-01-05 23:19:43 +0000175tar archive several times. Each archive member is represented by a
176\class{TarInfo} object, see \citetitle{TarInfo Objects} (section
177\ref{tarinfo-objects}) for details.
178
Guido van Rossumd8faa362007-04-27 19:54:29 +0000179\begin{classdesc}{TarFile}{name=None, mode='r', fileobj=None,
180 format=DEFAULT_FORMAT, tarinfo=TarInfo, dereference=False,
Guido van Rossume7ba4952007-06-06 23:52:48 +0000181 ignore_zeros=False, encoding=None, errors=None, pax_headers=None,
182 debug=0, errorlevel=0}
Guido van Rossumd8faa362007-04-27 19:54:29 +0000183
184 All following arguments are optional and can be accessed as instance
185 attributes as well.
186
187 \var{name} is the pathname of the archive. It can be omitted if
188 \var{fileobj} is given. In this case, the file object's \member{name}
189 attribute is used if it exists.
190
Neal Norwitzb9ef4ae2003-01-05 23:19:43 +0000191 \var{mode} is either \code{'r'} to read from an existing archive,
192 \code{'a'} to append data to an existing file or \code{'w'} to create a new
Guido van Rossumd8faa362007-04-27 19:54:29 +0000193 file overwriting an existing one.
Neal Norwitzb9ef4ae2003-01-05 23:19:43 +0000194
195 If \var{fileobj} is given, it is used for reading or writing data.
196 If it can be determined, \var{mode} is overridden by \var{fileobj}'s mode.
Thomas Wouterscf297e42007-02-23 15:07:44 +0000197 \var{fileobj} will be used from position 0.
Neal Norwitzb9ef4ae2003-01-05 23:19:43 +0000198 \begin{notice}
199 \var{fileobj} is not closed, when \class{TarFile} is closed.
200 \end{notice}
Guido van Rossumd8faa362007-04-27 19:54:29 +0000201
202 \var{format} controls the archive format. It must be one of the constants
203 \constant{USTAR_FORMAT}, \constant{GNU_FORMAT} or \constant{PAX_FORMAT}
204 that are defined at module level.
205 \versionadded{2.6}
206
207 The \var{tarinfo} argument can be used to replace the default
208 \class{TarInfo} class with a different one.
209 \versionadded{2.6}
210
211 If \var{dereference} is \code{False}, add symbolic and hard links to the
212 archive. If it is \code{True}, add the content of the target files to the
213 archive. This has no effect on systems that do not support symbolic links.
214
215 If \var{ignore_zeros} is \code{False}, treat an empty block as the end of
216 the archive. If it is \var{True}, skip empty (and invalid) blocks and try
217 to get as many members as possible. This is only useful for reading
218 concatenated or damaged archives.
219
220 \var{debug} can be set from \code{0} (no debug messages) up to \code{3}
221 (all debug messages). The messages are written to \code{sys.stderr}.
222
223 If \var{errorlevel} is \code{0}, all errors are ignored when using
224 \method{extract()}. Nevertheless, they appear as error messages in the
225 debug output, when debugging is enabled. If \code{1}, all \emph{fatal}
226 errors are raised as \exception{OSError} or \exception{IOError} exceptions.
227 If \code{2}, all \emph{non-fatal} errors are raised as \exception{TarError}
228 exceptions as well.
229
Guido van Rossume7ba4952007-06-06 23:52:48 +0000230 The \var{encoding} and \var{errors} arguments control the way strings are
231 converted to unicode objects and vice versa. The default settings will work
232 for most users. See section \ref{tar-unicode} for in-depth information.
Guido van Rossumd8faa362007-04-27 19:54:29 +0000233 \versionadded{2.6}
234
Guido van Rossume7ba4952007-06-06 23:52:48 +0000235 The \var{pax_headers} argument is an optional dictionary of unicode strings
236 which will be added as a pax global header if \var{format} is
237 \constant{PAX_FORMAT}.
Guido van Rossumd8faa362007-04-27 19:54:29 +0000238 \versionadded{2.6}
Neal Norwitzb9ef4ae2003-01-05 23:19:43 +0000239\end{classdesc}
240
241\begin{methoddesc}{open}{...}
242 Alternative constructor. The \function{open()} function on module level is
Fred Drake3bbd1152004-01-13 23:41:32 +0000243 actually a shortcut to this classmethod. See section~\ref{module-tarfile}
Neal Norwitzb9ef4ae2003-01-05 23:19:43 +0000244 for details.
245\end{methoddesc}
246
247\begin{methoddesc}{getmember}{name}
248 Return a \class{TarInfo} object for member \var{name}. If \var{name} can
249 not be found in the archive, \exception{KeyError} is raised.
250 \begin{notice}
251 If a member occurs more than once in the archive, its last
Johannes Gijsbersd3452252004-09-11 16:50:06 +0000252 occurrence is assumed to be the most up-to-date version.
Neal Norwitzb9ef4ae2003-01-05 23:19:43 +0000253 \end{notice}
254\end{methoddesc}
255
256\begin{methoddesc}{getmembers}{}
257 Return the members of the archive as a list of \class{TarInfo} objects.
258 The list has the same order as the members in the archive.
259\end{methoddesc}
260
261\begin{methoddesc}{getnames}{}
262 Return the members as a list of their names. It has the same order as
263 the list returned by \method{getmembers()}.
264\end{methoddesc}
265
266\begin{methoddesc}{list}{verbose=True}
267 Print a table of contents to \code{sys.stdout}. If \var{verbose} is
Fred Drake3bbd1152004-01-13 23:41:32 +0000268 \constant{False}, only the names of the members are printed. If it is
269 \constant{True}, output similar to that of \program{ls -l} is produced.
Neal Norwitzb9ef4ae2003-01-05 23:19:43 +0000270\end{methoddesc}
271
272\begin{methoddesc}{next}{}
273 Return the next member of the archive as a \class{TarInfo} object, when
274 \class{TarFile} is opened for reading. Return \code{None} if there is no
275 more available.
276\end{methoddesc}
277
Martin v. Löwis00a73e72005-03-04 19:40:34 +0000278\begin{methoddesc}{extractall}{\optional{path\optional{, members}}}
279 Extract all members from the archive to the current working directory
280 or directory \var{path}. If optional \var{members} is given, it must be
281 a subset of the list returned by \method{getmembers()}.
Guido van Rossume7ba4952007-06-06 23:52:48 +0000282 Directory information like owner, modification time and permissions are
Martin v. Löwis00a73e72005-03-04 19:40:34 +0000283 set after all members have been extracted. This is done to work around two
284 problems: A directory's modification time is reset each time a file is
285 created in it. And, if a directory's permissions do not allow writing,
286 extracting files to it will fail.
287 \versionadded{2.5}
288\end{methoddesc}
289
Neal Norwitzb9ef4ae2003-01-05 23:19:43 +0000290\begin{methoddesc}{extract}{member\optional{, path}}
291 Extract a member from the archive to the current working directory,
292 using its full name. Its file information is extracted as accurately as
293 possible.
294 \var{member} may be a filename or a \class{TarInfo} object.
295 You can specify a different directory using \var{path}.
Martin v. Löwis00a73e72005-03-04 19:40:34 +0000296 \begin{notice}
297 Because the \method{extract()} method allows random access to a tar
298 archive there are some issues you must take care of yourself. See the
299 description for \method{extractall()} above.
300 \end{notice}
Neal Norwitzb9ef4ae2003-01-05 23:19:43 +0000301\end{methoddesc}
302
303\begin{methoddesc}{extractfile}{member}
304 Extract a member from the archive as a file object.
305 \var{member} may be a filename or a \class{TarInfo} object.
306 If \var{member} is a regular file, a file-like object is returned.
307 If \var{member} is a link, a file-like object is constructed from the
308 link's target.
309 If \var{member} is none of the above, \code{None} is returned.
310 \begin{notice}
311 The file-like object is read-only and provides the following methods:
312 \method{read()}, \method{readline()}, \method{readlines()},
313 \method{seek()}, \method{tell()}.
314 \end{notice}
315\end{methoddesc}
316
Guido van Rossum486364b2007-06-30 05:01:58 +0000317\begin{methoddesc}{add}{name\optional{, arcname\optional{, recursive\optional{, exclude}}}}
Neal Norwitzb9ef4ae2003-01-05 23:19:43 +0000318 Add the file \var{name} to the archive. \var{name} may be any type
319 of file (directory, fifo, symbolic link, etc.).
320 If given, \var{arcname} specifies an alternative name for the file in the
321 archive. Directories are added recursively by default.
Guido van Rossum486364b2007-06-30 05:01:58 +0000322 This can be avoided by setting \var{recursive} to \constant{False}.
323 If \var{exclude} is given it must be a function that takes one filename
324 argument and returns a boolean value. Depending on this value the
325 respective file is either excluded (\constant{True}) or added
326 (\constant{False}).
327 \versionchanged[Added the \var{exclude} parameter]{2.6}
Neal Norwitzb9ef4ae2003-01-05 23:19:43 +0000328\end{methoddesc}
329
330\begin{methoddesc}{addfile}{tarinfo\optional{, fileobj}}
331 Add the \class{TarInfo} object \var{tarinfo} to the archive.
Fred Drake3bbd1152004-01-13 23:41:32 +0000332 If \var{fileobj} is given, \code{\var{tarinfo}.size} bytes are read
Neal Norwitzb9ef4ae2003-01-05 23:19:43 +0000333 from it and added to the archive. You can create \class{TarInfo} objects
334 using \method{gettarinfo()}.
335 \begin{notice}
336 On Windows platforms, \var{fileobj} should always be opened with mode
337 \code{'rb'} to avoid irritation about the file size.
338 \end{notice}
339\end{methoddesc}
340
Fred Drake3bbd1152004-01-13 23:41:32 +0000341\begin{methoddesc}{gettarinfo}{\optional{name\optional{,
342 arcname\optional{, fileobj}}}}
343 Create a \class{TarInfo} object for either the file \var{name} or
344 the file object \var{fileobj} (using \function{os.fstat()} on its
345 file descriptor). You can modify some of the \class{TarInfo}'s
346 attributes before you add it using \method{addfile()}. If given,
347 \var{arcname} specifies an alternative name for the file in the
Neal Norwitzb9ef4ae2003-01-05 23:19:43 +0000348 archive.
349\end{methoddesc}
350
351\begin{methoddesc}{close}{}
Fred Drake3bbd1152004-01-13 23:41:32 +0000352 Close the \class{TarFile}. In write mode, two finishing zero
353 blocks are appended to the archive.
Neal Norwitzb9ef4ae2003-01-05 23:19:43 +0000354\end{methoddesc}
355
Fred Drake3bbd1152004-01-13 23:41:32 +0000356\begin{memberdesc}{posix}
Guido van Rossumd8faa362007-04-27 19:54:29 +0000357 Setting this to \constant{True} is equivalent to setting the
358 \member{format} attribute to \constant{USTAR_FORMAT},
359 \constant{False} is equivalent to \constant{GNU_FORMAT}.
Neal Norwitz525b3152004-08-20 01:52:42 +0000360 \versionchanged[\var{posix} defaults to \constant{False}]{2.4}
Guido van Rossumd8faa362007-04-27 19:54:29 +0000361 \deprecated{2.6}{Use the \member{format} attribute instead.}
Neal Norwitzb9ef4ae2003-01-05 23:19:43 +0000362\end{memberdesc}
363
Guido van Rossume7ba4952007-06-06 23:52:48 +0000364\begin{memberdesc}{pax_headers}
365 A dictionary containing key-value pairs of pax global headers.
366 \versionadded{2.6}
367\end{memberdesc}
368
Neal Norwitzb9ef4ae2003-01-05 23:19:43 +0000369%-----------------
370% TarInfo Objects
371%-----------------
372
373\subsection{TarInfo Objects \label{tarinfo-objects}}
374
Fred Drake3bbd1152004-01-13 23:41:32 +0000375A \class{TarInfo} object represents one member in a
376\class{TarFile}. Aside from storing all required attributes of a file
377(like file type, size, time, permissions, owner etc.), it provides
378some useful methods to determine its type. It does \emph{not} contain
379the file's data itself.
Neal Norwitzb9ef4ae2003-01-05 23:19:43 +0000380
Fred Drake3bbd1152004-01-13 23:41:32 +0000381\class{TarInfo} objects are returned by \class{TarFile}'s methods
382\method{getmember()}, \method{getmembers()} and \method{gettarinfo()}.
Neal Norwitzb9ef4ae2003-01-05 23:19:43 +0000383
384\begin{classdesc}{TarInfo}{\optional{name}}
385 Create a \class{TarInfo} object.
386\end{classdesc}
387
Guido van Rossume7ba4952007-06-06 23:52:48 +0000388\begin{methoddesc}{frombuf}{buf}
389 Create and return a \class{TarInfo} object from string buffer \var{buf}.
Thomas Wouters902d6eb2007-01-09 23:18:33 +0000390 \versionadded[Raises \exception{HeaderError} if the buffer is
391 invalid.]{2.6}
Neal Norwitzb9ef4ae2003-01-05 23:19:43 +0000392\end{methoddesc}
393
Guido van Rossumd8faa362007-04-27 19:54:29 +0000394\begin{methoddesc}{fromtarfile}{tarfile}
395 Read the next member from the \class{TarFile} object \var{tarfile} and
396 return it as a \class{TarInfo} object.
397 \versionadded{2.6}
398\end{methoddesc}
Thomas Wouters477c8d52006-05-27 19:21:47 +0000399
Guido van Rossume7ba4952007-06-06 23:52:48 +0000400\begin{methoddesc}{tobuf}{\optional{format\optional{, encoding
401 \optional{, errors}}}}
402 Create a string buffer from a \class{TarInfo} object. For information
403 on the arguments see the constructor of the \class{TarFile} class.
404 \versionchanged[The arguments were added]{2.6}
Neal Norwitzb9ef4ae2003-01-05 23:19:43 +0000405\end{methoddesc}
406
407A \code{TarInfo} object has the following public data attributes:
Fred Drake3bbd1152004-01-13 23:41:32 +0000408
Neal Norwitzb9ef4ae2003-01-05 23:19:43 +0000409\begin{memberdesc}{name}
410 Name of the archive member.
411\end{memberdesc}
412
413\begin{memberdesc}{size}
414 Size in bytes.
415\end{memberdesc}
416
417\begin{memberdesc}{mtime}
418 Time of last modification.
419\end{memberdesc}
420
421\begin{memberdesc}{mode}
422 Permission bits.
423\end{memberdesc}
424
425\begin{memberdesc}{type}
Fred Drake3bbd1152004-01-13 23:41:32 +0000426 File type. \var{type} is usually one of these constants:
427 \constant{REGTYPE}, \constant{AREGTYPE}, \constant{LNKTYPE},
428 \constant{SYMTYPE}, \constant{DIRTYPE}, \constant{FIFOTYPE},
429 \constant{CONTTYPE}, \constant{CHRTYPE}, \constant{BLKTYPE},
430 \constant{GNUTYPE_SPARSE}. To determine the type of a
431 \class{TarInfo} object more conveniently, use the \code{is_*()}
432 methods below.
Neal Norwitzb9ef4ae2003-01-05 23:19:43 +0000433\end{memberdesc}
434
435\begin{memberdesc}{linkname}
Fred Drake3bbd1152004-01-13 23:41:32 +0000436 Name of the target file name, which is only present in
437 \class{TarInfo} objects of type \constant{LNKTYPE} and
438 \constant{SYMTYPE}.
Neal Norwitzb9ef4ae2003-01-05 23:19:43 +0000439\end{memberdesc}
440
Fred Drake3bbd1152004-01-13 23:41:32 +0000441\begin{memberdesc}{uid}
442 User ID of the user who originally stored this member.
Neal Norwitzb9ef4ae2003-01-05 23:19:43 +0000443\end{memberdesc}
444
Fred Drake3bbd1152004-01-13 23:41:32 +0000445\begin{memberdesc}{gid}
446 Group ID of the user who originally stored this member.
447\end{memberdesc}
448
449\begin{memberdesc}{uname}
450 User name.
451\end{memberdesc}
452
453\begin{memberdesc}{gname}
454 Group name.
Neal Norwitzb9ef4ae2003-01-05 23:19:43 +0000455\end{memberdesc}
456
Guido van Rossume7ba4952007-06-06 23:52:48 +0000457\begin{memberdesc}{pax_headers}
458 A dictionary containing key-value pairs of an associated pax
459 extended header.
460 \versionadded{2.6}
461\end{memberdesc}
462
Neal Norwitzb9ef4ae2003-01-05 23:19:43 +0000463A \class{TarInfo} object also provides some convenient query methods:
Fred Drake3bbd1152004-01-13 23:41:32 +0000464
Neal Norwitzb9ef4ae2003-01-05 23:19:43 +0000465\begin{methoddesc}{isfile}{}
Fred Drake3bbd1152004-01-13 23:41:32 +0000466 Return \constant{True} if the \class{Tarinfo} object is a regular
467 file.
Neal Norwitzb9ef4ae2003-01-05 23:19:43 +0000468\end{methoddesc}
469
470\begin{methoddesc}{isreg}{}
471 Same as \method{isfile()}.
472\end{methoddesc}
473
474\begin{methoddesc}{isdir}{}
Fred Drake3bbd1152004-01-13 23:41:32 +0000475 Return \constant{True} if it is a directory.
Neal Norwitzb9ef4ae2003-01-05 23:19:43 +0000476\end{methoddesc}
477
478\begin{methoddesc}{issym}{}
Fred Drake3bbd1152004-01-13 23:41:32 +0000479 Return \constant{True} if it is a symbolic link.
Neal Norwitzb9ef4ae2003-01-05 23:19:43 +0000480\end{methoddesc}
481
482\begin{methoddesc}{islnk}{}
Fred Drake3bbd1152004-01-13 23:41:32 +0000483 Return \constant{True} if it is a hard link.
Neal Norwitzb9ef4ae2003-01-05 23:19:43 +0000484\end{methoddesc}
485
486\begin{methoddesc}{ischr}{}
Fred Drake3bbd1152004-01-13 23:41:32 +0000487 Return \constant{True} if it is a character device.
Neal Norwitzb9ef4ae2003-01-05 23:19:43 +0000488\end{methoddesc}
489
490\begin{methoddesc}{isblk}{}
Fred Drake3bbd1152004-01-13 23:41:32 +0000491 Return \constant{True} if it is a block device.
Neal Norwitzb9ef4ae2003-01-05 23:19:43 +0000492\end{methoddesc}
493
494\begin{methoddesc}{isfifo}{}
Fred Drake3bbd1152004-01-13 23:41:32 +0000495 Return \constant{True} if it is a FIFO.
Neal Norwitzb9ef4ae2003-01-05 23:19:43 +0000496\end{methoddesc}
497
498\begin{methoddesc}{isdev}{}
Fred Drake3bbd1152004-01-13 23:41:32 +0000499 Return \constant{True} if it is one of character device, block
500 device or FIFO.
Neal Norwitzb9ef4ae2003-01-05 23:19:43 +0000501\end{methoddesc}
502
503%------------------------
504% Examples
505%------------------------
506
507\subsection{Examples \label{tar-examples}}
508
Martin v. Löwis00a73e72005-03-04 19:40:34 +0000509How to extract an entire tar archive to the current working directory:
510\begin{verbatim}
511import tarfile
512tar = tarfile.open("sample.tar.gz")
513tar.extractall()
514tar.close()
515\end{verbatim}
516
Neal Norwitzb9ef4ae2003-01-05 23:19:43 +0000517How to create an uncompressed tar archive from a list of filenames:
518\begin{verbatim}
519import tarfile
520tar = tarfile.open("sample.tar", "w")
521for name in ["foo", "bar", "quux"]:
522 tar.add(name)
523tar.close()
524\end{verbatim}
525
526How to read a gzip compressed tar archive and display some member information:
527\begin{verbatim}
528import tarfile
529tar = tarfile.open("sample.tar.gz", "r:gz")
530for tarinfo in tar:
531 print tarinfo.name, "is", tarinfo.size, "bytes in size and is",
532 if tarinfo.isreg():
533 print "a regular file."
534 elif tarinfo.isdir():
535 print "a directory."
536 else:
537 print "something else."
538tar.close()
539\end{verbatim}
540
541How to create a tar archive with faked information:
542\begin{verbatim}
543import tarfile
544tar = tarfile.open("sample.tar.gz", "w:gz")
545for name in namelist:
546 tarinfo = tar.gettarinfo(name, "fakeproj-1.0/" + name)
547 tarinfo.uid = 123
548 tarinfo.gid = 456
549 tarinfo.uname = "johndoe"
550 tarinfo.gname = "fake"
551 tar.addfile(tarinfo, file(name))
552tar.close()
553\end{verbatim}
554
555The \emph{only} way to extract an uncompressed tar stream from
556\code{sys.stdin}:
557\begin{verbatim}
558import sys
559import tarfile
560tar = tarfile.open(mode="r|", fileobj=sys.stdin)
561for tarinfo in tar:
562 tar.extract(tarinfo)
563tar.close()
564\end{verbatim}
Guido van Rossume7ba4952007-06-06 23:52:48 +0000565
566%------------
567% Tar format
568%------------
569
570\subsection{Supported tar formats \label{tar-formats}}
571
572There are three tar formats that can be created with the \module{tarfile}
573module:
574
575\begin{itemize}
576
577\item
578The \POSIX{}.1-1988 ustar format (\constant{USTAR_FORMAT}). It supports
579filenames up to a length of at best 256 characters and linknames up to 100
580characters. The maximum file size is 8 gigabytes. This is an old and limited
581but widely supported format.
582
583\item
584The GNU tar format (\constant{GNU_FORMAT}). It supports long filenames and
585linknames, files bigger than 8 gigabytes and sparse files. It is the de facto
586standard on GNU/Linux systems. \module{tarfile} fully supports the GNU tar
587extensions for long names, sparse file support is read-only.
588
589\item
590The \POSIX{}.1-2001 pax format (\constant{PAX_FORMAT}). It is the most
591flexible format with virtually no limits. It supports long filenames and
592linknames, large files and stores pathnames in a portable way. However, not
593all tar implementations today are able to handle pax archives properly.
594
595The \emph{pax} format is an extension to the existing \emph{ustar} format. It
596uses extra headers for information that cannot be stored otherwise. There are
597two flavours of pax headers: Extended headers only affect the subsequent file
598header, global headers are valid for the complete archive and affect all
599following files. All the data in a pax header is encoded in \emph{UTF-8} for
600portability reasons.
601
602\end{itemize}
603
604There are some more variants of the tar format which can be read, but not
605created:
606
607\begin{itemize}
608
609\item
610The ancient V7 format. This is the first tar format from \UNIX{} Seventh
611Edition, storing only regular files and directories. Names must not be longer
612than 100 characters, there is no user/group name information. Some archives
613have miscalculated header checksums in case of fields with non-\ASCII{}
614characters.
615
616\item
617The SunOS tar extended format. This format is a variant of the \POSIX{}.1-2001
618pax format, but is not compatible.
619
620\end{itemize}
621
622%----------------
623% Unicode issues
624%----------------
625
626\subsection{Unicode issues \label{tar-unicode}}
627
628The tar format was originally conceived to make backups on tape drives with the
629main focus on preserving file system information. Nowadays tar archives are
630commonly used for file distribution and exchanging archives over networks. One
631problem of the original format (that all other formats are merely variants of)
632is that there is no concept of supporting different character encodings.
633For example, an ordinary tar archive created on a \emph{UTF-8} system cannot be
634read correctly on a \emph{Latin-1} system if it contains non-\ASCII{}
635characters. Names (i.e. filenames, linknames, user/group names) containing
636these characters will appear damaged. Unfortunately, there is no way to
637autodetect the encoding of an archive.
638
639The pax format was designed to solve this problem. It stores non-\ASCII{} names
640using the universal character encoding \emph{UTF-8}. When a pax archive is
641read, these \emph{UTF-8} names are converted to the encoding of the local
642file system.
643
644The details of unicode conversion are controlled by the \var{encoding} and
645\var{errors} keyword arguments of the \class{TarFile} class.
646
647The default value for \var{encoding} is the local character encoding. It is
648deduced from \function{sys.getfilesystemencoding()} and
649\function{sys.getdefaultencoding()}. In read mode, \var{encoding} is used
650exclusively to convert unicode names from a pax archive to strings in the local
651character encoding. In write mode, the use of \var{encoding} depends on the
652chosen archive format. In case of \constant{PAX_FORMAT}, input names that
653contain non-\ASCII{} characters need to be decoded before being stored as
654\emph{UTF-8} strings. The other formats do not make use of \var{encoding}
655unless unicode objects are used as input names. These are converted to
6568-bit character strings before they are added to the archive.
657
658The \var{errors} argument defines how characters are treated that cannot be
659converted to or from \var{encoding}. Possible values are listed in section
660\ref{codec-base-classes}. In read mode, there is an additional scheme
661\code{'utf-8'} which means that bad characters are replaced by their
662\emph{UTF-8} representation. This is the default scheme. In write mode the
663default value for \var{errors} is \code{'strict'} to ensure that name
664information is not altered unnoticed.