blob: 67207a1f06817411f356d151c4e23bbf8ed25fa0 [file] [log] [blame]
Neal Norwitzb9ef4ae2003-01-05 23:19:43 +00001\section{\module{tarfile} --- Read and write tar archive files}
2
3\declaremodule{standard}{tarfile}
4\modulesynopsis{Read and write tar-format archive files.}
5\versionadded{2.3}
6
7\moduleauthor{Lars Gust\"abel}{lars@gustaebel.de}
8\sectionauthor{Lars Gust\"abel}{lars@gustaebel.de}
9
10The \module{tarfile} module makes it possible to read and create tar archives.
11Some facts and figures:
12
13\begin{itemize}
14\item reads and writes \module{gzip} and \module{bzip2} compressed archives.
Lars Gustäbelc64e4022007-03-13 10:47:19 +000015\item read/write support for the \POSIX{}.1-1988 (ustar) format.
16\item read/write support for the GNU tar format including \emph{longname} and
17 \emph{longlink} extensions, read-only support for the \emph{sparse}
18 extension.
19\item read/write support for the \POSIX{}.1-2001 (pax) format.
20 \versionadded{2.6}
Neal Norwitzb9ef4ae2003-01-05 23:19:43 +000021\item handles directories, regular files, hardlinks, symbolic links, fifos,
22 character devices and block devices and is able to acquire and
23 restore file information like timestamp, access permissions and owner.
24\item can handle tape devices.
25\end{itemize}
26
Lars Gustäbelc64e4022007-03-13 10:47:19 +000027\begin{funcdesc}{open}{name\optional{, mode\optional{,
28 fileobj\optional{, bufsize}}}, **kwargs}
Neal Norwitzb9ef4ae2003-01-05 23:19:43 +000029 Return a \class{TarFile} object for the pathname \var{name}.
Lars Gustäbelc64e4022007-03-13 10:47:19 +000030 For detailed information on \class{TarFile} objects and the keyword
31 arguments that are allowed, see \citetitle{TarFile Objects}
32 (section \ref{tarfile-objects}).
Neal Norwitzb9ef4ae2003-01-05 23:19:43 +000033
34 \var{mode} has to be a string of the form \code{'filemode[:compression]'},
35 it defaults to \code{'r'}. Here is a full list of mode combinations:
36
37 \begin{tableii}{c|l}{code}{mode}{action}
Martin v. Löwis78be7df2005-03-05 12:47:42 +000038 \lineii{'r' or 'r:*'}{Open for reading with transparent compression (recommended).}
Neal Norwitzb9ef4ae2003-01-05 23:19:43 +000039 \lineii{'r:'}{Open for reading exclusively without compression.}
40 \lineii{'r:gz'}{Open for reading with gzip compression.}
41 \lineii{'r:bz2'}{Open for reading with bzip2 compression.}
Lars Gustäbel3f8aca12007-02-06 18:38:13 +000042 \lineii{'a' or 'a:'}{Open for appending with no compression. The file
43 is created if it does not exist.}
Neal Norwitzb9ef4ae2003-01-05 23:19:43 +000044 \lineii{'w' or 'w:'}{Open for uncompressed writing.}
45 \lineii{'w:gz'}{Open for gzip compressed writing.}
46 \lineii{'w:bz2'}{Open for bzip2 compressed writing.}
47 \end{tableii}
48
49 Note that \code{'a:gz'} or \code{'a:bz2'} is not possible.
50 If \var{mode} is not suitable to open a certain (compressed) file for
51 reading, \exception{ReadError} is raised. Use \var{mode} \code{'r'} to
52 avoid this. If a compression method is not supported,
53 \exception{CompressionError} is raised.
54
Lars Gustäbela69aa322007-02-12 09:25:53 +000055 If \var{fileobj} is specified, it is used as an alternative to a file
56 object opened for \var{name}. It is supposed to be at position 0.
Neal Norwitzb9ef4ae2003-01-05 23:19:43 +000057
58 For special purposes, there is a second format for \var{mode}:
Fred Drake3bbd1152004-01-13 23:41:32 +000059 \code{'filemode|[compression]'}. \function{open()} will return a
60 \class{TarFile} object that processes its data as a stream of
61 blocks. No random seeking will be done on the file. If given,
62 \var{fileobj} may be any object that has a \method{read()} or
63 \method{write()} method (depending on the \var{mode}).
64 \var{bufsize} specifies the blocksize and defaults to \code{20 *
65 512} bytes. Use this variant in combination with
66 e.g. \code{sys.stdin}, a socket file object or a tape device.
67 However, such a \class{TarFile} object is limited in that it does
68 not allow to be accessed randomly, see ``Examples''
69 (section~\ref{tar-examples}). The currently possible modes:
Neal Norwitzb9ef4ae2003-01-05 23:19:43 +000070
Fred Drake3bbd1152004-01-13 23:41:32 +000071 \begin{tableii}{c|l}{code}{Mode}{Action}
Martin v. Löwis78be7df2005-03-05 12:47:42 +000072 \lineii{'r|*'}{Open a \emph{stream} of tar blocks for reading with transparent compression.}
Neal Norwitzb9ef4ae2003-01-05 23:19:43 +000073 \lineii{'r|'}{Open a \emph{stream} of uncompressed tar blocks for reading.}
74 \lineii{'r|gz'}{Open a gzip compressed \emph{stream} for reading.}
75 \lineii{'r|bz2'}{Open a bzip2 compressed \emph{stream} for reading.}
76 \lineii{'w|'}{Open an uncompressed \emph{stream} for writing.}
77 \lineii{'w|gz'}{Open an gzip compressed \emph{stream} for writing.}
78 \lineii{'w|bz2'}{Open an bzip2 compressed \emph{stream} for writing.}
79 \end{tableii}
80\end{funcdesc}
81
82\begin{classdesc*}{TarFile}
83 Class for reading and writing tar archives. Do not use this
84 class directly, better use \function{open()} instead.
Fred Drake3bbd1152004-01-13 23:41:32 +000085 See ``TarFile Objects'' (section~\ref{tarfile-objects}).
Neal Norwitzb9ef4ae2003-01-05 23:19:43 +000086\end{classdesc*}
87
88\begin{funcdesc}{is_tarfile}{name}
Fred Drake3bbd1152004-01-13 23:41:32 +000089 Return \constant{True} if \var{name} is a tar archive file, that
90 the \module{tarfile} module can read.
Neal Norwitzb9ef4ae2003-01-05 23:19:43 +000091\end{funcdesc}
92
93\begin{classdesc}{TarFileCompat}{filename\optional{, mode\optional{,
Fred Drake3bbd1152004-01-13 23:41:32 +000094 compression}}}
95 Class for limited access to tar archives with a
96 \refmodule{zipfile}-like interface. Please consult the
97 documentation of the \refmodule{zipfile} module for more details.
98 \var{compression} must be one of the following constants:
Neal Norwitzb9ef4ae2003-01-05 23:19:43 +000099 \begin{datadesc}{TAR_PLAIN}
100 Constant for an uncompressed tar archive.
101 \end{datadesc}
102 \begin{datadesc}{TAR_GZIPPED}
Fred Drake3bbd1152004-01-13 23:41:32 +0000103 Constant for a \refmodule{gzip} compressed tar archive.
Neal Norwitzb9ef4ae2003-01-05 23:19:43 +0000104 \end{datadesc}
105\end{classdesc}
106
107\begin{excdesc}{TarError}
108 Base class for all \module{tarfile} exceptions.
109\end{excdesc}
110
111\begin{excdesc}{ReadError}
112 Is raised when a tar archive is opened, that either cannot be handled by
113 the \module{tarfile} module or is somehow invalid.
114\end{excdesc}
115
116\begin{excdesc}{CompressionError}
117 Is raised when a compression method is not supported or when the data
118 cannot be decoded properly.
119\end{excdesc}
120
121\begin{excdesc}{StreamError}
122 Is raised for the limitations that are typical for stream-like
123 \class{TarFile} objects.
124\end{excdesc}
125
126\begin{excdesc}{ExtractError}
127 Is raised for \emph{non-fatal} errors when using \method{extract()}, but
128 only if \member{TarFile.errorlevel}\code{ == 2}.
129\end{excdesc}
130
Georg Brandlebbeed72006-12-19 22:06:46 +0000131\begin{excdesc}{HeaderError}
132 Is raised by \method{frombuf()} if the buffer it gets is invalid.
133 \versionadded{2.6}
134\end{excdesc}
135
Lars Gustäbela0fcb932007-05-27 19:49:30 +0000136Each of the following constants defines a tar archive format that the
137\module{tarfile} module is able to create. See section \ref{tar-formats} for
138details.
139
Lars Gustäbelc64e4022007-03-13 10:47:19 +0000140\begin{datadesc}{USTAR_FORMAT}
Lars Gustäbela0fcb932007-05-27 19:49:30 +0000141 \POSIX{}.1-1988 (ustar) format.
Lars Gustäbelc64e4022007-03-13 10:47:19 +0000142\end{datadesc}
143
144\begin{datadesc}{GNU_FORMAT}
Lars Gustäbela0fcb932007-05-27 19:49:30 +0000145 GNU tar format.
Lars Gustäbelc64e4022007-03-13 10:47:19 +0000146\end{datadesc}
147
148\begin{datadesc}{PAX_FORMAT}
Lars Gustäbela0fcb932007-05-27 19:49:30 +0000149 \POSIX{}.1-2001 (pax) format.
Lars Gustäbelc64e4022007-03-13 10:47:19 +0000150\end{datadesc}
151
152\begin{datadesc}{DEFAULT_FORMAT}
153 The default format for creating archives. This is currently
154 \constant{GNU_FORMAT}.
155\end{datadesc}
156
Neal Norwitzb9ef4ae2003-01-05 23:19:43 +0000157\begin{seealso}
Fred Drake3bbd1152004-01-13 23:41:32 +0000158 \seemodule{zipfile}{Documentation of the \refmodule{zipfile}
Neal Norwitzb9ef4ae2003-01-05 23:19:43 +0000159 standard module.}
160
George Yoshidad7716722006-04-28 16:40:14 +0000161 \seetitle[http://www.gnu.org/software/tar/manual/html_node/tar_134.html\#SEC134]
Georg Brandl9a19e5c2005-08-27 17:10:35 +0000162 {GNU tar manual, Basic Tar Format}{Documentation for tar archive files,
Neal Norwitzb9ef4ae2003-01-05 23:19:43 +0000163 including GNU tar extensions.}
164\end{seealso}
165
166%-----------------
167% TarFile Objects
168%-----------------
169
170\subsection{TarFile Objects \label{tarfile-objects}}
171
172The \class{TarFile} object provides an interface to a tar archive. A tar
173archive is a sequence of blocks. An archive member (a stored file) is made up
Lars Gustäbela0fcb932007-05-27 19:49:30 +0000174of a header block followed by data blocks. It is possible to store a file in a
Neal Norwitzb9ef4ae2003-01-05 23:19:43 +0000175tar archive several times. Each archive member is represented by a
176\class{TarInfo} object, see \citetitle{TarInfo Objects} (section
177\ref{tarinfo-objects}) for details.
178
Lars Gustäbelc64e4022007-03-13 10:47:19 +0000179\begin{classdesc}{TarFile}{name=None, mode='r', fileobj=None,
180 format=DEFAULT_FORMAT, tarinfo=TarInfo, dereference=False,
Lars Gustäbela0fcb932007-05-27 19:49:30 +0000181 ignore_zeros=False, encoding=None, errors=None, pax_headers=None,
182 debug=0, errorlevel=0}
Lars Gustäbelc64e4022007-03-13 10:47:19 +0000183
184 All following arguments are optional and can be accessed as instance
185 attributes as well.
186
187 \var{name} is the pathname of the archive. It can be omitted if
188 \var{fileobj} is given. In this case, the file object's \member{name}
189 attribute is used if it exists.
190
Neal Norwitzb9ef4ae2003-01-05 23:19:43 +0000191 \var{mode} is either \code{'r'} to read from an existing archive,
192 \code{'a'} to append data to an existing file or \code{'w'} to create a new
Lars Gustäbelc64e4022007-03-13 10:47:19 +0000193 file overwriting an existing one.
Neal Norwitzb9ef4ae2003-01-05 23:19:43 +0000194
195 If \var{fileobj} is given, it is used for reading or writing data.
196 If it can be determined, \var{mode} is overridden by \var{fileobj}'s mode.
Lars Gustäbela69aa322007-02-12 09:25:53 +0000197 \var{fileobj} will be used from position 0.
Neal Norwitzb9ef4ae2003-01-05 23:19:43 +0000198 \begin{notice}
199 \var{fileobj} is not closed, when \class{TarFile} is closed.
200 \end{notice}
Lars Gustäbelc64e4022007-03-13 10:47:19 +0000201
202 \var{format} controls the archive format. It must be one of the constants
203 \constant{USTAR_FORMAT}, \constant{GNU_FORMAT} or \constant{PAX_FORMAT}
204 that are defined at module level.
205 \versionadded{2.6}
206
207 The \var{tarinfo} argument can be used to replace the default
208 \class{TarInfo} class with a different one.
209 \versionadded{2.6}
210
211 If \var{dereference} is \code{False}, add symbolic and hard links to the
212 archive. If it is \code{True}, add the content of the target files to the
213 archive. This has no effect on systems that do not support symbolic links.
214
215 If \var{ignore_zeros} is \code{False}, treat an empty block as the end of
216 the archive. If it is \var{True}, skip empty (and invalid) blocks and try
217 to get as many members as possible. This is only useful for reading
218 concatenated or damaged archives.
219
220 \var{debug} can be set from \code{0} (no debug messages) up to \code{3}
221 (all debug messages). The messages are written to \code{sys.stderr}.
222
223 If \var{errorlevel} is \code{0}, all errors are ignored when using
224 \method{extract()}. Nevertheless, they appear as error messages in the
225 debug output, when debugging is enabled. If \code{1}, all \emph{fatal}
226 errors are raised as \exception{OSError} or \exception{IOError} exceptions.
227 If \code{2}, all \emph{non-fatal} errors are raised as \exception{TarError}
228 exceptions as well.
229
Lars Gustäbela0fcb932007-05-27 19:49:30 +0000230 The \var{encoding} and \var{errors} arguments control the way strings are
231 converted to unicode objects and vice versa. The default settings will work
232 for most users. See section \ref{tar-unicode} for in-depth information.
Lars Gustäbelc64e4022007-03-13 10:47:19 +0000233 \versionadded{2.6}
234
Lars Gustäbela0fcb932007-05-27 19:49:30 +0000235 The \var{pax_headers} argument is an optional dictionary of unicode strings
236 which will be added as a pax global header if \var{format} is
237 \constant{PAX_FORMAT}.
Lars Gustäbelc64e4022007-03-13 10:47:19 +0000238 \versionadded{2.6}
Neal Norwitzb9ef4ae2003-01-05 23:19:43 +0000239\end{classdesc}
240
241\begin{methoddesc}{open}{...}
242 Alternative constructor. The \function{open()} function on module level is
Fred Drake3bbd1152004-01-13 23:41:32 +0000243 actually a shortcut to this classmethod. See section~\ref{module-tarfile}
Neal Norwitzb9ef4ae2003-01-05 23:19:43 +0000244 for details.
245\end{methoddesc}
246
247\begin{methoddesc}{getmember}{name}
248 Return a \class{TarInfo} object for member \var{name}. If \var{name} can
249 not be found in the archive, \exception{KeyError} is raised.
250 \begin{notice}
251 If a member occurs more than once in the archive, its last
Johannes Gijsbersd3452252004-09-11 16:50:06 +0000252 occurrence is assumed to be the most up-to-date version.
Neal Norwitzb9ef4ae2003-01-05 23:19:43 +0000253 \end{notice}
254\end{methoddesc}
255
256\begin{methoddesc}{getmembers}{}
257 Return the members of the archive as a list of \class{TarInfo} objects.
258 The list has the same order as the members in the archive.
259\end{methoddesc}
260
261\begin{methoddesc}{getnames}{}
262 Return the members as a list of their names. It has the same order as
263 the list returned by \method{getmembers()}.
264\end{methoddesc}
265
266\begin{methoddesc}{list}{verbose=True}
267 Print a table of contents to \code{sys.stdout}. If \var{verbose} is
Fred Drake3bbd1152004-01-13 23:41:32 +0000268 \constant{False}, only the names of the members are printed. If it is
269 \constant{True}, output similar to that of \program{ls -l} is produced.
Neal Norwitzb9ef4ae2003-01-05 23:19:43 +0000270\end{methoddesc}
271
272\begin{methoddesc}{next}{}
273 Return the next member of the archive as a \class{TarInfo} object, when
274 \class{TarFile} is opened for reading. Return \code{None} if there is no
275 more available.
276\end{methoddesc}
277
Martin v. Löwis00a73e72005-03-04 19:40:34 +0000278\begin{methoddesc}{extractall}{\optional{path\optional{, members}}}
279 Extract all members from the archive to the current working directory
280 or directory \var{path}. If optional \var{members} is given, it must be
281 a subset of the list returned by \method{getmembers()}.
Lars Gustäbela0fcb932007-05-27 19:49:30 +0000282 Directory information like owner, modification time and permissions are
Martin v. Löwis00a73e72005-03-04 19:40:34 +0000283 set after all members have been extracted. This is done to work around two
284 problems: A directory's modification time is reset each time a file is
285 created in it. And, if a directory's permissions do not allow writing,
286 extracting files to it will fail.
287 \versionadded{2.5}
288\end{methoddesc}
289
Neal Norwitzb9ef4ae2003-01-05 23:19:43 +0000290\begin{methoddesc}{extract}{member\optional{, path}}
291 Extract a member from the archive to the current working directory,
292 using its full name. Its file information is extracted as accurately as
293 possible.
294 \var{member} may be a filename or a \class{TarInfo} object.
295 You can specify a different directory using \var{path}.
Martin v. Löwis00a73e72005-03-04 19:40:34 +0000296 \begin{notice}
297 Because the \method{extract()} method allows random access to a tar
298 archive there are some issues you must take care of yourself. See the
299 description for \method{extractall()} above.
300 \end{notice}
Neal Norwitzb9ef4ae2003-01-05 23:19:43 +0000301\end{methoddesc}
302
303\begin{methoddesc}{extractfile}{member}
304 Extract a member from the archive as a file object.
305 \var{member} may be a filename or a \class{TarInfo} object.
306 If \var{member} is a regular file, a file-like object is returned.
307 If \var{member} is a link, a file-like object is constructed from the
308 link's target.
309 If \var{member} is none of the above, \code{None} is returned.
310 \begin{notice}
311 The file-like object is read-only and provides the following methods:
312 \method{read()}, \method{readline()}, \method{readlines()},
313 \method{seek()}, \method{tell()}.
314 \end{notice}
315\end{methoddesc}
316
Lars Gustäbel104490e2007-06-18 11:42:11 +0000317\begin{methoddesc}{add}{name\optional{, arcname\optional{, recursive\optional{, exclude}}}}
Neal Norwitzb9ef4ae2003-01-05 23:19:43 +0000318 Add the file \var{name} to the archive. \var{name} may be any type
319 of file (directory, fifo, symbolic link, etc.).
320 If given, \var{arcname} specifies an alternative name for the file in the
321 archive. Directories are added recursively by default.
Lars Gustäbel104490e2007-06-18 11:42:11 +0000322 This can be avoided by setting \var{recursive} to \constant{False}.
323 If \var{exclude} is given it must be a function that takes one filename
324 argument and returns a boolean value. Depending on this value the
325 respective file is either excluded (\constant{True}) or added
326 (\constant{False}).
Neal Norwitzb9ef4ae2003-01-05 23:19:43 +0000327\end{methoddesc}
328
329\begin{methoddesc}{addfile}{tarinfo\optional{, fileobj}}
330 Add the \class{TarInfo} object \var{tarinfo} to the archive.
Fred Drake3bbd1152004-01-13 23:41:32 +0000331 If \var{fileobj} is given, \code{\var{tarinfo}.size} bytes are read
Neal Norwitzb9ef4ae2003-01-05 23:19:43 +0000332 from it and added to the archive. You can create \class{TarInfo} objects
333 using \method{gettarinfo()}.
334 \begin{notice}
335 On Windows platforms, \var{fileobj} should always be opened with mode
336 \code{'rb'} to avoid irritation about the file size.
337 \end{notice}
338\end{methoddesc}
339
Fred Drake3bbd1152004-01-13 23:41:32 +0000340\begin{methoddesc}{gettarinfo}{\optional{name\optional{,
341 arcname\optional{, fileobj}}}}
342 Create a \class{TarInfo} object for either the file \var{name} or
343 the file object \var{fileobj} (using \function{os.fstat()} on its
344 file descriptor). You can modify some of the \class{TarInfo}'s
345 attributes before you add it using \method{addfile()}. If given,
346 \var{arcname} specifies an alternative name for the file in the
Neal Norwitzb9ef4ae2003-01-05 23:19:43 +0000347 archive.
348\end{methoddesc}
349
350\begin{methoddesc}{close}{}
Fred Drake3bbd1152004-01-13 23:41:32 +0000351 Close the \class{TarFile}. In write mode, two finishing zero
352 blocks are appended to the archive.
Neal Norwitzb9ef4ae2003-01-05 23:19:43 +0000353\end{methoddesc}
354
Fred Drake3bbd1152004-01-13 23:41:32 +0000355\begin{memberdesc}{posix}
Lars Gustäbelc64e4022007-03-13 10:47:19 +0000356 Setting this to \constant{True} is equivalent to setting the
357 \member{format} attribute to \constant{USTAR_FORMAT},
358 \constant{False} is equivalent to \constant{GNU_FORMAT}.
Neal Norwitz525b3152004-08-20 01:52:42 +0000359 \versionchanged[\var{posix} defaults to \constant{False}]{2.4}
Lars Gustäbelc64e4022007-03-13 10:47:19 +0000360 \deprecated{2.6}{Use the \member{format} attribute instead.}
Neal Norwitzb9ef4ae2003-01-05 23:19:43 +0000361\end{memberdesc}
362
Lars Gustäbela0fcb932007-05-27 19:49:30 +0000363\begin{memberdesc}{pax_headers}
364 A dictionary containing key-value pairs of pax global headers.
365 \versionadded{2.6}
366\end{memberdesc}
367
Neal Norwitzb9ef4ae2003-01-05 23:19:43 +0000368%-----------------
369% TarInfo Objects
370%-----------------
371
372\subsection{TarInfo Objects \label{tarinfo-objects}}
373
Fred Drake3bbd1152004-01-13 23:41:32 +0000374A \class{TarInfo} object represents one member in a
375\class{TarFile}. Aside from storing all required attributes of a file
376(like file type, size, time, permissions, owner etc.), it provides
377some useful methods to determine its type. It does \emph{not} contain
378the file's data itself.
Neal Norwitzb9ef4ae2003-01-05 23:19:43 +0000379
Fred Drake3bbd1152004-01-13 23:41:32 +0000380\class{TarInfo} objects are returned by \class{TarFile}'s methods
381\method{getmember()}, \method{getmembers()} and \method{gettarinfo()}.
Neal Norwitzb9ef4ae2003-01-05 23:19:43 +0000382
383\begin{classdesc}{TarInfo}{\optional{name}}
384 Create a \class{TarInfo} object.
385\end{classdesc}
386
Lars Gustäbela0fcb932007-05-27 19:49:30 +0000387\begin{methoddesc}{frombuf}{buf}
388 Create and return a \class{TarInfo} object from string buffer \var{buf}.
Georg Brandlebbeed72006-12-19 22:06:46 +0000389 \versionadded[Raises \exception{HeaderError} if the buffer is
390 invalid.]{2.6}
Neal Norwitzb9ef4ae2003-01-05 23:19:43 +0000391\end{methoddesc}
392
Lars Gustäbelc64e4022007-03-13 10:47:19 +0000393\begin{methoddesc}{fromtarfile}{tarfile}
394 Read the next member from the \class{TarFile} object \var{tarfile} and
395 return it as a \class{TarInfo} object.
396 \versionadded{2.6}
397\end{methoddesc}
Georg Brandl38c6a222006-05-10 16:26:03 +0000398
Lars Gustäbela0fcb932007-05-27 19:49:30 +0000399\begin{methoddesc}{tobuf}{\optional{format\optional{, encoding
400 \optional{, errors}}}}
401 Create a string buffer from a \class{TarInfo} object. For information
402 on the arguments see the constructor of the \class{TarFile} class.
403 \versionchanged[The arguments were added]{2.6}
Neal Norwitzb9ef4ae2003-01-05 23:19:43 +0000404\end{methoddesc}
405
406A \code{TarInfo} object has the following public data attributes:
Fred Drake3bbd1152004-01-13 23:41:32 +0000407
Neal Norwitzb9ef4ae2003-01-05 23:19:43 +0000408\begin{memberdesc}{name}
409 Name of the archive member.
410\end{memberdesc}
411
412\begin{memberdesc}{size}
413 Size in bytes.
414\end{memberdesc}
415
416\begin{memberdesc}{mtime}
417 Time of last modification.
418\end{memberdesc}
419
420\begin{memberdesc}{mode}
421 Permission bits.
422\end{memberdesc}
423
424\begin{memberdesc}{type}
Fred Drake3bbd1152004-01-13 23:41:32 +0000425 File type. \var{type} is usually one of these constants:
426 \constant{REGTYPE}, \constant{AREGTYPE}, \constant{LNKTYPE},
427 \constant{SYMTYPE}, \constant{DIRTYPE}, \constant{FIFOTYPE},
428 \constant{CONTTYPE}, \constant{CHRTYPE}, \constant{BLKTYPE},
429 \constant{GNUTYPE_SPARSE}. To determine the type of a
430 \class{TarInfo} object more conveniently, use the \code{is_*()}
431 methods below.
Neal Norwitzb9ef4ae2003-01-05 23:19:43 +0000432\end{memberdesc}
433
434\begin{memberdesc}{linkname}
Fred Drake3bbd1152004-01-13 23:41:32 +0000435 Name of the target file name, which is only present in
436 \class{TarInfo} objects of type \constant{LNKTYPE} and
437 \constant{SYMTYPE}.
Neal Norwitzb9ef4ae2003-01-05 23:19:43 +0000438\end{memberdesc}
439
Fred Drake3bbd1152004-01-13 23:41:32 +0000440\begin{memberdesc}{uid}
441 User ID of the user who originally stored this member.
Neal Norwitzb9ef4ae2003-01-05 23:19:43 +0000442\end{memberdesc}
443
Fred Drake3bbd1152004-01-13 23:41:32 +0000444\begin{memberdesc}{gid}
445 Group ID of the user who originally stored this member.
446\end{memberdesc}
447
448\begin{memberdesc}{uname}
449 User name.
450\end{memberdesc}
451
452\begin{memberdesc}{gname}
453 Group name.
Neal Norwitzb9ef4ae2003-01-05 23:19:43 +0000454\end{memberdesc}
455
Lars Gustäbela0fcb932007-05-27 19:49:30 +0000456\begin{memberdesc}{pax_headers}
457 A dictionary containing key-value pairs of an associated pax
458 extended header.
459 \versionadded{2.6}
460\end{memberdesc}
461
Neal Norwitzb9ef4ae2003-01-05 23:19:43 +0000462A \class{TarInfo} object also provides some convenient query methods:
Fred Drake3bbd1152004-01-13 23:41:32 +0000463
Neal Norwitzb9ef4ae2003-01-05 23:19:43 +0000464\begin{methoddesc}{isfile}{}
Fred Drake3bbd1152004-01-13 23:41:32 +0000465 Return \constant{True} if the \class{Tarinfo} object is a regular
466 file.
Neal Norwitzb9ef4ae2003-01-05 23:19:43 +0000467\end{methoddesc}
468
469\begin{methoddesc}{isreg}{}
470 Same as \method{isfile()}.
471\end{methoddesc}
472
473\begin{methoddesc}{isdir}{}
Fred Drake3bbd1152004-01-13 23:41:32 +0000474 Return \constant{True} if it is a directory.
Neal Norwitzb9ef4ae2003-01-05 23:19:43 +0000475\end{methoddesc}
476
477\begin{methoddesc}{issym}{}
Fred Drake3bbd1152004-01-13 23:41:32 +0000478 Return \constant{True} if it is a symbolic link.
Neal Norwitzb9ef4ae2003-01-05 23:19:43 +0000479\end{methoddesc}
480
481\begin{methoddesc}{islnk}{}
Fred Drake3bbd1152004-01-13 23:41:32 +0000482 Return \constant{True} if it is a hard link.
Neal Norwitzb9ef4ae2003-01-05 23:19:43 +0000483\end{methoddesc}
484
485\begin{methoddesc}{ischr}{}
Fred Drake3bbd1152004-01-13 23:41:32 +0000486 Return \constant{True} if it is a character device.
Neal Norwitzb9ef4ae2003-01-05 23:19:43 +0000487\end{methoddesc}
488
489\begin{methoddesc}{isblk}{}
Fred Drake3bbd1152004-01-13 23:41:32 +0000490 Return \constant{True} if it is a block device.
Neal Norwitzb9ef4ae2003-01-05 23:19:43 +0000491\end{methoddesc}
492
493\begin{methoddesc}{isfifo}{}
Fred Drake3bbd1152004-01-13 23:41:32 +0000494 Return \constant{True} if it is a FIFO.
Neal Norwitzb9ef4ae2003-01-05 23:19:43 +0000495\end{methoddesc}
496
497\begin{methoddesc}{isdev}{}
Fred Drake3bbd1152004-01-13 23:41:32 +0000498 Return \constant{True} if it is one of character device, block
499 device or FIFO.
Neal Norwitzb9ef4ae2003-01-05 23:19:43 +0000500\end{methoddesc}
501
502%------------------------
503% Examples
504%------------------------
505
506\subsection{Examples \label{tar-examples}}
507
Martin v. Löwis00a73e72005-03-04 19:40:34 +0000508How to extract an entire tar archive to the current working directory:
509\begin{verbatim}
510import tarfile
511tar = tarfile.open("sample.tar.gz")
512tar.extractall()
513tar.close()
514\end{verbatim}
515
Neal Norwitzb9ef4ae2003-01-05 23:19:43 +0000516How to create an uncompressed tar archive from a list of filenames:
517\begin{verbatim}
518import tarfile
519tar = tarfile.open("sample.tar", "w")
520for name in ["foo", "bar", "quux"]:
521 tar.add(name)
522tar.close()
523\end{verbatim}
524
525How to read a gzip compressed tar archive and display some member information:
526\begin{verbatim}
527import tarfile
528tar = tarfile.open("sample.tar.gz", "r:gz")
529for tarinfo in tar:
530 print tarinfo.name, "is", tarinfo.size, "bytes in size and is",
531 if tarinfo.isreg():
532 print "a regular file."
533 elif tarinfo.isdir():
534 print "a directory."
535 else:
536 print "something else."
537tar.close()
538\end{verbatim}
539
540How to create a tar archive with faked information:
541\begin{verbatim}
542import tarfile
543tar = tarfile.open("sample.tar.gz", "w:gz")
544for name in namelist:
545 tarinfo = tar.gettarinfo(name, "fakeproj-1.0/" + name)
546 tarinfo.uid = 123
547 tarinfo.gid = 456
548 tarinfo.uname = "johndoe"
549 tarinfo.gname = "fake"
550 tar.addfile(tarinfo, file(name))
551tar.close()
552\end{verbatim}
553
554The \emph{only} way to extract an uncompressed tar stream from
555\code{sys.stdin}:
556\begin{verbatim}
557import sys
558import tarfile
559tar = tarfile.open(mode="r|", fileobj=sys.stdin)
560for tarinfo in tar:
561 tar.extract(tarinfo)
562tar.close()
563\end{verbatim}
Lars Gustäbela0fcb932007-05-27 19:49:30 +0000564
565%------------
566% Tar format
567%------------
568
569\subsection{Supported tar formats \label{tar-formats}}
570
571There are three tar formats that can be created with the \module{tarfile}
572module:
573
574\begin{itemize}
575
576\item
577The \POSIX{}.1-1988 ustar format (\constant{USTAR_FORMAT}). It supports
578filenames up to a length of at best 256 characters and linknames up to 100
579characters. The maximum file size is 8 gigabytes. This is an old and limited
580but widely supported format.
581
582\item
583The GNU tar format (\constant{GNU_FORMAT}). It supports long filenames and
584linknames, files bigger than 8 gigabytes and sparse files. It is the de facto
585standard on GNU/Linux systems. \module{tarfile} fully supports the GNU tar
586extensions for long names, sparse file support is read-only.
587
588\item
589The \POSIX{}.1-2001 pax format (\constant{PAX_FORMAT}). It is the most
590flexible format with virtually no limits. It supports long filenames and
591linknames, large files and stores pathnames in a portable way. However, not
592all tar implementations today are able to handle pax archives properly.
593
594The \emph{pax} format is an extension to the existing \emph{ustar} format. It
595uses extra headers for information that cannot be stored otherwise. There are
596two flavours of pax headers: Extended headers only affect the subsequent file
597header, global headers are valid for the complete archive and affect all
598following files. All the data in a pax header is encoded in \emph{UTF-8} for
599portability reasons.
600
601\end{itemize}
602
603There are some more variants of the tar format which can be read, but not
604created:
605
606\begin{itemize}
607
608\item
609The ancient V7 format. This is the first tar format from \UNIX{} Seventh
610Edition, storing only regular files and directories. Names must not be longer
611than 100 characters, there is no user/group name information. Some archives
612have miscalculated header checksums in case of fields with non-\ASCII{}
613characters.
614
615\item
616The SunOS tar extended format. This format is a variant of the \POSIX{}.1-2001
617pax format, but is not compatible.
618
619\end{itemize}
620
621%----------------
622% Unicode issues
623%----------------
624
625\subsection{Unicode issues \label{tar-unicode}}
626
627The tar format was originally conceived to make backups on tape drives with the
628main focus on preserving file system information. Nowadays tar archives are
629commonly used for file distribution and exchanging archives over networks. One
630problem of the original format (that all other formats are merely variants of)
631is that there is no concept of supporting different character encodings.
632For example, an ordinary tar archive created on a \emph{UTF-8} system cannot be
633read correctly on a \emph{Latin-1} system if it contains non-\ASCII{}
634characters. Names (i.e. filenames, linknames, user/group names) containing
635these characters will appear damaged. Unfortunately, there is no way to
636autodetect the encoding of an archive.
637
638The pax format was designed to solve this problem. It stores non-\ASCII{} names
639using the universal character encoding \emph{UTF-8}. When a pax archive is
640read, these \emph{UTF-8} names are converted to the encoding of the local
641file system.
642
643The details of unicode conversion are controlled by the \var{encoding} and
644\var{errors} keyword arguments of the \class{TarFile} class.
645
646The default value for \var{encoding} is the local character encoding. It is
647deduced from \function{sys.getfilesystemencoding()} and
648\function{sys.getdefaultencoding()}. In read mode, \var{encoding} is used
649exclusively to convert unicode names from a pax archive to strings in the local
650character encoding. In write mode, the use of \var{encoding} depends on the
651chosen archive format. In case of \constant{PAX_FORMAT}, input names that
652contain non-\ASCII{} characters need to be decoded before being stored as
653\emph{UTF-8} strings. The other formats do not make use of \var{encoding}
654unless unicode objects are used as input names. These are converted to
6558-bit character strings before they are added to the archive.
656
657The \var{errors} argument defines how characters are treated that cannot be
658converted to or from \var{encoding}. Possible values are listed in section
659\ref{codec-base-classes}. In read mode, there is an additional scheme
660\code{'utf-8'} which means that bad characters are replaced by their
661\emph{UTF-8} representation. This is the default scheme. In write mode the
662default value for \var{errors} is \code{'strict'} to ensure that name
663information is not altered unnoticed.