Neal Norwitz | b9ef4ae | 2003-01-05 23:19:43 +0000 | [diff] [blame] | 1 | \section{\module{tarfile} --- Read and write tar archive files} |
| 2 | |
| 3 | \declaremodule{standard}{tarfile} |
| 4 | \modulesynopsis{Read and write tar-format archive files.} |
| 5 | \versionadded{2.3} |
| 6 | |
| 7 | \moduleauthor{Lars Gust\"abel}{lars@gustaebel.de} |
| 8 | \sectionauthor{Lars Gust\"abel}{lars@gustaebel.de} |
| 9 | |
| 10 | The \module{tarfile} module makes it possible to read and create tar archives. |
| 11 | Some facts and figures: |
| 12 | |
| 13 | \begin{itemize} |
| 14 | \item reads and writes \module{gzip} and \module{bzip2} compressed archives. |
Guido van Rossum | d8faa36 | 2007-04-27 19:54:29 +0000 | [diff] [blame] | 15 | \item read/write support for the \POSIX{}.1-1988 (ustar) format. |
| 16 | \item read/write support for the GNU tar format including \emph{longname} and |
| 17 | \emph{longlink} extensions, read-only support for the \emph{sparse} |
| 18 | extension. |
| 19 | \item read/write support for the \POSIX{}.1-2001 (pax) format. |
| 20 | \versionadded{2.6} |
Neal Norwitz | b9ef4ae | 2003-01-05 23:19:43 +0000 | [diff] [blame] | 21 | \item handles directories, regular files, hardlinks, symbolic links, fifos, |
| 22 | character devices and block devices and is able to acquire and |
| 23 | restore file information like timestamp, access permissions and owner. |
| 24 | \item can handle tape devices. |
| 25 | \end{itemize} |
| 26 | |
Guido van Rossum | d8faa36 | 2007-04-27 19:54:29 +0000 | [diff] [blame] | 27 | \begin{funcdesc}{open}{name\optional{, mode\optional{, |
| 28 | fileobj\optional{, bufsize}}}, **kwargs} |
Neal Norwitz | b9ef4ae | 2003-01-05 23:19:43 +0000 | [diff] [blame] | 29 | Return a \class{TarFile} object for the pathname \var{name}. |
Guido van Rossum | d8faa36 | 2007-04-27 19:54:29 +0000 | [diff] [blame] | 30 | For detailed information on \class{TarFile} objects and the keyword |
| 31 | arguments that are allowed, see \citetitle{TarFile Objects} |
| 32 | (section \ref{tarfile-objects}). |
Neal Norwitz | b9ef4ae | 2003-01-05 23:19:43 +0000 | [diff] [blame] | 33 | |
| 34 | \var{mode} has to be a string of the form \code{'filemode[:compression]'}, |
| 35 | it defaults to \code{'r'}. Here is a full list of mode combinations: |
| 36 | |
| 37 | \begin{tableii}{c|l}{code}{mode}{action} |
Martin v. Löwis | 78be7df | 2005-03-05 12:47:42 +0000 | [diff] [blame] | 38 | \lineii{'r' or 'r:*'}{Open for reading with transparent compression (recommended).} |
Neal Norwitz | b9ef4ae | 2003-01-05 23:19:43 +0000 | [diff] [blame] | 39 | \lineii{'r:'}{Open for reading exclusively without compression.} |
| 40 | \lineii{'r:gz'}{Open for reading with gzip compression.} |
| 41 | \lineii{'r:bz2'}{Open for reading with bzip2 compression.} |
Thomas Wouters | cf297e4 | 2007-02-23 15:07:44 +0000 | [diff] [blame] | 42 | \lineii{'a' or 'a:'}{Open for appending with no compression. The file |
| 43 | is created if it does not exist.} |
Neal Norwitz | b9ef4ae | 2003-01-05 23:19:43 +0000 | [diff] [blame] | 44 | \lineii{'w' or 'w:'}{Open for uncompressed writing.} |
| 45 | \lineii{'w:gz'}{Open for gzip compressed writing.} |
| 46 | \lineii{'w:bz2'}{Open for bzip2 compressed writing.} |
| 47 | \end{tableii} |
| 48 | |
| 49 | Note that \code{'a:gz'} or \code{'a:bz2'} is not possible. |
| 50 | If \var{mode} is not suitable to open a certain (compressed) file for |
| 51 | reading, \exception{ReadError} is raised. Use \var{mode} \code{'r'} to |
| 52 | avoid this. If a compression method is not supported, |
| 53 | \exception{CompressionError} is raised. |
| 54 | |
Thomas Wouters | cf297e4 | 2007-02-23 15:07:44 +0000 | [diff] [blame] | 55 | If \var{fileobj} is specified, it is used as an alternative to a file |
| 56 | object opened for \var{name}. It is supposed to be at position 0. |
Neal Norwitz | b9ef4ae | 2003-01-05 23:19:43 +0000 | [diff] [blame] | 57 | |
| 58 | For special purposes, there is a second format for \var{mode}: |
Fred Drake | 3bbd115 | 2004-01-13 23:41:32 +0000 | [diff] [blame] | 59 | \code{'filemode|[compression]'}. \function{open()} will return a |
| 60 | \class{TarFile} object that processes its data as a stream of |
| 61 | blocks. No random seeking will be done on the file. If given, |
| 62 | \var{fileobj} may be any object that has a \method{read()} or |
| 63 | \method{write()} method (depending on the \var{mode}). |
| 64 | \var{bufsize} specifies the blocksize and defaults to \code{20 * |
| 65 | 512} bytes. Use this variant in combination with |
| 66 | e.g. \code{sys.stdin}, a socket file object or a tape device. |
| 67 | However, such a \class{TarFile} object is limited in that it does |
| 68 | not allow to be accessed randomly, see ``Examples'' |
| 69 | (section~\ref{tar-examples}). The currently possible modes: |
Neal Norwitz | b9ef4ae | 2003-01-05 23:19:43 +0000 | [diff] [blame] | 70 | |
Fred Drake | 3bbd115 | 2004-01-13 23:41:32 +0000 | [diff] [blame] | 71 | \begin{tableii}{c|l}{code}{Mode}{Action} |
Martin v. Löwis | 78be7df | 2005-03-05 12:47:42 +0000 | [diff] [blame] | 72 | \lineii{'r|*'}{Open a \emph{stream} of tar blocks for reading with transparent compression.} |
Neal Norwitz | b9ef4ae | 2003-01-05 23:19:43 +0000 | [diff] [blame] | 73 | \lineii{'r|'}{Open a \emph{stream} of uncompressed tar blocks for reading.} |
| 74 | \lineii{'r|gz'}{Open a gzip compressed \emph{stream} for reading.} |
| 75 | \lineii{'r|bz2'}{Open a bzip2 compressed \emph{stream} for reading.} |
| 76 | \lineii{'w|'}{Open an uncompressed \emph{stream} for writing.} |
| 77 | \lineii{'w|gz'}{Open an gzip compressed \emph{stream} for writing.} |
| 78 | \lineii{'w|bz2'}{Open an bzip2 compressed \emph{stream} for writing.} |
| 79 | \end{tableii} |
| 80 | \end{funcdesc} |
| 81 | |
| 82 | \begin{classdesc*}{TarFile} |
| 83 | Class for reading and writing tar archives. Do not use this |
| 84 | class directly, better use \function{open()} instead. |
Fred Drake | 3bbd115 | 2004-01-13 23:41:32 +0000 | [diff] [blame] | 85 | See ``TarFile Objects'' (section~\ref{tarfile-objects}). |
Neal Norwitz | b9ef4ae | 2003-01-05 23:19:43 +0000 | [diff] [blame] | 86 | \end{classdesc*} |
| 87 | |
| 88 | \begin{funcdesc}{is_tarfile}{name} |
Fred Drake | 3bbd115 | 2004-01-13 23:41:32 +0000 | [diff] [blame] | 89 | Return \constant{True} if \var{name} is a tar archive file, that |
| 90 | the \module{tarfile} module can read. |
Neal Norwitz | b9ef4ae | 2003-01-05 23:19:43 +0000 | [diff] [blame] | 91 | \end{funcdesc} |
| 92 | |
| 93 | \begin{classdesc}{TarFileCompat}{filename\optional{, mode\optional{, |
Fred Drake | 3bbd115 | 2004-01-13 23:41:32 +0000 | [diff] [blame] | 94 | compression}}} |
| 95 | Class for limited access to tar archives with a |
| 96 | \refmodule{zipfile}-like interface. Please consult the |
| 97 | documentation of the \refmodule{zipfile} module for more details. |
| 98 | \var{compression} must be one of the following constants: |
Neal Norwitz | b9ef4ae | 2003-01-05 23:19:43 +0000 | [diff] [blame] | 99 | \begin{datadesc}{TAR_PLAIN} |
| 100 | Constant for an uncompressed tar archive. |
| 101 | \end{datadesc} |
| 102 | \begin{datadesc}{TAR_GZIPPED} |
Fred Drake | 3bbd115 | 2004-01-13 23:41:32 +0000 | [diff] [blame] | 103 | Constant for a \refmodule{gzip} compressed tar archive. |
Neal Norwitz | b9ef4ae | 2003-01-05 23:19:43 +0000 | [diff] [blame] | 104 | \end{datadesc} |
| 105 | \end{classdesc} |
| 106 | |
| 107 | \begin{excdesc}{TarError} |
| 108 | Base class for all \module{tarfile} exceptions. |
| 109 | \end{excdesc} |
| 110 | |
| 111 | \begin{excdesc}{ReadError} |
| 112 | Is raised when a tar archive is opened, that either cannot be handled by |
| 113 | the \module{tarfile} module or is somehow invalid. |
| 114 | \end{excdesc} |
| 115 | |
| 116 | \begin{excdesc}{CompressionError} |
| 117 | Is raised when a compression method is not supported or when the data |
| 118 | cannot be decoded properly. |
| 119 | \end{excdesc} |
| 120 | |
| 121 | \begin{excdesc}{StreamError} |
| 122 | Is raised for the limitations that are typical for stream-like |
| 123 | \class{TarFile} objects. |
| 124 | \end{excdesc} |
| 125 | |
| 126 | \begin{excdesc}{ExtractError} |
| 127 | Is raised for \emph{non-fatal} errors when using \method{extract()}, but |
| 128 | only if \member{TarFile.errorlevel}\code{ == 2}. |
| 129 | \end{excdesc} |
| 130 | |
Thomas Wouters | 902d6eb | 2007-01-09 23:18:33 +0000 | [diff] [blame] | 131 | \begin{excdesc}{HeaderError} |
| 132 | Is raised by \method{frombuf()} if the buffer it gets is invalid. |
| 133 | \versionadded{2.6} |
| 134 | \end{excdesc} |
| 135 | |
Guido van Rossum | e7ba495 | 2007-06-06 23:52:48 +0000 | [diff] [blame^] | 136 | Each of the following constants defines a tar archive format that the |
| 137 | \module{tarfile} module is able to create. See section \ref{tar-formats} for |
| 138 | details. |
| 139 | |
Guido van Rossum | d8faa36 | 2007-04-27 19:54:29 +0000 | [diff] [blame] | 140 | \begin{datadesc}{USTAR_FORMAT} |
Guido van Rossum | e7ba495 | 2007-06-06 23:52:48 +0000 | [diff] [blame^] | 141 | \POSIX{}.1-1988 (ustar) format. |
Guido van Rossum | d8faa36 | 2007-04-27 19:54:29 +0000 | [diff] [blame] | 142 | \end{datadesc} |
| 143 | |
| 144 | \begin{datadesc}{GNU_FORMAT} |
Guido van Rossum | e7ba495 | 2007-06-06 23:52:48 +0000 | [diff] [blame^] | 145 | GNU tar format. |
Guido van Rossum | d8faa36 | 2007-04-27 19:54:29 +0000 | [diff] [blame] | 146 | \end{datadesc} |
| 147 | |
| 148 | \begin{datadesc}{PAX_FORMAT} |
Guido van Rossum | e7ba495 | 2007-06-06 23:52:48 +0000 | [diff] [blame^] | 149 | \POSIX{}.1-2001 (pax) format. |
Guido van Rossum | d8faa36 | 2007-04-27 19:54:29 +0000 | [diff] [blame] | 150 | \end{datadesc} |
| 151 | |
| 152 | \begin{datadesc}{DEFAULT_FORMAT} |
| 153 | The default format for creating archives. This is currently |
| 154 | \constant{GNU_FORMAT}. |
| 155 | \end{datadesc} |
| 156 | |
Neal Norwitz | b9ef4ae | 2003-01-05 23:19:43 +0000 | [diff] [blame] | 157 | \begin{seealso} |
Fred Drake | 3bbd115 | 2004-01-13 23:41:32 +0000 | [diff] [blame] | 158 | \seemodule{zipfile}{Documentation of the \refmodule{zipfile} |
Neal Norwitz | b9ef4ae | 2003-01-05 23:19:43 +0000 | [diff] [blame] | 159 | standard module.} |
| 160 | |
Thomas Wouters | 477c8d5 | 2006-05-27 19:21:47 +0000 | [diff] [blame] | 161 | \seetitle[http://www.gnu.org/software/tar/manual/html_node/tar_134.html\#SEC134] |
Georg Brandl | 9a19e5c | 2005-08-27 17:10:35 +0000 | [diff] [blame] | 162 | {GNU tar manual, Basic Tar Format}{Documentation for tar archive files, |
Neal Norwitz | b9ef4ae | 2003-01-05 23:19:43 +0000 | [diff] [blame] | 163 | including GNU tar extensions.} |
| 164 | \end{seealso} |
| 165 | |
| 166 | %----------------- |
| 167 | % TarFile Objects |
| 168 | %----------------- |
| 169 | |
| 170 | \subsection{TarFile Objects \label{tarfile-objects}} |
| 171 | |
| 172 | The \class{TarFile} object provides an interface to a tar archive. A tar |
| 173 | archive is a sequence of blocks. An archive member (a stored file) is made up |
Guido van Rossum | e7ba495 | 2007-06-06 23:52:48 +0000 | [diff] [blame^] | 174 | of a header block followed by data blocks. It is possible to store a file in a |
Neal Norwitz | b9ef4ae | 2003-01-05 23:19:43 +0000 | [diff] [blame] | 175 | tar archive several times. Each archive member is represented by a |
| 176 | \class{TarInfo} object, see \citetitle{TarInfo Objects} (section |
| 177 | \ref{tarinfo-objects}) for details. |
| 178 | |
Guido van Rossum | d8faa36 | 2007-04-27 19:54:29 +0000 | [diff] [blame] | 179 | \begin{classdesc}{TarFile}{name=None, mode='r', fileobj=None, |
| 180 | format=DEFAULT_FORMAT, tarinfo=TarInfo, dereference=False, |
Guido van Rossum | e7ba495 | 2007-06-06 23:52:48 +0000 | [diff] [blame^] | 181 | ignore_zeros=False, encoding=None, errors=None, pax_headers=None, |
| 182 | debug=0, errorlevel=0} |
Guido van Rossum | d8faa36 | 2007-04-27 19:54:29 +0000 | [diff] [blame] | 183 | |
| 184 | All following arguments are optional and can be accessed as instance |
| 185 | attributes as well. |
| 186 | |
| 187 | \var{name} is the pathname of the archive. It can be omitted if |
| 188 | \var{fileobj} is given. In this case, the file object's \member{name} |
| 189 | attribute is used if it exists. |
| 190 | |
Neal Norwitz | b9ef4ae | 2003-01-05 23:19:43 +0000 | [diff] [blame] | 191 | \var{mode} is either \code{'r'} to read from an existing archive, |
| 192 | \code{'a'} to append data to an existing file or \code{'w'} to create a new |
Guido van Rossum | d8faa36 | 2007-04-27 19:54:29 +0000 | [diff] [blame] | 193 | file overwriting an existing one. |
Neal Norwitz | b9ef4ae | 2003-01-05 23:19:43 +0000 | [diff] [blame] | 194 | |
| 195 | If \var{fileobj} is given, it is used for reading or writing data. |
| 196 | If it can be determined, \var{mode} is overridden by \var{fileobj}'s mode. |
Thomas Wouters | cf297e4 | 2007-02-23 15:07:44 +0000 | [diff] [blame] | 197 | \var{fileobj} will be used from position 0. |
Neal Norwitz | b9ef4ae | 2003-01-05 23:19:43 +0000 | [diff] [blame] | 198 | \begin{notice} |
| 199 | \var{fileobj} is not closed, when \class{TarFile} is closed. |
| 200 | \end{notice} |
Guido van Rossum | d8faa36 | 2007-04-27 19:54:29 +0000 | [diff] [blame] | 201 | |
| 202 | \var{format} controls the archive format. It must be one of the constants |
| 203 | \constant{USTAR_FORMAT}, \constant{GNU_FORMAT} or \constant{PAX_FORMAT} |
| 204 | that are defined at module level. |
| 205 | \versionadded{2.6} |
| 206 | |
| 207 | The \var{tarinfo} argument can be used to replace the default |
| 208 | \class{TarInfo} class with a different one. |
| 209 | \versionadded{2.6} |
| 210 | |
| 211 | If \var{dereference} is \code{False}, add symbolic and hard links to the |
| 212 | archive. If it is \code{True}, add the content of the target files to the |
| 213 | archive. This has no effect on systems that do not support symbolic links. |
| 214 | |
| 215 | If \var{ignore_zeros} is \code{False}, treat an empty block as the end of |
| 216 | the archive. If it is \var{True}, skip empty (and invalid) blocks and try |
| 217 | to get as many members as possible. This is only useful for reading |
| 218 | concatenated or damaged archives. |
| 219 | |
| 220 | \var{debug} can be set from \code{0} (no debug messages) up to \code{3} |
| 221 | (all debug messages). The messages are written to \code{sys.stderr}. |
| 222 | |
| 223 | If \var{errorlevel} is \code{0}, all errors are ignored when using |
| 224 | \method{extract()}. Nevertheless, they appear as error messages in the |
| 225 | debug output, when debugging is enabled. If \code{1}, all \emph{fatal} |
| 226 | errors are raised as \exception{OSError} or \exception{IOError} exceptions. |
| 227 | If \code{2}, all \emph{non-fatal} errors are raised as \exception{TarError} |
| 228 | exceptions as well. |
| 229 | |
Guido van Rossum | e7ba495 | 2007-06-06 23:52:48 +0000 | [diff] [blame^] | 230 | The \var{encoding} and \var{errors} arguments control the way strings are |
| 231 | converted to unicode objects and vice versa. The default settings will work |
| 232 | for most users. See section \ref{tar-unicode} for in-depth information. |
Guido van Rossum | d8faa36 | 2007-04-27 19:54:29 +0000 | [diff] [blame] | 233 | \versionadded{2.6} |
| 234 | |
Guido van Rossum | e7ba495 | 2007-06-06 23:52:48 +0000 | [diff] [blame^] | 235 | The \var{pax_headers} argument is an optional dictionary of unicode strings |
| 236 | which will be added as a pax global header if \var{format} is |
| 237 | \constant{PAX_FORMAT}. |
Guido van Rossum | d8faa36 | 2007-04-27 19:54:29 +0000 | [diff] [blame] | 238 | \versionadded{2.6} |
Neal Norwitz | b9ef4ae | 2003-01-05 23:19:43 +0000 | [diff] [blame] | 239 | \end{classdesc} |
| 240 | |
| 241 | \begin{methoddesc}{open}{...} |
| 242 | Alternative constructor. The \function{open()} function on module level is |
Fred Drake | 3bbd115 | 2004-01-13 23:41:32 +0000 | [diff] [blame] | 243 | actually a shortcut to this classmethod. See section~\ref{module-tarfile} |
Neal Norwitz | b9ef4ae | 2003-01-05 23:19:43 +0000 | [diff] [blame] | 244 | for details. |
| 245 | \end{methoddesc} |
| 246 | |
| 247 | \begin{methoddesc}{getmember}{name} |
| 248 | Return a \class{TarInfo} object for member \var{name}. If \var{name} can |
| 249 | not be found in the archive, \exception{KeyError} is raised. |
| 250 | \begin{notice} |
| 251 | If a member occurs more than once in the archive, its last |
Johannes Gijsbers | d345225 | 2004-09-11 16:50:06 +0000 | [diff] [blame] | 252 | occurrence is assumed to be the most up-to-date version. |
Neal Norwitz | b9ef4ae | 2003-01-05 23:19:43 +0000 | [diff] [blame] | 253 | \end{notice} |
| 254 | \end{methoddesc} |
| 255 | |
| 256 | \begin{methoddesc}{getmembers}{} |
| 257 | Return the members of the archive as a list of \class{TarInfo} objects. |
| 258 | The list has the same order as the members in the archive. |
| 259 | \end{methoddesc} |
| 260 | |
| 261 | \begin{methoddesc}{getnames}{} |
| 262 | Return the members as a list of their names. It has the same order as |
| 263 | the list returned by \method{getmembers()}. |
| 264 | \end{methoddesc} |
| 265 | |
| 266 | \begin{methoddesc}{list}{verbose=True} |
| 267 | Print a table of contents to \code{sys.stdout}. If \var{verbose} is |
Fred Drake | 3bbd115 | 2004-01-13 23:41:32 +0000 | [diff] [blame] | 268 | \constant{False}, only the names of the members are printed. If it is |
| 269 | \constant{True}, output similar to that of \program{ls -l} is produced. |
Neal Norwitz | b9ef4ae | 2003-01-05 23:19:43 +0000 | [diff] [blame] | 270 | \end{methoddesc} |
| 271 | |
| 272 | \begin{methoddesc}{next}{} |
| 273 | Return the next member of the archive as a \class{TarInfo} object, when |
| 274 | \class{TarFile} is opened for reading. Return \code{None} if there is no |
| 275 | more available. |
| 276 | \end{methoddesc} |
| 277 | |
Martin v. Löwis | 00a73e7 | 2005-03-04 19:40:34 +0000 | [diff] [blame] | 278 | \begin{methoddesc}{extractall}{\optional{path\optional{, members}}} |
| 279 | Extract all members from the archive to the current working directory |
| 280 | or directory \var{path}. If optional \var{members} is given, it must be |
| 281 | a subset of the list returned by \method{getmembers()}. |
Guido van Rossum | e7ba495 | 2007-06-06 23:52:48 +0000 | [diff] [blame^] | 282 | Directory information like owner, modification time and permissions are |
Martin v. Löwis | 00a73e7 | 2005-03-04 19:40:34 +0000 | [diff] [blame] | 283 | set after all members have been extracted. This is done to work around two |
| 284 | problems: A directory's modification time is reset each time a file is |
| 285 | created in it. And, if a directory's permissions do not allow writing, |
| 286 | extracting files to it will fail. |
| 287 | \versionadded{2.5} |
| 288 | \end{methoddesc} |
| 289 | |
Neal Norwitz | b9ef4ae | 2003-01-05 23:19:43 +0000 | [diff] [blame] | 290 | \begin{methoddesc}{extract}{member\optional{, path}} |
| 291 | Extract a member from the archive to the current working directory, |
| 292 | using its full name. Its file information is extracted as accurately as |
| 293 | possible. |
| 294 | \var{member} may be a filename or a \class{TarInfo} object. |
| 295 | You can specify a different directory using \var{path}. |
Martin v. Löwis | 00a73e7 | 2005-03-04 19:40:34 +0000 | [diff] [blame] | 296 | \begin{notice} |
| 297 | Because the \method{extract()} method allows random access to a tar |
| 298 | archive there are some issues you must take care of yourself. See the |
| 299 | description for \method{extractall()} above. |
| 300 | \end{notice} |
Neal Norwitz | b9ef4ae | 2003-01-05 23:19:43 +0000 | [diff] [blame] | 301 | \end{methoddesc} |
| 302 | |
| 303 | \begin{methoddesc}{extractfile}{member} |
| 304 | Extract a member from the archive as a file object. |
| 305 | \var{member} may be a filename or a \class{TarInfo} object. |
| 306 | If \var{member} is a regular file, a file-like object is returned. |
| 307 | If \var{member} is a link, a file-like object is constructed from the |
| 308 | link's target. |
| 309 | If \var{member} is none of the above, \code{None} is returned. |
| 310 | \begin{notice} |
| 311 | The file-like object is read-only and provides the following methods: |
| 312 | \method{read()}, \method{readline()}, \method{readlines()}, |
| 313 | \method{seek()}, \method{tell()}. |
| 314 | \end{notice} |
| 315 | \end{methoddesc} |
| 316 | |
Fred Drake | 3bbd115 | 2004-01-13 23:41:32 +0000 | [diff] [blame] | 317 | \begin{methoddesc}{add}{name\optional{, arcname\optional{, recursive}}} |
Neal Norwitz | b9ef4ae | 2003-01-05 23:19:43 +0000 | [diff] [blame] | 318 | Add the file \var{name} to the archive. \var{name} may be any type |
| 319 | of file (directory, fifo, symbolic link, etc.). |
| 320 | If given, \var{arcname} specifies an alternative name for the file in the |
| 321 | archive. Directories are added recursively by default. |
Fred Drake | 3bbd115 | 2004-01-13 23:41:32 +0000 | [diff] [blame] | 322 | This can be avoided by setting \var{recursive} to \constant{False}; |
| 323 | the default is \constant{True}. |
Neal Norwitz | b9ef4ae | 2003-01-05 23:19:43 +0000 | [diff] [blame] | 324 | \end{methoddesc} |
| 325 | |
| 326 | \begin{methoddesc}{addfile}{tarinfo\optional{, fileobj}} |
| 327 | Add the \class{TarInfo} object \var{tarinfo} to the archive. |
Fred Drake | 3bbd115 | 2004-01-13 23:41:32 +0000 | [diff] [blame] | 328 | If \var{fileobj} is given, \code{\var{tarinfo}.size} bytes are read |
Neal Norwitz | b9ef4ae | 2003-01-05 23:19:43 +0000 | [diff] [blame] | 329 | from it and added to the archive. You can create \class{TarInfo} objects |
| 330 | using \method{gettarinfo()}. |
| 331 | \begin{notice} |
| 332 | On Windows platforms, \var{fileobj} should always be opened with mode |
| 333 | \code{'rb'} to avoid irritation about the file size. |
| 334 | \end{notice} |
| 335 | \end{methoddesc} |
| 336 | |
Fred Drake | 3bbd115 | 2004-01-13 23:41:32 +0000 | [diff] [blame] | 337 | \begin{methoddesc}{gettarinfo}{\optional{name\optional{, |
| 338 | arcname\optional{, fileobj}}}} |
| 339 | Create a \class{TarInfo} object for either the file \var{name} or |
| 340 | the file object \var{fileobj} (using \function{os.fstat()} on its |
| 341 | file descriptor). You can modify some of the \class{TarInfo}'s |
| 342 | attributes before you add it using \method{addfile()}. If given, |
| 343 | \var{arcname} specifies an alternative name for the file in the |
Neal Norwitz | b9ef4ae | 2003-01-05 23:19:43 +0000 | [diff] [blame] | 344 | archive. |
| 345 | \end{methoddesc} |
| 346 | |
| 347 | \begin{methoddesc}{close}{} |
Fred Drake | 3bbd115 | 2004-01-13 23:41:32 +0000 | [diff] [blame] | 348 | Close the \class{TarFile}. In write mode, two finishing zero |
| 349 | blocks are appended to the archive. |
Neal Norwitz | b9ef4ae | 2003-01-05 23:19:43 +0000 | [diff] [blame] | 350 | \end{methoddesc} |
| 351 | |
Fred Drake | 3bbd115 | 2004-01-13 23:41:32 +0000 | [diff] [blame] | 352 | \begin{memberdesc}{posix} |
Guido van Rossum | d8faa36 | 2007-04-27 19:54:29 +0000 | [diff] [blame] | 353 | Setting this to \constant{True} is equivalent to setting the |
| 354 | \member{format} attribute to \constant{USTAR_FORMAT}, |
| 355 | \constant{False} is equivalent to \constant{GNU_FORMAT}. |
Neal Norwitz | 525b315 | 2004-08-20 01:52:42 +0000 | [diff] [blame] | 356 | \versionchanged[\var{posix} defaults to \constant{False}]{2.4} |
Guido van Rossum | d8faa36 | 2007-04-27 19:54:29 +0000 | [diff] [blame] | 357 | \deprecated{2.6}{Use the \member{format} attribute instead.} |
Neal Norwitz | b9ef4ae | 2003-01-05 23:19:43 +0000 | [diff] [blame] | 358 | \end{memberdesc} |
| 359 | |
Guido van Rossum | e7ba495 | 2007-06-06 23:52:48 +0000 | [diff] [blame^] | 360 | \begin{memberdesc}{pax_headers} |
| 361 | A dictionary containing key-value pairs of pax global headers. |
| 362 | \versionadded{2.6} |
| 363 | \end{memberdesc} |
| 364 | |
Neal Norwitz | b9ef4ae | 2003-01-05 23:19:43 +0000 | [diff] [blame] | 365 | %----------------- |
| 366 | % TarInfo Objects |
| 367 | %----------------- |
| 368 | |
| 369 | \subsection{TarInfo Objects \label{tarinfo-objects}} |
| 370 | |
Fred Drake | 3bbd115 | 2004-01-13 23:41:32 +0000 | [diff] [blame] | 371 | A \class{TarInfo} object represents one member in a |
| 372 | \class{TarFile}. Aside from storing all required attributes of a file |
| 373 | (like file type, size, time, permissions, owner etc.), it provides |
| 374 | some useful methods to determine its type. It does \emph{not} contain |
| 375 | the file's data itself. |
Neal Norwitz | b9ef4ae | 2003-01-05 23:19:43 +0000 | [diff] [blame] | 376 | |
Fred Drake | 3bbd115 | 2004-01-13 23:41:32 +0000 | [diff] [blame] | 377 | \class{TarInfo} objects are returned by \class{TarFile}'s methods |
| 378 | \method{getmember()}, \method{getmembers()} and \method{gettarinfo()}. |
Neal Norwitz | b9ef4ae | 2003-01-05 23:19:43 +0000 | [diff] [blame] | 379 | |
| 380 | \begin{classdesc}{TarInfo}{\optional{name}} |
| 381 | Create a \class{TarInfo} object. |
| 382 | \end{classdesc} |
| 383 | |
Guido van Rossum | e7ba495 | 2007-06-06 23:52:48 +0000 | [diff] [blame^] | 384 | \begin{methoddesc}{frombuf}{buf} |
| 385 | Create and return a \class{TarInfo} object from string buffer \var{buf}. |
Thomas Wouters | 902d6eb | 2007-01-09 23:18:33 +0000 | [diff] [blame] | 386 | \versionadded[Raises \exception{HeaderError} if the buffer is |
| 387 | invalid.]{2.6} |
Neal Norwitz | b9ef4ae | 2003-01-05 23:19:43 +0000 | [diff] [blame] | 388 | \end{methoddesc} |
| 389 | |
Guido van Rossum | d8faa36 | 2007-04-27 19:54:29 +0000 | [diff] [blame] | 390 | \begin{methoddesc}{fromtarfile}{tarfile} |
| 391 | Read the next member from the \class{TarFile} object \var{tarfile} and |
| 392 | return it as a \class{TarInfo} object. |
| 393 | \versionadded{2.6} |
| 394 | \end{methoddesc} |
Thomas Wouters | 477c8d5 | 2006-05-27 19:21:47 +0000 | [diff] [blame] | 395 | |
Guido van Rossum | e7ba495 | 2007-06-06 23:52:48 +0000 | [diff] [blame^] | 396 | \begin{methoddesc}{tobuf}{\optional{format\optional{, encoding |
| 397 | \optional{, errors}}}} |
| 398 | Create a string buffer from a \class{TarInfo} object. For information |
| 399 | on the arguments see the constructor of the \class{TarFile} class. |
| 400 | \versionchanged[The arguments were added]{2.6} |
Neal Norwitz | b9ef4ae | 2003-01-05 23:19:43 +0000 | [diff] [blame] | 401 | \end{methoddesc} |
| 402 | |
| 403 | A \code{TarInfo} object has the following public data attributes: |
Fred Drake | 3bbd115 | 2004-01-13 23:41:32 +0000 | [diff] [blame] | 404 | |
Neal Norwitz | b9ef4ae | 2003-01-05 23:19:43 +0000 | [diff] [blame] | 405 | \begin{memberdesc}{name} |
| 406 | Name of the archive member. |
| 407 | \end{memberdesc} |
| 408 | |
| 409 | \begin{memberdesc}{size} |
| 410 | Size in bytes. |
| 411 | \end{memberdesc} |
| 412 | |
| 413 | \begin{memberdesc}{mtime} |
| 414 | Time of last modification. |
| 415 | \end{memberdesc} |
| 416 | |
| 417 | \begin{memberdesc}{mode} |
| 418 | Permission bits. |
| 419 | \end{memberdesc} |
| 420 | |
| 421 | \begin{memberdesc}{type} |
Fred Drake | 3bbd115 | 2004-01-13 23:41:32 +0000 | [diff] [blame] | 422 | File type. \var{type} is usually one of these constants: |
| 423 | \constant{REGTYPE}, \constant{AREGTYPE}, \constant{LNKTYPE}, |
| 424 | \constant{SYMTYPE}, \constant{DIRTYPE}, \constant{FIFOTYPE}, |
| 425 | \constant{CONTTYPE}, \constant{CHRTYPE}, \constant{BLKTYPE}, |
| 426 | \constant{GNUTYPE_SPARSE}. To determine the type of a |
| 427 | \class{TarInfo} object more conveniently, use the \code{is_*()} |
| 428 | methods below. |
Neal Norwitz | b9ef4ae | 2003-01-05 23:19:43 +0000 | [diff] [blame] | 429 | \end{memberdesc} |
| 430 | |
| 431 | \begin{memberdesc}{linkname} |
Fred Drake | 3bbd115 | 2004-01-13 23:41:32 +0000 | [diff] [blame] | 432 | Name of the target file name, which is only present in |
| 433 | \class{TarInfo} objects of type \constant{LNKTYPE} and |
| 434 | \constant{SYMTYPE}. |
Neal Norwitz | b9ef4ae | 2003-01-05 23:19:43 +0000 | [diff] [blame] | 435 | \end{memberdesc} |
| 436 | |
Fred Drake | 3bbd115 | 2004-01-13 23:41:32 +0000 | [diff] [blame] | 437 | \begin{memberdesc}{uid} |
| 438 | User ID of the user who originally stored this member. |
Neal Norwitz | b9ef4ae | 2003-01-05 23:19:43 +0000 | [diff] [blame] | 439 | \end{memberdesc} |
| 440 | |
Fred Drake | 3bbd115 | 2004-01-13 23:41:32 +0000 | [diff] [blame] | 441 | \begin{memberdesc}{gid} |
| 442 | Group ID of the user who originally stored this member. |
| 443 | \end{memberdesc} |
| 444 | |
| 445 | \begin{memberdesc}{uname} |
| 446 | User name. |
| 447 | \end{memberdesc} |
| 448 | |
| 449 | \begin{memberdesc}{gname} |
| 450 | Group name. |
Neal Norwitz | b9ef4ae | 2003-01-05 23:19:43 +0000 | [diff] [blame] | 451 | \end{memberdesc} |
| 452 | |
Guido van Rossum | e7ba495 | 2007-06-06 23:52:48 +0000 | [diff] [blame^] | 453 | \begin{memberdesc}{pax_headers} |
| 454 | A dictionary containing key-value pairs of an associated pax |
| 455 | extended header. |
| 456 | \versionadded{2.6} |
| 457 | \end{memberdesc} |
| 458 | |
Neal Norwitz | b9ef4ae | 2003-01-05 23:19:43 +0000 | [diff] [blame] | 459 | A \class{TarInfo} object also provides some convenient query methods: |
Fred Drake | 3bbd115 | 2004-01-13 23:41:32 +0000 | [diff] [blame] | 460 | |
Neal Norwitz | b9ef4ae | 2003-01-05 23:19:43 +0000 | [diff] [blame] | 461 | \begin{methoddesc}{isfile}{} |
Fred Drake | 3bbd115 | 2004-01-13 23:41:32 +0000 | [diff] [blame] | 462 | Return \constant{True} if the \class{Tarinfo} object is a regular |
| 463 | file. |
Neal Norwitz | b9ef4ae | 2003-01-05 23:19:43 +0000 | [diff] [blame] | 464 | \end{methoddesc} |
| 465 | |
| 466 | \begin{methoddesc}{isreg}{} |
| 467 | Same as \method{isfile()}. |
| 468 | \end{methoddesc} |
| 469 | |
| 470 | \begin{methoddesc}{isdir}{} |
Fred Drake | 3bbd115 | 2004-01-13 23:41:32 +0000 | [diff] [blame] | 471 | Return \constant{True} if it is a directory. |
Neal Norwitz | b9ef4ae | 2003-01-05 23:19:43 +0000 | [diff] [blame] | 472 | \end{methoddesc} |
| 473 | |
| 474 | \begin{methoddesc}{issym}{} |
Fred Drake | 3bbd115 | 2004-01-13 23:41:32 +0000 | [diff] [blame] | 475 | Return \constant{True} if it is a symbolic link. |
Neal Norwitz | b9ef4ae | 2003-01-05 23:19:43 +0000 | [diff] [blame] | 476 | \end{methoddesc} |
| 477 | |
| 478 | \begin{methoddesc}{islnk}{} |
Fred Drake | 3bbd115 | 2004-01-13 23:41:32 +0000 | [diff] [blame] | 479 | Return \constant{True} if it is a hard link. |
Neal Norwitz | b9ef4ae | 2003-01-05 23:19:43 +0000 | [diff] [blame] | 480 | \end{methoddesc} |
| 481 | |
| 482 | \begin{methoddesc}{ischr}{} |
Fred Drake | 3bbd115 | 2004-01-13 23:41:32 +0000 | [diff] [blame] | 483 | Return \constant{True} if it is a character device. |
Neal Norwitz | b9ef4ae | 2003-01-05 23:19:43 +0000 | [diff] [blame] | 484 | \end{methoddesc} |
| 485 | |
| 486 | \begin{methoddesc}{isblk}{} |
Fred Drake | 3bbd115 | 2004-01-13 23:41:32 +0000 | [diff] [blame] | 487 | Return \constant{True} if it is a block device. |
Neal Norwitz | b9ef4ae | 2003-01-05 23:19:43 +0000 | [diff] [blame] | 488 | \end{methoddesc} |
| 489 | |
| 490 | \begin{methoddesc}{isfifo}{} |
Fred Drake | 3bbd115 | 2004-01-13 23:41:32 +0000 | [diff] [blame] | 491 | Return \constant{True} if it is a FIFO. |
Neal Norwitz | b9ef4ae | 2003-01-05 23:19:43 +0000 | [diff] [blame] | 492 | \end{methoddesc} |
| 493 | |
| 494 | \begin{methoddesc}{isdev}{} |
Fred Drake | 3bbd115 | 2004-01-13 23:41:32 +0000 | [diff] [blame] | 495 | Return \constant{True} if it is one of character device, block |
| 496 | device or FIFO. |
Neal Norwitz | b9ef4ae | 2003-01-05 23:19:43 +0000 | [diff] [blame] | 497 | \end{methoddesc} |
| 498 | |
| 499 | %------------------------ |
| 500 | % Examples |
| 501 | %------------------------ |
| 502 | |
| 503 | \subsection{Examples \label{tar-examples}} |
| 504 | |
Martin v. Löwis | 00a73e7 | 2005-03-04 19:40:34 +0000 | [diff] [blame] | 505 | How to extract an entire tar archive to the current working directory: |
| 506 | \begin{verbatim} |
| 507 | import tarfile |
| 508 | tar = tarfile.open("sample.tar.gz") |
| 509 | tar.extractall() |
| 510 | tar.close() |
| 511 | \end{verbatim} |
| 512 | |
Neal Norwitz | b9ef4ae | 2003-01-05 23:19:43 +0000 | [diff] [blame] | 513 | How to create an uncompressed tar archive from a list of filenames: |
| 514 | \begin{verbatim} |
| 515 | import tarfile |
| 516 | tar = tarfile.open("sample.tar", "w") |
| 517 | for name in ["foo", "bar", "quux"]: |
| 518 | tar.add(name) |
| 519 | tar.close() |
| 520 | \end{verbatim} |
| 521 | |
| 522 | How to read a gzip compressed tar archive and display some member information: |
| 523 | \begin{verbatim} |
| 524 | import tarfile |
| 525 | tar = tarfile.open("sample.tar.gz", "r:gz") |
| 526 | for tarinfo in tar: |
| 527 | print tarinfo.name, "is", tarinfo.size, "bytes in size and is", |
| 528 | if tarinfo.isreg(): |
| 529 | print "a regular file." |
| 530 | elif tarinfo.isdir(): |
| 531 | print "a directory." |
| 532 | else: |
| 533 | print "something else." |
| 534 | tar.close() |
| 535 | \end{verbatim} |
| 536 | |
| 537 | How to create a tar archive with faked information: |
| 538 | \begin{verbatim} |
| 539 | import tarfile |
| 540 | tar = tarfile.open("sample.tar.gz", "w:gz") |
| 541 | for name in namelist: |
| 542 | tarinfo = tar.gettarinfo(name, "fakeproj-1.0/" + name) |
| 543 | tarinfo.uid = 123 |
| 544 | tarinfo.gid = 456 |
| 545 | tarinfo.uname = "johndoe" |
| 546 | tarinfo.gname = "fake" |
| 547 | tar.addfile(tarinfo, file(name)) |
| 548 | tar.close() |
| 549 | \end{verbatim} |
| 550 | |
| 551 | The \emph{only} way to extract an uncompressed tar stream from |
| 552 | \code{sys.stdin}: |
| 553 | \begin{verbatim} |
| 554 | import sys |
| 555 | import tarfile |
| 556 | tar = tarfile.open(mode="r|", fileobj=sys.stdin) |
| 557 | for tarinfo in tar: |
| 558 | tar.extract(tarinfo) |
| 559 | tar.close() |
| 560 | \end{verbatim} |
Guido van Rossum | e7ba495 | 2007-06-06 23:52:48 +0000 | [diff] [blame^] | 561 | |
| 562 | %------------ |
| 563 | % Tar format |
| 564 | %------------ |
| 565 | |
| 566 | \subsection{Supported tar formats \label{tar-formats}} |
| 567 | |
| 568 | There are three tar formats that can be created with the \module{tarfile} |
| 569 | module: |
| 570 | |
| 571 | \begin{itemize} |
| 572 | |
| 573 | \item |
| 574 | The \POSIX{}.1-1988 ustar format (\constant{USTAR_FORMAT}). It supports |
| 575 | filenames up to a length of at best 256 characters and linknames up to 100 |
| 576 | characters. The maximum file size is 8 gigabytes. This is an old and limited |
| 577 | but widely supported format. |
| 578 | |
| 579 | \item |
| 580 | The GNU tar format (\constant{GNU_FORMAT}). It supports long filenames and |
| 581 | linknames, files bigger than 8 gigabytes and sparse files. It is the de facto |
| 582 | standard on GNU/Linux systems. \module{tarfile} fully supports the GNU tar |
| 583 | extensions for long names, sparse file support is read-only. |
| 584 | |
| 585 | \item |
| 586 | The \POSIX{}.1-2001 pax format (\constant{PAX_FORMAT}). It is the most |
| 587 | flexible format with virtually no limits. It supports long filenames and |
| 588 | linknames, large files and stores pathnames in a portable way. However, not |
| 589 | all tar implementations today are able to handle pax archives properly. |
| 590 | |
| 591 | The \emph{pax} format is an extension to the existing \emph{ustar} format. It |
| 592 | uses extra headers for information that cannot be stored otherwise. There are |
| 593 | two flavours of pax headers: Extended headers only affect the subsequent file |
| 594 | header, global headers are valid for the complete archive and affect all |
| 595 | following files. All the data in a pax header is encoded in \emph{UTF-8} for |
| 596 | portability reasons. |
| 597 | |
| 598 | \end{itemize} |
| 599 | |
| 600 | There are some more variants of the tar format which can be read, but not |
| 601 | created: |
| 602 | |
| 603 | \begin{itemize} |
| 604 | |
| 605 | \item |
| 606 | The ancient V7 format. This is the first tar format from \UNIX{} Seventh |
| 607 | Edition, storing only regular files and directories. Names must not be longer |
| 608 | than 100 characters, there is no user/group name information. Some archives |
| 609 | have miscalculated header checksums in case of fields with non-\ASCII{} |
| 610 | characters. |
| 611 | |
| 612 | \item |
| 613 | The SunOS tar extended format. This format is a variant of the \POSIX{}.1-2001 |
| 614 | pax format, but is not compatible. |
| 615 | |
| 616 | \end{itemize} |
| 617 | |
| 618 | %---------------- |
| 619 | % Unicode issues |
| 620 | %---------------- |
| 621 | |
| 622 | \subsection{Unicode issues \label{tar-unicode}} |
| 623 | |
| 624 | The tar format was originally conceived to make backups on tape drives with the |
| 625 | main focus on preserving file system information. Nowadays tar archives are |
| 626 | commonly used for file distribution and exchanging archives over networks. One |
| 627 | problem of the original format (that all other formats are merely variants of) |
| 628 | is that there is no concept of supporting different character encodings. |
| 629 | For example, an ordinary tar archive created on a \emph{UTF-8} system cannot be |
| 630 | read correctly on a \emph{Latin-1} system if it contains non-\ASCII{} |
| 631 | characters. Names (i.e. filenames, linknames, user/group names) containing |
| 632 | these characters will appear damaged. Unfortunately, there is no way to |
| 633 | autodetect the encoding of an archive. |
| 634 | |
| 635 | The pax format was designed to solve this problem. It stores non-\ASCII{} names |
| 636 | using the universal character encoding \emph{UTF-8}. When a pax archive is |
| 637 | read, these \emph{UTF-8} names are converted to the encoding of the local |
| 638 | file system. |
| 639 | |
| 640 | The details of unicode conversion are controlled by the \var{encoding} and |
| 641 | \var{errors} keyword arguments of the \class{TarFile} class. |
| 642 | |
| 643 | The default value for \var{encoding} is the local character encoding. It is |
| 644 | deduced from \function{sys.getfilesystemencoding()} and |
| 645 | \function{sys.getdefaultencoding()}. In read mode, \var{encoding} is used |
| 646 | exclusively to convert unicode names from a pax archive to strings in the local |
| 647 | character encoding. In write mode, the use of \var{encoding} depends on the |
| 648 | chosen archive format. In case of \constant{PAX_FORMAT}, input names that |
| 649 | contain non-\ASCII{} characters need to be decoded before being stored as |
| 650 | \emph{UTF-8} strings. The other formats do not make use of \var{encoding} |
| 651 | unless unicode objects are used as input names. These are converted to |
| 652 | 8-bit character strings before they are added to the archive. |
| 653 | |
| 654 | The \var{errors} argument defines how characters are treated that cannot be |
| 655 | converted to or from \var{encoding}. Possible values are listed in section |
| 656 | \ref{codec-base-classes}. In read mode, there is an additional scheme |
| 657 | \code{'utf-8'} which means that bad characters are replaced by their |
| 658 | \emph{UTF-8} representation. This is the default scheme. In write mode the |
| 659 | default value for \var{errors} is \code{'strict'} to ensure that name |
| 660 | information is not altered unnoticed. |