blob: 019a894721e8e4fb87ef7ce39843604ebc69756a [file] [log] [blame]
Georg Brandl116aa622007-08-15 14:28:22 +00001:mod:`zipfile` --- Work with ZIP archives
2=========================================
3
4.. module:: zipfile
5 :synopsis: Read and write ZIP-format archive files.
6.. moduleauthor:: James C. Ahlstrom <jim@interet.com>
7.. sectionauthor:: James C. Ahlstrom <jim@interet.com>
8
Raymond Hettinger469271d2011-01-27 20:38:46 +00009**Source code:** :source:`Lib/zipfile.py`
10
11--------------
12
Georg Brandl116aa622007-08-15 14:28:22 +000013The ZIP file format is a common archive and compression standard. This module
14provides tools to create, read, write, append, and list a ZIP file. Any
15advanced use of this module will require an understanding of the format, as
16defined in `PKZIP Application Note
Christian Heimesdd15f6c2008-03-16 00:07:10 +000017<http://www.pkware.com/documents/casestudies/APPNOTE.TXT>`_.
Georg Brandl116aa622007-08-15 14:28:22 +000018
Georg Brandl98be9962010-08-02 20:52:10 +000019This module does not currently handle multi-disk ZIP files.
20It can handle ZIP files that use the ZIP64 extensions
Guido van Rossum77677112007-11-05 19:43:04 +000021(that is ZIP files that are more than 4 GByte in size). It supports
22decryption of encrypted files in ZIP archives, but it currently cannot
Christian Heimesfdab48e2008-01-20 09:06:41 +000023create an encrypted file. Decryption is extremely slow as it is
Benjamin Peterson20211002009-11-25 18:34:42 +000024implemented in native Python rather than C.
Georg Brandl116aa622007-08-15 14:28:22 +000025
Guido van Rossum77677112007-11-05 19:43:04 +000026For other archive formats, see the :mod:`bz2`, :mod:`gzip`, and
27:mod:`tarfile` modules.
Georg Brandl116aa622007-08-15 14:28:22 +000028
Guido van Rossum77677112007-11-05 19:43:04 +000029The module defines the following items:
Georg Brandl116aa622007-08-15 14:28:22 +000030
Georg Brandl4d540882010-10-28 06:42:33 +000031.. exception:: BadZipFile
Georg Brandl116aa622007-08-15 14:28:22 +000032
Éric Araujod001ffe2011-08-19 00:44:31 +020033 The error raised for bad ZIP files.
Georg Brandl116aa622007-08-15 14:28:22 +000034
Georg Brandl4d540882010-10-28 06:42:33 +000035 .. versionadded:: 3.2
36
37
38.. exception:: BadZipfile
39
Éric Araujod001ffe2011-08-19 00:44:31 +020040 Alias of :exc:`BadZipFile`, for compatibility with older Python versions.
41
42 .. deprecated:: 3.2
Georg Brandl4d540882010-10-28 06:42:33 +000043
Georg Brandl116aa622007-08-15 14:28:22 +000044
45.. exception:: LargeZipFile
46
47 The error raised when a ZIP file would require ZIP64 functionality but that has
48 not been enabled.
49
50
51.. class:: ZipFile
Georg Brandl5e92a502010-11-12 06:20:12 +000052 :noindex:
Georg Brandl116aa622007-08-15 14:28:22 +000053
54 The class for reading and writing ZIP files. See section
55 :ref:`zipfile-objects` for constructor details.
56
57
58.. class:: PyZipFile
Georg Brandl8334fd92010-12-04 10:26:46 +000059 :noindex:
Georg Brandl116aa622007-08-15 14:28:22 +000060
61 Class for creating ZIP archives containing Python libraries.
62
63
Georg Brandl7f01a132009-09-16 15:58:14 +000064.. class:: ZipInfo(filename='NoName', date_time=(1980,1,1,0,0,0))
Georg Brandl116aa622007-08-15 14:28:22 +000065
66 Class used to represent information about a member of an archive. Instances
67 of this class are returned by the :meth:`getinfo` and :meth:`infolist`
68 methods of :class:`ZipFile` objects. Most users of the :mod:`zipfile` module
69 will not need to create these, but only use those created by this
70 module. *filename* should be the full name of the archive member, and
71 *date_time* should be a tuple containing six fields which describe the time
72 of the last modification to the file; the fields are described in section
73 :ref:`zipinfo-objects`.
74
75
76.. function:: is_zipfile(filename)
77
78 Returns ``True`` if *filename* is a valid ZIP file based on its magic number,
Antoine Pitroudb5fe662008-12-27 15:50:40 +000079 otherwise returns ``False``. *filename* may be a file or file-like object too.
Georg Brandl116aa622007-08-15 14:28:22 +000080
Georg Brandl277a1502009-01-04 00:28:14 +000081 .. versionchanged:: 3.1
82 Support for file and file-like objects.
Georg Brandl116aa622007-08-15 14:28:22 +000083
Georg Brandl67b21b72010-08-17 15:07:14 +000084
Georg Brandl116aa622007-08-15 14:28:22 +000085.. data:: ZIP_STORED
86
87 The numeric constant for an uncompressed archive member.
88
89
90.. data:: ZIP_DEFLATED
91
92 The numeric constant for the usual ZIP compression method. This requires the
93 zlib module. No other compression methods are currently supported.
94
95
96.. seealso::
97
Christian Heimesdd15f6c2008-03-16 00:07:10 +000098 `PKZIP Application Note <http://www.pkware.com/documents/casestudies/APPNOTE.TXT>`_
Georg Brandl116aa622007-08-15 14:28:22 +000099 Documentation on the ZIP file format by Phil Katz, the creator of the format and
100 algorithms used.
101
102 `Info-ZIP Home Page <http://www.info-zip.org/>`_
103 Information about the Info-ZIP project's ZIP archive programs and development
104 libraries.
105
106
107.. _zipfile-objects:
108
109ZipFile Objects
110---------------
111
112
Georg Brandl7f01a132009-09-16 15:58:14 +0000113.. class:: ZipFile(file, mode='r', compression=ZIP_STORED, allowZip64=False)
Georg Brandl116aa622007-08-15 14:28:22 +0000114
115 Open a ZIP file, where *file* can be either a path to a file (a string) or a
116 file-like object. The *mode* parameter should be ``'r'`` to read an existing
117 file, ``'w'`` to truncate and write a new file, or ``'a'`` to append to an
Ezio Melottifaa6b7f2009-12-30 12:34:59 +0000118 existing file. If *mode* is ``'a'`` and *file* refers to an existing ZIP
119 file, then additional files are added to it. If *file* does not refer to a
120 ZIP file, then a new ZIP archive is appended to the file. This is meant for
121 adding a ZIP archive to another file (such as :file:`python.exe`). If
122 *mode* is ``a`` and the file does not exist at all, it is created.
123 *compression* is the ZIP compression method to use when writing the archive,
124 and should be :const:`ZIP_STORED` or :const:`ZIP_DEFLATED`; unrecognized
125 values will cause :exc:`RuntimeError` to be raised. If :const:`ZIP_DEFLATED`
126 is specified but the :mod:`zlib` module is not available, :exc:`RuntimeError`
127 is also raised. The default is :const:`ZIP_STORED`. If *allowZip64* is
128 ``True`` zipfile will create ZIP files that use the ZIP64 extensions when
129 the zipfile is larger than 2 GB. If it is false (the default) :mod:`zipfile`
130 will raise an exception when the ZIP file would require ZIP64 extensions.
131 ZIP64 extensions are disabled by default because the default :program:`zip`
132 and :program:`unzip` commands on Unix (the InfoZIP utilities) don't support
133 these extensions.
Georg Brandl116aa622007-08-15 14:28:22 +0000134
Georg Brandl268e4d42010-10-14 06:59:45 +0000135 If the file is created with mode ``'a'`` or ``'w'`` and then
136 :meth:`close`\ d without adding any files to the archive, the appropriate
137 ZIP structures for an empty archive will be written to the file.
138
Ezio Melottifaa6b7f2009-12-30 12:34:59 +0000139 ZipFile is also a context manager and therefore supports the
140 :keyword:`with` statement. In the example, *myzip* is closed after the
141 :keyword:`with` statement's suite is finished---even if an exception occurs::
Georg Brandl116aa622007-08-15 14:28:22 +0000142
Ezio Melottifaa6b7f2009-12-30 12:34:59 +0000143 with ZipFile('spam.zip', 'w') as myzip:
144 myzip.write('eggs.txt')
145
146 .. versionadded:: 3.2
147 Added the ability to use :class:`ZipFile` as a context manager.
Georg Brandl116aa622007-08-15 14:28:22 +0000148
Georg Brandl116aa622007-08-15 14:28:22 +0000149
150.. method:: ZipFile.close()
151
152 Close the archive file. You must call :meth:`close` before exiting your program
153 or essential records will not be written.
154
155
156.. method:: ZipFile.getinfo(name)
157
158 Return a :class:`ZipInfo` object with information about the archive member
159 *name*. Calling :meth:`getinfo` for a name not currently contained in the
160 archive will raise a :exc:`KeyError`.
161
162
163.. method:: ZipFile.infolist()
164
165 Return a list containing a :class:`ZipInfo` object for each member of the
166 archive. The objects are in the same order as their entries in the actual ZIP
167 file on disk if an existing archive was opened.
168
169
170.. method:: ZipFile.namelist()
171
172 Return a list of archive members by name.
173
174
Georg Brandl7f01a132009-09-16 15:58:14 +0000175.. method:: ZipFile.open(name, mode='r', pwd=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000176
177 Extract a member from the archive as a file-like object (ZipExtFile). *name* is
Georg Brandlb533e262008-05-25 18:19:30 +0000178 the name of the file in the archive, or a :class:`ZipInfo` object. The *mode*
179 parameter, if included, must be one of the following: ``'r'`` (the default),
180 ``'U'``, or ``'rU'``. Choosing ``'U'`` or ``'rU'`` will enable universal newline
181 support in the read-only object. *pwd* is the password used for encrypted files.
182 Calling :meth:`open` on a closed ZipFile will raise a :exc:`RuntimeError`.
Georg Brandl116aa622007-08-15 14:28:22 +0000183
184 .. note::
185
186 The file-like object is read-only and provides the following methods:
Georg Brandl8f358aa2010-11-30 07:43:28 +0000187 :meth:`!read`, :meth:`!readline`, :meth:`!readlines`, :meth:`!__iter__`,
188 :meth:`!__next__`.
Georg Brandl116aa622007-08-15 14:28:22 +0000189
190 .. note::
191
192 If the ZipFile was created by passing in a file-like object as the first
Guido van Rossumda27fd22007-08-17 00:24:54 +0000193 argument to the constructor, then the object returned by :meth:`.open` shares the
Georg Brandl116aa622007-08-15 14:28:22 +0000194 ZipFile's file pointer. Under these circumstances, the object returned by
Guido van Rossumda27fd22007-08-17 00:24:54 +0000195 :meth:`.open` should not be used after any additional operations are performed
Georg Brandl116aa622007-08-15 14:28:22 +0000196 on the ZipFile object. If the ZipFile was created by passing in a string (the
Guido van Rossumda27fd22007-08-17 00:24:54 +0000197 filename) as the first argument to the constructor, then :meth:`.open` will
Georg Brandl116aa622007-08-15 14:28:22 +0000198 create a new file object that will be held by the ZipExtFile, allowing it to
199 operate independently of the ZipFile.
200
Georg Brandlb533e262008-05-25 18:19:30 +0000201 .. note::
202
203 The :meth:`open`, :meth:`read` and :meth:`extract` methods can take a filename
204 or a :class:`ZipInfo` object. You will appreciate this when trying to read a
205 ZIP file that contains members with duplicate names.
206
Georg Brandl116aa622007-08-15 14:28:22 +0000207
Georg Brandl7f01a132009-09-16 15:58:14 +0000208.. method:: ZipFile.extract(member, path=None, pwd=None)
Christian Heimes790c8232008-01-07 21:14:23 +0000209
Georg Brandlb533e262008-05-25 18:19:30 +0000210 Extract a member from the archive to the current working directory; *member*
211 must be its full name or a :class:`ZipInfo` object). Its file information is
212 extracted as accurately as possible. *path* specifies a different directory
213 to extract to. *member* can be a filename or a :class:`ZipInfo` object.
214 *pwd* is the password used for encrypted files.
Christian Heimes790c8232008-01-07 21:14:23 +0000215
Christian Heimes790c8232008-01-07 21:14:23 +0000216
Georg Brandl7f01a132009-09-16 15:58:14 +0000217.. method:: ZipFile.extractall(path=None, members=None, pwd=None)
Christian Heimes790c8232008-01-07 21:14:23 +0000218
Georg Brandl48310cd2009-01-03 21:18:54 +0000219 Extract all members from the archive to the current working directory. *path*
Christian Heimes790c8232008-01-07 21:14:23 +0000220 specifies a different directory to extract to. *members* is optional and must
221 be a subset of the list returned by :meth:`namelist`. *pwd* is the password
222 used for encrypted files.
223
Benjamin Petersona0dfa822009-11-13 02:25:08 +0000224 .. warning::
225
226 Never extract archives from untrusted sources without prior inspection.
227 It is possible that files are created outside of *path*, e.g. members
228 that have absolute filenames starting with ``"/"`` or filenames with two
229 dots ``".."``.
230
Christian Heimes790c8232008-01-07 21:14:23 +0000231
Georg Brandl116aa622007-08-15 14:28:22 +0000232.. method:: ZipFile.printdir()
233
234 Print a table of contents for the archive to ``sys.stdout``.
235
236
237.. method:: ZipFile.setpassword(pwd)
238
239 Set *pwd* as default password to extract encrypted files.
240
Georg Brandl116aa622007-08-15 14:28:22 +0000241
Georg Brandl7f01a132009-09-16 15:58:14 +0000242.. method:: ZipFile.read(name, pwd=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000243
Georg Brandlb533e262008-05-25 18:19:30 +0000244 Return the bytes of the file *name* in the archive. *name* is the name of the
245 file in the archive, or a :class:`ZipInfo` object. The archive must be open for
246 read or append. *pwd* is the password used for encrypted files and, if specified,
247 it will override the default password set with :meth:`setpassword`. Calling
Georg Brandl116aa622007-08-15 14:28:22 +0000248 :meth:`read` on a closed ZipFile will raise a :exc:`RuntimeError`.
249
Georg Brandl116aa622007-08-15 14:28:22 +0000250
251.. method:: ZipFile.testzip()
252
253 Read all the files in the archive and check their CRC's and file headers.
254 Return the name of the first bad file, or else return ``None``. Calling
255 :meth:`testzip` on a closed ZipFile will raise a :exc:`RuntimeError`.
256
257
Georg Brandl7f01a132009-09-16 15:58:14 +0000258.. method:: ZipFile.write(filename, arcname=None, compress_type=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000259
260 Write the file named *filename* to the archive, giving it the archive name
261 *arcname* (by default, this will be the same as *filename*, but without a drive
262 letter and with leading path separators removed). If given, *compress_type*
263 overrides the value given for the *compression* parameter to the constructor for
264 the new entry. The archive must be open with mode ``'w'`` or ``'a'`` -- calling
265 :meth:`write` on a ZipFile created with mode ``'r'`` will raise a
266 :exc:`RuntimeError`. Calling :meth:`write` on a closed ZipFile will raise a
267 :exc:`RuntimeError`.
268
269 .. note::
270
271 There is no official file name encoding for ZIP files. If you have unicode file
Thomas Wouters47b49bf2007-08-30 22:15:33 +0000272 names, you must convert them to byte strings in your desired encoding before
Georg Brandl116aa622007-08-15 14:28:22 +0000273 passing them to :meth:`write`. WinZip interprets all file names as encoded in
274 CP437, also known as DOS Latin.
275
276 .. note::
277
278 Archive names should be relative to the archive root, that is, they should not
279 start with a path separator.
280
281 .. note::
282
283 If ``arcname`` (or ``filename``, if ``arcname`` is not given) contains a null
284 byte, the name of the file in the archive will be truncated at the null byte.
285
286
Ronald Oussorenee5c8852010-02-07 20:24:02 +0000287.. method:: ZipFile.writestr(zinfo_or_arcname, bytes[, compress_type])
Georg Brandl116aa622007-08-15 14:28:22 +0000288
289 Write the string *bytes* to the archive; *zinfo_or_arcname* is either the file
290 name it will be given in the archive, or a :class:`ZipInfo` instance. If it's
291 an instance, at least the filename, date, and time must be given. If it's a
292 name, the date and time is set to the current date and time. The archive must be
293 opened with mode ``'w'`` or ``'a'`` -- calling :meth:`writestr` on a ZipFile
294 created with mode ``'r'`` will raise a :exc:`RuntimeError`. Calling
295 :meth:`writestr` on a closed ZipFile will raise a :exc:`RuntimeError`.
296
Ronald Oussorenee5c8852010-02-07 20:24:02 +0000297 If given, *compress_type* overrides the value given for the *compression*
298 parameter to the constructor for the new entry, or in the *zinfo_or_arcname*
299 (if that is a :class:`ZipInfo` instance).
300
Christian Heimes790c8232008-01-07 21:14:23 +0000301 .. note::
302
Éric Araujo0d4bcf42010-12-26 17:53:27 +0000303 When passing a :class:`ZipInfo` instance as the *zinfo_or_arcname* parameter,
Georg Brandl48310cd2009-01-03 21:18:54 +0000304 the compression method used will be that specified in the *compress_type*
305 member of the given :class:`ZipInfo` instance. By default, the
Christian Heimes790c8232008-01-07 21:14:23 +0000306 :class:`ZipInfo` constructor sets this member to :const:`ZIP_STORED`.
307
Ezio Melottif8754a62010-03-21 07:16:43 +0000308 .. versionchanged:: 3.2
Ronald Oussorenee5c8852010-02-07 20:24:02 +0000309 The *compression_type* argument.
310
Martin v. Löwisb09b8442008-07-03 14:13:42 +0000311The following data attributes are also available:
Georg Brandl116aa622007-08-15 14:28:22 +0000312
313
314.. attribute:: ZipFile.debug
315
316 The level of debug output to use. This may be set from ``0`` (the default, no
317 output) to ``3`` (the most output). Debugging information is written to
318 ``sys.stdout``.
319
Martin v. Löwisb09b8442008-07-03 14:13:42 +0000320.. attribute:: ZipFile.comment
321
Georg Brandl48310cd2009-01-03 21:18:54 +0000322 The comment text associated with the ZIP file. If assigning a comment to a
323 :class:`ZipFile` instance created with mode 'a' or 'w', this should be a
324 string no longer than 65535 bytes. Comments longer than this will be
Martin v. Löwisb09b8442008-07-03 14:13:42 +0000325 truncated in the written archive when :meth:`ZipFile.close` is called.
Georg Brandl116aa622007-08-15 14:28:22 +0000326
Georg Brandl8334fd92010-12-04 10:26:46 +0000327
Georg Brandl116aa622007-08-15 14:28:22 +0000328.. _pyzipfile-objects:
329
330PyZipFile Objects
331-----------------
332
333The :class:`PyZipFile` constructor takes the same parameters as the
Georg Brandl8334fd92010-12-04 10:26:46 +0000334:class:`ZipFile` constructor, and one additional parameter, *optimize*.
Georg Brandl116aa622007-08-15 14:28:22 +0000335
Georg Brandl8334fd92010-12-04 10:26:46 +0000336.. class:: PyZipFile(file, mode='r', compression=ZIP_STORED, allowZip64=False, \
337 optimize=-1)
Georg Brandl116aa622007-08-15 14:28:22 +0000338
Georg Brandl8334fd92010-12-04 10:26:46 +0000339 .. versionadded:: 3.2
340 The *optimize* parameter.
Georg Brandl116aa622007-08-15 14:28:22 +0000341
Georg Brandl8334fd92010-12-04 10:26:46 +0000342 Instances have one method in addition to those of :class:`ZipFile` objects:
Georg Brandl116aa622007-08-15 14:28:22 +0000343
Georg Brandl8334fd92010-12-04 10:26:46 +0000344 .. method:: PyZipFile.writepy(pathname, basename='')
345
346 Search for files :file:`\*.py` and add the corresponding file to the
347 archive.
348
349 If the *optimize* parameter to :class:`PyZipFile` was not given or ``-1``,
350 the corresponding file is a :file:`\*.pyo` file if available, else a
351 :file:`\*.pyc` file, compiling if necessary.
352
353 If the *optimize* parameter to :class:`PyZipFile` was ``0``, ``1`` or
354 ``2``, only files with that optimization level (see :func:`compile`) are
355 added to the archive, compiling if necessary.
356
357 If the pathname is a file, the filename must end with :file:`.py`, and
358 just the (corresponding :file:`\*.py[co]`) file is added at the top level
359 (no path information). If the pathname is a file that does not end with
360 :file:`.py`, a :exc:`RuntimeError` will be raised. If it is a directory,
361 and the directory is not a package directory, then all the files
362 :file:`\*.py[co]` are added at the top level. If the directory is a
363 package directory, then all :file:`\*.py[co]` are added under the package
364 name as a file path, and if any subdirectories are package directories,
365 all of these are added recursively. *basename* is intended for internal
366 use only. The :meth:`writepy` method makes archives with file names like
367 this::
368
369 string.pyc # Top level name
370 test/__init__.pyc # Package directory
371 test/testall.pyc # Module test.testall
372 test/bogus/__init__.pyc # Subpackage directory
373 test/bogus/myfile.pyc # Submodule test.bogus.myfile
Georg Brandl116aa622007-08-15 14:28:22 +0000374
375
376.. _zipinfo-objects:
377
378ZipInfo Objects
379---------------
380
381Instances of the :class:`ZipInfo` class are returned by the :meth:`getinfo` and
382:meth:`infolist` methods of :class:`ZipFile` objects. Each object stores
383information about a single member of the ZIP archive.
384
385Instances have the following attributes:
386
387
388.. attribute:: ZipInfo.filename
389
390 Name of the file in the archive.
391
392
393.. attribute:: ZipInfo.date_time
394
395 The time and date of the last modification to the archive member. This is a
396 tuple of six values:
397
398 +-------+--------------------------+
399 | Index | Value |
400 +=======+==========================+
401 | ``0`` | Year |
402 +-------+--------------------------+
403 | ``1`` | Month (one-based) |
404 +-------+--------------------------+
405 | ``2`` | Day of month (one-based) |
406 +-------+--------------------------+
407 | ``3`` | Hours (zero-based) |
408 +-------+--------------------------+
409 | ``4`` | Minutes (zero-based) |
410 +-------+--------------------------+
411 | ``5`` | Seconds (zero-based) |
412 +-------+--------------------------+
413
414
415.. attribute:: ZipInfo.compress_type
416
417 Type of compression for the archive member.
418
419
420.. attribute:: ZipInfo.comment
421
422 Comment for the individual archive member.
423
424
425.. attribute:: ZipInfo.extra
426
427 Expansion field data. The `PKZIP Application Note
Christian Heimesdd15f6c2008-03-16 00:07:10 +0000428 <http://www.pkware.com/documents/casestudies/APPNOTE.TXT>`_ contains
Georg Brandl116aa622007-08-15 14:28:22 +0000429 some comments on the internal structure of the data contained in this string.
430
431
432.. attribute:: ZipInfo.create_system
433
434 System which created ZIP archive.
435
436
437.. attribute:: ZipInfo.create_version
438
439 PKZIP version which created ZIP archive.
440
441
442.. attribute:: ZipInfo.extract_version
443
444 PKZIP version needed to extract archive.
445
446
447.. attribute:: ZipInfo.reserved
448
449 Must be zero.
450
451
452.. attribute:: ZipInfo.flag_bits
453
454 ZIP flag bits.
455
456
457.. attribute:: ZipInfo.volume
458
459 Volume number of file header.
460
461
462.. attribute:: ZipInfo.internal_attr
463
464 Internal attributes.
465
466
467.. attribute:: ZipInfo.external_attr
468
469 External file attributes.
470
471
472.. attribute:: ZipInfo.header_offset
473
474 Byte offset to the file header.
475
476
477.. attribute:: ZipInfo.CRC
478
479 CRC-32 of the uncompressed file.
480
481
482.. attribute:: ZipInfo.compress_size
483
484 Size of the compressed data.
485
486
487.. attribute:: ZipInfo.file_size
488
489 Size of the uncompressed file.
490