blob: ef2edfcfb35fba2074e40f8e0569021faf8afb8b [file] [log] [blame]
Georg Brandl116aa622007-08-15 14:28:22 +00001:mod:`zipfile` --- Work with ZIP archives
2=========================================
3
4.. module:: zipfile
5 :synopsis: Read and write ZIP-format archive files.
6.. moduleauthor:: James C. Ahlstrom <jim@interet.com>
7.. sectionauthor:: James C. Ahlstrom <jim@interet.com>
8
Georg Brandl116aa622007-08-15 14:28:22 +00009The ZIP file format is a common archive and compression standard. This module
10provides tools to create, read, write, append, and list a ZIP file. Any
11advanced use of this module will require an understanding of the format, as
12defined in `PKZIP Application Note
Christian Heimesdd15f6c2008-03-16 00:07:10 +000013<http://www.pkware.com/documents/casestudies/APPNOTE.TXT>`_.
Georg Brandl116aa622007-08-15 14:28:22 +000014
Georg Brandl98be9962010-08-02 20:52:10 +000015This module does not currently handle multi-disk ZIP files.
16It can handle ZIP files that use the ZIP64 extensions
Guido van Rossum77677112007-11-05 19:43:04 +000017(that is ZIP files that are more than 4 GByte in size). It supports
18decryption of encrypted files in ZIP archives, but it currently cannot
Christian Heimesfdab48e2008-01-20 09:06:41 +000019create an encrypted file. Decryption is extremely slow as it is
Benjamin Peterson20211002009-11-25 18:34:42 +000020implemented in native Python rather than C.
Georg Brandl116aa622007-08-15 14:28:22 +000021
Guido van Rossum77677112007-11-05 19:43:04 +000022For other archive formats, see the :mod:`bz2`, :mod:`gzip`, and
23:mod:`tarfile` modules.
Georg Brandl116aa622007-08-15 14:28:22 +000024
Guido van Rossum77677112007-11-05 19:43:04 +000025The module defines the following items:
Georg Brandl116aa622007-08-15 14:28:22 +000026
Georg Brandl4d540882010-10-28 06:42:33 +000027.. exception:: BadZipFile
Georg Brandl116aa622007-08-15 14:28:22 +000028
29 The error raised for bad ZIP files (old name: ``zipfile.error``).
30
Georg Brandl4d540882010-10-28 06:42:33 +000031 .. versionadded:: 3.2
32
33
34.. exception:: BadZipfile
35
36 This is an alias for :exc:`BadZipFile` that exists for compatibility with
37 Python versions prior to 3.2. Usage is deprecated.
38
Georg Brandl116aa622007-08-15 14:28:22 +000039
40.. exception:: LargeZipFile
41
42 The error raised when a ZIP file would require ZIP64 functionality but that has
43 not been enabled.
44
45
46.. class:: ZipFile
47
48 The class for reading and writing ZIP files. See section
49 :ref:`zipfile-objects` for constructor details.
50
51
52.. class:: PyZipFile
53
54 Class for creating ZIP archives containing Python libraries.
55
56
Georg Brandl7f01a132009-09-16 15:58:14 +000057.. class:: ZipInfo(filename='NoName', date_time=(1980,1,1,0,0,0))
Georg Brandl116aa622007-08-15 14:28:22 +000058
59 Class used to represent information about a member of an archive. Instances
60 of this class are returned by the :meth:`getinfo` and :meth:`infolist`
61 methods of :class:`ZipFile` objects. Most users of the :mod:`zipfile` module
62 will not need to create these, but only use those created by this
63 module. *filename* should be the full name of the archive member, and
64 *date_time* should be a tuple containing six fields which describe the time
65 of the last modification to the file; the fields are described in section
66 :ref:`zipinfo-objects`.
67
68
69.. function:: is_zipfile(filename)
70
71 Returns ``True`` if *filename* is a valid ZIP file based on its magic number,
Antoine Pitroudb5fe662008-12-27 15:50:40 +000072 otherwise returns ``False``. *filename* may be a file or file-like object too.
Georg Brandl116aa622007-08-15 14:28:22 +000073
Georg Brandl277a1502009-01-04 00:28:14 +000074 .. versionchanged:: 3.1
75 Support for file and file-like objects.
Georg Brandl116aa622007-08-15 14:28:22 +000076
Georg Brandl67b21b72010-08-17 15:07:14 +000077
Georg Brandl116aa622007-08-15 14:28:22 +000078.. data:: ZIP_STORED
79
80 The numeric constant for an uncompressed archive member.
81
82
83.. data:: ZIP_DEFLATED
84
85 The numeric constant for the usual ZIP compression method. This requires the
86 zlib module. No other compression methods are currently supported.
87
88
89.. seealso::
90
Christian Heimesdd15f6c2008-03-16 00:07:10 +000091 `PKZIP Application Note <http://www.pkware.com/documents/casestudies/APPNOTE.TXT>`_
Georg Brandl116aa622007-08-15 14:28:22 +000092 Documentation on the ZIP file format by Phil Katz, the creator of the format and
93 algorithms used.
94
95 `Info-ZIP Home Page <http://www.info-zip.org/>`_
96 Information about the Info-ZIP project's ZIP archive programs and development
97 libraries.
98
99
100.. _zipfile-objects:
101
102ZipFile Objects
103---------------
104
105
Georg Brandl7f01a132009-09-16 15:58:14 +0000106.. class:: ZipFile(file, mode='r', compression=ZIP_STORED, allowZip64=False)
Georg Brandl116aa622007-08-15 14:28:22 +0000107
108 Open a ZIP file, where *file* can be either a path to a file (a string) or a
109 file-like object. The *mode* parameter should be ``'r'`` to read an existing
110 file, ``'w'`` to truncate and write a new file, or ``'a'`` to append to an
Ezio Melottifaa6b7f2009-12-30 12:34:59 +0000111 existing file. If *mode* is ``'a'`` and *file* refers to an existing ZIP
112 file, then additional files are added to it. If *file* does not refer to a
113 ZIP file, then a new ZIP archive is appended to the file. This is meant for
114 adding a ZIP archive to another file (such as :file:`python.exe`). If
115 *mode* is ``a`` and the file does not exist at all, it is created.
116 *compression* is the ZIP compression method to use when writing the archive,
117 and should be :const:`ZIP_STORED` or :const:`ZIP_DEFLATED`; unrecognized
118 values will cause :exc:`RuntimeError` to be raised. If :const:`ZIP_DEFLATED`
119 is specified but the :mod:`zlib` module is not available, :exc:`RuntimeError`
120 is also raised. The default is :const:`ZIP_STORED`. If *allowZip64* is
121 ``True`` zipfile will create ZIP files that use the ZIP64 extensions when
122 the zipfile is larger than 2 GB. If it is false (the default) :mod:`zipfile`
123 will raise an exception when the ZIP file would require ZIP64 extensions.
124 ZIP64 extensions are disabled by default because the default :program:`zip`
125 and :program:`unzip` commands on Unix (the InfoZIP utilities) don't support
126 these extensions.
Georg Brandl116aa622007-08-15 14:28:22 +0000127
Georg Brandl268e4d42010-10-14 06:59:45 +0000128 If the file is created with mode ``'a'`` or ``'w'`` and then
129 :meth:`close`\ d without adding any files to the archive, the appropriate
130 ZIP structures for an empty archive will be written to the file.
131
Ezio Melottifaa6b7f2009-12-30 12:34:59 +0000132 ZipFile is also a context manager and therefore supports the
133 :keyword:`with` statement. In the example, *myzip* is closed after the
134 :keyword:`with` statement's suite is finished---even if an exception occurs::
Georg Brandl116aa622007-08-15 14:28:22 +0000135
Ezio Melottifaa6b7f2009-12-30 12:34:59 +0000136 with ZipFile('spam.zip', 'w') as myzip:
137 myzip.write('eggs.txt')
138
139 .. versionadded:: 3.2
140 Added the ability to use :class:`ZipFile` as a context manager.
Georg Brandl116aa622007-08-15 14:28:22 +0000141
Georg Brandl116aa622007-08-15 14:28:22 +0000142
143.. method:: ZipFile.close()
144
145 Close the archive file. You must call :meth:`close` before exiting your program
146 or essential records will not be written.
147
148
149.. method:: ZipFile.getinfo(name)
150
151 Return a :class:`ZipInfo` object with information about the archive member
152 *name*. Calling :meth:`getinfo` for a name not currently contained in the
153 archive will raise a :exc:`KeyError`.
154
155
156.. method:: ZipFile.infolist()
157
158 Return a list containing a :class:`ZipInfo` object for each member of the
159 archive. The objects are in the same order as their entries in the actual ZIP
160 file on disk if an existing archive was opened.
161
162
163.. method:: ZipFile.namelist()
164
165 Return a list of archive members by name.
166
167
Georg Brandl7f01a132009-09-16 15:58:14 +0000168.. method:: ZipFile.open(name, mode='r', pwd=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000169
170 Extract a member from the archive as a file-like object (ZipExtFile). *name* is
Georg Brandlb533e262008-05-25 18:19:30 +0000171 the name of the file in the archive, or a :class:`ZipInfo` object. The *mode*
172 parameter, if included, must be one of the following: ``'r'`` (the default),
173 ``'U'``, or ``'rU'``. Choosing ``'U'`` or ``'rU'`` will enable universal newline
174 support in the read-only object. *pwd* is the password used for encrypted files.
175 Calling :meth:`open` on a closed ZipFile will raise a :exc:`RuntimeError`.
Georg Brandl116aa622007-08-15 14:28:22 +0000176
177 .. note::
178
179 The file-like object is read-only and provides the following methods:
180 :meth:`read`, :meth:`readline`, :meth:`readlines`, :meth:`__iter__`,
Georg Brandlcb445ef2010-04-02 20:12:42 +0000181 :meth:`__next__`.
Georg Brandl116aa622007-08-15 14:28:22 +0000182
183 .. note::
184
185 If the ZipFile was created by passing in a file-like object as the first
Guido van Rossumda27fd22007-08-17 00:24:54 +0000186 argument to the constructor, then the object returned by :meth:`.open` shares the
Georg Brandl116aa622007-08-15 14:28:22 +0000187 ZipFile's file pointer. Under these circumstances, the object returned by
Guido van Rossumda27fd22007-08-17 00:24:54 +0000188 :meth:`.open` should not be used after any additional operations are performed
Georg Brandl116aa622007-08-15 14:28:22 +0000189 on the ZipFile object. If the ZipFile was created by passing in a string (the
Guido van Rossumda27fd22007-08-17 00:24:54 +0000190 filename) as the first argument to the constructor, then :meth:`.open` will
Georg Brandl116aa622007-08-15 14:28:22 +0000191 create a new file object that will be held by the ZipExtFile, allowing it to
192 operate independently of the ZipFile.
193
Georg Brandlb533e262008-05-25 18:19:30 +0000194 .. note::
195
196 The :meth:`open`, :meth:`read` and :meth:`extract` methods can take a filename
197 or a :class:`ZipInfo` object. You will appreciate this when trying to read a
198 ZIP file that contains members with duplicate names.
199
Georg Brandl116aa622007-08-15 14:28:22 +0000200
Georg Brandl7f01a132009-09-16 15:58:14 +0000201.. method:: ZipFile.extract(member, path=None, pwd=None)
Christian Heimes790c8232008-01-07 21:14:23 +0000202
Georg Brandlb533e262008-05-25 18:19:30 +0000203 Extract a member from the archive to the current working directory; *member*
204 must be its full name or a :class:`ZipInfo` object). Its file information is
205 extracted as accurately as possible. *path* specifies a different directory
206 to extract to. *member* can be a filename or a :class:`ZipInfo` object.
207 *pwd* is the password used for encrypted files.
Christian Heimes790c8232008-01-07 21:14:23 +0000208
Christian Heimes790c8232008-01-07 21:14:23 +0000209
Georg Brandl7f01a132009-09-16 15:58:14 +0000210.. method:: ZipFile.extractall(path=None, members=None, pwd=None)
Christian Heimes790c8232008-01-07 21:14:23 +0000211
Georg Brandl48310cd2009-01-03 21:18:54 +0000212 Extract all members from the archive to the current working directory. *path*
Christian Heimes790c8232008-01-07 21:14:23 +0000213 specifies a different directory to extract to. *members* is optional and must
214 be a subset of the list returned by :meth:`namelist`. *pwd* is the password
215 used for encrypted files.
216
Benjamin Petersona0dfa822009-11-13 02:25:08 +0000217 .. warning::
218
219 Never extract archives from untrusted sources without prior inspection.
220 It is possible that files are created outside of *path*, e.g. members
221 that have absolute filenames starting with ``"/"`` or filenames with two
222 dots ``".."``.
223
Christian Heimes790c8232008-01-07 21:14:23 +0000224
Georg Brandl116aa622007-08-15 14:28:22 +0000225.. method:: ZipFile.printdir()
226
227 Print a table of contents for the archive to ``sys.stdout``.
228
229
230.. method:: ZipFile.setpassword(pwd)
231
232 Set *pwd* as default password to extract encrypted files.
233
Georg Brandl116aa622007-08-15 14:28:22 +0000234
Georg Brandl7f01a132009-09-16 15:58:14 +0000235.. method:: ZipFile.read(name, pwd=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000236
Georg Brandlb533e262008-05-25 18:19:30 +0000237 Return the bytes of the file *name* in the archive. *name* is the name of the
238 file in the archive, or a :class:`ZipInfo` object. The archive must be open for
239 read or append. *pwd* is the password used for encrypted files and, if specified,
240 it will override the default password set with :meth:`setpassword`. Calling
Georg Brandl116aa622007-08-15 14:28:22 +0000241 :meth:`read` on a closed ZipFile will raise a :exc:`RuntimeError`.
242
Georg Brandl116aa622007-08-15 14:28:22 +0000243
244.. method:: ZipFile.testzip()
245
246 Read all the files in the archive and check their CRC's and file headers.
247 Return the name of the first bad file, or else return ``None``. Calling
248 :meth:`testzip` on a closed ZipFile will raise a :exc:`RuntimeError`.
249
250
Georg Brandl7f01a132009-09-16 15:58:14 +0000251.. method:: ZipFile.write(filename, arcname=None, compress_type=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000252
253 Write the file named *filename* to the archive, giving it the archive name
254 *arcname* (by default, this will be the same as *filename*, but without a drive
255 letter and with leading path separators removed). If given, *compress_type*
256 overrides the value given for the *compression* parameter to the constructor for
257 the new entry. The archive must be open with mode ``'w'`` or ``'a'`` -- calling
258 :meth:`write` on a ZipFile created with mode ``'r'`` will raise a
259 :exc:`RuntimeError`. Calling :meth:`write` on a closed ZipFile will raise a
260 :exc:`RuntimeError`.
261
262 .. note::
263
264 There is no official file name encoding for ZIP files. If you have unicode file
Thomas Wouters47b49bf2007-08-30 22:15:33 +0000265 names, you must convert them to byte strings in your desired encoding before
Georg Brandl116aa622007-08-15 14:28:22 +0000266 passing them to :meth:`write`. WinZip interprets all file names as encoded in
267 CP437, also known as DOS Latin.
268
269 .. note::
270
271 Archive names should be relative to the archive root, that is, they should not
272 start with a path separator.
273
274 .. note::
275
276 If ``arcname`` (or ``filename``, if ``arcname`` is not given) contains a null
277 byte, the name of the file in the archive will be truncated at the null byte.
278
279
Ronald Oussorenee5c8852010-02-07 20:24:02 +0000280.. method:: ZipFile.writestr(zinfo_or_arcname, bytes[, compress_type])
Georg Brandl116aa622007-08-15 14:28:22 +0000281
282 Write the string *bytes* to the archive; *zinfo_or_arcname* is either the file
283 name it will be given in the archive, or a :class:`ZipInfo` instance. If it's
284 an instance, at least the filename, date, and time must be given. If it's a
285 name, the date and time is set to the current date and time. The archive must be
286 opened with mode ``'w'`` or ``'a'`` -- calling :meth:`writestr` on a ZipFile
287 created with mode ``'r'`` will raise a :exc:`RuntimeError`. Calling
288 :meth:`writestr` on a closed ZipFile will raise a :exc:`RuntimeError`.
289
Ronald Oussorenee5c8852010-02-07 20:24:02 +0000290 If given, *compress_type* overrides the value given for the *compression*
291 parameter to the constructor for the new entry, or in the *zinfo_or_arcname*
292 (if that is a :class:`ZipInfo` instance).
293
Christian Heimes790c8232008-01-07 21:14:23 +0000294 .. note::
295
Georg Brandl48310cd2009-01-03 21:18:54 +0000296 When passing a :class:`ZipInfo` instance as the *zinfo_or_acrname* parameter,
297 the compression method used will be that specified in the *compress_type*
298 member of the given :class:`ZipInfo` instance. By default, the
Christian Heimes790c8232008-01-07 21:14:23 +0000299 :class:`ZipInfo` constructor sets this member to :const:`ZIP_STORED`.
300
Ezio Melottif8754a62010-03-21 07:16:43 +0000301 .. versionchanged:: 3.2
Ronald Oussorenee5c8852010-02-07 20:24:02 +0000302 The *compression_type* argument.
303
Martin v. Löwisb09b8442008-07-03 14:13:42 +0000304The following data attributes are also available:
Georg Brandl116aa622007-08-15 14:28:22 +0000305
306
307.. attribute:: ZipFile.debug
308
309 The level of debug output to use. This may be set from ``0`` (the default, no
310 output) to ``3`` (the most output). Debugging information is written to
311 ``sys.stdout``.
312
Martin v. Löwisb09b8442008-07-03 14:13:42 +0000313.. attribute:: ZipFile.comment
314
Georg Brandl48310cd2009-01-03 21:18:54 +0000315 The comment text associated with the ZIP file. If assigning a comment to a
316 :class:`ZipFile` instance created with mode 'a' or 'w', this should be a
317 string no longer than 65535 bytes. Comments longer than this will be
Martin v. Löwisb09b8442008-07-03 14:13:42 +0000318 truncated in the written archive when :meth:`ZipFile.close` is called.
Georg Brandl116aa622007-08-15 14:28:22 +0000319
320.. _pyzipfile-objects:
321
322PyZipFile Objects
323-----------------
324
325The :class:`PyZipFile` constructor takes the same parameters as the
326:class:`ZipFile` constructor. Instances have one method in addition to those of
327:class:`ZipFile` objects.
328
329
Georg Brandl7f01a132009-09-16 15:58:14 +0000330.. method:: PyZipFile.writepy(pathname, basename='')
Georg Brandl116aa622007-08-15 14:28:22 +0000331
332 Search for files :file:`\*.py` and add the corresponding file to the archive.
333 The corresponding file is a :file:`\*.pyo` file if available, else a
334 :file:`\*.pyc` file, compiling if necessary. If the pathname is a file, the
335 filename must end with :file:`.py`, and just the (corresponding
336 :file:`\*.py[co]`) file is added at the top level (no path information). If the
337 pathname is a file that does not end with :file:`.py`, a :exc:`RuntimeError`
338 will be raised. If it is a directory, and the directory is not a package
339 directory, then all the files :file:`\*.py[co]` are added at the top level. If
340 the directory is a package directory, then all :file:`\*.py[co]` are added under
341 the package name as a file path, and if any subdirectories are package
342 directories, all of these are added recursively. *basename* is intended for
343 internal use only. The :meth:`writepy` method makes archives with file names
344 like this::
345
Georg Brandl48310cd2009-01-03 21:18:54 +0000346 string.pyc # Top level name
347 test/__init__.pyc # Package directory
Georg Brandl116aa622007-08-15 14:28:22 +0000348 test/testall.pyc # Module test.testall
Georg Brandl48310cd2009-01-03 21:18:54 +0000349 test/bogus/__init__.pyc # Subpackage directory
Georg Brandl116aa622007-08-15 14:28:22 +0000350 test/bogus/myfile.pyc # Submodule test.bogus.myfile
351
352
353.. _zipinfo-objects:
354
355ZipInfo Objects
356---------------
357
358Instances of the :class:`ZipInfo` class are returned by the :meth:`getinfo` and
359:meth:`infolist` methods of :class:`ZipFile` objects. Each object stores
360information about a single member of the ZIP archive.
361
362Instances have the following attributes:
363
364
365.. attribute:: ZipInfo.filename
366
367 Name of the file in the archive.
368
369
370.. attribute:: ZipInfo.date_time
371
372 The time and date of the last modification to the archive member. This is a
373 tuple of six values:
374
375 +-------+--------------------------+
376 | Index | Value |
377 +=======+==========================+
378 | ``0`` | Year |
379 +-------+--------------------------+
380 | ``1`` | Month (one-based) |
381 +-------+--------------------------+
382 | ``2`` | Day of month (one-based) |
383 +-------+--------------------------+
384 | ``3`` | Hours (zero-based) |
385 +-------+--------------------------+
386 | ``4`` | Minutes (zero-based) |
387 +-------+--------------------------+
388 | ``5`` | Seconds (zero-based) |
389 +-------+--------------------------+
390
391
392.. attribute:: ZipInfo.compress_type
393
394 Type of compression for the archive member.
395
396
397.. attribute:: ZipInfo.comment
398
399 Comment for the individual archive member.
400
401
402.. attribute:: ZipInfo.extra
403
404 Expansion field data. The `PKZIP Application Note
Christian Heimesdd15f6c2008-03-16 00:07:10 +0000405 <http://www.pkware.com/documents/casestudies/APPNOTE.TXT>`_ contains
Georg Brandl116aa622007-08-15 14:28:22 +0000406 some comments on the internal structure of the data contained in this string.
407
408
409.. attribute:: ZipInfo.create_system
410
411 System which created ZIP archive.
412
413
414.. attribute:: ZipInfo.create_version
415
416 PKZIP version which created ZIP archive.
417
418
419.. attribute:: ZipInfo.extract_version
420
421 PKZIP version needed to extract archive.
422
423
424.. attribute:: ZipInfo.reserved
425
426 Must be zero.
427
428
429.. attribute:: ZipInfo.flag_bits
430
431 ZIP flag bits.
432
433
434.. attribute:: ZipInfo.volume
435
436 Volume number of file header.
437
438
439.. attribute:: ZipInfo.internal_attr
440
441 Internal attributes.
442
443
444.. attribute:: ZipInfo.external_attr
445
446 External file attributes.
447
448
449.. attribute:: ZipInfo.header_offset
450
451 Byte offset to the file header.
452
453
454.. attribute:: ZipInfo.CRC
455
456 CRC-32 of the uncompressed file.
457
458
459.. attribute:: ZipInfo.compress_size
460
461 Size of the compressed data.
462
463
464.. attribute:: ZipInfo.file_size
465
466 Size of the uncompressed file.
467