blob: d106463ceaad1d6275c9e9adda05f3d37526f0a3 [file] [log] [blame]
Georg Brandl116aa622007-08-15 14:28:22 +00001:mod:`zipfile` --- Work with ZIP archives
2=========================================
3
4.. module:: zipfile
5 :synopsis: Read and write ZIP-format archive files.
6.. moduleauthor:: James C. Ahlstrom <jim@interet.com>
7.. sectionauthor:: James C. Ahlstrom <jim@interet.com>
8
Georg Brandl116aa622007-08-15 14:28:22 +00009The ZIP file format is a common archive and compression standard. This module
10provides tools to create, read, write, append, and list a ZIP file. Any
11advanced use of this module will require an understanding of the format, as
12defined in `PKZIP Application Note
Christian Heimesdd15f6c2008-03-16 00:07:10 +000013<http://www.pkware.com/documents/casestudies/APPNOTE.TXT>`_.
Georg Brandl116aa622007-08-15 14:28:22 +000014
Georg Brandl98be9962010-08-02 20:52:10 +000015This module does not currently handle multi-disk ZIP files.
16It can handle ZIP files that use the ZIP64 extensions
Guido van Rossum77677112007-11-05 19:43:04 +000017(that is ZIP files that are more than 4 GByte in size). It supports
18decryption of encrypted files in ZIP archives, but it currently cannot
Christian Heimesfdab48e2008-01-20 09:06:41 +000019create an encrypted file. Decryption is extremely slow as it is
Benjamin Peterson20211002009-11-25 18:34:42 +000020implemented in native Python rather than C.
Georg Brandl116aa622007-08-15 14:28:22 +000021
Guido van Rossum77677112007-11-05 19:43:04 +000022For other archive formats, see the :mod:`bz2`, :mod:`gzip`, and
23:mod:`tarfile` modules.
Georg Brandl116aa622007-08-15 14:28:22 +000024
Guido van Rossum77677112007-11-05 19:43:04 +000025The module defines the following items:
Georg Brandl116aa622007-08-15 14:28:22 +000026
Georg Brandl4d540882010-10-28 06:42:33 +000027.. exception:: BadZipFile
Georg Brandl116aa622007-08-15 14:28:22 +000028
29 The error raised for bad ZIP files (old name: ``zipfile.error``).
30
Georg Brandl4d540882010-10-28 06:42:33 +000031 .. versionadded:: 3.2
32
33
34.. exception:: BadZipfile
35
36 This is an alias for :exc:`BadZipFile` that exists for compatibility with
37 Python versions prior to 3.2. Usage is deprecated.
38
Georg Brandl116aa622007-08-15 14:28:22 +000039
40.. exception:: LargeZipFile
41
42 The error raised when a ZIP file would require ZIP64 functionality but that has
43 not been enabled.
44
45
46.. class:: ZipFile
Georg Brandl5e92a502010-11-12 06:20:12 +000047 :noindex:
Georg Brandl116aa622007-08-15 14:28:22 +000048
49 The class for reading and writing ZIP files. See section
50 :ref:`zipfile-objects` for constructor details.
51
52
53.. class:: PyZipFile
54
55 Class for creating ZIP archives containing Python libraries.
56
57
Georg Brandl7f01a132009-09-16 15:58:14 +000058.. class:: ZipInfo(filename='NoName', date_time=(1980,1,1,0,0,0))
Georg Brandl116aa622007-08-15 14:28:22 +000059
60 Class used to represent information about a member of an archive. Instances
61 of this class are returned by the :meth:`getinfo` and :meth:`infolist`
62 methods of :class:`ZipFile` objects. Most users of the :mod:`zipfile` module
63 will not need to create these, but only use those created by this
64 module. *filename* should be the full name of the archive member, and
65 *date_time* should be a tuple containing six fields which describe the time
66 of the last modification to the file; the fields are described in section
67 :ref:`zipinfo-objects`.
68
69
70.. function:: is_zipfile(filename)
71
72 Returns ``True`` if *filename* is a valid ZIP file based on its magic number,
Antoine Pitroudb5fe662008-12-27 15:50:40 +000073 otherwise returns ``False``. *filename* may be a file or file-like object too.
Georg Brandl116aa622007-08-15 14:28:22 +000074
Georg Brandl277a1502009-01-04 00:28:14 +000075 .. versionchanged:: 3.1
76 Support for file and file-like objects.
Georg Brandl116aa622007-08-15 14:28:22 +000077
Georg Brandl67b21b72010-08-17 15:07:14 +000078
Georg Brandl116aa622007-08-15 14:28:22 +000079.. data:: ZIP_STORED
80
81 The numeric constant for an uncompressed archive member.
82
83
84.. data:: ZIP_DEFLATED
85
86 The numeric constant for the usual ZIP compression method. This requires the
87 zlib module. No other compression methods are currently supported.
88
89
90.. seealso::
91
Christian Heimesdd15f6c2008-03-16 00:07:10 +000092 `PKZIP Application Note <http://www.pkware.com/documents/casestudies/APPNOTE.TXT>`_
Georg Brandl116aa622007-08-15 14:28:22 +000093 Documentation on the ZIP file format by Phil Katz, the creator of the format and
94 algorithms used.
95
96 `Info-ZIP Home Page <http://www.info-zip.org/>`_
97 Information about the Info-ZIP project's ZIP archive programs and development
98 libraries.
99
100
101.. _zipfile-objects:
102
103ZipFile Objects
104---------------
105
106
Georg Brandl7f01a132009-09-16 15:58:14 +0000107.. class:: ZipFile(file, mode='r', compression=ZIP_STORED, allowZip64=False)
Georg Brandl116aa622007-08-15 14:28:22 +0000108
109 Open a ZIP file, where *file* can be either a path to a file (a string) or a
110 file-like object. The *mode* parameter should be ``'r'`` to read an existing
111 file, ``'w'`` to truncate and write a new file, or ``'a'`` to append to an
Ezio Melottifaa6b7f2009-12-30 12:34:59 +0000112 existing file. If *mode* is ``'a'`` and *file* refers to an existing ZIP
113 file, then additional files are added to it. If *file* does not refer to a
114 ZIP file, then a new ZIP archive is appended to the file. This is meant for
115 adding a ZIP archive to another file (such as :file:`python.exe`). If
116 *mode* is ``a`` and the file does not exist at all, it is created.
117 *compression* is the ZIP compression method to use when writing the archive,
118 and should be :const:`ZIP_STORED` or :const:`ZIP_DEFLATED`; unrecognized
119 values will cause :exc:`RuntimeError` to be raised. If :const:`ZIP_DEFLATED`
120 is specified but the :mod:`zlib` module is not available, :exc:`RuntimeError`
121 is also raised. The default is :const:`ZIP_STORED`. If *allowZip64* is
122 ``True`` zipfile will create ZIP files that use the ZIP64 extensions when
123 the zipfile is larger than 2 GB. If it is false (the default) :mod:`zipfile`
124 will raise an exception when the ZIP file would require ZIP64 extensions.
125 ZIP64 extensions are disabled by default because the default :program:`zip`
126 and :program:`unzip` commands on Unix (the InfoZIP utilities) don't support
127 these extensions.
Georg Brandl116aa622007-08-15 14:28:22 +0000128
Georg Brandl268e4d42010-10-14 06:59:45 +0000129 If the file is created with mode ``'a'`` or ``'w'`` and then
130 :meth:`close`\ d without adding any files to the archive, the appropriate
131 ZIP structures for an empty archive will be written to the file.
132
Ezio Melottifaa6b7f2009-12-30 12:34:59 +0000133 ZipFile is also a context manager and therefore supports the
134 :keyword:`with` statement. In the example, *myzip* is closed after the
135 :keyword:`with` statement's suite is finished---even if an exception occurs::
Georg Brandl116aa622007-08-15 14:28:22 +0000136
Ezio Melottifaa6b7f2009-12-30 12:34:59 +0000137 with ZipFile('spam.zip', 'w') as myzip:
138 myzip.write('eggs.txt')
139
140 .. versionadded:: 3.2
141 Added the ability to use :class:`ZipFile` as a context manager.
Georg Brandl116aa622007-08-15 14:28:22 +0000142
Georg Brandl116aa622007-08-15 14:28:22 +0000143
144.. method:: ZipFile.close()
145
146 Close the archive file. You must call :meth:`close` before exiting your program
147 or essential records will not be written.
148
149
150.. method:: ZipFile.getinfo(name)
151
152 Return a :class:`ZipInfo` object with information about the archive member
153 *name*. Calling :meth:`getinfo` for a name not currently contained in the
154 archive will raise a :exc:`KeyError`.
155
156
157.. method:: ZipFile.infolist()
158
159 Return a list containing a :class:`ZipInfo` object for each member of the
160 archive. The objects are in the same order as their entries in the actual ZIP
161 file on disk if an existing archive was opened.
162
163
164.. method:: ZipFile.namelist()
165
166 Return a list of archive members by name.
167
168
Georg Brandl7f01a132009-09-16 15:58:14 +0000169.. method:: ZipFile.open(name, mode='r', pwd=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000170
171 Extract a member from the archive as a file-like object (ZipExtFile). *name* is
Georg Brandlb533e262008-05-25 18:19:30 +0000172 the name of the file in the archive, or a :class:`ZipInfo` object. The *mode*
173 parameter, if included, must be one of the following: ``'r'`` (the default),
174 ``'U'``, or ``'rU'``. Choosing ``'U'`` or ``'rU'`` will enable universal newline
175 support in the read-only object. *pwd* is the password used for encrypted files.
176 Calling :meth:`open` on a closed ZipFile will raise a :exc:`RuntimeError`.
Georg Brandl116aa622007-08-15 14:28:22 +0000177
178 .. note::
179
180 The file-like object is read-only and provides the following methods:
181 :meth:`read`, :meth:`readline`, :meth:`readlines`, :meth:`__iter__`,
Georg Brandlcb445ef2010-04-02 20:12:42 +0000182 :meth:`__next__`.
Georg Brandl116aa622007-08-15 14:28:22 +0000183
184 .. note::
185
186 If the ZipFile was created by passing in a file-like object as the first
Guido van Rossumda27fd22007-08-17 00:24:54 +0000187 argument to the constructor, then the object returned by :meth:`.open` shares the
Georg Brandl116aa622007-08-15 14:28:22 +0000188 ZipFile's file pointer. Under these circumstances, the object returned by
Guido van Rossumda27fd22007-08-17 00:24:54 +0000189 :meth:`.open` should not be used after any additional operations are performed
Georg Brandl116aa622007-08-15 14:28:22 +0000190 on the ZipFile object. If the ZipFile was created by passing in a string (the
Guido van Rossumda27fd22007-08-17 00:24:54 +0000191 filename) as the first argument to the constructor, then :meth:`.open` will
Georg Brandl116aa622007-08-15 14:28:22 +0000192 create a new file object that will be held by the ZipExtFile, allowing it to
193 operate independently of the ZipFile.
194
Georg Brandlb533e262008-05-25 18:19:30 +0000195 .. note::
196
197 The :meth:`open`, :meth:`read` and :meth:`extract` methods can take a filename
198 or a :class:`ZipInfo` object. You will appreciate this when trying to read a
199 ZIP file that contains members with duplicate names.
200
Georg Brandl116aa622007-08-15 14:28:22 +0000201
Georg Brandl7f01a132009-09-16 15:58:14 +0000202.. method:: ZipFile.extract(member, path=None, pwd=None)
Christian Heimes790c8232008-01-07 21:14:23 +0000203
Georg Brandlb533e262008-05-25 18:19:30 +0000204 Extract a member from the archive to the current working directory; *member*
205 must be its full name or a :class:`ZipInfo` object). Its file information is
206 extracted as accurately as possible. *path* specifies a different directory
207 to extract to. *member* can be a filename or a :class:`ZipInfo` object.
208 *pwd* is the password used for encrypted files.
Christian Heimes790c8232008-01-07 21:14:23 +0000209
Christian Heimes790c8232008-01-07 21:14:23 +0000210
Georg Brandl7f01a132009-09-16 15:58:14 +0000211.. method:: ZipFile.extractall(path=None, members=None, pwd=None)
Christian Heimes790c8232008-01-07 21:14:23 +0000212
Georg Brandl48310cd2009-01-03 21:18:54 +0000213 Extract all members from the archive to the current working directory. *path*
Christian Heimes790c8232008-01-07 21:14:23 +0000214 specifies a different directory to extract to. *members* is optional and must
215 be a subset of the list returned by :meth:`namelist`. *pwd* is the password
216 used for encrypted files.
217
Benjamin Petersona0dfa822009-11-13 02:25:08 +0000218 .. warning::
219
220 Never extract archives from untrusted sources without prior inspection.
221 It is possible that files are created outside of *path*, e.g. members
222 that have absolute filenames starting with ``"/"`` or filenames with two
223 dots ``".."``.
224
Christian Heimes790c8232008-01-07 21:14:23 +0000225
Georg Brandl116aa622007-08-15 14:28:22 +0000226.. method:: ZipFile.printdir()
227
228 Print a table of contents for the archive to ``sys.stdout``.
229
230
231.. method:: ZipFile.setpassword(pwd)
232
233 Set *pwd* as default password to extract encrypted files.
234
Georg Brandl116aa622007-08-15 14:28:22 +0000235
Georg Brandl7f01a132009-09-16 15:58:14 +0000236.. method:: ZipFile.read(name, pwd=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000237
Georg Brandlb533e262008-05-25 18:19:30 +0000238 Return the bytes of the file *name* in the archive. *name* is the name of the
239 file in the archive, or a :class:`ZipInfo` object. The archive must be open for
240 read or append. *pwd* is the password used for encrypted files and, if specified,
241 it will override the default password set with :meth:`setpassword`. Calling
Georg Brandl116aa622007-08-15 14:28:22 +0000242 :meth:`read` on a closed ZipFile will raise a :exc:`RuntimeError`.
243
Georg Brandl116aa622007-08-15 14:28:22 +0000244
245.. method:: ZipFile.testzip()
246
247 Read all the files in the archive and check their CRC's and file headers.
248 Return the name of the first bad file, or else return ``None``. Calling
249 :meth:`testzip` on a closed ZipFile will raise a :exc:`RuntimeError`.
250
251
Georg Brandl7f01a132009-09-16 15:58:14 +0000252.. method:: ZipFile.write(filename, arcname=None, compress_type=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000253
254 Write the file named *filename* to the archive, giving it the archive name
255 *arcname* (by default, this will be the same as *filename*, but without a drive
256 letter and with leading path separators removed). If given, *compress_type*
257 overrides the value given for the *compression* parameter to the constructor for
258 the new entry. The archive must be open with mode ``'w'`` or ``'a'`` -- calling
259 :meth:`write` on a ZipFile created with mode ``'r'`` will raise a
260 :exc:`RuntimeError`. Calling :meth:`write` on a closed ZipFile will raise a
261 :exc:`RuntimeError`.
262
263 .. note::
264
265 There is no official file name encoding for ZIP files. If you have unicode file
Thomas Wouters47b49bf2007-08-30 22:15:33 +0000266 names, you must convert them to byte strings in your desired encoding before
Georg Brandl116aa622007-08-15 14:28:22 +0000267 passing them to :meth:`write`. WinZip interprets all file names as encoded in
268 CP437, also known as DOS Latin.
269
270 .. note::
271
272 Archive names should be relative to the archive root, that is, they should not
273 start with a path separator.
274
275 .. note::
276
277 If ``arcname`` (or ``filename``, if ``arcname`` is not given) contains a null
278 byte, the name of the file in the archive will be truncated at the null byte.
279
280
Ronald Oussorenee5c8852010-02-07 20:24:02 +0000281.. method:: ZipFile.writestr(zinfo_or_arcname, bytes[, compress_type])
Georg Brandl116aa622007-08-15 14:28:22 +0000282
283 Write the string *bytes* to the archive; *zinfo_or_arcname* is either the file
284 name it will be given in the archive, or a :class:`ZipInfo` instance. If it's
285 an instance, at least the filename, date, and time must be given. If it's a
286 name, the date and time is set to the current date and time. The archive must be
287 opened with mode ``'w'`` or ``'a'`` -- calling :meth:`writestr` on a ZipFile
288 created with mode ``'r'`` will raise a :exc:`RuntimeError`. Calling
289 :meth:`writestr` on a closed ZipFile will raise a :exc:`RuntimeError`.
290
Ronald Oussorenee5c8852010-02-07 20:24:02 +0000291 If given, *compress_type* overrides the value given for the *compression*
292 parameter to the constructor for the new entry, or in the *zinfo_or_arcname*
293 (if that is a :class:`ZipInfo` instance).
294
Christian Heimes790c8232008-01-07 21:14:23 +0000295 .. note::
296
Georg Brandl48310cd2009-01-03 21:18:54 +0000297 When passing a :class:`ZipInfo` instance as the *zinfo_or_acrname* parameter,
298 the compression method used will be that specified in the *compress_type*
299 member of the given :class:`ZipInfo` instance. By default, the
Christian Heimes790c8232008-01-07 21:14:23 +0000300 :class:`ZipInfo` constructor sets this member to :const:`ZIP_STORED`.
301
Ezio Melottif8754a62010-03-21 07:16:43 +0000302 .. versionchanged:: 3.2
Ronald Oussorenee5c8852010-02-07 20:24:02 +0000303 The *compression_type* argument.
304
Martin v. Löwisb09b8442008-07-03 14:13:42 +0000305The following data attributes are also available:
Georg Brandl116aa622007-08-15 14:28:22 +0000306
307
308.. attribute:: ZipFile.debug
309
310 The level of debug output to use. This may be set from ``0`` (the default, no
311 output) to ``3`` (the most output). Debugging information is written to
312 ``sys.stdout``.
313
Martin v. Löwisb09b8442008-07-03 14:13:42 +0000314.. attribute:: ZipFile.comment
315
Georg Brandl48310cd2009-01-03 21:18:54 +0000316 The comment text associated with the ZIP file. If assigning a comment to a
317 :class:`ZipFile` instance created with mode 'a' or 'w', this should be a
318 string no longer than 65535 bytes. Comments longer than this will be
Martin v. Löwisb09b8442008-07-03 14:13:42 +0000319 truncated in the written archive when :meth:`ZipFile.close` is called.
Georg Brandl116aa622007-08-15 14:28:22 +0000320
321.. _pyzipfile-objects:
322
323PyZipFile Objects
324-----------------
325
326The :class:`PyZipFile` constructor takes the same parameters as the
327:class:`ZipFile` constructor. Instances have one method in addition to those of
328:class:`ZipFile` objects.
329
330
Georg Brandl7f01a132009-09-16 15:58:14 +0000331.. method:: PyZipFile.writepy(pathname, basename='')
Georg Brandl116aa622007-08-15 14:28:22 +0000332
333 Search for files :file:`\*.py` and add the corresponding file to the archive.
334 The corresponding file is a :file:`\*.pyo` file if available, else a
335 :file:`\*.pyc` file, compiling if necessary. If the pathname is a file, the
336 filename must end with :file:`.py`, and just the (corresponding
337 :file:`\*.py[co]`) file is added at the top level (no path information). If the
338 pathname is a file that does not end with :file:`.py`, a :exc:`RuntimeError`
339 will be raised. If it is a directory, and the directory is not a package
340 directory, then all the files :file:`\*.py[co]` are added at the top level. If
341 the directory is a package directory, then all :file:`\*.py[co]` are added under
342 the package name as a file path, and if any subdirectories are package
343 directories, all of these are added recursively. *basename* is intended for
344 internal use only. The :meth:`writepy` method makes archives with file names
345 like this::
346
Georg Brandl48310cd2009-01-03 21:18:54 +0000347 string.pyc # Top level name
348 test/__init__.pyc # Package directory
Georg Brandl116aa622007-08-15 14:28:22 +0000349 test/testall.pyc # Module test.testall
Georg Brandl48310cd2009-01-03 21:18:54 +0000350 test/bogus/__init__.pyc # Subpackage directory
Georg Brandl116aa622007-08-15 14:28:22 +0000351 test/bogus/myfile.pyc # Submodule test.bogus.myfile
352
353
354.. _zipinfo-objects:
355
356ZipInfo Objects
357---------------
358
359Instances of the :class:`ZipInfo` class are returned by the :meth:`getinfo` and
360:meth:`infolist` methods of :class:`ZipFile` objects. Each object stores
361information about a single member of the ZIP archive.
362
363Instances have the following attributes:
364
365
366.. attribute:: ZipInfo.filename
367
368 Name of the file in the archive.
369
370
371.. attribute:: ZipInfo.date_time
372
373 The time and date of the last modification to the archive member. This is a
374 tuple of six values:
375
376 +-------+--------------------------+
377 | Index | Value |
378 +=======+==========================+
379 | ``0`` | Year |
380 +-------+--------------------------+
381 | ``1`` | Month (one-based) |
382 +-------+--------------------------+
383 | ``2`` | Day of month (one-based) |
384 +-------+--------------------------+
385 | ``3`` | Hours (zero-based) |
386 +-------+--------------------------+
387 | ``4`` | Minutes (zero-based) |
388 +-------+--------------------------+
389 | ``5`` | Seconds (zero-based) |
390 +-------+--------------------------+
391
392
393.. attribute:: ZipInfo.compress_type
394
395 Type of compression for the archive member.
396
397
398.. attribute:: ZipInfo.comment
399
400 Comment for the individual archive member.
401
402
403.. attribute:: ZipInfo.extra
404
405 Expansion field data. The `PKZIP Application Note
Christian Heimesdd15f6c2008-03-16 00:07:10 +0000406 <http://www.pkware.com/documents/casestudies/APPNOTE.TXT>`_ contains
Georg Brandl116aa622007-08-15 14:28:22 +0000407 some comments on the internal structure of the data contained in this string.
408
409
410.. attribute:: ZipInfo.create_system
411
412 System which created ZIP archive.
413
414
415.. attribute:: ZipInfo.create_version
416
417 PKZIP version which created ZIP archive.
418
419
420.. attribute:: ZipInfo.extract_version
421
422 PKZIP version needed to extract archive.
423
424
425.. attribute:: ZipInfo.reserved
426
427 Must be zero.
428
429
430.. attribute:: ZipInfo.flag_bits
431
432 ZIP flag bits.
433
434
435.. attribute:: ZipInfo.volume
436
437 Volume number of file header.
438
439
440.. attribute:: ZipInfo.internal_attr
441
442 Internal attributes.
443
444
445.. attribute:: ZipInfo.external_attr
446
447 External file attributes.
448
449
450.. attribute:: ZipInfo.header_offset
451
452 Byte offset to the file header.
453
454
455.. attribute:: ZipInfo.CRC
456
457 CRC-32 of the uncompressed file.
458
459
460.. attribute:: ZipInfo.compress_size
461
462 Size of the compressed data.
463
464
465.. attribute:: ZipInfo.file_size
466
467 Size of the uncompressed file.
468