| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 1 | :mod:`filecmp` --- File and Directory Comparisons | 
|  | 2 | ================================================= | 
|  | 3 |  | 
|  | 4 | .. module:: filecmp | 
|  | 5 | :synopsis: Compare files efficiently. | 
|  | 6 | .. sectionauthor:: Moshe Zadka <moshez@zadka.site.co.il> | 
|  | 7 |  | 
| Raymond Hettinger | 1048094 | 2011-01-10 03:26:08 +0000 | [diff] [blame] | 8 | **Source code:** :source:`Lib/filecmp.py` | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 9 |  | 
| Raymond Hettinger | 4f707fd | 2011-01-10 19:54:11 +0000 | [diff] [blame] | 10 | -------------- | 
|  | 11 |  | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 12 | The :mod:`filecmp` module defines functions to compare files and directories, | 
| Georg Brandl | 9afde1c | 2007-11-01 20:32:30 +0000 | [diff] [blame] | 13 | with various optional time/correctness trade-offs. For comparing files, | 
|  | 14 | see also the :mod:`difflib` module. | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 15 |  | 
|  | 16 | The :mod:`filecmp` module defines the following functions: | 
|  | 17 |  | 
|  | 18 |  | 
| Georg Brandl | 71515ca | 2009-05-17 12:29:12 +0000 | [diff] [blame] | 19 | .. function:: cmp(f1, f2, shallow=True) | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 20 |  | 
|  | 21 | Compare the files named *f1* and *f2*, returning ``True`` if they seem equal, | 
|  | 22 | ``False`` otherwise. | 
|  | 23 |  | 
| Eli Bendersky | e431ed2 | 2012-07-24 19:47:34 +0300 | [diff] [blame] | 24 | If *shallow* is true, files with identical :func:`os.stat` signatures are | 
|  | 25 | taken to be equal.  Otherwise, the contents of the files are compared. | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 26 |  | 
|  | 27 | Note that no external programs are called from this function, giving it | 
|  | 28 | portability and efficiency. | 
|  | 29 |  | 
| Ned Deily | 7bff3cb | 2013-06-14 15:19:11 -0700 | [diff] [blame] | 30 | This function uses a cache for past comparisons and the results, | 
| R David Murray | 4885f49 | 2014-02-02 11:11:01 -0500 | [diff] [blame] | 31 | with cache entries invalidated if the :func:`os.stat` information for the | 
|  | 32 | file changes.  The entire cache may be cleared using :func:`clear_cache`. | 
| Ned Deily | 7bff3cb | 2013-06-14 15:19:11 -0700 | [diff] [blame] | 33 |  | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 34 |  | 
| Georg Brandl | 71515ca | 2009-05-17 12:29:12 +0000 | [diff] [blame] | 35 | .. function:: cmpfiles(dir1, dir2, common, shallow=True) | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 36 |  | 
| Benjamin Peterson | e0124bd | 2009-03-09 21:04:33 +0000 | [diff] [blame] | 37 | Compare the files in the two directories *dir1* and *dir2* whose names are | 
|  | 38 | given by *common*. | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 39 |  | 
| Benjamin Peterson | e0124bd | 2009-03-09 21:04:33 +0000 | [diff] [blame] | 40 | Returns three lists of file names: *match*, *mismatch*, | 
|  | 41 | *errors*.  *match* contains the list of files that match, *mismatch* contains | 
|  | 42 | the names of those that don't, and *errors* lists the names of files which | 
|  | 43 | could not be compared.  Files are listed in *errors* if they don't exist in | 
|  | 44 | one of the directories, the user lacks permission to read them or if the | 
|  | 45 | comparison could not be done for some other reason. | 
|  | 46 |  | 
|  | 47 | The *shallow* parameter has the same meaning and default value as for | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 48 | :func:`filecmp.cmp`. | 
|  | 49 |  | 
| Benjamin Peterson | e0124bd | 2009-03-09 21:04:33 +0000 | [diff] [blame] | 50 | For example, ``cmpfiles('a', 'b', ['c', 'd/e'])`` will compare ``a/c`` with | 
|  | 51 | ``b/c`` and ``a/d/e`` with ``b/d/e``.  ``'c'`` and ``'d/e'`` will each be in | 
|  | 52 | one of the three returned lists. | 
|  | 53 |  | 
|  | 54 |  | 
| Ned Deily | 7bff3cb | 2013-06-14 15:19:11 -0700 | [diff] [blame] | 55 | .. function:: clear_cache() | 
|  | 56 |  | 
| Ned Deily | 7bff3cb | 2013-06-14 15:19:11 -0700 | [diff] [blame] | 57 | Clear the filecmp cache. This may be useful if a file is compared so quickly | 
|  | 58 | after it is modified that it is within the mtime resolution of | 
|  | 59 | the underlying filesystem. | 
|  | 60 |  | 
| R David Murray | 4885f49 | 2014-02-02 11:11:01 -0500 | [diff] [blame] | 61 | .. versionadded:: 3.4 | 
|  | 62 |  | 
| Ned Deily | 7bff3cb | 2013-06-14 15:19:11 -0700 | [diff] [blame] | 63 |  | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 64 | .. _dircmp-objects: | 
|  | 65 |  | 
|  | 66 | The :class:`dircmp` class | 
|  | 67 | ------------------------- | 
|  | 68 |  | 
| Georg Brandl | 71515ca | 2009-05-17 12:29:12 +0000 | [diff] [blame] | 69 | .. class:: dircmp(a, b, ignore=None, hide=None) | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 70 |  | 
| Eli Bendersky | eb2884a | 2013-01-12 06:13:32 -0800 | [diff] [blame] | 71 | Construct a new directory comparison object, to compare the directories *a* | 
| Eli Bendersky | f50d6bc | 2013-03-14 14:39:51 -0700 | [diff] [blame] | 72 | and *b*.  *ignore* is a list of names to ignore, and defaults to | 
|  | 73 | :attr:`filecmp.DEFAULT_IGNORES`.  *hide* is a list of names to hide, and | 
| Eli Bendersky | eb2884a | 2013-01-12 06:13:32 -0800 | [diff] [blame] | 74 | defaults to ``[os.curdir, os.pardir]``. | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 75 |  | 
| Senthil Kumaran | 28a9f21 | 2012-07-22 19:12:58 -0700 | [diff] [blame] | 76 | The :class:`dircmp` class compares files by doing *shallow* comparisons | 
|  | 77 | as described for :func:`filecmp.cmp`. | 
|  | 78 |  | 
| Benjamin Peterson | e41251e | 2008-04-25 01:59:09 +0000 | [diff] [blame] | 79 | The :class:`dircmp` class provides the following methods: | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 80 |  | 
| Benjamin Peterson | e41251e | 2008-04-25 01:59:09 +0000 | [diff] [blame] | 81 | .. method:: report() | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 82 |  | 
| Eli Bendersky | f7a54a0 | 2012-07-24 20:44:48 +0300 | [diff] [blame] | 83 | Print (to :data:`sys.stdout`) a comparison between *a* and *b*. | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 84 |  | 
| Benjamin Peterson | e41251e | 2008-04-25 01:59:09 +0000 | [diff] [blame] | 85 | .. method:: report_partial_closure() | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 86 |  | 
| Benjamin Peterson | e41251e | 2008-04-25 01:59:09 +0000 | [diff] [blame] | 87 | Print a comparison between *a* and *b* and common immediate | 
|  | 88 | subdirectories. | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 89 |  | 
| Benjamin Peterson | e41251e | 2008-04-25 01:59:09 +0000 | [diff] [blame] | 90 | .. method:: report_full_closure() | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 91 |  | 
| Benjamin Peterson | e41251e | 2008-04-25 01:59:09 +0000 | [diff] [blame] | 92 | Print a comparison between *a* and *b* and common subdirectories | 
|  | 93 | (recursively). | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 94 |  | 
| Senthil Kumaran | 28a9f21 | 2012-07-22 19:12:58 -0700 | [diff] [blame] | 95 | The :class:`dircmp` class offers a number of interesting attributes that may be | 
| Benjamin Peterson | e41251e | 2008-04-25 01:59:09 +0000 | [diff] [blame] | 96 | used to get various bits of information about the directory trees being | 
|  | 97 | compared. | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 98 |  | 
| Benjamin Peterson | e41251e | 2008-04-25 01:59:09 +0000 | [diff] [blame] | 99 | Note that via :meth:`__getattr__` hooks, all attributes are computed lazily, | 
|  | 100 | so there is no speed penalty if only those attributes which are lightweight | 
|  | 101 | to compute are used. | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 102 |  | 
|  | 103 |  | 
| R David Murray | 2b209cd | 2012-08-14 21:40:13 -0400 | [diff] [blame] | 104 | .. attribute:: left | 
|  | 105 |  | 
|  | 106 | The directory *a*. | 
|  | 107 |  | 
|  | 108 |  | 
|  | 109 | .. attribute:: right | 
|  | 110 |  | 
|  | 111 | The directory *b*. | 
|  | 112 |  | 
|  | 113 |  | 
| Benjamin Peterson | e41251e | 2008-04-25 01:59:09 +0000 | [diff] [blame] | 114 | .. attribute:: left_list | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 115 |  | 
| Benjamin Peterson | e41251e | 2008-04-25 01:59:09 +0000 | [diff] [blame] | 116 | Files and subdirectories in *a*, filtered by *hide* and *ignore*. | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 117 |  | 
|  | 118 |  | 
| Benjamin Peterson | e41251e | 2008-04-25 01:59:09 +0000 | [diff] [blame] | 119 | .. attribute:: right_list | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 120 |  | 
| Benjamin Peterson | e41251e | 2008-04-25 01:59:09 +0000 | [diff] [blame] | 121 | Files and subdirectories in *b*, filtered by *hide* and *ignore*. | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 122 |  | 
|  | 123 |  | 
| Benjamin Peterson | e41251e | 2008-04-25 01:59:09 +0000 | [diff] [blame] | 124 | .. attribute:: common | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 125 |  | 
| Benjamin Peterson | e41251e | 2008-04-25 01:59:09 +0000 | [diff] [blame] | 126 | Files and subdirectories in both *a* and *b*. | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 127 |  | 
|  | 128 |  | 
| Benjamin Peterson | e41251e | 2008-04-25 01:59:09 +0000 | [diff] [blame] | 129 | .. attribute:: left_only | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 130 |  | 
| Benjamin Peterson | e41251e | 2008-04-25 01:59:09 +0000 | [diff] [blame] | 131 | Files and subdirectories only in *a*. | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 132 |  | 
|  | 133 |  | 
| Benjamin Peterson | e41251e | 2008-04-25 01:59:09 +0000 | [diff] [blame] | 134 | .. attribute:: right_only | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 135 |  | 
| Benjamin Peterson | e41251e | 2008-04-25 01:59:09 +0000 | [diff] [blame] | 136 | Files and subdirectories only in *b*. | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 137 |  | 
|  | 138 |  | 
| Benjamin Peterson | e41251e | 2008-04-25 01:59:09 +0000 | [diff] [blame] | 139 | .. attribute:: common_dirs | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 140 |  | 
| Benjamin Peterson | e41251e | 2008-04-25 01:59:09 +0000 | [diff] [blame] | 141 | Subdirectories in both *a* and *b*. | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 142 |  | 
|  | 143 |  | 
| Benjamin Peterson | e41251e | 2008-04-25 01:59:09 +0000 | [diff] [blame] | 144 | .. attribute:: common_files | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 145 |  | 
| Eli Bendersky | f50d6bc | 2013-03-14 14:39:51 -0700 | [diff] [blame] | 146 | Files in both *a* and *b*. | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 147 |  | 
|  | 148 |  | 
| Benjamin Peterson | e41251e | 2008-04-25 01:59:09 +0000 | [diff] [blame] | 149 | .. attribute:: common_funny | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 150 |  | 
| Benjamin Peterson | e41251e | 2008-04-25 01:59:09 +0000 | [diff] [blame] | 151 | Names in both *a* and *b*, such that the type differs between the | 
|  | 152 | directories, or names for which :func:`os.stat` reports an error. | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 153 |  | 
|  | 154 |  | 
| Benjamin Peterson | e41251e | 2008-04-25 01:59:09 +0000 | [diff] [blame] | 155 | .. attribute:: same_files | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 156 |  | 
| Senthil Kumaran | 28a9f21 | 2012-07-22 19:12:58 -0700 | [diff] [blame] | 157 | Files which are identical in both *a* and *b*, using the class's | 
|  | 158 | file comparison operator. | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 159 |  | 
|  | 160 |  | 
| Benjamin Peterson | e41251e | 2008-04-25 01:59:09 +0000 | [diff] [blame] | 161 | .. attribute:: diff_files | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 162 |  | 
| Senthil Kumaran | 28a9f21 | 2012-07-22 19:12:58 -0700 | [diff] [blame] | 163 | Files which are in both *a* and *b*, whose contents differ according | 
|  | 164 | to the class's file comparison operator. | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 165 |  | 
|  | 166 |  | 
| Benjamin Peterson | e41251e | 2008-04-25 01:59:09 +0000 | [diff] [blame] | 167 | .. attribute:: funny_files | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 168 |  | 
| Benjamin Peterson | e41251e | 2008-04-25 01:59:09 +0000 | [diff] [blame] | 169 | Files which are in both *a* and *b*, but could not be compared. | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 170 |  | 
|  | 171 |  | 
| Benjamin Peterson | e41251e | 2008-04-25 01:59:09 +0000 | [diff] [blame] | 172 | .. attribute:: subdirs | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 173 |  | 
| Georg Brandl | 71515ca | 2009-05-17 12:29:12 +0000 | [diff] [blame] | 174 | A dictionary mapping names in :attr:`common_dirs` to :class:`dircmp` | 
|  | 175 | objects. | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 176 |  | 
| Eli Bendersky | eb2884a | 2013-01-12 06:13:32 -0800 | [diff] [blame] | 177 | .. attribute:: DEFAULT_IGNORES | 
|  | 178 |  | 
| Eli Bendersky | abdcf2c | 2013-01-12 14:02:29 -0800 | [diff] [blame] | 179 | .. versionadded:: 3.4 | 
| Eli Bendersky | eb2884a | 2013-01-12 06:13:32 -0800 | [diff] [blame] | 180 |  | 
|  | 181 | List of directories ignored by :class:`dircmp` by default. | 
|  | 182 |  | 
| R David Murray | 2b209cd | 2012-08-14 21:40:13 -0400 | [diff] [blame] | 183 |  | 
|  | 184 | Here is a simplified example of using the ``subdirs`` attribute to search | 
|  | 185 | recursively through two directories to show common different files:: | 
|  | 186 |  | 
|  | 187 | >>> from filecmp import dircmp | 
|  | 188 | >>> def print_diff_files(dcmp): | 
|  | 189 | ...     for name in dcmp.diff_files: | 
|  | 190 | ...         print("diff_file %s found in %s and %s" % (name, dcmp.left, | 
|  | 191 | ...               dcmp.right)) | 
|  | 192 | ...     for sub_dcmp in dcmp.subdirs.values(): | 
|  | 193 | ...         print_diff_files(sub_dcmp) | 
|  | 194 | ... | 
| Ezio Melotti | 4050792 | 2013-01-11 09:09:07 +0200 | [diff] [blame] | 195 | >>> dcmp = dircmp('dir1', 'dir2') # doctest: +SKIP | 
|  | 196 | >>> print_diff_files(dcmp) # doctest: +SKIP | 
| R David Murray | 2b209cd | 2012-08-14 21:40:13 -0400 | [diff] [blame] | 197 |  |