Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 1 | :mod:`filecmp` --- File and Directory Comparisons |
| 2 | ================================================= |
| 3 | |
| 4 | .. module:: filecmp |
| 5 | :synopsis: Compare files efficiently. |
Terry Jan Reedy | fa089b9 | 2016-06-11 15:02:54 -0400 | [diff] [blame] | 6 | |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 7 | .. sectionauthor:: Moshe Zadka <moshez@zadka.site.co.il> |
| 8 | |
Raymond Hettinger | 1048094 | 2011-01-10 03:26:08 +0000 | [diff] [blame] | 9 | **Source code:** :source:`Lib/filecmp.py` |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 10 | |
Raymond Hettinger | 4f707fd | 2011-01-10 19:54:11 +0000 | [diff] [blame] | 11 | -------------- |
| 12 | |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 13 | The :mod:`filecmp` module defines functions to compare files and directories, |
Georg Brandl | 9afde1c | 2007-11-01 20:32:30 +0000 | [diff] [blame] | 14 | with various optional time/correctness trade-offs. For comparing files, |
| 15 | see also the :mod:`difflib` module. |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 16 | |
| 17 | The :mod:`filecmp` module defines the following functions: |
| 18 | |
| 19 | |
Georg Brandl | 71515ca | 2009-05-17 12:29:12 +0000 | [diff] [blame] | 20 | .. function:: cmp(f1, f2, shallow=True) |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 21 | |
| 22 | Compare the files named *f1* and *f2*, returning ``True`` if they seem equal, |
| 23 | ``False`` otherwise. |
| 24 | |
Eli Bendersky | e431ed2 | 2012-07-24 19:47:34 +0300 | [diff] [blame] | 25 | If *shallow* is true, files with identical :func:`os.stat` signatures are |
| 26 | taken to be equal. Otherwise, the contents of the files are compared. |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 27 | |
| 28 | Note that no external programs are called from this function, giving it |
| 29 | portability and efficiency. |
| 30 | |
Ned Deily | 7bff3cb | 2013-06-14 15:19:11 -0700 | [diff] [blame] | 31 | This function uses a cache for past comparisons and the results, |
R David Murray | 4885f49 | 2014-02-02 11:11:01 -0500 | [diff] [blame] | 32 | with cache entries invalidated if the :func:`os.stat` information for the |
| 33 | file changes. The entire cache may be cleared using :func:`clear_cache`. |
Ned Deily | 7bff3cb | 2013-06-14 15:19:11 -0700 | [diff] [blame] | 34 | |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 35 | |
Georg Brandl | 71515ca | 2009-05-17 12:29:12 +0000 | [diff] [blame] | 36 | .. function:: cmpfiles(dir1, dir2, common, shallow=True) |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 37 | |
Benjamin Peterson | e0124bd | 2009-03-09 21:04:33 +0000 | [diff] [blame] | 38 | Compare the files in the two directories *dir1* and *dir2* whose names are |
| 39 | given by *common*. |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 40 | |
Benjamin Peterson | e0124bd | 2009-03-09 21:04:33 +0000 | [diff] [blame] | 41 | Returns three lists of file names: *match*, *mismatch*, |
| 42 | *errors*. *match* contains the list of files that match, *mismatch* contains |
| 43 | the names of those that don't, and *errors* lists the names of files which |
| 44 | could not be compared. Files are listed in *errors* if they don't exist in |
| 45 | one of the directories, the user lacks permission to read them or if the |
| 46 | comparison could not be done for some other reason. |
| 47 | |
| 48 | The *shallow* parameter has the same meaning and default value as for |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 49 | :func:`filecmp.cmp`. |
| 50 | |
Benjamin Peterson | e0124bd | 2009-03-09 21:04:33 +0000 | [diff] [blame] | 51 | For example, ``cmpfiles('a', 'b', ['c', 'd/e'])`` will compare ``a/c`` with |
| 52 | ``b/c`` and ``a/d/e`` with ``b/d/e``. ``'c'`` and ``'d/e'`` will each be in |
| 53 | one of the three returned lists. |
| 54 | |
| 55 | |
Ned Deily | 7bff3cb | 2013-06-14 15:19:11 -0700 | [diff] [blame] | 56 | .. function:: clear_cache() |
| 57 | |
Ned Deily | 7bff3cb | 2013-06-14 15:19:11 -0700 | [diff] [blame] | 58 | Clear the filecmp cache. This may be useful if a file is compared so quickly |
| 59 | after it is modified that it is within the mtime resolution of |
| 60 | the underlying filesystem. |
| 61 | |
R David Murray | 4885f49 | 2014-02-02 11:11:01 -0500 | [diff] [blame] | 62 | .. versionadded:: 3.4 |
| 63 | |
Ned Deily | 7bff3cb | 2013-06-14 15:19:11 -0700 | [diff] [blame] | 64 | |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 65 | .. _dircmp-objects: |
| 66 | |
| 67 | The :class:`dircmp` class |
| 68 | ------------------------- |
| 69 | |
Georg Brandl | 71515ca | 2009-05-17 12:29:12 +0000 | [diff] [blame] | 70 | .. class:: dircmp(a, b, ignore=None, hide=None) |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 71 | |
Eli Bendersky | eb2884a | 2013-01-12 06:13:32 -0800 | [diff] [blame] | 72 | Construct a new directory comparison object, to compare the directories *a* |
Eli Bendersky | f50d6bc | 2013-03-14 14:39:51 -0700 | [diff] [blame] | 73 | and *b*. *ignore* is a list of names to ignore, and defaults to |
| 74 | :attr:`filecmp.DEFAULT_IGNORES`. *hide* is a list of names to hide, and |
Eli Bendersky | eb2884a | 2013-01-12 06:13:32 -0800 | [diff] [blame] | 75 | defaults to ``[os.curdir, os.pardir]``. |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 76 | |
Senthil Kumaran | 28a9f21 | 2012-07-22 19:12:58 -0700 | [diff] [blame] | 77 | The :class:`dircmp` class compares files by doing *shallow* comparisons |
| 78 | as described for :func:`filecmp.cmp`. |
| 79 | |
Benjamin Peterson | e41251e | 2008-04-25 01:59:09 +0000 | [diff] [blame] | 80 | The :class:`dircmp` class provides the following methods: |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 81 | |
Benjamin Peterson | e41251e | 2008-04-25 01:59:09 +0000 | [diff] [blame] | 82 | .. method:: report() |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 83 | |
Eli Bendersky | f7a54a0 | 2012-07-24 20:44:48 +0300 | [diff] [blame] | 84 | Print (to :data:`sys.stdout`) a comparison between *a* and *b*. |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 85 | |
Benjamin Peterson | e41251e | 2008-04-25 01:59:09 +0000 | [diff] [blame] | 86 | .. method:: report_partial_closure() |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 87 | |
Benjamin Peterson | e41251e | 2008-04-25 01:59:09 +0000 | [diff] [blame] | 88 | Print a comparison between *a* and *b* and common immediate |
| 89 | subdirectories. |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 90 | |
Benjamin Peterson | e41251e | 2008-04-25 01:59:09 +0000 | [diff] [blame] | 91 | .. method:: report_full_closure() |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 92 | |
Benjamin Peterson | e41251e | 2008-04-25 01:59:09 +0000 | [diff] [blame] | 93 | Print a comparison between *a* and *b* and common subdirectories |
| 94 | (recursively). |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 95 | |
Senthil Kumaran | 28a9f21 | 2012-07-22 19:12:58 -0700 | [diff] [blame] | 96 | The :class:`dircmp` class offers a number of interesting attributes that may be |
Benjamin Peterson | e41251e | 2008-04-25 01:59:09 +0000 | [diff] [blame] | 97 | used to get various bits of information about the directory trees being |
| 98 | compared. |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 99 | |
Benjamin Peterson | e41251e | 2008-04-25 01:59:09 +0000 | [diff] [blame] | 100 | Note that via :meth:`__getattr__` hooks, all attributes are computed lazily, |
| 101 | so there is no speed penalty if only those attributes which are lightweight |
| 102 | to compute are used. |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 103 | |
| 104 | |
R David Murray | 2b209cd | 2012-08-14 21:40:13 -0400 | [diff] [blame] | 105 | .. attribute:: left |
| 106 | |
| 107 | The directory *a*. |
| 108 | |
| 109 | |
| 110 | .. attribute:: right |
| 111 | |
| 112 | The directory *b*. |
| 113 | |
| 114 | |
Benjamin Peterson | e41251e | 2008-04-25 01:59:09 +0000 | [diff] [blame] | 115 | .. attribute:: left_list |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 116 | |
Benjamin Peterson | e41251e | 2008-04-25 01:59:09 +0000 | [diff] [blame] | 117 | Files and subdirectories in *a*, filtered by *hide* and *ignore*. |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 118 | |
| 119 | |
Benjamin Peterson | e41251e | 2008-04-25 01:59:09 +0000 | [diff] [blame] | 120 | .. attribute:: right_list |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 121 | |
Benjamin Peterson | e41251e | 2008-04-25 01:59:09 +0000 | [diff] [blame] | 122 | Files and subdirectories in *b*, filtered by *hide* and *ignore*. |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 123 | |
| 124 | |
Benjamin Peterson | e41251e | 2008-04-25 01:59:09 +0000 | [diff] [blame] | 125 | .. attribute:: common |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 126 | |
Benjamin Peterson | e41251e | 2008-04-25 01:59:09 +0000 | [diff] [blame] | 127 | Files and subdirectories in both *a* and *b*. |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 128 | |
| 129 | |
Benjamin Peterson | e41251e | 2008-04-25 01:59:09 +0000 | [diff] [blame] | 130 | .. attribute:: left_only |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 131 | |
Benjamin Peterson | e41251e | 2008-04-25 01:59:09 +0000 | [diff] [blame] | 132 | Files and subdirectories only in *a*. |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 133 | |
| 134 | |
Benjamin Peterson | e41251e | 2008-04-25 01:59:09 +0000 | [diff] [blame] | 135 | .. attribute:: right_only |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 136 | |
Benjamin Peterson | e41251e | 2008-04-25 01:59:09 +0000 | [diff] [blame] | 137 | Files and subdirectories only in *b*. |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 138 | |
| 139 | |
Benjamin Peterson | e41251e | 2008-04-25 01:59:09 +0000 | [diff] [blame] | 140 | .. attribute:: common_dirs |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 141 | |
Benjamin Peterson | e41251e | 2008-04-25 01:59:09 +0000 | [diff] [blame] | 142 | Subdirectories in both *a* and *b*. |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 143 | |
| 144 | |
Benjamin Peterson | e41251e | 2008-04-25 01:59:09 +0000 | [diff] [blame] | 145 | .. attribute:: common_files |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 146 | |
Eli Bendersky | f50d6bc | 2013-03-14 14:39:51 -0700 | [diff] [blame] | 147 | Files in both *a* and *b*. |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 148 | |
| 149 | |
Benjamin Peterson | e41251e | 2008-04-25 01:59:09 +0000 | [diff] [blame] | 150 | .. attribute:: common_funny |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 151 | |
Benjamin Peterson | e41251e | 2008-04-25 01:59:09 +0000 | [diff] [blame] | 152 | Names in both *a* and *b*, such that the type differs between the |
| 153 | directories, or names for which :func:`os.stat` reports an error. |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 154 | |
| 155 | |
Benjamin Peterson | e41251e | 2008-04-25 01:59:09 +0000 | [diff] [blame] | 156 | .. attribute:: same_files |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 157 | |
Senthil Kumaran | 28a9f21 | 2012-07-22 19:12:58 -0700 | [diff] [blame] | 158 | Files which are identical in both *a* and *b*, using the class's |
| 159 | file comparison operator. |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 160 | |
| 161 | |
Benjamin Peterson | e41251e | 2008-04-25 01:59:09 +0000 | [diff] [blame] | 162 | .. attribute:: diff_files |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 163 | |
Senthil Kumaran | 28a9f21 | 2012-07-22 19:12:58 -0700 | [diff] [blame] | 164 | Files which are in both *a* and *b*, whose contents differ according |
| 165 | to the class's file comparison operator. |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 166 | |
| 167 | |
Benjamin Peterson | e41251e | 2008-04-25 01:59:09 +0000 | [diff] [blame] | 168 | .. attribute:: funny_files |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 169 | |
Benjamin Peterson | e41251e | 2008-04-25 01:59:09 +0000 | [diff] [blame] | 170 | Files which are in both *a* and *b*, but could not be compared. |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 171 | |
| 172 | |
Benjamin Peterson | e41251e | 2008-04-25 01:59:09 +0000 | [diff] [blame] | 173 | .. attribute:: subdirs |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 174 | |
Georg Brandl | 71515ca | 2009-05-17 12:29:12 +0000 | [diff] [blame] | 175 | A dictionary mapping names in :attr:`common_dirs` to :class:`dircmp` |
| 176 | objects. |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 177 | |
Eli Bendersky | eb2884a | 2013-01-12 06:13:32 -0800 | [diff] [blame] | 178 | .. attribute:: DEFAULT_IGNORES |
| 179 | |
Eli Bendersky | abdcf2c | 2013-01-12 14:02:29 -0800 | [diff] [blame] | 180 | .. versionadded:: 3.4 |
Eli Bendersky | eb2884a | 2013-01-12 06:13:32 -0800 | [diff] [blame] | 181 | |
| 182 | List of directories ignored by :class:`dircmp` by default. |
| 183 | |
R David Murray | 2b209cd | 2012-08-14 21:40:13 -0400 | [diff] [blame] | 184 | |
| 185 | Here is a simplified example of using the ``subdirs`` attribute to search |
| 186 | recursively through two directories to show common different files:: |
| 187 | |
| 188 | >>> from filecmp import dircmp |
| 189 | >>> def print_diff_files(dcmp): |
| 190 | ... for name in dcmp.diff_files: |
| 191 | ... print("diff_file %s found in %s and %s" % (name, dcmp.left, |
| 192 | ... dcmp.right)) |
| 193 | ... for sub_dcmp in dcmp.subdirs.values(): |
| 194 | ... print_diff_files(sub_dcmp) |
| 195 | ... |
Ezio Melotti | 4050792 | 2013-01-11 09:09:07 +0200 | [diff] [blame] | 196 | >>> dcmp = dircmp('dir1', 'dir2') # doctest: +SKIP |
| 197 | >>> print_diff_files(dcmp) # doctest: +SKIP |
R David Murray | 2b209cd | 2012-08-14 21:40:13 -0400 | [diff] [blame] | 198 | |