Blame - Doc/library/difflib.rst - platform/external/python/cpython2

blob: db4bd3a17332e1f502d022923cf7b65ef9a95361 [file] [log] [blame]

Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	1	:mod:`difflib` --- Helpers for computing deltas
				2	===============================================
				3
				4	.. module:: difflib
				5	:synopsis: Helpers for computing differences between objects.
				6	.. moduleauthor:: Tim Peters <tim_one@users.sourceforge.net>
				7	.. sectionauthor:: Tim Peters <tim_one@users.sourceforge.net>
Christian Heimes	5b5e81c	2007-12-31 16:14:33 +0000	[diff] [blame]	8	.. Markup by Fred L. Drake, Jr. <fdrake@acm.org>
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	9
Christian Heimes	fe337bf	2008-03-23 21:54:12 +0000	[diff] [blame^]	10	.. testsetup::
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	11
Christian Heimes	fe337bf	2008-03-23 21:54:12 +0000	[diff] [blame^]	12	import sys
				13	from difflib import *
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	14
Georg Brandl	9afde1c	2007-11-01 20:32:30 +0000	[diff] [blame]	15	This module provides classes and functions for comparing sequences. It
				16	can be used for example, for comparing files, and can produce difference
				17	information in various formats, including HTML and context and unified
				18	diffs. For comparing directories and files, see also, the :mod:`filecmp` module.
				19
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	20	.. class:: SequenceMatcher
				21
				22	This is a flexible class for comparing pairs of sequences of any type, so long
Guido van Rossum	2cc30da	2007-11-02 23:46:40 +0000	[diff] [blame]	23	as the sequence elements are :term:`hashable`. The basic algorithm predates, and is a
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	24	little fancier than, an algorithm published in the late 1980's by Ratcliff and
				25	Obershelp under the hyperbolic name "gestalt pattern matching." The idea is to
				26	find the longest contiguous matching subsequence that contains no "junk"
				27	elements (the Ratcliff and Obershelp algorithm doesn't address junk). The same
				28	idea is then applied recursively to the pieces of the sequences to the left and
				29	to the right of the matching subsequence. This does not yield minimal edit
				30	sequences, but does tend to yield matches that "look right" to people.
				31
				32	Timing: The basic Ratcliff-Obershelp algorithm is cubic time in the worst
				33	case and quadratic time in the expected case. :class:`SequenceMatcher` is
				34	quadratic time for the worst case and has expected-case behavior dependent in a
				35	complicated way on how many elements the sequences have in common; best case
				36	time is linear.
				37
				38
				39	.. class:: Differ
				40
				41	This is a class for comparing sequences of lines of text, and producing
				42	human-readable differences or deltas. Differ uses :class:`SequenceMatcher`
				43	both to compare sequences of lines, and to compare sequences of characters
				44	within similar (near-matching) lines.
				45
				46	Each line of a :class:`Differ` delta begins with a two-letter code:
				47
				48	+----------+-------------------------------------------+
				49	\| Code \| Meaning \|
				50	+==========+===========================================+
				51	\| ``'- '`` \| line unique to sequence 1 \|
				52	+----------+-------------------------------------------+
				53	\| ``'+ '`` \| line unique to sequence 2 \|
				54	+----------+-------------------------------------------+
				55	\| ``' '`` \| line common to both sequences \|
				56	+----------+-------------------------------------------+
				57	\| ``'? '`` \| line not present in either input sequence \|
				58	+----------+-------------------------------------------+
				59
				60	Lines beginning with '``?``' attempt to guide the eye to intraline differences,
				61	and were not present in either input sequence. These lines can be confusing if
				62	the sequences contain tab characters.
				63
				64
				65	.. class:: HtmlDiff
				66
				67	This class can be used to create an HTML table (or a complete HTML file
				68	containing the table) showing a side by side, line by line comparison of text
				69	with inter-line and intra-line change highlights. The table can be generated in
				70	either full or contextual difference mode.
				71
				72	The constructor for this class is:
				73
				74
				75	.. function:: __init__([tabsize][, wrapcolumn][, linejunk][, charjunk])
				76
				77	Initializes instance of :class:`HtmlDiff`.
				78
				79	tabsize is an optional keyword argument to specify tab stop spacing and
				80	defaults to ``8``.
				81
				82	wrapcolumn is an optional keyword to specify column number where lines are
				83	broken and wrapped, defaults to ``None`` where lines are not wrapped.
				84
				85	linejunk and charjunk are optional keyword arguments passed into ``ndiff()``
				86	(used by :class:`HtmlDiff` to generate the side by side HTML differences). See
				87	``ndiff()`` documentation for argument default values and descriptions.
				88
				89	The following methods are public:
				90
				91
				92	.. function:: make_file(fromlines, tolines [, fromdesc][, todesc][, context][, numlines])
				93
				94	Compares fromlines and tolines (lists of strings) and returns a string which
				95	is a complete HTML file containing a table showing line by line differences with
				96	inter-line and intra-line changes highlighted.
				97
				98	fromdesc and todesc are optional keyword arguments to specify from/to file
				99	column header strings (both default to an empty string).
				100
				101	context and numlines are both optional keyword arguments. Set context to
				102	``True`` when contextual differences are to be shown, else the default is
				103	``False`` to show the full files. numlines defaults to ``5``. When context
				104	is ``True`` numlines controls the number of context lines which surround the
				105	difference highlights. When context is ``False`` numlines controls the
				106	number of lines which are shown before a difference highlight when using the
				107	"next" hyperlinks (setting to zero would cause the "next" hyperlinks to place
				108	the next difference highlight at the top of the browser without any leading
				109	context).
				110
				111
				112	.. function:: make_table(fromlines, tolines [, fromdesc][, todesc][, context][, numlines])
				113
				114	Compares fromlines and tolines (lists of strings) and returns a string which
				115	is a complete HTML table showing line by line differences with inter-line and
				116	intra-line changes highlighted.
				117
				118	The arguments for this method are the same as those for the :meth:`make_file`
				119	method.
				120
				121	:file:`Tools/scripts/diff.py` is a command-line front-end to this class and
				122	contains a good example of its use.
				123
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	124
				125	.. function:: context_diff(a, b[, fromfile][, tofile][, fromfiledate][, tofiledate][, n][, lineterm])
				126
Georg Brandl	9afde1c	2007-11-01 20:32:30 +0000	[diff] [blame]	127	Compare a and b (lists of strings); return a delta (a :term:`generator`
				128	generating the delta lines) in context diff format.
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	129
				130	Context diffs are a compact way of showing just the lines that have changed plus
				131	a few lines of context. The changes are shown in a before/after style. The
				132	number of context lines is set by n which defaults to three.
				133
				134	By default, the diff control lines (those with ``***`` or ``---``) are created
				135	with a trailing newline. This is helpful so that inputs created from
				136	:func:`file.readlines` result in diffs that are suitable for use with
				137	:func:`file.writelines` since both the inputs and outputs have trailing
				138	newlines.
				139
				140	For inputs that do not have trailing newlines, set the lineterm argument to
				141	``""`` so that the output will be uniformly newline free.
				142
				143	The context diff format normally has a header for filenames and modification
				144	times. Any or all of these may be specified using strings for fromfile,
				145	tofile, fromfiledate, and tofiledate. The modification times are normally
				146	expressed in the format returned by :func:`time.ctime`. If not specified, the
				147	strings default to blanks.
				148
Christian Heimes	8640e74	2008-02-23 16:23:06 +0000	[diff] [blame]	149	>>> s1 = ['bacon\n', 'eggs\n', 'ham\n', 'guido\n']
				150	>>> s2 = ['python\n', 'eggy\n', 'hamster\n', 'guido\n']
				151	>>> for line in context_diff(s1, s2, fromfile='before.py', tofile='after.py'):
Christian Heimes	fe337bf	2008-03-23 21:54:12 +0000	[diff] [blame^]	152	... sys.stdout.write(line) # doctest: +NORMALIZE_WHITESPACE
Christian Heimes	8640e74	2008-02-23 16:23:06 +0000	[diff] [blame]	153	*** before.py
				154	--- after.py
				155	***************
				156	* 1,4 **
				157	! bacon
				158	! eggs
				159	! ham
				160	guido
				161	--- 1,4 ----
				162	! python
				163	! eggy
				164	! hamster
				165	guido
				166
				167	See :ref:`difflib-interface` for a more detailed example.
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	168
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	169
				170	.. function:: get_close_matches(word, possibilities[, n][, cutoff])
				171
				172	Return a list of the best "good enough" matches. word is a sequence for which
				173	close matches are desired (typically a string), and possibilities is a list of
				174	sequences against which to match word (typically a list of strings).
				175
				176	Optional argument n (default ``3``) is the maximum number of close matches to
				177	return; n must be greater than ``0``.
				178
				179	Optional argument cutoff (default ``0.6``) is a float in the range [0, 1].
				180	Possibilities that don't score at least that similar to word are ignored.
				181
				182	The best (no more than n) matches among the possibilities are returned in a
Christian Heimes	fe337bf	2008-03-23 21:54:12 +0000	[diff] [blame^]	183	list, sorted by similarity score, most similar first.
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	184
				185	>>> get_close_matches('appel', ['ape', 'apple', 'peach', 'puppy'])
				186	['apple', 'ape']
				187	>>> import keyword
				188	>>> get_close_matches('wheel', keyword.kwlist)
				189	['while']
				190	>>> get_close_matches('apple', keyword.kwlist)
				191	[]
				192	>>> get_close_matches('accept', keyword.kwlist)
				193	['except']
				194
				195
				196	.. function:: ndiff(a, b[, linejunk][, charjunk])
				197
Georg Brandl	9afde1c	2007-11-01 20:32:30 +0000	[diff] [blame]	198	Compare a and b (lists of strings); return a :class:`Differ`\ -style
				199	delta (a :term:`generator` generating the delta lines).
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	200
				201	Optional keyword parameters linejunk and charjunk are for filter functions
				202	(or ``None``):
				203
				204	linejunk: A function that accepts a single string argument, and returns true
				205	if the string is junk, or false if not. The default is (``None``), starting with
				206	Python 2.3. Before then, the default was the module-level function
				207	:func:`IS_LINE_JUNK`, which filters out lines without visible characters, except
				208	for at most one pound character (``'#'``). As of Python 2.3, the underlying
				209	:class:`SequenceMatcher` class does a dynamic analysis of which lines are so
				210	frequent as to constitute noise, and this usually works better than the pre-2.3
				211	default.
				212
				213	charjunk: A function that accepts a character (a string of length 1), and
				214	returns if the character is junk, or false if not. The default is module-level
				215	function :func:`IS_CHARACTER_JUNK`, which filters out whitespace characters (a
				216	blank or tab; note: bad idea to include newline in this!).
				217
Christian Heimes	fe337bf	2008-03-23 21:54:12 +0000	[diff] [blame^]	218	:file:`Tools/scripts/ndiff.py` is a command-line front-end to this function.
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	219
				220	>>> diff = ndiff('one\ntwo\nthree\n'.splitlines(1),
				221	... 'ore\ntree\nemu\n'.splitlines(1))
Georg Brandl	6911e3c	2007-09-04 07:15:32 +0000	[diff] [blame]	222	>>> print(''.join(diff), end="")
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	223	- one
				224	? ^
				225	+ ore
				226	? ^
				227	- two
				228	- three
				229	? -
				230	+ tree
				231	+ emu
				232
				233
				234	.. function:: restore(sequence, which)
				235
				236	Return one of the two sequences that generated a delta.
				237
				238	Given a sequence produced by :meth:`Differ.compare` or :func:`ndiff`, extract
				239	lines originating from file 1 or 2 (parameter which), stripping off line
				240	prefixes.
				241
Christian Heimes	fe337bf	2008-03-23 21:54:12 +0000	[diff] [blame^]	242	Example:
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	243
				244	>>> diff = ndiff('one\ntwo\nthree\n'.splitlines(1),
				245	... 'ore\ntree\nemu\n'.splitlines(1))
				246	>>> diff = list(diff) # materialize the generated delta into a list
Georg Brandl	6911e3c	2007-09-04 07:15:32 +0000	[diff] [blame]	247	>>> print(''.join(restore(diff, 1)), end="")
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	248	one
				249	two
				250	three
Georg Brandl	6911e3c	2007-09-04 07:15:32 +0000	[diff] [blame]	251	>>> print(''.join(restore(diff, 2)), end="")
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	252	ore
				253	tree
				254	emu
				255
				256
				257	.. function:: unified_diff(a, b[, fromfile][, tofile][, fromfiledate][, tofiledate][, n][, lineterm])
				258
Georg Brandl	9afde1c	2007-11-01 20:32:30 +0000	[diff] [blame]	259	Compare a and b (lists of strings); return a delta (a :term:`generator`
				260	generating the delta lines) in unified diff format.
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	261
				262	Unified diffs are a compact way of showing just the lines that have changed plus
				263	a few lines of context. The changes are shown in a inline style (instead of
				264	separate before/after blocks). The number of context lines is set by n which
				265	defaults to three.
				266
				267	By default, the diff control lines (those with ``---``, ``+++``, or ``@@``) are
				268	created with a trailing newline. This is helpful so that inputs created from
				269	:func:`file.readlines` result in diffs that are suitable for use with
				270	:func:`file.writelines` since both the inputs and outputs have trailing
				271	newlines.
				272
				273	For inputs that do not have trailing newlines, set the lineterm argument to
				274	``""`` so that the output will be uniformly newline free.
				275
				276	The context diff format normally has a header for filenames and modification
				277	times. Any or all of these may be specified using strings for fromfile,
				278	tofile, fromfiledate, and tofiledate. The modification times are normally
				279	expressed in the format returned by :func:`time.ctime`. If not specified, the
				280	strings default to blanks.
				281
Christian Heimes	8640e74	2008-02-23 16:23:06 +0000	[diff] [blame]	282
				283	>>> s1 = ['bacon\n', 'eggs\n', 'ham\n', 'guido\n']
				284	>>> s2 = ['python\n', 'eggy\n', 'hamster\n', 'guido\n']
				285	>>> for line in unified_diff(s1, s2, fromfile='before.py', tofile='after.py'):
Christian Heimes	fe337bf	2008-03-23 21:54:12 +0000	[diff] [blame^]	286	... sys.stdout.write(line) # doctest: +NORMALIZE_WHITESPACE
Christian Heimes	8640e74	2008-02-23 16:23:06 +0000	[diff] [blame]	287	--- before.py
				288	+++ after.py
				289	@@ -1,4 +1,4 @@
				290	-bacon
				291	-eggs
				292	-ham
				293	+python
				294	+eggy
				295	+hamster
				296	guido
				297
				298	See :ref:`difflib-interface` for a more detailed example.
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	299
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	300
				301	.. function:: IS_LINE_JUNK(line)
				302
				303	Return true for ignorable lines. The line line is ignorable if line is
				304	blank or contains a single ``'#'``, otherwise it is not ignorable. Used as a
				305	default for parameter linejunk in :func:`ndiff` before Python 2.3.
				306
				307
				308	.. function:: IS_CHARACTER_JUNK(ch)
				309
				310	Return true for ignorable characters. The character ch is ignorable if ch
				311	is a space or tab, otherwise it is not ignorable. Used as a default for
				312	parameter charjunk in :func:`ndiff`.
				313
				314
				315	.. seealso::
				316
				317	`Pattern Matching: The Gestalt Approach <http://www.ddj.com/184407970?pgno=5>`_
				318	Discussion of a similar algorithm by John W. Ratcliff and D. E. Metzener. This
				319	was published in `Dr. Dobb's Journal <http://www.ddj.com/>`_ in July, 1988.
				320
				321
				322	.. _sequence-matcher:
				323
				324	SequenceMatcher Objects
				325	-----------------------
				326
				327	The :class:`SequenceMatcher` class has this constructor:
				328
				329
				330	.. class:: SequenceMatcher([isjunk[, a[, b]]])
				331
				332	Optional argument isjunk must be ``None`` (the default) or a one-argument
				333	function that takes a sequence element and returns true if and only if the
				334	element is "junk" and should be ignored. Passing ``None`` for isjunk is
				335	equivalent to passing ``lambda x: 0``; in other words, no elements are ignored.
				336	For example, pass::
				337
				338	lambda x: x in " \t"
				339
				340	if you're comparing lines as sequences of characters, and don't want to synch up
				341	on blanks or hard tabs.
				342
				343	The optional arguments a and b are sequences to be compared; both default to
Guido van Rossum	2cc30da	2007-11-02 23:46:40 +0000	[diff] [blame]	344	empty strings. The elements of both sequences must be :term:`hashable`.
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	345
				346	:class:`SequenceMatcher` objects have the following methods:
				347
				348
				349	.. method:: SequenceMatcher.set_seqs(a, b)
				350
				351	Set the two sequences to be compared.
				352
				353	:class:`SequenceMatcher` computes and caches detailed information about the
				354	second sequence, so if you want to compare one sequence against many sequences,
				355	use :meth:`set_seq2` to set the commonly used sequence once and call
				356	:meth:`set_seq1` repeatedly, once for each of the other sequences.
				357
				358
				359	.. method:: SequenceMatcher.set_seq1(a)
				360
				361	Set the first sequence to be compared. The second sequence to be compared is
				362	not changed.
				363
				364
				365	.. method:: SequenceMatcher.set_seq2(b)
				366
				367	Set the second sequence to be compared. The first sequence to be compared is
				368	not changed.
				369
				370
				371	.. method:: SequenceMatcher.find_longest_match(alo, ahi, blo, bhi)
				372
				373	Find longest matching block in ``a[alo:ahi]`` and ``b[blo:bhi]``.
				374
Christian Heimes	25bb783	2008-01-11 16:17:00 +0000	[diff] [blame]	375	If isjunk was omitted or ``None``, :meth:`find_longest_match` returns ``(i, j,
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	376	k)`` such that ``a[i:i+k]`` is equal to ``b[j:j+k]``, where ``alo <= i <= i+k <=
				377	ahi`` and ``blo <= j <= j+k <= bhi``. For all ``(i', j', k')`` meeting those
				378	conditions, the additional conditions ``k >= k'``, ``i <= i'``, and if ``i ==
				379	i'``, ``j <= j'`` are also met. In other words, of all maximal matching blocks,
				380	return one that starts earliest in a, and of all those maximal matching blocks
Christian Heimes	fe337bf	2008-03-23 21:54:12 +0000	[diff] [blame^]	381	that start earliest in a, return the one that starts earliest in b.
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	382
				383	>>> s = SequenceMatcher(None, " abcd", "abcd abcd")
				384	>>> s.find_longest_match(0, 5, 0, 9)
Christian Heimes	fe337bf	2008-03-23 21:54:12 +0000	[diff] [blame^]	385	Match(a=0, b=4, size=5)
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	386
				387	If isjunk was provided, first the longest matching block is determined as
				388	above, but with the additional restriction that no junk element appears in the
				389	block. Then that block is extended as far as possible by matching (only) junk
				390	elements on both sides. So the resulting block never matches on junk except as
				391	identical junk happens to be adjacent to an interesting match.
				392
				393	Here's the same example as before, but considering blanks to be junk. That
				394	prevents ``' abcd'`` from matching the ``' abcd'`` at the tail end of the second
				395	sequence directly. Instead only the ``'abcd'`` can match, and matches the
Christian Heimes	fe337bf	2008-03-23 21:54:12 +0000	[diff] [blame^]	396	leftmost ``'abcd'`` in the second sequence:
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	397
				398	>>> s = SequenceMatcher(lambda x: x==" ", " abcd", "abcd abcd")
				399	>>> s.find_longest_match(0, 5, 0, 9)
Christian Heimes	fe337bf	2008-03-23 21:54:12 +0000	[diff] [blame^]	400	Match(a=1, b=0, size=4)
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	401
				402	If no blocks match, this returns ``(alo, blo, 0)``.
				403
Christian Heimes	25bb783	2008-01-11 16:17:00 +0000	[diff] [blame]	404	.. versionchanged:: 2.6
				405	This method returns a :term:`named tuple` ``Match(a, b, size)``.
				406
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	407
				408	.. method:: SequenceMatcher.get_matching_blocks()
				409
				410	Return list of triples describing matching subsequences. Each triple is of the
				411	form ``(i, j, n)``, and means that ``a[i:i+n] == b[j:j+n]``. The triples are
				412	monotonically increasing in i and j.
				413
				414	The last triple is a dummy, and has the value ``(len(a), len(b), 0)``. It is
				415	the only triple with ``n == 0``. If ``(i, j, n)`` and ``(i', j', n')`` are
				416	adjacent triples in the list, and the second is not the last triple in the list,
				417	then ``i+n != i'`` or ``j+n != j'``; in other words, adjacent triples always
				418	describe non-adjacent equal blocks.
				419
Christian Heimes	5b5e81c	2007-12-31 16:14:33 +0000	[diff] [blame]	420	.. XXX Explain why a dummy is used!
				421
Christian Heimes	fe337bf	2008-03-23 21:54:12 +0000	[diff] [blame^]	422	.. doctest::
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	423
				424	>>> s = SequenceMatcher(None, "abxcd", "abcd")
				425	>>> s.get_matching_blocks()
Christian Heimes	fe337bf	2008-03-23 21:54:12 +0000	[diff] [blame^]	426	[Match(a=0, b=0, size=2), Match(a=3, b=2, size=2), Match(a=5, b=4, size=0)]
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	427
				428
				429	.. method:: SequenceMatcher.get_opcodes()
				430
				431	Return list of 5-tuples describing how to turn a into b. Each tuple is of
				432	the form ``(tag, i1, i2, j1, j2)``. The first tuple has ``i1 == j1 == 0``, and
				433	remaining tuples have i1 equal to the i2 from the preceding tuple, and,
				434	likewise, j1 equal to the previous j2.
				435
				436	The tag values are strings, with these meanings:
				437
				438	+---------------+---------------------------------------------+
				439	\| Value \| Meaning \|
				440	+===============+=============================================+
				441	\| ``'replace'`` \| ``a[i1:i2]`` should be replaced by \|
				442	\| \| ``b[j1:j2]``. \|
				443	+---------------+---------------------------------------------+
				444	\| ``'delete'`` \| ``a[i1:i2]`` should be deleted. Note that \|
				445	\| \| ``j1 == j2`` in this case. \|
				446	+---------------+---------------------------------------------+
				447	\| ``'insert'`` \| ``b[j1:j2]`` should be inserted at \|
				448	\| \| ``a[i1:i1]``. Note that ``i1 == i2`` in \|
				449	\| \| this case. \|
				450	+---------------+---------------------------------------------+
				451	\| ``'equal'`` \| ``a[i1:i2] == b[j1:j2]`` (the sub-sequences \|
				452	\| \| are equal). \|
				453	+---------------+---------------------------------------------+
				454
Christian Heimes	fe337bf	2008-03-23 21:54:12 +0000	[diff] [blame^]	455	For example:
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	456
				457	>>> a = "qabxcd"
				458	>>> b = "abycdf"
				459	>>> s = SequenceMatcher(None, a, b)
				460	>>> for tag, i1, i2, j1, j2 in s.get_opcodes():
Georg Brandl	6911e3c	2007-09-04 07:15:32 +0000	[diff] [blame]	461	... print(("%7s a[%d:%d] (%s) b[%d:%d] (%s)" %
				462	... (tag, i1, i2, a[i1:i2], j1, j2, b[j1:j2])))
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	463	delete a[0:1] (q) b[0:0] ()
				464	equal a[1:3] (ab) b[0:2] (ab)
				465	replace a[3:4] (x) b[2:3] (y)
				466	equal a[4:6] (cd) b[3:5] (cd)
				467	insert a[6:6] () b[5:6] (f)
				468
				469
				470	.. method:: SequenceMatcher.get_grouped_opcodes([n])
				471
Georg Brandl	9afde1c	2007-11-01 20:32:30 +0000	[diff] [blame]	472	Return a :term:`generator` of groups with up to n lines of context.
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	473
				474	Starting with the groups returned by :meth:`get_opcodes`, this method splits out
				475	smaller change clusters and eliminates intervening ranges which have no changes.
				476
				477	The groups are returned in the same format as :meth:`get_opcodes`.
				478
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	479
				480	.. method:: SequenceMatcher.ratio()
				481
				482	Return a measure of the sequences' similarity as a float in the range [0, 1].
				483
				484	Where T is the total number of elements in both sequences, and M is the number
				485	of matches, this is 2.0\*M / T. Note that this is ``1.0`` if the sequences are
				486	identical, and ``0.0`` if they have nothing in common.
				487
				488	This is expensive to compute if :meth:`get_matching_blocks` or
				489	:meth:`get_opcodes` hasn't already been called, in which case you may want to
				490	try :meth:`quick_ratio` or :meth:`real_quick_ratio` first to get an upper bound.
				491
				492
				493	.. method:: SequenceMatcher.quick_ratio()
				494
				495	Return an upper bound on :meth:`ratio` relatively quickly.
				496
				497	This isn't defined beyond that it is an upper bound on :meth:`ratio`, and is
				498	faster to compute.
				499
				500
				501	.. method:: SequenceMatcher.real_quick_ratio()
				502
				503	Return an upper bound on :meth:`ratio` very quickly.
				504
				505	This isn't defined beyond that it is an upper bound on :meth:`ratio`, and is
				506	faster to compute than either :meth:`ratio` or :meth:`quick_ratio`.
				507
				508	The three methods that return the ratio of matching to total characters can give
				509	different results due to differing levels of approximation, although
				510	:meth:`quick_ratio` and :meth:`real_quick_ratio` are always at least as large as
Christian Heimes	fe337bf	2008-03-23 21:54:12 +0000	[diff] [blame^]	511	:meth:`ratio`:
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	512
				513	>>> s = SequenceMatcher(None, "abcd", "bcde")
				514	>>> s.ratio()
				515	0.75
				516	>>> s.quick_ratio()
				517	0.75
				518	>>> s.real_quick_ratio()
				519	1.0
				520
				521
				522	.. _sequencematcher-examples:
				523
				524	SequenceMatcher Examples
				525	------------------------
				526
Christian Heimes	fe337bf	2008-03-23 21:54:12 +0000	[diff] [blame^]	527	This example compares two strings, considering blanks to be "junk:"
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	528
				529	>>> s = SequenceMatcher(lambda x: x == " ",
				530	... "private Thread currentThread;",
				531	... "private volatile Thread currentThread;")
				532
				533	:meth:`ratio` returns a float in [0, 1], measuring the similarity of the
				534	sequences. As a rule of thumb, a :meth:`ratio` value over 0.6 means the
Christian Heimes	fe337bf	2008-03-23 21:54:12 +0000	[diff] [blame^]	535	sequences are close matches:
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	536
Georg Brandl	6911e3c	2007-09-04 07:15:32 +0000	[diff] [blame]	537	>>> print(round(s.ratio(), 3))
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	538	0.866
				539
				540	If you're only interested in where the sequences match,
Christian Heimes	fe337bf	2008-03-23 21:54:12 +0000	[diff] [blame^]	541	:meth:`get_matching_blocks` is handy:
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	542
				543	>>> for block in s.get_matching_blocks():
Georg Brandl	6911e3c	2007-09-04 07:15:32 +0000	[diff] [blame]	544	... print("a[%d] and b[%d] match for %d elements" % block)
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	545	a[0] and b[0] match for 8 elements
Christian Heimes	fe337bf	2008-03-23 21:54:12 +0000	[diff] [blame^]	546	a[8] and b[17] match for 21 elements
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	547	a[29] and b[38] match for 0 elements
				548
				549	Note that the last tuple returned by :meth:`get_matching_blocks` is always a
				550	dummy, ``(len(a), len(b), 0)``, and this is the only case in which the last
				551	tuple element (number of elements matched) is ``0``.
				552
				553	If you want to know how to change the first sequence into the second, use
Christian Heimes	fe337bf	2008-03-23 21:54:12 +0000	[diff] [blame^]	554	:meth:`get_opcodes`:
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	555
				556	>>> for opcode in s.get_opcodes():
Georg Brandl	6911e3c	2007-09-04 07:15:32 +0000	[diff] [blame]	557	... print("%6s a[%d:%d] b[%d:%d]" % opcode)
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	558	equal a[0:8] b[0:8]
				559	insert a[8:8] b[8:17]
Christian Heimes	fe337bf	2008-03-23 21:54:12 +0000	[diff] [blame^]	560	equal a[8:29] b[17:38]
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	561
				562	See also the function :func:`get_close_matches` in this module, which shows how
				563	simple code building on :class:`SequenceMatcher` can be used to do useful work.
				564
				565
				566	.. _differ-objects:
				567
				568	Differ Objects
				569	--------------
				570
				571	Note that :class:`Differ`\ -generated deltas make no claim to be minimal
				572	diffs. To the contrary, minimal diffs are often counter-intuitive, because they
				573	synch up anywhere possible, sometimes accidental matches 100 pages apart.
				574	Restricting synch points to contiguous matches preserves some notion of
				575	locality, at the occasional cost of producing a longer diff.
				576
				577	The :class:`Differ` class has this constructor:
				578
				579
				580	.. class:: Differ([linejunk[, charjunk]])
				581
				582	Optional keyword parameters linejunk and charjunk are for filter functions
				583	(or ``None``):
				584
				585	linejunk: A function that accepts a single string argument, and returns true
				586	if the string is junk. The default is ``None``, meaning that no line is
				587	considered junk.
				588
				589	charjunk: A function that accepts a single character argument (a string of
				590	length 1), and returns true if the character is junk. The default is ``None``,
				591	meaning that no character is considered junk.
				592
				593	:class:`Differ` objects are used (deltas generated) via a single method:
				594
				595
				596	.. method:: Differ.compare(a, b)
				597
				598	Compare two sequences of lines, and generate the delta (a sequence of lines).
				599
				600	Each sequence must contain individual single-line strings ending with newlines.
				601	Such sequences can be obtained from the :meth:`readlines` method of file-like
				602	objects. The delta generated also consists of newline-terminated strings, ready
				603	to be printed as-is via the :meth:`writelines` method of a file-like object.
				604
				605
				606	.. _differ-examples:
				607
				608	Differ Example
				609	--------------
				610
				611	This example compares two texts. First we set up the texts, sequences of
				612	individual single-line strings ending with newlines (such sequences can also be
Christian Heimes	fe337bf	2008-03-23 21:54:12 +0000	[diff] [blame^]	613	obtained from the :meth:`readlines` method of file-like objects):
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	614
				615	>>> text1 = ''' 1. Beautiful is better than ugly.
				616	... 2. Explicit is better than implicit.
				617	... 3. Simple is better than complex.
				618	... 4. Complex is better than complicated.
				619	... '''.splitlines(1)
				620	>>> len(text1)
				621	4
				622	>>> text1[0][-1]
				623	'\n'
				624	>>> text2 = ''' 1. Beautiful is better than ugly.
				625	... 3. Simple is better than complex.
				626	... 4. Complicated is better than complex.
				627	... 5. Flat is better than nested.
				628	... '''.splitlines(1)
				629
Christian Heimes	fe337bf	2008-03-23 21:54:12 +0000	[diff] [blame^]	630	Next we instantiate a Differ object:
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	631
				632	>>> d = Differ()
				633
				634	Note that when instantiating a :class:`Differ` object we may pass functions to
				635	filter out line and character "junk." See the :meth:`Differ` constructor for
				636	details.
				637
Christian Heimes	fe337bf	2008-03-23 21:54:12 +0000	[diff] [blame^]	638	Finally, we compare the two:
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	639
				640	>>> result = list(d.compare(text1, text2))
				641
Christian Heimes	fe337bf	2008-03-23 21:54:12 +0000	[diff] [blame^]	642	``result`` is a list of strings, so let's pretty-print it:
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	643
				644	>>> from pprint import pprint
				645	>>> pprint(result)
				646	[' 1. Beautiful is better than ugly.\n',
				647	'- 2. Explicit is better than implicit.\n',
				648	'- 3. Simple is better than complex.\n',
				649	'+ 3. Simple is better than complex.\n',
Christian Heimes	fe337bf	2008-03-23 21:54:12 +0000	[diff] [blame^]	650	'? ++\n',
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	651	'- 4. Complex is better than complicated.\n',
Christian Heimes	fe337bf	2008-03-23 21:54:12 +0000	[diff] [blame^]	652	'? ^ ---- ^\n',
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	653	'+ 4. Complicated is better than complex.\n',
Christian Heimes	fe337bf	2008-03-23 21:54:12 +0000	[diff] [blame^]	654	'? ++++ ^ ^\n',
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	655	'+ 5. Flat is better than nested.\n']
				656
Christian Heimes	fe337bf	2008-03-23 21:54:12 +0000	[diff] [blame^]	657	As a single multi-line string it looks like this:
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	658
				659	>>> import sys
				660	>>> sys.stdout.writelines(result)
				661	1. Beautiful is better than ugly.
				662	- 2. Explicit is better than implicit.
				663	- 3. Simple is better than complex.
				664	+ 3. Simple is better than complex.
				665	? ++
				666	- 4. Complex is better than complicated.
				667	? ^ ---- ^
				668	+ 4. Complicated is better than complex.
				669	? ++++ ^ ^
				670	+ 5. Flat is better than nested.
				671
Christian Heimes	8640e74	2008-02-23 16:23:06 +0000	[diff] [blame]	672
				673	.. _difflib-interface:
				674
				675	A command-line interface to difflib
				676	-----------------------------------
				677
				678	This example shows how to use difflib to create a ``diff``-like utility.
				679	It is also contained in the Python source distribution, as
				680	:file:`Tools/scripts/diff.py`.
				681
Christian Heimes	fe337bf	2008-03-23 21:54:12 +0000	[diff] [blame^]	682	.. testcode::
Christian Heimes	8640e74	2008-02-23 16:23:06 +0000	[diff] [blame]	683
				684	""" Command line interface to difflib.py providing diffs in four formats:
				685
				686	* ndiff: lists every line and highlights interline changes.
				687	* context: highlights clusters of changes in a before/after format.
				688	* unified: highlights clusters of changes in an inline format.
				689	* html: generates side by side comparison with change highlights.
				690
				691	"""
				692
				693	import sys, os, time, difflib, optparse
				694
				695	def main():
				696	# Configure the option parser
				697	usage = "usage: %prog [options] fromfile tofile"
				698	parser = optparse.OptionParser(usage)
				699	parser.add_option("-c", action="store_true", default=False,
				700	help='Produce a context format diff (default)')
				701	parser.add_option("-u", action="store_true", default=False,
				702	help='Produce a unified format diff')
				703	hlp = 'Produce HTML side by side diff (can use -c and -l in conjunction)'
				704	parser.add_option("-m", action="store_true", default=False, help=hlp)
				705	parser.add_option("-n", action="store_true", default=False,
				706	help='Produce a ndiff format diff')
				707	parser.add_option("-l", "--lines", type="int", default=3,
				708	help='Set number of context lines (default 3)')
				709	(options, args) = parser.parse_args()
				710
				711	if len(args) == 0:
				712	parser.print_help()
				713	sys.exit(1)
				714	if len(args) != 2:
				715	parser.error("need to specify both a fromfile and tofile")
				716
				717	n = options.lines
				718	fromfile, tofile = args # as specified in the usage string
				719
				720	# we're passing these as arguments to the diff function
				721	fromdate = time.ctime(os.stat(fromfile).st_mtime)
				722	todate = time.ctime(os.stat(tofile).st_mtime)
				723	fromlines = open(fromfile, 'U').readlines()
				724	tolines = open(tofile, 'U').readlines()
				725
				726	if options.u:
				727	diff = difflib.unified_diff(fromlines, tolines, fromfile, tofile,
				728	fromdate, todate, n=n)
				729	elif options.n:
				730	diff = difflib.ndiff(fromlines, tolines)
				731	elif options.m:
				732	diff = difflib.HtmlDiff().make_file(fromlines, tolines, fromfile,
				733	tofile, context=options.c,
				734	numlines=n)
				735	else:
				736	diff = difflib.context_diff(fromlines, tolines, fromfile, tofile,
				737	fromdate, todate, n=n)
				738
				739	# we're using writelines because diff is a generator
				740	sys.stdout.writelines(diff)
				741
				742	if __name__ == '__main__':
				743	main()