Blame - Doc/library/difflib.rst - platform/external/python/cpython2

blob: 66f64e502fb8ad8519c8a70a148bc53e5eac41f8 [file] [log] [blame]

Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	1
				2	:mod:`difflib` --- Helpers for computing deltas
				3	===============================================
				4
				5	.. module:: difflib
				6	:synopsis: Helpers for computing differences between objects.
				7	.. moduleauthor:: Tim Peters <tim_one@users.sourceforge.net>
				8	.. sectionauthor:: Tim Peters <tim_one@users.sourceforge.net>
Christian Heimes	5b5e81c	2007-12-31 16:14:33 +0000	[diff] [blame]	9	.. Markup by Fred L. Drake, Jr. <fdrake@acm.org>
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	10
				11
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	12
Georg Brandl	9afde1c	2007-11-01 20:32:30 +0000	[diff] [blame]	13	This module provides classes and functions for comparing sequences. It
				14	can be used for example, for comparing files, and can produce difference
				15	information in various formats, including HTML and context and unified
				16	diffs. For comparing directories and files, see also, the :mod:`filecmp` module.
				17
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	18	.. class:: SequenceMatcher
				19
				20	This is a flexible class for comparing pairs of sequences of any type, so long
Guido van Rossum	2cc30da	2007-11-02 23:46:40 +0000	[diff] [blame]	21	as the sequence elements are :term:`hashable`. The basic algorithm predates, and is a
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	22	little fancier than, an algorithm published in the late 1980's by Ratcliff and
				23	Obershelp under the hyperbolic name "gestalt pattern matching." The idea is to
				24	find the longest contiguous matching subsequence that contains no "junk"
				25	elements (the Ratcliff and Obershelp algorithm doesn't address junk). The same
				26	idea is then applied recursively to the pieces of the sequences to the left and
				27	to the right of the matching subsequence. This does not yield minimal edit
				28	sequences, but does tend to yield matches that "look right" to people.
				29
				30	Timing: The basic Ratcliff-Obershelp algorithm is cubic time in the worst
				31	case and quadratic time in the expected case. :class:`SequenceMatcher` is
				32	quadratic time for the worst case and has expected-case behavior dependent in a
				33	complicated way on how many elements the sequences have in common; best case
				34	time is linear.
				35
				36
				37	.. class:: Differ
				38
				39	This is a class for comparing sequences of lines of text, and producing
				40	human-readable differences or deltas. Differ uses :class:`SequenceMatcher`
				41	both to compare sequences of lines, and to compare sequences of characters
				42	within similar (near-matching) lines.
				43
				44	Each line of a :class:`Differ` delta begins with a two-letter code:
				45
				46	+----------+-------------------------------------------+
				47	\| Code \| Meaning \|
				48	+==========+===========================================+
				49	\| ``'- '`` \| line unique to sequence 1 \|
				50	+----------+-------------------------------------------+
				51	\| ``'+ '`` \| line unique to sequence 2 \|
				52	+----------+-------------------------------------------+
				53	\| ``' '`` \| line common to both sequences \|
				54	+----------+-------------------------------------------+
				55	\| ``'? '`` \| line not present in either input sequence \|
				56	+----------+-------------------------------------------+
				57
				58	Lines beginning with '``?``' attempt to guide the eye to intraline differences,
				59	and were not present in either input sequence. These lines can be confusing if
				60	the sequences contain tab characters.
				61
				62
				63	.. class:: HtmlDiff
				64
				65	This class can be used to create an HTML table (or a complete HTML file
				66	containing the table) showing a side by side, line by line comparison of text
				67	with inter-line and intra-line change highlights. The table can be generated in
				68	either full or contextual difference mode.
				69
				70	The constructor for this class is:
				71
				72
				73	.. function:: __init__([tabsize][, wrapcolumn][, linejunk][, charjunk])
				74
				75	Initializes instance of :class:`HtmlDiff`.
				76
				77	tabsize is an optional keyword argument to specify tab stop spacing and
				78	defaults to ``8``.
				79
				80	wrapcolumn is an optional keyword to specify column number where lines are
				81	broken and wrapped, defaults to ``None`` where lines are not wrapped.
				82
				83	linejunk and charjunk are optional keyword arguments passed into ``ndiff()``
				84	(used by :class:`HtmlDiff` to generate the side by side HTML differences). See
				85	``ndiff()`` documentation for argument default values and descriptions.
				86
				87	The following methods are public:
				88
				89
				90	.. function:: make_file(fromlines, tolines [, fromdesc][, todesc][, context][, numlines])
				91
				92	Compares fromlines and tolines (lists of strings) and returns a string which
				93	is a complete HTML file containing a table showing line by line differences with
				94	inter-line and intra-line changes highlighted.
				95
				96	fromdesc and todesc are optional keyword arguments to specify from/to file
				97	column header strings (both default to an empty string).
				98
				99	context and numlines are both optional keyword arguments. Set context to
				100	``True`` when contextual differences are to be shown, else the default is
				101	``False`` to show the full files. numlines defaults to ``5``. When context
				102	is ``True`` numlines controls the number of context lines which surround the
				103	difference highlights. When context is ``False`` numlines controls the
				104	number of lines which are shown before a difference highlight when using the
				105	"next" hyperlinks (setting to zero would cause the "next" hyperlinks to place
				106	the next difference highlight at the top of the browser without any leading
				107	context).
				108
				109
				110	.. function:: make_table(fromlines, tolines [, fromdesc][, todesc][, context][, numlines])
				111
				112	Compares fromlines and tolines (lists of strings) and returns a string which
				113	is a complete HTML table showing line by line differences with inter-line and
				114	intra-line changes highlighted.
				115
				116	The arguments for this method are the same as those for the :meth:`make_file`
				117	method.
				118
				119	:file:`Tools/scripts/diff.py` is a command-line front-end to this class and
				120	contains a good example of its use.
				121
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	122
				123	.. function:: context_diff(a, b[, fromfile][, tofile][, fromfiledate][, tofiledate][, n][, lineterm])
				124
Georg Brandl	9afde1c	2007-11-01 20:32:30 +0000	[diff] [blame]	125	Compare a and b (lists of strings); return a delta (a :term:`generator`
				126	generating the delta lines) in context diff format.
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	127
				128	Context diffs are a compact way of showing just the lines that have changed plus
				129	a few lines of context. The changes are shown in a before/after style. The
				130	number of context lines is set by n which defaults to three.
				131
				132	By default, the diff control lines (those with ``***`` or ``---``) are created
				133	with a trailing newline. This is helpful so that inputs created from
				134	:func:`file.readlines` result in diffs that are suitable for use with
				135	:func:`file.writelines` since both the inputs and outputs have trailing
				136	newlines.
				137
				138	For inputs that do not have trailing newlines, set the lineterm argument to
				139	``""`` so that the output will be uniformly newline free.
				140
				141	The context diff format normally has a header for filenames and modification
				142	times. Any or all of these may be specified using strings for fromfile,
				143	tofile, fromfiledate, and tofiledate. The modification times are normally
				144	expressed in the format returned by :func:`time.ctime`. If not specified, the
				145	strings default to blanks.
				146
Christian Heimes	8640e74	2008-02-23 16:23:06 +0000	[diff] [blame]	147	::
				148
				149	>>> s1 = ['bacon\n', 'eggs\n', 'ham\n', 'guido\n']
				150	>>> s2 = ['python\n', 'eggy\n', 'hamster\n', 'guido\n']
				151	>>> for line in context_diff(s1, s2, fromfile='before.py', tofile='after.py'):
				152	... sys.stdout.write(line)
				153	*** before.py
				154	--- after.py
				155	***************
				156	* 1,4 **
				157	! bacon
				158	! eggs
				159	! ham
				160	guido
				161	--- 1,4 ----
				162	! python
				163	! eggy
				164	! hamster
				165	guido
				166
				167	See :ref:`difflib-interface` for a more detailed example.
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	168
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	169
				170	.. function:: get_close_matches(word, possibilities[, n][, cutoff])
				171
				172	Return a list of the best "good enough" matches. word is a sequence for which
				173	close matches are desired (typically a string), and possibilities is a list of
				174	sequences against which to match word (typically a list of strings).
				175
				176	Optional argument n (default ``3``) is the maximum number of close matches to
				177	return; n must be greater than ``0``.
				178
				179	Optional argument cutoff (default ``0.6``) is a float in the range [0, 1].
				180	Possibilities that don't score at least that similar to word are ignored.
				181
				182	The best (no more than n) matches among the possibilities are returned in a
				183	list, sorted by similarity score, most similar first. ::
				184
				185	>>> get_close_matches('appel', ['ape', 'apple', 'peach', 'puppy'])
				186	['apple', 'ape']
				187	>>> import keyword
				188	>>> get_close_matches('wheel', keyword.kwlist)
				189	['while']
				190	>>> get_close_matches('apple', keyword.kwlist)
				191	[]
				192	>>> get_close_matches('accept', keyword.kwlist)
				193	['except']
				194
				195
				196	.. function:: ndiff(a, b[, linejunk][, charjunk])
				197
Georg Brandl	9afde1c	2007-11-01 20:32:30 +0000	[diff] [blame]	198	Compare a and b (lists of strings); return a :class:`Differ`\ -style
				199	delta (a :term:`generator` generating the delta lines).
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	200
				201	Optional keyword parameters linejunk and charjunk are for filter functions
				202	(or ``None``):
				203
				204	linejunk: A function that accepts a single string argument, and returns true
				205	if the string is junk, or false if not. The default is (``None``), starting with
				206	Python 2.3. Before then, the default was the module-level function
				207	:func:`IS_LINE_JUNK`, which filters out lines without visible characters, except
				208	for at most one pound character (``'#'``). As of Python 2.3, the underlying
				209	:class:`SequenceMatcher` class does a dynamic analysis of which lines are so
				210	frequent as to constitute noise, and this usually works better than the pre-2.3
				211	default.
				212
				213	charjunk: A function that accepts a character (a string of length 1), and
				214	returns if the character is junk, or false if not. The default is module-level
				215	function :func:`IS_CHARACTER_JUNK`, which filters out whitespace characters (a
				216	blank or tab; note: bad idea to include newline in this!).
				217
				218	:file:`Tools/scripts/ndiff.py` is a command-line front-end to this function. ::
				219
				220	>>> diff = ndiff('one\ntwo\nthree\n'.splitlines(1),
				221	... 'ore\ntree\nemu\n'.splitlines(1))
Georg Brandl	6911e3c	2007-09-04 07:15:32 +0000	[diff] [blame]	222	>>> print(''.join(diff), end="")
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	223	- one
				224	? ^
				225	+ ore
				226	? ^
				227	- two
				228	- three
				229	? -
				230	+ tree
				231	+ emu
				232
				233
				234	.. function:: restore(sequence, which)
				235
				236	Return one of the two sequences that generated a delta.
				237
				238	Given a sequence produced by :meth:`Differ.compare` or :func:`ndiff`, extract
				239	lines originating from file 1 or 2 (parameter which), stripping off line
				240	prefixes.
				241
				242	Example::
				243
				244	>>> diff = ndiff('one\ntwo\nthree\n'.splitlines(1),
				245	... 'ore\ntree\nemu\n'.splitlines(1))
				246	>>> diff = list(diff) # materialize the generated delta into a list
Georg Brandl	6911e3c	2007-09-04 07:15:32 +0000	[diff] [blame]	247	>>> print(''.join(restore(diff, 1)), end="")
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	248	one
				249	two
				250	three
Georg Brandl	6911e3c	2007-09-04 07:15:32 +0000	[diff] [blame]	251	>>> print(''.join(restore(diff, 2)), end="")
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	252	ore
				253	tree
				254	emu
				255
				256
				257	.. function:: unified_diff(a, b[, fromfile][, tofile][, fromfiledate][, tofiledate][, n][, lineterm])
				258
Georg Brandl	9afde1c	2007-11-01 20:32:30 +0000	[diff] [blame]	259	Compare a and b (lists of strings); return a delta (a :term:`generator`
				260	generating the delta lines) in unified diff format.
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	261
				262	Unified diffs are a compact way of showing just the lines that have changed plus
				263	a few lines of context. The changes are shown in a inline style (instead of
				264	separate before/after blocks). The number of context lines is set by n which
				265	defaults to three.
				266
				267	By default, the diff control lines (those with ``---``, ``+++``, or ``@@``) are
				268	created with a trailing newline. This is helpful so that inputs created from
				269	:func:`file.readlines` result in diffs that are suitable for use with
				270	:func:`file.writelines` since both the inputs and outputs have trailing
				271	newlines.
				272
				273	For inputs that do not have trailing newlines, set the lineterm argument to
				274	``""`` so that the output will be uniformly newline free.
				275
				276	The context diff format normally has a header for filenames and modification
				277	times. Any or all of these may be specified using strings for fromfile,
				278	tofile, fromfiledate, and tofiledate. The modification times are normally
				279	expressed in the format returned by :func:`time.ctime`. If not specified, the
				280	strings default to blanks.
				281
Christian Heimes	8640e74	2008-02-23 16:23:06 +0000	[diff] [blame]	282	::
				283
				284	>>> s1 = ['bacon\n', 'eggs\n', 'ham\n', 'guido\n']
				285	>>> s2 = ['python\n', 'eggy\n', 'hamster\n', 'guido\n']
				286	>>> for line in unified_diff(s1, s2, fromfile='before.py', tofile='after.py'):
				287	... sys.stdout.write(line)
				288	--- before.py
				289	+++ after.py
				290	@@ -1,4 +1,4 @@
				291	-bacon
				292	-eggs
				293	-ham
				294	+python
				295	+eggy
				296	+hamster
				297	guido
				298
				299	See :ref:`difflib-interface` for a more detailed example.
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	300
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	301
				302	.. function:: IS_LINE_JUNK(line)
				303
				304	Return true for ignorable lines. The line line is ignorable if line is
				305	blank or contains a single ``'#'``, otherwise it is not ignorable. Used as a
				306	default for parameter linejunk in :func:`ndiff` before Python 2.3.
				307
				308
				309	.. function:: IS_CHARACTER_JUNK(ch)
				310
				311	Return true for ignorable characters. The character ch is ignorable if ch
				312	is a space or tab, otherwise it is not ignorable. Used as a default for
				313	parameter charjunk in :func:`ndiff`.
				314
				315
				316	.. seealso::
				317
				318	`Pattern Matching: The Gestalt Approach <http://www.ddj.com/184407970?pgno=5>`_
				319	Discussion of a similar algorithm by John W. Ratcliff and D. E. Metzener. This
				320	was published in `Dr. Dobb's Journal <http://www.ddj.com/>`_ in July, 1988.
				321
				322
				323	.. _sequence-matcher:
				324
				325	SequenceMatcher Objects
				326	-----------------------
				327
				328	The :class:`SequenceMatcher` class has this constructor:
				329
				330
				331	.. class:: SequenceMatcher([isjunk[, a[, b]]])
				332
				333	Optional argument isjunk must be ``None`` (the default) or a one-argument
				334	function that takes a sequence element and returns true if and only if the
				335	element is "junk" and should be ignored. Passing ``None`` for isjunk is
				336	equivalent to passing ``lambda x: 0``; in other words, no elements are ignored.
				337	For example, pass::
				338
				339	lambda x: x in " \t"
				340
				341	if you're comparing lines as sequences of characters, and don't want to synch up
				342	on blanks or hard tabs.
				343
				344	The optional arguments a and b are sequences to be compared; both default to
Guido van Rossum	2cc30da	2007-11-02 23:46:40 +0000	[diff] [blame]	345	empty strings. The elements of both sequences must be :term:`hashable`.
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	346
				347	:class:`SequenceMatcher` objects have the following methods:
				348
				349
				350	.. method:: SequenceMatcher.set_seqs(a, b)
				351
				352	Set the two sequences to be compared.
				353
				354	:class:`SequenceMatcher` computes and caches detailed information about the
				355	second sequence, so if you want to compare one sequence against many sequences,
				356	use :meth:`set_seq2` to set the commonly used sequence once and call
				357	:meth:`set_seq1` repeatedly, once for each of the other sequences.
				358
				359
				360	.. method:: SequenceMatcher.set_seq1(a)
				361
				362	Set the first sequence to be compared. The second sequence to be compared is
				363	not changed.
				364
				365
				366	.. method:: SequenceMatcher.set_seq2(b)
				367
				368	Set the second sequence to be compared. The first sequence to be compared is
				369	not changed.
				370
				371
				372	.. method:: SequenceMatcher.find_longest_match(alo, ahi, blo, bhi)
				373
				374	Find longest matching block in ``a[alo:ahi]`` and ``b[blo:bhi]``.
				375
Christian Heimes	25bb783	2008-01-11 16:17:00 +0000	[diff] [blame]	376	If isjunk was omitted or ``None``, :meth:`find_longest_match` returns ``(i, j,
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	377	k)`` such that ``a[i:i+k]`` is equal to ``b[j:j+k]``, where ``alo <= i <= i+k <=
				378	ahi`` and ``blo <= j <= j+k <= bhi``. For all ``(i', j', k')`` meeting those
				379	conditions, the additional conditions ``k >= k'``, ``i <= i'``, and if ``i ==
				380	i'``, ``j <= j'`` are also met. In other words, of all maximal matching blocks,
				381	return one that starts earliest in a, and of all those maximal matching blocks
				382	that start earliest in a, return the one that starts earliest in b. ::
				383
				384	>>> s = SequenceMatcher(None, " abcd", "abcd abcd")
				385	>>> s.find_longest_match(0, 5, 0, 9)
				386	(0, 4, 5)
				387
				388	If isjunk was provided, first the longest matching block is determined as
				389	above, but with the additional restriction that no junk element appears in the
				390	block. Then that block is extended as far as possible by matching (only) junk
				391	elements on both sides. So the resulting block never matches on junk except as
				392	identical junk happens to be adjacent to an interesting match.
				393
				394	Here's the same example as before, but considering blanks to be junk. That
				395	prevents ``' abcd'`` from matching the ``' abcd'`` at the tail end of the second
				396	sequence directly. Instead only the ``'abcd'`` can match, and matches the
				397	leftmost ``'abcd'`` in the second sequence::
				398
				399	>>> s = SequenceMatcher(lambda x: x==" ", " abcd", "abcd abcd")
				400	>>> s.find_longest_match(0, 5, 0, 9)
				401	(1, 0, 4)
				402
				403	If no blocks match, this returns ``(alo, blo, 0)``.
				404
Christian Heimes	25bb783	2008-01-11 16:17:00 +0000	[diff] [blame]	405	.. versionchanged:: 2.6
				406	This method returns a :term:`named tuple` ``Match(a, b, size)``.
				407
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	408
				409	.. method:: SequenceMatcher.get_matching_blocks()
				410
				411	Return list of triples describing matching subsequences. Each triple is of the
				412	form ``(i, j, n)``, and means that ``a[i:i+n] == b[j:j+n]``. The triples are
				413	monotonically increasing in i and j.
				414
				415	The last triple is a dummy, and has the value ``(len(a), len(b), 0)``. It is
				416	the only triple with ``n == 0``. If ``(i, j, n)`` and ``(i', j', n')`` are
				417	adjacent triples in the list, and the second is not the last triple in the list,
				418	then ``i+n != i'`` or ``j+n != j'``; in other words, adjacent triples always
				419	describe non-adjacent equal blocks.
				420
Christian Heimes	5b5e81c	2007-12-31 16:14:33 +0000	[diff] [blame]	421	.. XXX Explain why a dummy is used!
				422
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	423	::
				424
				425	>>> s = SequenceMatcher(None, "abxcd", "abcd")
				426	>>> s.get_matching_blocks()
				427	[(0, 0, 2), (3, 2, 2), (5, 4, 0)]
				428
				429
				430	.. method:: SequenceMatcher.get_opcodes()
				431
				432	Return list of 5-tuples describing how to turn a into b. Each tuple is of
				433	the form ``(tag, i1, i2, j1, j2)``. The first tuple has ``i1 == j1 == 0``, and
				434	remaining tuples have i1 equal to the i2 from the preceding tuple, and,
				435	likewise, j1 equal to the previous j2.
				436
				437	The tag values are strings, with these meanings:
				438
				439	+---------------+---------------------------------------------+
				440	\| Value \| Meaning \|
				441	+===============+=============================================+
				442	\| ``'replace'`` \| ``a[i1:i2]`` should be replaced by \|
				443	\| \| ``b[j1:j2]``. \|
				444	+---------------+---------------------------------------------+
				445	\| ``'delete'`` \| ``a[i1:i2]`` should be deleted. Note that \|
				446	\| \| ``j1 == j2`` in this case. \|
				447	+---------------+---------------------------------------------+
				448	\| ``'insert'`` \| ``b[j1:j2]`` should be inserted at \|
				449	\| \| ``a[i1:i1]``. Note that ``i1 == i2`` in \|
				450	\| \| this case. \|
				451	+---------------+---------------------------------------------+
				452	\| ``'equal'`` \| ``a[i1:i2] == b[j1:j2]`` (the sub-sequences \|
				453	\| \| are equal). \|
				454	+---------------+---------------------------------------------+
				455
				456	For example::
				457
				458	>>> a = "qabxcd"
				459	>>> b = "abycdf"
				460	>>> s = SequenceMatcher(None, a, b)
				461	>>> for tag, i1, i2, j1, j2 in s.get_opcodes():
Georg Brandl	6911e3c	2007-09-04 07:15:32 +0000	[diff] [blame]	462	... print(("%7s a[%d:%d] (%s) b[%d:%d] (%s)" %
				463	... (tag, i1, i2, a[i1:i2], j1, j2, b[j1:j2])))
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	464	delete a[0:1] (q) b[0:0] ()
				465	equal a[1:3] (ab) b[0:2] (ab)
				466	replace a[3:4] (x) b[2:3] (y)
				467	equal a[4:6] (cd) b[3:5] (cd)
				468	insert a[6:6] () b[5:6] (f)
				469
				470
				471	.. method:: SequenceMatcher.get_grouped_opcodes([n])
				472
Georg Brandl	9afde1c	2007-11-01 20:32:30 +0000	[diff] [blame]	473	Return a :term:`generator` of groups with up to n lines of context.
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	474
				475	Starting with the groups returned by :meth:`get_opcodes`, this method splits out
				476	smaller change clusters and eliminates intervening ranges which have no changes.
				477
				478	The groups are returned in the same format as :meth:`get_opcodes`.
				479
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	480
				481	.. method:: SequenceMatcher.ratio()
				482
				483	Return a measure of the sequences' similarity as a float in the range [0, 1].
				484
				485	Where T is the total number of elements in both sequences, and M is the number
				486	of matches, this is 2.0\*M / T. Note that this is ``1.0`` if the sequences are
				487	identical, and ``0.0`` if they have nothing in common.
				488
				489	This is expensive to compute if :meth:`get_matching_blocks` or
				490	:meth:`get_opcodes` hasn't already been called, in which case you may want to
				491	try :meth:`quick_ratio` or :meth:`real_quick_ratio` first to get an upper bound.
				492
				493
				494	.. method:: SequenceMatcher.quick_ratio()
				495
				496	Return an upper bound on :meth:`ratio` relatively quickly.
				497
				498	This isn't defined beyond that it is an upper bound on :meth:`ratio`, and is
				499	faster to compute.
				500
				501
				502	.. method:: SequenceMatcher.real_quick_ratio()
				503
				504	Return an upper bound on :meth:`ratio` very quickly.
				505
				506	This isn't defined beyond that it is an upper bound on :meth:`ratio`, and is
				507	faster to compute than either :meth:`ratio` or :meth:`quick_ratio`.
				508
				509	The three methods that return the ratio of matching to total characters can give
				510	different results due to differing levels of approximation, although
				511	:meth:`quick_ratio` and :meth:`real_quick_ratio` are always at least as large as
				512	:meth:`ratio`::
				513
				514	>>> s = SequenceMatcher(None, "abcd", "bcde")
				515	>>> s.ratio()
				516	0.75
				517	>>> s.quick_ratio()
				518	0.75
				519	>>> s.real_quick_ratio()
				520	1.0
				521
				522
				523	.. _sequencematcher-examples:
				524
				525	SequenceMatcher Examples
				526	------------------------
				527
				528	This example compares two strings, considering blanks to be "junk:" ::
				529
				530	>>> s = SequenceMatcher(lambda x: x == " ",
				531	... "private Thread currentThread;",
				532	... "private volatile Thread currentThread;")
				533
				534	:meth:`ratio` returns a float in [0, 1], measuring the similarity of the
				535	sequences. As a rule of thumb, a :meth:`ratio` value over 0.6 means the
				536	sequences are close matches::
				537
Georg Brandl	6911e3c	2007-09-04 07:15:32 +0000	[diff] [blame]	538	>>> print(round(s.ratio(), 3))
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	539	0.866
				540
				541	If you're only interested in where the sequences match,
				542	:meth:`get_matching_blocks` is handy::
				543
				544	>>> for block in s.get_matching_blocks():
Georg Brandl	6911e3c	2007-09-04 07:15:32 +0000	[diff] [blame]	545	... print("a[%d] and b[%d] match for %d elements" % block)
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	546	a[0] and b[0] match for 8 elements
				547	a[8] and b[17] match for 6 elements
				548	a[14] and b[23] match for 15 elements
				549	a[29] and b[38] match for 0 elements
				550
				551	Note that the last tuple returned by :meth:`get_matching_blocks` is always a
				552	dummy, ``(len(a), len(b), 0)``, and this is the only case in which the last
				553	tuple element (number of elements matched) is ``0``.
				554
				555	If you want to know how to change the first sequence into the second, use
				556	:meth:`get_opcodes`::
				557
				558	>>> for opcode in s.get_opcodes():
Georg Brandl	6911e3c	2007-09-04 07:15:32 +0000	[diff] [blame]	559	... print("%6s a[%d:%d] b[%d:%d]" % opcode)
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	560	equal a[0:8] b[0:8]
				561	insert a[8:8] b[8:17]
				562	equal a[8:14] b[17:23]
				563	equal a[14:29] b[23:38]
				564
				565	See also the function :func:`get_close_matches` in this module, which shows how
				566	simple code building on :class:`SequenceMatcher` can be used to do useful work.
				567
				568
				569	.. _differ-objects:
				570
				571	Differ Objects
				572	--------------
				573
				574	Note that :class:`Differ`\ -generated deltas make no claim to be minimal
				575	diffs. To the contrary, minimal diffs are often counter-intuitive, because they
				576	synch up anywhere possible, sometimes accidental matches 100 pages apart.
				577	Restricting synch points to contiguous matches preserves some notion of
				578	locality, at the occasional cost of producing a longer diff.
				579
				580	The :class:`Differ` class has this constructor:
				581
				582
				583	.. class:: Differ([linejunk[, charjunk]])
				584
				585	Optional keyword parameters linejunk and charjunk are for filter functions
				586	(or ``None``):
				587
				588	linejunk: A function that accepts a single string argument, and returns true
				589	if the string is junk. The default is ``None``, meaning that no line is
				590	considered junk.
				591
				592	charjunk: A function that accepts a single character argument (a string of
				593	length 1), and returns true if the character is junk. The default is ``None``,
				594	meaning that no character is considered junk.
				595
				596	:class:`Differ` objects are used (deltas generated) via a single method:
				597
				598
				599	.. method:: Differ.compare(a, b)
				600
				601	Compare two sequences of lines, and generate the delta (a sequence of lines).
				602
				603	Each sequence must contain individual single-line strings ending with newlines.
				604	Such sequences can be obtained from the :meth:`readlines` method of file-like
				605	objects. The delta generated also consists of newline-terminated strings, ready
				606	to be printed as-is via the :meth:`writelines` method of a file-like object.
				607
				608
				609	.. _differ-examples:
				610
				611	Differ Example
				612	--------------
				613
				614	This example compares two texts. First we set up the texts, sequences of
				615	individual single-line strings ending with newlines (such sequences can also be
				616	obtained from the :meth:`readlines` method of file-like objects)::
				617
				618	>>> text1 = ''' 1. Beautiful is better than ugly.
				619	... 2. Explicit is better than implicit.
				620	... 3. Simple is better than complex.
				621	... 4. Complex is better than complicated.
				622	... '''.splitlines(1)
				623	>>> len(text1)
				624	4
				625	>>> text1[0][-1]
				626	'\n'
				627	>>> text2 = ''' 1. Beautiful is better than ugly.
				628	... 3. Simple is better than complex.
				629	... 4. Complicated is better than complex.
				630	... 5. Flat is better than nested.
				631	... '''.splitlines(1)
				632
				633	Next we instantiate a Differ object::
				634
				635	>>> d = Differ()
				636
				637	Note that when instantiating a :class:`Differ` object we may pass functions to
				638	filter out line and character "junk." See the :meth:`Differ` constructor for
				639	details.
				640
				641	Finally, we compare the two::
				642
				643	>>> result = list(d.compare(text1, text2))
				644
				645	``result`` is a list of strings, so let's pretty-print it::
				646
				647	>>> from pprint import pprint
				648	>>> pprint(result)
				649	[' 1. Beautiful is better than ugly.\n',
				650	'- 2. Explicit is better than implicit.\n',
				651	'- 3. Simple is better than complex.\n',
				652	'+ 3. Simple is better than complex.\n',
				653	'? ++ \n',
				654	'- 4. Complex is better than complicated.\n',
				655	'? ^ ---- ^ \n',
				656	'+ 4. Complicated is better than complex.\n',
				657	'? ++++ ^ ^ \n',
				658	'+ 5. Flat is better than nested.\n']
				659
				660	As a single multi-line string it looks like this::
				661
				662	>>> import sys
				663	>>> sys.stdout.writelines(result)
				664	1. Beautiful is better than ugly.
				665	- 2. Explicit is better than implicit.
				666	- 3. Simple is better than complex.
				667	+ 3. Simple is better than complex.
				668	? ++
				669	- 4. Complex is better than complicated.
				670	? ^ ---- ^
				671	+ 4. Complicated is better than complex.
				672	? ++++ ^ ^
				673	+ 5. Flat is better than nested.
				674
Christian Heimes	8640e74	2008-02-23 16:23:06 +0000	[diff] [blame]	675
				676	.. _difflib-interface:
				677
				678	A command-line interface to difflib
				679	-----------------------------------
				680
				681	This example shows how to use difflib to create a ``diff``-like utility.
				682	It is also contained in the Python source distribution, as
				683	:file:`Tools/scripts/diff.py`.
				684
				685	::
				686
				687	""" Command line interface to difflib.py providing diffs in four formats:
				688
				689	* ndiff: lists every line and highlights interline changes.
				690	* context: highlights clusters of changes in a before/after format.
				691	* unified: highlights clusters of changes in an inline format.
				692	* html: generates side by side comparison with change highlights.
				693
				694	"""
				695
				696	import sys, os, time, difflib, optparse
				697
				698	def main():
				699	# Configure the option parser
				700	usage = "usage: %prog [options] fromfile tofile"
				701	parser = optparse.OptionParser(usage)
				702	parser.add_option("-c", action="store_true", default=False,
				703	help='Produce a context format diff (default)')
				704	parser.add_option("-u", action="store_true", default=False,
				705	help='Produce a unified format diff')
				706	hlp = 'Produce HTML side by side diff (can use -c and -l in conjunction)'
				707	parser.add_option("-m", action="store_true", default=False, help=hlp)
				708	parser.add_option("-n", action="store_true", default=False,
				709	help='Produce a ndiff format diff')
				710	parser.add_option("-l", "--lines", type="int", default=3,
				711	help='Set number of context lines (default 3)')
				712	(options, args) = parser.parse_args()
				713
				714	if len(args) == 0:
				715	parser.print_help()
				716	sys.exit(1)
				717	if len(args) != 2:
				718	parser.error("need to specify both a fromfile and tofile")
				719
				720	n = options.lines
				721	fromfile, tofile = args # as specified in the usage string
				722
				723	# we're passing these as arguments to the diff function
				724	fromdate = time.ctime(os.stat(fromfile).st_mtime)
				725	todate = time.ctime(os.stat(tofile).st_mtime)
				726	fromlines = open(fromfile, 'U').readlines()
				727	tolines = open(tofile, 'U').readlines()
				728
				729	if options.u:
				730	diff = difflib.unified_diff(fromlines, tolines, fromfile, tofile,
				731	fromdate, todate, n=n)
				732	elif options.n:
				733	diff = difflib.ndiff(fromlines, tolines)
				734	elif options.m:
				735	diff = difflib.HtmlDiff().make_file(fromlines, tolines, fromfile,
				736	tofile, context=options.c,
				737	numlines=n)
				738	else:
				739	diff = difflib.context_diff(fromlines, tolines, fromfile, tofile,
				740	fromdate, todate, n=n)
				741
				742	# we're using writelines because diff is a generator
				743	sys.stdout.writelines(diff)
				744
				745	if __name__ == '__main__':
				746	main()