Blame - Doc/library/difflib.rst - platform/external/python/cpython2

blob: ff7a66eb4ed665f43ec3f161c4cd80a25ac1cb3a [file] [log] [blame]

Georg Brandl	8ec7f65	2007-08-15 14:28:01 +0000	[diff] [blame]	1
				2	:mod:`difflib` --- Helpers for computing deltas
				3	===============================================
				4
				5	.. module:: difflib
				6	:synopsis: Helpers for computing differences between objects.
				7	.. moduleauthor:: Tim Peters <tim_one@users.sourceforge.net>
				8	.. sectionauthor:: Tim Peters <tim_one@users.sourceforge.net>
Georg Brandl	b19be57	2007-12-29 10:57:00 +0000	[diff] [blame]	9	.. Markup by Fred L. Drake, Jr. <fdrake@acm.org>
Georg Brandl	8ec7f65	2007-08-15 14:28:01 +0000	[diff] [blame]	10
				11
Georg Brandl	8ec7f65	2007-08-15 14:28:01 +0000	[diff] [blame]	12
				13	.. versionadded:: 2.1
				14
Mark Summerfield	0752d20	2007-10-19 12:48:17 +0000	[diff] [blame]	15	This module provides classes and functions for comparing sequences. It
				16	can be used for example, for comparing files, and can produce difference
				17	information in various formats, including HTML and context and unified
				18	diffs. For comparing directories and files, see also, the :mod:`filecmp` module.
Georg Brandl	8ec7f65	2007-08-15 14:28:01 +0000	[diff] [blame]	19
				20	.. class:: SequenceMatcher
				21
				22	This is a flexible class for comparing pairs of sequences of any type, so long
Georg Brandl	7c3e79f	2007-11-02 20:06:17 +0000	[diff] [blame]	23	as the sequence elements are :term:`hashable`. The basic algorithm predates, and is a
Georg Brandl	8ec7f65	2007-08-15 14:28:01 +0000	[diff] [blame]	24	little fancier than, an algorithm published in the late 1980's by Ratcliff and
				25	Obershelp under the hyperbolic name "gestalt pattern matching." The idea is to
				26	find the longest contiguous matching subsequence that contains no "junk"
				27	elements (the Ratcliff and Obershelp algorithm doesn't address junk). The same
				28	idea is then applied recursively to the pieces of the sequences to the left and
				29	to the right of the matching subsequence. This does not yield minimal edit
				30	sequences, but does tend to yield matches that "look right" to people.
				31
				32	Timing: The basic Ratcliff-Obershelp algorithm is cubic time in the worst
				33	case and quadratic time in the expected case. :class:`SequenceMatcher` is
				34	quadratic time for the worst case and has expected-case behavior dependent in a
				35	complicated way on how many elements the sequences have in common; best case
				36	time is linear.
				37
				38
				39	.. class:: Differ
				40
				41	This is a class for comparing sequences of lines of text, and producing
				42	human-readable differences or deltas. Differ uses :class:`SequenceMatcher`
				43	both to compare sequences of lines, and to compare sequences of characters
				44	within similar (near-matching) lines.
				45
				46	Each line of a :class:`Differ` delta begins with a two-letter code:
				47
				48	+----------+-------------------------------------------+
				49	\| Code \| Meaning \|
				50	+==========+===========================================+
				51	\| ``'- '`` \| line unique to sequence 1 \|
				52	+----------+-------------------------------------------+
				53	\| ``'+ '`` \| line unique to sequence 2 \|
				54	+----------+-------------------------------------------+
				55	\| ``' '`` \| line common to both sequences \|
				56	+----------+-------------------------------------------+
				57	\| ``'? '`` \| line not present in either input sequence \|
				58	+----------+-------------------------------------------+
				59
				60	Lines beginning with '``?``' attempt to guide the eye to intraline differences,
				61	and were not present in either input sequence. These lines can be confusing if
				62	the sequences contain tab characters.
				63
				64
				65	.. class:: HtmlDiff
				66
				67	This class can be used to create an HTML table (or a complete HTML file
				68	containing the table) showing a side by side, line by line comparison of text
				69	with inter-line and intra-line change highlights. The table can be generated in
				70	either full or contextual difference mode.
				71
				72	The constructor for this class is:
				73
				74
				75	.. function:: __init__([tabsize][, wrapcolumn][, linejunk][, charjunk])
				76
				77	Initializes instance of :class:`HtmlDiff`.
				78
				79	tabsize is an optional keyword argument to specify tab stop spacing and
				80	defaults to ``8``.
				81
				82	wrapcolumn is an optional keyword to specify column number where lines are
				83	broken and wrapped, defaults to ``None`` where lines are not wrapped.
				84
				85	linejunk and charjunk are optional keyword arguments passed into ``ndiff()``
				86	(used by :class:`HtmlDiff` to generate the side by side HTML differences). See
				87	``ndiff()`` documentation for argument default values and descriptions.
				88
				89	The following methods are public:
				90
				91
				92	.. function:: make_file(fromlines, tolines [, fromdesc][, todesc][, context][, numlines])
				93
				94	Compares fromlines and tolines (lists of strings) and returns a string which
				95	is a complete HTML file containing a table showing line by line differences with
				96	inter-line and intra-line changes highlighted.
				97
				98	fromdesc and todesc are optional keyword arguments to specify from/to file
				99	column header strings (both default to an empty string).
				100
				101	context and numlines are both optional keyword arguments. Set context to
				102	``True`` when contextual differences are to be shown, else the default is
				103	``False`` to show the full files. numlines defaults to ``5``. When context
				104	is ``True`` numlines controls the number of context lines which surround the
				105	difference highlights. When context is ``False`` numlines controls the
				106	number of lines which are shown before a difference highlight when using the
				107	"next" hyperlinks (setting to zero would cause the "next" hyperlinks to place
				108	the next difference highlight at the top of the browser without any leading
				109	context).
				110
				111
				112	.. function:: make_table(fromlines, tolines [, fromdesc][, todesc][, context][, numlines])
				113
				114	Compares fromlines and tolines (lists of strings) and returns a string which
				115	is a complete HTML table showing line by line differences with inter-line and
				116	intra-line changes highlighted.
				117
				118	The arguments for this method are the same as those for the :meth:`make_file`
				119	method.
				120
				121	:file:`Tools/scripts/diff.py` is a command-line front-end to this class and
				122	contains a good example of its use.
				123
				124	.. versionadded:: 2.4
				125
				126
				127	.. function:: context_diff(a, b[, fromfile][, tofile][, fromfiledate][, tofiledate][, n][, lineterm])
				128
Georg Brandl	cf3fb25	2007-10-21 10:52:38 +0000	[diff] [blame]	129	Compare a and b (lists of strings); return a delta (a :term:`generator`
				130	generating the delta lines) in context diff format.
Georg Brandl	8ec7f65	2007-08-15 14:28:01 +0000	[diff] [blame]	131
				132	Context diffs are a compact way of showing just the lines that have changed plus
				133	a few lines of context. The changes are shown in a before/after style. The
				134	number of context lines is set by n which defaults to three.
				135
				136	By default, the diff control lines (those with ``***`` or ``---``) are created
				137	with a trailing newline. This is helpful so that inputs created from
				138	:func:`file.readlines` result in diffs that are suitable for use with
				139	:func:`file.writelines` since both the inputs and outputs have trailing
				140	newlines.
				141
				142	For inputs that do not have trailing newlines, set the lineterm argument to
				143	``""`` so that the output will be uniformly newline free.
				144
				145	The context diff format normally has a header for filenames and modification
				146	times. Any or all of these may be specified using strings for fromfile,
				147	tofile, fromfiledate, and tofiledate. The modification times are normally
				148	expressed in the format returned by :func:`time.ctime`. If not specified, the
				149	strings default to blanks.
				150
Georg Brandl	080b094	2008-02-23 15:19:54 +0000	[diff] [blame^]	151	::
				152
				153	>>> s1 = ['bacon\n', 'eggs\n', 'ham\n', 'guido\n']
				154	>>> s2 = ['python\n', 'eggy\n', 'hamster\n', 'guido\n']
				155	>>> for line in context_diff(s1, s2, fromfile='before.py', tofile='after.py'):
				156	... sys.stdout.write(line)
				157	*** before.py
				158	--- after.py
				159	***************
				160	* 1,4 **
				161	! bacon
				162	! eggs
				163	! ham
				164	guido
				165	--- 1,4 ----
				166	! python
				167	! eggy
				168	! hamster
				169	guido
				170
				171	See :ref:`difflib-interface` for a more detailed example.
Georg Brandl	8ec7f65	2007-08-15 14:28:01 +0000	[diff] [blame]	172
				173	.. versionadded:: 2.3
				174
				175
				176	.. function:: get_close_matches(word, possibilities[, n][, cutoff])
				177
				178	Return a list of the best "good enough" matches. word is a sequence for which
				179	close matches are desired (typically a string), and possibilities is a list of
				180	sequences against which to match word (typically a list of strings).
				181
				182	Optional argument n (default ``3``) is the maximum number of close matches to
				183	return; n must be greater than ``0``.
				184
				185	Optional argument cutoff (default ``0.6``) is a float in the range [0, 1].
				186	Possibilities that don't score at least that similar to word are ignored.
				187
				188	The best (no more than n) matches among the possibilities are returned in a
				189	list, sorted by similarity score, most similar first. ::
				190
				191	>>> get_close_matches('appel', ['ape', 'apple', 'peach', 'puppy'])
				192	['apple', 'ape']
				193	>>> import keyword
				194	>>> get_close_matches('wheel', keyword.kwlist)
				195	['while']
				196	>>> get_close_matches('apple', keyword.kwlist)
				197	[]
				198	>>> get_close_matches('accept', keyword.kwlist)
				199	['except']
				200
				201
				202	.. function:: ndiff(a, b[, linejunk][, charjunk])
				203
Georg Brandl	cf3fb25	2007-10-21 10:52:38 +0000	[diff] [blame]	204	Compare a and b (lists of strings); return a :class:`Differ`\ -style
				205	delta (a :term:`generator` generating the delta lines).
Georg Brandl	8ec7f65	2007-08-15 14:28:01 +0000	[diff] [blame]	206
				207	Optional keyword parameters linejunk and charjunk are for filter functions
				208	(or ``None``):
				209
				210	linejunk: A function that accepts a single string argument, and returns true
				211	if the string is junk, or false if not. The default is (``None``), starting with
				212	Python 2.3. Before then, the default was the module-level function
				213	:func:`IS_LINE_JUNK`, which filters out lines without visible characters, except
				214	for at most one pound character (``'#'``). As of Python 2.3, the underlying
				215	:class:`SequenceMatcher` class does a dynamic analysis of which lines are so
				216	frequent as to constitute noise, and this usually works better than the pre-2.3
				217	default.
				218
				219	charjunk: A function that accepts a character (a string of length 1), and
				220	returns if the character is junk, or false if not. The default is module-level
				221	function :func:`IS_CHARACTER_JUNK`, which filters out whitespace characters (a
				222	blank or tab; note: bad idea to include newline in this!).
				223
				224	:file:`Tools/scripts/ndiff.py` is a command-line front-end to this function. ::
				225
				226	>>> diff = ndiff('one\ntwo\nthree\n'.splitlines(1),
				227	... 'ore\ntree\nemu\n'.splitlines(1))
				228	>>> print ''.join(diff),
				229	- one
				230	? ^
				231	+ ore
				232	? ^
				233	- two
				234	- three
				235	? -
				236	+ tree
				237	+ emu
				238
				239
				240	.. function:: restore(sequence, which)
				241
				242	Return one of the two sequences that generated a delta.
				243
				244	Given a sequence produced by :meth:`Differ.compare` or :func:`ndiff`, extract
				245	lines originating from file 1 or 2 (parameter which), stripping off line
				246	prefixes.
				247
				248	Example::
				249
				250	>>> diff = ndiff('one\ntwo\nthree\n'.splitlines(1),
				251	... 'ore\ntree\nemu\n'.splitlines(1))
				252	>>> diff = list(diff) # materialize the generated delta into a list
				253	>>> print ''.join(restore(diff, 1)),
				254	one
				255	two
				256	three
				257	>>> print ''.join(restore(diff, 2)),
				258	ore
				259	tree
				260	emu
				261
				262
				263	.. function:: unified_diff(a, b[, fromfile][, tofile][, fromfiledate][, tofiledate][, n][, lineterm])
				264
Georg Brandl	cf3fb25	2007-10-21 10:52:38 +0000	[diff] [blame]	265	Compare a and b (lists of strings); return a delta (a :term:`generator`
				266	generating the delta lines) in unified diff format.
Georg Brandl	8ec7f65	2007-08-15 14:28:01 +0000	[diff] [blame]	267
				268	Unified diffs are a compact way of showing just the lines that have changed plus
				269	a few lines of context. The changes are shown in a inline style (instead of
				270	separate before/after blocks). The number of context lines is set by n which
				271	defaults to three.
				272
				273	By default, the diff control lines (those with ``---``, ``+++``, or ``@@``) are
				274	created with a trailing newline. This is helpful so that inputs created from
				275	:func:`file.readlines` result in diffs that are suitable for use with
				276	:func:`file.writelines` since both the inputs and outputs have trailing
				277	newlines.
				278
				279	For inputs that do not have trailing newlines, set the lineterm argument to
				280	``""`` so that the output will be uniformly newline free.
				281
				282	The context diff format normally has a header for filenames and modification
				283	times. Any or all of these may be specified using strings for fromfile,
				284	tofile, fromfiledate, and tofiledate. The modification times are normally
				285	expressed in the format returned by :func:`time.ctime`. If not specified, the
				286	strings default to blanks.
				287
Georg Brandl	080b094	2008-02-23 15:19:54 +0000	[diff] [blame^]	288	::
				289
				290
				291	>>> s1 = ['bacon\n', 'eggs\n', 'ham\n', 'guido\n']
				292	>>> s2 = ['python\n', 'eggy\n', 'hamster\n', 'guido\n']
				293	>>> for line in unified_diff(s1, s2, fromfile='before.py', tofile='after.py'):
				294	... sys.stdout.write(line)
				295	--- before.py
				296	+++ after.py
				297	@@ -1,4 +1,4 @@
				298	-bacon
				299	-eggs
				300	-ham
				301	+python
				302	+eggy
				303	+hamster
				304	guido
				305
				306	See :ref:`difflib-interface` for a more detailed example.
Georg Brandl	8ec7f65	2007-08-15 14:28:01 +0000	[diff] [blame]	307
				308	.. versionadded:: 2.3
				309
				310
				311	.. function:: IS_LINE_JUNK(line)
				312
				313	Return true for ignorable lines. The line line is ignorable if line is
				314	blank or contains a single ``'#'``, otherwise it is not ignorable. Used as a
				315	default for parameter linejunk in :func:`ndiff` before Python 2.3.
				316
				317
				318	.. function:: IS_CHARACTER_JUNK(ch)
				319
				320	Return true for ignorable characters. The character ch is ignorable if ch
				321	is a space or tab, otherwise it is not ignorable. Used as a default for
				322	parameter charjunk in :func:`ndiff`.
				323
				324
				325	.. seealso::
				326
				327	`Pattern Matching: The Gestalt Approach <http://www.ddj.com/184407970?pgno=5>`_
				328	Discussion of a similar algorithm by John W. Ratcliff and D. E. Metzener. This
				329	was published in `Dr. Dobb's Journal <http://www.ddj.com/>`_ in July, 1988.
				330
				331
				332	.. _sequence-matcher:
				333
				334	SequenceMatcher Objects
				335	-----------------------
				336
				337	The :class:`SequenceMatcher` class has this constructor:
				338
				339
				340	.. class:: SequenceMatcher([isjunk[, a[, b]]])
				341
				342	Optional argument isjunk must be ``None`` (the default) or a one-argument
				343	function that takes a sequence element and returns true if and only if the
				344	element is "junk" and should be ignored. Passing ``None`` for isjunk is
				345	equivalent to passing ``lambda x: 0``; in other words, no elements are ignored.
				346	For example, pass::
				347
				348	lambda x: x in " \t"
				349
				350	if you're comparing lines as sequences of characters, and don't want to synch up
				351	on blanks or hard tabs.
				352
				353	The optional arguments a and b are sequences to be compared; both default to
Georg Brandl	7c3e79f	2007-11-02 20:06:17 +0000	[diff] [blame]	354	empty strings. The elements of both sequences must be :term:`hashable`.
Georg Brandl	8ec7f65	2007-08-15 14:28:01 +0000	[diff] [blame]	355
				356	:class:`SequenceMatcher` objects have the following methods:
				357
				358
				359	.. method:: SequenceMatcher.set_seqs(a, b)
				360
				361	Set the two sequences to be compared.
				362
				363	:class:`SequenceMatcher` computes and caches detailed information about the
				364	second sequence, so if you want to compare one sequence against many sequences,
				365	use :meth:`set_seq2` to set the commonly used sequence once and call
				366	:meth:`set_seq1` repeatedly, once for each of the other sequences.
				367
				368
				369	.. method:: SequenceMatcher.set_seq1(a)
				370
				371	Set the first sequence to be compared. The second sequence to be compared is
				372	not changed.
				373
				374
				375	.. method:: SequenceMatcher.set_seq2(b)
				376
				377	Set the second sequence to be compared. The first sequence to be compared is
				378	not changed.
				379
				380
				381	.. method:: SequenceMatcher.find_longest_match(alo, ahi, blo, bhi)
				382
				383	Find longest matching block in ``a[alo:ahi]`` and ``b[blo:bhi]``.
				384
Georg Brandl	e3c3db5	2008-01-11 09:55:53 +0000	[diff] [blame]	385	If isjunk was omitted or ``None``, :meth:`find_longest_match` returns ``(i, j,
Georg Brandl	8ec7f65	2007-08-15 14:28:01 +0000	[diff] [blame]	386	k)`` such that ``a[i:i+k]`` is equal to ``b[j:j+k]``, where ``alo <= i <= i+k <=
				387	ahi`` and ``blo <= j <= j+k <= bhi``. For all ``(i', j', k')`` meeting those
				388	conditions, the additional conditions ``k >= k'``, ``i <= i'``, and if ``i ==
				389	i'``, ``j <= j'`` are also met. In other words, of all maximal matching blocks,
				390	return one that starts earliest in a, and of all those maximal matching blocks
				391	that start earliest in a, return the one that starts earliest in b. ::
				392
				393	>>> s = SequenceMatcher(None, " abcd", "abcd abcd")
				394	>>> s.find_longest_match(0, 5, 0, 9)
				395	(0, 4, 5)
				396
				397	If isjunk was provided, first the longest matching block is determined as
				398	above, but with the additional restriction that no junk element appears in the
				399	block. Then that block is extended as far as possible by matching (only) junk
				400	elements on both sides. So the resulting block never matches on junk except as
				401	identical junk happens to be adjacent to an interesting match.
				402
				403	Here's the same example as before, but considering blanks to be junk. That
				404	prevents ``' abcd'`` from matching the ``' abcd'`` at the tail end of the second
				405	sequence directly. Instead only the ``'abcd'`` can match, and matches the
				406	leftmost ``'abcd'`` in the second sequence::
				407
				408	>>> s = SequenceMatcher(lambda x: x==" ", " abcd", "abcd abcd")
				409	>>> s.find_longest_match(0, 5, 0, 9)
				410	(1, 0, 4)
				411
				412	If no blocks match, this returns ``(alo, blo, 0)``.
				413
Georg Brandl	e3c3db5	2008-01-11 09:55:53 +0000	[diff] [blame]	414	.. versionchanged:: 2.6
				415	This method returns a :term:`named tuple` ``Match(a, b, size)``.
				416
Georg Brandl	8ec7f65	2007-08-15 14:28:01 +0000	[diff] [blame]	417
				418	.. method:: SequenceMatcher.get_matching_blocks()
				419
				420	Return list of triples describing matching subsequences. Each triple is of the
				421	form ``(i, j, n)``, and means that ``a[i:i+n] == b[j:j+n]``. The triples are
				422	monotonically increasing in i and j.
				423
				424	The last triple is a dummy, and has the value ``(len(a), len(b), 0)``. It is
				425	the only triple with ``n == 0``. If ``(i, j, n)`` and ``(i', j', n')`` are
				426	adjacent triples in the list, and the second is not the last triple in the list,
				427	then ``i+n != i'`` or ``j+n != j'``; in other words, adjacent triples always
				428	describe non-adjacent equal blocks.
				429
Georg Brandl	b19be57	2007-12-29 10:57:00 +0000	[diff] [blame]	430	.. XXX Explain why a dummy is used!
Georg Brandl	8ec7f65	2007-08-15 14:28:01 +0000	[diff] [blame]	431
				432	.. versionchanged:: 2.5
				433	The guarantee that adjacent triples always describe non-adjacent blocks was
				434	implemented.
				435
				436	::
				437
				438	>>> s = SequenceMatcher(None, "abxcd", "abcd")
				439	>>> s.get_matching_blocks()
				440	[(0, 0, 2), (3, 2, 2), (5, 4, 0)]
				441
				442
				443	.. method:: SequenceMatcher.get_opcodes()
				444
				445	Return list of 5-tuples describing how to turn a into b. Each tuple is of
				446	the form ``(tag, i1, i2, j1, j2)``. The first tuple has ``i1 == j1 == 0``, and
				447	remaining tuples have i1 equal to the i2 from the preceding tuple, and,
				448	likewise, j1 equal to the previous j2.
				449
				450	The tag values are strings, with these meanings:
				451
				452	+---------------+---------------------------------------------+
				453	\| Value \| Meaning \|
				454	+===============+=============================================+
				455	\| ``'replace'`` \| ``a[i1:i2]`` should be replaced by \|
				456	\| \| ``b[j1:j2]``. \|
				457	+---------------+---------------------------------------------+
				458	\| ``'delete'`` \| ``a[i1:i2]`` should be deleted. Note that \|
				459	\| \| ``j1 == j2`` in this case. \|
				460	+---------------+---------------------------------------------+
				461	\| ``'insert'`` \| ``b[j1:j2]`` should be inserted at \|
				462	\| \| ``a[i1:i1]``. Note that ``i1 == i2`` in \|
				463	\| \| this case. \|
				464	+---------------+---------------------------------------------+
				465	\| ``'equal'`` \| ``a[i1:i2] == b[j1:j2]`` (the sub-sequences \|
				466	\| \| are equal). \|
				467	+---------------+---------------------------------------------+
				468
				469	For example::
				470
				471	>>> a = "qabxcd"
				472	>>> b = "abycdf"
				473	>>> s = SequenceMatcher(None, a, b)
				474	>>> for tag, i1, i2, j1, j2 in s.get_opcodes():
				475	... print ("%7s a[%d:%d] (%s) b[%d:%d] (%s)" %
				476	... (tag, i1, i2, a[i1:i2], j1, j2, b[j1:j2]))
				477	delete a[0:1] (q) b[0:0] ()
				478	equal a[1:3] (ab) b[0:2] (ab)
				479	replace a[3:4] (x) b[2:3] (y)
				480	equal a[4:6] (cd) b[3:5] (cd)
				481	insert a[6:6] () b[5:6] (f)
				482
				483
				484	.. method:: SequenceMatcher.get_grouped_opcodes([n])
				485
Georg Brandl	cf3fb25	2007-10-21 10:52:38 +0000	[diff] [blame]	486	Return a :term:`generator` of groups with up to n lines of context.
Georg Brandl	8ec7f65	2007-08-15 14:28:01 +0000	[diff] [blame]	487
				488	Starting with the groups returned by :meth:`get_opcodes`, this method splits out
				489	smaller change clusters and eliminates intervening ranges which have no changes.
				490
				491	The groups are returned in the same format as :meth:`get_opcodes`.
				492
				493	.. versionadded:: 2.3
				494
				495
				496	.. method:: SequenceMatcher.ratio()
				497
				498	Return a measure of the sequences' similarity as a float in the range [0, 1].
				499
				500	Where T is the total number of elements in both sequences, and M is the number
				501	of matches, this is 2.0\*M / T. Note that this is ``1.0`` if the sequences are
				502	identical, and ``0.0`` if they have nothing in common.
				503
				504	This is expensive to compute if :meth:`get_matching_blocks` or
				505	:meth:`get_opcodes` hasn't already been called, in which case you may want to
				506	try :meth:`quick_ratio` or :meth:`real_quick_ratio` first to get an upper bound.
				507
				508
				509	.. method:: SequenceMatcher.quick_ratio()
				510
				511	Return an upper bound on :meth:`ratio` relatively quickly.
				512
				513	This isn't defined beyond that it is an upper bound on :meth:`ratio`, and is
				514	faster to compute.
				515
				516
				517	.. method:: SequenceMatcher.real_quick_ratio()
				518
				519	Return an upper bound on :meth:`ratio` very quickly.
				520
				521	This isn't defined beyond that it is an upper bound on :meth:`ratio`, and is
				522	faster to compute than either :meth:`ratio` or :meth:`quick_ratio`.
				523
				524	The three methods that return the ratio of matching to total characters can give
				525	different results due to differing levels of approximation, although
				526	:meth:`quick_ratio` and :meth:`real_quick_ratio` are always at least as large as
				527	:meth:`ratio`::
				528
				529	>>> s = SequenceMatcher(None, "abcd", "bcde")
				530	>>> s.ratio()
				531	0.75
				532	>>> s.quick_ratio()
				533	0.75
				534	>>> s.real_quick_ratio()
				535	1.0
				536
				537
				538	.. _sequencematcher-examples:
				539
				540	SequenceMatcher Examples
				541	------------------------
				542
				543	This example compares two strings, considering blanks to be "junk:" ::
				544
				545	>>> s = SequenceMatcher(lambda x: x == " ",
				546	... "private Thread currentThread;",
				547	... "private volatile Thread currentThread;")
				548
				549	:meth:`ratio` returns a float in [0, 1], measuring the similarity of the
				550	sequences. As a rule of thumb, a :meth:`ratio` value over 0.6 means the
				551	sequences are close matches::
				552
				553	>>> print round(s.ratio(), 3)
				554	0.866
				555
				556	If you're only interested in where the sequences match,
				557	:meth:`get_matching_blocks` is handy::
				558
				559	>>> for block in s.get_matching_blocks():
				560	... print "a[%d] and b[%d] match for %d elements" % block
				561	a[0] and b[0] match for 8 elements
				562	a[8] and b[17] match for 6 elements
				563	a[14] and b[23] match for 15 elements
				564	a[29] and b[38] match for 0 elements
				565
				566	Note that the last tuple returned by :meth:`get_matching_blocks` is always a
				567	dummy, ``(len(a), len(b), 0)``, and this is the only case in which the last
				568	tuple element (number of elements matched) is ``0``.
				569
				570	If you want to know how to change the first sequence into the second, use
				571	:meth:`get_opcodes`::
				572
				573	>>> for opcode in s.get_opcodes():
				574	... print "%6s a[%d:%d] b[%d:%d]" % opcode
				575	equal a[0:8] b[0:8]
				576	insert a[8:8] b[8:17]
				577	equal a[8:14] b[17:23]
				578	equal a[14:29] b[23:38]
				579
				580	See also the function :func:`get_close_matches` in this module, which shows how
				581	simple code building on :class:`SequenceMatcher` can be used to do useful work.
				582
				583
				584	.. _differ-objects:
				585
				586	Differ Objects
				587	--------------
				588
				589	Note that :class:`Differ`\ -generated deltas make no claim to be minimal
				590	diffs. To the contrary, minimal diffs are often counter-intuitive, because they
				591	synch up anywhere possible, sometimes accidental matches 100 pages apart.
				592	Restricting synch points to contiguous matches preserves some notion of
				593	locality, at the occasional cost of producing a longer diff.
				594
				595	The :class:`Differ` class has this constructor:
				596
				597
				598	.. class:: Differ([linejunk[, charjunk]])
				599
				600	Optional keyword parameters linejunk and charjunk are for filter functions
				601	(or ``None``):
				602
				603	linejunk: A function that accepts a single string argument, and returns true
				604	if the string is junk. The default is ``None``, meaning that no line is
				605	considered junk.
				606
				607	charjunk: A function that accepts a single character argument (a string of
				608	length 1), and returns true if the character is junk. The default is ``None``,
				609	meaning that no character is considered junk.
				610
				611	:class:`Differ` objects are used (deltas generated) via a single method:
				612
				613
				614	.. method:: Differ.compare(a, b)
				615
				616	Compare two sequences of lines, and generate the delta (a sequence of lines).
				617
				618	Each sequence must contain individual single-line strings ending with newlines.
				619	Such sequences can be obtained from the :meth:`readlines` method of file-like
				620	objects. The delta generated also consists of newline-terminated strings, ready
				621	to be printed as-is via the :meth:`writelines` method of a file-like object.
				622
				623
				624	.. _differ-examples:
				625
				626	Differ Example
				627	--------------
				628
				629	This example compares two texts. First we set up the texts, sequences of
				630	individual single-line strings ending with newlines (such sequences can also be
				631	obtained from the :meth:`readlines` method of file-like objects)::
				632
				633	>>> text1 = ''' 1. Beautiful is better than ugly.
				634	... 2. Explicit is better than implicit.
				635	... 3. Simple is better than complex.
				636	... 4. Complex is better than complicated.
				637	... '''.splitlines(1)
				638	>>> len(text1)
				639	4
				640	>>> text1[0][-1]
				641	'\n'
				642	>>> text2 = ''' 1. Beautiful is better than ugly.
				643	... 3. Simple is better than complex.
				644	... 4. Complicated is better than complex.
				645	... 5. Flat is better than nested.
				646	... '''.splitlines(1)
				647
				648	Next we instantiate a Differ object::
				649
				650	>>> d = Differ()
				651
				652	Note that when instantiating a :class:`Differ` object we may pass functions to
				653	filter out line and character "junk." See the :meth:`Differ` constructor for
				654	details.
				655
				656	Finally, we compare the two::
				657
				658	>>> result = list(d.compare(text1, text2))
				659
				660	``result`` is a list of strings, so let's pretty-print it::
				661
				662	>>> from pprint import pprint
				663	>>> pprint(result)
				664	[' 1. Beautiful is better than ugly.\n',
				665	'- 2. Explicit is better than implicit.\n',
				666	'- 3. Simple is better than complex.\n',
				667	'+ 3. Simple is better than complex.\n',
				668	'? ++ \n',
				669	'- 4. Complex is better than complicated.\n',
				670	'? ^ ---- ^ \n',
				671	'+ 4. Complicated is better than complex.\n',
				672	'? ++++ ^ ^ \n',
				673	'+ 5. Flat is better than nested.\n']
				674
				675	As a single multi-line string it looks like this::
				676
				677	>>> import sys
				678	>>> sys.stdout.writelines(result)
				679	1. Beautiful is better than ugly.
				680	- 2. Explicit is better than implicit.
				681	- 3. Simple is better than complex.
				682	+ 3. Simple is better than complex.
				683	? ++
				684	- 4. Complex is better than complicated.
				685	? ^ ---- ^
				686	+ 4. Complicated is better than complex.
				687	? ++++ ^ ^
				688	+ 5. Flat is better than nested.
				689
Georg Brandl	080b094	2008-02-23 15:19:54 +0000	[diff] [blame^]	690
				691	.. _difflib-interface:
				692
				693	A command-line interface to difflib
				694	-----------------------------------
				695
				696	This example shows how to use difflib to create a ``diff``-like utility.
				697	It is also contained in the Python source distribution, as
				698	:file:`Tools/scripts/diff.py`.
				699
				700	::
				701
				702	""" Command line interface to difflib.py providing diffs in four formats:
				703
				704	* ndiff: lists every line and highlights interline changes.
				705	* context: highlights clusters of changes in a before/after format.
				706	* unified: highlights clusters of changes in an inline format.
				707	* html: generates side by side comparison with change highlights.
				708
				709	"""
				710
				711	import sys, os, time, difflib, optparse
				712
				713	def main():
				714	# Configure the option parser
				715	usage = "usage: %prog [options] fromfile tofile"
				716	parser = optparse.OptionParser(usage)
				717	parser.add_option("-c", action="store_true", default=False,
				718	help='Produce a context format diff (default)')
				719	parser.add_option("-u", action="store_true", default=False,
				720	help='Produce a unified format diff')
				721	hlp = 'Produce HTML side by side diff (can use -c and -l in conjunction)'
				722	parser.add_option("-m", action="store_true", default=False, help=hlp)
				723	parser.add_option("-n", action="store_true", default=False,
				724	help='Produce a ndiff format diff')
				725	parser.add_option("-l", "--lines", type="int", default=3,
				726	help='Set number of context lines (default 3)')
				727	(options, args) = parser.parse_args()
				728
				729	if len(args) == 0:
				730	parser.print_help()
				731	sys.exit(1)
				732	if len(args) != 2:
				733	parser.error("need to specify both a fromfile and tofile")
				734
				735	n = options.lines
				736	fromfile, tofile = args # as specified in the usage string
				737
				738	# we're passing these as arguments to the diff function
				739	fromdate = time.ctime(os.stat(fromfile).st_mtime)
				740	todate = time.ctime(os.stat(tofile).st_mtime)
				741	fromlines = open(fromfile, 'U').readlines()
				742	tolines = open(tofile, 'U').readlines()
				743
				744	if options.u:
				745	diff = difflib.unified_diff(fromlines, tolines, fromfile, tofile,
				746	fromdate, todate, n=n)
				747	elif options.n:
				748	diff = difflib.ndiff(fromlines, tolines)
				749	elif options.m:
				750	diff = difflib.HtmlDiff().make_file(fromlines, tolines, fromfile,
				751	tofile, context=options.c,
				752	numlines=n)
				753	else:
				754	diff = difflib.context_diff(fromlines, tolines, fromfile, tofile,
				755	fromdate, todate, n=n)
				756
				757	# we're using writelines because diff is a generator
				758	sys.stdout.writelines(diff)
				759
				760	if __name__ == '__main__':
				761	main()