Blame - Doc/lib/libdifflib.tex - platform/external/python/cpython2

blob: 593416fb9516adc2bc8b5333da80ca5fd3d99487 [file] [log] [blame]

Fred Drake	baf7142	2001-02-19 16:31:02 +0000	[diff] [blame]	1	\section{\module{difflib} ---
				2	Helpers for computing deltas}
				3
				4	\declaremodule{standard}{difflib}
				5	\modulesynopsis{Helpers for computing differences between objects.}
				6	\moduleauthor{Tim Peters}{tim.one@home.com}
				7	\sectionauthor{Tim Peters}{tim.one@home.com}
				8	% LaTeXification by Fred L. Drake, Jr. <fdrake@acm.org>.
				9
Fred Drake	da00cda	2001-04-10 19:56:09 +0000	[diff] [blame]	10	\versionadded{2.1}
				11
				12
Fred Drake	6943a29	2001-08-13 19:31:59 +0000	[diff] [blame]	13	\begin{classdesc*}{SequenceMatcher}
				14	This is a flexible class for comparing pairs of sequences of any
				15	type, so long as the sequence elements are hashable. The basic
				16	algorithm predates, and is a little fancier than, an algorithm
				17	published in the late 1980's by Ratcliff and Obershelp under the
				18	hyperbolic name ``gestalt pattern matching.'' The idea is to find
				19	the longest contiguous matching subsequence that contains no
				20	``junk'' elements (the Ratcliff and Obershelp algorithm doesn't
				21	address junk). The same idea is then applied recursively to the
				22	pieces of the sequences to the left and to the right of the matching
				23	subsequence. This does not yield minimal edit sequences, but does
				24	tend to yield matches that ``look right'' to people.
				25
				26	\strong{Timing:} The basic Ratcliff-Obershelp algorithm is cubic
				27	time in the worst case and quadratic time in the expected case.
				28	\class{SequenceMatcher} is quadratic time for the worst case and has
				29	expected-case behavior dependent in a complicated way on how many
				30	elements the sequences have in common; best case time is linear.
				31	\end{classdesc*}
				32
				33	\begin{classdesc*}{Differ}
				34	This is a class for comparing sequences of lines of text, and
Tim Peters	8a9c284	2001-09-22 21:30:22 +0000	[diff] [blame]	35	producing human-readable differences or deltas. Differ uses
Fred Drake	6943a29	2001-08-13 19:31:59 +0000	[diff] [blame]	36	\class{SequenceMatcher} both to compare sequences of lines, and to
				37	compare sequences of characters within similar (near-matching)
				38	lines.
				39
				40	Each line of a \class{Differ} delta begins with a two-letter code:
				41
				42	\begin{tableii}{l\|l}{code}{Code}{Meaning}
				43	\lineii{'- '}{line unique to sequence 1}
				44	\lineii{'+ '}{line unique to sequence 2}
				45	\lineii{' '}{line common to both sequences}
				46	\lineii{'? '}{line not present in either input sequence}
				47	\end{tableii}
				48
				49	Lines beginning with `\code{?~}' attempt to guide the eye to
				50	intraline differences, and were not present in either input
				51	sequence. These lines can be confusing if the sequences contain tab
				52	characters.
				53	\end{classdesc*}
				54
Raymond Hettinger	e07b835	2003-06-09 21:44:59 +0000	[diff] [blame]	55	\begin{funcdesc}{context_diff}{a, b\optional{, fromfile\optional{, tofile
				56	\optional{, fromfiledate\optional{, tofiledate\optional{, n
				57	\optional{, lineterm}}}}}}}
				58
				59	Compare \var{a} and \var{b} (lists of strings); return a
				60	delta (a generator generating the delta lines) in context diff
				61	format.
				62
				63	Context diffs are a compact way of showing just the lines that have
				64	changed plus a few lines of context. The changes are shown in a
				65	before/after style. The number of context lines is set by \var{n}
				66	which defaults to three.
				67
				68	By default, the diff control lines (those with \code{***} or \code{---})
				69	are created with a trailing newline. This is helpful so that inputs created
				70	from \function{file.readlines()} result in diffs that are suitable for use
				71	with \function{file.writelines()} since both the inputs and outputs have
				72	trailing newlines.
				73
				74	For inputs that do not have trailing newlines, set the \var{lineterm}
				75	argument to \code{""} so that the output will be uniformly newline free.
				76
				77	The context diff format normally has a header for filenames and
				78	modification times. Any or all of these may be specified using strings for
				79	\var{fromfile}, \var{tofile}, \var{fromfiledate}, and \var{tofiledate}.
				80	The modification times are normally expressed in the format returned by
				81	\function{time.ctime()}. If not specified, the strings default to blanks.
				82
				83	\file{Tools/scripts/diff.py} is a command-line front-end for this
Raymond Hettinger	132fa37	2003-06-11 07:50:44 +0000	[diff] [blame]	84	function.
				85
				86	\versionadded{2.3}
Raymond Hettinger	e07b835	2003-06-09 21:44:59 +0000	[diff] [blame]	87	\end{funcdesc}
				88
Fred Drake	baf7142	2001-02-19 16:31:02 +0000	[diff] [blame]	89	\begin{funcdesc}{get_close_matches}{word, possibilities\optional{,
				90	n\optional{, cutoff}}}
				91	Return a list of the best ``good enough'' matches. \var{word} is a
				92	sequence for which close matches are desired (typically a string),
				93	and \var{possibilities} is a list of sequences against which to
				94	match \var{word} (typically a list of strings).
				95
				96	Optional argument \var{n} (default \code{3}) is the maximum number
				97	of close matches to return; \var{n} must be greater than \code{0}.
				98
				99	Optional argument \var{cutoff} (default \code{0.6}) is a float in
				100	the range [0, 1]. Possibilities that don't score at least that
				101	similar to \var{word} are ignored.
				102
				103	The best (no more than \var{n}) matches among the possibilities are
				104	returned in a list, sorted by similarity score, most similar first.
				105
				106	\begin{verbatim}
				107	>>> get_close_matches('appel', ['ape', 'apple', 'peach', 'puppy'])
				108	['apple', 'ape']
				109	>>> import keyword
				110	>>> get_close_matches('wheel', keyword.kwlist)
				111	['while']
				112	>>> get_close_matches('apple', keyword.kwlist)
				113	[]
				114	>>> get_close_matches('accept', keyword.kwlist)
				115	['except']
				116	\end{verbatim}
				117	\end{funcdesc}
				118
Fred Drake	6943a29	2001-08-13 19:31:59 +0000	[diff] [blame]	119	\begin{funcdesc}{ndiff}{a, b\optional{, linejunk\optional{,
				120	charjunk}}}
				121	Compare \var{a} and \var{b} (lists of strings); return a
Tim Peters	8a9c284	2001-09-22 21:30:22 +0000	[diff] [blame]	122	\class{Differ}-style delta (a generator generating the delta lines).
Fred Drake	baf7142	2001-02-19 16:31:02 +0000	[diff] [blame]	123
Fred Drake	6943a29	2001-08-13 19:31:59 +0000	[diff] [blame]	124	Optional keyword parameters \var{linejunk} and \var{charjunk} are
				125	for filter functions (or \code{None}):
				126
Tim Peters	81b9251	2002-04-29 01:37:32 +0000	[diff] [blame]	127	\var{linejunk}: A function that accepts a single string
				128	argument, and returns true if the string is junk, or false if not.
				129	The default is (\code{None}), starting with Python 2.3. Before then,
				130	the default was the module-level function
Fred Drake	6943a29	2001-08-13 19:31:59 +0000	[diff] [blame]	131	\function{IS_LINE_JUNK()}, which filters out lines without visible
				132	characters, except for at most one pound character (\character{\#}).
Tim Peters	81b9251	2002-04-29 01:37:32 +0000	[diff] [blame]	133	As of Python 2.3, the underlying \class{SequenceMatcher} class
				134	does a dynamic analysis of which lines are so frequent as to
				135	constitute noise, and this usually works better than the pre-2.3
				136	default.
Fred Drake	6943a29	2001-08-13 19:31:59 +0000	[diff] [blame]	137
Tim Peters	81b9251	2002-04-29 01:37:32 +0000	[diff] [blame]	138	\var{charjunk}: A function that accepts a character (a string of
				139	length 1), and returns if the character is junk, or false if not.
Fred Drake	6943a29	2001-08-13 19:31:59 +0000	[diff] [blame]	140	The default is module-level function \function{IS_CHARACTER_JUNK()},
				141	which filters out whitespace characters (a blank or tab; note: bad
				142	idea to include newline in this!).
				143
				144	\file{Tools/scripts/ndiff.py} is a command-line front-end to this
				145	function.
				146
				147	\begin{verbatim}
				148	>>> diff = ndiff('one\ntwo\nthree\n'.splitlines(1),
Fred Drake	edb635f	2002-12-06 18:52:28 +0000	[diff] [blame]	149	... 'ore\ntree\nemu\n'.splitlines(1))
Fred Drake	6943a29	2001-08-13 19:31:59 +0000	[diff] [blame]	150	>>> print ''.join(diff),
				151	- one
Tim Peters	8a9c284	2001-09-22 21:30:22 +0000	[diff] [blame]	152	? ^
Fred Drake	6943a29	2001-08-13 19:31:59 +0000	[diff] [blame]	153	+ ore
Tim Peters	8a9c284	2001-09-22 21:30:22 +0000	[diff] [blame]	154	? ^
Fred Drake	6943a29	2001-08-13 19:31:59 +0000	[diff] [blame]	155	- two
				156	- three
Tim Peters	8a9c284	2001-09-22 21:30:22 +0000	[diff] [blame]	157	? -
Fred Drake	6943a29	2001-08-13 19:31:59 +0000	[diff] [blame]	158	+ tree
				159	+ emu
				160	\end{verbatim}
				161	\end{funcdesc}
				162
				163	\begin{funcdesc}{restore}{sequence, which}
				164	Return one of the two sequences that generated a delta.
				165
				166	Given a \var{sequence} produced by \method{Differ.compare()} or
				167	\function{ndiff()}, extract lines originating from file 1 or 2
				168	(parameter \var{which}), stripping off line prefixes.
				169
				170	Example:
				171
				172	\begin{verbatim}
				173	>>> diff = ndiff('one\ntwo\nthree\n'.splitlines(1),
				174	... 'ore\ntree\nemu\n'.splitlines(1))
Tim Peters	8a9c284	2001-09-22 21:30:22 +0000	[diff] [blame]	175	>>> diff = list(diff) # materialize the generated delta into a list
Fred Drake	6943a29	2001-08-13 19:31:59 +0000	[diff] [blame]	176	>>> print ''.join(restore(diff, 1)),
				177	one
				178	two
				179	three
				180	>>> print ''.join(restore(diff, 2)),
				181	ore
				182	tree
				183	emu
				184	\end{verbatim}
				185
				186	\end{funcdesc}
				187
Raymond Hettinger	e07b835	2003-06-09 21:44:59 +0000	[diff] [blame]	188	\begin{funcdesc}{unified_diff}{a, b\optional{, fromfile\optional{, tofile
				189	\optional{, fromfiledate\optional{, tofiledate\optional{, n
				190	\optional{, lineterm}}}}}}}
				191
				192	Compare \var{a} and \var{b} (lists of strings); return a
				193	delta (a generator generating the delta lines) in unified diff
				194	format.
				195
				196	Unified diffs are a compact way of showing just the lines that have
				197	changed plus a few lines of context. The changes are shown in a
				198	inline style (instead of separate before/after blocks). The number
				199	of context lines is set by \var{n} which defaults to three.
				200
				201	By default, the diff control lines (those with \code{---}, \code{+++},
				202	or \code{@@}) are created with a trailing newline. This is helpful so
				203	that inputs created from \function{file.readlines()} result in diffs
				204	that are suitable for use with \function{file.writelines()} since both
				205	the inputs and outputs have trailing newlines.
				206
				207	For inputs that do not have trailing newlines, set the \var{lineterm}
				208	argument to \code{""} so that the output will be uniformly newline free.
				209
				210	The context diff format normally has a header for filenames and
				211	modification times. Any or all of these may be specified using strings for
				212	\var{fromfile}, \var{tofile}, \var{fromfiledate}, and \var{tofiledate}.
				213	The modification times are normally expressed in the format returned by
				214	\function{time.ctime()}. If not specified, the strings default to blanks.
				215
				216	\file{Tools/scripts/diff.py} is a command-line front-end for this
Raymond Hettinger	132fa37	2003-06-11 07:50:44 +0000	[diff] [blame]	217	function.
				218
				219	\versionadded{2.3}
Raymond Hettinger	e07b835	2003-06-09 21:44:59 +0000	[diff] [blame]	220	\end{funcdesc}
Fred Drake	6943a29	2001-08-13 19:31:59 +0000	[diff] [blame]	221
Fred Drake	7f10cce	2001-10-26 03:04:23 +0000	[diff] [blame]	222	\begin{funcdesc}{IS_LINE_JUNK}{line}
				223	Return true for ignorable lines. The line \var{line} is ignorable
				224	if \var{line} is blank or contains a single \character{\#},
				225	otherwise it is not ignorable. Used as a default for parameter
Tim Peters	81b9251	2002-04-29 01:37:32 +0000	[diff] [blame]	226	\var{linejunk} in \function{ndiff()} before Python 2.3.
Fred Drake	6943a29	2001-08-13 19:31:59 +0000	[diff] [blame]	227	\end{funcdesc}
				228
				229
Fred Drake	7f10cce	2001-10-26 03:04:23 +0000	[diff] [blame]	230	\begin{funcdesc}{IS_CHARACTER_JUNK}{ch}
				231	Return true for ignorable characters. The character \var{ch} is
				232	ignorable if \var{ch} is a space or tab, otherwise it is not
				233	ignorable. Used as a default for parameter \var{charjunk} in
Fred Drake	6943a29	2001-08-13 19:31:59 +0000	[diff] [blame]	234	\function{ndiff()}.
Fred Drake	6943a29	2001-08-13 19:31:59 +0000	[diff] [blame]	235	\end{funcdesc}
Fred Drake	baf7142	2001-02-19 16:31:02 +0000	[diff] [blame]	236
				237
Fred Drake	6fda3ac	2001-04-10 18:41:16 +0000	[diff] [blame]	238	\begin{seealso}
				239	\seetitle{Pattern Matching: The Gestalt Approach}{Discussion of a
				240	similar algorithm by John W. Ratcliff and D. E. Metzener.
				241	This was published in
				242	\citetitle[http://www.ddj.com/]{Dr. Dobb's Journal} in
				243	July, 1988.}
				244	\end{seealso}
				245
				246
Fred Drake	baf7142	2001-02-19 16:31:02 +0000	[diff] [blame]	247	\subsection{SequenceMatcher Objects \label{sequence-matcher}}
				248
Fred Drake	96d7a70	2001-05-11 01:08:13 +0000	[diff] [blame]	249	The \class{SequenceMatcher} class has this constructor:
				250
Fred Drake	baf7142	2001-02-19 16:31:02 +0000	[diff] [blame]	251	\begin{classdesc}{SequenceMatcher}{\optional{isjunk\optional{,
				252	a\optional{, b}}}}
				253	Optional argument \var{isjunk} must be \code{None} (the default) or
				254	a one-argument function that takes a sequence element and returns
				255	true if and only if the element is ``junk'' and should be ignored.
Fred Drake	7f10cce	2001-10-26 03:04:23 +0000	[diff] [blame]	256	Passing \code{None} for \var{b} is equivalent to passing
				257	\code{lambda x: 0}; in other words, no elements are ignored. For
				258	example, pass:
Fred Drake	baf7142	2001-02-19 16:31:02 +0000	[diff] [blame]	259
				260	\begin{verbatim}
Fred Drake	447f545	2001-02-23 19:13:07 +0000	[diff] [blame]	261	lambda x: x in " \t"
Fred Drake	baf7142	2001-02-19 16:31:02 +0000	[diff] [blame]	262	\end{verbatim}
				263
				264	if you're comparing lines as sequences of characters, and don't want
				265	to synch up on blanks or hard tabs.
				266
				267	The optional arguments \var{a} and \var{b} are sequences to be
				268	compared; both default to empty strings. The elements of both
				269	sequences must be hashable.
				270	\end{classdesc}
				271
				272
				273	\class{SequenceMatcher} objects have the following methods:
				274
				275	\begin{methoddesc}{set_seqs}{a, b}
				276	Set the two sequences to be compared.
				277	\end{methoddesc}
				278
				279	\class{SequenceMatcher} computes and caches detailed information about
				280	the second sequence, so if you want to compare one sequence against
				281	many sequences, use \method{set_seq2()} to set the commonly used
				282	sequence once and call \method{set_seq1()} repeatedly, once for each
				283	of the other sequences.
				284
				285	\begin{methoddesc}{set_seq1}{a}
				286	Set the first sequence to be compared. The second sequence to be
				287	compared is not changed.
				288	\end{methoddesc}
				289
				290	\begin{methoddesc}{set_seq2}{b}
				291	Set the second sequence to be compared. The first sequence to be
				292	compared is not changed.
				293	\end{methoddesc}
				294
				295	\begin{methoddesc}{find_longest_match}{alo, ahi, blo, bhi}
				296	Find longest matching block in \code{\var{a}[\var{alo}:\var{ahi}]}
				297	and \code{\var{b}[\var{blo}:\var{bhi}]}.
				298
				299	If \var{isjunk} was omitted or \code{None},
				300	\method{get_longest_match()} returns \code{(\var{i}, \var{j},
				301	\var{k})} such that \code{\var{a}[\var{i}:\var{i}+\var{k}]} is equal
Tim Peters	8a9c284	2001-09-22 21:30:22 +0000	[diff] [blame]	302	to \code{\var{b}[\var{j}:\var{j}+\var{k}]}, where
Fred Drake	baf7142	2001-02-19 16:31:02 +0000	[diff] [blame]	303	\code{\var{alo} <= \var{i} <= \var{i}+\var{k} <= \var{ahi}} and
				304	\code{\var{blo} <= \var{j} <= \var{j}+\var{k} <= \var{bhi}}.
				305	For all \code{(\var{i'}, \var{j'}, \var{k'})} meeting those
				306	conditions, the additional conditions
				307	\code{\var{k} >= \var{k'}},
				308	\code{\var{i} <= \var{i'}},
				309	and if \code{\var{i} == \var{i'}}, \code{\var{j} <= \var{j'}}
				310	are also met.
				311	In other words, of all maximal matching blocks, return one that
				312	starts earliest in \var{a}, and of all those maximal matching blocks
				313	that start earliest in \var{a}, return the one that starts earliest
				314	in \var{b}.
				315
				316	\begin{verbatim}
				317	>>> s = SequenceMatcher(None, " abcd", "abcd abcd")
				318	>>> s.find_longest_match(0, 5, 0, 9)
				319	(0, 4, 5)
				320	\end{verbatim}
				321
				322	If \var{isjunk} was provided, first the longest matching block is
				323	determined as above, but with the additional restriction that no
				324	junk element appears in the block. Then that block is extended as
				325	far as possible by matching (only) junk elements on both sides.
				326	So the resulting block never matches on junk except as identical
				327	junk happens to be adjacent to an interesting match.
				328
				329	Here's the same example as before, but considering blanks to be junk.
Tim Peters	754ba58	2001-02-20 11:24:35 +0000	[diff] [blame]	330	That prevents \code{' abcd'} from matching the \code{' abcd'} at the
Fred Drake	baf7142	2001-02-19 16:31:02 +0000	[diff] [blame]	331	tail end of the second sequence directly. Instead only the
				332	\code{'abcd'} can match, and matches the leftmost \code{'abcd'} in
				333	the second sequence:
				334
				335	\begin{verbatim}
				336	>>> s = SequenceMatcher(lambda x: x==" ", " abcd", "abcd abcd")
				337	>>> s.find_longest_match(0, 5, 0, 9)
				338	(1, 0, 4)
				339	\end{verbatim}
				340
				341	If no blocks match, this returns \code{(\var{alo}, \var{blo}, 0)}.
				342	\end{methoddesc}
				343
				344	\begin{methoddesc}{get_matching_blocks}{}
				345	Return list of triples describing matching subsequences.
				346	Each triple is of the form \code{(\var{i}, \var{j}, \var{n})}, and
				347	means that \code{\var{a}[\var{i}:\var{i}+\var{n}] ==
				348	\var{b}[\var{j}:\var{j}+\var{n}]}. The triples are monotonically
				349	increasing in \var{i} and \var{j}.
				350
				351	The last triple is a dummy, and has the value \code{(len(\var{a}),
				352	len(\var{b}), 0)}. It is the only triple with \code{\var{n} == 0}.
				353	% Explain why a dummy is used!
				354
				355	\begin{verbatim}
				356	>>> s = SequenceMatcher(None, "abxcd", "abcd")
				357	>>> s.get_matching_blocks()
				358	[(0, 0, 2), (3, 2, 2), (5, 4, 0)]
				359	\end{verbatim}
				360	\end{methoddesc}
				361
				362	\begin{methoddesc}{get_opcodes}{}
				363	Return list of 5-tuples describing how to turn \var{a} into \var{b}.
				364	Each tuple is of the form \code{(\var{tag}, \var{i1}, \var{i2},
				365	\var{j1}, \var{j2})}. The first tuple has \code{\var{i1} ==
				366	\var{j1} == 0}, and remaining tuples have \var{i1} equal to the
				367	\var{i2} from the preceeding tuple, and, likewise, \var{j1} equal to
				368	the previous \var{j2}.
				369
				370	The \var{tag} values are strings, with these meanings:
				371
				372	\begin{tableii}{l\|l}{code}{Value}{Meaning}
				373	\lineii{'replace'}{\code{\var{a}[\var{i1}:\var{i2}]} should be
				374	replaced by \code{\var{b}[\var{j1}:\var{j2}]}.}
				375	\lineii{'delete'}{\code{\var{a}[\var{i1}:\var{i2}]} should be
				376	deleted. Note that \code{\var{j1} == \var{j2}} in
				377	this case.}
				378	\lineii{'insert'}{\code{\var{b}[\var{j1}:\var{j2}]} should be
Tim Peters	8a9c284	2001-09-22 21:30:22 +0000	[diff] [blame]	379	inserted at \code{\var{a}[\var{i1}:\var{i1}]}.
Fred Drake	baf7142	2001-02-19 16:31:02 +0000	[diff] [blame]	380	Note that \code{\var{i1} == \var{i2}} in this
				381	case.}
				382	\lineii{'equal'}{\code{\var{a}[\var{i1}:\var{i2}] ==
				383	\var{b}[\var{j1}:\var{j2}]} (the sub-sequences are
				384	equal).}
				385	\end{tableii}
				386
				387	For example:
				388
				389	\begin{verbatim}
				390	>>> a = "qabxcd"
				391	>>> b = "abycdf"
				392	>>> s = SequenceMatcher(None, a, b)
				393	>>> for tag, i1, i2, j1, j2 in s.get_opcodes():
				394	... print ("%7s a[%d:%d] (%s) b[%d:%d] (%s)" %
				395	... (tag, i1, i2, a[i1:i2], j1, j2, b[j1:j2]))
				396	delete a[0:1] (q) b[0:0] ()
				397	equal a[1:3] (ab) b[0:2] (ab)
				398	replace a[3:4] (x) b[2:3] (y)
				399	equal a[4:6] (cd) b[3:5] (cd)
				400	insert a[6:6] () b[5:6] (f)
				401	\end{verbatim}
				402	\end{methoddesc}
				403
Raymond Hettinger	132fa37	2003-06-11 07:50:44 +0000	[diff] [blame]	404	\begin{methoddesc}{get_grouped_opcodes}{\optional{n}}
				405	Return a generator of groups with up to \var{n} lines of context.
				406
				407	Starting with the groups returned by \method{get_opcodes()},
				408	this method splits out smaller change clusters and eliminates
				409	intervening ranges which have no changes.
				410
				411	The groups are returned in the same format as \method{get_opcodes()}.
				412	\versionadded{2.3}
				413	\end{methoddesc}
				414
Fred Drake	baf7142	2001-02-19 16:31:02 +0000	[diff] [blame]	415	\begin{methoddesc}{ratio}{}
				416	Return a measure of the sequences' similarity as a float in the
				417	range [0, 1].
				418
				419	Where T is the total number of elements in both sequences, and M is
Fred Drake	6943a29	2001-08-13 19:31:59 +0000	[diff] [blame]	420	the number of matches, this is 2.0*M / T. Note that this is
				421	\code{1.0} if the sequences are identical, and \code{0.0} if they
				422	have nothing in common.
Fred Drake	baf7142	2001-02-19 16:31:02 +0000	[diff] [blame]	423
				424	This is expensive to compute if \method{get_matching_blocks()} or
				425	\method{get_opcodes()} hasn't already been called, in which case you
				426	may want to try \method{quick_ratio()} or
				427	\method{real_quick_ratio()} first to get an upper bound.
				428	\end{methoddesc}
				429
				430	\begin{methoddesc}{quick_ratio}{}
				431	Return an upper bound on \method{ratio()} relatively quickly.
				432
				433	This isn't defined beyond that it is an upper bound on
				434	\method{ratio()}, and is faster to compute.
				435	\end{methoddesc}
				436
				437	\begin{methoddesc}{real_quick_ratio}{}
				438	Return an upper bound on \method{ratio()} very quickly.
				439
				440	This isn't defined beyond that it is an upper bound on
				441	\method{ratio()}, and is faster to compute than either
				442	\method{ratio()} or \method{quick_ratio()}.
				443	\end{methoddesc}
				444
Tim Peters	754ba58	2001-02-20 11:24:35 +0000	[diff] [blame]	445	The three methods that return the ratio of matching to total characters
				446	can give different results due to differing levels of approximation,
				447	although \method{quick_ratio()} and \method{real_quick_ratio()} are always
				448	at least as large as \method{ratio()}:
Fred Drake	baf7142	2001-02-19 16:31:02 +0000	[diff] [blame]	449
				450	\begin{verbatim}
				451	>>> s = SequenceMatcher(None, "abcd", "bcde")
				452	>>> s.ratio()
				453	0.75
				454	>>> s.quick_ratio()
				455	0.75
				456	>>> s.real_quick_ratio()
				457	1.0
				458	\end{verbatim}
				459
				460
Fred Drake	6943a29	2001-08-13 19:31:59 +0000	[diff] [blame]	461	\subsection{SequenceMatcher Examples \label{sequencematcher-examples}}
Fred Drake	baf7142	2001-02-19 16:31:02 +0000	[diff] [blame]	462
				463
				464	This example compares two strings, considering blanks to be ``junk:''
				465
				466	\begin{verbatim}
				467	>>> s = SequenceMatcher(lambda x: x == " ",
				468	... "private Thread currentThread;",
				469	... "private volatile Thread currentThread;")
				470	\end{verbatim}
				471
				472	\method{ratio()} returns a float in [0, 1], measuring the similarity
				473	of the sequences. As a rule of thumb, a \method{ratio()} value over
				474	0.6 means the sequences are close matches:
				475
				476	\begin{verbatim}
				477	>>> print round(s.ratio(), 3)
				478	0.866
				479	\end{verbatim}
				480
				481	If you're only interested in where the sequences match,
				482	\method{get_matching_blocks()} is handy:
				483
				484	\begin{verbatim}
				485	>>> for block in s.get_matching_blocks():
				486	... print "a[%d] and b[%d] match for %d elements" % block
				487	a[0] and b[0] match for 8 elements
				488	a[8] and b[17] match for 6 elements
				489	a[14] and b[23] match for 15 elements
				490	a[29] and b[38] match for 0 elements
				491	\end{verbatim}
				492
				493	Note that the last tuple returned by \method{get_matching_blocks()} is
				494	always a dummy, \code{(len(\var{a}), len(\var{b}), 0)}, and this is
				495	the only case in which the last tuple element (number of elements
				496	matched) is \code{0}.
				497
				498	If you want to know how to change the first sequence into the second,
				499	use \method{get_opcodes()}:
				500
				501	\begin{verbatim}
				502	>>> for opcode in s.get_opcodes():
				503	... print "%6s a[%d:%d] b[%d:%d]" % opcode
				504	equal a[0:8] b[0:8]
				505	insert a[8:8] b[8:17]
				506	equal a[8:14] b[17:23]
				507	equal a[14:29] b[23:38]
				508	\end{verbatim}
				509
Fred Drake	baf7142	2001-02-19 16:31:02 +0000	[diff] [blame]	510	See also the function \function{get_close_matches()} in this module,
				511	which shows how simple code building on \class{SequenceMatcher} can be
				512	used to do useful work.
Fred Drake	6943a29	2001-08-13 19:31:59 +0000	[diff] [blame]	513
				514
				515	\subsection{Differ Objects \label{differ-objects}}
				516
				517	Note that \class{Differ}-generated deltas make no claim to be
				518	\strong{minimal} diffs. To the contrary, minimal diffs are often
				519	counter-intuitive, because they synch up anywhere possible, sometimes
				520	accidental matches 100 pages apart. Restricting synch points to
				521	contiguous matches preserves some notion of locality, at the
				522	occasional cost of producing a longer diff.
				523
				524	The \class{Differ} class has this constructor:
				525
				526	\begin{classdesc}{Differ}{\optional{linejunk\optional{, charjunk}}}
				527	Optional keyword parameters \var{linejunk} and \var{charjunk} are
				528	for filter functions (or \code{None}):
				529
Tim Peters	81b9251	2002-04-29 01:37:32 +0000	[diff] [blame]	530	\var{linejunk}: A function that accepts a single string
				531	argument, and returns true if the string is junk. The default is
				532	\code{None}, meaning that no line is considered junk.
Fred Drake	6943a29	2001-08-13 19:31:59 +0000	[diff] [blame]	533
Tim Peters	81b9251	2002-04-29 01:37:32 +0000	[diff] [blame]	534	\var{charjunk}: A function that accepts a single character argument
				535	(a string of length 1), and returns true if the character is junk.
				536	The default is \code{None}, meaning that no character is
				537	considered junk.
Fred Drake	6943a29	2001-08-13 19:31:59 +0000	[diff] [blame]	538	\end{classdesc}
				539
				540	\class{Differ} objects are used (deltas generated) via a single
				541	method:
				542
				543	\begin{methoddesc}{compare}{a, b}
Tim Peters	8a9c284	2001-09-22 21:30:22 +0000	[diff] [blame]	544	Compare two sequences of lines, and generate the delta (a sequence
				545	of lines).
Fred Drake	6943a29	2001-08-13 19:31:59 +0000	[diff] [blame]	546
				547	Each sequence must contain individual single-line strings ending
				548	with newlines. Such sequences can be obtained from the
Tim Peters	8a9c284	2001-09-22 21:30:22 +0000	[diff] [blame]	549	\method{readlines()} method of file-like objects. The delta generated
				550	also consists of newline-terminated strings, ready to be printed as-is
Fred Drake	389aa17	2001-11-29 19:04:50 +0000	[diff] [blame]	551	via the \method{writelines()} method of a file-like object.
Fred Drake	6943a29	2001-08-13 19:31:59 +0000	[diff] [blame]	552	\end{methoddesc}
				553
				554
				555	\subsection{Differ Example \label{differ-examples}}
				556
				557	This example compares two texts. First we set up the texts, sequences
				558	of individual single-line strings ending with newlines (such sequences
				559	can also be obtained from the \method{readlines()} method of file-like
				560	objects):
				561
				562	\begin{verbatim}
				563	>>> text1 = ''' 1. Beautiful is better than ugly.
				564	... 2. Explicit is better than implicit.
				565	... 3. Simple is better than complex.
				566	... 4. Complex is better than complicated.
				567	... '''.splitlines(1)
				568	>>> len(text1)
				569	4
				570	>>> text1[0][-1]
				571	'\n'
				572	>>> text2 = ''' 1. Beautiful is better than ugly.
				573	... 3. Simple is better than complex.
				574	... 4. Complicated is better than complex.
				575	... 5. Flat is better than nested.
				576	... '''.splitlines(1)
				577	\end{verbatim}
				578
				579	Next we instantiate a Differ object:
				580
				581	\begin{verbatim}
				582	>>> d = Differ()
				583	\end{verbatim}
				584
				585	Note that when instantiating a \class{Differ} object we may pass
				586	functions to filter out line and character ``junk.'' See the
				587	\method{Differ()} constructor for details.
				588
				589	Finally, we compare the two:
				590
				591	\begin{verbatim}
Tim Peters	8a9c284	2001-09-22 21:30:22 +0000	[diff] [blame]	592	>>> result = list(d.compare(text1, text2))
Fred Drake	6943a29	2001-08-13 19:31:59 +0000	[diff] [blame]	593	\end{verbatim}
				594
				595	\code{result} is a list of strings, so let's pretty-print it:
				596
				597	\begin{verbatim}
				598	>>> from pprint import pprint
				599	>>> pprint(result)
				600	[' 1. Beautiful is better than ugly.\n',
				601	'- 2. Explicit is better than implicit.\n',
				602	'- 3. Simple is better than complex.\n',
				603	'+ 3. Simple is better than complex.\n',
				604	'? ++ \n',
				605	'- 4. Complex is better than complicated.\n',
				606	'? ^ ---- ^ \n',
				607	'+ 4. Complicated is better than complex.\n',
				608	'? ++++ ^ ^ \n',
				609	'+ 5. Flat is better than nested.\n']
				610	\end{verbatim}
				611
				612	As a single multi-line string it looks like this:
				613
				614	\begin{verbatim}
				615	>>> import sys
				616	>>> sys.stdout.writelines(result)
				617	1. Beautiful is better than ugly.
				618	- 2. Explicit is better than implicit.
				619	- 3. Simple is better than complex.
				620	+ 3. Simple is better than complex.
				621	? ++
				622	- 4. Complex is better than complicated.
				623	? ^ ---- ^
				624	+ 4. Complicated is better than complex.
				625	? ++++ ^ ^
				626	+ 5. Flat is better than nested.
				627	\end{verbatim}