Blame - Doc/lib/libdifflib.tex - platform/external/python/cpython3

blob: fc588ffdf8ef67c63dcbc3d45a146fc5212c2f20 [file] [log] [blame]

Fred Drake	baf7142	2001-02-19 16:31:02 +0000	[diff] [blame]	1	\section{\module{difflib} ---
				2	Helpers for computing deltas}
				3
				4	\declaremodule{standard}{difflib}
				5	\modulesynopsis{Helpers for computing differences between objects.}
Fred Drake	34929f2	2003-12-30 16:12:27 +0000	[diff] [blame]	6	\moduleauthor{Tim Peters}{tim_one@users.sourceforge.net}
				7	\sectionauthor{Tim Peters}{tim_one@users.sourceforge.net}
Fred Drake	baf7142	2001-02-19 16:31:02 +0000	[diff] [blame]	8	% LaTeXification by Fred L. Drake, Jr. <fdrake@acm.org>.
				9
Fred Drake	da00cda	2001-04-10 19:56:09 +0000	[diff] [blame]	10	\versionadded{2.1}
				11
				12
Fred Drake	6943a29	2001-08-13 19:31:59 +0000	[diff] [blame]	13	\begin{classdesc*}{SequenceMatcher}
				14	This is a flexible class for comparing pairs of sequences of any
				15	type, so long as the sequence elements are hashable. The basic
				16	algorithm predates, and is a little fancier than, an algorithm
				17	published in the late 1980's by Ratcliff and Obershelp under the
				18	hyperbolic name ``gestalt pattern matching.'' The idea is to find
				19	the longest contiguous matching subsequence that contains no
				20	``junk'' elements (the Ratcliff and Obershelp algorithm doesn't
				21	address junk). The same idea is then applied recursively to the
				22	pieces of the sequences to the left and to the right of the matching
				23	subsequence. This does not yield minimal edit sequences, but does
				24	tend to yield matches that ``look right'' to people.
				25
				26	\strong{Timing:} The basic Ratcliff-Obershelp algorithm is cubic
				27	time in the worst case and quadratic time in the expected case.
				28	\class{SequenceMatcher} is quadratic time for the worst case and has
				29	expected-case behavior dependent in a complicated way on how many
				30	elements the sequences have in common; best case time is linear.
				31	\end{classdesc*}
				32
				33	\begin{classdesc*}{Differ}
				34	This is a class for comparing sequences of lines of text, and
Tim Peters	8a9c284	2001-09-22 21:30:22 +0000	[diff] [blame]	35	producing human-readable differences or deltas. Differ uses
Fred Drake	6943a29	2001-08-13 19:31:59 +0000	[diff] [blame]	36	\class{SequenceMatcher} both to compare sequences of lines, and to
				37	compare sequences of characters within similar (near-matching)
				38	lines.
				39
				40	Each line of a \class{Differ} delta begins with a two-letter code:
				41
				42	\begin{tableii}{l\|l}{code}{Code}{Meaning}
				43	\lineii{'- '}{line unique to sequence 1}
				44	\lineii{'+ '}{line unique to sequence 2}
				45	\lineii{' '}{line common to both sequences}
				46	\lineii{'? '}{line not present in either input sequence}
				47	\end{tableii}
				48
				49	Lines beginning with `\code{?~}' attempt to guide the eye to
				50	intraline differences, and were not present in either input
				51	sequence. These lines can be confusing if the sequences contain tab
				52	characters.
				53	\end{classdesc*}
				54
Raymond Hettinger	e07b835	2003-06-09 21:44:59 +0000	[diff] [blame]	55	\begin{funcdesc}{context_diff}{a, b\optional{, fromfile\optional{, tofile
				56	\optional{, fromfiledate\optional{, tofiledate\optional{, n
				57	\optional{, lineterm}}}}}}}
Raymond Hettinger	e07b835	2003-06-09 21:44:59 +0000	[diff] [blame]	58	Compare \var{a} and \var{b} (lists of strings); return a
				59	delta (a generator generating the delta lines) in context diff
				60	format.
				61
				62	Context diffs are a compact way of showing just the lines that have
				63	changed plus a few lines of context. The changes are shown in a
				64	before/after style. The number of context lines is set by \var{n}
				65	which defaults to three.
				66
				67	By default, the diff control lines (those with \code{***} or \code{---})
				68	are created with a trailing newline. This is helpful so that inputs created
				69	from \function{file.readlines()} result in diffs that are suitable for use
				70	with \function{file.writelines()} since both the inputs and outputs have
				71	trailing newlines.
				72
				73	For inputs that do not have trailing newlines, set the \var{lineterm}
				74	argument to \code{""} so that the output will be uniformly newline free.
				75
				76	The context diff format normally has a header for filenames and
				77	modification times. Any or all of these may be specified using strings for
				78	\var{fromfile}, \var{tofile}, \var{fromfiledate}, and \var{tofiledate}.
				79	The modification times are normally expressed in the format returned by
				80	\function{time.ctime()}. If not specified, the strings default to blanks.
				81
				82	\file{Tools/scripts/diff.py} is a command-line front-end for this
Raymond Hettinger	132fa37	2003-06-11 07:50:44 +0000	[diff] [blame]	83	function.
				84
				85	\versionadded{2.3}
Raymond Hettinger	e07b835	2003-06-09 21:44:59 +0000	[diff] [blame]	86	\end{funcdesc}
				87
Fred Drake	baf7142	2001-02-19 16:31:02 +0000	[diff] [blame]	88	\begin{funcdesc}{get_close_matches}{word, possibilities\optional{,
				89	n\optional{, cutoff}}}
				90	Return a list of the best ``good enough'' matches. \var{word} is a
				91	sequence for which close matches are desired (typically a string),
				92	and \var{possibilities} is a list of sequences against which to
				93	match \var{word} (typically a list of strings).
				94
				95	Optional argument \var{n} (default \code{3}) is the maximum number
				96	of close matches to return; \var{n} must be greater than \code{0}.
				97
				98	Optional argument \var{cutoff} (default \code{0.6}) is a float in
				99	the range [0, 1]. Possibilities that don't score at least that
				100	similar to \var{word} are ignored.
				101
				102	The best (no more than \var{n}) matches among the possibilities are
				103	returned in a list, sorted by similarity score, most similar first.
				104
				105	\begin{verbatim}
				106	>>> get_close_matches('appel', ['ape', 'apple', 'peach', 'puppy'])
				107	['apple', 'ape']
				108	>>> import keyword
				109	>>> get_close_matches('wheel', keyword.kwlist)
				110	['while']
				111	>>> get_close_matches('apple', keyword.kwlist)
				112	[]
				113	>>> get_close_matches('accept', keyword.kwlist)
				114	['except']
				115	\end{verbatim}
				116	\end{funcdesc}
				117
Fred Drake	6943a29	2001-08-13 19:31:59 +0000	[diff] [blame]	118	\begin{funcdesc}{ndiff}{a, b\optional{, linejunk\optional{,
				119	charjunk}}}
				120	Compare \var{a} and \var{b} (lists of strings); return a
Tim Peters	8a9c284	2001-09-22 21:30:22 +0000	[diff] [blame]	121	\class{Differ}-style delta (a generator generating the delta lines).
Fred Drake	baf7142	2001-02-19 16:31:02 +0000	[diff] [blame]	122
Fred Drake	6943a29	2001-08-13 19:31:59 +0000	[diff] [blame]	123	Optional keyword parameters \var{linejunk} and \var{charjunk} are
				124	for filter functions (or \code{None}):
				125
Tim Peters	81b9251	2002-04-29 01:37:32 +0000	[diff] [blame]	126	\var{linejunk}: A function that accepts a single string
				127	argument, and returns true if the string is junk, or false if not.
				128	The default is (\code{None}), starting with Python 2.3. Before then,
				129	the default was the module-level function
Fred Drake	6943a29	2001-08-13 19:31:59 +0000	[diff] [blame]	130	\function{IS_LINE_JUNK()}, which filters out lines without visible
				131	characters, except for at most one pound character (\character{\#}).
Tim Peters	81b9251	2002-04-29 01:37:32 +0000	[diff] [blame]	132	As of Python 2.3, the underlying \class{SequenceMatcher} class
				133	does a dynamic analysis of which lines are so frequent as to
				134	constitute noise, and this usually works better than the pre-2.3
				135	default.
Fred Drake	6943a29	2001-08-13 19:31:59 +0000	[diff] [blame]	136
Tim Peters	81b9251	2002-04-29 01:37:32 +0000	[diff] [blame]	137	\var{charjunk}: A function that accepts a character (a string of
				138	length 1), and returns if the character is junk, or false if not.
Fred Drake	6943a29	2001-08-13 19:31:59 +0000	[diff] [blame]	139	The default is module-level function \function{IS_CHARACTER_JUNK()},
				140	which filters out whitespace characters (a blank or tab; note: bad
				141	idea to include newline in this!).
				142
				143	\file{Tools/scripts/ndiff.py} is a command-line front-end to this
				144	function.
				145
				146	\begin{verbatim}
				147	>>> diff = ndiff('one\ntwo\nthree\n'.splitlines(1),
Fred Drake	edb635f	2002-12-06 18:52:28 +0000	[diff] [blame]	148	... 'ore\ntree\nemu\n'.splitlines(1))
Fred Drake	6943a29	2001-08-13 19:31:59 +0000	[diff] [blame]	149	>>> print ''.join(diff),
				150	- one
Tim Peters	8a9c284	2001-09-22 21:30:22 +0000	[diff] [blame]	151	? ^
Fred Drake	6943a29	2001-08-13 19:31:59 +0000	[diff] [blame]	152	+ ore
Tim Peters	8a9c284	2001-09-22 21:30:22 +0000	[diff] [blame]	153	? ^
Fred Drake	6943a29	2001-08-13 19:31:59 +0000	[diff] [blame]	154	- two
				155	- three
Tim Peters	8a9c284	2001-09-22 21:30:22 +0000	[diff] [blame]	156	? -
Fred Drake	6943a29	2001-08-13 19:31:59 +0000	[diff] [blame]	157	+ tree
				158	+ emu
				159	\end{verbatim}
				160	\end{funcdesc}
				161
				162	\begin{funcdesc}{restore}{sequence, which}
				163	Return one of the two sequences that generated a delta.
				164
				165	Given a \var{sequence} produced by \method{Differ.compare()} or
				166	\function{ndiff()}, extract lines originating from file 1 or 2
				167	(parameter \var{which}), stripping off line prefixes.
				168
				169	Example:
				170
				171	\begin{verbatim}
				172	>>> diff = ndiff('one\ntwo\nthree\n'.splitlines(1),
				173	... 'ore\ntree\nemu\n'.splitlines(1))
Tim Peters	8a9c284	2001-09-22 21:30:22 +0000	[diff] [blame]	174	>>> diff = list(diff) # materialize the generated delta into a list
Fred Drake	6943a29	2001-08-13 19:31:59 +0000	[diff] [blame]	175	>>> print ''.join(restore(diff, 1)),
				176	one
				177	two
				178	three
				179	>>> print ''.join(restore(diff, 2)),
				180	ore
				181	tree
				182	emu
				183	\end{verbatim}
				184
				185	\end{funcdesc}
				186
Raymond Hettinger	e07b835	2003-06-09 21:44:59 +0000	[diff] [blame]	187	\begin{funcdesc}{unified_diff}{a, b\optional{, fromfile\optional{, tofile
				188	\optional{, fromfiledate\optional{, tofiledate\optional{, n
				189	\optional{, lineterm}}}}}}}
Raymond Hettinger	e07b835	2003-06-09 21:44:59 +0000	[diff] [blame]	190	Compare \var{a} and \var{b} (lists of strings); return a
				191	delta (a generator generating the delta lines) in unified diff
				192	format.
				193
				194	Unified diffs are a compact way of showing just the lines that have
				195	changed plus a few lines of context. The changes are shown in a
				196	inline style (instead of separate before/after blocks). The number
				197	of context lines is set by \var{n} which defaults to three.
				198
				199	By default, the diff control lines (those with \code{---}, \code{+++},
				200	or \code{@@}) are created with a trailing newline. This is helpful so
				201	that inputs created from \function{file.readlines()} result in diffs
				202	that are suitable for use with \function{file.writelines()} since both
				203	the inputs and outputs have trailing newlines.
				204
				205	For inputs that do not have trailing newlines, set the \var{lineterm}
				206	argument to \code{""} so that the output will be uniformly newline free.
				207
				208	The context diff format normally has a header for filenames and
				209	modification times. Any or all of these may be specified using strings for
				210	\var{fromfile}, \var{tofile}, \var{fromfiledate}, and \var{tofiledate}.
				211	The modification times are normally expressed in the format returned by
				212	\function{time.ctime()}. If not specified, the strings default to blanks.
				213
				214	\file{Tools/scripts/diff.py} is a command-line front-end for this
Raymond Hettinger	132fa37	2003-06-11 07:50:44 +0000	[diff] [blame]	215	function.
				216
				217	\versionadded{2.3}
Raymond Hettinger	e07b835	2003-06-09 21:44:59 +0000	[diff] [blame]	218	\end{funcdesc}
Fred Drake	6943a29	2001-08-13 19:31:59 +0000	[diff] [blame]	219
Fred Drake	7f10cce	2001-10-26 03:04:23 +0000	[diff] [blame]	220	\begin{funcdesc}{IS_LINE_JUNK}{line}
				221	Return true for ignorable lines. The line \var{line} is ignorable
				222	if \var{line} is blank or contains a single \character{\#},
				223	otherwise it is not ignorable. Used as a default for parameter
Tim Peters	81b9251	2002-04-29 01:37:32 +0000	[diff] [blame]	224	\var{linejunk} in \function{ndiff()} before Python 2.3.
Fred Drake	6943a29	2001-08-13 19:31:59 +0000	[diff] [blame]	225	\end{funcdesc}
				226
				227
Fred Drake	7f10cce	2001-10-26 03:04:23 +0000	[diff] [blame]	228	\begin{funcdesc}{IS_CHARACTER_JUNK}{ch}
				229	Return true for ignorable characters. The character \var{ch} is
				230	ignorable if \var{ch} is a space or tab, otherwise it is not
				231	ignorable. Used as a default for parameter \var{charjunk} in
Fred Drake	6943a29	2001-08-13 19:31:59 +0000	[diff] [blame]	232	\function{ndiff()}.
Fred Drake	6943a29	2001-08-13 19:31:59 +0000	[diff] [blame]	233	\end{funcdesc}
Fred Drake	baf7142	2001-02-19 16:31:02 +0000	[diff] [blame]	234
				235
Fred Drake	6fda3ac	2001-04-10 18:41:16 +0000	[diff] [blame]	236	\begin{seealso}
Fred Drake	1fe9750	2004-01-21 18:30:28 +0000	[diff] [blame]	237	\seetitle[http://www.ddj.com/documents/s=1103/ddj8807c/]
				238	{Pattern Matching: The Gestalt Approach}{Discussion of a
Fred Drake	6fda3ac	2001-04-10 18:41:16 +0000	[diff] [blame]	239	similar algorithm by John W. Ratcliff and D. E. Metzener.
				240	This was published in
				241	\citetitle[http://www.ddj.com/]{Dr. Dobb's Journal} in
				242	July, 1988.}
				243	\end{seealso}
				244
				245
Fred Drake	baf7142	2001-02-19 16:31:02 +0000	[diff] [blame]	246	\subsection{SequenceMatcher Objects \label{sequence-matcher}}
				247
Fred Drake	96d7a70	2001-05-11 01:08:13 +0000	[diff] [blame]	248	The \class{SequenceMatcher} class has this constructor:
				249
Fred Drake	baf7142	2001-02-19 16:31:02 +0000	[diff] [blame]	250	\begin{classdesc}{SequenceMatcher}{\optional{isjunk\optional{,
				251	a\optional{, b}}}}
				252	Optional argument \var{isjunk} must be \code{None} (the default) or
				253	a one-argument function that takes a sequence element and returns
				254	true if and only if the element is ``junk'' and should be ignored.
Fred Drake	7f10cce	2001-10-26 03:04:23 +0000	[diff] [blame]	255	Passing \code{None} for \var{b} is equivalent to passing
				256	\code{lambda x: 0}; in other words, no elements are ignored. For
				257	example, pass:
Fred Drake	baf7142	2001-02-19 16:31:02 +0000	[diff] [blame]	258
				259	\begin{verbatim}
Fred Drake	447f545	2001-02-23 19:13:07 +0000	[diff] [blame]	260	lambda x: x in " \t"
Fred Drake	baf7142	2001-02-19 16:31:02 +0000	[diff] [blame]	261	\end{verbatim}
				262
				263	if you're comparing lines as sequences of characters, and don't want
				264	to synch up on blanks or hard tabs.
				265
				266	The optional arguments \var{a} and \var{b} are sequences to be
				267	compared; both default to empty strings. The elements of both
				268	sequences must be hashable.
				269	\end{classdesc}
				270
				271
				272	\class{SequenceMatcher} objects have the following methods:
				273
				274	\begin{methoddesc}{set_seqs}{a, b}
				275	Set the two sequences to be compared.
				276	\end{methoddesc}
				277
				278	\class{SequenceMatcher} computes and caches detailed information about
				279	the second sequence, so if you want to compare one sequence against
				280	many sequences, use \method{set_seq2()} to set the commonly used
				281	sequence once and call \method{set_seq1()} repeatedly, once for each
				282	of the other sequences.
				283
				284	\begin{methoddesc}{set_seq1}{a}
				285	Set the first sequence to be compared. The second sequence to be
				286	compared is not changed.
				287	\end{methoddesc}
				288
				289	\begin{methoddesc}{set_seq2}{b}
				290	Set the second sequence to be compared. The first sequence to be
				291	compared is not changed.
				292	\end{methoddesc}
				293
				294	\begin{methoddesc}{find_longest_match}{alo, ahi, blo, bhi}
				295	Find longest matching block in \code{\var{a}[\var{alo}:\var{ahi}]}
				296	and \code{\var{b}[\var{blo}:\var{bhi}]}.
				297
				298	If \var{isjunk} was omitted or \code{None},
				299	\method{get_longest_match()} returns \code{(\var{i}, \var{j},
				300	\var{k})} such that \code{\var{a}[\var{i}:\var{i}+\var{k}]} is equal
Tim Peters	8a9c284	2001-09-22 21:30:22 +0000	[diff] [blame]	301	to \code{\var{b}[\var{j}:\var{j}+\var{k}]}, where
Fred Drake	baf7142	2001-02-19 16:31:02 +0000	[diff] [blame]	302	\code{\var{alo} <= \var{i} <= \var{i}+\var{k} <= \var{ahi}} and
				303	\code{\var{blo} <= \var{j} <= \var{j}+\var{k} <= \var{bhi}}.
				304	For all \code{(\var{i'}, \var{j'}, \var{k'})} meeting those
				305	conditions, the additional conditions
				306	\code{\var{k} >= \var{k'}},
				307	\code{\var{i} <= \var{i'}},
				308	and if \code{\var{i} == \var{i'}}, \code{\var{j} <= \var{j'}}
				309	are also met.
				310	In other words, of all maximal matching blocks, return one that
				311	starts earliest in \var{a}, and of all those maximal matching blocks
				312	that start earliest in \var{a}, return the one that starts earliest
				313	in \var{b}.
				314
				315	\begin{verbatim}
				316	>>> s = SequenceMatcher(None, " abcd", "abcd abcd")
				317	>>> s.find_longest_match(0, 5, 0, 9)
				318	(0, 4, 5)
				319	\end{verbatim}
				320
				321	If \var{isjunk} was provided, first the longest matching block is
				322	determined as above, but with the additional restriction that no
				323	junk element appears in the block. Then that block is extended as
				324	far as possible by matching (only) junk elements on both sides.
				325	So the resulting block never matches on junk except as identical
				326	junk happens to be adjacent to an interesting match.
				327
				328	Here's the same example as before, but considering blanks to be junk.
Tim Peters	754ba58	2001-02-20 11:24:35 +0000	[diff] [blame]	329	That prevents \code{' abcd'} from matching the \code{' abcd'} at the
Fred Drake	baf7142	2001-02-19 16:31:02 +0000	[diff] [blame]	330	tail end of the second sequence directly. Instead only the
				331	\code{'abcd'} can match, and matches the leftmost \code{'abcd'} in
				332	the second sequence:
				333
				334	\begin{verbatim}
				335	>>> s = SequenceMatcher(lambda x: x==" ", " abcd", "abcd abcd")
				336	>>> s.find_longest_match(0, 5, 0, 9)
				337	(1, 0, 4)
				338	\end{verbatim}
				339
				340	If no blocks match, this returns \code{(\var{alo}, \var{blo}, 0)}.
				341	\end{methoddesc}
				342
				343	\begin{methoddesc}{get_matching_blocks}{}
				344	Return list of triples describing matching subsequences.
				345	Each triple is of the form \code{(\var{i}, \var{j}, \var{n})}, and
				346	means that \code{\var{a}[\var{i}:\var{i}+\var{n}] ==
				347	\var{b}[\var{j}:\var{j}+\var{n}]}. The triples are monotonically
				348	increasing in \var{i} and \var{j}.
				349
				350	The last triple is a dummy, and has the value \code{(len(\var{a}),
				351	len(\var{b}), 0)}. It is the only triple with \code{\var{n} == 0}.
				352	% Explain why a dummy is used!
				353
				354	\begin{verbatim}
				355	>>> s = SequenceMatcher(None, "abxcd", "abcd")
				356	>>> s.get_matching_blocks()
				357	[(0, 0, 2), (3, 2, 2), (5, 4, 0)]
				358	\end{verbatim}
				359	\end{methoddesc}
				360
				361	\begin{methoddesc}{get_opcodes}{}
				362	Return list of 5-tuples describing how to turn \var{a} into \var{b}.
				363	Each tuple is of the form \code{(\var{tag}, \var{i1}, \var{i2},
				364	\var{j1}, \var{j2})}. The first tuple has \code{\var{i1} ==
				365	\var{j1} == 0}, and remaining tuples have \var{i1} equal to the
				366	\var{i2} from the preceeding tuple, and, likewise, \var{j1} equal to
				367	the previous \var{j2}.
				368
				369	The \var{tag} values are strings, with these meanings:
				370
				371	\begin{tableii}{l\|l}{code}{Value}{Meaning}
				372	\lineii{'replace'}{\code{\var{a}[\var{i1}:\var{i2}]} should be
				373	replaced by \code{\var{b}[\var{j1}:\var{j2}]}.}
				374	\lineii{'delete'}{\code{\var{a}[\var{i1}:\var{i2}]} should be
				375	deleted. Note that \code{\var{j1} == \var{j2}} in
				376	this case.}
				377	\lineii{'insert'}{\code{\var{b}[\var{j1}:\var{j2}]} should be
Tim Peters	8a9c284	2001-09-22 21:30:22 +0000	[diff] [blame]	378	inserted at \code{\var{a}[\var{i1}:\var{i1}]}.
Fred Drake	baf7142	2001-02-19 16:31:02 +0000	[diff] [blame]	379	Note that \code{\var{i1} == \var{i2}} in this
				380	case.}
				381	\lineii{'equal'}{\code{\var{a}[\var{i1}:\var{i2}] ==
				382	\var{b}[\var{j1}:\var{j2}]} (the sub-sequences are
				383	equal).}
				384	\end{tableii}
				385
				386	For example:
				387
				388	\begin{verbatim}
				389	>>> a = "qabxcd"
				390	>>> b = "abycdf"
				391	>>> s = SequenceMatcher(None, a, b)
				392	>>> for tag, i1, i2, j1, j2 in s.get_opcodes():
				393	... print ("%7s a[%d:%d] (%s) b[%d:%d] (%s)" %
				394	... (tag, i1, i2, a[i1:i2], j1, j2, b[j1:j2]))
				395	delete a[0:1] (q) b[0:0] ()
				396	equal a[1:3] (ab) b[0:2] (ab)
				397	replace a[3:4] (x) b[2:3] (y)
				398	equal a[4:6] (cd) b[3:5] (cd)
				399	insert a[6:6] () b[5:6] (f)
				400	\end{verbatim}
				401	\end{methoddesc}
				402
Raymond Hettinger	132fa37	2003-06-11 07:50:44 +0000	[diff] [blame]	403	\begin{methoddesc}{get_grouped_opcodes}{\optional{n}}
				404	Return a generator of groups with up to \var{n} lines of context.
				405
				406	Starting with the groups returned by \method{get_opcodes()},
				407	this method splits out smaller change clusters and eliminates
				408	intervening ranges which have no changes.
				409
				410	The groups are returned in the same format as \method{get_opcodes()}.
				411	\versionadded{2.3}
				412	\end{methoddesc}
				413
Fred Drake	baf7142	2001-02-19 16:31:02 +0000	[diff] [blame]	414	\begin{methoddesc}{ratio}{}
				415	Return a measure of the sequences' similarity as a float in the
				416	range [0, 1].
				417
				418	Where T is the total number of elements in both sequences, and M is
Fred Drake	6943a29	2001-08-13 19:31:59 +0000	[diff] [blame]	419	the number of matches, this is 2.0*M / T. Note that this is
				420	\code{1.0} if the sequences are identical, and \code{0.0} if they
				421	have nothing in common.
Fred Drake	baf7142	2001-02-19 16:31:02 +0000	[diff] [blame]	422
				423	This is expensive to compute if \method{get_matching_blocks()} or
				424	\method{get_opcodes()} hasn't already been called, in which case you
				425	may want to try \method{quick_ratio()} or
				426	\method{real_quick_ratio()} first to get an upper bound.
				427	\end{methoddesc}
				428
				429	\begin{methoddesc}{quick_ratio}{}
				430	Return an upper bound on \method{ratio()} relatively quickly.
				431
				432	This isn't defined beyond that it is an upper bound on
				433	\method{ratio()}, and is faster to compute.
				434	\end{methoddesc}
				435
				436	\begin{methoddesc}{real_quick_ratio}{}
				437	Return an upper bound on \method{ratio()} very quickly.
				438
				439	This isn't defined beyond that it is an upper bound on
				440	\method{ratio()}, and is faster to compute than either
				441	\method{ratio()} or \method{quick_ratio()}.
				442	\end{methoddesc}
				443
Tim Peters	754ba58	2001-02-20 11:24:35 +0000	[diff] [blame]	444	The three methods that return the ratio of matching to total characters
				445	can give different results due to differing levels of approximation,
				446	although \method{quick_ratio()} and \method{real_quick_ratio()} are always
				447	at least as large as \method{ratio()}:
Fred Drake	baf7142	2001-02-19 16:31:02 +0000	[diff] [blame]	448
				449	\begin{verbatim}
				450	>>> s = SequenceMatcher(None, "abcd", "bcde")
				451	>>> s.ratio()
				452	0.75
				453	>>> s.quick_ratio()
				454	0.75
				455	>>> s.real_quick_ratio()
				456	1.0
				457	\end{verbatim}
				458
				459
Fred Drake	6943a29	2001-08-13 19:31:59 +0000	[diff] [blame]	460	\subsection{SequenceMatcher Examples \label{sequencematcher-examples}}
Fred Drake	baf7142	2001-02-19 16:31:02 +0000	[diff] [blame]	461
				462
				463	This example compares two strings, considering blanks to be ``junk:''
				464
				465	\begin{verbatim}
				466	>>> s = SequenceMatcher(lambda x: x == " ",
				467	... "private Thread currentThread;",
				468	... "private volatile Thread currentThread;")
				469	\end{verbatim}
				470
				471	\method{ratio()} returns a float in [0, 1], measuring the similarity
				472	of the sequences. As a rule of thumb, a \method{ratio()} value over
				473	0.6 means the sequences are close matches:
				474
				475	\begin{verbatim}
				476	>>> print round(s.ratio(), 3)
				477	0.866
				478	\end{verbatim}
				479
				480	If you're only interested in where the sequences match,
				481	\method{get_matching_blocks()} is handy:
				482
				483	\begin{verbatim}
				484	>>> for block in s.get_matching_blocks():
				485	... print "a[%d] and b[%d] match for %d elements" % block
				486	a[0] and b[0] match for 8 elements
				487	a[8] and b[17] match for 6 elements
				488	a[14] and b[23] match for 15 elements
				489	a[29] and b[38] match for 0 elements
				490	\end{verbatim}
				491
				492	Note that the last tuple returned by \method{get_matching_blocks()} is
				493	always a dummy, \code{(len(\var{a}), len(\var{b}), 0)}, and this is
				494	the only case in which the last tuple element (number of elements
				495	matched) is \code{0}.
				496
				497	If you want to know how to change the first sequence into the second,
				498	use \method{get_opcodes()}:
				499
				500	\begin{verbatim}
				501	>>> for opcode in s.get_opcodes():
				502	... print "%6s a[%d:%d] b[%d:%d]" % opcode
				503	equal a[0:8] b[0:8]
				504	insert a[8:8] b[8:17]
				505	equal a[8:14] b[17:23]
				506	equal a[14:29] b[23:38]
				507	\end{verbatim}
				508
Fred Drake	baf7142	2001-02-19 16:31:02 +0000	[diff] [blame]	509	See also the function \function{get_close_matches()} in this module,
				510	which shows how simple code building on \class{SequenceMatcher} can be
				511	used to do useful work.
Fred Drake	6943a29	2001-08-13 19:31:59 +0000	[diff] [blame]	512
				513
				514	\subsection{Differ Objects \label{differ-objects}}
				515
				516	Note that \class{Differ}-generated deltas make no claim to be
				517	\strong{minimal} diffs. To the contrary, minimal diffs are often
				518	counter-intuitive, because they synch up anywhere possible, sometimes
				519	accidental matches 100 pages apart. Restricting synch points to
				520	contiguous matches preserves some notion of locality, at the
				521	occasional cost of producing a longer diff.
				522
				523	The \class{Differ} class has this constructor:
				524
				525	\begin{classdesc}{Differ}{\optional{linejunk\optional{, charjunk}}}
				526	Optional keyword parameters \var{linejunk} and \var{charjunk} are
				527	for filter functions (or \code{None}):
				528
Tim Peters	81b9251	2002-04-29 01:37:32 +0000	[diff] [blame]	529	\var{linejunk}: A function that accepts a single string
				530	argument, and returns true if the string is junk. The default is
				531	\code{None}, meaning that no line is considered junk.
Fred Drake	6943a29	2001-08-13 19:31:59 +0000	[diff] [blame]	532
Tim Peters	81b9251	2002-04-29 01:37:32 +0000	[diff] [blame]	533	\var{charjunk}: A function that accepts a single character argument
				534	(a string of length 1), and returns true if the character is junk.
				535	The default is \code{None}, meaning that no character is
				536	considered junk.
Fred Drake	6943a29	2001-08-13 19:31:59 +0000	[diff] [blame]	537	\end{classdesc}
				538
				539	\class{Differ} objects are used (deltas generated) via a single
				540	method:
				541
				542	\begin{methoddesc}{compare}{a, b}
Tim Peters	8a9c284	2001-09-22 21:30:22 +0000	[diff] [blame]	543	Compare two sequences of lines, and generate the delta (a sequence
				544	of lines).
Fred Drake	6943a29	2001-08-13 19:31:59 +0000	[diff] [blame]	545
				546	Each sequence must contain individual single-line strings ending
				547	with newlines. Such sequences can be obtained from the
Tim Peters	8a9c284	2001-09-22 21:30:22 +0000	[diff] [blame]	548	\method{readlines()} method of file-like objects. The delta generated
				549	also consists of newline-terminated strings, ready to be printed as-is
Fred Drake	389aa17	2001-11-29 19:04:50 +0000	[diff] [blame]	550	via the \method{writelines()} method of a file-like object.
Fred Drake	6943a29	2001-08-13 19:31:59 +0000	[diff] [blame]	551	\end{methoddesc}
				552
				553
				554	\subsection{Differ Example \label{differ-examples}}
				555
				556	This example compares two texts. First we set up the texts, sequences
				557	of individual single-line strings ending with newlines (such sequences
				558	can also be obtained from the \method{readlines()} method of file-like
				559	objects):
				560
				561	\begin{verbatim}
				562	>>> text1 = ''' 1. Beautiful is better than ugly.
				563	... 2. Explicit is better than implicit.
				564	... 3. Simple is better than complex.
				565	... 4. Complex is better than complicated.
				566	... '''.splitlines(1)
				567	>>> len(text1)
				568	4
				569	>>> text1[0][-1]
				570	'\n'
				571	>>> text2 = ''' 1. Beautiful is better than ugly.
				572	... 3. Simple is better than complex.
				573	... 4. Complicated is better than complex.
				574	... 5. Flat is better than nested.
				575	... '''.splitlines(1)
				576	\end{verbatim}
				577
				578	Next we instantiate a Differ object:
				579
				580	\begin{verbatim}
				581	>>> d = Differ()
				582	\end{verbatim}
				583
				584	Note that when instantiating a \class{Differ} object we may pass
				585	functions to filter out line and character ``junk.'' See the
				586	\method{Differ()} constructor for details.
				587
				588	Finally, we compare the two:
				589
				590	\begin{verbatim}
Tim Peters	8a9c284	2001-09-22 21:30:22 +0000	[diff] [blame]	591	>>> result = list(d.compare(text1, text2))
Fred Drake	6943a29	2001-08-13 19:31:59 +0000	[diff] [blame]	592	\end{verbatim}
				593
				594	\code{result} is a list of strings, so let's pretty-print it:
				595
				596	\begin{verbatim}
				597	>>> from pprint import pprint
				598	>>> pprint(result)
				599	[' 1. Beautiful is better than ugly.\n',
				600	'- 2. Explicit is better than implicit.\n',
				601	'- 3. Simple is better than complex.\n',
				602	'+ 3. Simple is better than complex.\n',
				603	'? ++ \n',
				604	'- 4. Complex is better than complicated.\n',
				605	'? ^ ---- ^ \n',
				606	'+ 4. Complicated is better than complex.\n',
				607	'? ++++ ^ ^ \n',
				608	'+ 5. Flat is better than nested.\n']
				609	\end{verbatim}
				610
				611	As a single multi-line string it looks like this:
				612
				613	\begin{verbatim}
				614	>>> import sys
				615	>>> sys.stdout.writelines(result)
				616	1. Beautiful is better than ugly.
				617	- 2. Explicit is better than implicit.
				618	- 3. Simple is better than complex.
				619	+ 3. Simple is better than complex.
				620	? ++
				621	- 4. Complex is better than complicated.
				622	? ^ ---- ^
				623	+ 4. Complicated is better than complex.
				624	? ++++ ^ ^
				625	+ 5. Flat is better than nested.
				626	\end{verbatim}