Blame - Doc/library/difflib.rst - platform/external/python/cpython3

blob: c2a19dc019bb3792c60475cbdd9a910a29c41938 [file] [log] [blame]

Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	1	:mod:`difflib` --- Helpers for computing deltas
				2	===============================================
				3
				4	.. module:: difflib
				5	:synopsis: Helpers for computing differences between objects.
Terry Jan Reedy	fa089b9	2016-06-11 15:02:54 -0400	[diff] [blame]	6
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	7	.. moduleauthor:: Tim Peters <tim_one@users.sourceforge.net>
				8	.. sectionauthor:: Tim Peters <tim_one@users.sourceforge.net>
Christian Heimes	5b5e81c	2007-12-31 16:14:33 +0000	[diff] [blame]	9	.. Markup by Fred L. Drake, Jr. <fdrake@acm.org>
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	10
Andrew Kuchling	2e3743c	2014-03-19 16:23:01 -0400	[diff] [blame]	11	Source code: :source:`Lib/difflib.py`
				12
Christian Heimes	fe337bf	2008-03-23 21:54:12 +0000	[diff] [blame]	13	.. testsetup::
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	14
Christian Heimes	fe337bf	2008-03-23 21:54:12 +0000	[diff] [blame]	15	import sys
				16	from difflib import *
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	17
Terry Jan Reedy	fa089b9	2016-06-11 15:02:54 -0400	[diff] [blame]	18	--------------
				19
Georg Brandl	9afde1c	2007-11-01 20:32:30 +0000	[diff] [blame]	20	This module provides classes and functions for comparing sequences. It
				21	can be used for example, for comparing files, and can produce difference
				22	information in various formats, including HTML and context and unified
				23	diffs. For comparing directories and files, see also, the :mod:`filecmp` module.
				24
Terry Reedy	99f9637	2010-11-25 06:12:34 +0000	[diff] [blame]	25
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	26	.. class:: SequenceMatcher
				27
				28	This is a flexible class for comparing pairs of sequences of any type, so long
Guido van Rossum	2cc30da	2007-11-02 23:46:40 +0000	[diff] [blame]	29	as the sequence elements are :term:`hashable`. The basic algorithm predates, and is a
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	30	little fancier than, an algorithm published in the late 1980's by Ratcliff and
				31	Obershelp under the hyperbolic name "gestalt pattern matching." The idea is to
				32	find the longest contiguous matching subsequence that contains no "junk"
Andrew Kuchling	c51da2b	2014-03-19 16:43:06 -0400	[diff] [blame]	33	elements; these "junk" elements are ones that are uninteresting in some
				34	sense, such as blank lines or whitespace. (Handling junk is an
				35	extension to the Ratcliff and Obershelp algorithm.) The same
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	36	idea is then applied recursively to the pieces of the sequences to the left and
				37	to the right of the matching subsequence. This does not yield minimal edit
				38	sequences, but does tend to yield matches that "look right" to people.
				39
				40	Timing: The basic Ratcliff-Obershelp algorithm is cubic time in the worst
				41	case and quadratic time in the expected case. :class:`SequenceMatcher` is
				42	quadratic time for the worst case and has expected-case behavior dependent in a
				43	complicated way on how many elements the sequences have in common; best case
				44	time is linear.
				45
Terry Reedy	99f9637	2010-11-25 06:12:34 +0000	[diff] [blame]	46	Automatic junk heuristic: :class:`SequenceMatcher` supports a heuristic that
				47	automatically treats certain sequence items as junk. The heuristic counts how many
				48	times each individual item appears in the sequence. If an item's duplicates (after
				49	the first one) account for more than 1% of the sequence and the sequence is at least
				50	200 items long, this item is marked as "popular" and is treated as junk for
				51	the purpose of sequence matching. This heuristic can be turned off by setting
				52	the ``autojunk`` argument to ``False`` when creating the :class:`SequenceMatcher`.
				53
Terry Reedy	dc9b17d	2010-11-27 20:52:14 +0000	[diff] [blame]	54	.. versionadded:: 3.2
				55	The autojunk parameter.
				56
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	57
				58	.. class:: Differ
				59
				60	This is a class for comparing sequences of lines of text, and producing
				61	human-readable differences or deltas. Differ uses :class:`SequenceMatcher`
				62	both to compare sequences of lines, and to compare sequences of characters
				63	within similar (near-matching) lines.
				64
				65	Each line of a :class:`Differ` delta begins with a two-letter code:
				66
				67	+----------+-------------------------------------------+
				68	\| Code \| Meaning \|
				69	+==========+===========================================+
				70	\| ``'- '`` \| line unique to sequence 1 \|
				71	+----------+-------------------------------------------+
				72	\| ``'+ '`` \| line unique to sequence 2 \|
				73	+----------+-------------------------------------------+
				74	\| ``' '`` \| line common to both sequences \|
				75	+----------+-------------------------------------------+
				76	\| ``'? '`` \| line not present in either input sequence \|
				77	+----------+-------------------------------------------+
				78
				79	Lines beginning with '``?``' attempt to guide the eye to intraline differences,
				80	and were not present in either input sequence. These lines can be confusing if
				81	the sequences contain tab characters.
				82
				83
				84	.. class:: HtmlDiff
				85
				86	This class can be used to create an HTML table (or a complete HTML file
				87	containing the table) showing a side by side, line by line comparison of text
				88	with inter-line and intra-line change highlights. The table can be generated in
				89	either full or contextual difference mode.
				90
				91	The constructor for this class is:
				92
				93
Georg Brandl	c2a4f4f	2009-04-10 09:03:43 +0000	[diff] [blame]	94	.. method:: __init__(tabsize=8, wrapcolumn=None, linejunk=None, charjunk=IS_CHARACTER_JUNK)
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	95
				96	Initializes instance of :class:`HtmlDiff`.
				97
				98	tabsize is an optional keyword argument to specify tab stop spacing and
				99	defaults to ``8``.
				100
				101	wrapcolumn is an optional keyword to specify column number where lines are
				102	broken and wrapped, defaults to ``None`` where lines are not wrapped.
				103
Terry Jan Reedy	3e8a7ad	2015-10-30 19:41:16 -0400	[diff] [blame]	104	linejunk and charjunk are optional keyword arguments passed into :func:`ndiff`
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	105	(used by :class:`HtmlDiff` to generate the side by side HTML differences). See
Terry Jan Reedy	3e8a7ad	2015-10-30 19:41:16 -0400	[diff] [blame]	106	:func:`ndiff` documentation for argument default values and descriptions.
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	107
				108	The following methods are public:
				109
Berker Peksag	102029d	2015-03-15 01:18:47 +0200	[diff] [blame]	110	.. method:: make_file(fromlines, tolines, fromdesc='', todesc='', context=False, \
				111	numlines=5, *, charset='utf-8')
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	112
				113	Compares fromlines and tolines (lists of strings) and returns a string which
				114	is a complete HTML file containing a table showing line by line differences with
				115	inter-line and intra-line changes highlighted.
				116
				117	fromdesc and todesc are optional keyword arguments to specify from/to file
				118	column header strings (both default to an empty string).
				119
				120	context and numlines are both optional keyword arguments. Set context to
				121	``True`` when contextual differences are to be shown, else the default is
				122	``False`` to show the full files. numlines defaults to ``5``. When context
				123	is ``True`` numlines controls the number of context lines which surround the
				124	difference highlights. When context is ``False`` numlines controls the
				125	number of lines which are shown before a difference highlight when using the
				126	"next" hyperlinks (setting to zero would cause the "next" hyperlinks to place
				127	the next difference highlight at the top of the browser without any leading
				128	context).
				129
Xtreak	c78dae8	2019-09-11 12:21:31 +0100	[diff] [blame^]	130	.. note::
				131	fromdesc and todesc are interpreted as unescaped HTML and should be
				132	properly escaped while receiving input from untrusted sources.
				133
Berker Peksag	102029d	2015-03-15 01:18:47 +0200	[diff] [blame]	134	.. versionchanged:: 3.5
				135	charset keyword-only argument was added. The default charset of
				136	HTML document changed from ``'ISO-8859-1'`` to ``'utf-8'``.
				137
Georg Brandl	c2a4f4f	2009-04-10 09:03:43 +0000	[diff] [blame]	138	.. method:: make_table(fromlines, tolines, fromdesc='', todesc='', context=False, numlines=5)
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	139
				140	Compares fromlines and tolines (lists of strings) and returns a string which
				141	is a complete HTML table showing line by line differences with inter-line and
				142	intra-line changes highlighted.
				143
				144	The arguments for this method are the same as those for the :meth:`make_file`
				145	method.
				146
				147	:file:`Tools/scripts/diff.py` is a command-line front-end to this class and
				148	contains a good example of its use.
				149
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	150
Georg Brandl	c2a4f4f	2009-04-10 09:03:43 +0000	[diff] [blame]	151	.. function:: context_diff(a, b, fromfile='', tofile='', fromfiledate='', tofiledate='', n=3, lineterm='\\n')
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	152
Georg Brandl	9afde1c	2007-11-01 20:32:30 +0000	[diff] [blame]	153	Compare a and b (lists of strings); return a delta (a :term:`generator`
				154	generating the delta lines) in context diff format.
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	155
				156	Context diffs are a compact way of showing just the lines that have changed plus
				157	a few lines of context. The changes are shown in a before/after style. The
				158	number of context lines is set by n which defaults to three.
				159
				160	By default, the diff control lines (those with ``***`` or ``---``) are created
				161	with a trailing newline. This is helpful so that inputs created from
Serhiy Storchaka	bfdcd43	2013-10-13 23:09:14 +0300	[diff] [blame]	162	:func:`io.IOBase.readlines` result in diffs that are suitable for use with
				163	:func:`io.IOBase.writelines` since both the inputs and outputs have trailing
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	164	newlines.
				165
				166	For inputs that do not have trailing newlines, set the lineterm argument to
				167	``""`` so that the output will be uniformly newline free.
				168
				169	The context diff format normally has a header for filenames and modification
				170	times. Any or all of these may be specified using strings for fromfile,
R. David Murray	b2416e5	2010-04-12 16:58:02 +0000	[diff] [blame]	171	tofile, fromfiledate, and tofiledate. The modification times are normally
				172	expressed in the ISO 8601 format. If not specified, the
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	173	strings default to blanks.
				174
Christian Heimes	8640e74	2008-02-23 16:23:06 +0000	[diff] [blame]	175	>>> s1 = ['bacon\n', 'eggs\n', 'ham\n', 'guido\n']
				176	>>> s2 = ['python\n', 'eggy\n', 'hamster\n', 'guido\n']
Zachary Ware	9f8b3a0	2016-08-10 00:59:59 -0500	[diff] [blame]	177	>>> sys.stdout.writelines(context_diff(s1, s2, fromfile='before.py', tofile='after.py'))
Christian Heimes	8640e74	2008-02-23 16:23:06 +0000	[diff] [blame]	178	*** before.py
				179	--- after.py
				180	***************
				181	* 1,4 **
				182	! bacon
				183	! eggs
				184	! ham
				185	guido
				186	--- 1,4 ----
				187	! python
				188	! eggy
				189	! hamster
				190	guido
				191
				192	See :ref:`difflib-interface` for a more detailed example.
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	193
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	194
Georg Brandl	c2a4f4f	2009-04-10 09:03:43 +0000	[diff] [blame]	195	.. function:: get_close_matches(word, possibilities, n=3, cutoff=0.6)
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	196
				197	Return a list of the best "good enough" matches. word is a sequence for which
				198	close matches are desired (typically a string), and possibilities is a list of
				199	sequences against which to match word (typically a list of strings).
				200
				201	Optional argument n (default ``3``) is the maximum number of close matches to
				202	return; n must be greater than ``0``.
				203
				204	Optional argument cutoff (default ``0.6``) is a float in the range [0, 1].
				205	Possibilities that don't score at least that similar to word are ignored.
				206
				207	The best (no more than n) matches among the possibilities are returned in a
Christian Heimes	fe337bf	2008-03-23 21:54:12 +0000	[diff] [blame]	208	list, sorted by similarity score, most similar first.
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	209
				210	>>> get_close_matches('appel', ['ape', 'apple', 'peach', 'puppy'])
				211	['apple', 'ape']
				212	>>> import keyword
				213	>>> get_close_matches('wheel', keyword.kwlist)
				214	['while']
Zachary Ware	9f8b3a0	2016-08-10 00:59:59 -0500	[diff] [blame]	215	>>> get_close_matches('pineapple', keyword.kwlist)
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	216	[]
				217	>>> get_close_matches('accept', keyword.kwlist)
				218	['except']
				219
				220
Georg Brandl	c2a4f4f	2009-04-10 09:03:43 +0000	[diff] [blame]	221	.. function:: ndiff(a, b, linejunk=None, charjunk=IS_CHARACTER_JUNK)
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	222
Georg Brandl	9afde1c	2007-11-01 20:32:30 +0000	[diff] [blame]	223	Compare a and b (lists of strings); return a :class:`Differ`\ -style
				224	delta (a :term:`generator` generating the delta lines).
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	225
Andrew Kuchling	c51da2b	2014-03-19 16:43:06 -0400	[diff] [blame]	226	Optional keyword parameters linejunk and charjunk are filtering functions
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	227	(or ``None``):
				228
Georg Brandl	e6bcc91	2008-05-12 18:05:20 +0000	[diff] [blame]	229	linejunk: A function that accepts a single string argument, and returns
				230	true if the string is junk, or false if not. The default is ``None``. There
				231	is also a module-level function :func:`IS_LINE_JUNK`, which filters out lines
				232	without visible characters, except for at most one pound character (``'#'``)
				233	-- however the underlying :class:`SequenceMatcher` class does a dynamic
				234	analysis of which lines are so frequent as to constitute noise, and this
				235	usually works better than using this function.
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	236
				237	charjunk: A function that accepts a character (a string of length 1), and
				238	returns if the character is junk, or false if not. The default is module-level
				239	function :func:`IS_CHARACTER_JUNK`, which filters out whitespace characters (a
Andrew Kuchling	c51da2b	2014-03-19 16:43:06 -0400	[diff] [blame]	240	blank or tab; it's a bad idea to include newline in this!).
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	241
Christian Heimes	fe337bf	2008-03-23 21:54:12 +0000	[diff] [blame]	242	:file:`Tools/scripts/ndiff.py` is a command-line front-end to this function.
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	243
Terry Jan Reedy	bddecc3	2014-04-18 17:00:19 -0400	[diff] [blame]	244	>>> diff = ndiff('one\ntwo\nthree\n'.splitlines(keepends=True),
				245	... 'ore\ntree\nemu\n'.splitlines(keepends=True))
Georg Brandl	6911e3c	2007-09-04 07:15:32 +0000	[diff] [blame]	246	>>> print(''.join(diff), end="")
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	247	- one
				248	? ^
				249	+ ore
				250	? ^
				251	- two
				252	- three
				253	? -
				254	+ tree
				255	+ emu
				256
				257
				258	.. function:: restore(sequence, which)
				259
				260	Return one of the two sequences that generated a delta.
				261
				262	Given a sequence produced by :meth:`Differ.compare` or :func:`ndiff`, extract
				263	lines originating from file 1 or 2 (parameter which), stripping off line
				264	prefixes.
				265
Christian Heimes	fe337bf	2008-03-23 21:54:12 +0000	[diff] [blame]	266	Example:
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	267
Terry Jan Reedy	bddecc3	2014-04-18 17:00:19 -0400	[diff] [blame]	268	>>> diff = ndiff('one\ntwo\nthree\n'.splitlines(keepends=True),
				269	... 'ore\ntree\nemu\n'.splitlines(keepends=True))
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	270	>>> diff = list(diff) # materialize the generated delta into a list
Georg Brandl	6911e3c	2007-09-04 07:15:32 +0000	[diff] [blame]	271	>>> print(''.join(restore(diff, 1)), end="")
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	272	one
				273	two
				274	three
Georg Brandl	6911e3c	2007-09-04 07:15:32 +0000	[diff] [blame]	275	>>> print(''.join(restore(diff, 2)), end="")
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	276	ore
				277	tree
				278	emu
				279
				280
Georg Brandl	c2a4f4f	2009-04-10 09:03:43 +0000	[diff] [blame]	281	.. function:: unified_diff(a, b, fromfile='', tofile='', fromfiledate='', tofiledate='', n=3, lineterm='\\n')
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	282
Georg Brandl	9afde1c	2007-11-01 20:32:30 +0000	[diff] [blame]	283	Compare a and b (lists of strings); return a delta (a :term:`generator`
				284	generating the delta lines) in unified diff format.
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	285
				286	Unified diffs are a compact way of showing just the lines that have changed plus
Martin Panter	7462b649	2015-11-02 03:37:02 +0000	[diff] [blame]	287	a few lines of context. The changes are shown in an inline style (instead of
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	288	separate before/after blocks). The number of context lines is set by n which
				289	defaults to three.
				290
				291	By default, the diff control lines (those with ``---``, ``+++``, or ``@@``) are
				292	created with a trailing newline. This is helpful so that inputs created from
Serhiy Storchaka	bfdcd43	2013-10-13 23:09:14 +0300	[diff] [blame]	293	:func:`io.IOBase.readlines` result in diffs that are suitable for use with
				294	:func:`io.IOBase.writelines` since both the inputs and outputs have trailing
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	295	newlines.
				296
				297	For inputs that do not have trailing newlines, set the lineterm argument to
				298	``""`` so that the output will be uniformly newline free.
				299
				300	The context diff format normally has a header for filenames and modification
				301	times. Any or all of these may be specified using strings for fromfile,
R. David Murray	b2416e5	2010-04-12 16:58:02 +0000	[diff] [blame]	302	tofile, fromfiledate, and tofiledate. The modification times are normally
				303	expressed in the ISO 8601 format. If not specified, the
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	304	strings default to blanks.
				305
Christian Heimes	8640e74	2008-02-23 16:23:06 +0000	[diff] [blame]	306
				307	>>> s1 = ['bacon\n', 'eggs\n', 'ham\n', 'guido\n']
				308	>>> s2 = ['python\n', 'eggy\n', 'hamster\n', 'guido\n']
Zachary Ware	9f8b3a0	2016-08-10 00:59:59 -0500	[diff] [blame]	309	>>> sys.stdout.writelines(unified_diff(s1, s2, fromfile='before.py', tofile='after.py'))
Christian Heimes	8640e74	2008-02-23 16:23:06 +0000	[diff] [blame]	310	--- before.py
				311	+++ after.py
				312	@@ -1,4 +1,4 @@
				313	-bacon
				314	-eggs
				315	-ham
				316	+python
				317	+eggy
				318	+hamster
				319	guido
				320
				321	See :ref:`difflib-interface` for a more detailed example.
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	322
Greg Ward	4d9d256	2015-04-20 20:21:21 -0400	[diff] [blame]	323	.. function:: diff_bytes(dfunc, a, b, fromfile=b'', tofile=b'', fromfiledate=b'', tofiledate=b'', n=3, lineterm=b'\\n')
				324
				325	Compare a and b (lists of bytes objects) using dfunc; yield a
				326	sequence of delta lines (also bytes) in the format returned by dfunc.
				327	dfunc must be a callable, typically either :func:`unified_diff` or
				328	:func:`context_diff`.
				329
				330	Allows you to compare data with unknown or inconsistent encoding. All
				331	inputs except n must be bytes objects, not str. Works by losslessly
				332	converting all inputs (except n) to str, and calling ``dfunc(a, b,
				333	fromfile, tofile, fromfiledate, tofiledate, n, lineterm)``. The output of
				334	dfunc is then converted back to bytes, so the delta lines that you
				335	receive have the same unknown/inconsistent encodings as a and b.
				336
				337	.. versionadded:: 3.5
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	338
				339	.. function:: IS_LINE_JUNK(line)
				340
				341	Return true for ignorable lines. The line line is ignorable if line is
				342	blank or contains a single ``'#'``, otherwise it is not ignorable. Used as a
Georg Brandl	e6bcc91	2008-05-12 18:05:20 +0000	[diff] [blame]	343	default for parameter linejunk in :func:`ndiff` in older versions.
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	344
				345
				346	.. function:: IS_CHARACTER_JUNK(ch)
				347
				348	Return true for ignorable characters. The character ch is ignorable if ch
				349	is a space or tab, otherwise it is not ignorable. Used as a default for
				350	parameter charjunk in :func:`ndiff`.
				351
				352
				353	.. seealso::
				354
Georg Brandl	525d355	2014-10-29 10:26:56 +0100	[diff] [blame]	355	`Pattern Matching: The Gestalt Approach <http://www.drdobbs.com/database/pattern-matching-the-gestalt-approach/184407970>`_
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	356	Discussion of a similar algorithm by John W. Ratcliff and D. E. Metzener. This
Georg Brandl	525d355	2014-10-29 10:26:56 +0100	[diff] [blame]	357	was published in `Dr. Dobb's Journal <http://www.drdobbs.com/>`_ in July, 1988.
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	358
				359
				360	.. _sequence-matcher:
				361
				362	SequenceMatcher Objects
				363	-----------------------
				364
				365	The :class:`SequenceMatcher` class has this constructor:
				366
				367
Terry Reedy	99f9637	2010-11-25 06:12:34 +0000	[diff] [blame]	368	.. class:: SequenceMatcher(isjunk=None, a='', b='', autojunk=True)
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	369
				370	Optional argument isjunk must be ``None`` (the default) or a one-argument
				371	function that takes a sequence element and returns true if and only if the
				372	element is "junk" and should be ignored. Passing ``None`` for isjunk is
				373	equivalent to passing ``lambda x: 0``; in other words, no elements are ignored.
				374	For example, pass::
				375
				376	lambda x: x in " \t"
				377
				378	if you're comparing lines as sequences of characters, and don't want to synch up
				379	on blanks or hard tabs.
				380
				381	The optional arguments a and b are sequences to be compared; both default to
Guido van Rossum	2cc30da	2007-11-02 23:46:40 +0000	[diff] [blame]	382	empty strings. The elements of both sequences must be :term:`hashable`.
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	383
Terry Reedy	99f9637	2010-11-25 06:12:34 +0000	[diff] [blame]	384	The optional argument autojunk can be used to disable the automatic junk
				385	heuristic.
				386
Terry Reedy	dc9b17d	2010-11-27 20:52:14 +0000	[diff] [blame]	387	.. versionadded:: 3.2
				388	The autojunk parameter.
				389
Terry Reedy	74a7c67	2010-12-03 18:57:42 +0000	[diff] [blame]	390	SequenceMatcher objects get three data attributes: bjunk is the
Serhiy Storchaka	fbc1c26	2013-11-29 12:17:13 +0200	[diff] [blame]	391	set of elements of b for which isjunk is ``True``; bpopular is the set of
Terry Reedy	17a5925	2010-12-15 20:18:10 +0000	[diff] [blame]	392	non-junk elements considered popular by the heuristic (if it is not
				393	disabled); b2j is a dict mapping the remaining elements of b to a list
				394	of positions where they occur. All three are reset whenever b is reset
				395	with :meth:`set_seqs` or :meth:`set_seq2`.
Terry Reedy	74a7c67	2010-12-03 18:57:42 +0000	[diff] [blame]	396
Georg Brandl	500be24	2010-12-03 19:56:42 +0000	[diff] [blame]	397	.. versionadded:: 3.2
Terry Reedy	74a7c67	2010-12-03 18:57:42 +0000	[diff] [blame]	398	The bjunk and bpopular attributes.
				399
Benjamin Peterson	e41251e	2008-04-25 01:59:09 +0000	[diff] [blame]	400	:class:`SequenceMatcher` objects have the following methods:
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	401
Benjamin Peterson	e41251e	2008-04-25 01:59:09 +0000	[diff] [blame]	402	.. method:: set_seqs(a, b)
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	403
Benjamin Peterson	e41251e	2008-04-25 01:59:09 +0000	[diff] [blame]	404	Set the two sequences to be compared.
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	405
Benjamin Peterson	e41251e	2008-04-25 01:59:09 +0000	[diff] [blame]	406	:class:`SequenceMatcher` computes and caches detailed information about the
				407	second sequence, so if you want to compare one sequence against many
				408	sequences, use :meth:`set_seq2` to set the commonly used sequence once and
				409	call :meth:`set_seq1` repeatedly, once for each of the other sequences.
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	410
				411
Benjamin Peterson	e41251e	2008-04-25 01:59:09 +0000	[diff] [blame]	412	.. method:: set_seq1(a)
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	413
Benjamin Peterson	e41251e	2008-04-25 01:59:09 +0000	[diff] [blame]	414	Set the first sequence to be compared. The second sequence to be compared
				415	is not changed.
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	416
				417
Benjamin Peterson	e41251e	2008-04-25 01:59:09 +0000	[diff] [blame]	418	.. method:: set_seq2(b)
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	419
Benjamin Peterson	e41251e	2008-04-25 01:59:09 +0000	[diff] [blame]	420	Set the second sequence to be compared. The first sequence to be compared
				421	is not changed.
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	422
				423
Benjamin Peterson	e41251e	2008-04-25 01:59:09 +0000	[diff] [blame]	424	.. method:: find_longest_match(alo, ahi, blo, bhi)
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	425
Benjamin Peterson	e41251e	2008-04-25 01:59:09 +0000	[diff] [blame]	426	Find longest matching block in ``a[alo:ahi]`` and ``b[blo:bhi]``.
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	427
Benjamin Peterson	e41251e	2008-04-25 01:59:09 +0000	[diff] [blame]	428	If isjunk was omitted or ``None``, :meth:`find_longest_match` returns
				429	``(i, j, k)`` such that ``a[i:i+k]`` is equal to ``b[j:j+k]``, where ``alo
				430	<= i <= i+k <= ahi`` and ``blo <= j <= j+k <= bhi``. For all ``(i', j',
				431	k')`` meeting those conditions, the additional conditions ``k >= k'``, ``i
				432	<= i'``, and if ``i == i'``, ``j <= j'`` are also met. In other words, of
				433	all maximal matching blocks, return one that starts earliest in a, and
				434	of all those maximal matching blocks that start earliest in a, return
				435	the one that starts earliest in b.
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	436
Benjamin Peterson	e41251e	2008-04-25 01:59:09 +0000	[diff] [blame]	437	>>> s = SequenceMatcher(None, " abcd", "abcd abcd")
				438	>>> s.find_longest_match(0, 5, 0, 9)
				439	Match(a=0, b=4, size=5)
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	440
Benjamin Peterson	e41251e	2008-04-25 01:59:09 +0000	[diff] [blame]	441	If isjunk was provided, first the longest matching block is determined
				442	as above, but with the additional restriction that no junk element appears
				443	in the block. Then that block is extended as far as possible by matching
				444	(only) junk elements on both sides. So the resulting block never matches
				445	on junk except as identical junk happens to be adjacent to an interesting
				446	match.
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	447
Benjamin Peterson	e41251e	2008-04-25 01:59:09 +0000	[diff] [blame]	448	Here's the same example as before, but considering blanks to be junk. That
				449	prevents ``' abcd'`` from matching the ``' abcd'`` at the tail end of the
				450	second sequence directly. Instead only the ``'abcd'`` can match, and
				451	matches the leftmost ``'abcd'`` in the second sequence:
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	452
Benjamin Peterson	e41251e	2008-04-25 01:59:09 +0000	[diff] [blame]	453	>>> s = SequenceMatcher(lambda x: x==" ", " abcd", "abcd abcd")
				454	>>> s.find_longest_match(0, 5, 0, 9)
				455	Match(a=1, b=0, size=4)
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	456
Benjamin Peterson	e41251e	2008-04-25 01:59:09 +0000	[diff] [blame]	457	If no blocks match, this returns ``(alo, blo, 0)``.
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	458
Benjamin Peterson	e41251e	2008-04-25 01:59:09 +0000	[diff] [blame]	459	This method returns a :term:`named tuple` ``Match(a, b, size)``.
Christian Heimes	25bb783	2008-01-11 16:17:00 +0000	[diff] [blame]	460
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	461
Benjamin Peterson	e41251e	2008-04-25 01:59:09 +0000	[diff] [blame]	462	.. method:: get_matching_blocks()
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	463
Terry Jan Reedy	d9bff4e	2018-10-26 23:03:08 -0400	[diff] [blame]	464	Return list of triples describing non-overlapping matching subsequences.
				465	Each triple is of the form ``(i, j, n)``,
				466	and means that ``a[i:i+n] == b[j:j+n]``. The
Benjamin Peterson	e41251e	2008-04-25 01:59:09 +0000	[diff] [blame]	467	triples are monotonically increasing in i and j.
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	468
Benjamin Peterson	e41251e	2008-04-25 01:59:09 +0000	[diff] [blame]	469	The last triple is a dummy, and has the value ``(len(a), len(b), 0)``. It
				470	is the only triple with ``n == 0``. If ``(i, j, n)`` and ``(i', j', n')``
				471	are adjacent triples in the list, and the second is not the last triple in
Terry Jan Reedy	d9bff4e	2018-10-26 23:03:08 -0400	[diff] [blame]	472	the list, then ``i+n < i'`` or ``j+n < j'``; in other words, adjacent
Benjamin Peterson	e41251e	2008-04-25 01:59:09 +0000	[diff] [blame]	473	triples always describe non-adjacent equal blocks.
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	474
Benjamin Peterson	e41251e	2008-04-25 01:59:09 +0000	[diff] [blame]	475	.. XXX Explain why a dummy is used!
Christian Heimes	5b5e81c	2007-12-31 16:14:33 +0000	[diff] [blame]	476
Benjamin Peterson	e41251e	2008-04-25 01:59:09 +0000	[diff] [blame]	477	.. doctest::
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	478
Benjamin Peterson	e41251e	2008-04-25 01:59:09 +0000	[diff] [blame]	479	>>> s = SequenceMatcher(None, "abxcd", "abcd")
				480	>>> s.get_matching_blocks()
				481	[Match(a=0, b=0, size=2), Match(a=3, b=2, size=2), Match(a=5, b=4, size=0)]
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	482
				483
Benjamin Peterson	e41251e	2008-04-25 01:59:09 +0000	[diff] [blame]	484	.. method:: get_opcodes()
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	485
Benjamin Peterson	e41251e	2008-04-25 01:59:09 +0000	[diff] [blame]	486	Return list of 5-tuples describing how to turn a into b. Each tuple is
				487	of the form ``(tag, i1, i2, j1, j2)``. The first tuple has ``i1 == j1 ==
				488	0``, and remaining tuples have i1 equal to the i2 from the preceding
				489	tuple, and, likewise, j1 equal to the previous j2.
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	490
Benjamin Peterson	e41251e	2008-04-25 01:59:09 +0000	[diff] [blame]	491	The tag values are strings, with these meanings:
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	492
Benjamin Peterson	e41251e	2008-04-25 01:59:09 +0000	[diff] [blame]	493	+---------------+---------------------------------------------+
				494	\| Value \| Meaning \|
				495	+===============+=============================================+
				496	\| ``'replace'`` \| ``a[i1:i2]`` should be replaced by \|
				497	\| \| ``b[j1:j2]``. \|
				498	+---------------+---------------------------------------------+
				499	\| ``'delete'`` \| ``a[i1:i2]`` should be deleted. Note that \|
				500	\| \| ``j1 == j2`` in this case. \|
				501	+---------------+---------------------------------------------+
				502	\| ``'insert'`` \| ``b[j1:j2]`` should be inserted at \|
				503	\| \| ``a[i1:i1]``. Note that ``i1 == i2`` in \|
				504	\| \| this case. \|
				505	+---------------+---------------------------------------------+
				506	\| ``'equal'`` \| ``a[i1:i2] == b[j1:j2]`` (the sub-sequences \|
				507	\| \| are equal). \|
				508	+---------------+---------------------------------------------+
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	509
Berker Peksag	eb2e02b	2016-03-11 23:19:48 +0200	[diff] [blame]	510	For example::
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	511
Benjamin Peterson	e41251e	2008-04-25 01:59:09 +0000	[diff] [blame]	512	>>> a = "qabxcd"
				513	>>> b = "abycdf"
				514	>>> s = SequenceMatcher(None, a, b)
				515	>>> for tag, i1, i2, j1, j2 in s.get_opcodes():
Berker Peksag	eb2e02b	2016-03-11 23:19:48 +0200	[diff] [blame]	516	... print('{:7} a[{}:{}] --> b[{}:{}] {!r:>8} --> {!r}'.format(
				517	... tag, i1, i2, j1, j2, a[i1:i2], b[j1:j2]))
Raymond Hettinger	dbb677a	2011-04-09 19:41:00 -0700	[diff] [blame]	518	delete a[0:1] --> b[0:0] 'q' --> ''
				519	equal a[1:3] --> b[0:2] 'ab' --> 'ab'
				520	replace a[3:4] --> b[2:3] 'x' --> 'y'
				521	equal a[4:6] --> b[3:5] 'cd' --> 'cd'
				522	insert a[6:6] --> b[5:6] '' --> 'f'
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	523
				524
Georg Brandl	c2a4f4f	2009-04-10 09:03:43 +0000	[diff] [blame]	525	.. method:: get_grouped_opcodes(n=3)
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	526
Benjamin Peterson	e41251e	2008-04-25 01:59:09 +0000	[diff] [blame]	527	Return a :term:`generator` of groups with up to n lines of context.
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	528
Benjamin Peterson	e41251e	2008-04-25 01:59:09 +0000	[diff] [blame]	529	Starting with the groups returned by :meth:`get_opcodes`, this method
				530	splits out smaller change clusters and eliminates intervening ranges which
				531	have no changes.
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	532
Benjamin Peterson	e41251e	2008-04-25 01:59:09 +0000	[diff] [blame]	533	The groups are returned in the same format as :meth:`get_opcodes`.
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	534
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	535
Benjamin Peterson	e41251e	2008-04-25 01:59:09 +0000	[diff] [blame]	536	.. method:: ratio()
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	537
Benjamin Peterson	e41251e	2008-04-25 01:59:09 +0000	[diff] [blame]	538	Return a measure of the sequences' similarity as a float in the range [0,
				539	1].
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	540
Benjamin Peterson	e41251e	2008-04-25 01:59:09 +0000	[diff] [blame]	541	Where T is the total number of elements in both sequences, and M is the
				542	number of matches, this is 2.0\*M / T. Note that this is ``1.0`` if the
				543	sequences are identical, and ``0.0`` if they have nothing in common.
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	544
Benjamin Peterson	e41251e	2008-04-25 01:59:09 +0000	[diff] [blame]	545	This is expensive to compute if :meth:`get_matching_blocks` or
				546	:meth:`get_opcodes` hasn't already been called, in which case you may want
				547	to try :meth:`quick_ratio` or :meth:`real_quick_ratio` first to get an
				548	upper bound.
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	549
sweeneyde	e9cbcd0	2019-08-07 00:37:08 -0400	[diff] [blame]	550	.. note::
				551
				552	Caution: The result of a :meth:`ratio` call may depend on the order of
				553	the arguments. For instance::
				554
				555	>>> SequenceMatcher(None, 'tide', 'diet').ratio()
				556	0.25
				557	>>> SequenceMatcher(None, 'diet', 'tide').ratio()
				558	0.5
				559
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	560
Benjamin Peterson	e41251e	2008-04-25 01:59:09 +0000	[diff] [blame]	561	.. method:: quick_ratio()
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	562
Benjamin Peterson	e41251e	2008-04-25 01:59:09 +0000	[diff] [blame]	563	Return an upper bound on :meth:`ratio` relatively quickly.
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	564
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	565
Benjamin Peterson	e41251e	2008-04-25 01:59:09 +0000	[diff] [blame]	566	.. method:: real_quick_ratio()
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	567
Benjamin Peterson	e41251e	2008-04-25 01:59:09 +0000	[diff] [blame]	568	Return an upper bound on :meth:`ratio` very quickly.
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	569
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	570
				571	The three methods that return the ratio of matching to total characters can give
				572	different results due to differing levels of approximation, although
				573	:meth:`quick_ratio` and :meth:`real_quick_ratio` are always at least as large as
Christian Heimes	fe337bf	2008-03-23 21:54:12 +0000	[diff] [blame]	574	:meth:`ratio`:
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	575
				576	>>> s = SequenceMatcher(None, "abcd", "bcde")
				577	>>> s.ratio()
				578	0.75
				579	>>> s.quick_ratio()
				580	0.75
				581	>>> s.real_quick_ratio()
				582	1.0
				583
				584
				585	.. _sequencematcher-examples:
				586
				587	SequenceMatcher Examples
				588	------------------------
				589
Terry Reedy	74a7c67	2010-12-03 18:57:42 +0000	[diff] [blame]	590	This example compares two strings, considering blanks to be "junk":
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	591
				592	>>> s = SequenceMatcher(lambda x: x == " ",
				593	... "private Thread currentThread;",
				594	... "private volatile Thread currentThread;")
				595
				596	:meth:`ratio` returns a float in [0, 1], measuring the similarity of the
				597	sequences. As a rule of thumb, a :meth:`ratio` value over 0.6 means the
Christian Heimes	fe337bf	2008-03-23 21:54:12 +0000	[diff] [blame]	598	sequences are close matches:
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	599
Georg Brandl	6911e3c	2007-09-04 07:15:32 +0000	[diff] [blame]	600	>>> print(round(s.ratio(), 3))
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	601	0.866
				602
				603	If you're only interested in where the sequences match,
Christian Heimes	fe337bf	2008-03-23 21:54:12 +0000	[diff] [blame]	604	:meth:`get_matching_blocks` is handy:
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	605
				606	>>> for block in s.get_matching_blocks():
Georg Brandl	6911e3c	2007-09-04 07:15:32 +0000	[diff] [blame]	607	... print("a[%d] and b[%d] match for %d elements" % block)
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	608	a[0] and b[0] match for 8 elements
Christian Heimes	fe337bf	2008-03-23 21:54:12 +0000	[diff] [blame]	609	a[8] and b[17] match for 21 elements
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	610	a[29] and b[38] match for 0 elements
				611
				612	Note that the last tuple returned by :meth:`get_matching_blocks` is always a
				613	dummy, ``(len(a), len(b), 0)``, and this is the only case in which the last
				614	tuple element (number of elements matched) is ``0``.
				615
				616	If you want to know how to change the first sequence into the second, use
Christian Heimes	fe337bf	2008-03-23 21:54:12 +0000	[diff] [blame]	617	:meth:`get_opcodes`:
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	618
				619	>>> for opcode in s.get_opcodes():
Georg Brandl	6911e3c	2007-09-04 07:15:32 +0000	[diff] [blame]	620	... print("%6s a[%d:%d] b[%d:%d]" % opcode)
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	621	equal a[0:8] b[0:8]
				622	insert a[8:8] b[8:17]
Christian Heimes	fe337bf	2008-03-23 21:54:12 +0000	[diff] [blame]	623	equal a[8:29] b[17:38]
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	624
Raymond Hettinger	58c8c26	2009-04-27 21:01:21 +0000	[diff] [blame]	625	.. seealso::
				626
				627	* The :func:`get_close_matches` function in this module which shows how
				628	simple code building on :class:`SequenceMatcher` can be used to do useful
				629	work.
				630
				631	* `Simple version control recipe
Serhiy Storchaka	6dff020	2016-05-07 10:49:07 +0300	[diff] [blame]	632	<https://code.activestate.com/recipes/576729/>`_ for a small application
Raymond Hettinger	58c8c26	2009-04-27 21:01:21 +0000	[diff] [blame]	633	built with :class:`SequenceMatcher`.
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	634
				635
				636	.. _differ-objects:
				637
				638	Differ Objects
				639	--------------
				640
				641	Note that :class:`Differ`\ -generated deltas make no claim to be minimal
				642	diffs. To the contrary, minimal diffs are often counter-intuitive, because they
				643	synch up anywhere possible, sometimes accidental matches 100 pages apart.
				644	Restricting synch points to contiguous matches preserves some notion of
				645	locality, at the occasional cost of producing a longer diff.
				646
				647	The :class:`Differ` class has this constructor:
				648
				649
Georg Brandl	c2a4f4f	2009-04-10 09:03:43 +0000	[diff] [blame]	650	.. class:: Differ(linejunk=None, charjunk=None)
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	651
				652	Optional keyword parameters linejunk and charjunk are for filter functions
				653	(or ``None``):
				654
				655	linejunk: A function that accepts a single string argument, and returns true
				656	if the string is junk. The default is ``None``, meaning that no line is
				657	considered junk.
				658
				659	charjunk: A function that accepts a single character argument (a string of
				660	length 1), and returns true if the character is junk. The default is ``None``,
				661	meaning that no character is considered junk.
				662
Andrew Kuchling	c51da2b	2014-03-19 16:43:06 -0400	[diff] [blame]	663	These junk-filtering functions speed up matching to find
				664	differences and do not cause any differing lines or characters to
				665	be ignored. Read the description of the
				666	:meth:`~SequenceMatcher.find_longest_match` method's isjunk
				667	parameter for an explanation.
				668
Benjamin Peterson	e41251e	2008-04-25 01:59:09 +0000	[diff] [blame]	669	:class:`Differ` objects are used (deltas generated) via a single method:
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	670
				671
Benjamin Peterson	e41251e	2008-04-25 01:59:09 +0000	[diff] [blame]	672	.. method:: Differ.compare(a, b)
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	673
Benjamin Peterson	e41251e	2008-04-25 01:59:09 +0000	[diff] [blame]	674	Compare two sequences of lines, and generate the delta (a sequence of lines).
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	675
Serhiy Storchaka	bfdcd43	2013-10-13 23:09:14 +0300	[diff] [blame]	676	Each sequence must contain individual single-line strings ending with
				677	newlines. Such sequences can be obtained from the
				678	:meth:`~io.IOBase.readlines` method of file-like objects. The delta
				679	generated also consists of newline-terminated strings, ready to be
				680	printed as-is via the :meth:`~io.IOBase.writelines` method of a
				681	file-like object.
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	682
				683
				684	.. _differ-examples:
				685
				686	Differ Example
				687	--------------
				688
				689	This example compares two texts. First we set up the texts, sequences of
				690	individual single-line strings ending with newlines (such sequences can also be
Serhiy Storchaka	bfdcd43	2013-10-13 23:09:14 +0300	[diff] [blame]	691	obtained from the :meth:`~io.BaseIO.readlines` method of file-like objects):
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	692
				693	>>> text1 = ''' 1. Beautiful is better than ugly.
				694	... 2. Explicit is better than implicit.
				695	... 3. Simple is better than complex.
				696	... 4. Complex is better than complicated.
Terry Jan Reedy	bddecc3	2014-04-18 17:00:19 -0400	[diff] [blame]	697	... '''.splitlines(keepends=True)
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	698	>>> len(text1)
				699	4
				700	>>> text1[0][-1]
				701	'\n'
				702	>>> text2 = ''' 1. Beautiful is better than ugly.
				703	... 3. Simple is better than complex.
				704	... 4. Complicated is better than complex.
				705	... 5. Flat is better than nested.
Terry Jan Reedy	bddecc3	2014-04-18 17:00:19 -0400	[diff] [blame]	706	... '''.splitlines(keepends=True)
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	707
Christian Heimes	fe337bf	2008-03-23 21:54:12 +0000	[diff] [blame]	708	Next we instantiate a Differ object:
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	709
				710	>>> d = Differ()
				711
				712	Note that when instantiating a :class:`Differ` object we may pass functions to
				713	filter out line and character "junk." See the :meth:`Differ` constructor for
				714	details.
				715
Christian Heimes	fe337bf	2008-03-23 21:54:12 +0000	[diff] [blame]	716	Finally, we compare the two:
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	717
				718	>>> result = list(d.compare(text1, text2))
				719
Christian Heimes	fe337bf	2008-03-23 21:54:12 +0000	[diff] [blame]	720	``result`` is a list of strings, so let's pretty-print it:
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	721
				722	>>> from pprint import pprint
				723	>>> pprint(result)
				724	[' 1. Beautiful is better than ugly.\n',
				725	'- 2. Explicit is better than implicit.\n',
				726	'- 3. Simple is better than complex.\n',
				727	'+ 3. Simple is better than complex.\n',
Christian Heimes	fe337bf	2008-03-23 21:54:12 +0000	[diff] [blame]	728	'? ++\n',
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	729	'- 4. Complex is better than complicated.\n',
Christian Heimes	fe337bf	2008-03-23 21:54:12 +0000	[diff] [blame]	730	'? ^ ---- ^\n',
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	731	'+ 4. Complicated is better than complex.\n',
Christian Heimes	fe337bf	2008-03-23 21:54:12 +0000	[diff] [blame]	732	'? ++++ ^ ^\n',
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	733	'+ 5. Flat is better than nested.\n']
				734
Christian Heimes	fe337bf	2008-03-23 21:54:12 +0000	[diff] [blame]	735	As a single multi-line string it looks like this:
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	736
				737	>>> import sys
				738	>>> sys.stdout.writelines(result)
				739	1. Beautiful is better than ugly.
				740	- 2. Explicit is better than implicit.
				741	- 3. Simple is better than complex.
				742	+ 3. Simple is better than complex.
				743	? ++
				744	- 4. Complex is better than complicated.
				745	? ^ ---- ^
				746	+ 4. Complicated is better than complex.
				747	? ++++ ^ ^
				748	+ 5. Flat is better than nested.
				749
Christian Heimes	8640e74	2008-02-23 16:23:06 +0000	[diff] [blame]	750
				751	.. _difflib-interface:
				752
				753	A command-line interface to difflib
				754	-----------------------------------
				755
				756	This example shows how to use difflib to create a ``diff``-like utility.
				757	It is also contained in the Python source distribution, as
				758	:file:`Tools/scripts/diff.py`.
				759
Berker Peksag	707deb9	2015-07-30 00:03:48 +0300	[diff] [blame]	760	.. literalinclude:: ../../Tools/scripts/diff.py