Blame - Doc/library/difflib.rst - platform/external/python/cpython3

blob: 6743bdc1d87b9d5d60ce5cc3f7d8da57c81d0beb [file] [log] [blame]

Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	1	:mod:`difflib` --- Helpers for computing deltas
				2	===============================================
				3
				4	.. module:: difflib
				5	:synopsis: Helpers for computing differences between objects.
Terry Jan Reedy	fa089b9	2016-06-11 15:02:54 -0400	[diff] [blame]	6
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	7	.. moduleauthor:: Tim Peters <tim_one@users.sourceforge.net>
				8	.. sectionauthor:: Tim Peters <tim_one@users.sourceforge.net>
Christian Heimes	5b5e81c	2007-12-31 16:14:33 +0000	[diff] [blame]	9	.. Markup by Fred L. Drake, Jr. <fdrake@acm.org>
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	10
Andrew Kuchling	2e3743c	2014-03-19 16:23:01 -0400	[diff] [blame]	11	Source code: :source:`Lib/difflib.py`
				12
Christian Heimes	fe337bf	2008-03-23 21:54:12 +0000	[diff] [blame]	13	.. testsetup::
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	14
Christian Heimes	fe337bf	2008-03-23 21:54:12 +0000	[diff] [blame]	15	import sys
				16	from difflib import *
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	17
Terry Jan Reedy	fa089b9	2016-06-11 15:02:54 -0400	[diff] [blame]	18	--------------
				19
Georg Brandl	9afde1c	2007-11-01 20:32:30 +0000	[diff] [blame]	20	This module provides classes and functions for comparing sequences. It
				21	can be used for example, for comparing files, and can produce difference
				22	information in various formats, including HTML and context and unified
				23	diffs. For comparing directories and files, see also, the :mod:`filecmp` module.
				24
Terry Reedy	99f9637	2010-11-25 06:12:34 +0000	[diff] [blame]	25
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	26	.. class:: SequenceMatcher
				27
				28	This is a flexible class for comparing pairs of sequences of any type, so long
Guido van Rossum	2cc30da	2007-11-02 23:46:40 +0000	[diff] [blame]	29	as the sequence elements are :term:`hashable`. The basic algorithm predates, and is a
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	30	little fancier than, an algorithm published in the late 1980's by Ratcliff and
				31	Obershelp under the hyperbolic name "gestalt pattern matching." The idea is to
				32	find the longest contiguous matching subsequence that contains no "junk"
Andrew Kuchling	c51da2b	2014-03-19 16:43:06 -0400	[diff] [blame]	33	elements; these "junk" elements are ones that are uninteresting in some
				34	sense, such as blank lines or whitespace. (Handling junk is an
				35	extension to the Ratcliff and Obershelp algorithm.) The same
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	36	idea is then applied recursively to the pieces of the sequences to the left and
				37	to the right of the matching subsequence. This does not yield minimal edit
				38	sequences, but does tend to yield matches that "look right" to people.
				39
				40	Timing: The basic Ratcliff-Obershelp algorithm is cubic time in the worst
				41	case and quadratic time in the expected case. :class:`SequenceMatcher` is
				42	quadratic time for the worst case and has expected-case behavior dependent in a
				43	complicated way on how many elements the sequences have in common; best case
				44	time is linear.
				45
Terry Reedy	99f9637	2010-11-25 06:12:34 +0000	[diff] [blame]	46	Automatic junk heuristic: :class:`SequenceMatcher` supports a heuristic that
				47	automatically treats certain sequence items as junk. The heuristic counts how many
				48	times each individual item appears in the sequence. If an item's duplicates (after
				49	the first one) account for more than 1% of the sequence and the sequence is at least
				50	200 items long, this item is marked as "popular" and is treated as junk for
				51	the purpose of sequence matching. This heuristic can be turned off by setting
				52	the ``autojunk`` argument to ``False`` when creating the :class:`SequenceMatcher`.
				53
Terry Reedy	dc9b17d	2010-11-27 20:52:14 +0000	[diff] [blame]	54	.. versionadded:: 3.2
				55	The autojunk parameter.
				56
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	57
				58	.. class:: Differ
				59
				60	This is a class for comparing sequences of lines of text, and producing
				61	human-readable differences or deltas. Differ uses :class:`SequenceMatcher`
				62	both to compare sequences of lines, and to compare sequences of characters
				63	within similar (near-matching) lines.
				64
				65	Each line of a :class:`Differ` delta begins with a two-letter code:
				66
				67	+----------+-------------------------------------------+
				68	\| Code \| Meaning \|
				69	+==========+===========================================+
				70	\| ``'- '`` \| line unique to sequence 1 \|
				71	+----------+-------------------------------------------+
				72	\| ``'+ '`` \| line unique to sequence 2 \|
				73	+----------+-------------------------------------------+
				74	\| ``' '`` \| line common to both sequences \|
				75	+----------+-------------------------------------------+
				76	\| ``'? '`` \| line not present in either input sequence \|
				77	+----------+-------------------------------------------+
				78
				79	Lines beginning with '``?``' attempt to guide the eye to intraline differences,
				80	and were not present in either input sequence. These lines can be confusing if
				81	the sequences contain tab characters.
				82
				83
				84	.. class:: HtmlDiff
				85
				86	This class can be used to create an HTML table (or a complete HTML file
				87	containing the table) showing a side by side, line by line comparison of text
				88	with inter-line and intra-line change highlights. The table can be generated in
				89	either full or contextual difference mode.
				90
				91	The constructor for this class is:
				92
				93
Georg Brandl	c2a4f4f	2009-04-10 09:03:43 +0000	[diff] [blame]	94	.. method:: __init__(tabsize=8, wrapcolumn=None, linejunk=None, charjunk=IS_CHARACTER_JUNK)
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	95
				96	Initializes instance of :class:`HtmlDiff`.
				97
				98	tabsize is an optional keyword argument to specify tab stop spacing and
				99	defaults to ``8``.
				100
				101	wrapcolumn is an optional keyword to specify column number where lines are
				102	broken and wrapped, defaults to ``None`` where lines are not wrapped.
				103
Terry Jan Reedy	3e8a7ad	2015-10-30 19:41:16 -0400	[diff] [blame]	104	linejunk and charjunk are optional keyword arguments passed into :func:`ndiff`
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	105	(used by :class:`HtmlDiff` to generate the side by side HTML differences). See
Terry Jan Reedy	3e8a7ad	2015-10-30 19:41:16 -0400	[diff] [blame]	106	:func:`ndiff` documentation for argument default values and descriptions.
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	107
				108	The following methods are public:
				109
Berker Peksag	102029d	2015-03-15 01:18:47 +0200	[diff] [blame]	110	.. method:: make_file(fromlines, tolines, fromdesc='', todesc='', context=False, \
				111	numlines=5, *, charset='utf-8')
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	112
				113	Compares fromlines and tolines (lists of strings) and returns a string which
				114	is a complete HTML file containing a table showing line by line differences with
				115	inter-line and intra-line changes highlighted.
				116
				117	fromdesc and todesc are optional keyword arguments to specify from/to file
				118	column header strings (both default to an empty string).
				119
				120	context and numlines are both optional keyword arguments. Set context to
				121	``True`` when contextual differences are to be shown, else the default is
				122	``False`` to show the full files. numlines defaults to ``5``. When context
				123	is ``True`` numlines controls the number of context lines which surround the
				124	difference highlights. When context is ``False`` numlines controls the
				125	number of lines which are shown before a difference highlight when using the
				126	"next" hyperlinks (setting to zero would cause the "next" hyperlinks to place
				127	the next difference highlight at the top of the browser without any leading
				128	context).
				129
Berker Peksag	102029d	2015-03-15 01:18:47 +0200	[diff] [blame]	130	.. versionchanged:: 3.5
				131	charset keyword-only argument was added. The default charset of
				132	HTML document changed from ``'ISO-8859-1'`` to ``'utf-8'``.
				133
Georg Brandl	c2a4f4f	2009-04-10 09:03:43 +0000	[diff] [blame]	134	.. method:: make_table(fromlines, tolines, fromdesc='', todesc='', context=False, numlines=5)
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	135
				136	Compares fromlines and tolines (lists of strings) and returns a string which
				137	is a complete HTML table showing line by line differences with inter-line and
				138	intra-line changes highlighted.
				139
				140	The arguments for this method are the same as those for the :meth:`make_file`
				141	method.
				142
				143	:file:`Tools/scripts/diff.py` is a command-line front-end to this class and
				144	contains a good example of its use.
				145
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	146
Georg Brandl	c2a4f4f	2009-04-10 09:03:43 +0000	[diff] [blame]	147	.. function:: context_diff(a, b, fromfile='', tofile='', fromfiledate='', tofiledate='', n=3, lineterm='\\n')
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	148
Georg Brandl	9afde1c	2007-11-01 20:32:30 +0000	[diff] [blame]	149	Compare a and b (lists of strings); return a delta (a :term:`generator`
				150	generating the delta lines) in context diff format.
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	151
				152	Context diffs are a compact way of showing just the lines that have changed plus
				153	a few lines of context. The changes are shown in a before/after style. The
				154	number of context lines is set by n which defaults to three.
				155
				156	By default, the diff control lines (those with ``***`` or ``---``) are created
				157	with a trailing newline. This is helpful so that inputs created from
Serhiy Storchaka	bfdcd43	2013-10-13 23:09:14 +0300	[diff] [blame]	158	:func:`io.IOBase.readlines` result in diffs that are suitable for use with
				159	:func:`io.IOBase.writelines` since both the inputs and outputs have trailing
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	160	newlines.
				161
				162	For inputs that do not have trailing newlines, set the lineterm argument to
				163	``""`` so that the output will be uniformly newline free.
				164
				165	The context diff format normally has a header for filenames and modification
				166	times. Any or all of these may be specified using strings for fromfile,
R. David Murray	b2416e5	2010-04-12 16:58:02 +0000	[diff] [blame]	167	tofile, fromfiledate, and tofiledate. The modification times are normally
				168	expressed in the ISO 8601 format. If not specified, the
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	169	strings default to blanks.
				170
Christian Heimes	8640e74	2008-02-23 16:23:06 +0000	[diff] [blame]	171	>>> s1 = ['bacon\n', 'eggs\n', 'ham\n', 'guido\n']
				172	>>> s2 = ['python\n', 'eggy\n', 'hamster\n', 'guido\n']
Zachary Ware	9f8b3a0	2016-08-10 00:59:59 -0500	[diff] [blame]	173	>>> sys.stdout.writelines(context_diff(s1, s2, fromfile='before.py', tofile='after.py'))
Christian Heimes	8640e74	2008-02-23 16:23:06 +0000	[diff] [blame]	174	*** before.py
				175	--- after.py
				176	***************
				177	* 1,4 **
				178	! bacon
				179	! eggs
				180	! ham
				181	guido
				182	--- 1,4 ----
				183	! python
				184	! eggy
				185	! hamster
				186	guido
				187
				188	See :ref:`difflib-interface` for a more detailed example.
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	189
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	190
Georg Brandl	c2a4f4f	2009-04-10 09:03:43 +0000	[diff] [blame]	191	.. function:: get_close_matches(word, possibilities, n=3, cutoff=0.6)
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	192
				193	Return a list of the best "good enough" matches. word is a sequence for which
				194	close matches are desired (typically a string), and possibilities is a list of
				195	sequences against which to match word (typically a list of strings).
				196
				197	Optional argument n (default ``3``) is the maximum number of close matches to
				198	return; n must be greater than ``0``.
				199
				200	Optional argument cutoff (default ``0.6``) is a float in the range [0, 1].
				201	Possibilities that don't score at least that similar to word are ignored.
				202
				203	The best (no more than n) matches among the possibilities are returned in a
Christian Heimes	fe337bf	2008-03-23 21:54:12 +0000	[diff] [blame]	204	list, sorted by similarity score, most similar first.
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	205
				206	>>> get_close_matches('appel', ['ape', 'apple', 'peach', 'puppy'])
				207	['apple', 'ape']
				208	>>> import keyword
				209	>>> get_close_matches('wheel', keyword.kwlist)
				210	['while']
Zachary Ware	9f8b3a0	2016-08-10 00:59:59 -0500	[diff] [blame]	211	>>> get_close_matches('pineapple', keyword.kwlist)
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	212	[]
				213	>>> get_close_matches('accept', keyword.kwlist)
				214	['except']
				215
				216
Georg Brandl	c2a4f4f	2009-04-10 09:03:43 +0000	[diff] [blame]	217	.. function:: ndiff(a, b, linejunk=None, charjunk=IS_CHARACTER_JUNK)
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	218
Georg Brandl	9afde1c	2007-11-01 20:32:30 +0000	[diff] [blame]	219	Compare a and b (lists of strings); return a :class:`Differ`\ -style
				220	delta (a :term:`generator` generating the delta lines).
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	221
Andrew Kuchling	c51da2b	2014-03-19 16:43:06 -0400	[diff] [blame]	222	Optional keyword parameters linejunk and charjunk are filtering functions
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	223	(or ``None``):
				224
Georg Brandl	e6bcc91	2008-05-12 18:05:20 +0000	[diff] [blame]	225	linejunk: A function that accepts a single string argument, and returns
				226	true if the string is junk, or false if not. The default is ``None``. There
				227	is also a module-level function :func:`IS_LINE_JUNK`, which filters out lines
				228	without visible characters, except for at most one pound character (``'#'``)
				229	-- however the underlying :class:`SequenceMatcher` class does a dynamic
				230	analysis of which lines are so frequent as to constitute noise, and this
				231	usually works better than using this function.
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	232
				233	charjunk: A function that accepts a character (a string of length 1), and
				234	returns if the character is junk, or false if not. The default is module-level
				235	function :func:`IS_CHARACTER_JUNK`, which filters out whitespace characters (a
Andrew Kuchling	c51da2b	2014-03-19 16:43:06 -0400	[diff] [blame]	236	blank or tab; it's a bad idea to include newline in this!).
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	237
Christian Heimes	fe337bf	2008-03-23 21:54:12 +0000	[diff] [blame]	238	:file:`Tools/scripts/ndiff.py` is a command-line front-end to this function.
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	239
Terry Jan Reedy	bddecc3	2014-04-18 17:00:19 -0400	[diff] [blame]	240	>>> diff = ndiff('one\ntwo\nthree\n'.splitlines(keepends=True),
				241	... 'ore\ntree\nemu\n'.splitlines(keepends=True))
Georg Brandl	6911e3c	2007-09-04 07:15:32 +0000	[diff] [blame]	242	>>> print(''.join(diff), end="")
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	243	- one
				244	? ^
				245	+ ore
				246	? ^
				247	- two
				248	- three
				249	? -
				250	+ tree
				251	+ emu
				252
				253
				254	.. function:: restore(sequence, which)
				255
				256	Return one of the two sequences that generated a delta.
				257
				258	Given a sequence produced by :meth:`Differ.compare` or :func:`ndiff`, extract
				259	lines originating from file 1 or 2 (parameter which), stripping off line
				260	prefixes.
				261
Christian Heimes	fe337bf	2008-03-23 21:54:12 +0000	[diff] [blame]	262	Example:
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	263
Terry Jan Reedy	bddecc3	2014-04-18 17:00:19 -0400	[diff] [blame]	264	>>> diff = ndiff('one\ntwo\nthree\n'.splitlines(keepends=True),
				265	... 'ore\ntree\nemu\n'.splitlines(keepends=True))
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	266	>>> diff = list(diff) # materialize the generated delta into a list
Georg Brandl	6911e3c	2007-09-04 07:15:32 +0000	[diff] [blame]	267	>>> print(''.join(restore(diff, 1)), end="")
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	268	one
				269	two
				270	three
Georg Brandl	6911e3c	2007-09-04 07:15:32 +0000	[diff] [blame]	271	>>> print(''.join(restore(diff, 2)), end="")
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	272	ore
				273	tree
				274	emu
				275
				276
Georg Brandl	c2a4f4f	2009-04-10 09:03:43 +0000	[diff] [blame]	277	.. function:: unified_diff(a, b, fromfile='', tofile='', fromfiledate='', tofiledate='', n=3, lineterm='\\n')
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	278
Georg Brandl	9afde1c	2007-11-01 20:32:30 +0000	[diff] [blame]	279	Compare a and b (lists of strings); return a delta (a :term:`generator`
				280	generating the delta lines) in unified diff format.
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	281
				282	Unified diffs are a compact way of showing just the lines that have changed plus
Martin Panter	7462b649	2015-11-02 03:37:02 +0000	[diff] [blame]	283	a few lines of context. The changes are shown in an inline style (instead of
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	284	separate before/after blocks). The number of context lines is set by n which
				285	defaults to three.
				286
				287	By default, the diff control lines (those with ``---``, ``+++``, or ``@@``) are
				288	created with a trailing newline. This is helpful so that inputs created from
Serhiy Storchaka	bfdcd43	2013-10-13 23:09:14 +0300	[diff] [blame]	289	:func:`io.IOBase.readlines` result in diffs that are suitable for use with
				290	:func:`io.IOBase.writelines` since both the inputs and outputs have trailing
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	291	newlines.
				292
				293	For inputs that do not have trailing newlines, set the lineterm argument to
				294	``""`` so that the output will be uniformly newline free.
				295
				296	The context diff format normally has a header for filenames and modification
				297	times. Any or all of these may be specified using strings for fromfile,
R. David Murray	b2416e5	2010-04-12 16:58:02 +0000	[diff] [blame]	298	tofile, fromfiledate, and tofiledate. The modification times are normally
				299	expressed in the ISO 8601 format. If not specified, the
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	300	strings default to blanks.
				301
Christian Heimes	8640e74	2008-02-23 16:23:06 +0000	[diff] [blame]	302
				303	>>> s1 = ['bacon\n', 'eggs\n', 'ham\n', 'guido\n']
				304	>>> s2 = ['python\n', 'eggy\n', 'hamster\n', 'guido\n']
Zachary Ware	9f8b3a0	2016-08-10 00:59:59 -0500	[diff] [blame]	305	>>> sys.stdout.writelines(unified_diff(s1, s2, fromfile='before.py', tofile='after.py'))
Christian Heimes	8640e74	2008-02-23 16:23:06 +0000	[diff] [blame]	306	--- before.py
				307	+++ after.py
				308	@@ -1,4 +1,4 @@
				309	-bacon
				310	-eggs
				311	-ham
				312	+python
				313	+eggy
				314	+hamster
				315	guido
				316
				317	See :ref:`difflib-interface` for a more detailed example.
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	318
Greg Ward	4d9d256	2015-04-20 20:21:21 -0400	[diff] [blame]	319	.. function:: diff_bytes(dfunc, a, b, fromfile=b'', tofile=b'', fromfiledate=b'', tofiledate=b'', n=3, lineterm=b'\\n')
				320
				321	Compare a and b (lists of bytes objects) using dfunc; yield a
				322	sequence of delta lines (also bytes) in the format returned by dfunc.
				323	dfunc must be a callable, typically either :func:`unified_diff` or
				324	:func:`context_diff`.
				325
				326	Allows you to compare data with unknown or inconsistent encoding. All
				327	inputs except n must be bytes objects, not str. Works by losslessly
				328	converting all inputs (except n) to str, and calling ``dfunc(a, b,
				329	fromfile, tofile, fromfiledate, tofiledate, n, lineterm)``. The output of
				330	dfunc is then converted back to bytes, so the delta lines that you
				331	receive have the same unknown/inconsistent encodings as a and b.
				332
				333	.. versionadded:: 3.5
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	334
				335	.. function:: IS_LINE_JUNK(line)
				336
				337	Return true for ignorable lines. The line line is ignorable if line is
				338	blank or contains a single ``'#'``, otherwise it is not ignorable. Used as a
Georg Brandl	e6bcc91	2008-05-12 18:05:20 +0000	[diff] [blame]	339	default for parameter linejunk in :func:`ndiff` in older versions.
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	340
				341
				342	.. function:: IS_CHARACTER_JUNK(ch)
				343
				344	Return true for ignorable characters. The character ch is ignorable if ch
				345	is a space or tab, otherwise it is not ignorable. Used as a default for
				346	parameter charjunk in :func:`ndiff`.
				347
				348
				349	.. seealso::
				350
Georg Brandl	525d355	2014-10-29 10:26:56 +0100	[diff] [blame]	351	`Pattern Matching: The Gestalt Approach <http://www.drdobbs.com/database/pattern-matching-the-gestalt-approach/184407970>`_
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	352	Discussion of a similar algorithm by John W. Ratcliff and D. E. Metzener. This
Georg Brandl	525d355	2014-10-29 10:26:56 +0100	[diff] [blame]	353	was published in `Dr. Dobb's Journal <http://www.drdobbs.com/>`_ in July, 1988.
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	354
				355
				356	.. _sequence-matcher:
				357
				358	SequenceMatcher Objects
				359	-----------------------
				360
				361	The :class:`SequenceMatcher` class has this constructor:
				362
				363
Terry Reedy	99f9637	2010-11-25 06:12:34 +0000	[diff] [blame]	364	.. class:: SequenceMatcher(isjunk=None, a='', b='', autojunk=True)
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	365
				366	Optional argument isjunk must be ``None`` (the default) or a one-argument
				367	function that takes a sequence element and returns true if and only if the
				368	element is "junk" and should be ignored. Passing ``None`` for isjunk is
				369	equivalent to passing ``lambda x: 0``; in other words, no elements are ignored.
				370	For example, pass::
				371
				372	lambda x: x in " \t"
				373
				374	if you're comparing lines as sequences of characters, and don't want to synch up
				375	on blanks or hard tabs.
				376
				377	The optional arguments a and b are sequences to be compared; both default to
Guido van Rossum	2cc30da	2007-11-02 23:46:40 +0000	[diff] [blame]	378	empty strings. The elements of both sequences must be :term:`hashable`.
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	379
Terry Reedy	99f9637	2010-11-25 06:12:34 +0000	[diff] [blame]	380	The optional argument autojunk can be used to disable the automatic junk
				381	heuristic.
				382
Terry Reedy	dc9b17d	2010-11-27 20:52:14 +0000	[diff] [blame]	383	.. versionadded:: 3.2
				384	The autojunk parameter.
				385
Terry Reedy	74a7c67	2010-12-03 18:57:42 +0000	[diff] [blame]	386	SequenceMatcher objects get three data attributes: bjunk is the
Serhiy Storchaka	fbc1c26	2013-11-29 12:17:13 +0200	[diff] [blame]	387	set of elements of b for which isjunk is ``True``; bpopular is the set of
Terry Reedy	17a5925	2010-12-15 20:18:10 +0000	[diff] [blame]	388	non-junk elements considered popular by the heuristic (if it is not
				389	disabled); b2j is a dict mapping the remaining elements of b to a list
				390	of positions where they occur. All three are reset whenever b is reset
				391	with :meth:`set_seqs` or :meth:`set_seq2`.
Terry Reedy	74a7c67	2010-12-03 18:57:42 +0000	[diff] [blame]	392
Georg Brandl	500be24	2010-12-03 19:56:42 +0000	[diff] [blame]	393	.. versionadded:: 3.2
Terry Reedy	74a7c67	2010-12-03 18:57:42 +0000	[diff] [blame]	394	The bjunk and bpopular attributes.
				395
Benjamin Peterson	e41251e	2008-04-25 01:59:09 +0000	[diff] [blame]	396	:class:`SequenceMatcher` objects have the following methods:
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	397
Benjamin Peterson	e41251e	2008-04-25 01:59:09 +0000	[diff] [blame]	398	.. method:: set_seqs(a, b)
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	399
Benjamin Peterson	e41251e	2008-04-25 01:59:09 +0000	[diff] [blame]	400	Set the two sequences to be compared.
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	401
Benjamin Peterson	e41251e	2008-04-25 01:59:09 +0000	[diff] [blame]	402	:class:`SequenceMatcher` computes and caches detailed information about the
				403	second sequence, so if you want to compare one sequence against many
				404	sequences, use :meth:`set_seq2` to set the commonly used sequence once and
				405	call :meth:`set_seq1` repeatedly, once for each of the other sequences.
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	406
				407
Benjamin Peterson	e41251e	2008-04-25 01:59:09 +0000	[diff] [blame]	408	.. method:: set_seq1(a)
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	409
Benjamin Peterson	e41251e	2008-04-25 01:59:09 +0000	[diff] [blame]	410	Set the first sequence to be compared. The second sequence to be compared
				411	is not changed.
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	412
				413
Benjamin Peterson	e41251e	2008-04-25 01:59:09 +0000	[diff] [blame]	414	.. method:: set_seq2(b)
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	415
Benjamin Peterson	e41251e	2008-04-25 01:59:09 +0000	[diff] [blame]	416	Set the second sequence to be compared. The first sequence to be compared
				417	is not changed.
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	418
				419
Benjamin Peterson	e41251e	2008-04-25 01:59:09 +0000	[diff] [blame]	420	.. method:: find_longest_match(alo, ahi, blo, bhi)
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	421
Benjamin Peterson	e41251e	2008-04-25 01:59:09 +0000	[diff] [blame]	422	Find longest matching block in ``a[alo:ahi]`` and ``b[blo:bhi]``.
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	423
Benjamin Peterson	e41251e	2008-04-25 01:59:09 +0000	[diff] [blame]	424	If isjunk was omitted or ``None``, :meth:`find_longest_match` returns
				425	``(i, j, k)`` such that ``a[i:i+k]`` is equal to ``b[j:j+k]``, where ``alo
				426	<= i <= i+k <= ahi`` and ``blo <= j <= j+k <= bhi``. For all ``(i', j',
				427	k')`` meeting those conditions, the additional conditions ``k >= k'``, ``i
				428	<= i'``, and if ``i == i'``, ``j <= j'`` are also met. In other words, of
				429	all maximal matching blocks, return one that starts earliest in a, and
				430	of all those maximal matching blocks that start earliest in a, return
				431	the one that starts earliest in b.
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	432
Benjamin Peterson	e41251e	2008-04-25 01:59:09 +0000	[diff] [blame]	433	>>> s = SequenceMatcher(None, " abcd", "abcd abcd")
				434	>>> s.find_longest_match(0, 5, 0, 9)
				435	Match(a=0, b=4, size=5)
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	436
Benjamin Peterson	e41251e	2008-04-25 01:59:09 +0000	[diff] [blame]	437	If isjunk was provided, first the longest matching block is determined
				438	as above, but with the additional restriction that no junk element appears
				439	in the block. Then that block is extended as far as possible by matching
				440	(only) junk elements on both sides. So the resulting block never matches
				441	on junk except as identical junk happens to be adjacent to an interesting
				442	match.
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	443
Benjamin Peterson	e41251e	2008-04-25 01:59:09 +0000	[diff] [blame]	444	Here's the same example as before, but considering blanks to be junk. That
				445	prevents ``' abcd'`` from matching the ``' abcd'`` at the tail end of the
				446	second sequence directly. Instead only the ``'abcd'`` can match, and
				447	matches the leftmost ``'abcd'`` in the second sequence:
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	448
Benjamin Peterson	e41251e	2008-04-25 01:59:09 +0000	[diff] [blame]	449	>>> s = SequenceMatcher(lambda x: x==" ", " abcd", "abcd abcd")
				450	>>> s.find_longest_match(0, 5, 0, 9)
				451	Match(a=1, b=0, size=4)
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	452
Benjamin Peterson	e41251e	2008-04-25 01:59:09 +0000	[diff] [blame]	453	If no blocks match, this returns ``(alo, blo, 0)``.
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	454
Benjamin Peterson	e41251e	2008-04-25 01:59:09 +0000	[diff] [blame]	455	This method returns a :term:`named tuple` ``Match(a, b, size)``.
Christian Heimes	25bb783	2008-01-11 16:17:00 +0000	[diff] [blame]	456
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	457
Benjamin Peterson	e41251e	2008-04-25 01:59:09 +0000	[diff] [blame]	458	.. method:: get_matching_blocks()
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	459
Benjamin Peterson	e41251e	2008-04-25 01:59:09 +0000	[diff] [blame]	460	Return list of triples describing matching subsequences. Each triple is of
				461	the form ``(i, j, n)``, and means that ``a[i:i+n] == b[j:j+n]``. The
				462	triples are monotonically increasing in i and j.
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	463
Benjamin Peterson	e41251e	2008-04-25 01:59:09 +0000	[diff] [blame]	464	The last triple is a dummy, and has the value ``(len(a), len(b), 0)``. It
				465	is the only triple with ``n == 0``. If ``(i, j, n)`` and ``(i', j', n')``
				466	are adjacent triples in the list, and the second is not the last triple in
				467	the list, then ``i+n != i'`` or ``j+n != j'``; in other words, adjacent
				468	triples always describe non-adjacent equal blocks.
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	469
Benjamin Peterson	e41251e	2008-04-25 01:59:09 +0000	[diff] [blame]	470	.. XXX Explain why a dummy is used!
Christian Heimes	5b5e81c	2007-12-31 16:14:33 +0000	[diff] [blame]	471
Benjamin Peterson	e41251e	2008-04-25 01:59:09 +0000	[diff] [blame]	472	.. doctest::
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	473
Benjamin Peterson	e41251e	2008-04-25 01:59:09 +0000	[diff] [blame]	474	>>> s = SequenceMatcher(None, "abxcd", "abcd")
				475	>>> s.get_matching_blocks()
				476	[Match(a=0, b=0, size=2), Match(a=3, b=2, size=2), Match(a=5, b=4, size=0)]
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	477
				478
Benjamin Peterson	e41251e	2008-04-25 01:59:09 +0000	[diff] [blame]	479	.. method:: get_opcodes()
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	480
Benjamin Peterson	e41251e	2008-04-25 01:59:09 +0000	[diff] [blame]	481	Return list of 5-tuples describing how to turn a into b. Each tuple is
				482	of the form ``(tag, i1, i2, j1, j2)``. The first tuple has ``i1 == j1 ==
				483	0``, and remaining tuples have i1 equal to the i2 from the preceding
				484	tuple, and, likewise, j1 equal to the previous j2.
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	485
Benjamin Peterson	e41251e	2008-04-25 01:59:09 +0000	[diff] [blame]	486	The tag values are strings, with these meanings:
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	487
Benjamin Peterson	e41251e	2008-04-25 01:59:09 +0000	[diff] [blame]	488	+---------------+---------------------------------------------+
				489	\| Value \| Meaning \|
				490	+===============+=============================================+
				491	\| ``'replace'`` \| ``a[i1:i2]`` should be replaced by \|
				492	\| \| ``b[j1:j2]``. \|
				493	+---------------+---------------------------------------------+
				494	\| ``'delete'`` \| ``a[i1:i2]`` should be deleted. Note that \|
				495	\| \| ``j1 == j2`` in this case. \|
				496	+---------------+---------------------------------------------+
				497	\| ``'insert'`` \| ``b[j1:j2]`` should be inserted at \|
				498	\| \| ``a[i1:i1]``. Note that ``i1 == i2`` in \|
				499	\| \| this case. \|
				500	+---------------+---------------------------------------------+
				501	\| ``'equal'`` \| ``a[i1:i2] == b[j1:j2]`` (the sub-sequences \|
				502	\| \| are equal). \|
				503	+---------------+---------------------------------------------+
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	504
Berker Peksag	eb2e02b	2016-03-11 23:19:48 +0200	[diff] [blame]	505	For example::
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	506
Benjamin Peterson	e41251e	2008-04-25 01:59:09 +0000	[diff] [blame]	507	>>> a = "qabxcd"
				508	>>> b = "abycdf"
				509	>>> s = SequenceMatcher(None, a, b)
				510	>>> for tag, i1, i2, j1, j2 in s.get_opcodes():
Berker Peksag	eb2e02b	2016-03-11 23:19:48 +0200	[diff] [blame]	511	... print('{:7} a[{}:{}] --> b[{}:{}] {!r:>8} --> {!r}'.format(
				512	... tag, i1, i2, j1, j2, a[i1:i2], b[j1:j2]))
Raymond Hettinger	dbb677a	2011-04-09 19:41:00 -0700	[diff] [blame]	513	delete a[0:1] --> b[0:0] 'q' --> ''
				514	equal a[1:3] --> b[0:2] 'ab' --> 'ab'
				515	replace a[3:4] --> b[2:3] 'x' --> 'y'
				516	equal a[4:6] --> b[3:5] 'cd' --> 'cd'
				517	insert a[6:6] --> b[5:6] '' --> 'f'
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	518
				519
Georg Brandl	c2a4f4f	2009-04-10 09:03:43 +0000	[diff] [blame]	520	.. method:: get_grouped_opcodes(n=3)
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	521
Benjamin Peterson	e41251e	2008-04-25 01:59:09 +0000	[diff] [blame]	522	Return a :term:`generator` of groups with up to n lines of context.
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	523
Benjamin Peterson	e41251e	2008-04-25 01:59:09 +0000	[diff] [blame]	524	Starting with the groups returned by :meth:`get_opcodes`, this method
				525	splits out smaller change clusters and eliminates intervening ranges which
				526	have no changes.
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	527
Benjamin Peterson	e41251e	2008-04-25 01:59:09 +0000	[diff] [blame]	528	The groups are returned in the same format as :meth:`get_opcodes`.
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	529
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	530
Benjamin Peterson	e41251e	2008-04-25 01:59:09 +0000	[diff] [blame]	531	.. method:: ratio()
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	532
Benjamin Peterson	e41251e	2008-04-25 01:59:09 +0000	[diff] [blame]	533	Return a measure of the sequences' similarity as a float in the range [0,
				534	1].
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	535
Benjamin Peterson	e41251e	2008-04-25 01:59:09 +0000	[diff] [blame]	536	Where T is the total number of elements in both sequences, and M is the
				537	number of matches, this is 2.0\*M / T. Note that this is ``1.0`` if the
				538	sequences are identical, and ``0.0`` if they have nothing in common.
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	539
Benjamin Peterson	e41251e	2008-04-25 01:59:09 +0000	[diff] [blame]	540	This is expensive to compute if :meth:`get_matching_blocks` or
				541	:meth:`get_opcodes` hasn't already been called, in which case you may want
				542	to try :meth:`quick_ratio` or :meth:`real_quick_ratio` first to get an
				543	upper bound.
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	544
				545
Benjamin Peterson	e41251e	2008-04-25 01:59:09 +0000	[diff] [blame]	546	.. method:: quick_ratio()
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	547
Benjamin Peterson	e41251e	2008-04-25 01:59:09 +0000	[diff] [blame]	548	Return an upper bound on :meth:`ratio` relatively quickly.
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	549
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	550
Benjamin Peterson	e41251e	2008-04-25 01:59:09 +0000	[diff] [blame]	551	.. method:: real_quick_ratio()
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	552
Benjamin Peterson	e41251e	2008-04-25 01:59:09 +0000	[diff] [blame]	553	Return an upper bound on :meth:`ratio` very quickly.
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	554
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	555
				556	The three methods that return the ratio of matching to total characters can give
				557	different results due to differing levels of approximation, although
				558	:meth:`quick_ratio` and :meth:`real_quick_ratio` are always at least as large as
Christian Heimes	fe337bf	2008-03-23 21:54:12 +0000	[diff] [blame]	559	:meth:`ratio`:
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	560
				561	>>> s = SequenceMatcher(None, "abcd", "bcde")
				562	>>> s.ratio()
				563	0.75
				564	>>> s.quick_ratio()
				565	0.75
				566	>>> s.real_quick_ratio()
				567	1.0
				568
				569
				570	.. _sequencematcher-examples:
				571
				572	SequenceMatcher Examples
				573	------------------------
				574
Terry Reedy	74a7c67	2010-12-03 18:57:42 +0000	[diff] [blame]	575	This example compares two strings, considering blanks to be "junk":
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	576
				577	>>> s = SequenceMatcher(lambda x: x == " ",
				578	... "private Thread currentThread;",
				579	... "private volatile Thread currentThread;")
				580
				581	:meth:`ratio` returns a float in [0, 1], measuring the similarity of the
				582	sequences. As a rule of thumb, a :meth:`ratio` value over 0.6 means the
Christian Heimes	fe337bf	2008-03-23 21:54:12 +0000	[diff] [blame]	583	sequences are close matches:
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	584
Georg Brandl	6911e3c	2007-09-04 07:15:32 +0000	[diff] [blame]	585	>>> print(round(s.ratio(), 3))
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	586	0.866
				587
				588	If you're only interested in where the sequences match,
Christian Heimes	fe337bf	2008-03-23 21:54:12 +0000	[diff] [blame]	589	:meth:`get_matching_blocks` is handy:
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	590
				591	>>> for block in s.get_matching_blocks():
Georg Brandl	6911e3c	2007-09-04 07:15:32 +0000	[diff] [blame]	592	... print("a[%d] and b[%d] match for %d elements" % block)
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	593	a[0] and b[0] match for 8 elements
Christian Heimes	fe337bf	2008-03-23 21:54:12 +0000	[diff] [blame]	594	a[8] and b[17] match for 21 elements
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	595	a[29] and b[38] match for 0 elements
				596
				597	Note that the last tuple returned by :meth:`get_matching_blocks` is always a
				598	dummy, ``(len(a), len(b), 0)``, and this is the only case in which the last
				599	tuple element (number of elements matched) is ``0``.
				600
				601	If you want to know how to change the first sequence into the second, use
Christian Heimes	fe337bf	2008-03-23 21:54:12 +0000	[diff] [blame]	602	:meth:`get_opcodes`:
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	603
				604	>>> for opcode in s.get_opcodes():
Georg Brandl	6911e3c	2007-09-04 07:15:32 +0000	[diff] [blame]	605	... print("%6s a[%d:%d] b[%d:%d]" % opcode)
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	606	equal a[0:8] b[0:8]
				607	insert a[8:8] b[8:17]
Christian Heimes	fe337bf	2008-03-23 21:54:12 +0000	[diff] [blame]	608	equal a[8:29] b[17:38]
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	609
Raymond Hettinger	58c8c26	2009-04-27 21:01:21 +0000	[diff] [blame]	610	.. seealso::
				611
				612	* The :func:`get_close_matches` function in this module which shows how
				613	simple code building on :class:`SequenceMatcher` can be used to do useful
				614	work.
				615
				616	* `Simple version control recipe
Serhiy Storchaka	6dff020	2016-05-07 10:49:07 +0300	[diff] [blame]	617	<https://code.activestate.com/recipes/576729/>`_ for a small application
Raymond Hettinger	58c8c26	2009-04-27 21:01:21 +0000	[diff] [blame]	618	built with :class:`SequenceMatcher`.
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	619
				620
				621	.. _differ-objects:
				622
				623	Differ Objects
				624	--------------
				625
				626	Note that :class:`Differ`\ -generated deltas make no claim to be minimal
				627	diffs. To the contrary, minimal diffs are often counter-intuitive, because they
				628	synch up anywhere possible, sometimes accidental matches 100 pages apart.
				629	Restricting synch points to contiguous matches preserves some notion of
				630	locality, at the occasional cost of producing a longer diff.
				631
				632	The :class:`Differ` class has this constructor:
				633
				634
Georg Brandl	c2a4f4f	2009-04-10 09:03:43 +0000	[diff] [blame]	635	.. class:: Differ(linejunk=None, charjunk=None)
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	636
				637	Optional keyword parameters linejunk and charjunk are for filter functions
				638	(or ``None``):
				639
				640	linejunk: A function that accepts a single string argument, and returns true
				641	if the string is junk. The default is ``None``, meaning that no line is
				642	considered junk.
				643
				644	charjunk: A function that accepts a single character argument (a string of
				645	length 1), and returns true if the character is junk. The default is ``None``,
				646	meaning that no character is considered junk.
				647
Andrew Kuchling	c51da2b	2014-03-19 16:43:06 -0400	[diff] [blame]	648	These junk-filtering functions speed up matching to find
				649	differences and do not cause any differing lines or characters to
				650	be ignored. Read the description of the
				651	:meth:`~SequenceMatcher.find_longest_match` method's isjunk
				652	parameter for an explanation.
				653
Benjamin Peterson	e41251e	2008-04-25 01:59:09 +0000	[diff] [blame]	654	:class:`Differ` objects are used (deltas generated) via a single method:
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	655
				656
Benjamin Peterson	e41251e	2008-04-25 01:59:09 +0000	[diff] [blame]	657	.. method:: Differ.compare(a, b)
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	658
Benjamin Peterson	e41251e	2008-04-25 01:59:09 +0000	[diff] [blame]	659	Compare two sequences of lines, and generate the delta (a sequence of lines).
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	660
Serhiy Storchaka	bfdcd43	2013-10-13 23:09:14 +0300	[diff] [blame]	661	Each sequence must contain individual single-line strings ending with
				662	newlines. Such sequences can be obtained from the
				663	:meth:`~io.IOBase.readlines` method of file-like objects. The delta
				664	generated also consists of newline-terminated strings, ready to be
				665	printed as-is via the :meth:`~io.IOBase.writelines` method of a
				666	file-like object.
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	667
				668
				669	.. _differ-examples:
				670
				671	Differ Example
				672	--------------
				673
				674	This example compares two texts. First we set up the texts, sequences of
				675	individual single-line strings ending with newlines (such sequences can also be
Serhiy Storchaka	bfdcd43	2013-10-13 23:09:14 +0300	[diff] [blame]	676	obtained from the :meth:`~io.BaseIO.readlines` method of file-like objects):
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	677
				678	>>> text1 = ''' 1. Beautiful is better than ugly.
				679	... 2. Explicit is better than implicit.
				680	... 3. Simple is better than complex.
				681	... 4. Complex is better than complicated.
Terry Jan Reedy	bddecc3	2014-04-18 17:00:19 -0400	[diff] [blame]	682	... '''.splitlines(keepends=True)
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	683	>>> len(text1)
				684	4
				685	>>> text1[0][-1]
				686	'\n'
				687	>>> text2 = ''' 1. Beautiful is better than ugly.
				688	... 3. Simple is better than complex.
				689	... 4. Complicated is better than complex.
				690	... 5. Flat is better than nested.
Terry Jan Reedy	bddecc3	2014-04-18 17:00:19 -0400	[diff] [blame]	691	... '''.splitlines(keepends=True)
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	692
Christian Heimes	fe337bf	2008-03-23 21:54:12 +0000	[diff] [blame]	693	Next we instantiate a Differ object:
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	694
				695	>>> d = Differ()
				696
				697	Note that when instantiating a :class:`Differ` object we may pass functions to
				698	filter out line and character "junk." See the :meth:`Differ` constructor for
				699	details.
				700
Christian Heimes	fe337bf	2008-03-23 21:54:12 +0000	[diff] [blame]	701	Finally, we compare the two:
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	702
				703	>>> result = list(d.compare(text1, text2))
				704
Christian Heimes	fe337bf	2008-03-23 21:54:12 +0000	[diff] [blame]	705	``result`` is a list of strings, so let's pretty-print it:
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	706
				707	>>> from pprint import pprint
				708	>>> pprint(result)
				709	[' 1. Beautiful is better than ugly.\n',
				710	'- 2. Explicit is better than implicit.\n',
				711	'- 3. Simple is better than complex.\n',
				712	'+ 3. Simple is better than complex.\n',
Christian Heimes	fe337bf	2008-03-23 21:54:12 +0000	[diff] [blame]	713	'? ++\n',
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	714	'- 4. Complex is better than complicated.\n',
Christian Heimes	fe337bf	2008-03-23 21:54:12 +0000	[diff] [blame]	715	'? ^ ---- ^\n',
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	716	'+ 4. Complicated is better than complex.\n',
Christian Heimes	fe337bf	2008-03-23 21:54:12 +0000	[diff] [blame]	717	'? ++++ ^ ^\n',
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	718	'+ 5. Flat is better than nested.\n']
				719
Christian Heimes	fe337bf	2008-03-23 21:54:12 +0000	[diff] [blame]	720	As a single multi-line string it looks like this:
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	721
				722	>>> import sys
				723	>>> sys.stdout.writelines(result)
				724	1. Beautiful is better than ugly.
				725	- 2. Explicit is better than implicit.
				726	- 3. Simple is better than complex.
				727	+ 3. Simple is better than complex.
				728	? ++
				729	- 4. Complex is better than complicated.
				730	? ^ ---- ^
				731	+ 4. Complicated is better than complex.
				732	? ++++ ^ ^
				733	+ 5. Flat is better than nested.
				734
Christian Heimes	8640e74	2008-02-23 16:23:06 +0000	[diff] [blame]	735
				736	.. _difflib-interface:
				737
				738	A command-line interface to difflib
				739	-----------------------------------
				740
				741	This example shows how to use difflib to create a ``diff``-like utility.
				742	It is also contained in the Python source distribution, as
				743	:file:`Tools/scripts/diff.py`.
				744
Berker Peksag	707deb9	2015-07-30 00:03:48 +0300	[diff] [blame]	745	.. literalinclude:: ../../Tools/scripts/diff.py