Blame - Doc/library/difflib.rst - platform/external/python/cpython3

blob: 59a6478d8d839e1ab59c7b12e7470a9012d06000 [file] [log] [blame]

Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	1	:mod:`difflib` --- Helpers for computing deltas
				2	===============================================
				3
				4	.. module:: difflib
				5	:synopsis: Helpers for computing differences between objects.
Terry Jan Reedy	fa089b9	2016-06-11 15:02:54 -0400	[diff] [blame]	6
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	7	.. moduleauthor:: Tim Peters <tim_one@users.sourceforge.net>
				8	.. sectionauthor:: Tim Peters <tim_one@users.sourceforge.net>
Christian Heimes	5b5e81c	2007-12-31 16:14:33 +0000	[diff] [blame]	9	.. Markup by Fred L. Drake, Jr. <fdrake@acm.org>
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	10
Andrew Kuchling	2e3743c	2014-03-19 16:23:01 -0400	[diff] [blame]	11	Source code: :source:`Lib/difflib.py`
				12
Christian Heimes	fe337bf	2008-03-23 21:54:12 +0000	[diff] [blame]	13	.. testsetup::
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	14
Christian Heimes	fe337bf	2008-03-23 21:54:12 +0000	[diff] [blame]	15	import sys
				16	from difflib import *
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	17
Terry Jan Reedy	fa089b9	2016-06-11 15:02:54 -0400	[diff] [blame]	18	--------------
				19
Georg Brandl	9afde1c	2007-11-01 20:32:30 +0000	[diff] [blame]	20	This module provides classes and functions for comparing sequences. It
				21	can be used for example, for comparing files, and can produce difference
				22	information in various formats, including HTML and context and unified
				23	diffs. For comparing directories and files, see also, the :mod:`filecmp` module.
				24
Terry Reedy	99f9637	2010-11-25 06:12:34 +0000	[diff] [blame]	25
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	26	.. class:: SequenceMatcher
				27
				28	This is a flexible class for comparing pairs of sequences of any type, so long
Guido van Rossum	2cc30da	2007-11-02 23:46:40 +0000	[diff] [blame]	29	as the sequence elements are :term:`hashable`. The basic algorithm predates, and is a
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	30	little fancier than, an algorithm published in the late 1980's by Ratcliff and
				31	Obershelp under the hyperbolic name "gestalt pattern matching." The idea is to
				32	find the longest contiguous matching subsequence that contains no "junk"
Andrew Kuchling	c51da2b	2014-03-19 16:43:06 -0400	[diff] [blame]	33	elements; these "junk" elements are ones that are uninteresting in some
				34	sense, such as blank lines or whitespace. (Handling junk is an
				35	extension to the Ratcliff and Obershelp algorithm.) The same
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	36	idea is then applied recursively to the pieces of the sequences to the left and
				37	to the right of the matching subsequence. This does not yield minimal edit
				38	sequences, but does tend to yield matches that "look right" to people.
				39
				40	Timing: The basic Ratcliff-Obershelp algorithm is cubic time in the worst
				41	case and quadratic time in the expected case. :class:`SequenceMatcher` is
				42	quadratic time for the worst case and has expected-case behavior dependent in a
				43	complicated way on how many elements the sequences have in common; best case
				44	time is linear.
				45
Terry Reedy	99f9637	2010-11-25 06:12:34 +0000	[diff] [blame]	46	Automatic junk heuristic: :class:`SequenceMatcher` supports a heuristic that
				47	automatically treats certain sequence items as junk. The heuristic counts how many
				48	times each individual item appears in the sequence. If an item's duplicates (after
				49	the first one) account for more than 1% of the sequence and the sequence is at least
				50	200 items long, this item is marked as "popular" and is treated as junk for
				51	the purpose of sequence matching. This heuristic can be turned off by setting
				52	the ``autojunk`` argument to ``False`` when creating the :class:`SequenceMatcher`.
				53
Terry Reedy	dc9b17d	2010-11-27 20:52:14 +0000	[diff] [blame]	54	.. versionadded:: 3.2
				55	The autojunk parameter.
				56
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	57
				58	.. class:: Differ
				59
				60	This is a class for comparing sequences of lines of text, and producing
				61	human-readable differences or deltas. Differ uses :class:`SequenceMatcher`
				62	both to compare sequences of lines, and to compare sequences of characters
				63	within similar (near-matching) lines.
				64
				65	Each line of a :class:`Differ` delta begins with a two-letter code:
				66
				67	+----------+-------------------------------------------+
				68	\| Code \| Meaning \|
				69	+==========+===========================================+
				70	\| ``'- '`` \| line unique to sequence 1 \|
				71	+----------+-------------------------------------------+
				72	\| ``'+ '`` \| line unique to sequence 2 \|
				73	+----------+-------------------------------------------+
				74	\| ``' '`` \| line common to both sequences \|
				75	+----------+-------------------------------------------+
				76	\| ``'? '`` \| line not present in either input sequence \|
				77	+----------+-------------------------------------------+
				78
				79	Lines beginning with '``?``' attempt to guide the eye to intraline differences,
				80	and were not present in either input sequence. These lines can be confusing if
				81	the sequences contain tab characters.
				82
				83
				84	.. class:: HtmlDiff
				85
				86	This class can be used to create an HTML table (or a complete HTML file
				87	containing the table) showing a side by side, line by line comparison of text
				88	with inter-line and intra-line change highlights. The table can be generated in
				89	either full or contextual difference mode.
				90
				91	The constructor for this class is:
				92
				93
Georg Brandl	c2a4f4f	2009-04-10 09:03:43 +0000	[diff] [blame]	94	.. method:: __init__(tabsize=8, wrapcolumn=None, linejunk=None, charjunk=IS_CHARACTER_JUNK)
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	95
				96	Initializes instance of :class:`HtmlDiff`.
				97
				98	tabsize is an optional keyword argument to specify tab stop spacing and
				99	defaults to ``8``.
				100
				101	wrapcolumn is an optional keyword to specify column number where lines are
				102	broken and wrapped, defaults to ``None`` where lines are not wrapped.
				103
Terry Jan Reedy	3e8a7ad	2015-10-30 19:41:16 -0400	[diff] [blame]	104	linejunk and charjunk are optional keyword arguments passed into :func:`ndiff`
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	105	(used by :class:`HtmlDiff` to generate the side by side HTML differences). See
Terry Jan Reedy	3e8a7ad	2015-10-30 19:41:16 -0400	[diff] [blame]	106	:func:`ndiff` documentation for argument default values and descriptions.
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	107
				108	The following methods are public:
				109
Berker Peksag	102029d	2015-03-15 01:18:47 +0200	[diff] [blame]	110	.. method:: make_file(fromlines, tolines, fromdesc='', todesc='', context=False, \
				111	numlines=5, *, charset='utf-8')
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	112
				113	Compares fromlines and tolines (lists of strings) and returns a string which
				114	is a complete HTML file containing a table showing line by line differences with
				115	inter-line and intra-line changes highlighted.
				116
				117	fromdesc and todesc are optional keyword arguments to specify from/to file
				118	column header strings (both default to an empty string).
				119
				120	context and numlines are both optional keyword arguments. Set context to
				121	``True`` when contextual differences are to be shown, else the default is
				122	``False`` to show the full files. numlines defaults to ``5``. When context
				123	is ``True`` numlines controls the number of context lines which surround the
				124	difference highlights. When context is ``False`` numlines controls the
				125	number of lines which are shown before a difference highlight when using the
				126	"next" hyperlinks (setting to zero would cause the "next" hyperlinks to place
				127	the next difference highlight at the top of the browser without any leading
				128	context).
				129
Berker Peksag	102029d	2015-03-15 01:18:47 +0200	[diff] [blame]	130	.. versionchanged:: 3.5
				131	charset keyword-only argument was added. The default charset of
				132	HTML document changed from ``'ISO-8859-1'`` to ``'utf-8'``.
				133
Georg Brandl	c2a4f4f	2009-04-10 09:03:43 +0000	[diff] [blame]	134	.. method:: make_table(fromlines, tolines, fromdesc='', todesc='', context=False, numlines=5)
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	135
				136	Compares fromlines and tolines (lists of strings) and returns a string which
				137	is a complete HTML table showing line by line differences with inter-line and
				138	intra-line changes highlighted.
				139
				140	The arguments for this method are the same as those for the :meth:`make_file`
				141	method.
				142
				143	:file:`Tools/scripts/diff.py` is a command-line front-end to this class and
				144	contains a good example of its use.
				145
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	146
Georg Brandl	c2a4f4f	2009-04-10 09:03:43 +0000	[diff] [blame]	147	.. function:: context_diff(a, b, fromfile='', tofile='', fromfiledate='', tofiledate='', n=3, lineterm='\\n')
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	148
Georg Brandl	9afde1c	2007-11-01 20:32:30 +0000	[diff] [blame]	149	Compare a and b (lists of strings); return a delta (a :term:`generator`
				150	generating the delta lines) in context diff format.
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	151
				152	Context diffs are a compact way of showing just the lines that have changed plus
				153	a few lines of context. The changes are shown in a before/after style. The
				154	number of context lines is set by n which defaults to three.
				155
				156	By default, the diff control lines (those with ``***`` or ``---``) are created
				157	with a trailing newline. This is helpful so that inputs created from
Serhiy Storchaka	bfdcd43	2013-10-13 23:09:14 +0300	[diff] [blame]	158	:func:`io.IOBase.readlines` result in diffs that are suitable for use with
				159	:func:`io.IOBase.writelines` since both the inputs and outputs have trailing
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	160	newlines.
				161
				162	For inputs that do not have trailing newlines, set the lineterm argument to
				163	``""`` so that the output will be uniformly newline free.
				164
				165	The context diff format normally has a header for filenames and modification
				166	times. Any or all of these may be specified using strings for fromfile,
R. David Murray	b2416e5	2010-04-12 16:58:02 +0000	[diff] [blame]	167	tofile, fromfiledate, and tofiledate. The modification times are normally
				168	expressed in the ISO 8601 format. If not specified, the
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	169	strings default to blanks.
				170
Christian Heimes	8640e74	2008-02-23 16:23:06 +0000	[diff] [blame]	171	>>> s1 = ['bacon\n', 'eggs\n', 'ham\n', 'guido\n']
				172	>>> s2 = ['python\n', 'eggy\n', 'hamster\n', 'guido\n']
				173	>>> for line in context_diff(s1, s2, fromfile='before.py', tofile='after.py'):
Christian Heimes	fe337bf	2008-03-23 21:54:12 +0000	[diff] [blame]	174	... sys.stdout.write(line) # doctest: +NORMALIZE_WHITESPACE
Christian Heimes	8640e74	2008-02-23 16:23:06 +0000	[diff] [blame]	175	*** before.py
				176	--- after.py
				177	***************
				178	* 1,4 **
				179	! bacon
				180	! eggs
				181	! ham
				182	guido
				183	--- 1,4 ----
				184	! python
				185	! eggy
				186	! hamster
				187	guido
				188
				189	See :ref:`difflib-interface` for a more detailed example.
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	190
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	191
Georg Brandl	c2a4f4f	2009-04-10 09:03:43 +0000	[diff] [blame]	192	.. function:: get_close_matches(word, possibilities, n=3, cutoff=0.6)
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	193
				194	Return a list of the best "good enough" matches. word is a sequence for which
				195	close matches are desired (typically a string), and possibilities is a list of
				196	sequences against which to match word (typically a list of strings).
				197
				198	Optional argument n (default ``3``) is the maximum number of close matches to
				199	return; n must be greater than ``0``.
				200
				201	Optional argument cutoff (default ``0.6``) is a float in the range [0, 1].
				202	Possibilities that don't score at least that similar to word are ignored.
				203
				204	The best (no more than n) matches among the possibilities are returned in a
Christian Heimes	fe337bf	2008-03-23 21:54:12 +0000	[diff] [blame]	205	list, sorted by similarity score, most similar first.
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	206
				207	>>> get_close_matches('appel', ['ape', 'apple', 'peach', 'puppy'])
				208	['apple', 'ape']
				209	>>> import keyword
				210	>>> get_close_matches('wheel', keyword.kwlist)
				211	['while']
				212	>>> get_close_matches('apple', keyword.kwlist)
				213	[]
				214	>>> get_close_matches('accept', keyword.kwlist)
				215	['except']
				216
				217
Georg Brandl	c2a4f4f	2009-04-10 09:03:43 +0000	[diff] [blame]	218	.. function:: ndiff(a, b, linejunk=None, charjunk=IS_CHARACTER_JUNK)
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	219
Georg Brandl	9afde1c	2007-11-01 20:32:30 +0000	[diff] [blame]	220	Compare a and b (lists of strings); return a :class:`Differ`\ -style
				221	delta (a :term:`generator` generating the delta lines).
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	222
Andrew Kuchling	c51da2b	2014-03-19 16:43:06 -0400	[diff] [blame]	223	Optional keyword parameters linejunk and charjunk are filtering functions
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	224	(or ``None``):
				225
Georg Brandl	e6bcc91	2008-05-12 18:05:20 +0000	[diff] [blame]	226	linejunk: A function that accepts a single string argument, and returns
				227	true if the string is junk, or false if not. The default is ``None``. There
				228	is also a module-level function :func:`IS_LINE_JUNK`, which filters out lines
				229	without visible characters, except for at most one pound character (``'#'``)
				230	-- however the underlying :class:`SequenceMatcher` class does a dynamic
				231	analysis of which lines are so frequent as to constitute noise, and this
				232	usually works better than using this function.
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	233
				234	charjunk: A function that accepts a character (a string of length 1), and
				235	returns if the character is junk, or false if not. The default is module-level
				236	function :func:`IS_CHARACTER_JUNK`, which filters out whitespace characters (a
Andrew Kuchling	c51da2b	2014-03-19 16:43:06 -0400	[diff] [blame]	237	blank or tab; it's a bad idea to include newline in this!).
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	238
Christian Heimes	fe337bf	2008-03-23 21:54:12 +0000	[diff] [blame]	239	:file:`Tools/scripts/ndiff.py` is a command-line front-end to this function.
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	240
Terry Jan Reedy	bddecc3	2014-04-18 17:00:19 -0400	[diff] [blame]	241	>>> diff = ndiff('one\ntwo\nthree\n'.splitlines(keepends=True),
				242	... 'ore\ntree\nemu\n'.splitlines(keepends=True))
Georg Brandl	6911e3c	2007-09-04 07:15:32 +0000	[diff] [blame]	243	>>> print(''.join(diff), end="")
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	244	- one
				245	? ^
				246	+ ore
				247	? ^
				248	- two
				249	- three
				250	? -
				251	+ tree
				252	+ emu
				253
				254
				255	.. function:: restore(sequence, which)
				256
				257	Return one of the two sequences that generated a delta.
				258
				259	Given a sequence produced by :meth:`Differ.compare` or :func:`ndiff`, extract
				260	lines originating from file 1 or 2 (parameter which), stripping off line
				261	prefixes.
				262
Christian Heimes	fe337bf	2008-03-23 21:54:12 +0000	[diff] [blame]	263	Example:
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	264
Terry Jan Reedy	bddecc3	2014-04-18 17:00:19 -0400	[diff] [blame]	265	>>> diff = ndiff('one\ntwo\nthree\n'.splitlines(keepends=True),
				266	... 'ore\ntree\nemu\n'.splitlines(keepends=True))
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	267	>>> diff = list(diff) # materialize the generated delta into a list
Georg Brandl	6911e3c	2007-09-04 07:15:32 +0000	[diff] [blame]	268	>>> print(''.join(restore(diff, 1)), end="")
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	269	one
				270	two
				271	three
Georg Brandl	6911e3c	2007-09-04 07:15:32 +0000	[diff] [blame]	272	>>> print(''.join(restore(diff, 2)), end="")
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	273	ore
				274	tree
				275	emu
				276
				277
Georg Brandl	c2a4f4f	2009-04-10 09:03:43 +0000	[diff] [blame]	278	.. function:: unified_diff(a, b, fromfile='', tofile='', fromfiledate='', tofiledate='', n=3, lineterm='\\n')
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	279
Georg Brandl	9afde1c	2007-11-01 20:32:30 +0000	[diff] [blame]	280	Compare a and b (lists of strings); return a delta (a :term:`generator`
				281	generating the delta lines) in unified diff format.
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	282
				283	Unified diffs are a compact way of showing just the lines that have changed plus
Martin Panter	7462b649	2015-11-02 03:37:02 +0000	[diff] [blame]	284	a few lines of context. The changes are shown in an inline style (instead of
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	285	separate before/after blocks). The number of context lines is set by n which
				286	defaults to three.
				287
				288	By default, the diff control lines (those with ``---``, ``+++``, or ``@@``) are
				289	created with a trailing newline. This is helpful so that inputs created from
Serhiy Storchaka	bfdcd43	2013-10-13 23:09:14 +0300	[diff] [blame]	290	:func:`io.IOBase.readlines` result in diffs that are suitable for use with
				291	:func:`io.IOBase.writelines` since both the inputs and outputs have trailing
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	292	newlines.
				293
				294	For inputs that do not have trailing newlines, set the lineterm argument to
				295	``""`` so that the output will be uniformly newline free.
				296
				297	The context diff format normally has a header for filenames and modification
				298	times. Any or all of these may be specified using strings for fromfile,
R. David Murray	b2416e5	2010-04-12 16:58:02 +0000	[diff] [blame]	299	tofile, fromfiledate, and tofiledate. The modification times are normally
				300	expressed in the ISO 8601 format. If not specified, the
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	301	strings default to blanks.
				302
Christian Heimes	8640e74	2008-02-23 16:23:06 +0000	[diff] [blame]	303
				304	>>> s1 = ['bacon\n', 'eggs\n', 'ham\n', 'guido\n']
				305	>>> s2 = ['python\n', 'eggy\n', 'hamster\n', 'guido\n']
				306	>>> for line in unified_diff(s1, s2, fromfile='before.py', tofile='after.py'):
Christian Heimes	fe337bf	2008-03-23 21:54:12 +0000	[diff] [blame]	307	... sys.stdout.write(line) # doctest: +NORMALIZE_WHITESPACE
Christian Heimes	8640e74	2008-02-23 16:23:06 +0000	[diff] [blame]	308	--- before.py
				309	+++ after.py
				310	@@ -1,4 +1,4 @@
				311	-bacon
				312	-eggs
				313	-ham
				314	+python
				315	+eggy
				316	+hamster
				317	guido
				318
				319	See :ref:`difflib-interface` for a more detailed example.
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	320
Greg Ward	4d9d256	2015-04-20 20:21:21 -0400	[diff] [blame]	321	.. function:: diff_bytes(dfunc, a, b, fromfile=b'', tofile=b'', fromfiledate=b'', tofiledate=b'', n=3, lineterm=b'\\n')
				322
				323	Compare a and b (lists of bytes objects) using dfunc; yield a
				324	sequence of delta lines (also bytes) in the format returned by dfunc.
				325	dfunc must be a callable, typically either :func:`unified_diff` or
				326	:func:`context_diff`.
				327
				328	Allows you to compare data with unknown or inconsistent encoding. All
				329	inputs except n must be bytes objects, not str. Works by losslessly
				330	converting all inputs (except n) to str, and calling ``dfunc(a, b,
				331	fromfile, tofile, fromfiledate, tofiledate, n, lineterm)``. The output of
				332	dfunc is then converted back to bytes, so the delta lines that you
				333	receive have the same unknown/inconsistent encodings as a and b.
				334
				335	.. versionadded:: 3.5
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	336
				337	.. function:: IS_LINE_JUNK(line)
				338
				339	Return true for ignorable lines. The line line is ignorable if line is
				340	blank or contains a single ``'#'``, otherwise it is not ignorable. Used as a
Georg Brandl	e6bcc91	2008-05-12 18:05:20 +0000	[diff] [blame]	341	default for parameter linejunk in :func:`ndiff` in older versions.
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	342
				343
				344	.. function:: IS_CHARACTER_JUNK(ch)
				345
				346	Return true for ignorable characters. The character ch is ignorable if ch
				347	is a space or tab, otherwise it is not ignorable. Used as a default for
				348	parameter charjunk in :func:`ndiff`.
				349
				350
				351	.. seealso::
				352
Georg Brandl	525d355	2014-10-29 10:26:56 +0100	[diff] [blame]	353	`Pattern Matching: The Gestalt Approach <http://www.drdobbs.com/database/pattern-matching-the-gestalt-approach/184407970>`_
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	354	Discussion of a similar algorithm by John W. Ratcliff and D. E. Metzener. This
Georg Brandl	525d355	2014-10-29 10:26:56 +0100	[diff] [blame]	355	was published in `Dr. Dobb's Journal <http://www.drdobbs.com/>`_ in July, 1988.
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	356
				357
				358	.. _sequence-matcher:
				359
				360	SequenceMatcher Objects
				361	-----------------------
				362
				363	The :class:`SequenceMatcher` class has this constructor:
				364
				365
Terry Reedy	99f9637	2010-11-25 06:12:34 +0000	[diff] [blame]	366	.. class:: SequenceMatcher(isjunk=None, a='', b='', autojunk=True)
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	367
				368	Optional argument isjunk must be ``None`` (the default) or a one-argument
				369	function that takes a sequence element and returns true if and only if the
				370	element is "junk" and should be ignored. Passing ``None`` for isjunk is
				371	equivalent to passing ``lambda x: 0``; in other words, no elements are ignored.
				372	For example, pass::
				373
				374	lambda x: x in " \t"
				375
				376	if you're comparing lines as sequences of characters, and don't want to synch up
				377	on blanks or hard tabs.
				378
				379	The optional arguments a and b are sequences to be compared; both default to
Guido van Rossum	2cc30da	2007-11-02 23:46:40 +0000	[diff] [blame]	380	empty strings. The elements of both sequences must be :term:`hashable`.
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	381
Terry Reedy	99f9637	2010-11-25 06:12:34 +0000	[diff] [blame]	382	The optional argument autojunk can be used to disable the automatic junk
				383	heuristic.
				384
Terry Reedy	dc9b17d	2010-11-27 20:52:14 +0000	[diff] [blame]	385	.. versionadded:: 3.2
				386	The autojunk parameter.
				387
Terry Reedy	74a7c67	2010-12-03 18:57:42 +0000	[diff] [blame]	388	SequenceMatcher objects get three data attributes: bjunk is the
Serhiy Storchaka	fbc1c26	2013-11-29 12:17:13 +0200	[diff] [blame]	389	set of elements of b for which isjunk is ``True``; bpopular is the set of
Terry Reedy	17a5925	2010-12-15 20:18:10 +0000	[diff] [blame]	390	non-junk elements considered popular by the heuristic (if it is not
				391	disabled); b2j is a dict mapping the remaining elements of b to a list
				392	of positions where they occur. All three are reset whenever b is reset
				393	with :meth:`set_seqs` or :meth:`set_seq2`.
Terry Reedy	74a7c67	2010-12-03 18:57:42 +0000	[diff] [blame]	394
Georg Brandl	500be24	2010-12-03 19:56:42 +0000	[diff] [blame]	395	.. versionadded:: 3.2
Terry Reedy	74a7c67	2010-12-03 18:57:42 +0000	[diff] [blame]	396	The bjunk and bpopular attributes.
				397
Benjamin Peterson	e41251e	2008-04-25 01:59:09 +0000	[diff] [blame]	398	:class:`SequenceMatcher` objects have the following methods:
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	399
Benjamin Peterson	e41251e	2008-04-25 01:59:09 +0000	[diff] [blame]	400	.. method:: set_seqs(a, b)
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	401
Benjamin Peterson	e41251e	2008-04-25 01:59:09 +0000	[diff] [blame]	402	Set the two sequences to be compared.
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	403
Benjamin Peterson	e41251e	2008-04-25 01:59:09 +0000	[diff] [blame]	404	:class:`SequenceMatcher` computes and caches detailed information about the
				405	second sequence, so if you want to compare one sequence against many
				406	sequences, use :meth:`set_seq2` to set the commonly used sequence once and
				407	call :meth:`set_seq1` repeatedly, once for each of the other sequences.
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	408
				409
Benjamin Peterson	e41251e	2008-04-25 01:59:09 +0000	[diff] [blame]	410	.. method:: set_seq1(a)
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	411
Benjamin Peterson	e41251e	2008-04-25 01:59:09 +0000	[diff] [blame]	412	Set the first sequence to be compared. The second sequence to be compared
				413	is not changed.
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	414
				415
Benjamin Peterson	e41251e	2008-04-25 01:59:09 +0000	[diff] [blame]	416	.. method:: set_seq2(b)
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	417
Benjamin Peterson	e41251e	2008-04-25 01:59:09 +0000	[diff] [blame]	418	Set the second sequence to be compared. The first sequence to be compared
				419	is not changed.
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	420
				421
Benjamin Peterson	e41251e	2008-04-25 01:59:09 +0000	[diff] [blame]	422	.. method:: find_longest_match(alo, ahi, blo, bhi)
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	423
Benjamin Peterson	e41251e	2008-04-25 01:59:09 +0000	[diff] [blame]	424	Find longest matching block in ``a[alo:ahi]`` and ``b[blo:bhi]``.
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	425
Benjamin Peterson	e41251e	2008-04-25 01:59:09 +0000	[diff] [blame]	426	If isjunk was omitted or ``None``, :meth:`find_longest_match` returns
				427	``(i, j, k)`` such that ``a[i:i+k]`` is equal to ``b[j:j+k]``, where ``alo
				428	<= i <= i+k <= ahi`` and ``blo <= j <= j+k <= bhi``. For all ``(i', j',
				429	k')`` meeting those conditions, the additional conditions ``k >= k'``, ``i
				430	<= i'``, and if ``i == i'``, ``j <= j'`` are also met. In other words, of
				431	all maximal matching blocks, return one that starts earliest in a, and
				432	of all those maximal matching blocks that start earliest in a, return
				433	the one that starts earliest in b.
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	434
Benjamin Peterson	e41251e	2008-04-25 01:59:09 +0000	[diff] [blame]	435	>>> s = SequenceMatcher(None, " abcd", "abcd abcd")
				436	>>> s.find_longest_match(0, 5, 0, 9)
				437	Match(a=0, b=4, size=5)
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	438
Benjamin Peterson	e41251e	2008-04-25 01:59:09 +0000	[diff] [blame]	439	If isjunk was provided, first the longest matching block is determined
				440	as above, but with the additional restriction that no junk element appears
				441	in the block. Then that block is extended as far as possible by matching
				442	(only) junk elements on both sides. So the resulting block never matches
				443	on junk except as identical junk happens to be adjacent to an interesting
				444	match.
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	445
Benjamin Peterson	e41251e	2008-04-25 01:59:09 +0000	[diff] [blame]	446	Here's the same example as before, but considering blanks to be junk. That
				447	prevents ``' abcd'`` from matching the ``' abcd'`` at the tail end of the
				448	second sequence directly. Instead only the ``'abcd'`` can match, and
				449	matches the leftmost ``'abcd'`` in the second sequence:
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	450
Benjamin Peterson	e41251e	2008-04-25 01:59:09 +0000	[diff] [blame]	451	>>> s = SequenceMatcher(lambda x: x==" ", " abcd", "abcd abcd")
				452	>>> s.find_longest_match(0, 5, 0, 9)
				453	Match(a=1, b=0, size=4)
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	454
Benjamin Peterson	e41251e	2008-04-25 01:59:09 +0000	[diff] [blame]	455	If no blocks match, this returns ``(alo, blo, 0)``.
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	456
Benjamin Peterson	e41251e	2008-04-25 01:59:09 +0000	[diff] [blame]	457	This method returns a :term:`named tuple` ``Match(a, b, size)``.
Christian Heimes	25bb783	2008-01-11 16:17:00 +0000	[diff] [blame]	458
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	459
Benjamin Peterson	e41251e	2008-04-25 01:59:09 +0000	[diff] [blame]	460	.. method:: get_matching_blocks()
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	461
Benjamin Peterson	e41251e	2008-04-25 01:59:09 +0000	[diff] [blame]	462	Return list of triples describing matching subsequences. Each triple is of
				463	the form ``(i, j, n)``, and means that ``a[i:i+n] == b[j:j+n]``. The
				464	triples are monotonically increasing in i and j.
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	465
Benjamin Peterson	e41251e	2008-04-25 01:59:09 +0000	[diff] [blame]	466	The last triple is a dummy, and has the value ``(len(a), len(b), 0)``. It
				467	is the only triple with ``n == 0``. If ``(i, j, n)`` and ``(i', j', n')``
				468	are adjacent triples in the list, and the second is not the last triple in
				469	the list, then ``i+n != i'`` or ``j+n != j'``; in other words, adjacent
				470	triples always describe non-adjacent equal blocks.
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	471
Benjamin Peterson	e41251e	2008-04-25 01:59:09 +0000	[diff] [blame]	472	.. XXX Explain why a dummy is used!
Christian Heimes	5b5e81c	2007-12-31 16:14:33 +0000	[diff] [blame]	473
Benjamin Peterson	e41251e	2008-04-25 01:59:09 +0000	[diff] [blame]	474	.. doctest::
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	475
Benjamin Peterson	e41251e	2008-04-25 01:59:09 +0000	[diff] [blame]	476	>>> s = SequenceMatcher(None, "abxcd", "abcd")
				477	>>> s.get_matching_blocks()
				478	[Match(a=0, b=0, size=2), Match(a=3, b=2, size=2), Match(a=5, b=4, size=0)]
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	479
				480
Benjamin Peterson	e41251e	2008-04-25 01:59:09 +0000	[diff] [blame]	481	.. method:: get_opcodes()
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	482
Benjamin Peterson	e41251e	2008-04-25 01:59:09 +0000	[diff] [blame]	483	Return list of 5-tuples describing how to turn a into b. Each tuple is
				484	of the form ``(tag, i1, i2, j1, j2)``. The first tuple has ``i1 == j1 ==
				485	0``, and remaining tuples have i1 equal to the i2 from the preceding
				486	tuple, and, likewise, j1 equal to the previous j2.
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	487
Benjamin Peterson	e41251e	2008-04-25 01:59:09 +0000	[diff] [blame]	488	The tag values are strings, with these meanings:
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	489
Benjamin Peterson	e41251e	2008-04-25 01:59:09 +0000	[diff] [blame]	490	+---------------+---------------------------------------------+
				491	\| Value \| Meaning \|
				492	+===============+=============================================+
				493	\| ``'replace'`` \| ``a[i1:i2]`` should be replaced by \|
				494	\| \| ``b[j1:j2]``. \|
				495	+---------------+---------------------------------------------+
				496	\| ``'delete'`` \| ``a[i1:i2]`` should be deleted. Note that \|
				497	\| \| ``j1 == j2`` in this case. \|
				498	+---------------+---------------------------------------------+
				499	\| ``'insert'`` \| ``b[j1:j2]`` should be inserted at \|
				500	\| \| ``a[i1:i1]``. Note that ``i1 == i2`` in \|
				501	\| \| this case. \|
				502	+---------------+---------------------------------------------+
				503	\| ``'equal'`` \| ``a[i1:i2] == b[j1:j2]`` (the sub-sequences \|
				504	\| \| are equal). \|
				505	+---------------+---------------------------------------------+
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	506
Berker Peksag	eb2e02b	2016-03-11 23:19:48 +0200	[diff] [blame]	507	For example::
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	508
Benjamin Peterson	e41251e	2008-04-25 01:59:09 +0000	[diff] [blame]	509	>>> a = "qabxcd"
				510	>>> b = "abycdf"
				511	>>> s = SequenceMatcher(None, a, b)
				512	>>> for tag, i1, i2, j1, j2 in s.get_opcodes():
Berker Peksag	eb2e02b	2016-03-11 23:19:48 +0200	[diff] [blame]	513	... print('{:7} a[{}:{}] --> b[{}:{}] {!r:>8} --> {!r}'.format(
				514	... tag, i1, i2, j1, j2, a[i1:i2], b[j1:j2]))
Raymond Hettinger	dbb677a	2011-04-09 19:41:00 -0700	[diff] [blame]	515	delete a[0:1] --> b[0:0] 'q' --> ''
				516	equal a[1:3] --> b[0:2] 'ab' --> 'ab'
				517	replace a[3:4] --> b[2:3] 'x' --> 'y'
				518	equal a[4:6] --> b[3:5] 'cd' --> 'cd'
				519	insert a[6:6] --> b[5:6] '' --> 'f'
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	520
				521
Georg Brandl	c2a4f4f	2009-04-10 09:03:43 +0000	[diff] [blame]	522	.. method:: get_grouped_opcodes(n=3)
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	523
Benjamin Peterson	e41251e	2008-04-25 01:59:09 +0000	[diff] [blame]	524	Return a :term:`generator` of groups with up to n lines of context.
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	525
Benjamin Peterson	e41251e	2008-04-25 01:59:09 +0000	[diff] [blame]	526	Starting with the groups returned by :meth:`get_opcodes`, this method
				527	splits out smaller change clusters and eliminates intervening ranges which
				528	have no changes.
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	529
Benjamin Peterson	e41251e	2008-04-25 01:59:09 +0000	[diff] [blame]	530	The groups are returned in the same format as :meth:`get_opcodes`.
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	531
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	532
Benjamin Peterson	e41251e	2008-04-25 01:59:09 +0000	[diff] [blame]	533	.. method:: ratio()
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	534
Benjamin Peterson	e41251e	2008-04-25 01:59:09 +0000	[diff] [blame]	535	Return a measure of the sequences' similarity as a float in the range [0,
				536	1].
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	537
Benjamin Peterson	e41251e	2008-04-25 01:59:09 +0000	[diff] [blame]	538	Where T is the total number of elements in both sequences, and M is the
				539	number of matches, this is 2.0\*M / T. Note that this is ``1.0`` if the
				540	sequences are identical, and ``0.0`` if they have nothing in common.
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	541
Benjamin Peterson	e41251e	2008-04-25 01:59:09 +0000	[diff] [blame]	542	This is expensive to compute if :meth:`get_matching_blocks` or
				543	:meth:`get_opcodes` hasn't already been called, in which case you may want
				544	to try :meth:`quick_ratio` or :meth:`real_quick_ratio` first to get an
				545	upper bound.
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	546
				547
Benjamin Peterson	e41251e	2008-04-25 01:59:09 +0000	[diff] [blame]	548	.. method:: quick_ratio()
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	549
Benjamin Peterson	e41251e	2008-04-25 01:59:09 +0000	[diff] [blame]	550	Return an upper bound on :meth:`ratio` relatively quickly.
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	551
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	552
Benjamin Peterson	e41251e	2008-04-25 01:59:09 +0000	[diff] [blame]	553	.. method:: real_quick_ratio()
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	554
Benjamin Peterson	e41251e	2008-04-25 01:59:09 +0000	[diff] [blame]	555	Return an upper bound on :meth:`ratio` very quickly.
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	556
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	557
				558	The three methods that return the ratio of matching to total characters can give
				559	different results due to differing levels of approximation, although
				560	:meth:`quick_ratio` and :meth:`real_quick_ratio` are always at least as large as
Christian Heimes	fe337bf	2008-03-23 21:54:12 +0000	[diff] [blame]	561	:meth:`ratio`:
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	562
				563	>>> s = SequenceMatcher(None, "abcd", "bcde")
				564	>>> s.ratio()
				565	0.75
				566	>>> s.quick_ratio()
				567	0.75
				568	>>> s.real_quick_ratio()
				569	1.0
				570
				571
				572	.. _sequencematcher-examples:
				573
				574	SequenceMatcher Examples
				575	------------------------
				576
Terry Reedy	74a7c67	2010-12-03 18:57:42 +0000	[diff] [blame]	577	This example compares two strings, considering blanks to be "junk":
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	578
				579	>>> s = SequenceMatcher(lambda x: x == " ",
				580	... "private Thread currentThread;",
				581	... "private volatile Thread currentThread;")
				582
				583	:meth:`ratio` returns a float in [0, 1], measuring the similarity of the
				584	sequences. As a rule of thumb, a :meth:`ratio` value over 0.6 means the
Christian Heimes	fe337bf	2008-03-23 21:54:12 +0000	[diff] [blame]	585	sequences are close matches:
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	586
Georg Brandl	6911e3c	2007-09-04 07:15:32 +0000	[diff] [blame]	587	>>> print(round(s.ratio(), 3))
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	588	0.866
				589
				590	If you're only interested in where the sequences match,
Christian Heimes	fe337bf	2008-03-23 21:54:12 +0000	[diff] [blame]	591	:meth:`get_matching_blocks` is handy:
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	592
				593	>>> for block in s.get_matching_blocks():
Georg Brandl	6911e3c	2007-09-04 07:15:32 +0000	[diff] [blame]	594	... print("a[%d] and b[%d] match for %d elements" % block)
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	595	a[0] and b[0] match for 8 elements
Christian Heimes	fe337bf	2008-03-23 21:54:12 +0000	[diff] [blame]	596	a[8] and b[17] match for 21 elements
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	597	a[29] and b[38] match for 0 elements
				598
				599	Note that the last tuple returned by :meth:`get_matching_blocks` is always a
				600	dummy, ``(len(a), len(b), 0)``, and this is the only case in which the last
				601	tuple element (number of elements matched) is ``0``.
				602
				603	If you want to know how to change the first sequence into the second, use
Christian Heimes	fe337bf	2008-03-23 21:54:12 +0000	[diff] [blame]	604	:meth:`get_opcodes`:
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	605
				606	>>> for opcode in s.get_opcodes():
Georg Brandl	6911e3c	2007-09-04 07:15:32 +0000	[diff] [blame]	607	... print("%6s a[%d:%d] b[%d:%d]" % opcode)
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	608	equal a[0:8] b[0:8]
				609	insert a[8:8] b[8:17]
Christian Heimes	fe337bf	2008-03-23 21:54:12 +0000	[diff] [blame]	610	equal a[8:29] b[17:38]
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	611
Raymond Hettinger	58c8c26	2009-04-27 21:01:21 +0000	[diff] [blame]	612	.. seealso::
				613
				614	* The :func:`get_close_matches` function in this module which shows how
				615	simple code building on :class:`SequenceMatcher` can be used to do useful
				616	work.
				617
				618	* `Simple version control recipe
Serhiy Storchaka	6dff020	2016-05-07 10:49:07 +0300	[diff] [blame]	619	<https://code.activestate.com/recipes/576729/>`_ for a small application
Raymond Hettinger	58c8c26	2009-04-27 21:01:21 +0000	[diff] [blame]	620	built with :class:`SequenceMatcher`.
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	621
				622
				623	.. _differ-objects:
				624
				625	Differ Objects
				626	--------------
				627
				628	Note that :class:`Differ`\ -generated deltas make no claim to be minimal
				629	diffs. To the contrary, minimal diffs are often counter-intuitive, because they
				630	synch up anywhere possible, sometimes accidental matches 100 pages apart.
				631	Restricting synch points to contiguous matches preserves some notion of
				632	locality, at the occasional cost of producing a longer diff.
				633
				634	The :class:`Differ` class has this constructor:
				635
				636
Georg Brandl	c2a4f4f	2009-04-10 09:03:43 +0000	[diff] [blame]	637	.. class:: Differ(linejunk=None, charjunk=None)
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	638
				639	Optional keyword parameters linejunk and charjunk are for filter functions
				640	(or ``None``):
				641
				642	linejunk: A function that accepts a single string argument, and returns true
				643	if the string is junk. The default is ``None``, meaning that no line is
				644	considered junk.
				645
				646	charjunk: A function that accepts a single character argument (a string of
				647	length 1), and returns true if the character is junk. The default is ``None``,
				648	meaning that no character is considered junk.
				649
Andrew Kuchling	c51da2b	2014-03-19 16:43:06 -0400	[diff] [blame]	650	These junk-filtering functions speed up matching to find
				651	differences and do not cause any differing lines or characters to
				652	be ignored. Read the description of the
				653	:meth:`~SequenceMatcher.find_longest_match` method's isjunk
				654	parameter for an explanation.
				655
Benjamin Peterson	e41251e	2008-04-25 01:59:09 +0000	[diff] [blame]	656	:class:`Differ` objects are used (deltas generated) via a single method:
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	657
				658
Benjamin Peterson	e41251e	2008-04-25 01:59:09 +0000	[diff] [blame]	659	.. method:: Differ.compare(a, b)
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	660
Benjamin Peterson	e41251e	2008-04-25 01:59:09 +0000	[diff] [blame]	661	Compare two sequences of lines, and generate the delta (a sequence of lines).
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	662
Serhiy Storchaka	bfdcd43	2013-10-13 23:09:14 +0300	[diff] [blame]	663	Each sequence must contain individual single-line strings ending with
				664	newlines. Such sequences can be obtained from the
				665	:meth:`~io.IOBase.readlines` method of file-like objects. The delta
				666	generated also consists of newline-terminated strings, ready to be
				667	printed as-is via the :meth:`~io.IOBase.writelines` method of a
				668	file-like object.
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	669
				670
				671	.. _differ-examples:
				672
				673	Differ Example
				674	--------------
				675
				676	This example compares two texts. First we set up the texts, sequences of
				677	individual single-line strings ending with newlines (such sequences can also be
Serhiy Storchaka	bfdcd43	2013-10-13 23:09:14 +0300	[diff] [blame]	678	obtained from the :meth:`~io.BaseIO.readlines` method of file-like objects):
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	679
				680	>>> text1 = ''' 1. Beautiful is better than ugly.
				681	... 2. Explicit is better than implicit.
				682	... 3. Simple is better than complex.
				683	... 4. Complex is better than complicated.
Terry Jan Reedy	bddecc3	2014-04-18 17:00:19 -0400	[diff] [blame]	684	... '''.splitlines(keepends=True)
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	685	>>> len(text1)
				686	4
				687	>>> text1[0][-1]
				688	'\n'
				689	>>> text2 = ''' 1. Beautiful is better than ugly.
				690	... 3. Simple is better than complex.
				691	... 4. Complicated is better than complex.
				692	... 5. Flat is better than nested.
Terry Jan Reedy	bddecc3	2014-04-18 17:00:19 -0400	[diff] [blame]	693	... '''.splitlines(keepends=True)
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	694
Christian Heimes	fe337bf	2008-03-23 21:54:12 +0000	[diff] [blame]	695	Next we instantiate a Differ object:
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	696
				697	>>> d = Differ()
				698
				699	Note that when instantiating a :class:`Differ` object we may pass functions to
				700	filter out line and character "junk." See the :meth:`Differ` constructor for
				701	details.
				702
Christian Heimes	fe337bf	2008-03-23 21:54:12 +0000	[diff] [blame]	703	Finally, we compare the two:
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	704
				705	>>> result = list(d.compare(text1, text2))
				706
Christian Heimes	fe337bf	2008-03-23 21:54:12 +0000	[diff] [blame]	707	``result`` is a list of strings, so let's pretty-print it:
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	708
				709	>>> from pprint import pprint
				710	>>> pprint(result)
				711	[' 1. Beautiful is better than ugly.\n',
				712	'- 2. Explicit is better than implicit.\n',
				713	'- 3. Simple is better than complex.\n',
				714	'+ 3. Simple is better than complex.\n',
Christian Heimes	fe337bf	2008-03-23 21:54:12 +0000	[diff] [blame]	715	'? ++\n',
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	716	'- 4. Complex is better than complicated.\n',
Christian Heimes	fe337bf	2008-03-23 21:54:12 +0000	[diff] [blame]	717	'? ^ ---- ^\n',
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	718	'+ 4. Complicated is better than complex.\n',
Christian Heimes	fe337bf	2008-03-23 21:54:12 +0000	[diff] [blame]	719	'? ++++ ^ ^\n',
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	720	'+ 5. Flat is better than nested.\n']
				721
Christian Heimes	fe337bf	2008-03-23 21:54:12 +0000	[diff] [blame]	722	As a single multi-line string it looks like this:
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	723
				724	>>> import sys
				725	>>> sys.stdout.writelines(result)
				726	1. Beautiful is better than ugly.
				727	- 2. Explicit is better than implicit.
				728	- 3. Simple is better than complex.
				729	+ 3. Simple is better than complex.
				730	? ++
				731	- 4. Complex is better than complicated.
				732	? ^ ---- ^
				733	+ 4. Complicated is better than complex.
				734	? ++++ ^ ^
				735	+ 5. Flat is better than nested.
				736
Christian Heimes	8640e74	2008-02-23 16:23:06 +0000	[diff] [blame]	737
				738	.. _difflib-interface:
				739
				740	A command-line interface to difflib
				741	-----------------------------------
				742
				743	This example shows how to use difflib to create a ``diff``-like utility.
				744	It is also contained in the Python source distribution, as
				745	:file:`Tools/scripts/diff.py`.
				746
Berker Peksag	707deb9	2015-07-30 00:03:48 +0300	[diff] [blame]	747	.. literalinclude:: ../../Tools/scripts/diff.py