Issue 2986: Add autojunk paramater to SequenceMatcher to turn off heuristic. Patch by Terry Reedy, Eli Bendersky, and Simon Cross
diff --git a/Doc/library/difflib.rst b/Doc/library/difflib.rst
index 58bbe45..c7efa96 100644
--- a/Doc/library/difflib.rst
+++ b/Doc/library/difflib.rst
@@ -17,6 +17,7 @@
information in various formats, including HTML and context and unified
diffs. For comparing directories and files, see also, the :mod:`filecmp` module.
+
.. class:: SequenceMatcher
This is a flexible class for comparing pairs of sequences of any type, so long
@@ -35,6 +36,14 @@
complicated way on how many elements the sequences have in common; best case
time is linear.
+ **Automatic junk heuristic:** :class:`SequenceMatcher` supports a heuristic that
+ automatically treats certain sequence items as junk. The heuristic counts how many
+ times each individual item appears in the sequence. If an item's duplicates (after
+ the first one) account for more than 1% of the sequence and the sequence is at least
+ 200 items long, this item is marked as "popular" and is treated as junk for
+ the purpose of sequence matching. This heuristic can be turned off by setting
+ the ``autojunk`` argument to ``False`` when creating the :class:`SequenceMatcher`.
+
.. class:: Differ
@@ -324,7 +333,7 @@
The :class:`SequenceMatcher` class has this constructor:
-.. class:: SequenceMatcher(isjunk=None, a='', b='')
+.. class:: SequenceMatcher(isjunk=None, a='', b='', autojunk=True)
Optional argument *isjunk* must be ``None`` (the default) or a one-argument
function that takes a sequence element and returns true if and only if the
@@ -340,6 +349,9 @@
The optional arguments *a* and *b* are sequences to be compared; both default to
empty strings. The elements of both sequences must be :term:`hashable`.
+ The optional argument *autojunk* can be used to disable the automatic junk
+ heuristic.
+
:class:`SequenceMatcher` objects have the following methods: