bpo-25054, bpo-1647489: Added support of splitting on zerowidth patterns. (#4471)
Also fixed searching patterns that could match an empty string.
diff --git a/Doc/whatsnew/3.7.rst b/Doc/whatsnew/3.7.rst
index b6dad4e..3d23aa7 100644
--- a/Doc/whatsnew/3.7.rst
+++ b/Doc/whatsnew/3.7.rst
@@ -364,6 +364,10 @@
can be set within the scope of a group.
(Contributed by Serhiy Storchaka in :issue:`31690`.)
+:func:`re.split` now supports splitting on a pattern like ``r'\b'``,
+``'^$'`` or ``(?=-)`` that matches an empty string.
+(Contributed by Serhiy Storchaka in :issue:`25054`.)
+
string
------
@@ -768,6 +772,23 @@
avoid a warning escape them with a backslash.
(Contributed by Serhiy Storchaka in :issue:`30349`.)
+* The result of splitting a string on a :mod:`regular expression <re>`
+ that could match an empty string has been changed. For example
+ splitting on ``r'\s*'`` will now split not only on whitespaces as it
+ did previously, but also between any pair of non-whitespace
+ characters. The previous behavior can be restored by changing the pattern
+ to ``r'\s+'``. A :exc:`FutureWarning` was emitted for such patterns since
+ Python 3.5.
+
+ For patterns that match both empty and non-empty strings, the result of
+ searching for all matches may also be changed in other cases. For example
+ in the string ``'a\n\n'``, the pattern ``r'(?m)^\s*?$'`` will not only
+ match empty strings at positions 2 and 3, but also the string ``'\n'`` at
+ positions 2--3. To match only blank lines, the pattern should be rewritten
+ as ``r'(?m)^[^\S\n]*$'``.
+
+ (Contributed by Serhiy Storchaka in :issue:`25054`.)
+
* :class:`tracemalloc.Traceback` frames are now sorted from oldest to most
recent to be more consistent with :mod:`traceback`.
(Contributed by Jesse Bakker in :issue:`32121`.)