bpo-30681: Support invalid date format or value in email Date header (GH-22090)
I am re-submitting an older PR which was abandoned but is still relevant, #10783 by @timb07.
The issue being solved () is still relevant. The original PR #10783 was closed as
the final request changes were not applied and since abandoned.
In this new PR I have re-used the original patch plus applied both comments from the review, by @maxking and @pganssle.
For reference, here is the original PR description:
In email.utils.parsedate_to_datetime(), a failure to parse the date, or invalid date components (such as hour outside 0..23) raises an exception. Document this behaviour, and add tests to test_email/test_utils.py to confirm this behaviour.
In email.headerregistry.DateHeader.parse(), check when parsedate_to_datetime() raises an exception and add a new defect InvalidDateDefect; preserve the invalid value as the string value of the header, but set the datetime attribute to None.
Add tests to test_email/test_headerregistry.py to confirm this behaviour; also added test to test_email/test_inversion.py to confirm emails with such defective date headers round trip successfully.
This pull request incorporates feedback gratefully received from @bitdancer, @brettcannon, @Mariatta and @warsaw, and replaces the earlier PR #2254.
Automerge-Triggered-By: GH:warsaw
diff --git a/Doc/library/email.errors.rst b/Doc/library/email.errors.rst
index f4b9f52..7a77640 100644
--- a/Doc/library/email.errors.rst
+++ b/Doc/library/email.errors.rst
@@ -112,3 +112,6 @@
* :class:`InvalidBase64LengthDefect` -- When decoding a block of base64 encoded
bytes, the number of non-padding base64 characters was invalid (1 more than
a multiple of 4). The encoded block was kept as-is.
+
+* :class:`InvalidDateDefect` -- When decoding an invalid or unparsable date field.
+ The original value is kept as-is.
\ No newline at end of file
diff --git a/Doc/library/email.utils.rst b/Doc/library/email.utils.rst
index 4d0e920..0e266b6 100644
--- a/Doc/library/email.utils.rst
+++ b/Doc/library/email.utils.rst
@@ -124,8 +124,10 @@
.. function:: parsedate_to_datetime(date)
The inverse of :func:`format_datetime`. Performs the same function as
- :func:`parsedate`, but on success returns a :mod:`~datetime.datetime`. If
- the input date has a timezone of ``-0000``, the ``datetime`` will be a naive
+ :func:`parsedate`, but on success returns a :mod:`~datetime.datetime`;
+ otherwise ``ValueError`` is raised if *date* contains an invalid value such
+ as an hour greater than 23 or a timezone offset not between -24 and 24 hours.
+ If the input date has a timezone of ``-0000``, the ``datetime`` will be a naive
``datetime``, and if the date is conforming to the RFCs it will represent a
time in UTC but with no indication of the actual source timezone of the
message the date comes from. If the input date has any other valid timezone