bpo-30681: Support invalid date format or value in email Date header (GH-22090)



I am re-submitting an older PR which was abandoned but is still relevant, #10783 by @timb07.

The issue being solved () is still relevant. The original PR #10783 was closed as
the final request changes were not applied and since abandoned.

In this new PR I have re-used the original patch plus applied both comments from the review, by @maxking and @pganssle.


For reference, here is the original PR description:
In email.utils.parsedate_to_datetime(), a failure to parse the date, or invalid date components (such as hour outside 0..23) raises an exception. Document this behaviour, and add tests to test_email/test_utils.py to confirm this behaviour.

In email.headerregistry.DateHeader.parse(), check when parsedate_to_datetime() raises an exception and add a new defect InvalidDateDefect; preserve the invalid value as the string value of the header, but set the datetime attribute to None.

Add tests to test_email/test_headerregistry.py to confirm this behaviour; also added test to test_email/test_inversion.py to confirm emails with such defective date headers round trip successfully.

This pull request incorporates feedback gratefully received from @bitdancer, @brettcannon, @Mariatta and @warsaw, and replaces the earlier PR #2254.

Automerge-Triggered-By: GH:warsaw
diff --git a/Doc/library/email.errors.rst b/Doc/library/email.errors.rst
index f4b9f52..7a77640 100644
--- a/Doc/library/email.errors.rst
+++ b/Doc/library/email.errors.rst
@@ -112,3 +112,6 @@
 * :class:`InvalidBase64LengthDefect` -- When decoding a block of base64 encoded
   bytes, the number of non-padding base64 characters was invalid (1 more than
   a multiple of 4).  The encoded block was kept as-is.
+
+* :class:`InvalidDateDefect` -- When decoding an invalid or unparsable date field.
+  The original value is kept as-is.
\ No newline at end of file
diff --git a/Doc/library/email.utils.rst b/Doc/library/email.utils.rst
index 4d0e920..0e266b6 100644
--- a/Doc/library/email.utils.rst
+++ b/Doc/library/email.utils.rst
@@ -124,8 +124,10 @@
 .. function:: parsedate_to_datetime(date)
 
    The inverse of :func:`format_datetime`.  Performs the same function as
-   :func:`parsedate`, but on success returns a :mod:`~datetime.datetime`.  If
-   the input date has a timezone of ``-0000``, the ``datetime`` will be a naive
+   :func:`parsedate`, but on success returns a :mod:`~datetime.datetime`;
+   otherwise ``ValueError`` is raised if *date* contains an invalid value such
+   as an hour greater than 23 or a timezone offset not between -24 and 24 hours.
+   If the input date has a timezone of ``-0000``, the ``datetime`` will be a naive
    ``datetime``, and if the date is conforming to the RFCs it will represent a
    time in UTC but with no indication of the actual source timezone of the
    message the date comes from.  If the input date has any other valid timezone