bpo-29240: Fix locale encodings in UTF-8 Mode (#5170)
Modify locale.localeconv(), time.tzname, os.strerror() and other
functions to ignore the UTF-8 Mode: always use the current locale
encoding.
Changes:
* Add _Py_DecodeLocaleEx() and _Py_EncodeLocaleEx(). On decoding or
encoding error, they return the position of the error and an error
message which are used to raise Unicode errors in
PyUnicode_DecodeLocale() and PyUnicode_EncodeLocale().
* Replace _Py_DecodeCurrentLocale() with _Py_DecodeLocaleEx().
* PyUnicode_DecodeLocale() now uses _Py_DecodeLocaleEx() for all
cases, especially for the strict error handler.
* Add _Py_DecodeUTF8Ex(): return more information on decoding error
and supports the strict error handler.
* Rename _Py_EncodeUTF8_surrogateescape() to _Py_EncodeUTF8Ex().
* Replace _Py_EncodeCurrentLocale() with _Py_EncodeLocaleEx().
* Ignore the UTF-8 mode to encode/decode localeconv(), strerror()
and time zone name.
* Remove PyUnicode_DecodeLocale(), PyUnicode_DecodeLocaleAndSize()
and PyUnicode_EncodeLocale() now ignore the UTF-8 mode: always use
the "current" locale.
* Remove _PyUnicode_DecodeCurrentLocale(),
_PyUnicode_DecodeCurrentLocaleAndSize() and
_PyUnicode_EncodeCurrentLocale().
diff --git a/Doc/c-api/sys.rst b/Doc/c-api/sys.rst
index 20bc7bd..e4da96c 100644
--- a/Doc/c-api/sys.rst
+++ b/Doc/c-api/sys.rst
@@ -106,6 +106,16 @@
surrogate character, escape the bytes using the surrogateescape error
handler instead of decoding them.
+ Encoding, highest priority to lowest priority:
+
+ * ``UTF-8`` on macOS and Android;
+ * ``UTF-8`` if the Python UTF-8 mode is enabled;
+ * ``ASCII`` if the ``LC_CTYPE`` locale is ``"C"``,
+ ``nl_langinfo(CODESET)`` returns the ``ASCII`` encoding (or an alias),
+ and :c:func:`mbstowcs` and :c:func:`wcstombs` functions uses the
+ ``ISO-8859-1`` encoding.
+ * the current locale encoding.
+
Return a pointer to a newly allocated wide character string, use
:c:func:`PyMem_RawFree` to free the memory. If size is not ``NULL``, write
the number of wide characters excluding the null character into ``*size``
@@ -137,6 +147,18 @@
:ref:`surrogateescape error handler <surrogateescape>`: surrogate characters
in the range U+DC80..U+DCFF are converted to bytes 0x80..0xFF.
+ Encoding, highest priority to lowest priority:
+
+ * ``UTF-8`` on macOS and Android;
+ * ``UTF-8`` if the Python UTF-8 mode is enabled;
+ * ``ASCII`` if the ``LC_CTYPE`` locale is ``"C"``,
+ ``nl_langinfo(CODESET)`` returns the ``ASCII`` encoding (or an alias),
+ and :c:func:`mbstowcs` and :c:func:`wcstombs` functions uses the
+ ``ISO-8859-1`` encoding.
+ * the current locale encoding.
+
+ The function uses the UTF-8 encoding in the Python UTF-8 mode.
+
Return a pointer to a newly allocated byte string, use :c:func:`PyMem_Free`
to free the memory. Return ``NULL`` on encoding error or memory allocation
error