Issue #5915: Implement PEP 383, Non-decodable Bytes in
System Character Interfaces.
diff --git a/Doc/library/codecs.rst b/Doc/library/codecs.rst
index ab578ea..3f1a5fe 100644
--- a/Doc/library/codecs.rst
+++ b/Doc/library/codecs.rst
@@ -322,6 +322,8 @@
| ``'backslashreplace'`` | Replace with backslashed escape sequences |
| | (only for encoding). |
+-------------------------+-----------------------------------------------+
+| ``'utf8b'`` | Replace byte with surrogate U+DCxx. |
++-------------------------+-----------------------------------------------+
In addition, the following error handlers are specific to a single codec:
@@ -333,7 +335,7 @@
+------------------+---------+--------------------------------------------+
.. versionadded:: 3.1
- The ``'surrogates'`` error handler.
+ The ``'utf8b'`` and ``'surrogates'`` error handlers.
The set of allowed values can be extended via :meth:`register_error`.
diff --git a/Doc/library/os.rst b/Doc/library/os.rst
index c686baf..83f5ee9 100644
--- a/Doc/library/os.rst
+++ b/Doc/library/os.rst
@@ -51,6 +51,30 @@
``'ce'``, ``'java'``.
+.. _os-filenames:
+
+File Names, Command Line Arguments, and Environment Variables
+-------------------------------------------------------------
+
+In Python, file names, command line arguments, and environment
+variables are represented using the string type. On some systems,
+decoding these strings to and from bytes is necessary before passing
+them to the operating system. Python uses the file system encoding to
+perform this conversion (see :func:`sys.getfilesystemencoding`).
+
+.. versionchanged:: 3.1
+ On some systems, conversion using the file system encoding may
+ fail. In this case, Python uses the ``utf8b`` encoding error
+ handler, which means that undecodable bytes are replaced by a
+ Unicode character U+DCxx on decoding, and these are again
+ translated to the original byte on encoding.
+
+
+The file system encoding must guarantee to successfully decode all
+bytes below 128. If the file system encoding fails to provide this
+guarantee, API functions may raise UnicodeErrors.
+
+
.. _os-procinfo:
Process Parameters
@@ -688,12 +712,8 @@
.. function:: getcwd()
- Return a string representing the current working directory. On Unix
- platforms, this function may raise :exc:`UnicodeDecodeError` if the name of
- the current directory is not decodable in the file system encoding. Use
- :func:`getcwdb` if you need the call to never fail. Availability: Unix,
- Windows.
-
+ Return a string representing the current working directory.
+ Availability: Unix, Windows.
.. function:: getcwdb()
@@ -800,10 +820,8 @@
entries ``'.'`` and ``'..'`` even if they are present in the directory.
Availability: Unix, Windows.
- This function can be called with a bytes or string argument. In the bytes
- case, all filenames will be listed as returned by the underlying API. In the
- string case, filenames will be decoded using the file system encoding, and
- skipped if a decoding error occurs.
+ This function can be called with a bytes or string argument, and returns
+ filenames of the same datatype.
.. function:: lstat(path)