bpo-41894: Fix UnicodeDecodeError while loading native module (GH-22466)
When running in a non-UTF-8 locale, if an error occurs while importing a
native Python module (say because a dependent share library is missing),
the error message string returned may contain non-ASCII code points
causing a UnicodeDecodeError.
PyUnicode_DecodeFSDefault is used for buffers which may contain
filesystem paths. For consistency with os.strerror(),
PyUnicode_DecodeLocale is used for buffers which contain system error
messages. While the shortname parameter is always encoded in ASCII
according to PEP 489, it is left decoded using PyUnicode_FromString to
minimize the changes and since it should not affect the decoding (albeit
_potentially_ slower).
In dynload_hpux, since the error buffer contains a message generated
from a static ASCII string and the module filesystem path,
PyUnicode_DecodeFSDefault is used instead of PyUnicode_DecodeLocale as
is used elsewhere.
* bpo-41894: Fix bugs in dynload error msg handling
For both dynload_aix and dynload_hpux, properly handle the possibility
that decoding strings may return NULL and when such an error happens,
properly decrement any previously decoded strings and return early.
In addition, in dynload_aix, ensure that we pass the decoded string
*object* pathname_ob to PyErr_SetImportError instead of the original
pathname buffer.
Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
(cherry picked from commit 2d2af320d94afc6561e8f8adf174c9d3fd9065bc)
Co-authored-by: Kevin Adler <kadler@us.ibm.com>
diff --git a/Python/dynload_shlib.c b/Python/dynload_shlib.c
index 082154d..2382889 100644
--- a/Python/dynload_shlib.c
+++ b/Python/dynload_shlib.c
@@ -106,7 +106,7 @@
const char *error = dlerror();
if (error == NULL)
error = "unknown dlopen() error";
- error_ob = PyUnicode_FromString(error);
+ error_ob = PyUnicode_DecodeLocale(error, "surrogateescape");
if (error_ob == NULL)
return NULL;
mod_name = PyUnicode_FromString(shortname);
@@ -114,7 +114,7 @@
Py_DECREF(error_ob);
return NULL;
}
- path = PyUnicode_FromString(pathname);
+ path = PyUnicode_DecodeFSDefault(pathname);
if (path == NULL) {
Py_DECREF(error_ob);
Py_DECREF(mod_name);