The string formatting code has a test to switch to Unicode when %s
sees a Unicode argument. Unfortunately this test was also executed
for %r, because %s and %r share almost all of their code. This meant
that, if u is a unicode object while repr(u) is an 8-bit string
containing ASCII characters, '%r' % u is a *unicode* string containing
only ASCII characters!
Fixed by executing the test only for %s.
Also fixed an error message -- %s argument has non-string str()
doesn't make sense for %r, so the error message now differentiates
between %s and %r.
diff --git a/Objects/stringobject.c b/Objects/stringobject.c
index 932ef51..52f96ff 100644
--- a/Objects/stringobject.c
+++ b/Objects/stringobject.c
@@ -3858,7 +3858,6 @@
len = 1;
break;
case 's':
- case 'r':
#ifdef Py_USING_UNICODE
if (PyUnicode_Check(v)) {
fmt = fmt_start;
@@ -3866,6 +3865,8 @@
goto unicode;
}
#endif
+ /* Fall through */
+ case 'r':
if (c == 's')
temp = PyObject_Str(v);
else
@@ -3874,7 +3875,9 @@
goto error;
if (!PyString_Check(temp)) {
PyErr_SetString(PyExc_TypeError,
- "%s argument has non-string str()");
+ c == 's' ?
+ "%s argument has non-string str()" :
+ "%r argument has non-string repr()");
Py_DECREF(temp);
goto error;
}