Fix to a bug found by Florian Weimer:
The UTF-8 decoder is still buggy (i.e. it doesn't pass Markus Kuhn's
stress test), mainly due to the following construct:
#define UTF8_ERROR(details) do { \
if (utf8_decoding_error(&s, &p, errors, details)) \
goto onError; \
continue; \
} while (0)
(The "continue" statement is supposed to exit from the outer loop,
but of course, it doesn't. Indeed, this is a marvelous example of
the dangers of the C programming language and especially of the C
preprocessor.)
diff --git a/Objects/unicodeobject.c b/Objects/unicodeobject.c
index 12c5be4..7c35f1c 100644
--- a/Objects/unicodeobject.c
+++ b/Objects/unicodeobject.c
@@ -634,7 +634,7 @@
#define UTF8_ERROR(details) do { \
if (utf8_decoding_error(&s, &p, errors, details)) \
goto onError; \
- continue; \
+ goto nextChar; \
} while (0)
PyObject *PyUnicode_DecodeUTF8(const char *s,
@@ -731,6 +731,7 @@
UTF8_ERROR("unsupported Unicode code range");
}
s += n;
+nextChar:
}
/* Adjust length */