Fix to a bug found by Florian Weimer: The UTF-8 decoder is still buggy (i.e. it doesn't pass Markus Kuhn's stress test), mainly due to the following construct: #define UTF8_ERROR(details) do { \ if (utf8_decoding_error(&s, &p, errors, details)) \ goto onError; \ continue; \ } while (0) (The "continue" statement is supposed to exit from the outer loop, but of course, it doesn't. Indeed, this is a marvelous example of the dangers of the C programming language and especially of the C preprocessor.)

commit: fb625847bfc9fb3ebf548b8c32a9accd21868d18 [log] [tgz]
author: Marc-André Lemburg <mal@egenix.com> Sun Jul 16 13:29:13 2000 +0000
committer: Marc-André Lemburg <mal@egenix.com> Sun Jul 16 13:29:13 2000 +0000
tree: 4cfa75e340ee27724d9cc27670c0a3136b894042
parent: 7e47402264cf87b9bbb61fc9ff610af08add7c7b [diff]
diff --git a/Objects/unicodeobject.c b/Objects/unicodeobject.c
index 12c5be4..7c35f1c 100644
--- a/Objects/unicodeobject.c
+++ b/Objects/unicodeobject.c

@@ -634,7 +634,7 @@
 #define UTF8_ERROR(details)  do {                       \
     if (utf8_decoding_error(&s, &p, errors, details))   \
         goto onError;                                   \
-    continue;                                           \
+    goto nextChar;                                      \
 } while (0)
 
 PyObject *PyUnicode_DecodeUTF8(const char *s,
@@ -731,6 +731,7 @@
             UTF8_ERROR("unsupported Unicode code range");
         }
         s += n;
+nextChar:
     }
 
     /* Adjust length */
commit	fb625847bfc9fb3ebf548b8c32a9accd21868d18	[log] [tgz]
author	Marc-André Lemburg <mal@egenix.com>	Sun Jul 16 13:29:13 2000 +0000
committer	Marc-André Lemburg <mal@egenix.com>	Sun Jul 16 13:29:13 2000 +0000
tree	4cfa75e340ee27724d9cc27670c0a3136b894042
parent	7e47402264cf87b9bbb61fc9ff610af08add7c7b [diff]