bpo-33578: Add getstate/setstate for CJK codec (GH-6984)
This implements getstate and setstate for the cjkcodecs multibyte incremental encoders/decoders, primarily to fix issues with seek/tell.
The encoder getstate/setstate is slightly tricky as the "state" is pending bytes + MultibyteCodec_State but only an integer can be returned. The approach I've taken is to encode this data into a long, similar to how .tell() encodes a "cookie_type" as a long.
https://bugs.python.org/issue33578
diff --git a/Modules/cjkcodecs/multibytecodec.h b/Modules/cjkcodecs/multibytecodec.h
index 5b8c222..6d34534 100644
--- a/Modules/cjkcodecs/multibytecodec.h
+++ b/Modules/cjkcodecs/multibytecodec.h
@@ -16,12 +16,15 @@
typedef unsigned short ucs2_t, DBCHAR;
#endif
-typedef union {
- void *p;
- int i;
+/*
+ * A struct that provides 8 bytes of state for multibyte
+ * codecs. Codecs are free to use this how they want. Note: if you
+ * need to add a new field to this struct, ensure that its byte order
+ * is independent of CPU endianness so that the return value of
+ * getstate doesn't differ between little and big endian CPUs.
+ */
+typedef struct {
unsigned char c[8];
- ucs2_t u2[4];
- Py_UCS4 u4[2];
} MultibyteCodec_State;
typedef int (*mbcodec_init)(const void *config);