bpo-33578: Add getstate/setstate for CJK codec (GH-6984)



This implements getstate and setstate for the cjkcodecs multibyte incremental encoders/decoders, primarily to fix issues with seek/tell.

The encoder getstate/setstate is slightly tricky as the "state" is pending bytes + MultibyteCodec_State but only an integer can be returned. The approach I've taken is to encode this data into a long, similar to how .tell() encodes a "cookie_type" as a long.


https://bugs.python.org/issue33578
diff --git a/Modules/cjkcodecs/multibytecodec.h b/Modules/cjkcodecs/multibytecodec.h
index 5b8c222..6d34534 100644
--- a/Modules/cjkcodecs/multibytecodec.h
+++ b/Modules/cjkcodecs/multibytecodec.h
@@ -16,12 +16,15 @@
 typedef unsigned short ucs2_t, DBCHAR;
 #endif
 
-typedef union {
-    void *p;
-    int i;
+/*
+ * A struct that provides 8 bytes of state for multibyte
+ * codecs. Codecs are free to use this how they want. Note: if you
+ * need to add a new field to this struct, ensure that its byte order
+ * is independent of CPU endianness so that the return value of
+ * getstate doesn't differ between little and big endian CPUs.
+ */
+typedef struct {
     unsigned char c[8];
-    ucs2_t u2[4];
-    Py_UCS4 u4[2];
 } MultibyteCodec_State;
 
 typedef int (*mbcodec_init)(const void *config);