Patrick Monnerat | 17951ea | 2014-03-04 16:42:19 +0100 | [diff] [blame] | 1 | IBM OS/400 implements iconv in an odd way: |
| 2 | - Type iconv_t is a structure: therefore objects of this type cannot be |
| 3 | compared to (iconv_t) -1. |
| 4 | - Supported character sets names are all of the form IBMCCSIDccsid..., where |
| 5 | ccsid is a decimal 5-digit integer identifying an IBM coded character set. |
| 6 | In addition, character set names have to be given in EBCDIC. |
| 7 | Standard character set names like "UTF-8" are NOT recognized. |
| 8 | - The prototype of iconv_open() does not declare parameters as const, although |
| 9 | they are not altered. |
| 10 | |
| 11 | Since libiconv does not support EBCDIC, use of this package here as a |
| 12 | replacement is not a solution. |
| 13 | |
| 14 | For these reasons, the code in this directory implements a wrapper to the |
| 15 | OS/400 iconv implementation. The wrapper performs the following transformations: |
| 16 | - Type iconv_t is an pointer. Although OS/400 pointers are odd, comparing |
| 17 | with (iconv_t) -1 is OK. |
| 18 | - All IANA character set names are recognized in a coding- and case-insensitive |
| 19 | way, providing an equivalent CCSID exists. see |
| 20 | http://www.iana.org/assignments/character-sets/character-sets.xhtml |
| 21 | - All CCSIDs from the association file can be expressed as IBMCCSIDxxxxx where |
| 22 | xxxxx is the 5 digit CCSID; no null terminator is required. Alternate codes |
| 23 | are of the form ibm-xxx (null-terminated), where xxx is the integer CCSID with |
| 24 | leading zeroes stripped. |
| 25 | - If a IANA BIBenum is defined for a CCSID, the name iana-xxx can be used, |
| 26 | where xxx is the integer MIBenum without leading zeroes. |
| 27 | - In addition, some aliases are also taken from the association file. Examples |
| 28 | are: ASCII, EBCDIC, UTF8. |
| 29 | - Prototype of iconv_open() has const parameters. |
| 30 | - Character code names can be given in any code. |
| 31 | |
| 32 | Character set names to CCSID conversion. |
| 33 | - http://www.iana.org/assignments/character-sets/character-sets.xhtml provides |
| 34 | all IANA registered character set names and aliases associated with a |
| 35 | MIBenum, that is a unique character set identifier. |
| 36 | - A hand-maintained file ccsid_mibenum.xml associates IBM CCSIDs to |
| 37 | IANA MBenums. |
| 38 | - An OS/400 C program (in subdirectory bldcsndfa) generates a deterministic |
| 39 | finite automaton from the files mentioned above into a C file for all |
| 40 | possible character set name and associating each of them with its |
| 41 | corresponding CCSID. This program can only be run on OS/400 since it uses |
| 42 | the native iconv support for EBCDIC. |
| 43 | - Since these operations are tedious and the table generation needs bootstraping |
| 44 | with libxml2, the generated automaton is stored within sources and need not |
| 45 | be rebuilt at each compilation. However, source is provided here to allow |
| 46 | new table generation with conversion tables that were not available at the |
| 47 | time of original generation. |