Mauro Carvalho Chehab | 7751439 | 2016-09-23 16:14:29 -0300 | [diff] [blame] | 1 | Unicode support |
| 2 | =============== |
| 3 | |
Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 4 | Last update: 2005-01-17, version 1.4 |
| 5 | |
| 6 | This file is maintained by H. Peter Anvin <unicode@lanana.org> as part |
| 7 | of the Linux Assigned Names And Numbers Authority (LANANA) project. |
| 8 | The current version can be found at: |
| 9 | |
Mauro Carvalho Chehab | 8c27ceff3 | 2016-10-18 10:12:27 -0200 | [diff] [blame] | 10 | http://www.lanana.org/docs/unicode/admin-guide/unicode.rst |
Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 11 | |
Sanjeev | 7d56f0f | 2016-12-01 23:36:00 +0800 | [diff] [blame] | 12 | Introduction |
| 13 | ------------ |
Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 14 | |
| 15 | The Linux kernel code has been rewritten to use Unicode to map |
| 16 | characters to fonts. By downloading a single Unicode-to-font table, |
| 17 | both the eight-bit character sets and UTF-8 mode are changed to use |
| 18 | the font as indicated. |
| 19 | |
| 20 | This changes the semantics of the eight-bit character tables subtly. |
| 21 | The four character tables are now: |
| 22 | |
Mauro Carvalho Chehab | 7751439 | 2016-09-23 16:14:29 -0300 | [diff] [blame] | 23 | =============== =============================== ================ |
Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 24 | Map symbol Map name Escape code (G0) |
Mauro Carvalho Chehab | 7751439 | 2016-09-23 16:14:29 -0300 | [diff] [blame] | 25 | =============== =============================== ================ |
Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 26 | LAT1_MAP Latin-1 (ISO 8859-1) ESC ( B |
| 27 | GRAF_MAP DEC VT100 pseudographics ESC ( 0 |
| 28 | IBMPC_MAP IBM code page 437 ESC ( U |
| 29 | USER_MAP User defined ESC ( K |
Mauro Carvalho Chehab | 7751439 | 2016-09-23 16:14:29 -0300 | [diff] [blame] | 30 | =============== =============================== ================ |
Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 31 | |
| 32 | In particular, ESC ( U is no longer "straight to font", since the font |
| 33 | might be completely different than the IBM character set. This |
| 34 | permits for example the use of block graphics even with a Latin-1 font |
| 35 | loaded. |
| 36 | |
| 37 | Note that although these codes are similar to ISO 2022, neither the |
| 38 | codes nor their uses match ISO 2022; Linux has two 8-bit codes (G0 and |
| 39 | G1), whereas ISO 2022 has four 7-bit codes (G0-G3). |
| 40 | |
| 41 | In accordance with the Unicode standard/ISO 10646 the range U+F000 to |
| 42 | U+F8FF has been reserved for OS-wide allocation (the Unicode Standard |
| 43 | refers to this as a "Corporate Zone", since this is inaccurate for |
| 44 | Linux we call it the "Linux Zone"). U+F000 was picked as the starting |
| 45 | point since it lets the direct-mapping area start on a large power of |
| 46 | two (in case 1024- or 2048-character fonts ever become necessary). |
| 47 | This leaves U+E000 to U+EFFF as End User Zone. |
| 48 | |
| 49 | [v1.2]: The Unicodes range from U+F000 and up to U+F7FF have been |
| 50 | hard-coded to map directly to the loaded font, bypassing the |
| 51 | translation table. The user-defined map now defaults to U+F000 to |
| 52 | U+F0FF, emulating the previous behaviour. In practice, this range |
| 53 | might be shorter; for example, vgacon can only handle 256-character |
| 54 | (U+F000..U+F0FF) or 512-character (U+F000..U+F1FF) fonts. |
| 55 | |
| 56 | |
| 57 | Actual characters assigned in the Linux Zone |
| 58 | -------------------------------------------- |
| 59 | |
| 60 | In addition, the following characters not present in Unicode 1.1.4 |
| 61 | have been defined; these are used by the DEC VT graphics map. [v1.2] |
| 62 | THIS USE IS OBSOLETE AND SHOULD NO LONGER BE USED; PLEASE SEE BELOW. |
| 63 | |
Mauro Carvalho Chehab | 7751439 | 2016-09-23 16:14:29 -0300 | [diff] [blame] | 64 | ====== ====================================== |
Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 65 | U+F800 DEC VT GRAPHICS HORIZONTAL LINE SCAN 1 |
| 66 | U+F801 DEC VT GRAPHICS HORIZONTAL LINE SCAN 3 |
| 67 | U+F803 DEC VT GRAPHICS HORIZONTAL LINE SCAN 7 |
| 68 | U+F804 DEC VT GRAPHICS HORIZONTAL LINE SCAN 9 |
Mauro Carvalho Chehab | 7751439 | 2016-09-23 16:14:29 -0300 | [diff] [blame] | 69 | ====== ====================================== |
Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 70 | |
| 71 | The DEC VT220 uses a 6x10 character matrix, and these characters form |
| 72 | a smooth progression in the DEC VT graphics character set. I have |
| 73 | omitted the scan 5 line, since it is also used as a block-graphics |
| 74 | character, and hence has been coded as U+2500 FORMS LIGHT HORIZONTAL. |
| 75 | |
| 76 | [v1.3]: These characters have been officially added to Unicode 3.2.0; |
| 77 | they are added at U+23BA, U+23BB, U+23BC, U+23BD. Linux now uses the |
| 78 | new values. |
| 79 | |
| 80 | [v1.2]: The following characters have been added to represent common |
| 81 | keyboard symbols that are unlikely to ever be added to Unicode proper |
| 82 | since they are horribly vendor-specific. This, of course, is an |
| 83 | excellent example of horrible design. |
| 84 | |
Mauro Carvalho Chehab | 7751439 | 2016-09-23 16:14:29 -0300 | [diff] [blame] | 85 | ====== ====================================== |
Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 86 | U+F810 KEYBOARD SYMBOL FLYING FLAG |
| 87 | U+F811 KEYBOARD SYMBOL PULLDOWN MENU |
| 88 | U+F812 KEYBOARD SYMBOL OPEN APPLE |
| 89 | U+F813 KEYBOARD SYMBOL SOLID APPLE |
Mauro Carvalho Chehab | 7751439 | 2016-09-23 16:14:29 -0300 | [diff] [blame] | 90 | ====== ====================================== |
Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 91 | |
| 92 | Klingon language support |
| 93 | ------------------------ |
| 94 | |
| 95 | In 1996, Linux was the first operating system in the world to add |
| 96 | support for the artificial language Klingon, created by Marc Okrand |
| 97 | for the "Star Trek" television series. This encoding was later |
| 98 | adopted by the ConScript Unicode Registry and proposed (but ultimately |
| 99 | rejected) for inclusion in Unicode Plane 1. Thus, it remains as a |
| 100 | Linux/CSUR private assignment in the Linux Zone. |
| 101 | |
| 102 | This encoding has been endorsed by the Klingon Language Institute. |
| 103 | For more information, contact them at: |
| 104 | |
| 105 | http://www.kli.org/ |
| 106 | |
| 107 | Since the characters in the beginning of the Linux CZ have been more |
| 108 | of the dingbats/symbols/forms type and this is a language, I have |
| 109 | located it at the end, on a 16-cell boundary in keeping with standard |
| 110 | Unicode practice. |
| 111 | |
Mauro Carvalho Chehab | 7751439 | 2016-09-23 16:14:29 -0300 | [diff] [blame] | 112 | .. note:: |
| 113 | |
| 114 | This range is now officially managed by the ConScript Unicode |
| 115 | Registry. The normative reference is at: |
Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 116 | |
| 117 | http://www.evertype.com/standards/csur/klingon.html |
| 118 | |
| 119 | Klingon has an alphabet of 26 characters, a positional numeric writing |
| 120 | system with 10 digits, and is written left-to-right, top-to-bottom. |
| 121 | |
| 122 | Several glyph forms for the Klingon alphabet have been proposed. |
| 123 | However, since the set of symbols appear to be consistent throughout, |
| 124 | with only the actual shapes being different, in keeping with standard |
| 125 | Unicode practice these differences are considered font variants. |
| 126 | |
Mauro Carvalho Chehab | 7751439 | 2016-09-23 16:14:29 -0300 | [diff] [blame] | 127 | ====== ======================================================= |
Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 128 | U+F8D0 KLINGON LETTER A |
| 129 | U+F8D1 KLINGON LETTER B |
| 130 | U+F8D2 KLINGON LETTER CH |
| 131 | U+F8D3 KLINGON LETTER D |
| 132 | U+F8D4 KLINGON LETTER E |
| 133 | U+F8D5 KLINGON LETTER GH |
| 134 | U+F8D6 KLINGON LETTER H |
| 135 | U+F8D7 KLINGON LETTER I |
| 136 | U+F8D8 KLINGON LETTER J |
| 137 | U+F8D9 KLINGON LETTER L |
| 138 | U+F8DA KLINGON LETTER M |
| 139 | U+F8DB KLINGON LETTER N |
| 140 | U+F8DC KLINGON LETTER NG |
| 141 | U+F8DD KLINGON LETTER O |
| 142 | U+F8DE KLINGON LETTER P |
| 143 | U+F8DF KLINGON LETTER Q |
| 144 | - Written <q> in standard Okrand Latin transliteration |
| 145 | U+F8E0 KLINGON LETTER QH |
| 146 | - Written <Q> in standard Okrand Latin transliteration |
| 147 | U+F8E1 KLINGON LETTER R |
| 148 | U+F8E2 KLINGON LETTER S |
| 149 | U+F8E3 KLINGON LETTER T |
| 150 | U+F8E4 KLINGON LETTER TLH |
| 151 | U+F8E5 KLINGON LETTER U |
| 152 | U+F8E6 KLINGON LETTER V |
| 153 | U+F8E7 KLINGON LETTER W |
| 154 | U+F8E8 KLINGON LETTER Y |
| 155 | U+F8E9 KLINGON LETTER GLOTTAL STOP |
| 156 | |
| 157 | U+F8F0 KLINGON DIGIT ZERO |
| 158 | U+F8F1 KLINGON DIGIT ONE |
| 159 | U+F8F2 KLINGON DIGIT TWO |
| 160 | U+F8F3 KLINGON DIGIT THREE |
| 161 | U+F8F4 KLINGON DIGIT FOUR |
| 162 | U+F8F5 KLINGON DIGIT FIVE |
| 163 | U+F8F6 KLINGON DIGIT SIX |
| 164 | U+F8F7 KLINGON DIGIT SEVEN |
| 165 | U+F8F8 KLINGON DIGIT EIGHT |
| 166 | U+F8F9 KLINGON DIGIT NINE |
| 167 | |
| 168 | U+F8FD KLINGON COMMA |
| 169 | U+F8FE KLINGON FULL STOP |
| 170 | U+F8FF KLINGON SYMBOL FOR EMPIRE |
Mauro Carvalho Chehab | 7751439 | 2016-09-23 16:14:29 -0300 | [diff] [blame] | 171 | ====== ======================================================= |
Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 172 | |
| 173 | Other Fictional and Artificial Scripts |
| 174 | -------------------------------------- |
| 175 | |
| 176 | Since the assignment of the Klingon Linux Unicode block, a registry of |
| 177 | fictional and artificial scripts has been established by John Cowan |
| 178 | <jcowan@reutershealth.com> and Michael Everson <everson@evertype.com>. |
| 179 | The ConScript Unicode Registry is accessible at: |
| 180 | |
| 181 | http://www.evertype.com/standards/csur/ |
| 182 | |
| 183 | The ranges used fall at the low end of the End User Zone and can hence |
| 184 | not be normatively assigned, but it is recommended that people who |
| 185 | wish to encode fictional scripts use these codes, in the interest of |
| 186 | interoperability. For Klingon, CSUR has adopted the Linux encoding. |
| 187 | The CSUR people are driving adding Tengwar and Cirth into Unicode |
| 188 | Plane 1; the addition of Klingon to Unicode Plane 1 has been rejected |
| 189 | and so the above encoding remains official. |