blob: 47144a6adf79a3401ad4508d6b41891246ad0dba [file] [log] [blame]
Georg Brandl116aa622007-08-15 14:28:22 +00001:mod:`stringprep` --- Internet String Preparation
2=================================================
3
4.. module:: stringprep
5 :synopsis: String preparation, as per RFC 3453
Guido van Rossumda27fd22007-08-17 00:24:54 +00006 :deprecated:
Georg Brandl116aa622007-08-15 14:28:22 +00007.. moduleauthor:: Martin v. Löwis <martin@v.loewis.de>
8.. sectionauthor:: Martin v. Löwis <martin@v.loewis.de>
9
10
Georg Brandl116aa622007-08-15 14:28:22 +000011When identifying things (such as host names) in the internet, it is often
12necessary to compare such identifications for "equality". Exactly how this
13comparison is executed may depend on the application domain, e.g. whether it
14should be case-insensitive or not. It may be also necessary to restrict the
15possible identifications, to allow only identifications consisting of
16"printable" characters.
17
18:rfc:`3454` defines a procedure for "preparing" Unicode strings in internet
19protocols. Before passing strings onto the wire, they are processed with the
20preparation procedure, after which they have a certain normalized form. The RFC
21defines a set of tables, which can be combined into profiles. Each profile must
22define which tables it uses, and what other optional parts of the ``stringprep``
23procedure are part of the profile. One example of a ``stringprep`` profile is
24``nameprep``, which is used for internationalized domain names.
25
26The module :mod:`stringprep` only exposes the tables from RFC 3454. As these
27tables would be very large to represent them as dictionaries or lists, the
28module uses the Unicode character database internally. The module source code
29itself was generated using the ``mkstringprep.py`` utility.
30
31As a result, these tables are exposed as functions, not as data structures.
32There are two kinds of tables in the RFC: sets and mappings. For a set,
33:mod:`stringprep` provides the "characteristic function", i.e. a function that
34returns true if the parameter is part of the set. For mappings, it provides the
35mapping function: given the key, it returns the associated value. Below is a
36list of all functions available in the module.
37
38
39.. function:: in_table_a1(code)
40
41 Determine whether *code* is in tableA.1 (Unassigned code points in Unicode 3.2).
42
43
44.. function:: in_table_b1(code)
45
46 Determine whether *code* is in tableB.1 (Commonly mapped to nothing).
47
48
49.. function:: map_table_b2(code)
50
51 Return the mapped value for *code* according to tableB.2 (Mapping for
52 case-folding used with NFKC).
53
54
55.. function:: map_table_b3(code)
56
57 Return the mapped value for *code* according to tableB.3 (Mapping for
58 case-folding used with no normalization).
59
60
61.. function:: in_table_c11(code)
62
63 Determine whether *code* is in tableC.1.1 (ASCII space characters).
64
65
66.. function:: in_table_c12(code)
67
68 Determine whether *code* is in tableC.1.2 (Non-ASCII space characters).
69
70
71.. function:: in_table_c11_c12(code)
72
73 Determine whether *code* is in tableC.1 (Space characters, union of C.1.1 and
74 C.1.2).
75
76
77.. function:: in_table_c21(code)
78
79 Determine whether *code* is in tableC.2.1 (ASCII control characters).
80
81
82.. function:: in_table_c22(code)
83
84 Determine whether *code* is in tableC.2.2 (Non-ASCII control characters).
85
86
87.. function:: in_table_c21_c22(code)
88
89 Determine whether *code* is in tableC.2 (Control characters, union of C.2.1 and
90 C.2.2).
91
92
93.. function:: in_table_c3(code)
94
95 Determine whether *code* is in tableC.3 (Private use).
96
97
98.. function:: in_table_c4(code)
99
100 Determine whether *code* is in tableC.4 (Non-character code points).
101
102
103.. function:: in_table_c5(code)
104
105 Determine whether *code* is in tableC.5 (Surrogate codes).
106
107
108.. function:: in_table_c6(code)
109
110 Determine whether *code* is in tableC.6 (Inappropriate for plain text).
111
112
113.. function:: in_table_c7(code)
114
115 Determine whether *code* is in tableC.7 (Inappropriate for canonical
116 representation).
117
118
119.. function:: in_table_c8(code)
120
121 Determine whether *code* is in tableC.8 (Change display properties or are
122 deprecated).
123
124
125.. function:: in_table_c9(code)
126
127 Determine whether *code* is in tableC.9 (Tagging characters).
128
129
130.. function:: in_table_d1(code)
131
132 Determine whether *code* is in tableD.1 (Characters with bidirectional property
133 "R" or "AL").
134
135
136.. function:: in_table_d2(code)
137
138 Determine whether *code* is in tableD.2 (Characters with bidirectional property
139 "L").
140