blob: cf49ad7291f55e04854ce05cd0b8c644b7f807e7 [file] [log] [blame]
Georg Brandl116aa622007-08-15 14:28:22 +00001
2:mod:`stringprep` --- Internet String Preparation
3=================================================
4
5.. module:: stringprep
6 :synopsis: String preparation, as per RFC 3453
Guido van Rossumda27fd22007-08-17 00:24:54 +00007 :deprecated:
Georg Brandl116aa622007-08-15 14:28:22 +00008.. moduleauthor:: Martin v. Löwis <martin@v.loewis.de>
9.. sectionauthor:: Martin v. Löwis <martin@v.loewis.de>
10
11
Georg Brandl116aa622007-08-15 14:28:22 +000012When identifying things (such as host names) in the internet, it is often
13necessary to compare such identifications for "equality". Exactly how this
14comparison is executed may depend on the application domain, e.g. whether it
15should be case-insensitive or not. It may be also necessary to restrict the
16possible identifications, to allow only identifications consisting of
17"printable" characters.
18
19:rfc:`3454` defines a procedure for "preparing" Unicode strings in internet
20protocols. Before passing strings onto the wire, they are processed with the
21preparation procedure, after which they have a certain normalized form. The RFC
22defines a set of tables, which can be combined into profiles. Each profile must
23define which tables it uses, and what other optional parts of the ``stringprep``
24procedure are part of the profile. One example of a ``stringprep`` profile is
25``nameprep``, which is used for internationalized domain names.
26
27The module :mod:`stringprep` only exposes the tables from RFC 3454. As these
28tables would be very large to represent them as dictionaries or lists, the
29module uses the Unicode character database internally. The module source code
30itself was generated using the ``mkstringprep.py`` utility.
31
32As a result, these tables are exposed as functions, not as data structures.
33There are two kinds of tables in the RFC: sets and mappings. For a set,
34:mod:`stringprep` provides the "characteristic function", i.e. a function that
35returns true if the parameter is part of the set. For mappings, it provides the
36mapping function: given the key, it returns the associated value. Below is a
37list of all functions available in the module.
38
39
40.. function:: in_table_a1(code)
41
42 Determine whether *code* is in tableA.1 (Unassigned code points in Unicode 3.2).
43
44
45.. function:: in_table_b1(code)
46
47 Determine whether *code* is in tableB.1 (Commonly mapped to nothing).
48
49
50.. function:: map_table_b2(code)
51
52 Return the mapped value for *code* according to tableB.2 (Mapping for
53 case-folding used with NFKC).
54
55
56.. function:: map_table_b3(code)
57
58 Return the mapped value for *code* according to tableB.3 (Mapping for
59 case-folding used with no normalization).
60
61
62.. function:: in_table_c11(code)
63
64 Determine whether *code* is in tableC.1.1 (ASCII space characters).
65
66
67.. function:: in_table_c12(code)
68
69 Determine whether *code* is in tableC.1.2 (Non-ASCII space characters).
70
71
72.. function:: in_table_c11_c12(code)
73
74 Determine whether *code* is in tableC.1 (Space characters, union of C.1.1 and
75 C.1.2).
76
77
78.. function:: in_table_c21(code)
79
80 Determine whether *code* is in tableC.2.1 (ASCII control characters).
81
82
83.. function:: in_table_c22(code)
84
85 Determine whether *code* is in tableC.2.2 (Non-ASCII control characters).
86
87
88.. function:: in_table_c21_c22(code)
89
90 Determine whether *code* is in tableC.2 (Control characters, union of C.2.1 and
91 C.2.2).
92
93
94.. function:: in_table_c3(code)
95
96 Determine whether *code* is in tableC.3 (Private use).
97
98
99.. function:: in_table_c4(code)
100
101 Determine whether *code* is in tableC.4 (Non-character code points).
102
103
104.. function:: in_table_c5(code)
105
106 Determine whether *code* is in tableC.5 (Surrogate codes).
107
108
109.. function:: in_table_c6(code)
110
111 Determine whether *code* is in tableC.6 (Inappropriate for plain text).
112
113
114.. function:: in_table_c7(code)
115
116 Determine whether *code* is in tableC.7 (Inappropriate for canonical
117 representation).
118
119
120.. function:: in_table_c8(code)
121
122 Determine whether *code* is in tableC.8 (Change display properties or are
123 deprecated).
124
125
126.. function:: in_table_c9(code)
127
128 Determine whether *code* is in tableC.9 (Tagging characters).
129
130
131.. function:: in_table_d1(code)
132
133 Determine whether *code* is in tableD.1 (Characters with bidirectional property
134 "R" or "AL").
135
136
137.. function:: in_table_d2(code)
138
139 Determine whether *code* is in tableD.2 (Characters with bidirectional property
140 "L").
141