blob: b0944e43730dbb2ed843d4c5be41387054a15bfb [file] [log] [blame]
Georg Brandl116aa622007-08-15 14:28:22 +00001
2:mod:`stringprep` --- Internet String Preparation
3=================================================
4
5.. module:: stringprep
6 :synopsis: String preparation, as per RFC 3453
7.. moduleauthor:: Martin v. Löwis <martin@v.loewis.de>
8.. sectionauthor:: Martin v. Löwis <martin@v.loewis.de>
9
10
11.. versionadded:: 2.3
12
13When identifying things (such as host names) in the internet, it is often
14necessary to compare such identifications for "equality". Exactly how this
15comparison is executed may depend on the application domain, e.g. whether it
16should be case-insensitive or not. It may be also necessary to restrict the
17possible identifications, to allow only identifications consisting of
18"printable" characters.
19
20:rfc:`3454` defines a procedure for "preparing" Unicode strings in internet
21protocols. Before passing strings onto the wire, they are processed with the
22preparation procedure, after which they have a certain normalized form. The RFC
23defines a set of tables, which can be combined into profiles. Each profile must
24define which tables it uses, and what other optional parts of the ``stringprep``
25procedure are part of the profile. One example of a ``stringprep`` profile is
26``nameprep``, which is used for internationalized domain names.
27
28The module :mod:`stringprep` only exposes the tables from RFC 3454. As these
29tables would be very large to represent them as dictionaries or lists, the
30module uses the Unicode character database internally. The module source code
31itself was generated using the ``mkstringprep.py`` utility.
32
33As a result, these tables are exposed as functions, not as data structures.
34There are two kinds of tables in the RFC: sets and mappings. For a set,
35:mod:`stringprep` provides the "characteristic function", i.e. a function that
36returns true if the parameter is part of the set. For mappings, it provides the
37mapping function: given the key, it returns the associated value. Below is a
38list of all functions available in the module.
39
40
41.. function:: in_table_a1(code)
42
43 Determine whether *code* is in tableA.1 (Unassigned code points in Unicode 3.2).
44
45
46.. function:: in_table_b1(code)
47
48 Determine whether *code* is in tableB.1 (Commonly mapped to nothing).
49
50
51.. function:: map_table_b2(code)
52
53 Return the mapped value for *code* according to tableB.2 (Mapping for
54 case-folding used with NFKC).
55
56
57.. function:: map_table_b3(code)
58
59 Return the mapped value for *code* according to tableB.3 (Mapping for
60 case-folding used with no normalization).
61
62
63.. function:: in_table_c11(code)
64
65 Determine whether *code* is in tableC.1.1 (ASCII space characters).
66
67
68.. function:: in_table_c12(code)
69
70 Determine whether *code* is in tableC.1.2 (Non-ASCII space characters).
71
72
73.. function:: in_table_c11_c12(code)
74
75 Determine whether *code* is in tableC.1 (Space characters, union of C.1.1 and
76 C.1.2).
77
78
79.. function:: in_table_c21(code)
80
81 Determine whether *code* is in tableC.2.1 (ASCII control characters).
82
83
84.. function:: in_table_c22(code)
85
86 Determine whether *code* is in tableC.2.2 (Non-ASCII control characters).
87
88
89.. function:: in_table_c21_c22(code)
90
91 Determine whether *code* is in tableC.2 (Control characters, union of C.2.1 and
92 C.2.2).
93
94
95.. function:: in_table_c3(code)
96
97 Determine whether *code* is in tableC.3 (Private use).
98
99
100.. function:: in_table_c4(code)
101
102 Determine whether *code* is in tableC.4 (Non-character code points).
103
104
105.. function:: in_table_c5(code)
106
107 Determine whether *code* is in tableC.5 (Surrogate codes).
108
109
110.. function:: in_table_c6(code)
111
112 Determine whether *code* is in tableC.6 (Inappropriate for plain text).
113
114
115.. function:: in_table_c7(code)
116
117 Determine whether *code* is in tableC.7 (Inappropriate for canonical
118 representation).
119
120
121.. function:: in_table_c8(code)
122
123 Determine whether *code* is in tableC.8 (Change display properties or are
124 deprecated).
125
126
127.. function:: in_table_c9(code)
128
129 Determine whether *code* is in tableC.9 (Tagging characters).
130
131
132.. function:: in_table_d1(code)
133
134 Determine whether *code* is in tableD.1 (Characters with bidirectional property
135 "R" or "AL").
136
137
138.. function:: in_table_d2(code)
139
140 Determine whether *code* is in tableD.2 (Characters with bidirectional property
141 "L").
142