blob: fc890cb2326ab8c5a0d42d0b3f45d9602e13d98f [file] [log] [blame]
Georg Brandl116aa622007-08-15 14:28:22 +00001:mod:`stringprep` --- Internet String Preparation
2=================================================
3
4.. module:: stringprep
5 :synopsis: String preparation, as per RFC 3453
Georg Brandl116aa622007-08-15 14:28:22 +00006.. moduleauthor:: Martin v. Löwis <martin@v.loewis.de>
7.. sectionauthor:: Martin v. Löwis <martin@v.loewis.de>
8
9
Georg Brandl116aa622007-08-15 14:28:22 +000010When identifying things (such as host names) in the internet, it is often
11necessary to compare such identifications for "equality". Exactly how this
12comparison is executed may depend on the application domain, e.g. whether it
13should be case-insensitive or not. It may be also necessary to restrict the
14possible identifications, to allow only identifications consisting of
15"printable" characters.
16
17:rfc:`3454` defines a procedure for "preparing" Unicode strings in internet
18protocols. Before passing strings onto the wire, they are processed with the
19preparation procedure, after which they have a certain normalized form. The RFC
20defines a set of tables, which can be combined into profiles. Each profile must
21define which tables it uses, and what other optional parts of the ``stringprep``
22procedure are part of the profile. One example of a ``stringprep`` profile is
23``nameprep``, which is used for internationalized domain names.
24
25The module :mod:`stringprep` only exposes the tables from RFC 3454. As these
26tables would be very large to represent them as dictionaries or lists, the
27module uses the Unicode character database internally. The module source code
28itself was generated using the ``mkstringprep.py`` utility.
29
30As a result, these tables are exposed as functions, not as data structures.
31There are two kinds of tables in the RFC: sets and mappings. For a set,
32:mod:`stringprep` provides the "characteristic function", i.e. a function that
33returns true if the parameter is part of the set. For mappings, it provides the
34mapping function: given the key, it returns the associated value. Below is a
35list of all functions available in the module.
36
37
38.. function:: in_table_a1(code)
39
40 Determine whether *code* is in tableA.1 (Unassigned code points in Unicode 3.2).
41
42
43.. function:: in_table_b1(code)
44
45 Determine whether *code* is in tableB.1 (Commonly mapped to nothing).
46
47
48.. function:: map_table_b2(code)
49
50 Return the mapped value for *code* according to tableB.2 (Mapping for
51 case-folding used with NFKC).
52
53
54.. function:: map_table_b3(code)
55
56 Return the mapped value for *code* according to tableB.3 (Mapping for
57 case-folding used with no normalization).
58
59
60.. function:: in_table_c11(code)
61
62 Determine whether *code* is in tableC.1.1 (ASCII space characters).
63
64
65.. function:: in_table_c12(code)
66
67 Determine whether *code* is in tableC.1.2 (Non-ASCII space characters).
68
69
70.. function:: in_table_c11_c12(code)
71
72 Determine whether *code* is in tableC.1 (Space characters, union of C.1.1 and
73 C.1.2).
74
75
76.. function:: in_table_c21(code)
77
78 Determine whether *code* is in tableC.2.1 (ASCII control characters).
79
80
81.. function:: in_table_c22(code)
82
83 Determine whether *code* is in tableC.2.2 (Non-ASCII control characters).
84
85
86.. function:: in_table_c21_c22(code)
87
88 Determine whether *code* is in tableC.2 (Control characters, union of C.2.1 and
89 C.2.2).
90
91
92.. function:: in_table_c3(code)
93
94 Determine whether *code* is in tableC.3 (Private use).
95
96
97.. function:: in_table_c4(code)
98
99 Determine whether *code* is in tableC.4 (Non-character code points).
100
101
102.. function:: in_table_c5(code)
103
104 Determine whether *code* is in tableC.5 (Surrogate codes).
105
106
107.. function:: in_table_c6(code)
108
109 Determine whether *code* is in tableC.6 (Inappropriate for plain text).
110
111
112.. function:: in_table_c7(code)
113
114 Determine whether *code* is in tableC.7 (Inappropriate for canonical
115 representation).
116
117
118.. function:: in_table_c8(code)
119
120 Determine whether *code* is in tableC.8 (Change display properties or are
121 deprecated).
122
123
124.. function:: in_table_c9(code)
125
126 Determine whether *code* is in tableC.9 (Tagging characters).
127
128
129.. function:: in_table_d1(code)
130
131 Determine whether *code* is in tableD.1 (Characters with bidirectional property
132 "R" or "AL").
133
134
135.. function:: in_table_d2(code)
136
137 Determine whether *code* is in tableD.2 (Characters with bidirectional property
138 "L").
139