blob: b7317bb43b7b1aa7388e5fa2dd039dac56a696a8 [file] [log] [blame]
Fred Drakeb7979c72000-04-06 14:21:58 +00001\section{\module{codecs} ---
2 Python codec registry and base classes}
3
4\declaremodule{standard}{codec}
5\modulesynopsis{Encode and decode data and streams.}
6\moduleauthor{Marc-Andre Lemburg}{mal@lemburg.com}
7\sectionauthor{Marc-Andre Lemburg}{mal@lemburg.com}
8
9
10\index{Unicode}
11\index{Codecs}
12\indexii{Codecs}{encode}
13\indexii{Codecs}{decode}
14\index{streams}
15\indexii{stackable}{streams}
16
17
18This module defines base classes for standard Python codecs (encoders
19and decoders) and provides access to the internal Python codec
20registry which manages the codec lookup process.
21
22It defines the following functions:
23
24\begin{funcdesc}{register}{search_function}
25Register a codec search function. Search functions are expected to
26take one argument, the encoding name in all lower case letters, and
27return a tuple of functions \code{(\var{encoder}, \var{decoder}, \var{stream_reader},
28\var{stream_writer})} taking the following arguments:
29
30 \var{encoder} and \var{decoder}: These must be functions or methods
31 which have the same interface as the .encode/.decode methods of
32 Codec instances (see Codec Interface). The functions/methods are
33 expected to work in a stateless mode.
34
35 \var{stream_reader} and \var{stream_writer}: These have to be
36 factory functions providing the following interface:
37
38 \code{factory(\var{stream},\var{errors}='strict')}
39
40 The factory functions must return objects providing the interfaces
41 defined by the base classes
42 \class{StreamWriter}/\class{StreamReader} resp. Stream codecs can
43 maintain state.
44
45 Possible values for errors are 'strict' (raise an exception in case
46 of an encoding error), 'replace' (replace malformed data with a
47 suitable replacement marker, e.g. '?') and 'ignore' (ignore
48 malformed data and continue without further notice).
49
50In case a search function cannot find a given encoding, it should
51return None.
52\end{funcdesc}
53
54\begin{funcdesc}{lookup}{encoding}
55Looks up a codec tuple in the Python codec registry and returns the
56function tuple as defined above.
57
58Encodings are first looked up in the registry's cache. If not found,
59the list of registered search functions is scanned. If no codecs tuple
60is found, a LookupError is raised. Otherwise, the codecs tuple is
61stored in the cache and returned to the caller.
62\end{funcdesc}
63
64To simplify working with encoded files or stream, the module
65also defines these utility functions:
66
67\begin{funcdesc}{open}{filename, mode\optional{, encoding=None, errors='strict', buffering=1}}
68Open an encoded file using the given \var{mode} and return
69a wrapped version providing transparent encoding/decoding.
70
71Note: The wrapped version will only accept the object format defined
72by the codecs, i.e. Unicode objects for most builtin codecs. Output is
73also codec dependent and will usually by Unicode as well.
74
75\var{encoding} specifies the encoding which is to be used for the
76the file.
77
78\var{errors} may be given to define the error handling. It defaults
79to 'strict' which causes a \exception{ValueError} to be raised in case
80an encoding error occurs.
81
82\var{buffering} has the same meaning as for the builtin open() API.
83It defaults to line buffered.
84\end{funcdesc}
85
86\begin{funcdesc}{EncodedFile}{file, input\optional{, output=None, errors='strict'}}
87
88Return a wrapped version of file which provides transparent
89encoding translation.
90
91Strings written to the wrapped file are interpreted according to the
92given \var{input} encoding and then written to the original file as
93string using the \var{output} encoding. The intermediate encoding will
94usually be Unicode but depends on the specified codecs.
95
96If \var{output} is not given, it defaults to input.
97
98\var{errors} may be given to define the error handling. It defaults to
99'strict' which causes \exception{ValueError} to be raised in case
100an encoding error occurs.
101\end{funcdesc}
102
103
104
105...XXX document codec base classes...
106
107
108
109The module also provides the following constants which are useful
110for reading and writing to platform dependent files:
111
112\begin{datadesc}{BOM}
113\dataline{BOM_BE}
114\dataline{BOM_LE}
115\dataline{BOM32_BE}
116\dataline{BOM32_LE}
117\dataline{BOM64_BE}
118\dataline{BOM64_LE}
119These constants define the byte order marks (BOM) used in data
120streams to indicate the byte order used in the stream or file.
121\constant{BOM} is either \constant{BOM_BE} or \constant{BOM_LE}
122depending on the platform's native byte order, while the others
123represent big endian (\samp{_BE} suffix) and little endian
124(\samp{_LE} suffix) byte order using 32-bit and 64-bit encodings.
125\end{datadesc}
126