Blame - Doc/lib/libcodecs.tex - platform/external/python/cpython3

blob: b7317bb43b7b1aa7388e5fa2dd039dac56a696a8 [file] [log] [blame]

Fred Drake	b7979c7	2000-04-06 14:21:58 +0000	[diff] [blame^]	1	\section{\module{codecs} ---
				2	Python codec registry and base classes}
				3
				4	\declaremodule{standard}{codec}
				5	\modulesynopsis{Encode and decode data and streams.}
				6	\moduleauthor{Marc-Andre Lemburg}{mal@lemburg.com}
				7	\sectionauthor{Marc-Andre Lemburg}{mal@lemburg.com}
				8
				9
				10	\index{Unicode}
				11	\index{Codecs}
				12	\indexii{Codecs}{encode}
				13	\indexii{Codecs}{decode}
				14	\index{streams}
				15	\indexii{stackable}{streams}
				16
				17
				18	This module defines base classes for standard Python codecs (encoders
				19	and decoders) and provides access to the internal Python codec
				20	registry which manages the codec lookup process.
				21
				22	It defines the following functions:
				23
				24	\begin{funcdesc}{register}{search_function}
				25	Register a codec search function. Search functions are expected to
				26	take one argument, the encoding name in all lower case letters, and
				27	return a tuple of functions \code{(\var{encoder}, \var{decoder}, \var{stream_reader},
				28	\var{stream_writer})} taking the following arguments:
				29
				30	\var{encoder} and \var{decoder}: These must be functions or methods
				31	which have the same interface as the .encode/.decode methods of
				32	Codec instances (see Codec Interface). The functions/methods are
				33	expected to work in a stateless mode.
				34
				35	\var{stream_reader} and \var{stream_writer}: These have to be
				36	factory functions providing the following interface:
				37
				38	\code{factory(\var{stream},\var{errors}='strict')}
				39
				40	The factory functions must return objects providing the interfaces
				41	defined by the base classes
				42	\class{StreamWriter}/\class{StreamReader} resp. Stream codecs can
				43	maintain state.
				44
				45	Possible values for errors are 'strict' (raise an exception in case
				46	of an encoding error), 'replace' (replace malformed data with a
				47	suitable replacement marker, e.g. '?') and 'ignore' (ignore
				48	malformed data and continue without further notice).
				49
				50	In case a search function cannot find a given encoding, it should
				51	return None.
				52	\end{funcdesc}
				53
				54	\begin{funcdesc}{lookup}{encoding}
				55	Looks up a codec tuple in the Python codec registry and returns the
				56	function tuple as defined above.
				57
				58	Encodings are first looked up in the registry's cache. If not found,
				59	the list of registered search functions is scanned. If no codecs tuple
				60	is found, a LookupError is raised. Otherwise, the codecs tuple is
				61	stored in the cache and returned to the caller.
				62	\end{funcdesc}
				63
				64	To simplify working with encoded files or stream, the module
				65	also defines these utility functions:
				66
				67	\begin{funcdesc}{open}{filename, mode\optional{, encoding=None, errors='strict', buffering=1}}
				68	Open an encoded file using the given \var{mode} and return
				69	a wrapped version providing transparent encoding/decoding.
				70
				71	Note: The wrapped version will only accept the object format defined
				72	by the codecs, i.e. Unicode objects for most builtin codecs. Output is
				73	also codec dependent and will usually by Unicode as well.
				74
				75	\var{encoding} specifies the encoding which is to be used for the
				76	the file.
				77
				78	\var{errors} may be given to define the error handling. It defaults
				79	to 'strict' which causes a \exception{ValueError} to be raised in case
				80	an encoding error occurs.
				81
				82	\var{buffering} has the same meaning as for the builtin open() API.
				83	It defaults to line buffered.
				84	\end{funcdesc}
				85
				86	\begin{funcdesc}{EncodedFile}{file, input\optional{, output=None, errors='strict'}}
				87
				88	Return a wrapped version of file which provides transparent
				89	encoding translation.
				90
				91	Strings written to the wrapped file are interpreted according to the
				92	given \var{input} encoding and then written to the original file as
				93	string using the \var{output} encoding. The intermediate encoding will
				94	usually be Unicode but depends on the specified codecs.
				95
				96	If \var{output} is not given, it defaults to input.
				97
				98	\var{errors} may be given to define the error handling. It defaults to
				99	'strict' which causes \exception{ValueError} to be raised in case
				100	an encoding error occurs.
				101	\end{funcdesc}
				102
				103
				104
				105	...XXX document codec base classes...
				106
				107
				108
				109	The module also provides the following constants which are useful
				110	for reading and writing to platform dependent files:
				111
				112	\begin{datadesc}{BOM}
				113	\dataline{BOM_BE}
				114	\dataline{BOM_LE}
				115	\dataline{BOM32_BE}
				116	\dataline{BOM32_LE}
				117	\dataline{BOM64_BE}
				118	\dataline{BOM64_LE}
				119	These constants define the byte order marks (BOM) used in data
				120	streams to indicate the byte order used in the stream or file.
				121	\constant{BOM} is either \constant{BOM_BE} or \constant{BOM_LE}
				122	depending on the platform's native byte order, while the others
				123	represent big endian (\samp{_BE} suffix) and little endian
				124	(\samp{_LE} suffix) byte order using 32-bit and 64-bit encodings.
				125	\end{datadesc}
				126