Blame - Doc/lib/libcsv.tex - platform/external/python/cpython3

blob: 283e40152138c65d5608a990468ccebf95f7894b [file] [log] [blame]

Skip Montanaro	b4a0417	2003-03-20 23:29:12 +0000	[diff] [blame]	1	\section{\module{csv} --- CSV File Reading and Writing}
				2
				3	\declaremodule{standard}{csv}
				4	\modulesynopsis{Write and read tabular data to and from delimited files.}
				5
				6	\versionadded{2.3}
				7	\index{csv}
				8	\indexii{data}{tabular}
				9
				10	The so-called CSV (Comma Separated Values) format is the most common import
				11	and export format for spreadsheets and databases. There is no ``CSV
				12	standard'', so the format is operationally defined by the many applications
				13	which read and write it. The lack of a standard means that subtle
				14	differences often exist in the data produced and consumed by different
				15	applications. These differences can make it annoying to process CSV files
				16	from multiple sources. Still, while the delimiters and quoting characters
				17	vary, the overall format is similar enough that it is possible to write a
				18	single module which can efficiently manipulate such data, hiding the details
				19	of reading and writing the data from the programmer.
				20
				21	The \module{csv} module implements classes to read and write tabular data in
				22	CSV format. It allows programmers to say, ``write this data in the format
				23	preferred by Excel,'' or ``read data from this file which was generated by
				24	Excel,'' without knowing the precise details of the CSV format used by
				25	Excel. Programmers can also describe the CSV formats understood by other
				26	applications or define their own special-purpose CSV formats.
				27
				28	The \module{csv} module's \class{reader} and \class{writer} objects read and
				29	write sequences. Programmers can also read and write data in dictionary
				30	form using the \class{DictReader} and \class{DictWriter} classes.
				31
				32	\note{The first version of the \module{csv} module doesn't support Unicode
				33	input. Also, there are currently some issues regarding \ASCII{} NUL
				34	characters. Accordingly, all input should generally be plain \ASCII{} to be
				35	safe. These restrictions will be removed in the future.}
				36
				37	\begin{seealso}
				38	% \seemodule{array}{Arrays of uniformly types numeric values.}
				39	\seepep{305}{CSV File API}
				40	{The Python Enhancement Proposal which proposed this addition
				41	to Python.}
				42	\end{seealso}
				43
				44
				45	\subsection{Module Contents}
				46
				47
				48	The \module{csv} module defines the following functions:
				49
				50	\begin{funcdesc}{reader}{csvfile\optional{,
				51	dialect=\code{'excel'}\optional{, fmtparam}}}
				52	Return a reader object which will iterate over lines in the given
				53	{}\var{csvfile}. \var{csvfile} can be any object which supports the
				54	iterator protocol and returns a string each time its \method{next}
				55	method is called. An optional \var{dialect} parameter can be given
				56	which is used to define a set of parameters specific to a particular CSV
				57	dialect. It may be an instance of a subclass of the \class{Dialect}
				58	class or one of the strings returned by the \function{list_dialects}
				59	function. The other optional {}\var{fmtparam} keyword arguments can be
				60	given to override individual formatting parameters in the current
				61	dialect. For more information about the dialect and formatting
				62	parameters, see section~\ref{fmt-params}, ``Dialects and Formatting
				63	Parameters'' for details of these parameters.
				64
				65	All data read are returned as strings. No automatic data type
				66	conversion is performed.
				67	\end{funcdesc}
				68
				69	\begin{funcdesc}{writer}{csvfile\optional{,
				70	dialect=\code{'excel'}\optional{, fmtparam}}}
				71	Return a writer object responsible for converting the user's data into
				72	delimited strings on the given file-like object. An optional
				73	{}\var{dialect} parameter can be given which is used to define a set of
				74	parameters specific to a particular CSV dialect. It may be an instance
				75	of a subclass of the \class{Dialect} class or one of the strings
				76	returned by the \function{list_dialects} function. The other optional
				77	{}\var{fmtparam} keyword arguments can be given to override individual
				78	formatting parameters in the current dialect. For more information
				79	about the dialect and formatting parameters, see
				80	section~\ref{fmt-params}, ``Dialects and Formatting Parameters'' for
				81	details of these parameters. To make it as easy as possible to
				82	interface with modules which implement the DB API, the value
				83	\constant{None} is written as the empty string. While this isn't a
				84	reversible transformation, it makes it easier to dump SQL NULL data values
				85	to CSV files without preprocessing the data returned from a
				86	\code{cursor.fetch*()} call. All other non-string data are stringified
				87	with \function{str()} before being written.
				88	\end{funcdesc}
				89
				90	\begin{funcdesc}{register_dialect}{name, dialect}
				91	Associate \var{dialect} with \var{name}. \var{dialect} must be a subclass
				92	of \class{csv.Dialect}. \var{name} must be a string or Unicode object.
				93	\end{funcdesc}
				94
				95	\begin{funcdesc}{unregister_dialect}{name}
				96	Delete the dialect associated with \var{name} from the dialect registry. An
				97	\exception{Error} is raised if \var{name} is not a registered dialect
				98	name.
				99	\end{funcdesc}
				100
				101	\begin{funcdesc}{get_dialect}{name}
				102	Return the dialect associated with \var{name}. An \exception{Error} is
				103	raised if \var{name} is not a registered dialect name.
				104	\end{funcdesc}
				105
				106	\begin{funcdesc}{list_dialects}{}
				107	Return the names of all registered dialects.
				108	\end{funcdesc}
				109
				110
				111	The \module{csv} module defines the following classes:
				112
				113	\begin{classdesc}{DictReader}{csvfile, fieldnames\optional{,
				114	restkey=\code{None}\optional{,
				115	restval=\code{None}\optional{,
				116	dialect=\code{'excel'}\optional{,
				117	fmtparam}}}}}
				118	Create an object which operates like a regular reader but maps the
				119	information read into a dict whose keys are given by the \var{fieldnames}
				120	parameter. If the row read has fewer fields than the fieldnames sequence,
				121	the value of \var{restval} will be used as the default value. If the row
				122	read has more fields than the fieldnames sequence, the remaining data is
				123	added as a sequence keyed by the value of \var{restkey}. If the row read
				124	has fewer fields than the fieldnames sequence, the remaining keys take the
				125	value of the optiona \var{restval} parameter. All other parameters are
				126	interpreted as for regular readers.
				127	\end{classdesc}
				128
				129
				130	\begin{classdesc}{DictWriter}{csvfile, fieldnames\optional{,
				131	restval=""\optional{,
				132	extrasaction=\code{'raise'}\optional{,
				133	dialect=\code{'excel'}\optional{, fmtparam}}}}}
				134	Create an object which operates like a regular writer but maps dictionaries
				135	onto output rows. The \var{fieldnames} parameter identifies the order in
				136	which values in the dictionary passed to the \method{writerow()} method are
				137	written to the \var{csvfile}. The optional \var{restval} parameter
				138	specifies the value to be written if the dictionary is missing a key in
				139	\var{fieldnames}. If the dictionary passed to the \method{writerow()}
				140	method contains a key not found in \var{fieldnames}, the optional
				141	\var{extrasaction} parameter indicates what action to take. If it is set
				142	to \code{'raise'} a \exception{ValueError} is raised. If it is set to
				143	\code{'ignore'}, extra values in the dictionary are ignored. All other
				144	parameters are interpreted as for regular writers.
				145	\end{classdesc}
				146
				147
				148	\begin{classdesc*}{Dialect}{}
				149	The \class{Dialect} class is a container class relied on primarily for its
				150	attributes, which are used to define the parameters for a specific
				151	\class{reader} or \class{writer} instance. Dialect objects support the
				152	following data attributes:
				153
				154	\begin{memberdesc}[string]{delimiter}
				155	A one-character string used to separate fields. It defaults to \code{","}.
				156	\end{memberdesc}
				157
				158	\begin{memberdesc}[boolean]{doublequote}
				159	Controls how instances of \var{quotechar} appearing inside a field should be
				160	themselves be quoted. When \constant{True}, the character is doubledd.
				161	When \constant{False}, the \var{escapechar} must be a one-character string
				162	which is used as a prefix to the \var{quotechar}. It defaults to
				163	\constant{True}.
				164	\end{memberdesc}
				165
				166	\begin{memberdesc}{escapechar}
				167	A one-character string used to escape the \var{delimiter} if \var{quoting}
				168	is set to \constant{QUOTE_NONE}. It defaults to \constant{None}.
				169	\end{memberdesc}
				170
				171	\begin{memberdesc}[string]{lineterminator}
				172	The string used to terminate lines in the CSV file. It defaults to
				173	\code{"\e r\e n"}.
				174	\end{memberdesc}
				175
				176	\begin{memberdesc}[string]{quotechar}
				177	A one-character string used to quote elements containing the \var{delimiter}
				178	or which start with the \var{quotechar}. It defaults to \code{'"'}.
				179	\end{memberdesc}
				180
				181	\begin{memberdesc}[integer]{quoting}
				182	Controls when quotes should be generated by the writer. It can take on any
				183	of the \code{QUOTE_*} constants defined below and defaults to
				184	\constant{QUOTE_MINIMAL}.
				185	\end{memberdesc}
				186
				187	\begin{memberdesc}[boolean]{skipinitialspace}
				188	When \constant{True}, whitespace immediately following the \var{delimiter}
				189	is ignored. The default is \constant{False}.
				190	\end{memberdesc}
				191
				192	\end{classdesc*}
				193
				194	The \module{csv} module defines the following constants:
				195
				196	\begin{datadesc}{QUOTE_ALWAYS}
				197	Instructs \class{writer} objects to quote all fields.
				198	\end{datadesc}
				199
				200	\begin{datadesc}{QUOTE_MINIMAL}
				201	Instructs \class{writer} objects to only quote those fields which contain
				202	the current \var{delimiter} or begin with the current \var{quotechar}.
				203	\end{datadesc}
				204
				205	\begin{datadesc}{QUOTE_NONNUMERIC}
				206	Instructs \class{writer} objects to quote all non-numeric fields.
				207	\end{datadesc}
				208
				209	\begin{datadesc}{QUOTE_NONE}
				210	Instructs \class{writer} objects to never quote fields. When the current
				211	\var{delimiter} occurs in output data it is preceded by the current
				212	\var{escapechar} character. When \constant{QUOTE_NONE} is in effect, it
				213	is an error not to have a single-character \var{escapechar} defined, even if
				214	no data to be written contains the \var{delimiter} character.
				215	\end{datadesc}
				216
				217
				218	The \module{csv} module defines the following exception:
				219
				220	\begin{excdesc}{Error}
				221	Raised by any of the functions when an error is detected.
				222	\end{excdesc}
				223
				224
				225	\subsection{Dialects and Formatting Parameters\label{fmt-params}}
				226
				227	To make it easier to specify the format of input and output records,
				228	specific formatting parameters are grouped together into dialects. A
				229	dialect is a subclass of the \class{Dialect} class having a set of specific
				230	methods and a single \method{validate()} method. When creating \class{reader}
				231	or \class{writer} objects, the programmer can specify a string or a subclass
				232	of the \class{Dialect} class as the dialect parameter. In addition to, or
				233	instead of, the \var{dialect} parameter, the programmer can also specify
				234	individual formatting parameters, which have the same names as the
				235	attributes defined above for the \class{Dialect} class.
				236
				237
				238	\subsection{Reader Objects}
				239
				240	\class{DictReader} and \var{reader} objects have the following public
				241	methods:
				242
				243	\begin{methoddesc}{next}{}
				244	Return the next row of the reader's iterable object as a list, parsed
				245	according to the current dialect.
				246	\end{methoddesc}
				247
				248
				249	\subsection{Writer Objects}
				250
				251	\class{DictWriter} and \var{writer} objects have the following public
				252	methods:
				253
				254	\begin{methoddesc}{writerow}{row}
				255	Write the \var{row} parameter to the writer's file object, formatted
				256	according to the current dialect.
				257	\end{methoddesc}
				258
				259	\begin{methoddesc}{writerows}{rows}
				260	Write all the \var{rows} parameters to the writer's file object, formatted
				261	according to the current dialect.
				262	\end{methoddesc}
				263
				264
				265	\subsection{Examples}
				266
				267	The ``Hello, world'' of csv reading is
				268
				269	\begin{verbatim}
				270	reader = csv.reader(file("some.csv"))
				271	for row in reader:
				272	print row
				273	\end{verbatim}
				274
				275	The corresponding simplest possible writing example is
				276
				277	\begin{verbatim}
				278	writer = csv.writer(file("some.csv", "w"))
				279	for row in someiterable:
				280	writer.writerow(row)
				281	\end{verbatim}