Blame - Doc/lib/libcsv.tex - platform/external/python/cpython3

blob: 54fc8db5fe1ab6de73e8d39e92c853050e3c9fd3 [file] [log] [blame]

Skip Montanaro	b4a0417	2003-03-20 23:29:12 +0000	[diff] [blame]	1	\section{\module{csv} --- CSV File Reading and Writing}
				2
				3	\declaremodule{standard}{csv}
				4	\modulesynopsis{Write and read tabular data to and from delimited files.}
Skip Montanaro	3bd3c84	2003-04-24 18:47:31 +0000	[diff] [blame]	5	\sectionauthor{Skip Montanaro}{skip@pobox.com}
Skip Montanaro	b4a0417	2003-03-20 23:29:12 +0000	[diff] [blame]	6
				7	\versionadded{2.3}
				8	\index{csv}
				9	\indexii{data}{tabular}
				10
				11	The so-called CSV (Comma Separated Values) format is the most common import
				12	and export format for spreadsheets and databases. There is no ``CSV
				13	standard'', so the format is operationally defined by the many applications
				14	which read and write it. The lack of a standard means that subtle
				15	differences often exist in the data produced and consumed by different
				16	applications. These differences can make it annoying to process CSV files
				17	from multiple sources. Still, while the delimiters and quoting characters
				18	vary, the overall format is similar enough that it is possible to write a
				19	single module which can efficiently manipulate such data, hiding the details
				20	of reading and writing the data from the programmer.
				21
Skip Montanaro	5d0136e	2003-04-25 15:14:49 +0000	[diff] [blame]	22	The \module{csv} module implements classes to read and write tabular data in
Skip Montanaro	b4a0417	2003-03-20 23:29:12 +0000	[diff] [blame]	23	CSV format. It allows programmers to say, ``write this data in the format
				24	preferred by Excel,'' or ``read data from this file which was generated by
				25	Excel,'' without knowing the precise details of the CSV format used by
				26	Excel. Programmers can also describe the CSV formats understood by other
				27	applications or define their own special-purpose CSV formats.
				28
Skip Montanaro	5d0136e	2003-04-25 15:14:49 +0000	[diff] [blame]	29	The \module{csv} module's \class{reader} and \class{writer} objects read and
Skip Montanaro	b4a0417	2003-03-20 23:29:12 +0000	[diff] [blame]	30	write sequences. Programmers can also read and write data in dictionary
				31	form using the \class{DictReader} and \class{DictWriter} classes.
				32
Fred Drake	9635268	2003-04-25 18:02:34 +0000	[diff] [blame]	33	\begin{notice}
				34	This version of the \module{csv} module doesn't support Unicode
				35	input. Also, there are currently some issues regarding \ASCII{} NUL
Thomas Wouters	49fd7fa	2006-04-21 10:40:58 +0000	[diff] [blame]	36	characters. Accordingly, all input should be UTF-8 or printable
				37	\ASCII{} to be safe; see the examples in section~\ref{csv-examples}.
				38	These restrictions will be removed in the future.
Fred Drake	9635268	2003-04-25 18:02:34 +0000	[diff] [blame]	39	\end{notice}
Skip Montanaro	b4a0417	2003-03-20 23:29:12 +0000	[diff] [blame]	40
				41	\begin{seealso}
				42	% \seemodule{array}{Arrays of uniformly types numeric values.}
				43	\seepep{305}{CSV File API}
				44	{The Python Enhancement Proposal which proposed this addition
				45	to Python.}
				46	\end{seealso}
				47
				48
Raymond Hettinger	6f6d7b93	2003-08-31 05:44:54 +0000	[diff] [blame]	49	\subsection{Module Contents \label{csv-contents}}
Skip Montanaro	b4a0417	2003-03-20 23:29:12 +0000	[diff] [blame]	50
Skip Montanaro	5d0136e	2003-04-25 15:14:49 +0000	[diff] [blame]	51	The \module{csv} module defines the following functions:
Skip Montanaro	b4a0417	2003-03-20 23:29:12 +0000	[diff] [blame]	52
				53	\begin{funcdesc}{reader}{csvfile\optional{,
Andrew McNamara	8231de0	2005-01-12 11:47:57 +0000	[diff] [blame]	54	dialect=\code{'excel'}}\optional{, fmtparam}}
Skip Montanaro	b4a0417	2003-03-20 23:29:12 +0000	[diff] [blame]	55	Return a reader object which will iterate over lines in the given
				56	{}\var{csvfile}. \var{csvfile} can be any object which supports the
				57	iterator protocol and returns a string each time its \method{next}
Thomas Wouters	0e3f591	2006-08-11 14:57:12 +0000	[diff] [blame]	58	method is called --- file objects and list objects are both suitable.
Andrew McNamara	8231de0	2005-01-12 11:47:57 +0000	[diff] [blame]	59	If \var{csvfile} is a file object, it must be opened with
Skip Montanaro	5e4e39f	2003-07-02 15:32:48 +0000	[diff] [blame]	60	the 'b' flag on platforms where that makes a difference. An optional
				61	{}\var{dialect} parameter can be given
Skip Montanaro	b4a0417	2003-03-20 23:29:12 +0000	[diff] [blame]	62	which is used to define a set of parameters specific to a particular CSV
				63	dialect. It may be an instance of a subclass of the \class{Dialect}
				64	class or one of the strings returned by the \function{list_dialects}
				65	function. The other optional {}\var{fmtparam} keyword arguments can be
				66	given to override individual formatting parameters in the current
Thomas Wouters	89f507f	2006-12-13 04:49:30 +0000	[diff] [blame]	67	dialect. For full details about the dialect and formatting
Raymond Hettinger	6e380cd	2003-09-10 18:54:49 +0000	[diff] [blame]	68	parameters, see section~\ref{csv-fmt-params}, ``Dialects and Formatting
Thomas Wouters	89f507f	2006-12-13 04:49:30 +0000	[diff] [blame]	69	Parameters''.
Skip Montanaro	b4a0417	2003-03-20 23:29:12 +0000	[diff] [blame]	70
				71	All data read are returned as strings. No automatic data type
				72	conversion is performed.
Thomas Wouters	0e3f591	2006-08-11 14:57:12 +0000	[diff] [blame]	73
				74	\versionchanged[
				75	The parser is now stricter with respect to multi-line quoted
				76	fields. Previously, if a line ended within a quoted field without a
				77	terminating newline character, a newline would be inserted into the
				78	returned field. This behavior caused problems when reading files
				79	which contained carriage return characters within fields. The
				80	behavior was changed to return the field without inserting newlines. As
				81	a consequence, if newlines embedded within fields are important, the
				82	input should be split into lines in a manner which preserves the newline
				83	characters]{2.5}
				84
Skip Montanaro	b4a0417	2003-03-20 23:29:12 +0000	[diff] [blame]	85	\end{funcdesc}
				86
				87	\begin{funcdesc}{writer}{csvfile\optional{,
Andrew McNamara	8231de0	2005-01-12 11:47:57 +0000	[diff] [blame]	88	dialect=\code{'excel'}}\optional{, fmtparam}}
Skip Montanaro	b4a0417	2003-03-20 23:29:12 +0000	[diff] [blame]	89	Return a writer object responsible for converting the user's data into
Skip Montanaro	5e4e39f	2003-07-02 15:32:48 +0000	[diff] [blame]	90	delimited strings on the given file-like object. \var{csvfile} can be any
				91	object with a \function{write} method. If \var{csvfile} is a file object,
				92	it must be opened with the 'b' flag on platforms where that makes a
				93	difference. An optional
Skip Montanaro	b4a0417	2003-03-20 23:29:12 +0000	[diff] [blame]	94	{}\var{dialect} parameter can be given which is used to define a set of
				95	parameters specific to a particular CSV dialect. It may be an instance
				96	of a subclass of the \class{Dialect} class or one of the strings
				97	returned by the \function{list_dialects} function. The other optional
				98	{}\var{fmtparam} keyword arguments can be given to override individual
Thomas Wouters	89f507f	2006-12-13 04:49:30 +0000	[diff] [blame]	99	formatting parameters in the current dialect. For full details
Skip Montanaro	b4a0417	2003-03-20 23:29:12 +0000	[diff] [blame]	100	about the dialect and formatting parameters, see
Thomas Wouters	89f507f	2006-12-13 04:49:30 +0000	[diff] [blame]	101	section~\ref{csv-fmt-params}, ``Dialects and Formatting Parameters''.
				102	To make it as easy as possible to
Skip Montanaro	b4a0417	2003-03-20 23:29:12 +0000	[diff] [blame]	103	interface with modules which implement the DB API, the value
				104	\constant{None} is written as the empty string. While this isn't a
				105	reversible transformation, it makes it easier to dump SQL NULL data values
				106	to CSV files without preprocessing the data returned from a
				107	\code{cursor.fetch*()} call. All other non-string data are stringified
				108	with \function{str()} before being written.
				109	\end{funcdesc}
				110
Andrew McNamara	8231de0	2005-01-12 11:47:57 +0000	[diff] [blame]	111	\begin{funcdesc}{register_dialect}{name\optional{, dialect}\optional{, fmtparam}}
				112	Associate \var{dialect} with \var{name}. \var{name} must be a string
				113	or Unicode object. The dialect can be specified either by passing a
				114	sub-class of \class{Dialect}, or by \var{fmtparam} keyword arguments,
				115	or both, with keyword arguments overriding parameters of the dialect.
Thomas Wouters	89f507f	2006-12-13 04:49:30 +0000	[diff] [blame]	116	For full details about the dialect and formatting parameters, see
				117	section~\ref{csv-fmt-params}, ``Dialects and Formatting Parameters''.
Skip Montanaro	b4a0417	2003-03-20 23:29:12 +0000	[diff] [blame]	118	\end{funcdesc}
				119
				120	\begin{funcdesc}{unregister_dialect}{name}
				121	Delete the dialect associated with \var{name} from the dialect registry. An
				122	\exception{Error} is raised if \var{name} is not a registered dialect
				123	name.
				124	\end{funcdesc}
				125
				126	\begin{funcdesc}{get_dialect}{name}
				127	Return the dialect associated with \var{name}. An \exception{Error} is
				128	raised if \var{name} is not a registered dialect name.
				129	\end{funcdesc}
				130
				131	\begin{funcdesc}{list_dialects}{}
				132	Return the names of all registered dialects.
				133	\end{funcdesc}
				134
Andrew McNamara	8231de0	2005-01-12 11:47:57 +0000	[diff] [blame]	135	\begin{funcdesc}{field_size_limit}{\optional{new_limit}}
				136	Returns the current maximum field size allowed by the parser. If
				137	\var{new_limit} is given, this becomes the new limit.
				138	\versionadded{2.5}
				139	\end{funcdesc}
				140
Skip Montanaro	b4a0417	2003-03-20 23:29:12 +0000	[diff] [blame]	141
Skip Montanaro	5d0136e	2003-04-25 15:14:49 +0000	[diff] [blame]	142	The \module{csv} module defines the following classes:
Skip Montanaro	b4a0417	2003-03-20 23:29:12 +0000	[diff] [blame]	143
Skip Montanaro	dffeed3	2003-10-03 14:03:01 +0000	[diff] [blame]	144	\begin{classdesc}{DictReader}{csvfile\optional{,
				145	fieldnames=\constant{None},\optional{,
Fred Drake	9635268	2003-04-25 18:02:34 +0000	[diff] [blame]	146	restkey=\constant{None}\optional{,
				147	restval=\constant{None}\optional{,
Skip Montanaro	b4a0417	2003-03-20 23:29:12 +0000	[diff] [blame]	148	dialect=\code{'excel'}\optional{,
Skip Montanaro	10659f2	2004-04-16 03:21:01 +0000	[diff] [blame]	149	args, *kwds}}}}}}
Skip Montanaro	b4a0417	2003-03-20 23:29:12 +0000	[diff] [blame]	150	Create an object which operates like a regular reader but maps the
Skip Montanaro	dffeed3	2003-10-03 14:03:01 +0000	[diff] [blame]	151	information read into a dict whose keys are given by the optional
				152	{} \var{fieldnames}
				153	parameter. If the \var{fieldnames} parameter is omitted, the values in
				154	the first row of the \var{csvfile} will be used as the fieldnames.
				155	If the row read has fewer fields than the fieldnames sequence,
Skip Montanaro	b4a0417	2003-03-20 23:29:12 +0000	[diff] [blame]	156	the value of \var{restval} will be used as the default value. If the row
				157	read has more fields than the fieldnames sequence, the remaining data is
				158	added as a sequence keyed by the value of \var{restkey}. If the row read
				159	has fewer fields than the fieldnames sequence, the remaining keys take the
Skip Montanaro	10659f2	2004-04-16 03:21:01 +0000	[diff] [blame]	160	value of the optional \var{restval} parameter. Any other optional or
				161	keyword arguments are passed to the underlying \class{reader} instance.
Skip Montanaro	b4a0417	2003-03-20 23:29:12 +0000	[diff] [blame]	162	\end{classdesc}
				163
				164
				165	\begin{classdesc}{DictWriter}{csvfile, fieldnames\optional{,
				166	restval=""\optional{,
				167	extrasaction=\code{'raise'}\optional{,
Skip Montanaro	10659f2	2004-04-16 03:21:01 +0000	[diff] [blame]	168	dialect=\code{'excel'}\optional{,
				169	args, *kwds}}}}}
Skip Montanaro	b4a0417	2003-03-20 23:29:12 +0000	[diff] [blame]	170	Create an object which operates like a regular writer but maps dictionaries
				171	onto output rows. The \var{fieldnames} parameter identifies the order in
				172	which values in the dictionary passed to the \method{writerow()} method are
				173	written to the \var{csvfile}. The optional \var{restval} parameter
				174	specifies the value to be written if the dictionary is missing a key in
				175	\var{fieldnames}. If the dictionary passed to the \method{writerow()}
				176	method contains a key not found in \var{fieldnames}, the optional
				177	\var{extrasaction} parameter indicates what action to take. If it is set
				178	to \code{'raise'} a \exception{ValueError} is raised. If it is set to
Skip Montanaro	10659f2	2004-04-16 03:21:01 +0000	[diff] [blame]	179	\code{'ignore'}, extra values in the dictionary are ignored. Any other
				180	optional or keyword arguments are passed to the underlying \class{writer}
				181	instance.
Skip Montanaro	dffeed3	2003-10-03 14:03:01 +0000	[diff] [blame]	182
				183	Note that unlike the \class{DictReader} class, the \var{fieldnames}
				184	parameter of the \class{DictWriter} is not optional. Since Python's
				185	\class{dict} objects are not ordered, there is not enough information
				186	available to deduce the order in which the row should be written to the
				187	\var{csvfile}.
				188
Skip Montanaro	b4a0417	2003-03-20 23:29:12 +0000	[diff] [blame]	189	\end{classdesc}
				190
Skip Montanaro	b4a0417	2003-03-20 23:29:12 +0000	[diff] [blame]	191	\begin{classdesc*}{Dialect}{}
				192	The \class{Dialect} class is a container class relied on primarily for its
				193	attributes, which are used to define the parameters for a specific
Fred Drake	9635268	2003-04-25 18:02:34 +0000	[diff] [blame]	194	\class{reader} or \class{writer} instance.
Skip Montanaro	b4a0417	2003-03-20 23:29:12 +0000	[diff] [blame]	195	\end{classdesc*}
				196
Skip Montanaro	bb0c9dc	2005-01-05 06:58:15 +0000	[diff] [blame]	197	\begin{classdesc}{excel}{}
				198	The \class{excel} class defines the usual properties of an Excel-generated
Thomas Wouters	89f507f	2006-12-13 04:49:30 +0000	[diff] [blame]	199	CSV file. It is registered with the dialect name \code{'excel'}.
Skip Montanaro	bb0c9dc	2005-01-05 06:58:15 +0000	[diff] [blame]	200	\end{classdesc}
				201
				202	\begin{classdesc}{excel_tab}{}
				203	The \class{excel_tab} class defines the usual properties of an
Thomas Wouters	89f507f	2006-12-13 04:49:30 +0000	[diff] [blame]	204	Excel-generated TAB-delimited file. It is registered with the dialect name
				205	\code{'excel-tab'}.
Skip Montanaro	bb0c9dc	2005-01-05 06:58:15 +0000	[diff] [blame]	206	\end{classdesc}
				207
Skip Montanaro	7789237	2003-05-19 15:33:36 +0000	[diff] [blame]	208	\begin{classdesc}{Sniffer}{}
				209	The \class{Sniffer} class is used to deduce the format of a CSV file.
Fred Drake	9635268	2003-04-25 18:02:34 +0000	[diff] [blame]	210	\end{classdesc}
				211
Skip Montanaro	8bdaac7	2005-12-28 15:56:58 +0000	[diff] [blame]	212	The \class{Sniffer} class provides two methods:
Fred Drake	9635268	2003-04-25 18:02:34 +0000	[diff] [blame]	213
Skip Montanaro	7789237	2003-05-19 15:33:36 +0000	[diff] [blame]	214	\begin{methoddesc}{sniff}{sample\optional{,delimiters=None}}
				215	Analyze the given \var{sample} and return a \class{Dialect} subclass
				216	reflecting the parameters found. If the optional \var{delimiters} parameter
				217	is given, it is interpreted as a string containing possible valid delimiter
				218	characters.
Fred Drake	9635268	2003-04-25 18:02:34 +0000	[diff] [blame]	219	\end{methoddesc}
				220
				221	\begin{methoddesc}{has_header}{sample}
				222	Analyze the sample text (presumed to be in CSV format) and return
				223	\constant{True} if the first row appears to be a series of column
				224	headers.
				225	\end{methoddesc}
				226
				227
Skip Montanaro	b4a0417	2003-03-20 23:29:12 +0000	[diff] [blame]	228	The \module{csv} module defines the following constants:
				229
Skip Montanaro	a104556	2003-06-04 15:30:13 +0000	[diff] [blame]	230	\begin{datadesc}{QUOTE_ALL}
Skip Montanaro	b4a0417	2003-03-20 23:29:12 +0000	[diff] [blame]	231	Instructs \class{writer} objects to quote all fields.
				232	\end{datadesc}
				233
				234	\begin{datadesc}{QUOTE_MINIMAL}
				235	Instructs \class{writer} objects to only quote those fields which contain
Andrew McNamara	8231de0	2005-01-12 11:47:57 +0000	[diff] [blame]	236	special characters such as \var{delimiter}, \var{quotechar} or any of the
				237	characters in \var{lineterminator}.
Skip Montanaro	b4a0417	2003-03-20 23:29:12 +0000	[diff] [blame]	238	\end{datadesc}
				239
				240	\begin{datadesc}{QUOTE_NONNUMERIC}
Andrew McNamara	8231de0	2005-01-12 11:47:57 +0000	[diff] [blame]	241	Instructs \class{writer} objects to quote all non-numeric
				242	fields.
				243
				244	Instructs the reader to convert all non-quoted fields to type \var{float}.
Skip Montanaro	b4a0417	2003-03-20 23:29:12 +0000	[diff] [blame]	245	\end{datadesc}
				246
				247	\begin{datadesc}{QUOTE_NONE}
				248	Instructs \class{writer} objects to never quote fields. When the current
				249	\var{delimiter} occurs in output data it is preceded by the current
Andrew McNamara	8231de0	2005-01-12 11:47:57 +0000	[diff] [blame]	250	\var{escapechar} character. If \var{escapechar} is not set, the writer
				251	will raise \exception{Error} if any characters that require escaping
				252	are encountered.
				253
				254	Instructs \class{reader} to perform no special processing of quote characters.
Skip Montanaro	b4a0417	2003-03-20 23:29:12 +0000	[diff] [blame]	255	\end{datadesc}
				256
				257
				258	The \module{csv} module defines the following exception:
				259
				260	\begin{excdesc}{Error}
				261	Raised by any of the functions when an error is detected.
				262	\end{excdesc}
				263
				264
Fred Drake	9635268	2003-04-25 18:02:34 +0000	[diff] [blame]	265	\subsection{Dialects and Formatting Parameters\label{csv-fmt-params}}
Skip Montanaro	b4a0417	2003-03-20 23:29:12 +0000	[diff] [blame]	266
				267	To make it easier to specify the format of input and output records,
				268	specific formatting parameters are grouped together into dialects. A
				269	dialect is a subclass of the \class{Dialect} class having a set of specific
				270	methods and a single \method{validate()} method. When creating \class{reader}
				271	or \class{writer} objects, the programmer can specify a string or a subclass
				272	of the \class{Dialect} class as the dialect parameter. In addition to, or
				273	instead of, the \var{dialect} parameter, the programmer can also specify
				274	individual formatting parameters, which have the same names as the
Raymond Hettinger	6f6d7b93	2003-08-31 05:44:54 +0000	[diff] [blame]	275	attributes defined below for the \class{Dialect} class.
Skip Montanaro	b4a0417	2003-03-20 23:29:12 +0000	[diff] [blame]	276
Fred Drake	9635268	2003-04-25 18:02:34 +0000	[diff] [blame]	277	Dialects support the following attributes:
				278
				279	\begin{memberdesc}[Dialect]{delimiter}
				280	A one-character string used to separate fields. It defaults to \code{','}.
				281	\end{memberdesc}
				282
				283	\begin{memberdesc}[Dialect]{doublequote}
Andrew McNamara	8231de0	2005-01-12 11:47:57 +0000	[diff] [blame]	284	Controls how instances of \var{quotechar} appearing inside a field should
				285	be themselves be quoted. When \constant{True}, the character is doubled.
				286	When \constant{False}, the \var{escapechar} is used as a prefix to the
				287	\var{quotechar}. It defaults to \constant{True}.
				288
				289	On output, if \var{doublequote} is \constant{False} and no
				290	\var{escapechar} is set, \exception{Error} is raised if a \var{quotechar}
				291	is found in a field.
Fred Drake	9635268	2003-04-25 18:02:34 +0000	[diff] [blame]	292	\end{memberdesc}
				293
				294	\begin{memberdesc}[Dialect]{escapechar}
Andrew McNamara	8231de0	2005-01-12 11:47:57 +0000	[diff] [blame]	295	A one-character string used by the writer to escape the \var{delimiter} if
				296	\var{quoting} is set to \constant{QUOTE_NONE} and the \var{quotechar}
				297	if \var{doublequote} is \constant{False}. On reading, the \var{escapechar}
				298	removes any special meaning from the following character. It defaults
				299	to \constant{None}, which disables escaping.
Fred Drake	9635268	2003-04-25 18:02:34 +0000	[diff] [blame]	300	\end{memberdesc}
				301
				302	\begin{memberdesc}[Dialect]{lineterminator}
Andrew McNamara	8231de0	2005-01-12 11:47:57 +0000	[diff] [blame]	303	The string used to terminate lines produced by the \class{writer}.
				304	It defaults to \code{'\e r\e n'}.
				305
				306	\note{The \class{reader} is hard-coded to recognise either \code{'\e r'}
				307	or \code{'\e n'} as end-of-line, and ignores \var{lineterminator}. This
				308	behavior may change in the future.}
Fred Drake	9635268	2003-04-25 18:02:34 +0000	[diff] [blame]	309	\end{memberdesc}
				310
				311	\begin{memberdesc}[Dialect]{quotechar}
Andrew McNamara	8231de0	2005-01-12 11:47:57 +0000	[diff] [blame]	312	A one-character string used to quote fields containing special characters,
				313	such as the \var{delimiter} or \var{quotechar}, or which contain new-line
				314	characters. It defaults to \code{'"'}.
Fred Drake	9635268	2003-04-25 18:02:34 +0000	[diff] [blame]	315	\end{memberdesc}
				316
				317	\begin{memberdesc}[Dialect]{quoting}
Andrew McNamara	8231de0	2005-01-12 11:47:57 +0000	[diff] [blame]	318	Controls when quotes should be generated by the writer and recognised
				319	by the reader. It can take on any of the \constant{QUOTE_*} constants
				320	(see section~\ref{csv-contents}) and defaults to \constant{QUOTE_MINIMAL}.
Fred Drake	9635268	2003-04-25 18:02:34 +0000	[diff] [blame]	321	\end{memberdesc}
				322
				323	\begin{memberdesc}[Dialect]{skipinitialspace}
				324	When \constant{True}, whitespace immediately following the \var{delimiter}
				325	is ignored. The default is \constant{False}.
				326	\end{memberdesc}
				327
Skip Montanaro	b4a0417	2003-03-20 23:29:12 +0000	[diff] [blame]	328
				329	\subsection{Reader Objects}
				330
Fred Drake	9635268	2003-04-25 18:02:34 +0000	[diff] [blame]	331	Reader objects (\class{DictReader} instances and objects returned by
Raymond Hettinger	6f6d7b93	2003-08-31 05:44:54 +0000	[diff] [blame]	332	the \function{reader()} function) have the following public methods:
Skip Montanaro	b4a0417	2003-03-20 23:29:12 +0000	[diff] [blame]	333
Fred Drake	9635268	2003-04-25 18:02:34 +0000	[diff] [blame]	334	\begin{methoddesc}[csv reader]{next}{}
Skip Montanaro	b4a0417	2003-03-20 23:29:12 +0000	[diff] [blame]	335	Return the next row of the reader's iterable object as a list, parsed
				336	according to the current dialect.
				337	\end{methoddesc}
				338
Andrew McNamara	8231de0	2005-01-12 11:47:57 +0000	[diff] [blame]	339	Reader objects have the following public attributes:
				340
				341	\begin{memberdesc}[csv reader]{dialect}
				342	A read-only description of the dialect in use by the parser.
				343	\end{memberdesc}
				344
				345	\begin{memberdesc}[csv reader]{line_num}
				346	The number of lines read from the source iterator. This is not the same
				347	as the number of records returned, as records can span multiple lines.
Thomas Wouters	89f507f	2006-12-13 04:49:30 +0000	[diff] [blame]	348	\versionadded{2.5}
Andrew McNamara	8231de0	2005-01-12 11:47:57 +0000	[diff] [blame]	349	\end{memberdesc}
				350
Skip Montanaro	b4a0417	2003-03-20 23:29:12 +0000	[diff] [blame]	351
				352	\subsection{Writer Objects}
				353
Skip Montanaro	ba0485a	2004-01-21 13:47:04 +0000	[diff] [blame]	354	\class{Writer} objects (\class{DictWriter} instances and objects returned by
				355	the \function{writer()} function) have the following public methods. A
				356	{}\var{row} must be a sequence of strings or numbers for \class{Writer}
				357	objects and a dictionary mapping fieldnames to strings or numbers (by
				358	passing them through \function{str()} first) for {}\class{DictWriter}
				359	objects. Note that complex numbers are written out surrounded by parens.
				360	This may cause some problems for other programs which read CSV files
				361	(assuming they support complex numbers at all).
Skip Montanaro	b4a0417	2003-03-20 23:29:12 +0000	[diff] [blame]	362
Fred Drake	9635268	2003-04-25 18:02:34 +0000	[diff] [blame]	363	\begin{methoddesc}[csv writer]{writerow}{row}
Skip Montanaro	b4a0417	2003-03-20 23:29:12 +0000	[diff] [blame]	364	Write the \var{row} parameter to the writer's file object, formatted
				365	according to the current dialect.
				366	\end{methoddesc}
				367
Fred Drake	9635268	2003-04-25 18:02:34 +0000	[diff] [blame]	368	\begin{methoddesc}[csv writer]{writerows}{rows}
Skip Montanaro	ba0485a	2004-01-21 13:47:04 +0000	[diff] [blame]	369	Write all the \var{rows} parameters (a list of \var{row} objects as
				370	described above) to the writer's file object, formatted
Skip Montanaro	b4a0417	2003-03-20 23:29:12 +0000	[diff] [blame]	371	according to the current dialect.
				372	\end{methoddesc}
				373
Andrew McNamara	8231de0	2005-01-12 11:47:57 +0000	[diff] [blame]	374	Writer objects have the following public attribute:
				375
				376	\begin{memberdesc}[csv writer]{dialect}
				377	A read-only description of the dialect in use by the writer.
				378	\end{memberdesc}
				379
				380
Skip Montanaro	b4a0417	2003-03-20 23:29:12 +0000	[diff] [blame]	381
Thomas Wouters	49fd7fa	2006-04-21 10:40:58 +0000	[diff] [blame]	382	\subsection{Examples\label{csv-examples}}
Skip Montanaro	b4a0417	2003-03-20 23:29:12 +0000	[diff] [blame]	383
Andrew McNamara	8231de0	2005-01-12 11:47:57 +0000	[diff] [blame]	384	The simplest example of reading a CSV file:
Skip Montanaro	b4a0417	2003-03-20 23:29:12 +0000	[diff] [blame]	385
				386	\begin{verbatim}
Fred Drake	9635268	2003-04-25 18:02:34 +0000	[diff] [blame]	387	import csv
Andrew M. Kuchling	6f937b1	2004-08-07 15:11:24 +0000	[diff] [blame]	388	reader = csv.reader(open("some.csv", "rb"))
Fred Drake	9635268	2003-04-25 18:02:34 +0000	[diff] [blame]	389	for row in reader:
				390	print row
Skip Montanaro	b4a0417	2003-03-20 23:29:12 +0000	[diff] [blame]	391	\end{verbatim}
				392
Andrew McNamara	8231de0	2005-01-12 11:47:57 +0000	[diff] [blame]	393	Reading a file with an alternate format:
Skip Montanaro	2b2795a	2004-07-08 19:49:10 +0000	[diff] [blame]	394
				395	\begin{verbatim}
				396	import csv
Andrew McNamara	8231de0	2005-01-12 11:47:57 +0000	[diff] [blame]	397	reader = csv.reader(open("passwd", "rb"), delimiter=':', quoting=csv.QUOTE_NONE)
Skip Montanaro	2b2795a	2004-07-08 19:49:10 +0000	[diff] [blame]	398	for row in reader:
Andrew McNamara	8231de0	2005-01-12 11:47:57 +0000	[diff] [blame]	399	print row
Skip Montanaro	2b2795a	2004-07-08 19:49:10 +0000	[diff] [blame]	400	\end{verbatim}
				401
Andrew McNamara	8231de0	2005-01-12 11:47:57 +0000	[diff] [blame]	402	The corresponding simplest possible writing example is:
Skip Montanaro	b4a0417	2003-03-20 23:29:12 +0000	[diff] [blame]	403
				404	\begin{verbatim}
Fred Drake	9635268	2003-04-25 18:02:34 +0000	[diff] [blame]	405	import csv
Andrew M. Kuchling	6f937b1	2004-08-07 15:11:24 +0000	[diff] [blame]	406	writer = csv.writer(open("some.csv", "wb"))
Andrew McNamara	8231de0	2005-01-12 11:47:57 +0000	[diff] [blame]	407	writer.writerows(someiterable)
Skip Montanaro	b4a0417	2003-03-20 23:29:12 +0000	[diff] [blame]	408	\end{verbatim}
Andrew McNamara	8231de0	2005-01-12 11:47:57 +0000	[diff] [blame]	409
				410	Registering a new dialect:
				411
				412	\begin{verbatim}
				413	import csv
				414
				415	csv.register_dialect('unixpwd', delimiter=':', quoting=csv.QUOTE_NONE)
				416
				417	reader = csv.reader(open("passwd", "rb"), 'unixpwd')
				418	\end{verbatim}
				419
Thomas Wouters	0e3f591	2006-08-11 14:57:12 +0000	[diff] [blame]	420	A slightly more advanced use of the reader --- catching and reporting errors:
Andrew McNamara	8231de0	2005-01-12 11:47:57 +0000	[diff] [blame]	421
				422	\begin{verbatim}
				423	import csv, sys
				424	filename = "some.csv"
				425	reader = csv.reader(open(filename, "rb"))
				426	try:
				427	for row in reader:
				428	print row
Guido van Rossum	b940e11	2007-01-10 16:19:56 +0000	[diff] [blame]	429	except csv.Error as e:
Andrew McNamara	8231de0	2005-01-12 11:47:57 +0000	[diff] [blame]	430	sys.exit('file %s, line %d: %s' % (filename, reader.line_num, e))
				431	\end{verbatim}
				432
				433	And while the module doesn't directly support parsing strings, it can
				434	easily be done:
				435
				436	\begin{verbatim}
				437	import csv
Thomas Wouters	bbdf607	2006-02-16 14:57:05 +0000	[diff] [blame]	438	for row in csv.reader(['one,two,three']):
				439	print row
Andrew McNamara	8231de0	2005-01-12 11:47:57 +0000	[diff] [blame]	440	\end{verbatim}
				441
Skip Montanaro	5011c3f	2005-03-18 16:56:37 +0000	[diff] [blame]	442	The \module{csv} module doesn't directly support reading and writing
Thomas Wouters	49fd7fa	2006-04-21 10:40:58 +0000	[diff] [blame]	443	Unicode, but it is 8-bit-clean save for some problems with \ASCII{} NUL
				444	characters. So you can write functions or classes that handle the
				445	encoding and decoding for you as long as you avoid encodings like
				446	UTF-16 that use NULs. UTF-8 is recommended.
				447
				448	\function{unicode_csv_reader} below is a generator that wraps
				449	\class{csv.reader} to handle Unicode CSV data (a list of Unicode
				450	strings). \function{utf_8_encoder} is a generator that encodes the
				451	Unicode strings as UTF-8, one string (or row) at a time. The encoded
				452	strings are parsed by the CSV reader, and
				453	\function{unicode_csv_reader} decodes the UTF-8-encoded cells back
				454	into Unicode:
Skip Montanaro	5011c3f	2005-03-18 16:56:37 +0000	[diff] [blame]	455
				456	\begin{verbatim}
				457	import csv
				458
Thomas Wouters	49fd7fa	2006-04-21 10:40:58 +0000	[diff] [blame]	459	def unicode_csv_reader(unicode_csv_data, dialect=csv.excel, **kwargs):
				460	# csv.py doesn't do Unicode; encode temporarily as UTF-8:
				461	csv_reader = csv.reader(utf_8_encoder(unicode_csv_data),
				462	dialect=dialect, **kwargs)
				463	for row in csv_reader:
				464	# decode UTF-8 back to Unicode, cell by cell:
				465	yield [unicode(cell, 'utf-8') for cell in row]
				466
				467	def utf_8_encoder(unicode_csv_data):
				468	for line in unicode_csv_data:
				469	yield line.encode('utf-8')
				470	\end{verbatim}
				471
				472	For all other encodings the following \class{UnicodeReader} and
				473	\class{UnicodeWriter} classes can be used. They take an additional
				474	\var{encoding} parameter in their constructor and make sure that the data
				475	passes the real reader or writer encoded as UTF-8:
				476
				477	\begin{verbatim}
				478	import csv, codecs, cStringIO
				479
				480	class UTF8Recoder:
				481	"""
				482	Iterator that reads an encoded stream and reencodes the input to UTF-8
				483	"""
				484	def __init__(self, f, encoding):
				485	self.reader = codecs.getreader(encoding)(f)
				486
				487	def __iter__(self):
				488	return self
				489
Georg Brandl	a18af4e	2007-04-21 15:47:16 +0000	[diff] [blame]	490	def __next__(self):
				491	return next(self.reader).encode("utf-8")
Thomas Wouters	49fd7fa	2006-04-21 10:40:58 +0000	[diff] [blame]	492
Skip Montanaro	5011c3f	2005-03-18 16:56:37 +0000	[diff] [blame]	493	class UnicodeReader:
Thomas Wouters	49fd7fa	2006-04-21 10:40:58 +0000	[diff] [blame]	494	"""
				495	A CSV reader which will iterate over lines in the CSV file "f",
				496	which is encoded in the given encoding.
				497	"""
				498
Skip Montanaro	5011c3f	2005-03-18 16:56:37 +0000	[diff] [blame]	499	def __init__(self, f, dialect=csv.excel, encoding="utf-8", **kwds):
Thomas Wouters	49fd7fa	2006-04-21 10:40:58 +0000	[diff] [blame]	500	f = UTF8Recoder(f, encoding)
Skip Montanaro	5011c3f	2005-03-18 16:56:37 +0000	[diff] [blame]	501	self.reader = csv.reader(f, dialect=dialect, **kwds)
Skip Montanaro	5011c3f	2005-03-18 16:56:37 +0000	[diff] [blame]	502
Georg Brandl	a18af4e	2007-04-21 15:47:16 +0000	[diff] [blame]	503	def __next__(self):
				504	row = next(self.reader)
Thomas Wouters	49fd7fa	2006-04-21 10:40:58 +0000	[diff] [blame]	505	return [unicode(s, "utf-8") for s in row]
Skip Montanaro	5011c3f	2005-03-18 16:56:37 +0000	[diff] [blame]	506
				507	def __iter__(self):
				508	return self
				509
				510	class UnicodeWriter:
Thomas Wouters	49fd7fa	2006-04-21 10:40:58 +0000	[diff] [blame]	511	"""
				512	A CSV writer which will write rows to CSV file "f",
				513	which is encoded in the given encoding.
				514	"""
				515
Skip Montanaro	5011c3f	2005-03-18 16:56:37 +0000	[diff] [blame]	516	def __init__(self, f, dialect=csv.excel, encoding="utf-8", **kwds):
Thomas Wouters	49fd7fa	2006-04-21 10:40:58 +0000	[diff] [blame]	517	# Redirect output to a queue
				518	self.queue = cStringIO.StringIO()
				519	self.writer = csv.writer(self.queue, dialect=dialect, **kwds)
				520	self.stream = f
				521	self.encoder = codecs.getincrementalencoder(encoding)()
Skip Montanaro	5011c3f	2005-03-18 16:56:37 +0000	[diff] [blame]	522
				523	def writerow(self, row):
Thomas Wouters	49fd7fa	2006-04-21 10:40:58 +0000	[diff] [blame]	524	self.writer.writerow([s.encode("utf-8") for s in row])
				525	# Fetch UTF-8 output from the queue ...
				526	data = self.queue.getvalue()
				527	data = data.decode("utf-8")
				528	# ... and reencode it into the target encoding
				529	data = self.encoder.encode(data)
				530	# write to the target stream
				531	self.stream.write(data)
				532	# empty queue
				533	self.queue.truncate(0)
Skip Montanaro	5011c3f	2005-03-18 16:56:37 +0000	[diff] [blame]	534
				535	def writerows(self, rows):
				536	for row in rows:
				537	self.writerow(row)
				538	\end{verbatim}