Blame - Doc/lib/libcsv.tex - platform/external/python/cpython2

blob: 7de90567b0394cf1d0addc8af5cf7a760d96aa44 [file] [log] [blame]

Skip Montanaro	b4a0417	2003-03-20 23:29:12 +0000	[diff] [blame]	1	\section{\module{csv} --- CSV File Reading and Writing}
				2
				3	\declaremodule{standard}{csv}
				4	\modulesynopsis{Write and read tabular data to and from delimited files.}
Skip Montanaro	3bd3c84	2003-04-24 18:47:31 +0000	[diff] [blame]	5	\sectionauthor{Skip Montanaro}{skip@pobox.com}
Skip Montanaro	b4a0417	2003-03-20 23:29:12 +0000	[diff] [blame]	6
				7	\versionadded{2.3}
				8	\index{csv}
				9	\indexii{data}{tabular}
				10
				11	The so-called CSV (Comma Separated Values) format is the most common import
				12	and export format for spreadsheets and databases. There is no ``CSV
				13	standard'', so the format is operationally defined by the many applications
				14	which read and write it. The lack of a standard means that subtle
				15	differences often exist in the data produced and consumed by different
				16	applications. These differences can make it annoying to process CSV files
				17	from multiple sources. Still, while the delimiters and quoting characters
				18	vary, the overall format is similar enough that it is possible to write a
				19	single module which can efficiently manipulate such data, hiding the details
				20	of reading and writing the data from the programmer.
				21
Skip Montanaro	5d0136e	2003-04-25 15:14:49 +0000	[diff] [blame^]	22	The \module{csv} module implements classes to read and write tabular data in
Skip Montanaro	b4a0417	2003-03-20 23:29:12 +0000	[diff] [blame]	23	CSV format. It allows programmers to say, ``write this data in the format
				24	preferred by Excel,'' or ``read data from this file which was generated by
				25	Excel,'' without knowing the precise details of the CSV format used by
				26	Excel. Programmers can also describe the CSV formats understood by other
				27	applications or define their own special-purpose CSV formats.
				28
Skip Montanaro	5d0136e	2003-04-25 15:14:49 +0000	[diff] [blame^]	29	The \module{csv} module's \class{reader} and \class{writer} objects read and
Skip Montanaro	b4a0417	2003-03-20 23:29:12 +0000	[diff] [blame]	30	write sequences. Programmers can also read and write data in dictionary
				31	form using the \class{DictReader} and \class{DictWriter} classes.
				32
Skip Montanaro	5d0136e	2003-04-25 15:14:49 +0000	[diff] [blame^]	33	\note{The first version of the \module{csv} module doesn't support Unicode
Skip Montanaro	b4a0417	2003-03-20 23:29:12 +0000	[diff] [blame]	34	input. Also, there are currently some issues regarding \ASCII{} NUL
Skip Montanaro	3bd3c84	2003-04-24 18:47:31 +0000	[diff] [blame]	35	characters. Accordingly, all input should generally be printable \ASCII{}
				36	to be safe. These restrictions will be removed in the future.}
Skip Montanaro	b4a0417	2003-03-20 23:29:12 +0000	[diff] [blame]	37
				38	\begin{seealso}
				39	% \seemodule{array}{Arrays of uniformly types numeric values.}
				40	\seepep{305}{CSV File API}
				41	{The Python Enhancement Proposal which proposed this addition
				42	to Python.}
				43	\end{seealso}
				44
				45
Skip Montanaro	5d0136e	2003-04-25 15:14:49 +0000	[diff] [blame^]	46	\subsection{Module Contents}
Skip Montanaro	b4a0417	2003-03-20 23:29:12 +0000	[diff] [blame]	47
				48
Skip Montanaro	5d0136e	2003-04-25 15:14:49 +0000	[diff] [blame^]	49	The \module{csv} module defines the following functions:
Skip Montanaro	b4a0417	2003-03-20 23:29:12 +0000	[diff] [blame]	50
				51	\begin{funcdesc}{reader}{csvfile\optional{,
				52	dialect=\code{'excel'}\optional{, fmtparam}}}
				53	Return a reader object which will iterate over lines in the given
				54	{}\var{csvfile}. \var{csvfile} can be any object which supports the
				55	iterator protocol and returns a string each time its \method{next}
				56	method is called. An optional \var{dialect} parameter can be given
				57	which is used to define a set of parameters specific to a particular CSV
				58	dialect. It may be an instance of a subclass of the \class{Dialect}
				59	class or one of the strings returned by the \function{list_dialects}
				60	function. The other optional {}\var{fmtparam} keyword arguments can be
				61	given to override individual formatting parameters in the current
				62	dialect. For more information about the dialect and formatting
				63	parameters, see section~\ref{fmt-params}, ``Dialects and Formatting
				64	Parameters'' for details of these parameters.
				65
				66	All data read are returned as strings. No automatic data type
				67	conversion is performed.
				68	\end{funcdesc}
				69
				70	\begin{funcdesc}{writer}{csvfile\optional{,
				71	dialect=\code{'excel'}\optional{, fmtparam}}}
				72	Return a writer object responsible for converting the user's data into
				73	delimited strings on the given file-like object. An optional
				74	{}\var{dialect} parameter can be given which is used to define a set of
				75	parameters specific to a particular CSV dialect. It may be an instance
				76	of a subclass of the \class{Dialect} class or one of the strings
				77	returned by the \function{list_dialects} function. The other optional
				78	{}\var{fmtparam} keyword arguments can be given to override individual
				79	formatting parameters in the current dialect. For more information
				80	about the dialect and formatting parameters, see
				81	section~\ref{fmt-params}, ``Dialects and Formatting Parameters'' for
				82	details of these parameters. To make it as easy as possible to
				83	interface with modules which implement the DB API, the value
				84	\constant{None} is written as the empty string. While this isn't a
				85	reversible transformation, it makes it easier to dump SQL NULL data values
				86	to CSV files without preprocessing the data returned from a
				87	\code{cursor.fetch*()} call. All other non-string data are stringified
				88	with \function{str()} before being written.
				89	\end{funcdesc}
				90
				91	\begin{funcdesc}{register_dialect}{name, dialect}
				92	Associate \var{dialect} with \var{name}. \var{dialect} must be a subclass
				93	of \class{csv.Dialect}. \var{name} must be a string or Unicode object.
				94	\end{funcdesc}
				95
				96	\begin{funcdesc}{unregister_dialect}{name}
				97	Delete the dialect associated with \var{name} from the dialect registry. An
				98	\exception{Error} is raised if \var{name} is not a registered dialect
				99	name.
				100	\end{funcdesc}
				101
				102	\begin{funcdesc}{get_dialect}{name}
				103	Return the dialect associated with \var{name}. An \exception{Error} is
				104	raised if \var{name} is not a registered dialect name.
				105	\end{funcdesc}
				106
				107	\begin{funcdesc}{list_dialects}{}
				108	Return the names of all registered dialects.
				109	\end{funcdesc}
				110
				111
Skip Montanaro	5d0136e	2003-04-25 15:14:49 +0000	[diff] [blame^]	112	The \module{csv} module defines the following classes:
Skip Montanaro	b4a0417	2003-03-20 23:29:12 +0000	[diff] [blame]	113
				114	\begin{classdesc}{DictReader}{csvfile, fieldnames\optional{,
				115	restkey=\code{None}\optional{,
				116	restval=\code{None}\optional{,
				117	dialect=\code{'excel'}\optional{,
				118	fmtparam}}}}}
				119	Create an object which operates like a regular reader but maps the
				120	information read into a dict whose keys are given by the \var{fieldnames}
				121	parameter. If the row read has fewer fields than the fieldnames sequence,
				122	the value of \var{restval} will be used as the default value. If the row
				123	read has more fields than the fieldnames sequence, the remaining data is
				124	added as a sequence keyed by the value of \var{restkey}. If the row read
				125	has fewer fields than the fieldnames sequence, the remaining keys take the
				126	value of the optiona \var{restval} parameter. All other parameters are
				127	interpreted as for regular readers.
				128	\end{classdesc}
				129
				130
				131	\begin{classdesc}{DictWriter}{csvfile, fieldnames\optional{,
				132	restval=""\optional{,
				133	extrasaction=\code{'raise'}\optional{,
				134	dialect=\code{'excel'}\optional{, fmtparam}}}}}
				135	Create an object which operates like a regular writer but maps dictionaries
				136	onto output rows. The \var{fieldnames} parameter identifies the order in
				137	which values in the dictionary passed to the \method{writerow()} method are
				138	written to the \var{csvfile}. The optional \var{restval} parameter
				139	specifies the value to be written if the dictionary is missing a key in
				140	\var{fieldnames}. If the dictionary passed to the \method{writerow()}
				141	method contains a key not found in \var{fieldnames}, the optional
				142	\var{extrasaction} parameter indicates what action to take. If it is set
				143	to \code{'raise'} a \exception{ValueError} is raised. If it is set to
				144	\code{'ignore'}, extra values in the dictionary are ignored. All other
				145	parameters are interpreted as for regular writers.
				146	\end{classdesc}
				147
				148
				149	\begin{classdesc*}{Dialect}{}
				150	The \class{Dialect} class is a container class relied on primarily for its
				151	attributes, which are used to define the parameters for a specific
				152	\class{reader} or \class{writer} instance. Dialect objects support the
				153	following data attributes:
				154
				155	\begin{memberdesc}[string]{delimiter}
				156	A one-character string used to separate fields. It defaults to \code{","}.
				157	\end{memberdesc}
				158
				159	\begin{memberdesc}[boolean]{doublequote}
				160	Controls how instances of \var{quotechar} appearing inside a field should be
				161	themselves be quoted. When \constant{True}, the character is doubledd.
				162	When \constant{False}, the \var{escapechar} must be a one-character string
				163	which is used as a prefix to the \var{quotechar}. It defaults to
				164	\constant{True}.
				165	\end{memberdesc}
				166
				167	\begin{memberdesc}{escapechar}
				168	A one-character string used to escape the \var{delimiter} if \var{quoting}
				169	is set to \constant{QUOTE_NONE}. It defaults to \constant{None}.
				170	\end{memberdesc}
				171
				172	\begin{memberdesc}[string]{lineterminator}
				173	The string used to terminate lines in the CSV file. It defaults to
				174	\code{"\e r\e n"}.
				175	\end{memberdesc}
				176
				177	\begin{memberdesc}[string]{quotechar}
				178	A one-character string used to quote elements containing the \var{delimiter}
				179	or which start with the \var{quotechar}. It defaults to \code{'"'}.
				180	\end{memberdesc}
				181
				182	\begin{memberdesc}[integer]{quoting}
				183	Controls when quotes should be generated by the writer. It can take on any
				184	of the \code{QUOTE_*} constants defined below and defaults to
				185	\constant{QUOTE_MINIMAL}.
				186	\end{memberdesc}
				187
				188	\begin{memberdesc}[boolean]{skipinitialspace}
				189	When \constant{True}, whitespace immediately following the \var{delimiter}
				190	is ignored. The default is \constant{False}.
				191	\end{memberdesc}
				192
				193	\end{classdesc*}
				194
				195	The \module{csv} module defines the following constants:
				196
				197	\begin{datadesc}{QUOTE_ALWAYS}
				198	Instructs \class{writer} objects to quote all fields.
				199	\end{datadesc}
				200
				201	\begin{datadesc}{QUOTE_MINIMAL}
				202	Instructs \class{writer} objects to only quote those fields which contain
				203	the current \var{delimiter} or begin with the current \var{quotechar}.
				204	\end{datadesc}
				205
				206	\begin{datadesc}{QUOTE_NONNUMERIC}
				207	Instructs \class{writer} objects to quote all non-numeric fields.
				208	\end{datadesc}
				209
				210	\begin{datadesc}{QUOTE_NONE}
				211	Instructs \class{writer} objects to never quote fields. When the current
				212	\var{delimiter} occurs in output data it is preceded by the current
				213	\var{escapechar} character. When \constant{QUOTE_NONE} is in effect, it
				214	is an error not to have a single-character \var{escapechar} defined, even if
				215	no data to be written contains the \var{delimiter} character.
				216	\end{datadesc}
				217
				218
				219	The \module{csv} module defines the following exception:
				220
				221	\begin{excdesc}{Error}
				222	Raised by any of the functions when an error is detected.
				223	\end{excdesc}
				224
				225
				226	\subsection{Dialects and Formatting Parameters\label{fmt-params}}
				227
				228	To make it easier to specify the format of input and output records,
				229	specific formatting parameters are grouped together into dialects. A
				230	dialect is a subclass of the \class{Dialect} class having a set of specific
				231	methods and a single \method{validate()} method. When creating \class{reader}
				232	or \class{writer} objects, the programmer can specify a string or a subclass
				233	of the \class{Dialect} class as the dialect parameter. In addition to, or
				234	instead of, the \var{dialect} parameter, the programmer can also specify
				235	individual formatting parameters, which have the same names as the
				236	attributes defined above for the \class{Dialect} class.
				237
				238
				239	\subsection{Reader Objects}
				240
				241	\class{DictReader} and \var{reader} objects have the following public
				242	methods:
				243
				244	\begin{methoddesc}{next}{}
				245	Return the next row of the reader's iterable object as a list, parsed
				246	according to the current dialect.
				247	\end{methoddesc}
				248
				249
				250	\subsection{Writer Objects}
				251
				252	\class{DictWriter} and \var{writer} objects have the following public
				253	methods:
				254
				255	\begin{methoddesc}{writerow}{row}
				256	Write the \var{row} parameter to the writer's file object, formatted
				257	according to the current dialect.
				258	\end{methoddesc}
				259
				260	\begin{methoddesc}{writerows}{rows}
				261	Write all the \var{rows} parameters to the writer's file object, formatted
				262	according to the current dialect.
				263	\end{methoddesc}
				264
				265
Skip Montanaro	5d0136e	2003-04-25 15:14:49 +0000	[diff] [blame^]	266	\begin{classdesc}{Sniffer}{}
Skip Montanaro	3bd3c84	2003-04-24 18:47:31 +0000	[diff] [blame]	267
Skip Montanaro	5d0136e	2003-04-25 15:14:49 +0000	[diff] [blame^]	268	The \class{Sniffer} class is used to deduce the format of a CSV file.
Skip Montanaro	3bd3c84	2003-04-24 18:47:31 +0000	[diff] [blame]	269
Skip Montanaro	5d0136e	2003-04-25 15:14:49 +0000	[diff] [blame^]	270	\begin{methoddesc}{sniff}{sample}
				271	Analyze the sample text (presumed to be in CSV format) and return a
				272	{}\class{Dialect} class reflecting the parameters found.
Skip Montanaro	3bd3c84	2003-04-24 18:47:31 +0000	[diff] [blame]	273	\end{methoddesc}
				274
Skip Montanaro	5d0136e	2003-04-25 15:14:49 +0000	[diff] [blame^]	275	\begin{methoddesc}{has_header}{sample}
				276	Analyze the sample text (presumed to be in CSV format) and return
				277	{}\code{True} if the first row appears to be a series of column headers.
Skip Montanaro	3bd3c84	2003-04-24 18:47:31 +0000	[diff] [blame]	278	\end{methoddesc}
				279	\end{classdesc}
				280
Skip Montanaro	b4a0417	2003-03-20 23:29:12 +0000	[diff] [blame]	281	\subsection{Examples}
				282
				283	The ``Hello, world'' of csv reading is
				284
				285	\begin{verbatim}
Skip Montanaro	3bd3c84	2003-04-24 18:47:31 +0000	[diff] [blame]	286	import csv
Skip Montanaro	b4a0417	2003-03-20 23:29:12 +0000	[diff] [blame]	287	reader = csv.reader(file("some.csv"))
				288	for row in reader:
				289	print row
				290	\end{verbatim}
				291
				292	The corresponding simplest possible writing example is
				293
				294	\begin{verbatim}
Skip Montanaro	3bd3c84	2003-04-24 18:47:31 +0000	[diff] [blame]	295	import csv
Skip Montanaro	b4a0417	2003-03-20 23:29:12 +0000	[diff] [blame]	296	writer = csv.writer(file("some.csv", "w"))
				297	for row in someiterable:
				298	writer.writerow(row)
				299	\end{verbatim}