blob: 283e40152138c65d5608a990468ccebf95f7894b [file] [log] [blame]
Skip Montanarob4a04172003-03-20 23:29:12 +00001\section{\module{csv} --- CSV File Reading and Writing}
2
3\declaremodule{standard}{csv}
4\modulesynopsis{Write and read tabular data to and from delimited files.}
5
6\versionadded{2.3}
7\index{csv}
8\indexii{data}{tabular}
9
10The so-called CSV (Comma Separated Values) format is the most common import
11and export format for spreadsheets and databases. There is no ``CSV
12standard'', so the format is operationally defined by the many applications
13which read and write it. The lack of a standard means that subtle
14differences often exist in the data produced and consumed by different
15applications. These differences can make it annoying to process CSV files
16from multiple sources. Still, while the delimiters and quoting characters
17vary, the overall format is similar enough that it is possible to write a
18single module which can efficiently manipulate such data, hiding the details
19of reading and writing the data from the programmer.
20
21The \module{csv} module implements classes to read and write tabular data in
22CSV format. It allows programmers to say, ``write this data in the format
23preferred by Excel,'' or ``read data from this file which was generated by
24Excel,'' without knowing the precise details of the CSV format used by
25Excel. Programmers can also describe the CSV formats understood by other
26applications or define their own special-purpose CSV formats.
27
28The \module{csv} module's \class{reader} and \class{writer} objects read and
29write sequences. Programmers can also read and write data in dictionary
30form using the \class{DictReader} and \class{DictWriter} classes.
31
32\note{The first version of the \module{csv} module doesn't support Unicode
33input. Also, there are currently some issues regarding \ASCII{} NUL
34characters. Accordingly, all input should generally be plain \ASCII{} to be
35safe. These restrictions will be removed in the future.}
36
37\begin{seealso}
38% \seemodule{array}{Arrays of uniformly types numeric values.}
39 \seepep{305}{CSV File API}
40 {The Python Enhancement Proposal which proposed this addition
41 to Python.}
42\end{seealso}
43
44
45\subsection{Module Contents}
46
47
48The \module{csv} module defines the following functions:
49
50\begin{funcdesc}{reader}{csvfile\optional{,
51 dialect=\code{'excel'}\optional{, fmtparam}}}
52Return a reader object which will iterate over lines in the given
53{}\var{csvfile}. \var{csvfile} can be any object which supports the
54iterator protocol and returns a string each time its \method{next}
55method is called. An optional \var{dialect} parameter can be given
56which is used to define a set of parameters specific to a particular CSV
57dialect. It may be an instance of a subclass of the \class{Dialect}
58class or one of the strings returned by the \function{list_dialects}
59function. The other optional {}\var{fmtparam} keyword arguments can be
60given to override individual formatting parameters in the current
61dialect. For more information about the dialect and formatting
62parameters, see section~\ref{fmt-params}, ``Dialects and Formatting
63Parameters'' for details of these parameters.
64
65All data read are returned as strings. No automatic data type
66conversion is performed.
67\end{funcdesc}
68
69\begin{funcdesc}{writer}{csvfile\optional{,
70 dialect=\code{'excel'}\optional{, fmtparam}}}
71Return a writer object responsible for converting the user's data into
72delimited strings on the given file-like object. An optional
73{}\var{dialect} parameter can be given which is used to define a set of
74parameters specific to a particular CSV dialect. It may be an instance
75of a subclass of the \class{Dialect} class or one of the strings
76returned by the \function{list_dialects} function. The other optional
77{}\var{fmtparam} keyword arguments can be given to override individual
78formatting parameters in the current dialect. For more information
79about the dialect and formatting parameters, see
80section~\ref{fmt-params}, ``Dialects and Formatting Parameters'' for
81details of these parameters. To make it as easy as possible to
82interface with modules which implement the DB API, the value
83\constant{None} is written as the empty string. While this isn't a
84reversible transformation, it makes it easier to dump SQL NULL data values
85to CSV files without preprocessing the data returned from a
86\code{cursor.fetch*()} call. All other non-string data are stringified
87with \function{str()} before being written.
88\end{funcdesc}
89
90\begin{funcdesc}{register_dialect}{name, dialect}
91Associate \var{dialect} with \var{name}. \var{dialect} must be a subclass
92of \class{csv.Dialect}. \var{name} must be a string or Unicode object.
93\end{funcdesc}
94
95\begin{funcdesc}{unregister_dialect}{name}
96Delete the dialect associated with \var{name} from the dialect registry. An
97\exception{Error} is raised if \var{name} is not a registered dialect
98name.
99\end{funcdesc}
100
101\begin{funcdesc}{get_dialect}{name}
102Return the dialect associated with \var{name}. An \exception{Error} is
103raised if \var{name} is not a registered dialect name.
104\end{funcdesc}
105
106\begin{funcdesc}{list_dialects}{}
107Return the names of all registered dialects.
108\end{funcdesc}
109
110
111The \module{csv} module defines the following classes:
112
113\begin{classdesc}{DictReader}{csvfile, fieldnames\optional{,
114 restkey=\code{None}\optional{,
115 restval=\code{None}\optional{,
116 dialect=\code{'excel'}\optional{,
117 fmtparam}}}}}
118Create an object which operates like a regular reader but maps the
119information read into a dict whose keys are given by the \var{fieldnames}
120parameter. If the row read has fewer fields than the fieldnames sequence,
121the value of \var{restval} will be used as the default value. If the row
122read has more fields than the fieldnames sequence, the remaining data is
123added as a sequence keyed by the value of \var{restkey}. If the row read
124has fewer fields than the fieldnames sequence, the remaining keys take the
125value of the optiona \var{restval} parameter. All other parameters are
126interpreted as for regular readers.
127\end{classdesc}
128
129
130\begin{classdesc}{DictWriter}{csvfile, fieldnames\optional{,
131 restval=""\optional{,
132 extrasaction=\code{'raise'}\optional{,
133 dialect=\code{'excel'}\optional{, fmtparam}}}}}
134Create an object which operates like a regular writer but maps dictionaries
135onto output rows. The \var{fieldnames} parameter identifies the order in
136which values in the dictionary passed to the \method{writerow()} method are
137written to the \var{csvfile}. The optional \var{restval} parameter
138specifies the value to be written if the dictionary is missing a key in
139\var{fieldnames}. If the dictionary passed to the \method{writerow()}
140method contains a key not found in \var{fieldnames}, the optional
141\var{extrasaction} parameter indicates what action to take. If it is set
142to \code{'raise'} a \exception{ValueError} is raised. If it is set to
143\code{'ignore'}, extra values in the dictionary are ignored. All other
144parameters are interpreted as for regular writers.
145\end{classdesc}
146
147
148\begin{classdesc*}{Dialect}{}
149The \class{Dialect} class is a container class relied on primarily for its
150attributes, which are used to define the parameters for a specific
151\class{reader} or \class{writer} instance. Dialect objects support the
152following data attributes:
153
154\begin{memberdesc}[string]{delimiter}
155A one-character string used to separate fields. It defaults to \code{","}.
156\end{memberdesc}
157
158\begin{memberdesc}[boolean]{doublequote}
159Controls how instances of \var{quotechar} appearing inside a field should be
160themselves be quoted. When \constant{True}, the character is doubledd.
161When \constant{False}, the \var{escapechar} must be a one-character string
162which is used as a prefix to the \var{quotechar}. It defaults to
163\constant{True}.
164\end{memberdesc}
165
166\begin{memberdesc}{escapechar}
167A one-character string used to escape the \var{delimiter} if \var{quoting}
168is set to \constant{QUOTE_NONE}. It defaults to \constant{None}.
169\end{memberdesc}
170
171\begin{memberdesc}[string]{lineterminator}
172The string used to terminate lines in the CSV file. It defaults to
173\code{"\e r\e n"}.
174\end{memberdesc}
175
176\begin{memberdesc}[string]{quotechar}
177A one-character string used to quote elements containing the \var{delimiter}
178or which start with the \var{quotechar}. It defaults to \code{'"'}.
179\end{memberdesc}
180
181\begin{memberdesc}[integer]{quoting}
182Controls when quotes should be generated by the writer. It can take on any
183of the \code{QUOTE_*} constants defined below and defaults to
184\constant{QUOTE_MINIMAL}.
185\end{memberdesc}
186
187\begin{memberdesc}[boolean]{skipinitialspace}
188When \constant{True}, whitespace immediately following the \var{delimiter}
189is ignored. The default is \constant{False}.
190\end{memberdesc}
191
192\end{classdesc*}
193
194The \module{csv} module defines the following constants:
195
196\begin{datadesc}{QUOTE_ALWAYS}
197Instructs \class{writer} objects to quote all fields.
198\end{datadesc}
199
200\begin{datadesc}{QUOTE_MINIMAL}
201Instructs \class{writer} objects to only quote those fields which contain
202the current \var{delimiter} or begin with the current \var{quotechar}.
203\end{datadesc}
204
205\begin{datadesc}{QUOTE_NONNUMERIC}
206Instructs \class{writer} objects to quote all non-numeric fields.
207\end{datadesc}
208
209\begin{datadesc}{QUOTE_NONE}
210Instructs \class{writer} objects to never quote fields. When the current
211\var{delimiter} occurs in output data it is preceded by the current
212\var{escapechar} character. When \constant{QUOTE_NONE} is in effect, it
213is an error not to have a single-character \var{escapechar} defined, even if
214no data to be written contains the \var{delimiter} character.
215\end{datadesc}
216
217
218The \module{csv} module defines the following exception:
219
220\begin{excdesc}{Error}
221Raised by any of the functions when an error is detected.
222\end{excdesc}
223
224
225\subsection{Dialects and Formatting Parameters\label{fmt-params}}
226
227To make it easier to specify the format of input and output records,
228specific formatting parameters are grouped together into dialects. A
229dialect is a subclass of the \class{Dialect} class having a set of specific
230methods and a single \method{validate()} method. When creating \class{reader}
231or \class{writer} objects, the programmer can specify a string or a subclass
232of the \class{Dialect} class as the dialect parameter. In addition to, or
233instead of, the \var{dialect} parameter, the programmer can also specify
234individual formatting parameters, which have the same names as the
235attributes defined above for the \class{Dialect} class.
236
237
238\subsection{Reader Objects}
239
240\class{DictReader} and \var{reader} objects have the following public
241methods:
242
243\begin{methoddesc}{next}{}
244Return the next row of the reader's iterable object as a list, parsed
245according to the current dialect.
246\end{methoddesc}
247
248
249\subsection{Writer Objects}
250
251\class{DictWriter} and \var{writer} objects have the following public
252methods:
253
254\begin{methoddesc}{writerow}{row}
255Write the \var{row} parameter to the writer's file object, formatted
256according to the current dialect.
257\end{methoddesc}
258
259\begin{methoddesc}{writerows}{rows}
260Write all the \var{rows} parameters to the writer's file object, formatted
261according to the current dialect.
262\end{methoddesc}
263
264
265\subsection{Examples}
266
267The ``Hello, world'' of csv reading is
268
269\begin{verbatim}
270 reader = csv.reader(file("some.csv"))
271 for row in reader:
272 print row
273\end{verbatim}
274
275The corresponding simplest possible writing example is
276
277\begin{verbatim}
278 writer = csv.writer(file("some.csv", "w"))
279 for row in someiterable:
280 writer.writerow(row)
281\end{verbatim}