blob: bf8b0bf789c34307e1cffcdaac76e0252b8bbef4 [file] [log] [blame]
Skip Montanarob4a04172003-03-20 23:29:12 +00001\section{\module{csv} --- CSV File Reading and Writing}
2
3\declaremodule{standard}{csv}
4\modulesynopsis{Write and read tabular data to and from delimited files.}
Skip Montanaro3bd3c842003-04-24 18:47:31 +00005\sectionauthor{Skip Montanaro}{skip@pobox.com}
Skip Montanarob4a04172003-03-20 23:29:12 +00006
7\versionadded{2.3}
8\index{csv}
9\indexii{data}{tabular}
10
11The so-called CSV (Comma Separated Values) format is the most common import
12and export format for spreadsheets and databases. There is no ``CSV
13standard'', so the format is operationally defined by the many applications
14which read and write it. The lack of a standard means that subtle
15differences often exist in the data produced and consumed by different
16applications. These differences can make it annoying to process CSV files
17from multiple sources. Still, while the delimiters and quoting characters
18vary, the overall format is similar enough that it is possible to write a
19single module which can efficiently manipulate such data, hiding the details
20of reading and writing the data from the programmer.
21
Skip Montanaro5d0136e2003-04-25 15:14:49 +000022The \module{csv} module implements classes to read and write tabular data in
Skip Montanarob4a04172003-03-20 23:29:12 +000023CSV format. It allows programmers to say, ``write this data in the format
24preferred by Excel,'' or ``read data from this file which was generated by
25Excel,'' without knowing the precise details of the CSV format used by
26Excel. Programmers can also describe the CSV formats understood by other
27applications or define their own special-purpose CSV formats.
28
Skip Montanaro5d0136e2003-04-25 15:14:49 +000029The \module{csv} module's \class{reader} and \class{writer} objects read and
Skip Montanarob4a04172003-03-20 23:29:12 +000030write sequences. Programmers can also read and write data in dictionary
31form using the \class{DictReader} and \class{DictWriter} classes.
32
Fred Drake96352682003-04-25 18:02:34 +000033\begin{notice}
34 This version of the \module{csv} module doesn't support Unicode
35 input. Also, there are currently some issues regarding \ASCII{} NUL
36 characters. Accordingly, all input should generally be printable
37 \ASCII{} to be safe. These restrictions will be removed in the future.
38\end{notice}
Skip Montanarob4a04172003-03-20 23:29:12 +000039
40\begin{seealso}
41% \seemodule{array}{Arrays of uniformly types numeric values.}
42 \seepep{305}{CSV File API}
43 {The Python Enhancement Proposal which proposed this addition
44 to Python.}
45\end{seealso}
46
47
Raymond Hettinger6f6d7b932003-08-31 05:44:54 +000048\subsection{Module Contents \label{csv-contents}}
Skip Montanarob4a04172003-03-20 23:29:12 +000049
Skip Montanaro5d0136e2003-04-25 15:14:49 +000050The \module{csv} module defines the following functions:
Skip Montanarob4a04172003-03-20 23:29:12 +000051
52\begin{funcdesc}{reader}{csvfile\optional{,
53 dialect=\code{'excel'}\optional{, fmtparam}}}
54Return a reader object which will iterate over lines in the given
55{}\var{csvfile}. \var{csvfile} can be any object which supports the
56iterator protocol and returns a string each time its \method{next}
Skip Montanaro5e4e39f2003-07-02 15:32:48 +000057method is called. If \var{csvfile} is a file object, it must be opened with
58the 'b' flag on platforms where that makes a difference. An optional
59{}\var{dialect} parameter can be given
Skip Montanarob4a04172003-03-20 23:29:12 +000060which is used to define a set of parameters specific to a particular CSV
61dialect. It may be an instance of a subclass of the \class{Dialect}
62class or one of the strings returned by the \function{list_dialects}
63function. The other optional {}\var{fmtparam} keyword arguments can be
64given to override individual formatting parameters in the current
65dialect. For more information about the dialect and formatting
Raymond Hettinger6e380cd2003-09-10 18:54:49 +000066parameters, see section~\ref{csv-fmt-params}, ``Dialects and Formatting
Skip Montanarob4a04172003-03-20 23:29:12 +000067Parameters'' for details of these parameters.
68
69All data read are returned as strings. No automatic data type
70conversion is performed.
71\end{funcdesc}
72
73\begin{funcdesc}{writer}{csvfile\optional{,
74 dialect=\code{'excel'}\optional{, fmtparam}}}
75Return a writer object responsible for converting the user's data into
Skip Montanaro5e4e39f2003-07-02 15:32:48 +000076delimited strings on the given file-like object. \var{csvfile} can be any
77object with a \function{write} method. If \var{csvfile} is a file object,
78it must be opened with the 'b' flag on platforms where that makes a
79difference. An optional
Skip Montanarob4a04172003-03-20 23:29:12 +000080{}\var{dialect} parameter can be given which is used to define a set of
81parameters specific to a particular CSV dialect. It may be an instance
82of a subclass of the \class{Dialect} class or one of the strings
83returned by the \function{list_dialects} function. The other optional
84{}\var{fmtparam} keyword arguments can be given to override individual
85formatting parameters in the current dialect. For more information
86about the dialect and formatting parameters, see
Raymond Hettinger6e380cd2003-09-10 18:54:49 +000087section~\ref{csv-fmt-params}, ``Dialects and Formatting Parameters'' for
Skip Montanarob4a04172003-03-20 23:29:12 +000088details of these parameters. To make it as easy as possible to
89interface with modules which implement the DB API, the value
90\constant{None} is written as the empty string. While this isn't a
91reversible transformation, it makes it easier to dump SQL NULL data values
92to CSV files without preprocessing the data returned from a
93\code{cursor.fetch*()} call. All other non-string data are stringified
94with \function{str()} before being written.
95\end{funcdesc}
96
97\begin{funcdesc}{register_dialect}{name, dialect}
98Associate \var{dialect} with \var{name}. \var{dialect} must be a subclass
99of \class{csv.Dialect}. \var{name} must be a string or Unicode object.
100\end{funcdesc}
101
102\begin{funcdesc}{unregister_dialect}{name}
103Delete the dialect associated with \var{name} from the dialect registry. An
104\exception{Error} is raised if \var{name} is not a registered dialect
105name.
106\end{funcdesc}
107
108\begin{funcdesc}{get_dialect}{name}
109Return the dialect associated with \var{name}. An \exception{Error} is
110raised if \var{name} is not a registered dialect name.
111\end{funcdesc}
112
113\begin{funcdesc}{list_dialects}{}
114Return the names of all registered dialects.
115\end{funcdesc}
116
117
Skip Montanaro5d0136e2003-04-25 15:14:49 +0000118The \module{csv} module defines the following classes:
Skip Montanarob4a04172003-03-20 23:29:12 +0000119
Skip Montanarodffeed32003-10-03 14:03:01 +0000120\begin{classdesc}{DictReader}{csvfile\optional{,
121 fieldnames=\constant{None},\optional{,
Fred Drake96352682003-04-25 18:02:34 +0000122 restkey=\constant{None}\optional{,
123 restval=\constant{None}\optional{,
Skip Montanarob4a04172003-03-20 23:29:12 +0000124 dialect=\code{'excel'}\optional{,
Skip Montanarodffeed32003-10-03 14:03:01 +0000125 fmtparam}}}}}}
Skip Montanarob4a04172003-03-20 23:29:12 +0000126Create an object which operates like a regular reader but maps the
Skip Montanarodffeed32003-10-03 14:03:01 +0000127information read into a dict whose keys are given by the optional
128{} \var{fieldnames}
129parameter. If the \var{fieldnames} parameter is omitted, the values in
130the first row of the \var{csvfile} will be used as the fieldnames.
131If the row read has fewer fields than the fieldnames sequence,
Skip Montanarob4a04172003-03-20 23:29:12 +0000132the value of \var{restval} will be used as the default value. If the row
133read has more fields than the fieldnames sequence, the remaining data is
134added as a sequence keyed by the value of \var{restkey}. If the row read
135has fewer fields than the fieldnames sequence, the remaining keys take the
Raymond Hettinger6f6d7b932003-08-31 05:44:54 +0000136value of the optional \var{restval} parameter. All other parameters are
Skip Montanaro5e4e39f2003-07-02 15:32:48 +0000137interpreted as for \class{reader} objects.
Skip Montanarob4a04172003-03-20 23:29:12 +0000138\end{classdesc}
139
140
141\begin{classdesc}{DictWriter}{csvfile, fieldnames\optional{,
142 restval=""\optional{,
143 extrasaction=\code{'raise'}\optional{,
144 dialect=\code{'excel'}\optional{, fmtparam}}}}}
145Create an object which operates like a regular writer but maps dictionaries
146onto output rows. The \var{fieldnames} parameter identifies the order in
147which values in the dictionary passed to the \method{writerow()} method are
148written to the \var{csvfile}. The optional \var{restval} parameter
149specifies the value to be written if the dictionary is missing a key in
150\var{fieldnames}. If the dictionary passed to the \method{writerow()}
151method contains a key not found in \var{fieldnames}, the optional
152\var{extrasaction} parameter indicates what action to take. If it is set
153to \code{'raise'} a \exception{ValueError} is raised. If it is set to
154\code{'ignore'}, extra values in the dictionary are ignored. All other
Skip Montanaro5e4e39f2003-07-02 15:32:48 +0000155parameters are interpreted as for \class{writer} objects.
Skip Montanarodffeed32003-10-03 14:03:01 +0000156
157Note that unlike the \class{DictReader} class, the \var{fieldnames}
158parameter of the \class{DictWriter} is not optional. Since Python's
159\class{dict} objects are not ordered, there is not enough information
160available to deduce the order in which the row should be written to the
161\var{csvfile}.
162
Skip Montanarob4a04172003-03-20 23:29:12 +0000163\end{classdesc}
164
Skip Montanarob4a04172003-03-20 23:29:12 +0000165\begin{classdesc*}{Dialect}{}
166The \class{Dialect} class is a container class relied on primarily for its
167attributes, which are used to define the parameters for a specific
Fred Drake96352682003-04-25 18:02:34 +0000168\class{reader} or \class{writer} instance.
Skip Montanarob4a04172003-03-20 23:29:12 +0000169\end{classdesc*}
170
Skip Montanaro77892372003-05-19 15:33:36 +0000171\begin{classdesc}{Sniffer}{}
172The \class{Sniffer} class is used to deduce the format of a CSV file.
Fred Drake96352682003-04-25 18:02:34 +0000173\end{classdesc}
174
175The \class{Sniffer} class provides a single method:
176
Skip Montanaro77892372003-05-19 15:33:36 +0000177\begin{methoddesc}{sniff}{sample\optional{,delimiters=None}}
178Analyze the given \var{sample} and return a \class{Dialect} subclass
179reflecting the parameters found. If the optional \var{delimiters} parameter
180is given, it is interpreted as a string containing possible valid delimiter
181characters.
Fred Drake96352682003-04-25 18:02:34 +0000182\end{methoddesc}
183
184\begin{methoddesc}{has_header}{sample}
185Analyze the sample text (presumed to be in CSV format) and return
186\constant{True} if the first row appears to be a series of column
187headers.
188\end{methoddesc}
189
190
Skip Montanarob4a04172003-03-20 23:29:12 +0000191The \module{csv} module defines the following constants:
192
Skip Montanaroa1045562003-06-04 15:30:13 +0000193\begin{datadesc}{QUOTE_ALL}
Skip Montanarob4a04172003-03-20 23:29:12 +0000194Instructs \class{writer} objects to quote all fields.
195\end{datadesc}
196
197\begin{datadesc}{QUOTE_MINIMAL}
198Instructs \class{writer} objects to only quote those fields which contain
199the current \var{delimiter} or begin with the current \var{quotechar}.
200\end{datadesc}
201
202\begin{datadesc}{QUOTE_NONNUMERIC}
203Instructs \class{writer} objects to quote all non-numeric fields.
204\end{datadesc}
205
206\begin{datadesc}{QUOTE_NONE}
207Instructs \class{writer} objects to never quote fields. When the current
208\var{delimiter} occurs in output data it is preceded by the current
209\var{escapechar} character. When \constant{QUOTE_NONE} is in effect, it
210is an error not to have a single-character \var{escapechar} defined, even if
211no data to be written contains the \var{delimiter} character.
212\end{datadesc}
213
214
215The \module{csv} module defines the following exception:
216
217\begin{excdesc}{Error}
218Raised by any of the functions when an error is detected.
219\end{excdesc}
220
221
Fred Drake96352682003-04-25 18:02:34 +0000222\subsection{Dialects and Formatting Parameters\label{csv-fmt-params}}
Skip Montanarob4a04172003-03-20 23:29:12 +0000223
224To make it easier to specify the format of input and output records,
225specific formatting parameters are grouped together into dialects. A
226dialect is a subclass of the \class{Dialect} class having a set of specific
227methods and a single \method{validate()} method. When creating \class{reader}
228or \class{writer} objects, the programmer can specify a string or a subclass
229of the \class{Dialect} class as the dialect parameter. In addition to, or
230instead of, the \var{dialect} parameter, the programmer can also specify
231individual formatting parameters, which have the same names as the
Raymond Hettinger6f6d7b932003-08-31 05:44:54 +0000232attributes defined below for the \class{Dialect} class.
Skip Montanarob4a04172003-03-20 23:29:12 +0000233
Fred Drake96352682003-04-25 18:02:34 +0000234Dialects support the following attributes:
235
236\begin{memberdesc}[Dialect]{delimiter}
237A one-character string used to separate fields. It defaults to \code{','}.
238\end{memberdesc}
239
240\begin{memberdesc}[Dialect]{doublequote}
241Controls how instances of \var{quotechar} appearing inside a field should be
242themselves be quoted. When \constant{True}, the character is doubledd.
243When \constant{False}, the \var{escapechar} must be a one-character string
244which is used as a prefix to the \var{quotechar}. It defaults to
245\constant{True}.
246\end{memberdesc}
247
248\begin{memberdesc}[Dialect]{escapechar}
249A one-character string used to escape the \var{delimiter} if \var{quoting}
250is set to \constant{QUOTE_NONE}. It defaults to \constant{None}.
251\end{memberdesc}
252
253\begin{memberdesc}[Dialect]{lineterminator}
254The string used to terminate lines in the CSV file. It defaults to
255\code{'\e r\e n'}.
256\end{memberdesc}
257
258\begin{memberdesc}[Dialect]{quotechar}
259A one-character string used to quote elements containing the \var{delimiter}
260or which start with the \var{quotechar}. It defaults to \code{'"'}.
261\end{memberdesc}
262
263\begin{memberdesc}[Dialect]{quoting}
264Controls when quotes should be generated by the writer. It can take on any
Raymond Hettinger6f6d7b932003-08-31 05:44:54 +0000265of the \constant{QUOTE_*} constants (see section~\ref{csv-contents})
266and defaults to \constant{QUOTE_MINIMAL}.
Fred Drake96352682003-04-25 18:02:34 +0000267\end{memberdesc}
268
269\begin{memberdesc}[Dialect]{skipinitialspace}
270When \constant{True}, whitespace immediately following the \var{delimiter}
271is ignored. The default is \constant{False}.
272\end{memberdesc}
273
Skip Montanarob4a04172003-03-20 23:29:12 +0000274
275\subsection{Reader Objects}
276
Fred Drake96352682003-04-25 18:02:34 +0000277Reader objects (\class{DictReader} instances and objects returned by
Raymond Hettinger6f6d7b932003-08-31 05:44:54 +0000278the \function{reader()} function) have the following public methods:
Skip Montanarob4a04172003-03-20 23:29:12 +0000279
Fred Drake96352682003-04-25 18:02:34 +0000280\begin{methoddesc}[csv reader]{next}{}
Skip Montanarob4a04172003-03-20 23:29:12 +0000281Return the next row of the reader's iterable object as a list, parsed
282according to the current dialect.
283\end{methoddesc}
284
285
286\subsection{Writer Objects}
287
Fred Drake96352682003-04-25 18:02:34 +0000288Writer objects (\class{DictWriter} instances and objects returned by
289the \function{writer()} function) have the following public methods:
Skip Montanarob4a04172003-03-20 23:29:12 +0000290
Fred Drake96352682003-04-25 18:02:34 +0000291\begin{methoddesc}[csv writer]{writerow}{row}
Skip Montanarob4a04172003-03-20 23:29:12 +0000292Write the \var{row} parameter to the writer's file object, formatted
293according to the current dialect.
294\end{methoddesc}
295
Fred Drake96352682003-04-25 18:02:34 +0000296\begin{methoddesc}[csv writer]{writerows}{rows}
Skip Montanarob4a04172003-03-20 23:29:12 +0000297Write all the \var{rows} parameters to the writer's file object, formatted
298according to the current dialect.
299\end{methoddesc}
300
301
302\subsection{Examples}
303
304The ``Hello, world'' of csv reading is
305
306\begin{verbatim}
Fred Drake96352682003-04-25 18:02:34 +0000307import csv
308reader = csv.reader(file("some.csv"))
309for row in reader:
310 print row
Skip Montanarob4a04172003-03-20 23:29:12 +0000311\end{verbatim}
312
313The corresponding simplest possible writing example is
314
315\begin{verbatim}
Fred Drake96352682003-04-25 18:02:34 +0000316import csv
317writer = csv.writer(file("some.csv", "w"))
318for row in someiterable:
319 writer.writerow(row)
Skip Montanarob4a04172003-03-20 23:29:12 +0000320\end{verbatim}