blob: e2877ee6ad72651bf4a4569dbfad3a8270e7c6ba [file] [log] [blame]
Skip Montanarob4a04172003-03-20 23:29:12 +00001\section{\module{csv} --- CSV File Reading and Writing}
2
3\declaremodule{standard}{csv}
4\modulesynopsis{Write and read tabular data to and from delimited files.}
Skip Montanaro3bd3c842003-04-24 18:47:31 +00005\sectionauthor{Skip Montanaro}{skip@pobox.com}
Skip Montanarob4a04172003-03-20 23:29:12 +00006
7\versionadded{2.3}
8\index{csv}
9\indexii{data}{tabular}
10
11The so-called CSV (Comma Separated Values) format is the most common import
12and export format for spreadsheets and databases. There is no ``CSV
13standard'', so the format is operationally defined by the many applications
14which read and write it. The lack of a standard means that subtle
15differences often exist in the data produced and consumed by different
16applications. These differences can make it annoying to process CSV files
17from multiple sources. Still, while the delimiters and quoting characters
18vary, the overall format is similar enough that it is possible to write a
19single module which can efficiently manipulate such data, hiding the details
20of reading and writing the data from the programmer.
21
Skip Montanaro5d0136e2003-04-25 15:14:49 +000022The \module{csv} module implements classes to read and write tabular data in
Skip Montanarob4a04172003-03-20 23:29:12 +000023CSV format. It allows programmers to say, ``write this data in the format
24preferred by Excel,'' or ``read data from this file which was generated by
25Excel,'' without knowing the precise details of the CSV format used by
26Excel. Programmers can also describe the CSV formats understood by other
27applications or define their own special-purpose CSV formats.
28
Skip Montanaro5d0136e2003-04-25 15:14:49 +000029The \module{csv} module's \class{reader} and \class{writer} objects read and
Skip Montanarob4a04172003-03-20 23:29:12 +000030write sequences. Programmers can also read and write data in dictionary
31form using the \class{DictReader} and \class{DictWriter} classes.
32
Fred Drake96352682003-04-25 18:02:34 +000033\begin{notice}
34 This version of the \module{csv} module doesn't support Unicode
35 input. Also, there are currently some issues regarding \ASCII{} NUL
36 characters. Accordingly, all input should generally be printable
37 \ASCII{} to be safe. These restrictions will be removed in the future.
38\end{notice}
Skip Montanarob4a04172003-03-20 23:29:12 +000039
40\begin{seealso}
41% \seemodule{array}{Arrays of uniformly types numeric values.}
42 \seepep{305}{CSV File API}
43 {The Python Enhancement Proposal which proposed this addition
44 to Python.}
45\end{seealso}
46
47
Raymond Hettinger6f6d7b932003-08-31 05:44:54 +000048\subsection{Module Contents \label{csv-contents}}
Skip Montanarob4a04172003-03-20 23:29:12 +000049
Skip Montanaro5d0136e2003-04-25 15:14:49 +000050The \module{csv} module defines the following functions:
Skip Montanarob4a04172003-03-20 23:29:12 +000051
52\begin{funcdesc}{reader}{csvfile\optional{,
53 dialect=\code{'excel'}\optional{, fmtparam}}}
54Return a reader object which will iterate over lines in the given
55{}\var{csvfile}. \var{csvfile} can be any object which supports the
56iterator protocol and returns a string each time its \method{next}
Skip Montanaro5e4e39f2003-07-02 15:32:48 +000057method is called. If \var{csvfile} is a file object, it must be opened with
58the 'b' flag on platforms where that makes a difference. An optional
59{}\var{dialect} parameter can be given
Skip Montanarob4a04172003-03-20 23:29:12 +000060which is used to define a set of parameters specific to a particular CSV
61dialect. It may be an instance of a subclass of the \class{Dialect}
62class or one of the strings returned by the \function{list_dialects}
63function. The other optional {}\var{fmtparam} keyword arguments can be
64given to override individual formatting parameters in the current
65dialect. For more information about the dialect and formatting
Raymond Hettinger6e380cd2003-09-10 18:54:49 +000066parameters, see section~\ref{csv-fmt-params}, ``Dialects and Formatting
Skip Montanarob4a04172003-03-20 23:29:12 +000067Parameters'' for details of these parameters.
68
69All data read are returned as strings. No automatic data type
70conversion is performed.
71\end{funcdesc}
72
73\begin{funcdesc}{writer}{csvfile\optional{,
74 dialect=\code{'excel'}\optional{, fmtparam}}}
75Return a writer object responsible for converting the user's data into
Skip Montanaro5e4e39f2003-07-02 15:32:48 +000076delimited strings on the given file-like object. \var{csvfile} can be any
77object with a \function{write} method. If \var{csvfile} is a file object,
78it must be opened with the 'b' flag on platforms where that makes a
79difference. An optional
Skip Montanarob4a04172003-03-20 23:29:12 +000080{}\var{dialect} parameter can be given which is used to define a set of
81parameters specific to a particular CSV dialect. It may be an instance
82of a subclass of the \class{Dialect} class or one of the strings
83returned by the \function{list_dialects} function. The other optional
84{}\var{fmtparam} keyword arguments can be given to override individual
85formatting parameters in the current dialect. For more information
86about the dialect and formatting parameters, see
Raymond Hettinger6e380cd2003-09-10 18:54:49 +000087section~\ref{csv-fmt-params}, ``Dialects and Formatting Parameters'' for
Skip Montanarob4a04172003-03-20 23:29:12 +000088details of these parameters. To make it as easy as possible to
89interface with modules which implement the DB API, the value
90\constant{None} is written as the empty string. While this isn't a
91reversible transformation, it makes it easier to dump SQL NULL data values
92to CSV files without preprocessing the data returned from a
93\code{cursor.fetch*()} call. All other non-string data are stringified
94with \function{str()} before being written.
95\end{funcdesc}
96
97\begin{funcdesc}{register_dialect}{name, dialect}
98Associate \var{dialect} with \var{name}. \var{dialect} must be a subclass
99of \class{csv.Dialect}. \var{name} must be a string or Unicode object.
100\end{funcdesc}
101
102\begin{funcdesc}{unregister_dialect}{name}
103Delete the dialect associated with \var{name} from the dialect registry. An
104\exception{Error} is raised if \var{name} is not a registered dialect
105name.
106\end{funcdesc}
107
108\begin{funcdesc}{get_dialect}{name}
109Return the dialect associated with \var{name}. An \exception{Error} is
110raised if \var{name} is not a registered dialect name.
111\end{funcdesc}
112
113\begin{funcdesc}{list_dialects}{}
114Return the names of all registered dialects.
115\end{funcdesc}
116
117
Skip Montanaro5d0136e2003-04-25 15:14:49 +0000118The \module{csv} module defines the following classes:
Skip Montanarob4a04172003-03-20 23:29:12 +0000119
120\begin{classdesc}{DictReader}{csvfile, fieldnames\optional{,
Fred Drake96352682003-04-25 18:02:34 +0000121 restkey=\constant{None}\optional{,
122 restval=\constant{None}\optional{,
Skip Montanarob4a04172003-03-20 23:29:12 +0000123 dialect=\code{'excel'}\optional{,
124 fmtparam}}}}}
125Create an object which operates like a regular reader but maps the
126information read into a dict whose keys are given by the \var{fieldnames}
127parameter. If the row read has fewer fields than the fieldnames sequence,
128the value of \var{restval} will be used as the default value. If the row
129read has more fields than the fieldnames sequence, the remaining data is
130added as a sequence keyed by the value of \var{restkey}. If the row read
131has fewer fields than the fieldnames sequence, the remaining keys take the
Raymond Hettinger6f6d7b932003-08-31 05:44:54 +0000132value of the optional \var{restval} parameter. All other parameters are
Skip Montanaro5e4e39f2003-07-02 15:32:48 +0000133interpreted as for \class{reader} objects.
Skip Montanarob4a04172003-03-20 23:29:12 +0000134\end{classdesc}
135
136
137\begin{classdesc}{DictWriter}{csvfile, fieldnames\optional{,
138 restval=""\optional{,
139 extrasaction=\code{'raise'}\optional{,
140 dialect=\code{'excel'}\optional{, fmtparam}}}}}
141Create an object which operates like a regular writer but maps dictionaries
142onto output rows. The \var{fieldnames} parameter identifies the order in
143which values in the dictionary passed to the \method{writerow()} method are
144written to the \var{csvfile}. The optional \var{restval} parameter
145specifies the value to be written if the dictionary is missing a key in
146\var{fieldnames}. If the dictionary passed to the \method{writerow()}
147method contains a key not found in \var{fieldnames}, the optional
148\var{extrasaction} parameter indicates what action to take. If it is set
149to \code{'raise'} a \exception{ValueError} is raised. If it is set to
150\code{'ignore'}, extra values in the dictionary are ignored. All other
Skip Montanaro5e4e39f2003-07-02 15:32:48 +0000151parameters are interpreted as for \class{writer} objects.
Skip Montanarob4a04172003-03-20 23:29:12 +0000152\end{classdesc}
153
Skip Montanarob4a04172003-03-20 23:29:12 +0000154\begin{classdesc*}{Dialect}{}
155The \class{Dialect} class is a container class relied on primarily for its
156attributes, which are used to define the parameters for a specific
Fred Drake96352682003-04-25 18:02:34 +0000157\class{reader} or \class{writer} instance.
Skip Montanarob4a04172003-03-20 23:29:12 +0000158\end{classdesc*}
159
Skip Montanaro77892372003-05-19 15:33:36 +0000160\begin{classdesc}{Sniffer}{}
161The \class{Sniffer} class is used to deduce the format of a CSV file.
Fred Drake96352682003-04-25 18:02:34 +0000162\end{classdesc}
163
164The \class{Sniffer} class provides a single method:
165
Skip Montanaro77892372003-05-19 15:33:36 +0000166\begin{methoddesc}{sniff}{sample\optional{,delimiters=None}}
167Analyze the given \var{sample} and return a \class{Dialect} subclass
168reflecting the parameters found. If the optional \var{delimiters} parameter
169is given, it is interpreted as a string containing possible valid delimiter
170characters.
Fred Drake96352682003-04-25 18:02:34 +0000171\end{methoddesc}
172
173\begin{methoddesc}{has_header}{sample}
174Analyze the sample text (presumed to be in CSV format) and return
175\constant{True} if the first row appears to be a series of column
176headers.
177\end{methoddesc}
178
179
Skip Montanarob4a04172003-03-20 23:29:12 +0000180The \module{csv} module defines the following constants:
181
Skip Montanaroa1045562003-06-04 15:30:13 +0000182\begin{datadesc}{QUOTE_ALL}
Skip Montanarob4a04172003-03-20 23:29:12 +0000183Instructs \class{writer} objects to quote all fields.
184\end{datadesc}
185
186\begin{datadesc}{QUOTE_MINIMAL}
187Instructs \class{writer} objects to only quote those fields which contain
188the current \var{delimiter} or begin with the current \var{quotechar}.
189\end{datadesc}
190
191\begin{datadesc}{QUOTE_NONNUMERIC}
192Instructs \class{writer} objects to quote all non-numeric fields.
193\end{datadesc}
194
195\begin{datadesc}{QUOTE_NONE}
196Instructs \class{writer} objects to never quote fields. When the current
197\var{delimiter} occurs in output data it is preceded by the current
198\var{escapechar} character. When \constant{QUOTE_NONE} is in effect, it
199is an error not to have a single-character \var{escapechar} defined, even if
200no data to be written contains the \var{delimiter} character.
201\end{datadesc}
202
203
204The \module{csv} module defines the following exception:
205
206\begin{excdesc}{Error}
207Raised by any of the functions when an error is detected.
208\end{excdesc}
209
210
Fred Drake96352682003-04-25 18:02:34 +0000211\subsection{Dialects and Formatting Parameters\label{csv-fmt-params}}
Skip Montanarob4a04172003-03-20 23:29:12 +0000212
213To make it easier to specify the format of input and output records,
214specific formatting parameters are grouped together into dialects. A
215dialect is a subclass of the \class{Dialect} class having a set of specific
216methods and a single \method{validate()} method. When creating \class{reader}
217or \class{writer} objects, the programmer can specify a string or a subclass
218of the \class{Dialect} class as the dialect parameter. In addition to, or
219instead of, the \var{dialect} parameter, the programmer can also specify
220individual formatting parameters, which have the same names as the
Raymond Hettinger6f6d7b932003-08-31 05:44:54 +0000221attributes defined below for the \class{Dialect} class.
Skip Montanarob4a04172003-03-20 23:29:12 +0000222
Fred Drake96352682003-04-25 18:02:34 +0000223Dialects support the following attributes:
224
225\begin{memberdesc}[Dialect]{delimiter}
226A one-character string used to separate fields. It defaults to \code{','}.
227\end{memberdesc}
228
229\begin{memberdesc}[Dialect]{doublequote}
230Controls how instances of \var{quotechar} appearing inside a field should be
231themselves be quoted. When \constant{True}, the character is doubledd.
232When \constant{False}, the \var{escapechar} must be a one-character string
233which is used as a prefix to the \var{quotechar}. It defaults to
234\constant{True}.
235\end{memberdesc}
236
237\begin{memberdesc}[Dialect]{escapechar}
238A one-character string used to escape the \var{delimiter} if \var{quoting}
239is set to \constant{QUOTE_NONE}. It defaults to \constant{None}.
240\end{memberdesc}
241
242\begin{memberdesc}[Dialect]{lineterminator}
243The string used to terminate lines in the CSV file. It defaults to
244\code{'\e r\e n'}.
245\end{memberdesc}
246
247\begin{memberdesc}[Dialect]{quotechar}
248A one-character string used to quote elements containing the \var{delimiter}
249or which start with the \var{quotechar}. It defaults to \code{'"'}.
250\end{memberdesc}
251
252\begin{memberdesc}[Dialect]{quoting}
253Controls when quotes should be generated by the writer. It can take on any
Raymond Hettinger6f6d7b932003-08-31 05:44:54 +0000254of the \constant{QUOTE_*} constants (see section~\ref{csv-contents})
255and defaults to \constant{QUOTE_MINIMAL}.
Fred Drake96352682003-04-25 18:02:34 +0000256\end{memberdesc}
257
258\begin{memberdesc}[Dialect]{skipinitialspace}
259When \constant{True}, whitespace immediately following the \var{delimiter}
260is ignored. The default is \constant{False}.
261\end{memberdesc}
262
Skip Montanarob4a04172003-03-20 23:29:12 +0000263
264\subsection{Reader Objects}
265
Fred Drake96352682003-04-25 18:02:34 +0000266Reader objects (\class{DictReader} instances and objects returned by
Raymond Hettinger6f6d7b932003-08-31 05:44:54 +0000267the \function{reader()} function) have the following public methods:
Skip Montanarob4a04172003-03-20 23:29:12 +0000268
Fred Drake96352682003-04-25 18:02:34 +0000269\begin{methoddesc}[csv reader]{next}{}
Skip Montanarob4a04172003-03-20 23:29:12 +0000270Return the next row of the reader's iterable object as a list, parsed
271according to the current dialect.
272\end{methoddesc}
273
274
275\subsection{Writer Objects}
276
Fred Drake96352682003-04-25 18:02:34 +0000277Writer objects (\class{DictWriter} instances and objects returned by
278the \function{writer()} function) have the following public methods:
Skip Montanarob4a04172003-03-20 23:29:12 +0000279
Fred Drake96352682003-04-25 18:02:34 +0000280\begin{methoddesc}[csv writer]{writerow}{row}
Skip Montanarob4a04172003-03-20 23:29:12 +0000281Write the \var{row} parameter to the writer's file object, formatted
282according to the current dialect.
283\end{methoddesc}
284
Fred Drake96352682003-04-25 18:02:34 +0000285\begin{methoddesc}[csv writer]{writerows}{rows}
Skip Montanarob4a04172003-03-20 23:29:12 +0000286Write all the \var{rows} parameters to the writer's file object, formatted
287according to the current dialect.
288\end{methoddesc}
289
290
291\subsection{Examples}
292
293The ``Hello, world'' of csv reading is
294
295\begin{verbatim}
Fred Drake96352682003-04-25 18:02:34 +0000296import csv
297reader = csv.reader(file("some.csv"))
298for row in reader:
299 print row
Skip Montanarob4a04172003-03-20 23:29:12 +0000300\end{verbatim}
301
302The corresponding simplest possible writing example is
303
304\begin{verbatim}
Fred Drake96352682003-04-25 18:02:34 +0000305import csv
306writer = csv.writer(file("some.csv", "w"))
307for row in someiterable:
308 writer.writerow(row)
Skip Montanarob4a04172003-03-20 23:29:12 +0000309\end{verbatim}