blob: f2dc912bdc7a2c67db9cbebc0969fc3132191479 [file] [log] [blame]
Skip Montanarob4a04172003-03-20 23:29:12 +00001\section{\module{csv} --- CSV File Reading and Writing}
2
3\declaremodule{standard}{csv}
4\modulesynopsis{Write and read tabular data to and from delimited files.}
Skip Montanaro3bd3c842003-04-24 18:47:31 +00005\sectionauthor{Skip Montanaro}{skip@pobox.com}
Skip Montanarob4a04172003-03-20 23:29:12 +00006
7\versionadded{2.3}
8\index{csv}
9\indexii{data}{tabular}
10
11The so-called CSV (Comma Separated Values) format is the most common import
12and export format for spreadsheets and databases. There is no ``CSV
13standard'', so the format is operationally defined by the many applications
14which read and write it. The lack of a standard means that subtle
15differences often exist in the data produced and consumed by different
16applications. These differences can make it annoying to process CSV files
17from multiple sources. Still, while the delimiters and quoting characters
18vary, the overall format is similar enough that it is possible to write a
19single module which can efficiently manipulate such data, hiding the details
20of reading and writing the data from the programmer.
21
Skip Montanaro5d0136e2003-04-25 15:14:49 +000022The \module{csv} module implements classes to read and write tabular data in
Skip Montanarob4a04172003-03-20 23:29:12 +000023CSV format. It allows programmers to say, ``write this data in the format
24preferred by Excel,'' or ``read data from this file which was generated by
25Excel,'' without knowing the precise details of the CSV format used by
26Excel. Programmers can also describe the CSV formats understood by other
27applications or define their own special-purpose CSV formats.
28
Skip Montanaro5d0136e2003-04-25 15:14:49 +000029The \module{csv} module's \class{reader} and \class{writer} objects read and
Skip Montanarob4a04172003-03-20 23:29:12 +000030write sequences. Programmers can also read and write data in dictionary
31form using the \class{DictReader} and \class{DictWriter} classes.
32
Fred Drake96352682003-04-25 18:02:34 +000033\begin{notice}
34 This version of the \module{csv} module doesn't support Unicode
35 input. Also, there are currently some issues regarding \ASCII{} NUL
36 characters. Accordingly, all input should generally be printable
37 \ASCII{} to be safe. These restrictions will be removed in the future.
38\end{notice}
Skip Montanarob4a04172003-03-20 23:29:12 +000039
40\begin{seealso}
41% \seemodule{array}{Arrays of uniformly types numeric values.}
42 \seepep{305}{CSV File API}
43 {The Python Enhancement Proposal which proposed this addition
44 to Python.}
45\end{seealso}
46
47
Raymond Hettinger6f6d7b932003-08-31 05:44:54 +000048\subsection{Module Contents \label{csv-contents}}
Skip Montanarob4a04172003-03-20 23:29:12 +000049
Skip Montanaro5d0136e2003-04-25 15:14:49 +000050The \module{csv} module defines the following functions:
Skip Montanarob4a04172003-03-20 23:29:12 +000051
52\begin{funcdesc}{reader}{csvfile\optional{,
53 dialect=\code{'excel'}\optional{, fmtparam}}}
54Return a reader object which will iterate over lines in the given
55{}\var{csvfile}. \var{csvfile} can be any object which supports the
56iterator protocol and returns a string each time its \method{next}
Skip Montanaro5e4e39f2003-07-02 15:32:48 +000057method is called. If \var{csvfile} is a file object, it must be opened with
58the 'b' flag on platforms where that makes a difference. An optional
59{}\var{dialect} parameter can be given
Skip Montanarob4a04172003-03-20 23:29:12 +000060which is used to define a set of parameters specific to a particular CSV
61dialect. It may be an instance of a subclass of the \class{Dialect}
62class or one of the strings returned by the \function{list_dialects}
63function. The other optional {}\var{fmtparam} keyword arguments can be
64given to override individual formatting parameters in the current
65dialect. For more information about the dialect and formatting
Raymond Hettinger6e380cd2003-09-10 18:54:49 +000066parameters, see section~\ref{csv-fmt-params}, ``Dialects and Formatting
Skip Montanarob4a04172003-03-20 23:29:12 +000067Parameters'' for details of these parameters.
68
69All data read are returned as strings. No automatic data type
70conversion is performed.
71\end{funcdesc}
72
73\begin{funcdesc}{writer}{csvfile\optional{,
74 dialect=\code{'excel'}\optional{, fmtparam}}}
75Return a writer object responsible for converting the user's data into
Skip Montanaro5e4e39f2003-07-02 15:32:48 +000076delimited strings on the given file-like object. \var{csvfile} can be any
77object with a \function{write} method. If \var{csvfile} is a file object,
78it must be opened with the 'b' flag on platforms where that makes a
79difference. An optional
Skip Montanarob4a04172003-03-20 23:29:12 +000080{}\var{dialect} parameter can be given which is used to define a set of
81parameters specific to a particular CSV dialect. It may be an instance
82of a subclass of the \class{Dialect} class or one of the strings
83returned by the \function{list_dialects} function. The other optional
84{}\var{fmtparam} keyword arguments can be given to override individual
85formatting parameters in the current dialect. For more information
86about the dialect and formatting parameters, see
Raymond Hettinger6e380cd2003-09-10 18:54:49 +000087section~\ref{csv-fmt-params}, ``Dialects and Formatting Parameters'' for
Skip Montanarob4a04172003-03-20 23:29:12 +000088details of these parameters. To make it as easy as possible to
89interface with modules which implement the DB API, the value
90\constant{None} is written as the empty string. While this isn't a
91reversible transformation, it makes it easier to dump SQL NULL data values
92to CSV files without preprocessing the data returned from a
93\code{cursor.fetch*()} call. All other non-string data are stringified
94with \function{str()} before being written.
95\end{funcdesc}
96
97\begin{funcdesc}{register_dialect}{name, dialect}
98Associate \var{dialect} with \var{name}. \var{dialect} must be a subclass
99of \class{csv.Dialect}. \var{name} must be a string or Unicode object.
100\end{funcdesc}
101
102\begin{funcdesc}{unregister_dialect}{name}
103Delete the dialect associated with \var{name} from the dialect registry. An
104\exception{Error} is raised if \var{name} is not a registered dialect
105name.
106\end{funcdesc}
107
108\begin{funcdesc}{get_dialect}{name}
109Return the dialect associated with \var{name}. An \exception{Error} is
110raised if \var{name} is not a registered dialect name.
111\end{funcdesc}
112
113\begin{funcdesc}{list_dialects}{}
114Return the names of all registered dialects.
115\end{funcdesc}
116
117
Skip Montanaro5d0136e2003-04-25 15:14:49 +0000118The \module{csv} module defines the following classes:
Skip Montanarob4a04172003-03-20 23:29:12 +0000119
Skip Montanarodffeed32003-10-03 14:03:01 +0000120\begin{classdesc}{DictReader}{csvfile\optional{,
121 fieldnames=\constant{None},\optional{,
Fred Drake96352682003-04-25 18:02:34 +0000122 restkey=\constant{None}\optional{,
123 restval=\constant{None}\optional{,
Skip Montanarob4a04172003-03-20 23:29:12 +0000124 dialect=\code{'excel'}\optional{,
Skip Montanaro10659f22004-04-16 03:21:01 +0000125 *args, **kwds}}}}}}
Skip Montanarob4a04172003-03-20 23:29:12 +0000126Create an object which operates like a regular reader but maps the
Skip Montanarodffeed32003-10-03 14:03:01 +0000127information read into a dict whose keys are given by the optional
128{} \var{fieldnames}
129parameter. If the \var{fieldnames} parameter is omitted, the values in
130the first row of the \var{csvfile} will be used as the fieldnames.
131If the row read has fewer fields than the fieldnames sequence,
Skip Montanarob4a04172003-03-20 23:29:12 +0000132the value of \var{restval} will be used as the default value. If the row
133read has more fields than the fieldnames sequence, the remaining data is
134added as a sequence keyed by the value of \var{restkey}. If the row read
135has fewer fields than the fieldnames sequence, the remaining keys take the
Skip Montanaro10659f22004-04-16 03:21:01 +0000136value of the optional \var{restval} parameter. Any other optional or
137keyword arguments are passed to the underlying \class{reader} instance.
Skip Montanarob4a04172003-03-20 23:29:12 +0000138\end{classdesc}
139
140
141\begin{classdesc}{DictWriter}{csvfile, fieldnames\optional{,
142 restval=""\optional{,
143 extrasaction=\code{'raise'}\optional{,
Skip Montanaro10659f22004-04-16 03:21:01 +0000144 dialect=\code{'excel'}\optional{,
145 *args, **kwds}}}}}
Skip Montanarob4a04172003-03-20 23:29:12 +0000146Create an object which operates like a regular writer but maps dictionaries
147onto output rows. The \var{fieldnames} parameter identifies the order in
148which values in the dictionary passed to the \method{writerow()} method are
149written to the \var{csvfile}. The optional \var{restval} parameter
150specifies the value to be written if the dictionary is missing a key in
151\var{fieldnames}. If the dictionary passed to the \method{writerow()}
152method contains a key not found in \var{fieldnames}, the optional
153\var{extrasaction} parameter indicates what action to take. If it is set
154to \code{'raise'} a \exception{ValueError} is raised. If it is set to
Skip Montanaro10659f22004-04-16 03:21:01 +0000155\code{'ignore'}, extra values in the dictionary are ignored. Any other
156optional or keyword arguments are passed to the underlying \class{writer}
157instance.
Skip Montanarodffeed32003-10-03 14:03:01 +0000158
159Note that unlike the \class{DictReader} class, the \var{fieldnames}
160parameter of the \class{DictWriter} is not optional. Since Python's
161\class{dict} objects are not ordered, there is not enough information
162available to deduce the order in which the row should be written to the
163\var{csvfile}.
164
Skip Montanarob4a04172003-03-20 23:29:12 +0000165\end{classdesc}
166
Skip Montanarob4a04172003-03-20 23:29:12 +0000167\begin{classdesc*}{Dialect}{}
168The \class{Dialect} class is a container class relied on primarily for its
169attributes, which are used to define the parameters for a specific
Fred Drake96352682003-04-25 18:02:34 +0000170\class{reader} or \class{writer} instance.
Skip Montanarob4a04172003-03-20 23:29:12 +0000171\end{classdesc*}
172
Skip Montanaro77892372003-05-19 15:33:36 +0000173\begin{classdesc}{Sniffer}{}
174The \class{Sniffer} class is used to deduce the format of a CSV file.
Fred Drake96352682003-04-25 18:02:34 +0000175\end{classdesc}
176
177The \class{Sniffer} class provides a single method:
178
Skip Montanaro77892372003-05-19 15:33:36 +0000179\begin{methoddesc}{sniff}{sample\optional{,delimiters=None}}
180Analyze the given \var{sample} and return a \class{Dialect} subclass
181reflecting the parameters found. If the optional \var{delimiters} parameter
182is given, it is interpreted as a string containing possible valid delimiter
183characters.
Fred Drake96352682003-04-25 18:02:34 +0000184\end{methoddesc}
185
186\begin{methoddesc}{has_header}{sample}
187Analyze the sample text (presumed to be in CSV format) and return
188\constant{True} if the first row appears to be a series of column
189headers.
190\end{methoddesc}
191
192
Skip Montanarob4a04172003-03-20 23:29:12 +0000193The \module{csv} module defines the following constants:
194
Skip Montanaroa1045562003-06-04 15:30:13 +0000195\begin{datadesc}{QUOTE_ALL}
Skip Montanarob4a04172003-03-20 23:29:12 +0000196Instructs \class{writer} objects to quote all fields.
197\end{datadesc}
198
199\begin{datadesc}{QUOTE_MINIMAL}
200Instructs \class{writer} objects to only quote those fields which contain
201the current \var{delimiter} or begin with the current \var{quotechar}.
202\end{datadesc}
203
204\begin{datadesc}{QUOTE_NONNUMERIC}
205Instructs \class{writer} objects to quote all non-numeric fields.
206\end{datadesc}
207
208\begin{datadesc}{QUOTE_NONE}
209Instructs \class{writer} objects to never quote fields. When the current
210\var{delimiter} occurs in output data it is preceded by the current
211\var{escapechar} character. When \constant{QUOTE_NONE} is in effect, it
212is an error not to have a single-character \var{escapechar} defined, even if
213no data to be written contains the \var{delimiter} character.
214\end{datadesc}
215
216
217The \module{csv} module defines the following exception:
218
219\begin{excdesc}{Error}
220Raised by any of the functions when an error is detected.
221\end{excdesc}
222
223
Fred Drake96352682003-04-25 18:02:34 +0000224\subsection{Dialects and Formatting Parameters\label{csv-fmt-params}}
Skip Montanarob4a04172003-03-20 23:29:12 +0000225
226To make it easier to specify the format of input and output records,
227specific formatting parameters are grouped together into dialects. A
228dialect is a subclass of the \class{Dialect} class having a set of specific
229methods and a single \method{validate()} method. When creating \class{reader}
230or \class{writer} objects, the programmer can specify a string or a subclass
231of the \class{Dialect} class as the dialect parameter. In addition to, or
232instead of, the \var{dialect} parameter, the programmer can also specify
233individual formatting parameters, which have the same names as the
Raymond Hettinger6f6d7b932003-08-31 05:44:54 +0000234attributes defined below for the \class{Dialect} class.
Skip Montanarob4a04172003-03-20 23:29:12 +0000235
Fred Drake96352682003-04-25 18:02:34 +0000236Dialects support the following attributes:
237
238\begin{memberdesc}[Dialect]{delimiter}
239A one-character string used to separate fields. It defaults to \code{','}.
240\end{memberdesc}
241
242\begin{memberdesc}[Dialect]{doublequote}
243Controls how instances of \var{quotechar} appearing inside a field should be
Skip Montanaro78951462004-01-21 13:34:35 +0000244themselves be quoted. When \constant{True}, the character is doubled.
Fred Drake96352682003-04-25 18:02:34 +0000245When \constant{False}, the \var{escapechar} must be a one-character string
246which is used as a prefix to the \var{quotechar}. It defaults to
247\constant{True}.
248\end{memberdesc}
249
250\begin{memberdesc}[Dialect]{escapechar}
251A one-character string used to escape the \var{delimiter} if \var{quoting}
252is set to \constant{QUOTE_NONE}. It defaults to \constant{None}.
253\end{memberdesc}
254
255\begin{memberdesc}[Dialect]{lineterminator}
256The string used to terminate lines in the CSV file. It defaults to
257\code{'\e r\e n'}.
258\end{memberdesc}
259
260\begin{memberdesc}[Dialect]{quotechar}
261A one-character string used to quote elements containing the \var{delimiter}
262or which start with the \var{quotechar}. It defaults to \code{'"'}.
263\end{memberdesc}
264
265\begin{memberdesc}[Dialect]{quoting}
266Controls when quotes should be generated by the writer. It can take on any
Raymond Hettinger6f6d7b932003-08-31 05:44:54 +0000267of the \constant{QUOTE_*} constants (see section~\ref{csv-contents})
268and defaults to \constant{QUOTE_MINIMAL}.
Fred Drake96352682003-04-25 18:02:34 +0000269\end{memberdesc}
270
271\begin{memberdesc}[Dialect]{skipinitialspace}
272When \constant{True}, whitespace immediately following the \var{delimiter}
273is ignored. The default is \constant{False}.
274\end{memberdesc}
275
Skip Montanarob4a04172003-03-20 23:29:12 +0000276
277\subsection{Reader Objects}
278
Fred Drake96352682003-04-25 18:02:34 +0000279Reader objects (\class{DictReader} instances and objects returned by
Raymond Hettinger6f6d7b932003-08-31 05:44:54 +0000280the \function{reader()} function) have the following public methods:
Skip Montanarob4a04172003-03-20 23:29:12 +0000281
Fred Drake96352682003-04-25 18:02:34 +0000282\begin{methoddesc}[csv reader]{next}{}
Skip Montanarob4a04172003-03-20 23:29:12 +0000283Return the next row of the reader's iterable object as a list, parsed
284according to the current dialect.
285\end{methoddesc}
286
287
288\subsection{Writer Objects}
289
Skip Montanaroba0485a2004-01-21 13:47:04 +0000290\class{Writer} objects (\class{DictWriter} instances and objects returned by
291the \function{writer()} function) have the following public methods. A
292{}\var{row} must be a sequence of strings or numbers for \class{Writer}
293objects and a dictionary mapping fieldnames to strings or numbers (by
294passing them through \function{str()} first) for {}\class{DictWriter}
295objects. Note that complex numbers are written out surrounded by parens.
296This may cause some problems for other programs which read CSV files
297(assuming they support complex numbers at all).
Skip Montanarob4a04172003-03-20 23:29:12 +0000298
Fred Drake96352682003-04-25 18:02:34 +0000299\begin{methoddesc}[csv writer]{writerow}{row}
Skip Montanarob4a04172003-03-20 23:29:12 +0000300Write the \var{row} parameter to the writer's file object, formatted
301according to the current dialect.
302\end{methoddesc}
303
Fred Drake96352682003-04-25 18:02:34 +0000304\begin{methoddesc}[csv writer]{writerows}{rows}
Skip Montanaroba0485a2004-01-21 13:47:04 +0000305Write all the \var{rows} parameters (a list of \var{row} objects as
306described above) to the writer's file object, formatted
Skip Montanarob4a04172003-03-20 23:29:12 +0000307according to the current dialect.
308\end{methoddesc}
309
310
311\subsection{Examples}
312
313The ``Hello, world'' of csv reading is
314
315\begin{verbatim}
Fred Drake96352682003-04-25 18:02:34 +0000316import csv
Skip Montanarobdda9f32004-03-17 01:24:17 +0000317reader = csv.reader(file("some.csv", "rb"))
Fred Drake96352682003-04-25 18:02:34 +0000318for row in reader:
319 print row
Skip Montanarob4a04172003-03-20 23:29:12 +0000320\end{verbatim}
321
322The corresponding simplest possible writing example is
323
324\begin{verbatim}
Fred Drake96352682003-04-25 18:02:34 +0000325import csv
Skip Montanarobdda9f32004-03-17 01:24:17 +0000326writer = csv.writer(file("some.csv", "wb"))
Fred Drake96352682003-04-25 18:02:34 +0000327for row in someiterable:
328 writer.writerow(row)
Skip Montanarob4a04172003-03-20 23:29:12 +0000329\end{verbatim}