blob: 9f9449b43c5d5dfdd9ef113402e094e3849c67dc [file] [log] [blame]
Skip Montanarob4a04172003-03-20 23:29:12 +00001\section{\module{csv} --- CSV File Reading and Writing}
2
3\declaremodule{standard}{csv}
4\modulesynopsis{Write and read tabular data to and from delimited files.}
Skip Montanaro3bd3c842003-04-24 18:47:31 +00005\sectionauthor{Skip Montanaro}{skip@pobox.com}
Skip Montanarob4a04172003-03-20 23:29:12 +00006
7\versionadded{2.3}
8\index{csv}
9\indexii{data}{tabular}
10
11The so-called CSV (Comma Separated Values) format is the most common import
12and export format for spreadsheets and databases. There is no ``CSV
13standard'', so the format is operationally defined by the many applications
14which read and write it. The lack of a standard means that subtle
15differences often exist in the data produced and consumed by different
16applications. These differences can make it annoying to process CSV files
17from multiple sources. Still, while the delimiters and quoting characters
18vary, the overall format is similar enough that it is possible to write a
19single module which can efficiently manipulate such data, hiding the details
20of reading and writing the data from the programmer.
21
Skip Montanaro5d0136e2003-04-25 15:14:49 +000022The \module{csv} module implements classes to read and write tabular data in
Skip Montanarob4a04172003-03-20 23:29:12 +000023CSV format. It allows programmers to say, ``write this data in the format
24preferred by Excel,'' or ``read data from this file which was generated by
25Excel,'' without knowing the precise details of the CSV format used by
26Excel. Programmers can also describe the CSV formats understood by other
27applications or define their own special-purpose CSV formats.
28
Skip Montanaro5d0136e2003-04-25 15:14:49 +000029The \module{csv} module's \class{reader} and \class{writer} objects read and
Skip Montanarob4a04172003-03-20 23:29:12 +000030write sequences. Programmers can also read and write data in dictionary
31form using the \class{DictReader} and \class{DictWriter} classes.
32
Fred Drake96352682003-04-25 18:02:34 +000033\begin{notice}
34 This version of the \module{csv} module doesn't support Unicode
35 input. Also, there are currently some issues regarding \ASCII{} NUL
36 characters. Accordingly, all input should generally be printable
37 \ASCII{} to be safe. These restrictions will be removed in the future.
38\end{notice}
Skip Montanarob4a04172003-03-20 23:29:12 +000039
40\begin{seealso}
41% \seemodule{array}{Arrays of uniformly types numeric values.}
42 \seepep{305}{CSV File API}
43 {The Python Enhancement Proposal which proposed this addition
44 to Python.}
45\end{seealso}
46
47
Raymond Hettinger6f6d7b932003-08-31 05:44:54 +000048\subsection{Module Contents \label{csv-contents}}
Skip Montanarob4a04172003-03-20 23:29:12 +000049
Skip Montanaro5d0136e2003-04-25 15:14:49 +000050The \module{csv} module defines the following functions:
Skip Montanarob4a04172003-03-20 23:29:12 +000051
52\begin{funcdesc}{reader}{csvfile\optional{,
53 dialect=\code{'excel'}\optional{, fmtparam}}}
54Return a reader object which will iterate over lines in the given
55{}\var{csvfile}. \var{csvfile} can be any object which supports the
56iterator protocol and returns a string each time its \method{next}
Skip Montanaro5e4e39f2003-07-02 15:32:48 +000057method is called. If \var{csvfile} is a file object, it must be opened with
58the 'b' flag on platforms where that makes a difference. An optional
59{}\var{dialect} parameter can be given
Skip Montanarob4a04172003-03-20 23:29:12 +000060which is used to define a set of parameters specific to a particular CSV
61dialect. It may be an instance of a subclass of the \class{Dialect}
62class or one of the strings returned by the \function{list_dialects}
63function. The other optional {}\var{fmtparam} keyword arguments can be
64given to override individual formatting parameters in the current
65dialect. For more information about the dialect and formatting
Raymond Hettinger6e380cd2003-09-10 18:54:49 +000066parameters, see section~\ref{csv-fmt-params}, ``Dialects and Formatting
Skip Montanarob4a04172003-03-20 23:29:12 +000067Parameters'' for details of these parameters.
68
69All data read are returned as strings. No automatic data type
70conversion is performed.
71\end{funcdesc}
72
73\begin{funcdesc}{writer}{csvfile\optional{,
74 dialect=\code{'excel'}\optional{, fmtparam}}}
75Return a writer object responsible for converting the user's data into
Skip Montanaro5e4e39f2003-07-02 15:32:48 +000076delimited strings on the given file-like object. \var{csvfile} can be any
77object with a \function{write} method. If \var{csvfile} is a file object,
78it must be opened with the 'b' flag on platforms where that makes a
79difference. An optional
Skip Montanarob4a04172003-03-20 23:29:12 +000080{}\var{dialect} parameter can be given which is used to define a set of
81parameters specific to a particular CSV dialect. It may be an instance
82of a subclass of the \class{Dialect} class or one of the strings
83returned by the \function{list_dialects} function. The other optional
84{}\var{fmtparam} keyword arguments can be given to override individual
85formatting parameters in the current dialect. For more information
86about the dialect and formatting parameters, see
Raymond Hettinger6e380cd2003-09-10 18:54:49 +000087section~\ref{csv-fmt-params}, ``Dialects and Formatting Parameters'' for
Skip Montanarob4a04172003-03-20 23:29:12 +000088details of these parameters. To make it as easy as possible to
89interface with modules which implement the DB API, the value
90\constant{None} is written as the empty string. While this isn't a
91reversible transformation, it makes it easier to dump SQL NULL data values
92to CSV files without preprocessing the data returned from a
93\code{cursor.fetch*()} call. All other non-string data are stringified
94with \function{str()} before being written.
95\end{funcdesc}
96
97\begin{funcdesc}{register_dialect}{name, dialect}
98Associate \var{dialect} with \var{name}. \var{dialect} must be a subclass
99of \class{csv.Dialect}. \var{name} must be a string or Unicode object.
100\end{funcdesc}
101
102\begin{funcdesc}{unregister_dialect}{name}
103Delete the dialect associated with \var{name} from the dialect registry. An
104\exception{Error} is raised if \var{name} is not a registered dialect
105name.
106\end{funcdesc}
107
108\begin{funcdesc}{get_dialect}{name}
109Return the dialect associated with \var{name}. An \exception{Error} is
110raised if \var{name} is not a registered dialect name.
111\end{funcdesc}
112
113\begin{funcdesc}{list_dialects}{}
114Return the names of all registered dialects.
115\end{funcdesc}
116
117
Skip Montanaro5d0136e2003-04-25 15:14:49 +0000118The \module{csv} module defines the following classes:
Skip Montanarob4a04172003-03-20 23:29:12 +0000119
Skip Montanarodffeed32003-10-03 14:03:01 +0000120\begin{classdesc}{DictReader}{csvfile\optional{,
121 fieldnames=\constant{None},\optional{,
Fred Drake96352682003-04-25 18:02:34 +0000122 restkey=\constant{None}\optional{,
123 restval=\constant{None}\optional{,
Skip Montanarob4a04172003-03-20 23:29:12 +0000124 dialect=\code{'excel'}\optional{,
Skip Montanaro10659f22004-04-16 03:21:01 +0000125 *args, **kwds}}}}}}
Skip Montanarob4a04172003-03-20 23:29:12 +0000126Create an object which operates like a regular reader but maps the
Skip Montanarodffeed32003-10-03 14:03:01 +0000127information read into a dict whose keys are given by the optional
128{} \var{fieldnames}
129parameter. If the \var{fieldnames} parameter is omitted, the values in
130the first row of the \var{csvfile} will be used as the fieldnames.
131If the row read has fewer fields than the fieldnames sequence,
Skip Montanarob4a04172003-03-20 23:29:12 +0000132the value of \var{restval} will be used as the default value. If the row
133read has more fields than the fieldnames sequence, the remaining data is
134added as a sequence keyed by the value of \var{restkey}. If the row read
135has fewer fields than the fieldnames sequence, the remaining keys take the
Skip Montanaro10659f22004-04-16 03:21:01 +0000136value of the optional \var{restval} parameter. Any other optional or
137keyword arguments are passed to the underlying \class{reader} instance.
Skip Montanarob4a04172003-03-20 23:29:12 +0000138\end{classdesc}
139
140
141\begin{classdesc}{DictWriter}{csvfile, fieldnames\optional{,
142 restval=""\optional{,
143 extrasaction=\code{'raise'}\optional{,
Skip Montanaro10659f22004-04-16 03:21:01 +0000144 dialect=\code{'excel'}\optional{,
145 *args, **kwds}}}}}
Skip Montanarob4a04172003-03-20 23:29:12 +0000146Create an object which operates like a regular writer but maps dictionaries
147onto output rows. The \var{fieldnames} parameter identifies the order in
148which values in the dictionary passed to the \method{writerow()} method are
149written to the \var{csvfile}. The optional \var{restval} parameter
150specifies the value to be written if the dictionary is missing a key in
151\var{fieldnames}. If the dictionary passed to the \method{writerow()}
152method contains a key not found in \var{fieldnames}, the optional
153\var{extrasaction} parameter indicates what action to take. If it is set
154to \code{'raise'} a \exception{ValueError} is raised. If it is set to
Skip Montanaro10659f22004-04-16 03:21:01 +0000155\code{'ignore'}, extra values in the dictionary are ignored. Any other
156optional or keyword arguments are passed to the underlying \class{writer}
157instance.
Skip Montanarodffeed32003-10-03 14:03:01 +0000158
159Note that unlike the \class{DictReader} class, the \var{fieldnames}
160parameter of the \class{DictWriter} is not optional. Since Python's
161\class{dict} objects are not ordered, there is not enough information
162available to deduce the order in which the row should be written to the
163\var{csvfile}.
164
Skip Montanarob4a04172003-03-20 23:29:12 +0000165\end{classdesc}
166
Skip Montanarob4a04172003-03-20 23:29:12 +0000167\begin{classdesc*}{Dialect}{}
168The \class{Dialect} class is a container class relied on primarily for its
169attributes, which are used to define the parameters for a specific
Fred Drake96352682003-04-25 18:02:34 +0000170\class{reader} or \class{writer} instance.
Skip Montanarob4a04172003-03-20 23:29:12 +0000171\end{classdesc*}
172
Skip Montanarobb0c9dc2005-01-05 06:58:15 +0000173\begin{classdesc}{excel}{}
174The \class{excel} class defines the usual properties of an Excel-generated
175CSV file.
176\end{classdesc}
177
178\begin{classdesc}{excel_tab}{}
179The \class{excel_tab} class defines the usual properties of an
180Excel-generated TAB-delimited file.
181\end{classdesc}
182
Skip Montanaro77892372003-05-19 15:33:36 +0000183\begin{classdesc}{Sniffer}{}
184The \class{Sniffer} class is used to deduce the format of a CSV file.
Fred Drake96352682003-04-25 18:02:34 +0000185\end{classdesc}
186
187The \class{Sniffer} class provides a single method:
188
Skip Montanaro77892372003-05-19 15:33:36 +0000189\begin{methoddesc}{sniff}{sample\optional{,delimiters=None}}
190Analyze the given \var{sample} and return a \class{Dialect} subclass
191reflecting the parameters found. If the optional \var{delimiters} parameter
192is given, it is interpreted as a string containing possible valid delimiter
193characters.
Fred Drake96352682003-04-25 18:02:34 +0000194\end{methoddesc}
195
196\begin{methoddesc}{has_header}{sample}
197Analyze the sample text (presumed to be in CSV format) and return
198\constant{True} if the first row appears to be a series of column
199headers.
200\end{methoddesc}
201
202
Skip Montanarob4a04172003-03-20 23:29:12 +0000203The \module{csv} module defines the following constants:
204
Skip Montanaroa1045562003-06-04 15:30:13 +0000205\begin{datadesc}{QUOTE_ALL}
Skip Montanarob4a04172003-03-20 23:29:12 +0000206Instructs \class{writer} objects to quote all fields.
207\end{datadesc}
208
209\begin{datadesc}{QUOTE_MINIMAL}
210Instructs \class{writer} objects to only quote those fields which contain
211the current \var{delimiter} or begin with the current \var{quotechar}.
212\end{datadesc}
213
214\begin{datadesc}{QUOTE_NONNUMERIC}
215Instructs \class{writer} objects to quote all non-numeric fields.
216\end{datadesc}
217
218\begin{datadesc}{QUOTE_NONE}
219Instructs \class{writer} objects to never quote fields. When the current
220\var{delimiter} occurs in output data it is preceded by the current
221\var{escapechar} character. When \constant{QUOTE_NONE} is in effect, it
222is an error not to have a single-character \var{escapechar} defined, even if
223no data to be written contains the \var{delimiter} character.
224\end{datadesc}
225
226
227The \module{csv} module defines the following exception:
228
229\begin{excdesc}{Error}
230Raised by any of the functions when an error is detected.
231\end{excdesc}
232
233
Fred Drake96352682003-04-25 18:02:34 +0000234\subsection{Dialects and Formatting Parameters\label{csv-fmt-params}}
Skip Montanarob4a04172003-03-20 23:29:12 +0000235
236To make it easier to specify the format of input and output records,
237specific formatting parameters are grouped together into dialects. A
238dialect is a subclass of the \class{Dialect} class having a set of specific
239methods and a single \method{validate()} method. When creating \class{reader}
240or \class{writer} objects, the programmer can specify a string or a subclass
241of the \class{Dialect} class as the dialect parameter. In addition to, or
242instead of, the \var{dialect} parameter, the programmer can also specify
243individual formatting parameters, which have the same names as the
Raymond Hettinger6f6d7b932003-08-31 05:44:54 +0000244attributes defined below for the \class{Dialect} class.
Skip Montanarob4a04172003-03-20 23:29:12 +0000245
Fred Drake96352682003-04-25 18:02:34 +0000246Dialects support the following attributes:
247
248\begin{memberdesc}[Dialect]{delimiter}
249A one-character string used to separate fields. It defaults to \code{','}.
250\end{memberdesc}
251
252\begin{memberdesc}[Dialect]{doublequote}
253Controls how instances of \var{quotechar} appearing inside a field should be
Skip Montanaro78951462004-01-21 13:34:35 +0000254themselves be quoted. When \constant{True}, the character is doubled.
Fred Drake96352682003-04-25 18:02:34 +0000255When \constant{False}, the \var{escapechar} must be a one-character string
256which is used as a prefix to the \var{quotechar}. It defaults to
257\constant{True}.
258\end{memberdesc}
259
260\begin{memberdesc}[Dialect]{escapechar}
261A one-character string used to escape the \var{delimiter} if \var{quoting}
262is set to \constant{QUOTE_NONE}. It defaults to \constant{None}.
263\end{memberdesc}
264
265\begin{memberdesc}[Dialect]{lineterminator}
266The string used to terminate lines in the CSV file. It defaults to
267\code{'\e r\e n'}.
268\end{memberdesc}
269
270\begin{memberdesc}[Dialect]{quotechar}
271A one-character string used to quote elements containing the \var{delimiter}
272or which start with the \var{quotechar}. It defaults to \code{'"'}.
273\end{memberdesc}
274
275\begin{memberdesc}[Dialect]{quoting}
276Controls when quotes should be generated by the writer. It can take on any
Raymond Hettinger6f6d7b932003-08-31 05:44:54 +0000277of the \constant{QUOTE_*} constants (see section~\ref{csv-contents})
278and defaults to \constant{QUOTE_MINIMAL}.
Fred Drake96352682003-04-25 18:02:34 +0000279\end{memberdesc}
280
281\begin{memberdesc}[Dialect]{skipinitialspace}
282When \constant{True}, whitespace immediately following the \var{delimiter}
283is ignored. The default is \constant{False}.
284\end{memberdesc}
285
Skip Montanarob4a04172003-03-20 23:29:12 +0000286
287\subsection{Reader Objects}
288
Fred Drake96352682003-04-25 18:02:34 +0000289Reader objects (\class{DictReader} instances and objects returned by
Raymond Hettinger6f6d7b932003-08-31 05:44:54 +0000290the \function{reader()} function) have the following public methods:
Skip Montanarob4a04172003-03-20 23:29:12 +0000291
Fred Drake96352682003-04-25 18:02:34 +0000292\begin{methoddesc}[csv reader]{next}{}
Skip Montanarob4a04172003-03-20 23:29:12 +0000293Return the next row of the reader's iterable object as a list, parsed
294according to the current dialect.
295\end{methoddesc}
296
297
298\subsection{Writer Objects}
299
Skip Montanaroba0485a2004-01-21 13:47:04 +0000300\class{Writer} objects (\class{DictWriter} instances and objects returned by
301the \function{writer()} function) have the following public methods. A
302{}\var{row} must be a sequence of strings or numbers for \class{Writer}
303objects and a dictionary mapping fieldnames to strings or numbers (by
304passing them through \function{str()} first) for {}\class{DictWriter}
305objects. Note that complex numbers are written out surrounded by parens.
306This may cause some problems for other programs which read CSV files
307(assuming they support complex numbers at all).
Skip Montanarob4a04172003-03-20 23:29:12 +0000308
Fred Drake96352682003-04-25 18:02:34 +0000309\begin{methoddesc}[csv writer]{writerow}{row}
Skip Montanarob4a04172003-03-20 23:29:12 +0000310Write the \var{row} parameter to the writer's file object, formatted
311according to the current dialect.
312\end{methoddesc}
313
Fred Drake96352682003-04-25 18:02:34 +0000314\begin{methoddesc}[csv writer]{writerows}{rows}
Skip Montanaroba0485a2004-01-21 13:47:04 +0000315Write all the \var{rows} parameters (a list of \var{row} objects as
316described above) to the writer's file object, formatted
Skip Montanarob4a04172003-03-20 23:29:12 +0000317according to the current dialect.
318\end{methoddesc}
319
320
321\subsection{Examples}
322
323The ``Hello, world'' of csv reading is
324
325\begin{verbatim}
Fred Drake96352682003-04-25 18:02:34 +0000326import csv
Andrew M. Kuchling6f937b12004-08-07 15:11:24 +0000327reader = csv.reader(open("some.csv", "rb"))
Fred Drake96352682003-04-25 18:02:34 +0000328for row in reader:
329 print row
Skip Montanarob4a04172003-03-20 23:29:12 +0000330\end{verbatim}
331
Skip Montanaro2b2795a2004-07-08 19:49:10 +0000332To print just the first and last columns of each row try
333
334\begin{verbatim}
335import csv
Andrew M. Kuchling6f937b12004-08-07 15:11:24 +0000336reader = csv.reader(open("some.csv", "rb"))
Skip Montanaro2b2795a2004-07-08 19:49:10 +0000337for row in reader:
338 print row[0], row[-1]
339\end{verbatim}
340
Skip Montanarob4a04172003-03-20 23:29:12 +0000341The corresponding simplest possible writing example is
342
343\begin{verbatim}
Fred Drake96352682003-04-25 18:02:34 +0000344import csv
Andrew M. Kuchling6f937b12004-08-07 15:11:24 +0000345writer = csv.writer(open("some.csv", "wb"))
Fred Drake96352682003-04-25 18:02:34 +0000346for row in someiterable:
347 writer.writerow(row)
Skip Montanarob4a04172003-03-20 23:29:12 +0000348\end{verbatim}