blob: b566f6a97533caa6b807a723c60b91efa516718f [file] [log] [blame]
Skip Montanarob4a04172003-03-20 23:29:12 +00001\section{\module{csv} --- CSV File Reading and Writing}
2
3\declaremodule{standard}{csv}
4\modulesynopsis{Write and read tabular data to and from delimited files.}
Skip Montanaro3bd3c842003-04-24 18:47:31 +00005\sectionauthor{Skip Montanaro}{skip@pobox.com}
Skip Montanarob4a04172003-03-20 23:29:12 +00006
7\versionadded{2.3}
8\index{csv}
9\indexii{data}{tabular}
10
11The so-called CSV (Comma Separated Values) format is the most common import
12and export format for spreadsheets and databases. There is no ``CSV
13standard'', so the format is operationally defined by the many applications
14which read and write it. The lack of a standard means that subtle
15differences often exist in the data produced and consumed by different
16applications. These differences can make it annoying to process CSV files
17from multiple sources. Still, while the delimiters and quoting characters
18vary, the overall format is similar enough that it is possible to write a
19single module which can efficiently manipulate such data, hiding the details
20of reading and writing the data from the programmer.
21
Skip Montanaro5d0136e2003-04-25 15:14:49 +000022The \module{csv} module implements classes to read and write tabular data in
Skip Montanarob4a04172003-03-20 23:29:12 +000023CSV format. It allows programmers to say, ``write this data in the format
24preferred by Excel,'' or ``read data from this file which was generated by
25Excel,'' without knowing the precise details of the CSV format used by
26Excel. Programmers can also describe the CSV formats understood by other
27applications or define their own special-purpose CSV formats.
28
Skip Montanaro5d0136e2003-04-25 15:14:49 +000029The \module{csv} module's \class{reader} and \class{writer} objects read and
Skip Montanarob4a04172003-03-20 23:29:12 +000030write sequences. Programmers can also read and write data in dictionary
31form using the \class{DictReader} and \class{DictWriter} classes.
32
Fred Drake96352682003-04-25 18:02:34 +000033\begin{notice}
34 This version of the \module{csv} module doesn't support Unicode
35 input. Also, there are currently some issues regarding \ASCII{} NUL
David Goodgercb30f972006-04-04 03:05:44 +000036 characters. Accordingly, all input should be UTF-8 or printable
37 \ASCII{} to be safe; see the examples in section~\ref{csv-examples}.
38 These restrictions will be removed in the future.
Fred Drake96352682003-04-25 18:02:34 +000039\end{notice}
Skip Montanarob4a04172003-03-20 23:29:12 +000040
41\begin{seealso}
42% \seemodule{array}{Arrays of uniformly types numeric values.}
43 \seepep{305}{CSV File API}
44 {The Python Enhancement Proposal which proposed this addition
45 to Python.}
46\end{seealso}
47
48
Raymond Hettinger6f6d7b932003-08-31 05:44:54 +000049\subsection{Module Contents \label{csv-contents}}
Skip Montanarob4a04172003-03-20 23:29:12 +000050
Skip Montanaro5d0136e2003-04-25 15:14:49 +000051The \module{csv} module defines the following functions:
Skip Montanarob4a04172003-03-20 23:29:12 +000052
53\begin{funcdesc}{reader}{csvfile\optional{,
Andrew McNamara8231de02005-01-12 11:47:57 +000054 dialect=\code{'excel'}}\optional{, fmtparam}}
Skip Montanarob4a04172003-03-20 23:29:12 +000055Return a reader object which will iterate over lines in the given
56{}\var{csvfile}. \var{csvfile} can be any object which supports the
57iterator protocol and returns a string each time its \method{next}
Andrew M. Kuchlingb9a79c92006-07-29 21:27:12 +000058method is called --- file objects and list objects are both suitable.
Andrew McNamara8231de02005-01-12 11:47:57 +000059If \var{csvfile} is a file object, it must be opened with
Skip Montanaro5e4e39f2003-07-02 15:32:48 +000060the 'b' flag on platforms where that makes a difference. An optional
61{}\var{dialect} parameter can be given
Skip Montanarob4a04172003-03-20 23:29:12 +000062which is used to define a set of parameters specific to a particular CSV
63dialect. It may be an instance of a subclass of the \class{Dialect}
64class or one of the strings returned by the \function{list_dialects}
65function. The other optional {}\var{fmtparam} keyword arguments can be
66given to override individual formatting parameters in the current
Skip Montanarobf4358a2006-10-08 17:51:24 +000067dialect. For full details about the dialect and formatting
Raymond Hettinger6e380cd2003-09-10 18:54:49 +000068parameters, see section~\ref{csv-fmt-params}, ``Dialects and Formatting
Skip Montanarobf4358a2006-10-08 17:51:24 +000069Parameters''.
Skip Montanarob4a04172003-03-20 23:29:12 +000070
71All data read are returned as strings. No automatic data type
72conversion is performed.
Skip Montanaroabd51a32006-07-29 20:06:05 +000073
74\versionchanged[
Andrew McNamara10183b82006-07-31 02:27:48 +000075The parser is now stricter with respect to multi-line quoted
76fields. Previously, if a line ended within a quoted field without a
77terminating newline character, a newline would be inserted into the
Skip Montanaro08bbccf2006-07-31 03:09:45 +000078returned field. This behavior caused problems when reading files
Skip Montanaro759c1852006-07-31 03:11:11 +000079which contained carriage return characters within fields. The
Skip Montanaro08bbccf2006-07-31 03:09:45 +000080behavior was changed to return the field without inserting newlines. As
Andrew McNamara10183b82006-07-31 02:27:48 +000081a consequence, if newlines embedded within fields are important, the
82input should be split into lines in a manner which preserves the newline
Skip Montanaro08bbccf2006-07-31 03:09:45 +000083characters]{2.5}
Skip Montanaroabd51a32006-07-29 20:06:05 +000084
Skip Montanarob4a04172003-03-20 23:29:12 +000085\end{funcdesc}
86
87\begin{funcdesc}{writer}{csvfile\optional{,
Andrew McNamara8231de02005-01-12 11:47:57 +000088 dialect=\code{'excel'}}\optional{, fmtparam}}
Skip Montanarob4a04172003-03-20 23:29:12 +000089Return a writer object responsible for converting the user's data into
Skip Montanaro5e4e39f2003-07-02 15:32:48 +000090delimited strings on the given file-like object. \var{csvfile} can be any
91object with a \function{write} method. If \var{csvfile} is a file object,
92it must be opened with the 'b' flag on platforms where that makes a
93difference. An optional
Skip Montanarob4a04172003-03-20 23:29:12 +000094{}\var{dialect} parameter can be given which is used to define a set of
95parameters specific to a particular CSV dialect. It may be an instance
96of a subclass of the \class{Dialect} class or one of the strings
97returned by the \function{list_dialects} function. The other optional
98{}\var{fmtparam} keyword arguments can be given to override individual
Skip Montanarobf4358a2006-10-08 17:51:24 +000099formatting parameters in the current dialect. For full details
Skip Montanarob4a04172003-03-20 23:29:12 +0000100about the dialect and formatting parameters, see
Skip Montanarobf4358a2006-10-08 17:51:24 +0000101section~\ref{csv-fmt-params}, ``Dialects and Formatting Parameters''.
102To make it as easy as possible to
Skip Montanarob4a04172003-03-20 23:29:12 +0000103interface with modules which implement the DB API, the value
104\constant{None} is written as the empty string. While this isn't a
105reversible transformation, it makes it easier to dump SQL NULL data values
106to CSV files without preprocessing the data returned from a
107\code{cursor.fetch*()} call. All other non-string data are stringified
108with \function{str()} before being written.
109\end{funcdesc}
110
Andrew McNamara8231de02005-01-12 11:47:57 +0000111\begin{funcdesc}{register_dialect}{name\optional{, dialect}\optional{, fmtparam}}
112Associate \var{dialect} with \var{name}. \var{name} must be a string
113or Unicode object. The dialect can be specified either by passing a
114sub-class of \class{Dialect}, or by \var{fmtparam} keyword arguments,
115or both, with keyword arguments overriding parameters of the dialect.
Skip Montanarobf4358a2006-10-08 17:51:24 +0000116For full details about the dialect and formatting parameters, see
117section~\ref{csv-fmt-params}, ``Dialects and Formatting Parameters''.
Skip Montanarob4a04172003-03-20 23:29:12 +0000118\end{funcdesc}
119
120\begin{funcdesc}{unregister_dialect}{name}
121Delete the dialect associated with \var{name} from the dialect registry. An
122\exception{Error} is raised if \var{name} is not a registered dialect
123name.
124\end{funcdesc}
125
126\begin{funcdesc}{get_dialect}{name}
127Return the dialect associated with \var{name}. An \exception{Error} is
128raised if \var{name} is not a registered dialect name.
Skip Montanaroca741402007-11-04 15:57:43 +0000129
130\versionchanged[
131This function now returns an immutable \class{Dialect}. Previously an
132instance of the requested dialect was returned. Users could modify the
133underlying class, changing the behavior of active readers and writers.]{2.5}
Skip Montanarob4a04172003-03-20 23:29:12 +0000134\end{funcdesc}
135
136\begin{funcdesc}{list_dialects}{}
137Return the names of all registered dialects.
138\end{funcdesc}
139
Andrew McNamara8231de02005-01-12 11:47:57 +0000140\begin{funcdesc}{field_size_limit}{\optional{new_limit}}
141 Returns the current maximum field size allowed by the parser. If
142 \var{new_limit} is given, this becomes the new limit.
143 \versionadded{2.5}
144\end{funcdesc}
145
Skip Montanarob4a04172003-03-20 23:29:12 +0000146
Skip Montanaro5d0136e2003-04-25 15:14:49 +0000147The \module{csv} module defines the following classes:
Skip Montanarob4a04172003-03-20 23:29:12 +0000148
Skip Montanarodffeed32003-10-03 14:03:01 +0000149\begin{classdesc}{DictReader}{csvfile\optional{,
150 fieldnames=\constant{None},\optional{,
Fred Drake96352682003-04-25 18:02:34 +0000151 restkey=\constant{None}\optional{,
152 restval=\constant{None}\optional{,
Skip Montanarob4a04172003-03-20 23:29:12 +0000153 dialect=\code{'excel'}\optional{,
Skip Montanaro10659f22004-04-16 03:21:01 +0000154 *args, **kwds}}}}}}
Skip Montanarob4a04172003-03-20 23:29:12 +0000155Create an object which operates like a regular reader but maps the
Skip Montanarodffeed32003-10-03 14:03:01 +0000156information read into a dict whose keys are given by the optional
157{} \var{fieldnames}
158parameter. If the \var{fieldnames} parameter is omitted, the values in
159the first row of the \var{csvfile} will be used as the fieldnames.
160If the row read has fewer fields than the fieldnames sequence,
Skip Montanarob4a04172003-03-20 23:29:12 +0000161the value of \var{restval} will be used as the default value. If the row
162read has more fields than the fieldnames sequence, the remaining data is
163added as a sequence keyed by the value of \var{restkey}. If the row read
164has fewer fields than the fieldnames sequence, the remaining keys take the
Skip Montanaro10659f22004-04-16 03:21:01 +0000165value of the optional \var{restval} parameter. Any other optional or
166keyword arguments are passed to the underlying \class{reader} instance.
Skip Montanarob4a04172003-03-20 23:29:12 +0000167\end{classdesc}
168
169
170\begin{classdesc}{DictWriter}{csvfile, fieldnames\optional{,
171 restval=""\optional{,
172 extrasaction=\code{'raise'}\optional{,
Skip Montanaro10659f22004-04-16 03:21:01 +0000173 dialect=\code{'excel'}\optional{,
174 *args, **kwds}}}}}
Skip Montanarob4a04172003-03-20 23:29:12 +0000175Create an object which operates like a regular writer but maps dictionaries
176onto output rows. The \var{fieldnames} parameter identifies the order in
177which values in the dictionary passed to the \method{writerow()} method are
178written to the \var{csvfile}. The optional \var{restval} parameter
179specifies the value to be written if the dictionary is missing a key in
180\var{fieldnames}. If the dictionary passed to the \method{writerow()}
181method contains a key not found in \var{fieldnames}, the optional
182\var{extrasaction} parameter indicates what action to take. If it is set
183to \code{'raise'} a \exception{ValueError} is raised. If it is set to
Skip Montanaro10659f22004-04-16 03:21:01 +0000184\code{'ignore'}, extra values in the dictionary are ignored. Any other
185optional or keyword arguments are passed to the underlying \class{writer}
186instance.
Skip Montanarodffeed32003-10-03 14:03:01 +0000187
188Note that unlike the \class{DictReader} class, the \var{fieldnames}
189parameter of the \class{DictWriter} is not optional. Since Python's
190\class{dict} objects are not ordered, there is not enough information
191available to deduce the order in which the row should be written to the
192\var{csvfile}.
193
Skip Montanarob4a04172003-03-20 23:29:12 +0000194\end{classdesc}
195
Skip Montanarob4a04172003-03-20 23:29:12 +0000196\begin{classdesc*}{Dialect}{}
197The \class{Dialect} class is a container class relied on primarily for its
198attributes, which are used to define the parameters for a specific
Fred Drake96352682003-04-25 18:02:34 +0000199\class{reader} or \class{writer} instance.
Skip Montanarob4a04172003-03-20 23:29:12 +0000200\end{classdesc*}
201
Skip Montanarobb0c9dc2005-01-05 06:58:15 +0000202\begin{classdesc}{excel}{}
203The \class{excel} class defines the usual properties of an Excel-generated
Skip Montanarobf4358a2006-10-08 17:51:24 +0000204CSV file. It is registered with the dialect name \code{'excel'}.
Skip Montanarobb0c9dc2005-01-05 06:58:15 +0000205\end{classdesc}
206
207\begin{classdesc}{excel_tab}{}
208The \class{excel_tab} class defines the usual properties of an
Skip Montanarobf4358a2006-10-08 17:51:24 +0000209Excel-generated TAB-delimited file. It is registered with the dialect name
210\code{'excel-tab'}.
Skip Montanarobb0c9dc2005-01-05 06:58:15 +0000211\end{classdesc}
212
Skip Montanaro77892372003-05-19 15:33:36 +0000213\begin{classdesc}{Sniffer}{}
214The \class{Sniffer} class is used to deduce the format of a CSV file.
Fred Drake96352682003-04-25 18:02:34 +0000215\end{classdesc}
216
Skip Montanaro8bdaac72005-12-28 15:56:58 +0000217The \class{Sniffer} class provides two methods:
Fred Drake96352682003-04-25 18:02:34 +0000218
Skip Montanaro77892372003-05-19 15:33:36 +0000219\begin{methoddesc}{sniff}{sample\optional{,delimiters=None}}
220Analyze the given \var{sample} and return a \class{Dialect} subclass
221reflecting the parameters found. If the optional \var{delimiters} parameter
222is given, it is interpreted as a string containing possible valid delimiter
223characters.
Fred Drake96352682003-04-25 18:02:34 +0000224\end{methoddesc}
225
226\begin{methoddesc}{has_header}{sample}
227Analyze the sample text (presumed to be in CSV format) and return
228\constant{True} if the first row appears to be a series of column
229headers.
230\end{methoddesc}
231
232
Skip Montanarob4a04172003-03-20 23:29:12 +0000233The \module{csv} module defines the following constants:
234
Skip Montanaroa1045562003-06-04 15:30:13 +0000235\begin{datadesc}{QUOTE_ALL}
Skip Montanarob4a04172003-03-20 23:29:12 +0000236Instructs \class{writer} objects to quote all fields.
237\end{datadesc}
238
239\begin{datadesc}{QUOTE_MINIMAL}
240Instructs \class{writer} objects to only quote those fields which contain
Andrew McNamara8231de02005-01-12 11:47:57 +0000241special characters such as \var{delimiter}, \var{quotechar} or any of the
242characters in \var{lineterminator}.
Skip Montanarob4a04172003-03-20 23:29:12 +0000243\end{datadesc}
244
245\begin{datadesc}{QUOTE_NONNUMERIC}
Andrew McNamara8231de02005-01-12 11:47:57 +0000246Instructs \class{writer} objects to quote all non-numeric
247fields.
248
249Instructs the reader to convert all non-quoted fields to type \var{float}.
Skip Montanarob4a04172003-03-20 23:29:12 +0000250\end{datadesc}
251
252\begin{datadesc}{QUOTE_NONE}
253Instructs \class{writer} objects to never quote fields. When the current
254\var{delimiter} occurs in output data it is preceded by the current
Andrew McNamara8231de02005-01-12 11:47:57 +0000255\var{escapechar} character. If \var{escapechar} is not set, the writer
256will raise \exception{Error} if any characters that require escaping
257are encountered.
258
259Instructs \class{reader} to perform no special processing of quote characters.
Skip Montanarob4a04172003-03-20 23:29:12 +0000260\end{datadesc}
261
262
263The \module{csv} module defines the following exception:
264
265\begin{excdesc}{Error}
266Raised by any of the functions when an error is detected.
267\end{excdesc}
268
269
Fred Drake96352682003-04-25 18:02:34 +0000270\subsection{Dialects and Formatting Parameters\label{csv-fmt-params}}
Skip Montanarob4a04172003-03-20 23:29:12 +0000271
272To make it easier to specify the format of input and output records,
273specific formatting parameters are grouped together into dialects. A
274dialect is a subclass of the \class{Dialect} class having a set of specific
275methods and a single \method{validate()} method. When creating \class{reader}
276or \class{writer} objects, the programmer can specify a string or a subclass
277of the \class{Dialect} class as the dialect parameter. In addition to, or
278instead of, the \var{dialect} parameter, the programmer can also specify
279individual formatting parameters, which have the same names as the
Raymond Hettinger6f6d7b932003-08-31 05:44:54 +0000280attributes defined below for the \class{Dialect} class.
Skip Montanarob4a04172003-03-20 23:29:12 +0000281
Fred Drake96352682003-04-25 18:02:34 +0000282Dialects support the following attributes:
283
284\begin{memberdesc}[Dialect]{delimiter}
285A one-character string used to separate fields. It defaults to \code{','}.
286\end{memberdesc}
287
288\begin{memberdesc}[Dialect]{doublequote}
Andrew McNamara8231de02005-01-12 11:47:57 +0000289Controls how instances of \var{quotechar} appearing inside a field should
290be themselves be quoted. When \constant{True}, the character is doubled.
291When \constant{False}, the \var{escapechar} is used as a prefix to the
292\var{quotechar}. It defaults to \constant{True}.
293
294On output, if \var{doublequote} is \constant{False} and no
295\var{escapechar} is set, \exception{Error} is raised if a \var{quotechar}
296is found in a field.
Fred Drake96352682003-04-25 18:02:34 +0000297\end{memberdesc}
298
299\begin{memberdesc}[Dialect]{escapechar}
Andrew McNamara8231de02005-01-12 11:47:57 +0000300A one-character string used by the writer to escape the \var{delimiter} if
301\var{quoting} is set to \constant{QUOTE_NONE} and the \var{quotechar}
302if \var{doublequote} is \constant{False}. On reading, the \var{escapechar}
303removes any special meaning from the following character. It defaults
304to \constant{None}, which disables escaping.
Fred Drake96352682003-04-25 18:02:34 +0000305\end{memberdesc}
306
307\begin{memberdesc}[Dialect]{lineterminator}
Andrew McNamara8231de02005-01-12 11:47:57 +0000308The string used to terminate lines produced by the \class{writer}.
309It defaults to \code{'\e r\e n'}.
310
311\note{The \class{reader} is hard-coded to recognise either \code{'\e r'}
312or \code{'\e n'} as end-of-line, and ignores \var{lineterminator}. This
313behavior may change in the future.}
Fred Drake96352682003-04-25 18:02:34 +0000314\end{memberdesc}
315
316\begin{memberdesc}[Dialect]{quotechar}
Andrew McNamara8231de02005-01-12 11:47:57 +0000317A one-character string used to quote fields containing special characters,
318such as the \var{delimiter} or \var{quotechar}, or which contain new-line
319characters. It defaults to \code{'"'}.
Fred Drake96352682003-04-25 18:02:34 +0000320\end{memberdesc}
321
322\begin{memberdesc}[Dialect]{quoting}
Andrew McNamara8231de02005-01-12 11:47:57 +0000323Controls when quotes should be generated by the writer and recognised
324by the reader. It can take on any of the \constant{QUOTE_*} constants
325(see section~\ref{csv-contents}) and defaults to \constant{QUOTE_MINIMAL}.
Fred Drake96352682003-04-25 18:02:34 +0000326\end{memberdesc}
327
328\begin{memberdesc}[Dialect]{skipinitialspace}
329When \constant{True}, whitespace immediately following the \var{delimiter}
330is ignored. The default is \constant{False}.
331\end{memberdesc}
332
Skip Montanarob4a04172003-03-20 23:29:12 +0000333
334\subsection{Reader Objects}
335
Fred Drake96352682003-04-25 18:02:34 +0000336Reader objects (\class{DictReader} instances and objects returned by
Raymond Hettinger6f6d7b932003-08-31 05:44:54 +0000337the \function{reader()} function) have the following public methods:
Skip Montanarob4a04172003-03-20 23:29:12 +0000338
Fred Drake96352682003-04-25 18:02:34 +0000339\begin{methoddesc}[csv reader]{next}{}
Skip Montanarob4a04172003-03-20 23:29:12 +0000340Return the next row of the reader's iterable object as a list, parsed
341according to the current dialect.
342\end{methoddesc}
343
Andrew McNamara8231de02005-01-12 11:47:57 +0000344Reader objects have the following public attributes:
345
346\begin{memberdesc}[csv reader]{dialect}
347A read-only description of the dialect in use by the parser.
348\end{memberdesc}
349
350\begin{memberdesc}[csv reader]{line_num}
351 The number of lines read from the source iterator. This is not the same
352 as the number of records returned, as records can span multiple lines.
Andrew M. Kuchling11b69e52006-10-27 12:18:58 +0000353 \versionadded{2.5}
Andrew McNamara8231de02005-01-12 11:47:57 +0000354\end{memberdesc}
355
Skip Montanarob4a04172003-03-20 23:29:12 +0000356
357\subsection{Writer Objects}
358
Skip Montanaroba0485a2004-01-21 13:47:04 +0000359\class{Writer} objects (\class{DictWriter} instances and objects returned by
360the \function{writer()} function) have the following public methods. A
361{}\var{row} must be a sequence of strings or numbers for \class{Writer}
362objects and a dictionary mapping fieldnames to strings or numbers (by
363passing them through \function{str()} first) for {}\class{DictWriter}
364objects. Note that complex numbers are written out surrounded by parens.
365This may cause some problems for other programs which read CSV files
366(assuming they support complex numbers at all).
Skip Montanarob4a04172003-03-20 23:29:12 +0000367
Fred Drake96352682003-04-25 18:02:34 +0000368\begin{methoddesc}[csv writer]{writerow}{row}
Skip Montanarob4a04172003-03-20 23:29:12 +0000369Write the \var{row} parameter to the writer's file object, formatted
370according to the current dialect.
371\end{methoddesc}
372
Fred Drake96352682003-04-25 18:02:34 +0000373\begin{methoddesc}[csv writer]{writerows}{rows}
Skip Montanaroba0485a2004-01-21 13:47:04 +0000374Write all the \var{rows} parameters (a list of \var{row} objects as
375described above) to the writer's file object, formatted
Skip Montanarob4a04172003-03-20 23:29:12 +0000376according to the current dialect.
377\end{methoddesc}
378
Andrew McNamara8231de02005-01-12 11:47:57 +0000379Writer objects have the following public attribute:
380
381\begin{memberdesc}[csv writer]{dialect}
382A read-only description of the dialect in use by the writer.
383\end{memberdesc}
384
385
Skip Montanarob4a04172003-03-20 23:29:12 +0000386
David Goodgercb30f972006-04-04 03:05:44 +0000387\subsection{Examples\label{csv-examples}}
Skip Montanarob4a04172003-03-20 23:29:12 +0000388
Andrew McNamara8231de02005-01-12 11:47:57 +0000389The simplest example of reading a CSV file:
Skip Montanarob4a04172003-03-20 23:29:12 +0000390
391\begin{verbatim}
Fred Drake96352682003-04-25 18:02:34 +0000392import csv
Andrew M. Kuchling6f937b12004-08-07 15:11:24 +0000393reader = csv.reader(open("some.csv", "rb"))
Fred Drake96352682003-04-25 18:02:34 +0000394for row in reader:
395 print row
Skip Montanarob4a04172003-03-20 23:29:12 +0000396\end{verbatim}
397
Andrew McNamara8231de02005-01-12 11:47:57 +0000398Reading a file with an alternate format:
Skip Montanaro2b2795a2004-07-08 19:49:10 +0000399
400\begin{verbatim}
401import csv
Andrew McNamara8231de02005-01-12 11:47:57 +0000402reader = csv.reader(open("passwd", "rb"), delimiter=':', quoting=csv.QUOTE_NONE)
Skip Montanaro2b2795a2004-07-08 19:49:10 +0000403for row in reader:
Andrew McNamara8231de02005-01-12 11:47:57 +0000404 print row
Skip Montanaro2b2795a2004-07-08 19:49:10 +0000405\end{verbatim}
406
Andrew McNamara8231de02005-01-12 11:47:57 +0000407The corresponding simplest possible writing example is:
Skip Montanarob4a04172003-03-20 23:29:12 +0000408
409\begin{verbatim}
Fred Drake96352682003-04-25 18:02:34 +0000410import csv
Andrew M. Kuchling6f937b12004-08-07 15:11:24 +0000411writer = csv.writer(open("some.csv", "wb"))
Andrew McNamara8231de02005-01-12 11:47:57 +0000412writer.writerows(someiterable)
Skip Montanarob4a04172003-03-20 23:29:12 +0000413\end{verbatim}
Andrew McNamara8231de02005-01-12 11:47:57 +0000414
415Registering a new dialect:
416
417\begin{verbatim}
418import csv
419
420csv.register_dialect('unixpwd', delimiter=':', quoting=csv.QUOTE_NONE)
421
422reader = csv.reader(open("passwd", "rb"), 'unixpwd')
423\end{verbatim}
424
Andrew M. Kuchlingb9a79c92006-07-29 21:27:12 +0000425A slightly more advanced use of the reader --- catching and reporting errors:
Andrew McNamara8231de02005-01-12 11:47:57 +0000426
427\begin{verbatim}
428import csv, sys
429filename = "some.csv"
430reader = csv.reader(open(filename, "rb"))
431try:
432 for row in reader:
433 print row
434except csv.Error, e:
435 sys.exit('file %s, line %d: %s' % (filename, reader.line_num, e))
436\end{verbatim}
437
438And while the module doesn't directly support parsing strings, it can
439easily be done:
440
441\begin{verbatim}
442import csv
Thomas Woutersbbdf6072006-02-16 14:57:05 +0000443for row in csv.reader(['one,two,three']):
444 print row
Andrew McNamara8231de02005-01-12 11:47:57 +0000445\end{verbatim}
446
Skip Montanaro5011c3f2005-03-18 16:56:37 +0000447The \module{csv} module doesn't directly support reading and writing
David Goodgercb30f972006-04-04 03:05:44 +0000448Unicode, but it is 8-bit-clean save for some problems with \ASCII{} NUL
449characters. So you can write functions or classes that handle the
450encoding and decoding for you as long as you avoid encodings like
451UTF-16 that use NULs. UTF-8 is recommended.
452
453\function{unicode_csv_reader} below is a generator that wraps
454\class{csv.reader} to handle Unicode CSV data (a list of Unicode
455strings). \function{utf_8_encoder} is a generator that encodes the
456Unicode strings as UTF-8, one string (or row) at a time. The encoded
457strings are parsed by the CSV reader, and
458\function{unicode_csv_reader} decodes the UTF-8-encoded cells back
459into Unicode:
460
461\begin{verbatim}
462import csv
463
464def unicode_csv_reader(unicode_csv_data, dialect=csv.excel, **kwargs):
465 # csv.py doesn't do Unicode; encode temporarily as UTF-8:
466 csv_reader = csv.reader(utf_8_encoder(unicode_csv_data),
467 dialect=dialect, **kwargs)
468 for row in csv_reader:
469 # decode UTF-8 back to Unicode, cell by cell:
470 yield [unicode(cell, 'utf-8') for cell in row]
471
472def utf_8_encoder(unicode_csv_data):
473 for line in unicode_csv_data:
474 yield line.encode('utf-8')
475\end{verbatim}
476
Walter Dörwaldf7bc5f92006-04-04 17:32:49 +0000477For all other encodings the following \class{UnicodeReader} and
478\class{UnicodeWriter} classes can be used. They take an additional
479\var{encoding} parameter in their constructor and make sure that the data
480passes the real reader or writer encoded as UTF-8:
Skip Montanaro5011c3f2005-03-18 16:56:37 +0000481
482\begin{verbatim}
Walter Dörwaldf7bc5f92006-04-04 17:32:49 +0000483import csv, codecs, cStringIO
484
485class UTF8Recoder:
486 """
487 Iterator that reads an encoded stream and reencodes the input to UTF-8
488 """
489 def __init__(self, f, encoding):
490 self.reader = codecs.getreader(encoding)(f)
491
492 def __iter__(self):
493 return self
494
495 def next(self):
496 return self.reader.next().encode("utf-8")
Skip Montanaro5011c3f2005-03-18 16:56:37 +0000497
498class UnicodeReader:
David Goodgercb30f972006-04-04 03:05:44 +0000499 """
500 A CSV reader which will iterate over lines in the CSV file "f",
501 which is encoded in the given encoding.
502 """
503
Skip Montanaro5011c3f2005-03-18 16:56:37 +0000504 def __init__(self, f, dialect=csv.excel, encoding="utf-8", **kwds):
Walter Dörwaldf7bc5f92006-04-04 17:32:49 +0000505 f = UTF8Recoder(f, encoding)
Skip Montanaro5011c3f2005-03-18 16:56:37 +0000506 self.reader = csv.reader(f, dialect=dialect, **kwds)
Skip Montanaro5011c3f2005-03-18 16:56:37 +0000507
508 def next(self):
509 row = self.reader.next()
Walter Dörwaldf7bc5f92006-04-04 17:32:49 +0000510 return [unicode(s, "utf-8") for s in row]
Skip Montanaro5011c3f2005-03-18 16:56:37 +0000511
512 def __iter__(self):
513 return self
514
515class UnicodeWriter:
David Goodgercb30f972006-04-04 03:05:44 +0000516 """
517 A CSV writer which will write rows to CSV file "f",
518 which is encoded in the given encoding.
519 """
520
Skip Montanaro5011c3f2005-03-18 16:56:37 +0000521 def __init__(self, f, dialect=csv.excel, encoding="utf-8", **kwds):
Walter Dörwaldf7bc5f92006-04-04 17:32:49 +0000522 # Redirect output to a queue
523 self.queue = cStringIO.StringIO()
524 self.writer = csv.writer(self.queue, dialect=dialect, **kwds)
525 self.stream = f
526 self.encoder = codecs.getincrementalencoder(encoding)()
Skip Montanaro5011c3f2005-03-18 16:56:37 +0000527
528 def writerow(self, row):
Walter Dörwaldf7bc5f92006-04-04 17:32:49 +0000529 self.writer.writerow([s.encode("utf-8") for s in row])
530 # Fetch UTF-8 output from the queue ...
531 data = self.queue.getvalue()
532 data = data.decode("utf-8")
533 # ... and reencode it into the target encoding
534 data = self.encoder.encode(data)
535 # write to the target stream
536 self.stream.write(data)
537 # empty queue
538 self.queue.truncate(0)
Skip Montanaro5011c3f2005-03-18 16:56:37 +0000539
540 def writerows(self, rows):
541 for row in rows:
542 self.writerow(row)
543\end{verbatim}