blob: 7de90567b0394cf1d0addc8af5cf7a760d96aa44 [file] [log] [blame]
Skip Montanarob4a04172003-03-20 23:29:12 +00001\section{\module{csv} --- CSV File Reading and Writing}
2
3\declaremodule{standard}{csv}
4\modulesynopsis{Write and read tabular data to and from delimited files.}
Skip Montanaro3bd3c842003-04-24 18:47:31 +00005\sectionauthor{Skip Montanaro}{skip@pobox.com}
Skip Montanarob4a04172003-03-20 23:29:12 +00006
7\versionadded{2.3}
8\index{csv}
9\indexii{data}{tabular}
10
11The so-called CSV (Comma Separated Values) format is the most common import
12and export format for spreadsheets and databases. There is no ``CSV
13standard'', so the format is operationally defined by the many applications
14which read and write it. The lack of a standard means that subtle
15differences often exist in the data produced and consumed by different
16applications. These differences can make it annoying to process CSV files
17from multiple sources. Still, while the delimiters and quoting characters
18vary, the overall format is similar enough that it is possible to write a
19single module which can efficiently manipulate such data, hiding the details
20of reading and writing the data from the programmer.
21
Skip Montanaro5d0136e2003-04-25 15:14:49 +000022The \module{csv} module implements classes to read and write tabular data in
Skip Montanarob4a04172003-03-20 23:29:12 +000023CSV format. It allows programmers to say, ``write this data in the format
24preferred by Excel,'' or ``read data from this file which was generated by
25Excel,'' without knowing the precise details of the CSV format used by
26Excel. Programmers can also describe the CSV formats understood by other
27applications or define their own special-purpose CSV formats.
28
Skip Montanaro5d0136e2003-04-25 15:14:49 +000029The \module{csv} module's \class{reader} and \class{writer} objects read and
Skip Montanarob4a04172003-03-20 23:29:12 +000030write sequences. Programmers can also read and write data in dictionary
31form using the \class{DictReader} and \class{DictWriter} classes.
32
Skip Montanaro5d0136e2003-04-25 15:14:49 +000033\note{The first version of the \module{csv} module doesn't support Unicode
Skip Montanarob4a04172003-03-20 23:29:12 +000034input. Also, there are currently some issues regarding \ASCII{} NUL
Skip Montanaro3bd3c842003-04-24 18:47:31 +000035characters. Accordingly, all input should generally be printable \ASCII{}
36to be safe. These restrictions will be removed in the future.}
Skip Montanarob4a04172003-03-20 23:29:12 +000037
38\begin{seealso}
39% \seemodule{array}{Arrays of uniformly types numeric values.}
40 \seepep{305}{CSV File API}
41 {The Python Enhancement Proposal which proposed this addition
42 to Python.}
43\end{seealso}
44
45
Skip Montanaro5d0136e2003-04-25 15:14:49 +000046\subsection{Module Contents}
Skip Montanarob4a04172003-03-20 23:29:12 +000047
48
Skip Montanaro5d0136e2003-04-25 15:14:49 +000049The \module{csv} module defines the following functions:
Skip Montanarob4a04172003-03-20 23:29:12 +000050
51\begin{funcdesc}{reader}{csvfile\optional{,
52 dialect=\code{'excel'}\optional{, fmtparam}}}
53Return a reader object which will iterate over lines in the given
54{}\var{csvfile}. \var{csvfile} can be any object which supports the
55iterator protocol and returns a string each time its \method{next}
56method is called. An optional \var{dialect} parameter can be given
57which is used to define a set of parameters specific to a particular CSV
58dialect. It may be an instance of a subclass of the \class{Dialect}
59class or one of the strings returned by the \function{list_dialects}
60function. The other optional {}\var{fmtparam} keyword arguments can be
61given to override individual formatting parameters in the current
62dialect. For more information about the dialect and formatting
63parameters, see section~\ref{fmt-params}, ``Dialects and Formatting
64Parameters'' for details of these parameters.
65
66All data read are returned as strings. No automatic data type
67conversion is performed.
68\end{funcdesc}
69
70\begin{funcdesc}{writer}{csvfile\optional{,
71 dialect=\code{'excel'}\optional{, fmtparam}}}
72Return a writer object responsible for converting the user's data into
73delimited strings on the given file-like object. An optional
74{}\var{dialect} parameter can be given which is used to define a set of
75parameters specific to a particular CSV dialect. It may be an instance
76of a subclass of the \class{Dialect} class or one of the strings
77returned by the \function{list_dialects} function. The other optional
78{}\var{fmtparam} keyword arguments can be given to override individual
79formatting parameters in the current dialect. For more information
80about the dialect and formatting parameters, see
81section~\ref{fmt-params}, ``Dialects and Formatting Parameters'' for
82details of these parameters. To make it as easy as possible to
83interface with modules which implement the DB API, the value
84\constant{None} is written as the empty string. While this isn't a
85reversible transformation, it makes it easier to dump SQL NULL data values
86to CSV files without preprocessing the data returned from a
87\code{cursor.fetch*()} call. All other non-string data are stringified
88with \function{str()} before being written.
89\end{funcdesc}
90
91\begin{funcdesc}{register_dialect}{name, dialect}
92Associate \var{dialect} with \var{name}. \var{dialect} must be a subclass
93of \class{csv.Dialect}. \var{name} must be a string or Unicode object.
94\end{funcdesc}
95
96\begin{funcdesc}{unregister_dialect}{name}
97Delete the dialect associated with \var{name} from the dialect registry. An
98\exception{Error} is raised if \var{name} is not a registered dialect
99name.
100\end{funcdesc}
101
102\begin{funcdesc}{get_dialect}{name}
103Return the dialect associated with \var{name}. An \exception{Error} is
104raised if \var{name} is not a registered dialect name.
105\end{funcdesc}
106
107\begin{funcdesc}{list_dialects}{}
108Return the names of all registered dialects.
109\end{funcdesc}
110
111
Skip Montanaro5d0136e2003-04-25 15:14:49 +0000112The \module{csv} module defines the following classes:
Skip Montanarob4a04172003-03-20 23:29:12 +0000113
114\begin{classdesc}{DictReader}{csvfile, fieldnames\optional{,
115 restkey=\code{None}\optional{,
116 restval=\code{None}\optional{,
117 dialect=\code{'excel'}\optional{,
118 fmtparam}}}}}
119Create an object which operates like a regular reader but maps the
120information read into a dict whose keys are given by the \var{fieldnames}
121parameter. If the row read has fewer fields than the fieldnames sequence,
122the value of \var{restval} will be used as the default value. If the row
123read has more fields than the fieldnames sequence, the remaining data is
124added as a sequence keyed by the value of \var{restkey}. If the row read
125has fewer fields than the fieldnames sequence, the remaining keys take the
126value of the optiona \var{restval} parameter. All other parameters are
127interpreted as for regular readers.
128\end{classdesc}
129
130
131\begin{classdesc}{DictWriter}{csvfile, fieldnames\optional{,
132 restval=""\optional{,
133 extrasaction=\code{'raise'}\optional{,
134 dialect=\code{'excel'}\optional{, fmtparam}}}}}
135Create an object which operates like a regular writer but maps dictionaries
136onto output rows. The \var{fieldnames} parameter identifies the order in
137which values in the dictionary passed to the \method{writerow()} method are
138written to the \var{csvfile}. The optional \var{restval} parameter
139specifies the value to be written if the dictionary is missing a key in
140\var{fieldnames}. If the dictionary passed to the \method{writerow()}
141method contains a key not found in \var{fieldnames}, the optional
142\var{extrasaction} parameter indicates what action to take. If it is set
143to \code{'raise'} a \exception{ValueError} is raised. If it is set to
144\code{'ignore'}, extra values in the dictionary are ignored. All other
145parameters are interpreted as for regular writers.
146\end{classdesc}
147
148
149\begin{classdesc*}{Dialect}{}
150The \class{Dialect} class is a container class relied on primarily for its
151attributes, which are used to define the parameters for a specific
152\class{reader} or \class{writer} instance. Dialect objects support the
153following data attributes:
154
155\begin{memberdesc}[string]{delimiter}
156A one-character string used to separate fields. It defaults to \code{","}.
157\end{memberdesc}
158
159\begin{memberdesc}[boolean]{doublequote}
160Controls how instances of \var{quotechar} appearing inside a field should be
161themselves be quoted. When \constant{True}, the character is doubledd.
162When \constant{False}, the \var{escapechar} must be a one-character string
163which is used as a prefix to the \var{quotechar}. It defaults to
164\constant{True}.
165\end{memberdesc}
166
167\begin{memberdesc}{escapechar}
168A one-character string used to escape the \var{delimiter} if \var{quoting}
169is set to \constant{QUOTE_NONE}. It defaults to \constant{None}.
170\end{memberdesc}
171
172\begin{memberdesc}[string]{lineterminator}
173The string used to terminate lines in the CSV file. It defaults to
174\code{"\e r\e n"}.
175\end{memberdesc}
176
177\begin{memberdesc}[string]{quotechar}
178A one-character string used to quote elements containing the \var{delimiter}
179or which start with the \var{quotechar}. It defaults to \code{'"'}.
180\end{memberdesc}
181
182\begin{memberdesc}[integer]{quoting}
183Controls when quotes should be generated by the writer. It can take on any
184of the \code{QUOTE_*} constants defined below and defaults to
185\constant{QUOTE_MINIMAL}.
186\end{memberdesc}
187
188\begin{memberdesc}[boolean]{skipinitialspace}
189When \constant{True}, whitespace immediately following the \var{delimiter}
190is ignored. The default is \constant{False}.
191\end{memberdesc}
192
193\end{classdesc*}
194
195The \module{csv} module defines the following constants:
196
197\begin{datadesc}{QUOTE_ALWAYS}
198Instructs \class{writer} objects to quote all fields.
199\end{datadesc}
200
201\begin{datadesc}{QUOTE_MINIMAL}
202Instructs \class{writer} objects to only quote those fields which contain
203the current \var{delimiter} or begin with the current \var{quotechar}.
204\end{datadesc}
205
206\begin{datadesc}{QUOTE_NONNUMERIC}
207Instructs \class{writer} objects to quote all non-numeric fields.
208\end{datadesc}
209
210\begin{datadesc}{QUOTE_NONE}
211Instructs \class{writer} objects to never quote fields. When the current
212\var{delimiter} occurs in output data it is preceded by the current
213\var{escapechar} character. When \constant{QUOTE_NONE} is in effect, it
214is an error not to have a single-character \var{escapechar} defined, even if
215no data to be written contains the \var{delimiter} character.
216\end{datadesc}
217
218
219The \module{csv} module defines the following exception:
220
221\begin{excdesc}{Error}
222Raised by any of the functions when an error is detected.
223\end{excdesc}
224
225
226\subsection{Dialects and Formatting Parameters\label{fmt-params}}
227
228To make it easier to specify the format of input and output records,
229specific formatting parameters are grouped together into dialects. A
230dialect is a subclass of the \class{Dialect} class having a set of specific
231methods and a single \method{validate()} method. When creating \class{reader}
232or \class{writer} objects, the programmer can specify a string or a subclass
233of the \class{Dialect} class as the dialect parameter. In addition to, or
234instead of, the \var{dialect} parameter, the programmer can also specify
235individual formatting parameters, which have the same names as the
236attributes defined above for the \class{Dialect} class.
237
238
239\subsection{Reader Objects}
240
241\class{DictReader} and \var{reader} objects have the following public
242methods:
243
244\begin{methoddesc}{next}{}
245Return the next row of the reader's iterable object as a list, parsed
246according to the current dialect.
247\end{methoddesc}
248
249
250\subsection{Writer Objects}
251
252\class{DictWriter} and \var{writer} objects have the following public
253methods:
254
255\begin{methoddesc}{writerow}{row}
256Write the \var{row} parameter to the writer's file object, formatted
257according to the current dialect.
258\end{methoddesc}
259
260\begin{methoddesc}{writerows}{rows}
261Write all the \var{rows} parameters to the writer's file object, formatted
262according to the current dialect.
263\end{methoddesc}
264
265
Skip Montanaro5d0136e2003-04-25 15:14:49 +0000266\begin{classdesc}{Sniffer}{}
Skip Montanaro3bd3c842003-04-24 18:47:31 +0000267
Skip Montanaro5d0136e2003-04-25 15:14:49 +0000268The \class{Sniffer} class is used to deduce the format of a CSV file.
Skip Montanaro3bd3c842003-04-24 18:47:31 +0000269
Skip Montanaro5d0136e2003-04-25 15:14:49 +0000270\begin{methoddesc}{sniff}{sample}
271Analyze the sample text (presumed to be in CSV format) and return a
272{}\class{Dialect} class reflecting the parameters found.
Skip Montanaro3bd3c842003-04-24 18:47:31 +0000273\end{methoddesc}
274
Skip Montanaro5d0136e2003-04-25 15:14:49 +0000275\begin{methoddesc}{has_header}{sample}
276Analyze the sample text (presumed to be in CSV format) and return
277{}\code{True} if the first row appears to be a series of column headers.
Skip Montanaro3bd3c842003-04-24 18:47:31 +0000278\end{methoddesc}
279\end{classdesc}
280
Skip Montanarob4a04172003-03-20 23:29:12 +0000281\subsection{Examples}
282
283The ``Hello, world'' of csv reading is
284
285\begin{verbatim}
Skip Montanaro3bd3c842003-04-24 18:47:31 +0000286 import csv
Skip Montanarob4a04172003-03-20 23:29:12 +0000287 reader = csv.reader(file("some.csv"))
288 for row in reader:
289 print row
290\end{verbatim}
291
292The corresponding simplest possible writing example is
293
294\begin{verbatim}
Skip Montanaro3bd3c842003-04-24 18:47:31 +0000295 import csv
Skip Montanarob4a04172003-03-20 23:29:12 +0000296 writer = csv.writer(file("some.csv", "w"))
297 for row in someiterable:
298 writer.writerow(row)
299\end{verbatim}