blob: fd56e46fdb69bed19dc4b8fb00ceddf2aa84c760 [file] [log] [blame]
Georg Brandl116aa622007-08-15 14:28:22 +00001
2:mod:`csv` --- CSV File Reading and Writing
3===========================================
4
5.. module:: csv
6 :synopsis: Write and read tabular data to and from delimited files.
7.. sectionauthor:: Skip Montanaro <skip@pobox.com>
8
9
Georg Brandl116aa622007-08-15 14:28:22 +000010.. index::
11 single: csv
12 pair: data; tabular
13
14The so-called CSV (Comma Separated Values) format is the most common import and
15export format for spreadsheets and databases. There is no "CSV standard", so
16the format is operationally defined by the many applications which read and
17write it. The lack of a standard means that subtle differences often exist in
18the data produced and consumed by different applications. These differences can
19make it annoying to process CSV files from multiple sources. Still, while the
20delimiters and quoting characters vary, the overall format is similar enough
21that it is possible to write a single module which can efficiently manipulate
22such data, hiding the details of reading and writing the data from the
23programmer.
24
25The :mod:`csv` module implements classes to read and write tabular data in CSV
26format. It allows programmers to say, "write this data in the format preferred
27by Excel," or "read data from this file which was generated by Excel," without
28knowing the precise details of the CSV format used by Excel. Programmers can
29also describe the CSV formats understood by other applications or define their
30own special-purpose CSV formats.
31
32The :mod:`csv` module's :class:`reader` and :class:`writer` objects read and
33write sequences. Programmers can also read and write data in dictionary form
34using the :class:`DictReader` and :class:`DictWriter` classes.
35
Georg Brandl116aa622007-08-15 14:28:22 +000036.. seealso::
37
Georg Brandl116aa622007-08-15 14:28:22 +000038 :pep:`305` - CSV File API
39 The Python Enhancement Proposal which proposed this addition to Python.
40
41
42.. _csv-contents:
43
44Module Contents
45---------------
46
47The :mod:`csv` module defines the following functions:
48
49
50.. function:: reader(csvfile[, dialect='excel'][, fmtparam])
51
52 Return a reader object which will iterate over lines in the given *csvfile*.
Georg Brandl9afde1c2007-11-01 20:32:30 +000053 *csvfile* can be any object which supports the :term:`iterator` protocol and returns a
Georg Brandl116aa622007-08-15 14:28:22 +000054 string each time its :meth:`next` method is called --- file objects and list
R. David Murray8b7d4aa2009-04-04 01:38:38 +000055 objects are both suitable. If *csvfile* is a file object, it should be opened
56 with ``newline=''``. [#]_ An optional
Georg Brandl116aa622007-08-15 14:28:22 +000057 *dialect* parameter can be given which is used to define a set of parameters
58 specific to a particular CSV dialect. It may be an instance of a subclass of
59 the :class:`Dialect` class or one of the strings returned by the
60 :func:`list_dialects` function. The other optional *fmtparam* keyword arguments
61 can be given to override individual formatting parameters in the current
62 dialect. For full details about the dialect and formatting parameters, see
63 section :ref:`csv-fmt-params`.
64
Skip Montanaro0468df32009-03-25 00:52:18 +000065 Each row read from the csv file is returned as a list of strings. No
R. David Murray8b7d4aa2009-04-04 01:38:38 +000066 automatic data type conversion is performed unless the ``QUOTE_NONNUMERIC`` format
67 option is specified (in which case unquoted fields are transformed into floats).
Georg Brandl116aa622007-08-15 14:28:22 +000068
Christian Heimesb9eccbf2007-12-05 20:18:38 +000069 A short usage example::
Georg Brandl48310cd2009-01-03 21:18:54 +000070
Christian Heimesb9eccbf2007-12-05 20:18:38 +000071 >>> import csv
R. David Murray8b7d4aa2009-04-04 01:38:38 +000072 >>> spamReader = csv.reader(open('eggs.csv', newline=''), delimiter=' ', quotechar='|')
Christian Heimesb9eccbf2007-12-05 20:18:38 +000073 >>> for row in spamReader:
Georg Brandlf6945182008-02-01 11:56:49 +000074 ... print(', '.join(row))
Christian Heimesb9eccbf2007-12-05 20:18:38 +000075 Spam, Spam, Spam, Spam, Spam, Baked Beans
76 Spam, Lovely Spam, Wonderful Spam
77
Georg Brandl116aa622007-08-15 14:28:22 +000078
79.. function:: writer(csvfile[, dialect='excel'][, fmtparam])
80
81 Return a writer object responsible for converting the user's data into delimited
82 strings on the given file-like object. *csvfile* can be any object with a
R. David Murray8b7d4aa2009-04-04 01:38:38 +000083 :func:`write` method. An optional *dialect*
Georg Brandl116aa622007-08-15 14:28:22 +000084 parameter can be given which is used to define a set of parameters specific to a
85 particular CSV dialect. It may be an instance of a subclass of the
86 :class:`Dialect` class or one of the strings returned by the
87 :func:`list_dialects` function. The other optional *fmtparam* keyword arguments
88 can be given to override individual formatting parameters in the current
89 dialect. For full details about the dialect and formatting parameters, see
90 section :ref:`csv-fmt-params`. To make it
91 as easy as possible to interface with modules which implement the DB API, the
92 value :const:`None` is written as the empty string. While this isn't a
93 reversible transformation, it makes it easier to dump SQL NULL data values to
94 CSV files without preprocessing the data returned from a ``cursor.fetch*`` call.
95 All other non-string data are stringified with :func:`str` before being written.
96
Christian Heimesb9eccbf2007-12-05 20:18:38 +000097 A short usage example::
98
99 >>> import csv
100 >>> spamWriter = csv.writer(open('eggs.csv', 'w'), delimiter=' ',
101 ... quotechar='|', quoting=QUOTE_MINIMAL)
102 >>> spamWriter.writerow(['Spam'] * 5 + ['Baked Beans'])
103 >>> spamWriter.writerow(['Spam', 'Lovely Spam', 'Wonderful Spam'])
104
Georg Brandl116aa622007-08-15 14:28:22 +0000105
106.. function:: register_dialect(name[, dialect][, fmtparam])
107
Georg Brandlf6945182008-02-01 11:56:49 +0000108 Associate *dialect* with *name*. *name* must be a string. The
Georg Brandl116aa622007-08-15 14:28:22 +0000109 dialect can be specified either by passing a sub-class of :class:`Dialect`, or
110 by *fmtparam* keyword arguments, or both, with keyword arguments overriding
111 parameters of the dialect. For full details about the dialect and formatting
112 parameters, see section :ref:`csv-fmt-params`.
113
114
115.. function:: unregister_dialect(name)
116
117 Delete the dialect associated with *name* from the dialect registry. An
118 :exc:`Error` is raised if *name* is not a registered dialect name.
119
120
121.. function:: get_dialect(name)
122
Georg Brandl6554cb92007-12-02 23:15:43 +0000123 Return the dialect associated with *name*. An :exc:`Error` is raised if
124 *name* is not a registered dialect name. This function returns an immutable
125 :class:`Dialect`.
Georg Brandl116aa622007-08-15 14:28:22 +0000126
127.. function:: list_dialects()
128
129 Return the names of all registered dialects.
130
131
132.. function:: field_size_limit([new_limit])
133
134 Returns the current maximum field size allowed by the parser. If *new_limit* is
135 given, this becomes the new limit.
136
Georg Brandl116aa622007-08-15 14:28:22 +0000137
138The :mod:`csv` module defines the following classes:
139
Georg Brandl9afde1c2007-11-01 20:32:30 +0000140.. class:: DictReader(csvfile[, fieldnames=None[, restkey=None[, restval=None[, dialect='excel'[, *args, **kwds]]]]])
Georg Brandl116aa622007-08-15 14:28:22 +0000141
142 Create an object which operates like a regular reader but maps the information
143 read into a dict whose keys are given by the optional *fieldnames* parameter.
144 If the *fieldnames* parameter is omitted, the values in the first row of the
145 *csvfile* will be used as the fieldnames. If the row read has fewer fields than
146 the fieldnames sequence, the value of *restval* will be used as the default
147 value. If the row read has more fields than the fieldnames sequence, the
148 remaining data is added as a sequence keyed by the value of *restkey*. If the
149 row read has fewer fields than the fieldnames sequence, the remaining keys take
150 the value of the optional *restval* parameter. Any other optional or keyword
151 arguments are passed to the underlying :class:`reader` instance.
152
153
154.. class:: DictWriter(csvfile, fieldnames[, restval=''[, extrasaction='raise'[, dialect='excel'[, *args, **kwds]]]])
155
156 Create an object which operates like a regular writer but maps dictionaries onto
157 output rows. The *fieldnames* parameter identifies the order in which values in
158 the dictionary passed to the :meth:`writerow` method are written to the
159 *csvfile*. The optional *restval* parameter specifies the value to be written
160 if the dictionary is missing a key in *fieldnames*. If the dictionary passed to
161 the :meth:`writerow` method contains a key not found in *fieldnames*, the
162 optional *extrasaction* parameter indicates what action to take. If it is set
163 to ``'raise'`` a :exc:`ValueError` is raised. If it is set to ``'ignore'``,
164 extra values in the dictionary are ignored. Any other optional or keyword
165 arguments are passed to the underlying :class:`writer` instance.
166
167 Note that unlike the :class:`DictReader` class, the *fieldnames* parameter of
168 the :class:`DictWriter` is not optional. Since Python's :class:`dict` objects
169 are not ordered, there is not enough information available to deduce the order
170 in which the row should be written to the *csvfile*.
171
172
173.. class:: Dialect
174
175 The :class:`Dialect` class is a container class relied on primarily for its
176 attributes, which are used to define the parameters for a specific
177 :class:`reader` or :class:`writer` instance.
178
179
180.. class:: excel()
181
182 The :class:`excel` class defines the usual properties of an Excel-generated CSV
183 file. It is registered with the dialect name ``'excel'``.
184
185
186.. class:: excel_tab()
187
188 The :class:`excel_tab` class defines the usual properties of an Excel-generated
189 TAB-delimited file. It is registered with the dialect name ``'excel-tab'``.
190
191
192.. class:: Sniffer()
193
194 The :class:`Sniffer` class is used to deduce the format of a CSV file.
195
Benjamin Petersone41251e2008-04-25 01:59:09 +0000196 The :class:`Sniffer` class provides two methods:
Georg Brandl116aa622007-08-15 14:28:22 +0000197
Benjamin Petersone41251e2008-04-25 01:59:09 +0000198 .. method:: sniff(sample[, delimiters=None])
Georg Brandl116aa622007-08-15 14:28:22 +0000199
Benjamin Petersone41251e2008-04-25 01:59:09 +0000200 Analyze the given *sample* and return a :class:`Dialect` subclass
201 reflecting the parameters found. If the optional *delimiters* parameter
202 is given, it is interpreted as a string containing possible valid
203 delimiter characters.
Georg Brandl116aa622007-08-15 14:28:22 +0000204
205
Benjamin Petersone41251e2008-04-25 01:59:09 +0000206 .. method:: has_header(sample)
Georg Brandl116aa622007-08-15 14:28:22 +0000207
Benjamin Petersone41251e2008-04-25 01:59:09 +0000208 Analyze the sample text (presumed to be in CSV format) and return
209 :const:`True` if the first row appears to be a series of column headers.
Georg Brandl116aa622007-08-15 14:28:22 +0000210
Christian Heimes7f044312008-01-06 17:05:40 +0000211An example for :class:`Sniffer` use::
Georg Brandl116aa622007-08-15 14:28:22 +0000212
Christian Heimes7f044312008-01-06 17:05:40 +0000213 csvfile = open("example.csv")
214 dialect = csv.Sniffer().sniff(csvfile.read(1024))
215 csvfile.seek(0)
216 reader = csv.reader(csvfile, dialect)
217 # ... process CSV file contents here ...
218
219
220The :mod:`csv` module defines the following constants:
Georg Brandl116aa622007-08-15 14:28:22 +0000221
222.. data:: QUOTE_ALL
223
224 Instructs :class:`writer` objects to quote all fields.
225
226
227.. data:: QUOTE_MINIMAL
228
229 Instructs :class:`writer` objects to only quote those fields which contain
230 special characters such as *delimiter*, *quotechar* or any of the characters in
231 *lineterminator*.
232
233
234.. data:: QUOTE_NONNUMERIC
235
236 Instructs :class:`writer` objects to quote all non-numeric fields.
237
238 Instructs the reader to convert all non-quoted fields to type *float*.
239
240
241.. data:: QUOTE_NONE
242
243 Instructs :class:`writer` objects to never quote fields. When the current
244 *delimiter* occurs in output data it is preceded by the current *escapechar*
245 character. If *escapechar* is not set, the writer will raise :exc:`Error` if
246 any characters that require escaping are encountered.
247
248 Instructs :class:`reader` to perform no special processing of quote characters.
249
250The :mod:`csv` module defines the following exception:
251
252
253.. exception:: Error
254
255 Raised by any of the functions when an error is detected.
256
Georg Brandl116aa622007-08-15 14:28:22 +0000257.. _csv-fmt-params:
258
259Dialects and Formatting Parameters
260----------------------------------
261
262To make it easier to specify the format of input and output records, specific
263formatting parameters are grouped together into dialects. A dialect is a
264subclass of the :class:`Dialect` class having a set of specific methods and a
265single :meth:`validate` method. When creating :class:`reader` or
266:class:`writer` objects, the programmer can specify a string or a subclass of
267the :class:`Dialect` class as the dialect parameter. In addition to, or instead
268of, the *dialect* parameter, the programmer can also specify individual
269formatting parameters, which have the same names as the attributes defined below
270for the :class:`Dialect` class.
271
272Dialects support the following attributes:
273
274
275.. attribute:: Dialect.delimiter
276
277 A one-character string used to separate fields. It defaults to ``','``.
278
279
280.. attribute:: Dialect.doublequote
281
282 Controls how instances of *quotechar* appearing inside a field should be
283 themselves be quoted. When :const:`True`, the character is doubled. When
284 :const:`False`, the *escapechar* is used as a prefix to the *quotechar*. It
285 defaults to :const:`True`.
286
287 On output, if *doublequote* is :const:`False` and no *escapechar* is set,
288 :exc:`Error` is raised if a *quotechar* is found in a field.
289
290
291.. attribute:: Dialect.escapechar
292
293 A one-character string used by the writer to escape the *delimiter* if *quoting*
294 is set to :const:`QUOTE_NONE` and the *quotechar* if *doublequote* is
295 :const:`False`. On reading, the *escapechar* removes any special meaning from
296 the following character. It defaults to :const:`None`, which disables escaping.
297
298
299.. attribute:: Dialect.lineterminator
300
301 The string used to terminate lines produced by the :class:`writer`. It defaults
302 to ``'\r\n'``.
303
304 .. note::
305
306 The :class:`reader` is hard-coded to recognise either ``'\r'`` or ``'\n'`` as
307 end-of-line, and ignores *lineterminator*. This behavior may change in the
308 future.
309
310
311.. attribute:: Dialect.quotechar
312
313 A one-character string used to quote fields containing special characters, such
314 as the *delimiter* or *quotechar*, or which contain new-line characters. It
315 defaults to ``'"'``.
316
317
318.. attribute:: Dialect.quoting
319
320 Controls when quotes should be generated by the writer and recognised by the
321 reader. It can take on any of the :const:`QUOTE_\*` constants (see section
322 :ref:`csv-contents`) and defaults to :const:`QUOTE_MINIMAL`.
323
324
325.. attribute:: Dialect.skipinitialspace
326
327 When :const:`True`, whitespace immediately following the *delimiter* is ignored.
328 The default is :const:`False`.
329
330
331Reader Objects
332--------------
333
334Reader objects (:class:`DictReader` instances and objects returned by the
335:func:`reader` function) have the following public methods:
336
Georg Brandlc7485062009-04-01 15:53:15 +0000337.. method:: csvreader.__next__()
Georg Brandl116aa622007-08-15 14:28:22 +0000338
339 Return the next row of the reader's iterable object as a list, parsed according
Georg Brandlc7485062009-04-01 15:53:15 +0000340 to the current dialect. Usually you should call this as ``next(reader)``.
341
Georg Brandl116aa622007-08-15 14:28:22 +0000342
343Reader objects have the following public attributes:
344
Georg Brandl116aa622007-08-15 14:28:22 +0000345.. attribute:: csvreader.dialect
346
347 A read-only description of the dialect in use by the parser.
348
349
350.. attribute:: csvreader.line_num
351
352 The number of lines read from the source iterator. This is not the same as the
353 number of records returned, as records can span multiple lines.
354
Georg Brandl116aa622007-08-15 14:28:22 +0000355
Skip Montanaroaf8fcfa2008-08-09 19:44:22 +0000356DictReader objects have the following public attribute:
357
Skip Montanaroaf8fcfa2008-08-09 19:44:22 +0000358.. attribute:: csvreader.fieldnames
359
360 If not passed as a parameter when creating the object, this attribute is
361 initialized upon first access or when the first record is read from the
362 file.
363
Skip Montanaroaf8fcfa2008-08-09 19:44:22 +0000364
365
Georg Brandl116aa622007-08-15 14:28:22 +0000366Writer Objects
367--------------
368
369:class:`Writer` objects (:class:`DictWriter` instances and objects returned by
370the :func:`writer` function) have the following public methods. A *row* must be
371a sequence of strings or numbers for :class:`Writer` objects and a dictionary
372mapping fieldnames to strings or numbers (by passing them through :func:`str`
373first) for :class:`DictWriter` objects. Note that complex numbers are written
374out surrounded by parens. This may cause some problems for other programs which
375read CSV files (assuming they support complex numbers at all).
376
377
378.. method:: csvwriter.writerow(row)
379
380 Write the *row* parameter to the writer's file object, formatted according to
381 the current dialect.
382
383
384.. method:: csvwriter.writerows(rows)
385
386 Write all the *rows* parameters (a list of *row* objects as described above) to
387 the writer's file object, formatted according to the current dialect.
388
389Writer objects have the following public attribute:
390
391
392.. attribute:: csvwriter.dialect
393
394 A read-only description of the dialect in use by the writer.
395
396
397.. _csv-examples:
398
399Examples
400--------
401
402The simplest example of reading a CSV file::
403
404 import csv
R. David Murray8b7d4aa2009-04-04 01:38:38 +0000405 reader = csv.reader(open("some.csv", newline=''))
Georg Brandl116aa622007-08-15 14:28:22 +0000406 for row in reader:
Georg Brandl6911e3c2007-09-04 07:15:32 +0000407 print(row)
Georg Brandl116aa622007-08-15 14:28:22 +0000408
409Reading a file with an alternate format::
410
411 import csv
R. David Murray8b7d4aa2009-04-04 01:38:38 +0000412 reader = csv.reader(open("passwd"), delimiter=':', quoting=csv.QUOTE_NONE)
Georg Brandl116aa622007-08-15 14:28:22 +0000413 for row in reader:
Georg Brandl6911e3c2007-09-04 07:15:32 +0000414 print(row)
Georg Brandl116aa622007-08-15 14:28:22 +0000415
416The corresponding simplest possible writing example is::
417
418 import csv
R. David Murray8b7d4aa2009-04-04 01:38:38 +0000419 writer = csv.writer(open("some.csv", "w"))
Georg Brandl116aa622007-08-15 14:28:22 +0000420 writer.writerows(someiterable)
421
R. David Murray8b7d4aa2009-04-04 01:38:38 +0000422Since :func:`open` is used to open a CSV file for reading, the file
423will by default be decoded into unicode using the system default
424encoding (see :func:`locale.getpreferredencoding`). To decode a file
425using a different encoding, use the ``encoding`` argument of open::
426
427 import csv
428 reader = csv.reader(open("some.csv", newline='', encoding='utf-8'))
429 for row in reader:
430 print(row)
431
432The same applies to writing in something other than the system default
433encoding: specify the encoding argument when opening the output file.
434
Georg Brandl116aa622007-08-15 14:28:22 +0000435Registering a new dialect::
436
437 import csv
Georg Brandl116aa622007-08-15 14:28:22 +0000438 csv.register_dialect('unixpwd', delimiter=':', quoting=csv.QUOTE_NONE)
R. David Murray8b7d4aa2009-04-04 01:38:38 +0000439 reader = csv.reader(open("passwd"), 'unixpwd')
Georg Brandl116aa622007-08-15 14:28:22 +0000440
441A slightly more advanced use of the reader --- catching and reporting errors::
442
443 import csv, sys
444 filename = "some.csv"
R. David Murray8b7d4aa2009-04-04 01:38:38 +0000445 reader = csv.reader(open(filename, newline=''))
Georg Brandl116aa622007-08-15 14:28:22 +0000446 try:
447 for row in reader:
Georg Brandl6911e3c2007-09-04 07:15:32 +0000448 print(row)
Georg Brandl116aa622007-08-15 14:28:22 +0000449 except csv.Error as e:
R. David Murray8b7d4aa2009-04-04 01:38:38 +0000450 sys.exit('file {}, line {}: {}'.format(filename, reader.line_num, e))
Georg Brandl116aa622007-08-15 14:28:22 +0000451
452And while the module doesn't directly support parsing strings, it can easily be
453done::
454
455 import csv
456 for row in csv.reader(['one,two,three']):
Georg Brandl6911e3c2007-09-04 07:15:32 +0000457 print(row)
Georg Brandl116aa622007-08-15 14:28:22 +0000458
Georg Brandl116aa622007-08-15 14:28:22 +0000459
R. David Murray8b7d4aa2009-04-04 01:38:38 +0000460.. rubric:: Footnotes
Georg Brandl116aa622007-08-15 14:28:22 +0000461
R. David Murray8b7d4aa2009-04-04 01:38:38 +0000462.. [#] If ``newline=''`` is not specified, newlines embedded inside quoted fields
463 will not be interpreted correctly. It should always be safe to specify
464 ``newline=''``, since the csv module does its own universal newline handling
465 on input.