blob: e84e218a37ea2099155dcc02168fe0ba0e58c440 [file] [log] [blame]
Georg Brandl116aa622007-08-15 14:28:22 +00001:mod:`csv` --- CSV File Reading and Writing
2===========================================
3
4.. module:: csv
5 :synopsis: Write and read tabular data to and from delimited files.
6.. sectionauthor:: Skip Montanaro <skip@pobox.com>
7
8
Georg Brandl116aa622007-08-15 14:28:22 +00009.. index::
10 single: csv
11 pair: data; tabular
12
13The so-called CSV (Comma Separated Values) format is the most common import and
14export format for spreadsheets and databases. There is no "CSV standard", so
15the format is operationally defined by the many applications which read and
16write it. The lack of a standard means that subtle differences often exist in
17the data produced and consumed by different applications. These differences can
18make it annoying to process CSV files from multiple sources. Still, while the
19delimiters and quoting characters vary, the overall format is similar enough
20that it is possible to write a single module which can efficiently manipulate
21such data, hiding the details of reading and writing the data from the
22programmer.
23
24The :mod:`csv` module implements classes to read and write tabular data in CSV
25format. It allows programmers to say, "write this data in the format preferred
26by Excel," or "read data from this file which was generated by Excel," without
27knowing the precise details of the CSV format used by Excel. Programmers can
28also describe the CSV formats understood by other applications or define their
29own special-purpose CSV formats.
30
31The :mod:`csv` module's :class:`reader` and :class:`writer` objects read and
32write sequences. Programmers can also read and write data in dictionary form
33using the :class:`DictReader` and :class:`DictWriter` classes.
34
Georg Brandl116aa622007-08-15 14:28:22 +000035.. seealso::
36
Georg Brandl116aa622007-08-15 14:28:22 +000037 :pep:`305` - CSV File API
38 The Python Enhancement Proposal which proposed this addition to Python.
39
40
41.. _csv-contents:
42
43Module Contents
44---------------
45
46The :mod:`csv` module defines the following functions:
47
48
R David Murray1b00f252012-08-15 10:43:58 -040049.. index::
50 single: universal newlines; csv.reader function
51
Georg Brandlc2a4f4f2009-04-10 09:03:43 +000052.. function:: reader(csvfile, dialect='excel', **fmtparams)
Georg Brandl116aa622007-08-15 14:28:22 +000053
54 Return a reader object which will iterate over lines in the given *csvfile*.
Georg Brandl9afde1c2007-11-01 20:32:30 +000055 *csvfile* can be any object which supports the :term:`iterator` protocol and returns a
Georg Brandlb30f3302011-01-06 09:23:56 +000056 string each time its :meth:`!__next__` method is called --- :term:`file objects
Antoine Pitrou11cb9612010-09-15 11:11:28 +000057 <file object>` and list objects are both suitable. If *csvfile* is a file object,
R David Murrayf453deb2011-03-20 10:23:22 -040058 it should be opened with ``newline=''``. [1]_ An optional
Georg Brandl116aa622007-08-15 14:28:22 +000059 *dialect* parameter can be given which is used to define a set of parameters
60 specific to a particular CSV dialect. It may be an instance of a subclass of
61 the :class:`Dialect` class or one of the strings returned by the
Georg Brandlc2a4f4f2009-04-10 09:03:43 +000062 :func:`list_dialects` function. The other optional *fmtparams* keyword arguments
Georg Brandl116aa622007-08-15 14:28:22 +000063 can be given to override individual formatting parameters in the current
64 dialect. For full details about the dialect and formatting parameters, see
65 section :ref:`csv-fmt-params`.
66
Skip Montanaro0468df32009-03-25 00:52:18 +000067 Each row read from the csv file is returned as a list of strings. No
R. David Murray8b7d4aa2009-04-04 01:38:38 +000068 automatic data type conversion is performed unless the ``QUOTE_NONNUMERIC`` format
69 option is specified (in which case unquoted fields are transformed into floats).
Georg Brandl116aa622007-08-15 14:28:22 +000070
Christian Heimesb9eccbf2007-12-05 20:18:38 +000071 A short usage example::
Georg Brandl48310cd2009-01-03 21:18:54 +000072
Christian Heimesb9eccbf2007-12-05 20:18:38 +000073 >>> import csv
R. David Murray8b7d4aa2009-04-04 01:38:38 +000074 >>> spamReader = csv.reader(open('eggs.csv', newline=''), delimiter=' ', quotechar='|')
Christian Heimesb9eccbf2007-12-05 20:18:38 +000075 >>> for row in spamReader:
Georg Brandlf6945182008-02-01 11:56:49 +000076 ... print(', '.join(row))
Christian Heimesb9eccbf2007-12-05 20:18:38 +000077 Spam, Spam, Spam, Spam, Spam, Baked Beans
78 Spam, Lovely Spam, Wonderful Spam
79
Georg Brandl116aa622007-08-15 14:28:22 +000080
Georg Brandlc2a4f4f2009-04-10 09:03:43 +000081.. function:: writer(csvfile, dialect='excel', **fmtparams)
Georg Brandl116aa622007-08-15 14:28:22 +000082
83 Return a writer object responsible for converting the user's data into delimited
84 strings on the given file-like object. *csvfile* can be any object with a
R David Murray9c0d5ea2011-03-20 11:18:21 -040085 :func:`write` method. If *csvfile* is a file object, it should be opened with
86 ``newline=''`` [1]_. An optional *dialect*
Georg Brandl116aa622007-08-15 14:28:22 +000087 parameter can be given which is used to define a set of parameters specific to a
88 particular CSV dialect. It may be an instance of a subclass of the
89 :class:`Dialect` class or one of the strings returned by the
Georg Brandlc2a4f4f2009-04-10 09:03:43 +000090 :func:`list_dialects` function. The other optional *fmtparams* keyword arguments
Georg Brandl116aa622007-08-15 14:28:22 +000091 can be given to override individual formatting parameters in the current
92 dialect. For full details about the dialect and formatting parameters, see
93 section :ref:`csv-fmt-params`. To make it
94 as easy as possible to interface with modules which implement the DB API, the
95 value :const:`None` is written as the empty string. While this isn't a
96 reversible transformation, it makes it easier to dump SQL NULL data values to
97 CSV files without preprocessing the data returned from a ``cursor.fetch*`` call.
98 All other non-string data are stringified with :func:`str` before being written.
99
Christian Heimesb9eccbf2007-12-05 20:18:38 +0000100 A short usage example::
101
102 >>> import csv
R David Murrayf453deb2011-03-20 10:23:22 -0400103 >>> spamWriter = csv.writer(open('eggs.csv', 'w', newline=''), delimiter=' ',
Skip Montanarof290cb52009-08-18 14:37:42 +0000104 ... quotechar='|', quoting=csv.QUOTE_MINIMAL)
Christian Heimesb9eccbf2007-12-05 20:18:38 +0000105 >>> spamWriter.writerow(['Spam'] * 5 + ['Baked Beans'])
106 >>> spamWriter.writerow(['Spam', 'Lovely Spam', 'Wonderful Spam'])
107
Georg Brandl116aa622007-08-15 14:28:22 +0000108
Georg Brandlc2a4f4f2009-04-10 09:03:43 +0000109.. function:: register_dialect(name[, dialect], **fmtparams)
Georg Brandl116aa622007-08-15 14:28:22 +0000110
Georg Brandlf6945182008-02-01 11:56:49 +0000111 Associate *dialect* with *name*. *name* must be a string. The
Georg Brandl116aa622007-08-15 14:28:22 +0000112 dialect can be specified either by passing a sub-class of :class:`Dialect`, or
Georg Brandlc2a4f4f2009-04-10 09:03:43 +0000113 by *fmtparams* keyword arguments, or both, with keyword arguments overriding
Georg Brandl116aa622007-08-15 14:28:22 +0000114 parameters of the dialect. For full details about the dialect and formatting
115 parameters, see section :ref:`csv-fmt-params`.
116
117
118.. function:: unregister_dialect(name)
119
120 Delete the dialect associated with *name* from the dialect registry. An
121 :exc:`Error` is raised if *name* is not a registered dialect name.
122
123
124.. function:: get_dialect(name)
125
Georg Brandl6554cb92007-12-02 23:15:43 +0000126 Return the dialect associated with *name*. An :exc:`Error` is raised if
127 *name* is not a registered dialect name. This function returns an immutable
128 :class:`Dialect`.
Georg Brandl116aa622007-08-15 14:28:22 +0000129
130.. function:: list_dialects()
131
132 Return the names of all registered dialects.
133
134
135.. function:: field_size_limit([new_limit])
136
137 Returns the current maximum field size allowed by the parser. If *new_limit* is
138 given, this becomes the new limit.
139
Georg Brandl116aa622007-08-15 14:28:22 +0000140
141The :mod:`csv` module defines the following classes:
142
Georg Brandlc2a4f4f2009-04-10 09:03:43 +0000143.. class:: DictReader(csvfile, fieldnames=None, restkey=None, restval=None, dialect='excel', *args, **kwds)
Georg Brandl116aa622007-08-15 14:28:22 +0000144
145 Create an object which operates like a regular reader but maps the information
146 read into a dict whose keys are given by the optional *fieldnames* parameter.
147 If the *fieldnames* parameter is omitted, the values in the first row of the
R. David Murray578ec902009-11-09 14:21:38 +0000148 *csvfile* will be used as the fieldnames. If the row read has more fields
149 than the fieldnames sequence, the remaining data is added as a sequence
150 keyed by the value of *restkey*. If the row read has fewer fields than the
151 fieldnames sequence, the remaining keys take the value of the optional
152 *restval* parameter. Any other optional or keyword arguments are passed to
153 the underlying :class:`reader` instance.
Georg Brandl116aa622007-08-15 14:28:22 +0000154
155
Georg Brandlc2a4f4f2009-04-10 09:03:43 +0000156.. class:: DictWriter(csvfile, fieldnames, restval='', extrasaction='raise', dialect='excel', *args, **kwds)
Georg Brandl116aa622007-08-15 14:28:22 +0000157
158 Create an object which operates like a regular writer but maps dictionaries onto
159 output rows. The *fieldnames* parameter identifies the order in which values in
160 the dictionary passed to the :meth:`writerow` method are written to the
161 *csvfile*. The optional *restval* parameter specifies the value to be written
162 if the dictionary is missing a key in *fieldnames*. If the dictionary passed to
163 the :meth:`writerow` method contains a key not found in *fieldnames*, the
164 optional *extrasaction* parameter indicates what action to take. If it is set
165 to ``'raise'`` a :exc:`ValueError` is raised. If it is set to ``'ignore'``,
166 extra values in the dictionary are ignored. Any other optional or keyword
167 arguments are passed to the underlying :class:`writer` instance.
168
169 Note that unlike the :class:`DictReader` class, the *fieldnames* parameter of
170 the :class:`DictWriter` is not optional. Since Python's :class:`dict` objects
171 are not ordered, there is not enough information available to deduce the order
172 in which the row should be written to the *csvfile*.
173
174
175.. class:: Dialect
176
177 The :class:`Dialect` class is a container class relied on primarily for its
178 attributes, which are used to define the parameters for a specific
179 :class:`reader` or :class:`writer` instance.
180
181
182.. class:: excel()
183
184 The :class:`excel` class defines the usual properties of an Excel-generated CSV
185 file. It is registered with the dialect name ``'excel'``.
186
187
188.. class:: excel_tab()
189
190 The :class:`excel_tab` class defines the usual properties of an Excel-generated
191 TAB-delimited file. It is registered with the dialect name ``'excel-tab'``.
192
193
Georg Brandl7424dd32010-10-27 07:27:06 +0000194.. class:: unix_dialect()
195
196 The :class:`unix_dialect` class defines the usual properties of a CSV file
197 generated on UNIX systems, i.e. using ``'\n'`` as line terminator and quoting
198 all fields. It is registered with the dialect name ``'unix'``.
199
200 .. versionadded:: 3.2
201
202
Georg Brandl116aa622007-08-15 14:28:22 +0000203.. class:: Sniffer()
204
205 The :class:`Sniffer` class is used to deduce the format of a CSV file.
206
Benjamin Petersone41251e2008-04-25 01:59:09 +0000207 The :class:`Sniffer` class provides two methods:
Georg Brandl116aa622007-08-15 14:28:22 +0000208
Georg Brandlc2a4f4f2009-04-10 09:03:43 +0000209 .. method:: sniff(sample, delimiters=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000210
Benjamin Petersone41251e2008-04-25 01:59:09 +0000211 Analyze the given *sample* and return a :class:`Dialect` subclass
212 reflecting the parameters found. If the optional *delimiters* parameter
213 is given, it is interpreted as a string containing possible valid
214 delimiter characters.
Georg Brandl116aa622007-08-15 14:28:22 +0000215
216
Benjamin Petersone41251e2008-04-25 01:59:09 +0000217 .. method:: has_header(sample)
Georg Brandl116aa622007-08-15 14:28:22 +0000218
Benjamin Petersone41251e2008-04-25 01:59:09 +0000219 Analyze the sample text (presumed to be in CSV format) and return
220 :const:`True` if the first row appears to be a series of column headers.
Georg Brandl116aa622007-08-15 14:28:22 +0000221
Christian Heimes7f044312008-01-06 17:05:40 +0000222An example for :class:`Sniffer` use::
Georg Brandl116aa622007-08-15 14:28:22 +0000223
Christian Heimes7f044312008-01-06 17:05:40 +0000224 csvfile = open("example.csv")
225 dialect = csv.Sniffer().sniff(csvfile.read(1024))
226 csvfile.seek(0)
227 reader = csv.reader(csvfile, dialect)
228 # ... process CSV file contents here ...
229
230
231The :mod:`csv` module defines the following constants:
Georg Brandl116aa622007-08-15 14:28:22 +0000232
233.. data:: QUOTE_ALL
234
235 Instructs :class:`writer` objects to quote all fields.
236
237
238.. data:: QUOTE_MINIMAL
239
240 Instructs :class:`writer` objects to only quote those fields which contain
241 special characters such as *delimiter*, *quotechar* or any of the characters in
242 *lineterminator*.
243
244
245.. data:: QUOTE_NONNUMERIC
246
247 Instructs :class:`writer` objects to quote all non-numeric fields.
248
249 Instructs the reader to convert all non-quoted fields to type *float*.
250
251
252.. data:: QUOTE_NONE
253
254 Instructs :class:`writer` objects to never quote fields. When the current
255 *delimiter* occurs in output data it is preceded by the current *escapechar*
256 character. If *escapechar* is not set, the writer will raise :exc:`Error` if
257 any characters that require escaping are encountered.
258
259 Instructs :class:`reader` to perform no special processing of quote characters.
260
261The :mod:`csv` module defines the following exception:
262
263
264.. exception:: Error
265
266 Raised by any of the functions when an error is detected.
267
Georg Brandl116aa622007-08-15 14:28:22 +0000268.. _csv-fmt-params:
269
270Dialects and Formatting Parameters
271----------------------------------
272
273To make it easier to specify the format of input and output records, specific
274formatting parameters are grouped together into dialects. A dialect is a
275subclass of the :class:`Dialect` class having a set of specific methods and a
276single :meth:`validate` method. When creating :class:`reader` or
277:class:`writer` objects, the programmer can specify a string or a subclass of
278the :class:`Dialect` class as the dialect parameter. In addition to, or instead
279of, the *dialect* parameter, the programmer can also specify individual
280formatting parameters, which have the same names as the attributes defined below
281for the :class:`Dialect` class.
282
283Dialects support the following attributes:
284
285
286.. attribute:: Dialect.delimiter
287
288 A one-character string used to separate fields. It defaults to ``','``.
289
290
291.. attribute:: Dialect.doublequote
292
293 Controls how instances of *quotechar* appearing inside a field should be
294 themselves be quoted. When :const:`True`, the character is doubled. When
295 :const:`False`, the *escapechar* is used as a prefix to the *quotechar*. It
296 defaults to :const:`True`.
297
298 On output, if *doublequote* is :const:`False` and no *escapechar* is set,
299 :exc:`Error` is raised if a *quotechar* is found in a field.
300
301
302.. attribute:: Dialect.escapechar
303
304 A one-character string used by the writer to escape the *delimiter* if *quoting*
305 is set to :const:`QUOTE_NONE` and the *quotechar* if *doublequote* is
306 :const:`False`. On reading, the *escapechar* removes any special meaning from
307 the following character. It defaults to :const:`None`, which disables escaping.
308
309
310.. attribute:: Dialect.lineterminator
311
312 The string used to terminate lines produced by the :class:`writer`. It defaults
313 to ``'\r\n'``.
314
315 .. note::
316
317 The :class:`reader` is hard-coded to recognise either ``'\r'`` or ``'\n'`` as
318 end-of-line, and ignores *lineterminator*. This behavior may change in the
319 future.
320
321
322.. attribute:: Dialect.quotechar
323
324 A one-character string used to quote fields containing special characters, such
325 as the *delimiter* or *quotechar*, or which contain new-line characters. It
326 defaults to ``'"'``.
327
328
329.. attribute:: Dialect.quoting
330
331 Controls when quotes should be generated by the writer and recognised by the
332 reader. It can take on any of the :const:`QUOTE_\*` constants (see section
333 :ref:`csv-contents`) and defaults to :const:`QUOTE_MINIMAL`.
334
335
336.. attribute:: Dialect.skipinitialspace
337
338 When :const:`True`, whitespace immediately following the *delimiter* is ignored.
339 The default is :const:`False`.
340
341
342Reader Objects
343--------------
344
345Reader objects (:class:`DictReader` instances and objects returned by the
346:func:`reader` function) have the following public methods:
347
Georg Brandlc7485062009-04-01 15:53:15 +0000348.. method:: csvreader.__next__()
Georg Brandl116aa622007-08-15 14:28:22 +0000349
350 Return the next row of the reader's iterable object as a list, parsed according
Georg Brandlc7485062009-04-01 15:53:15 +0000351 to the current dialect. Usually you should call this as ``next(reader)``.
352
Georg Brandl116aa622007-08-15 14:28:22 +0000353
354Reader objects have the following public attributes:
355
Georg Brandl116aa622007-08-15 14:28:22 +0000356.. attribute:: csvreader.dialect
357
358 A read-only description of the dialect in use by the parser.
359
360
361.. attribute:: csvreader.line_num
362
363 The number of lines read from the source iterator. This is not the same as the
364 number of records returned, as records can span multiple lines.
365
Georg Brandl116aa622007-08-15 14:28:22 +0000366
Skip Montanaroaf8fcfa2008-08-09 19:44:22 +0000367DictReader objects have the following public attribute:
368
Skip Montanaroaf8fcfa2008-08-09 19:44:22 +0000369.. attribute:: csvreader.fieldnames
370
371 If not passed as a parameter when creating the object, this attribute is
372 initialized upon first access or when the first record is read from the
373 file.
374
Skip Montanaroaf8fcfa2008-08-09 19:44:22 +0000375
376
Georg Brandl116aa622007-08-15 14:28:22 +0000377Writer Objects
378--------------
379
380:class:`Writer` objects (:class:`DictWriter` instances and objects returned by
381the :func:`writer` function) have the following public methods. A *row* must be
382a sequence of strings or numbers for :class:`Writer` objects and a dictionary
383mapping fieldnames to strings or numbers (by passing them through :func:`str`
384first) for :class:`DictWriter` objects. Note that complex numbers are written
385out surrounded by parens. This may cause some problems for other programs which
386read CSV files (assuming they support complex numbers at all).
387
388
389.. method:: csvwriter.writerow(row)
390
391 Write the *row* parameter to the writer's file object, formatted according to
392 the current dialect.
393
394
395.. method:: csvwriter.writerows(rows)
396
397 Write all the *rows* parameters (a list of *row* objects as described above) to
398 the writer's file object, formatted according to the current dialect.
399
400Writer objects have the following public attribute:
401
402
403.. attribute:: csvwriter.dialect
404
405 A read-only description of the dialect in use by the writer.
406
407
R. David Murraybe0698b2010-02-23 22:57:58 +0000408DictWriter objects have the following public method:
409
410
411.. method:: DictWriter.writeheader()
412
413 Write a row with the field names (as specified in the constructor).
414
R. David Murray19e45482010-02-23 23:00:34 +0000415 .. versionadded:: 3.2
R. David Murraybe0698b2010-02-23 22:57:58 +0000416
417
Georg Brandl116aa622007-08-15 14:28:22 +0000418.. _csv-examples:
419
420Examples
421--------
422
423The simplest example of reading a CSV file::
424
425 import csv
Eli Bendersky9cc62492011-03-11 16:33:36 +0200426 with open('some.csv', newline='') as f:
427 reader = csv.reader(f)
428 for row in reader:
429 print(row)
Georg Brandl116aa622007-08-15 14:28:22 +0000430
431Reading a file with an alternate format::
432
433 import csv
R David Murrayf453deb2011-03-20 10:23:22 -0400434 with open('passwd', newline='') as f:
Eli Bendersky9cc62492011-03-11 16:33:36 +0200435 reader = csv.reader(f, delimiter=':', quoting=csv.QUOTE_NONE)
436 for row in reader:
437 print(row)
Georg Brandl116aa622007-08-15 14:28:22 +0000438
439The corresponding simplest possible writing example is::
440
441 import csv
R David Murrayf453deb2011-03-20 10:23:22 -0400442 with open('some.csv', 'w', newline='') as f:
Eli Bendersky9cc62492011-03-11 16:33:36 +0200443 writer = csv.writer(f)
444 writer.writerows(someiterable)
Georg Brandl116aa622007-08-15 14:28:22 +0000445
R. David Murray8b7d4aa2009-04-04 01:38:38 +0000446Since :func:`open` is used to open a CSV file for reading, the file
447will by default be decoded into unicode using the system default
448encoding (see :func:`locale.getpreferredencoding`). To decode a file
449using a different encoding, use the ``encoding`` argument of open::
450
Eli Bendersky9cc62492011-03-11 16:33:36 +0200451 import csv
452 with open('some.csv', newline='', encoding='utf-8') as f:
453 reader = csv.reader(f)
454 for row in reader:
455 print(row)
R. David Murray8b7d4aa2009-04-04 01:38:38 +0000456
457The same applies to writing in something other than the system default
458encoding: specify the encoding argument when opening the output file.
459
Georg Brandl116aa622007-08-15 14:28:22 +0000460Registering a new dialect::
461
462 import csv
Georg Brandl116aa622007-08-15 14:28:22 +0000463 csv.register_dialect('unixpwd', delimiter=':', quoting=csv.QUOTE_NONE)
R David Murrayf453deb2011-03-20 10:23:22 -0400464 with open('passwd', newline='') as f:
Eli Bendersky9cc62492011-03-11 16:33:36 +0200465 reader = csv.reader(f, 'unixpwd')
Georg Brandl116aa622007-08-15 14:28:22 +0000466
467A slightly more advanced use of the reader --- catching and reporting errors::
468
469 import csv, sys
Eli Bendersky9cc62492011-03-11 16:33:36 +0200470 filename = 'some.csv'
471 with open(filename, newline='') as f:
472 reader = csv.reader(f)
473 try:
474 for row in reader:
475 print(row)
476 except csv.Error as e:
477 sys.exit('file {}, line {}: {}'.format(filename, reader.line_num, e))
Georg Brandl116aa622007-08-15 14:28:22 +0000478
479And while the module doesn't directly support parsing strings, it can easily be
480done::
481
482 import csv
483 for row in csv.reader(['one,two,three']):
Georg Brandl6911e3c2007-09-04 07:15:32 +0000484 print(row)
Georg Brandl116aa622007-08-15 14:28:22 +0000485
Georg Brandl116aa622007-08-15 14:28:22 +0000486
R. David Murray8b7d4aa2009-04-04 01:38:38 +0000487.. rubric:: Footnotes
Georg Brandl116aa622007-08-15 14:28:22 +0000488
R David Murrayf453deb2011-03-20 10:23:22 -0400489.. [1] If ``newline=''`` is not specified, newlines embedded inside quoted fields
490 will not be interpreted correctly, and on platforms that use ``\r\n`` linendings
R David Murray9c0d5ea2011-03-20 11:18:21 -0400491 on write an extra ``\r`` will be added. It should always be safe to specify
R David Murray1b00f252012-08-15 10:43:58 -0400492 ``newline=''``, since the csv module does its own
493 (:term:`universal <universal newlines>`) newline handling.