blob: 71523b5c5289d038817b1fa8b6bdc26bdb2330fe [file] [log] [blame]
Georg Brandl116aa622007-08-15 14:28:22 +00001:mod:`csv` --- CSV File Reading and Writing
2===========================================
3
4.. module:: csv
5 :synopsis: Write and read tabular data to and from delimited files.
6.. sectionauthor:: Skip Montanaro <skip@pobox.com>
7
8
Georg Brandl116aa622007-08-15 14:28:22 +00009.. index::
10 single: csv
11 pair: data; tabular
12
13The so-called CSV (Comma Separated Values) format is the most common import and
14export format for spreadsheets and databases. There is no "CSV standard", so
15the format is operationally defined by the many applications which read and
16write it. The lack of a standard means that subtle differences often exist in
17the data produced and consumed by different applications. These differences can
18make it annoying to process CSV files from multiple sources. Still, while the
19delimiters and quoting characters vary, the overall format is similar enough
20that it is possible to write a single module which can efficiently manipulate
21such data, hiding the details of reading and writing the data from the
22programmer.
23
24The :mod:`csv` module implements classes to read and write tabular data in CSV
25format. It allows programmers to say, "write this data in the format preferred
26by Excel," or "read data from this file which was generated by Excel," without
27knowing the precise details of the CSV format used by Excel. Programmers can
28also describe the CSV formats understood by other applications or define their
29own special-purpose CSV formats.
30
31The :mod:`csv` module's :class:`reader` and :class:`writer` objects read and
32write sequences. Programmers can also read and write data in dictionary form
33using the :class:`DictReader` and :class:`DictWriter` classes.
34
Georg Brandl116aa622007-08-15 14:28:22 +000035.. seealso::
36
Georg Brandl116aa622007-08-15 14:28:22 +000037 :pep:`305` - CSV File API
38 The Python Enhancement Proposal which proposed this addition to Python.
39
40
41.. _csv-contents:
42
43Module Contents
44---------------
45
46The :mod:`csv` module defines the following functions:
47
48
Georg Brandlc2a4f4f2009-04-10 09:03:43 +000049.. function:: reader(csvfile, dialect='excel', **fmtparams)
Georg Brandl116aa622007-08-15 14:28:22 +000050
51 Return a reader object which will iterate over lines in the given *csvfile*.
Georg Brandl9afde1c2007-11-01 20:32:30 +000052 *csvfile* can be any object which supports the :term:`iterator` protocol and returns a
Georg Brandl502d9a52009-07-26 15:02:41 +000053 string each time its :meth:`!next` method is called --- file objects and list
R. David Murray8b7d4aa2009-04-04 01:38:38 +000054 objects are both suitable. If *csvfile* is a file object, it should be opened
55 with ``newline=''``. [#]_ An optional
Georg Brandl116aa622007-08-15 14:28:22 +000056 *dialect* parameter can be given which is used to define a set of parameters
57 specific to a particular CSV dialect. It may be an instance of a subclass of
58 the :class:`Dialect` class or one of the strings returned by the
Georg Brandlc2a4f4f2009-04-10 09:03:43 +000059 :func:`list_dialects` function. The other optional *fmtparams* keyword arguments
Georg Brandl116aa622007-08-15 14:28:22 +000060 can be given to override individual formatting parameters in the current
61 dialect. For full details about the dialect and formatting parameters, see
62 section :ref:`csv-fmt-params`.
63
Skip Montanaro0468df32009-03-25 00:52:18 +000064 Each row read from the csv file is returned as a list of strings. No
R. David Murray8b7d4aa2009-04-04 01:38:38 +000065 automatic data type conversion is performed unless the ``QUOTE_NONNUMERIC`` format
66 option is specified (in which case unquoted fields are transformed into floats).
Georg Brandl116aa622007-08-15 14:28:22 +000067
Christian Heimesb9eccbf2007-12-05 20:18:38 +000068 A short usage example::
Georg Brandl48310cd2009-01-03 21:18:54 +000069
Christian Heimesb9eccbf2007-12-05 20:18:38 +000070 >>> import csv
R. David Murray8b7d4aa2009-04-04 01:38:38 +000071 >>> spamReader = csv.reader(open('eggs.csv', newline=''), delimiter=' ', quotechar='|')
Christian Heimesb9eccbf2007-12-05 20:18:38 +000072 >>> for row in spamReader:
Georg Brandlf6945182008-02-01 11:56:49 +000073 ... print(', '.join(row))
Christian Heimesb9eccbf2007-12-05 20:18:38 +000074 Spam, Spam, Spam, Spam, Spam, Baked Beans
75 Spam, Lovely Spam, Wonderful Spam
76
Georg Brandl116aa622007-08-15 14:28:22 +000077
Georg Brandlc2a4f4f2009-04-10 09:03:43 +000078.. function:: writer(csvfile, dialect='excel', **fmtparams)
Georg Brandl116aa622007-08-15 14:28:22 +000079
80 Return a writer object responsible for converting the user's data into delimited
81 strings on the given file-like object. *csvfile* can be any object with a
R. David Murray8b7d4aa2009-04-04 01:38:38 +000082 :func:`write` method. An optional *dialect*
Georg Brandl116aa622007-08-15 14:28:22 +000083 parameter can be given which is used to define a set of parameters specific to a
84 particular CSV dialect. It may be an instance of a subclass of the
85 :class:`Dialect` class or one of the strings returned by the
Georg Brandlc2a4f4f2009-04-10 09:03:43 +000086 :func:`list_dialects` function. The other optional *fmtparams* keyword arguments
Georg Brandl116aa622007-08-15 14:28:22 +000087 can be given to override individual formatting parameters in the current
88 dialect. For full details about the dialect and formatting parameters, see
89 section :ref:`csv-fmt-params`. To make it
90 as easy as possible to interface with modules which implement the DB API, the
91 value :const:`None` is written as the empty string. While this isn't a
92 reversible transformation, it makes it easier to dump SQL NULL data values to
93 CSV files without preprocessing the data returned from a ``cursor.fetch*`` call.
94 All other non-string data are stringified with :func:`str` before being written.
95
Christian Heimesb9eccbf2007-12-05 20:18:38 +000096 A short usage example::
97
98 >>> import csv
99 >>> spamWriter = csv.writer(open('eggs.csv', 'w'), delimiter=' ',
Skip Montanarof290cb52009-08-18 14:37:42 +0000100 ... quotechar='|', quoting=csv.QUOTE_MINIMAL)
Christian Heimesb9eccbf2007-12-05 20:18:38 +0000101 >>> spamWriter.writerow(['Spam'] * 5 + ['Baked Beans'])
102 >>> spamWriter.writerow(['Spam', 'Lovely Spam', 'Wonderful Spam'])
103
Georg Brandl116aa622007-08-15 14:28:22 +0000104
Georg Brandlc2a4f4f2009-04-10 09:03:43 +0000105.. function:: register_dialect(name[, dialect], **fmtparams)
Georg Brandl116aa622007-08-15 14:28:22 +0000106
Georg Brandlf6945182008-02-01 11:56:49 +0000107 Associate *dialect* with *name*. *name* must be a string. The
Georg Brandl116aa622007-08-15 14:28:22 +0000108 dialect can be specified either by passing a sub-class of :class:`Dialect`, or
Georg Brandlc2a4f4f2009-04-10 09:03:43 +0000109 by *fmtparams* keyword arguments, or both, with keyword arguments overriding
Georg Brandl116aa622007-08-15 14:28:22 +0000110 parameters of the dialect. For full details about the dialect and formatting
111 parameters, see section :ref:`csv-fmt-params`.
112
113
114.. function:: unregister_dialect(name)
115
116 Delete the dialect associated with *name* from the dialect registry. An
117 :exc:`Error` is raised if *name* is not a registered dialect name.
118
119
120.. function:: get_dialect(name)
121
Georg Brandl6554cb92007-12-02 23:15:43 +0000122 Return the dialect associated with *name*. An :exc:`Error` is raised if
123 *name* is not a registered dialect name. This function returns an immutable
124 :class:`Dialect`.
Georg Brandl116aa622007-08-15 14:28:22 +0000125
126.. function:: list_dialects()
127
128 Return the names of all registered dialects.
129
130
131.. function:: field_size_limit([new_limit])
132
133 Returns the current maximum field size allowed by the parser. If *new_limit* is
134 given, this becomes the new limit.
135
Georg Brandl116aa622007-08-15 14:28:22 +0000136
137The :mod:`csv` module defines the following classes:
138
Georg Brandlc2a4f4f2009-04-10 09:03:43 +0000139.. class:: DictReader(csvfile, fieldnames=None, restkey=None, restval=None, dialect='excel', *args, **kwds)
Georg Brandl116aa622007-08-15 14:28:22 +0000140
141 Create an object which operates like a regular reader but maps the information
142 read into a dict whose keys are given by the optional *fieldnames* parameter.
143 If the *fieldnames* parameter is omitted, the values in the first row of the
R. David Murray578ec902009-11-09 14:21:38 +0000144 *csvfile* will be used as the fieldnames. If the row read has more fields
145 than the fieldnames sequence, the remaining data is added as a sequence
146 keyed by the value of *restkey*. If the row read has fewer fields than the
147 fieldnames sequence, the remaining keys take the value of the optional
148 *restval* parameter. Any other optional or keyword arguments are passed to
149 the underlying :class:`reader` instance.
Georg Brandl116aa622007-08-15 14:28:22 +0000150
151
Georg Brandlc2a4f4f2009-04-10 09:03:43 +0000152.. class:: DictWriter(csvfile, fieldnames, restval='', extrasaction='raise', dialect='excel', *args, **kwds)
Georg Brandl116aa622007-08-15 14:28:22 +0000153
154 Create an object which operates like a regular writer but maps dictionaries onto
155 output rows. The *fieldnames* parameter identifies the order in which values in
156 the dictionary passed to the :meth:`writerow` method are written to the
157 *csvfile*. The optional *restval* parameter specifies the value to be written
158 if the dictionary is missing a key in *fieldnames*. If the dictionary passed to
159 the :meth:`writerow` method contains a key not found in *fieldnames*, the
160 optional *extrasaction* parameter indicates what action to take. If it is set
161 to ``'raise'`` a :exc:`ValueError` is raised. If it is set to ``'ignore'``,
162 extra values in the dictionary are ignored. Any other optional or keyword
163 arguments are passed to the underlying :class:`writer` instance.
164
165 Note that unlike the :class:`DictReader` class, the *fieldnames* parameter of
166 the :class:`DictWriter` is not optional. Since Python's :class:`dict` objects
167 are not ordered, there is not enough information available to deduce the order
168 in which the row should be written to the *csvfile*.
169
170
171.. class:: Dialect
172
173 The :class:`Dialect` class is a container class relied on primarily for its
174 attributes, which are used to define the parameters for a specific
175 :class:`reader` or :class:`writer` instance.
176
177
178.. class:: excel()
179
180 The :class:`excel` class defines the usual properties of an Excel-generated CSV
181 file. It is registered with the dialect name ``'excel'``.
182
183
184.. class:: excel_tab()
185
186 The :class:`excel_tab` class defines the usual properties of an Excel-generated
187 TAB-delimited file. It is registered with the dialect name ``'excel-tab'``.
188
189
190.. class:: Sniffer()
191
192 The :class:`Sniffer` class is used to deduce the format of a CSV file.
193
Benjamin Petersone41251e2008-04-25 01:59:09 +0000194 The :class:`Sniffer` class provides two methods:
Georg Brandl116aa622007-08-15 14:28:22 +0000195
Georg Brandlc2a4f4f2009-04-10 09:03:43 +0000196 .. method:: sniff(sample, delimiters=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000197
Benjamin Petersone41251e2008-04-25 01:59:09 +0000198 Analyze the given *sample* and return a :class:`Dialect` subclass
199 reflecting the parameters found. If the optional *delimiters* parameter
200 is given, it is interpreted as a string containing possible valid
201 delimiter characters.
Georg Brandl116aa622007-08-15 14:28:22 +0000202
203
Benjamin Petersone41251e2008-04-25 01:59:09 +0000204 .. method:: has_header(sample)
Georg Brandl116aa622007-08-15 14:28:22 +0000205
Benjamin Petersone41251e2008-04-25 01:59:09 +0000206 Analyze the sample text (presumed to be in CSV format) and return
207 :const:`True` if the first row appears to be a series of column headers.
Georg Brandl116aa622007-08-15 14:28:22 +0000208
Christian Heimes7f044312008-01-06 17:05:40 +0000209An example for :class:`Sniffer` use::
Georg Brandl116aa622007-08-15 14:28:22 +0000210
Christian Heimes7f044312008-01-06 17:05:40 +0000211 csvfile = open("example.csv")
212 dialect = csv.Sniffer().sniff(csvfile.read(1024))
213 csvfile.seek(0)
214 reader = csv.reader(csvfile, dialect)
215 # ... process CSV file contents here ...
216
217
218The :mod:`csv` module defines the following constants:
Georg Brandl116aa622007-08-15 14:28:22 +0000219
220.. data:: QUOTE_ALL
221
222 Instructs :class:`writer` objects to quote all fields.
223
224
225.. data:: QUOTE_MINIMAL
226
227 Instructs :class:`writer` objects to only quote those fields which contain
228 special characters such as *delimiter*, *quotechar* or any of the characters in
229 *lineterminator*.
230
231
232.. data:: QUOTE_NONNUMERIC
233
234 Instructs :class:`writer` objects to quote all non-numeric fields.
235
236 Instructs the reader to convert all non-quoted fields to type *float*.
237
238
239.. data:: QUOTE_NONE
240
241 Instructs :class:`writer` objects to never quote fields. When the current
242 *delimiter* occurs in output data it is preceded by the current *escapechar*
243 character. If *escapechar* is not set, the writer will raise :exc:`Error` if
244 any characters that require escaping are encountered.
245
246 Instructs :class:`reader` to perform no special processing of quote characters.
247
248The :mod:`csv` module defines the following exception:
249
250
251.. exception:: Error
252
253 Raised by any of the functions when an error is detected.
254
Georg Brandl116aa622007-08-15 14:28:22 +0000255.. _csv-fmt-params:
256
257Dialects and Formatting Parameters
258----------------------------------
259
260To make it easier to specify the format of input and output records, specific
261formatting parameters are grouped together into dialects. A dialect is a
262subclass of the :class:`Dialect` class having a set of specific methods and a
263single :meth:`validate` method. When creating :class:`reader` or
264:class:`writer` objects, the programmer can specify a string or a subclass of
265the :class:`Dialect` class as the dialect parameter. In addition to, or instead
266of, the *dialect* parameter, the programmer can also specify individual
267formatting parameters, which have the same names as the attributes defined below
268for the :class:`Dialect` class.
269
270Dialects support the following attributes:
271
272
273.. attribute:: Dialect.delimiter
274
275 A one-character string used to separate fields. It defaults to ``','``.
276
277
278.. attribute:: Dialect.doublequote
279
280 Controls how instances of *quotechar* appearing inside a field should be
281 themselves be quoted. When :const:`True`, the character is doubled. When
282 :const:`False`, the *escapechar* is used as a prefix to the *quotechar*. It
283 defaults to :const:`True`.
284
285 On output, if *doublequote* is :const:`False` and no *escapechar* is set,
286 :exc:`Error` is raised if a *quotechar* is found in a field.
287
288
289.. attribute:: Dialect.escapechar
290
291 A one-character string used by the writer to escape the *delimiter* if *quoting*
292 is set to :const:`QUOTE_NONE` and the *quotechar* if *doublequote* is
293 :const:`False`. On reading, the *escapechar* removes any special meaning from
294 the following character. It defaults to :const:`None`, which disables escaping.
295
296
297.. attribute:: Dialect.lineterminator
298
299 The string used to terminate lines produced by the :class:`writer`. It defaults
300 to ``'\r\n'``.
301
302 .. note::
303
304 The :class:`reader` is hard-coded to recognise either ``'\r'`` or ``'\n'`` as
305 end-of-line, and ignores *lineterminator*. This behavior may change in the
306 future.
307
308
309.. attribute:: Dialect.quotechar
310
311 A one-character string used to quote fields containing special characters, such
312 as the *delimiter* or *quotechar*, or which contain new-line characters. It
313 defaults to ``'"'``.
314
315
316.. attribute:: Dialect.quoting
317
318 Controls when quotes should be generated by the writer and recognised by the
319 reader. It can take on any of the :const:`QUOTE_\*` constants (see section
320 :ref:`csv-contents`) and defaults to :const:`QUOTE_MINIMAL`.
321
322
323.. attribute:: Dialect.skipinitialspace
324
325 When :const:`True`, whitespace immediately following the *delimiter* is ignored.
326 The default is :const:`False`.
327
328
329Reader Objects
330--------------
331
332Reader objects (:class:`DictReader` instances and objects returned by the
333:func:`reader` function) have the following public methods:
334
Georg Brandlc7485062009-04-01 15:53:15 +0000335.. method:: csvreader.__next__()
Georg Brandl116aa622007-08-15 14:28:22 +0000336
337 Return the next row of the reader's iterable object as a list, parsed according
Georg Brandlc7485062009-04-01 15:53:15 +0000338 to the current dialect. Usually you should call this as ``next(reader)``.
339
Georg Brandl116aa622007-08-15 14:28:22 +0000340
341Reader objects have the following public attributes:
342
Georg Brandl116aa622007-08-15 14:28:22 +0000343.. attribute:: csvreader.dialect
344
345 A read-only description of the dialect in use by the parser.
346
347
348.. attribute:: csvreader.line_num
349
350 The number of lines read from the source iterator. This is not the same as the
351 number of records returned, as records can span multiple lines.
352
Georg Brandl116aa622007-08-15 14:28:22 +0000353
Skip Montanaroaf8fcfa2008-08-09 19:44:22 +0000354DictReader objects have the following public attribute:
355
Skip Montanaroaf8fcfa2008-08-09 19:44:22 +0000356.. attribute:: csvreader.fieldnames
357
358 If not passed as a parameter when creating the object, this attribute is
359 initialized upon first access or when the first record is read from the
360 file.
361
Skip Montanaroaf8fcfa2008-08-09 19:44:22 +0000362
363
Georg Brandl116aa622007-08-15 14:28:22 +0000364Writer Objects
365--------------
366
367:class:`Writer` objects (:class:`DictWriter` instances and objects returned by
368the :func:`writer` function) have the following public methods. A *row* must be
369a sequence of strings or numbers for :class:`Writer` objects and a dictionary
370mapping fieldnames to strings or numbers (by passing them through :func:`str`
371first) for :class:`DictWriter` objects. Note that complex numbers are written
372out surrounded by parens. This may cause some problems for other programs which
373read CSV files (assuming they support complex numbers at all).
374
375
376.. method:: csvwriter.writerow(row)
377
378 Write the *row* parameter to the writer's file object, formatted according to
379 the current dialect.
380
381
382.. method:: csvwriter.writerows(rows)
383
384 Write all the *rows* parameters (a list of *row* objects as described above) to
385 the writer's file object, formatted according to the current dialect.
386
387Writer objects have the following public attribute:
388
389
390.. attribute:: csvwriter.dialect
391
392 A read-only description of the dialect in use by the writer.
393
394
R. David Murraybe0698b2010-02-23 22:57:58 +0000395DictWriter objects have the following public method:
396
397
398.. method:: DictWriter.writeheader()
399
400 Write a row with the field names (as specified in the constructor).
401
R. David Murray19e45482010-02-23 23:00:34 +0000402 .. versionadded:: 3.2
R. David Murraybe0698b2010-02-23 22:57:58 +0000403
404
Georg Brandl116aa622007-08-15 14:28:22 +0000405.. _csv-examples:
406
407Examples
408--------
409
410The simplest example of reading a CSV file::
411
412 import csv
R. David Murray8b7d4aa2009-04-04 01:38:38 +0000413 reader = csv.reader(open("some.csv", newline=''))
Georg Brandl116aa622007-08-15 14:28:22 +0000414 for row in reader:
Georg Brandl6911e3c2007-09-04 07:15:32 +0000415 print(row)
Georg Brandl116aa622007-08-15 14:28:22 +0000416
417Reading a file with an alternate format::
418
419 import csv
R. David Murray8b7d4aa2009-04-04 01:38:38 +0000420 reader = csv.reader(open("passwd"), delimiter=':', quoting=csv.QUOTE_NONE)
Georg Brandl116aa622007-08-15 14:28:22 +0000421 for row in reader:
Georg Brandl6911e3c2007-09-04 07:15:32 +0000422 print(row)
Georg Brandl116aa622007-08-15 14:28:22 +0000423
424The corresponding simplest possible writing example is::
425
426 import csv
R. David Murray8b7d4aa2009-04-04 01:38:38 +0000427 writer = csv.writer(open("some.csv", "w"))
Georg Brandl116aa622007-08-15 14:28:22 +0000428 writer.writerows(someiterable)
429
R. David Murray8b7d4aa2009-04-04 01:38:38 +0000430Since :func:`open` is used to open a CSV file for reading, the file
431will by default be decoded into unicode using the system default
432encoding (see :func:`locale.getpreferredencoding`). To decode a file
433using a different encoding, use the ``encoding`` argument of open::
434
435 import csv
436 reader = csv.reader(open("some.csv", newline='', encoding='utf-8'))
437 for row in reader:
438 print(row)
439
440The same applies to writing in something other than the system default
441encoding: specify the encoding argument when opening the output file.
442
Georg Brandl116aa622007-08-15 14:28:22 +0000443Registering a new dialect::
444
445 import csv
Georg Brandl116aa622007-08-15 14:28:22 +0000446 csv.register_dialect('unixpwd', delimiter=':', quoting=csv.QUOTE_NONE)
R. David Murray8b7d4aa2009-04-04 01:38:38 +0000447 reader = csv.reader(open("passwd"), 'unixpwd')
Georg Brandl116aa622007-08-15 14:28:22 +0000448
449A slightly more advanced use of the reader --- catching and reporting errors::
450
451 import csv, sys
452 filename = "some.csv"
R. David Murray8b7d4aa2009-04-04 01:38:38 +0000453 reader = csv.reader(open(filename, newline=''))
Georg Brandl116aa622007-08-15 14:28:22 +0000454 try:
455 for row in reader:
Georg Brandl6911e3c2007-09-04 07:15:32 +0000456 print(row)
Georg Brandl116aa622007-08-15 14:28:22 +0000457 except csv.Error as e:
R. David Murray8b7d4aa2009-04-04 01:38:38 +0000458 sys.exit('file {}, line {}: {}'.format(filename, reader.line_num, e))
Georg Brandl116aa622007-08-15 14:28:22 +0000459
460And while the module doesn't directly support parsing strings, it can easily be
461done::
462
463 import csv
464 for row in csv.reader(['one,two,three']):
Georg Brandl6911e3c2007-09-04 07:15:32 +0000465 print(row)
Georg Brandl116aa622007-08-15 14:28:22 +0000466
Georg Brandl116aa622007-08-15 14:28:22 +0000467
R. David Murray8b7d4aa2009-04-04 01:38:38 +0000468.. rubric:: Footnotes
Georg Brandl116aa622007-08-15 14:28:22 +0000469
R. David Murray8b7d4aa2009-04-04 01:38:38 +0000470.. [#] If ``newline=''`` is not specified, newlines embedded inside quoted fields
471 will not be interpreted correctly. It should always be safe to specify
472 ``newline=''``, since the csv module does its own universal newline handling
473 on input.