blob: 407efd0233ac886e84ef4f1b8cc81dab5c31a942 [file] [log] [blame]
Georg Brandl8ec7f652007-08-15 14:28:01 +00001
2:mod:`csv` --- CSV File Reading and Writing
3===========================================
4
5.. module:: csv
6 :synopsis: Write and read tabular data to and from delimited files.
7.. sectionauthor:: Skip Montanaro <skip@pobox.com>
8
9
10.. versionadded:: 2.3
11
12.. index::
13 single: csv
14 pair: data; tabular
15
16The so-called CSV (Comma Separated Values) format is the most common import and
17export format for spreadsheets and databases. There is no "CSV standard", so
18the format is operationally defined by the many applications which read and
19write it. The lack of a standard means that subtle differences often exist in
20the data produced and consumed by different applications. These differences can
21make it annoying to process CSV files from multiple sources. Still, while the
22delimiters and quoting characters vary, the overall format is similar enough
23that it is possible to write a single module which can efficiently manipulate
24such data, hiding the details of reading and writing the data from the
25programmer.
26
27The :mod:`csv` module implements classes to read and write tabular data in CSV
28format. It allows programmers to say, "write this data in the format preferred
29by Excel," or "read data from this file which was generated by Excel," without
30knowing the precise details of the CSV format used by Excel. Programmers can
31also describe the CSV formats understood by other applications or define their
32own special-purpose CSV formats.
33
34The :mod:`csv` module's :class:`reader` and :class:`writer` objects read and
35write sequences. Programmers can also read and write data in dictionary form
36using the :class:`DictReader` and :class:`DictWriter` classes.
37
38.. note::
39
40 This version of the :mod:`csv` module doesn't support Unicode input. Also,
41 there are currently some issues regarding ASCII NUL characters. Accordingly,
42 all input should be UTF-8 or printable ASCII to be safe; see the examples in
43 section :ref:`csv-examples`. These restrictions will be removed in the future.
44
45
46.. seealso::
47
Georg Brandl8ec7f652007-08-15 14:28:01 +000048 :pep:`305` - CSV File API
49 The Python Enhancement Proposal which proposed this addition to Python.
50
51
52.. _csv-contents:
53
54Module Contents
55---------------
56
57The :mod:`csv` module defines the following functions:
58
59
60.. function:: reader(csvfile[, dialect='excel'][, fmtparam])
61
62 Return a reader object which will iterate over lines in the given *csvfile*.
Georg Brandle7a09902007-10-21 12:10:28 +000063 *csvfile* can be any object which supports the :term:`iterator` protocol and returns a
Georg Brandl9fa61bb2009-07-26 14:19:57 +000064 string each time its :meth:`!next` method is called --- file objects and list
Georg Brandl8ec7f652007-08-15 14:28:01 +000065 objects are both suitable. If *csvfile* is a file object, it must be opened
66 with the 'b' flag on platforms where that makes a difference. An optional
67 *dialect* parameter can be given which is used to define a set of parameters
68 specific to a particular CSV dialect. It may be an instance of a subclass of
69 the :class:`Dialect` class or one of the strings returned by the
70 :func:`list_dialects` function. The other optional *fmtparam* keyword arguments
71 can be given to override individual formatting parameters in the current
72 dialect. For full details about the dialect and formatting parameters, see
73 section :ref:`csv-fmt-params`.
74
Skip Montanaro9a1337b2009-03-25 00:52:11 +000075 Each row read from the csv file is returned as a list of strings. No
76 automatic data type conversion is performed.
Georg Brandl8ec7f652007-08-15 14:28:01 +000077
Georg Brandl722e1012007-12-05 17:56:50 +000078 A short usage example::
Georg Brandlc62ef8b2009-01-03 20:55:06 +000079
Georg Brandl722e1012007-12-05 17:56:50 +000080 >>> import csv
81 >>> spamReader = csv.reader(open('eggs.csv'), delimiter=' ', quotechar='|')
82 >>> for row in spamReader:
83 ... print ', '.join(row)
84 Spam, Spam, Spam, Spam, Spam, Baked Beans
85 Spam, Lovely Spam, Wonderful Spam
86
Georg Brandl8ec7f652007-08-15 14:28:01 +000087 .. versionchanged:: 2.5
88 The parser is now stricter with respect to multi-line quoted fields. Previously,
89 if a line ended within a quoted field without a terminating newline character, a
90 newline would be inserted into the returned field. This behavior caused problems
91 when reading files which contained carriage return characters within fields.
92 The behavior was changed to return the field without inserting newlines. As a
93 consequence, if newlines embedded within fields are important, the input should
94 be split into lines in a manner which preserves the newline characters.
95
96
97.. function:: writer(csvfile[, dialect='excel'][, fmtparam])
98
99 Return a writer object responsible for converting the user's data into delimited
100 strings on the given file-like object. *csvfile* can be any object with a
101 :func:`write` method. If *csvfile* is a file object, it must be opened with the
102 'b' flag on platforms where that makes a difference. An optional *dialect*
103 parameter can be given which is used to define a set of parameters specific to a
104 particular CSV dialect. It may be an instance of a subclass of the
105 :class:`Dialect` class or one of the strings returned by the
106 :func:`list_dialects` function. The other optional *fmtparam* keyword arguments
107 can be given to override individual formatting parameters in the current
108 dialect. For full details about the dialect and formatting parameters, see
109 section :ref:`csv-fmt-params`. To make it
110 as easy as possible to interface with modules which implement the DB API, the
111 value :const:`None` is written as the empty string. While this isn't a
112 reversible transformation, it makes it easier to dump SQL NULL data values to
113 CSV files without preprocessing the data returned from a ``cursor.fetch*`` call.
114 All other non-string data are stringified with :func:`str` before being written.
115
Georg Brandl722e1012007-12-05 17:56:50 +0000116 A short usage example::
117
118 >>> import csv
119 >>> spamWriter = csv.writer(open('eggs.csv', 'w'), delimiter=' ',
Skip Montanaro56366cc2009-08-18 14:37:52 +0000120 ... quotechar='|', quoting=csv.QUOTE_MINIMAL)
Georg Brandl722e1012007-12-05 17:56:50 +0000121 >>> spamWriter.writerow(['Spam'] * 5 + ['Baked Beans'])
122 >>> spamWriter.writerow(['Spam', 'Lovely Spam', 'Wonderful Spam'])
123
Georg Brandl8ec7f652007-08-15 14:28:01 +0000124
125.. function:: register_dialect(name[, dialect][, fmtparam])
126
127 Associate *dialect* with *name*. *name* must be a string or Unicode object. The
128 dialect can be specified either by passing a sub-class of :class:`Dialect`, or
129 by *fmtparam* keyword arguments, or both, with keyword arguments overriding
130 parameters of the dialect. For full details about the dialect and formatting
131 parameters, see section :ref:`csv-fmt-params`.
132
133
134.. function:: unregister_dialect(name)
135
136 Delete the dialect associated with *name* from the dialect registry. An
137 :exc:`Error` is raised if *name* is not a registered dialect name.
138
139
140.. function:: get_dialect(name)
141
142 Return the dialect associated with *name*. An :exc:`Error` is raised if *name*
143 is not a registered dialect name.
144
Skip Montanarod469ff12007-11-04 15:56:52 +0000145 .. versionchanged:: 2.5
Georg Brandl9c466ba2007-11-04 17:43:49 +0000146 This function now returns an immutable :class:`Dialect`. Previously an
147 instance of the requested dialect was returned. Users could modify the
148 underlying class, changing the behavior of active readers and writers.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000149
150.. function:: list_dialects()
151
152 Return the names of all registered dialects.
153
154
155.. function:: field_size_limit([new_limit])
156
157 Returns the current maximum field size allowed by the parser. If *new_limit* is
158 given, this becomes the new limit.
159
160 .. versionadded:: 2.5
161
162The :mod:`csv` module defines the following classes:
163
164
Brett Cannon1f67a672007-10-16 23:24:06 +0000165.. class:: DictReader(csvfile[, fieldnames=None[, restkey=None[, restval=None[, dialect='excel'[, *args, **kwds]]]]])
Georg Brandl8ec7f652007-08-15 14:28:01 +0000166
167 Create an object which operates like a regular reader but maps the information
168 read into a dict whose keys are given by the optional *fieldnames* parameter.
169 If the *fieldnames* parameter is omitted, the values in the first row of the
R. David Murraya5dcf212009-11-09 14:18:14 +0000170 *csvfile* will be used as the fieldnames. If the row read has more fields
171 than the fieldnames sequence, the remaining data is added as a sequence
172 keyed by the value of *restkey*. If the row read has fewer fields than the
173 fieldnames sequence, the remaining keys take the value of the optional
174 *restval* parameter. Any other optional or keyword arguments are passed to
175 the underlying :class:`reader` instance.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000176
177
178.. class:: DictWriter(csvfile, fieldnames[, restval=''[, extrasaction='raise'[, dialect='excel'[, *args, **kwds]]]])
179
180 Create an object which operates like a regular writer but maps dictionaries onto
181 output rows. The *fieldnames* parameter identifies the order in which values in
182 the dictionary passed to the :meth:`writerow` method are written to the
183 *csvfile*. The optional *restval* parameter specifies the value to be written
184 if the dictionary is missing a key in *fieldnames*. If the dictionary passed to
185 the :meth:`writerow` method contains a key not found in *fieldnames*, the
186 optional *extrasaction* parameter indicates what action to take. If it is set
187 to ``'raise'`` a :exc:`ValueError` is raised. If it is set to ``'ignore'``,
188 extra values in the dictionary are ignored. Any other optional or keyword
189 arguments are passed to the underlying :class:`writer` instance.
190
191 Note that unlike the :class:`DictReader` class, the *fieldnames* parameter of
192 the :class:`DictWriter` is not optional. Since Python's :class:`dict` objects
193 are not ordered, there is not enough information available to deduce the order
194 in which the row should be written to the *csvfile*.
195
196
197.. class:: Dialect
198
199 The :class:`Dialect` class is a container class relied on primarily for its
200 attributes, which are used to define the parameters for a specific
201 :class:`reader` or :class:`writer` instance.
202
203
204.. class:: excel()
205
206 The :class:`excel` class defines the usual properties of an Excel-generated CSV
207 file. It is registered with the dialect name ``'excel'``.
208
209
210.. class:: excel_tab()
211
212 The :class:`excel_tab` class defines the usual properties of an Excel-generated
213 TAB-delimited file. It is registered with the dialect name ``'excel-tab'``.
214
215
216.. class:: Sniffer()
217
218 The :class:`Sniffer` class is used to deduce the format of a CSV file.
219
Benjamin Petersonc7b05922008-04-25 01:29:10 +0000220 The :class:`Sniffer` class provides two methods:
Georg Brandl8ec7f652007-08-15 14:28:01 +0000221
Benjamin Petersonc7b05922008-04-25 01:29:10 +0000222 .. method:: sniff(sample[, delimiters=None])
Georg Brandl8ec7f652007-08-15 14:28:01 +0000223
Benjamin Petersonc7b05922008-04-25 01:29:10 +0000224 Analyze the given *sample* and return a :class:`Dialect` subclass
225 reflecting the parameters found. If the optional *delimiters* parameter
226 is given, it is interpreted as a string containing possible valid
227 delimiter characters.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000228
229
Benjamin Petersonc7b05922008-04-25 01:29:10 +0000230 .. method:: has_header(sample)
Georg Brandl8ec7f652007-08-15 14:28:01 +0000231
Benjamin Petersonc7b05922008-04-25 01:29:10 +0000232 Analyze the sample text (presumed to be in CSV format) and return
233 :const:`True` if the first row appears to be a series of column headers.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000234
Georg Brandl14aaee12008-01-06 16:04:56 +0000235An example for :class:`Sniffer` use::
Georg Brandl8ec7f652007-08-15 14:28:01 +0000236
Georg Brandl14aaee12008-01-06 16:04:56 +0000237 csvfile = open("example.csv")
238 dialect = csv.Sniffer().sniff(csvfile.read(1024))
239 csvfile.seek(0)
240 reader = csv.reader(csvfile, dialect)
241 # ... process CSV file contents here ...
242
243
244The :mod:`csv` module defines the following constants:
Georg Brandl8ec7f652007-08-15 14:28:01 +0000245
246.. data:: QUOTE_ALL
247
248 Instructs :class:`writer` objects to quote all fields.
249
250
251.. data:: QUOTE_MINIMAL
252
253 Instructs :class:`writer` objects to only quote those fields which contain
254 special characters such as *delimiter*, *quotechar* or any of the characters in
255 *lineterminator*.
256
257
258.. data:: QUOTE_NONNUMERIC
259
260 Instructs :class:`writer` objects to quote all non-numeric fields.
261
262 Instructs the reader to convert all non-quoted fields to type *float*.
263
264
265.. data:: QUOTE_NONE
266
267 Instructs :class:`writer` objects to never quote fields. When the current
268 *delimiter* occurs in output data it is preceded by the current *escapechar*
269 character. If *escapechar* is not set, the writer will raise :exc:`Error` if
270 any characters that require escaping are encountered.
271
272 Instructs :class:`reader` to perform no special processing of quote characters.
273
274The :mod:`csv` module defines the following exception:
275
276
277.. exception:: Error
278
279 Raised by any of the functions when an error is detected.
280
281
282.. _csv-fmt-params:
283
284Dialects and Formatting Parameters
285----------------------------------
286
287To make it easier to specify the format of input and output records, specific
288formatting parameters are grouped together into dialects. A dialect is a
289subclass of the :class:`Dialect` class having a set of specific methods and a
290single :meth:`validate` method. When creating :class:`reader` or
291:class:`writer` objects, the programmer can specify a string or a subclass of
292the :class:`Dialect` class as the dialect parameter. In addition to, or instead
293of, the *dialect* parameter, the programmer can also specify individual
294formatting parameters, which have the same names as the attributes defined below
295for the :class:`Dialect` class.
296
297Dialects support the following attributes:
298
299
300.. attribute:: Dialect.delimiter
301
302 A one-character string used to separate fields. It defaults to ``','``.
303
304
305.. attribute:: Dialect.doublequote
306
307 Controls how instances of *quotechar* appearing inside a field should be
308 themselves be quoted. When :const:`True`, the character is doubled. When
309 :const:`False`, the *escapechar* is used as a prefix to the *quotechar*. It
310 defaults to :const:`True`.
311
312 On output, if *doublequote* is :const:`False` and no *escapechar* is set,
313 :exc:`Error` is raised if a *quotechar* is found in a field.
314
315
316.. attribute:: Dialect.escapechar
317
318 A one-character string used by the writer to escape the *delimiter* if *quoting*
319 is set to :const:`QUOTE_NONE` and the *quotechar* if *doublequote* is
320 :const:`False`. On reading, the *escapechar* removes any special meaning from
321 the following character. It defaults to :const:`None`, which disables escaping.
322
323
324.. attribute:: Dialect.lineterminator
325
326 The string used to terminate lines produced by the :class:`writer`. It defaults
327 to ``'\r\n'``.
328
329 .. note::
330
331 The :class:`reader` is hard-coded to recognise either ``'\r'`` or ``'\n'`` as
332 end-of-line, and ignores *lineterminator*. This behavior may change in the
333 future.
334
335
336.. attribute:: Dialect.quotechar
337
338 A one-character string used to quote fields containing special characters, such
339 as the *delimiter* or *quotechar*, or which contain new-line characters. It
340 defaults to ``'"'``.
341
342
343.. attribute:: Dialect.quoting
344
345 Controls when quotes should be generated by the writer and recognised by the
346 reader. It can take on any of the :const:`QUOTE_\*` constants (see section
347 :ref:`csv-contents`) and defaults to :const:`QUOTE_MINIMAL`.
348
349
350.. attribute:: Dialect.skipinitialspace
351
352 When :const:`True`, whitespace immediately following the *delimiter* is ignored.
353 The default is :const:`False`.
354
355
356Reader Objects
357--------------
358
359Reader objects (:class:`DictReader` instances and objects returned by the
360:func:`reader` function) have the following public methods:
361
362
363.. method:: csvreader.next()
364
365 Return the next row of the reader's iterable object as a list, parsed according
366 to the current dialect.
367
368Reader objects have the following public attributes:
369
370
371.. attribute:: csvreader.dialect
372
373 A read-only description of the dialect in use by the parser.
374
375
376.. attribute:: csvreader.line_num
377
378 The number of lines read from the source iterator. This is not the same as the
379 number of records returned, as records can span multiple lines.
380
381 .. versionadded:: 2.5
382
383
Skip Montanaroa032bf42008-08-08 22:52:51 +0000384DictReader objects have the following public attribute:
385
386
387.. attribute:: csvreader.fieldnames
388
389 If not passed as a parameter when creating the object, this attribute is
390 initialized upon first access or when the first record is read from the
391 file.
392
393 .. versionchanged:: 2.6
394
395
Georg Brandl8ec7f652007-08-15 14:28:01 +0000396Writer Objects
397--------------
398
399:class:`Writer` objects (:class:`DictWriter` instances and objects returned by
400the :func:`writer` function) have the following public methods. A *row* must be
401a sequence of strings or numbers for :class:`Writer` objects and a dictionary
402mapping fieldnames to strings or numbers (by passing them through :func:`str`
403first) for :class:`DictWriter` objects. Note that complex numbers are written
404out surrounded by parens. This may cause some problems for other programs which
405read CSV files (assuming they support complex numbers at all).
406
407
408.. method:: csvwriter.writerow(row)
409
410 Write the *row* parameter to the writer's file object, formatted according to
411 the current dialect.
412
413
414.. method:: csvwriter.writerows(rows)
415
416 Write all the *rows* parameters (a list of *row* objects as described above) to
417 the writer's file object, formatted according to the current dialect.
418
419Writer objects have the following public attribute:
420
421
422.. attribute:: csvwriter.dialect
423
424 A read-only description of the dialect in use by the writer.
425
426
427.. _csv-examples:
428
429Examples
430--------
431
432The simplest example of reading a CSV file::
433
434 import csv
435 reader = csv.reader(open("some.csv", "rb"))
436 for row in reader:
437 print row
438
439Reading a file with an alternate format::
440
441 import csv
442 reader = csv.reader(open("passwd", "rb"), delimiter=':', quoting=csv.QUOTE_NONE)
443 for row in reader:
444 print row
445
446The corresponding simplest possible writing example is::
447
448 import csv
449 writer = csv.writer(open("some.csv", "wb"))
450 writer.writerows(someiterable)
451
452Registering a new dialect::
453
454 import csv
455
456 csv.register_dialect('unixpwd', delimiter=':', quoting=csv.QUOTE_NONE)
457
458 reader = csv.reader(open("passwd", "rb"), 'unixpwd')
459
460A slightly more advanced use of the reader --- catching and reporting errors::
461
Benjamin Petersona7b55a32009-02-20 03:31:23 +0000462 import csv, sys
Georg Brandl8ec7f652007-08-15 14:28:01 +0000463 filename = "some.csv"
464 reader = csv.reader(open(filename, "rb"))
465 try:
466 for row in reader:
467 print row
468 except csv.Error, e:
469 sys.exit('file %s, line %d: %s' % (filename, reader.line_num, e))
470
471And while the module doesn't directly support parsing strings, it can easily be
472done::
473
474 import csv
475 for row in csv.reader(['one,two,three']):
476 print row
477
478The :mod:`csv` module doesn't directly support reading and writing Unicode, but
479it is 8-bit-clean save for some problems with ASCII NUL characters. So you can
480write functions or classes that handle the encoding and decoding for you as long
481as you avoid encodings like UTF-16 that use NULs. UTF-8 is recommended.
482
Georg Brandlcf3fb252007-10-21 10:52:38 +0000483:func:`unicode_csv_reader` below is a :term:`generator` that wraps :class:`csv.reader`
Georg Brandl8ec7f652007-08-15 14:28:01 +0000484to handle Unicode CSV data (a list of Unicode strings). :func:`utf_8_encoder`
Georg Brandlcf3fb252007-10-21 10:52:38 +0000485is a :term:`generator` that encodes the Unicode strings as UTF-8, one string (or row) at
Georg Brandl8ec7f652007-08-15 14:28:01 +0000486a time. The encoded strings are parsed by the CSV reader, and
487:func:`unicode_csv_reader` decodes the UTF-8-encoded cells back into Unicode::
488
489 import csv
490
491 def unicode_csv_reader(unicode_csv_data, dialect=csv.excel, **kwargs):
492 # csv.py doesn't do Unicode; encode temporarily as UTF-8:
493 csv_reader = csv.reader(utf_8_encoder(unicode_csv_data),
494 dialect=dialect, **kwargs)
495 for row in csv_reader:
496 # decode UTF-8 back to Unicode, cell by cell:
497 yield [unicode(cell, 'utf-8') for cell in row]
498
499 def utf_8_encoder(unicode_csv_data):
500 for line in unicode_csv_data:
501 yield line.encode('utf-8')
502
503For all other encodings the following :class:`UnicodeReader` and
504:class:`UnicodeWriter` classes can be used. They take an additional *encoding*
505parameter in their constructor and make sure that the data passes the real
506reader or writer encoded as UTF-8::
507
Benjamin Petersona7b55a32009-02-20 03:31:23 +0000508 import csv, codecs, cStringIO
Georg Brandl8ec7f652007-08-15 14:28:01 +0000509
510 class UTF8Recoder:
511 """
512 Iterator that reads an encoded stream and reencodes the input to UTF-8
513 """
514 def __init__(self, f, encoding):
515 self.reader = codecs.getreader(encoding)(f)
516
517 def __iter__(self):
518 return self
519
520 def next(self):
521 return self.reader.next().encode("utf-8")
522
523 class UnicodeReader:
524 """
525 A CSV reader which will iterate over lines in the CSV file "f",
526 which is encoded in the given encoding.
527 """
528
529 def __init__(self, f, dialect=csv.excel, encoding="utf-8", **kwds):
530 f = UTF8Recoder(f, encoding)
531 self.reader = csv.reader(f, dialect=dialect, **kwds)
532
533 def next(self):
534 row = self.reader.next()
535 return [unicode(s, "utf-8") for s in row]
536
537 def __iter__(self):
538 return self
539
540 class UnicodeWriter:
541 """
542 A CSV writer which will write rows to CSV file "f",
543 which is encoded in the given encoding.
544 """
545
546 def __init__(self, f, dialect=csv.excel, encoding="utf-8", **kwds):
547 # Redirect output to a queue
548 self.queue = cStringIO.StringIO()
549 self.writer = csv.writer(self.queue, dialect=dialect, **kwds)
550 self.stream = f
551 self.encoder = codecs.getincrementalencoder(encoding)()
552
553 def writerow(self, row):
554 self.writer.writerow([s.encode("utf-8") for s in row])
555 # Fetch UTF-8 output from the queue ...
556 data = self.queue.getvalue()
557 data = data.decode("utf-8")
558 # ... and reencode it into the target encoding
559 data = self.encoder.encode(data)
560 # write to the target stream
561 self.stream.write(data)
562 # empty queue
563 self.queue.truncate(0)
564
565 def writerows(self, rows):
566 for row in rows:
567 self.writerow(row)
568