Blame - src/tutorial.rs - platform/external/rust/crates/csv

blob: 9d5e607cd5b766567549fb72b93e68ec1ee7cb59 [file] [log] [blame]

Jakub Kotur	c72d720	2020-12-21 17:28:15 +0100	[diff] [blame]	1	/*!
				2	A tutorial for handling CSV data in Rust.
				3
				4	This tutorial will cover basic CSV reading and writing, automatic
				5	(de)serialization with Serde, CSV transformations and performance.
				6
				7	This tutorial is targeted at beginner Rust programmers. Experienced Rust
				8	programmers may find this tutorial to be too verbose, but skimming may be
				9	useful. There is also a
				10	[cookbook](../cookbook/index.html)
				11	of examples for those that prefer more information density.
				12
				13	For an introduction to Rust, please see the
				14	[official book](https://doc.rust-lang.org/book/second-edition/).
				15	If you haven't written any Rust code yet but have written code in another
				16	language, then this tutorial might be accessible to you without needing to read
				17	the book first.
				18
				19	# Table of contents
				20
				21	1. [Setup](#setup)
				22	1. [Basic error handling](#basic-error-handling)
				23	* [Switch to recoverable errors](#switch-to-recoverable-errors)
				24	1. [Reading CSV](#reading-csv)
				25	* [Reading headers](#reading-headers)
				26	* [Delimiters, quotes and variable length records](#delimiters-quotes-and-variable-length-records)
				27	* [Reading with Serde](#reading-with-serde)
				28	* [Handling invalid data with Serde](#handling-invalid-data-with-serde)
				29	1. [Writing CSV](#writing-csv)
				30	* [Writing tab separated values](#writing-tab-separated-values)
				31	* [Writing with Serde](#writing-with-serde)
				32	1. [Pipelining](#pipelining)
				33	* [Filter by search](#filter-by-search)
				34	* [Filter by population count](#filter-by-population-count)
				35	1. [Performance](#performance)
				36	* [Amortizing allocations](#amortizing-allocations)
				37	* [Serde and zero allocation](#serde-and-zero-allocation)
				38	* [CSV parsing without the standard library](#csv-parsing-without-the-standard-library)
				39	1. [Closing thoughts](#closing-thoughts)
				40
				41	# Setup
				42
				43	In this section, we'll get you setup with a simple program that reads CSV data
				44	and prints a "debug" version of each record. This assumes that you have the
				45	[Rust toolchain installed](https://www.rust-lang.org/install.html),
				46	which includes both Rust and Cargo.
				47
				48	We'll start by creating a new Cargo project:
				49
				50	```text
				51	$ cargo new --bin csvtutor
				52	$ cd csvtutor
				53	```
				54
				55	Once inside `csvtutor`, open `Cargo.toml` in your favorite text editor and add
				56	`csv = "1.1"` to your `[dependencies]` section. At this point, your
				57	`Cargo.toml` should look something like this:
				58
				59	```text
				60	[package]
				61	name = "csvtutor"
				62	version = "0.1.0"
				63	authors = ["Your Name"]
				64
				65	[dependencies]
				66	csv = "1.1"
				67	```
				68
				69	Next, let's build your project. Since you added the `csv` crate as a
				70	dependency, Cargo will automatically download it and compile it for you. To
				71	build your project, use Cargo:
				72
				73	```text
				74	$ cargo build
				75	```
				76
				77	This will produce a new binary, `csvtutor`, in your `target/debug` directory.
				78	It won't do much at this point, but you can run it:
				79
				80	```text
				81	$ ./target/debug/csvtutor
				82	Hello, world!
				83	```
				84
				85	Let's make our program do something useful. Our program will read CSV data on
				86	stdin and print debug output for each record on stdout. To write this program,
				87	open `src/main.rs` in your favorite text editor and replace its contents with
				88	this:
				89
				90	```no_run
				91	//tutorial-setup-01.rs
				92	// Import the standard library's I/O module so we can read from stdin.
				93	use std::io;
				94
				95	// The `main` function is where your program starts executing.
				96	fn main() {
				97	// Create a CSV parser that reads data from stdin.
				98	let mut rdr = csv::Reader::from_reader(io::stdin());
				99	// Loop over each record.
				100	for result in rdr.records() {
				101	// An error may occur, so abort the program in an unfriendly way.
				102	// We will make this more friendly later!
				103	let record = result.expect("a CSV record");
				104	// Print a debug version of the record.
				105	println!("{:?}", record);
				106	}
				107	}
				108	```
				109
				110	Don't worry too much about what this code means; we'll dissect it in the next
				111	section. For now, try rebuilding your project:
				112
				113	```text
				114	$ cargo build
				115	```
				116
				117	Assuming that succeeds, let's try running our program. But first, we will need
				118	some CSV data to play with! For that, we will use a random selection of 100
				119	US cities, along with their population size and geographical coordinates. (We
				120	will use this same CSV data throughout the entire tutorial.) To get the data,
				121	download it from github:
				122
				123	```text
				124	$ curl -LO 'https://raw.githubusercontent.com/BurntSushi/rust-csv/master/examples/data/uspop.csv'
				125	```
				126
				127	And now finally, run your program on `uspop.csv`:
				128
				129	```text
				130	$ ./target/debug/csvtutor < uspop.csv
				131	StringRecord(["Davidsons Landing", "AK", "", "65.2419444", "-165.2716667"])
				132	StringRecord(["Kenai", "AK", "7610", "60.5544444", "-151.2583333"])
				133	StringRecord(["Oakman", "AL", "", "33.7133333", "-87.3886111"])
				134	# ... and much more
				135	```
				136
				137	# Basic error handling
				138
				139	Since reading CSV data can result in errors, error handling is pervasive
				140	throughout the examples in this tutorial. Therefore, we're going to spend a
				141	little bit of time going over basic error handling, and in particular, fix
				142	our previous example to show errors in a more friendly way. **If you're already
				143	comfortable with things like `Result` and `try!`/`?` in Rust, then you can
				144	safely skip this section.**
				145
				146	Note that
				147	[The Rust Programming Language Book](https://doc.rust-lang.org/book/second-edition/)
				148	contains an
				149	[introduction to general error handling](https://doc.rust-lang.org/book/second-edition/ch09-00-error-handling.html).
				150	For a deeper dive, see
				151	[my blog post on error handling in Rust](http://blog.burntsushi.net/rust-error-handling/).
				152	The blog post is especially important if you plan on building Rust libraries.
				153
				154	With that out of the way, error handling in Rust comes in two different forms:
				155	unrecoverable errors and recoverable errors.
				156
				157	Unrecoverable errors generally correspond to things like bugs in your program,
				158	which might occur when an invariant or contract is broken. At that point, the
				159	state of your program is unpredictable, and there's typically little recourse
				160	other than panicking. In Rust, a panic is similar to simply aborting your
				161	program, but it will unwind the stack and clean up resources before your
				162	program exits.
				163
				164	On the other hand, recoverable errors generally correspond to predictable
				165	errors. A non-existent file or invalid CSV data are examples of recoverable
				166	errors. In Rust, recoverable errors are handled via `Result`. A `Result`
				167	represents the state of a computation that has either succeeded or failed.
				168	It is defined like so:
				169
				170	```
				171	enum Result<T, E> {
				172	Ok(T),
				173	Err(E),
				174	}
				175	```
				176
				177	That is, a `Result` either contains a value of type `T` when the computation
				178	succeeds, or it contains a value of type `E` when the computation fails.
				179
				180	The relationship between unrecoverable errors and recoverable errors is
				181	important. In particular, it is strongly discouraged to treat recoverable
				182	errors as if they were unrecoverable. For example, panicking when a file could
				183	not be found, or if some CSV data is invalid, is considered bad practice.
				184	Instead, predictable errors should be handled using Rust's `Result` type.
				185
				186	With our new found knowledge, let's re-examine our previous example and dissect
				187	its error handling.
				188
				189	```no_run
				190	//tutorial-error-01.rs
				191	use std::io;
				192
				193	fn main() {
				194	let mut rdr = csv::Reader::from_reader(io::stdin());
				195	for result in rdr.records() {
				196	let record = result.expect("a CSV record");
				197	println!("{:?}", record);
				198	}
				199	}
				200	```
				201
				202	There are two places where an error can occur in this program. The first is
				203	if there was a problem reading a record from stdin. The second is if there is
				204	a problem writing to stdout. In general, we will ignore the latter problem in
				205	this tutorial, although robust command line applications should probably try
				206	to handle it (e.g., when a broken pipe occurs). The former however is worth
				207	looking into in more detail. For example, if a user of this program provides
				208	invalid CSV data, then the program will panic:
				209
				210	```text
				211	$ cat invalid
				212	header1,header2
				213	foo,bar
				214	quux,baz,foobar
				215	$ ./target/debug/csvtutor < invalid
				216	StringRecord { position: Some(Position { byte: 16, line: 2, record: 1 }), fields: ["foo", "bar"] }
				217	thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: UnequalLengths { pos: Some(Position { byte: 24, line: 3, record: 2 }), expected_len: 2, len: 3 }', /checkout/src/libcore/result.rs:859
				218	note: Run with `RUST_BACKTRACE=1` for a backtrace.
				219	```
				220
				221	What happened here? First and foremost, we should talk about why the CSV data
				222	is invalid. The CSV data consists of three records: a header and two data
				223	records. The header and first data record have two fields, but the second
				224	data record has three fields. By default, the csv crate will treat inconsistent
				225	record lengths as an error.
				226	(This behavior can be toggled using the
				227	[`ReaderBuilder::flexible`](../struct.ReaderBuilder.html#method.flexible)
				228	config knob.) This explains why the first data record is printed in this
				229	example, since it has the same number of fields as the header record. That is,
				230	we don't actually hit an error until we parse the second data record.
				231
				232	(Note that the CSV reader automatically interprets the first record as a
				233	header. This can be toggled with the
				234	[`ReaderBuilder::has_headers`](../struct.ReaderBuilder.html#method.has_headers)
				235	config knob.)
				236
				237	So what actually causes the panic to happen in our program? That would be the
				238	first line in our loop:
				239
				240	```ignore
				241	for result in rdr.records() {
				242	let record = result.expect("a CSV record"); // this panics
				243	println!("{:?}", record);
				244	}
				245	```
				246
				247	The key thing to understand here is that `rdr.records()` returns an iterator
				248	that yields `Result` values. That is, instead of yielding records, it yields
				249	a `Result` that contains either a record or an error. The `expect` method,
				250	which is defined on `Result`, unwraps the success value inside the `Result`.
				251	Since the `Result` might contain an error instead, `expect` will panic when
				252	it does contain an error.
				253
				254	It might help to look at the implementation of `expect`:
				255
				256	```ignore
				257	use std::fmt;
				258
				259	// This says, "for all types T and E, where E can be turned into a human
				260	// readable debug message, define the `expect` method."
				261	impl<T, E: fmt::Debug> Result<T, E> {
				262	fn expect(self, msg: &str) -> T {
				263	match self {
				264	Ok(t) => t,
				265	Err(e) => panic!("{}: {:?}", msg, e),
				266	}
				267	}
				268	}
				269	```
				270
				271	Since this causes a panic if the CSV data is invalid, and invalid CSV data is
				272	a perfectly predictable error, we've turned what should be a recoverable
				273	error into an unrecoverable error. We did this because it is expedient to
				274	use unrecoverable errors. Since this is bad practice, we will endeavor to avoid
				275	unrecoverable errors throughout the rest of the tutorial.
				276
				277	## Switch to recoverable errors
				278
				279	We'll convert our unrecoverable error to a recoverable error in 3 steps. First,
				280	let's get rid of the panic and print an error message manually:
				281
				282	```no_run
				283	//tutorial-error-02.rs
				284	use std::io;
				285	use std::process;
				286
				287	fn main() {
				288	let mut rdr = csv::Reader::from_reader(io::stdin());
				289	for result in rdr.records() {
				290	// Examine our Result.
				291	// If there was no problem, print the record.
				292	// Otherwise, print the error message and quit the program.
				293	match result {
				294	Ok(record) => println!("{:?}", record),
				295	Err(err) => {
				296	println!("error reading CSV from <stdin>: {}", err);
				297	process::exit(1);
				298	}
				299	}
				300	}
				301	}
				302	```
				303
				304	If we run our program again, we'll still see an error message, but it is no
				305	longer a panic message:
				306
				307	```text
				308	$ cat invalid
				309	header1,header2
				310	foo,bar
				311	quux,baz,foobar
				312	$ ./target/debug/csvtutor < invalid
				313	StringRecord { position: Some(Position { byte: 16, line: 2, record: 1 }), fields: ["foo", "bar"] }
				314	error reading CSV from <stdin>: CSV error: record 2 (line: 3, byte: 24): found record with 3 fields, but the previous record has 2 fields
				315	```
				316
				317	The second step for moving to recoverable errors is to put our CSV record loop
				318	into a separate function. This function then has the option of returning an
				319	error, which our `main` function can then inspect and decide what to do with.
				320
				321	```no_run
				322	//tutorial-error-03.rs
				323	use std::error::Error;
				324	use std::io;
				325	use std::process;
				326
				327	fn main() {
				328	if let Err(err) = run() {
				329	println!("{}", err);
				330	process::exit(1);
				331	}
				332	}
				333
				334	fn run() -> Result<(), Box<dyn Error>> {
				335	let mut rdr = csv::Reader::from_reader(io::stdin());
				336	for result in rdr.records() {
				337	// Examine our Result.
				338	// If there was no problem, print the record.
				339	// Otherwise, convert our error to a Box<dyn Error> and return it.
				340	match result {
				341	Err(err) => return Err(From::from(err)),
				342	Ok(record) => {
				343	println!("{:?}", record);
				344	}
				345	}
				346	}
				347	Ok(())
				348	}
				349	```
				350
				351	Our new function, `run`, has a return type of `Result<(), Box<dyn Error>>`. In
				352	simple terms, this says that `run` either returns nothing when successful, or
				353	if an error occurred, it returns a `Box<dyn Error>`, which stands for "any kind of
				354	error." A `Box<dyn Error>` is hard to inspect if we cared about the specific error
				355	that occurred. But for our purposes, all we need to do is gracefully print an
				356	error message and exit the program.
				357
				358	The third and final step is to replace our explicit `match` expression with a
				359	special Rust language feature: the question mark.
				360
				361	```no_run
				362	//tutorial-error-04.rs
				363	use std::error::Error;
				364	use std::io;
				365	use std::process;
				366
				367	fn main() {
				368	if let Err(err) = run() {
				369	println!("{}", err);
				370	process::exit(1);
				371	}
				372	}
				373
				374	fn run() -> Result<(), Box<dyn Error>> {
				375	let mut rdr = csv::Reader::from_reader(io::stdin());
				376	for result in rdr.records() {
				377	// This is effectively the same code as our `match` in the
				378	// previous example. In other words, `?` is syntactic sugar.
				379	let record = result?;
				380	println!("{:?}", record);
				381	}
				382	Ok(())
				383	}
				384	```
				385
				386	This last step shows how we can use the `?` to automatically forward errors
				387	to our caller without having to do explicit case analysis with `match`
				388	ourselves. We will use the `?` heavily throughout this tutorial, and it's
				389	important to note that it can **only be used in functions that return
				390	`Result`.**
				391
				392	We'll end this section with a word of caution: using `Box<dyn Error>` as our error
				393	type is the minimally acceptable thing we can do here. Namely, while it allows
				394	our program to gracefully handle errors, it makes it hard for callers to
				395	inspect the specific error condition that occurred. However, since this is a
				396	tutorial on writing command line programs that do CSV parsing, we will consider
				397	ourselves satisfied. If you'd like to know more, or are interested in writing
				398	a library that handles CSV data, then you should check out my
				399	[blog post on error handling](http://blog.burntsushi.net/rust-error-handling/).
				400
				401	With all that said, if all you're doing is writing a one-off program to do
				402	CSV transformations, then using methods like `expect` and panicking when an
				403	error occurs is a perfectly reasonable thing to do. Nevertheless, this tutorial
				404	will endeavor to show idiomatic code.
				405
				406	# Reading CSV
				407
				408	Now that we've got you setup and covered basic error handling, it's time to do
				409	what we came here to do: handle CSV data. We've already seen how to read
				410	CSV data from `stdin`, but this section will cover how to read CSV data from
				411	files and how to configure our CSV reader to data formatted with different
				412	delimiters and quoting strategies.
				413
				414	First up, let's adapt the example we've been working with to accept a file
				415	path argument instead of stdin.
				416
				417	```no_run
				418	//tutorial-read-01.rs
				419	use std::env;
				420	use std::error::Error;
				421	use std::ffi::OsString;
				422	use std::fs::File;
				423	use std::process;
				424
				425	fn run() -> Result<(), Box<dyn Error>> {
				426	let file_path = get_first_arg()?;
				427	let file = File::open(file_path)?;
				428	let mut rdr = csv::Reader::from_reader(file);
				429	for result in rdr.records() {
				430	let record = result?;
				431	println!("{:?}", record);
				432	}
				433	Ok(())
				434	}
				435
				436	/// Returns the first positional argument sent to this process. If there are no
				437	/// positional arguments, then this returns an error.
				438	fn get_first_arg() -> Result<OsString, Box<dyn Error>> {
				439	match env::args_os().nth(1) {
				440	None => Err(From::from("expected 1 argument, but got none")),
				441	Some(file_path) => Ok(file_path),
				442	}
				443	}
				444
				445	fn main() {
				446	if let Err(err) = run() {
				447	println!("{}", err);
				448	process::exit(1);
				449	}
				450	}
				451	```
				452
				453	If you replace the contents of your `src/main.rs` file with the above code,
				454	then you should be able to rebuild your project and try it out:
				455
				456	```text
				457	$ cargo build
				458	$ ./target/debug/csvtutor uspop.csv
				459	StringRecord(["Davidsons Landing", "AK", "", "65.2419444", "-165.2716667"])
				460	StringRecord(["Kenai", "AK", "7610", "60.5544444", "-151.2583333"])
				461	StringRecord(["Oakman", "AL", "", "33.7133333", "-87.3886111"])
				462	# ... and much more
				463	```
				464
				465	This example contains two new pieces of code:
				466
				467	1. Code for querying the positional arguments of your program. We put this code
				468	into its own function called `get_first_arg`. Our program expects a file
				469	path in the first position (which is indexed at `1`; the argument at index
				470	`0` is the executable name), so if one doesn't exist, then `get_first_arg`
				471	returns an error.
				472	2. Code for opening a file. In `run`, we open a file using `File::open`. If
				473	there was a problem opening the file, we forward the error to the caller of
				474	`run` (which is `main` in this program). Note that we do not wrap the
				475	`File` in a buffer. The CSV reader does buffering internally, so there's
				476	no need for the caller to do it.
				477
				478	Now is a good time to introduce an alternate CSV reader constructor, which
				479	makes it slightly more convenient to open CSV data from a file. That is,
				480	instead of:
				481
				482	```ignore
				483	let file_path = get_first_arg()?;
				484	let file = File::open(file_path)?;
				485	let mut rdr = csv::Reader::from_reader(file);
				486	```
				487
				488	you can use:
				489
				490	```ignore
				491	let file_path = get_first_arg()?;
				492	let mut rdr = csv::Reader::from_path(file_path)?;
				493	```
				494
				495	`csv::Reader::from_path` will open the file for you and return an error if
				496	the file could not be opened.
				497
				498	## Reading headers
				499
				500	If you had a chance to look at the data inside `uspop.csv`, you would notice
				501	that there is a header record that looks like this:
				502
				503	```text
				504	City,State,Population,Latitude,Longitude
				505	```
				506
				507	Now, if you look back at the output of the commands you've run so far, you'll
				508	notice that the header record is never printed. Why is that? By default, the
				509	CSV reader will interpret the first record in CSV data as a header, which
				510	is typically distinct from the actual data in the records that follow.
				511	Therefore, the header record is always skipped whenever you try to read or
				512	iterate over the records in CSV data.
				513
				514	The CSV reader does not try to be smart about the header record and does
				515	not employ any heuristics for automatically detecting whether the first
				516	record is a header or not. Instead, if you don't want to treat the first record
				517	as a header, you'll need to tell the CSV reader that there are no headers.
				518
				519	To configure a CSV reader to do this, we'll need to use a
				520	[`ReaderBuilder`](../struct.ReaderBuilder.html)
				521	to build a CSV reader with our desired configuration. Here's an example that
				522	does just that. (Note that we've moved back to reading from `stdin`, since it
				523	produces terser examples.)
				524
				525	```no_run
				526	//tutorial-read-headers-01.rs
				527	# use std::error::Error;
				528	# use std::io;
				529	# use std::process;
				530	#
				531	fn run() -> Result<(), Box<dyn Error>> {
				532	let mut rdr = csv::ReaderBuilder::new()
				533	.has_headers(false)
				534	.from_reader(io::stdin());
				535	for result in rdr.records() {
				536	let record = result?;
				537	println!("{:?}", record);
				538	}
				539	Ok(())
				540	}
				541	#
				542	# fn main() {
				543	# if let Err(err) = run() {
				544	# println!("{}", err);
				545	# process::exit(1);
				546	# }
				547	# }
				548	```
				549
				550	If you compile and run this program with our `uspop.csv` data, then you'll see
				551	that the header record is now printed:
				552
				553	```text
				554	$ cargo build
				555	$ ./target/debug/csvtutor < uspop.csv
				556	StringRecord(["City", "State", "Population", "Latitude", "Longitude"])
				557	StringRecord(["Davidsons Landing", "AK", "", "65.2419444", "-165.2716667"])
				558	StringRecord(["Kenai", "AK", "7610", "60.5544444", "-151.2583333"])
				559	StringRecord(["Oakman", "AL", "", "33.7133333", "-87.3886111"])
				560	```
				561
				562	If you ever need to access the header record directly, then you can use the
				563	[`Reader::header`](../struct.Reader.html#method.headers)
				564	method like so:
				565
				566	```no_run
				567	//tutorial-read-headers-02.rs
				568	# use std::error::Error;
				569	# use std::io;
				570	# use std::process;
				571	#
				572	fn run() -> Result<(), Box<dyn Error>> {
				573	let mut rdr = csv::Reader::from_reader(io::stdin());
				574	{
				575	// We nest this call in its own scope because of lifetimes.
				576	let headers = rdr.headers()?;
				577	println!("{:?}", headers);
				578	}
				579	for result in rdr.records() {
				580	let record = result?;
				581	println!("{:?}", record);
				582	}
				583	// We can ask for the headers at any time. There's no need to nest this
				584	// call in its own scope because we never try to borrow the reader again.
				585	let headers = rdr.headers()?;
				586	println!("{:?}", headers);
				587	Ok(())
				588	}
				589	#
				590	# fn main() {
				591	# if let Err(err) = run() {
				592	# println!("{}", err);
				593	# process::exit(1);
				594	# }
				595	# }
				596	```
				597
				598	One interesting thing to note in this example is that we put the call to
				599	`rdr.headers()` in its own scope. We do this because `rdr.headers()` returns
				600	a borrow of the reader's internal header state. The nested scope in this
				601	code allows the borrow to end before we try to iterate over the records. If
				602	we didn't nest the call to `rdr.headers()` in its own scope, then the code
				603	wouldn't compile because we cannot borrow the reader's headers at the same time
				604	that we try to borrow the reader to iterate over its records.
				605
				606	Another way of solving this problem is to clone the header record:
				607
				608	```ignore
				609	let headers = rdr.headers()?.clone();
				610	```
				611
				612	This converts it from a borrow of the CSV reader to a new owned value. This
				613	makes the code a bit easier to read, but at the cost of copying the header
				614	record into a new allocation.
				615
				616	## Delimiters, quotes and variable length records
				617
				618	In this section we'll temporarily depart from our `uspop.csv` data set and
				619	show how to read some CSV data that is a little less clean. This CSV data
				620	uses `;` as a delimiter, escapes quotes with `\"` (instead of `""`) and has
				621	records of varying length. Here's the data, which contains a list of WWE
				622	wrestlers and the year they started, if it's known:
				623
				624	```text
				625	$ cat strange.csv
				626	"\"Hacksaw\" Jim Duggan";1987
				627	"Bret \"Hit Man\" Hart";1984
				628	# We're not sure when Rafael started, so omit the year.
				629	Rafael Halperin
				630	"\"Big Cat\" Ernie Ladd";1964
				631	"\"Macho Man\" Randy Savage";1985
				632	"Jake \"The Snake\" Roberts";1986
				633	```
				634
				635	To read this CSV data, we'll want to do the following:
				636
				637	1. Disable headers, since this data has none.
				638	2. Change the delimiter from `,` to `;`.
				639	3. Change the quote strategy from doubled (e.g., `""`) to escaped (e.g., `\"`).
				640	4. Permit flexible length records, since some omit the year.
				641	5. Ignore lines beginning with a `#`.
				642
				643	All of this (and more!) can be configured with a
				644	[`ReaderBuilder`](../struct.ReaderBuilder.html),
				645	as seen in the following example:
				646
				647	```no_run
				648	//tutorial-read-delimiter-01.rs
				649	# use std::error::Error;
				650	# use std::io;
				651	# use std::process;
				652	#
				653	fn run() -> Result<(), Box<dyn Error>> {
				654	let mut rdr = csv::ReaderBuilder::new()
				655	.has_headers(false)
				656	.delimiter(b';')
				657	.double_quote(false)
				658	.escape(Some(b'\\'))
				659	.flexible(true)
				660	.comment(Some(b'#'))
				661	.from_reader(io::stdin());
				662	for result in rdr.records() {
				663	let record = result?;
				664	println!("{:?}", record);
				665	}
				666	Ok(())
				667	}
				668	#
				669	# fn main() {
				670	# if let Err(err) = run() {
				671	# println!("{}", err);
				672	# process::exit(1);
				673	# }
				674	# }
				675	```
				676
				677	Now re-compile your project and try running the program on `strange.csv`:
				678
				679	```text
				680	$ cargo build
				681	$ ./target/debug/csvtutor < strange.csv
				682	StringRecord(["\"Hacksaw\" Jim Duggan", "1987"])
				683	StringRecord(["Bret \"Hit Man\" Hart", "1984"])
				684	StringRecord(["Rafael Halperin"])
				685	StringRecord(["\"Big Cat\" Ernie Ladd", "1964"])
				686	StringRecord(["\"Macho Man\" Randy Savage", "1985"])
				687	StringRecord(["Jake \"The Snake\" Roberts", "1986"])
				688	```
				689
				690	You should feel encouraged to play around with the settings. Some interesting
				691	things you might try:
				692
				693	1. If you remove the `escape` setting, notice that no CSV errors are reported.
				694	Instead, records are still parsed. This is a feature of the CSV parser. Even
				695	though it gets the data slightly wrong, it still provides a parse that you
				696	might be able to work with. This is a useful property given the messiness
				697	of real world CSV data.
				698	2. If you remove the `delimiter` setting, parsing still succeeds, although
				699	every record has exactly one field.
				700	3. If you remove the `flexible` setting, the reader will print the first two
				701	records (since they both have the same number of fields), but will return a
				702	parse error on the third record, since it has only one field.
				703
				704	This covers most of the things you might want to configure on your CSV reader,
				705	although there are a few other knobs. For example, you can change the record
				706	terminator from a new line to any other character. (By default, the terminator
				707	is `CRLF`, which treats each of `\r\n`, `\r` and `\n` as single record
				708	terminators.) For more details, see the documentation and examples for each of
				709	the methods on
				710	[`ReaderBuilder`](../struct.ReaderBuilder.html).
				711
				712	## Reading with Serde
				713
				714	One of the most convenient features of this crate is its support for
				715	[Serde](https://serde.rs/).
				716	Serde is a framework for automatically serializing and deserializing data into
				717	Rust types. In simpler terms, that means instead of iterating over records
				718	as an array of string fields, we can iterate over records of a specific type
				719	of our choosing.
				720
				721	For example, let's take a look at some data from our `uspop.csv` file:
				722
				723	```text
				724	City,State,Population,Latitude,Longitude
				725	Davidsons Landing,AK,,65.2419444,-165.2716667
				726	Kenai,AK,7610,60.5544444,-151.2583333
				727	```
				728
				729	While some of these fields make sense as strings (`City`, `State`), other
				730	fields look more like numbers. For example, `Population` looks like it contains
				731	integers while `Latitude` and `Longitude` appear to contain decimals. If we
				732	wanted to convert these fields to their "proper" types, then we need to do
				733	a lot of manual work. This next example shows how.
				734
				735	```no_run
				736	//tutorial-read-serde-01.rs
				737	# use std::error::Error;
				738	# use std::io;
				739	# use std::process;
				740	#
				741	fn run() -> Result<(), Box<dyn Error>> {
				742	let mut rdr = csv::Reader::from_reader(io::stdin());
				743	for result in rdr.records() {
				744	let record = result?;
				745
				746	let city = &record[0];
				747	let state = &record[1];
				748	// Some records are missing population counts, so if we can't
				749	// parse a number, treat the population count as missing instead
				750	// of returning an error.
				751	let pop: Option<u64> = record[2].parse().ok();
				752	// Lucky us! Latitudes and longitudes are available for every record.
				753	// Therefore, if one couldn't be parsed, return an error.
				754	let latitude: f64 = record[3].parse()?;
				755	let longitude: f64 = record[4].parse()?;
				756
				757	println!(
				758	"city: {:?}, state: {:?}, \
				759	pop: {:?}, latitude: {:?}, longitude: {:?}",
				760	city, state, pop, latitude, longitude);
				761	}
				762	Ok(())
				763	}
				764	#
				765	# fn main() {
				766	# if let Err(err) = run() {
				767	# println!("{}", err);
				768	# process::exit(1);
				769	# }
				770	# }
				771	```
				772
				773	The problem here is that we need to parse each individual field manually, which
				774	can be labor intensive and repetitive. Serde, however, makes this process
				775	automatic. For example, we can ask to deserialize every record into a tuple
				776	type: `(String, String, Option<u64>, f64, f64)`.
				777
				778	```no_run
				779	//tutorial-read-serde-02.rs
				780	# use std::error::Error;
				781	# use std::io;
				782	# use std::process;
				783	#
				784	// This introduces a type alias so that we can conveniently reference our
				785	// record type.
				786	type Record = (String, String, Option<u64>, f64, f64);
				787
				788	fn run() -> Result<(), Box<dyn Error>> {
				789	let mut rdr = csv::Reader::from_reader(io::stdin());
				790	// Instead of creating an iterator with the `records` method, we create
				791	// an iterator with the `deserialize` method.
				792	for result in rdr.deserialize() {
				793	// We must tell Serde what type we want to deserialize into.
				794	let record: Record = result?;
				795	println!("{:?}", record);
				796	}
				797	Ok(())
				798	}
				799	#
				800	# fn main() {
				801	# if let Err(err) = run() {
				802	# println!("{}", err);
				803	# process::exit(1);
				804	# }
				805	# }
				806	```
				807
				808	Running this code should show similar output as previous examples:
				809
				810	```text
				811	$ cargo build
				812	$ ./target/debug/csvtutor < uspop.csv
				813	("Davidsons Landing", "AK", None, 65.2419444, -165.2716667)
				814	("Kenai", "AK", Some(7610), 60.5544444, -151.2583333)
				815	("Oakman", "AL", None, 33.7133333, -87.3886111)
				816	# ... and much more
				817	```
				818
				819	One of the downsides of using Serde this way is that the type you use must
				820	match the order of fields as they appear in each record. This can be a pain
				821	if your CSV data has a header record, since you might tend to think about each
				822	field as a value of a particular named field rather than as a numbered field.
				823	One way we might achieve this is to deserialize our record into a map type like
				824	[`HashMap`](https://doc.rust-lang.org/std/collections/struct.HashMap.html)
				825	or
				826	[`BTreeMap`](https://doc.rust-lang.org/std/collections/struct.BTreeMap.html).
				827	The next example shows how, and in particular, notice that the only thing that
				828	changed from the last example is the definition of the `Record` type alias and
				829	a new `use` statement that imports `HashMap` from the standard library:
				830
				831	```no_run
				832	//tutorial-read-serde-03.rs
				833	use std::collections::HashMap;
				834	# use std::error::Error;
				835	# use std::io;
				836	# use std::process;
				837
				838	// This introduces a type alias so that we can conveniently reference our
				839	// record type.
				840	type Record = HashMap<String, String>;
				841
				842	fn run() -> Result<(), Box<dyn Error>> {
				843	let mut rdr = csv::Reader::from_reader(io::stdin());
				844	for result in rdr.deserialize() {
				845	let record: Record = result?;
				846	println!("{:?}", record);
				847	}
				848	Ok(())
				849	}
				850	#
				851	# fn main() {
				852	# if let Err(err) = run() {
				853	# println!("{}", err);
				854	# process::exit(1);
				855	# }
				856	# }
				857	```
				858
				859	Running this program shows similar results as before, but each record is
				860	printed as a map:
				861
				862	```text
				863	$ cargo build
				864	$ ./target/debug/csvtutor < uspop.csv
				865	{"City": "Davidsons Landing", "Latitude": "65.2419444", "State": "AK", "Population": "", "Longitude": "-165.2716667"}
				866	{"City": "Kenai", "Population": "7610", "State": "AK", "Longitude": "-151.2583333", "Latitude": "60.5544444"}
				867	{"State": "AL", "City": "Oakman", "Longitude": "-87.3886111", "Population": "", "Latitude": "33.7133333"}
				868	```
				869
				870	This method works especially well if you need to read CSV data with header
				871	records, but whose exact structure isn't known until your program runs.
				872	However, in our case, we know the structure of the data in `uspop.csv`. In
				873	particular, with the `HashMap` approach, we've lost the specific types we had
				874	for each field in the previous example when we deserialized each record into a
				875	`(String, String, Option<u64>, f64, f64)`. Is there a way to identify fields
				876	by their corresponding header name and assign each field its own unique
				877	type? The answer is yes, but we'll need to bring in Serde's `derive` feature
				878	first. You can do that by adding this to the `[dependencies]` section of your
				879	`Cargo.toml` file:
				880
				881	```text
				882	serde = { version = "1", features = ["derive"] }
				883	```
				884
				885	With these crates added to our project, we can now define our own custom struct
				886	that represents our record. We then ask Serde to automatically write the glue
				887	code required to populate our struct from a CSV record. The next example shows
				888	how. Don't miss the new Serde imports!
				889
				890	```no_run
				891	//tutorial-read-serde-04.rs
				892	use std::error::Error;
				893	use std::io;
				894	use std::process;
				895
				896	// This lets us write `#[derive(Deserialize)]`.
				897	use serde::Deserialize;
				898
				899	// We don't need to derive `Debug` (which doesn't require Serde), but it's a
				900	// good habit to do it for all your types.
				901	//
				902	// Notice that the field names in this struct are NOT in the same order as
				903	// the fields in the CSV data!
				904	#[derive(Debug, Deserialize)]
				905	#[serde(rename_all = "PascalCase")]
				906	struct Record {
				907	latitude: f64,
				908	longitude: f64,
				909	population: Option<u64>,
				910	city: String,
				911	state: String,
				912	}
				913
				914	fn run() -> Result<(), Box<dyn Error>> {
				915	let mut rdr = csv::Reader::from_reader(io::stdin());
				916	for result in rdr.deserialize() {
				917	let record: Record = result?;
				918	println!("{:?}", record);
				919	// Try this if you don't like each record smushed on one line:
				920	// println!("{:#?}", record);
				921	}
				922	Ok(())
				923	}
				924
				925	fn main() {
				926	if let Err(err) = run() {
				927	println!("{}", err);
				928	process::exit(1);
				929	}
				930	}
				931	```
				932
				933	Compile and run this program to see similar output as before:
				934
				935	```text
				936	$ cargo build
				937	$ ./target/debug/csvtutor < uspop.csv
				938	Record { latitude: 65.2419444, longitude: -165.2716667, population: None, city: "Davidsons Landing", state: "AK" }
				939	Record { latitude: 60.5544444, longitude: -151.2583333, population: Some(7610), city: "Kenai", state: "AK" }
				940	Record { latitude: 33.7133333, longitude: -87.3886111, population: None, city: "Oakman", state: "AL" }
				941	```
				942
				943	Once again, we didn't need to change our `run` function at all: we're still
				944	iterating over records using the `deserialize` iterator that we started with
				945	in the beginning of this section. The only thing that changed in this example
				946	was the definition of the `Record` type and a new `use` statement. Our `Record`
				947	type is now a custom struct that we defined instead of a type alias, and as a
				948	result, Serde doesn't know how to deserialize it by default. However, a special
				949	compiler plugin provided by Serde is available, which will read your struct
				950	definition at compile time and generate code that will deserialize a CSV record
				951	into a `Record` value. To see what happens if you leave out the automatic
				952	derive, change `#[derive(Debug, Deserialize)]` to `#[derive(Debug)]`.
				953
				954	One other thing worth mentioning in this example is the use of
				955	`#[serde(rename_all = "PascalCase")]`. This directive helps Serde map your
				956	struct's field names to the header names in the CSV data. If you recall, our
				957	header record is:
				958
				959	```text
				960	City,State,Population,Latitude,Longitude
				961	```
				962
				963	Notice that each name is capitalized, but the fields in our struct are not. The
				964	`#[serde(rename_all = "PascalCase")]` directive fixes that by interpreting each
				965	field in `PascalCase`, where the first letter of the field is capitalized. If
				966	we didn't tell Serde about the name remapping, then the program will quit with
				967	an error:
				968
				969	```text
				970	$ ./target/debug/csvtutor < uspop.csv
				971	CSV deserialize error: record 1 (line: 2, byte: 41): missing field `latitude`
				972	```
				973
				974	We could have fixed this through other means. For example, we could have used
				975	capital letters in our field names:
				976
				977	```ignore
				978	#[derive(Debug, Deserialize)]
				979	struct Record {
				980	Latitude: f64,
				981	Longitude: f64,
				982	Population: Option<u64>,
				983	City: String,
				984	State: String,
				985	}
				986	```
				987
				988	However, this violates Rust naming style. (In fact, the Rust compiler
				989	will even warn you that the names do not follow convention!)
				990
				991	Another way to fix this is to ask Serde to rename each field individually. This
				992	is useful when there is no consistent name mapping from fields to header names:
				993
				994	```ignore
				995	#[derive(Debug, Deserialize)]
				996	struct Record {
				997	#[serde(rename = "Latitude")]
				998	latitude: f64,
				999	#[serde(rename = "Longitude")]
				1000	longitude: f64,
				1001	#[serde(rename = "Population")]
				1002	population: Option<u64>,
				1003	#[serde(rename = "City")]
				1004	city: String,
				1005	#[serde(rename = "State")]
				1006	state: String,
				1007	}
				1008	```
				1009
				1010	To read more about renaming fields and about other Serde directives, please
				1011	consult the
				1012	[Serde documentation on attributes](https://serde.rs/attributes.html).
				1013
				1014	## Handling invalid data with Serde
				1015
				1016	In this section we will see a brief example of how to deal with data that isn't
				1017	clean. To do this exercise, we'll work with a slightly tweaked version of the
				1018	US population data we've been using throughout this tutorial. This version of
				1019	the data is slightly messier than what we've been using. You can get it like
				1020	so:
				1021
				1022	```text
				1023	$ curl -LO 'https://raw.githubusercontent.com/BurntSushi/rust-csv/master/examples/data/uspop-null.csv'
				1024	```
				1025
				1026	Let's start by running our program from the previous section:
				1027
				1028	```no_run
				1029	//tutorial-read-serde-invalid-01.rs
				1030	# use std::error::Error;
				1031	# use std::io;
				1032	# use std::process;
				1033	#
				1034	# use serde::Deserialize;
				1035	#
				1036	#[derive(Debug, Deserialize)]
				1037	#[serde(rename_all = "PascalCase")]
				1038	struct Record {
				1039	latitude: f64,
				1040	longitude: f64,
				1041	population: Option<u64>,
				1042	city: String,
				1043	state: String,
				1044	}
				1045
				1046	fn run() -> Result<(), Box<dyn Error>> {
				1047	let mut rdr = csv::Reader::from_reader(io::stdin());
				1048	for result in rdr.deserialize() {
				1049	let record: Record = result?;
				1050	println!("{:?}", record);
				1051	}
				1052	Ok(())
				1053	}
				1054	#
				1055	# fn main() {
				1056	# if let Err(err) = run() {
				1057	# println!("{}", err);
				1058	# process::exit(1);
				1059	# }
				1060	# }
				1061	```
				1062
				1063	Compile and run it on our messier data:
				1064
				1065	```text
				1066	$ cargo build
				1067	$ ./target/debug/csvtutor < uspop-null.csv
				1068	Record { latitude: 65.2419444, longitude: -165.2716667, population: None, city: "Davidsons Landing", state: "AK" }
				1069	Record { latitude: 60.5544444, longitude: -151.2583333, population: Some(7610), city: "Kenai", state: "AK" }
				1070	Record { latitude: 33.7133333, longitude: -87.3886111, population: None, city: "Oakman", state: "AL" }
				1071	# ... more records
				1072	CSV deserialize error: record 42 (line: 43, byte: 1710): field 2: invalid digit found in string
				1073	```
				1074
				1075	Oops! What happened? The program printed several records, but stopped when it
				1076	tripped over a deserialization problem. The error message says that it found
				1077	an invalid digit in the field at index `2` (which is the `Population` field)
				1078	on line 43. What does line 43 look like?
				1079
				1080	```text
				1081	$ head -n 43 uspop-null.csv \| tail -n1
				1082	Flint Springs,KY,NULL,37.3433333,-86.7136111
				1083	```
				1084
				1085	Ah! The third field (index `2`) is supposed to either be empty or contain a
				1086	population count. However, in this data, it seems that `NULL` sometimes appears
				1087	as a value, presumably to indicate that there is no count available.
				1088
				1089	The problem with our current program is that it fails to read this record
				1090	because it doesn't know how to deserialize a `NULL` string into an
				1091	`Option<u64>`. That is, a `Option<u64>` either corresponds to an empty field
				1092	or an integer.
				1093
				1094	To fix this, we tell Serde to convert any deserialization errors on this field
				1095	to a `None` value, as shown in this next example:
				1096
				1097	```no_run
				1098	//tutorial-read-serde-invalid-02.rs
				1099	# use std::error::Error;
				1100	# use std::io;
				1101	# use std::process;
				1102	#
				1103	# use serde::Deserialize;
				1104	#[derive(Debug, Deserialize)]
				1105	#[serde(rename_all = "PascalCase")]
				1106	struct Record {
				1107	latitude: f64,
				1108	longitude: f64,
				1109	#[serde(deserialize_with = "csv::invalid_option")]
				1110	population: Option<u64>,
				1111	city: String,
				1112	state: String,
				1113	}
				1114
				1115	fn run() -> Result<(), Box<dyn Error>> {
				1116	let mut rdr = csv::Reader::from_reader(io::stdin());
				1117	for result in rdr.deserialize() {
				1118	let record: Record = result?;
				1119	println!("{:?}", record);
				1120	}
				1121	Ok(())
				1122	}
				1123	#
				1124	# fn main() {
				1125	# if let Err(err) = run() {
				1126	# println!("{}", err);
				1127	# process::exit(1);
				1128	# }
				1129	# }
				1130	```
				1131
				1132	If you compile and run this example, then it should run to completion just
				1133	like the other examples:
				1134
				1135	```text
				1136	$ cargo build
				1137	$ ./target/debug/csvtutor < uspop-null.csv
				1138	Record { latitude: 65.2419444, longitude: -165.2716667, population: None, city: "Davidsons Landing", state: "AK" }
				1139	Record { latitude: 60.5544444, longitude: -151.2583333, population: Some(7610), city: "Kenai", state: "AK" }
				1140	Record { latitude: 33.7133333, longitude: -87.3886111, population: None, city: "Oakman", state: "AL" }
				1141	# ... and more
				1142	```
				1143
				1144	The only change in this example was adding this attribute to the `population`
				1145	field in our `Record` type:
				1146
				1147	```ignore
				1148	#[serde(deserialize_with = "csv::invalid_option")]
				1149	```
				1150
				1151	The
				1152	[`invalid_option`](../fn.invalid_option.html)
				1153	function is a generic helper function that does one very simple thing: when
				1154	applied to `Option` fields, it will convert any deserialization error into a
				1155	`None` value. This is useful when you need to work with messy CSV data.
				1156
				1157	# Writing CSV
				1158
				1159	In this section we'll show a few examples that write CSV data. Writing CSV data
				1160	tends to be a bit more straight-forward than reading CSV data, since you get to
				1161	control the output format.
				1162
				1163	Let's start with the most basic example: writing a few CSV records to `stdout`.
				1164
				1165	```no_run
				1166	//tutorial-write-01.rs
				1167	use std::error::Error;
				1168	use std::io;
				1169	use std::process;
				1170
				1171	fn run() -> Result<(), Box<dyn Error>> {
				1172	let mut wtr = csv::Writer::from_writer(io::stdout());
				1173	// Since we're writing records manually, we must explicitly write our
				1174	// header record. A header record is written the same way that other
				1175	// records are written.
				1176	wtr.write_record(&["City", "State", "Population", "Latitude", "Longitude"])?;
				1177	wtr.write_record(&["Davidsons Landing", "AK", "", "65.2419444", "-165.2716667"])?;
				1178	wtr.write_record(&["Kenai", "AK", "7610", "60.5544444", "-151.2583333"])?;
				1179	wtr.write_record(&["Oakman", "AL", "", "33.7133333", "-87.3886111"])?;
				1180
				1181	// A CSV writer maintains an internal buffer, so it's important
				1182	// to flush the buffer when you're done.
				1183	wtr.flush()?;
				1184	Ok(())
				1185	}
				1186
				1187	fn main() {
				1188	if let Err(err) = run() {
				1189	println!("{}", err);
				1190	process::exit(1);
				1191	}
				1192	}
				1193	```
				1194
				1195	Compiling and running this example results in CSV data being printed:
				1196
				1197	```text
				1198	$ cargo build
				1199	$ ./target/debug/csvtutor
				1200	City,State,Population,Latitude,Longitude
				1201	Davidsons Landing,AK,,65.2419444,-165.2716667
				1202	Kenai,AK,7610,60.5544444,-151.2583333
				1203	Oakman,AL,,33.7133333,-87.3886111
				1204	```
				1205
				1206	Before moving on, it's worth taking a closer look at the `write_record`
				1207	method. In this example, it looks rather simple, but if you're new to Rust then
				1208	its type signature might look a little daunting:
				1209
				1210	```ignore
				1211	pub fn write_record<I, T>(&mut self, record: I) -> csv::Result<()>
				1212	where I: IntoIterator<Item=T>, T: AsRef<[u8]>
				1213	{
				1214	// implementation elided
				1215	}
				1216	```
				1217
				1218	To understand the type signature, we can break it down piece by piece.
				1219
				1220	1. The method takes two parameters: `self` and `record`.
				1221	2. `self` is a special parameter that corresponds to the `Writer` itself.
				1222	3. `record` is the CSV record we'd like to write. Its type is `I`, which is
				1223	a generic type.
				1224	4. In the method's `where` clause, the `I` type is constrained by the
				1225	`IntoIterator<Item=T>` bound. What that means is that `I` must satisfy the
				1226	`IntoIterator` trait. If you look at the documentation of the
				1227	[`IntoIterator` trait](https://doc.rust-lang.org/std/iter/trait.IntoIterator.html),
				1228	then we can see that it describes types that can build iterators. In this
				1229	case, we want an iterator that yields another generic type `T`, where
				1230	`T` is the type of each field we want to write.
				1231	5. `T` also appears in the method's `where` clause, but its constraint is the
				1232	`AsRef<[u8]>` bound. The `AsRef` trait is a way to describe zero cost
				1233	conversions between types in Rust. In this case, the `[u8]` in `AsRef<[u8]>`
				1234	means that we want to be able to borrow a slice of bytes from `T`.
				1235	The CSV writer will take these bytes and write them as a single field.
				1236	The `AsRef<[u8]>` bound is useful because types like `String`, `&str`,
				1237	`Vec<u8>` and `&[u8]` all satisfy it.
				1238	6. Finally, the method returns a `csv::Result<()>`, which is short-hand for
				1239	`Result<(), csv::Error>`. That means `write_record` either returns nothing
				1240	on success or returns a `csv::Error` on failure.
				1241
				1242	Now, let's apply our new found understanding of the type signature of
				1243	`write_record`. If you recall, in our previous example, we used it like so:
				1244
				1245	```ignore
				1246	wtr.write_record(&["field 1", "field 2", "etc"])?;
				1247	```
				1248
				1249	So how do the types match up? Well, the type of each of our fields in this
				1250	code is `&'static str` (which is the type of a string literal in Rust). Since
				1251	we put them in a slice literal, the type of our parameter is
				1252	`&'static [&'static str]`, or more succinctly written as `&[&str]` without the
				1253	lifetime annotations. Since slices satisfy the `IntoIterator` bound and
				1254	strings satisfy the `AsRef<[u8]>` bound, this ends up being a legal call.
				1255
				1256	Here are a few more examples of ways you can call `write_record`:
				1257
				1258	```no_run
				1259	# use csv;
				1260	# let mut wtr = csv::Writer::from_writer(vec![]);
				1261	// A slice of byte strings.
				1262	wtr.write_record(&[b"a", b"b", b"c"]);
				1263	// A vector.
				1264	wtr.write_record(vec!["a", "b", "c"]);
				1265	// A string record.
				1266	wtr.write_record(&csv::StringRecord::from(vec!["a", "b", "c"]));
				1267	// A byte record.
				1268	wtr.write_record(&csv::ByteRecord::from(vec!["a", "b", "c"]));
				1269	```
				1270
				1271	Finally, the example above can be easily adapted to write to a file instead
				1272	of `stdout`:
				1273
				1274	```no_run
				1275	//tutorial-write-02.rs
				1276	use std::env;
				1277	use std::error::Error;
				1278	use std::ffi::OsString;
				1279	use std::process;
				1280
				1281	fn run() -> Result<(), Box<dyn Error>> {
				1282	let file_path = get_first_arg()?;
				1283	let mut wtr = csv::Writer::from_path(file_path)?;
				1284
				1285	wtr.write_record(&["City", "State", "Population", "Latitude", "Longitude"])?;
				1286	wtr.write_record(&["Davidsons Landing", "AK", "", "65.2419444", "-165.2716667"])?;
				1287	wtr.write_record(&["Kenai", "AK", "7610", "60.5544444", "-151.2583333"])?;
				1288	wtr.write_record(&["Oakman", "AL", "", "33.7133333", "-87.3886111"])?;
				1289
				1290	wtr.flush()?;
				1291	Ok(())
				1292	}
				1293
				1294	/// Returns the first positional argument sent to this process. If there are no
				1295	/// positional arguments, then this returns an error.
				1296	fn get_first_arg() -> Result<OsString, Box<dyn Error>> {
				1297	match env::args_os().nth(1) {
				1298	None => Err(From::from("expected 1 argument, but got none")),
				1299	Some(file_path) => Ok(file_path),
				1300	}
				1301	}
				1302
				1303	fn main() {
				1304	if let Err(err) = run() {
				1305	println!("{}", err);
				1306	process::exit(1);
				1307	}
				1308	}
				1309	```
				1310
				1311	## Writing tab separated values
				1312
				1313	In the previous section, we saw how to write some simple CSV data to `stdout`
				1314	that looked like this:
				1315
				1316	```text
				1317	City,State,Population,Latitude,Longitude
				1318	Davidsons Landing,AK,,65.2419444,-165.2716667
				1319	Kenai,AK,7610,60.5544444,-151.2583333
				1320	Oakman,AL,,33.7133333,-87.3886111
				1321	```
				1322
				1323	You might wonder to yourself: what's the point of using a CSV writer if the
				1324	data is so simple? Well, the benefit of a CSV writer is that it can handle all
				1325	types of data without sacrificing the integrity of your data. That is, it knows
				1326	when to quote fields that contain special CSV characters (like commas or new
				1327	lines) or escape literal quotes that appear in your data. The CSV writer can
				1328	also be easily configured to use different delimiters or quoting strategies.
				1329
				1330	In this section, we'll take a look a look at how to tweak some of the settings
				1331	on a CSV writer. In particular, we'll write TSV ("tab separated values")
				1332	instead of CSV, and we'll ask the CSV writer to quote all non-numeric fields.
				1333	Here's an example:
				1334
				1335	```no_run
				1336	//tutorial-write-delimiter-01.rs
				1337	# use std::error::Error;
				1338	# use std::io;
				1339	# use std::process;
				1340	#
				1341	fn run() -> Result<(), Box<dyn Error>> {
				1342	let mut wtr = csv::WriterBuilder::new()
				1343	.delimiter(b'\t')
				1344	.quote_style(csv::QuoteStyle::NonNumeric)
				1345	.from_writer(io::stdout());
				1346
				1347	wtr.write_record(&["City", "State", "Population", "Latitude", "Longitude"])?;
				1348	wtr.write_record(&["Davidsons Landing", "AK", "", "65.2419444", "-165.2716667"])?;
				1349	wtr.write_record(&["Kenai", "AK", "7610", "60.5544444", "-151.2583333"])?;
				1350	wtr.write_record(&["Oakman", "AL", "", "33.7133333", "-87.3886111"])?;
				1351
				1352	wtr.flush()?;
				1353	Ok(())
				1354	}
				1355	#
				1356	# fn main() {
				1357	# if let Err(err) = run() {
				1358	# println!("{}", err);
				1359	# process::exit(1);
				1360	# }
				1361	# }
				1362	```
				1363
				1364	Compiling and running this example gives:
				1365
				1366	```text
				1367	$ cargo build
				1368	$ ./target/debug/csvtutor
				1369	"City" "State" "Population" "Latitude" "Longitude"
				1370	"Davidsons Landing" "AK" "" 65.2419444 -165.2716667
				1371	"Kenai" "AK" 7610 60.5544444 -151.2583333
				1372	"Oakman" "AL" "" 33.7133333 -87.3886111
				1373	```
				1374
				1375	In this example, we used a new type
				1376	[`QuoteStyle`](../enum.QuoteStyle.html).
				1377	The `QuoteStyle` type represents the different quoting strategies available
				1378	to you. The default is to add quotes to fields only when necessary. This
				1379	probably works for most use cases, but you can also ask for quotes to always
				1380	be put around fields, to never be put around fields or to always be put around
				1381	non-numeric fields.
				1382
				1383	## Writing with Serde
				1384
				1385	Just like the CSV reader supports automatic deserialization into Rust types
				1386	with Serde, the CSV writer supports automatic serialization from Rust types
				1387	into CSV records using Serde. In this section, we'll learn how to use it.
				1388
				1389	As with reading, let's start by seeing how we can serialize a Rust tuple.
				1390
				1391	```no_run
				1392	//tutorial-write-serde-01.rs
				1393	# use std::error::Error;
				1394	# use std::io;
				1395	# use std::process;
				1396	#
				1397	fn run() -> Result<(), Box<dyn Error>> {
				1398	let mut wtr = csv::Writer::from_writer(io::stdout());
				1399
				1400	// We still need to write headers manually.
				1401	wtr.write_record(&["City", "State", "Population", "Latitude", "Longitude"])?;
				1402
				1403	// But now we can write records by providing a normal Rust value.
				1404	//
				1405	// Note that the odd `None::<u64>` syntax is required because `None` on
				1406	// its own doesn't have a concrete type, but Serde needs a concrete type
				1407	// in order to serialize it. That is, `None` has type `Option<T>` but
				1408	// `None::<u64>` has type `Option<u64>`.
				1409	wtr.serialize(("Davidsons Landing", "AK", None::<u64>, 65.2419444, -165.2716667))?;
				1410	wtr.serialize(("Kenai", "AK", Some(7610), 60.5544444, -151.2583333))?;
				1411	wtr.serialize(("Oakman", "AL", None::<u64>, 33.7133333, -87.3886111))?;
				1412
				1413	wtr.flush()?;
				1414	Ok(())
				1415	}
				1416	#
				1417	# fn main() {
				1418	# if let Err(err) = run() {
				1419	# println!("{}", err);
				1420	# process::exit(1);
				1421	# }
				1422	# }
				1423	```
				1424
				1425	Compiling and running this program gives the expected output:
				1426
				1427	```text
				1428	$ cargo build
				1429	$ ./target/debug/csvtutor
				1430	City,State,Population,Latitude,Longitude
				1431	Davidsons Landing,AK,,65.2419444,-165.2716667
				1432	Kenai,AK,7610,60.5544444,-151.2583333
				1433	Oakman,AL,,33.7133333,-87.3886111
				1434	```
				1435
				1436	The key thing to note in the above example is the use of `serialize` instead
				1437	of `write_record` to write our data. In particular, `write_record` is used
				1438	when writing a simple record that contains string-like data only. On the other
				1439	hand, `serialize` is used when your data consists of more complex values like
				1440	numbers, floats or optional values. Of course, you could always convert the
				1441	complex values to strings and then use `write_record`, but Serde can do it for
				1442	you automatically.
				1443
				1444	As with reading, we can also serialize custom structs as CSV records. As a
				1445	bonus, the fields in a struct will automatically be written as a header
				1446	record!
				1447
				1448	To write custom structs as CSV records, we'll need to make use of Serde's
				1449	automatic `derive` feature again. As in the
				1450	[previous section on reading with Serde](#reading-with-serde),
				1451	we'll need to add a couple crates to our `[dependencies]` section in our
				1452	`Cargo.toml` (if they aren't already there):
				1453
				1454	```text
				1455	serde = { version = "1", features = ["derive"] }
				1456	```
				1457
				1458	And we'll also need to add a new `use` statement to our code, for Serde, as
				1459	shown in the example:
				1460
				1461	```no_run
				1462	//tutorial-write-serde-02.rs
				1463	use std::error::Error;
				1464	use std::io;
				1465	use std::process;
				1466
				1467	use serde::Serialize;
				1468
				1469	// Note that structs can derive both Serialize and Deserialize!
				1470	#[derive(Debug, Serialize)]
				1471	#[serde(rename_all = "PascalCase")]
				1472	struct Record<'a> {
				1473	city: &'a str,
				1474	state: &'a str,
				1475	population: Option<u64>,
				1476	latitude: f64,
				1477	longitude: f64,
				1478	}
				1479
				1480	fn run() -> Result<(), Box<dyn Error>> {
				1481	let mut wtr = csv::Writer::from_writer(io::stdout());
				1482
				1483	wtr.serialize(Record {
				1484	city: "Davidsons Landing",
				1485	state: "AK",
				1486	population: None,
				1487	latitude: 65.2419444,
				1488	longitude: -165.2716667,
				1489	})?;
				1490	wtr.serialize(Record {
				1491	city: "Kenai",
				1492	state: "AK",
				1493	population: Some(7610),
				1494	latitude: 60.5544444,
				1495	longitude: -151.2583333,
				1496	})?;
				1497	wtr.serialize(Record {
				1498	city: "Oakman",
				1499	state: "AL",
				1500	population: None,
				1501	latitude: 33.7133333,
				1502	longitude: -87.3886111,
				1503	})?;
				1504
				1505	wtr.flush()?;
				1506	Ok(())
				1507	}
				1508
				1509	fn main() {
				1510	if let Err(err) = run() {
				1511	println!("{}", err);
				1512	process::exit(1);
				1513	}
				1514	}
				1515	```
				1516
				1517	Compiling and running this example has the same output as last time, even
				1518	though we didn't explicitly write a header record:
				1519
				1520	```text
				1521	$ cargo build
				1522	$ ./target/debug/csvtutor
				1523	City,State,Population,Latitude,Longitude
				1524	Davidsons Landing,AK,,65.2419444,-165.2716667
				1525	Kenai,AK,7610,60.5544444,-151.2583333
				1526	Oakman,AL,,33.7133333,-87.3886111
				1527	```
				1528
				1529	In this case, the `serialize` method noticed that we were writing a struct
				1530	with field names. When this happens, `serialize` will automatically write a
				1531	header record (only if no other records have been written) that consists of
				1532	the fields in the struct in the order in which they are defined. Note that
				1533	this behavior can be disabled with the
				1534	[`WriterBuilder::has_headers`](../struct.WriterBuilder.html#method.has_headers)
				1535	method.
				1536
				1537	It's also worth pointing out the use of a lifetime parameter in our `Record`
				1538	struct:
				1539
				1540	```ignore
				1541	struct Record<'a> {
				1542	city: &'a str,
				1543	state: &'a str,
				1544	population: Option<u64>,
				1545	latitude: f64,
				1546	longitude: f64,
				1547	}
				1548	```
				1549
				1550	The `'a` lifetime parameter corresponds to the lifetime of the `city` and
				1551	`state` string slices. This says that the `Record` struct contains borrowed
				1552	data. We could have written our struct without borrowing any data, and
				1553	therefore, without any lifetime parameters:
				1554
				1555	```ignore
				1556	struct Record {
				1557	city: String,
				1558	state: String,
				1559	population: Option<u64>,
				1560	latitude: f64,
				1561	longitude: f64,
				1562	}
				1563	```
				1564
				1565	However, since we had to replace our borrowed `&str` types with owned `String`
				1566	types, we're now forced to allocate a new `String` value for both of `city`
				1567	and `state` for every record that we write. There's no intrinsic problem with
				1568	doing that, but it might be a bit wasteful.
				1569
				1570	For more examples and more details on the rules for serialization, please see
				1571	the
				1572	[`Writer::serialize`](../struct.Writer.html#method.serialize)
				1573	method.
				1574
				1575	# Pipelining
				1576
				1577	In this section, we're going to cover a few examples that demonstrate programs
				1578	that take CSV data as input, and produce possibly transformed or filtered CSV
				1579	data as output. This shows how to write a complete program that efficiently
				1580	reads and writes CSV data. Rust is well positioned to perform this task, since
				1581	you'll get great performance with the convenience of a high level CSV library.
				1582
				1583	## Filter by search
				1584
				1585	The first example of CSV pipelining we'll look at is a simple filter. It takes
				1586	as input some CSV data on stdin and a single string query as its only
				1587	positional argument, and it will produce as output CSV data that only contains
				1588	rows with a field that matches the query.
				1589
				1590	```no_run
				1591	//tutorial-pipeline-search-01.rs
				1592	use std::env;
				1593	use std::error::Error;
				1594	use std::io;
				1595	use std::process;
				1596
				1597	fn run() -> Result<(), Box<dyn Error>> {
				1598	// Get the query from the positional arguments.
				1599	// If one doesn't exist, return an error.
				1600	let query = match env::args().nth(1) {
				1601	None => return Err(From::from("expected 1 argument, but got none")),
				1602	Some(query) => query,
				1603	};
				1604
				1605	// Build CSV readers and writers to stdin and stdout, respectively.
				1606	let mut rdr = csv::Reader::from_reader(io::stdin());
				1607	let mut wtr = csv::Writer::from_writer(io::stdout());
				1608
				1609	// Before reading our data records, we should write the header record.
				1610	wtr.write_record(rdr.headers()?)?;
				1611
				1612	// Iterate over all the records in `rdr`, and write only records containing
				1613	// `query` to `wtr`.
				1614	for result in rdr.records() {
				1615	let record = result?;
				1616	if record.iter().any(\|field\| field == &query) {
				1617	wtr.write_record(&record)?;
				1618	}
				1619	}
				1620
				1621	// CSV writers use an internal buffer, so we should always flush when done.
				1622	wtr.flush()?;
				1623	Ok(())
				1624	}
				1625
				1626	fn main() {
				1627	if let Err(err) = run() {
				1628	println!("{}", err);
				1629	process::exit(1);
				1630	}
				1631	}
				1632	```
				1633
				1634	If we compile and run this program with a query of `MA` on `uspop.csv`, we'll
				1635	see that only one record matches:
				1636
				1637	```text
				1638	$ cargo build
				1639	$ ./csvtutor MA < uspop.csv
				1640	City,State,Population,Latitude,Longitude
				1641	Reading,MA,23441,42.5255556,-71.0958333
				1642	```
				1643
				1644	This example doesn't actually introduce anything new. It merely combines what
				1645	you've already learned about CSV readers and writers from previous sections.
				1646
				1647	Let's add a twist to this example. In the real world, you're often faced with
				1648	messy CSV data that might not be encoded correctly. One example you might come
				1649	across is CSV data encoded in
				1650	[Latin-1](https://en.wikipedia.org/wiki/ISO/IEC_8859-1).
				1651	Unfortunately, for the examples we've seen so far, our CSV reader assumes that
				1652	all of the data is UTF-8. Since all of the data we've worked on has been
				1653	ASCII---which is a subset of both Latin-1 and UTF-8---we haven't had any
				1654	problems. But let's introduce a slightly tweaked version of our `uspop.csv`
				1655	file that contains an encoding of a Latin-1 character that is invalid UTF-8.
				1656	You can get the data like so:
				1657
				1658	```text
				1659	$ curl -LO 'https://raw.githubusercontent.com/BurntSushi/rust-csv/master/examples/data/uspop-latin1.csv'
				1660	```
				1661
				1662	Even though I've already given away the problem, let's see what happen when
				1663	we try to run our previous example on this new data:
				1664
				1665	```text
				1666	$ ./csvtutor MA < uspop-latin1.csv
				1667	City,State,Population,Latitude,Longitude
				1668	CSV parse error: record 3 (line 4, field: 0, byte: 125): invalid utf-8: invalid UTF-8 in field 0 near byte index 0
				1669	```
				1670
				1671	The error message tells us exactly what's wrong. Let's take a look at line 4
				1672	to see what we're dealing with:
				1673
				1674	```text
				1675	$ head -n4 uspop-latin1.csv \| tail -n1
				1676	Õakman,AL,,33.7133333,-87.3886111
				1677	```
				1678
				1679	In this case, the very first character is the Latin-1 `Õ`, which is encoded as
				1680	the byte `0xD5`, which is in turn invalid UTF-8. So what do we do now that our
				1681	CSV parser has choked on our data? You have two choices. The first is to go in
				1682	and fix up your CSV data so that it's valid UTF-8. This is probably a good
				1683	idea anyway, and tools like `iconv` can help with the task of transcoding.
				1684	But if you can't or don't want to do that, then you can instead read CSV data
				1685	in a way that is mostly encoding agnostic (so long as ASCII is still a valid
				1686	subset). The trick is to use byte records instead of string records.
				1687
				1688	Thus far, we haven't actually talked much about the type of a record in this
				1689	library, but now is a good time to introduce them. There are two of them,
				1690	[`StringRecord`](../struct.StringRecord.html)
				1691	and
				1692	[`ByteRecord`](../struct.ByteRecord.html).
				1693	Each them represent a single record in CSV data, where a record is a sequence
				1694	of an arbitrary number of fields. The only difference between `StringRecord`
				1695	and `ByteRecord` is that `StringRecord` is guaranteed to be valid UTF-8, where
				1696	as `ByteRecord` contains arbitrary bytes.
				1697
				1698	Armed with that knowledge, we can now begin to understand why we saw an error
				1699	when we ran the last example on data that wasn't UTF-8. Namely, when we call
				1700	`records`, we get back an iterator of `StringRecord`. Since `StringRecord` is
				1701	guaranteed to be valid UTF-8, trying to build a `StringRecord` with invalid
				1702	UTF-8 will result in the error that we see.
				1703
				1704	All we need to do to make our example work is to switch from a `StringRecord`
				1705	to a `ByteRecord`. This means using `byte_records` to create our iterator
				1706	instead of `records`, and similarly using `byte_headers` instead of `headers`
				1707	if we think our header data might contain invalid UTF-8 as well. Here's the
				1708	change:
				1709
				1710	```no_run
				1711	//tutorial-pipeline-search-02.rs
				1712	# use std::env;
				1713	# use std::error::Error;
				1714	# use std::io;
				1715	# use std::process;
				1716	#
				1717	fn run() -> Result<(), Box<dyn Error>> {
				1718	let query = match env::args().nth(1) {
				1719	None => return Err(From::from("expected 1 argument, but got none")),
				1720	Some(query) => query,
				1721	};
				1722
				1723	let mut rdr = csv::Reader::from_reader(io::stdin());
				1724	let mut wtr = csv::Writer::from_writer(io::stdout());
				1725
				1726	wtr.write_record(rdr.byte_headers()?)?;
				1727
				1728	for result in rdr.byte_records() {
				1729	let record = result?;
				1730	// `query` is a `String` while `field` is now a `&[u8]`, so we'll
				1731	// need to convert `query` to `&[u8]` before doing a comparison.
				1732	if record.iter().any(\|field\| field == query.as_bytes()) {
				1733	wtr.write_record(&record)?;
				1734	}
				1735	}
				1736
				1737	wtr.flush()?;
				1738	Ok(())
				1739	}
				1740	#
				1741	# fn main() {
				1742	# if let Err(err) = run() {
				1743	# println!("{}", err);
				1744	# process::exit(1);
				1745	# }
				1746	# }
				1747	```
				1748
				1749	Compiling and running this now yields the same results as our first example,
				1750	but this time it works on data that isn't valid UTF-8.
				1751
				1752	```text
				1753	$ cargo build
				1754	$ ./csvtutor MA < uspop-latin1.csv
				1755	City,State,Population,Latitude,Longitude
				1756	Reading,MA,23441,42.5255556,-71.0958333
				1757	```
				1758
				1759	## Filter by population count
				1760
				1761	In this section, we will show another example program that both reads and
				1762	writes CSV data, but instead of dealing with arbitrary records, we will use
				1763	Serde to deserialize and serialize records with specific types.
				1764
				1765	For this program, we'd like to be able to filter records in our population data
				1766	by population count. Specifically, we'd like to see which records meet a
				1767	certain population threshold. In addition to using a simple inequality, we must
				1768	also account for records that have a missing population count. This is where
				1769	types like `Option<T>` come in handy, because the compiler will force us to
				1770	consider the case when the population count is missing.
				1771
				1772	Since we're using Serde in this example, don't forget to add the Serde
				1773	dependencies to your `Cargo.toml` in your `[dependencies]` section if they
				1774	aren't already there:
				1775
				1776	```text
				1777	serde = { version = "1", features = ["derive"] }
				1778	```
				1779
				1780	Now here's the code:
				1781
				1782	```no_run
				1783	//tutorial-pipeline-pop-01.rs
				1784	use std::env;
				1785	use std::error::Error;
				1786	use std::io;
				1787	use std::process;
				1788
				1789	use serde::{Deserialize, Serialize};
				1790
				1791	// Unlike previous examples, we derive both Deserialize and Serialize. This
				1792	// means we'll be able to automatically deserialize and serialize this type.
				1793	#[derive(Debug, Deserialize, Serialize)]
				1794	#[serde(rename_all = "PascalCase")]
				1795	struct Record {
				1796	city: String,
				1797	state: String,
				1798	population: Option<u64>,
				1799	latitude: f64,
				1800	longitude: f64,
				1801	}
				1802
				1803	fn run() -> Result<(), Box<dyn Error>> {
				1804	// Get the query from the positional arguments.
				1805	// If one doesn't exist or isn't an integer, return an error.
				1806	let minimum_pop: u64 = match env::args().nth(1) {
				1807	None => return Err(From::from("expected 1 argument, but got none")),
				1808	Some(arg) => arg.parse()?,
				1809	};
				1810
				1811	// Build CSV readers and writers to stdin and stdout, respectively.
				1812	// Note that we don't need to write headers explicitly. Since we're
				1813	// serializing a custom struct, that's done for us automatically.
				1814	let mut rdr = csv::Reader::from_reader(io::stdin());
				1815	let mut wtr = csv::Writer::from_writer(io::stdout());
				1816
				1817	// Iterate over all the records in `rdr`, and write only records containing
				1818	// a population that is greater than or equal to `minimum_pop`.
				1819	for result in rdr.deserialize() {
				1820	// Remember that when deserializing, we must use a type hint to
				1821	// indicate which type we want to deserialize our record into.
				1822	let record: Record = result?;
				1823
				1824	// `map_or` is a combinator on `Option`. It take two parameters:
				1825	// a value to use when the `Option` is `None` (i.e., the record has
				1826	// no population count) and a closure that returns another value of
				1827	// the same type when the `Option` is `Some`. In this case, we test it
				1828	// against our minimum population count that we got from the command
				1829	// line.
				1830	if record.population.map_or(false, \|pop\| pop >= minimum_pop) {
				1831	wtr.serialize(record)?;
				1832	}
				1833	}
				1834
				1835	// CSV writers use an internal buffer, so we should always flush when done.
				1836	wtr.flush()?;
				1837	Ok(())
				1838	}
				1839
				1840	fn main() {
				1841	if let Err(err) = run() {
				1842	println!("{}", err);
				1843	process::exit(1);
				1844	}
				1845	}
				1846	```
				1847
				1848	If we compile and run our program with a minimum threshold of `100000`, we
				1849	should see three matching records. Notice that the headers were added even
				1850	though we never explicitly wrote them!
				1851
				1852	```text
				1853	$ cargo build
				1854	$ ./target/debug/csvtutor 100000 < uspop.csv
				1855	City,State,Population,Latitude,Longitude
				1856	Fontana,CA,169160,34.0922222,-117.4341667
				1857	Bridgeport,CT,139090,41.1669444,-73.2052778
				1858	Indianapolis,IN,773283,39.7683333,-86.1580556
				1859	```
				1860
				1861	# Performance
				1862
				1863	In this section, we'll go over how to squeeze the most juice out of our CSV
				1864	reader. As it happens, most of the APIs we've seen so far were designed with
				1865	high level convenience in mind, and that often comes with some costs. For the
				1866	most part, those costs revolve around unnecessary allocations. Therefore, most
				1867	of the section will show how to do CSV parsing with as little allocation as
				1868	possible.
				1869
				1870	There are two critical preliminaries we must cover.
				1871
				1872	Firstly, when you care about performance, you should compile your code
				1873	with `cargo build --release` instead of `cargo build`. The `--release`
				1874	flag instructs the compiler to spend more time optimizing your code. When
				1875	compiling with the `--release` flag, you'll find your compiled program at
				1876	`target/release/csvtutor` instead of `target/debug/csvtutor`. Throughout this
				1877	tutorial, we've used `cargo build` because our dataset was small and we weren't
				1878	focused on speed. The downside of `cargo build --release` is that it will take
				1879	longer than `cargo build`.
				1880
				1881	Secondly, the dataset we've used throughout this tutorial only has 100 records.
				1882	We'd have to try really hard to cause our program to run slowly on 100 records,
				1883	even when we compile without the `--release` flag. Therefore, in order to
				1884	actually witness a performance difference, we need a bigger dataset. To get
				1885	such a dataset, we'll use the original source of `uspop.csv`. **Warning: the
				1886	download is 41MB compressed and decompresses to 145MB.**
				1887
				1888	```text
				1889	$ curl -LO http://burntsushi.net/stuff/worldcitiespop.csv.gz
				1890	$ gunzip worldcitiespop.csv.gz
				1891	$ wc worldcitiespop.csv
				1892	3173959 5681543 151492068 worldcitiespop.csv
				1893	$ md5sum worldcitiespop.csv
				1894	6198bd180b6d6586626ecbf044c1cca5 worldcitiespop.csv
				1895	```
				1896
				1897	Finally, it's worth pointing out that this section is not attempting to
				1898	present a rigorous set of benchmarks. We will stay away from rigorous analysis
				1899	and instead rely a bit more on wall clock times and intuition.
				1900
				1901	## Amortizing allocations
				1902
				1903	In order to measure performance, we must be careful about what it is we're
				1904	measuring. We must also be careful to not change the thing we're measuring as
				1905	we make improvements to the code. For this reason, we will focus on measuring
				1906	how long it takes to count the number of records corresponding to city
				1907	population counts in Massachusetts. This represents a very small amount of work
				1908	that requires us to visit every record, and therefore represents a decent way
				1909	to measure how long it takes to do CSV parsing.
				1910
				1911	Before diving into our first optimization, let's start with a baseline by
				1912	adapting a previous example to count the number of records in
				1913	`worldcitiespop.csv`:
				1914
				1915	```no_run
				1916	//tutorial-perf-alloc-01.rs
				1917	use std::error::Error;
				1918	use std::io;
				1919	use std::process;
				1920
				1921	fn run() -> Result<u64, Box<dyn Error>> {
				1922	let mut rdr = csv::Reader::from_reader(io::stdin());
				1923
				1924	let mut count = 0;
				1925	for result in rdr.records() {
				1926	let record = result?;
				1927	if &record[0] == "us" && &record[3] == "MA" {
				1928	count += 1;
				1929	}
				1930	}
				1931	Ok(count)
				1932	}
				1933
				1934	fn main() {
				1935	match run() {
				1936	Ok(count) => {
				1937	println!("{}", count);
				1938	}
				1939	Err(err) => {
				1940	println!("{}", err);
				1941	process::exit(1);
				1942	}
				1943	}
				1944	}
				1945	```
				1946
				1947	Now let's compile and run it and see what kind of timing we get. Don't forget
				1948	to compile with the `--release` flag. (For grins, try compiling without the
				1949	`--release` flag and see how long it takes to run the program!)
				1950
				1951	```text
				1952	$ cargo build --release
				1953	$ time ./target/release/csvtutor < worldcitiespop.csv
				1954	2176
				1955
				1956	real 0m0.645s
				1957	user 0m0.627s
				1958	sys 0m0.017s
				1959	```
				1960
				1961	All right, so what's the first thing we can do to make this faster? This
				1962	section promised to speed things up by amortizing allocation, but we can do
				1963	something even simpler first: iterate over
				1964	[`ByteRecord`](../struct.ByteRecord.html)s
				1965	instead of
				1966	[`StringRecord`](../struct.StringRecord.html)s.
				1967	If you recall from a previous section, a `StringRecord` is guaranteed to be
				1968	valid UTF-8, and therefore must validate that its contents is actually UTF-8.
				1969	(If validation fails, then the CSV reader will return an error.) If we remove
				1970	that validation from our program, then we can realize a nice speed boost as
				1971	shown in the next example:
				1972
				1973	```no_run
				1974	//tutorial-perf-alloc-02.rs
				1975	# use std::error::Error;
				1976	# use std::io;
				1977	# use std::process;
				1978	#
				1979	fn run() -> Result<u64, Box<dyn Error>> {
				1980	let mut rdr = csv::Reader::from_reader(io::stdin());
				1981
				1982	let mut count = 0;
				1983	for result in rdr.byte_records() {
				1984	let record = result?;
				1985	if &record[0] == b"us" && &record[3] == b"MA" {
				1986	count += 1;
				1987	}
				1988	}
				1989	Ok(count)
				1990	}
				1991	#
				1992	# fn main() {
				1993	# match run() {
				1994	# Ok(count) => {
				1995	# println!("{}", count);
				1996	# }
				1997	# Err(err) => {
				1998	# println!("{}", err);
				1999	# process::exit(1);
				2000	# }
				2001	# }
				2002	# }
				2003	```
				2004
				2005	And now compile and run:
				2006
				2007	```text
				2008	$ cargo build --release
				2009	$ time ./target/release/csvtutor < worldcitiespop.csv
				2010	2176
				2011
				2012	real 0m0.429s
				2013	user 0m0.403s
				2014	sys 0m0.023s
				2015	```
				2016
				2017	Our program is now approximately 30% faster, all because we removed UTF-8
				2018	validation. But was it actually okay to remove UTF-8 validation? What have we
				2019	lost? In this case, it is perfectly acceptable to drop UTF-8 validation and use
				2020	`ByteRecord` instead because all we're doing with the data in the record is
				2021	comparing two of its fields to raw bytes:
				2022
				2023	```ignore
				2024	if &record[0] == b"us" && &record[3] == b"MA" {
				2025	count += 1;
				2026	}
				2027	```
				2028
				2029	In particular, it doesn't matter whether `record` is valid UTF-8 or not, since
				2030	we're checking for equality on the raw bytes themselves.
				2031
				2032	UTF-8 validation via `StringRecord` is useful because it provides access to
				2033	fields as `&str` types, where as `ByteRecord` provides fields as `&[u8]` types.
				2034	`&str` is the type of a borrowed string in Rust, which provides convenient
				2035	access to string APIs like substring search. Strings are also frequently used
				2036	in other areas, so they tend to be a useful thing to have. Therefore, sticking
				2037	with `StringRecord` is a good default, but if you need the extra speed and can
				2038	deal with arbitrary bytes, then switching to `ByteRecord` might be a good idea.
				2039
				2040	Moving on, let's try to get another speed boost by amortizing allocation.
				2041	Amortizing allocation is the technique that creates an allocation once (or
				2042	very rarely), and then attempts to reuse it instead of creating additional
				2043	allocations. In the case of the previous examples, we used iterators created
				2044	by the `records` and `byte_records` methods on a CSV reader. These iterators
				2045	allocate a new record for every item that it yields, which in turn corresponds
				2046	to a new allocation. It does this because iterators cannot yield items that
				2047	borrow from the iterator itself, and because creating new allocations tends to
				2048	be a lot more convenient.
				2049
				2050	If we're willing to forgo use of iterators, then we can amortize allocations
				2051	by creating a single `ByteRecord` and asking the CSV reader to read into it.
				2052	We do this by using the
				2053	[`Reader::read_byte_record`](../struct.Reader.html#method.read_byte_record)
				2054	method.
				2055
				2056	```no_run
				2057	//tutorial-perf-alloc-03.rs
				2058	# use std::error::Error;
				2059	# use std::io;
				2060	# use std::process;
				2061	#
				2062	fn run() -> Result<u64, Box<dyn Error>> {
				2063	let mut rdr = csv::Reader::from_reader(io::stdin());
				2064	let mut record = csv::ByteRecord::new();
				2065
				2066	let mut count = 0;
				2067	while rdr.read_byte_record(&mut record)? {
				2068	if &record[0] == b"us" && &record[3] == b"MA" {
				2069	count += 1;
				2070	}
				2071	}
				2072	Ok(count)
				2073	}
				2074	#
				2075	# fn main() {
				2076	# match run() {
				2077	# Ok(count) => {
				2078	# println!("{}", count);
				2079	# }
				2080	# Err(err) => {
				2081	# println!("{}", err);
				2082	# process::exit(1);
				2083	# }
				2084	# }
				2085	# }
				2086	```
				2087
				2088	Compile and run:
				2089
				2090	```text
				2091	$ cargo build --release
				2092	$ time ./target/release/csvtutor < worldcitiespop.csv
				2093	2176
				2094
				2095	real 0m0.308s
				2096	user 0m0.283s
				2097	sys 0m0.023s
				2098	```
				2099
				2100	Woohoo! This represents another 30% boost over the previous example, which is
				2101	a 50% boost over the first example.
				2102
				2103	Let's dissect this code by taking a look at the type signature of the
				2104	`read_byte_record` method:
				2105
				2106	```ignore
				2107	fn read_byte_record(&mut self, record: &mut ByteRecord) -> csv::Result<bool>;
				2108	```
				2109
				2110	This method takes as input a CSV reader (the `self` parameter) and a *mutable
				2111	borrow* of a `ByteRecord`, and returns a `csv::Result<bool>`. (The
				2112	`csv::Result<bool>` is equivalent to `Result<bool, csv::Error>`.) The return
				2113	value is `true` if and only if a record was read. When it's `false`, that means
				2114	the reader has exhausted its input. This method works by copying the contents
				2115	of the next record into the provided `ByteRecord`. Since the same `ByteRecord`
				2116	is used to read every record, it will already have space allocated for data.
				2117	When `read_byte_record` runs, it will overwrite the contents that were there
				2118	with the new record, which means that it can reuse the space that was
				2119	allocated. Thus, we have amortized allocation.
				2120
				2121	An exercise you might consider doing is to use a `StringRecord` instead of a
				2122	`ByteRecord`, and therefore
				2123	[`Reader::read_record`](../struct.Reader.html#method.read_record)
				2124	instead of `read_byte_record`. This will give you easy access to Rust strings
				2125	at the cost of UTF-8 validation but without the cost of allocating a new
				2126	`StringRecord` for every record.
				2127
				2128	## Serde and zero allocation
				2129
				2130	In this section, we are going to briefly examine how we use Serde and what we
				2131	can do to speed it up. The key optimization we'll want to make is to---you
				2132	guessed it---amortize allocation.
				2133
				2134	As with the previous section, let's start with a simple baseline based off an
				2135	example using Serde in a previous section:
				2136
				2137	```no_run
				2138	//tutorial-perf-serde-01.rs
				2139	use std::error::Error;
				2140	use std::io;
				2141	use std::process;
				2142
				2143	use serde::Deserialize;
				2144
				2145	#[derive(Debug, Deserialize)]
				2146	#[serde(rename_all = "PascalCase")]
				2147	struct Record {
				2148	country: String,
				2149	city: String,
				2150	accent_city: String,
				2151	region: String,
				2152	population: Option<u64>,
				2153	latitude: f64,
				2154	longitude: f64,
				2155	}
				2156
				2157	fn run() -> Result<u64, Box<dyn Error>> {
				2158	let mut rdr = csv::Reader::from_reader(io::stdin());
				2159
				2160	let mut count = 0;
				2161	for result in rdr.deserialize() {
				2162	let record: Record = result?;
				2163	if record.country == "us" && record.region == "MA" {
				2164	count += 1;
				2165	}
				2166	}
				2167	Ok(count)
				2168	}
				2169
				2170	fn main() {
				2171	match run() {
				2172	Ok(count) => {
				2173	println!("{}", count);
				2174	}
				2175	Err(err) => {
				2176	println!("{}", err);
				2177	process::exit(1);
				2178	}
				2179	}
				2180	}
				2181	```
				2182
				2183	Now compile and run this program:
				2184
				2185	```text
				2186	$ cargo build --release
				2187	$ ./target/release/csvtutor < worldcitiespop.csv
				2188	2176
				2189
				2190	real 0m1.381s
				2191	user 0m1.367s
				2192	sys 0m0.013s
				2193	```
				2194
				2195	The first thing you might notice is that this is quite a bit slower than our
				2196	programs in the previous section. This is because deserializing each record
				2197	has a certain amount of overhead to it. In particular, some of the fields need
				2198	to be parsed as integers or floating point numbers, which isn't free. However,
				2199	there is hope yet, because we can speed up this program!
				2200
				2201	Our first attempt to speed up the program will be to amortize allocation. Doing
				2202	this with Serde is a bit trickier than before, because we need to change our
				2203	`Record` type and use the manual deserialization API. Let's see what that looks
				2204	like:
				2205
				2206	```no_run
				2207	//tutorial-perf-serde-02.rs
				2208	# use std::error::Error;
				2209	# use std::io;
				2210	# use std::process;
				2211	#
				2212	# use serde::Deserialize;
				2213	#
				2214	#[derive(Debug, Deserialize)]
				2215	#[serde(rename_all = "PascalCase")]
				2216	struct Record<'a> {
				2217	country: &'a str,
				2218	city: &'a str,
				2219	accent_city: &'a str,
				2220	region: &'a str,
				2221	population: Option<u64>,
				2222	latitude: f64,
				2223	longitude: f64,
				2224	}
				2225
				2226	fn run() -> Result<u64, Box<dyn Error>> {
				2227	let mut rdr = csv::Reader::from_reader(io::stdin());
				2228	let mut raw_record = csv::StringRecord::new();
				2229	let headers = rdr.headers()?.clone();
				2230
				2231	let mut count = 0;
				2232	while rdr.read_record(&mut raw_record)? {
				2233	let record: Record = raw_record.deserialize(Some(&headers))?;
				2234	if record.country == "us" && record.region == "MA" {
				2235	count += 1;
				2236	}
				2237	}
				2238	Ok(count)
				2239	}
				2240	#
				2241	# fn main() {
				2242	# match run() {
				2243	# Ok(count) => {
				2244	# println!("{}", count);
				2245	# }
				2246	# Err(err) => {
				2247	# println!("{}", err);
				2248	# process::exit(1);
				2249	# }
				2250	# }
				2251	# }
				2252	```
				2253
				2254	Compile and run:
				2255
				2256	```text
				2257	$ cargo build --release
				2258	$ ./target/release/csvtutor < worldcitiespop.csv
				2259	2176
				2260
				2261	real 0m1.055s
				2262	user 0m1.040s
				2263	sys 0m0.013s
				2264	```
				2265
				2266	This corresponds to an approximately 24% increase in performance. To achieve
				2267	this, we had to make two important changes.
				2268
				2269	The first was to make our `Record` type contain `&str` fields instead of
				2270	`String` fields. If you recall from a previous section, `&str` is a borrowed
				2271	string where a `String` is an owned string. A borrowed string points to
				2272	a already existing allocation where as a `String` always implies a new
				2273	allocation. In this case, our `&str` is borrowing from the CSV record itself.
				2274
				2275	The second change we had to make was to stop using the
				2276	[`Reader::deserialize`](../struct.Reader.html#method.deserialize)
				2277	iterator, and instead deserialize our record into a `StringRecord` explicitly
				2278	and then use the
				2279	[`StringRecord::deserialize`](../struct.StringRecord.html#method.deserialize)
				2280	method to deserialize a single record.
				2281
				2282	The second change is a bit tricky, because in order for it to work, our
				2283	`Record` type needs to borrow from the data inside the `StringRecord`. That
				2284	means that our `Record` value cannot outlive the `StringRecord` that it was
				2285	created from. Since we overwrite the same `StringRecord` on each iteration
				2286	(in order to amortize allocation), that means our `Record` value must evaporate
				2287	before the next iteration of the loop. Indeed, the compiler will enforce this!
				2288
				2289	There is one more optimization we can make: remove UTF-8 validation. In
				2290	general, this means using `&[u8]` instead of `&str` and `ByteRecord` instead
				2291	of `StringRecord`:
				2292
				2293	```no_run
				2294	//tutorial-perf-serde-03.rs
				2295	# use std::error::Error;
				2296	# use std::io;
				2297	# use std::process;
				2298	#
				2299	# use serde::Deserialize;
				2300	#
				2301	#[derive(Debug, Deserialize)]
				2302	#[serde(rename_all = "PascalCase")]
				2303	struct Record<'a> {
				2304	country: &'a [u8],
				2305	city: &'a [u8],
				2306	accent_city: &'a [u8],
				2307	region: &'a [u8],
				2308	population: Option<u64>,
				2309	latitude: f64,
				2310	longitude: f64,
				2311	}
				2312
				2313	fn run() -> Result<u64, Box<dyn Error>> {
				2314	let mut rdr = csv::Reader::from_reader(io::stdin());
				2315	let mut raw_record = csv::ByteRecord::new();
				2316	let headers = rdr.byte_headers()?.clone();
				2317
				2318	let mut count = 0;
				2319	while rdr.read_byte_record(&mut raw_record)? {
				2320	let record: Record = raw_record.deserialize(Some(&headers))?;
				2321	if record.country == b"us" && record.region == b"MA" {
				2322	count += 1;
				2323	}
				2324	}
				2325	Ok(count)
				2326	}
				2327	#
				2328	# fn main() {
				2329	# match run() {
				2330	# Ok(count) => {
				2331	# println!("{}", count);
				2332	# }
				2333	# Err(err) => {
				2334	# println!("{}", err);
				2335	# process::exit(1);
				2336	# }
				2337	# }
				2338	# }
				2339	```
				2340
				2341	Compile and run:
				2342
				2343	```text
				2344	$ cargo build --release
				2345	$ ./target/release/csvtutor < worldcitiespop.csv
				2346	2176
				2347
				2348	real 0m0.873s
				2349	user 0m0.850s
				2350	sys 0m0.023s
				2351	```
				2352
				2353	This corresponds to a 17% increase over the previous example and a 37% increase
				2354	over the first example.
				2355
				2356	In sum, Serde parsing is still quite fast, but will generally not be the
				2357	fastest way to parse CSV since it necessarily needs to do more work.
				2358
				2359	## CSV parsing without the standard library
				2360
				2361	In this section, we will explore a niche use case: parsing CSV without the
				2362	standard library. While the `csv` crate itself requires the standard library,
				2363	the underlying parser is actually part of the
				2364	[`csv-core`](https://docs.rs/csv-core)
				2365	crate, which does not depend on the standard library. The downside of not
				2366	depending on the standard library is that CSV parsing becomes a lot more
				2367	inconvenient.
				2368
				2369	The `csv-core` crate is structured similarly to the `csv` crate. There is a
				2370	[`Reader`](../../csv_core/struct.Reader.html)
				2371	and a
				2372	[`Writer`](../../csv_core/struct.Writer.html),
				2373	as well as corresponding builders
				2374	[`ReaderBuilder`](../../csv_core/struct.ReaderBuilder.html)
				2375	and
				2376	[`WriterBuilder`](../../csv_core/struct.WriterBuilder.html).
				2377	The `csv-core` crate has no record types or iterators. Instead, CSV data
				2378	can either be read one field at a time or one record at a time. In this
				2379	section, we'll focus on reading a field at a time since it is simpler, but it
				2380	is generally faster to read a record at a time since it does more work per
				2381	function call.
				2382
				2383	In keeping with this section on performance, let's write a program using only
				2384	`csv-core` that counts the number of records in the state of Massachusetts.
				2385
				2386	(Note that we unfortunately use the standard library in this example even
				2387	though `csv-core` doesn't technically require it. We do this for convenient
				2388	access to I/O, which would be harder without the standard library.)
				2389
				2390	```no_run
				2391	//tutorial-perf-core-01.rs
				2392	use std::io::{self, Read};
				2393	use std::process;
				2394
				2395	use csv_core::{Reader, ReadFieldResult};
				2396
				2397	fn run(mut data: &[u8]) -> Option<u64> {
				2398	let mut rdr = Reader::new();
				2399
				2400	// Count the number of records in Massachusetts.
				2401	let mut count = 0;
				2402	// Indicates the current field index. Reset to 0 at start of each record.
				2403	let mut fieldidx = 0;
				2404	// True when the current record is in the United States.
				2405	let mut inus = false;
				2406	// Buffer for field data. Must be big enough to hold the largest field.
				2407	let mut field = [0; 1024];
				2408	loop {
				2409	// Attempt to incrementally read the next CSV field.
				2410	let (result, nread, nwrite) = rdr.read_field(data, &mut field);
				2411	// nread is the number of bytes read from our input. We should never
				2412	// pass those bytes to read_field again.
				2413	data = &data[nread..];
				2414	// nwrite is the number of bytes written to the output buffer `field`.
				2415	// The contents of the buffer after this point is unspecified.
				2416	let field = &field[..nwrite];
				2417
				2418	match result {
				2419	// We don't need to handle this case because we read all of the
				2420	// data up front. If we were reading data incrementally, then this
				2421	// would be a signal to read more.
				2422	ReadFieldResult::InputEmpty => {}
				2423	// If we get this case, then we found a field that contains more
				2424	// than 1024 bytes. We keep this example simple and just fail.
				2425	ReadFieldResult::OutputFull => {
				2426	return None;
				2427	}
				2428	// This case happens when we've successfully read a field. If the
				2429	// field is the last field in a record, then `record_end` is true.
				2430	ReadFieldResult::Field { record_end } => {
				2431	if fieldidx == 0 && field == b"us" {
				2432	inus = true;
				2433	} else if inus && fieldidx == 3 && field == b"MA" {
				2434	count += 1;
				2435	}
				2436	if record_end {
				2437	fieldidx = 0;
				2438	inus = false;
				2439	} else {
				2440	fieldidx += 1;
				2441	}
				2442	}
				2443	// This case happens when the CSV reader has successfully exhausted
				2444	// all input.
				2445	ReadFieldResult::End => {
				2446	break;
				2447	}
				2448	}
				2449	}
				2450	Some(count)
				2451	}
				2452
				2453	fn main() {
				2454	// Read the entire contents of stdin up front.
				2455	let mut data = vec![];
				2456	if let Err(err) = io::stdin().read_to_end(&mut data) {
				2457	println!("{}", err);
				2458	process::exit(1);
				2459	}
				2460	match run(&data) {
				2461	None => {
				2462	println!("error: could not count records, buffer too small");
				2463	process::exit(1);
				2464	}
				2465	Some(count) => {
				2466	println!("{}", count);
				2467	}
				2468	}
				2469	}
				2470	```
				2471
				2472	And compile and run it:
				2473
				2474	```text
				2475	$ cargo build --release
				2476	$ time ./target/release/csvtutor < worldcitiespop.csv
				2477	2176
				2478
				2479	real 0m0.572s
				2480	user 0m0.513s
				2481	sys 0m0.057s
				2482	```
				2483
				2484	This isn't as fast as some of our previous examples where we used the `csv`
				2485	crate to read into a `StringRecord` or a `ByteRecord`. This is mostly because
				2486	this example reads a field at a time, which incurs more overhead than reading a
				2487	record at a time. To fix this, you would want to use the
				2488	[`Reader::read_record`](../../csv_core/struct.Reader.html#method.read_record)
				2489	method instead, which is defined on `csv_core::Reader`.
				2490
				2491	The other thing to notice here is that the example is considerably longer than
				2492	the other examples. This is because we need to do more book keeping to keep
				2493	track of which field we're reading and how much data we've already fed to the
				2494	reader. There are basically two reasons to use the `csv_core` crate:
				2495
				2496	1. If you're in an environment where the standard library is not usable.
				2497	2. If you wanted to build your own csv-like library, you could build it on top
				2498	of `csv-core`.
				2499
				2500	# Closing thoughts
				2501
				2502	Congratulations on making it to the end! It seems incredible that one could
				2503	write so many words on something as basic as CSV parsing. I wanted this
				2504	guide to be accessible not only to Rust beginners, but to inexperienced
				2505	programmers as well. My hope is that the large number of examples will help
				2506	push you in the right direction.
				2507
				2508	With that said, here are a few more things you might want to look at:
				2509
				2510	* The [API documentation for the `csv` crate](../index.html) documents all
				2511	facets of the library, and is itself littered with even more examples.
				2512	* The [`csv-index` crate](https://docs.rs/csv-index) provides data structures
				2513	that can index CSV data that are amenable to writing to disk. (This library
				2514	is still a work in progress.)
				2515	* The [`xsv` command line tool](https://github.com/BurntSushi/xsv) is a high
				2516	performance CSV swiss army knife. It can slice, select, search, sort, join,
				2517	concatenate, index, format and compute statistics on arbitrary CSV data. Give
				2518	it a try!
				2519
				2520	*/