Blame - Documentation/bcache.txt - kernel/msm-4.9

blob: a9259b562d5c8b188576f22cfd9ed532ad522d79 [file] [log] [blame]

Marc MERLIN	c9b2ffc	2016-03-11 23:04:19 -0800	[diff] [blame]	1	Say you've got a big slow raid 6, and an ssd or three. Wouldn't it be
Kent Overstreet	cafe563	2013-03-23 16:11:31 -0700	[diff] [blame]	2	nice if you could use them as cache... Hence bcache.
				3
				4	Wiki and git repositories are at:
				5	http://bcache.evilpiepirate.org
				6	http://evilpiepirate.org/git/linux-bcache.git
				7	http://evilpiepirate.org/git/bcache-tools.git
				8
				9	It's designed around the performance characteristics of SSDs - it only allocates
				10	in erase block sized buckets, and it uses a hybrid btree/log to track cached
Marc MERLIN	c9b2ffc	2016-03-11 23:04:19 -0800	[diff] [blame]	11	extents (which can be anywhere from a single sector to the bucket size). It's
Kent Overstreet	cafe563	2013-03-23 16:11:31 -0700	[diff] [blame]	12	designed to avoid random writes at all costs; it fills up an erase block
				13	sequentially, then issues a discard before reusing it.
				14
				15	Both writethrough and writeback caching are supported. Writeback defaults to
				16	off, but can be switched on and off arbitrarily at runtime. Bcache goes to
				17	great lengths to protect your data - it reliably handles unclean shutdown. (It
				18	doesn't even have a notion of a clean shutdown; bcache simply doesn't return
				19	writes as completed until they're on stable storage).
				20
				21	Writeback caching can use most of the cache for buffering writes - writing
				22	dirty data to the backing device is always done sequentially, scanning from the
				23	start to the end of the index.
				24
				25	Since random IO is what SSDs excel at, there generally won't be much benefit
				26	to caching large sequential IO. Bcache detects sequential IO and skips it;
				27	it also keeps a rolling average of the IO sizes per task, and as long as the
				28	average is above the cutoff it will skip all IO from that task - instead of
				29	caching the first 512k after every seek. Backups and large file copies should
				30	thus entirely bypass the cache.
				31
				32	In the event of a data IO error on the flash it will try to recover by reading
				33	from disk or invalidating cache entries. For unrecoverable errors (meta data
				34	or dirty data), caching is automatically disabled; if dirty data was present
				35	in the cache it first disables writeback caching and waits for all dirty data
				36	to be flushed.
				37
				38	Getting started:
				39	You'll need make-bcache from the bcache-tools repository. Both the cache device
				40	and backing device must be formatted before use.
				41	make-bcache -B /dev/sdb
				42	make-bcache -C /dev/sdc
				43
				44	make-bcache has the ability to format multiple devices at the same time - if
				45	you format your backing devices and cache device at the same time, you won't
				46	have to manually attach:
				47	make-bcache -B /dev/sda /dev/sdb -C /dev/sdc
				48
Gabriel de Perthuis	cecd628	2013-06-27 02:12:07 +0200	[diff] [blame]	49	bcache-tools now ships udev rules, and bcache devices are known to the kernel
				50	immediately. Without udev, you can manually register devices like this:
Kent Overstreet	cafe563	2013-03-23 16:11:31 -0700	[diff] [blame]	51
				52	echo /dev/sdb > /sys/fs/bcache/register
				53	echo /dev/sdc > /sys/fs/bcache/register
				54
Gabriel de Perthuis	cecd628	2013-06-27 02:12:07 +0200	[diff] [blame]	55	Registering the backing device makes the bcache device show up in /dev; you can
				56	now format it and use it as normal. But the first time using a new bcache
				57	device, it'll be running in passthrough mode until you attach it to a cache.
Marc MERLIN	c9b2ffc	2016-03-11 23:04:19 -0800	[diff] [blame]	58	If you are thinking about using bcache later, it is recommended to setup all your
				59	slow devices as bcache backing devices without a cache, and you can choose to add
				60	a caching device later.
				61	See 'ATTACHING' section below.
Kent Overstreet	cafe563	2013-03-23 16:11:31 -0700	[diff] [blame]	62
Gabriel de Perthuis	cecd628	2013-06-27 02:12:07 +0200	[diff] [blame]	63	The devices show up as:
Kent Overstreet	cafe563	2013-03-23 16:11:31 -0700	[diff] [blame]	64
Gabriel de Perthuis	cecd628	2013-06-27 02:12:07 +0200	[diff] [blame]	65	/dev/bcache<N>
Kent Overstreet	cafe563	2013-03-23 16:11:31 -0700	[diff] [blame]	66
Gabriel de Perthuis	cecd628	2013-06-27 02:12:07 +0200	[diff] [blame]	67	As well as (with udev):
Kent Overstreet	cafe563	2013-03-23 16:11:31 -0700	[diff] [blame]	68
Gabriel de Perthuis	cecd628	2013-06-27 02:12:07 +0200	[diff] [blame]	69	/dev/bcache/by-uuid/<uuid>
				70	/dev/bcache/by-label/<label>
				71
				72	To get started:
Kent Overstreet	cafe563	2013-03-23 16:11:31 -0700	[diff] [blame]	73
				74	mkfs.ext4 /dev/bcache0
				75	mount /dev/bcache0 /mnt
				76
Gabriel de Perthuis	cecd628	2013-06-27 02:12:07 +0200	[diff] [blame]	77	You can control bcache devices through sysfs at /sys/block/bcache<N>/bcache .
Marc MERLIN	c9b2ffc	2016-03-11 23:04:19 -0800	[diff] [blame]	78	You can also control them through /sys/fs//bcache/<cset-uuid>/ .
Gabriel de Perthuis	cecd628	2013-06-27 02:12:07 +0200	[diff] [blame]	79
Kent Overstreet	cafe563	2013-03-23 16:11:31 -0700	[diff] [blame]	80	Cache devices are managed as sets; multiple caches per set isn't supported yet
				81	but will allow for mirroring of metadata and dirty data in the future. Your new
				82	cache set shows up as /sys/fs/bcache/<UUID>
				83
Marc MERLIN	c9b2ffc	2016-03-11 23:04:19 -0800	[diff] [blame]	84	ATTACHING
				85	---------
Kent Overstreet	cafe563	2013-03-23 16:11:31 -0700	[diff] [blame]	86
				87	After your cache device and backing device are registered, the backing device
				88	must be attached to your cache set to enable caching. Attaching a backing
				89	device to a cache set is done thusly, with the UUID of the cache set in
				90	/sys/fs/bcache:
				91
Gabriel de Perthuis	cecd628	2013-06-27 02:12:07 +0200	[diff] [blame]	92	echo <CSET-UUID> > /sys/block/bcache0/bcache/attach
Kent Overstreet	cafe563	2013-03-23 16:11:31 -0700	[diff] [blame]	93
				94	This only has to be done once. The next time you reboot, just reregister all
				95	your bcache devices. If a backing device has data in a cache somewhere, the
Gabriel de Perthuis	cecd628	2013-06-27 02:12:07 +0200	[diff] [blame]	96	/dev/bcache<N> device won't be created until the cache shows up - particularly
Kent Overstreet	cafe563	2013-03-23 16:11:31 -0700	[diff] [blame]	97	important if you have writeback caching turned on.
				98
				99	If you're booting up and your cache device is gone and never coming back, you
				100	can force run the backing device:
				101
				102	echo 1 > /sys/block/sdb/bcache/running
				103
				104	(You need to use /sys/block/sdb (or whatever your backing device is called), not
				105	/sys/block/bcache0, because bcache0 doesn't exist yet. If you're using a
				106	partition, the bcache directory would be at /sys/block/sdb/sdb2/bcache)
				107
				108	The backing device will still use that cache set if it shows up in the future,
				109	but all the cached data will be invalidated. If there was dirty data in the
				110	cache, don't expect the filesystem to be recoverable - you will have massive
				111	filesystem corruption, though ext4's fsck does work miracles.
				112
Marc MERLIN	c9b2ffc	2016-03-11 23:04:19 -0800	[diff] [blame]	113	ERROR HANDLING
				114	--------------
Kent Overstreet	7b41b51	2013-03-27 12:24:17 -0700	[diff] [blame]	115
				116	Bcache tries to transparently handle IO errors to/from the cache device without
				117	affecting normal operation; if it sees too many errors (the threshold is
				118	configurable, and defaults to 0) it shuts down the cache device and switches all
				119	the backing devices to passthrough mode.
				120
				121	- For reads from the cache, if they error we just retry the read from the
				122	backing device.
				123
				124	- For writethrough writes, if the write to the cache errors we just switch to
				125	invalidating the data at that lba in the cache (i.e. the same thing we do for
				126	a write that bypasses the cache)
				127
				128	- For writeback writes, we currently pass that error back up to the
				129	filesystem/userspace. This could be improved - we could retry it as a write
				130	that skips the cache so we don't have to error the write.
				131
				132	- When we detach, we first try to flush any dirty data (if we were running in
				133	writeback mode). It currently doesn't do anything intelligent if it fails to
				134	read some of the dirty data, though.
				135
Marc MERLIN	c9b2ffc	2016-03-11 23:04:19 -0800	[diff] [blame]	136
				137	HOWTO/COOKBOOK
				138	--------------
				139
Eric Wheeler	c0b8c9a	2016-03-11 23:43:47 -0800	[diff] [blame]	140	A) Starting a bcache with a missing caching device
Marc MERLIN	c9b2ffc	2016-03-11 23:04:19 -0800	[diff] [blame]	141
Eric Wheeler	c0b8c9a	2016-03-11 23:43:47 -0800	[diff] [blame]	142	If registering the backing device doesn't help, it's already there, you just need
Marc MERLIN	c9b2ffc	2016-03-11 23:04:19 -0800	[diff] [blame]	143	to force it to run without the cache:
Eric Wheeler	c0b8c9a	2016-03-11 23:43:47 -0800	[diff] [blame]	144	host:~# echo /dev/sdb1 > /sys/fs/bcache/register
				145	[ 119.844831] bcache: register_bcache() error opening /dev/sdb1: device already registered
Marc MERLIN	c9b2ffc	2016-03-11 23:04:19 -0800	[diff] [blame]	146
Eric Wheeler	c0b8c9a	2016-03-11 23:43:47 -0800	[diff] [blame]	147	Next, you try to register your caching device if it's present. However
				148	if it's absent, or registration fails for some reason, you can still
				149	start your bcache without its cache, like so:
				150	host:/sys/block/sdb/sdb1/bcache# echo 1 > running
				151
				152	Note that this may cause data loss if you were running in writeback mode.
Marc MERLIN	c9b2ffc	2016-03-11 23:04:19 -0800	[diff] [blame]	153
				154
Eric Wheeler	c0b8c9a	2016-03-11 23:43:47 -0800	[diff] [blame]	155	B) Bcache does not find its cache
Marc MERLIN	c9b2ffc	2016-03-11 23:04:19 -0800	[diff] [blame]	156
Eric Wheeler	c0b8c9a	2016-03-11 23:43:47 -0800	[diff] [blame]	157	host:/sys/block/md5/bcache# echo 0226553a-37cf-41d5-b3ce-8b1e944543a8 > attach
				158	[ 1933.455082] bcache: bch_cached_dev_attach() Couldn't find uuid for md5 in set
				159	[ 1933.478179] bcache: __cached_dev_store() Can't attach 0226553a-37cf-41d5-b3ce-8b1e944543a8
				160	[ 1933.478179] : cache set not found
Marc MERLIN	c9b2ffc	2016-03-11 23:04:19 -0800	[diff] [blame]	161
Eric Wheeler	c0b8c9a	2016-03-11 23:43:47 -0800	[diff] [blame]	162	In this case, the caching device was simply not registered at boot
				163	or disappeared and came back, and needs to be (re-)registered:
				164	host:/sys/block/md5/bcache# echo /dev/sdh2 > /sys/fs/bcache/register
Marc MERLIN	c9b2ffc	2016-03-11 23:04:19 -0800	[diff] [blame]	165
				166
Eric Wheeler	c0b8c9a	2016-03-11 23:43:47 -0800	[diff] [blame]	167	C) Corrupt bcache crashes the kernel at device registration time:
Marc MERLIN	c9b2ffc	2016-03-11 23:04:19 -0800	[diff] [blame]	168
Eric Wheeler	c0b8c9a	2016-03-11 23:43:47 -0800	[diff] [blame]	169	This should never happen. If it does happen, then you have found a bug!
				170	Please report it to the bcache development list: linux-bcache@vger.kernel.org
Marc MERLIN	c9b2ffc	2016-03-11 23:04:19 -0800	[diff] [blame]	171
Eric Wheeler	c0b8c9a	2016-03-11 23:43:47 -0800	[diff] [blame]	172	Be sure to provide as much information that you can including kernel dmesg
				173	output if available so that we may assist.
				174
				175
				176	D) Recovering data without bcache:
				177
				178	If bcache is not available in the kernel, a filesystem on the backing
				179	device is still available at an 8KiB offset. So either via a loopdev
				180	of the backing device created with --offset 8K, or any value defined by
				181	--data-offset when you originally formatted bcache with `make-bcache`.
				182
				183	For example:
				184	losetup -o 8192 /dev/loop0 /dev/your_bcache_backing_dev
				185
				186	This should present your unmodified backing device data in /dev/loop0
				187
				188	If your cache is in writethrough mode, then you can safely discard the
				189	cache device without loosing data.
				190
				191
				192	E) Wiping a cache device
				193
Marc MERLIN	c9b2ffc	2016-03-11 23:04:19 -0800	[diff] [blame]	194	host:~# wipefs -a /dev/sdh2
				195	16 bytes were erased at offset 0x1018 (bcache)
				196	they were: c6 85 73 f6 4e 1a 45 ca 82 65 f5 7f 48 ba 6d 81
				197
				198	After you boot back with bcache enabled, you recreate the cache and attach it:
				199	host:~# make-bcache -C /dev/sdh2
				200	UUID: 7be7e175-8f4c-4f99-94b2-9c904d227045
				201	Set UUID: 5bc072a8-ab17-446d-9744-e247949913c1
				202	version: 0
				203	nbuckets: 106874
				204	block_size: 1
				205	bucket_size: 1024
				206	nr_in_set: 1
				207	nr_this_dev: 0
				208	first_bucket: 1
				209	[ 650.511912] bcache: run_cache_set() invalidating existing data
				210	[ 650.549228] bcache: register_cache() registered cache device sdh2
				211
				212	start backing device with missing cache:
				213	host:/sys/block/md5/bcache# echo 1 > running
				214
				215	attach new cache:
				216	host:/sys/block/md5/bcache# echo 5bc072a8-ab17-446d-9744-e247949913c1 > attach
				217	[ 865.276616] bcache: bch_cached_dev_attach() Caching md5 as bcache0 on set 5bc072a8-ab17-446d-9744-e247949913c1
				218
				219
Eric Wheeler	c0b8c9a	2016-03-11 23:43:47 -0800	[diff] [blame]	220	F) Remove or replace a caching device
Marc MERLIN	c9b2ffc	2016-03-11 23:04:19 -0800	[diff] [blame]	221
Eric Wheeler	c0b8c9a	2016-03-11 23:43:47 -0800	[diff] [blame]	222	host:/sys/block/sda/sda7/bcache# echo 1 > detach
				223	[ 695.872542] bcache: cached_dev_detach_finish() Caching disabled for sda7
Marc MERLIN	c9b2ffc	2016-03-11 23:04:19 -0800	[diff] [blame]	224
Eric Wheeler	c0b8c9a	2016-03-11 23:43:47 -0800	[diff] [blame]	225	host:~# wipefs -a /dev/nvme0n1p4
				226	wipefs: error: /dev/nvme0n1p4: probing initialization failed: Device or resource busy
				227	Ooops, it's disabled, but not unregistered, so it's still protected
Marc MERLIN	c9b2ffc	2016-03-11 23:04:19 -0800	[diff] [blame]	228
				229	We need to go and unregister it:
Eric Wheeler	c0b8c9a	2016-03-11 23:43:47 -0800	[diff] [blame]	230	host:/sys/fs/bcache/b7ba27a1-2398-4649-8ae3-0959f57ba128# ls -l cache0
				231	lrwxrwxrwx 1 root root 0 Feb 25 18:33 cache0 -> ../../../devices/pci0000:00/0000:00:1d.0/0000:70:00.0/nvme/nvme0/nvme0n1/nvme0n1p4/bcache/
				232	host:/sys/fs/bcache/b7ba27a1-2398-4649-8ae3-0959f57ba128# echo 1 > stop
				233	kernel: [ 917.041908] bcache: cache_set_free() Cache set b7ba27a1-2398-4649-8ae3-0959f57ba128 unregistered
Marc MERLIN	c9b2ffc	2016-03-11 23:04:19 -0800	[diff] [blame]	234
				235	Now we can wipe it:
Eric Wheeler	c0b8c9a	2016-03-11 23:43:47 -0800	[diff] [blame]	236	host:~# wipefs -a /dev/nvme0n1p4
				237	/dev/nvme0n1p4: 16 bytes were erased at offset 0x00001018 (bcache): c6 85 73 f6 4e 1a 45 ca 82 65 f5 7f 48 ba 6d 81
Marc MERLIN	c9b2ffc	2016-03-11 23:04:19 -0800	[diff] [blame]	238
				239
Eric Wheeler	c0b8c9a	2016-03-11 23:43:47 -0800	[diff] [blame]	240	G) dm-crypt and bcache
Marc MERLIN	c9b2ffc	2016-03-11 23:04:19 -0800	[diff] [blame]	241
Eric Wheeler	c0b8c9a	2016-03-11 23:43:47 -0800	[diff] [blame]	242	First setup bcache unencrypted and then install dmcrypt on top of
				243	/dev/bcache<N> This will work faster than if you dmcrypt both the backing
				244	and caching devices and then install bcache on top. [benchmarks?]
Marc MERLIN	c9b2ffc	2016-03-11 23:04:19 -0800	[diff] [blame]	245
				246
Eric Wheeler	c0b8c9a	2016-03-11 23:43:47 -0800	[diff] [blame]	247	H) Stop/free a registered bcache to wipe and/or recreate it
				248
				249	Suppose that you need to free up all bcache references so that you can
				250	fdisk run and re-register a changed partition table, which won't work
				251	if there are any active backing or caching devices left on it:
Marc MERLIN	c9b2ffc	2016-03-11 23:04:19 -0800	[diff] [blame]	252
				253	1) Is it present in /dev/bcache* ? (there are times where it won't be)
Eric Wheeler	c0b8c9a	2016-03-11 23:43:47 -0800	[diff] [blame]	254
Marc MERLIN	c9b2ffc	2016-03-11 23:04:19 -0800	[diff] [blame]	255	If so, it's easy:
Eric Wheeler	c0b8c9a	2016-03-11 23:43:47 -0800	[diff] [blame]	256	host:/sys/block/bcache0/bcache# echo 1 > stop
Marc MERLIN	c9b2ffc	2016-03-11 23:04:19 -0800	[diff] [blame]	257
				258	2) But if your backing device is gone, this won't work:
Eric Wheeler	c0b8c9a	2016-03-11 23:43:47 -0800	[diff] [blame]	259	host:/sys/block/bcache0# cd bcache
				260	bash: cd: bcache: No such file or directory
Marc MERLIN	c9b2ffc	2016-03-11 23:04:19 -0800	[diff] [blame]	261
				262	In this case, you may have to unregister the dmcrypt block device that
				263	references this bcache to free it up:
Eric Wheeler	c0b8c9a	2016-03-11 23:43:47 -0800	[diff] [blame]	264	host:~# dmsetup remove oldds1
				265	bcache: bcache_device_free() bcache0 stopped
				266	bcache: cache_set_free() Cache set 5bc072a8-ab17-446d-9744-e247949913c1 unregistered
Marc MERLIN	c9b2ffc	2016-03-11 23:04:19 -0800	[diff] [blame]	267
Eric Wheeler	c0b8c9a	2016-03-11 23:43:47 -0800	[diff] [blame]	268	This causes the backing bcache to be removed from /sys/fs/bcache and
				269	then it can be reused. This would be true of any block device stacking
				270	where bcache is a lower device.
Marc MERLIN	c9b2ffc	2016-03-11 23:04:19 -0800	[diff] [blame]	271
				272	3) In other cases, you can also look in /sys/fs/bcache/:
Eric Wheeler	c0b8c9a	2016-03-11 23:43:47 -0800	[diff] [blame]	273
Marc MERLIN	c9b2ffc	2016-03-11 23:04:19 -0800	[diff] [blame]	274	host:/sys/fs/bcache# ls -l */{cache?,bdev?}
				275	lrwxrwxrwx 1 root root 0 Mar 5 09:39 0226553a-37cf-41d5-b3ce-8b1e944543a8/bdev1 -> ../../../devices/virtual/block/dm-1/bcache/
				276	lrwxrwxrwx 1 root root 0 Mar 5 09:39 0226553a-37cf-41d5-b3ce-8b1e944543a8/cache0 -> ../../../devices/virtual/block/dm-4/bcache/
				277	lrwxrwxrwx 1 root root 0 Mar 5 09:39 5bc072a8-ab17-446d-9744-e247949913c1/cache0 -> ../../../devices/pci0000:00/0000:00:01.0/0000:01:00.0/ata10/host9/target9:0:0/9:0:0:0/block/sdl/sdl2/bcache/
				278
				279	The device names will show which UUID is relevant, cd in that directory
				280	and stop the cache:
Eric Wheeler	c0b8c9a	2016-03-11 23:43:47 -0800	[diff] [blame]	281	host:/sys/fs/bcache/5bc072a8-ab17-446d-9744-e247949913c1# echo 1 > stop
				282
				283	This will free up bcache references and let you reuse the partition for
				284	other purposes.
Marc MERLIN	c9b2ffc	2016-03-11 23:04:19 -0800	[diff] [blame]	285
				286
				287
				288	TROUBLESHOOTING PERFORMANCE
				289	---------------------------
Kent Overstreet	7b41b51	2013-03-27 12:24:17 -0700	[diff] [blame]	290
				291	Bcache has a bunch of config options and tunables. The defaults are intended to
				292	be reasonable for typical desktop and server workloads, but they're not what you
				293	want for getting the best possible numbers when benchmarking.
				294
Eric Wheeler	c0b8c9a	2016-03-11 23:43:47 -0800	[diff] [blame]	295	- Backing device alignment
				296
				297	The default metadata size in bcache is 8k. If your backing device is
				298	RAID based, then be sure to align this by a multiple of your stride
				299	width using `make-bcache --data-offset`. If you intend to expand your
				300	disk array in the future, then multiply a series of primes by your
				301	raid stripe size to get the disk multiples that you would like.
				302
				303	For example: If you have a 64k stripe size, then the following offset
				304	would provide alignment for many common RAID5 data spindle counts:
				305	64k * 2223357 bytes = 161280k
				306
				307	That space is wasted, but for only 157.5MB you can grow your RAID 5
				308	volume to the following data-spindle counts without re-aligning:
				309	3,4,5,6,7,8,9,10,12,14,15,18,20,21 ...
				310
Kent Overstreet	7b41b51	2013-03-27 12:24:17 -0700	[diff] [blame]	311	- Bad write performance
				312
				313	If write performance is not what you expected, you probably wanted to be
				314	running in writeback mode, which isn't the default (not due to a lack of
				315	maturity, but simply because in writeback mode you'll lose data if something
				316	happens to your SSD)
				317
Marc MERLIN	c9b2ffc	2016-03-11 23:04:19 -0800	[diff] [blame]	318	# echo writeback > /sys/block/bcache0/bcache/cache_mode
Kent Overstreet	7b41b51	2013-03-27 12:24:17 -0700	[diff] [blame]	319
				320	- Bad performance, or traffic not going to the SSD that you'd expect
				321
				322	By default, bcache doesn't cache everything. It tries to skip sequential IO -
				323	because you really want to be caching the random IO, and if you copy a 10
				324	gigabyte file you probably don't want that pushing 10 gigabytes of randomly
				325	accessed data out of your cache.
				326
				327	But if you want to benchmark reads from cache, and you start out with fio
				328	writing an 8 gigabyte test file - so you want to disable that.
				329
				330	# echo 0 > /sys/block/bcache0/bcache/sequential_cutoff
				331
				332	To set it back to the default (4 mb), do
				333
				334	# echo 4M > /sys/block/bcache0/bcache/sequential_cutoff
				335
				336	- Traffic's still going to the spindle/still getting cache misses
				337
				338	In the real world, SSDs don't always keep up with disks - particularly with
				339	slower SSDs, many disks being cached by one SSD, or mostly sequential IO. So
				340	you want to avoid being bottlenecked by the SSD and having it slow everything
				341	down.
				342
				343	To avoid that bcache tracks latency to the cache device, and gradually
				344	throttles traffic if the latency exceeds a threshold (it does this by
				345	cranking down the sequential bypass).
				346
				347	You can disable this if you need to by setting the thresholds to 0:
				348
				349	# echo 0 > /sys/fs/bcache/<cache set>/congested_read_threshold_us
				350	# echo 0 > /sys/fs/bcache/<cache set>/congested_write_threshold_us
				351
				352	The default is 2000 us (2 milliseconds) for reads, and 20000 for writes.
				353
				354	- Still getting cache misses, of the same data
				355
				356	One last issue that sometimes trips people up is actually an old bug, due to
				357	the way cache coherency is handled for cache misses. If a btree node is full,
				358	a cache miss won't be able to insert a key for the new data and the data
				359	won't be written to the cache.
				360
				361	In practice this isn't an issue because as soon as a write comes along it'll
				362	cause the btree node to be split, and you need almost no write traffic for
Masanari Iida	bd206b5	2013-05-20 00:04:35 +0900	[diff] [blame]	363	this to not show up enough to be noticeable (especially since bcache's btree
Kent Overstreet	7b41b51	2013-03-27 12:24:17 -0700	[diff] [blame]	364	nodes are huge and index large regions of the device). But when you're
				365	benchmarking, if you're trying to warm the cache by reading a bunch of data
				366	and there's no other traffic - that can be a problem.
				367
				368	Solution: warm the cache by doing writes, or use the testing branch (there's
				369	a fix for the issue there).
				370
Marc MERLIN	c9b2ffc	2016-03-11 23:04:19 -0800	[diff] [blame]	371
				372	SYSFS - BACKING DEVICE
				373	----------------------
Kent Overstreet	cafe563	2013-03-23 16:11:31 -0700	[diff] [blame]	374
Gabriel de Perthuis	cecd628	2013-06-27 02:12:07 +0200	[diff] [blame]	375	Available at /sys/block/<bdev>/bcache, /sys/block/bcache*/bcache and
				376	(if attached) /sys/fs/bcache/<cset-uuid>/bdev*
				377
Kent Overstreet	cafe563	2013-03-23 16:11:31 -0700	[diff] [blame]	378	attach
				379	Echo the UUID of a cache set to this file to enable caching.
				380
				381	cache_mode
				382	Can be one of either writethrough, writeback, writearound or none.
				383
				384	clear_stats
				385	Writing to this file resets the running total stats (not the day/hour/5 minute
				386	decaying versions).
				387
				388	detach
				389	Write to this file to detach from a cache set. If there is dirty data in the
				390	cache, it will be flushed first.
				391
				392	dirty_data
				393	Amount of dirty data for this backing device in the cache. Continuously
				394	updated unlike the cache set's version, but may be slightly off.
				395
				396	label
				397	Name of underlying device.
				398
				399	readahead
				400	Size of readahead that should be performed. Defaults to 0. If set to e.g.
				401	1M, it will round cache miss reads up to that size, but without overlapping
				402	existing cache entries.
				403
				404	running
				405	1 if bcache is running (i.e. whether the /dev/bcache device exists, whether
				406	it's in passthrough mode or caching).
				407
				408	sequential_cutoff
Masanari Iida	bd206b5	2013-05-20 00:04:35 +0900	[diff] [blame]	409	A sequential IO will bypass the cache once it passes this threshold; the
Kent Overstreet	cafe563	2013-03-23 16:11:31 -0700	[diff] [blame]	410	most recent 128 IOs are tracked so sequential IO can be detected even when
				411	it isn't all done at once.
				412
				413	sequential_merge
				414	If non zero, bcache keeps a list of the last 128 requests submitted to compare
				415	against all new requests to determine which new requests are sequential
				416	continuations of previous requests for the purpose of determining sequential
				417	cutoff. This is necessary if the sequential cutoff value is greater than the
Eric Wheeler	c0b8c9a	2016-03-11 23:43:47 -0800	[diff] [blame]	418	maximum acceptable sequential size for any single request.
Kent Overstreet	cafe563	2013-03-23 16:11:31 -0700	[diff] [blame]	419
				420	state
				421	The backing device can be in one of four different states:
				422
				423	no cache: Has never been attached to a cache set.
				424
				425	clean: Part of a cache set, and there is no cached dirty data.
				426
				427	dirty: Part of a cache set, and there is cached dirty data.
				428
				429	inconsistent: The backing device was forcibly run by the user when there was
				430	dirty data cached but the cache set was unavailable; whatever data was on the
				431	backing device has likely been corrupted.
				432
				433	stop
				434	Write to this file to shut down the bcache device and close the backing
				435	device.
				436
				437	writeback_delay
				438	When dirty data is written to the cache and it previously did not contain
				439	any, waits some number of seconds before initiating writeback. Defaults to
				440	30.
				441
				442	writeback_percent
				443	If nonzero, bcache tries to keep around this percentage of the cache dirty by
				444	throttling background writeback and using a PD controller to smoothly adjust
				445	the rate.
				446
				447	writeback_rate
				448	Rate in sectors per second - if writeback_percent is nonzero, background
				449	writeback is throttled to this rate. Continuously adjusted by bcache but may
				450	also be set by the user.
				451
				452	writeback_running
				453	If off, writeback of dirty data will not take place at all. Dirty data will
				454	still be added to the cache until it is mostly full; only meant for
				455	benchmarking. Defaults to on.
				456
				457	SYSFS - BACKING DEVICE STATS:
				458
				459	There are directories with these numbers for a running total, as well as
				460	versions that decay over the past day, hour and 5 minutes; they're also
				461	aggregated in the cache set directory as well.
				462
				463	bypassed
				464	Amount of IO (both reads and writes) that has bypassed the cache
				465
				466	cache_hits
				467	cache_misses
				468	cache_hit_ratio
				469	Hits and misses are counted per individual IO as bcache sees them; a
				470	partial hit is counted as a miss.
				471
				472	cache_bypass_hits
				473	cache_bypass_misses
				474	Hits and misses for IO that is intended to skip the cache are still counted,
				475	but broken out here.
				476
				477	cache_miss_collisions
				478	Counts instances where data was going to be inserted into the cache from a
				479	cache miss, but raced with a write and data was already present (usually 0
				480	since the synchronization for cache misses was rewritten)
				481
				482	cache_readaheads
Masanari Iida	bd206b5	2013-05-20 00:04:35 +0900	[diff] [blame]	483	Count of times readahead occurred.
Kent Overstreet	cafe563	2013-03-23 16:11:31 -0700	[diff] [blame]	484
				485	SYSFS - CACHE SET:
				486
Gabriel de Perthuis	cecd628	2013-06-27 02:12:07 +0200	[diff] [blame]	487	Available at /sys/fs/bcache/<cset-uuid>
				488
Kent Overstreet	cafe563	2013-03-23 16:11:31 -0700	[diff] [blame]	489	average_key_size
				490	Average data per key in the btree.
				491
				492	bdev<0..n>
				493	Symlink to each of the attached backing devices.
				494
				495	block_size
				496	Block size of the cache devices.
				497
				498	btree_cache_size
				499	Amount of memory currently used by the btree cache
				500
				501	bucket_size
				502	Size of buckets
				503
				504	cache<0..n>
Eric Wheeler	c0b8c9a	2016-03-11 23:43:47 -0800	[diff] [blame]	505	Symlink to each of the cache devices comprising this cache set.
Kent Overstreet	cafe563	2013-03-23 16:11:31 -0700	[diff] [blame]	506
				507	cache_available_percent
Gabriel	fe0a797	2013-04-24 19:51:02 +0200	[diff] [blame]	508	Percentage of cache device which doesn't contain dirty data, and could
				509	potentially be used for writeback. This doesn't mean this space isn't used
				510	for clean cached data; the unused statistic (in priority_stats) is typically
				511	much lower.
Kent Overstreet	cafe563	2013-03-23 16:11:31 -0700	[diff] [blame]	512
				513	clear_stats
				514	Clears the statistics associated with this cache
				515
				516	dirty_data
				517	Amount of dirty data is in the cache (updated when garbage collection runs).
				518
				519	flash_vol_create
				520	Echoing a size to this file (in human readable units, k/M/G) creates a thinly
				521	provisioned volume backed by the cache set.
				522
				523	io_error_halflife
				524	io_error_limit
				525	These determines how many errors we accept before disabling the cache.
				526	Each error is decayed by the half life (in # ios). If the decaying count
				527	reaches io_error_limit dirty data is written out and the cache is disabled.
				528
				529	journal_delay_ms
				530	Journal writes will delay for up to this many milliseconds, unless a cache
				531	flush happens sooner. Defaults to 100.
				532
				533	root_usage_percent
				534	Percentage of the root btree node in use. If this gets too high the node
				535	will split, increasing the tree depth.
				536
				537	stop
				538	Write to this file to shut down the cache set - waits until all attached
				539	backing devices have been shut down.
				540
				541	tree_depth
				542	Depth of the btree (A single node btree has depth 0).
				543
				544	unregister
				545	Detaches all backing devices and closes the cache devices; if dirty data is
				546	present it will disable writeback caching and wait for it to be flushed.
				547
				548	SYSFS - CACHE SET INTERNAL:
				549
				550	This directory also exposes timings for a number of internal operations, with
Masanari Iida	bd206b5	2013-05-20 00:04:35 +0900	[diff] [blame]	551	separate files for average duration, average frequency, last occurrence and max
Kent Overstreet	cafe563	2013-03-23 16:11:31 -0700	[diff] [blame]	552	duration: garbage collection, btree read, btree node sorts and btree splits.
				553
				554	active_journal_entries
				555	Number of journal entries that are newer than the index.
				556
				557	btree_nodes
				558	Total nodes in the btree.
				559
				560	btree_used_percent
				561	Average fraction of btree in use.
				562
				563	bset_tree_stats
				564	Statistics about the auxiliary search trees
				565
				566	btree_cache_max_chain
				567	Longest chain in the btree node cache's hash table
				568
				569	cache_read_races
				570	Counts instances where while data was being read from the cache, the bucket
				571	was reused and invalidated - i.e. where the pointer was stale after the read
				572	completed. When this occurs the data is reread from the backing device.
				573
				574	trigger_gc
				575	Writing to this file forces garbage collection to run.
				576
				577	SYSFS - CACHE DEVICE:
				578
Gabriel de Perthuis	cecd628	2013-06-27 02:12:07 +0200	[diff] [blame]	579	Available at /sys/block/<cdev>/bcache
				580
Kent Overstreet	cafe563	2013-03-23 16:11:31 -0700	[diff] [blame]	581	block_size
				582	Minimum granularity of writes - should match hardware sector size.
				583
				584	btree_written
				585	Sum of all btree writes, in (kilo/mega/giga) bytes
				586
				587	bucket_size
				588	Size of buckets
				589
				590	cache_replacement_policy
				591	One of either lru, fifo or random.
				592
				593	discard
				594	Boolean; if on a discard/TRIM will be issued to each bucket before it is
				595	reused. Defaults to off, since SATA TRIM is an unqueued command (and thus
				596	slow).
				597
				598	freelist_percent
				599	Size of the freelist as a percentage of nbuckets. Can be written to to
				600	increase the number of buckets kept on the freelist, which lets you
				601	artificially reduce the size of the cache at runtime. Mostly for testing
				602	purposes (i.e. testing how different size caches affect your hit rate), but
				603	since buckets are discarded when they move on to the freelist will also make
				604	the SSD's garbage collection easier by effectively giving it more reserved
				605	space.
				606
				607	io_errors
Masanari Iida	bd206b5	2013-05-20 00:04:35 +0900	[diff] [blame]	608	Number of errors that have occurred, decayed by io_error_halflife.
Kent Overstreet	cafe563	2013-03-23 16:11:31 -0700	[diff] [blame]	609
				610	metadata_written
				611	Sum of all non data writes (btree writes and all other metadata).
				612
				613	nbuckets
				614	Total buckets in this cache
				615
				616	priority_stats
Gabriel	fe0a797	2013-04-24 19:51:02 +0200	[diff] [blame]	617	Statistics about how recently data in the cache has been accessed.
				618	This can reveal your working set size. Unused is the percentage of
				619	the cache that doesn't contain any data. Metadata is bcache's
				620	metadata overhead. Average is the average priority of cache buckets.
				621	Next is a list of quantiles with the priority threshold of each.
Kent Overstreet	cafe563	2013-03-23 16:11:31 -0700	[diff] [blame]	622
				623	written
				624	Sum of all data that has been written to the cache; comparison with
				625	btree_written gives the amount of write inflation in bcache.