Blame - pw_kvs/docs.rst - platform/external/pigweed

blob: b958c63daebcdede676220d23a2e19040185ef93 [file] [log] [blame]

Wyatt Hepler	f9fb90f	2020-09-30 18:59:33 -0700	[diff] [blame]	1	.. _module-pw_kvs:
Armando Montanez	0054a9b	2020-03-13 13:06:24 -0700	[diff] [blame]	2
				3	------
				4	pw_kvs
				5	------
David Rogers	dcdd486	2020-10-27 03:34:37 -0700	[diff] [blame]	6
				7	.. note::
				8	The documentation for this module is currently under construction.
				9
Armando Montanez	0054a9b	2020-03-13 13:06:24 -0700	[diff] [blame]	10	``pw_kvs`` is Pigweed's Key Value Store (KVS) library. KVS is a flash-backed
				11	persistent storage system with integrated wear-leveling that serves as a
				12	relatively lightweight alternative to a file system.
				13
David Rogers	dcdd486	2020-10-27 03:34:37 -0700	[diff] [blame]	14	KeyValueStore
				15	=============
				16
				17	The KVS system stores key and value data pairs. The key value pairs are stored
				18	in `flash memory`_ as a `key-value entry`_ (KV entry) that consists of a
				19	header/metadata, the key data, and value data. KV entries are accessed through
				20	Put, Get, and Delete operations.
				21
				22	Each flash sector is written sequentially in an append-only manner, with each
				23	following entry write being at a higher address than all of the previous entry
				24	writes to that sector since erase. Once information (header, metadata, data,
				25	etc) is written to flash, that information is not modified or cleared until a
				26	full sector erase occurs as part of garbage collection.
				27
				28	Individual KV entries are contained within a single flash sector (do not cross
				29	sector boundaries). Flash sectors can contain as many KV entries as fit in the
				30	sector.
				31
				32	KVS does not store any data/metadata/state in flash beyond the KV entries. All
				33	KVS system state can be derived from the stored KV entries. Current KVS system
				34	state is determined at boot from flash-stored KV entries and then maintained in
				35	ram by the KVS. The KVS is at all times in a valid state on-flash, so there are
				36	no windows of vulnerability to unexpected power loss or crash. The old entry
				37	for a key is maintained until the new entry for that key is written and
				38	verified.
				39
				40	Each `key-value entry`_ has a unique transaction ID that is incremented for
				41	each KVS update transaction. When determining system state from flash-stored KV
				42	entries, the valid entry with the highest transaction ID is considered to be
				43	the “current” entry of the key. All stored entries of the same key with lower
				44	transaction ID are considered old or “stale”.
				45
				46	Updates/rewrites of a key that has been previously stored is done as a new KV
				47	entry with an updated transaction ID and the new value for the key. The KVS
				48	internal state is updated to reflect the new entry. The previously stored KV
				49	entries for that key are not modified or removed from flash storage, until
				50	garbage collection reclaims the “stale” entries.
				51
				52	`Garbage collection`_ is done by coping any currently valid KV entries in the
				53	sector to be garbage collected to a different sector and then erasing the
				54	sector.
				55
				56	Flash Memory
				57	-------------
				58
				59	The flash storage used by KVS is comprised of two layers, FlashMemory and
				60	FlashPartition.
				61
				62	FlashMemory is the lower level that manages the raw read/write/erase of the
				63	flash memory device.
				64
				65	FlashPartition is a portion of a FlashMemory. A FlashMemory may have multiple
				66	FlashPartitions that represent different parts of the FlashMemory - such as
				67	partitions for KVS, OTA, snapshots/crashlogs, etc. Each FlashPartition has its
				68	own separate logical address space starting from zero to size of the partition.
				69	FlashPartition logical address does not always map directly to FlashMemory
				70	addresses due to partition encryption, sector headers, etc.
				71
				72	Writes to flash must have a start address that is a multiple of the flash
				73	write alignment. Write size must also be a multiple of flash write alignment.
				74	Write alignment varies by flash device and partition type. FlashPartitions may
				75	have a different alignment than the FlashMemory they are part of, so long as
				76	the partition's alignment is a multiple of the alignment for the FlashMemory.
				77	Reads from flash do not have any address or size alignment requirement - reads
				78	always have a minimum alignment of 1.
				79
				80	Flash sectors are the minimum erase size for both FlashMemory and
				81	FlashPartition. FlashPartitions may have a different logical sector size than
				82	the FlashMemory they are part of. Partition logical sectors may be smaller due
				83	to partition overhead (encryption, wear tracking, etc) or larger due to
				84	combining raw sectors into larger logical sectors.
				85
				86	Storage Allocation
				87	------------------
				88
				89	KVS requires more storage space than the size of the key-value data stored.
				90	This is due to the always free sector required for garbage collection and the
				91	"write and garbage collect later" approach KVS uses.
				92
				93	KVS works poorly with stored data being more than 75% of the available
				94	storage. It works best with stored data being less than 50% of the available
				95	storage. For applications that prefer/need to do garbage collection at
				96	scheduled times or that write very heavily can benefit from additional flash
				97	store space.
				98
				99	The flash storage used by KVS is multiplied by `redundancy`_ used. A redundancy
				100	of 2 will use twice the storage.
				101
				102	Key-Value Entry
				103	---------------
				104
				105	Each key-value (KV) entry consists of a header/metadata, the key data, and
				106	value data. Individual KV entries are contained within a single flash sector
				107	(do not cross sector boundaries). Because of this the maximum KV entry size is
				108	the partition sector size.
				109
				110	KV entries are appended as needed to sectors, with append operations spread
				111	over time. Each individual KV entry is written completely as a single
				112	high-level operation. KV entries are appended to a sector as long as space is
				113	available for a given KV entry. Multiple sectors can be active for writing at
				114	any time.
				115
				116	When a key is rewritten (writing a new KV entry of an existing key), the KV
				117	entry is stored at a new location that may or may not be located in the same
				118	sector as the previous entry for that key. The new entry uses a transaction
				119	ID greater than the previous transaction ID. The previous KV entry for that key
				120	remains unaltered “on-disk” but is considered “stale”. It is garbage collected
				121	at some future time.
				122
				123	Redundancy
				124	----------
				125
				126	KVS supports storing redundant copies of KV entries. For a given redundancy
				127	level (N), N total copies of each KV entry are stored. Redundant copies are
				128	always stored in different sectors. This protects against corruption or even
				129	full sector loss in N-1 sectors without data loss.
				130
				131	Redundancy increases flash usage proportional to the redundancy level. The RAM
				132	usage for KVS internal state has a small increase with redundancy.
				133
				134	Garbage Collection
				135	------------------
				136
				137	Storage space occupied by stale KV entries is reclaimed and made available
				138	for reuse through a garbage collection process. The base garbage collection
				139	operation is done to reclaim one sector at a time.
				140
				141	KVS always keeps at least one sector free at all times to ensure the ability to
				142	garbage collect. This free sector is used to copy valid entries from the sector
				143	to be garbage collected before erasing the sector to be garbage collected. The
				144	always free sector is rotated as part of the KVS wear leveling.
				145
				146	Full Maintenance does garbage collection of all sectors except those that have
				147	current valid KV entries.
				148
				149	Heavy Maintenance does garbage collection of all sectors. Use strong caution
				150	when doing Heavy Maintenance as it can, compared to Full Maintenance, result
				151	in a significant amount of moving valid entries,
				152
				153	Garbage collection can be performed by request of higher level software or
				154	automatically as needed to make space available to write new entries.
				155
				156	Flash wear management
				157	---------------------
				158
				159	Wear leveling is accomplished by cycling selection of the next sector to write
				160	to. This cycling spreads flash wear across all free sectors so that no one
				161	sector is prematurely worn out.
				162
				163	Wear leveling through cycling selection of next sector to write
				164
				165	* Location of new writes/rewrites of key-values will prefer sectors already
				166	in-use (partially filled), with new (blank) sectors used when no in-use
				167	sectors have large enough available space for the new write
				168	* New (blank) sectors selected cycle sequentially between available free
				169	sectors
				170	* Search for the first available sector, starting from current write sector + 1
				171	and wrap around to start at the end of partition.
				172	* This spreads the erase/write cycles for heavily written/rewritten key-values
				173	across all free sectors, reducing wear on any single sector
				174	* Erase count is not considered as part of the wear leveling decision making
				175	process
				176	* Sectors with already written key-values that are not modified will remain in
				177	the original sector and not participate in wear-leveling, so long as the
				178	key-values in the sector remain unchanged