blob: b958c63daebcdede676220d23a2e19040185ef93 [file] [log] [blame]
Wyatt Heplerf9fb90f2020-09-30 18:59:33 -07001.. _module-pw_kvs:
Armando Montanez0054a9b2020-03-13 13:06:24 -07002
3------
4pw_kvs
5------
David Rogersdcdd4862020-10-27 03:34:37 -07006
7.. note::
8 The documentation for this module is currently under construction.
9
Armando Montanez0054a9b2020-03-13 13:06:24 -070010``pw_kvs`` is Pigweed's Key Value Store (KVS) library. KVS is a flash-backed
11persistent storage system with integrated wear-leveling that serves as a
12relatively lightweight alternative to a file system.
13
David Rogersdcdd4862020-10-27 03:34:37 -070014KeyValueStore
15=============
16
17The KVS system stores key and value data pairs. The key value pairs are stored
18in `flash memory`_ as a `key-value entry`_ (KV entry) that consists of a
19header/metadata, the key data, and value data. KV entries are accessed through
20Put, Get, and Delete operations.
21
22Each flash sector is written sequentially in an append-only manner, with each
23following entry write being at a higher address than all of the previous entry
24writes to that sector since erase. Once information (header, metadata, data,
25etc) is written to flash, that information is not modified or cleared until a
26full sector erase occurs as part of garbage collection.
27
28Individual KV entries are contained within a single flash sector (do not cross
29sector boundaries). Flash sectors can contain as many KV entries as fit in the
30sector.
31
32KVS does not store any data/metadata/state in flash beyond the KV entries. All
33KVS system state can be derived from the stored KV entries. Current KVS system
34state is determined at boot from flash-stored KV entries and then maintained in
35ram by the KVS. The KVS is at all times in a valid state on-flash, so there are
36no windows of vulnerability to unexpected power loss or crash. The old entry
37for a key is maintained until the new entry for that key is written and
38verified.
39
40Each `key-value entry`_ has a unique transaction ID that is incremented for
41each KVS update transaction. When determining system state from flash-stored KV
42entries, the valid entry with the highest transaction ID is considered to be
43the “current” entry of the key. All stored entries of the same key with lower
44transaction ID are considered old or “stale”.
45
46Updates/rewrites of a key that has been previously stored is done as a new KV
47entry with an updated transaction ID and the new value for the key. The KVS
48internal state is updated to reflect the new entry. The previously stored KV
49entries for that key are not modified or removed from flash storage, until
50garbage collection reclaims the “stale” entries.
51
52`Garbage collection`_ is done by coping any currently valid KV entries in the
53sector to be garbage collected to a different sector and then erasing the
54sector.
55
56Flash Memory
57-------------
58
59The flash storage used by KVS is comprised of two layers, FlashMemory and
60FlashPartition.
61
62FlashMemory is the lower level that manages the raw read/write/erase of the
63flash memory device.
64
65FlashPartition is a portion of a FlashMemory. A FlashMemory may have multiple
66FlashPartitions that represent different parts of the FlashMemory - such as
67partitions for KVS, OTA, snapshots/crashlogs, etc. Each FlashPartition has its
68own separate logical address space starting from zero to size of the partition.
69FlashPartition logical address does not always map directly to FlashMemory
70addresses due to partition encryption, sector headers, etc.
71
72Writes to flash must have a start address that is a multiple of the flash
73write alignment. Write size must also be a multiple of flash write alignment.
74Write alignment varies by flash device and partition type. FlashPartitions may
75have a different alignment than the FlashMemory they are part of, so long as
76the partition's alignment is a multiple of the alignment for the FlashMemory.
77Reads from flash do not have any address or size alignment requirement - reads
78always have a minimum alignment of 1.
79
80Flash sectors are the minimum erase size for both FlashMemory and
81FlashPartition. FlashPartitions may have a different logical sector size than
82the FlashMemory they are part of. Partition logical sectors may be smaller due
83to partition overhead (encryption, wear tracking, etc) or larger due to
84combining raw sectors into larger logical sectors.
85
86Storage Allocation
87------------------
88
89KVS requires more storage space than the size of the key-value data stored.
90This is due to the always free sector required for garbage collection and the
91"write and garbage collect later" approach KVS uses.
92
93KVS works poorly with stored data being more than 75% of the available
94storage. It works best with stored data being less than 50% of the available
95storage. For applications that prefer/need to do garbage collection at
96scheduled times or that write very heavily can benefit from additional flash
97store space.
98
99The flash storage used by KVS is multiplied by `redundancy`_ used. A redundancy
100of 2 will use twice the storage.
101
102Key-Value Entry
103---------------
104
105Each key-value (KV) entry consists of a header/metadata, the key data, and
106value data. Individual KV entries are contained within a single flash sector
107(do not cross sector boundaries). Because of this the maximum KV entry size is
108the partition sector size.
109
110KV entries are appended as needed to sectors, with append operations spread
111over time. Each individual KV entry is written completely as a single
112high-level operation. KV entries are appended to a sector as long as space is
113available for a given KV entry. Multiple sectors can be active for writing at
114any time.
115
116When a key is rewritten (writing a new KV entry of an existing key), the KV
117entry is stored at a new location that may or may not be located in the same
118sector as the previous entry for that key. The new entry uses a transaction
119ID greater than the previous transaction ID. The previous KV entry for that key
120remains unaltered “on-disk” but is considered “stale”. It is garbage collected
121at some future time.
122
123Redundancy
124----------
125
126KVS supports storing redundant copies of KV entries. For a given redundancy
127level (N), N total copies of each KV entry are stored. Redundant copies are
128always stored in different sectors. This protects against corruption or even
129full sector loss in N-1 sectors without data loss.
130
131Redundancy increases flash usage proportional to the redundancy level. The RAM
132usage for KVS internal state has a small increase with redundancy.
133
134Garbage Collection
135------------------
136
137Storage space occupied by stale KV entries is reclaimed and made available
138for reuse through a garbage collection process. The base garbage collection
139operation is done to reclaim one sector at a time.
140
141KVS always keeps at least one sector free at all times to ensure the ability to
142garbage collect. This free sector is used to copy valid entries from the sector
143to be garbage collected before erasing the sector to be garbage collected. The
144always free sector is rotated as part of the KVS wear leveling.
145
146Full Maintenance does garbage collection of all sectors except those that have
147current valid KV entries.
148
149Heavy Maintenance does garbage collection of all sectors. Use strong caution
150when doing Heavy Maintenance as it can, compared to Full Maintenance, result
151in a significant amount of moving valid entries,
152
153Garbage collection can be performed by request of higher level software or
154automatically as needed to make space available to write new entries.
155
156Flash wear management
157---------------------
158
159Wear leveling is accomplished by cycling selection of the next sector to write
160to. This cycling spreads flash wear across all free sectors so that no one
161sector is prematurely worn out.
162
163Wear leveling through cycling selection of next sector to write
164
165* Location of new writes/rewrites of key-values will prefer sectors already
166 in-use (partially filled), with new (blank) sectors used when no in-use
167 sectors have large enough available space for the new write
168* New (blank) sectors selected cycle sequentially between available free
169 sectors
170* Search for the first available sector, starting from current write sector + 1
171 and wrap around to start at the end of partition.
172* This spreads the erase/write cycles for heavily written/rewritten key-values
173 across all free sectors, reducing wear on any single sector
174* Erase count is not considered as part of the wear leveling decision making
175 process
176* Sectors with already written key-values that are not modified will remain in
177 the original sector and not participate in wear-leveling, so long as the
178 key-values in the sector remain unchanged