blob: fd21996541f49f323fdf83bec0d7892fe9fdaa96 [file] [log] [blame]
Wyatt Heplerf9fb90f2020-09-30 18:59:33 -07001.. _module-pw_kvs:
Armando Montanez0054a9b2020-03-13 13:06:24 -07002
3------
4pw_kvs
5------
David Rogersdcdd4862020-10-27 03:34:37 -07006
7.. note::
8 The documentation for this module is currently under construction.
9
Armando Montanez0054a9b2020-03-13 13:06:24 -070010``pw_kvs`` is Pigweed's Key Value Store (KVS) library. KVS is a flash-backed
11persistent storage system with integrated wear-leveling that serves as a
12relatively lightweight alternative to a file system.
13
David Rogersdcdd4862020-10-27 03:34:37 -070014KeyValueStore
15=============
16
17The KVS system stores key and value data pairs. The key value pairs are stored
18in `flash memory`_ as a `key-value entry`_ (KV entry) that consists of a
19header/metadata, the key data, and value data. KV entries are accessed through
20Put, Get, and Delete operations.
21
22Each flash sector is written sequentially in an append-only manner, with each
23following entry write being at a higher address than all of the previous entry
24writes to that sector since erase. Once information (header, metadata, data,
25etc) is written to flash, that information is not modified or cleared until a
26full sector erase occurs as part of garbage collection.
27
28Individual KV entries are contained within a single flash sector (do not cross
29sector boundaries). Flash sectors can contain as many KV entries as fit in the
30sector.
31
32KVS does not store any data/metadata/state in flash beyond the KV entries. All
33KVS system state can be derived from the stored KV entries. Current KVS system
34state is determined at boot from flash-stored KV entries and then maintained in
35ram by the KVS. The KVS is at all times in a valid state on-flash, so there are
36no windows of vulnerability to unexpected power loss or crash. The old entry
37for a key is maintained until the new entry for that key is written and
38verified.
39
40Each `key-value entry`_ has a unique transaction ID that is incremented for
41each KVS update transaction. When determining system state from flash-stored KV
42entries, the valid entry with the highest transaction ID is considered to be
43the “current” entry of the key. All stored entries of the same key with lower
44transaction ID are considered old or “stale”.
45
46Updates/rewrites of a key that has been previously stored is done as a new KV
47entry with an updated transaction ID and the new value for the key. The KVS
48internal state is updated to reflect the new entry. The previously stored KV
49entries for that key are not modified or removed from flash storage, until
50garbage collection reclaims the “stale” entries.
51
Shiva Rajagopal9e516562021-05-11 17:04:15 -070052`Garbage collection`_ is done by copying any currently valid KV entries in the
David Rogersdcdd4862020-10-27 03:34:37 -070053sector to be garbage collected to a different sector and then erasing the
54sector.
55
56Flash Memory
57-------------
58
59The flash storage used by KVS is comprised of two layers, FlashMemory and
60FlashPartition.
61
62FlashMemory is the lower level that manages the raw read/write/erase of the
63flash memory device.
64
65FlashPartition is a portion of a FlashMemory. A FlashMemory may have multiple
66FlashPartitions that represent different parts of the FlashMemory - such as
67partitions for KVS, OTA, snapshots/crashlogs, etc. Each FlashPartition has its
68own separate logical address space starting from zero to size of the partition.
69FlashPartition logical address does not always map directly to FlashMemory
70addresses due to partition encryption, sector headers, etc.
71
72Writes to flash must have a start address that is a multiple of the flash
73write alignment. Write size must also be a multiple of flash write alignment.
David Rogers2d4fa782021-07-08 21:07:28 -070074Write alignment varies by flash device and partition type. Reads from flash do
75not have any address or size alignment requirement - reads always have a
76minimum alignment of 1.
David Rogersdcdd4862020-10-27 03:34:37 -070077
David Rogers2d4fa782021-07-08 21:07:28 -070078FlashPartitions may have a different alignment than the FlashMemory they are
79part of, so long as the partition's alignment is a multiple of the alignment
80for the FlashMemory.
81
82Sectors are the minimum erase size for both FlashMemory and FlashPartition.
83FlashPartitions may have a different logical sector size than the FlashMemory
84they are part of. Partition logical sectors may be smaller due to partition
85overhead (encryption, wear tracking, etc) or larger due to combining raw
86sectors into larger logical sectors.
David Rogersdcdd4862020-10-27 03:34:37 -070087
David Rogersb9acbb02022-02-11 14:19:55 -080088FlashPartition supports access via NonSeekableWriter and SeekableReader.
89
David Rogersa2c4d1d2021-01-15 03:17:28 -080090Size report
91-----------
92The following size report showcases the memory usage of the KVS and
93FlashPartition.
94
95.. include:: kvs_size
96
David Rogersdcdd4862020-10-27 03:34:37 -070097Storage Allocation
98------------------
99
100KVS requires more storage space than the size of the key-value data stored.
101This is due to the always free sector required for garbage collection and the
102"write and garbage collect later" approach KVS uses.
103
104KVS works poorly with stored data being more than 75% of the available
105storage. It works best with stored data being less than 50% of the available
106storage. For applications that prefer/need to do garbage collection at
107scheduled times or that write very heavily can benefit from additional flash
108store space.
109
110The flash storage used by KVS is multiplied by `redundancy`_ used. A redundancy
111of 2 will use twice the storage.
112
113Key-Value Entry
114---------------
115
116Each key-value (KV) entry consists of a header/metadata, the key data, and
117value data. Individual KV entries are contained within a single flash sector
118(do not cross sector boundaries). Because of this the maximum KV entry size is
119the partition sector size.
120
121KV entries are appended as needed to sectors, with append operations spread
122over time. Each individual KV entry is written completely as a single
123high-level operation. KV entries are appended to a sector as long as space is
124available for a given KV entry. Multiple sectors can be active for writing at
125any time.
126
127When a key is rewritten (writing a new KV entry of an existing key), the KV
128entry is stored at a new location that may or may not be located in the same
129sector as the previous entry for that key. The new entry uses a transaction
130ID greater than the previous transaction ID. The previous KV entry for that key
131remains unaltered “on-disk” but is considered “stale”. It is garbage collected
132at some future time.
133
134Redundancy
135----------
136
137KVS supports storing redundant copies of KV entries. For a given redundancy
138level (N), N total copies of each KV entry are stored. Redundant copies are
139always stored in different sectors. This protects against corruption or even
140full sector loss in N-1 sectors without data loss.
141
142Redundancy increases flash usage proportional to the redundancy level. The RAM
143usage for KVS internal state has a small increase with redundancy.
144
145Garbage Collection
146------------------
147
148Storage space occupied by stale KV entries is reclaimed and made available
149for reuse through a garbage collection process. The base garbage collection
150operation is done to reclaim one sector at a time.
151
152KVS always keeps at least one sector free at all times to ensure the ability to
153garbage collect. This free sector is used to copy valid entries from the sector
154to be garbage collected before erasing the sector to be garbage collected. The
155always free sector is rotated as part of the KVS wear leveling.
156
157Full Maintenance does garbage collection of all sectors except those that have
158current valid KV entries.
159
160Heavy Maintenance does garbage collection of all sectors. Use strong caution
161when doing Heavy Maintenance as it can, compared to Full Maintenance, result
162in a significant amount of moving valid entries,
163
164Garbage collection can be performed by request of higher level software or
165automatically as needed to make space available to write new entries.
166
167Flash wear management
168---------------------
169
170Wear leveling is accomplished by cycling selection of the next sector to write
171to. This cycling spreads flash wear across all free sectors so that no one
172sector is prematurely worn out.
173
174Wear leveling through cycling selection of next sector to write
175
176* Location of new writes/rewrites of key-values will prefer sectors already
177 in-use (partially filled), with new (blank) sectors used when no in-use
178 sectors have large enough available space for the new write
179* New (blank) sectors selected cycle sequentially between available free
180 sectors
181* Search for the first available sector, starting from current write sector + 1
182 and wrap around to start at the end of partition.
183* This spreads the erase/write cycles for heavily written/rewritten key-values
184 across all free sectors, reducing wear on any single sector
185* Erase count is not considered as part of the wear leveling decision making
186 process
187* Sectors with already written key-values that are not modified will remain in
188 the original sector and not participate in wear-leveling, so long as the
189 key-values in the sector remain unchanged