Armando Montanez | 179aa8e | 2021-03-10 11:46:35 -0800 | [diff] [blame] | 1 | .. _module-pw_snapshot-design_discussion: |
| 2 | |
| 3 | ================= |
| 4 | Design Discussion |
| 5 | ================= |
| 6 | There were a handful of key requirements going into the design of pw_snapshot: |
| 7 | |
| 8 | * **Pre-established file format** - Building and maintaining tooling to support |
| 9 | parsing binary snapshot data is a high maintenance burden that detracts from |
| 10 | the appeal of a pre-existing widely known/supported format. |
| 11 | * **Incremental writing** - Needing to build an entire snapshot before |
| 12 | committing it as a finished file is a big limitation on embedded devices where |
| 13 | RAM is often very constrained. It is important that a snapshot can be built in |
| 14 | smaller in-memory segments that can be committed incrementally to a larger |
| 15 | sink (e.g. UART, off-chip flash). |
| 16 | * **Extensible** - Pigweed doesn't know everything users might want to capture |
| 17 | in a snapshot. It's important that users have ways to include their own |
| 18 | information into snapshots with minimal friction. |
| 19 | * **Relatively compact** - It's important that snapshots can contain useful |
| 20 | information even when they are limited to a few hundred bytes in size. |
| 21 | |
| 22 | Why Proto? |
| 23 | ========== |
| 24 | Protobufs are widely used and supported across many languages and platforms. |
| 25 | This greatly reduces the encode/decode tooling maintenance introduced by using |
| 26 | custom or unstructured formats. While using a format like JSON provides |
| 27 | similarly wide tooling support, encoding the same information as a proto |
| 28 | significantly reduces the final file size. |
| 29 | |
| 30 | While protobuffer messages aren't truly streamable (i.e. can be written without |
| 31 | any intermediate buffers) due to how message nesting works, a large message can |
| 32 | be incrementally written as long as there's enough buffer space for encoding the |
| 33 | largest single sub-message in the proto. |
| 34 | |
| 35 | Why overlay multiple protos? |
| 36 | ============================ |
| 37 | Proto 2 supported a feature called "extensions" that explicitly allowed this |
| 38 | behavior. While proto 3 removed this feature, it doesn't disallow the old |
| 39 | behavior of serializing two 'overlayed' protos to the same data stream. Proto 3 |
| 40 | recommends using an "Any" proto instead of extensions, as it is more explicit |
| 41 | and eliminates the issue of collisions in proto messages. Unfortunately, proto |
| 42 | 'Any' messages introduce unacceptable overhead. For a single integer that would |
| 43 | encode to a few bytes using extensions, an Any submessage quickly expands to |
| 44 | tens of bytes. |
| 45 | |
| 46 | pw_snapshot's proto format takes advantage of "extensions" from proto 2 without |
| 47 | explicitly relying on the feature. To reduce the risk of colissions and maximize |
| 48 | encoding efficiency, certain ranges are reserved to allow Pigweed to grow while |
| 49 | ensuring downstream customers have equivalent flexibility when using the |
| 50 | Snapshot proto format. |
| 51 | |
| 52 | Why no file header? |
| 53 | =================== |
| 54 | Right now it's assumed that anything that is storing or transferring a |
| 55 | serialized snapshot implicitly tracks its size (and a checksum, if desired). |
| 56 | While a container format might be introduced independently, pw_snapshot focuses |
| 57 | on treating an encoded snapshot as raw serialized proto data. |