Wyatt Hepler | f9fb90f | 2020-09-30 18:59:33 -0700 | [diff] [blame] | 1 | .. _module-pw_protobuf: |
Alexei Frolov | 9c2ed46 | 2020-01-13 15:35:42 -0800 | [diff] [blame] | 2 | |
Ewout van Bekkum | f1672fb | 2021-08-24 14:21:29 -0700 | [diff] [blame] | 3 | =========== |
Alexei Frolov | 9c2ed46 | 2020-01-13 15:35:42 -0800 | [diff] [blame] | 4 | pw_protobuf |
Ewout van Bekkum | f1672fb | 2021-08-24 14:21:29 -0700 | [diff] [blame] | 5 | =========== |
Alexei Frolov | 9c2ed46 | 2020-01-13 15:35:42 -0800 | [diff] [blame] | 6 | The protobuf module provides a lightweight interface for encoding and decoding |
| 7 | the Protocol Buffer wire format. |
| 8 | |
Alexei Frolov | 469b39f | 2020-04-30 10:48:43 -0700 | [diff] [blame] | 9 | .. note:: |
| 10 | |
| 11 | The protobuf module is a work in progress. Wire format encoding and decoding |
| 12 | is supported, though the APIs are not final. C++ code generation exists for |
| 13 | encoding, but not decoding. |
| 14 | |
Alexei Frolov | 4a257c1 | 2020-03-02 14:09:42 -0800 | [diff] [blame] | 15 | Design |
| 16 | ====== |
| 17 | Unlike other protobuf libraries, which typically provide in-memory data |
Armando Montanez | 0054a9b | 2020-03-13 13:06:24 -0700 | [diff] [blame] | 18 | structures to represent protobuf messages, ``pw_protobuf`` operates directly on |
| 19 | the wire format and leaves data storage to the user. This has a few benefits. |
| 20 | The primary one is that it allows the library to be incredibly small, with the |
Alexei Frolov | 4a257c1 | 2020-03-02 14:09:42 -0800 | [diff] [blame] | 21 | encoder and decoder each having a code size of around 1.5K and negligible RAM |
| 22 | usage. Users can choose the tradeoffs most suitable for their product on top of |
| 23 | this core implementation. |
| 24 | |
Armando Montanez | 0054a9b | 2020-03-13 13:06:24 -0700 | [diff] [blame] | 25 | ``pw_protobuf`` also provides zero-overhead C++ code generation which wraps its |
Alexei Frolov | 4a257c1 | 2020-03-02 14:09:42 -0800 | [diff] [blame] | 26 | low-level wire format operations with a user-friendly API for processing |
| 27 | specific protobuf messages. The code generation integrates with Pigweed's GN |
| 28 | build system. |
| 29 | |
Alexei Frolov | f9ae189 | 2021-04-01 18:24:27 -0700 | [diff] [blame] | 30 | Configuration |
| 31 | ============= |
| 32 | ``pw_protobuf`` supports the following configuration options. |
| 33 | |
| 34 | * ``PW_PROTOBUF_CFG_MAX_VARINT_SIZE``: |
| 35 | When encoding nested messages, the number of bytes to reserve for the varint |
| 36 | submessage length. Nested messages are limited in size to the maximum value |
| 37 | that can be varint-encoded into this reserved space. |
| 38 | |
| 39 | The values that can be set, and their corresponding maximum submessage |
| 40 | lengths, are outlined below. |
| 41 | |
| 42 | +-------------------+----------------------------------------+ |
| 43 | | MAX_VARINT_SIZE | Maximum submessage length | |
| 44 | +===================+========================================+ |
| 45 | | 1 byte | 127 | |
| 46 | +-------------------+----------------------------------------+ |
| 47 | | 2 bytes | 16,383 or < 16KiB | |
| 48 | +-------------------+----------------------------------------+ |
| 49 | | 3 bytes | 2,097,151 or < 2048KiB | |
| 50 | +-------------------+----------------------------------------+ |
| 51 | | 4 bytes (default) | 268,435,455 or < 256MiB | |
| 52 | +-------------------+----------------------------------------+ |
| 53 | | 5 bytes | 4,294,967,295 or < 4GiB (max uint32_t) | |
| 54 | +-------------------+----------------------------------------+ |
| 55 | |
Ewout van Bekkum | f1672fb | 2021-08-24 14:21:29 -0700 | [diff] [blame] | 56 | ======== |
| 57 | Encoding |
| 58 | ======== |
| 59 | |
Alexei Frolov | 4a257c1 | 2020-03-02 14:09:42 -0800 | [diff] [blame] | 60 | Usage |
| 61 | ===== |
Ewout van Bekkum | f1672fb | 2021-08-24 14:21:29 -0700 | [diff] [blame] | 62 | Pigweed's protobuf encoders encode directly to the wire format of a proto rather |
| 63 | than staging information to a mutable datastructure. This means any writes of a |
| 64 | value are final, and can't be referenced or modified as a later step in the |
| 65 | encode process. |
Alexei Frolov | 4a257c1 | 2020-03-02 14:09:42 -0800 | [diff] [blame] | 66 | |
Ewout van Bekkum | f1672fb | 2021-08-24 14:21:29 -0700 | [diff] [blame] | 67 | MemoryEncoder |
| 68 | ============= |
| 69 | A MemoryEncoder directly encodes a proto to an in-memory buffer. |
Alexei Frolov | 4a257c1 | 2020-03-02 14:09:42 -0800 | [diff] [blame] | 70 | |
Ewout van Bekkum | f1672fb | 2021-08-24 14:21:29 -0700 | [diff] [blame] | 71 | .. Code:: cpp |
Alexei Frolov | 4a257c1 | 2020-03-02 14:09:42 -0800 | [diff] [blame] | 72 | |
Ewout van Bekkum | f1672fb | 2021-08-24 14:21:29 -0700 | [diff] [blame] | 73 | // Writes a proto response to the provided buffer, returning the encode |
| 74 | // status and number of bytes written. |
| 75 | StatusWithSize WriteProtoResponse(ByteSpan response) { |
| 76 | // All proto writes are directly written to the `response` buffer. |
| 77 | MemoryEncoder encoder(response); |
| 78 | encoder.WriteUint32(kMagicNumberField, 0x1a1a2b2b); |
| 79 | encoder.WriteString(kFavoriteFood, "cookies"); |
| 80 | return StatusWithSize(encoder.status(), encoder.size()); |
| 81 | } |
Alexei Frolov | 9c2ed46 | 2020-01-13 15:35:42 -0800 | [diff] [blame] | 82 | |
Ewout van Bekkum | f1672fb | 2021-08-24 14:21:29 -0700 | [diff] [blame] | 83 | StreamEncoder |
| 84 | ============= |
| 85 | pw_protobuf's StreamEncoder class operates on pw::stream::Writer objects to |
| 86 | serialized proto data. This means you can directly encode a proto to something |
| 87 | like pw::sys_io without needing to build the complete message in memory first. |
Alexei Frolov | 4a257c1 | 2020-03-02 14:09:42 -0800 | [diff] [blame] | 88 | |
Ewout van Bekkum | f1672fb | 2021-08-24 14:21:29 -0700 | [diff] [blame] | 89 | .. Code:: cpp |
| 90 | |
| 91 | #include "pw_protobuf/encoder.h" |
| 92 | #include "pw_stream/sys_io_stream.h" |
| 93 | #include "pw_bytes/span.h" |
| 94 | |
| 95 | pw::stream::SysIoWriter sys_io_writer; |
| 96 | pw::protobuf::StreamEncoder my_proto_encoder(sys_io_writer, |
| 97 | pw::ByteSpan()); |
| 98 | |
| 99 | // Once this line returns, the field has been written to the Writer. |
| 100 | my_proto_encoder.WriteInt64(kTimestampFieldNumber, system::GetUnixEpoch()); |
| 101 | |
| 102 | // There's no intermediate buffering when writing a string directly to a |
| 103 | // StreamEncoder. |
| 104 | my_proto_encoder.WriteString(kWelcomeMessageFieldNumber, |
| 105 | "Welcome to Pigweed!"); |
| 106 | if (!my_proto_encoder.status().ok()) { |
| 107 | PW_LOG_INFO("Failed to encode proto; %s", my_proto_encoder.status().str()); |
| 108 | } |
| 109 | |
| 110 | Nested submessages |
| 111 | ================== |
| 112 | Writing proto messages with nested submessages requires buffering due to |
| 113 | limitations of the proto format. Every proto submessage must know the size of |
| 114 | the submessage before its final serialization can begin. A streaming encoder can |
| 115 | be passed a scratch buffer to use when constructing nested messages. All |
| 116 | submessage data is buffered to this scratch buffer until the submessage is |
| 117 | finalized. Note that the contents of this scratch buffer is not necessarily |
| 118 | valid proto data, so don't try to use it directly. |
| 119 | |
| 120 | MemoryEncoder objects use the final destination buffer rather than relying on a |
| 121 | scratch buffer. Note that this means your destination buffer might need |
| 122 | additional space for overhead incurred by nesting submessages. The |
| 123 | ``MaxScratchBufferSize()`` helper function can be useful in estimating how much |
| 124 | space to allocate to account for nested submessage encoding overhead. |
| 125 | |
| 126 | .. Code:: cpp |
| 127 | |
| 128 | #include "pw_protobuf/encoder.h" |
| 129 | #include "pw_stream/sys_io_stream.h" |
| 130 | #include "pw_bytes/span.h" |
| 131 | |
| 132 | pw::stream::SysIoWriter sys_io_writer; |
| 133 | // The scratch buffer should be at least as big as the largest nested |
| 134 | // submessage. It's a good idea to be a little generous. |
| 135 | std::byte submessage_scratch_buffer[64]; |
| 136 | |
| 137 | // Provide the scratch buffer to the proto encoder. The buffer's lifetime must |
| 138 | // match the lifetime of the encoder. |
| 139 | pw::protobuf::StreamEncoder my_proto_encoder(sys_io_writer, |
Ewout van Bekkum | 011a4d5 | 2021-08-20 20:19:52 -0700 | [diff] [blame] | 140 | submessage_scratch_buffer); |
Ewout van Bekkum | f1672fb | 2021-08-24 14:21:29 -0700 | [diff] [blame] | 141 | |
Ewout van Bekkum | 011a4d5 | 2021-08-20 20:19:52 -0700 | [diff] [blame] | 142 | { |
| 143 | // Note that the parent encoder, my_proto_encoder, cannot be used until the |
| 144 | // nested encoder, nested_encoder, has been destroyed. |
| 145 | StreamEncoder nested_encoder = |
Ewout van Bekkum | f1672fb | 2021-08-24 14:21:29 -0700 | [diff] [blame] | 146 | my_proto_encoder.GetNestedEncoder(kPetsFieldNumber); |
Ewout van Bekkum | 011a4d5 | 2021-08-20 20:19:52 -0700 | [diff] [blame] | 147 | |
| 148 | // There's intermediate buffering when writing to a nested encoder. |
| 149 | nested_encoder.WriteString(kNameFieldNumber, "Spot"); |
| 150 | nested_encoder.WriteString(kPetTypeFieldNumber, "dog"); |
| 151 | |
| 152 | // When this scope ends, the nested encoder is serialized to the Writer. |
| 153 | // In addition, the parent encoder, my_proto_encoder, can be used again. |
| 154 | } |
Ewout van Bekkum | f1672fb | 2021-08-24 14:21:29 -0700 | [diff] [blame] | 155 | |
| 156 | // If an encode error occurs when encoding the nested messages, it will be |
| 157 | // reflected at the root encoder. |
| 158 | if (!my_proto_encoder.status().ok()) { |
| 159 | PW_LOG_INFO("Failed to encode proto; %s", my_proto_encoder.status().str()); |
| 160 | } |
| 161 | |
| 162 | .. warning:: |
Ewout van Bekkum | 011a4d5 | 2021-08-20 20:19:52 -0700 | [diff] [blame] | 163 | When a nested submessage is created, any use of the parent encoder that |
| 164 | created the nested encoder will trigger a crash. To resume using the parent |
| 165 | encoder, destroy the submessage encoder first. |
Ewout van Bekkum | f1672fb | 2021-08-24 14:21:29 -0700 | [diff] [blame] | 166 | |
| 167 | Error Handling |
| 168 | ============== |
| 169 | While individual write calls on a proto encoder return pw::Status objects, the |
| 170 | encoder tracks all status returns and "latches" onto the first error |
| 171 | encountered. This status can be accessed via ``StreamEncoder::status()``. |
| 172 | |
| 173 | Codegen |
| 174 | ======= |
| 175 | pw_protobuf encoder codegen integration is supported in GN, Bazel, and CMake. |
| 176 | The codegen is just a light wrapper around the ``StreamEncoder`` and |
| 177 | ``MemoryEncoder`` objects, providing named helper functions to write proto |
| 178 | fields rather than requiring that field numbers are directly passed to an |
| 179 | encoder. Namespaced proto enums are also generated, and used as the arguments |
| 180 | when writing enum fields of a proto message. |
| 181 | |
| 182 | All generated messages provide a ``Fields`` enum that can be used directly for |
| 183 | out-of-band encoding, or with the ``pw::protobuf::Decoder``. |
| 184 | |
| 185 | This module's codegen is available through the ``*.pwpb`` sub-target of a |
| 186 | ``pw_proto_library`` in GN, CMake, and Bazel. See :ref:`pw_protobuf_compiler's |
| 187 | documentation <module-pw_protobuf_compiler>` for more information on build |
| 188 | system integration for pw_protobuf codegen. |
| 189 | |
| 190 | Example ``BUILD.gn``: |
| 191 | |
| 192 | .. Code:: none |
| 193 | |
| 194 | import("//build_overrides/pigweed.gni") |
| 195 | |
| 196 | import("$dir_pw_build/target_types.gni") |
| 197 | import("$dir_pw_protobuf_compiler/proto.gni") |
| 198 | |
| 199 | # This target controls where the *.pwpb.h headers end up on the include path. |
| 200 | # In this example, it's at "pet_daycare_protos/client.pwpb.h". |
| 201 | pw_proto_library("pet_daycare_protos") { |
| 202 | sources = [ |
| 203 | "pet_daycare_protos/client.proto", |
| 204 | ] |
| 205 | } |
| 206 | |
| 207 | pw_source_set("example_client") { |
| 208 | sources = [ "example_client.cc" ] |
| 209 | deps = [ |
| 210 | ":pet_daycare_protos.pwpb", |
| 211 | dir_pw_bytes, |
| 212 | dir_pw_stream, |
| 213 | ] |
| 214 | } |
| 215 | |
| 216 | Example ``pet_daycare_protos/client.proto``: |
| 217 | |
| 218 | .. Code:: none |
| 219 | |
| 220 | syntax = "proto3"; |
| 221 | // The proto package controls the namespacing of the codegen. If this package |
| 222 | // were fuzzy.friends, the namespace for codegen would be fuzzy::friends::*. |
| 223 | package fuzzy_friends; |
| 224 | |
| 225 | message Pet { |
| 226 | string name = 1; |
| 227 | string pet_type = 2; |
| 228 | } |
| 229 | |
| 230 | message Client { |
| 231 | repeated Pet pets = 1; |
| 232 | } |
| 233 | |
| 234 | Example ``example_client.cc``: |
| 235 | |
| 236 | .. Code:: cpp |
| 237 | |
| 238 | #include "pet_daycare_protos/client.pwpb.h" |
| 239 | #include "pw_protobuf/encoder.h" |
| 240 | #include "pw_stream/sys_io_stream.h" |
| 241 | #include "pw_bytes/span.h" |
| 242 | |
| 243 | pw::stream::SysIoWriter sys_io_writer; |
| 244 | std::byte submessage_scratch_buffer[64]; |
| 245 | // The constructor is the same as a pw::protobuf::StreamEncoder. |
| 246 | fuzzy_friends::Client::StreamEncoder client(sys_io_writer, |
| 247 | submessage_scratch_buffer); |
Ewout van Bekkum | 011a4d5 | 2021-08-20 20:19:52 -0700 | [diff] [blame] | 248 | { |
| 249 | fuzzy_friends::Pet::StreamEncoder pet1 = client.GetPetsEncoder(); |
| 250 | pet1.WriteName("Spot"); |
| 251 | pet1.WritePetType("dog"); |
| 252 | } |
Ewout van Bekkum | f1672fb | 2021-08-24 14:21:29 -0700 | [diff] [blame] | 253 | |
Ewout van Bekkum | 011a4d5 | 2021-08-20 20:19:52 -0700 | [diff] [blame] | 254 | { |
Ewout van Bekkum | f1672fb | 2021-08-24 14:21:29 -0700 | [diff] [blame] | 255 | fuzzy_friends::Pet::StreamEncoder pet2 = client.GetPetsEncoder(); |
| 256 | pet2.WriteName("Slippers"); |
| 257 | pet2.WritePetType("rabbit"); |
| 258 | } |
| 259 | |
| 260 | if (!client.status().ok()) { |
| 261 | PW_LOG_INFO("Failed to encode proto; %s", client.status().str()); |
| 262 | } |
| 263 | |
| 264 | ======== |
| 265 | Decoding |
| 266 | ======== |
| 267 | |
| 268 | Size report |
| 269 | =========== |
| 270 | |
| 271 | Full size report |
| 272 | ---------------- |
| 273 | |
| 274 | This report demonstrates the size of using the entire decoder with all of its |
| 275 | decode methods and a decode callback for a proto message containing each of the |
| 276 | protobuf field types. |
| 277 | |
| 278 | .. include:: size_report/decoder_full |
| 279 | |
| 280 | |
| 281 | Incremental size report |
| 282 | ----------------------- |
| 283 | |
| 284 | This report is generated using the full report as a base and adding some int32 |
| 285 | fields to the decode callback to demonstrate the incremental cost of decoding |
| 286 | fields in a message. |
| 287 | |
| 288 | .. include:: size_report/decoder_incremental |
| 289 | |
| 290 | ======================================== |
Alexei Frolov | 4a257c1 | 2020-03-02 14:09:42 -0800 | [diff] [blame] | 291 | Comparison with other protobuf libraries |
| 292 | ======================================== |
| 293 | |
| 294 | protobuf-lite |
Ewout van Bekkum | f1672fb | 2021-08-24 14:21:29 -0700 | [diff] [blame] | 295 | ============= |
Alexei Frolov | 4a257c1 | 2020-03-02 14:09:42 -0800 | [diff] [blame] | 296 | protobuf-lite is the official reduced-size C++ implementation of protobuf. It |
| 297 | uses a restricted subset of the protobuf library's features to minimize code |
| 298 | size. However, is is still around 150K in size and requires dynamic memory |
| 299 | allocation, making it unsuitable for many embedded systems. |
| 300 | |
| 301 | nanopb |
Ewout van Bekkum | f1672fb | 2021-08-24 14:21:29 -0700 | [diff] [blame] | 302 | ====== |
Alexei Frolov | 4a257c1 | 2020-03-02 14:09:42 -0800 | [diff] [blame] | 303 | `nanopb <https://github.com/nanopb/nanopb>`_ is a commonly used embedded |
| 304 | protobuf library with very small code size and full code generation. It provides |
| 305 | both encoding/decoding functionality and in-memory C structs representing |
| 306 | protobuf messages. |
| 307 | |
| 308 | nanopb works well for many embedded products; however, using its generated code |
| 309 | can run into RAM usage issues when processing nontrivial protobuf messages due |
| 310 | to the necessity of defining a struct capable of storing all configurations of |
| 311 | the message, which can grow incredibly large. In one project, Pigweed developers |
| 312 | encountered an 11K struct statically allocated for a single message---over twice |
| 313 | the size of the final encoded output! (This was what prompted the development of |
Armando Montanez | 0054a9b | 2020-03-13 13:06:24 -0700 | [diff] [blame] | 314 | ``pw_protobuf``.) |
Alexei Frolov | 4a257c1 | 2020-03-02 14:09:42 -0800 | [diff] [blame] | 315 | |
| 316 | To avoid this issue, it is possible to use nanopb's low-level encode/decode |
| 317 | functions to process individual message fields directly, but this loses all of |
Armando Montanez | 0054a9b | 2020-03-13 13:06:24 -0700 | [diff] [blame] | 318 | the useful semantics of code generation. ``pw_protobuf`` is designed to optimize |
| 319 | for this use case; it allows for efficient operations on the wire format with an |
Alexei Frolov | 4a257c1 | 2020-03-02 14:09:42 -0800 | [diff] [blame] | 320 | intuitive user interface. |
| 321 | |
| 322 | Depending on the requirements of a project, either of these libraries could be |
| 323 | suitable. |