Wyatt Hepler | f9fb90f | 2020-09-30 18:59:33 -0700 | [diff] [blame] | 1 | .. _module-pw_metric: |
Keir Mierle | 45fa785 | 2020-08-10 21:09:54 -0700 | [diff] [blame] | 2 | |
| 3 | ========= |
| 4 | pw_metric |
| 5 | ========= |
| 6 | |
| 7 | .. attention:: |
| 8 | |
| 9 | This module is **not yet production ready**; ask us if you are interested in |
| 10 | using it out or have ideas about how to improve it. |
| 11 | |
| 12 | -------- |
| 13 | Overview |
| 14 | -------- |
| 15 | Pigweed's metric module is a **lightweight manual instrumentation system** for |
| 16 | tracking system health metrics like counts or set values. For example, |
| 17 | ``pw_metric`` could help with tracking the number of I2C bus writes, or the |
| 18 | number of times a buffer was filled before it could drain in time, or safely |
| 19 | incrementing counters from ISRs. |
| 20 | |
| 21 | Key features of ``pw_metric``: |
| 22 | |
| 23 | - **Tokenized names** - Names are tokenized using the ``pw_tokenizer`` enabling |
| 24 | long metric names that don't bloat your binary. |
| 25 | |
| 26 | - **Tree structure** - Metrics can form a tree, enabling grouping of related |
| 27 | metrics for clearer organization. |
| 28 | |
| 29 | - **Per object collection** - Metrics and groups can live on object instances |
| 30 | and be flexibly combined with metrics from other instances. |
| 31 | |
| 32 | - **Global registration** - For legacy code bases or just because it's easier, |
| 33 | ``pw_metric`` supports automatic aggregation of metrics. This is optional but |
| 34 | convenient in many cases. |
| 35 | |
| 36 | - **Simple design** - There are only two core data structures: ``Metric`` and |
| 37 | ``Group``, which are both simple to understand and use. The only type of |
| 38 | metric supported is ``uint32_t`` and ``float``. This module does not support |
| 39 | complicated aggregations like running average or min/max. |
| 40 | |
| 41 | Example: Instrumenting a single object |
| 42 | -------------------------------------- |
| 43 | The below example illustrates what instrumenting a class with a metric group |
| 44 | and metrics might look like. In this case, the object's |
| 45 | ``MySubsystem::metrics()`` member is not globally registered; the user is on |
| 46 | their own for combining this subsystem's metrics with others. |
| 47 | |
| 48 | .. code:: |
| 49 | |
| 50 | #include "pw_metric/metric.h" |
| 51 | |
| 52 | class MySubsystem { |
| 53 | public: |
| 54 | void DoSomething() { |
Keir Mierle | 6128809 | 2020-08-19 19:04:31 -0700 | [diff] [blame] | 55 | attempts_.Increment(); |
Keir Mierle | 45fa785 | 2020-08-10 21:09:54 -0700 | [diff] [blame] | 56 | if (ActionSucceeds()) { |
Keir Mierle | 6128809 | 2020-08-19 19:04:31 -0700 | [diff] [blame] | 57 | successes_.Increment(); |
Keir Mierle | 45fa785 | 2020-08-10 21:09:54 -0700 | [diff] [blame] | 58 | } |
| 59 | } |
Keir Mierle | 6128809 | 2020-08-19 19:04:31 -0700 | [diff] [blame] | 60 | Group& metrics() { return metrics_; } |
Keir Mierle | 45fa785 | 2020-08-10 21:09:54 -0700 | [diff] [blame] | 61 | |
| 62 | private: |
Keir Mierle | 6128809 | 2020-08-19 19:04:31 -0700 | [diff] [blame] | 63 | PW_METRIC_GROUP(metrics_, "my_subsystem"); |
| 64 | PW_METRIC(metrics_, attempts_, "attempts", 0u); |
| 65 | PW_METRIC(metrics_, successes_, "successes", 0u); |
Keir Mierle | 45fa785 | 2020-08-10 21:09:54 -0700 | [diff] [blame] | 66 | }; |
| 67 | |
| 68 | The metrics subsystem has no canonical output format at this time, but a JSON |
| 69 | dump might look something like this: |
| 70 | |
| 71 | .. code:: none |
| 72 | |
| 73 | { |
| 74 | "my_subsystem" : { |
| 75 | "successes" : 1000, |
| 76 | "attempts" : 1200, |
| 77 | } |
| 78 | } |
| 79 | |
| 80 | In this case, every instance of ``MySubsystem`` will have unique counters. |
| 81 | |
| 82 | Example: Instrumenting a legacy codebase |
| 83 | ---------------------------------------- |
| 84 | A common situation in embedded development is **debugging legacy code** or code |
| 85 | which is hard to change; where it is perhaps impossible to plumb metrics |
| 86 | objects around with dependency injection. The alternative to plumbing metrics |
| 87 | is to register the metrics through a global mechanism. ``pw_metric`` supports |
| 88 | this use case. For example: |
| 89 | |
| 90 | **Before instrumenting:** |
| 91 | |
| 92 | .. code:: |
| 93 | |
| 94 | // This code was passed down from generations of developers before; no one |
| 95 | // knows what it does or how it works. But it needs to be fixed! |
| 96 | void OldCodeThatDoesntWorkButWeDontKnowWhy() { |
| 97 | if (some_variable) { |
| 98 | DoSomething(); |
| 99 | } else { |
| 100 | DoSomethingElse(); |
| 101 | } |
| 102 | } |
| 103 | |
| 104 | **After instrumenting:** |
| 105 | |
| 106 | .. code:: |
| 107 | |
| 108 | #include "pw_metric/global.h" |
| 109 | #include "pw_metric/metric.h" |
| 110 | |
Keir Mierle | 6128809 | 2020-08-19 19:04:31 -0700 | [diff] [blame] | 111 | PW_METRIC_GLOBAL(legacy_do_something, "legacy_do_something"); |
| 112 | PW_METRIC_GLOBAL(legacy_do_something_else, "legacy_do_something_else"); |
Keir Mierle | 45fa785 | 2020-08-10 21:09:54 -0700 | [diff] [blame] | 113 | |
| 114 | // This code was passed down from generations of developers before; no one |
| 115 | // knows what it does or how it works. But it needs to be fixed! |
| 116 | void OldCodeThatDoesntWorkButWeDontKnowWhy() { |
| 117 | if (some_variable) { |
| 118 | legacy_do_something.Increment(); |
| 119 | DoSomething(); |
| 120 | } else { |
| 121 | legacy_do_something_else.Increment(); |
| 122 | DoSomethingElse(); |
| 123 | } |
| 124 | } |
| 125 | |
| 126 | In this case, the developer merely had to add the metrics header, define some |
| 127 | metrics, and then start incrementing them. These metrics will be available |
| 128 | globally through the ``pw::metric::global_metrics`` object defined in |
| 129 | ``pw_metric/global.h``. |
| 130 | |
| 131 | Why not just use simple counter variables? |
| 132 | ------------------------------------------ |
| 133 | One might wonder what the point of leveraging a metric library is when it is |
| 134 | trivial to make some global variables and print them out. There are a few |
| 135 | reasons: |
| 136 | |
| 137 | - **Metrics offload** - To make it easy to get metrics off-device by sharing |
| 138 | the infrastructure for offloading. |
| 139 | |
| 140 | - **Consistent format** - To get the metrics in a consistent format (e.g. |
| 141 | protobuf or JSON) for analysis |
| 142 | |
| 143 | - **Uncoordinated collection** - To provide a simple and reliable way for |
| 144 | developers on a team to all collect metrics for their subsystems, without |
| 145 | having to coordinate to offload. This could extend to code in libraries |
| 146 | written by other teams. |
| 147 | |
| 148 | - **Pre-boot or interrupt visibility** - Some of the most challenging bugs come |
| 149 | from early system boot when not all system facilities are up (e.g. logging or |
| 150 | UART). In those cases, metrics provide a low-overhead approach to understand |
| 151 | what is happening. During early boot, metrics can be incremented, then after |
| 152 | boot dumping the metrics provides insights into what happened. While basic |
| 153 | counter variables can work in these contexts to, one still has to deal with |
| 154 | the offloading problem; which the library handles. |
| 155 | |
| 156 | --------------------- |
| 157 | Metrics API reference |
| 158 | --------------------- |
| 159 | |
| 160 | The metrics API consists of just a few components: |
| 161 | |
| 162 | - The core data structures ``pw::metric::Metric`` and ``pw::metric::Group`` |
| 163 | - The macros for scoped metrics and groups ``PW_METRIC`` and |
| 164 | ``PW_METRIC_GROUP`` |
| 165 | - The macros for globally registered metrics and groups |
| 166 | ``PW_METRIC_GLOBAL`` and ``PW_METRIC_GROUP_GLOBAL`` |
| 167 | - The global groups and metrics list: ``pw::metric::global_groups`` and |
| 168 | ``pw::metric::global_metrics``. |
| 169 | |
| 170 | Metric |
| 171 | ------ |
| 172 | The ``pw::metric::Metric`` provides: |
| 173 | |
| 174 | - A 31-bit tokenized name |
| 175 | - A 1-bit discriminator for int or float |
| 176 | - A 32-bit payload (int or float) |
| 177 | - A 32-bit next pointer (intrusive list) |
| 178 | |
| 179 | The metric object is 12 bytes on 32-bit platforms. |
| 180 | |
| 181 | .. cpp:class:: pw::metric::Metric |
| 182 | |
| 183 | .. cpp:function:: Increment(uint32_t amount = 0) |
| 184 | |
| 185 | Increment the metric by the given amount. Results in undefined behaviour if |
| 186 | the metric is not of type int. |
| 187 | |
| 188 | .. cpp:function:: Set(uint32_t value) |
| 189 | |
| 190 | Set the metric to the given value. Results in undefined behaviour if the |
| 191 | metric is not of type int. |
| 192 | |
| 193 | .. cpp:function:: Set(float value) |
| 194 | |
| 195 | Set the metric to the given value. Results in undefined behaviour if the |
| 196 | metric is not of type float. |
| 197 | |
| 198 | Group |
| 199 | ----- |
| 200 | The ``pw::metric::Group`` object is simply: |
| 201 | |
| 202 | - A name for the group |
| 203 | - A list of children groups |
| 204 | - A list of leaf metrics groups |
| 205 | - A 32-bit next pointer (intrusive list) |
| 206 | |
| 207 | The group object is 16 bytes on 32-bit platforms. |
| 208 | |
| 209 | .. cpp:class:: pw::metric::Group |
| 210 | |
| 211 | .. cpp:function:: Dump(int indent_level = 0) |
| 212 | |
| 213 | Recursively dump a metrics group to ``pw_log``. Produces output like: |
| 214 | |
| 215 | .. code:: none |
| 216 | |
| 217 | "$6doqFw==": { |
| 218 | "$05OCZw==": { |
| 219 | "$VpPfzg==": 1, |
| 220 | "$LGPMBQ==": 1.000000, |
| 221 | "$+iJvUg==": 5, |
| 222 | } |
| 223 | "$9hPNxw==": 65, |
| 224 | "$oK7HmA==": 13, |
| 225 | "$FCM4qQ==": 0, |
| 226 | } |
| 227 | |
| 228 | Note the metric names are tokenized with base64. Decoding requires using |
| 229 | the Pigweed detokenizer. With a detokenizing-enabled logger, you could get |
| 230 | something like: |
| 231 | |
| 232 | .. code:: none |
| 233 | |
| 234 | "i2c_1": { |
| 235 | "gyro": { |
| 236 | "num_sampleses": 1, |
| 237 | "init_time_us": 1.000000, |
| 238 | "initialized": 5, |
| 239 | } |
| 240 | "bus_errors": 65, |
| 241 | "transactions": 13, |
| 242 | "bytes_sent": 0, |
| 243 | } |
| 244 | |
| 245 | Macros |
| 246 | ------ |
| 247 | The **macros are the primary mechanism for creating metrics**, and should be |
| 248 | used instead of directly constructing metrics or groups. The macros handle |
| 249 | tokenizing the metric and group names. |
| 250 | |
Keir Mierle | 6128809 | 2020-08-19 19:04:31 -0700 | [diff] [blame] | 251 | .. cpp:function:: PW_METRIC(identifier, name, value) |
| 252 | .. cpp:function:: PW_METRIC(group, identifier, name, value) |
Paul Mathieu | 2182c66 | 2020-08-28 17:02:50 +0200 | [diff] [blame] | 253 | .. cpp:function:: PW_METRIC_STATIC(identifier, name, value) |
| 254 | .. cpp:function:: PW_METRIC_STATIC(group, identifier, name, value) |
Keir Mierle | 45fa785 | 2020-08-10 21:09:54 -0700 | [diff] [blame] | 255 | |
| 256 | Declare a metric, optionally adding it to a group. |
| 257 | |
Keir Mierle | 6128809 | 2020-08-19 19:04:31 -0700 | [diff] [blame] | 258 | - **identifier** - An identifier name for the created variable or member. |
| 259 | For example: ``i2c_transactions`` might be used as a local or global |
| 260 | metric; inside a class, could be named according to members |
| 261 | (``i2c_transactions_`` for Google's C++ style). |
| 262 | - **name** - The string name for the metric. This will be tokenized. There |
| 263 | are no restrictions on the contents of the name; however, consider |
| 264 | restricting these to be valid C++ identifiers to ease integration with |
| 265 | other systems. |
Keir Mierle | 45fa785 | 2020-08-10 21:09:54 -0700 | [diff] [blame] | 266 | - **value** - The initial value for the metric. Must be either a floating |
| 267 | point value (e.g. ``3.2f``) or unsigned int (e.g. ``21u``). |
| 268 | - **group** - A ``pw::metric::Group`` instance. If provided, the metric is |
| 269 | added to the given group. |
| 270 | |
| 271 | The macro declares a variable or member named "name" with type |
| 272 | ``pw::metric::Metric``, and works in three contexts: global, local, and |
| 273 | member. |
| 274 | |
Paul Mathieu | 2182c66 | 2020-08-28 17:02:50 +0200 | [diff] [blame] | 275 | If the `_STATIC` variant is used, the macro declares a variable with static |
| 276 | storage. These can be used in function scopes, but not in classes. |
| 277 | |
Keir Mierle | 45fa785 | 2020-08-10 21:09:54 -0700 | [diff] [blame] | 278 | 1. At global scope: |
| 279 | |
| 280 | .. code:: |
| 281 | |
Keir Mierle | 6128809 | 2020-08-19 19:04:31 -0700 | [diff] [blame] | 282 | PW_METRIC(foo, "foo", 15.5f); |
Keir Mierle | 45fa785 | 2020-08-10 21:09:54 -0700 | [diff] [blame] | 283 | |
| 284 | void MyFunc() { |
| 285 | foo.Increment(); |
| 286 | } |
| 287 | |
| 288 | 2. At local function or member function scope: |
| 289 | |
| 290 | .. code:: |
| 291 | |
| 292 | void MyFunc() { |
Keir Mierle | 6128809 | 2020-08-19 19:04:31 -0700 | [diff] [blame] | 293 | PW_METRIC(foo, "foo", 15.5f); |
Keir Mierle | 45fa785 | 2020-08-10 21:09:54 -0700 | [diff] [blame] | 294 | foo.Increment(); |
| 295 | // foo goes out of scope here; be careful! |
| 296 | } |
| 297 | |
| 298 | 3. At member level inside a class or struct: |
| 299 | |
| 300 | .. code:: |
| 301 | |
| 302 | struct MyStructy { |
| 303 | void DoSomething() { |
| 304 | somethings.Increment(); |
| 305 | } |
| 306 | // Every instance of MyStructy will have a separate somethings counter. |
Keir Mierle | 6128809 | 2020-08-19 19:04:31 -0700 | [diff] [blame] | 307 | PW_METRIC(somethings, "somethings", 0u); |
Keir Mierle | 45fa785 | 2020-08-10 21:09:54 -0700 | [diff] [blame] | 308 | } |
| 309 | |
| 310 | You can also put a metric into a group with the macro. Metrics can belong to |
| 311 | strictly one group, otherwise a assertion will fail. Example: |
| 312 | |
| 313 | .. code:: |
| 314 | |
Keir Mierle | 6128809 | 2020-08-19 19:04:31 -0700 | [diff] [blame] | 315 | PW_METRIC_GROUP(my_group, "my_group"); |
| 316 | PW_METRIC(my_group, foo, "foo", 0.2f); |
| 317 | PW_METRIC(my_group, bar, "bar", 44000u); |
| 318 | PW_METRIC(my_group, zap, "zap", 3.14f); |
Keir Mierle | 45fa785 | 2020-08-10 21:09:54 -0700 | [diff] [blame] | 319 | |
| 320 | .. tip:: |
| 321 | |
| 322 | If you want a globally registered metric, see ``pw_metric/global.h``; in |
| 323 | that contexts, metrics are globally registered without the need to |
| 324 | centrally register in a single place. |
| 325 | |
Keir Mierle | 6128809 | 2020-08-19 19:04:31 -0700 | [diff] [blame] | 326 | .. cpp:function:: PW_METRIC_GROUP(identifier, name) |
Paul Mathieu | 2182c66 | 2020-08-28 17:02:50 +0200 | [diff] [blame] | 327 | .. cpp:function:: PW_METRIC_GROUP(parent_group, identifier, name) |
| 328 | .. cpp:function:: PW_METRIC_GROUP_STATIC(identifier, name) |
| 329 | .. cpp:function:: PW_METRIC_GROUP_STATIC(parent_group, identifier, name) |
Keir Mierle | 45fa785 | 2020-08-10 21:09:54 -0700 | [diff] [blame] | 330 | |
| 331 | Declares a ``pw::metric::Group`` with name name; the name is tokenized. |
| 332 | Works similar to ``PW_METRIC`` and can be used in the same contexts (global, |
Paul Mathieu | 2182c66 | 2020-08-28 17:02:50 +0200 | [diff] [blame] | 333 | local, and member). Optionally, the group can be added to a parent group. |
| 334 | |
| 335 | If the `_STATIC` variant is used, the macro declares a variable with static |
| 336 | storage. These can be used in function scopes, but not in classes. |
Keir Mierle | 45fa785 | 2020-08-10 21:09:54 -0700 | [diff] [blame] | 337 | |
| 338 | Example: |
| 339 | |
| 340 | .. code:: |
| 341 | |
Keir Mierle | 6128809 | 2020-08-19 19:04:31 -0700 | [diff] [blame] | 342 | PW_METRIC_GROUP(my_group, "my_group"); |
| 343 | PW_METRIC(my_group, foo, "foo", 0.2f); |
| 344 | PW_METRIC(my_group, bar, "bar", 44000u); |
| 345 | PW_METRIC(my_group, zap, "zap", 3.14f); |
Keir Mierle | 45fa785 | 2020-08-10 21:09:54 -0700 | [diff] [blame] | 346 | |
Keir Mierle | 6128809 | 2020-08-19 19:04:31 -0700 | [diff] [blame] | 347 | .. cpp:function:: PW_METRIC_GLOBAL(identifier, name, value) |
Keir Mierle | 45fa785 | 2020-08-10 21:09:54 -0700 | [diff] [blame] | 348 | |
| 349 | Declare a ``pw::metric::Metric`` with name name, and register it in the |
| 350 | global metrics list ``pw::metric::global_metrics``. |
| 351 | |
| 352 | Example: |
| 353 | |
| 354 | .. code:: |
| 355 | |
| 356 | #include "pw_metric/metric.h" |
| 357 | #include "pw_metric/global.h" |
| 358 | |
| 359 | // No need to coordinate collection of foo and bar; they're autoregistered. |
Keir Mierle | 6128809 | 2020-08-19 19:04:31 -0700 | [diff] [blame] | 360 | PW_METRIC_GLOBAL(foo, "foo", 0.2f); |
| 361 | PW_METRIC_GLOBAL(bar, "bar", 44000u); |
Keir Mierle | 45fa785 | 2020-08-10 21:09:54 -0700 | [diff] [blame] | 362 | |
| 363 | Note that metrics defined with ``PW_METRIC_GLOBAL`` should never be added to |
| 364 | groups defined with ``PW_METRIC_GROUP_GLOBAL``. Each metric can only belong |
| 365 | to one group, and metrics defined with ``PW_METRIC_GLOBAL`` are |
| 366 | pre-registered with the global metrics list. |
| 367 | |
| 368 | .. attention:: |
| 369 | |
| 370 | Do not create ``PW_METRIC_GLOBAL`` instances anywhere other than global |
| 371 | scope. Putting these on an instance (member context) would lead to dangling |
| 372 | pointers and misery. Metrics are never deleted or unregistered! |
| 373 | |
Keir Mierle | 6128809 | 2020-08-19 19:04:31 -0700 | [diff] [blame] | 374 | .. cpp:function:: PW_METRIC_GROUP_GLOBAL(identifier, name, value) |
Keir Mierle | 45fa785 | 2020-08-10 21:09:54 -0700 | [diff] [blame] | 375 | |
| 376 | Declare a ``pw::metric::Group`` with name name, and register it in the |
| 377 | global metric groups list ``pw::metric::global_groups``. |
| 378 | |
| 379 | Note that metrics created with ``PW_METRIC_GLOBAL`` should never be added to |
| 380 | groups! Instead, just create a freestanding metric and register it into the |
| 381 | global group (like in the example below). |
| 382 | |
| 383 | Example: |
| 384 | |
| 385 | .. code:: |
| 386 | |
| 387 | #include "pw_metric/metric.h" |
| 388 | #include "pw_metric/global.h" |
| 389 | |
| 390 | // No need to coordinate collection of this group; it's globally registered. |
Keir Mierle | 6128809 | 2020-08-19 19:04:31 -0700 | [diff] [blame] | 391 | PW_METRIC_GROUP_GLOBAL(leagcy_system, "legacy_system"); |
| 392 | PW_METRIC(leagcy_system, foo, "foo",0.2f); |
| 393 | PW_METRIC(leagcy_system, bar, "bar",44000u); |
Keir Mierle | 45fa785 | 2020-08-10 21:09:54 -0700 | [diff] [blame] | 394 | |
| 395 | .. attention:: |
| 396 | |
| 397 | Do not create ``PW_METRIC_GROUP_GLOBAL`` instances anywhere other than |
| 398 | global scope. Putting these on an instance (member context) would lead to |
| 399 | dangling pointers and misery. Metrics are never deleted or unregistered! |
| 400 | |
| 401 | ---------------------- |
| 402 | Usage & Best Practices |
| 403 | ---------------------- |
| 404 | This library makes several tradeoffs to enable low memory use per-metric, and |
| 405 | one of those tradeoffs results in requiring care in constructing the metric |
| 406 | trees. |
| 407 | |
| 408 | Use the Init() pattern for static objects with metrics |
| 409 | ------------------------------------------------------ |
| 410 | A common pattern in embedded systems is to allocate many objects globally, and |
| 411 | reduce reliance on dynamic allocation (or eschew malloc entirely). This leads |
| 412 | to a pattern where rich/large objects are statically constructed at global |
| 413 | scope, then interacted with via tasks or threads. For example, consider a |
| 414 | hypothetical global ``Uart`` object: |
| 415 | |
| 416 | .. code:: |
| 417 | |
| 418 | class Uart { |
| 419 | public: |
| 420 | Uart(span<std::byte> rx_buffer, span<std::byte> tx_buffer) |
| 421 | : rx_buffer_(rx_buffer), tx_buffer_(tx_buffer) {} |
| 422 | |
| 423 | // Send/receive here... |
| 424 | |
| 425 | private: |
| 426 | std::span<std::byte> rx_buffer; |
| 427 | std::span<std::byte> tx_buffer; |
| 428 | }; |
| 429 | |
| 430 | std::array<std::byte, 512> uart_rx_buffer; |
| 431 | std::array<std::byte, 512> uart_tx_buffer; |
| 432 | Uart uart1(uart_rx_buffer, uart_tx_buffer); |
| 433 | |
| 434 | Through the course of building a product, the team may want to add metrics to |
| 435 | the UART to for example gain insight into which operations are triggering lots |
| 436 | of data transfer. When adding metrics to the above imaginary UART object, one |
| 437 | might consider the following approach: |
| 438 | |
| 439 | .. code:: |
| 440 | |
| 441 | class Uart { |
| 442 | public: |
| 443 | Uart(span<std::byte> rx_buffer, |
| 444 | span<std::byte> tx_buffer, |
| 445 | Group& parent_metrics) |
| 446 | : rx_buffer_(rx_buffer), |
| 447 | tx_buffer_(tx_buffer) { |
| 448 | // PROBLEM! parent_metrics may not be constructed if it's a reference |
| 449 | // to a static global. |
| 450 | parent_metrics.Add(tx_bytes_); |
| 451 | parent_metrics.Add(rx_bytes_); |
| 452 | } |
| 453 | |
| 454 | // Send/receive here which increment tx/rx_bytes. |
| 455 | |
| 456 | private: |
| 457 | std::span<std::byte> rx_buffer; |
| 458 | std::span<std::byte> tx_buffer; |
| 459 | |
| 460 | PW_METRIC(tx_bytes_, "tx_bytes", 0); |
| 461 | PW_METRIC(rx_bytes_, "rx_bytes", 0); |
| 462 | }; |
| 463 | |
| 464 | PW_METRIC_GROUP(global_metrics, "/"); |
| 465 | PW_METRIC_GROUP(global_metrics, uart1_metrics, "uart1"); |
| 466 | |
| 467 | std::array<std::byte, 512> uart_rx_buffer; |
| 468 | std::array<std::byte, 512> uart_tx_buffer; |
| 469 | Uart uart1(uart_rx_buffer, |
| 470 | uart_tx_buffer, |
| 471 | uart1_metrics); |
| 472 | |
| 473 | However, this **is incorrect**, since the ``parent_metrics`` (pointing to |
| 474 | ``uart1_metrics`` in this case) may not be constructed at the point of |
| 475 | ``uart1`` getting constructed. Thankfully in the case of ``pw_metric`` this |
| 476 | will result in an assertion failure (or it will work correctly if the |
| 477 | constructors are called in a favorable order), so the problem will not go |
| 478 | unnoticed. Instead, consider using the ``Init()`` pattern for static objects, |
| 479 | where references to dependencies may only be stored during construction, but no |
| 480 | methods on the dependencies are called. |
| 481 | |
| 482 | Instead, the ``Init()`` approach separates global object construction into two |
| 483 | phases: The constructor where references are stored, and a ``Init()`` function |
| 484 | which is called after all static constructors have run. This approach works |
| 485 | correctly, even when the objects are allocated globally: |
| 486 | |
| 487 | .. code:: |
| 488 | |
| 489 | class Uart { |
| 490 | public: |
| 491 | // Note that metrics is not passed in here at all. |
| 492 | Uart(span<std::byte> rx_buffer, |
| 493 | span<std::byte> tx_buffer) |
| 494 | : rx_buffer_(rx_buffer), |
| 495 | tx_buffer_(tx_buffer) {} |
| 496 | |
| 497 | // Precondition: parent_metrics is already constructed. |
| 498 | void Init(Group& parent_metrics) { |
| 499 | parent_metrics.Add(tx_bytes_); |
| 500 | parent_metrics.Add(rx_bytes_); |
| 501 | } |
| 502 | |
| 503 | // Send/receive here which increment tx/rx_bytes. |
| 504 | |
| 505 | private: |
| 506 | std::span<std::byte> rx_buffer; |
| 507 | std::span<std::byte> tx_buffer; |
| 508 | |
| 509 | PW_METRIC(tx_bytes_, "tx_bytes", 0); |
| 510 | PW_METRIC(rx_bytes_, "rx_bytes", 0); |
| 511 | }; |
| 512 | |
| 513 | PW_METRIC_GROUP(root_metrics, "/"); |
| 514 | PW_METRIC_GROUP(root_metrics, uart1_metrics, "uart1"); |
| 515 | |
| 516 | std::array<std::byte, 512> uart_rx_buffer; |
| 517 | std::array<std::byte, 512> uart_tx_buffer; |
| 518 | Uart uart1(uart_rx_buffer, |
| 519 | uart_tx_buffer); |
| 520 | |
| 521 | void main() { |
| 522 | // uart1_metrics is guaranteed to be initialized by this point, so it is |
| 523 | safe to pass it to Init(). |
| 524 | uart1.Init(uart1_metrics); |
| 525 | } |
| 526 | |
| 527 | .. attention:: |
| 528 | |
| 529 | Be extra careful about **static global metric registration**. Consider using |
| 530 | the ``Init()`` pattern. |
| 531 | |
| 532 | Metric member order matters in objects |
| 533 | -------------------------------------- |
| 534 | The order of declaring in-class groups and metrics matters if the metrics are |
| 535 | within a group declared inside the class. For example, the following class will |
| 536 | work fine: |
| 537 | |
| 538 | .. code:: |
| 539 | |
| 540 | #include "pw_metric/metric.h" |
| 541 | |
| 542 | class PowerSubsystem { |
| 543 | public: |
| 544 | Group& metrics() { return metrics_; } |
| 545 | const Group& metrics() const { return metrics_; } |
| 546 | |
| 547 | private: |
| 548 | PW_METRIC_GROUP(metrics_, "power"); // Note metrics_ declared first. |
Keir Mierle | 6128809 | 2020-08-19 19:04:31 -0700 | [diff] [blame] | 549 | PW_METRIC(metrics_, foo, "foo", 0.2f); |
| 550 | PW_METRIC(metrics_, bar, "bar", 44000u); |
Keir Mierle | 45fa785 | 2020-08-10 21:09:54 -0700 | [diff] [blame] | 551 | }; |
| 552 | |
| 553 | but the following one will not since the group is constructed after the metrics |
| 554 | (and will result in a compile error): |
| 555 | |
| 556 | .. code:: |
| 557 | |
| 558 | #include "pw_metric/metric.h" |
| 559 | |
| 560 | class PowerSubsystem { |
| 561 | public: |
| 562 | Group& metrics() { return metrics_; } |
| 563 | const Group& metrics() const { return metrics_; } |
| 564 | |
| 565 | private: |
Keir Mierle | 6128809 | 2020-08-19 19:04:31 -0700 | [diff] [blame] | 566 | PW_METRIC(metrics_, foo, "foo", 0.2f); |
| 567 | PW_METRIC(metrics_, bar, "bar", 44000u); |
Keir Mierle | 45fa785 | 2020-08-10 21:09:54 -0700 | [diff] [blame] | 568 | PW_METRIC_GROUP(metrics_, "power"); // Error: metrics_ must be first. |
| 569 | }; |
| 570 | |
| 571 | .. attention:: |
| 572 | |
| 573 | Put **groups before metrics** when declaring metrics members inside classes. |
| 574 | |
| 575 | Thread safety |
| 576 | ------------- |
| 577 | ``pw_metric`` has **no built-in synchronization for manipulating the tree** |
| 578 | structure. Users are expected to either rely on shared global mutex when |
| 579 | constructing the metric tree, or do the metric construction in a single thread |
| 580 | (e.g. a boot/init thread). The same applies for destruction, though we do not |
| 581 | advise destructing metrics or groups. |
| 582 | |
| 583 | Individual metrics have atomic ``Increment()``, ``Set()``, and the value |
| 584 | accessors ``as_float()`` and ``as_int()`` which don't require separate |
| 585 | synchronization, and can be used from ISRs. |
| 586 | |
| 587 | .. attention:: |
| 588 | |
| 589 | **You must synchronize access to metrics**. ``pw_metrics`` does not |
| 590 | internally synchronize access during construction. Metric Set/Increment are |
| 591 | safe. |
| 592 | |
| 593 | Lifecycle |
| 594 | --------- |
| 595 | Metric objects are not designed to be destructed, and are expected to live for |
| 596 | the lifetime of the program or application. If you need dynamic |
| 597 | creation/destruction of metrics, ``pw_metric`` does not attempt to cover that |
| 598 | use case. Instead, ``pw_metric`` covers the case of products with two execution |
| 599 | phases: |
| 600 | |
| 601 | 1. A boot phase where the metric tree is created. |
| 602 | 2. A run phase where metrics are collected. The tree structure is fixed. |
| 603 | |
| 604 | Technically, it is possible to destruct metrics provided care is taken to |
| 605 | remove the given metric (or group) from the list it's contained in. However, |
| 606 | there are no helper functions for this, so be careful. |
| 607 | |
| 608 | Below is an example that **is incorrect**. Don't do what follows! |
| 609 | |
| 610 | .. code:: |
| 611 | |
| 612 | #include "pw_metric/metric.h" |
| 613 | |
| 614 | void main() { |
| 615 | PW_METRIC_GROUP(root, "/"); |
| 616 | { |
| 617 | // BAD! The metrics have a different lifetime than the group. |
| 618 | PW_METRIC(root, temperature, "temperature_f", 72.3f); |
| 619 | PW_METRIC(root, humidity, "humidity_relative_percent", 33.2f); |
| 620 | } |
| 621 | // OOPS! root now has a linked list that points to the destructed |
| 622 | // "humidity" object. |
| 623 | } |
| 624 | |
| 625 | .. attention:: |
| 626 | |
| 627 | **Don't destruct metrics**. Metrics are designed to be registered / |
| 628 | structured upfront, then manipulated during a device's active phase. They do |
| 629 | not support destruction. |
| 630 | |
Keir Mierle | f4dfd87 | 2020-08-12 20:53:26 -0700 | [diff] [blame] | 631 | ----------------- |
| 632 | Exporting metrics |
| 633 | ----------------- |
| 634 | Collecting metrics on a device is not useful without a mechanism to export |
| 635 | those metrics for analysis and debugging. ``pw_metric`` offers an optional RPC |
| 636 | service library (``:metric_service_nanopb``) that enables exporting a |
| 637 | user-supplied set of on-device metrics via RPC. This facility is intended to |
| 638 | function from the early stages of device bringup through production in the |
| 639 | field. |
| 640 | |
| 641 | The metrics are fetched by calling the ``MetricService.Get`` RPC method, which |
| 642 | streams all registered metrics to the caller in batches (server streaming RPC). |
| 643 | Batching the returned metrics avoids requiring a large buffer or large RPC MTU. |
| 644 | |
| 645 | The returned metric objects have flattened paths to the root. For example, the |
| 646 | returned metrics (post detokenization and jsonified) might look something like: |
| 647 | |
| 648 | .. code:: none |
| 649 | |
| 650 | { |
| 651 | "/i2c1/failed_txns": 17, |
| 652 | "/i2c1/total_txns": 2013, |
| 653 | "/i2c1/gyro/resets": 24, |
| 654 | "/i2c1/gyro/hangs": 1, |
| 655 | "/spi1/thermocouple/reads": 242, |
| 656 | "/spi1/thermocouple/temp_celcius": 34.52, |
| 657 | } |
| 658 | |
| 659 | Note that there is no nesting of the groups; the nesting is implied from the |
| 660 | path. |
| 661 | |
| 662 | RPC service setup |
| 663 | ----------------- |
| 664 | To expose a ``MetricService`` in your application, do the following: |
| 665 | |
| 666 | 1. Define metrics around the system, and put them in a group or list of |
| 667 | metrics. Easy choices include for example the ``global_groups`` and |
| 668 | ``global_metrics`` variables; or creat your own. |
| 669 | 2. Create an instance of ``pw::metric::MetricService``. |
| 670 | 3. Register the service with your RPC server. |
| 671 | |
| 672 | For example: |
| 673 | |
| 674 | .. code:: |
| 675 | |
| 676 | #include "pw_rpc/server.h" |
| 677 | #include "pw_metric/metric.h" |
| 678 | #include "pw_metric/global.h" |
| 679 | #include "pw_metric/metric_service_nanopb.h" |
| 680 | |
| 681 | // Note: You must customize the RPC server setup; see pw_rpc. |
| 682 | Channel channels[] = { |
| 683 | Channel::Create<1>(&uart_output), |
| 684 | }; |
| 685 | Server server(channels); |
| 686 | |
| 687 | // Metric service instance, pointing to the global metric objects. |
| 688 | // This could also point to custom per-product or application objects. |
| 689 | pw::metric::MetricService metric_service( |
| 690 | pw::metric::global_metrics, |
| 691 | pw::metric::global_groups); |
| 692 | |
| 693 | void RegisterServices() { |
| 694 | server.RegisterService(metric_service); |
| 695 | // Register other services here. |
| 696 | } |
| 697 | |
| 698 | void main() { |
| 699 | // ... system initialization ... |
| 700 | |
| 701 | RegisterServices(); |
| 702 | |
| 703 | // ... start your applcation ... |
| 704 | } |
| 705 | |
| 706 | .. attention:: |
| 707 | |
| 708 | Take care when exporting metrics. Ensure **appropriate access control** is in |
| 709 | place. In some cases it may make sense to entirely disable metrics export for |
| 710 | production builds. Although reading metrics via RPC won't influence the |
| 711 | device, in some cases the metrics could expose sensitive information if |
| 712 | product owners are not careful. |
| 713 | |
| 714 | .. attention:: |
| 715 | |
| 716 | **MetricService::Get is a synchronous RPC method** |
| 717 | |
| 718 | Calls to is ``MetricService::Get`` are blocking and will send all metrics |
| 719 | immediately, even though it is a server-streaming RPC. This will work fine if |
| 720 | the device doesn't have too many metics, or doesn't have concurrent RPCs like |
| 721 | logging, but could be a problem in some cases. |
| 722 | |
| 723 | We plan to offer an async version where the application is responsible for |
| 724 | pumping the metrics into the streaming response. This gives flow control to |
| 725 | the application. |
| 726 | |
Keir Mierle | 9b51cdf | 2020-08-19 09:46:19 -0700 | [diff] [blame] | 727 | ----------- |
| 728 | Size report |
| 729 | ----------- |
| 730 | The below size report shows the cost in code and memory for a few examples of |
| 731 | metrics. This does not include the RPC service. |
| 732 | |
| 733 | .. include:: metric_size_report |
| 734 | |
| 735 | .. attention:: |
| 736 | |
| 737 | At time of writing, **the above sizes show an unexpectedly large flash |
| 738 | impact**. We are investigating why GCC is inserting large global static |
| 739 | constructors per group, when all the logic should be reused across objects. |
| 740 | |
Keir Mierle | 45fa785 | 2020-08-10 21:09:54 -0700 | [diff] [blame] | 741 | ---------------- |
| 742 | Design tradeoffs |
| 743 | ---------------- |
| 744 | There are many possible approaches to metrics collection and aggregation. We've |
| 745 | chosen some points on the tradeoff curve: |
| 746 | |
| 747 | - **Atomic-sized metrics** - Using simple metric objects with just uint32/float |
| 748 | enables atomic operations. While it might be nice to support larger types, it |
| 749 | is more useful to have safe metrics increment from interrupt subroutines. |
| 750 | |
| 751 | - **No aggregate metrics (yet)** - Aggregate metrics (e.g. average, max, min, |
| 752 | histograms) are not supported, and must be built on top of the simple base |
| 753 | metrics. By taking this route, we can considerably simplify the core metrics |
| 754 | system and have aggregation logic in separate modules. Those modules can then |
| 755 | feed into the metrics system - for example by creating multiple metrics for a |
| 756 | single underlying metric. For example: "foo", "foo_max", "foo_min" and so on. |
| 757 | |
| 758 | The other problem with automatic aggregation is that what period the |
| 759 | aggregation happens over is often important, and it can be hard to design |
| 760 | this cleanly into the API. Instead, this responsibility is pushed to the user |
| 761 | who must take more care. |
| 762 | |
| 763 | Note that we will add helpers for aggregated metrics. |
| 764 | |
| 765 | - **No virtual metrics** - An alternate approach to the concrete Metric class |
| 766 | in the current module is to have a virtual interface for metrics, and then |
| 767 | allow those metrics to have their own storage. This is attractive but can |
| 768 | lead to many vtables and excess memory use in simple one-metric use cases. |
| 769 | |
| 770 | - **Linked list registration** - Using linked lists for registration is a |
| 771 | tradeoff, accepting some memory overhead in exchange for flexibility. Other |
| 772 | alternatives include a global table of metrics, which has the disadvantage of |
| 773 | requiring centralizing the metrics -- an impossibility for middleware like |
| 774 | Pigweed. |
| 775 | |
| 776 | - **Synchronization** - The only synchronization guarantee provided by |
| 777 | pw_metric is that increment and set are atomic. Other than that, users are on |
| 778 | their own to synchonize metric collection and updating. |
| 779 | |
| 780 | - **No fast metric lookup** - The current design does not make it fast to |
| 781 | lookup a metric at runtime; instead, one must run a linear search of the tree |
| 782 | to find the matching metric. In most non-dynamic use cases, this is fine in |
| 783 | practice, and saves having a more involved hash table. Metric updates will be |
| 784 | through direct member or variable accesses. |
| 785 | |
| 786 | - **Relying on C++ static initialization** - In short, the convenience |
| 787 | outweighs the cost and risk. Without static initializers, it would be |
| 788 | impossible to automatically collect the metrics without post-processing the |
| 789 | C++ code to find the metrics; a huge and debatably worthwhile approach. We |
| 790 | have carefully analyzed the static initializer behaviour of Pigweed's |
| 791 | IntrusiveList and are confident it is correct. |
| 792 | |
| 793 | - **Both local & global support** - Potentially just one approach (the local or |
| 794 | global one) could be offered, making the module less complex. However, we |
| 795 | feel the additional complexity is worthwhile since there are legimitate use |
| 796 | cases for both e.g. ``PW_METRIC`` and ``PW_METRIC_GLOBAL``. We'd prefer to |
| 797 | have a well-tested upstream solution for these use cases rather than have |
| 798 | customers re-implement one of these. |
| 799 | |
| 800 | ---------------- |
| 801 | Roadmap & Status |
| 802 | ---------------- |
| 803 | - **String metric names** - ``pw_metric`` stores metric names as tokens. On one |
| 804 | hand, this is great for production where having a compact binary is often a |
| 805 | requirement to fit the application in the given part. However, in early |
| 806 | development before flash is a constraint, string names are more convenient to |
| 807 | work with since there is no need for host-side detokenization. We plan to add |
| 808 | optional support for using supporting strings. |
| 809 | |
| 810 | - **Aggregate metrics** - We plan to add support for aggregate metrics on top |
| 811 | of the simple metric mechanism, either as another module or as additional |
| 812 | functionality inside this one. Likely examples include min/max, |
| 813 | |
| 814 | - **Selectively enable or disable metrics** - Currently the metrics are always |
| 815 | enabled once included. In practice this is not ideal since many times only a |
| 816 | few metrics are wanted in production, but having to strip all the metrics |
| 817 | code is error prone. Instead, we will add support for controlling what |
| 818 | metrics are enabled or disabled at compile time. This may rely on of C++20's |
| 819 | support for zero-sized members to fully remove the cost. |
| 820 | |
Keir Mierle | f4dfd87 | 2020-08-12 20:53:26 -0700 | [diff] [blame] | 821 | - **Async RCPC** - The current RPC service exports the metrics by streaming |
| 822 | them to the client in batches. However, the current solution streams all the |
| 823 | metrics to completion; this may block the RPC thread. In the future we will |
| 824 | have an async solution where the user is in control of flow priority. |
Keir Mierle | 45fa785 | 2020-08-10 21:09:54 -0700 | [diff] [blame] | 825 | |
| 826 | - **Timer integration** - We would like to add a stopwatch type mechanism to |
| 827 | time multiple in-flight events. |
| 828 | |
| 829 | - **C support** - In practice it's often useful or necessary to instrument |
| 830 | C-only code. While it will be impossible to support the global registration |
| 831 | system that the C++ version supports, we will figure out a solution to make |
| 832 | instrumenting C code relatively smooth. |
| 833 | |
| 834 | - **Global counter** - We may add a global metric counter to help detect cases |
| 835 | where post-initialization metrics manipulations are done. |
| 836 | |
| 837 | - **Proto structure** - It may be possible to directly map metrics to a custom |
| 838 | proto structure, where instead of a name or token field, a tag field is |
| 839 | provided. This could result in elegant export to an easily machine parsable |
| 840 | and compact representation on the host. We may investigate this in the |
| 841 | future. |
| 842 | |
| 843 | - **Safer data structures** - At a cost of 4B per metric and 4B per group, it |
| 844 | may be possible to make metric structure instantiation safe even in static |
| 845 | constructors, and also make it safe to remove metrics dynamically. We will |
| 846 | consider whether this tradeoff is the right one, since a 4B cost per metric |
| 847 | is substantial on projects with many metrics. |