blob: 6bdcd5a01d6477bc4f1e58f8513c656f9db00a82 [file] [log] [blame]
Wyatt Heplerf9fb90f2020-09-30 18:59:33 -07001.. _module-pw_metric:
Keir Mierle45fa7852020-08-10 21:09:54 -07002
3=========
4pw_metric
5=========
6
7.. attention::
8
9 This module is **not yet production ready**; ask us if you are interested in
10 using it out or have ideas about how to improve it.
11
12--------
13Overview
14--------
15Pigweed's metric module is a **lightweight manual instrumentation system** for
16tracking system health metrics like counts or set values. For example,
17``pw_metric`` could help with tracking the number of I2C bus writes, or the
18number of times a buffer was filled before it could drain in time, or safely
19incrementing counters from ISRs.
20
21Key features of ``pw_metric``:
22
23- **Tokenized names** - Names are tokenized using the ``pw_tokenizer`` enabling
24 long metric names that don't bloat your binary.
25
26- **Tree structure** - Metrics can form a tree, enabling grouping of related
27 metrics for clearer organization.
28
29- **Per object collection** - Metrics and groups can live on object instances
30 and be flexibly combined with metrics from other instances.
31
32- **Global registration** - For legacy code bases or just because it's easier,
33 ``pw_metric`` supports automatic aggregation of metrics. This is optional but
34 convenient in many cases.
35
36- **Simple design** - There are only two core data structures: ``Metric`` and
37 ``Group``, which are both simple to understand and use. The only type of
38 metric supported is ``uint32_t`` and ``float``. This module does not support
39 complicated aggregations like running average or min/max.
40
41Example: Instrumenting a single object
42--------------------------------------
43The below example illustrates what instrumenting a class with a metric group
44and metrics might look like. In this case, the object's
45``MySubsystem::metrics()`` member is not globally registered; the user is on
46their own for combining this subsystem's metrics with others.
47
48.. code::
49
50 #include "pw_metric/metric.h"
51
52 class MySubsystem {
53 public:
54 void DoSomething() {
Keir Mierle61288092020-08-19 19:04:31 -070055 attempts_.Increment();
Keir Mierle45fa7852020-08-10 21:09:54 -070056 if (ActionSucceeds()) {
Keir Mierle61288092020-08-19 19:04:31 -070057 successes_.Increment();
Keir Mierle45fa7852020-08-10 21:09:54 -070058 }
59 }
Keir Mierle61288092020-08-19 19:04:31 -070060 Group& metrics() { return metrics_; }
Keir Mierle45fa7852020-08-10 21:09:54 -070061
62 private:
Keir Mierle61288092020-08-19 19:04:31 -070063 PW_METRIC_GROUP(metrics_, "my_subsystem");
64 PW_METRIC(metrics_, attempts_, "attempts", 0u);
65 PW_METRIC(metrics_, successes_, "successes", 0u);
Keir Mierle45fa7852020-08-10 21:09:54 -070066 };
67
68The metrics subsystem has no canonical output format at this time, but a JSON
69dump might look something like this:
70
71.. code:: none
72
73 {
74 "my_subsystem" : {
75 "successes" : 1000,
76 "attempts" : 1200,
77 }
78 }
79
80In this case, every instance of ``MySubsystem`` will have unique counters.
81
82Example: Instrumenting a legacy codebase
83----------------------------------------
84A common situation in embedded development is **debugging legacy code** or code
85which is hard to change; where it is perhaps impossible to plumb metrics
86objects around with dependency injection. The alternative to plumbing metrics
87is to register the metrics through a global mechanism. ``pw_metric`` supports
88this use case. For example:
89
90**Before instrumenting:**
91
92.. code::
93
94 // This code was passed down from generations of developers before; no one
95 // knows what it does or how it works. But it needs to be fixed!
96 void OldCodeThatDoesntWorkButWeDontKnowWhy() {
97 if (some_variable) {
98 DoSomething();
99 } else {
100 DoSomethingElse();
101 }
102 }
103
104**After instrumenting:**
105
106.. code::
107
108 #include "pw_metric/global.h"
109 #include "pw_metric/metric.h"
110
Keir Mierle61288092020-08-19 19:04:31 -0700111 PW_METRIC_GLOBAL(legacy_do_something, "legacy_do_something");
112 PW_METRIC_GLOBAL(legacy_do_something_else, "legacy_do_something_else");
Keir Mierle45fa7852020-08-10 21:09:54 -0700113
114 // This code was passed down from generations of developers before; no one
115 // knows what it does or how it works. But it needs to be fixed!
116 void OldCodeThatDoesntWorkButWeDontKnowWhy() {
117 if (some_variable) {
118 legacy_do_something.Increment();
119 DoSomething();
120 } else {
121 legacy_do_something_else.Increment();
122 DoSomethingElse();
123 }
124 }
125
126In this case, the developer merely had to add the metrics header, define some
127metrics, and then start incrementing them. These metrics will be available
128globally through the ``pw::metric::global_metrics`` object defined in
129``pw_metric/global.h``.
130
131Why not just use simple counter variables?
132------------------------------------------
133One might wonder what the point of leveraging a metric library is when it is
134trivial to make some global variables and print them out. There are a few
135reasons:
136
137- **Metrics offload** - To make it easy to get metrics off-device by sharing
138 the infrastructure for offloading.
139
140- **Consistent format** - To get the metrics in a consistent format (e.g.
141 protobuf or JSON) for analysis
142
143- **Uncoordinated collection** - To provide a simple and reliable way for
144 developers on a team to all collect metrics for their subsystems, without
145 having to coordinate to offload. This could extend to code in libraries
146 written by other teams.
147
148- **Pre-boot or interrupt visibility** - Some of the most challenging bugs come
149 from early system boot when not all system facilities are up (e.g. logging or
150 UART). In those cases, metrics provide a low-overhead approach to understand
151 what is happening. During early boot, metrics can be incremented, then after
152 boot dumping the metrics provides insights into what happened. While basic
153 counter variables can work in these contexts to, one still has to deal with
154 the offloading problem; which the library handles.
155
156---------------------
157Metrics API reference
158---------------------
159
160The metrics API consists of just a few components:
161
162- The core data structures ``pw::metric::Metric`` and ``pw::metric::Group``
163- The macros for scoped metrics and groups ``PW_METRIC`` and
164 ``PW_METRIC_GROUP``
165- The macros for globally registered metrics and groups
166 ``PW_METRIC_GLOBAL`` and ``PW_METRIC_GROUP_GLOBAL``
167- The global groups and metrics list: ``pw::metric::global_groups`` and
168 ``pw::metric::global_metrics``.
169
170Metric
171------
172The ``pw::metric::Metric`` provides:
173
174- A 31-bit tokenized name
175- A 1-bit discriminator for int or float
176- A 32-bit payload (int or float)
177- A 32-bit next pointer (intrusive list)
178
179The metric object is 12 bytes on 32-bit platforms.
180
181.. cpp:class:: pw::metric::Metric
182
183 .. cpp:function:: Increment(uint32_t amount = 0)
184
185 Increment the metric by the given amount. Results in undefined behaviour if
186 the metric is not of type int.
187
188 .. cpp:function:: Set(uint32_t value)
189
190 Set the metric to the given value. Results in undefined behaviour if the
191 metric is not of type int.
192
193 .. cpp:function:: Set(float value)
194
195 Set the metric to the given value. Results in undefined behaviour if the
196 metric is not of type float.
197
198Group
199-----
200The ``pw::metric::Group`` object is simply:
201
202- A name for the group
203- A list of children groups
204- A list of leaf metrics groups
205- A 32-bit next pointer (intrusive list)
206
207The group object is 16 bytes on 32-bit platforms.
208
209.. cpp:class:: pw::metric::Group
210
211 .. cpp:function:: Dump(int indent_level = 0)
212
213 Recursively dump a metrics group to ``pw_log``. Produces output like:
214
215 .. code:: none
216
217 "$6doqFw==": {
218 "$05OCZw==": {
219 "$VpPfzg==": 1,
220 "$LGPMBQ==": 1.000000,
221 "$+iJvUg==": 5,
222 }
223 "$9hPNxw==": 65,
224 "$oK7HmA==": 13,
225 "$FCM4qQ==": 0,
226 }
227
228 Note the metric names are tokenized with base64. Decoding requires using
229 the Pigweed detokenizer. With a detokenizing-enabled logger, you could get
230 something like:
231
232 .. code:: none
233
234 "i2c_1": {
235 "gyro": {
236 "num_sampleses": 1,
237 "init_time_us": 1.000000,
238 "initialized": 5,
239 }
240 "bus_errors": 65,
241 "transactions": 13,
242 "bytes_sent": 0,
243 }
244
245Macros
246------
247The **macros are the primary mechanism for creating metrics**, and should be
248used instead of directly constructing metrics or groups. The macros handle
249tokenizing the metric and group names.
250
Keir Mierle61288092020-08-19 19:04:31 -0700251.. cpp:function:: PW_METRIC(identifier, name, value)
252.. cpp:function:: PW_METRIC(group, identifier, name, value)
Paul Mathieu2182c662020-08-28 17:02:50 +0200253.. cpp:function:: PW_METRIC_STATIC(identifier, name, value)
254.. cpp:function:: PW_METRIC_STATIC(group, identifier, name, value)
Keir Mierle45fa7852020-08-10 21:09:54 -0700255
256 Declare a metric, optionally adding it to a group.
257
Keir Mierle61288092020-08-19 19:04:31 -0700258 - **identifier** - An identifier name for the created variable or member.
259 For example: ``i2c_transactions`` might be used as a local or global
260 metric; inside a class, could be named according to members
261 (``i2c_transactions_`` for Google's C++ style).
262 - **name** - The string name for the metric. This will be tokenized. There
263 are no restrictions on the contents of the name; however, consider
264 restricting these to be valid C++ identifiers to ease integration with
265 other systems.
Keir Mierle45fa7852020-08-10 21:09:54 -0700266 - **value** - The initial value for the metric. Must be either a floating
267 point value (e.g. ``3.2f``) or unsigned int (e.g. ``21u``).
268 - **group** - A ``pw::metric::Group`` instance. If provided, the metric is
269 added to the given group.
270
271 The macro declares a variable or member named "name" with type
272 ``pw::metric::Metric``, and works in three contexts: global, local, and
273 member.
274
Paul Mathieu2182c662020-08-28 17:02:50 +0200275 If the `_STATIC` variant is used, the macro declares a variable with static
276 storage. These can be used in function scopes, but not in classes.
277
Keir Mierle45fa7852020-08-10 21:09:54 -0700278 1. At global scope:
279
280 .. code::
281
Keir Mierle61288092020-08-19 19:04:31 -0700282 PW_METRIC(foo, "foo", 15.5f);
Keir Mierle45fa7852020-08-10 21:09:54 -0700283
284 void MyFunc() {
285 foo.Increment();
286 }
287
288 2. At local function or member function scope:
289
290 .. code::
291
292 void MyFunc() {
Keir Mierle61288092020-08-19 19:04:31 -0700293 PW_METRIC(foo, "foo", 15.5f);
Keir Mierle45fa7852020-08-10 21:09:54 -0700294 foo.Increment();
295 // foo goes out of scope here; be careful!
296 }
297
298 3. At member level inside a class or struct:
299
300 .. code::
301
302 struct MyStructy {
303 void DoSomething() {
304 somethings.Increment();
305 }
306 // Every instance of MyStructy will have a separate somethings counter.
Keir Mierle61288092020-08-19 19:04:31 -0700307 PW_METRIC(somethings, "somethings", 0u);
Keir Mierle45fa7852020-08-10 21:09:54 -0700308 }
309
310 You can also put a metric into a group with the macro. Metrics can belong to
311 strictly one group, otherwise a assertion will fail. Example:
312
313 .. code::
314
Keir Mierle61288092020-08-19 19:04:31 -0700315 PW_METRIC_GROUP(my_group, "my_group");
316 PW_METRIC(my_group, foo, "foo", 0.2f);
317 PW_METRIC(my_group, bar, "bar", 44000u);
318 PW_METRIC(my_group, zap, "zap", 3.14f);
Keir Mierle45fa7852020-08-10 21:09:54 -0700319
320 .. tip::
321
322 If you want a globally registered metric, see ``pw_metric/global.h``; in
323 that contexts, metrics are globally registered without the need to
324 centrally register in a single place.
325
Keir Mierle61288092020-08-19 19:04:31 -0700326.. cpp:function:: PW_METRIC_GROUP(identifier, name)
Paul Mathieu2182c662020-08-28 17:02:50 +0200327.. cpp:function:: PW_METRIC_GROUP(parent_group, identifier, name)
328.. cpp:function:: PW_METRIC_GROUP_STATIC(identifier, name)
329.. cpp:function:: PW_METRIC_GROUP_STATIC(parent_group, identifier, name)
Keir Mierle45fa7852020-08-10 21:09:54 -0700330
331 Declares a ``pw::metric::Group`` with name name; the name is tokenized.
332 Works similar to ``PW_METRIC`` and can be used in the same contexts (global,
Paul Mathieu2182c662020-08-28 17:02:50 +0200333 local, and member). Optionally, the group can be added to a parent group.
334
335 If the `_STATIC` variant is used, the macro declares a variable with static
336 storage. These can be used in function scopes, but not in classes.
Keir Mierle45fa7852020-08-10 21:09:54 -0700337
338 Example:
339
340 .. code::
341
Keir Mierle61288092020-08-19 19:04:31 -0700342 PW_METRIC_GROUP(my_group, "my_group");
343 PW_METRIC(my_group, foo, "foo", 0.2f);
344 PW_METRIC(my_group, bar, "bar", 44000u);
345 PW_METRIC(my_group, zap, "zap", 3.14f);
Keir Mierle45fa7852020-08-10 21:09:54 -0700346
Keir Mierle61288092020-08-19 19:04:31 -0700347.. cpp:function:: PW_METRIC_GLOBAL(identifier, name, value)
Keir Mierle45fa7852020-08-10 21:09:54 -0700348
349 Declare a ``pw::metric::Metric`` with name name, and register it in the
350 global metrics list ``pw::metric::global_metrics``.
351
352 Example:
353
354 .. code::
355
356 #include "pw_metric/metric.h"
357 #include "pw_metric/global.h"
358
359 // No need to coordinate collection of foo and bar; they're autoregistered.
Keir Mierle61288092020-08-19 19:04:31 -0700360 PW_METRIC_GLOBAL(foo, "foo", 0.2f);
361 PW_METRIC_GLOBAL(bar, "bar", 44000u);
Keir Mierle45fa7852020-08-10 21:09:54 -0700362
363 Note that metrics defined with ``PW_METRIC_GLOBAL`` should never be added to
364 groups defined with ``PW_METRIC_GROUP_GLOBAL``. Each metric can only belong
365 to one group, and metrics defined with ``PW_METRIC_GLOBAL`` are
366 pre-registered with the global metrics list.
367
368 .. attention::
369
370 Do not create ``PW_METRIC_GLOBAL`` instances anywhere other than global
371 scope. Putting these on an instance (member context) would lead to dangling
372 pointers and misery. Metrics are never deleted or unregistered!
373
Keir Mierle61288092020-08-19 19:04:31 -0700374.. cpp:function:: PW_METRIC_GROUP_GLOBAL(identifier, name, value)
Keir Mierle45fa7852020-08-10 21:09:54 -0700375
376 Declare a ``pw::metric::Group`` with name name, and register it in the
377 global metric groups list ``pw::metric::global_groups``.
378
379 Note that metrics created with ``PW_METRIC_GLOBAL`` should never be added to
380 groups! Instead, just create a freestanding metric and register it into the
381 global group (like in the example below).
382
383 Example:
384
385 .. code::
386
387 #include "pw_metric/metric.h"
388 #include "pw_metric/global.h"
389
390 // No need to coordinate collection of this group; it's globally registered.
Keir Mierle61288092020-08-19 19:04:31 -0700391 PW_METRIC_GROUP_GLOBAL(leagcy_system, "legacy_system");
392 PW_METRIC(leagcy_system, foo, "foo",0.2f);
393 PW_METRIC(leagcy_system, bar, "bar",44000u);
Keir Mierle45fa7852020-08-10 21:09:54 -0700394
395 .. attention::
396
397 Do not create ``PW_METRIC_GROUP_GLOBAL`` instances anywhere other than
398 global scope. Putting these on an instance (member context) would lead to
399 dangling pointers and misery. Metrics are never deleted or unregistered!
400
401----------------------
402Usage & Best Practices
403----------------------
404This library makes several tradeoffs to enable low memory use per-metric, and
405one of those tradeoffs results in requiring care in constructing the metric
406trees.
407
408Use the Init() pattern for static objects with metrics
409------------------------------------------------------
410A common pattern in embedded systems is to allocate many objects globally, and
411reduce reliance on dynamic allocation (or eschew malloc entirely). This leads
412to a pattern where rich/large objects are statically constructed at global
413scope, then interacted with via tasks or threads. For example, consider a
414hypothetical global ``Uart`` object:
415
416.. code::
417
418 class Uart {
419 public:
420 Uart(span<std::byte> rx_buffer, span<std::byte> tx_buffer)
421 : rx_buffer_(rx_buffer), tx_buffer_(tx_buffer) {}
422
423 // Send/receive here...
424
425 private:
426 std::span<std::byte> rx_buffer;
427 std::span<std::byte> tx_buffer;
428 };
429
430 std::array<std::byte, 512> uart_rx_buffer;
431 std::array<std::byte, 512> uart_tx_buffer;
432 Uart uart1(uart_rx_buffer, uart_tx_buffer);
433
434Through the course of building a product, the team may want to add metrics to
435the UART to for example gain insight into which operations are triggering lots
436of data transfer. When adding metrics to the above imaginary UART object, one
437might consider the following approach:
438
439.. code::
440
441 class Uart {
442 public:
443 Uart(span<std::byte> rx_buffer,
444 span<std::byte> tx_buffer,
445 Group& parent_metrics)
446 : rx_buffer_(rx_buffer),
447 tx_buffer_(tx_buffer) {
448 // PROBLEM! parent_metrics may not be constructed if it's a reference
449 // to a static global.
450 parent_metrics.Add(tx_bytes_);
451 parent_metrics.Add(rx_bytes_);
452 }
453
454 // Send/receive here which increment tx/rx_bytes.
455
456 private:
457 std::span<std::byte> rx_buffer;
458 std::span<std::byte> tx_buffer;
459
460 PW_METRIC(tx_bytes_, "tx_bytes", 0);
461 PW_METRIC(rx_bytes_, "rx_bytes", 0);
462 };
463
464 PW_METRIC_GROUP(global_metrics, "/");
465 PW_METRIC_GROUP(global_metrics, uart1_metrics, "uart1");
466
467 std::array<std::byte, 512> uart_rx_buffer;
468 std::array<std::byte, 512> uart_tx_buffer;
469 Uart uart1(uart_rx_buffer,
470 uart_tx_buffer,
471 uart1_metrics);
472
473However, this **is incorrect**, since the ``parent_metrics`` (pointing to
474``uart1_metrics`` in this case) may not be constructed at the point of
475``uart1`` getting constructed. Thankfully in the case of ``pw_metric`` this
476will result in an assertion failure (or it will work correctly if the
477constructors are called in a favorable order), so the problem will not go
478unnoticed. Instead, consider using the ``Init()`` pattern for static objects,
479where references to dependencies may only be stored during construction, but no
480methods on the dependencies are called.
481
482Instead, the ``Init()`` approach separates global object construction into two
483phases: The constructor where references are stored, and a ``Init()`` function
484which is called after all static constructors have run. This approach works
485correctly, even when the objects are allocated globally:
486
487.. code::
488
489 class Uart {
490 public:
491 // Note that metrics is not passed in here at all.
492 Uart(span<std::byte> rx_buffer,
493 span<std::byte> tx_buffer)
494 : rx_buffer_(rx_buffer),
495 tx_buffer_(tx_buffer) {}
496
497 // Precondition: parent_metrics is already constructed.
498 void Init(Group& parent_metrics) {
499 parent_metrics.Add(tx_bytes_);
500 parent_metrics.Add(rx_bytes_);
501 }
502
503 // Send/receive here which increment tx/rx_bytes.
504
505 private:
506 std::span<std::byte> rx_buffer;
507 std::span<std::byte> tx_buffer;
508
509 PW_METRIC(tx_bytes_, "tx_bytes", 0);
510 PW_METRIC(rx_bytes_, "rx_bytes", 0);
511 };
512
513 PW_METRIC_GROUP(root_metrics, "/");
514 PW_METRIC_GROUP(root_metrics, uart1_metrics, "uart1");
515
516 std::array<std::byte, 512> uart_rx_buffer;
517 std::array<std::byte, 512> uart_tx_buffer;
518 Uart uart1(uart_rx_buffer,
519 uart_tx_buffer);
520
521 void main() {
522 // uart1_metrics is guaranteed to be initialized by this point, so it is
523 safe to pass it to Init().
524 uart1.Init(uart1_metrics);
525 }
526
527.. attention::
528
529 Be extra careful about **static global metric registration**. Consider using
530 the ``Init()`` pattern.
531
532Metric member order matters in objects
533--------------------------------------
534The order of declaring in-class groups and metrics matters if the metrics are
535within a group declared inside the class. For example, the following class will
536work fine:
537
538.. code::
539
540 #include "pw_metric/metric.h"
541
542 class PowerSubsystem {
543 public:
544 Group& metrics() { return metrics_; }
545 const Group& metrics() const { return metrics_; }
546
547 private:
548 PW_METRIC_GROUP(metrics_, "power"); // Note metrics_ declared first.
Keir Mierle61288092020-08-19 19:04:31 -0700549 PW_METRIC(metrics_, foo, "foo", 0.2f);
550 PW_METRIC(metrics_, bar, "bar", 44000u);
Keir Mierle45fa7852020-08-10 21:09:54 -0700551 };
552
553but the following one will not since the group is constructed after the metrics
554(and will result in a compile error):
555
556.. code::
557
558 #include "pw_metric/metric.h"
559
560 class PowerSubsystem {
561 public:
562 Group& metrics() { return metrics_; }
563 const Group& metrics() const { return metrics_; }
564
565 private:
Keir Mierle61288092020-08-19 19:04:31 -0700566 PW_METRIC(metrics_, foo, "foo", 0.2f);
567 PW_METRIC(metrics_, bar, "bar", 44000u);
Keir Mierle45fa7852020-08-10 21:09:54 -0700568 PW_METRIC_GROUP(metrics_, "power"); // Error: metrics_ must be first.
569 };
570
571.. attention::
572
573 Put **groups before metrics** when declaring metrics members inside classes.
574
575Thread safety
576-------------
577``pw_metric`` has **no built-in synchronization for manipulating the tree**
578structure. Users are expected to either rely on shared global mutex when
579constructing the metric tree, or do the metric construction in a single thread
580(e.g. a boot/init thread). The same applies for destruction, though we do not
581advise destructing metrics or groups.
582
583Individual metrics have atomic ``Increment()``, ``Set()``, and the value
584accessors ``as_float()`` and ``as_int()`` which don't require separate
585synchronization, and can be used from ISRs.
586
587.. attention::
588
589 **You must synchronize access to metrics**. ``pw_metrics`` does not
590 internally synchronize access during construction. Metric Set/Increment are
591 safe.
592
593Lifecycle
594---------
595Metric objects are not designed to be destructed, and are expected to live for
596the lifetime of the program or application. If you need dynamic
597creation/destruction of metrics, ``pw_metric`` does not attempt to cover that
598use case. Instead, ``pw_metric`` covers the case of products with two execution
599phases:
600
6011. A boot phase where the metric tree is created.
6022. A run phase where metrics are collected. The tree structure is fixed.
603
604Technically, it is possible to destruct metrics provided care is taken to
605remove the given metric (or group) from the list it's contained in. However,
606there are no helper functions for this, so be careful.
607
608Below is an example that **is incorrect**. Don't do what follows!
609
610.. code::
611
612 #include "pw_metric/metric.h"
613
614 void main() {
615 PW_METRIC_GROUP(root, "/");
616 {
617 // BAD! The metrics have a different lifetime than the group.
618 PW_METRIC(root, temperature, "temperature_f", 72.3f);
619 PW_METRIC(root, humidity, "humidity_relative_percent", 33.2f);
620 }
621 // OOPS! root now has a linked list that points to the destructed
622 // "humidity" object.
623 }
624
625.. attention::
626
627 **Don't destruct metrics**. Metrics are designed to be registered /
628 structured upfront, then manipulated during a device's active phase. They do
629 not support destruction.
630
Keir Mierlef4dfd872020-08-12 20:53:26 -0700631-----------------
632Exporting metrics
633-----------------
634Collecting metrics on a device is not useful without a mechanism to export
635those metrics for analysis and debugging. ``pw_metric`` offers an optional RPC
636service library (``:metric_service_nanopb``) that enables exporting a
637user-supplied set of on-device metrics via RPC. This facility is intended to
638function from the early stages of device bringup through production in the
639field.
640
641The metrics are fetched by calling the ``MetricService.Get`` RPC method, which
642streams all registered metrics to the caller in batches (server streaming RPC).
643Batching the returned metrics avoids requiring a large buffer or large RPC MTU.
644
645The returned metric objects have flattened paths to the root. For example, the
646returned metrics (post detokenization and jsonified) might look something like:
647
648.. code:: none
649
650 {
651 "/i2c1/failed_txns": 17,
652 "/i2c1/total_txns": 2013,
653 "/i2c1/gyro/resets": 24,
654 "/i2c1/gyro/hangs": 1,
655 "/spi1/thermocouple/reads": 242,
656 "/spi1/thermocouple/temp_celcius": 34.52,
657 }
658
659Note that there is no nesting of the groups; the nesting is implied from the
660path.
661
662RPC service setup
663-----------------
664To expose a ``MetricService`` in your application, do the following:
665
6661. Define metrics around the system, and put them in a group or list of
667 metrics. Easy choices include for example the ``global_groups`` and
668 ``global_metrics`` variables; or creat your own.
6692. Create an instance of ``pw::metric::MetricService``.
6703. Register the service with your RPC server.
671
672For example:
673
674.. code::
675
676 #include "pw_rpc/server.h"
677 #include "pw_metric/metric.h"
678 #include "pw_metric/global.h"
679 #include "pw_metric/metric_service_nanopb.h"
680
681 // Note: You must customize the RPC server setup; see pw_rpc.
682 Channel channels[] = {
683 Channel::Create<1>(&uart_output),
684 };
685 Server server(channels);
686
687 // Metric service instance, pointing to the global metric objects.
688 // This could also point to custom per-product or application objects.
689 pw::metric::MetricService metric_service(
690 pw::metric::global_metrics,
691 pw::metric::global_groups);
692
693 void RegisterServices() {
694 server.RegisterService(metric_service);
695 // Register other services here.
696 }
697
698 void main() {
699 // ... system initialization ...
700
701 RegisterServices();
702
703 // ... start your applcation ...
704 }
705
706.. attention::
707
708 Take care when exporting metrics. Ensure **appropriate access control** is in
709 place. In some cases it may make sense to entirely disable metrics export for
710 production builds. Although reading metrics via RPC won't influence the
711 device, in some cases the metrics could expose sensitive information if
712 product owners are not careful.
713
714.. attention::
715
716 **MetricService::Get is a synchronous RPC method**
717
718 Calls to is ``MetricService::Get`` are blocking and will send all metrics
719 immediately, even though it is a server-streaming RPC. This will work fine if
720 the device doesn't have too many metics, or doesn't have concurrent RPCs like
721 logging, but could be a problem in some cases.
722
723 We plan to offer an async version where the application is responsible for
724 pumping the metrics into the streaming response. This gives flow control to
725 the application.
726
Keir Mierle9b51cdf2020-08-19 09:46:19 -0700727-----------
728Size report
729-----------
730The below size report shows the cost in code and memory for a few examples of
731metrics. This does not include the RPC service.
732
733.. include:: metric_size_report
734
735.. attention::
736
737 At time of writing, **the above sizes show an unexpectedly large flash
738 impact**. We are investigating why GCC is inserting large global static
739 constructors per group, when all the logic should be reused across objects.
740
Keir Mierle45fa7852020-08-10 21:09:54 -0700741----------------
742Design tradeoffs
743----------------
744There are many possible approaches to metrics collection and aggregation. We've
745chosen some points on the tradeoff curve:
746
747- **Atomic-sized metrics** - Using simple metric objects with just uint32/float
748 enables atomic operations. While it might be nice to support larger types, it
749 is more useful to have safe metrics increment from interrupt subroutines.
750
751- **No aggregate metrics (yet)** - Aggregate metrics (e.g. average, max, min,
752 histograms) are not supported, and must be built on top of the simple base
753 metrics. By taking this route, we can considerably simplify the core metrics
754 system and have aggregation logic in separate modules. Those modules can then
755 feed into the metrics system - for example by creating multiple metrics for a
756 single underlying metric. For example: "foo", "foo_max", "foo_min" and so on.
757
758 The other problem with automatic aggregation is that what period the
759 aggregation happens over is often important, and it can be hard to design
760 this cleanly into the API. Instead, this responsibility is pushed to the user
761 who must take more care.
762
763 Note that we will add helpers for aggregated metrics.
764
765- **No virtual metrics** - An alternate approach to the concrete Metric class
766 in the current module is to have a virtual interface for metrics, and then
767 allow those metrics to have their own storage. This is attractive but can
768 lead to many vtables and excess memory use in simple one-metric use cases.
769
770- **Linked list registration** - Using linked lists for registration is a
771 tradeoff, accepting some memory overhead in exchange for flexibility. Other
772 alternatives include a global table of metrics, which has the disadvantage of
773 requiring centralizing the metrics -- an impossibility for middleware like
774 Pigweed.
775
776- **Synchronization** - The only synchronization guarantee provided by
777 pw_metric is that increment and set are atomic. Other than that, users are on
778 their own to synchonize metric collection and updating.
779
780- **No fast metric lookup** - The current design does not make it fast to
781 lookup a metric at runtime; instead, one must run a linear search of the tree
782 to find the matching metric. In most non-dynamic use cases, this is fine in
783 practice, and saves having a more involved hash table. Metric updates will be
784 through direct member or variable accesses.
785
786- **Relying on C++ static initialization** - In short, the convenience
787 outweighs the cost and risk. Without static initializers, it would be
788 impossible to automatically collect the metrics without post-processing the
789 C++ code to find the metrics; a huge and debatably worthwhile approach. We
790 have carefully analyzed the static initializer behaviour of Pigweed's
791 IntrusiveList and are confident it is correct.
792
793- **Both local & global support** - Potentially just one approach (the local or
794 global one) could be offered, making the module less complex. However, we
795 feel the additional complexity is worthwhile since there are legimitate use
796 cases for both e.g. ``PW_METRIC`` and ``PW_METRIC_GLOBAL``. We'd prefer to
797 have a well-tested upstream solution for these use cases rather than have
798 customers re-implement one of these.
799
800----------------
801Roadmap & Status
802----------------
803- **String metric names** - ``pw_metric`` stores metric names as tokens. On one
804 hand, this is great for production where having a compact binary is often a
805 requirement to fit the application in the given part. However, in early
806 development before flash is a constraint, string names are more convenient to
807 work with since there is no need for host-side detokenization. We plan to add
808 optional support for using supporting strings.
809
810- **Aggregate metrics** - We plan to add support for aggregate metrics on top
811 of the simple metric mechanism, either as another module or as additional
812 functionality inside this one. Likely examples include min/max,
813
814- **Selectively enable or disable metrics** - Currently the metrics are always
815 enabled once included. In practice this is not ideal since many times only a
816 few metrics are wanted in production, but having to strip all the metrics
817 code is error prone. Instead, we will add support for controlling what
818 metrics are enabled or disabled at compile time. This may rely on of C++20's
819 support for zero-sized members to fully remove the cost.
820
Keir Mierlef4dfd872020-08-12 20:53:26 -0700821- **Async RCPC** - The current RPC service exports the metrics by streaming
822 them to the client in batches. However, the current solution streams all the
823 metrics to completion; this may block the RPC thread. In the future we will
824 have an async solution where the user is in control of flow priority.
Keir Mierle45fa7852020-08-10 21:09:54 -0700825
826- **Timer integration** - We would like to add a stopwatch type mechanism to
827 time multiple in-flight events.
828
829- **C support** - In practice it's often useful or necessary to instrument
830 C-only code. While it will be impossible to support the global registration
831 system that the C++ version supports, we will figure out a solution to make
832 instrumenting C code relatively smooth.
833
834- **Global counter** - We may add a global metric counter to help detect cases
835 where post-initialization metrics manipulations are done.
836
837- **Proto structure** - It may be possible to directly map metrics to a custom
838 proto structure, where instead of a name or token field, a tag field is
839 provided. This could result in elegant export to an easily machine parsable
840 and compact representation on the host. We may investigate this in the
841 future.
842
843- **Safer data structures** - At a cost of 4B per metric and 4B per group, it
844 may be possible to make metric structure instantiation safe even in static
845 constructors, and also make it safe to remove metrics dynamically. We will
846 consider whether this tradeoff is the right one, since a 4B cost per metric
847 is substantial on projects with many metrics.