Teng Qin | 2f3cdbf | 2016-10-20 16:50:06 -0700 | [diff] [blame] | 1 | Demonstrations of llcstat. |
| 2 | |
Brendan Gregg | 715f7e6 | 2016-10-20 22:50:08 -0700 | [diff] [blame] | 3 | |
Teng Qin | 2f3cdbf | 2016-10-20 16:50:06 -0700 | [diff] [blame] | 4 | llcstat traces cache reference and cache miss events system-wide, and summarizes |
| 5 | them by PID and CPU. |
Brendan Gregg | 715f7e6 | 2016-10-20 22:50:08 -0700 | [diff] [blame] | 6 | |
Teng Qin | 2f3cdbf | 2016-10-20 16:50:06 -0700 | [diff] [blame] | 7 | These events, defined in uapi/linux/perf_event.h, have different meanings on |
| 8 | different architecture. For x86-64, they mean misses and references to LLC. |
| 9 | |
| 10 | Example output: |
| 11 | |
| 12 | # ./llcstat.py 20 -c 5000 |
| 13 | Running for 20 seconds or hit Ctrl-C to end. |
| 14 | PID NAME CPU REFERENCE MISS HIT% |
| 15 | 0 swapper/15 15 3515000 640000 81.79% |
| 16 | 238 migration/38 38 5000 0 100.00% |
| 17 | 4512 ntpd 11 5000 0 100.00% |
| 18 | 150867 ipmitool 3 25000 5000 80.00% |
| 19 | 150895 lscpu 17 280000 25000 91.07% |
| 20 | 151807 ipmitool 15 15000 5000 66.67% |
| 21 | 150757 awk 2 15000 5000 66.67% |
| 22 | 151213 chef-client 5 1770000 240000 86.44% |
| 23 | 151822 scribe-dispatch 12 15000 0 100.00% |
| 24 | 123386 mysqld 5 5000 0 100.00% |
| 25 | [...] |
| 26 | Total References: 518920000 Total Misses: 90265000 Hit Rate: 82.61% |
| 27 | |
| 28 | This shows each PID's cache hit rate during the 20 seconds run period. |
| 29 | |
Brendan Gregg | 715f7e6 | 2016-10-20 22:50:08 -0700 | [diff] [blame] | 30 | A count of 5000 was used in this example, which means that one in every 5,000 |
| 31 | events will trigger an in-kernel counter to be incremented. This is refactored |
| 32 | on the output, which is why it is always in multiples of 5,000. |
| 33 | |
| 34 | We don't instrument every single event since the overhead would be prohibitive, |
| 35 | nor do we need to: this is a type of sampling profiler. Because of this, the |
| 36 | processes that trigger the 5,000'th cache reference or misses can happen to |
| 37 | some degree by chance. Overall it should make sense. But for low counts, |
| 38 | you might find a case where -- by chance -- a process has been tallied with |
| 39 | more misses than references, which would seem impossible. |
| 40 | |
| 41 | |
Teng Qin | 2f3cdbf | 2016-10-20 16:50:06 -0700 | [diff] [blame] | 42 | USAGE message: |
| 43 | |
| 44 | # ./llcstat.py --help |
| 45 | usage: llcstat.py [-h] [-c SAMPLE_PERIOD] [duration] |
| 46 | |
| 47 | Summarize cache references and misses by PID |
| 48 | |
| 49 | positional arguments: |
| 50 | duration Duration, in seconds, to run |
| 51 | |
| 52 | optional arguments: |
| 53 | -h, --help show this help message and exit |
| 54 | -c SAMPLE_PERIOD, --sample_period SAMPLE_PERIOD |
| 55 | Sample one in this many number of cache reference |
| 56 | and miss events |