Sasha Goldshtein | 8e583cc | 2017-02-09 10:11:50 -0500 | [diff] [blame] | 1 | Demonstrations of syscount, the Linux/eBPF version. |
| 2 | |
| 3 | |
| 4 | syscount summarizes syscall counts across the system or a specific process, |
| 5 | with optional latency information. It is very useful for general workload |
| 6 | characterization, for example: |
| 7 | |
| 8 | # syscount |
| 9 | Tracing syscalls, printing top 10... Ctrl+C to quit. |
| 10 | [09:39:04] |
| 11 | SYSCALL COUNT |
| 12 | write 10739 |
| 13 | read 10584 |
| 14 | wait4 1460 |
| 15 | nanosleep 1457 |
| 16 | select 795 |
| 17 | rt_sigprocmask 689 |
| 18 | clock_gettime 653 |
| 19 | rt_sigaction 128 |
| 20 | futex 86 |
| 21 | ioctl 83 |
| 22 | ^C |
| 23 | |
| 24 | These are the top 10 entries; you can get more by using the -T switch. Here, |
| 25 | the output indicates that the write and read syscalls were very common, followed |
| 26 | immediately by wait4, nanosleep, and so on. By default, syscount counts across |
| 27 | the entire system, but we can point it to a specific process of interest: |
| 28 | |
| 29 | # syscount -p $(pidof dd) |
| 30 | Tracing syscalls, printing top 10... Ctrl+C to quit. |
| 31 | [09:40:21] |
| 32 | SYSCALL COUNT |
| 33 | read 7878397 |
| 34 | write 7878397 |
| 35 | ^C |
| 36 | |
| 37 | Indeed, dd's workload is a bit easier to characterize. Occasionally, the count |
| 38 | of syscalls is not enough, and you'd also want an aggregate latency: |
| 39 | |
| 40 | # syscount -L |
| 41 | Tracing syscalls, printing top 10... Ctrl+C to quit. |
| 42 | [09:41:32] |
| 43 | SYSCALL COUNT TIME (us) |
| 44 | select 16 3415860.022 |
| 45 | nanosleep 291 12038.707 |
| 46 | ftruncate 1 122.939 |
| 47 | write 4 63.389 |
| 48 | stat 1 23.431 |
| 49 | fstat 1 5.088 |
| 50 | [unknown: 321] 32 4.965 |
| 51 | timerfd_settime 1 4.830 |
| 52 | ioctl 3 4.802 |
| 53 | kill 1 4.342 |
| 54 | ^C |
| 55 | |
| 56 | The select and nanosleep calls are responsible for a lot of time, but remember |
| 57 | these are blocking calls. This output was taken from a mostly idle system. Note |
| 58 | the "unknown" entry -- syscall 321 is the bpf() syscall, which is not in the |
| 59 | table used by this tool (borrowed from strace sources). |
| 60 | |
| 61 | Another direction would be to understand which processes are making a lot of |
| 62 | syscalls, thus responsible for a lot of activity. This is what the -P switch |
| 63 | does: |
| 64 | |
| 65 | # syscount -P |
| 66 | Tracing syscalls, printing top 10... Ctrl+C to quit. |
| 67 | [09:58:13] |
| 68 | PID COMM COUNT |
| 69 | 13820 vim 548 |
| 70 | 30216 sshd 149 |
| 71 | 29633 bash 72 |
| 72 | 25188 screen 70 |
| 73 | 25776 mysqld 30 |
| 74 | 31285 python 10 |
| 75 | 529 systemd-udevd 9 |
| 76 | 1 systemd 8 |
| 77 | 494 systemd-journal 5 |
| 78 | ^C |
| 79 | |
| 80 | This is again from a mostly idle system over an interval of a few seconds. |
| 81 | |
| 82 | Sometimes, you'd only care about failed syscalls -- these are the ones that |
| 83 | might be worth investigating with follow-up tools like opensnoop, execsnoop, |
| 84 | or trace. Use the -x switch for this; the following example also demonstrates |
| 85 | the -i switch, for printing at predefined intervals: |
| 86 | |
| 87 | # syscount -x -i 5 |
| 88 | Tracing failed syscalls, printing top 10... Ctrl+C to quit. |
| 89 | [09:44:16] |
| 90 | SYSCALL COUNT |
| 91 | futex 13 |
| 92 | getxattr 10 |
| 93 | stat 8 |
| 94 | open 6 |
| 95 | wait4 3 |
| 96 | access 2 |
| 97 | [unknown: 321] 1 |
| 98 | |
| 99 | [09:44:21] |
| 100 | SYSCALL COUNT |
| 101 | futex 12 |
| 102 | getxattr 10 |
| 103 | [unknown: 321] 2 |
| 104 | wait4 1 |
| 105 | access 1 |
| 106 | pause 1 |
| 107 | ^C |
| 108 | |
| 109 | USAGE: |
| 110 | # syscount -h |
| 111 | usage: syscount.py [-h] [-p PID] [-i INTERVAL] [-T TOP] [-x] [-L] [-m] [-P] |
| 112 | [-l] |
| 113 | |
| 114 | Summarize syscall counts and latencies. |
| 115 | |
| 116 | optional arguments: |
| 117 | -h, --help show this help message and exit |
| 118 | -p PID, --pid PID trace only this pid |
| 119 | -i INTERVAL, --interval INTERVAL |
| 120 | print summary at this interval (seconds) |
| 121 | -T TOP, --top TOP print only the top syscalls by count or latency |
| 122 | -x, --failures trace only failed syscalls (return < 0) |
| 123 | -L, --latency collect syscall latency |
| 124 | -m, --milliseconds display latency in milliseconds (default: |
| 125 | microseconds) |
| 126 | -P, --process count by process and not by syscall |
| 127 | -l, --list print list of recognized syscalls and exit |