| Demonstrations of syscount, the Linux/eBPF version. |
| |
| |
| syscount summarizes syscall counts across the system or a specific process, |
| with optional latency information. It is very useful for general workload |
| characterization, for example: |
| |
| # syscount |
| Tracing syscalls, printing top 10... Ctrl+C to quit. |
| [09:39:04] |
| SYSCALL COUNT |
| write 10739 |
| read 10584 |
| wait4 1460 |
| nanosleep 1457 |
| select 795 |
| rt_sigprocmask 689 |
| clock_gettime 653 |
| rt_sigaction 128 |
| futex 86 |
| ioctl 83 |
| ^C |
| |
| These are the top 10 entries; you can get more by using the -T switch. Here, |
| the output indicates that the write and read syscalls were very common, followed |
| immediately by wait4, nanosleep, and so on. By default, syscount counts across |
| the entire system, but we can point it to a specific process of interest: |
| |
| # syscount -p $(pidof dd) |
| Tracing syscalls, printing top 10... Ctrl+C to quit. |
| [09:40:21] |
| SYSCALL COUNT |
| read 7878397 |
| write 7878397 |
| ^C |
| |
| Indeed, dd's workload is a bit easier to characterize. Occasionally, the count |
| of syscalls is not enough, and you'd also want an aggregate latency: |
| |
| # syscount -L |
| Tracing syscalls, printing top 10... Ctrl+C to quit. |
| [09:41:32] |
| SYSCALL COUNT TIME (us) |
| select 16 3415860.022 |
| nanosleep 291 12038.707 |
| ftruncate 1 122.939 |
| write 4 63.389 |
| stat 1 23.431 |
| fstat 1 5.088 |
| [unknown: 321] 32 4.965 |
| timerfd_settime 1 4.830 |
| ioctl 3 4.802 |
| kill 1 4.342 |
| ^C |
| |
| The select and nanosleep calls are responsible for a lot of time, but remember |
| these are blocking calls. This output was taken from a mostly idle system. Note |
| the "unknown" entry -- syscall 321 is the bpf() syscall, which is not in the |
| table used by this tool (borrowed from strace sources). |
| |
| Another direction would be to understand which processes are making a lot of |
| syscalls, thus responsible for a lot of activity. This is what the -P switch |
| does: |
| |
| # syscount -P |
| Tracing syscalls, printing top 10... Ctrl+C to quit. |
| [09:58:13] |
| PID COMM COUNT |
| 13820 vim 548 |
| 30216 sshd 149 |
| 29633 bash 72 |
| 25188 screen 70 |
| 25776 mysqld 30 |
| 31285 python 10 |
| 529 systemd-udevd 9 |
| 1 systemd 8 |
| 494 systemd-journal 5 |
| ^C |
| |
| This is again from a mostly idle system over an interval of a few seconds. |
| |
| Sometimes, you'd only care about failed syscalls -- these are the ones that |
| might be worth investigating with follow-up tools like opensnoop, execsnoop, |
| or trace. Use the -x switch for this; the following example also demonstrates |
| the -i switch, for printing at predefined intervals: |
| |
| # syscount -x -i 5 |
| Tracing failed syscalls, printing top 10... Ctrl+C to quit. |
| [09:44:16] |
| SYSCALL COUNT |
| futex 13 |
| getxattr 10 |
| stat 8 |
| open 6 |
| wait4 3 |
| access 2 |
| [unknown: 321] 1 |
| |
| [09:44:21] |
| SYSCALL COUNT |
| futex 12 |
| getxattr 10 |
| [unknown: 321] 2 |
| wait4 1 |
| access 1 |
| pause 1 |
| ^C |
| |
| Similar to -x/--failures, sometimes you only care about certain syscall |
| errors like EPERM or ENONET -- these are the ones that might be worth |
| investigating with follow-up tools like opensnoop, execsnoop, or |
| trace. Use the -e/--errno switch for this; the following example also |
| demonstrates the -e switch, for printing ENOENT failures at predefined intervals: |
| |
| # syscount -e ENOENT -i 5 |
| Tracing syscalls, printing top 10... Ctrl+C to quit. |
| [13:15:57] |
| SYSCALL COUNT |
| stat 4669 |
| open 1951 |
| access 561 |
| lstat 62 |
| openat 42 |
| readlink 8 |
| execve 4 |
| newfstatat 1 |
| |
| [13:16:02] |
| SYSCALL COUNT |
| lstat 18506 |
| stat 13087 |
| open 2907 |
| access 412 |
| openat 19 |
| readlink 12 |
| execve 7 |
| connect 6 |
| unlink 1 |
| rmdir 1 |
| ^C |
| |
| USAGE: |
| # syscount -h |
| usage: syscount.py [-h] [-p PID] [-i INTERVAL] [-T TOP] [-x] [-e ERRNO] [-L] |
| [-m] [-P] [-l] |
| |
| Summarize syscall counts and latencies. |
| |
| optional arguments: |
| -h, --help show this help message and exit |
| -p PID, --pid PID trace only this pid |
| -i INTERVAL, --interval INTERVAL |
| print summary at this interval (seconds) |
| -d DURATION, --duration DURATION |
| total duration of trace, in seconds |
| -T TOP, --top TOP print only the top syscalls by count or latency |
| -x, --failures trace only failed syscalls (return < 0) |
| -e ERRNO, --errno ERRNO |
| trace only syscalls that return this error (numeric or |
| EPERM, etc.) |
| -L, --latency collect syscall latency |
| -m, --milliseconds display latency in milliseconds (default: |
| microseconds) |
| -P, --process count by process and not by syscall |
| -l, --list print list of recognized syscalls and exit |