blob: aad51c409f161252ddc9d02007c0d2521b542646 [file] [log] [blame]
Sasha Goldshtein8e583cc2017-02-09 10:11:50 -05001Demonstrations of syscount, the Linux/eBPF version.
2
3
4syscount summarizes syscall counts across the system or a specific process,
5with optional latency information. It is very useful for general workload
6characterization, for example:
7
8# syscount
9Tracing syscalls, printing top 10... Ctrl+C to quit.
10[09:39:04]
11SYSCALL COUNT
12write 10739
13read 10584
14wait4 1460
15nanosleep 1457
16select 795
17rt_sigprocmask 689
18clock_gettime 653
19rt_sigaction 128
20futex 86
21ioctl 83
22^C
23
24These are the top 10 entries; you can get more by using the -T switch. Here,
25the output indicates that the write and read syscalls were very common, followed
26immediately by wait4, nanosleep, and so on. By default, syscount counts across
27the entire system, but we can point it to a specific process of interest:
28
29# syscount -p $(pidof dd)
30Tracing syscalls, printing top 10... Ctrl+C to quit.
31[09:40:21]
32SYSCALL COUNT
33read 7878397
34write 7878397
35^C
36
37Indeed, dd's workload is a bit easier to characterize. Occasionally, the count
38of syscalls is not enough, and you'd also want an aggregate latency:
39
40# syscount -L
41Tracing syscalls, printing top 10... Ctrl+C to quit.
42[09:41:32]
43SYSCALL COUNT TIME (us)
44select 16 3415860.022
45nanosleep 291 12038.707
46ftruncate 1 122.939
47write 4 63.389
48stat 1 23.431
49fstat 1 5.088
50[unknown: 321] 32 4.965
51timerfd_settime 1 4.830
52ioctl 3 4.802
53kill 1 4.342
54^C
55
56The select and nanosleep calls are responsible for a lot of time, but remember
57these are blocking calls. This output was taken from a mostly idle system. Note
58the "unknown" entry -- syscall 321 is the bpf() syscall, which is not in the
59table used by this tool (borrowed from strace sources).
60
61Another direction would be to understand which processes are making a lot of
62syscalls, thus responsible for a lot of activity. This is what the -P switch
63does:
64
65# syscount -P
66Tracing syscalls, printing top 10... Ctrl+C to quit.
67[09:58:13]
68PID COMM COUNT
6913820 vim 548
7030216 sshd 149
7129633 bash 72
7225188 screen 70
7325776 mysqld 30
7431285 python 10
75529 systemd-udevd 9
761 systemd 8
77494 systemd-journal 5
78^C
79
80This is again from a mostly idle system over an interval of a few seconds.
81
82Sometimes, you'd only care about failed syscalls -- these are the ones that
83might be worth investigating with follow-up tools like opensnoop, execsnoop,
84or trace. Use the -x switch for this; the following example also demonstrates
85the -i switch, for printing at predefined intervals:
86
87# syscount -x -i 5
88Tracing failed syscalls, printing top 10... Ctrl+C to quit.
89[09:44:16]
90SYSCALL COUNT
91futex 13
92getxattr 10
93stat 8
94open 6
95wait4 3
96access 2
97[unknown: 321] 1
98
99[09:44:21]
100SYSCALL COUNT
101futex 12
102getxattr 10
103[unknown: 321] 2
104wait4 1
105access 1
106pause 1
107^C
108
Lucian Adrian Grijincu5426ef22018-01-11 12:07:42 -0800109Similar to -x/--failures, sometimes you only care about certain syscall
110errors like EPERM or ENONET -- these are the ones that might be worth
111investigating with follow-up tools like opensnoop, execsnoop, or
112trace. Use the -e/--errno switch for this; the following example also
113demonstrates the -e switch, for printing ENOENT failures at predefined intervals:
114
115# syscount -e ENOENT -i 5
116Tracing syscalls, printing top 10... Ctrl+C to quit.
117[13:15:57]
118SYSCALL COUNT
119stat 4669
120open 1951
121access 561
122lstat 62
123openat 42
124readlink 8
125execve 4
126newfstatat 1
127
128[13:16:02]
129SYSCALL COUNT
130lstat 18506
131stat 13087
132open 2907
133access 412
134openat 19
135readlink 12
136execve 7
137connect 6
138unlink 1
139rmdir 1
140^C
141
Sasha Goldshtein8e583cc2017-02-09 10:11:50 -0500142USAGE:
143# syscount -h
Lucian Adrian Grijincu5426ef22018-01-11 12:07:42 -0800144usage: syscount.py [-h] [-p PID] [-i INTERVAL] [-T TOP] [-x] [-e ERRNO] [-L]
145 [-m] [-P] [-l]
Sasha Goldshtein8e583cc2017-02-09 10:11:50 -0500146
147Summarize syscall counts and latencies.
148
149optional arguments:
150 -h, --help show this help message and exit
151 -p PID, --pid PID trace only this pid
152 -i INTERVAL, --interval INTERVAL
153 print summary at this interval (seconds)
Akilesh Kailash89967192018-05-18 13:36:54 -0700154 -d DURATION, --duration DURATION
155 total duration of trace, in seconds
Sasha Goldshtein8e583cc2017-02-09 10:11:50 -0500156 -T TOP, --top TOP print only the top syscalls by count or latency
157 -x, --failures trace only failed syscalls (return < 0)
Lucian Adrian Grijincu5426ef22018-01-11 12:07:42 -0800158 -e ERRNO, --errno ERRNO
159 trace only syscalls that return this error (numeric or
160 EPERM, etc.)
Sasha Goldshtein8e583cc2017-02-09 10:11:50 -0500161 -L, --latency collect syscall latency
162 -m, --milliseconds display latency in milliseconds (default:
163 microseconds)
164 -P, --process count by process and not by syscall
165 -l, --list print list of recognized syscalls and exit