blob: b3fea3e9e4b8c8192a30bc659a5d05e9cf9caf56 [file] [log] [blame]
Edward Wu2188d232020-05-28 09:36:21 +08001Demonstrations of funcinterval, the Linux eBPF/bcc version.
2
3eBPF/bcc is very suitable for platform performance tuning.
4By funclatency, we can profile specific functions to know how latency
5this function costs. However, sometimes performance drop is not about the
6latency of function but the interval between function calls.
7funcinterval is born for this purpose.
8
9Another story, hardware performance tuning on the platform we will use
10protocol analyzer to analyze performance, but most protocol analyzers lack
11the distribution feature. Using a protocol analyzer you need a lot of time
12to check every detail latency. By funcinterval, we can save a lot of time
13by distribution feature.
14
15For example:
16
17# ./funcinterval xhci_ring_ep_doorbell -d 2 -u
18Tracing 1 functions for "xhci_ring_ep_doorbell"... Hit Ctrl-C to end.
19
20 usecs : count distribution
21 0 -> 1 : 0 | |
22 2 -> 3 : 0 | |
23 4 -> 7 : 0 | |
24 8 -> 15 : 0 | |
25 16 -> 31 : 0 | |
26 32 -> 63 : 134 | |
27 64 -> 127 : 2862 |******************** |
28 128 -> 255 : 5552 |****************************************|
29 256 -> 511 : 216 |* |
30 512 -> 1023 : 2 | |
31Detaching...
32
33This example output shows that the interval latency of xhci_ring_ep_doorbell
34took between 64 and 255 microseconds. USB MAC will start its job after USB
35doorbell register ringing, above information that can help hardware engineer to
36analyze, the performance drop is because software rings the doorbell too
37late or just slowly hardware DMA.
38
39# ./funcinterval blk_start_request -i 2 -u
40Tracing 1 functions for "blk_start_request"... Hit Ctrl-C to end.
41
42 usecs : count distribution
43 0 -> 1 : 0 | |
44 2 -> 3 : 0 | |
45 4 -> 7 : 0 | |
46 8 -> 15 : 5 |* |
47 16 -> 31 : 0 | |
48 32 -> 63 : 1 | |
49 64 -> 127 : 2 | |
50 128 -> 255 : 1 | |
51 256 -> 511 : 0 | |
52 512 -> 1023 : 1 | |
53 1024 -> 2047 : 1 | |
54 2048 -> 4095 : 117 |****************************************|
55 4096 -> 8191 : 13 |**** |
56 8192 -> 16383 : 1 | |
57
58If using biolatency tool that has no difference between two platforms.
59Maybe the problem is related to the interval time instead of block
60device I/O latency.
61
62# ./funcinterval ion_ioctl -i 2 -m
63Tracing 1 functions for "ion_ioctl"... Hit Ctrl-C to end.
64
65 msecs : count distribution
66 0 -> 1 : 215 |****************************************|
67 2 -> 3 : 0 | |
68 4 -> 7 : 4 | |
69 8 -> 15 : 5 | |
70 16 -> 31 : 29 |***** |
71
72You can also check the ion_ioctl. By the above output, we know the activity
73frequency of ion_ioctl() is high mostly(less than 1 ms), but has 29 times low
74frequency.
75
76# ./funcinterval t:block:block_bio_queue -d 30 -u
77Tracing tracepoint for "block:block_bio_queue"... Hit Ctrl-C to end.
78
79 usecs : count distribution
80 0 -> 1 : 0 | |
81 2 -> 3 : 0 | |
82 4 -> 7 : 0 | |
83 8 -> 15 : 2 | |
84 16 -> 31 : 262 | |
85 32 -> 63 : 9075 |******************* |
86 64 -> 127 : 18668 |****************************************|
87 128 -> 255 : 1492 |*** |
88 256 -> 511 : 2616 |***** |
89 512 -> 1023 : 7226 |*************** |
90 1024 -> 2047 : 8982 |******************* |
91 2048 -> 4095 : 2394 |***** |
92 4096 -> 8191 : 163 | |
93 8192 -> 16383 : 42 | |
94 16384 -> 32767 : 2 | |
95 32768 -> 65535 : 0 | |
96 65536 -> 131071 : 0 | |
97 131072 -> 262143 : 0 | |
98 262144 -> 524287 : 0 | |
99 524288 -> 1048575 : 1 | |
100Detaching...
101
102# ./funcinterval t:block:block_rq_issue -d 30 -u
103Tracing tracepoint for "block:block_rq_issue"... Hit Ctrl-C to end.
104
105 usecs : count distribution
106 0 -> 1 : 0 | |
107 2 -> 3 : 0 | |
108 4 -> 7 : 0 | |
109 8 -> 15 : 0 | |
110 16 -> 31 : 5 | |
111 32 -> 63 : 18 | |
112 64 -> 127 : 32 | |
113 128 -> 255 : 95 | |
114 256 -> 511 : 2194 |****** |
115 512 -> 1023 : 13830 |****************************************|
116 1024 -> 2047 : 9001 |************************** |
117 2048 -> 4095 : 1569 |**** |
118 4096 -> 8191 : 96 | |
119 8192 -> 16383 : 17 | |
120Detaching...
121
122funcinterval also supports tracepoint filter. The above two cases are under EMMC
123throughput testing, by those results you know which layer has a slower interval
124time. In our case, mmc-cmdqd is slower than block layer.
125
126
zhaoyao73f47b81a2021-06-30 13:15:26 -0400127# ./funcinterval -p `pidof -s top` c:malloc -i 3
128Tracing uprobe function for "malloc"... Hit Ctrl-C to end.
129
130 nsecs : count distribution
131 0 -> 1 : 0 | |
132 2 -> 3 : 0 | |
133 4 -> 7 : 0 | |
134 8 -> 15 : 0 | |
135 16 -> 31 : 0 | |
136 32 -> 63 : 0 | |
137 64 -> 127 : 0 | |
138 128 -> 255 : 0 | |
139 256 -> 511 : 0 | |
140 512 -> 1023 : 0 | |
141 1024 -> 2047 : 0 | |
142 2048 -> 4095 : 0 | |
143 4096 -> 8191 : 7 |************************* |
144 8192 -> 16383 : 11 |****************************************|
145 16384 -> 32767 : 4 |************** |
146 32768 -> 65535 : 1 |*** |
147 65536 -> 131071 : 1 |*** |
148 131072 -> 262143 : 1 |*** |
149 262144 -> 524287 : 0 | |
150 524288 -> 1048575 : 0 | |
151 1048576 -> 2097151 : 0 | |
152 2097152 -> 4194303 : 0 | |
153 4194304 -> 8388607 : 1 |*** |
154
155
156 nsecs : count distribution
157 0 -> 1 : 0 | |
158 2 -> 3 : 0 | |
159 4 -> 7 : 0 | |
160 8 -> 15 : 0 | |
161 16 -> 31 : 0 | |
162 32 -> 63 : 0 | |
163 64 -> 127 : 0 | |
164 128 -> 255 : 0 | |
165 256 -> 511 : 0 | |
166 512 -> 1023 : 0 | |
167 1024 -> 2047 : 0 | |
168 2048 -> 4095 : 0 | |
169 4096 -> 8191 : 8 |******************************** |
170 8192 -> 16383 : 10 |****************************************|
171 16384 -> 32767 : 4 |**************** |
172 32768 -> 65535 : 1 |**** |
173 65536 -> 131071 : 1 |**** |
174 131072 -> 262143 : 1 |**** |
175 262144 -> 524287 : 0 | |
176 524288 -> 1048575 : 0 | |
177 1048576 -> 2097151 : 0 | |
178 2097152 -> 4194303 : 0 | |
179 4194304 -> 8388607 : 1 |**** |
180
181Time the interval of libc's malloc for top utility every 3 seconds.
182
183# ./funcinterval /usr/local/bin/python:main
184Tracing uprobe function for "main"... Hit Ctrl-C to end.
185^C
186 nsecs : count distribution
187 0 -> 1 : 0 | |
188 2 -> 3 : 0 | |
189 4 -> 7 : 0 | |
190 8 -> 15 : 0 | |
191 16 -> 31 : 0 | |
192 32 -> 63 : 0 | |
193 64 -> 127 : 0 | |
194 128 -> 255 : 0 | |
195 256 -> 511 : 0 | |
196 512 -> 1023 : 0 | |
197 1024 -> 2047 : 0 | |
198 2048 -> 4095 : 0 | |
199 4096 -> 8191 : 0 | |
200 8192 -> 16383 : 0 | |
201 16384 -> 32767 : 0 | |
202 32768 -> 65535 : 0 | |
203 65536 -> 131071 : 0 | |
204 131072 -> 262143 : 0 | |
205 262144 -> 524287 : 0 | |
206 524288 -> 1048575 : 0 | |
207 1048576 -> 2097151 : 0 | |
208 2097152 -> 4194303 : 0 | |
209 4194304 -> 8388607 : 0 | |
210 8388608 -> 16777215 : 0 | |
211 16777216 -> 33554431 : 0 | |
212 33554432 -> 67108863 : 0 | |
213 67108864 -> 134217727 : 0 | |
214 134217728 -> 268435455 : 0 | |
215 268435456 -> 536870911 : 1 |****************************************|
216 536870912 -> 1073741823 : 1 |****************************************|
2171073741824 -> 2147483647 : 1 |****************************************|
2182147483648 -> 4294967295 : 1 |****************************************|
219Detaching...
220
221Time the interal of python's main function.
222
Edward Wu2188d232020-05-28 09:36:21 +0800223USAGE message:
224
225# ./funcinterval -h
226usage: funcinterval [-h] [-p PID] [-i INTERVAL] [-d DURATION] [-T] [-u] [-m]
227 [-v]
228 pattern
229
230Time interval and print latency as a histogram
231
232positional arguments:
233 pattern Function name for tracing
234
235optional arguments:
236 -h, --help show this help message and exit
237 -p PID, --pid PID trace this PID only
238 -i INTERVAL, --interval INTERVAL
239 summary interval, in seconds
240 -d DURATION, --duration DURATION
241 total duration of trace, in seconds
242 -T, --timestamp include timestamp on output
243 -u, --microseconds microsecond histogram
244 -m, --milliseconds millisecond histogram
245 -v, --verbose print the BPF program (for debugging purposes)
246
247examples:
248 # time the interval of do_sys_open()
249 ./funcinterval do_sys_open
250 # time the interval of xhci_ring_ep_doorbell(), in microseconds
251 ./funcinterval -u xhci_ring_ep_doorbell
252 # time the interval of do_nanosleep(), in milliseconds
253 ./funcinterval -m do_nanosleep
254 # output every 5 seconds, with timestamps
255 ./funcinterval -mTi 5 vfs_read
256 # time process 181 only
257 ./funcinterval -p 181 vfs_read
258 # time the interval of mm_vmscan_direct_reclaim_begin tracepoint
259 ./funcinterval t:vmscan:mm_vmscan_direct_reclaim_begin
zhaoyao73f47b81a2021-06-30 13:15:26 -0400260 # time the interval of c:malloc used by top every 3 seconds
261 ./funcinterval -p `pidof -s top` -i 3 c:malloc
262 # time /usr/local/bin/python main function
263 ./funcinterval /usr/local/bin/python:main