Edward Wu | 2188d23 | 2020-05-28 09:36:21 +0800 | [diff] [blame] | 1 | Demonstrations of funcinterval, the Linux eBPF/bcc version. |
| 2 | |
| 3 | eBPF/bcc is very suitable for platform performance tuning. |
| 4 | By funclatency, we can profile specific functions to know how latency |
| 5 | this function costs. However, sometimes performance drop is not about the |
| 6 | latency of function but the interval between function calls. |
| 7 | funcinterval is born for this purpose. |
| 8 | |
| 9 | Another story, hardware performance tuning on the platform we will use |
| 10 | protocol analyzer to analyze performance, but most protocol analyzers lack |
| 11 | the distribution feature. Using a protocol analyzer you need a lot of time |
| 12 | to check every detail latency. By funcinterval, we can save a lot of time |
| 13 | by distribution feature. |
| 14 | |
| 15 | For example: |
| 16 | |
| 17 | # ./funcinterval xhci_ring_ep_doorbell -d 2 -u |
| 18 | Tracing 1 functions for "xhci_ring_ep_doorbell"... Hit Ctrl-C to end. |
| 19 | |
| 20 | usecs : count distribution |
| 21 | 0 -> 1 : 0 | | |
| 22 | 2 -> 3 : 0 | | |
| 23 | 4 -> 7 : 0 | | |
| 24 | 8 -> 15 : 0 | | |
| 25 | 16 -> 31 : 0 | | |
| 26 | 32 -> 63 : 134 | | |
| 27 | 64 -> 127 : 2862 |******************** | |
| 28 | 128 -> 255 : 5552 |****************************************| |
| 29 | 256 -> 511 : 216 |* | |
| 30 | 512 -> 1023 : 2 | | |
| 31 | Detaching... |
| 32 | |
| 33 | This example output shows that the interval latency of xhci_ring_ep_doorbell |
| 34 | took between 64 and 255 microseconds. USB MAC will start its job after USB |
| 35 | doorbell register ringing, above information that can help hardware engineer to |
| 36 | analyze, the performance drop is because software rings the doorbell too |
| 37 | late or just slowly hardware DMA. |
| 38 | |
| 39 | # ./funcinterval blk_start_request -i 2 -u |
| 40 | Tracing 1 functions for "blk_start_request"... Hit Ctrl-C to end. |
| 41 | |
| 42 | usecs : count distribution |
| 43 | 0 -> 1 : 0 | | |
| 44 | 2 -> 3 : 0 | | |
| 45 | 4 -> 7 : 0 | | |
| 46 | 8 -> 15 : 5 |* | |
| 47 | 16 -> 31 : 0 | | |
| 48 | 32 -> 63 : 1 | | |
| 49 | 64 -> 127 : 2 | | |
| 50 | 128 -> 255 : 1 | | |
| 51 | 256 -> 511 : 0 | | |
| 52 | 512 -> 1023 : 1 | | |
| 53 | 1024 -> 2047 : 1 | | |
| 54 | 2048 -> 4095 : 117 |****************************************| |
| 55 | 4096 -> 8191 : 13 |**** | |
| 56 | 8192 -> 16383 : 1 | | |
| 57 | |
| 58 | If using biolatency tool that has no difference between two platforms. |
| 59 | Maybe the problem is related to the interval time instead of block |
| 60 | device I/O latency. |
| 61 | |
| 62 | # ./funcinterval ion_ioctl -i 2 -m |
| 63 | Tracing 1 functions for "ion_ioctl"... Hit Ctrl-C to end. |
| 64 | |
| 65 | msecs : count distribution |
| 66 | 0 -> 1 : 215 |****************************************| |
| 67 | 2 -> 3 : 0 | | |
| 68 | 4 -> 7 : 4 | | |
| 69 | 8 -> 15 : 5 | | |
| 70 | 16 -> 31 : 29 |***** | |
| 71 | |
| 72 | You can also check the ion_ioctl. By the above output, we know the activity |
| 73 | frequency of ion_ioctl() is high mostly(less than 1 ms), but has 29 times low |
| 74 | frequency. |
| 75 | |
| 76 | # ./funcinterval t:block:block_bio_queue -d 30 -u |
| 77 | Tracing tracepoint for "block:block_bio_queue"... Hit Ctrl-C to end. |
| 78 | |
| 79 | usecs : count distribution |
| 80 | 0 -> 1 : 0 | | |
| 81 | 2 -> 3 : 0 | | |
| 82 | 4 -> 7 : 0 | | |
| 83 | 8 -> 15 : 2 | | |
| 84 | 16 -> 31 : 262 | | |
| 85 | 32 -> 63 : 9075 |******************* | |
| 86 | 64 -> 127 : 18668 |****************************************| |
| 87 | 128 -> 255 : 1492 |*** | |
| 88 | 256 -> 511 : 2616 |***** | |
| 89 | 512 -> 1023 : 7226 |*************** | |
| 90 | 1024 -> 2047 : 8982 |******************* | |
| 91 | 2048 -> 4095 : 2394 |***** | |
| 92 | 4096 -> 8191 : 163 | | |
| 93 | 8192 -> 16383 : 42 | | |
| 94 | 16384 -> 32767 : 2 | | |
| 95 | 32768 -> 65535 : 0 | | |
| 96 | 65536 -> 131071 : 0 | | |
| 97 | 131072 -> 262143 : 0 | | |
| 98 | 262144 -> 524287 : 0 | | |
| 99 | 524288 -> 1048575 : 1 | | |
| 100 | Detaching... |
| 101 | |
| 102 | # ./funcinterval t:block:block_rq_issue -d 30 -u |
| 103 | Tracing tracepoint for "block:block_rq_issue"... Hit Ctrl-C to end. |
| 104 | |
| 105 | usecs : count distribution |
| 106 | 0 -> 1 : 0 | | |
| 107 | 2 -> 3 : 0 | | |
| 108 | 4 -> 7 : 0 | | |
| 109 | 8 -> 15 : 0 | | |
| 110 | 16 -> 31 : 5 | | |
| 111 | 32 -> 63 : 18 | | |
| 112 | 64 -> 127 : 32 | | |
| 113 | 128 -> 255 : 95 | | |
| 114 | 256 -> 511 : 2194 |****** | |
| 115 | 512 -> 1023 : 13830 |****************************************| |
| 116 | 1024 -> 2047 : 9001 |************************** | |
| 117 | 2048 -> 4095 : 1569 |**** | |
| 118 | 4096 -> 8191 : 96 | | |
| 119 | 8192 -> 16383 : 17 | | |
| 120 | Detaching... |
| 121 | |
| 122 | funcinterval also supports tracepoint filter. The above two cases are under EMMC |
| 123 | throughput testing, by those results you know which layer has a slower interval |
| 124 | time. In our case, mmc-cmdqd is slower than block layer. |
| 125 | |
| 126 | |
zhaoyao73 | f47b81a | 2021-06-30 13:15:26 -0400 | [diff] [blame] | 127 | # ./funcinterval -p `pidof -s top` c:malloc -i 3 |
| 128 | Tracing uprobe function for "malloc"... Hit Ctrl-C to end. |
| 129 | |
| 130 | nsecs : count distribution |
| 131 | 0 -> 1 : 0 | | |
| 132 | 2 -> 3 : 0 | | |
| 133 | 4 -> 7 : 0 | | |
| 134 | 8 -> 15 : 0 | | |
| 135 | 16 -> 31 : 0 | | |
| 136 | 32 -> 63 : 0 | | |
| 137 | 64 -> 127 : 0 | | |
| 138 | 128 -> 255 : 0 | | |
| 139 | 256 -> 511 : 0 | | |
| 140 | 512 -> 1023 : 0 | | |
| 141 | 1024 -> 2047 : 0 | | |
| 142 | 2048 -> 4095 : 0 | | |
| 143 | 4096 -> 8191 : 7 |************************* | |
| 144 | 8192 -> 16383 : 11 |****************************************| |
| 145 | 16384 -> 32767 : 4 |************** | |
| 146 | 32768 -> 65535 : 1 |*** | |
| 147 | 65536 -> 131071 : 1 |*** | |
| 148 | 131072 -> 262143 : 1 |*** | |
| 149 | 262144 -> 524287 : 0 | | |
| 150 | 524288 -> 1048575 : 0 | | |
| 151 | 1048576 -> 2097151 : 0 | | |
| 152 | 2097152 -> 4194303 : 0 | | |
| 153 | 4194304 -> 8388607 : 1 |*** | |
| 154 | |
| 155 | |
| 156 | nsecs : count distribution |
| 157 | 0 -> 1 : 0 | | |
| 158 | 2 -> 3 : 0 | | |
| 159 | 4 -> 7 : 0 | | |
| 160 | 8 -> 15 : 0 | | |
| 161 | 16 -> 31 : 0 | | |
| 162 | 32 -> 63 : 0 | | |
| 163 | 64 -> 127 : 0 | | |
| 164 | 128 -> 255 : 0 | | |
| 165 | 256 -> 511 : 0 | | |
| 166 | 512 -> 1023 : 0 | | |
| 167 | 1024 -> 2047 : 0 | | |
| 168 | 2048 -> 4095 : 0 | | |
| 169 | 4096 -> 8191 : 8 |******************************** | |
| 170 | 8192 -> 16383 : 10 |****************************************| |
| 171 | 16384 -> 32767 : 4 |**************** | |
| 172 | 32768 -> 65535 : 1 |**** | |
| 173 | 65536 -> 131071 : 1 |**** | |
| 174 | 131072 -> 262143 : 1 |**** | |
| 175 | 262144 -> 524287 : 0 | | |
| 176 | 524288 -> 1048575 : 0 | | |
| 177 | 1048576 -> 2097151 : 0 | | |
| 178 | 2097152 -> 4194303 : 0 | | |
| 179 | 4194304 -> 8388607 : 1 |**** | |
| 180 | |
| 181 | Time the interval of libc's malloc for top utility every 3 seconds. |
| 182 | |
| 183 | # ./funcinterval /usr/local/bin/python:main |
| 184 | Tracing uprobe function for "main"... Hit Ctrl-C to end. |
| 185 | ^C |
| 186 | nsecs : count distribution |
| 187 | 0 -> 1 : 0 | | |
| 188 | 2 -> 3 : 0 | | |
| 189 | 4 -> 7 : 0 | | |
| 190 | 8 -> 15 : 0 | | |
| 191 | 16 -> 31 : 0 | | |
| 192 | 32 -> 63 : 0 | | |
| 193 | 64 -> 127 : 0 | | |
| 194 | 128 -> 255 : 0 | | |
| 195 | 256 -> 511 : 0 | | |
| 196 | 512 -> 1023 : 0 | | |
| 197 | 1024 -> 2047 : 0 | | |
| 198 | 2048 -> 4095 : 0 | | |
| 199 | 4096 -> 8191 : 0 | | |
| 200 | 8192 -> 16383 : 0 | | |
| 201 | 16384 -> 32767 : 0 | | |
| 202 | 32768 -> 65535 : 0 | | |
| 203 | 65536 -> 131071 : 0 | | |
| 204 | 131072 -> 262143 : 0 | | |
| 205 | 262144 -> 524287 : 0 | | |
| 206 | 524288 -> 1048575 : 0 | | |
| 207 | 1048576 -> 2097151 : 0 | | |
| 208 | 2097152 -> 4194303 : 0 | | |
| 209 | 4194304 -> 8388607 : 0 | | |
| 210 | 8388608 -> 16777215 : 0 | | |
| 211 | 16777216 -> 33554431 : 0 | | |
| 212 | 33554432 -> 67108863 : 0 | | |
| 213 | 67108864 -> 134217727 : 0 | | |
| 214 | 134217728 -> 268435455 : 0 | | |
| 215 | 268435456 -> 536870911 : 1 |****************************************| |
| 216 | 536870912 -> 1073741823 : 1 |****************************************| |
| 217 | 1073741824 -> 2147483647 : 1 |****************************************| |
| 218 | 2147483648 -> 4294967295 : 1 |****************************************| |
| 219 | Detaching... |
| 220 | |
| 221 | Time the interal of python's main function. |
| 222 | |
Edward Wu | 2188d23 | 2020-05-28 09:36:21 +0800 | [diff] [blame] | 223 | USAGE message: |
| 224 | |
| 225 | # ./funcinterval -h |
| 226 | usage: funcinterval [-h] [-p PID] [-i INTERVAL] [-d DURATION] [-T] [-u] [-m] |
| 227 | [-v] |
| 228 | pattern |
| 229 | |
| 230 | Time interval and print latency as a histogram |
| 231 | |
| 232 | positional arguments: |
| 233 | pattern Function name for tracing |
| 234 | |
| 235 | optional arguments: |
| 236 | -h, --help show this help message and exit |
| 237 | -p PID, --pid PID trace this PID only |
| 238 | -i INTERVAL, --interval INTERVAL |
| 239 | summary interval, in seconds |
| 240 | -d DURATION, --duration DURATION |
| 241 | total duration of trace, in seconds |
| 242 | -T, --timestamp include timestamp on output |
| 243 | -u, --microseconds microsecond histogram |
| 244 | -m, --milliseconds millisecond histogram |
| 245 | -v, --verbose print the BPF program (for debugging purposes) |
| 246 | |
| 247 | examples: |
| 248 | # time the interval of do_sys_open() |
| 249 | ./funcinterval do_sys_open |
| 250 | # time the interval of xhci_ring_ep_doorbell(), in microseconds |
| 251 | ./funcinterval -u xhci_ring_ep_doorbell |
| 252 | # time the interval of do_nanosleep(), in milliseconds |
| 253 | ./funcinterval -m do_nanosleep |
| 254 | # output every 5 seconds, with timestamps |
| 255 | ./funcinterval -mTi 5 vfs_read |
| 256 | # time process 181 only |
| 257 | ./funcinterval -p 181 vfs_read |
| 258 | # time the interval of mm_vmscan_direct_reclaim_begin tracepoint |
| 259 | ./funcinterval t:vmscan:mm_vmscan_direct_reclaim_begin |
zhaoyao73 | f47b81a | 2021-06-30 13:15:26 -0400 | [diff] [blame] | 260 | # time the interval of c:malloc used by top every 3 seconds |
| 261 | ./funcinterval -p `pidof -s top` -i 3 c:malloc |
| 262 | # time /usr/local/bin/python main function |
| 263 | ./funcinterval /usr/local/bin/python:main |