Brendan Gregg | 60393ea | 2016-10-04 15:18:11 -0700 | [diff] [blame] | 1 | Demonstrations of tcptop, the Linux eBPF/bcc version. |
| 2 | |
| 3 | |
| 4 | tcptop summarizes throughput by host and port. Eg: |
| 5 | |
| 6 | # tcptop |
| 7 | Tracing... Output every 1 secs. Hit Ctrl-C to end |
| 8 | <screen clears> |
| 9 | 19:46:24 loadavg: 1.86 2.67 2.91 3/362 16681 |
| 10 | |
| 11 | PID COMM LADDR RADDR RX_KB TX_KB |
| 12 | 16648 16648 100.66.3.172:22 100.127.69.165:6684 1 0 |
| 13 | 16647 sshd 100.66.3.172:22 100.127.69.165:6684 0 2149 |
| 14 | 14374 sshd 100.66.3.172:22 100.127.69.165:25219 0 0 |
| 15 | 14458 sshd 100.66.3.172:22 100.127.69.165:7165 0 0 |
| 16 | |
| 17 | PID COMM LADDR6 RADDR6 RX_KB TX_KB |
| 18 | 16681 sshd fe80::8a3:9dff:fed5:6b19:22 fe80::8a3:9dff:fed5:6b19:16606 1 1 |
| 19 | 16679 ssh fe80::8a3:9dff:fed5:6b19:16606 fe80::8a3:9dff:fed5:6b19:22 1 1 |
| 20 | 16680 sshd fe80::8a3:9dff:fed5:6b19:22 fe80::8a3:9dff:fed5:6b19:16606 0 0 |
| 21 | |
| 22 | This example output shows two listings of TCP connections, for IPv4 and IPv6. |
| 23 | If there is only traffic for one of these, then only one group is shown. |
| 24 | |
| 25 | The output in each listing is sorted by total throughput (send then receive), |
| 26 | and when printed it is rounded (floor) to the nearest Kbyte. The example output |
| 27 | shows PID 16647, sshd, transmitted 2149 Kbytes during the tracing interval. |
| 28 | The other IPv4 sessions had such low throughput they rounded to zero. |
| 29 | |
| 30 | All TCP sessions, including over loopback, are included. |
| 31 | |
| 32 | The session with the process name (COMM) of 16648 is really a short-lived |
| 33 | process with PID 16648 where we didn't catch the process name when printing |
| 34 | the output. If this behavior is a serious issue for you, you can modify the |
| 35 | tool's code to include bpf_get_current_comm() in the key structs, so that it's |
| 36 | fetched during the event and will always be seen. I did it this way to start |
| 37 | with, but it was measurably increasing the overhead of this tool, so I switched |
| 38 | to the asynchronous model. |
| 39 | |
| 40 | The overhead is relative to TCP event rate (the rate of tcp_sendmsg() and |
| 41 | tcp_recvmsg() or tcp_cleanup_rbuf()). Due to buffering, this should be lower |
| 42 | than the packet rate. You can measure the rate of these using funccount. |
| 43 | Some sample production servers tested found total rates of 4k to 15k per |
| 44 | second. The CPU overhead at these rates ranged from 0.5% to 2.0% of one CPU. |
| 45 | Maybe your workloads have higher rates and therefore higher overhead, or, |
| 46 | lower rates. |
| 47 | |
| 48 | |
| 49 | I much prefer not clearing the screen, so that historic output is in the |
| 50 | scroll-back buffer, and patterns or intermittent issues can be better seen. |
| 51 | You can do this with -C: |
| 52 | |
| 53 | # tcptop -C |
| 54 | Tracing... Output every 1 secs. Hit Ctrl-C to end |
| 55 | |
| 56 | 20:27:12 loadavg: 0.08 0.02 0.17 2/367 17342 |
| 57 | |
| 58 | PID COMM LADDR RADDR RX_KB TX_KB |
| 59 | 17287 17287 100.66.3.172:22 100.127.69.165:57585 3 1 |
| 60 | 17286 sshd 100.66.3.172:22 100.127.69.165:57585 0 1 |
| 61 | 14374 sshd 100.66.3.172:22 100.127.69.165:25219 0 0 |
| 62 | |
| 63 | 20:27:13 loadavg: 0.08 0.02 0.17 1/367 17342 |
| 64 | |
| 65 | PID COMM LADDR RADDR RX_KB TX_KB |
| 66 | 17286 sshd 100.66.3.172:22 100.127.69.165:57585 1 7761 |
| 67 | 14374 sshd 100.66.3.172:22 100.127.69.165:25219 0 0 |
| 68 | |
| 69 | 20:27:14 loadavg: 0.08 0.02 0.17 2/365 17347 |
| 70 | |
| 71 | PID COMM LADDR RADDR RX_KB TX_KB |
| 72 | 17286 17286 100.66.3.172:22 100.127.69.165:57585 1 2501 |
| 73 | 14374 sshd 100.66.3.172:22 100.127.69.165:25219 0 0 |
| 74 | |
| 75 | 20:27:15 loadavg: 0.07 0.02 0.17 2/367 17403 |
| 76 | |
| 77 | PID COMM LADDR RADDR RX_KB TX_KB |
| 78 | 17349 17349 100.66.3.172:22 100.127.69.165:10161 3 1 |
| 79 | 17348 sshd 100.66.3.172:22 100.127.69.165:10161 0 1 |
| 80 | 14374 sshd 100.66.3.172:22 100.127.69.165:25219 0 0 |
| 81 | |
| 82 | 20:27:16 loadavg: 0.07 0.02 0.17 1/367 17403 |
| 83 | |
| 84 | PID COMM LADDR RADDR RX_KB TX_KB |
| 85 | 17348 sshd 100.66.3.172:22 100.127.69.165:10161 3333 0 |
| 86 | 14374 sshd 100.66.3.172:22 100.127.69.165:25219 0 0 |
| 87 | |
| 88 | 20:27:17 loadavg: 0.07 0.02 0.17 2/366 17409 |
| 89 | |
| 90 | PID COMM LADDR RADDR RX_KB TX_KB |
| 91 | 17348 17348 100.66.3.172:22 100.127.69.165:10161 6909 2 |
| 92 | |
| 93 | You can disable the loadavg summary line with -S if needed. |
| 94 | |
| 95 | |
| 96 | USAGE: |
| 97 | |
| 98 | # tcptop -h |
| 99 | usage: tcptop.py [-h] [-C] [-S] [-p PID] [interval] [count] |
| 100 | |
| 101 | Summarize TCP send/recv throughput by host |
| 102 | |
| 103 | positional arguments: |
| 104 | interval output interval, in seconds (default 1) |
| 105 | count number of outputs |
| 106 | |
| 107 | optional arguments: |
| 108 | -h, --help show this help message and exit |
| 109 | -C, --noclear don't clear the screen |
| 110 | -S, --nosummary skip system summary line |
| 111 | -p PID, --pid PID trace this PID only |
| 112 | |
| 113 | examples: |
| 114 | ./tcptop # trace TCP send/recv by host |
| 115 | ./tcptop -C # don't clear the screen |
| 116 | ./tcptop -p 181 # only trace PID 181 |