Brendan Gregg | 60393ea | 2016-10-04 15:18:11 -0700 | [diff] [blame] | 1 | Demonstrations of tcptop, the Linux eBPF/bcc version. |
| 2 | |
| 3 | |
| 4 | tcptop summarizes throughput by host and port. Eg: |
| 5 | |
| 6 | # tcptop |
| 7 | Tracing... Output every 1 secs. Hit Ctrl-C to end |
| 8 | <screen clears> |
| 9 | 19:46:24 loadavg: 1.86 2.67 2.91 3/362 16681 |
| 10 | |
| 11 | PID COMM LADDR RADDR RX_KB TX_KB |
| 12 | 16648 16648 100.66.3.172:22 100.127.69.165:6684 1 0 |
| 13 | 16647 sshd 100.66.3.172:22 100.127.69.165:6684 0 2149 |
| 14 | 14374 sshd 100.66.3.172:22 100.127.69.165:25219 0 0 |
| 15 | 14458 sshd 100.66.3.172:22 100.127.69.165:7165 0 0 |
| 16 | |
| 17 | PID COMM LADDR6 RADDR6 RX_KB TX_KB |
| 18 | 16681 sshd fe80::8a3:9dff:fed5:6b19:22 fe80::8a3:9dff:fed5:6b19:16606 1 1 |
| 19 | 16679 ssh fe80::8a3:9dff:fed5:6b19:16606 fe80::8a3:9dff:fed5:6b19:22 1 1 |
| 20 | 16680 sshd fe80::8a3:9dff:fed5:6b19:22 fe80::8a3:9dff:fed5:6b19:16606 0 0 |
| 21 | |
| 22 | This example output shows two listings of TCP connections, for IPv4 and IPv6. |
| 23 | If there is only traffic for one of these, then only one group is shown. |
| 24 | |
| 25 | The output in each listing is sorted by total throughput (send then receive), |
| 26 | and when printed it is rounded (floor) to the nearest Kbyte. The example output |
| 27 | shows PID 16647, sshd, transmitted 2149 Kbytes during the tracing interval. |
| 28 | The other IPv4 sessions had such low throughput they rounded to zero. |
| 29 | |
| 30 | All TCP sessions, including over loopback, are included. |
| 31 | |
| 32 | The session with the process name (COMM) of 16648 is really a short-lived |
| 33 | process with PID 16648 where we didn't catch the process name when printing |
| 34 | the output. If this behavior is a serious issue for you, you can modify the |
| 35 | tool's code to include bpf_get_current_comm() in the key structs, so that it's |
| 36 | fetched during the event and will always be seen. I did it this way to start |
| 37 | with, but it was measurably increasing the overhead of this tool, so I switched |
| 38 | to the asynchronous model. |
| 39 | |
| 40 | The overhead is relative to TCP event rate (the rate of tcp_sendmsg() and |
| 41 | tcp_recvmsg() or tcp_cleanup_rbuf()). Due to buffering, this should be lower |
| 42 | than the packet rate. You can measure the rate of these using funccount. |
| 43 | Some sample production servers tested found total rates of 4k to 15k per |
| 44 | second. The CPU overhead at these rates ranged from 0.5% to 2.0% of one CPU. |
| 45 | Maybe your workloads have higher rates and therefore higher overhead, or, |
| 46 | lower rates. |
| 47 | |
| 48 | |
| 49 | I much prefer not clearing the screen, so that historic output is in the |
| 50 | scroll-back buffer, and patterns or intermittent issues can be better seen. |
| 51 | You can do this with -C: |
| 52 | |
| 53 | # tcptop -C |
| 54 | Tracing... Output every 1 secs. Hit Ctrl-C to end |
| 55 | |
| 56 | 20:27:12 loadavg: 0.08 0.02 0.17 2/367 17342 |
| 57 | |
| 58 | PID COMM LADDR RADDR RX_KB TX_KB |
| 59 | 17287 17287 100.66.3.172:22 100.127.69.165:57585 3 1 |
| 60 | 17286 sshd 100.66.3.172:22 100.127.69.165:57585 0 1 |
| 61 | 14374 sshd 100.66.3.172:22 100.127.69.165:25219 0 0 |
| 62 | |
| 63 | 20:27:13 loadavg: 0.08 0.02 0.17 1/367 17342 |
| 64 | |
| 65 | PID COMM LADDR RADDR RX_KB TX_KB |
| 66 | 17286 sshd 100.66.3.172:22 100.127.69.165:57585 1 7761 |
| 67 | 14374 sshd 100.66.3.172:22 100.127.69.165:25219 0 0 |
| 68 | |
| 69 | 20:27:14 loadavg: 0.08 0.02 0.17 2/365 17347 |
| 70 | |
| 71 | PID COMM LADDR RADDR RX_KB TX_KB |
| 72 | 17286 17286 100.66.3.172:22 100.127.69.165:57585 1 2501 |
| 73 | 14374 sshd 100.66.3.172:22 100.127.69.165:25219 0 0 |
| 74 | |
| 75 | 20:27:15 loadavg: 0.07 0.02 0.17 2/367 17403 |
| 76 | |
| 77 | PID COMM LADDR RADDR RX_KB TX_KB |
| 78 | 17349 17349 100.66.3.172:22 100.127.69.165:10161 3 1 |
| 79 | 17348 sshd 100.66.3.172:22 100.127.69.165:10161 0 1 |
| 80 | 14374 sshd 100.66.3.172:22 100.127.69.165:25219 0 0 |
| 81 | |
| 82 | 20:27:16 loadavg: 0.07 0.02 0.17 1/367 17403 |
| 83 | |
| 84 | PID COMM LADDR RADDR RX_KB TX_KB |
| 85 | 17348 sshd 100.66.3.172:22 100.127.69.165:10161 3333 0 |
| 86 | 14374 sshd 100.66.3.172:22 100.127.69.165:25219 0 0 |
| 87 | |
| 88 | 20:27:17 loadavg: 0.07 0.02 0.17 2/366 17409 |
| 89 | |
| 90 | PID COMM LADDR RADDR RX_KB TX_KB |
| 91 | 17348 17348 100.66.3.172:22 100.127.69.165:10161 6909 2 |
| 92 | |
| 93 | You can disable the loadavg summary line with -S if needed. |
| 94 | |
Alban Crequy | 7d62656 | 2020-03-08 16:41:34 +0100 | [diff] [blame] | 95 | The --cgroupmap option filters based on a cgroup set. It is meant to be used |
| 96 | with an externally created map. |
| 97 | |
| 98 | # tcptop --cgroupmap /sys/fs/bpf/test01 |
| 99 | |
Alban Crequy | 32ab858 | 2020-03-22 16:06:44 +0100 | [diff] [blame] | 100 | For more details, see docs/special_filtering.md |
Alban Crequy | 7d62656 | 2020-03-08 16:41:34 +0100 | [diff] [blame] | 101 | |
Brendan Gregg | 60393ea | 2016-10-04 15:18:11 -0700 | [diff] [blame] | 102 | |
| 103 | USAGE: |
| 104 | |
| 105 | # tcptop -h |
Alban Crequy | 7d62656 | 2020-03-08 16:41:34 +0100 | [diff] [blame] | 106 | usage: tcptop.py [-h] [-C] [-S] [-p PID] [--cgroupmap CGROUPMAP] |
Alban Crequy | 32ab858 | 2020-03-22 16:06:44 +0100 | [diff] [blame] | 107 | [--mntnsmap MNTNSMAP] |
Hariharan Ananthakrishnan | 04893e3 | 2021-08-12 05:55:21 -0700 | [diff] [blame] | 108 | [interval] [count] [-4 | -6] |
Brendan Gregg | 60393ea | 2016-10-04 15:18:11 -0700 | [diff] [blame] | 109 | |
| 110 | Summarize TCP send/recv throughput by host |
| 111 | |
| 112 | positional arguments: |
Alban Crequy | 7d62656 | 2020-03-08 16:41:34 +0100 | [diff] [blame] | 113 | interval output interval, in seconds (default 1) |
| 114 | count number of outputs |
Brendan Gregg | 60393ea | 2016-10-04 15:18:11 -0700 | [diff] [blame] | 115 | |
| 116 | optional arguments: |
Alban Crequy | 7d62656 | 2020-03-08 16:41:34 +0100 | [diff] [blame] | 117 | -h, --help show this help message and exit |
| 118 | -C, --noclear don't clear the screen |
| 119 | -S, --nosummary skip system summary line |
| 120 | -p PID, --pid PID trace this PID only |
| 121 | --cgroupmap CGROUPMAP |
| 122 | trace cgroups in this BPF map only |
Hariharan Ananthakrishnan | 04893e3 | 2021-08-12 05:55:21 -0700 | [diff] [blame] | 123 | -4, --ipv4 trace IPv4 family only |
| 124 | -6, --ipv6 trace IPv6 family only |
Brendan Gregg | 60393ea | 2016-10-04 15:18:11 -0700 | [diff] [blame] | 125 | |
| 126 | examples: |
| 127 | ./tcptop # trace TCP send/recv by host |
| 128 | ./tcptop -C # don't clear the screen |
| 129 | ./tcptop -p 181 # only trace PID 181 |
Alban Crequy | 7d62656 | 2020-03-08 16:41:34 +0100 | [diff] [blame] | 130 | ./tcptop --cgroupmap ./mappath # only trace cgroups in this BPF map |
Alban Crequy | 32ab858 | 2020-03-22 16:06:44 +0100 | [diff] [blame] | 131 | ./tcptop --mntnsmap mappath # only trace mount namespaces in the map |
Hariharan Ananthakrishnan | 04893e3 | 2021-08-12 05:55:21 -0700 | [diff] [blame] | 132 | ./tcptop -4 # trace IPv4 family only |
| 133 | ./tcptop -6 # trace IPv6 family only |