Brendan Gregg | 553f2aa | 2016-02-14 18:15:24 -0800 | [diff] [blame] | 1 | Demonstrations of tcpretrans, the Linux eBPF/bcc version. |
| 2 | |
| 3 | |
| 4 | This tool traces the kernel TCP retransmit function to show details of these |
| 5 | retransmits. For example: |
| 6 | |
| 7 | # ./tcpretrans |
| 8 | TIME PID IP LADDR:LPORT T> RADDR:RPORT STATE |
| 9 | 01:55:05 0 4 10.153.223.157:22 R> 69.53.245.40:34619 ESTABLISHED |
| 10 | 01:55:05 0 4 10.153.223.157:22 R> 69.53.245.40:34619 ESTABLISHED |
| 11 | 01:55:17 0 4 10.153.223.157:22 R> 69.53.245.40:22957 ESTABLISHED |
| 12 | [...] |
| 13 | |
| 14 | This output shows three TCP retransmits, the first two were for an IPv4 |
| 15 | connection from 10.153.223.157 port 22 to 69.53.245.40 port 34619. The TCP |
| 16 | state was "ESTABLISHED" at the time of the retransmit. The on-CPU PID at the |
| 17 | time of the retransmit is printed, in this case 0 (the kernel, which will |
| 18 | be the case most of the time). |
| 19 | |
| 20 | Retransmits are usually a sign of poor network health, and this tool is |
| 21 | useful for their investigation. Unlike using tcpdump, this tool has very |
| 22 | low overhead, as it only traces the retransmit function. It also prints |
| 23 | additional kernel details: the state of the TCP session at the time of the |
| 24 | retransmit. |
| 25 | |
| 26 | |
| 27 | A -l option will include TCP tail loss probe attempts: |
| 28 | |
| 29 | # ./tcpretrans -l |
| 30 | TIME PID IP LADDR:LPORT T> RADDR:RPORT STATE |
| 31 | 01:55:45 0 4 10.153.223.157:22 R> 69.53.245.40:51601 ESTABLISHED |
| 32 | 01:55:46 0 4 10.153.223.157:22 R> 69.53.245.40:51601 ESTABLISHED |
| 33 | 01:55:46 0 4 10.153.223.157:22 R> 69.53.245.40:51601 ESTABLISHED |
| 34 | 01:55:53 0 4 10.153.223.157:22 L> 69.53.245.40:46444 ESTABLISHED |
| 35 | 01:56:06 0 4 10.153.223.157:22 R> 69.53.245.40:46444 ESTABLISHED |
| 36 | 01:56:06 0 4 10.153.223.157:22 R> 69.53.245.40:46444 ESTABLISHED |
| 37 | 01:56:08 0 4 10.153.223.157:22 R> 69.53.245.40:46444 ESTABLISHED |
| 38 | 01:56:08 0 4 10.153.223.157:22 R> 69.53.245.40:46444 ESTABLISHED |
| 39 | 01:56:08 1938 4 10.153.223.157:22 R> 69.53.245.40:46444 ESTABLISHED |
| 40 | 01:56:08 0 4 10.153.223.157:22 R> 69.53.245.40:46444 ESTABLISHED |
| 41 | 01:56:08 0 4 10.153.223.157:22 R> 69.53.245.40:46444 ESTABLISHED |
| 42 | [...] |
| 43 | |
| 44 | See the "L>" in the "T>" column. These are attempts: the kernel probably |
| 45 | sent a TLP, but in some cases it might not have been ultimately sent. |
| 46 | |
Matthias Tafelmeier | 1e9467f | 2017-12-13 18:50:22 +0100 | [diff] [blame] | 47 | To spot heavily retransmitting flows quickly one can use the -c flag. It will |
| 48 | count occurring retransmits per flow. |
| 49 | |
| 50 | # ./tcpretrans.py -c |
| 51 | Tracing retransmits ... Hit Ctrl-C to end |
| 52 | ^C |
| 53 | LADDR:LPORT RADDR:RPORT RETRANSMITS |
| 54 | 192.168.10.50:60366 <-> 172.217.21.194:443 700 |
| 55 | 192.168.10.50:666 <-> 172.213.11.195:443 345 |
| 56 | 192.168.10.50:366 <-> 172.212.22.194:443 211 |
| 57 | [...] |
| 58 | |
| 59 | This can ease to quickly isolate congested or otherwise awry network paths |
| 60 | responsible for clamping tcp performance. |
Brendan Gregg | 553f2aa | 2016-02-14 18:15:24 -0800 | [diff] [blame] | 61 | |
| 62 | USAGE message: |
| 63 | |
| 64 | # ./tcpretrans -h |
| 65 | usage: tcpretrans [-h] [-l] |
| 66 | |
| 67 | Trace TCP retransmits |
| 68 | |
| 69 | optional arguments: |
| 70 | -h, --help show this help message and exit |
| 71 | -l, --lossprobe include tail loss probe attempts |
Matthias Tafelmeier | 1e9467f | 2017-12-13 18:50:22 +0100 | [diff] [blame] | 72 | -c, --count count occurred retransmits per flow |
Brendan Gregg | 553f2aa | 2016-02-14 18:15:24 -0800 | [diff] [blame] | 73 | |
| 74 | examples: |
| 75 | ./tcpretrans # trace TCP retransmits |
| 76 | ./tcpretrans -l # include TLP attempts |