Brendan Gregg | 553f2aa | 2016-02-14 18:15:24 -0800 | [diff] [blame] | 1 | Demonstrations of tcpretrans, the Linux eBPF/bcc version. |
| 2 | |
| 3 | |
| 4 | This tool traces the kernel TCP retransmit function to show details of these |
| 5 | retransmits. For example: |
| 6 | |
Michael Gugino | 7abd77a | 2021-09-01 18:07:33 -0400 | [diff] [blame] | 7 | # ./tcpretrans |
Brendan Gregg | 553f2aa | 2016-02-14 18:15:24 -0800 | [diff] [blame] | 8 | TIME PID IP LADDR:LPORT T> RADDR:RPORT STATE |
| 9 | 01:55:05 0 4 10.153.223.157:22 R> 69.53.245.40:34619 ESTABLISHED |
| 10 | 01:55:05 0 4 10.153.223.157:22 R> 69.53.245.40:34619 ESTABLISHED |
| 11 | 01:55:17 0 4 10.153.223.157:22 R> 69.53.245.40:22957 ESTABLISHED |
| 12 | [...] |
| 13 | |
| 14 | This output shows three TCP retransmits, the first two were for an IPv4 |
| 15 | connection from 10.153.223.157 port 22 to 69.53.245.40 port 34619. The TCP |
| 16 | state was "ESTABLISHED" at the time of the retransmit. The on-CPU PID at the |
| 17 | time of the retransmit is printed, in this case 0 (the kernel, which will |
| 18 | be the case most of the time). |
| 19 | |
| 20 | Retransmits are usually a sign of poor network health, and this tool is |
| 21 | useful for their investigation. Unlike using tcpdump, this tool has very |
| 22 | low overhead, as it only traces the retransmit function. It also prints |
| 23 | additional kernel details: the state of the TCP session at the time of the |
| 24 | retransmit. |
| 25 | |
| 26 | |
| 27 | A -l option will include TCP tail loss probe attempts: |
| 28 | |
| 29 | # ./tcpretrans -l |
| 30 | TIME PID IP LADDR:LPORT T> RADDR:RPORT STATE |
| 31 | 01:55:45 0 4 10.153.223.157:22 R> 69.53.245.40:51601 ESTABLISHED |
| 32 | 01:55:46 0 4 10.153.223.157:22 R> 69.53.245.40:51601 ESTABLISHED |
| 33 | 01:55:46 0 4 10.153.223.157:22 R> 69.53.245.40:51601 ESTABLISHED |
| 34 | 01:55:53 0 4 10.153.223.157:22 L> 69.53.245.40:46444 ESTABLISHED |
| 35 | 01:56:06 0 4 10.153.223.157:22 R> 69.53.245.40:46444 ESTABLISHED |
| 36 | 01:56:06 0 4 10.153.223.157:22 R> 69.53.245.40:46444 ESTABLISHED |
| 37 | 01:56:08 0 4 10.153.223.157:22 R> 69.53.245.40:46444 ESTABLISHED |
| 38 | 01:56:08 0 4 10.153.223.157:22 R> 69.53.245.40:46444 ESTABLISHED |
| 39 | 01:56:08 1938 4 10.153.223.157:22 R> 69.53.245.40:46444 ESTABLISHED |
| 40 | 01:56:08 0 4 10.153.223.157:22 R> 69.53.245.40:46444 ESTABLISHED |
| 41 | 01:56:08 0 4 10.153.223.157:22 R> 69.53.245.40:46444 ESTABLISHED |
| 42 | [...] |
| 43 | |
| 44 | See the "L>" in the "T>" column. These are attempts: the kernel probably |
| 45 | sent a TLP, but in some cases it might not have been ultimately sent. |
| 46 | |
Matthias Tafelmeier | 1e9467f | 2017-12-13 18:50:22 +0100 | [diff] [blame] | 47 | To spot heavily retransmitting flows quickly one can use the -c flag. It will |
Michael Gugino | 7abd77a | 2021-09-01 18:07:33 -0400 | [diff] [blame] | 48 | count occurring retransmits per flow. |
Matthias Tafelmeier | 1e9467f | 2017-12-13 18:50:22 +0100 | [diff] [blame] | 49 | |
| 50 | # ./tcpretrans.py -c |
| 51 | Tracing retransmits ... Hit Ctrl-C to end |
| 52 | ^C |
| 53 | LADDR:LPORT RADDR:RPORT RETRANSMITS |
| 54 | 192.168.10.50:60366 <-> 172.217.21.194:443 700 |
Michael Gugino | 7abd77a | 2021-09-01 18:07:33 -0400 | [diff] [blame] | 55 | 192.168.10.50:666 <-> 172.213.11.195:443 345 |
Matthias Tafelmeier | 1e9467f | 2017-12-13 18:50:22 +0100 | [diff] [blame] | 56 | 192.168.10.50:366 <-> 172.212.22.194:443 211 |
| 57 | [...] |
| 58 | |
| 59 | This can ease to quickly isolate congested or otherwise awry network paths |
| 60 | responsible for clamping tcp performance. |
Brendan Gregg | 553f2aa | 2016-02-14 18:15:24 -0800 | [diff] [blame] | 61 | |
Michael Gugino | 7abd77a | 2021-09-01 18:07:33 -0400 | [diff] [blame] | 62 | TCP sequence numbers can be included via -s, except in count mode. These numbers |
| 63 | are useful for identifying specific retransmissions in large packet caputes. |
| 64 | Note, lossprobe -l output will display 0 for the sequence number for L type. |
| 65 | |
| 66 | # ./tcpretrans.py -s |
| 67 | TIME PID IP LADDR:LPORT T> RADDR:RPORT STATE SEQ |
| 68 | 18:03:46 0 4 192.168.10.50:41976 R> 172.217.21.194:443 SYN_SENT 2879306108 |
| 69 | 18:03:49 0 4 192.168.10.50:41976 R> 172.217.21.194:443 SYN_SENT 2879306108 |
| 70 | |
Brendan Gregg | 553f2aa | 2016-02-14 18:15:24 -0800 | [diff] [blame] | 71 | USAGE message: |
| 72 | |
| 73 | # ./tcpretrans -h |
Michael Gugino | 7abd77a | 2021-09-01 18:07:33 -0400 | [diff] [blame] | 74 | usage: tcpretrans.py [-h] [-s] [-l] [-c] [-4 | -6] |
Brendan Gregg | 553f2aa | 2016-02-14 18:15:24 -0800 | [diff] [blame] | 75 | |
| 76 | Trace TCP retransmits |
| 77 | |
| 78 | optional arguments: |
| 79 | -h, --help show this help message and exit |
Michael Gugino | 7abd77a | 2021-09-01 18:07:33 -0400 | [diff] [blame] | 80 | -s, --sequence display TCP sequence numbers |
Brendan Gregg | 553f2aa | 2016-02-14 18:15:24 -0800 | [diff] [blame] | 81 | -l, --lossprobe include tail loss probe attempts |
Matthias Tafelmeier | 1e9467f | 2017-12-13 18:50:22 +0100 | [diff] [blame] | 82 | -c, --count count occurred retransmits per flow |
Hariharan Ananthakrishnan | 04893e3 | 2021-08-12 05:55:21 -0700 | [diff] [blame] | 83 | -4, --ipv4 trace IPv4 family only |
| 84 | -6, --ipv6 trace IPv6 family only |
Brendan Gregg | 553f2aa | 2016-02-14 18:15:24 -0800 | [diff] [blame] | 85 | |
| 86 | examples: |
| 87 | ./tcpretrans # trace TCP retransmits |
| 88 | ./tcpretrans -l # include TLP attempts |
Hariharan Ananthakrishnan | 04893e3 | 2021-08-12 05:55:21 -0700 | [diff] [blame] | 89 | ./tcpretrans -4 # trace IPv4 family only |
| 90 | ./tcpretrans -6 # trace IPv6 family only |