blob: 823b64acc7ea839554e2e7909e833938bbb0c4ee [file] [log] [blame]
samuelnair166bf0f2017-09-05 00:57:49 +08001Demonstrations of nfsslower, the Linux eBPF/bcc version.
2
3nfsslower show NFS reads, writes, opens and getattrs, slower than a
4threshold. For example:
5
6./nfsslower.py
7Tracing NFS operations that are slower than 10 ms
8TIME COMM PID T BYTES OFF_KB LAT(ms) FILENAME
911:25:16 dd 21295 W 1048576 15360 14.84 1.test
1011:25:16 dd 21295 W 1048576 16384 12.73 1.test
1111:25:16 dd 21295 W 1048576 17408 24.27 1.test
1211:25:16 dd 21295 W 1048576 18432 22.93 1.test
1311:25:16 dd 21295 W 1048576 19456 14.65 1.test
1411:25:16 dd 21295 W 1048576 20480 12.58 1.test
1511:25:16 dd 21297 W 1048576 6144 10.50 1.test.w
1611:25:16 dd 21297 W 1048576 7168 16.65 1.test.w
1711:25:16 dd 21297 W 1048576 8192 13.01 1.test.w
1811:25:16 dd 21297 W 1048576 9216 14.06 1.test.w
19
20This shows NFS writes from dd each 1MB in size to 2 different files. The
21writes all had latency higher than 10ms.
22
23This "latency" is measured from when the operation was issued from the VFS
24interface to the file system, to when it completed. This spans everything:
25RPC latency, network latency, file system CPU cycles, file system locks, run
26queue latency, etc. This is a better measure of the latency suffered by
27applications reading from a NFS share and can better expose problems
28experienced by NFS clients.
29
30Note that this only traces the common NFS operations (read,write,open and
31getattr). I chose to include getattr as a significant percentage of NFS
32traffic end up being getattr calls and are a good indicator of problems
33with an NFS server.
34
35The threshold can be provided as an argument. E.g. I/O slower than 1 ms:
36
37./nfsslower.py 1
38Tracing NFS operations that are slower than 1 ms
39TIME COMM PID T BYTES OFF_KB LAT(ms) FILENAME
4011:40:16 cp 21583 R 131072 0 4.35 1.test
4111:40:16 cp 21583 R 131072 256 1.87 1.test
4211:40:16 cp 21583 R 131072 384 2.99 1.test
4311:40:16 cp 21583 R 131072 512 4.19 1.test
4411:40:16 cp 21583 R 131072 640 4.25 1.test
4511:40:16 cp 21583 R 131072 768 4.65 1.test
4611:40:16 cp 21583 R 131072 1280 1.08 1.test
4711:40:16 cp 21583 R 131072 1408 3.29 1.test
4811:40:16 cp 21583 R 131072 1792 3.12 1.test
4911:40:16 cp 21583 R 131072 3712 3.55 1.test
5011:40:16 cp 21583 R 131072 3840 1.12 1.test
5111:40:16 cp 21583 R 131072 4096 3.23 1.test
5211:40:16 cp 21583 R 131072 4224 2.73 1.test
5311:40:16 cp 21583 R 131072 4352 2.73 1.test
5411:40:16 cp 21583 R 131072 4480 6.09 1.test
5511:40:16 cp 21583 R 131072 5120 4.40 1.test
56[...]
57
58This shows all NFS_READS that were more than 1ms. Depending on your
59latency to your fileserver, you might need to tweak this value to
60remove
61
62A threshold of 0 will trace all operations. Warning: the output will be
63verbose, as it will include all file system cache hits.
64
65./nfsslower.py 0
66Tracing NFS operations
6711:56:50 dd 21852 W 1048576 0 0.42 1.test
6811:56:50 dd 21852 W 1048576 1024 0.46 1.test
6911:56:50 dd 21852 W 1048576 2048 0.36 1.test
7011:56:50 cp 21854 G 0 0 0.35 1.test
7111:56:50 cp 21854 O 0 0 0.33 1.test
7211:56:50 cp 21854 G 0 0 0.00 1.test
7311:56:50 cp 21854 R 131072 0 0.07 1.test
7411:56:50 cp 21854 R 131072 128 0.02 1.test
7511:56:50 cp 21854 R 131072 256 0.02 1.test
7611:56:50 cp 21854 R 131072 384 0.02 1.test
7711:56:50 cp 21854 R 131072 512 0.02 1.test
7811:56:50 cp 21854 R 131072 640 0.02 1.test
7911:56:50 cp 21854 R 131072 768 0.02 1.test
8011:56:50 cp 21854 R 131072 896 0.02 1.test
8111:56:50 cp 21854 R 131072 1024 0.02 1.test
8211:56:50 cp 21854 R 131072 1152 0.02 1.test
8311:56:50 cp 21854 R 131072 1280 0.02 1.test
8411:56:50 cp 21854 R 131072 1408 0.02 1.test
8511:56:50 cp 21854 R 131072 1536 0.02 1.test
8611:56:50 cp 21854 R 131072 1664 0.02 1.test
8711:56:50 cp 21854 R 131072 1792 0.02 1.test
8811:56:50 cp 21854 R 131072 1920 0.02 1.test
8911:56:50 cp 21854 R 131072 2048 0.02 1.test
9011:56:50 cp 21854 R 131072 2176 0.04 1.test
9111:56:50 cp 21854 R 131072 2304 0.02 1.test
9211:56:50 cp 21854 R 131072 2432 0.03 1.test
9311:56:50 cp 21854 R 131072 2560 0.03 1.test
9411:56:50 cp 21854 R 131072 2688 0.02 1.test
9511:56:50 cp 21854 R 131072 2816 0.03 1.test
9611:56:50 cp 21854 R 131072 2944 0.02 1.test
9711:56:50 cp 21854 R 0 3072 0.00 1.test
9811:56:50 ls 21855 G 0 0 0.00 1.test
9911:56:50 ls 21856 G 0 0 0.36 music
10011:56:50 ls 21856 G 0 0 0.00 music
10111:56:50 ls 21856 G 0 0 0.00 test
10211:56:50 ls 21856 G 0 0 0.00 ff
10311:56:50 ls 21856 G 0 0 0.00 34.log
10411:56:50 ls 21856 G 0 0 0.00 vmlinuz-linux
10511:56:50 ls 21856 G 0 0 0.00 2.test
10611:56:50 ls 21856 G 0 0 0.00 rt.log
10711:56:50 ls 21856 G 0 0 0.00 1.lod
10811:56:50 ls 21856 G 0 0 0.00 COPYRIGHT.txt
10911:56:50 ls 21856 G 0 0 0.00 gg
11011:56:50 ls 21856 G 0 0 0.00 qw.log
11111:56:50 ls 21856 G 0 0 0.00 README.md
11211:56:50 ls 21856 G 0 0 0.00 1.log
113
114The output now includes open operations ("O"), and reads ("R") wand getattrs ("G").
115A cp operation
116
117
118A -j option will print just the fields (parsable output, csv):
119
120./nfsslower.py -j 0
121ENDTIME_us,TASK,PID,TYPE,BYTES,OFFSET_b,LATENCY_us,FILE
12287054476520,dd,22754,W,1048576,0,425,1.test
12387054482916,dd,22754,W,1048576,1048576,320,1.test
12487054488179,dd,22754,W,1048576,2097152,389,1.test
12587054511340,cp,22756,G,0,0,371,1.test
12687054511685,cp,22756,O,0,0,306,1.test
12787054511700,cp,22756,G,0,0,2,1.test
12887054512325,cp,22756,R,131072,0,56,1.test
12987054512432,cp,22756,R,131072,131072,22,1.test
13087054512520,cp,22756,R,131072,262144,32,1.test
13187054512600,cp,22756,R,131072,393216,21,1.test
13287054512678,cp,22756,R,131072,524288,21,1.test
13387054512803,cp,22756,R,131072,655360,56,1.test
134
135This may be useful for visualizing with another tool, for example, for
136producing a scatter plot of ENDTIME vs LATENCY, to look for time-based
137patterns.
138
139USAGE message:
140
141usage: nfsslower.py [-h] [-j] [-p PID] [min_ms]
142
143Trace READ, WRITE, OPEN and GETATTR NFS calls slower than a threshold,supports NFSv{3,4}
144
145positional arguments:
146 min_ms Minimum IO duration to trace in ms (default=10ms)
147
148optional arguments:
149 -h, --help show this help message and exit
150 -j, --csv just print fields: comma-separated values
151 -p PID, --pid PID Trace this pid only
152
153 ./nfsslower # trace operations slower than 10ms
154 ./nfsslower 1 # trace operations slower than 1ms
155 ./nfsslower -j 1 # ... 1 ms, parsable output (csv)
156 ./nfsslower 0 # trace all nfs operations
157 ./nfsslower -p 121 # trace pid 121 only
158