blob: 7c2e12be0d7137f4ca10fc87467c57a54fea8fe9 [file] [log] [blame]
Brendan Gregg75d3e9d2016-02-07 18:48:20 -08001Demonstrations of fileslower, the Linux eBPF/bcc version.
Brendan Gregga32fbaf2016-02-07 12:14:37 -08002
3
Brendan Gregg75d3e9d2016-02-07 18:48:20 -08004fileslower shows file-based synchronous reads and writes slower than a
5threshold. For example:
Brendan Gregga32fbaf2016-02-07 12:14:37 -08006
Brendan Gregg75d3e9d2016-02-07 18:48:20 -08007# ./fileslower
Brendan Gregga32fbaf2016-02-07 12:14:37 -08008Tracing sync read/writes slower than 10 ms
9TIME(s) COMM PID D BYTES LAT(ms) FILENAME
100.000 randread.pl 4762 R 8192 12.70 data1
118.850 randread.pl 4762 R 8192 11.26 data1
1212.852 randread.pl 4762 R 8192 10.43 data1
13
14This showed a few reads from a "randread.pl" program, each 8 Kbytes in size,
15and from a "data1" file. These all had over 10 ms latency.
16
17This "latency" is measured from when the read or write was issued at the VFS
18interface, to when it completed. This spans everything: block device I/O (disk
19I/O), file system CPU cycles, file system locks, run queue latency, etc. This
20is a better measure of the latency suffered by applications reading from the
21file system than measuring this down at the block device interface.
22
Brendan Gregg75d3e9d2016-02-07 18:48:20 -080023Note that this only traces file reads and writes: other file system operations
24(eg, directory operations, open(), fflush()) are not traced.
Brendan Gregga32fbaf2016-02-07 12:14:37 -080025
26
27The threshold can be provided as an argument. Eg, I/O slower than 1 ms:
28
Brendan Gregg75d3e9d2016-02-07 18:48:20 -080029# ./fileslower 1
Brendan Gregga32fbaf2016-02-07 12:14:37 -080030Tracing sync read/writes slower than 1 ms
31TIME(s) COMM PID D BYTES LAT(ms) FILENAME
320.000 randread.pl 6925 R 8192 1.06 data1
330.082 randread.pl 6925 R 8192 2.42 data1
340.116 randread.pl 6925 R 8192 1.78 data1
350.153 randread.pl 6925 R 8192 2.31 data1
360.330 randread.pl 6925 R 8192 1.14 data1
370.345 randread.pl 6925 R 8192 1.52 data1
380.359 randread.pl 6925 R 8192 1.04 data1
390.532 randread.pl 6925 R 8192 2.56 data1
400.609 supervise 1892 W 18 3.65 status.new
410.610 randread.pl 6925 R 8192 1.37 data1
420.614 randread.pl 6925 R 8192 3.04 data1
430.729 randread.pl 6925 R 8192 2.90 data1
440.755 randread.pl 6925 R 8192 1.12 data1
450.762 randread.pl 6925 R 8192 2.62 data1
460.771 randread.pl 6925 R 8192 1.07 data1
470.816 randread.pl 6925 R 8192 10.50 data1
480.983 randread.pl 6925 R 8192 1.73 data1
490.989 randread.pl 6925 R 8192 2.12 data1
500.992 randread.pl 6925 R 8192 2.17 data1
511.001 randread.pl 6925 R 8192 1.93 data1
521.007 randread.pl 6925 R 8192 2.03 data1
531.210 randread.pl 6925 R 8192 1.82 data1
541.213 randread.pl 6925 R 8192 2.58 data1
551.219 randread.pl 6925 R 8192 2.20 data1
561.430 randread.pl 6925 R 8192 1.01 data1
571.448 randread.pl 6925 R 8192 2.22 data1
58[...]
59
60There's now much more output (this spans only 1.4 seconds, the previous output
61spanned 12 seconds), and the lower threshold is catching more I/O.
62
63
64In the following example, the file system caches were dropped before running
Brendan Gregg75d3e9d2016-02-07 18:48:20 -080065fileslower, and then in another session a "man ls" was executed. The command
Brendan Gregga32fbaf2016-02-07 12:14:37 -080066and files read from disk can be seen:
67
Brendan Gregg75d3e9d2016-02-07 18:48:20 -080068# echo 3 > /proc/sys/vm/drop_caches; ./fileslower 1
Brendan Gregga32fbaf2016-02-07 12:14:37 -080069Tracing sync read/writes slower than 1 ms
70TIME(s) COMM PID D BYTES LAT(ms) FILENAME
710.000 bash 9647 R 128 5.83 man
720.050 man 9647 R 832 19.52 libmandb-2.6.7.1.so
730.066 man 9647 R 832 15.79 libman-2.6.7.1.so
740.123 man 9647 R 832 56.36 libpipeline.so.1.3.0
750.135 man 9647 R 832 9.79 libgdbm.so.3.0.0
760.323 man 9647 R 4096 59.52 locale.alias
770.540 man 9648 R 8192 11.11 ls.1.gz
780.558 man 9647 R 72 6.97 index.db
790.563 man 9647 R 4096 5.12 index.db
800.723 man 9658 R 128 12.06 less
810.725 man 9656 R 128 14.52 nroff
820.779 man 9655 R 128 68.86 tbl
830.814 nroff 9660 R 128 14.55 locale
840.830 pager 9658 R 4096 28.27 .lesshst
850.866 man 9654 R 128 163.12 preconv
860.980 nroff 9684 R 128 13.80 groff
870.999 groff 9684 R 4096 14.29 DESC
881.036 groff 9685 R 128 5.94 troff
891.038 groff 9686 R 128 7.76 grotty
901.065 troff 9685 R 4096 6.33 R
911.082 troff 9685 R 4096 10.52 BI
921.096 troff 9685 R 4096 8.70 troffrc
931.176 troff 9685 R 4096 80.12 composite.tmac
941.195 troff 9685 R 4096 19.20 fallbacks.tmac
951.202 troff 9685 R 4096 6.79 tty.tmac
961.221 troff 9685 R 4096 7.87 man.local
972.977 supervise 1876 W 18 4.23 status.new
98
99This caught an individual I/O reaching 163.12 ms, for the "preconv" file. While
Brendan Greggf266f4f2016-02-07 12:22:50 -0800100the file system cache was flushed, causing these to need to be read from disk,
Brendan Gregga32fbaf2016-02-07 12:14:37 -0800101the duration here may not be entirely disk I/O: it can include file system
102locks, run queue latency, etc. These can be explored using other commands.
103
104
105USAGE message:
106
Brendan Gregg75d3e9d2016-02-07 18:48:20 -0800107# ./fileslower -h
108usage: fileslower [-h] [-p PID] [min_ms]
Brendan Gregga32fbaf2016-02-07 12:14:37 -0800109
Brendan Gregg75d3e9d2016-02-07 18:48:20 -0800110Trace slow synchronous file reads and writes
Brendan Gregga32fbaf2016-02-07 12:14:37 -0800111
112positional arguments:
113 min_ms minimum I/O duration to trace, in ms (default 10)
114
115optional arguments:
116 -h, --help show this help message and exit
117 -p PID, --pid PID trace this PID only
118
119examples:
Brendan Gregg75d3e9d2016-02-07 18:48:20 -0800120 ./fileslower # trace sync file I/O slower than 10 ms (default)
121 ./fileslower 1 # trace sync file I/O slower than 1 ms
122 ./fileslower -p 185 # trace PID 185 only