Brendan Gregg | 75d3e9d | 2016-02-07 18:48:20 -0800 | [diff] [blame] | 1 | Demonstrations of fileslower, the Linux eBPF/bcc version. |
Brendan Gregg | a32fbaf | 2016-02-07 12:14:37 -0800 | [diff] [blame] | 2 | |
| 3 | |
Brendan Gregg | 75d3e9d | 2016-02-07 18:48:20 -0800 | [diff] [blame] | 4 | fileslower shows file-based synchronous reads and writes slower than a |
| 5 | threshold. For example: |
Brendan Gregg | a32fbaf | 2016-02-07 12:14:37 -0800 | [diff] [blame] | 6 | |
Brendan Gregg | 75d3e9d | 2016-02-07 18:48:20 -0800 | [diff] [blame] | 7 | # ./fileslower |
Brendan Gregg | a32fbaf | 2016-02-07 12:14:37 -0800 | [diff] [blame] | 8 | Tracing sync read/writes slower than 10 ms |
| 9 | TIME(s) COMM PID D BYTES LAT(ms) FILENAME |
| 10 | 0.000 randread.pl 4762 R 8192 12.70 data1 |
| 11 | 8.850 randread.pl 4762 R 8192 11.26 data1 |
| 12 | 12.852 randread.pl 4762 R 8192 10.43 data1 |
| 13 | |
| 14 | This showed a few reads from a "randread.pl" program, each 8 Kbytes in size, |
| 15 | and from a "data1" file. These all had over 10 ms latency. |
| 16 | |
| 17 | This "latency" is measured from when the read or write was issued at the VFS |
| 18 | interface, to when it completed. This spans everything: block device I/O (disk |
| 19 | I/O), file system CPU cycles, file system locks, run queue latency, etc. This |
| 20 | is a better measure of the latency suffered by applications reading from the |
| 21 | file system than measuring this down at the block device interface. |
| 22 | |
Brendan Gregg | 75d3e9d | 2016-02-07 18:48:20 -0800 | [diff] [blame] | 23 | Note that this only traces file reads and writes: other file system operations |
| 24 | (eg, directory operations, open(), fflush()) are not traced. |
Brendan Gregg | a32fbaf | 2016-02-07 12:14:37 -0800 | [diff] [blame] | 25 | |
| 26 | |
| 27 | The threshold can be provided as an argument. Eg, I/O slower than 1 ms: |
| 28 | |
Brendan Gregg | 75d3e9d | 2016-02-07 18:48:20 -0800 | [diff] [blame] | 29 | # ./fileslower 1 |
Brendan Gregg | a32fbaf | 2016-02-07 12:14:37 -0800 | [diff] [blame] | 30 | Tracing sync read/writes slower than 1 ms |
| 31 | TIME(s) COMM PID D BYTES LAT(ms) FILENAME |
| 32 | 0.000 randread.pl 6925 R 8192 1.06 data1 |
| 33 | 0.082 randread.pl 6925 R 8192 2.42 data1 |
| 34 | 0.116 randread.pl 6925 R 8192 1.78 data1 |
| 35 | 0.153 randread.pl 6925 R 8192 2.31 data1 |
| 36 | 0.330 randread.pl 6925 R 8192 1.14 data1 |
| 37 | 0.345 randread.pl 6925 R 8192 1.52 data1 |
| 38 | 0.359 randread.pl 6925 R 8192 1.04 data1 |
| 39 | 0.532 randread.pl 6925 R 8192 2.56 data1 |
| 40 | 0.609 supervise 1892 W 18 3.65 status.new |
| 41 | 0.610 randread.pl 6925 R 8192 1.37 data1 |
| 42 | 0.614 randread.pl 6925 R 8192 3.04 data1 |
| 43 | 0.729 randread.pl 6925 R 8192 2.90 data1 |
| 44 | 0.755 randread.pl 6925 R 8192 1.12 data1 |
| 45 | 0.762 randread.pl 6925 R 8192 2.62 data1 |
| 46 | 0.771 randread.pl 6925 R 8192 1.07 data1 |
| 47 | 0.816 randread.pl 6925 R 8192 10.50 data1 |
| 48 | 0.983 randread.pl 6925 R 8192 1.73 data1 |
| 49 | 0.989 randread.pl 6925 R 8192 2.12 data1 |
| 50 | 0.992 randread.pl 6925 R 8192 2.17 data1 |
| 51 | 1.001 randread.pl 6925 R 8192 1.93 data1 |
| 52 | 1.007 randread.pl 6925 R 8192 2.03 data1 |
| 53 | 1.210 randread.pl 6925 R 8192 1.82 data1 |
| 54 | 1.213 randread.pl 6925 R 8192 2.58 data1 |
| 55 | 1.219 randread.pl 6925 R 8192 2.20 data1 |
| 56 | 1.430 randread.pl 6925 R 8192 1.01 data1 |
| 57 | 1.448 randread.pl 6925 R 8192 2.22 data1 |
| 58 | [...] |
| 59 | |
| 60 | There's now much more output (this spans only 1.4 seconds, the previous output |
| 61 | spanned 12 seconds), and the lower threshold is catching more I/O. |
| 62 | |
| 63 | |
| 64 | In the following example, the file system caches were dropped before running |
Brendan Gregg | 75d3e9d | 2016-02-07 18:48:20 -0800 | [diff] [blame] | 65 | fileslower, and then in another session a "man ls" was executed. The command |
Brendan Gregg | a32fbaf | 2016-02-07 12:14:37 -0800 | [diff] [blame] | 66 | and files read from disk can be seen: |
| 67 | |
Brendan Gregg | 75d3e9d | 2016-02-07 18:48:20 -0800 | [diff] [blame] | 68 | # echo 3 > /proc/sys/vm/drop_caches; ./fileslower 1 |
Brendan Gregg | a32fbaf | 2016-02-07 12:14:37 -0800 | [diff] [blame] | 69 | Tracing sync read/writes slower than 1 ms |
| 70 | TIME(s) COMM PID D BYTES LAT(ms) FILENAME |
| 71 | 0.000 bash 9647 R 128 5.83 man |
| 72 | 0.050 man 9647 R 832 19.52 libmandb-2.6.7.1.so |
| 73 | 0.066 man 9647 R 832 15.79 libman-2.6.7.1.so |
| 74 | 0.123 man 9647 R 832 56.36 libpipeline.so.1.3.0 |
| 75 | 0.135 man 9647 R 832 9.79 libgdbm.so.3.0.0 |
| 76 | 0.323 man 9647 R 4096 59.52 locale.alias |
| 77 | 0.540 man 9648 R 8192 11.11 ls.1.gz |
| 78 | 0.558 man 9647 R 72 6.97 index.db |
| 79 | 0.563 man 9647 R 4096 5.12 index.db |
| 80 | 0.723 man 9658 R 128 12.06 less |
| 81 | 0.725 man 9656 R 128 14.52 nroff |
| 82 | 0.779 man 9655 R 128 68.86 tbl |
| 83 | 0.814 nroff 9660 R 128 14.55 locale |
| 84 | 0.830 pager 9658 R 4096 28.27 .lesshst |
| 85 | 0.866 man 9654 R 128 163.12 preconv |
| 86 | 0.980 nroff 9684 R 128 13.80 groff |
| 87 | 0.999 groff 9684 R 4096 14.29 DESC |
| 88 | 1.036 groff 9685 R 128 5.94 troff |
| 89 | 1.038 groff 9686 R 128 7.76 grotty |
| 90 | 1.065 troff 9685 R 4096 6.33 R |
| 91 | 1.082 troff 9685 R 4096 10.52 BI |
| 92 | 1.096 troff 9685 R 4096 8.70 troffrc |
| 93 | 1.176 troff 9685 R 4096 80.12 composite.tmac |
| 94 | 1.195 troff 9685 R 4096 19.20 fallbacks.tmac |
| 95 | 1.202 troff 9685 R 4096 6.79 tty.tmac |
| 96 | 1.221 troff 9685 R 4096 7.87 man.local |
| 97 | 2.977 supervise 1876 W 18 4.23 status.new |
| 98 | |
| 99 | This caught an individual I/O reaching 163.12 ms, for the "preconv" file. While |
Brendan Gregg | f266f4f | 2016-02-07 12:22:50 -0800 | [diff] [blame] | 100 | the file system cache was flushed, causing these to need to be read from disk, |
Brendan Gregg | a32fbaf | 2016-02-07 12:14:37 -0800 | [diff] [blame] | 101 | the duration here may not be entirely disk I/O: it can include file system |
| 102 | locks, run queue latency, etc. These can be explored using other commands. |
| 103 | |
| 104 | |
| 105 | USAGE message: |
| 106 | |
Brendan Gregg | 75d3e9d | 2016-02-07 18:48:20 -0800 | [diff] [blame] | 107 | # ./fileslower -h |
Mark Drayton | 7434731 | 2016-08-25 20:46:35 +0100 | [diff] [blame] | 108 | usage: fileslower.py [-h] [-p PID] [-a] [min_ms] |
Brendan Gregg | a32fbaf | 2016-02-07 12:14:37 -0800 | [diff] [blame] | 109 | |
Brendan Gregg | 75d3e9d | 2016-02-07 18:48:20 -0800 | [diff] [blame] | 110 | Trace slow synchronous file reads and writes |
Brendan Gregg | a32fbaf | 2016-02-07 12:14:37 -0800 | [diff] [blame] | 111 | |
| 112 | positional arguments: |
| 113 | min_ms minimum I/O duration to trace, in ms (default 10) |
| 114 | |
| 115 | optional arguments: |
| 116 | -h, --help show this help message and exit |
| 117 | -p PID, --pid PID trace this PID only |
Mark Drayton | 7434731 | 2016-08-25 20:46:35 +0100 | [diff] [blame] | 118 | -a, --all-files include non-regular file types |
Brendan Gregg | a32fbaf | 2016-02-07 12:14:37 -0800 | [diff] [blame] | 119 | |
| 120 | examples: |
Brendan Gregg | 75d3e9d | 2016-02-07 18:48:20 -0800 | [diff] [blame] | 121 | ./fileslower # trace sync file I/O slower than 10 ms (default) |
| 122 | ./fileslower 1 # trace sync file I/O slower than 1 ms |
| 123 | ./fileslower -p 185 # trace PID 185 only |