| Demonstrations of ext4slower, the Linux eBPF/bcc version. |
| |
| |
| ext4slower shows ext4 reads, writes, opens, and fsyncs, slower than a threshold. |
| For example: |
| |
| # ./ext4slower |
| Tracing ext4 operations slower than 10 ms |
| TIME COMM PID T BYTES OFF_KB LAT(ms) FILENAME |
| 06:35:01 cron 16464 R 1249 0 16.05 common-auth |
| 06:35:01 cron 16463 R 1249 0 16.04 common-auth |
| 06:35:01 cron 16465 R 1249 0 16.03 common-auth |
| 06:35:01 cron 16465 R 4096 0 10.62 login.defs |
| 06:35:01 cron 16464 R 4096 0 10.61 login.defs |
| 06:35:01 cron 16463 R 4096 0 10.63 login.defs |
| 06:35:01 cron 16465 R 2972 0 18.52 pam_env.conf |
| 06:35:01 cron 16464 R 2972 0 18.51 pam_env.conf |
| 06:35:01 cron 16463 R 2972 0 18.49 pam_env.conf |
| 06:35:01 dumpsystemstat 16473 R 128 0 12.58 date |
| 06:35:01 debian-sa1 16474 R 283 0 12.66 sysstat |
| 06:35:01 debian-sa1 16474 R 128 0 10.39 sa1 |
| 06:35:01 dumpsystemstat 16491 R 128 0 13.22 ifconfig |
| 06:35:01 DumpThreads 16534 R 128 0 12.78 cut |
| 06:35:01 cron 16545 R 128 0 14.76 sendmail |
| 06:35:01 sendmail 16545 R 274 0 10.88 dynamicmaps.cf |
| 06:35:02 postdrop 16546 R 118 0 32.94 Universal |
| 06:35:02 pickup 9574 R 118 0 21.02 localtime |
| [...] |
| |
| This shows various system tasks reading from ext4. The high latency here is |
| due to disk I/O, as I had just evicted the file system cache for this example. |
| |
| This "latency" is measured from when the operation was issued from the VFS |
| interface to the file system, to when it completed. This spans everything: |
| block device I/O (disk I/O), file system CPU cycles, file system locks, run |
| queue latency, etc. This is a better measure of the latency suffered by |
| applications reading from the file system than measuring this down at the |
| block device interface. |
| |
| Note that this only traces the common file system operations previously |
| listed: other file system operations (eg, inode operations including |
| getattr()) are not traced. |
| |
| |
| The threshold can be provided as an argument. Eg, I/O slower than 1 ms: |
| |
| # ./ext4slower 1 |
| Tracing ext4 operations slower than 1 ms |
| TIME COMM PID T BYTES OFF_KB LAT(ms) FILENAME |
| 06:49:17 bash 3616 R 128 0 7.75 cksum |
| 06:49:17 cksum 3616 R 39552 0 1.34 [ |
| 06:49:17 cksum 3616 R 96 0 5.36 2to3-2.7 |
| 06:49:17 cksum 3616 R 96 0 14.94 2to3-3.4 |
| 06:49:17 cksum 3616 R 10320 0 6.82 411toppm |
| 06:49:17 cksum 3616 R 65536 0 4.01 a2p |
| 06:49:17 cksum 3616 R 55400 0 8.77 ab |
| 06:49:17 cksum 3616 R 36792 0 16.34 aclocal-1.14 |
| 06:49:17 cksum 3616 R 15008 0 19.31 acpi_listen |
| 06:49:17 cksum 3616 R 6123 0 17.23 add-apt-repository |
| 06:49:17 cksum 3616 R 6280 0 18.40 addpart |
| 06:49:17 cksum 3616 R 27696 0 2.16 addr2line |
| 06:49:17 cksum 3616 R 58080 0 10.11 ag |
| 06:49:17 cksum 3616 R 906 0 6.30 ec2-meta-data |
| 06:49:17 cksum 3616 R 6320 0 10.00 animate.im6 |
| 06:49:17 cksum 3616 R 5680 0 18.69 anytopnm |
| 06:49:17 cksum 3616 R 2671 0 20.27 apport-bug |
| 06:49:17 cksum 3616 R 12566 0 16.72 apport-cli |
| 06:49:17 cksum 3616 R 1622 0 7.95 apport-unpack |
| 06:49:17 cksum 3616 R 10440 0 2.37 appres |
| 06:49:17 cksum 3616 R 48112 0 5.42 whatis |
| 06:49:17 cksum 3616 R 14832 0 6.24 apt |
| 06:49:17 cksum 3616 R 65536 0 24.74 apt-cache |
| 06:49:17 cksum 3616 R 27264 0 1.68 apt-cdrom |
| 06:49:17 cksum 3616 R 23224 0 5.31 apt-extracttemplates |
| 06:49:17 cksum 3616 R 65536 0 8.08 apt-ftparchive |
| 06:49:17 cksum 3616 R 65536 128 2.92 apt-ftparchive |
| 06:49:17 cksum 3616 R 65536 0 9.58 aptitude-curses |
| 06:49:17 cksum 3616 R 65536 128 44.25 aptitude-curses |
| 06:49:17 cksum 3616 R 65536 384 1.69 aptitude-curses |
| [...] |
| |
| This time a cksum(1) command can be seen reading various files (from /usr/bin). |
| |
| |
| A threshold of 0 will trace all operations. Warning: the output will be |
| verbose, as it will include all file system cache hits. |
| |
| # ./ext4slower 0 |
| Tracing ext4 operations |
| TIME COMM PID T BYTES OFF_KB LAT(ms) FILENAME |
| 06:58:05 supervise 1884 O 0 0 0.00 status.new |
| 06:58:05 supervise 1884 W 18 0 0.02 status.new |
| 06:58:05 supervise 1884 O 0 0 0.00 status.new |
| 06:58:05 supervise 1884 W 18 0 0.01 status.new |
| 06:58:05 supervise 15817 O 0 0 0.00 run |
| 06:58:05 supervise 15817 R 92 0 0.00 run |
| 06:58:05 supervise 15817 O 0 0 0.00 bash |
| 06:58:05 supervise 15817 R 128 0 0.00 bash |
| 06:58:05 supervise 15817 R 504 0 0.00 bash |
| 06:58:05 supervise 15817 R 28 0 0.00 bash |
| 06:58:05 supervise 15817 O 0 0 0.00 ld-2.19.so |
| 06:58:05 supervise 15817 R 64 0 0.00 ld-2.19.so |
| 06:58:05 supervise 15817 R 392 0 0.00 ld-2.19.so |
| 06:58:05 run 15817 O 0 0 0.00 ld.so.cache |
| 06:58:05 run 15817 O 0 0 0.00 libtinfo.so.5.9 |
| 06:58:05 run 15817 R 832 0 0.00 libtinfo.so.5.9 |
| 06:58:05 run 15817 O 0 0 0.00 libdl-2.19.so |
| 06:58:05 run 15817 R 832 0 0.00 libdl-2.19.so |
| 06:58:05 run 15817 O 0 0 0.00 libc-2.19.so |
| 06:58:05 run 15817 R 832 0 0.00 libc-2.19.so |
| 06:58:05 supervise 1876 O 0 0 0.00 status.new |
| 06:58:05 supervise 1876 W 18 0 0.01 status.new |
| 06:58:05 supervise 1895 O 0 0 0.00 status.new |
| 06:58:05 supervise 1895 W 18 0 0.02 status.new |
| 06:58:05 supervise 1876 O 0 0 0.00 status.new |
| 06:58:05 supervise 1876 W 18 0 0.01 status.new |
| 06:58:05 supervise 1872 O 0 0 0.00 status.new |
| 06:58:05 supervise 1872 W 18 0 0.02 status.new |
| 06:58:05 supervise 1895 O 0 0 0.00 status.new |
| 06:58:05 supervise 1895 W 18 0 0.01 status.new |
| 06:58:05 supervise 15818 R 92 0 0.00 run |
| 06:58:05 supervise 15818 O 0 0 0.00 bash |
| 06:58:05 supervise 15818 R 128 0 0.00 bash |
| 06:58:05 supervise 15818 R 504 0 0.00 bash |
| 06:58:05 supervise 15818 R 28 0 0.00 bash |
| 06:58:05 supervise 15818 O 0 0 0.00 ld-2.19.so |
| 06:58:05 supervise 15818 R 64 0 0.00 ld-2.19.so |
| 06:58:05 supervise 15818 R 392 0 0.00 ld-2.19.so |
| 06:58:05 supervise 15818 O 0 0 0.00 run |
| 06:58:05 supervise 1888 O 0 0 0.00 status.new |
| 06:58:05 supervise 1888 W 18 0 0.01 status.new |
| 06:58:05 supervise 1888 O 0 0 0.00 status.new |
| 06:58:05 supervise 1888 W 18 0 0.02 status.new |
| 06:58:05 supervise 15822 R 119 0 0.00 run |
| 06:58:05 supervise 15822 O 0 0 0.00 bash |
| 06:58:05 supervise 15822 R 128 0 0.00 bash |
| 06:58:05 supervise 15822 R 504 0 0.00 bash |
| 06:58:05 supervise 15822 R 28 0 0.00 bash |
| 06:58:05 supervise 15822 O 0 0 0.00 ld-2.19.so |
| 06:58:05 supervise 15822 R 64 0 0.00 ld-2.19.so |
| 06:58:05 supervise 15822 R 392 0 0.00 ld-2.19.so |
| 06:58:05 supervise 1892 O 0 0 0.00 status.new |
| 06:58:05 supervise 1892 W 18 0 0.02 status.new |
| 06:58:05 supervise 1892 O 0 0 0.00 status.new |
| 06:58:05 supervise 1892 W 18 0 0.02 status.new |
| 06:58:05 supervise 15820 O 0 0 0.00 run |
| [...] |
| |
| The output now includes open operations ("O"), and writes ("W"). |
| |
| |
| A -j option will print just the fields (parsable output, csv): |
| |
| # ./ext4slower -j 1 |
| ENDTIME_us,TASK,PID,TYPE,BYTES,OFFSET_b,LATENCY_us,FILE |
| 127200712278,bash,17225,R,128,0,14329,cksum |
| 127200722986,cksum,17225,R,3274,0,8368,command-not-found |
| 127200735581,cksum,17225,R,65536,0,10903,libbfd-2.24-system.so |
| 127200738482,cksum,17225,R,65536,131072,2419,libbfd-2.24-system.so |
| 127200749226,cksum,17225,R,65536,655360,8995,libbfd-2.24-system.so |
| 127200776273,cksum,17225,R,55080,0,25297,libbind9.so.90.0.9 |
| 127200784688,cksum,17225,R,65536,0,7873,libblas.so.3.0 |
| 127200787551,cksum,17225,R,65536,131072,2386,libblas.so.3.0 |
| 127200795524,cksum,17225,R,18624,0,4947,libcpupower.so.3.13.0-49 |
| 127200802073,cksum,17225,R,65536,0,6410,libcwidget.so.3.0.0 |
| 127200808718,cksum,17225,R,65536,131072,6181,libcwidget.so.3.0.0 |
| 127200829518,cksum,17225,R,65536,0,14213,libdns.so.100.2.2 |
| 127200832916,cksum,17225,R,65536,131072,2911,libdns.so.100.2.2 |
| 127200841044,cksum,17225,R,65536,655360,6376,libdns.so.100.2.2 |
| 127200853646,cksum,17225,R,956,0,1022,libdumbnet.la |
| 127200857814,cksum,17225,R,61096,0,4111,libdumbnet.so.1.0.1 |
| 127200869655,cksum,17225,R,65536,0,11252,libgettextlib-0.18.3.so |
| 127200872985,cksum,17225,R,65536,131072,2882,libgettextlib-0.18.3.so |
| 127200883063,cksum,17225,R,65536,0,9661,libgettextsrc-0.18.3.so |
| 127200884767,cksum,17225,R,65536,131072,1251,libgettextsrc-0.18.3.so |
| 127200904830,cksum,17225,R,65536,0,19571,libgirepository-1.0.so.1.0.0 |
| 127200906354,cksum,17225,R,65536,131072,1080,libgirepository-1.0.so.1.0.0 |
| 127200936047,cksum,17225,R,65536,0,28674,libGraphicsMagick.a |
| 127200939091,cksum,17225,R,65536,131072,2576,libGraphicsMagick.a |
| 127200947295,cksum,17225,R,65536,655360,6463,libGraphicsMagick.a |
| 127200958793,cksum,17225,R,65536,1966080,7034,libGraphicsMagick.a |
| [...] |
| |
| This may be useful for visualizing with another tool, for example, for |
| producing a scatter plot of ENDTIME vs LATENCY, to look for time-based |
| patterns. |
| |
| |
| USAGE message: |
| |
| # ./ext4slower -h |
| usage: ext4slower [-h] [-j] [-p PID] [min_ms] |
| |
| Trace common ext4 file operations slower than a threshold |
| |
| positional arguments: |
| min_ms minimum I/O duration to trace, in ms (default 10) |
| |
| optional arguments: |
| -h, --help show this help message and exit |
| -j, --csv just print fields: comma-separated values |
| -p PID, --pid PID trace this PID only |
| |
| examples: |
| ./ext4slower # trace operations slower than 10 ms (default) |
| ./ext4slower 1 # trace operations slower than 1 ms |
| ./ext4slower -j 1 # ... 1 ms, parsable output (csv) |
| ./ext4slower 0 # trace all operations (warning: verbose) |
| ./ext4slower -p 185 # trace PID 185 only |