Brendan Gregg | cd1cad1 | 2016-02-12 02:27:19 -0800 | [diff] [blame] | 1 | Demonstrations of ext4slower, the Linux eBPF/bcc version. |
| 2 | |
| 3 | |
| 4 | ext4slower shows ext4 reads, writes, opens, and fsyncs, slower than a threshold. |
| 5 | For example: |
| 6 | |
| 7 | # ./ext4slower |
| 8 | Tracing ext4 operations slower than 10 ms |
| 9 | TIME COMM PID T BYTES OFF_KB LAT(ms) FILENAME |
| 10 | 06:35:01 cron 16464 R 1249 0 16.05 common-auth |
| 11 | 06:35:01 cron 16463 R 1249 0 16.04 common-auth |
| 12 | 06:35:01 cron 16465 R 1249 0 16.03 common-auth |
| 13 | 06:35:01 cron 16465 R 4096 0 10.62 login.defs |
| 14 | 06:35:01 cron 16464 R 4096 0 10.61 login.defs |
| 15 | 06:35:01 cron 16463 R 4096 0 10.63 login.defs |
| 16 | 06:35:01 cron 16465 R 2972 0 18.52 pam_env.conf |
| 17 | 06:35:01 cron 16464 R 2972 0 18.51 pam_env.conf |
| 18 | 06:35:01 cron 16463 R 2972 0 18.49 pam_env.conf |
| 19 | 06:35:01 dumpsystemstat 16473 R 128 0 12.58 date |
| 20 | 06:35:01 debian-sa1 16474 R 283 0 12.66 sysstat |
| 21 | 06:35:01 debian-sa1 16474 R 128 0 10.39 sa1 |
| 22 | 06:35:01 dumpsystemstat 16491 R 128 0 13.22 ifconfig |
| 23 | 06:35:01 DumpThreads 16534 R 128 0 12.78 cut |
| 24 | 06:35:01 cron 16545 R 128 0 14.76 sendmail |
| 25 | 06:35:01 sendmail 16545 R 274 0 10.88 dynamicmaps.cf |
| 26 | 06:35:02 postdrop 16546 R 118 0 32.94 Universal |
| 27 | 06:35:02 pickup 9574 R 118 0 21.02 localtime |
| 28 | [...] |
| 29 | |
| 30 | This shows various system tasks reading from ext4. The high latency here is |
| 31 | due to disk I/O, as I had just evicted the file system cache for this example. |
| 32 | |
| 33 | This "latency" is measured from when the operation was issued from the VFS |
| 34 | interface to the file system, to when it completed. This spans everything: |
| 35 | block device I/O (disk I/O), file system CPU cycles, file system locks, run |
| 36 | queue latency, etc. This is a better measure of the latency suffered by |
| 37 | applications reading from the file system than measuring this down at the |
| 38 | block device interface. |
| 39 | |
| 40 | Note that this only traces the common file system operations previously |
| 41 | listed: other file system operations (eg, inode operations including |
| 42 | getattr()) are not traced. |
| 43 | |
| 44 | |
| 45 | The threshold can be provided as an argument. Eg, I/O slower than 1 ms: |
| 46 | |
| 47 | # ./ext4slower 1 |
| 48 | Tracing ext4 operations slower than 1 ms |
| 49 | TIME COMM PID T BYTES OFF_KB LAT(ms) FILENAME |
| 50 | 06:49:17 bash 3616 R 128 0 7.75 cksum |
| 51 | 06:49:17 cksum 3616 R 39552 0 1.34 [ |
| 52 | 06:49:17 cksum 3616 R 96 0 5.36 2to3-2.7 |
| 53 | 06:49:17 cksum 3616 R 96 0 14.94 2to3-3.4 |
| 54 | 06:49:17 cksum 3616 R 10320 0 6.82 411toppm |
| 55 | 06:49:17 cksum 3616 R 65536 0 4.01 a2p |
| 56 | 06:49:17 cksum 3616 R 55400 0 8.77 ab |
| 57 | 06:49:17 cksum 3616 R 36792 0 16.34 aclocal-1.14 |
| 58 | 06:49:17 cksum 3616 R 15008 0 19.31 acpi_listen |
| 59 | 06:49:17 cksum 3616 R 6123 0 17.23 add-apt-repository |
| 60 | 06:49:17 cksum 3616 R 6280 0 18.40 addpart |
| 61 | 06:49:17 cksum 3616 R 27696 0 2.16 addr2line |
| 62 | 06:49:17 cksum 3616 R 58080 0 10.11 ag |
| 63 | 06:49:17 cksum 3616 R 906 0 6.30 ec2-meta-data |
| 64 | 06:49:17 cksum 3616 R 6320 0 10.00 animate.im6 |
| 65 | 06:49:17 cksum 3616 R 5680 0 18.69 anytopnm |
| 66 | 06:49:17 cksum 3616 R 2671 0 20.27 apport-bug |
| 67 | 06:49:17 cksum 3616 R 12566 0 16.72 apport-cli |
| 68 | 06:49:17 cksum 3616 R 1622 0 7.95 apport-unpack |
| 69 | 06:49:17 cksum 3616 R 10440 0 2.37 appres |
| 70 | 06:49:17 cksum 3616 R 48112 0 5.42 whatis |
| 71 | 06:49:17 cksum 3616 R 14832 0 6.24 apt |
| 72 | 06:49:17 cksum 3616 R 65536 0 24.74 apt-cache |
| 73 | 06:49:17 cksum 3616 R 27264 0 1.68 apt-cdrom |
| 74 | 06:49:17 cksum 3616 R 23224 0 5.31 apt-extracttemplates |
| 75 | 06:49:17 cksum 3616 R 65536 0 8.08 apt-ftparchive |
| 76 | 06:49:17 cksum 3616 R 65536 128 2.92 apt-ftparchive |
| 77 | 06:49:17 cksum 3616 R 65536 0 9.58 aptitude-curses |
| 78 | 06:49:17 cksum 3616 R 65536 128 44.25 aptitude-curses |
| 79 | 06:49:17 cksum 3616 R 65536 384 1.69 aptitude-curses |
| 80 | [...] |
| 81 | |
| 82 | This time a cksum(1) command can be seen reading various files (from /usr/bin). |
| 83 | |
| 84 | |
| 85 | A threshold of 0 will trace all operations. Warning: the output will be |
| 86 | verbose, as it will include all file system cache hits. |
| 87 | |
| 88 | # ./ext4slower 0 |
| 89 | Tracing ext4 operations |
| 90 | TIME COMM PID T BYTES OFF_KB LAT(ms) FILENAME |
| 91 | 06:58:05 supervise 1884 O 0 0 0.00 status.new |
| 92 | 06:58:05 supervise 1884 W 18 0 0.02 status.new |
| 93 | 06:58:05 supervise 1884 O 0 0 0.00 status.new |
| 94 | 06:58:05 supervise 1884 W 18 0 0.01 status.new |
| 95 | 06:58:05 supervise 15817 O 0 0 0.00 run |
| 96 | 06:58:05 supervise 15817 R 92 0 0.00 run |
| 97 | 06:58:05 supervise 15817 O 0 0 0.00 bash |
| 98 | 06:58:05 supervise 15817 R 128 0 0.00 bash |
| 99 | 06:58:05 supervise 15817 R 504 0 0.00 bash |
| 100 | 06:58:05 supervise 15817 R 28 0 0.00 bash |
| 101 | 06:58:05 supervise 15817 O 0 0 0.00 ld-2.19.so |
| 102 | 06:58:05 supervise 15817 R 64 0 0.00 ld-2.19.so |
| 103 | 06:58:05 supervise 15817 R 392 0 0.00 ld-2.19.so |
| 104 | 06:58:05 run 15817 O 0 0 0.00 ld.so.cache |
| 105 | 06:58:05 run 15817 O 0 0 0.00 libtinfo.so.5.9 |
| 106 | 06:58:05 run 15817 R 832 0 0.00 libtinfo.so.5.9 |
| 107 | 06:58:05 run 15817 O 0 0 0.00 libdl-2.19.so |
| 108 | 06:58:05 run 15817 R 832 0 0.00 libdl-2.19.so |
| 109 | 06:58:05 run 15817 O 0 0 0.00 libc-2.19.so |
| 110 | 06:58:05 run 15817 R 832 0 0.00 libc-2.19.so |
| 111 | 06:58:05 supervise 1876 O 0 0 0.00 status.new |
| 112 | 06:58:05 supervise 1876 W 18 0 0.01 status.new |
| 113 | 06:58:05 supervise 1895 O 0 0 0.00 status.new |
| 114 | 06:58:05 supervise 1895 W 18 0 0.02 status.new |
| 115 | 06:58:05 supervise 1876 O 0 0 0.00 status.new |
| 116 | 06:58:05 supervise 1876 W 18 0 0.01 status.new |
| 117 | 06:58:05 supervise 1872 O 0 0 0.00 status.new |
| 118 | 06:58:05 supervise 1872 W 18 0 0.02 status.new |
| 119 | 06:58:05 supervise 1895 O 0 0 0.00 status.new |
| 120 | 06:58:05 supervise 1895 W 18 0 0.01 status.new |
| 121 | 06:58:05 supervise 15818 R 92 0 0.00 run |
| 122 | 06:58:05 supervise 15818 O 0 0 0.00 bash |
| 123 | 06:58:05 supervise 15818 R 128 0 0.00 bash |
| 124 | 06:58:05 supervise 15818 R 504 0 0.00 bash |
| 125 | 06:58:05 supervise 15818 R 28 0 0.00 bash |
| 126 | 06:58:05 supervise 15818 O 0 0 0.00 ld-2.19.so |
| 127 | 06:58:05 supervise 15818 R 64 0 0.00 ld-2.19.so |
| 128 | 06:58:05 supervise 15818 R 392 0 0.00 ld-2.19.so |
| 129 | 06:58:05 supervise 15818 O 0 0 0.00 run |
| 130 | 06:58:05 supervise 1888 O 0 0 0.00 status.new |
| 131 | 06:58:05 supervise 1888 W 18 0 0.01 status.new |
| 132 | 06:58:05 supervise 1888 O 0 0 0.00 status.new |
| 133 | 06:58:05 supervise 1888 W 18 0 0.02 status.new |
| 134 | 06:58:05 supervise 15822 R 119 0 0.00 run |
| 135 | 06:58:05 supervise 15822 O 0 0 0.00 bash |
| 136 | 06:58:05 supervise 15822 R 128 0 0.00 bash |
| 137 | 06:58:05 supervise 15822 R 504 0 0.00 bash |
| 138 | 06:58:05 supervise 15822 R 28 0 0.00 bash |
| 139 | 06:58:05 supervise 15822 O 0 0 0.00 ld-2.19.so |
| 140 | 06:58:05 supervise 15822 R 64 0 0.00 ld-2.19.so |
| 141 | 06:58:05 supervise 15822 R 392 0 0.00 ld-2.19.so |
| 142 | 06:58:05 supervise 1892 O 0 0 0.00 status.new |
| 143 | 06:58:05 supervise 1892 W 18 0 0.02 status.new |
| 144 | 06:58:05 supervise 1892 O 0 0 0.00 status.new |
| 145 | 06:58:05 supervise 1892 W 18 0 0.02 status.new |
| 146 | 06:58:05 supervise 15820 O 0 0 0.00 run |
| 147 | [...] |
| 148 | |
| 149 | The output now includes open operations ("O"), and writes ("W"). |
| 150 | |
| 151 | |
| 152 | A -j option will print just the fields (parsable output, csv): |
| 153 | |
| 154 | # ./ext4slower -j 1 |
| 155 | ENDTIME_us,TASK,PID,TYPE,BYTES,OFFSET_b,LATENCY_us,FILE |
| 156 | 127200712278,bash,17225,R,128,0,14329,cksum |
| 157 | 127200722986,cksum,17225,R,3274,0,8368,command-not-found |
| 158 | 127200735581,cksum,17225,R,65536,0,10903,libbfd-2.24-system.so |
| 159 | 127200738482,cksum,17225,R,65536,131072,2419,libbfd-2.24-system.so |
| 160 | 127200749226,cksum,17225,R,65536,655360,8995,libbfd-2.24-system.so |
| 161 | 127200776273,cksum,17225,R,55080,0,25297,libbind9.so.90.0.9 |
| 162 | 127200784688,cksum,17225,R,65536,0,7873,libblas.so.3.0 |
| 163 | 127200787551,cksum,17225,R,65536,131072,2386,libblas.so.3.0 |
| 164 | 127200795524,cksum,17225,R,18624,0,4947,libcpupower.so.3.13.0-49 |
| 165 | 127200802073,cksum,17225,R,65536,0,6410,libcwidget.so.3.0.0 |
| 166 | 127200808718,cksum,17225,R,65536,131072,6181,libcwidget.so.3.0.0 |
| 167 | 127200829518,cksum,17225,R,65536,0,14213,libdns.so.100.2.2 |
| 168 | 127200832916,cksum,17225,R,65536,131072,2911,libdns.so.100.2.2 |
| 169 | 127200841044,cksum,17225,R,65536,655360,6376,libdns.so.100.2.2 |
| 170 | 127200853646,cksum,17225,R,956,0,1022,libdumbnet.la |
| 171 | 127200857814,cksum,17225,R,61096,0,4111,libdumbnet.so.1.0.1 |
| 172 | 127200869655,cksum,17225,R,65536,0,11252,libgettextlib-0.18.3.so |
| 173 | 127200872985,cksum,17225,R,65536,131072,2882,libgettextlib-0.18.3.so |
| 174 | 127200883063,cksum,17225,R,65536,0,9661,libgettextsrc-0.18.3.so |
| 175 | 127200884767,cksum,17225,R,65536,131072,1251,libgettextsrc-0.18.3.so |
| 176 | 127200904830,cksum,17225,R,65536,0,19571,libgirepository-1.0.so.1.0.0 |
| 177 | 127200906354,cksum,17225,R,65536,131072,1080,libgirepository-1.0.so.1.0.0 |
| 178 | 127200936047,cksum,17225,R,65536,0,28674,libGraphicsMagick.a |
| 179 | 127200939091,cksum,17225,R,65536,131072,2576,libGraphicsMagick.a |
| 180 | 127200947295,cksum,17225,R,65536,655360,6463,libGraphicsMagick.a |
| 181 | 127200958793,cksum,17225,R,65536,1966080,7034,libGraphicsMagick.a |
| 182 | [...] |
| 183 | |
| 184 | This may be useful for visualizing with another tool, for example, for |
| 185 | producing a scatter plot of ENDTIME vs LATENCY, to look for time-based |
| 186 | patterns. |
| 187 | |
| 188 | |
| 189 | USAGE message: |
| 190 | |
| 191 | # ./ext4slower -h |
| 192 | usage: ext4slower [-h] [-j] [-p PID] [min_ms] |
| 193 | |
| 194 | Trace common ext4 file operations slower than a threshold |
| 195 | |
| 196 | positional arguments: |
| 197 | min_ms minimum I/O duration to trace, in ms (default 10) |
| 198 | |
| 199 | optional arguments: |
| 200 | -h, --help show this help message and exit |
| 201 | -j, --csv just print fields: comma-separated values |
| 202 | -p PID, --pid PID trace this PID only |
| 203 | |
| 204 | examples: |
| 205 | ./ext4slower # trace operations slower than 10 ms (default) |
| 206 | ./ext4slower 1 # trace operations slower than 1 ms |
| 207 | ./ext4slower -j 1 # ... 1 ms, parsable output (csv) |
| 208 | ./ext4slower 0 # trace all operations (warning: verbose) |
| 209 | ./ext4slower -p 185 # trace PID 185 only |