| Demonstrations of zfsslower, the Linux eBPF/bcc version. |
| |
| |
| zfsslower shows ZFS reads, writes, opens, and fsyncs, slower than a threshold. |
| It has been written to work on ZFS on Linux (http://zfsonlinux.org). For |
| example: |
| |
| # ./zfsslower |
| Tracing ZFS operations slower than 10 ms |
| TIME COMM PID T BYTES OFF_KB LAT(ms) FILENAME |
| 06:31:28 dd 25570 W 131072 38784 303.92 data1 |
| 06:31:34 dd 25686 W 131072 38784 388.28 data1 |
| 06:31:35 dd 25686 W 131072 78720 519.66 data1 |
| 06:31:35 dd 25686 W 131072 116992 405.94 data1 |
| 06:31:35 dd 25686 W 131072 153600 433.52 data1 |
| 06:31:36 dd 25686 W 131072 188672 314.37 data1 |
| 06:31:36 dd 25686 W 131072 222336 372.33 data1 |
| 06:31:36 dd 25686 W 131072 254592 309.59 data1 |
| 06:31:37 dd 25686 W 131072 285440 304.52 data1 |
| 06:31:37 dd 25686 W 131072 315008 236.45 data1 |
| 06:31:37 dd 25686 W 131072 343424 193.54 data1 |
| 06:31:38 dd 25686 W 131072 370560 286.07 data1 |
| 06:31:38 dd 25686 W 131072 396672 251.92 data1 |
| [...] |
| |
| This shows writes to a "data1" file, each taking well over the 10 ms threshold. |
| the slowest, on the 3rd line of output, reached 519.66 ms for a 128 Kbyte |
| write by the "dd" command. |
| |
| This "latency" is measured from when the operation was issued from the VFS |
| interface to the file system (via the ZFS POSIX layer), to when it completed. |
| This spans everything: block device I/O (disk I/O), file system CPU cycles, |
| file system locks, run queue latency, etc. This is a better measure of the |
| latency suffered by applications reading from the file system than measuring |
| this down at the block device interface. |
| |
| Note that this only traces the common file system operations previously |
| listed: other file system operations (eg, inode operations including |
| getattr()) are not traced. |
| |
| |
| A threshold of 0 will trace all operations. Warning: the output will be |
| verbose, as it will include all file system cache hits. |
| |
| # ./zfsslower 0 |
| Tracing ZFS operations |
| TIME COMM PID T BYTES OFF_KB LAT(ms) FILENAME |
| 06:36:07 dd 32242 O 0 0 0.01 data1 |
| 06:36:07 dd 32242 W 131072 0 0.25 data1 |
| 06:36:07 dd 32242 W 131072 128 0.03 data1 |
| 06:36:07 dd 32242 W 131072 256 0.04 data1 |
| 06:36:07 dd 32242 W 131072 384 0.04 data1 |
| 06:36:07 dd 32242 W 131072 512 0.04 data1 |
| 06:36:07 dd 32242 W 131072 640 0.03 data1 |
| 06:36:07 dd 32242 W 131072 768 0.03 data1 |
| 06:36:07 dd 32242 W 131072 896 0.04 data1 |
| 06:36:07 dd 32242 W 131072 1024 0.28 data1 |
| 06:36:07 dd 32242 W 131072 1152 0.04 data1 |
| 06:36:07 dd 32242 W 131072 1280 0.03 data1 |
| [...] |
| 06:36:07 dd 32242 W 131072 13824 0.04 data1 |
| 06:36:07 dd 32242 W 131072 13952 0.04 data1 |
| 06:36:07 dd 32242 W 131072 14080 0.04 data1 |
| 06:36:07 dd 32242 W 131072 14208 398.92 data1 |
| 06:36:07 dd 32242 W 131072 14336 0.04 data1 |
| 06:36:07 dd 32242 W 131072 14464 0.04 data1 |
| 06:36:07 dd 32242 W 131072 15104 0.03 data1 |
| [...] |
| |
| The output now includes the open operation for this file ("O"), and then the |
| writes. Most of the writes are very fast, with only an occasional outlier that |
| is in the hundreds of milliseconds. |
| |
| Fortunately this is not a real world environment: I setup a zpool on top of a |
| XFS file system for testing purposes. More debugging using other tools will |
| explain these outliers: possibly XFS flushing. |
| |
| |
| Here's a random read workload, and showing operations slower than 1 ms: |
| |
| # ./zfsslower 1 |
| Tracing ZFS operations slower than 1 ms |
| TIME COMM PID T BYTES OFF_KB LAT(ms) FILENAME |
| 06:47:30 randread.pl 15431 R 8192 97840 1.03 data1 |
| 06:47:30 randread.pl 15431 R 8192 416744 1.12 data1 |
| 06:47:31 randread.pl 15431 R 8192 228856 1.96 data1 |
| 06:47:31 randread.pl 15431 R 8192 452248 1.02 data1 |
| 06:47:31 randread.pl 15431 R 8192 315288 5.90 data1 |
| 06:47:31 randread.pl 15431 R 8192 752696 1.20 data1 |
| 06:47:31 randread.pl 15431 R 8192 481832 1.39 data1 |
| 06:47:31 randread.pl 15431 R 8192 673752 1.39 data1 |
| 06:47:31 randread.pl 15431 R 8192 691736 1.01 data1 |
| 06:47:31 randread.pl 15431 R 8192 694776 1.78 data1 |
| 06:47:31 randread.pl 15431 R 8192 403328 3.75 data1 |
| 06:47:31 randread.pl 15431 R 8192 567688 1.08 data1 |
| 06:47:31 randread.pl 15431 R 8192 694280 1.31 data1 |
| 06:47:31 randread.pl 15431 R 8192 669280 1.06 data1 |
| 06:47:31 randread.pl 15431 R 8192 426608 1.56 data1 |
| 06:47:31 randread.pl 15431 R 8192 42512 1.01 data1 |
| 06:47:31 randread.pl 15431 R 8192 22944 1.33 data1 |
| 06:47:31 randread.pl 15431 R 8192 427432 1.48 data1 |
| 06:47:31 randread.pl 15431 R 8192 261320 1.28 data1 |
| 06:47:31 randread.pl 15431 R 8192 132248 1.23 data1 |
| 06:47:31 randread.pl 15431 R 8192 96936 1.04 data1 |
| 06:47:31 randread.pl 15431 R 8192 482800 2.63 data1 |
| [...] |
| |
| |
| A -j option will print just the fields (parsable output, csv): |
| |
| # ./zfsslower -j 1 |
| ENDTIME_us,TASK,PID,TYPE,BYTES,OFFSET_b,LATENCY_us,FILE |
| 252305490911,randread.pl,17922,R,8192,163446784,1156,data1 |
| 252305493852,randread.pl,17922,R,8192,321437696,1129,data1 |
| 252305498839,randread.pl,17922,R,8192,475152384,1154,data1 |
| 252305505515,randread.pl,17922,R,8192,49094656,1082,data1 |
| 252305506774,randread.pl,17922,R,8192,470401024,1245,data1 |
| 252305509265,randread.pl,17922,R,8192,553246720,2412,data1 |
| 252305512365,randread.pl,17922,R,8192,20963328,1093,data1 |
| 252305513755,randread.pl,17922,R,8192,304111616,1350,data1 |
| 252305583330,randread.pl,17922,R,8192,166174720,1154,data1 |
| 252305593913,randread.pl,17922,R,8192,175079424,1241,data1 |
| 252305602833,randread.pl,17922,R,8192,305340416,3307,data1 |
| 252305608663,randread.pl,17922,R,8192,655958016,2704,data1 |
| 252305611212,randread.pl,17922,R,8192,40951808,1033,data1 |
| 252305614609,randread.pl,17922,R,8192,318922752,2687,data1 |
| 252305623800,randread.pl,17922,R,8192,246734848,2983,data1 |
| 252305711125,randread.pl,17922,R,8192,581795840,1091,data1 |
| 252305728694,randread.pl,17922,R,8192,710483968,1034,data1 |
| 252305762046,randread.pl,17922,R,8192,329367552,1405,data1 |
| 252305798215,randread.pl,17922,R,8192,44482560,1030,data1 |
| 252305806748,randread.pl,17922,R,8192,660602880,1069,data1 |
| 252305826360,randread.pl,17922,R,8192,616144896,2327,data1 |
| [...] |
| |
| |
| USAGE message: |
| |
| # ./zfsslower -h |
| usage: zfsslower [-h] [-j] [-p PID] [min_ms] |
| |
| Trace common ZFS file operations slower than a threshold |
| |
| positional arguments: |
| min_ms minimum I/O duration to trace, in ms (default 10) |
| |
| optional arguments: |
| -h, --help show this help message and exit |
| -j, --csv just print fields: comma-separated values |
| -p PID, --pid PID trace this PID only |
| |
| examples: |
| ./zfsslower # trace operations slower than 10 ms (default) |
| ./zfsslower 1 # trace operations slower than 1 ms |
| ./zfsslower -j 1 # ... 1 ms, parsable output (csv) |
| ./zfsslower 0 # trace all operations (warning: verbose) |
| ./zfsslower -p 185 # trace PID 185 only |