Brendan Gregg | bc54bb6 | 2016-02-14 23:13:13 -0800 | [diff] [blame] | 1 | Demonstrations of zfsslower, the Linux eBPF/bcc version. |
| 2 | |
| 3 | |
| 4 | zfsslower shows ZFS reads, writes, opens, and fsyncs, slower than a threshold. |
| 5 | It has been written to work on ZFS on Linux (http://zfsonlinux.org). For |
| 6 | example: |
| 7 | |
| 8 | # ./zfsslower |
| 9 | Tracing ZFS operations slower than 10 ms |
| 10 | TIME COMM PID T BYTES OFF_KB LAT(ms) FILENAME |
| 11 | 06:31:28 dd 25570 W 131072 38784 303.92 data1 |
| 12 | 06:31:34 dd 25686 W 131072 38784 388.28 data1 |
| 13 | 06:31:35 dd 25686 W 131072 78720 519.66 data1 |
| 14 | 06:31:35 dd 25686 W 131072 116992 405.94 data1 |
| 15 | 06:31:35 dd 25686 W 131072 153600 433.52 data1 |
| 16 | 06:31:36 dd 25686 W 131072 188672 314.37 data1 |
| 17 | 06:31:36 dd 25686 W 131072 222336 372.33 data1 |
| 18 | 06:31:36 dd 25686 W 131072 254592 309.59 data1 |
| 19 | 06:31:37 dd 25686 W 131072 285440 304.52 data1 |
| 20 | 06:31:37 dd 25686 W 131072 315008 236.45 data1 |
| 21 | 06:31:37 dd 25686 W 131072 343424 193.54 data1 |
| 22 | 06:31:38 dd 25686 W 131072 370560 286.07 data1 |
| 23 | 06:31:38 dd 25686 W 131072 396672 251.92 data1 |
| 24 | [...] |
| 25 | |
| 26 | This shows writes to a "data1" file, each taking well over the 10 ms threshold. |
| 27 | the slowest, on the 3rd line of output, reached 519.66 ms for a 128 Kbyte |
| 28 | write by the "dd" command. |
| 29 | |
| 30 | This "latency" is measured from when the operation was issued from the VFS |
| 31 | interface to the file system (via the ZFS POSIX layer), to when it completed. |
| 32 | This spans everything: block device I/O (disk I/O), file system CPU cycles, |
| 33 | file system locks, run queue latency, etc. This is a better measure of the |
| 34 | latency suffered by applications reading from the file system than measuring |
| 35 | this down at the block device interface. |
| 36 | |
| 37 | Note that this only traces the common file system operations previously |
| 38 | listed: other file system operations (eg, inode operations including |
| 39 | getattr()) are not traced. |
| 40 | |
| 41 | |
| 42 | A threshold of 0 will trace all operations. Warning: the output will be |
| 43 | verbose, as it will include all file system cache hits. |
| 44 | |
| 45 | # ./zfsslower 0 |
| 46 | Tracing ZFS operations |
| 47 | TIME COMM PID T BYTES OFF_KB LAT(ms) FILENAME |
| 48 | 06:36:07 dd 32242 O 0 0 0.01 data1 |
| 49 | 06:36:07 dd 32242 W 131072 0 0.25 data1 |
| 50 | 06:36:07 dd 32242 W 131072 128 0.03 data1 |
| 51 | 06:36:07 dd 32242 W 131072 256 0.04 data1 |
| 52 | 06:36:07 dd 32242 W 131072 384 0.04 data1 |
| 53 | 06:36:07 dd 32242 W 131072 512 0.04 data1 |
| 54 | 06:36:07 dd 32242 W 131072 640 0.03 data1 |
| 55 | 06:36:07 dd 32242 W 131072 768 0.03 data1 |
| 56 | 06:36:07 dd 32242 W 131072 896 0.04 data1 |
| 57 | 06:36:07 dd 32242 W 131072 1024 0.28 data1 |
| 58 | 06:36:07 dd 32242 W 131072 1152 0.04 data1 |
| 59 | 06:36:07 dd 32242 W 131072 1280 0.03 data1 |
| 60 | [...] |
| 61 | 06:36:07 dd 32242 W 131072 13824 0.04 data1 |
| 62 | 06:36:07 dd 32242 W 131072 13952 0.04 data1 |
| 63 | 06:36:07 dd 32242 W 131072 14080 0.04 data1 |
| 64 | 06:36:07 dd 32242 W 131072 14208 398.92 data1 |
| 65 | 06:36:07 dd 32242 W 131072 14336 0.04 data1 |
| 66 | 06:36:07 dd 32242 W 131072 14464 0.04 data1 |
| 67 | 06:36:07 dd 32242 W 131072 15104 0.03 data1 |
| 68 | [...] |
| 69 | |
| 70 | The output now includes the open operation for this file ("O"), and then the |
| 71 | writes. Most of the writes are very fast, with only an occasional outlier that |
| 72 | is in the hundreds of milliseconds. |
| 73 | |
| 74 | Fortunately this is not a real world environment: I setup a zpool on top of a |
| 75 | XFS file system for testing purposes. More debugging using other tools will |
| 76 | explain these outliers: possibly XFS flushing. |
| 77 | |
| 78 | |
| 79 | Here's a random read workload, and showing operations slower than 1 ms: |
| 80 | |
| 81 | # ./zfsslower 1 |
| 82 | Tracing ZFS operations slower than 1 ms |
| 83 | TIME COMM PID T BYTES OFF_KB LAT(ms) FILENAME |
| 84 | 06:47:30 randread.pl 15431 R 8192 97840 1.03 data1 |
| 85 | 06:47:30 randread.pl 15431 R 8192 416744 1.12 data1 |
| 86 | 06:47:31 randread.pl 15431 R 8192 228856 1.96 data1 |
| 87 | 06:47:31 randread.pl 15431 R 8192 452248 1.02 data1 |
| 88 | 06:47:31 randread.pl 15431 R 8192 315288 5.90 data1 |
| 89 | 06:47:31 randread.pl 15431 R 8192 752696 1.20 data1 |
| 90 | 06:47:31 randread.pl 15431 R 8192 481832 1.39 data1 |
| 91 | 06:47:31 randread.pl 15431 R 8192 673752 1.39 data1 |
| 92 | 06:47:31 randread.pl 15431 R 8192 691736 1.01 data1 |
| 93 | 06:47:31 randread.pl 15431 R 8192 694776 1.78 data1 |
| 94 | 06:47:31 randread.pl 15431 R 8192 403328 3.75 data1 |
| 95 | 06:47:31 randread.pl 15431 R 8192 567688 1.08 data1 |
| 96 | 06:47:31 randread.pl 15431 R 8192 694280 1.31 data1 |
| 97 | 06:47:31 randread.pl 15431 R 8192 669280 1.06 data1 |
| 98 | 06:47:31 randread.pl 15431 R 8192 426608 1.56 data1 |
| 99 | 06:47:31 randread.pl 15431 R 8192 42512 1.01 data1 |
| 100 | 06:47:31 randread.pl 15431 R 8192 22944 1.33 data1 |
| 101 | 06:47:31 randread.pl 15431 R 8192 427432 1.48 data1 |
| 102 | 06:47:31 randread.pl 15431 R 8192 261320 1.28 data1 |
| 103 | 06:47:31 randread.pl 15431 R 8192 132248 1.23 data1 |
| 104 | 06:47:31 randread.pl 15431 R 8192 96936 1.04 data1 |
| 105 | 06:47:31 randread.pl 15431 R 8192 482800 2.63 data1 |
| 106 | [...] |
| 107 | |
| 108 | |
| 109 | A -j option will print just the fields (parsable output, csv): |
| 110 | |
| 111 | # ./zfsslower -j 1 |
| 112 | ENDTIME_us,TASK,PID,TYPE,BYTES,OFFSET_b,LATENCY_us,FILE |
| 113 | 252305490911,randread.pl,17922,R,8192,163446784,1156,data1 |
| 114 | 252305493852,randread.pl,17922,R,8192,321437696,1129,data1 |
| 115 | 252305498839,randread.pl,17922,R,8192,475152384,1154,data1 |
| 116 | 252305505515,randread.pl,17922,R,8192,49094656,1082,data1 |
| 117 | 252305506774,randread.pl,17922,R,8192,470401024,1245,data1 |
| 118 | 252305509265,randread.pl,17922,R,8192,553246720,2412,data1 |
| 119 | 252305512365,randread.pl,17922,R,8192,20963328,1093,data1 |
| 120 | 252305513755,randread.pl,17922,R,8192,304111616,1350,data1 |
| 121 | 252305583330,randread.pl,17922,R,8192,166174720,1154,data1 |
| 122 | 252305593913,randread.pl,17922,R,8192,175079424,1241,data1 |
| 123 | 252305602833,randread.pl,17922,R,8192,305340416,3307,data1 |
| 124 | 252305608663,randread.pl,17922,R,8192,655958016,2704,data1 |
| 125 | 252305611212,randread.pl,17922,R,8192,40951808,1033,data1 |
| 126 | 252305614609,randread.pl,17922,R,8192,318922752,2687,data1 |
| 127 | 252305623800,randread.pl,17922,R,8192,246734848,2983,data1 |
| 128 | 252305711125,randread.pl,17922,R,8192,581795840,1091,data1 |
| 129 | 252305728694,randread.pl,17922,R,8192,710483968,1034,data1 |
| 130 | 252305762046,randread.pl,17922,R,8192,329367552,1405,data1 |
| 131 | 252305798215,randread.pl,17922,R,8192,44482560,1030,data1 |
| 132 | 252305806748,randread.pl,17922,R,8192,660602880,1069,data1 |
| 133 | 252305826360,randread.pl,17922,R,8192,616144896,2327,data1 |
| 134 | [...] |
| 135 | |
| 136 | |
| 137 | USAGE message: |
| 138 | |
| 139 | # ./zfsslower -h |
| 140 | usage: zfsslower [-h] [-j] [-p PID] [min_ms] |
| 141 | |
| 142 | Trace common ZFS file operations slower than a threshold |
| 143 | |
| 144 | positional arguments: |
| 145 | min_ms minimum I/O duration to trace, in ms (default 10) |
| 146 | |
| 147 | optional arguments: |
| 148 | -h, --help show this help message and exit |
| 149 | -j, --csv just print fields: comma-separated values |
| 150 | -p PID, --pid PID trace this PID only |
| 151 | |
| 152 | examples: |
| 153 | ./zfsslower # trace operations slower than 10 ms (default) |
| 154 | ./zfsslower 1 # trace operations slower than 1 ms |
| 155 | ./zfsslower -j 1 # ... 1 ms, parsable output (csv) |
| 156 | ./zfsslower 0 # trace all operations (warning: verbose) |
| 157 | ./zfsslower -p 185 # trace PID 185 only |