Jens Axboe | ebac465 | 2005-12-08 15:25:21 +0100 | [diff] [blame] | 1 | fio |
| 2 | --- |
| 3 | |
Jens Axboe | 7980911 | 2006-06-09 10:14:54 +0200 | [diff] [blame] | 4 | fio is a tool that will spawn a number of threads or processes doing a |
| 5 | particular type of io action as specified by the user. fio takes a |
| 6 | number of global parameters, each inherited by the thread unless |
| 7 | otherwise parameters given to them overriding that setting is given. |
| 8 | The typical use of fio is to write a job file matching the io load |
| 9 | one wants to simulate. |
Jens Axboe | ebac465 | 2005-12-08 15:25:21 +0100 | [diff] [blame] | 10 | |
Jens Axboe | 2b02b54 | 2005-12-08 15:29:14 +0100 | [diff] [blame] | 11 | |
| 12 | Source |
| 13 | ------ |
| 14 | |
| 15 | fio resides in a git repo, the canonical place is: |
| 16 | |
| 17 | git://brick.kernel.dk/data/git/fio.git |
| 18 | |
Jens Axboe | 7980911 | 2006-06-09 10:14:54 +0200 | [diff] [blame] | 19 | Snapshots are frequently generated and they include the git meta data as |
| 20 | well. You can download them here: |
Jens Axboe | 2b02b54 | 2005-12-08 15:29:14 +0100 | [diff] [blame] | 21 | |
| 22 | http://brick.kernel.dk/snaps/ |
| 23 | |
Jens Axboe | 1053a10 | 2006-06-06 09:23:13 +0200 | [diff] [blame] | 24 | Pascal Bleser <guru@unixtech.be> has fio RPMs in his repository, you |
| 25 | can find them here: |
| 26 | |
| 27 | http://linux01.gwdg.de/~pbleser/rpm-navigation.php?cat=System/fio |
| 28 | |
Jens Axboe | 2b02b54 | 2005-12-08 15:29:14 +0100 | [diff] [blame] | 29 | |
Jens Axboe | bbfd6b0 | 2006-06-07 19:42:54 +0200 | [diff] [blame] | 30 | Building |
| 31 | -------- |
| 32 | |
| 33 | Just type 'make' and 'make install'. If on FreeBSD, for now you have to |
| 34 | specify the FreeBSD Makefile with -f, eg: |
| 35 | |
| 36 | $ make -f Makefile.Freebsd && make -f Makefile.FreeBSD install |
| 37 | |
Jens Axboe | edffcb9 | 2006-06-08 13:40:18 +0200 | [diff] [blame] | 38 | Likewise with OpenSolaris, use the Makefile.solaris to compile there. |
Jens Axboe | bbfd6b0 | 2006-06-07 19:42:54 +0200 | [diff] [blame] | 39 | This might change in the future if I opt for an autoconf type setup. |
| 40 | |
| 41 | |
Jens Axboe | 972cfd2 | 2006-06-09 11:08:56 +0200 | [diff] [blame^] | 42 | Command line |
| 43 | ------------ |
Jens Axboe | ebac465 | 2005-12-08 15:25:21 +0100 | [diff] [blame] | 44 | |
| 45 | $ fio |
| 46 | -s IO is sequential |
| 47 | -b block size in KiB for each io |
| 48 | -t <sec> Runtime in seconds |
| 49 | -r For random io, sequence must be repeatable |
| 50 | -R <on> If one thread fails to meet rate, quit all |
| 51 | -o <on> Use direct IO is 1, buffered if 0 |
| 52 | -l Generate per-job latency logs |
| 53 | -w Generate per-job bandwidth logs |
| 54 | -f <file> Read <file> for job descriptions |
Jens Axboe | eb8bbf4 | 2006-06-08 21:40:11 +0200 | [diff] [blame] | 55 | -O <file> Log output to file |
Jens Axboe | 4785f99 | 2006-05-26 03:59:10 +0200 | [diff] [blame] | 56 | -h Print help info |
Jens Axboe | ebac465 | 2005-12-08 15:25:21 +0100 | [diff] [blame] | 57 | -v Print version information and exit |
| 58 | |
Jens Axboe | 972cfd2 | 2006-06-09 11:08:56 +0200 | [diff] [blame^] | 59 | Any parameters following the options will be assumed to be job files. |
| 60 | You can add as many as you want, each job file will be regarded as a |
| 61 | separate group and fio will stonewall it's execution. |
| 62 | |
Jens Axboe | 7980911 | 2006-06-09 10:14:54 +0200 | [diff] [blame] | 63 | |
| 64 | Job file |
| 65 | -------- |
| 66 | |
| 67 | Only a few options can be controlled with command line parameters, |
| 68 | generally it's a lot easier to just write a simple job file to describe |
| 69 | the workload. The job file format is in the ini style format, as it's |
| 70 | easy to read and write for the user. |
| 71 | |
| 72 | The job file parameters are: |
Jens Axboe | ebac465 | 2005-12-08 15:25:21 +0100 | [diff] [blame] | 73 | |
Jens Axboe | 0145205 | 2006-06-07 10:29:47 +0200 | [diff] [blame] | 74 | name=x Use 'x' as the identifier for this job. |
Jens Axboe | ebac465 | 2005-12-08 15:25:21 +0100 | [diff] [blame] | 75 | directory=x Use 'x' as the top level directory for storing files |
Jens Axboe | 3d60d1e | 2006-05-25 06:31:06 +0200 | [diff] [blame] | 76 | rw=x 'x' may be: read, randread, write, randwrite, |
| 77 | rw (read-write mix), randrw (read-write random mix) |
Jens Axboe | a6ccc7b | 2006-06-02 10:14:15 +0200 | [diff] [blame] | 78 | rwmixcycle=x Base cycle for switching between read and write |
| 79 | in msecs. |
| 80 | rwmixread=x 'x' percentage of rw mix ios will be reads. If |
| 81 | rwmixwrite is also given, the last of the two will |
| 82 | be used if they don't add up to 100%. |
| 83 | rwmixwrite=x 'x' percentage of rw mix ios will be writes. See |
| 84 | rwmixread. |
Jens Axboe | ebac465 | 2005-12-08 15:25:21 +0100 | [diff] [blame] | 85 | size=x Set file size to x bytes (x string can include k/m/g) |
| 86 | ioengine=x 'x' may be: aio/libaio/linuxaio for Linux aio, |
| 87 | posixaio for POSIX aio, sync for regular read/write io, |
Jens Axboe | 8756e4d | 2006-05-27 20:24:53 +0200 | [diff] [blame] | 88 | mmap for mmap'ed io, splice for using splice/vmsplice, |
| 89 | or sgio for direct SG_IO io. The latter only works on |
| 90 | Linux on SCSI (or SCSI-like devices, such as |
| 91 | usb-storage or sata/libata driven) devices. |
Jens Axboe | ebac465 | 2005-12-08 15:25:21 +0100 | [diff] [blame] | 92 | iodepth=x For async io, allow 'x' ios in flight |
| 93 | overwrite=x If 'x', layout a write file first. |
| 94 | prio=x Run io at prio X, 0-7 is the kernel allowed range |
| 95 | prioclass=x Run io at prio class X |
| 96 | bs=x Use 'x' for thread blocksize. May include k/m postfix. |
| 97 | bsrange=x-y Mix thread block sizes randomly between x and y. May |
| 98 | also include k/m postfix. |
| 99 | direct=x 1 for direct IO, 0 for buffered IO |
| 100 | thinktime=x "Think" x usec after each io |
| 101 | rate=x Throttle rate to x KiB/sec |
| 102 | ratemin=x Quit if rate of x KiB/sec can't be met |
| 103 | ratecycle=x ratemin averaged over x msecs |
| 104 | cpumask=x Only allow job to run on CPUs defined by mask. |
| 105 | fsync=x If writing, fsync after every x blocks have been written |
| 106 | startdelay=x Start this thread x seconds after startup |
| 107 | timeout=x Terminate x seconds after startup |
| 108 | offset=x Start io at offset x (x string can include k/m/g) |
| 109 | invalidate=x Invalidate page cache for file prior to doing io |
| 110 | sync=x Use sync writes if x and writing |
| 111 | mem=x If x == malloc, use malloc for buffers. If x == shm, |
| 112 | use shm for buffers. If x == mmap, use anon mmap. |
| 113 | exitall When one thread quits, terminate the others |
| 114 | bwavgtime=x Average bandwidth stats over an x msec window. |
| 115 | create_serialize=x If 'x', serialize file creation. |
| 116 | create_fsync=x If 'x', run fsync() after file creation. |
Jens Axboe | fc1a471 | 2006-05-30 13:04:05 +0200 | [diff] [blame] | 117 | end_fsync=x If 'x', run fsync() after end-of-job. |
Jens Axboe | ebac465 | 2005-12-08 15:25:21 +0100 | [diff] [blame] | 118 | loops=x Run the job 'x' number of times. |
| 119 | verify=x If 'x' == md5, use md5 for verifies. If 'x' == crc32, |
| 120 | use crc32 for verifies. md5 is 'safer', but crc32 is |
| 121 | a lot faster. Only makes sense for writing to a file. |
| 122 | stonewall Wait for preceeding jobs to end before running. |
| 123 | numjobs=x Create 'x' similar entries for this job |
| 124 | thread Use pthreads instead of forked jobs |
Jens Axboe | 20dc95c | 2005-12-09 10:29:35 +0100 | [diff] [blame] | 125 | zonesize=x |
| 126 | zoneskip=y Zone options must be paired. If given, the job |
| 127 | will skip y bytes for every x read/written. This |
| 128 | can be used to gauge hard drive speed over the entire |
| 129 | platter, without reading everything. Both x/y can |
| 130 | include k/m/g suffix. |
Jens Axboe | aea47d4 | 2006-05-26 19:27:29 +0200 | [diff] [blame] | 131 | iolog=x Open and read io pattern from file 'x'. The file must |
| 132 | contain one io action per line in the following format: |
| 133 | rw, offset, length |
| 134 | where with rw=0/1 for read/write, and the offset |
| 135 | and length entries being in bytes. |
Jens Axboe | 843a741 | 2006-06-01 21:14:21 -0700 | [diff] [blame] | 136 | write_iolog=x Write an iolog to file 'x' in the same format as iolog. |
| 137 | The iolog options are exclusive, if both given the |
| 138 | read iolog will be performed. |
Jens Axboe | c04f7ec | 2006-05-31 10:13:16 +0200 | [diff] [blame] | 139 | lockmem=x Lock down x amount of memory on the machine, to |
| 140 | simulate a machine with less memory available. x can |
| 141 | include k/m/g suffix. |
Jens Axboe | b6f4d88 | 2006-06-02 10:32:51 +0200 | [diff] [blame] | 142 | nice=x Run job at given nice value. |
Jens Axboe | 4e0ba8a | 2006-06-06 09:36:28 +0200 | [diff] [blame] | 143 | exec_prerun=x Run 'x' before job io is begun. |
| 144 | exec_postrun=x Run 'x' after job io has finished. |
Jens Axboe | da86774 | 2006-06-06 10:39:10 -0700 | [diff] [blame] | 145 | ioscheduler=x Use ioscheduler 'x' for this job. |
Jens Axboe | ebac465 | 2005-12-08 15:25:21 +0100 | [diff] [blame] | 146 | |
Jens Axboe | 7980911 | 2006-06-09 10:14:54 +0200 | [diff] [blame] | 147 | |
Jens Axboe | ebac465 | 2005-12-08 15:25:21 +0100 | [diff] [blame] | 148 | Examples using a job file |
| 149 | ------------------------- |
| 150 | |
Jens Axboe | 7980911 | 2006-06-09 10:14:54 +0200 | [diff] [blame] | 151 | Example 1) Two random readers |
Jens Axboe | ebac465 | 2005-12-08 15:25:21 +0100 | [diff] [blame] | 152 | |
Jens Axboe | 7980911 | 2006-06-09 10:14:54 +0200 | [diff] [blame] | 153 | Lets say we want to simulate two threads reading randomly from a file |
| 154 | each. They will be doing IO in 4KiB chunks, using raw (O_DIRECT) IO. |
| 155 | Since they share most parameters, we'll put those in the [global] |
| 156 | section. Job 1 will use a 128MiB file, job 2 will use a 256MiB file. |
Jens Axboe | ebac465 | 2005-12-08 15:25:21 +0100 | [diff] [blame] | 157 | |
Jens Axboe | 7980911 | 2006-06-09 10:14:54 +0200 | [diff] [blame] | 158 | ; ---snip--- |
Jens Axboe | ebac465 | 2005-12-08 15:25:21 +0100 | [diff] [blame] | 159 | |
Jens Axboe | 7980911 | 2006-06-09 10:14:54 +0200 | [diff] [blame] | 160 | [global] |
| 161 | ioengine=sync ; regular read/write(2), the default |
| 162 | rw=randread |
| 163 | bs=4k |
Jens Axboe | ebac465 | 2005-12-08 15:25:21 +0100 | [diff] [blame] | 164 | direct=1 |
| 165 | |
Jens Axboe | 7980911 | 2006-06-09 10:14:54 +0200 | [diff] [blame] | 166 | [file1] |
| 167 | size=128m |
Jens Axboe | ebac465 | 2005-12-08 15:25:21 +0100 | [diff] [blame] | 168 | |
Jens Axboe | 7980911 | 2006-06-09 10:14:54 +0200 | [diff] [blame] | 169 | [file2] |
| 170 | size=256m |
Jens Axboe | ebac465 | 2005-12-08 15:25:21 +0100 | [diff] [blame] | 171 | |
Jens Axboe | 7980911 | 2006-06-09 10:14:54 +0200 | [diff] [blame] | 172 | ; ---snip--- |
Jens Axboe | ebac465 | 2005-12-08 15:25:21 +0100 | [diff] [blame] | 173 | |
Jens Axboe | 7980911 | 2006-06-09 10:14:54 +0200 | [diff] [blame] | 174 | Generally the [] bracketed name specifies a file name, but the "global" |
| 175 | keyword is reserved for setting options that are inherited by each |
| 176 | subsequent job description. It's possible to have several [global] |
| 177 | sections in the job file, each one adds options that are inherited by |
| 178 | jobs defined below it. The name can also point to a block device, such |
| 179 | as /dev/sda. To run the above job file, simply do: |
| 180 | |
| 181 | $ fio jobfile |
| 182 | |
| 183 | Example 2) Many random writers |
| 184 | |
| 185 | Say we want to exercise the IO subsystem some more. We'll define 64 |
| 186 | threads doing random buffered writes. We'll let each thread use async io |
| 187 | with a depth of 4 ios in flight. A job file would then look like this: |
| 188 | |
| 189 | ; ---snip--- |
| 190 | |
| 191 | [global] |
| 192 | ioengine=libaio |
| 193 | iodepth=4 |
| 194 | rw=randwrite |
| 195 | bs=32k |
| 196 | direct=0 |
| 197 | size=64m |
| 198 | |
| 199 | [files] |
| 200 | numjobs=64 |
| 201 | |
| 202 | ; ---snip--- |
| 203 | |
| 204 | This will create files.[0-63] and perform the random writes to them. |
| 205 | |
| 206 | There are endless ways to define jobs, the examples/ directory contains |
| 207 | a few more examples. |
Jens Axboe | ebac465 | 2005-12-08 15:25:21 +0100 | [diff] [blame] | 208 | |
| 209 | |
| 210 | Interpreting the output |
| 211 | ----------------------- |
| 212 | |
| 213 | fio spits out a lot of output. While running, fio will display the |
| 214 | status of the jobs created. An example of that would be: |
| 215 | |
Jens Axboe | 972cfd2 | 2006-06-09 11:08:56 +0200 | [diff] [blame^] | 216 | Threads running: 1: [_r] [24.79% done] [eta 00h:01m:31s] |
Jens Axboe | ebac465 | 2005-12-08 15:25:21 +0100 | [diff] [blame] | 217 | |
| 218 | The characters inside the square brackets denote the current status of |
| 219 | each thread. The possible values (in typical life cycle order) are: |
| 220 | |
| 221 | Idle Run |
| 222 | ---- --- |
| 223 | P Thread setup, but not started. |
Jens Axboe | 7980911 | 2006-06-09 10:14:54 +0200 | [diff] [blame] | 224 | C Thread created. |
| 225 | I Thread initialized, waiting. |
Jens Axboe | ebac465 | 2005-12-08 15:25:21 +0100 | [diff] [blame] | 226 | R Running, doing sequential reads. |
| 227 | r Running, doing random reads. |
| 228 | W Running, doing sequential writes. |
| 229 | w Running, doing random writes. |
Jens Axboe | 7980911 | 2006-06-09 10:14:54 +0200 | [diff] [blame] | 230 | M Running, doing mixed sequential reads/writes. |
| 231 | m Running, doing mixed random reads/writes. |
| 232 | F Running, currently waiting for fsync() |
Jens Axboe | ebac465 | 2005-12-08 15:25:21 +0100 | [diff] [blame] | 233 | V Running, doing verification of written data. |
| 234 | E Thread exited, not reaped by main thread yet. |
| 235 | _ Thread reaped. |
| 236 | |
Jens Axboe | 7980911 | 2006-06-09 10:14:54 +0200 | [diff] [blame] | 237 | The other values are fairly self explanatory - number of threads |
| 238 | currently running and doing io, and the estimated completion percentage |
Jens Axboe | 972cfd2 | 2006-06-09 11:08:56 +0200 | [diff] [blame^] | 239 | and time for the running group. It's impossible to estimate runtime |
| 240 | of the following groups (if any). |
Jens Axboe | ebac465 | 2005-12-08 15:25:21 +0100 | [diff] [blame] | 241 | |
| 242 | When fio is done (or interrupted by ctrl-c), it will show the data for |
| 243 | each thread, group of threads, and disks in that order. For each data |
| 244 | direction, the output looks like: |
| 245 | |
| 246 | Client1 (g=0): err= 0: |
| 247 | write: io= 32MiB, bw= 666KiB/s, runt= 50320msec |
| 248 | slat (msec): min= 0, max= 136, avg= 0.03, dev= 1.92 |
| 249 | clat (msec): min= 0, max= 631, avg=48.50, dev=86.82 |
| 250 | bw (KiB/s) : min= 0, max= 1196, per=51.00%, avg=664.02, dev=681.68 |
| 251 | cpu : usr=1.49%, sys=0.25%, ctx=7969 |
| 252 | |
| 253 | The client number is printed, along with the group id and error of that |
| 254 | thread. Below is the io statistics, here for writes. In the order listed, |
| 255 | they denote: |
| 256 | |
| 257 | io= Number of megabytes io performed |
| 258 | bw= Average bandwidth rate |
| 259 | runt= The runtime of that thread |
| 260 | slat= Submission latency (avg being the average, dev being the |
| 261 | standard deviation). This is the time it took to submit |
| 262 | the io. For sync io, the slat is really the completion |
| 263 | latency, since queue/complete is one operation there. |
| 264 | clat= Completion latency. Same names as slat, this denotes the |
| 265 | time from submission to completion of the io pieces. For |
| 266 | sync io, clat will usually be equal (or very close) to 0, |
| 267 | as the time from submit to complete is basically just |
| 268 | CPU time (io has already been done, see slat explanation). |
| 269 | bw= Bandwidth. Same names as the xlat stats, but also includes |
| 270 | an approximate percentage of total aggregate bandwidth |
| 271 | this thread received in this group. This last value is |
| 272 | only really useful if the threads in this group are on the |
| 273 | same disk, since they are then competing for disk access. |
| 274 | cpu= CPU usage. User and system time, along with the number |
| 275 | of context switches this thread went through. |
| 276 | |
| 277 | After each client has been listed, the group statistics are printed. They |
| 278 | will look like this: |
| 279 | |
| 280 | Run status group 0 (all jobs): |
| 281 | READ: io=64MiB, aggrb=22178, minb=11355, maxb=11814, mint=2840msec, maxt=2955msec |
| 282 | WRITE: io=64MiB, aggrb=1302, minb=666, maxb=669, mint=50093msec, maxt=50320msec |
| 283 | |
| 284 | For each data direction, it prints: |
| 285 | |
| 286 | io= Number of megabytes io performed. |
| 287 | aggrb= Aggregate bandwidth of threads in this group. |
| 288 | minb= The minimum average bandwidth a thread saw. |
| 289 | maxb= The maximum average bandwidth a thread saw. |
Jens Axboe | 7980911 | 2006-06-09 10:14:54 +0200 | [diff] [blame] | 290 | mint= The smallest runtime of the threads in that group. |
| 291 | maxt= The longest runtime of the threads in that group. |
Jens Axboe | ebac465 | 2005-12-08 15:25:21 +0100 | [diff] [blame] | 292 | |
| 293 | And finally, the disk statistics are printed. They will look like this: |
| 294 | |
| 295 | Disk stats (read/write): |
| 296 | sda: ios=16398/16511, merge=30/162, ticks=6853/819634, in_queue=826487, util=100.00% |
| 297 | |
| 298 | Each value is printed for both reads and writes, with reads first. The |
| 299 | numbers denote: |
| 300 | |
| 301 | ios= Number of ios performed by all groups. |
| 302 | merge= Number of merges io the io scheduler. |
| 303 | ticks= Number of ticks we kept the disk busy. |
| 304 | io_queue= Total time spent in the disk queue. |
| 305 | util= The disk utilization. A value of 100% means we kept the disk |
| 306 | busy constantly, 50% would be a disk idling half of the time. |
Jens Axboe | 7980911 | 2006-06-09 10:14:54 +0200 | [diff] [blame] | 307 | |
| 308 | |
| 309 | Author |
| 310 | ------ |
| 311 | |
| 312 | Fio was written by Jens Axboe <axboe@suse.de> to enable flexible testing |
| 313 | of the Linux IO subsystem and schedulers. He got tired of writing |
| 314 | specific test applications to simulate a given workload, and found that |
| 315 | the existing io benchmark/test tools out there weren't flexible enough |
| 316 | to do what he wanted. |
| 317 | |
| 318 | Jens Axboe <axboe@suse.de> 20060609 |
| 319 | |