| fio |
| --- |
| |
| fio is a tool that will spawn a number of threads or processes doing a |
| particular type of io action as specified by the user. fio takes a |
| number of global parameters, each inherited by the thread unless |
| otherwise parameters given to them overriding that setting is given. |
| The typical use of fio is to write a job file matching the io load |
| one wants to simulate. |
| |
| |
| Source |
| ------ |
| |
| fio resides in a git repo, the canonical place is: |
| |
| git://brick.kernel.dk/data/git/fio.git |
| |
| Snapshots are frequently generated and they include the git meta data as |
| well. You can download them here: |
| |
| http://brick.kernel.dk/snaps/ |
| |
| Pascal Bleser <guru@unixtech.be> has fio RPMs in his repository, you |
| can find them here: |
| |
| http://linux01.gwdg.de/~pbleser/rpm-navigation.php?cat=System/fio |
| |
| |
| Building |
| -------- |
| |
| Just type 'make' and 'make install'. If on FreeBSD, for now you have to |
| specify the FreeBSD Makefile with -f, eg: |
| |
| $ make -f Makefile.Freebsd && make -f Makefile.FreeBSD install |
| |
| Likewise with OpenSolaris, use the Makefile.solaris to compile there. |
| This might change in the future if I opt for an autoconf type setup. |
| |
| |
| Command line |
| ------------ |
| |
| $ fio |
| -t <sec> Runtime in seconds |
| -l Generate per-job latency logs |
| -w Generate per-job bandwidth logs |
| -o <file> Log output to file |
| -m Minimal (terse) output |
| -h Print help info |
| -v Print version information and exit |
| |
| Any parameters following the options will be assumed to be job files. |
| You can add as many as you want, each job file will be regarded as a |
| separate group and fio will stonewall it's execution. |
| |
| |
| Job file |
| -------- |
| |
| Only a few options can be controlled with command line parameters, |
| generally it's a lot easier to just write a simple job file to describe |
| the workload. The job file format is in the ini style format, as it's |
| easy to read and write for the user. |
| |
| The job file parameters are: |
| |
| name=x Use 'x' as the identifier for this job. |
| directory=x Use 'x' as the top level directory for storing files |
| rw=x 'x' may be: read, randread, write, randwrite, |
| rw (read-write mix), randrw (read-write random mix) |
| rwmixcycle=x Base cycle for switching between read and write |
| in msecs. |
| rwmixread=x 'x' percentage of rw mix ios will be reads. If |
| rwmixwrite is also given, the last of the two will |
| be used if they don't add up to 100%. |
| rwmixwrite=x 'x' percentage of rw mix ios will be writes. See |
| rwmixread. |
| rand_repeatable=x The sequence of random io blocks can be repeatable |
| across runs, if 'x' is 1. |
| size=x Set file size to x bytes (x string can include k/m/g) |
| ioengine=x 'x' may be: aio/libaio/linuxaio for Linux aio, |
| posixaio for POSIX aio, sync for regular read/write io, |
| mmap for mmap'ed io, splice for using splice/vmsplice, |
| or sgio for direct SG_IO io. The latter only works on |
| Linux on SCSI (or SCSI-like devices, such as |
| usb-storage or sata/libata driven) devices. |
| iodepth=x For async io, allow 'x' ios in flight |
| overwrite=x If 'x', layout a write file first. |
| prio=x Run io at prio X, 0-7 is the kernel allowed range |
| prioclass=x Run io at prio class X |
| bs=x Use 'x' for thread blocksize. May include k/m postfix. |
| bsrange=x-y Mix thread block sizes randomly between x and y. May |
| also include k/m postfix. |
| direct=x 1 for direct IO, 0 for buffered IO |
| thinktime=x "Think" x usec after each io |
| rate=x Throttle rate to x KiB/sec |
| ratemin=x Quit if rate of x KiB/sec can't be met |
| ratecycle=x ratemin averaged over x msecs |
| cpumask=x Only allow job to run on CPUs defined by mask. |
| fsync=x If writing, fsync after every x blocks have been written |
| startdelay=x Start this thread x seconds after startup |
| timeout=x Terminate x seconds after startup. Can include a |
| normal time suffix if not given in seconds, such as |
| 'm' for minutes, 'h' for hours, and 'd' for days. |
| offset=x Start io at offset x (x string can include k/m/g) |
| invalidate=x Invalidate page cache for file prior to doing io |
| sync=x Use sync writes if x and writing |
| mem=x If x == malloc, use malloc for buffers. If x == shm, |
| use shm for buffers. If x == mmap, use anon mmap. |
| exitall When one thread quits, terminate the others |
| bwavgtime=x Average bandwidth stats over an x msec window. |
| create_serialize=x If 'x', serialize file creation. |
| create_fsync=x If 'x', run fsync() after file creation. |
| end_fsync=x If 'x', run fsync() after end-of-job. |
| loops=x Run the job 'x' number of times. |
| verify=x If 'x' == md5, use md5 for verifies. If 'x' == crc32, |
| use crc32 for verifies. md5 is 'safer', but crc32 is |
| a lot faster. Only makes sense for writing to a file. |
| stonewall Wait for preceeding jobs to end before running. |
| numjobs=x Create 'x' similar entries for this job |
| thread Use pthreads instead of forked jobs |
| zonesize=x |
| zoneskip=y Zone options must be paired. If given, the job |
| will skip y bytes for every x read/written. This |
| can be used to gauge hard drive speed over the entire |
| platter, without reading everything. Both x/y can |
| include k/m/g suffix. |
| iolog=x Open and read io pattern from file 'x'. The file must |
| contain one io action per line in the following format: |
| rw, offset, length |
| where with rw=0/1 for read/write, and the offset |
| and length entries being in bytes. |
| write_iolog=x Write an iolog to file 'x' in the same format as iolog. |
| The iolog options are exclusive, if both given the |
| read iolog will be performed. |
| lockmem=x Lock down x amount of memory on the machine, to |
| simulate a machine with less memory available. x can |
| include k/m/g suffix. |
| nice=x Run job at given nice value. |
| exec_prerun=x Run 'x' before job io is begun. |
| exec_postrun=x Run 'x' after job io has finished. |
| ioscheduler=x Use ioscheduler 'x' for this job. |
| |
| |
| Examples using a job file |
| ------------------------- |
| |
| Example 1) Two random readers |
| |
| Lets say we want to simulate two threads reading randomly from a file |
| each. They will be doing IO in 4KiB chunks, using raw (O_DIRECT) IO. |
| Since they share most parameters, we'll put those in the [global] |
| section. Job 1 will use a 128MiB file, job 2 will use a 256MiB file. |
| |
| ; ---snip--- |
| |
| [global] |
| ioengine=sync ; regular read/write(2), the default |
| rw=randread |
| bs=4k |
| direct=1 |
| |
| [file1] |
| size=128m |
| |
| [file2] |
| size=256m |
| |
| ; ---snip--- |
| |
| Generally the [] bracketed name specifies a file name, but the "global" |
| keyword is reserved for setting options that are inherited by each |
| subsequent job description. It's possible to have several [global] |
| sections in the job file, each one adds options that are inherited by |
| jobs defined below it. The name can also point to a block device, such |
| as /dev/sda. To run the above job file, simply do: |
| |
| $ fio jobfile |
| |
| Example 2) Many random writers |
| |
| Say we want to exercise the IO subsystem some more. We'll define 64 |
| threads doing random buffered writes. We'll let each thread use async io |
| with a depth of 4 ios in flight. A job file would then look like this: |
| |
| ; ---snip--- |
| |
| [global] |
| ioengine=libaio |
| iodepth=4 |
| rw=randwrite |
| bs=32k |
| direct=0 |
| size=64m |
| |
| [files] |
| numjobs=64 |
| |
| ; ---snip--- |
| |
| This will create files.[0-63] and perform the random writes to them. |
| |
| There are endless ways to define jobs, the examples/ directory contains |
| a few more examples. |
| |
| |
| Interpreting the output |
| ----------------------- |
| |
| fio spits out a lot of output. While running, fio will display the |
| status of the jobs created. An example of that would be: |
| |
| Threads running: 1: [_r] [24.79% done] [eta 00h:01m:31s] |
| |
| The characters inside the square brackets denote the current status of |
| each thread. The possible values (in typical life cycle order) are: |
| |
| Idle Run |
| ---- --- |
| P Thread setup, but not started. |
| C Thread created. |
| I Thread initialized, waiting. |
| R Running, doing sequential reads. |
| r Running, doing random reads. |
| W Running, doing sequential writes. |
| w Running, doing random writes. |
| M Running, doing mixed sequential reads/writes. |
| m Running, doing mixed random reads/writes. |
| F Running, currently waiting for fsync() |
| V Running, doing verification of written data. |
| E Thread exited, not reaped by main thread yet. |
| _ Thread reaped. |
| |
| The other values are fairly self explanatory - number of threads |
| currently running and doing io, and the estimated completion percentage |
| and time for the running group. It's impossible to estimate runtime |
| of the following groups (if any). |
| |
| When fio is done (or interrupted by ctrl-c), it will show the data for |
| each thread, group of threads, and disks in that order. For each data |
| direction, the output looks like: |
| |
| Client1 (g=0): err= 0: |
| write: io= 32MiB, bw= 666KiB/s, runt= 50320msec |
| slat (msec): min= 0, max= 136, avg= 0.03, dev= 1.92 |
| clat (msec): min= 0, max= 631, avg=48.50, dev=86.82 |
| bw (KiB/s) : min= 0, max= 1196, per=51.00%, avg=664.02, dev=681.68 |
| cpu : usr=1.49%, sys=0.25%, ctx=7969 |
| |
| The client number is printed, along with the group id and error of that |
| thread. Below is the io statistics, here for writes. In the order listed, |
| they denote: |
| |
| io= Number of megabytes io performed |
| bw= Average bandwidth rate |
| runt= The runtime of that thread |
| slat= Submission latency (avg being the average, dev being the |
| standard deviation). This is the time it took to submit |
| the io. For sync io, the slat is really the completion |
| latency, since queue/complete is one operation there. |
| clat= Completion latency. Same names as slat, this denotes the |
| time from submission to completion of the io pieces. For |
| sync io, clat will usually be equal (or very close) to 0, |
| as the time from submit to complete is basically just |
| CPU time (io has already been done, see slat explanation). |
| bw= Bandwidth. Same names as the xlat stats, but also includes |
| an approximate percentage of total aggregate bandwidth |
| this thread received in this group. This last value is |
| only really useful if the threads in this group are on the |
| same disk, since they are then competing for disk access. |
| cpu= CPU usage. User and system time, along with the number |
| of context switches this thread went through. |
| |
| After each client has been listed, the group statistics are printed. They |
| will look like this: |
| |
| Run status group 0 (all jobs): |
| READ: io=64MiB, aggrb=22178, minb=11355, maxb=11814, mint=2840msec, maxt=2955msec |
| WRITE: io=64MiB, aggrb=1302, minb=666, maxb=669, mint=50093msec, maxt=50320msec |
| |
| For each data direction, it prints: |
| |
| io= Number of megabytes io performed. |
| aggrb= Aggregate bandwidth of threads in this group. |
| minb= The minimum average bandwidth a thread saw. |
| maxb= The maximum average bandwidth a thread saw. |
| mint= The smallest runtime of the threads in that group. |
| maxt= The longest runtime of the threads in that group. |
| |
| And finally, the disk statistics are printed. They will look like this: |
| |
| Disk stats (read/write): |
| sda: ios=16398/16511, merge=30/162, ticks=6853/819634, in_queue=826487, util=100.00% |
| |
| Each value is printed for both reads and writes, with reads first. The |
| numbers denote: |
| |
| ios= Number of ios performed by all groups. |
| merge= Number of merges io the io scheduler. |
| ticks= Number of ticks we kept the disk busy. |
| io_queue= Total time spent in the disk queue. |
| util= The disk utilization. A value of 100% means we kept the disk |
| busy constantly, 50% would be a disk idling half of the time. |
| |
| |
| Terse output |
| ------------ |
| |
| For scripted usage where you typically want to generate tables or graphs |
| of the results, fio can output the results in a comma seperated format. |
| The format is one long line of values, such as: |
| |
| client1,0,0,936,331,2894,0,0,0.000000,0.000000,1,170,22.115385,34.290410,16,714,84.252874%,366.500000,566.417819,3496,1237,2894,0,0,0.000000,0.000000,0,246,6.671625,21.436952,0,2534,55.465300%,1406.600000,2008.044216,0.000000%,0.431928%,1109 |
| |
| Split up, the format is as follows: |
| |
| jobname, groupid, error |
| READ status: |
| KiB IO, bandwidth (KiB/sec), runtime (msec) |
| Submission latency: min, max, mean, deviation |
| Completion latency: min, max, mean, deviation |
| Bw: min, max, aggreate percentage of total, mean, deviation |
| WRITE status: |
| KiB IO, bandwidth (KiB/sec), runtime (msec) |
| Submission latency: min, max, mean, deviation |
| Completion latency: min, max, mean, deviation |
| Bw: min, max, aggreate percentage of total, mean, deviation |
| CPU usage: user, system, context switches |
| |
| |
| Author |
| ------ |
| |
| Fio was written by Jens Axboe <axboe@suse.de> to enable flexible testing |
| of the Linux IO subsystem and schedulers. He got tired of writing |
| specific test applications to simulate a given workload, and found that |
| the existing io benchmark/test tools out there weren't flexible enough |
| to do what he wanted. |
| |
| Jens Axboe <axboe@suse.de> 20060609 |
| |