| Jens Axboe | 71bfa16 | 2006-10-25 11:08:19 +0200 | [diff] [blame] | 1 | Table of contents | 
|  | 2 | ----------------- | 
|  | 3 |  | 
|  | 4 | 1. Overview | 
|  | 5 | 2. How fio works | 
|  | 6 | 3. Running fio | 
|  | 7 | 4. Job file format | 
|  | 8 | 5. Detailed list of parameters | 
|  | 9 | 6. Normal output | 
|  | 10 | 7. Terse output | 
|  | 11 |  | 
|  | 12 |  | 
|  | 13 | 1.0 Overview and history | 
|  | 14 | ------------------------ | 
|  | 15 | fio was originally written to save me the hassle of writing special test | 
|  | 16 | case programs when I wanted to test a specific workload, either for | 
|  | 17 | performance reasons or to find/reproduce a bug. The process of writing | 
|  | 18 | such a test app can be tiresome, especially if you have to do it often. | 
|  | 19 | Hence I needed a tool that would be able to simulate a given io workload | 
|  | 20 | without resorting to writing a tailored test case again and again. | 
|  | 21 |  | 
|  | 22 | A test work load is difficult to define, though. There can be any number | 
|  | 23 | of processes or threads involved, and they can each be using their own | 
|  | 24 | way of generating io. You could have someone dirtying large amounts of | 
|  | 25 | memory in an memory mapped file, or maybe several threads issuing | 
|  | 26 | reads using asynchronous io. fio needed to be flexible enough to | 
|  | 27 | simulate both of these cases, and many more. | 
|  | 28 |  | 
|  | 29 | 2.0 How fio works | 
|  | 30 | ----------------- | 
|  | 31 | The first step in getting fio to simulate a desired io workload, is | 
|  | 32 | writing a job file describing that specific setup. A job file may contain | 
|  | 33 | any number of threads and/or files - the typical contents of the job file | 
|  | 34 | is a global section defining shared parameters, and one or more job | 
|  | 35 | sections describing the jobs involved. When run, fio parses this file | 
|  | 36 | and sets everything up as described. If we break down a job from top to | 
|  | 37 | bottom, it contains the following basic parameters: | 
|  | 38 |  | 
|  | 39 | IO type		Defines the io pattern issued to the file(s). | 
|  | 40 | We may only be reading sequentially from this | 
|  | 41 | file(s), or we may be writing randomly. Or even | 
|  | 42 | mixing reads and writes, sequentially or randomly. | 
|  | 43 |  | 
|  | 44 | Block size	In how large chunks are we issuing io? This may be | 
|  | 45 | a single value, or it may describe a range of | 
|  | 46 | block sizes. | 
|  | 47 |  | 
|  | 48 | IO size		How much data are we going to be reading/writing. | 
|  | 49 |  | 
|  | 50 | IO engine	How do we issue io? We could be memory mapping the | 
|  | 51 | file, we could be using regular read/write, we | 
|  | 52 | could be using splice, async io, or even | 
|  | 53 | SG (SCSI generic sg). | 
|  | 54 |  | 
| Jens Axboe | 6c21976 | 2006-11-03 15:51:45 +0100 | [diff] [blame] | 55 | IO depth	If the io engine is async, how large a queuing | 
| Jens Axboe | 71bfa16 | 2006-10-25 11:08:19 +0200 | [diff] [blame] | 56 | depth do we want to maintain? | 
|  | 57 |  | 
|  | 58 | IO type		Should we be doing buffered io, or direct/raw io? | 
|  | 59 |  | 
|  | 60 | Num files	How many files are we spreading the workload over. | 
|  | 61 |  | 
|  | 62 | Num threads	How many threads or processes should we spread | 
|  | 63 | this workload over. | 
|  | 64 |  | 
|  | 65 | The above are the basic parameters defined for a workload, in addition | 
|  | 66 | there's a multitude of parameters that modify other aspects of how this | 
|  | 67 | job behaves. | 
|  | 68 |  | 
|  | 69 |  | 
|  | 70 | 3.0 Running fio | 
|  | 71 | --------------- | 
|  | 72 | See the README file for command line parameters, there are only a few | 
|  | 73 | of them. | 
|  | 74 |  | 
|  | 75 | Running fio is normally the easiest part - you just give it the job file | 
|  | 76 | (or job files) as parameters: | 
|  | 77 |  | 
|  | 78 | $ fio job_file | 
|  | 79 |  | 
|  | 80 | and it will start doing what the job_file tells it to do. You can give | 
|  | 81 | more than one job file on the command line, fio will serialize the running | 
|  | 82 | of those files. Internally that is the same as using the 'stonewall' | 
|  | 83 | parameter described the the parameter section. | 
|  | 84 |  | 
| Jens Axboe | b469282 | 2006-10-27 13:43:22 +0200 | [diff] [blame] | 85 | If the job file contains only one job, you may as well just give the | 
|  | 86 | parameters on the command line. The command line parameters are identical | 
|  | 87 | to the job parameters, with a few extra that control global parameters | 
|  | 88 | (see README). For example, for the job file parameter iodepth=2, the | 
| Jens Axboe | c2b1e75 | 2006-10-30 09:03:13 +0100 | [diff] [blame] | 89 | mirror command line option would be --iodepth 2 or --iodepth=2. You can | 
|  | 90 | also use the command line for giving more than one job entry. For each | 
|  | 91 | --name option that fio sees, it will start a new job with that name. | 
|  | 92 | Command line entries following a --name entry will apply to that job, | 
|  | 93 | until there are no more entries or a new --name entry is seen. This is | 
|  | 94 | similar to the job file options, where each option applies to the current | 
|  | 95 | job until a new [] job entry is seen. | 
| Jens Axboe | b469282 | 2006-10-27 13:43:22 +0200 | [diff] [blame] | 96 |  | 
| Jens Axboe | 71bfa16 | 2006-10-25 11:08:19 +0200 | [diff] [blame] | 97 | fio does not need to run as root, except if the files or devices specified | 
|  | 98 | in the job section requires that. Some other options may also be restricted, | 
| Jens Axboe | 6c21976 | 2006-11-03 15:51:45 +0100 | [diff] [blame] | 99 | such as memory locking, io scheduler switching, and decreasing the nice value. | 
| Jens Axboe | 71bfa16 | 2006-10-25 11:08:19 +0200 | [diff] [blame] | 100 |  | 
|  | 101 |  | 
|  | 102 | 4.0 Job file format | 
|  | 103 | ------------------- | 
|  | 104 | As previously described, fio accepts one or more job files describing | 
|  | 105 | what it is supposed to do. The job file format is the classic ini file, | 
|  | 106 | where the names enclosed in [] brackets define the job name. You are free | 
|  | 107 | to use any ascii name you want, except 'global' which has special meaning. | 
|  | 108 | A global section sets defaults for the jobs described in that file. A job | 
|  | 109 | may override a global section parameter, and a job file may even have | 
|  | 110 | several global sections if so desired. A job is only affected by a global | 
|  | 111 | section residing above it. If the first character in a line is a ';', the | 
|  | 112 | entire line is discarded as a comment. | 
|  | 113 |  | 
|  | 114 | So lets look at a really simple job file that define to threads, each | 
|  | 115 | randomly reading from a 128MiB file. | 
|  | 116 |  | 
|  | 117 | ; -- start job file -- | 
|  | 118 | [global] | 
|  | 119 | rw=randread | 
|  | 120 | size=128m | 
|  | 121 |  | 
|  | 122 | [job1] | 
|  | 123 |  | 
|  | 124 | [job2] | 
|  | 125 |  | 
|  | 126 | ; -- end job file -- | 
|  | 127 |  | 
|  | 128 | As you can see, the job file sections themselves are empty as all the | 
|  | 129 | described parameters are shared. As no filename= option is given, fio | 
| Jens Axboe | c2b1e75 | 2006-10-30 09:03:13 +0100 | [diff] [blame] | 130 | makes up a filename for each of the jobs as it sees fit. On the command | 
|  | 131 | line, this job would look as follows: | 
|  | 132 |  | 
|  | 133 | $ fio --name=global --rw=randread --size=128m --name=job1 --name=job2 | 
|  | 134 |  | 
| Jens Axboe | 71bfa16 | 2006-10-25 11:08:19 +0200 | [diff] [blame] | 135 |  | 
|  | 136 | Lets look at an example that have a number of processes writing randomly | 
|  | 137 | to files. | 
|  | 138 |  | 
|  | 139 | ; -- start job file -- | 
|  | 140 | [random-writers] | 
|  | 141 | ioengine=libaio | 
|  | 142 | iodepth=4 | 
|  | 143 | rw=randwrite | 
|  | 144 | bs=32k | 
|  | 145 | direct=0 | 
|  | 146 | size=64m | 
|  | 147 | numjobs=4 | 
|  | 148 |  | 
|  | 149 | ; -- end job file -- | 
|  | 150 |  | 
|  | 151 | Here we have no global section, as we only have one job defined anyway. | 
|  | 152 | We want to use async io here, with a depth of 4 for each file. We also | 
|  | 153 | increased the buffer size used to 32KiB and define numjobs to 4 to | 
|  | 154 | fork 4 identical jobs. The result is 4 processes each randomly writing | 
| Jens Axboe | b469282 | 2006-10-27 13:43:22 +0200 | [diff] [blame] | 155 | to their own 64MiB file. Instead of using the above job file, you could | 
|  | 156 | have given the parameters on the command line. For this case, you would | 
|  | 157 | specify: | 
|  | 158 |  | 
|  | 159 | $ fio --name=random-writers --ioengine=libaio --iodepth=4 --rw=randwrite --bs=32k --direct=0 --size=64m --numjobs=4 | 
| Jens Axboe | 71bfa16 | 2006-10-25 11:08:19 +0200 | [diff] [blame] | 160 |  | 
|  | 161 | fio ships with a few example job files, you can also look there for | 
|  | 162 | inspiration. | 
|  | 163 |  | 
|  | 164 |  | 
|  | 165 | 5.0 Detailed list of parameters | 
|  | 166 | ------------------------------- | 
|  | 167 |  | 
|  | 168 | This section describes in details each parameter associated with a job. | 
|  | 169 | Some parameters take an option of a given type, such as an integer or | 
|  | 170 | a string. The following types are used: | 
|  | 171 |  | 
|  | 172 | str	String. This is a sequence of alpha characters. | 
|  | 173 | int	Integer. A whole number value, may be negative. | 
|  | 174 | siint	SI integer. A whole number value, which may contain a postfix | 
|  | 175 | describing the base of the number. Accepted postfixes are k/m/g, | 
| Jens Axboe | 6c21976 | 2006-11-03 15:51:45 +0100 | [diff] [blame] | 176 | meaning kilo, mega, and giga. So if you want to specify 4096, | 
| Jens Axboe | 71bfa16 | 2006-10-25 11:08:19 +0200 | [diff] [blame] | 177 | you could either write out '4096' or just give 4k. The postfixes | 
|  | 178 | signify base 2 values, so 1024 is 1k and 1024k is 1m and so on. | 
|  | 179 | bool	Boolean. Usually parsed as an integer, however only defined for | 
|  | 180 | true and false (1 and 0). | 
|  | 181 | irange	Integer range with postfix. Allows value range to be given, such | 
| Jens Axboe | 0c9baf9 | 2007-01-11 15:59:26 +0100 | [diff] [blame^] | 182 | as 1024-4096. A colon may also be used as the seperator, eg | 
|  | 183 | 1k:4k. If the option allows two sets of ranges, they can be | 
|  | 184 | specified with a ',' or '/' delimiter: 1k-4k/8k-32k. Also see | 
|  | 185 | siint. | 
| Jens Axboe | 71bfa16 | 2006-10-25 11:08:19 +0200 | [diff] [blame] | 186 |  | 
|  | 187 | With the above in mind, here follows the complete list of fio job | 
|  | 188 | parameters. | 
|  | 189 |  | 
|  | 190 | name=str	ASCII name of the job. This may be used to override the | 
|  | 191 | name printed by fio for this job. Otherwise the job | 
| Jens Axboe | c2b1e75 | 2006-10-30 09:03:13 +0100 | [diff] [blame] | 192 | name is used. On the command line this parameter has the | 
| Jens Axboe | 6c21976 | 2006-11-03 15:51:45 +0100 | [diff] [blame] | 193 | special purpose of also signaling the start of a new | 
| Jens Axboe | c2b1e75 | 2006-10-30 09:03:13 +0100 | [diff] [blame] | 194 | job. | 
| Jens Axboe | 71bfa16 | 2006-10-25 11:08:19 +0200 | [diff] [blame] | 195 |  | 
|  | 196 | directory=str	Prefix filenames with this directory. Used to places files | 
|  | 197 | in a different location than "./". | 
|  | 198 |  | 
|  | 199 | filename=str	Fio normally makes up a filename based on the job name, | 
|  | 200 | thread number, and file number. If you want to share | 
|  | 201 | files between threads in a job or several jobs, specify | 
|  | 202 | a filename for each of them to override the default. | 
|  | 203 |  | 
|  | 204 | rw=str		Type of io pattern. Accepted values are: | 
|  | 205 |  | 
|  | 206 | read		Sequential reads | 
|  | 207 | write		Sequential writes | 
|  | 208 | randwrite	Random writes | 
|  | 209 | randread	Random reads | 
|  | 210 | rw		Sequential mixed reads and writes | 
|  | 211 | randrw		Random mixed reads and writes | 
|  | 212 |  | 
|  | 213 | For the mixed io types, the default is to split them 50/50. | 
|  | 214 | For certain types of io the result may still be skewed a bit, | 
|  | 215 | since the speed may be different. | 
|  | 216 |  | 
| Jens Axboe | ee73849 | 2007-01-10 11:23:16 +0100 | [diff] [blame] | 217 | randrepeat=bool	For random IO workloads, seed the generator in a predictable | 
|  | 218 | way so that results are repeatable across repetitions. | 
|  | 219 |  | 
| Jens Axboe | 71bfa16 | 2006-10-25 11:08:19 +0200 | [diff] [blame] | 220 | size=siint	The total size of file io for this job. This may describe | 
|  | 221 | the size of the single file the job uses, or it may be | 
|  | 222 | divided between the number of files in the job. If the | 
|  | 223 | file already exists, the file size will be adjusted to this | 
|  | 224 | size if larger than the current file size. If this parameter | 
|  | 225 | is not given and the file exists, the file size will be used. | 
|  | 226 |  | 
| Jens Axboe | f90eff5 | 2006-11-06 11:08:21 +0100 | [diff] [blame] | 227 | bs=siint	The block size used for the io units. Defaults to 4k. Values | 
|  | 228 | can be given for both read and writes. If a single siint is | 
|  | 229 | given, it will apply to both. If a second siint is specified | 
|  | 230 | after a comma, it will apply to writes only. In other words, | 
|  | 231 | the format is either bs=read_and_write or bs=read,write. | 
|  | 232 | bs=4k,8k will thus use 4k blocks for reads, and 8k blocks | 
| Jens Axboe | 787f7e9 | 2006-11-06 13:26:29 +0100 | [diff] [blame] | 233 | for writes. If you only wish to set the write size, you | 
|  | 234 | can do so by passing an empty read size - bs=,8k will set | 
|  | 235 | 8k for writes and leave the read default value. | 
| Jens Axboe | a00735e | 2006-11-03 08:58:08 +0100 | [diff] [blame] | 236 |  | 
| Jens Axboe | 71bfa16 | 2006-10-25 11:08:19 +0200 | [diff] [blame] | 237 | bsrange=irange	Instead of giving a single block size, specify a range | 
|  | 238 | and fio will mix the issued io block sizes. The issued | 
|  | 239 | io unit will always be a multiple of the minimum value | 
| Jens Axboe | f90eff5 | 2006-11-06 11:08:21 +0100 | [diff] [blame] | 240 | given (also see bs_unaligned). Applies to both reads and | 
|  | 241 | writes, however a second range can be given after a comma. | 
|  | 242 | See bs=. | 
| Jens Axboe | a00735e | 2006-11-03 08:58:08 +0100 | [diff] [blame] | 243 |  | 
| Jens Axboe | 690adba | 2006-10-30 15:25:09 +0100 | [diff] [blame] | 244 | bs_unaligned	If this option is given, any byte size value within bsrange | 
|  | 245 | may be used as a block range. This typically wont work with | 
|  | 246 | direct IO, as that normally requires sector alignment. | 
| Jens Axboe | 71bfa16 | 2006-10-25 11:08:19 +0200 | [diff] [blame] | 247 |  | 
|  | 248 | nrfiles=int	Number of files to use for this job. Defaults to 1. | 
|  | 249 |  | 
|  | 250 | ioengine=str	Defines how the job issues io to the file. The following | 
|  | 251 | types are defined: | 
|  | 252 |  | 
|  | 253 | sync	Basic read(2) or write(2) io. lseek(2) is | 
|  | 254 | used to position the io location. | 
|  | 255 |  | 
|  | 256 | libaio	Linux native asynchronous io. | 
|  | 257 |  | 
|  | 258 | posixaio glibc posix asynchronous io. | 
|  | 259 |  | 
|  | 260 | mmap	File is memory mapped and data copied | 
|  | 261 | to/from using memcpy(3). | 
|  | 262 |  | 
|  | 263 | splice	splice(2) is used to transfer the data and | 
|  | 264 | vmsplice(2) to transfer data from user | 
|  | 265 | space to the kernel. | 
|  | 266 |  | 
|  | 267 | sg	SCSI generic sg v3 io. May either be | 
| Jens Axboe | 6c21976 | 2006-11-03 15:51:45 +0100 | [diff] [blame] | 268 | synchronous using the SG_IO ioctl, or if | 
| Jens Axboe | 71bfa16 | 2006-10-25 11:08:19 +0200 | [diff] [blame] | 269 | the target is an sg character device | 
|  | 270 | we use read(2) and write(2) for asynchronous | 
|  | 271 | io. | 
|  | 272 |  | 
| Jens Axboe | a94ea28 | 2006-11-24 12:37:34 +0100 | [diff] [blame] | 273 | null	Doesn't transfer any data, just pretends | 
|  | 274 | to. This is mainly used to exercise fio | 
|  | 275 | itself and for debugging/testing purposes. | 
|  | 276 |  | 
| Jens Axboe | 71bfa16 | 2006-10-25 11:08:19 +0200 | [diff] [blame] | 277 | iodepth=int	This defines how many io units to keep in flight against | 
|  | 278 | the file. The default is 1 for each file defined in this | 
|  | 279 | job, can be overridden with a larger value for higher | 
|  | 280 | concurrency. | 
|  | 281 |  | 
|  | 282 | direct=bool	If value is true, use non-buffered io. This is usually | 
| Jens Axboe | 76a43db | 2007-01-11 13:24:44 +0100 | [diff] [blame] | 283 | O_DIRECT. | 
|  | 284 |  | 
|  | 285 | buffered=bool	If value is true, use buffered io. This is the opposite | 
|  | 286 | of the 'direct' option. Defaults to true. | 
| Jens Axboe | 71bfa16 | 2006-10-25 11:08:19 +0200 | [diff] [blame] | 287 |  | 
|  | 288 | offset=siint	Start io at the given offset in the file. The data before | 
|  | 289 | the given offset will not be touched. This effectively | 
|  | 290 | caps the file size at real_size - offset. | 
|  | 291 |  | 
|  | 292 | fsync=int	If writing to a file, issue a sync of the dirty data | 
|  | 293 | for every number of blocks given. For example, if you give | 
|  | 294 | 32 as a parameter, fio will sync the file for every 32 | 
|  | 295 | writes issued. If fio is using non-buffered io, we may | 
|  | 296 | not sync the file. The exception is the sg io engine, which | 
| Jens Axboe | 6c21976 | 2006-11-03 15:51:45 +0100 | [diff] [blame] | 297 | synchronizes the disk cache anyway. | 
| Jens Axboe | 71bfa16 | 2006-10-25 11:08:19 +0200 | [diff] [blame] | 298 |  | 
|  | 299 | overwrite=bool	If writing to a file, setup the file first and do overwrites. | 
|  | 300 |  | 
|  | 301 | end_fsync=bool	If true, fsync file contents when the job exits. | 
|  | 302 |  | 
| Jens Axboe | 6c21976 | 2006-11-03 15:51:45 +0100 | [diff] [blame] | 303 | rwmixcycle=int	Value in milliseconds describing how often to switch between | 
| Jens Axboe | 71bfa16 | 2006-10-25 11:08:19 +0200 | [diff] [blame] | 304 | reads and writes for a mixed workload. The default is | 
|  | 305 | 500 msecs. | 
|  | 306 |  | 
|  | 307 | rwmixread=int	How large a percentage of the mix should be reads. | 
|  | 308 |  | 
|  | 309 | rwmixwrite=int	How large a percentage of the mix should be writes. If both | 
|  | 310 | rwmixread and rwmixwrite is given and the values do not add | 
|  | 311 | up to 100%, the latter of the two will be used to override | 
|  | 312 | the first. | 
|  | 313 |  | 
| Jens Axboe | bb8895e | 2006-10-30 15:14:48 +0100 | [diff] [blame] | 314 | norandommap	Normally fio will cover every block of the file when doing | 
|  | 315 | random IO. If this option is given, fio will just get a | 
|  | 316 | new random offset without looking at past io history. This | 
|  | 317 | means that some blocks may not be read or written, and that | 
|  | 318 | some blocks may be read/written more than once. This option | 
|  | 319 | is mutually exclusive with verify= for that reason. | 
|  | 320 |  | 
| Jens Axboe | 71bfa16 | 2006-10-25 11:08:19 +0200 | [diff] [blame] | 321 | nice=int	Run the job with the given nice value. See man nice(2). | 
|  | 322 |  | 
|  | 323 | prio=int	Set the io priority value of this job. Linux limits us to | 
|  | 324 | a positive value between 0 and 7, with 0 being the highest. | 
|  | 325 | See man ionice(1). | 
|  | 326 |  | 
|  | 327 | prioclass=int	Set the io priority class. See man ionice(1). | 
|  | 328 |  | 
|  | 329 | thinktime=int	Stall the job x microseconds after an io has completed before | 
|  | 330 | issuing the next. May be used to simulate processing being | 
| Jens Axboe | 9c1f743 | 2007-01-03 20:43:19 +0100 | [diff] [blame] | 331 | done by an application. See thinktime_blocks. | 
|  | 332 |  | 
|  | 333 | thinktime_blocks | 
|  | 334 | Only valid if thinktime is set - control how many blocks | 
|  | 335 | to issue, before waiting 'thinktime' usecs. If not set, | 
|  | 336 | defaults to 1 which will make fio wait 'thinktime' usecs | 
|  | 337 | after every block. | 
| Jens Axboe | 71bfa16 | 2006-10-25 11:08:19 +0200 | [diff] [blame] | 338 |  | 
|  | 339 | rate=int	Cap the bandwidth used by this job to this number of KiB/sec. | 
|  | 340 |  | 
|  | 341 | ratemin=int	Tell fio to do whatever it can to maintain at least this | 
|  | 342 | bandwidth. | 
|  | 343 |  | 
|  | 344 | ratecycle=int	Average bandwidth for 'rate' and 'ratemin' over this number | 
| Jens Axboe | 6c21976 | 2006-11-03 15:51:45 +0100 | [diff] [blame] | 345 | of milliseconds. | 
| Jens Axboe | 71bfa16 | 2006-10-25 11:08:19 +0200 | [diff] [blame] | 346 |  | 
|  | 347 | cpumask=int	Set the CPU affinity of this job. The parameter given is a | 
|  | 348 | bitmask of allowed CPU's the job may run on. See man | 
|  | 349 | sched_setaffinity(2). | 
|  | 350 |  | 
|  | 351 | startdelay=int	Start this job the specified number of seconds after fio | 
|  | 352 | has started. Only useful if the job file contains several | 
|  | 353 | jobs, and you want to delay starting some jobs to a certain | 
|  | 354 | time. | 
|  | 355 |  | 
| Jens Axboe | 03b74b3 | 2007-01-11 11:04:31 +0100 | [diff] [blame] | 356 | runtime=int	Tell fio to terminate processing after the specified number | 
| Jens Axboe | 71bfa16 | 2006-10-25 11:08:19 +0200 | [diff] [blame] | 357 | of seconds. It can be quite hard to determine for how long | 
|  | 358 | a specified job will run, so this parameter is handy to | 
|  | 359 | cap the total runtime to a given time. | 
|  | 360 |  | 
|  | 361 | invalidate=bool	Invalidate the buffer/page cache parts for this file prior | 
|  | 362 | to starting io. Defaults to true. | 
|  | 363 |  | 
|  | 364 | sync=bool	Use sync io for buffered writes. For the majority of the | 
|  | 365 | io engines, this means using O_SYNC. | 
|  | 366 |  | 
|  | 367 | mem=str		Fio can use various types of memory as the io unit buffer. | 
|  | 368 | The allowed values are: | 
|  | 369 |  | 
|  | 370 | malloc	Use memory from malloc(3) as the buffers. | 
|  | 371 |  | 
|  | 372 | shm	Use shared memory as the buffers. Allocated | 
|  | 373 | through shmget(2). | 
|  | 374 |  | 
| Jens Axboe | 74b025b | 2006-12-19 15:18:14 +0100 | [diff] [blame] | 375 | shmhuge	Same as shm, but use huge pages as backing. | 
|  | 376 |  | 
| Jens Axboe | 313cb20 | 2006-12-21 09:50:00 +0100 | [diff] [blame] | 377 | mmap	Use mmap to allocate buffers. May either be | 
|  | 378 | anonymous memory, or can be file backed if | 
|  | 379 | a filename is given after the option. The | 
|  | 380 | format is mem=mmap:/path/to/file. | 
| Jens Axboe | 71bfa16 | 2006-10-25 11:08:19 +0200 | [diff] [blame] | 381 |  | 
| Jens Axboe | d0bdaf4 | 2006-12-20 14:40:44 +0100 | [diff] [blame] | 382 | mmaphuge Use a memory mapped huge file as the buffer | 
|  | 383 | backing. Append filename after mmaphuge, ala | 
|  | 384 | mem=mmaphuge:/hugetlbfs/file | 
|  | 385 |  | 
| Jens Axboe | 71bfa16 | 2006-10-25 11:08:19 +0200 | [diff] [blame] | 386 | The area allocated is a function of the maximum allowed | 
| Jens Axboe | 5394ae5 | 2006-12-20 20:15:41 +0100 | [diff] [blame] | 387 | bs size for the job, multiplied by the io depth given. Note | 
|  | 388 | that for shmhuge and mmaphuge to work, the system must have | 
|  | 389 | free huge pages allocated. This can normally be checked | 
|  | 390 | and set by reading/writing /proc/sys/vm/nr_hugepages on a | 
|  | 391 | Linux system. Fio assumes a huge page is 4MiB in size. So | 
|  | 392 | to calculate the number of huge pages you need for a given | 
|  | 393 | job file, add up the io depth of all jobs (normally one unless | 
|  | 394 | iodepth= is used) and multiply by the maximum bs set. Then | 
|  | 395 | divide that number by the huge page size. You can see the | 
|  | 396 | size of the huge pages in /proc/meminfo. If no huge pages | 
|  | 397 | are allocated by having a non-zero number in nr_hugepages, | 
| Jens Axboe | 56bb17f | 2006-12-20 20:27:36 +0100 | [diff] [blame] | 398 | using mmaphuge or shmhuge will fail. Also see hugepage-size. | 
| Jens Axboe | 5394ae5 | 2006-12-20 20:15:41 +0100 | [diff] [blame] | 399 |  | 
|  | 400 | mmaphuge also needs to have hugetlbfs mounted and the file | 
|  | 401 | location should point there. So if it's mounted in /huge, | 
|  | 402 | you would use mem=mmaphuge:/huge/somefile. | 
| Jens Axboe | 71bfa16 | 2006-10-25 11:08:19 +0200 | [diff] [blame] | 403 |  | 
| Jens Axboe | 56bb17f | 2006-12-20 20:27:36 +0100 | [diff] [blame] | 404 | hugepage-size=siint | 
|  | 405 | Defines the size of a huge page. Must at least be equal | 
|  | 406 | to the system setting, see /proc/meminfo. Defaults to 4MiB. | 
| Jens Axboe | c51074e | 2006-12-20 20:28:33 +0100 | [diff] [blame] | 407 | Should probably always be a multiple of megabytes, so using | 
|  | 408 | hugepage-size=Xm is the preferred way to set this to avoid | 
|  | 409 | setting a non-pow-2 bad value. | 
| Jens Axboe | 56bb17f | 2006-12-20 20:27:36 +0100 | [diff] [blame] | 410 |  | 
| Jens Axboe | 71bfa16 | 2006-10-25 11:08:19 +0200 | [diff] [blame] | 411 | exitall		When one job finishes, terminate the rest. The default is | 
|  | 412 | to wait for each job to finish, sometimes that is not the | 
|  | 413 | desired action. | 
|  | 414 |  | 
|  | 415 | bwavgtime=int	Average the calculated bandwidth over the given time. Value | 
| Jens Axboe | 6c21976 | 2006-11-03 15:51:45 +0100 | [diff] [blame] | 416 | is specified in milliseconds. | 
| Jens Axboe | 71bfa16 | 2006-10-25 11:08:19 +0200 | [diff] [blame] | 417 |  | 
|  | 418 | create_serialize=bool	If true, serialize the file creating for the jobs. | 
|  | 419 | This may be handy to avoid interleaving of data | 
|  | 420 | files, which may greatly depend on the filesystem | 
|  | 421 | used and even the number of processors in the system. | 
|  | 422 |  | 
|  | 423 | create_fsync=bool	fsync the data file after creation. This is the | 
|  | 424 | default. | 
|  | 425 |  | 
| Jens Axboe | 8aeebd5 | 2007-01-08 10:47:43 +0100 | [diff] [blame] | 426 | unlink=bool	Unlink the job files when done. fio defaults to doing this, | 
| Jens Axboe | 71bfa16 | 2006-10-25 11:08:19 +0200 | [diff] [blame] | 427 | if it created the file itself. | 
|  | 428 |  | 
|  | 429 | loops=int	Run the specified number of iterations of this job. Used | 
|  | 430 | to repeat the same workload a given number of times. Defaults | 
|  | 431 | to 1. | 
|  | 432 |  | 
|  | 433 | verify=str	If writing to a file, fio can verify the file contents | 
|  | 434 | after each iteration of the job. The allowed values are: | 
|  | 435 |  | 
|  | 436 | md5	Use an md5 sum of the data area and store | 
|  | 437 | it in the header of each block. | 
|  | 438 |  | 
|  | 439 | crc32	Use a crc32 sum of the data area and store | 
|  | 440 | it in the header of each block. | 
|  | 441 |  | 
| Jens Axboe | 6c21976 | 2006-11-03 15:51:45 +0100 | [diff] [blame] | 442 | This option can be used for repeated burn-in tests of a | 
| Jens Axboe | 71bfa16 | 2006-10-25 11:08:19 +0200 | [diff] [blame] | 443 | system to make sure that the written data is also | 
|  | 444 | correctly read back. | 
|  | 445 |  | 
|  | 446 | stonewall	Wait for preceeding jobs in the job file to exit, before | 
|  | 447 | starting this one. Can be used to insert serialization | 
|  | 448 | points in the job file. | 
|  | 449 |  | 
|  | 450 | numjobs=int	Create the specified number of clones of this job. May be | 
|  | 451 | used to setup a larger number of threads/processes doing | 
|  | 452 | the same thing. | 
|  | 453 |  | 
|  | 454 | thread		fio defaults to forking jobs, however if this option is | 
|  | 455 | given, fio will use pthread_create(3) to create threads | 
|  | 456 | instead. | 
|  | 457 |  | 
|  | 458 | zonesize=siint	Divide a file into zones of the specified size. See zoneskip. | 
|  | 459 |  | 
|  | 460 | zoneskip=siint	Skip the specified number of bytes when zonesize data has | 
|  | 461 | been read. The two zone options can be used to only do | 
|  | 462 | io on zones of a file. | 
|  | 463 |  | 
| Jens Axboe | 076efc7 | 2006-10-27 11:24:25 +0200 | [diff] [blame] | 464 | write_iolog=str	Write the issued io patterns to the specified file. See | 
|  | 465 | read_iolog. | 
| Jens Axboe | 71bfa16 | 2006-10-25 11:08:19 +0200 | [diff] [blame] | 466 |  | 
| Jens Axboe | 076efc7 | 2006-10-27 11:24:25 +0200 | [diff] [blame] | 467 | read_iolog=str	Open an iolog with the specified file name and replay the | 
| Jens Axboe | 71bfa16 | 2006-10-25 11:08:19 +0200 | [diff] [blame] | 468 | io patterns it contains. This can be used to store a | 
|  | 469 | workload and replay it sometime later. | 
|  | 470 |  | 
|  | 471 | write_bw_log	If given, write a bandwidth log of the jobs in this job | 
|  | 472 | file. Can be used to store data of the bandwidth of the | 
| Jens Axboe | e0da9bc | 2006-10-25 13:08:57 +0200 | [diff] [blame] | 473 | jobs in their lifetime. The included fio_generate_plots | 
|  | 474 | script uses gnuplot to turn these text files into nice | 
|  | 475 | graphs. | 
| Jens Axboe | 71bfa16 | 2006-10-25 11:08:19 +0200 | [diff] [blame] | 476 |  | 
|  | 477 | write_lat_log	Same as write_bw_log, except that this option stores io | 
|  | 478 | completion latencies instead. | 
|  | 479 |  | 
|  | 480 | lockmem=siint	Pin down the specified amount of memory with mlock(2). Can | 
|  | 481 | potentially be used instead of removing memory or booting | 
|  | 482 | with less memory to simulate a smaller amount of memory. | 
|  | 483 |  | 
|  | 484 | exec_prerun=str	Before running this job, issue the command specified | 
|  | 485 | through system(3). | 
|  | 486 |  | 
|  | 487 | exec_postrun=str After the job completes, issue the command specified | 
|  | 488 | though system(3). | 
|  | 489 |  | 
|  | 490 | ioscheduler=str	Attempt to switch the device hosting the file to the specified | 
|  | 491 | io scheduler before running. | 
|  | 492 |  | 
|  | 493 | cpuload=int	If the job is a CPU cycle eater, attempt to use the specified | 
|  | 494 | percentage of CPU cycles. | 
|  | 495 |  | 
|  | 496 | cpuchunks=int	If the job is a CPU cycle eater, split the load into | 
| Jens Axboe | 6c21976 | 2006-11-03 15:51:45 +0100 | [diff] [blame] | 497 | cycles of the given time. In milliseconds. | 
| Jens Axboe | 71bfa16 | 2006-10-25 11:08:19 +0200 | [diff] [blame] | 498 |  | 
|  | 499 |  | 
|  | 500 | 6.0 Interpreting the output | 
|  | 501 | --------------------------- | 
|  | 502 |  | 
|  | 503 | fio spits out a lot of output. While running, fio will display the | 
|  | 504 | status of the jobs created. An example of that would be: | 
|  | 505 |  | 
| Jens Axboe | 6043c57 | 2006-11-03 11:37:47 +0100 | [diff] [blame] | 506 | Threads running: 1: [_r] [24.79% done] [ 13509/  8334 kb/s] [eta 00h:01m:31s] | 
| Jens Axboe | 71bfa16 | 2006-10-25 11:08:19 +0200 | [diff] [blame] | 507 |  | 
|  | 508 | The characters inside the square brackets denote the current status of | 
|  | 509 | each thread. The possible values (in typical life cycle order) are: | 
|  | 510 |  | 
|  | 511 | Idle	Run | 
|  | 512 | ----    --- | 
|  | 513 | P		Thread setup, but not started. | 
|  | 514 | C		Thread created. | 
|  | 515 | I		Thread initialized, waiting. | 
|  | 516 | R	Running, doing sequential reads. | 
|  | 517 | r	Running, doing random reads. | 
|  | 518 | W	Running, doing sequential writes. | 
|  | 519 | w	Running, doing random writes. | 
|  | 520 | M	Running, doing mixed sequential reads/writes. | 
|  | 521 | m	Running, doing mixed random reads/writes. | 
|  | 522 | F	Running, currently waiting for fsync() | 
|  | 523 | V		Running, doing verification of written data. | 
|  | 524 | E		Thread exited, not reaped by main thread yet. | 
|  | 525 | _		Thread reaped. | 
|  | 526 |  | 
|  | 527 | The other values are fairly self explanatory - number of threads | 
| Jens Axboe | 6043c57 | 2006-11-03 11:37:47 +0100 | [diff] [blame] | 528 | currently running and doing io, rate of io since last check, and the estimated | 
|  | 529 | completion percentage and time for the running group. It's impossible to | 
|  | 530 | estimate runtime of the following groups (if any). | 
| Jens Axboe | 71bfa16 | 2006-10-25 11:08:19 +0200 | [diff] [blame] | 531 |  | 
|  | 532 | When fio is done (or interrupted by ctrl-c), it will show the data for | 
|  | 533 | each thread, group of threads, and disks in that order. For each data | 
|  | 534 | direction, the output looks like: | 
|  | 535 |  | 
|  | 536 | Client1 (g=0): err= 0: | 
|  | 537 | write: io=    32MiB, bw=   666KiB/s, runt= 50320msec | 
| Jens Axboe | 6104ddb | 2007-01-11 14:24:29 +0100 | [diff] [blame] | 538 | slat (msec): min=    0, max=  136, avg= 0.03, stdev= 1.92 | 
|  | 539 | clat (msec): min=    0, max=  631, avg=48.50, stdev=86.82 | 
|  | 540 | bw (KiB/s) : min=    0, max= 1196, per=51.00%, avg=664.02, stdev=681.68 | 
| Jens Axboe | 71bfa16 | 2006-10-25 11:08:19 +0200 | [diff] [blame] | 541 | cpu        : usr=1.49%, sys=0.25%, ctx=7969 | 
|  | 542 |  | 
|  | 543 | The client number is printed, along with the group id and error of that | 
|  | 544 | thread. Below is the io statistics, here for writes. In the order listed, | 
|  | 545 | they denote: | 
|  | 546 |  | 
|  | 547 | io=		Number of megabytes io performed | 
|  | 548 | bw=		Average bandwidth rate | 
|  | 549 | runt=		The runtime of that thread | 
|  | 550 | slat=	Submission latency (avg being the average, dev being the | 
|  | 551 | standard deviation). This is the time it took to submit | 
|  | 552 | the io. For sync io, the slat is really the completion | 
|  | 553 | latency, since queue/complete is one operation there. | 
|  | 554 | clat=	Completion latency. Same names as slat, this denotes the | 
|  | 555 | time from submission to completion of the io pieces. For | 
|  | 556 | sync io, clat will usually be equal (or very close) to 0, | 
|  | 557 | as the time from submit to complete is basically just | 
|  | 558 | CPU time (io has already been done, see slat explanation). | 
|  | 559 | bw=	Bandwidth. Same names as the xlat stats, but also includes | 
|  | 560 | an approximate percentage of total aggregate bandwidth | 
|  | 561 | this thread received in this group. This last value is | 
|  | 562 | only really useful if the threads in this group are on the | 
|  | 563 | same disk, since they are then competing for disk access. | 
|  | 564 | cpu=		CPU usage. User and system time, along with the number | 
|  | 565 | of context switches this thread went through. | 
|  | 566 |  | 
|  | 567 | After each client has been listed, the group statistics are printed. They | 
|  | 568 | will look like this: | 
|  | 569 |  | 
|  | 570 | Run status group 0 (all jobs): | 
|  | 571 | READ: io=64MiB, aggrb=22178, minb=11355, maxb=11814, mint=2840msec, maxt=2955msec | 
|  | 572 | WRITE: io=64MiB, aggrb=1302, minb=666, maxb=669, mint=50093msec, maxt=50320msec | 
|  | 573 |  | 
|  | 574 | For each data direction, it prints: | 
|  | 575 |  | 
|  | 576 | io=		Number of megabytes io performed. | 
|  | 577 | aggrb=		Aggregate bandwidth of threads in this group. | 
|  | 578 | minb=		The minimum average bandwidth a thread saw. | 
|  | 579 | maxb=		The maximum average bandwidth a thread saw. | 
|  | 580 | mint=		The smallest runtime of the threads in that group. | 
|  | 581 | maxt=		The longest runtime of the threads in that group. | 
|  | 582 |  | 
|  | 583 | And finally, the disk statistics are printed. They will look like this: | 
|  | 584 |  | 
|  | 585 | Disk stats (read/write): | 
|  | 586 | sda: ios=16398/16511, merge=30/162, ticks=6853/819634, in_queue=826487, util=100.00% | 
|  | 587 |  | 
|  | 588 | Each value is printed for both reads and writes, with reads first. The | 
|  | 589 | numbers denote: | 
|  | 590 |  | 
|  | 591 | ios=		Number of ios performed by all groups. | 
|  | 592 | merge=		Number of merges io the io scheduler. | 
|  | 593 | ticks=		Number of ticks we kept the disk busy. | 
|  | 594 | io_queue=	Total time spent in the disk queue. | 
|  | 595 | util=		The disk utilization. A value of 100% means we kept the disk | 
|  | 596 | busy constantly, 50% would be a disk idling half of the time. | 
|  | 597 |  | 
|  | 598 |  | 
|  | 599 | 7.0 Terse output | 
|  | 600 | ---------------- | 
|  | 601 |  | 
|  | 602 | For scripted usage where you typically want to generate tables or graphs | 
| Jens Axboe | 6c21976 | 2006-11-03 15:51:45 +0100 | [diff] [blame] | 603 | of the results, fio can output the results in a comma separated format. | 
| Jens Axboe | 71bfa16 | 2006-10-25 11:08:19 +0200 | [diff] [blame] | 604 | The format is one long line of values, such as: | 
|  | 605 |  | 
|  | 606 | client1,0,0,936,331,2894,0,0,0.000000,0.000000,1,170,22.115385,34.290410,16,714,84.252874%,366.500000,566.417819,3496,1237,2894,0,0,0.000000,0.000000,0,246,6.671625,21.436952,0,2534,55.465300%,1406.600000,2008.044216,0.000000%,0.431928%,1109 | 
|  | 607 |  | 
|  | 608 | Split up, the format is as follows: | 
|  | 609 |  | 
|  | 610 | jobname, groupid, error | 
|  | 611 | READ status: | 
|  | 612 | KiB IO, bandwidth (KiB/sec), runtime (msec) | 
|  | 613 | Submission latency: min, max, mean, deviation | 
|  | 614 | Completion latency: min, max, mean, deviation | 
| Jens Axboe | 6c21976 | 2006-11-03 15:51:45 +0100 | [diff] [blame] | 615 | Bw: min, max, aggregate percentage of total, mean, deviation | 
| Jens Axboe | 71bfa16 | 2006-10-25 11:08:19 +0200 | [diff] [blame] | 616 | WRITE status: | 
|  | 617 | KiB IO, bandwidth (KiB/sec), runtime (msec) | 
|  | 618 | Submission latency: min, max, mean, deviation | 
|  | 619 | Completion latency: min, max, mean, deviation | 
| Jens Axboe | 6c21976 | 2006-11-03 15:51:45 +0100 | [diff] [blame] | 620 | Bw: min, max, aggregate percentage of total, mean, deviation | 
| Jens Axboe | 71bfa16 | 2006-10-25 11:08:19 +0200 | [diff] [blame] | 621 | CPU usage: user, system, context switches | 
|  | 622 |  |