| Queue sysfs files |
| ================= |
| |
| This text file will detail the queue files that are located in the sysfs tree |
| for each block device. Note that stacked devices typically do not export |
| any settings, since their queue merely functions are a remapping target. |
| These files are the ones found in the /sys/block/xxx/queue/ directory. |
| |
| Files denoted with a RO postfix are readonly and the RW postfix means |
| read-write. |
| |
| add_random (RW) |
| ---------------- |
| This file allows to turn off the disk entropy contribution. Default |
| value of this file is '1'(on). |
| |
| dax (RO) |
| -------- |
| This file indicates whether the device supports Direct Access (DAX), |
| used by CPU-addressable storage to bypass the pagecache. It shows '1' |
| if true, '0' if not. |
| |
| discard_granularity (RO) |
| ----------------------- |
| This shows the size of internal allocation of the device in bytes, if |
| reported by the device. A value of '0' means device does not support |
| the discard functionality. |
| |
| discard_max_hw_bytes (RO) |
| ---------------------- |
| Devices that support discard functionality may have internal limits on |
| the number of bytes that can be trimmed or unmapped in a single operation. |
| The discard_max_bytes parameter is set by the device driver to the maximum |
| number of bytes that can be discarded in a single operation. Discard |
| requests issued to the device must not exceed this limit. A discard_max_bytes |
| value of 0 means that the device does not support discard functionality. |
| |
| discard_max_bytes (RW) |
| ---------------------- |
| While discard_max_hw_bytes is the hardware limit for the device, this |
| setting is the software limit. Some devices exhibit large latencies when |
| large discards are issued, setting this value lower will make Linux issue |
| smaller discards and potentially help reduce latencies induced by large |
| discard operations. |
| |
| discard_zeroes_data (RO) |
| ------------------------ |
| When read, this file will show if the discarded block are zeroed by the |
| device or not. If its value is '1' the blocks are zeroed otherwise not. |
| |
| hw_sector_size (RO) |
| ------------------- |
| This is the hardware sector size of the device, in bytes. |
| |
| io_poll (RW) |
| ------------ |
| When read, this file shows the total number of block IO polls and how |
| many returned success. Writing '0' to this file will disable polling |
| for this device. Writing any non-zero value will enable this feature. |
| |
| io_poll_delay (RW) |
| ------------------ |
| If polling is enabled, this controls what kind of polling will be |
| performed. It defaults to -1, which is classic polling. In this mode, |
| the CPU will repeatedly ask for completions without giving up any time. |
| If set to 0, a hybrid polling mode is used, where the kernel will attempt |
| to make an educated guess at when the IO will complete. Based on this |
| guess, the kernel will put the process issuing IO to sleep for an amount |
| of time, before entering a classic poll loop. This mode might be a |
| little slower than pure classic polling, but it will be more efficient. |
| If set to a value larger than 0, the kernel will put the process issuing |
| IO to sleep for this amont of microseconds before entering classic |
| polling. |
| |
| iostats (RW) |
| ------------- |
| This file is used to control (on/off) the iostats accounting of the |
| disk. |
| |
| logical_block_size (RO) |
| ----------------------- |
| This is the logical block size of the device, in bytes. |
| |
| max_hw_sectors_kb (RO) |
| ---------------------- |
| This is the maximum number of kilobytes supported in a single data transfer. |
| |
| max_integrity_segments (RO) |
| --------------------------- |
| When read, this file shows the max limit of integrity segments as |
| set by block layer which a hardware controller can handle. |
| |
| max_sectors_kb (RW) |
| ------------------- |
| This is the maximum number of kilobytes that the block layer will allow |
| for a filesystem request. Must be smaller than or equal to the maximum |
| size allowed by the hardware. |
| |
| max_segments (RO) |
| ----------------- |
| Maximum number of segments of the device. |
| |
| max_segment_size (RO) |
| --------------------- |
| Maximum segment size of the device. |
| |
| minimum_io_size (RO) |
| -------------------- |
| This is the smallest preferred IO size reported by the device. |
| |
| nomerges (RW) |
| ------------- |
| This enables the user to disable the lookup logic involved with IO |
| merging requests in the block layer. By default (0) all merges are |
| enabled. When set to 1 only simple one-hit merges will be tried. When |
| set to 2 no merge algorithms will be tried (including one-hit or more |
| complex tree/hash lookups). |
| |
| nr_requests (RW) |
| ---------------- |
| This controls how many requests may be allocated in the block layer for |
| read or write requests. Note that the total allocated number may be twice |
| this amount, since it applies only to reads or writes (not the accumulated |
| sum). |
| |
| To avoid priority inversion through request starvation, a request |
| queue maintains a separate request pool per each cgroup when |
| CONFIG_BLK_CGROUP is enabled, and this parameter applies to each such |
| per-block-cgroup request pool. IOW, if there are N block cgroups, |
| each request queue may have up to N request pools, each independently |
| regulated by nr_requests. |
| |
| optimal_io_size (RO) |
| -------------------- |
| This is the optimal IO size reported by the device. |
| |
| physical_block_size (RO) |
| ------------------------ |
| This is the physical block size of device, in bytes. |
| |
| read_ahead_kb (RW) |
| ------------------ |
| Maximum number of kilobytes to read-ahead for filesystems on this block |
| device. |
| |
| rotational (RW) |
| --------------- |
| This file is used to stat if the device is of rotational type or |
| non-rotational type. |
| |
| rq_affinity (RW) |
| ---------------- |
| If this option is '1', the block layer will migrate request completions to the |
| cpu "group" that originally submitted the request. For some workloads this |
| provides a significant reduction in CPU cycles due to caching effects. |
| |
| For storage configurations that need to maximize distribution of completion |
| processing setting this option to '2' forces the completion to run on the |
| requesting cpu (bypassing the "group" aggregation logic). |
| |
| scheduler (RW) |
| -------------- |
| When read, this file will display the current and available IO schedulers |
| for this block device. The currently active IO scheduler will be enclosed |
| in [] brackets. Writing an IO scheduler name to this file will switch |
| control of this block device to that new IO scheduler. Note that writing |
| an IO scheduler name to this file will attempt to load that IO scheduler |
| module, if it isn't already present in the system. |
| |
| write_cache (RW) |
| ---------------- |
| When read, this file will display whether the device has write back |
| caching enabled or not. It will return "write back" for the former |
| case, and "write through" for the latter. Writing to this file can |
| change the kernels view of the device, but it doesn't alter the |
| device state. This means that it might not be safe to toggle the |
| setting from "write back" to "write through", since that will also |
| eliminate cache flushes issued by the kernel. |
| |
| write_same_max_bytes (RO) |
| ------------------------- |
| This is the number of bytes the device can write in a single write-same |
| command. A value of '0' means write-same is not supported by this |
| device. |
| |
| wb_lat_usec (RW) |
| ---------------- |
| If the device is registered for writeback throttling, then this file shows |
| the target minimum read latency. If this latency is exceeded in a given |
| window of time (see wb_window_usec), then the writeback throttling will start |
| scaling back writes. Writing a value of '0' to this file disables the |
| feature. Writing a value of '-1' to this file resets the value to the |
| default setting. |
| |
| |
| Jens Axboe <jens.axboe@oracle.com>, February 2009 |