Mikulas Patocka | fd2ed4d | 2013-08-16 10:54:23 -0400 | [diff] [blame] | 1 | DM statistics |
| 2 | ============= |
| 3 | |
| 4 | Device Mapper supports the collection of I/O statistics on user-defined |
| 5 | regions of a DM device. If no regions are defined no statistics are |
| 6 | collected so there isn't any performance impact. Only bio-based DM |
| 7 | devices are currently supported. |
| 8 | |
| 9 | Each user-defined region specifies a starting sector, length and step. |
| 10 | Individual statistics will be collected for each step-sized area within |
| 11 | the range specified. |
| 12 | |
| 13 | The I/O statistics counters for each step-sized area of a region are |
| 14 | in the same format as /sys/block/*/stat or /proc/diskstats (see: |
| 15 | Documentation/iostats.txt). But two extra counters (12 and 13) are |
Mikulas Patocka | dfcfac3 | 2015-06-09 17:22:05 -0400 | [diff] [blame] | 16 | provided: total time spent reading and writing. When the histogram |
| 17 | argument is used, the 14th parameter is reported that represents the |
| 18 | histogram of latencies. All these counters may be accessed by sending |
| 19 | the @stats_print message to the appropriate DM device via dmsetup. |
Mikulas Patocka | c96aec3 | 2015-06-09 17:21:39 -0400 | [diff] [blame] | 20 | |
| 21 | The reported times are in milliseconds and the granularity depends on |
| 22 | the kernel ticks. When the option precise_timestamps is used, the |
| 23 | reported times are in nanoseconds. |
Mikulas Patocka | fd2ed4d | 2013-08-16 10:54:23 -0400 | [diff] [blame] | 24 | |
| 25 | Each region has a corresponding unique identifier, which we call a |
| 26 | region_id, that is assigned when the region is created. The region_id |
| 27 | must be supplied when querying statistics about the region, deleting the |
| 28 | region, etc. Unique region_ids enable multiple userspace programs to |
| 29 | request and process statistics for the same DM device without stepping |
| 30 | on each other's data. |
| 31 | |
| 32 | The creation of DM statistics will allocate memory via kmalloc or |
| 33 | fallback to using vmalloc space. At most, 1/4 of the overall system |
| 34 | memory may be allocated by DM statistics. The admin can see how much |
| 35 | memory is used by reading |
| 36 | /sys/module/dm_mod/parameters/stats_current_allocated_bytes |
| 37 | |
| 38 | Messages |
| 39 | ======== |
| 40 | |
Mikulas Patocka | c96aec3 | 2015-06-09 17:21:39 -0400 | [diff] [blame] | 41 | @stats_create <range> <step> |
| 42 | [<number_of_optional_arguments> <optional_arguments>...] |
| 43 | [<program_id> [<aux_data>]] |
Mikulas Patocka | fd2ed4d | 2013-08-16 10:54:23 -0400 | [diff] [blame] | 44 | |
| 45 | Create a new region and return the region_id. |
| 46 | |
| 47 | <range> |
| 48 | "-" - whole device |
| 49 | "<start_sector>+<length>" - a range of <length> 512-byte sectors |
| 50 | starting with <start_sector>. |
| 51 | |
| 52 | <step> |
| 53 | "<area_size>" - the range is subdivided into areas each containing |
| 54 | <area_size> sectors. |
| 55 | "/<number_of_areas>" - the range is subdivided into the specified |
| 56 | number of areas. |
| 57 | |
Mikulas Patocka | c96aec3 | 2015-06-09 17:21:39 -0400 | [diff] [blame] | 58 | <number_of_optional_arguments> |
| 59 | The number of optional arguments |
| 60 | |
| 61 | <optional_arguments> |
| 62 | The following optional arguments are supported |
| 63 | precise_timestamps - use precise timer with nanosecond resolution |
| 64 | instead of the "jiffies" variable. When this argument is |
| 65 | used, the resulting times are in nanoseconds instead of |
| 66 | milliseconds. Precise timestamps are a little bit slower |
| 67 | to obtain than jiffies-based timestamps. |
Mikulas Patocka | dfcfac3 | 2015-06-09 17:22:05 -0400 | [diff] [blame] | 68 | histogram:n1,n2,n3,n4,... - collect histogram of latencies. The |
| 69 | numbers n1, n2, etc are times that represent the boundaries |
| 70 | of the histogram. If precise_timestamps is not used, the |
| 71 | times are in milliseconds, otherwise they are in |
| 72 | nanoseconds. For each range, the kernel will report the |
| 73 | number of requests that completed within this range. For |
| 74 | example, if we use "histogram:10,20,30", the kernel will |
| 75 | report four numbers a:b:c:d. a is the number of requests |
| 76 | that took 0-10 ms to complete, b is the number of requests |
| 77 | that took 10-20 ms to complete, c is the number of requests |
| 78 | that took 20-30 ms to complete and d is the number of |
| 79 | requests that took more than 30 ms to complete. |
Mikulas Patocka | c96aec3 | 2015-06-09 17:21:39 -0400 | [diff] [blame] | 80 | |
Mikulas Patocka | fd2ed4d | 2013-08-16 10:54:23 -0400 | [diff] [blame] | 81 | <program_id> |
| 82 | An optional parameter. A name that uniquely identifies |
| 83 | the userspace owner of the range. This groups ranges together |
| 84 | so that userspace programs can identify the ranges they |
| 85 | created and ignore those created by others. |
| 86 | The kernel returns this string back in the output of |
| 87 | @stats_list message, but it doesn't use it for anything else. |
Mikulas Patocka | c96aec3 | 2015-06-09 17:21:39 -0400 | [diff] [blame] | 88 | If we omit the number of optional arguments, program id must not |
| 89 | be a number, otherwise it would be interpreted as the number of |
| 90 | optional arguments. |
Mikulas Patocka | fd2ed4d | 2013-08-16 10:54:23 -0400 | [diff] [blame] | 91 | |
| 92 | <aux_data> |
| 93 | An optional parameter. A word that provides auxiliary data |
| 94 | that is useful to the client program that created the range. |
| 95 | The kernel returns this string back in the output of |
| 96 | @stats_list message, but it doesn't use this value for anything. |
| 97 | |
| 98 | @stats_delete <region_id> |
| 99 | |
| 100 | Delete the region with the specified id. |
| 101 | |
| 102 | <region_id> |
| 103 | region_id returned from @stats_create |
| 104 | |
| 105 | @stats_clear <region_id> |
| 106 | |
| 107 | Clear all the counters except the in-flight i/o counters. |
| 108 | |
| 109 | <region_id> |
| 110 | region_id returned from @stats_create |
| 111 | |
| 112 | @stats_list [<program_id>] |
| 113 | |
| 114 | List all regions registered with @stats_create. |
| 115 | |
| 116 | <program_id> |
| 117 | An optional parameter. |
| 118 | If this parameter is specified, only matching regions |
| 119 | are returned. |
| 120 | If it is not specified, all regions are returned. |
| 121 | |
| 122 | Output format: |
| 123 | <region_id>: <start_sector>+<length> <step> <program_id> <aux_data> |
| 124 | |
| 125 | @stats_print <region_id> [<starting_line> <number_of_lines>] |
| 126 | |
| 127 | Print counters for each step-sized area of a region. |
| 128 | |
| 129 | <region_id> |
| 130 | region_id returned from @stats_create |
| 131 | |
| 132 | <starting_line> |
| 133 | The index of the starting line in the output. |
| 134 | If omitted, all lines are returned. |
| 135 | |
| 136 | <number_of_lines> |
| 137 | The number of lines to include in the output. |
| 138 | If omitted, all lines are returned. |
| 139 | |
| 140 | Output format for each step-sized area of a region: |
| 141 | |
| 142 | <start_sector>+<length> counters |
| 143 | |
| 144 | The first 11 counters have the same meaning as |
| 145 | /sys/block/*/stat or /proc/diskstats. |
| 146 | |
| 147 | Please refer to Documentation/iostats.txt for details. |
| 148 | |
| 149 | 1. the number of reads completed |
| 150 | 2. the number of reads merged |
| 151 | 3. the number of sectors read |
| 152 | 4. the number of milliseconds spent reading |
| 153 | 5. the number of writes completed |
| 154 | 6. the number of writes merged |
| 155 | 7. the number of sectors written |
| 156 | 8. the number of milliseconds spent writing |
| 157 | 9. the number of I/Os currently in progress |
| 158 | 10. the number of milliseconds spent doing I/Os |
| 159 | 11. the weighted number of milliseconds spent doing I/Os |
| 160 | |
| 161 | Additional counters: |
| 162 | 12. the total time spent reading in milliseconds |
| 163 | 13. the total time spent writing in milliseconds |
| 164 | |
| 165 | @stats_print_clear <region_id> [<starting_line> <number_of_lines>] |
| 166 | |
| 167 | Atomically print and then clear all the counters except the |
| 168 | in-flight i/o counters. Useful when the client consuming the |
| 169 | statistics does not want to lose any statistics (those updated |
| 170 | between printing and clearing). |
| 171 | |
| 172 | <region_id> |
| 173 | region_id returned from @stats_create |
| 174 | |
| 175 | <starting_line> |
| 176 | The index of the starting line in the output. |
| 177 | If omitted, all lines are printed and then cleared. |
| 178 | |
| 179 | <number_of_lines> |
| 180 | The number of lines to process. |
| 181 | If omitted, all lines are printed and then cleared. |
| 182 | |
| 183 | @stats_set_aux <region_id> <aux_data> |
| 184 | |
| 185 | Store auxiliary data aux_data for the specified region. |
| 186 | |
| 187 | <region_id> |
| 188 | region_id returned from @stats_create |
| 189 | |
| 190 | <aux_data> |
| 191 | The string that identifies data which is useful to the client |
| 192 | program that created the range. The kernel returns this |
| 193 | string back in the output of @stats_list message, but it |
| 194 | doesn't use this value for anything. |
| 195 | |
| 196 | Examples |
| 197 | ======== |
| 198 | |
| 199 | Subdivide the DM device 'vol' into 100 pieces and start collecting |
| 200 | statistics on them: |
| 201 | |
| 202 | dmsetup message vol 0 @stats_create - /100 |
| 203 | |
| 204 | Set the auxillary data string to "foo bar baz" (the escape for each |
| 205 | space must also be escaped, otherwise the shell will consume them): |
| 206 | |
| 207 | dmsetup message vol 0 @stats_set_aux 0 foo\\ bar\\ baz |
| 208 | |
| 209 | List the statistics: |
| 210 | |
| 211 | dmsetup message vol 0 @stats_list |
| 212 | |
| 213 | Print the statistics: |
| 214 | |
| 215 | dmsetup message vol 0 @stats_print 0 |
| 216 | |
| 217 | Delete the statistics: |
| 218 | |
| 219 | dmsetup message vol 0 @stats_delete 0 |