Jonathan Brassow | c0a2fa1 | 2011-08-02 12:32:06 +0100 | [diff] [blame] | 1 | dm-raid |
| 2 | ------- |
NeilBrown | 9d09e66 | 2011-01-13 20:00:02 +0000 | [diff] [blame] | 3 | |
Jonathan Brassow | c0a2fa1 | 2011-08-02 12:32:06 +0100 | [diff] [blame] | 4 | The device-mapper RAID (dm-raid) target provides a bridge from DM to MD. |
| 5 | It allows the MD RAID drivers to be accessed using a device-mapper |
| 6 | interface. |
NeilBrown | 9d09e66 | 2011-01-13 20:00:02 +0000 | [diff] [blame] | 7 | |
Jonathan Brassow | c0a2fa1 | 2011-08-02 12:32:06 +0100 | [diff] [blame] | 8 | The target is named "raid" and it accepts the following parameters: |
NeilBrown | 9d09e66 | 2011-01-13 20:00:02 +0000 | [diff] [blame] | 9 | |
Jonathan Brassow | c0a2fa1 | 2011-08-02 12:32:06 +0100 | [diff] [blame] | 10 | <raid_type> <#raid_params> <raid_params> \ |
| 11 | <#raid_devs> <metadata_dev0> <dev0> [.. <metadata_devN> <devN>] |
NeilBrown | 9d09e66 | 2011-01-13 20:00:02 +0000 | [diff] [blame] | 12 | |
Jonathan Brassow | c0a2fa1 | 2011-08-02 12:32:06 +0100 | [diff] [blame] | 13 | <raid_type>: |
Jonathan Brassow | b12d437 | 2011-08-02 12:32:07 +0100 | [diff] [blame] | 14 | raid1 RAID1 mirroring |
Jonathan Brassow | c0a2fa1 | 2011-08-02 12:32:06 +0100 | [diff] [blame] | 15 | raid4 RAID4 dedicated parity disk |
| 16 | raid5_la RAID5 left asymmetric |
| 17 | - rotating parity 0 with data continuation |
| 18 | raid5_ra RAID5 right asymmetric |
| 19 | - rotating parity N with data continuation |
| 20 | raid5_ls RAID5 left symmetric |
| 21 | - rotating parity 0 with data restart |
| 22 | raid5_rs RAID5 right symmetric |
| 23 | - rotating parity N with data restart |
| 24 | raid6_zr RAID6 zero restart |
| 25 | - rotating parity zero (left-to-right) with data restart |
| 26 | raid6_nr RAID6 N restart |
| 27 | - rotating parity N (right-to-left) with data restart |
| 28 | raid6_nc RAID6 N continue |
| 29 | - rotating parity N (right-to-left) with data continuation |
Jonathan Brassow | 63f33b8d | 2012-07-31 21:44:26 -0500 | [diff] [blame] | 30 | raid10 Various RAID10 inspired algorithms chosen by additional params |
| 31 | - RAID10: Striped Mirrors (aka 'Striping on top of mirrors') |
| 32 | - RAID1E: Integrated Adjacent Stripe Mirroring |
Jonathan Brassow | fe5d2f4 | 2013-02-21 13:28:10 +1100 | [diff] [blame] | 33 | - RAID1E: Integrated Offset Stripe Mirroring |
Jonathan Brassow | 63f33b8d | 2012-07-31 21:44:26 -0500 | [diff] [blame] | 34 | - and other similar RAID10 variants |
NeilBrown | 9d09e66 | 2011-01-13 20:00:02 +0000 | [diff] [blame] | 35 | |
Masanari Iida | 40e4712 | 2012-03-04 23:16:11 +0900 | [diff] [blame] | 36 | Reference: Chapter 4 of |
Jonathan Brassow | c0a2fa1 | 2011-08-02 12:32:06 +0100 | [diff] [blame] | 37 | http://www.snia.org/sites/default/files/SNIA_DDF_Technical_Position_v2.0.pdf |
NeilBrown | 9d09e66 | 2011-01-13 20:00:02 +0000 | [diff] [blame] | 38 | |
Jonathan Brassow | c0a2fa1 | 2011-08-02 12:32:06 +0100 | [diff] [blame] | 39 | <#raid_params>: The number of parameters that follow. |
NeilBrown | 9d09e66 | 2011-01-13 20:00:02 +0000 | [diff] [blame] | 40 | |
Jonathan Brassow | c0a2fa1 | 2011-08-02 12:32:06 +0100 | [diff] [blame] | 41 | <raid_params> consists of |
| 42 | Mandatory parameters: |
| 43 | <chunk_size>: Chunk size in sectors. This parameter is often known as |
| 44 | "stripe size". It is the only mandatory parameter and |
| 45 | is placed first. |
| 46 | |
| 47 | followed by optional parameters (in any order): |
| 48 | [sync|nosync] Force or prevent RAID initialization. |
| 49 | |
| 50 | [rebuild <idx>] Rebuild drive number idx (first drive is 0). |
| 51 | |
| 52 | [daemon_sleep <ms>] |
| 53 | Interval between runs of the bitmap daemon that |
| 54 | clear bits. A longer interval means less bitmap I/O but |
| 55 | resyncing after a failure is likely to take longer. |
| 56 | |
| 57 | [min_recovery_rate <kB/sec/disk>] Throttle RAID initialization |
| 58 | [max_recovery_rate <kB/sec/disk>] Throttle RAID initialization |
Jonathan Brassow | 46bed2b | 2011-08-02 12:32:07 +0100 | [diff] [blame] | 59 | [write_mostly <idx>] Drive index is write-mostly |
Jonathan Brassow | c0a2fa1 | 2011-08-02 12:32:06 +0100 | [diff] [blame] | 60 | [max_write_behind <sectors>] See '-write-behind=' (man mdadm) |
| 61 | [stripe_cache <sectors>] Stripe cache size (higher RAIDs only) |
Jonathan Brassow | c108456 | 2011-08-02 12:32:07 +0100 | [diff] [blame] | 62 | [region_size <sectors>] |
| 63 | The region_size multiplied by the number of regions is the |
| 64 | logical size of the array. The bitmap records the device |
| 65 | synchronisation state for each region. |
Jonathan Brassow | c0a2fa1 | 2011-08-02 12:32:06 +0100 | [diff] [blame] | 66 | |
Jonathan Brassow | 63f33b8d | 2012-07-31 21:44:26 -0500 | [diff] [blame] | 67 | [raid10_copies <# copies>] |
Jonathan Brassow | fe5d2f4 | 2013-02-21 13:28:10 +1100 | [diff] [blame] | 68 | [raid10_format <near|far|offset>] |
Jonathan Brassow | 63f33b8d | 2012-07-31 21:44:26 -0500 | [diff] [blame] | 69 | These two options are used to alter the default layout of |
| 70 | a RAID10 configuration. The number of copies is can be |
Jonathan Brassow | fe5d2f4 | 2013-02-21 13:28:10 +1100 | [diff] [blame] | 71 | specified, but the default is 2. There are also three |
| 72 | variations to how the copies are laid down - the default |
| 73 | is "near". Near copies are what most people think of with |
| 74 | respect to mirroring. If these options are left unspecified, |
| 75 | or 'raid10_copies 2' and/or 'raid10_format near' are given, |
| 76 | then the layouts for 2, 3 and 4 devices are: |
Jonathan Brassow | 63f33b8d | 2012-07-31 21:44:26 -0500 | [diff] [blame] | 77 | 2 drives 3 drives 4 drives |
| 78 | -------- ---------- -------------- |
| 79 | A1 A1 A1 A1 A2 A1 A1 A2 A2 |
| 80 | A2 A2 A2 A3 A3 A3 A3 A4 A4 |
| 81 | A3 A3 A4 A4 A5 A5 A5 A6 A6 |
| 82 | A4 A4 A5 A6 A6 A7 A7 A8 A8 |
| 83 | .. .. .. .. .. .. .. .. .. |
| 84 | The 2-device layout is equivalent 2-way RAID1. The 4-device |
| 85 | layout is what a traditional RAID10 would look like. The |
| 86 | 3-device layout is what might be called a 'RAID1E - Integrated |
| 87 | Adjacent Stripe Mirroring'. |
| 88 | |
Jonathan Brassow | fe5d2f4 | 2013-02-21 13:28:10 +1100 | [diff] [blame] | 89 | If 'raid10_copies 2' and 'raid10_format far', then the layouts |
| 90 | for 2, 3 and 4 devices are: |
| 91 | 2 drives 3 drives 4 drives |
| 92 | -------- -------------- -------------------- |
| 93 | A1 A2 A1 A2 A3 A1 A2 A3 A4 |
| 94 | A3 A4 A4 A5 A6 A5 A6 A7 A8 |
| 95 | A5 A6 A7 A8 A9 A9 A10 A11 A12 |
| 96 | .. .. .. .. .. .. .. .. .. |
| 97 | A2 A1 A3 A1 A2 A2 A1 A4 A3 |
| 98 | A4 A3 A6 A4 A5 A6 A5 A8 A7 |
| 99 | A6 A5 A9 A7 A8 A10 A9 A12 A11 |
| 100 | .. .. .. .. .. .. .. .. .. |
| 101 | |
| 102 | If 'raid10_copies 2' and 'raid10_format offset', then the |
| 103 | layouts for 2, 3 and 4 devices are: |
| 104 | 2 drives 3 drives 4 drives |
| 105 | -------- ------------ ----------------- |
| 106 | A1 A2 A1 A2 A3 A1 A2 A3 A4 |
| 107 | A2 A1 A3 A1 A2 A2 A1 A4 A3 |
| 108 | A3 A4 A4 A5 A6 A5 A6 A7 A8 |
| 109 | A4 A3 A6 A4 A5 A6 A5 A8 A7 |
| 110 | A5 A6 A7 A8 A9 A9 A10 A11 A12 |
| 111 | A6 A5 A9 A7 A8 A10 A9 A12 A11 |
| 112 | .. .. .. .. .. .. .. .. .. |
| 113 | Here we see layouts closely akin to 'RAID1E - Integrated |
| 114 | Offset Stripe Mirroring'. |
| 115 | |
Jonathan Brassow | c0a2fa1 | 2011-08-02 12:32:06 +0100 | [diff] [blame] | 116 | <#raid_devs>: The number of devices composing the array. |
| 117 | Each device consists of two entries. The first is the device |
| 118 | containing the metadata (if any); the second is the one containing the |
Jonathan Brassow | b12d437 | 2011-08-02 12:32:07 +0100 | [diff] [blame] | 119 | data. |
Jonathan Brassow | c0a2fa1 | 2011-08-02 12:32:06 +0100 | [diff] [blame] | 120 | |
| 121 | If a drive has failed or is missing at creation time, a '-' can be |
| 122 | given for both the metadata and data drives for a given position. |
| 123 | |
| 124 | |
| 125 | Example tables |
| 126 | -------------- |
Jonathan Brassow | b12d437 | 2011-08-02 12:32:07 +0100 | [diff] [blame] | 127 | # RAID4 - 4 data drives, 1 parity (no metadata devices) |
NeilBrown | 9d09e66 | 2011-01-13 20:00:02 +0000 | [diff] [blame] | 128 | # No metadata devices specified to hold superblock/bitmap info |
| 129 | # Chunk size of 1MiB |
| 130 | # (Lines separated for easy reading) |
Jonathan Brassow | c0a2fa1 | 2011-08-02 12:32:06 +0100 | [diff] [blame] | 131 | |
NeilBrown | 9d09e66 | 2011-01-13 20:00:02 +0000 | [diff] [blame] | 132 | 0 1960893648 raid \ |
| 133 | raid4 1 2048 \ |
| 134 | 5 - 8:17 - 8:33 - 8:49 - 8:65 - 8:81 |
| 135 | |
Jonathan Brassow | b12d437 | 2011-08-02 12:32:07 +0100 | [diff] [blame] | 136 | # RAID4 - 4 data drives, 1 parity (with metadata devices) |
NeilBrown | 9d09e66 | 2011-01-13 20:00:02 +0000 | [diff] [blame] | 137 | # Chunk size of 1MiB, force RAID initialization, |
| 138 | # min recovery rate at 20 kiB/sec/disk |
Jonathan Brassow | c0a2fa1 | 2011-08-02 12:32:06 +0100 | [diff] [blame] | 139 | |
NeilBrown | 9d09e66 | 2011-01-13 20:00:02 +0000 | [diff] [blame] | 140 | 0 1960893648 raid \ |
Jonathan Brassow | b12d437 | 2011-08-02 12:32:07 +0100 | [diff] [blame] | 141 | raid4 4 2048 sync min_recovery_rate 20 \ |
| 142 | 5 8:17 8:18 8:33 8:34 8:49 8:50 8:65 8:66 8:81 8:82 |
NeilBrown | 9d09e66 | 2011-01-13 20:00:02 +0000 | [diff] [blame] | 143 | |
Jonathan Brassow | c0a2fa1 | 2011-08-02 12:32:06 +0100 | [diff] [blame] | 144 | 'dmsetup table' displays the table used to construct the mapping. |
Jonathan Brassow | 46bed2b | 2011-08-02 12:32:07 +0100 | [diff] [blame] | 145 | The optional parameters are always printed in the order listed |
Jonathan Brassow | c0a2fa1 | 2011-08-02 12:32:06 +0100 | [diff] [blame] | 146 | above with "sync" or "nosync" always output ahead of the other |
| 147 | arguments, regardless of the order used when originally loading the table. |
Jonathan Brassow | 46bed2b | 2011-08-02 12:32:07 +0100 | [diff] [blame] | 148 | Arguments that can be repeated are ordered by value. |
NeilBrown | 9d09e66 | 2011-01-13 20:00:02 +0000 | [diff] [blame] | 149 | |
Jonathan Brassow | c0a2fa1 | 2011-08-02 12:32:06 +0100 | [diff] [blame] | 150 | 'dmsetup status' yields information on the state and health of the |
| 151 | array. |
| 152 | The output is as follows: |
NeilBrown | 9d09e66 | 2011-01-13 20:00:02 +0000 | [diff] [blame] | 153 | 1: <s> <l> raid \ |
| 154 | 2: <raid_type> <#devices> <1 health char for each dev> <resync_ratio> |
| 155 | |
Jonathan Brassow | c0a2fa1 | 2011-08-02 12:32:06 +0100 | [diff] [blame] | 156 | Line 1 is the standard output produced by device-mapper. |
| 157 | Line 2 is produced by the raid target, and best explained by example: |
NeilBrown | 9d09e66 | 2011-01-13 20:00:02 +0000 | [diff] [blame] | 158 | 0 1960893648 raid raid4 5 AAAAA 2/490221568 |
| 159 | Here we can see the RAID type is raid4, there are 5 devices - all of |
| 160 | which are 'A'live, and the array is 2/490221568 complete with recovery. |
Jonathan Brassow | c0a2fa1 | 2011-08-02 12:32:06 +0100 | [diff] [blame] | 161 | Faulty or missing devices are marked 'D'. Devices that are out-of-sync |
| 162 | are marked 'a'. |
Jonathan Brassow | 4ec1e36 | 2012-10-11 13:40:24 +1100 | [diff] [blame] | 163 | |
| 164 | |
| 165 | Version History |
| 166 | --------------- |
| 167 | 1.0.0 Initial version. Support for RAID 4/5/6 |
| 168 | 1.1.0 Added support for RAID 1 |
| 169 | 1.2.0 Handle creation of arrays that contain failed devices. |
| 170 | 1.3.0 Added support for RAID 10 |
| 171 | 1.3.1 Allow device replacement/rebuild for RAID 10 |
Jonathan Brassow | 55ebbb5 | 2013-01-22 21:42:18 -0600 | [diff] [blame] | 172 | 1.3.2 Fix/improve redundancy checking for RAID10 |
Jonathan Brassow | fe5d2f4 | 2013-02-21 13:28:10 +1100 | [diff] [blame] | 173 | 1.4.0 Non-functional change. Removes arg from mapping function. |
| 174 | 1.4.1 Add RAID10 "far" and "offset" algorithm support. |