Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame^] | 1 | Linux Driver for Mylex DAC960/AcceleRAID/eXtremeRAID PCI RAID Controllers |
| 2 | |
| 3 | Version 2.2.11 for Linux 2.2.19 |
| 4 | Version 2.4.11 for Linux 2.4.12 |
| 5 | |
| 6 | PRODUCTION RELEASE |
| 7 | |
| 8 | 11 October 2001 |
| 9 | |
| 10 | Leonard N. Zubkoff |
| 11 | Dandelion Digital |
| 12 | lnz@dandelion.com |
| 13 | |
| 14 | Copyright 1998-2001 by Leonard N. Zubkoff <lnz@dandelion.com> |
| 15 | |
| 16 | |
| 17 | INTRODUCTION |
| 18 | |
| 19 | Mylex, Inc. designs and manufactures a variety of high performance PCI RAID |
| 20 | controllers. Mylex Corporation is located at 34551 Ardenwood Blvd., Fremont, |
| 21 | California 94555, USA and can be reached at 510.796.6100 or on the World Wide |
| 22 | Web at http://www.mylex.com. Mylex Technical Support can be reached by |
| 23 | electronic mail at mylexsup@us.ibm.com, by voice at 510.608.2400, or by FAX at |
| 24 | 510.745.7715. Contact information for offices in Europe and Japan is available |
| 25 | on their Web site. |
| 26 | |
| 27 | The latest information on Linux support for DAC960 PCI RAID Controllers, as |
| 28 | well as the most recent release of this driver, will always be available from |
| 29 | my Linux Home Page at URL "http://www.dandelion.com/Linux/". The Linux DAC960 |
| 30 | driver supports all current Mylex PCI RAID controllers including the new |
| 31 | eXtremeRAID 2000/3000 and AcceleRAID 352/170/160 models which have an entirely |
| 32 | new firmware interface from the older eXtremeRAID 1100, AcceleRAID 150/200/250, |
| 33 | and DAC960PJ/PG/PU/PD/PL. See below for a complete controller list as well as |
| 34 | minimum firmware version requirements. For simplicity, in most places this |
| 35 | documentation refers to DAC960 generically rather than explicitly listing all |
| 36 | the supported models. |
| 37 | |
| 38 | Driver bug reports should be sent via electronic mail to "lnz@dandelion.com". |
| 39 | Please include with the bug report the complete configuration messages reported |
| 40 | by the driver at startup, along with any subsequent system messages relevant to |
| 41 | the controller's operation, and a detailed description of your system's |
| 42 | hardware configuration. Driver bugs are actually quite rare; if you encounter |
| 43 | problems with disks being marked offline, for example, please contact Mylex |
| 44 | Technical Support as the problem is related to the hardware configuration |
| 45 | rather than the Linux driver. |
| 46 | |
| 47 | Please consult the RAID controller documentation for detailed information |
| 48 | regarding installation and configuration of the controllers. This document |
| 49 | primarily provides information specific to the Linux support. |
| 50 | |
| 51 | |
| 52 | DRIVER FEATURES |
| 53 | |
| 54 | The DAC960 RAID controllers are supported solely as high performance RAID |
| 55 | controllers, not as interfaces to arbitrary SCSI devices. The Linux DAC960 |
| 56 | driver operates at the block device level, the same level as the SCSI and IDE |
| 57 | drivers. Unlike other RAID controllers currently supported on Linux, the |
| 58 | DAC960 driver is not dependent on the SCSI subsystem, and hence avoids all the |
| 59 | complexity and unnecessary code that would be associated with an implementation |
| 60 | as a SCSI driver. The DAC960 driver is designed for as high a performance as |
| 61 | possible with no compromises or extra code for compatibility with lower |
| 62 | performance devices. The DAC960 driver includes extensive error logging and |
| 63 | online configuration management capabilities. Except for initial configuration |
| 64 | of the controller and adding new disk drives, most everything can be handled |
| 65 | from Linux while the system is operational. |
| 66 | |
| 67 | The DAC960 driver is architected to support up to 8 controllers per system. |
| 68 | Each DAC960 parallel SCSI controller can support up to 15 disk drives per |
| 69 | channel, for a maximum of 60 drives on a four channel controller; the fibre |
| 70 | channel eXtremeRAID 3000 controller supports up to 125 disk drives per loop for |
| 71 | a total of 250 drives. The drives installed on a controller are divided into |
| 72 | one or more "Drive Groups", and then each Drive Group is subdivided further |
| 73 | into 1 to 32 "Logical Drives". Each Logical Drive has a specific RAID Level |
| 74 | and caching policy associated with it, and it appears to Linux as a single |
| 75 | block device. Logical Drives are further subdivided into up to 7 partitions |
| 76 | through the normal Linux and PC disk partitioning schemes. Logical Drives are |
| 77 | also known as "System Drives", and Drive Groups are also called "Packs". Both |
| 78 | terms are in use in the Mylex documentation; I have chosen to standardize on |
| 79 | the more generic "Logical Drive" and "Drive Group". |
| 80 | |
| 81 | DAC960 RAID disk devices are named in the style of the Device File System |
| 82 | (DEVFS). The device corresponding to Logical Drive D on Controller C is |
| 83 | referred to as /dev/rd/cCdD, and the partitions are called /dev/rd/cCdDp1 |
| 84 | through /dev/rd/cCdDp7. For example, partition 3 of Logical Drive 5 on |
| 85 | Controller 2 is referred to as /dev/rd/c2d5p3. Note that unlike with SCSI |
| 86 | disks the device names will not change in the event of a disk drive failure. |
| 87 | The DAC960 driver is assigned major numbers 48 - 55 with one major number per |
| 88 | controller. The 8 bits of minor number are divided into 5 bits for the Logical |
| 89 | Drive and 3 bits for the partition. |
| 90 | |
| 91 | |
| 92 | SUPPORTED DAC960/AcceleRAID/eXtremeRAID PCI RAID CONTROLLERS |
| 93 | |
| 94 | The following list comprises the supported DAC960, AcceleRAID, and eXtremeRAID |
| 95 | PCI RAID Controllers as of the date of this document. It is recommended that |
| 96 | anyone purchasing a Mylex PCI RAID Controller not in the following table |
| 97 | contact the author beforehand to verify that it is or will be supported. |
| 98 | |
| 99 | eXtremeRAID 3000 |
| 100 | 1 Wide Ultra-2/LVD SCSI channel |
| 101 | 2 External Fibre FC-AL channels |
| 102 | 233MHz StrongARM SA 110 Processor |
| 103 | 64 Bit 33MHz PCI (backward compatible with 32 Bit PCI slots) |
| 104 | 32MB/64MB ECC SDRAM Memory |
| 105 | |
| 106 | eXtremeRAID 2000 |
| 107 | 4 Wide Ultra-160 LVD SCSI channels |
| 108 | 233MHz StrongARM SA 110 Processor |
| 109 | 64 Bit 33MHz PCI (backward compatible with 32 Bit PCI slots) |
| 110 | 32MB/64MB ECC SDRAM Memory |
| 111 | |
| 112 | AcceleRAID 352 |
| 113 | 2 Wide Ultra-160 LVD SCSI channels |
| 114 | 100MHz Intel i960RN RISC Processor |
| 115 | 64 Bit 33MHz PCI (backward compatible with 32 Bit PCI slots) |
| 116 | 32MB/64MB ECC SDRAM Memory |
| 117 | |
| 118 | AcceleRAID 170 |
| 119 | 1 Wide Ultra-160 LVD SCSI channel |
| 120 | 100MHz Intel i960RM RISC Processor |
| 121 | 16MB/32MB/64MB ECC SDRAM Memory |
| 122 | |
| 123 | AcceleRAID 160 (AcceleRAID 170LP) |
| 124 | 1 Wide Ultra-160 LVD SCSI channel |
| 125 | 100MHz Intel i960RS RISC Processor |
| 126 | Built in 16M ECC SDRAM Memory |
| 127 | PCI Low Profile Form Factor - fit for 2U height |
| 128 | |
| 129 | eXtremeRAID 1100 (DAC1164P) |
| 130 | 3 Wide Ultra-2/LVD SCSI channels |
| 131 | 233MHz StrongARM SA 110 Processor |
| 132 | 64 Bit 33MHz PCI (backward compatible with 32 Bit PCI slots) |
| 133 | 16MB/32MB/64MB Parity SDRAM Memory with Battery Backup |
| 134 | |
| 135 | AcceleRAID 250 (DAC960PTL1) |
| 136 | Uses onboard Symbios SCSI chips on certain motherboards |
| 137 | Also includes one onboard Wide Ultra-2/LVD SCSI Channel |
| 138 | 66MHz Intel i960RD RISC Processor |
| 139 | 4MB/8MB/16MB/32MB/64MB/128MB ECC EDO Memory |
| 140 | |
| 141 | AcceleRAID 200 (DAC960PTL0) |
| 142 | Uses onboard Symbios SCSI chips on certain motherboards |
| 143 | Includes no onboard SCSI Channels |
| 144 | 66MHz Intel i960RD RISC Processor |
| 145 | 4MB/8MB/16MB/32MB/64MB/128MB ECC EDO Memory |
| 146 | |
| 147 | AcceleRAID 150 (DAC960PRL) |
| 148 | Uses onboard Symbios SCSI chips on certain motherboards |
| 149 | Also includes one onboard Wide Ultra-2/LVD SCSI Channel |
| 150 | 33MHz Intel i960RP RISC Processor |
| 151 | 4MB Parity EDO Memory |
| 152 | |
| 153 | DAC960PJ 1/2/3 Wide Ultra SCSI-3 Channels |
| 154 | 66MHz Intel i960RD RISC Processor |
| 155 | 4MB/8MB/16MB/32MB/64MB/128MB ECC EDO Memory |
| 156 | |
| 157 | DAC960PG 1/2/3 Wide Ultra SCSI-3 Channels |
| 158 | 33MHz Intel i960RP RISC Processor |
| 159 | 4MB/8MB ECC EDO Memory |
| 160 | |
| 161 | DAC960PU 1/2/3 Wide Ultra SCSI-3 Channels |
| 162 | Intel i960CF RISC Processor |
| 163 | 4MB/8MB EDRAM or 2MB/4MB/8MB/16MB/32MB DRAM Memory |
| 164 | |
| 165 | DAC960PD 1/2/3 Wide Fast SCSI-2 Channels |
| 166 | Intel i960CF RISC Processor |
| 167 | 4MB/8MB EDRAM or 2MB/4MB/8MB/16MB/32MB DRAM Memory |
| 168 | |
| 169 | DAC960PL 1/2/3 Wide Fast SCSI-2 Channels |
| 170 | Intel i960 RISC Processor |
| 171 | 2MB/4MB/8MB/16MB/32MB DRAM Memory |
| 172 | |
| 173 | DAC960P 1/2/3 Wide Fast SCSI-2 Channels |
| 174 | Intel i960 RISC Processor |
| 175 | 2MB/4MB/8MB/16MB/32MB DRAM Memory |
| 176 | |
| 177 | For the eXtremeRAID 2000/3000 and AcceleRAID 352/170/160, firmware version |
| 178 | 6.00-01 or above is required. |
| 179 | |
| 180 | For the eXtremeRAID 1100, firmware version 5.06-0-52 or above is required. |
| 181 | |
| 182 | For the AcceleRAID 250, 200, and 150, firmware version 4.06-0-57 or above is |
| 183 | required. |
| 184 | |
| 185 | For the DAC960PJ and DAC960PG, firmware version 4.06-0-00 or above is required. |
| 186 | |
| 187 | For the DAC960PU, DAC960PD, DAC960PL, and DAC960P, either firmware version |
| 188 | 3.51-0-04 or above is required (for dual Flash ROM controllers), or firmware |
| 189 | version 2.73-0-00 or above is required (for single Flash ROM controllers) |
| 190 | |
| 191 | Please note that not all SCSI disk drives are suitable for use with DAC960 |
| 192 | controllers, and only particular firmware versions of any given model may |
| 193 | actually function correctly. Similarly, not all motherboards have a BIOS that |
| 194 | properly initializes the AcceleRAID 250, AcceleRAID 200, AcceleRAID 150, |
| 195 | DAC960PJ, and DAC960PG because the Intel i960RD/RP is a multi-function device. |
| 196 | If in doubt, contact Mylex RAID Technical Support (mylexsup@us.ibm.com) to |
| 197 | verify compatibility. Mylex makes available a hard disk compatibility list at |
| 198 | http://www.mylex.com/support/hdcomp/hd-lists.html. |
| 199 | |
| 200 | |
| 201 | DRIVER INSTALLATION |
| 202 | |
| 203 | This distribution was prepared for Linux kernel version 2.2.19 or 2.4.12. |
| 204 | |
| 205 | To install the DAC960 RAID driver, you may use the following commands, |
| 206 | replacing "/usr/src" with wherever you keep your Linux kernel source tree: |
| 207 | |
| 208 | cd /usr/src |
| 209 | tar -xvzf DAC960-2.2.11.tar.gz (or DAC960-2.4.11.tar.gz) |
| 210 | mv README.DAC960 linux/Documentation |
| 211 | mv DAC960.[ch] linux/drivers/block |
| 212 | patch -p0 < DAC960.patch (if DAC960.patch is included) |
| 213 | cd linux |
| 214 | make config |
| 215 | make bzImage (or zImage) |
| 216 | |
| 217 | Then install "arch/i386/boot/bzImage" or "arch/i386/boot/zImage" as your |
| 218 | standard kernel, run lilo if appropriate, and reboot. |
| 219 | |
| 220 | To create the necessary devices in /dev, the "make_rd" script included in |
| 221 | "DAC960-Utilities.tar.gz" from http://www.dandelion.com/Linux/ may be used. |
| 222 | LILO 21 and FDISK v2.9 include DAC960 support; also included in this archive |
| 223 | are patches to LILO 20 and FDISK v2.8 that add DAC960 support, along with |
| 224 | statically linked executables of LILO and FDISK. This modified version of LILO |
| 225 | will allow booting from a DAC960 controller and/or mounting the root file |
| 226 | system from a DAC960. |
| 227 | |
| 228 | Red Hat Linux 6.0 and SuSE Linux 6.1 include support for Mylex PCI RAID |
| 229 | controllers. Installing directly onto a DAC960 may be problematic from other |
| 230 | Linux distributions until their installation utilities are updated. |
| 231 | |
| 232 | |
| 233 | INSTALLATION NOTES |
| 234 | |
| 235 | Before installing Linux or adding DAC960 logical drives to an existing Linux |
| 236 | system, the controller must first be configured to provide one or more logical |
| 237 | drives using the BIOS Configuration Utility or DACCF. Please note that since |
| 238 | there are only at most 6 usable partitions on each logical drive, systems |
| 239 | requiring more partitions should subdivide a drive group into multiple logical |
| 240 | drives, each of which can have up to 6 usable partitions. Also, note that with |
| 241 | large disk arrays it is advisable to enable the 8GB BIOS Geometry (255/63) |
| 242 | rather than accepting the default 2GB BIOS Geometry (128/32); failing to so do |
| 243 | will cause the logical drive geometry to have more than 65535 cylinders which |
| 244 | will make it impossible for FDISK to be used properly. The 8GB BIOS Geometry |
| 245 | can be enabled by configuring the DAC960 BIOS, which is accessible via Alt-M |
| 246 | during the BIOS initialization sequence. |
| 247 | |
| 248 | For maximum performance and the most efficient E2FSCK performance, it is |
| 249 | recommended that EXT2 file systems be built with a 4KB block size and 16 block |
| 250 | stride to match the DAC960 controller's 64KB default stripe size. The command |
| 251 | "mke2fs -b 4096 -R stride=16 <device>" is appropriate. Unless there will be a |
| 252 | large number of small files on the file systems, it is also beneficial to add |
| 253 | the "-i 16384" option to increase the bytes per inode parameter thereby |
| 254 | reducing the file system metadata. Finally, on systems that will only be run |
| 255 | with Linux 2.2 or later kernels it is beneficial to enable sparse superblocks |
| 256 | with the "-s 1" option. |
| 257 | |
| 258 | |
| 259 | DAC960 ANNOUNCEMENTS MAILING LIST |
| 260 | |
| 261 | The DAC960 Announcements Mailing List provides a forum for informing Linux |
| 262 | users of new driver releases and other announcements regarding Linux support |
| 263 | for DAC960 PCI RAID Controllers. To join the mailing list, send a message to |
| 264 | "dac960-announce-request@dandelion.com" with the line "subscribe" in the |
| 265 | message body. |
| 266 | |
| 267 | |
| 268 | CONTROLLER CONFIGURATION AND STATUS MONITORING |
| 269 | |
| 270 | The DAC960 RAID controllers running firmware 4.06 or above include a Background |
| 271 | Initialization facility so that system downtime is minimized both for initial |
| 272 | installation and subsequent configuration of additional storage. The BIOS |
| 273 | Configuration Utility (accessible via Alt-R during the BIOS initialization |
| 274 | sequence) is used to quickly configure the controller, and then the logical |
| 275 | drives that have been created are available for immediate use even while they |
| 276 | are still being initialized by the controller. The primary need for online |
| 277 | configuration and status monitoring is then to avoid system downtime when disk |
| 278 | drives fail and must be replaced. Mylex's online monitoring and configuration |
| 279 | utilities are being ported to Linux and will become available at some point in |
| 280 | the future. Note that with a SAF-TE (SCSI Accessed Fault-Tolerant Enclosure) |
| 281 | enclosure, the controller is able to rebuild failed drives automatically as |
| 282 | soon as a drive replacement is made available. |
| 283 | |
| 284 | The primary interfaces for controller configuration and status monitoring are |
| 285 | special files created in the /proc/rd/... hierarchy along with the normal |
| 286 | system console logging mechanism. Whenever the system is operating, the DAC960 |
| 287 | driver queries each controller for status information every 10 seconds, and |
| 288 | checks for additional conditions every 60 seconds. The initial status of each |
| 289 | controller is always available for controller N in /proc/rd/cN/initial_status, |
| 290 | and the current status as of the last status monitoring query is available in |
| 291 | /proc/rd/cN/current_status. In addition, status changes are also logged by the |
| 292 | driver to the system console and will appear in the log files maintained by |
| 293 | syslog. The progress of asynchronous rebuild or consistency check operations |
| 294 | is also available in /proc/rd/cN/current_status, and progress messages are |
| 295 | logged to the system console at most every 60 seconds. |
| 296 | |
| 297 | Starting with the 2.2.3/2.0.3 versions of the driver, the status information |
| 298 | available in /proc/rd/cN/initial_status and /proc/rd/cN/current_status has been |
| 299 | augmented to include the vendor, model, revision, and serial number (if |
| 300 | available) for each physical device found connected to the controller: |
| 301 | |
| 302 | ***** DAC960 RAID Driver Version 2.2.3 of 19 August 1999 ***** |
| 303 | Copyright 1998-1999 by Leonard N. Zubkoff <lnz@dandelion.com> |
| 304 | Configuring Mylex DAC960PRL PCI RAID Controller |
| 305 | Firmware Version: 4.07-0-07, Channels: 1, Memory Size: 16MB |
| 306 | PCI Bus: 1, Device: 4, Function: 1, I/O Address: Unassigned |
| 307 | PCI Address: 0xFE300000 mapped at 0xA0800000, IRQ Channel: 21 |
| 308 | Controller Queue Depth: 128, Maximum Blocks per Command: 128 |
| 309 | Driver Queue Depth: 127, Maximum Scatter/Gather Segments: 33 |
| 310 | Stripe Size: 64KB, Segment Size: 8KB, BIOS Geometry: 255/63 |
| 311 | SAF-TE Enclosure Management Enabled |
| 312 | Physical Devices: |
| 313 | 0:0 Vendor: IBM Model: DRVS09D Revision: 0270 |
| 314 | Serial Number: 68016775HA |
| 315 | Disk Status: Online, 17928192 blocks |
| 316 | 0:1 Vendor: IBM Model: DRVS09D Revision: 0270 |
| 317 | Serial Number: 68004E53HA |
| 318 | Disk Status: Online, 17928192 blocks |
| 319 | 0:2 Vendor: IBM Model: DRVS09D Revision: 0270 |
| 320 | Serial Number: 13013935HA |
| 321 | Disk Status: Online, 17928192 blocks |
| 322 | 0:3 Vendor: IBM Model: DRVS09D Revision: 0270 |
| 323 | Serial Number: 13016897HA |
| 324 | Disk Status: Online, 17928192 blocks |
| 325 | 0:4 Vendor: IBM Model: DRVS09D Revision: 0270 |
| 326 | Serial Number: 68019905HA |
| 327 | Disk Status: Online, 17928192 blocks |
| 328 | 0:5 Vendor: IBM Model: DRVS09D Revision: 0270 |
| 329 | Serial Number: 68012753HA |
| 330 | Disk Status: Online, 17928192 blocks |
| 331 | 0:6 Vendor: ESG-SHV Model: SCA HSBP M6 Revision: 0.61 |
| 332 | Logical Drives: |
| 333 | /dev/rd/c0d0: RAID-5, Online, 89640960 blocks, Write Thru |
| 334 | No Rebuild or Consistency Check in Progress |
| 335 | |
| 336 | To simplify the monitoring process for custom software, the special file |
| 337 | /proc/rd/status returns "OK" when all DAC960 controllers in the system are |
| 338 | operating normally and no failures have occurred, or "ALERT" if any logical |
| 339 | drives are offline or critical or any non-standby physical drives are dead. |
| 340 | |
| 341 | Configuration commands for controller N are available via the special file |
| 342 | /proc/rd/cN/user_command. A human readable command can be written to this |
| 343 | special file to initiate a configuration operation, and the results of the |
| 344 | operation can then be read back from the special file in addition to being |
| 345 | logged to the system console. The shell command sequence |
| 346 | |
| 347 | echo "<configuration-command>" > /proc/rd/c0/user_command |
| 348 | cat /proc/rd/c0/user_command |
| 349 | |
| 350 | is typically used to execute configuration commands. The configuration |
| 351 | commands are: |
| 352 | |
| 353 | flush-cache |
| 354 | |
| 355 | The "flush-cache" command flushes the controller's cache. The system |
| 356 | automatically flushes the cache at shutdown or if the driver module is |
| 357 | unloaded, so this command is only needed to be certain a write back cache |
| 358 | is flushed to disk before the system is powered off by a command to a UPS. |
| 359 | Note that the flush-cache command also stops an asynchronous rebuild or |
| 360 | consistency check, so it should not be used except when the system is being |
| 361 | halted. |
| 362 | |
| 363 | kill <channel>:<target-id> |
| 364 | |
| 365 | The "kill" command marks the physical drive <channel>:<target-id> as DEAD. |
| 366 | This command is provided primarily for testing, and should not be used |
| 367 | during normal system operation. |
| 368 | |
| 369 | make-online <channel>:<target-id> |
| 370 | |
| 371 | The "make-online" command changes the physical drive <channel>:<target-id> |
| 372 | from status DEAD to status ONLINE. In cases where multiple physical drives |
| 373 | have been killed simultaneously, this command may be used to bring all but |
| 374 | one of them back online, after which a rebuild to the final drive is |
| 375 | necessary. |
| 376 | |
| 377 | Warning: make-online should only be used on a dead physical drive that is |
| 378 | an active part of a drive group, never on a standby drive. The command |
| 379 | should never be used on a dead drive that is part of a critical logical |
| 380 | drive; rebuild should be used if only a single drive is dead. |
| 381 | |
| 382 | make-standby <channel>:<target-id> |
| 383 | |
| 384 | The "make-standby" command changes physical drive <channel>:<target-id> |
| 385 | from status DEAD to status STANDBY. It should only be used in cases where |
| 386 | a dead drive was replaced after an automatic rebuild was performed onto a |
| 387 | standby drive. It cannot be used to add a standby drive to the controller |
| 388 | configuration if one was not created initially; the BIOS Configuration |
| 389 | Utility must be used for that currently. |
| 390 | |
| 391 | rebuild <channel>:<target-id> |
| 392 | |
| 393 | The "rebuild" command initiates an asynchronous rebuild onto physical drive |
| 394 | <channel>:<target-id>. It should only be used when a dead drive has been |
| 395 | replaced. |
| 396 | |
| 397 | check-consistency <logical-drive-number> |
| 398 | |
| 399 | The "check-consistency" command initiates an asynchronous consistency check |
| 400 | of <logical-drive-number> with automatic restoration. It can be used |
| 401 | whenever it is desired to verify the consistency of the redundancy |
| 402 | information. |
| 403 | |
| 404 | cancel-rebuild |
| 405 | cancel-consistency-check |
| 406 | |
| 407 | The "cancel-rebuild" and "cancel-consistency-check" commands cancel any |
| 408 | rebuild or consistency check operations previously initiated. |
| 409 | |
| 410 | |
| 411 | EXAMPLE I - DRIVE FAILURE WITHOUT A STANDBY DRIVE |
| 412 | |
| 413 | The following annotated logs demonstrate the controller configuration and and |
| 414 | online status monitoring capabilities of the Linux DAC960 Driver. The test |
| 415 | configuration comprises 6 1GB Quantum Atlas I disk drives on two channels of a |
| 416 | DAC960PJ controller. The physical drives are configured into a single drive |
| 417 | group without a standby drive, and the drive group has been configured into two |
| 418 | logical drives, one RAID-5 and one RAID-6. Note that these logs are from an |
| 419 | earlier version of the driver and the messages have changed somewhat with newer |
| 420 | releases, but the functionality remains similar. First, here is the current |
| 421 | status of the RAID configuration: |
| 422 | |
| 423 | gwynedd:/u/lnz# cat /proc/rd/c0/current_status |
| 424 | ***** DAC960 RAID Driver Version 2.0.0 of 23 March 1999 ***** |
| 425 | Copyright 1998-1999 by Leonard N. Zubkoff <lnz@dandelion.com> |
| 426 | Configuring Mylex DAC960PJ PCI RAID Controller |
| 427 | Firmware Version: 4.06-0-08, Channels: 3, Memory Size: 8MB |
| 428 | PCI Bus: 0, Device: 19, Function: 1, I/O Address: Unassigned |
| 429 | PCI Address: 0xFD4FC000 mapped at 0x8807000, IRQ Channel: 9 |
| 430 | Controller Queue Depth: 128, Maximum Blocks per Command: 128 |
| 431 | Driver Queue Depth: 127, Maximum Scatter/Gather Segments: 33 |
| 432 | Stripe Size: 64KB, Segment Size: 8KB, BIOS Geometry: 255/63 |
| 433 | Physical Devices: |
| 434 | 0:1 - Disk: Online, 2201600 blocks |
| 435 | 0:2 - Disk: Online, 2201600 blocks |
| 436 | 0:3 - Disk: Online, 2201600 blocks |
| 437 | 1:1 - Disk: Online, 2201600 blocks |
| 438 | 1:2 - Disk: Online, 2201600 blocks |
| 439 | 1:3 - Disk: Online, 2201600 blocks |
| 440 | Logical Drives: |
| 441 | /dev/rd/c0d0: RAID-5, Online, 5498880 blocks, Write Thru |
| 442 | /dev/rd/c0d1: RAID-6, Online, 3305472 blocks, Write Thru |
| 443 | No Rebuild or Consistency Check in Progress |
| 444 | |
| 445 | gwynedd:/u/lnz# cat /proc/rd/status |
| 446 | OK |
| 447 | |
| 448 | The above messages indicate that everything is healthy, and /proc/rd/status |
| 449 | returns "OK" indicating that there are no problems with any DAC960 controller |
| 450 | in the system. For demonstration purposes, while I/O is active Physical Drive |
| 451 | 1:1 is now disconnected, simulating a drive failure. The failure is noted by |
| 452 | the driver within 10 seconds of the controller's having detected it, and the |
| 453 | driver logs the following console status messages indicating that Logical |
| 454 | Drives 0 and 1 are now CRITICAL as a result of Physical Drive 1:1 being DEAD: |
| 455 | |
| 456 | DAC960#0: Physical Drive 1:2 Error Log: Sense Key = 6, ASC = 29, ASCQ = 02 |
| 457 | DAC960#0: Physical Drive 1:3 Error Log: Sense Key = 6, ASC = 29, ASCQ = 02 |
| 458 | DAC960#0: Physical Drive 1:1 killed because of timeout on SCSI command |
| 459 | DAC960#0: Physical Drive 1:1 is now DEAD |
| 460 | DAC960#0: Logical Drive 0 (/dev/rd/c0d0) is now CRITICAL |
| 461 | DAC960#0: Logical Drive 1 (/dev/rd/c0d1) is now CRITICAL |
| 462 | |
| 463 | The Sense Keys logged here are just Check Condition / Unit Attention conditions |
| 464 | arising from a SCSI bus reset that is forced by the controller during its error |
| 465 | recovery procedures. Concurrently with the above, the driver status available |
| 466 | from /proc/rd also reflects the drive failure. The status message in |
| 467 | /proc/rd/status has changed from "OK" to "ALERT": |
| 468 | |
| 469 | gwynedd:/u/lnz# cat /proc/rd/status |
| 470 | ALERT |
| 471 | |
| 472 | and /proc/rd/c0/current_status has been updated: |
| 473 | |
| 474 | gwynedd:/u/lnz# cat /proc/rd/c0/current_status |
| 475 | ... |
| 476 | Physical Devices: |
| 477 | 0:1 - Disk: Online, 2201600 blocks |
| 478 | 0:2 - Disk: Online, 2201600 blocks |
| 479 | 0:3 - Disk: Online, 2201600 blocks |
| 480 | 1:1 - Disk: Dead, 2201600 blocks |
| 481 | 1:2 - Disk: Online, 2201600 blocks |
| 482 | 1:3 - Disk: Online, 2201600 blocks |
| 483 | Logical Drives: |
| 484 | /dev/rd/c0d0: RAID-5, Critical, 5498880 blocks, Write Thru |
| 485 | /dev/rd/c0d1: RAID-6, Critical, 3305472 blocks, Write Thru |
| 486 | No Rebuild or Consistency Check in Progress |
| 487 | |
| 488 | Since there are no standby drives configured, the system can continue to access |
| 489 | the logical drives in a performance degraded mode until the failed drive is |
| 490 | replaced and a rebuild operation completed to restore the redundancy of the |
| 491 | logical drives. Once Physical Drive 1:1 is replaced with a properly |
| 492 | functioning drive, or if the physical drive was killed without having failed |
| 493 | (e.g., due to electrical problems on the SCSI bus), the user can instruct the |
| 494 | controller to initiate a rebuild operation onto the newly replaced drive: |
| 495 | |
| 496 | gwynedd:/u/lnz# echo "rebuild 1:1" > /proc/rd/c0/user_command |
| 497 | gwynedd:/u/lnz# cat /proc/rd/c0/user_command |
| 498 | Rebuild of Physical Drive 1:1 Initiated |
| 499 | |
| 500 | The echo command instructs the controller to initiate an asynchronous rebuild |
| 501 | operation onto Physical Drive 1:1, and the status message that results from the |
| 502 | operation is then available for reading from /proc/rd/c0/user_command, as well |
| 503 | as being logged to the console by the driver. |
| 504 | |
| 505 | Within 10 seconds of this command the driver logs the initiation of the |
| 506 | asynchronous rebuild operation: |
| 507 | |
| 508 | DAC960#0: Rebuild of Physical Drive 1:1 Initiated |
| 509 | DAC960#0: Physical Drive 1:1 Error Log: Sense Key = 6, ASC = 29, ASCQ = 01 |
| 510 | DAC960#0: Physical Drive 1:1 is now WRITE-ONLY |
| 511 | DAC960#0: Rebuild in Progress: Logical Drive 0 (/dev/rd/c0d0) 1% completed |
| 512 | |
| 513 | and /proc/rd/c0/current_status is updated: |
| 514 | |
| 515 | gwynedd:/u/lnz# cat /proc/rd/c0/current_status |
| 516 | ... |
| 517 | Physical Devices: |
| 518 | 0:1 - Disk: Online, 2201600 blocks |
| 519 | 0:2 - Disk: Online, 2201600 blocks |
| 520 | 0:3 - Disk: Online, 2201600 blocks |
| 521 | 1:1 - Disk: Write-Only, 2201600 blocks |
| 522 | 1:2 - Disk: Online, 2201600 blocks |
| 523 | 1:3 - Disk: Online, 2201600 blocks |
| 524 | Logical Drives: |
| 525 | /dev/rd/c0d0: RAID-5, Critical, 5498880 blocks, Write Thru |
| 526 | /dev/rd/c0d1: RAID-6, Critical, 3305472 blocks, Write Thru |
| 527 | Rebuild in Progress: Logical Drive 0 (/dev/rd/c0d0) 6% completed |
| 528 | |
| 529 | As the rebuild progresses, the current status in /proc/rd/c0/current_status is |
| 530 | updated every 10 seconds: |
| 531 | |
| 532 | gwynedd:/u/lnz# cat /proc/rd/c0/current_status |
| 533 | ... |
| 534 | Physical Devices: |
| 535 | 0:1 - Disk: Online, 2201600 blocks |
| 536 | 0:2 - Disk: Online, 2201600 blocks |
| 537 | 0:3 - Disk: Online, 2201600 blocks |
| 538 | 1:1 - Disk: Write-Only, 2201600 blocks |
| 539 | 1:2 - Disk: Online, 2201600 blocks |
| 540 | 1:3 - Disk: Online, 2201600 blocks |
| 541 | Logical Drives: |
| 542 | /dev/rd/c0d0: RAID-5, Critical, 5498880 blocks, Write Thru |
| 543 | /dev/rd/c0d1: RAID-6, Critical, 3305472 blocks, Write Thru |
| 544 | Rebuild in Progress: Logical Drive 0 (/dev/rd/c0d0) 15% completed |
| 545 | |
| 546 | and every minute a progress message is logged to the console by the driver: |
| 547 | |
| 548 | DAC960#0: Rebuild in Progress: Logical Drive 0 (/dev/rd/c0d0) 32% completed |
| 549 | DAC960#0: Rebuild in Progress: Logical Drive 0 (/dev/rd/c0d0) 63% completed |
| 550 | DAC960#0: Rebuild in Progress: Logical Drive 0 (/dev/rd/c0d0) 94% completed |
| 551 | DAC960#0: Rebuild in Progress: Logical Drive 1 (/dev/rd/c0d1) 94% completed |
| 552 | |
| 553 | Finally, the rebuild completes successfully. The driver logs the status of the |
| 554 | logical and physical drives and the rebuild completion: |
| 555 | |
| 556 | DAC960#0: Rebuild Completed Successfully |
| 557 | DAC960#0: Physical Drive 1:1 is now ONLINE |
| 558 | DAC960#0: Logical Drive 0 (/dev/rd/c0d0) is now ONLINE |
| 559 | DAC960#0: Logical Drive 1 (/dev/rd/c0d1) is now ONLINE |
| 560 | |
| 561 | /proc/rd/c0/current_status is updated: |
| 562 | |
| 563 | gwynedd:/u/lnz# cat /proc/rd/c0/current_status |
| 564 | ... |
| 565 | Physical Devices: |
| 566 | 0:1 - Disk: Online, 2201600 blocks |
| 567 | 0:2 - Disk: Online, 2201600 blocks |
| 568 | 0:3 - Disk: Online, 2201600 blocks |
| 569 | 1:1 - Disk: Online, 2201600 blocks |
| 570 | 1:2 - Disk: Online, 2201600 blocks |
| 571 | 1:3 - Disk: Online, 2201600 blocks |
| 572 | Logical Drives: |
| 573 | /dev/rd/c0d0: RAID-5, Online, 5498880 blocks, Write Thru |
| 574 | /dev/rd/c0d1: RAID-6, Online, 3305472 blocks, Write Thru |
| 575 | Rebuild Completed Successfully |
| 576 | |
| 577 | and /proc/rd/status indicates that everything is healthy once again: |
| 578 | |
| 579 | gwynedd:/u/lnz# cat /proc/rd/status |
| 580 | OK |
| 581 | |
| 582 | |
| 583 | EXAMPLE II - DRIVE FAILURE WITH A STANDBY DRIVE |
| 584 | |
| 585 | The following annotated logs demonstrate the controller configuration and and |
| 586 | online status monitoring capabilities of the Linux DAC960 Driver. The test |
| 587 | configuration comprises 6 1GB Quantum Atlas I disk drives on two channels of a |
| 588 | DAC960PJ controller. The physical drives are configured into a single drive |
| 589 | group with a standby drive, and the drive group has been configured into two |
| 590 | logical drives, one RAID-5 and one RAID-6. Note that these logs are from an |
| 591 | earlier version of the driver and the messages have changed somewhat with newer |
| 592 | releases, but the functionality remains similar. First, here is the current |
| 593 | status of the RAID configuration: |
| 594 | |
| 595 | gwynedd:/u/lnz# cat /proc/rd/c0/current_status |
| 596 | ***** DAC960 RAID Driver Version 2.0.0 of 23 March 1999 ***** |
| 597 | Copyright 1998-1999 by Leonard N. Zubkoff <lnz@dandelion.com> |
| 598 | Configuring Mylex DAC960PJ PCI RAID Controller |
| 599 | Firmware Version: 4.06-0-08, Channels: 3, Memory Size: 8MB |
| 600 | PCI Bus: 0, Device: 19, Function: 1, I/O Address: Unassigned |
| 601 | PCI Address: 0xFD4FC000 mapped at 0x8807000, IRQ Channel: 9 |
| 602 | Controller Queue Depth: 128, Maximum Blocks per Command: 128 |
| 603 | Driver Queue Depth: 127, Maximum Scatter/Gather Segments: 33 |
| 604 | Stripe Size: 64KB, Segment Size: 8KB, BIOS Geometry: 255/63 |
| 605 | Physical Devices: |
| 606 | 0:1 - Disk: Online, 2201600 blocks |
| 607 | 0:2 - Disk: Online, 2201600 blocks |
| 608 | 0:3 - Disk: Online, 2201600 blocks |
| 609 | 1:1 - Disk: Online, 2201600 blocks |
| 610 | 1:2 - Disk: Online, 2201600 blocks |
| 611 | 1:3 - Disk: Standby, 2201600 blocks |
| 612 | Logical Drives: |
| 613 | /dev/rd/c0d0: RAID-5, Online, 4399104 blocks, Write Thru |
| 614 | /dev/rd/c0d1: RAID-6, Online, 2754560 blocks, Write Thru |
| 615 | No Rebuild or Consistency Check in Progress |
| 616 | |
| 617 | gwynedd:/u/lnz# cat /proc/rd/status |
| 618 | OK |
| 619 | |
| 620 | The above messages indicate that everything is healthy, and /proc/rd/status |
| 621 | returns "OK" indicating that there are no problems with any DAC960 controller |
| 622 | in the system. For demonstration purposes, while I/O is active Physical Drive |
| 623 | 1:2 is now disconnected, simulating a drive failure. The failure is noted by |
| 624 | the driver within 10 seconds of the controller's having detected it, and the |
| 625 | driver logs the following console status messages: |
| 626 | |
| 627 | DAC960#0: Physical Drive 1:1 Error Log: Sense Key = 6, ASC = 29, ASCQ = 02 |
| 628 | DAC960#0: Physical Drive 1:3 Error Log: Sense Key = 6, ASC = 29, ASCQ = 02 |
| 629 | DAC960#0: Physical Drive 1:2 killed because of timeout on SCSI command |
| 630 | DAC960#0: Physical Drive 1:2 is now DEAD |
| 631 | DAC960#0: Physical Drive 1:2 killed because it was removed |
| 632 | DAC960#0: Logical Drive 0 (/dev/rd/c0d0) is now CRITICAL |
| 633 | DAC960#0: Logical Drive 1 (/dev/rd/c0d1) is now CRITICAL |
| 634 | |
| 635 | Since a standby drive is configured, the controller automatically begins |
| 636 | rebuilding onto the standby drive: |
| 637 | |
| 638 | DAC960#0: Physical Drive 1:3 is now WRITE-ONLY |
| 639 | DAC960#0: Rebuild in Progress: Logical Drive 0 (/dev/rd/c0d0) 4% completed |
| 640 | |
| 641 | Concurrently with the above, the driver status available from /proc/rd also |
| 642 | reflects the drive failure and automatic rebuild. The status message in |
| 643 | /proc/rd/status has changed from "OK" to "ALERT": |
| 644 | |
| 645 | gwynedd:/u/lnz# cat /proc/rd/status |
| 646 | ALERT |
| 647 | |
| 648 | and /proc/rd/c0/current_status has been updated: |
| 649 | |
| 650 | gwynedd:/u/lnz# cat /proc/rd/c0/current_status |
| 651 | ... |
| 652 | Physical Devices: |
| 653 | 0:1 - Disk: Online, 2201600 blocks |
| 654 | 0:2 - Disk: Online, 2201600 blocks |
| 655 | 0:3 - Disk: Online, 2201600 blocks |
| 656 | 1:1 - Disk: Online, 2201600 blocks |
| 657 | 1:2 - Disk: Dead, 2201600 blocks |
| 658 | 1:3 - Disk: Write-Only, 2201600 blocks |
| 659 | Logical Drives: |
| 660 | /dev/rd/c0d0: RAID-5, Critical, 4399104 blocks, Write Thru |
| 661 | /dev/rd/c0d1: RAID-6, Critical, 2754560 blocks, Write Thru |
| 662 | Rebuild in Progress: Logical Drive 0 (/dev/rd/c0d0) 4% completed |
| 663 | |
| 664 | As the rebuild progresses, the current status in /proc/rd/c0/current_status is |
| 665 | updated every 10 seconds: |
| 666 | |
| 667 | gwynedd:/u/lnz# cat /proc/rd/c0/current_status |
| 668 | ... |
| 669 | Physical Devices: |
| 670 | 0:1 - Disk: Online, 2201600 blocks |
| 671 | 0:2 - Disk: Online, 2201600 blocks |
| 672 | 0:3 - Disk: Online, 2201600 blocks |
| 673 | 1:1 - Disk: Online, 2201600 blocks |
| 674 | 1:2 - Disk: Dead, 2201600 blocks |
| 675 | 1:3 - Disk: Write-Only, 2201600 blocks |
| 676 | Logical Drives: |
| 677 | /dev/rd/c0d0: RAID-5, Critical, 4399104 blocks, Write Thru |
| 678 | /dev/rd/c0d1: RAID-6, Critical, 2754560 blocks, Write Thru |
| 679 | Rebuild in Progress: Logical Drive 0 (/dev/rd/c0d0) 40% completed |
| 680 | |
| 681 | and every minute a progress message is logged on the console by the driver: |
| 682 | |
| 683 | DAC960#0: Rebuild in Progress: Logical Drive 0 (/dev/rd/c0d0) 40% completed |
| 684 | DAC960#0: Rebuild in Progress: Logical Drive 0 (/dev/rd/c0d0) 76% completed |
| 685 | DAC960#0: Rebuild in Progress: Logical Drive 1 (/dev/rd/c0d1) 66% completed |
| 686 | DAC960#0: Rebuild in Progress: Logical Drive 1 (/dev/rd/c0d1) 84% completed |
| 687 | |
| 688 | Finally, the rebuild completes successfully. The driver logs the status of the |
| 689 | logical and physical drives and the rebuild completion: |
| 690 | |
| 691 | DAC960#0: Rebuild Completed Successfully |
| 692 | DAC960#0: Physical Drive 1:3 is now ONLINE |
| 693 | DAC960#0: Logical Drive 0 (/dev/rd/c0d0) is now ONLINE |
| 694 | DAC960#0: Logical Drive 1 (/dev/rd/c0d1) is now ONLINE |
| 695 | |
| 696 | /proc/rd/c0/current_status is updated: |
| 697 | |
| 698 | ***** DAC960 RAID Driver Version 2.0.0 of 23 March 1999 ***** |
| 699 | Copyright 1998-1999 by Leonard N. Zubkoff <lnz@dandelion.com> |
| 700 | Configuring Mylex DAC960PJ PCI RAID Controller |
| 701 | Firmware Version: 4.06-0-08, Channels: 3, Memory Size: 8MB |
| 702 | PCI Bus: 0, Device: 19, Function: 1, I/O Address: Unassigned |
| 703 | PCI Address: 0xFD4FC000 mapped at 0x8807000, IRQ Channel: 9 |
| 704 | Controller Queue Depth: 128, Maximum Blocks per Command: 128 |
| 705 | Driver Queue Depth: 127, Maximum Scatter/Gather Segments: 33 |
| 706 | Stripe Size: 64KB, Segment Size: 8KB, BIOS Geometry: 255/63 |
| 707 | Physical Devices: |
| 708 | 0:1 - Disk: Online, 2201600 blocks |
| 709 | 0:2 - Disk: Online, 2201600 blocks |
| 710 | 0:3 - Disk: Online, 2201600 blocks |
| 711 | 1:1 - Disk: Online, 2201600 blocks |
| 712 | 1:2 - Disk: Dead, 2201600 blocks |
| 713 | 1:3 - Disk: Online, 2201600 blocks |
| 714 | Logical Drives: |
| 715 | /dev/rd/c0d0: RAID-5, Online, 4399104 blocks, Write Thru |
| 716 | /dev/rd/c0d1: RAID-6, Online, 2754560 blocks, Write Thru |
| 717 | Rebuild Completed Successfully |
| 718 | |
| 719 | and /proc/rd/status indicates that everything is healthy once again: |
| 720 | |
| 721 | gwynedd:/u/lnz# cat /proc/rd/status |
| 722 | OK |
| 723 | |
| 724 | Note that the absence of a viable standby drive does not create an "ALERT" |
| 725 | status. Once dead Physical Drive 1:2 has been replaced, the controller must be |
| 726 | told that this has occurred and that the newly replaced drive should become the |
| 727 | new standby drive: |
| 728 | |
| 729 | gwynedd:/u/lnz# echo "make-standby 1:2" > /proc/rd/c0/user_command |
| 730 | gwynedd:/u/lnz# cat /proc/rd/c0/user_command |
| 731 | Make Standby of Physical Drive 1:2 Succeeded |
| 732 | |
| 733 | The echo command instructs the controller to make Physical Drive 1:2 into a |
| 734 | standby drive, and the status message that results from the operation is then |
| 735 | available for reading from /proc/rd/c0/user_command, as well as being logged to |
| 736 | the console by the driver. Within 60 seconds of this command the driver logs: |
| 737 | |
| 738 | DAC960#0: Physical Drive 1:2 Error Log: Sense Key = 6, ASC = 29, ASCQ = 01 |
| 739 | DAC960#0: Physical Drive 1:2 is now STANDBY |
| 740 | DAC960#0: Make Standby of Physical Drive 1:2 Succeeded |
| 741 | |
| 742 | and /proc/rd/c0/current_status is updated: |
| 743 | |
| 744 | gwynedd:/u/lnz# cat /proc/rd/c0/current_status |
| 745 | ... |
| 746 | Physical Devices: |
| 747 | 0:1 - Disk: Online, 2201600 blocks |
| 748 | 0:2 - Disk: Online, 2201600 blocks |
| 749 | 0:3 - Disk: Online, 2201600 blocks |
| 750 | 1:1 - Disk: Online, 2201600 blocks |
| 751 | 1:2 - Disk: Standby, 2201600 blocks |
| 752 | 1:3 - Disk: Online, 2201600 blocks |
| 753 | Logical Drives: |
| 754 | /dev/rd/c0d0: RAID-5, Online, 4399104 blocks, Write Thru |
| 755 | /dev/rd/c0d1: RAID-6, Online, 2754560 blocks, Write Thru |
| 756 | Rebuild Completed Successfully |