Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 1 | <?xml version="1.0" encoding="UTF-8"?> |
| 2 | <!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.1.2//EN" |
| 3 | "http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd" []> |
| 4 | |
| 5 | <book id="libataDevGuide"> |
| 6 | <bookinfo> |
| 7 | <title>libATA Developer's Guide</title> |
| 8 | |
| 9 | <authorgroup> |
| 10 | <author> |
| 11 | <firstname>Jeff</firstname> |
| 12 | <surname>Garzik</surname> |
| 13 | </author> |
| 14 | </authorgroup> |
| 15 | |
| 16 | <copyright> |
Jeff Garzik | 780a87f | 2005-05-30 15:41:05 -0400 | [diff] [blame] | 17 | <year>2003-2005</year> |
Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 18 | <holder>Jeff Garzik</holder> |
| 19 | </copyright> |
| 20 | |
| 21 | <legalnotice> |
| 22 | <para> |
| 23 | The contents of this file are subject to the Open |
| 24 | Software License version 1.1 that can be found at |
| 25 | <ulink url="http://www.opensource.org/licenses/osl-1.1.txt">http://www.opensource.org/licenses/osl-1.1.txt</ulink> and is included herein |
| 26 | by reference. |
| 27 | </para> |
| 28 | |
| 29 | <para> |
| 30 | Alternatively, the contents of this file may be used under the terms |
| 31 | of the GNU General Public License version 2 (the "GPL") as distributed |
| 32 | in the kernel source COPYING file, in which case the provisions of |
| 33 | the GPL are applicable instead of the above. If you wish to allow |
| 34 | the use of your version of this file only under the terms of the |
| 35 | GPL and not to allow others to use your version of this file under |
| 36 | the OSL, indicate your decision by deleting the provisions above and |
| 37 | replace them with the notice and other provisions required by the GPL. |
| 38 | If you do not delete the provisions above, a recipient may use your |
| 39 | version of this file under either the OSL or the GPL. |
| 40 | </para> |
| 41 | |
| 42 | </legalnotice> |
| 43 | </bookinfo> |
| 44 | |
| 45 | <toc></toc> |
| 46 | |
Jeff Garzik | 07dd39b | 2005-05-30 13:15:52 -0400 | [diff] [blame] | 47 | <chapter id="libataIntroduction"> |
| 48 | <title>Introduction</title> |
| 49 | <para> |
| 50 | libATA is a library used inside the Linux kernel to support ATA host |
| 51 | controllers and devices. libATA provides an ATA driver API, class |
| 52 | transports for ATA and ATAPI devices, and SCSI<->ATA translation |
| 53 | for ATA devices according to the T10 SAT specification. |
| 54 | </para> |
| 55 | <para> |
| 56 | This Guide documents the libATA driver API, library functions, library |
| 57 | internals, and a couple sample ATA low-level drivers. |
| 58 | </para> |
| 59 | </chapter> |
| 60 | |
Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 61 | <chapter id="libataDriverApi"> |
| 62 | <title>libata Driver API</title> |
Jeff Garzik | 92bab26 | 2005-05-31 20:43:57 -0400 | [diff] [blame] | 63 | <para> |
| 64 | struct ata_port_operations is defined for every low-level libata |
| 65 | hardware driver, and it controls how the low-level driver |
| 66 | interfaces with the ATA and SCSI layers. |
| 67 | </para> |
| 68 | <para> |
| 69 | FIS-based drivers will hook into the system with ->qc_prep() and |
| 70 | ->qc_issue() high-level hooks. Hardware which behaves in a manner |
| 71 | similar to PCI IDE hardware may utilize several generic helpers, |
| 72 | defining at a bare minimum the bus I/O addresses of the ATA shadow |
| 73 | register blocks. |
| 74 | </para> |
Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 75 | <sect1> |
| 76 | <title>struct ata_port_operations</title> |
| 77 | |
Jeff Garzik | 92bab26 | 2005-05-31 20:43:57 -0400 | [diff] [blame] | 78 | <sect2><title>Disable ATA port</title> |
Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 79 | <programlisting> |
| 80 | void (*port_disable) (struct ata_port *); |
| 81 | </programlisting> |
| 82 | |
| 83 | <para> |
| 84 | Called from ata_bus_probe() and ata_bus_reset() error paths, |
| 85 | as well as when unregistering from the SCSI module (rmmod, hot |
| 86 | unplug). |
Edward Falk | 8b2af8f | 2005-06-15 14:26:39 -0700 | [diff] [blame] | 87 | This function should do whatever needs to be done to take the |
| 88 | port out of use. In most cases, ata_port_disable() can be used |
| 89 | as this hook. |
| 90 | </para> |
| 91 | <para> |
| 92 | Called from ata_bus_probe() on a failed probe. |
| 93 | Called from ata_bus_reset() on a failed bus reset. |
| 94 | Called from ata_scsi_release(). |
Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 95 | </para> |
| 96 | |
Jeff Garzik | 92bab26 | 2005-05-31 20:43:57 -0400 | [diff] [blame] | 97 | </sect2> |
| 98 | |
| 99 | <sect2><title>Post-IDENTIFY device configuration</title> |
Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 100 | <programlisting> |
| 101 | void (*dev_config) (struct ata_port *, struct ata_device *); |
| 102 | </programlisting> |
| 103 | |
| 104 | <para> |
| 105 | Called after IDENTIFY [PACKET] DEVICE is issued to each device |
| 106 | found. Typically used to apply device-specific fixups prior to |
| 107 | issue of SET FEATURES - XFER MODE, and prior to operation. |
| 108 | </para> |
Edward Falk | 8b2af8f | 2005-06-15 14:26:39 -0700 | [diff] [blame] | 109 | <para> |
| 110 | Called by ata_device_add() after ata_dev_identify() determines |
| 111 | a device is present. |
| 112 | </para> |
| 113 | <para> |
| 114 | This entry may be specified as NULL in ata_port_operations. |
| 115 | </para> |
Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 116 | |
Jeff Garzik | 92bab26 | 2005-05-31 20:43:57 -0400 | [diff] [blame] | 117 | </sect2> |
| 118 | |
| 119 | <sect2><title>Set PIO/DMA mode</title> |
Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 120 | <programlisting> |
| 121 | void (*set_piomode) (struct ata_port *, struct ata_device *); |
| 122 | void (*set_dmamode) (struct ata_port *, struct ata_device *); |
| 123 | void (*post_set_mode) (struct ata_port *ap); |
| 124 | </programlisting> |
| 125 | |
| 126 | <para> |
| 127 | Hooks called prior to the issue of SET FEATURES - XFER MODE |
| 128 | command. dev->pio_mode is guaranteed to be valid when |
| 129 | ->set_piomode() is called, and dev->dma_mode is guaranteed to be |
| 130 | valid when ->set_dmamode() is called. ->post_set_mode() is |
| 131 | called unconditionally, after the SET FEATURES - XFER MODE |
| 132 | command completes successfully. |
| 133 | </para> |
| 134 | |
| 135 | <para> |
| 136 | ->set_piomode() is always called (if present), but |
| 137 | ->set_dma_mode() is only called if DMA is possible. |
| 138 | </para> |
| 139 | |
Jeff Garzik | 92bab26 | 2005-05-31 20:43:57 -0400 | [diff] [blame] | 140 | </sect2> |
| 141 | |
| 142 | <sect2><title>Taskfile read/write</title> |
Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 143 | <programlisting> |
| 144 | void (*tf_load) (struct ata_port *ap, struct ata_taskfile *tf); |
| 145 | void (*tf_read) (struct ata_port *ap, struct ata_taskfile *tf); |
| 146 | </programlisting> |
| 147 | |
| 148 | <para> |
| 149 | ->tf_load() is called to load the given taskfile into hardware |
| 150 | registers / DMA buffers. ->tf_read() is called to read the |
| 151 | hardware registers / DMA buffers, to obtain the current set of |
| 152 | taskfile register values. |
Edward Falk | 8b2af8f | 2005-06-15 14:26:39 -0700 | [diff] [blame] | 153 | Most drivers for taskfile-based hardware (PIO or MMIO) use |
| 154 | ata_tf_load() and ata_tf_read() for these hooks. |
Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 155 | </para> |
| 156 | |
Jeff Garzik | 92bab26 | 2005-05-31 20:43:57 -0400 | [diff] [blame] | 157 | </sect2> |
| 158 | |
| 159 | <sect2><title>ATA command execute</title> |
Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 160 | <programlisting> |
| 161 | void (*exec_command)(struct ata_port *ap, struct ata_taskfile *tf); |
| 162 | </programlisting> |
| 163 | |
| 164 | <para> |
| 165 | causes an ATA command, previously loaded with |
| 166 | ->tf_load(), to be initiated in hardware. |
Edward Falk | 8b2af8f | 2005-06-15 14:26:39 -0700 | [diff] [blame] | 167 | Most drivers for taskfile-based hardware use ata_exec_command() |
| 168 | for this hook. |
Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 169 | </para> |
| 170 | |
Jeff Garzik | 92bab26 | 2005-05-31 20:43:57 -0400 | [diff] [blame] | 171 | </sect2> |
| 172 | |
| 173 | <sect2><title>Per-cmd ATAPI DMA capabilities filter</title> |
Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 174 | <programlisting> |
Jeff Garzik | 780a87f | 2005-05-30 15:41:05 -0400 | [diff] [blame] | 175 | int (*check_atapi_dma) (struct ata_queued_cmd *qc); |
Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 176 | </programlisting> |
| 177 | |
| 178 | <para> |
Jeff Garzik | 780a87f | 2005-05-30 15:41:05 -0400 | [diff] [blame] | 179 | Allow low-level driver to filter ATA PACKET commands, returning a status |
| 180 | indicating whether or not it is OK to use DMA for the supplied PACKET |
| 181 | command. |
| 182 | </para> |
Edward Falk | 8b2af8f | 2005-06-15 14:26:39 -0700 | [diff] [blame] | 183 | <para> |
| 184 | This hook may be specified as NULL, in which case libata will |
| 185 | assume that atapi dma can be supported. |
| 186 | </para> |
Jeff Garzik | 780a87f | 2005-05-30 15:41:05 -0400 | [diff] [blame] | 187 | |
Jeff Garzik | 92bab26 | 2005-05-31 20:43:57 -0400 | [diff] [blame] | 188 | </sect2> |
| 189 | |
| 190 | <sect2><title>Read specific ATA shadow registers</title> |
Jeff Garzik | 780a87f | 2005-05-30 15:41:05 -0400 | [diff] [blame] | 191 | <programlisting> |
| 192 | u8 (*check_status)(struct ata_port *ap); |
| 193 | u8 (*check_altstatus)(struct ata_port *ap); |
| 194 | u8 (*check_err)(struct ata_port *ap); |
| 195 | </programlisting> |
| 196 | |
| 197 | <para> |
| 198 | Reads the Status/AltStatus/Error ATA shadow register from |
| 199 | hardware. On some hardware, reading the Status register has |
| 200 | the side effect of clearing the interrupt condition. |
Edward Falk | 8b2af8f | 2005-06-15 14:26:39 -0700 | [diff] [blame] | 201 | Most drivers for taskfile-based hardware use |
| 202 | ata_check_status() for this hook. |
| 203 | </para> |
| 204 | <para> |
| 205 | Note that because this is called from ata_device_add(), at |
| 206 | least a dummy function that clears device interrupts must be |
| 207 | provided for all drivers, even if the controller doesn't |
| 208 | actually have a taskfile status register. |
Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 209 | </para> |
| 210 | |
Jeff Garzik | 92bab26 | 2005-05-31 20:43:57 -0400 | [diff] [blame] | 211 | </sect2> |
| 212 | |
| 213 | <sect2><title>Select ATA device on bus</title> |
Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 214 | <programlisting> |
| 215 | void (*dev_select)(struct ata_port *ap, unsigned int device); |
| 216 | </programlisting> |
| 217 | |
| 218 | <para> |
| 219 | Issues the low-level hardware command(s) that causes one of N |
| 220 | hardware devices to be considered 'selected' (active and |
Jeff Garzik | 780a87f | 2005-05-30 15:41:05 -0400 | [diff] [blame] | 221 | available for use) on the ATA bus. This generally has no |
Edward Falk | 8b2af8f | 2005-06-15 14:26:39 -0700 | [diff] [blame] | 222 | meaning on FIS-based devices. |
| 223 | </para> |
| 224 | <para> |
| 225 | Most drivers for taskfile-based hardware use |
| 226 | ata_std_dev_select() for this hook. Controllers which do not |
| 227 | support second drives on a port (such as SATA contollers) will |
| 228 | use ata_noop_dev_select(). |
Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 229 | </para> |
| 230 | |
Jeff Garzik | 92bab26 | 2005-05-31 20:43:57 -0400 | [diff] [blame] | 231 | </sect2> |
| 232 | |
| 233 | <sect2><title>Reset ATA bus</title> |
Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 234 | <programlisting> |
| 235 | void (*phy_reset) (struct ata_port *ap); |
| 236 | </programlisting> |
| 237 | |
| 238 | <para> |
| 239 | The very first step in the probe phase. Actions vary depending |
| 240 | on the bus type, typically. After waking up the device and probing |
| 241 | for device presence (PATA and SATA), typically a soft reset |
| 242 | (SRST) will be performed. Drivers typically use the helper |
| 243 | functions ata_bus_reset() or sata_phy_reset() for this hook. |
Edward Falk | 8b2af8f | 2005-06-15 14:26:39 -0700 | [diff] [blame] | 244 | Many SATA drivers use sata_phy_reset() or call it from within |
| 245 | their own phy_reset() functions. |
Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 246 | </para> |
| 247 | |
Jeff Garzik | 92bab26 | 2005-05-31 20:43:57 -0400 | [diff] [blame] | 248 | </sect2> |
| 249 | |
| 250 | <sect2><title>Control PCI IDE BMDMA engine</title> |
Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 251 | <programlisting> |
| 252 | void (*bmdma_setup) (struct ata_queued_cmd *qc); |
| 253 | void (*bmdma_start) (struct ata_queued_cmd *qc); |
Jeff Garzik | 780a87f | 2005-05-30 15:41:05 -0400 | [diff] [blame] | 254 | void (*bmdma_stop) (struct ata_port *ap); |
| 255 | u8 (*bmdma_status) (struct ata_port *ap); |
Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 256 | </programlisting> |
| 257 | |
| 258 | <para> |
Jeff Garzik | 780a87f | 2005-05-30 15:41:05 -0400 | [diff] [blame] | 259 | When setting up an IDE BMDMA transaction, these hooks arm |
| 260 | (->bmdma_setup), fire (->bmdma_start), and halt (->bmdma_stop) |
| 261 | the hardware's DMA engine. ->bmdma_status is used to read the standard |
| 262 | PCI IDE DMA Status register. |
| 263 | </para> |
| 264 | |
| 265 | <para> |
| 266 | These hooks are typically either no-ops, or simply not implemented, in |
| 267 | FIS-based drivers. |
Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 268 | </para> |
Edward Falk | 8b2af8f | 2005-06-15 14:26:39 -0700 | [diff] [blame] | 269 | <para> |
| 270 | Most legacy IDE drivers use ata_bmdma_setup() for the bmdma_setup() |
| 271 | hook. ata_bmdma_setup() will write the pointer to the PRD table to |
| 272 | the IDE PRD Table Address register, enable DMA in the DMA Command |
| 273 | register, and call exec_command() to begin the transfer. |
| 274 | </para> |
| 275 | <para> |
| 276 | Most legacy IDE drivers use ata_bmdma_start() for the bmdma_start() |
| 277 | hook. ata_bmdma_start() will write the ATA_DMA_START flag to the DMA |
| 278 | Command register. |
| 279 | </para> |
| 280 | <para> |
| 281 | Many legacy IDE drivers use ata_bmdma_stop() for the bmdma_stop() |
| 282 | hook. ata_bmdma_stop() clears the ATA_DMA_START flag in the DMA |
| 283 | command register. |
| 284 | </para> |
| 285 | <para> |
| 286 | Many legacy IDE drivers use ata_bmdma_status() as the bmdma_status() hook. |
| 287 | </para> |
Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 288 | |
Jeff Garzik | 92bab26 | 2005-05-31 20:43:57 -0400 | [diff] [blame] | 289 | </sect2> |
| 290 | |
| 291 | <sect2><title>High-level taskfile hooks</title> |
Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 292 | <programlisting> |
| 293 | void (*qc_prep) (struct ata_queued_cmd *qc); |
| 294 | int (*qc_issue) (struct ata_queued_cmd *qc); |
| 295 | </programlisting> |
| 296 | |
| 297 | <para> |
| 298 | Higher-level hooks, these two hooks can potentially supercede |
| 299 | several of the above taskfile/DMA engine hooks. ->qc_prep is |
| 300 | called after the buffers have been DMA-mapped, and is typically |
| 301 | used to populate the hardware's DMA scatter-gather table. |
| 302 | Most drivers use the standard ata_qc_prep() helper function, but |
| 303 | more advanced drivers roll their own. |
| 304 | </para> |
| 305 | <para> |
| 306 | ->qc_issue is used to make a command active, once the hardware |
| 307 | and S/G tables have been prepared. IDE BMDMA drivers use the |
| 308 | helper function ata_qc_issue_prot() for taskfile protocol-based |
Jeff Garzik | 780a87f | 2005-05-30 15:41:05 -0400 | [diff] [blame] | 309 | dispatch. More advanced drivers implement their own ->qc_issue. |
Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 310 | </para> |
Edward Falk | 8b2af8f | 2005-06-15 14:26:39 -0700 | [diff] [blame] | 311 | <para> |
| 312 | ata_qc_issue_prot() calls ->tf_load(), ->bmdma_setup(), and |
| 313 | ->bmdma_start() as necessary to initiate a transfer. |
| 314 | </para> |
Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 315 | |
Jeff Garzik | 92bab26 | 2005-05-31 20:43:57 -0400 | [diff] [blame] | 316 | </sect2> |
| 317 | |
| 318 | <sect2><title>Timeout (error) handling</title> |
Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 319 | <programlisting> |
| 320 | void (*eng_timeout) (struct ata_port *ap); |
| 321 | </programlisting> |
| 322 | |
| 323 | <para> |
Jeff Garzik | 780a87f | 2005-05-30 15:41:05 -0400 | [diff] [blame] | 324 | This is a high level error handling function, called from the |
| 325 | error handling thread, when a command times out. Most newer |
| 326 | hardware will implement its own error handling code here. IDE BMDMA |
| 327 | drivers may use the helper function ata_eng_timeout(). |
Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 328 | </para> |
| 329 | |
Jeff Garzik | 92bab26 | 2005-05-31 20:43:57 -0400 | [diff] [blame] | 330 | </sect2> |
| 331 | |
| 332 | <sect2><title>Hardware interrupt handling</title> |
Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 333 | <programlisting> |
| 334 | irqreturn_t (*irq_handler)(int, void *, struct pt_regs *); |
| 335 | void (*irq_clear) (struct ata_port *); |
| 336 | </programlisting> |
| 337 | |
| 338 | <para> |
| 339 | ->irq_handler is the interrupt handling routine registered with |
| 340 | the system, by libata. ->irq_clear is called during probe just |
| 341 | before the interrupt handler is registered, to be sure hardware |
| 342 | is quiet. |
| 343 | </para> |
Edward Falk | 8b2af8f | 2005-06-15 14:26:39 -0700 | [diff] [blame] | 344 | <para> |
| 345 | The second argument, dev_instance, should be cast to a pointer |
| 346 | to struct ata_host_set. |
| 347 | </para> |
| 348 | <para> |
| 349 | Most legacy IDE drivers use ata_interrupt() for the |
| 350 | irq_handler hook, which scans all ports in the host_set, |
| 351 | determines which queued command was active (if any), and calls |
| 352 | ata_host_intr(ap,qc). |
| 353 | </para> |
| 354 | <para> |
| 355 | Most legacy IDE drivers use ata_bmdma_irq_clear() for the |
| 356 | irq_clear() hook, which simply clears the interrupt and error |
| 357 | flags in the DMA status register. |
| 358 | </para> |
Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 359 | |
Jeff Garzik | 92bab26 | 2005-05-31 20:43:57 -0400 | [diff] [blame] | 360 | </sect2> |
| 361 | |
| 362 | <sect2><title>SATA phy read/write</title> |
Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 363 | <programlisting> |
| 364 | u32 (*scr_read) (struct ata_port *ap, unsigned int sc_reg); |
| 365 | void (*scr_write) (struct ata_port *ap, unsigned int sc_reg, |
| 366 | u32 val); |
| 367 | </programlisting> |
| 368 | |
| 369 | <para> |
| 370 | Read and write standard SATA phy registers. Currently only used |
| 371 | if ->phy_reset hook called the sata_phy_reset() helper function. |
Edward Falk | 8b2af8f | 2005-06-15 14:26:39 -0700 | [diff] [blame] | 372 | sc_reg is one of SCR_STATUS, SCR_CONTROL, SCR_ERROR, or SCR_ACTIVE. |
Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 373 | </para> |
| 374 | |
Jeff Garzik | 92bab26 | 2005-05-31 20:43:57 -0400 | [diff] [blame] | 375 | </sect2> |
| 376 | |
| 377 | <sect2><title>Init and shutdown</title> |
Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 378 | <programlisting> |
| 379 | int (*port_start) (struct ata_port *ap); |
| 380 | void (*port_stop) (struct ata_port *ap); |
| 381 | void (*host_stop) (struct ata_host_set *host_set); |
| 382 | </programlisting> |
| 383 | |
| 384 | <para> |
| 385 | ->port_start() is called just after the data structures for each |
| 386 | port are initialized. Typically this is used to alloc per-port |
| 387 | DMA buffers / tables / rings, enable DMA engines, and similar |
Edward Falk | 8b2af8f | 2005-06-15 14:26:39 -0700 | [diff] [blame] | 388 | tasks. Some drivers also use this entry point as a chance to |
| 389 | allocate driver-private memory for ap->private_data. |
| 390 | </para> |
| 391 | <para> |
| 392 | Many drivers use ata_port_start() as this hook or call |
| 393 | it from their own port_start() hooks. ata_port_start() |
| 394 | allocates space for a legacy IDE PRD table and returns. |
Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 395 | </para> |
| 396 | <para> |
Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 397 | ->port_stop() is called after ->host_stop(). It's sole function |
| 398 | is to release DMA/memory resources, now that they are no longer |
Edward Falk | 8b2af8f | 2005-06-15 14:26:39 -0700 | [diff] [blame] | 399 | actively being used. Many drivers also free driver-private |
| 400 | data from port at this time. |
| 401 | </para> |
| 402 | <para> |
| 403 | Many drivers use ata_port_stop() as this hook, which frees the |
| 404 | PRD table. |
Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 405 | </para> |
Jeff Garzik | 780a87f | 2005-05-30 15:41:05 -0400 | [diff] [blame] | 406 | <para> |
| 407 | ->host_stop() is called after all ->port_stop() calls |
| 408 | have completed. The hook must finalize hardware shutdown, release DMA |
| 409 | and other resources, etc. |
Edward Falk | 8b2af8f | 2005-06-15 14:26:39 -0700 | [diff] [blame] | 410 | This hook may be specified as NULL, in which case it is not called. |
Jeff Garzik | 780a87f | 2005-05-30 15:41:05 -0400 | [diff] [blame] | 411 | </para> |
Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 412 | |
Jeff Garzik | 92bab26 | 2005-05-31 20:43:57 -0400 | [diff] [blame] | 413 | </sect2> |
| 414 | |
Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 415 | </sect1> |
Tejun Heo | bfd0072 | 2005-09-26 11:28:47 +0900 | [diff] [blame^] | 416 | <sect1> |
| 417 | <title>Error handling</title> |
| 418 | |
| 419 | <para> |
| 420 | This chapter describes how errors are handled under libata. |
| 421 | Readers are advised to read SCSI EH |
| 422 | (Documentation/scsi/scsi_eh.txt) and ATA exceptions doc first. |
| 423 | </para> |
| 424 | |
| 425 | <sect2><title>Origins of commands</title> |
| 426 | <para> |
| 427 | In libata, a command is represented with struct ata_queued_cmd |
| 428 | or qc. qc's are preallocated during port initialization and |
| 429 | repetitively used for command executions. Currently only one |
| 430 | qc is allocated per port but yet-to-be-merged NCQ branch |
| 431 | allocates one for each tag and maps each qc to NCQ tag 1-to-1. |
| 432 | </para> |
| 433 | <para> |
| 434 | libata commands can originate from two sources - libata itself |
| 435 | and SCSI midlayer. libata internal commands are used for |
| 436 | initialization and error handling. All normal blk requests |
| 437 | and commands for SCSI emulation are passed as SCSI commands |
| 438 | through queuecommand callback of SCSI host template. |
| 439 | </para> |
| 440 | </sect2> |
| 441 | |
| 442 | <sect2><title>How commands are issued</title> |
| 443 | |
| 444 | <variablelist> |
| 445 | |
| 446 | <varlistentry><term>Internal commands</term> |
| 447 | <listitem> |
| 448 | <para> |
| 449 | First, qc is allocated and initialized using |
| 450 | ata_qc_new_init(). Although ata_qc_new_init() doesn't |
| 451 | implement any wait or retry mechanism when qc is not |
| 452 | available, internal commands are currently issued only during |
| 453 | initialization and error recovery, so no other command is |
| 454 | active and allocation is guaranteed to succeed. |
| 455 | </para> |
| 456 | <para> |
| 457 | Once allocated qc's taskfile is initialized for the command to |
| 458 | be executed. qc currently has two mechanisms to notify |
| 459 | completion. One is via qc->complete_fn() callback and the |
| 460 | other is completion qc->waiting. qc->complete_fn() callback |
| 461 | is the asynchronous path used by normal SCSI translated |
| 462 | commands and qc->waiting is the synchronous (issuer sleeps in |
| 463 | process context) path used by internal commands. |
| 464 | </para> |
| 465 | <para> |
| 466 | Once initialization is complete, host_set lock is acquired |
| 467 | and the qc is issued. |
| 468 | </para> |
| 469 | </listitem> |
| 470 | </varlistentry> |
| 471 | |
| 472 | <varlistentry><term>SCSI commands</term> |
| 473 | <listitem> |
| 474 | <para> |
| 475 | All libata drivers use ata_scsi_queuecmd() as |
| 476 | hostt->queuecommand callback. scmds can either be simulated |
| 477 | or translated. No qc is involved in processing a simulated |
| 478 | scmd. The result is computed right away and the scmd is |
| 479 | completed. |
| 480 | </para> |
| 481 | <para> |
| 482 | For a translated scmd, ata_qc_new_init() is invoked to |
| 483 | allocate a qc and the scmd is translated into the qc. SCSI |
| 484 | midlayer's completion notification function pointer is stored |
| 485 | into qc->scsidone. |
| 486 | </para> |
| 487 | <para> |
| 488 | qc->complete_fn() callback is used for completion |
| 489 | notification. ATA commands use ata_scsi_qc_complete() while |
| 490 | ATAPI commands use atapi_qc_complete(). Both functions end up |
| 491 | calling qc->scsidone to notify upper layer when the qc is |
| 492 | finished. After translation is completed, the qc is issued |
| 493 | with ata_qc_issue(). |
| 494 | </para> |
| 495 | <para> |
| 496 | Note that SCSI midlayer invokes hostt->queuecommand while |
| 497 | holding host_set lock, so all above occur while holding |
| 498 | host_set lock. |
| 499 | </para> |
| 500 | </listitem> |
| 501 | </varlistentry> |
| 502 | |
| 503 | </variablelist> |
| 504 | </sect2> |
| 505 | |
| 506 | <sect2><title>How commands are processed</title> |
| 507 | <para> |
| 508 | Depending on which protocol and which controller are used, |
| 509 | commands are processed differently. For the purpose of |
| 510 | discussion, a controller which uses taskfile interface and all |
| 511 | standard callbacks is assumed. |
| 512 | </para> |
| 513 | <para> |
| 514 | Currently 6 ATA command protocols are used. They can be |
| 515 | sorted into the following four categories according to how |
| 516 | they are processed. |
| 517 | </para> |
| 518 | |
| 519 | <variablelist> |
| 520 | <varlistentry><term>ATA NO DATA or DMA</term> |
| 521 | <listitem> |
| 522 | <para> |
| 523 | ATA_PROT_NODATA and ATA_PROT_DMA fall into this category. |
| 524 | These types of commands don't require any software |
| 525 | intervention once issued. Device will raise interrupt on |
| 526 | completion. |
| 527 | </para> |
| 528 | </listitem> |
| 529 | </varlistentry> |
| 530 | |
| 531 | <varlistentry><term>ATA PIO</term> |
| 532 | <listitem> |
| 533 | <para> |
| 534 | ATA_PROT_PIO is in this category. libata currently |
| 535 | implements PIO with polling. ATA_NIEN bit is set to turn |
| 536 | off interrupt and pio_task on ata_wq performs polling and |
| 537 | IO. |
| 538 | </para> |
| 539 | </listitem> |
| 540 | </varlistentry> |
| 541 | |
| 542 | <varlistentry><term>ATAPI NODATA or DMA</term> |
| 543 | <listitem> |
| 544 | <para> |
| 545 | ATA_PROT_ATAPI_NODATA and ATA_PROT_ATAPI_DMA are in this |
| 546 | category. packet_task is used to poll BSY bit after |
| 547 | issuing PACKET command. Once BSY is turned off by the |
| 548 | device, packet_task transfers CDB and hands off processing |
| 549 | to interrupt handler. |
| 550 | </para> |
| 551 | </listitem> |
| 552 | </varlistentry> |
| 553 | |
| 554 | <varlistentry><term>ATAPI PIO</term> |
| 555 | <listitem> |
| 556 | <para> |
| 557 | ATA_PROT_ATAPI is in this category. ATA_NIEN bit is set |
| 558 | and, as in ATAPI NODATA or DMA, packet_task submits cdb. |
| 559 | However, after submitting cdb, further processing (data |
| 560 | transfer) is handed off to pio_task. |
| 561 | </para> |
| 562 | </listitem> |
| 563 | </varlistentry> |
| 564 | </variablelist> |
| 565 | </sect2> |
| 566 | |
| 567 | <sect2><title>How commands are completed</title> |
| 568 | <para> |
| 569 | Once issued, all qc's are either completed with |
| 570 | ata_qc_complete() or time out. For commands which are handled |
| 571 | by interrupts, ata_host_intr() invokes ata_qc_complete(), and, |
| 572 | for PIO tasks, pio_task invokes ata_qc_complete(). In error |
| 573 | cases, packet_task may also complete commands. |
| 574 | </para> |
| 575 | <para> |
| 576 | ata_qc_complete() does the following. |
| 577 | </para> |
| 578 | |
| 579 | <orderedlist> |
| 580 | |
| 581 | <listitem> |
| 582 | <para> |
| 583 | DMA memory is unmapped. |
| 584 | </para> |
| 585 | </listitem> |
| 586 | |
| 587 | <listitem> |
| 588 | <para> |
| 589 | ATA_QCFLAG_ACTIVE is clared from qc->flags. |
| 590 | </para> |
| 591 | </listitem> |
| 592 | |
| 593 | <listitem> |
| 594 | <para> |
| 595 | qc->complete_fn() callback is invoked. If the return value of |
| 596 | the callback is not zero. Completion is short circuited and |
| 597 | ata_qc_complete() returns. |
| 598 | </para> |
| 599 | </listitem> |
| 600 | |
| 601 | <listitem> |
| 602 | <para> |
| 603 | __ata_qc_complete() is called, which does |
| 604 | <orderedlist> |
| 605 | |
| 606 | <listitem> |
| 607 | <para> |
| 608 | qc->flags is cleared to zero. |
| 609 | </para> |
| 610 | </listitem> |
| 611 | |
| 612 | <listitem> |
| 613 | <para> |
| 614 | ap->active_tag and qc->tag are poisoned. |
| 615 | </para> |
| 616 | </listitem> |
| 617 | |
| 618 | <listitem> |
| 619 | <para> |
| 620 | qc->waiting is claread & completed (in that order). |
| 621 | </para> |
| 622 | </listitem> |
| 623 | |
| 624 | <listitem> |
| 625 | <para> |
| 626 | qc is deallocated by clearing appropriate bit in ap->qactive. |
| 627 | </para> |
| 628 | </listitem> |
| 629 | |
| 630 | </orderedlist> |
| 631 | </para> |
| 632 | </listitem> |
| 633 | |
| 634 | </orderedlist> |
| 635 | |
| 636 | <para> |
| 637 | So, it basically notifies upper layer and deallocates qc. One |
| 638 | exception is short-circuit path in #3 which is used by |
| 639 | atapi_qc_complete(). |
| 640 | </para> |
| 641 | <para> |
| 642 | For all non-ATAPI commands, whether it fails or not, almost |
| 643 | the same code path is taken and very little error handling |
| 644 | takes place. A qc is completed with success status if it |
| 645 | succeeded, with failed status otherwise. |
| 646 | </para> |
| 647 | <para> |
| 648 | However, failed ATAPI commands require more handling as |
| 649 | REQUEST SENSE is needed to acquire sense data. If an ATAPI |
| 650 | command fails, ata_qc_complete() is invoked with error status, |
| 651 | which in turn invokes atapi_qc_complete() via |
| 652 | qc->complete_fn() callback. |
| 653 | </para> |
| 654 | <para> |
| 655 | This makes atapi_qc_complete() set scmd->result to |
| 656 | SAM_STAT_CHECK_CONDITION, complete the scmd and return 1. As |
| 657 | the sense data is empty but scmd->result is CHECK CONDITION, |
| 658 | SCSI midlayer will invoke EH for the scmd, and returning 1 |
| 659 | makes ata_qc_complete() to return without deallocating the qc. |
| 660 | This leads us to ata_scsi_error() with partially completed qc. |
| 661 | </para> |
| 662 | |
| 663 | </sect2> |
| 664 | |
| 665 | <sect2><title>ata_scsi_error()</title> |
| 666 | <para> |
| 667 | ata_scsi_error() is the current hostt->eh_strategy_handler() |
| 668 | for libata. As discussed above, this will be entered in two |
| 669 | cases - timeout and ATAPI error completion. This function |
| 670 | calls low level libata driver's eng_timeout() callback, the |
| 671 | standard callback for which is ata_eng_timeout(). It checks |
| 672 | if a qc is active and calls ata_qc_timeout() on the qc if so. |
| 673 | Actual error handling occurs in ata_qc_timeout(). |
| 674 | </para> |
| 675 | <para> |
| 676 | If EH is invoked for timeout, ata_qc_timeout() stops BMDMA and |
| 677 | completes the qc. Note that as we're currently in EH, we |
| 678 | cannot call scsi_done. As described in SCSI EH doc, a |
| 679 | recovered scmd should be either retried with |
| 680 | scsi_queue_insert() or finished with scsi_finish_command(). |
| 681 | Here, we override qc->scsidone with scsi_finish_command() and |
| 682 | calls ata_qc_complete(). |
| 683 | </para> |
| 684 | <para> |
| 685 | If EH is invoked due to a failed ATAPI qc, the qc here is |
| 686 | completed but not deallocated. The purpose of this |
| 687 | half-completion is to use the qc as place holder to make EH |
| 688 | code reach this place. This is a bit hackish, but it works. |
| 689 | </para> |
| 690 | <para> |
| 691 | Once control reaches here, the qc is deallocated by invoking |
| 692 | __ata_qc_complete() explicitly. Then, internal qc for REQUEST |
| 693 | SENSE is issued. Once sense data is acquired, scmd is |
| 694 | finished by directly invoking scsi_finish_command() on the |
| 695 | scmd. Note that as we already have completed and deallocated |
| 696 | the qc which was associated with the scmd, we don't need |
| 697 | to/cannot call ata_qc_complete() again. |
| 698 | </para> |
| 699 | |
| 700 | </sect2> |
| 701 | |
| 702 | <sect2><title>Problems with the current EH</title> |
| 703 | |
| 704 | <itemizedlist> |
| 705 | |
| 706 | <listitem> |
| 707 | <para> |
| 708 | Error representation is too crude. Currently any and all |
| 709 | error conditions are represented with ATA STATUS and ERROR |
| 710 | registers. Errors which aren't ATA device errors are treated |
| 711 | as ATA device errors by setting ATA_ERR bit. Better error |
| 712 | descriptor which can properly represent ATA and other |
| 713 | errors/exceptions is needed. |
| 714 | </para> |
| 715 | </listitem> |
| 716 | |
| 717 | <listitem> |
| 718 | <para> |
| 719 | When handling timeouts, no action is taken to make device |
| 720 | forget about the timed out command and ready for new commands. |
| 721 | </para> |
| 722 | </listitem> |
| 723 | |
| 724 | <listitem> |
| 725 | <para> |
| 726 | EH handling via ata_scsi_error() is not properly protected |
| 727 | from usual command processing. On EH entrance, the device is |
| 728 | not in quiescent state. Timed out commands may succeed or |
| 729 | fail any time. pio_task and atapi_task may still be running. |
| 730 | </para> |
| 731 | </listitem> |
| 732 | |
| 733 | <listitem> |
| 734 | <para> |
| 735 | Too weak error recovery. Devices / controllers causing HSM |
| 736 | mismatch errors and other errors quite often require reset to |
| 737 | return to known state. Also, advanced error handling is |
| 738 | necessary to support features like NCQ and hotplug. |
| 739 | </para> |
| 740 | </listitem> |
| 741 | |
| 742 | <listitem> |
| 743 | <para> |
| 744 | ATA errors are directly handled in the interrupt handler and |
| 745 | PIO errors in pio_task. This is problematic for advanced |
| 746 | error handling for the following reasons. |
| 747 | </para> |
| 748 | <para> |
| 749 | First, advanced error handling often requires context and |
| 750 | internal qc execution. |
| 751 | </para> |
| 752 | <para> |
| 753 | Second, even a simple failure (say, CRC error) needs |
| 754 | information gathering and could trigger complex error handling |
| 755 | (say, resetting & reconfiguring). Having multiple code |
| 756 | paths to gather information, enter EH and trigger actions |
| 757 | makes life painful. |
| 758 | </para> |
| 759 | <para> |
| 760 | Third, scattered EH code makes implementing low level drivers |
| 761 | difficult. Low level drivers override libata callbacks. If |
| 762 | EH is scattered over several places, each affected callbacks |
| 763 | should perform its part of error handling. This can be error |
| 764 | prone and painful. |
| 765 | </para> |
| 766 | </listitem> |
| 767 | |
| 768 | </itemizedlist> |
| 769 | </sect2> |
| 770 | |
| 771 | </sect1> |
Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 772 | </chapter> |
| 773 | |
| 774 | <chapter id="libataExt"> |
| 775 | <title>libata Library</title> |
| 776 | !Edrivers/scsi/libata-core.c |
| 777 | </chapter> |
| 778 | |
| 779 | <chapter id="libataInt"> |
| 780 | <title>libata Core Internals</title> |
| 781 | !Idrivers/scsi/libata-core.c |
| 782 | </chapter> |
| 783 | |
| 784 | <chapter id="libataScsiInt"> |
| 785 | <title>libata SCSI translation/emulation</title> |
| 786 | !Edrivers/scsi/libata-scsi.c |
| 787 | !Idrivers/scsi/libata-scsi.c |
| 788 | </chapter> |
| 789 | |
| 790 | <chapter id="PiixInt"> |
| 791 | <title>ata_piix Internals</title> |
| 792 | !Idrivers/scsi/ata_piix.c |
| 793 | </chapter> |
| 794 | |
| 795 | <chapter id="SILInt"> |
| 796 | <title>sata_sil Internals</title> |
| 797 | !Idrivers/scsi/sata_sil.c |
| 798 | </chapter> |
| 799 | |
Jeff Garzik | 0cba632 | 2005-05-30 19:49:12 -0400 | [diff] [blame] | 800 | <chapter id="libataThanks"> |
| 801 | <title>Thanks</title> |
| 802 | <para> |
| 803 | The bulk of the ATA knowledge comes thanks to long conversations with |
| 804 | Andre Hedrick (www.linux-ide.org), and long hours pondering the ATA |
| 805 | and SCSI specifications. |
| 806 | </para> |
| 807 | <para> |
| 808 | Thanks to Alan Cox for pointing out similarities |
| 809 | between SATA and SCSI, and in general for motivation to hack on |
| 810 | libata. |
| 811 | </para> |
| 812 | <para> |
| 813 | libata's device detection |
| 814 | method, ata_pio_devchk, and in general all the early probing was |
| 815 | based on extensive study of Hale Landis's probe/reset code in his |
| 816 | ATADRVR driver (www.ata-atapi.com). |
| 817 | </para> |
| 818 | </chapter> |
| 819 | |
Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 820 | </book> |