James Bottomley | 2908d77 | 2006-08-29 09:22:51 -0500 | [diff] [blame] | 1 | SAS Layer |
| 2 | --------- |
| 3 | |
| 4 | The SAS Layer is a management infrastructure which manages |
| 5 | SAS LLDDs. It sits between SCSI Core and SAS LLDDs. The |
| 6 | layout is as follows: while SCSI Core is concerned with |
| 7 | SAM/SPC issues, and a SAS LLDD+sequencer is concerned with |
| 8 | phy/OOB/link management, the SAS layer is concerned with: |
| 9 | |
| 10 | * SAS Phy/Port/HA event management (LLDD generates, |
| 11 | SAS Layer processes), |
| 12 | * SAS Port management (creation/destruction), |
| 13 | * SAS Domain discovery and revalidation, |
| 14 | * SAS Domain device management, |
| 15 | * SCSI Host registration/unregistration, |
| 16 | * Device registration with SCSI Core (SAS) or libata |
| 17 | (SATA), and |
| 18 | * Expander management and exporting expander control |
| 19 | to user space. |
| 20 | |
| 21 | A SAS LLDD is a PCI device driver. It is concerned with |
| 22 | phy/OOB management, and vendor specific tasks and generates |
| 23 | events to the SAS layer. |
| 24 | |
| 25 | The SAS Layer does most SAS tasks as outlined in the SAS 1.1 |
| 26 | spec. |
| 27 | |
| 28 | The sas_ha_struct describes the SAS LLDD to the SAS layer. |
| 29 | Most of it is used by the SAS Layer but a few fields need to |
| 30 | be initialized by the LLDDs. |
| 31 | |
| 32 | After initializing your hardware, from the probe() function |
| 33 | you call sas_register_ha(). It will register your LLDD with |
| 34 | the SCSI subsystem, creating a SCSI host and it will |
| 35 | register your SAS driver with the sysfs SAS tree it creates. |
| 36 | It will then return. Then you enable your phys to actually |
| 37 | start OOB (at which point your driver will start calling the |
| 38 | notify_* event callbacks). |
| 39 | |
| 40 | Structure descriptions: |
| 41 | |
| 42 | struct sas_phy -------------------- |
| 43 | Normally this is statically embedded to your driver's |
| 44 | phy structure: |
| 45 | struct my_phy { |
| 46 | blah; |
| 47 | struct sas_phy sas_phy; |
| 48 | bleh; |
| 49 | }; |
| 50 | And then all the phys are an array of my_phy in your HA |
| 51 | struct (shown below). |
| 52 | |
| 53 | Then as you go along and initialize your phys you also |
| 54 | initialize the sas_phy struct, along with your own |
| 55 | phy structure. |
| 56 | |
| 57 | In general, the phys are managed by the LLDD and the ports |
| 58 | are managed by the SAS layer. So the phys are initialized |
| 59 | and updated by the LLDD and the ports are initialized and |
| 60 | updated by the SAS layer. |
| 61 | |
| 62 | There is a scheme where the LLDD can RW certain fields, |
| 63 | and the SAS layer can only read such ones, and vice versa. |
| 64 | The idea is to avoid unnecessary locking. |
| 65 | |
| 66 | enabled -- must be set (0/1) |
| 67 | id -- must be set [0,MAX_PHYS) |
| 68 | class, proto, type, role, oob_mode, linkrate -- must be set |
| 69 | oob_mode -- you set this when OOB has finished and then notify |
| 70 | the SAS Layer. |
| 71 | |
| 72 | sas_addr -- this normally points to an array holding the sas |
| 73 | address of the phy, possibly somewhere in your my_phy |
| 74 | struct. |
| 75 | |
| 76 | attached_sas_addr -- set this when you (LLDD) receive an |
| 77 | IDENTIFY frame or a FIS frame, _before_ notifying the SAS |
| 78 | layer. The idea is that sometimes the LLDD may want to fake |
| 79 | or provide a different SAS address on that phy/port and this |
| 80 | allows it to do this. At best you should copy the sas |
| 81 | address from the IDENTIFY frame or maybe generate a SAS |
| 82 | address for SATA directly attached devices. The Discover |
| 83 | process may later change this. |
| 84 | |
| 85 | frame_rcvd -- this is where you copy the IDENTIFY/FIS frame |
| 86 | when you get it; you lock, copy, set frame_rcvd_size and |
| 87 | unlock the lock, and then call the event. It is a pointer |
| 88 | since there's no way to know your hw frame size _exactly_, |
| 89 | so you define the actual array in your phy struct and let |
| 90 | this pointer point to it. You copy the frame from your |
| 91 | DMAable memory to that area holding the lock. |
| 92 | |
| 93 | sas_prim -- this is where primitives go when they're |
| 94 | received. See sas.h. Grab the lock, set the primitive, |
| 95 | release the lock, notify. |
| 96 | |
| 97 | port -- this points to the sas_port if the phy belongs |
| 98 | to a port -- the LLDD only reads this. It points to the |
| 99 | sas_port this phy is part of. Set by the SAS Layer. |
| 100 | |
| 101 | ha -- may be set; the SAS layer sets it anyway. |
| 102 | |
| 103 | lldd_phy -- you should set this to point to your phy so you |
| 104 | can find your way around faster when the SAS layer calls one |
| 105 | of your callbacks and passes you a phy. If the sas_phy is |
| 106 | embedded you can also use container_of -- whatever you |
| 107 | prefer. |
| 108 | |
| 109 | |
| 110 | struct sas_port -------------------- |
| 111 | The LLDD doesn't set any fields of this struct -- it only |
| 112 | reads them. They should be self explanatory. |
| 113 | |
| 114 | phy_mask is 32 bit, this should be enough for now, as I |
| 115 | haven't heard of a HA having more than 8 phys. |
| 116 | |
| 117 | lldd_port -- I haven't found use for that -- maybe other |
| 118 | LLDD who wish to have internal port representation can make |
| 119 | use of this. |
| 120 | |
| 121 | |
| 122 | struct sas_ha_struct -------------------- |
| 123 | It normally is statically declared in your own LLDD |
| 124 | structure describing your adapter: |
| 125 | struct my_sas_ha { |
| 126 | blah; |
| 127 | struct sas_ha_struct sas_ha; |
| 128 | struct my_phy phys[MAX_PHYS]; |
| 129 | struct sas_port sas_ports[MAX_PHYS]; /* (1) */ |
| 130 | bleh; |
| 131 | }; |
| 132 | |
| 133 | (1) If your LLDD doesn't have its own port representation. |
| 134 | |
| 135 | What needs to be initialized (sample function given below). |
| 136 | |
| 137 | pcidev |
| 138 | sas_addr -- since the SAS layer doesn't want to mess with |
| 139 | memory allocation, etc, this points to statically |
| 140 | allocated array somewhere (say in your host adapter |
| 141 | structure) and holds the SAS address of the host |
| 142 | adapter as given by you or the manufacturer, etc. |
| 143 | sas_port |
| 144 | sas_phy -- an array of pointers to structures. (see |
| 145 | note above on sas_addr). |
| 146 | These must be set. See more notes below. |
| 147 | num_phys -- the number of phys present in the sas_phy array, |
| 148 | and the number of ports present in the sas_port |
| 149 | array. There can be a maximum num_phys ports (one per |
| 150 | port) so we drop the num_ports, and only use |
| 151 | num_phys. |
| 152 | |
| 153 | The event interface: |
| 154 | |
| 155 | /* LLDD calls these to notify the class of an event. */ |
| 156 | void (*notify_ha_event)(struct sas_ha_struct *, enum ha_event); |
| 157 | void (*notify_port_event)(struct sas_phy *, enum port_event); |
| 158 | void (*notify_phy_event)(struct sas_phy *, enum phy_event); |
| 159 | |
| 160 | When sas_register_ha() returns, those are set and can be |
| 161 | called by the LLDD to notify the SAS layer of such events |
| 162 | the SAS layer. |
| 163 | |
| 164 | The port notification: |
| 165 | |
| 166 | /* The class calls these to notify the LLDD of an event. */ |
| 167 | void (*lldd_port_formed)(struct sas_phy *); |
| 168 | void (*lldd_port_deformed)(struct sas_phy *); |
| 169 | |
| 170 | If the LLDD wants notification when a port has been formed |
| 171 | or deformed it sets those to a function satisfying the type. |
| 172 | |
| 173 | A SAS LLDD should also implement at least one of the Task |
| 174 | Management Functions (TMFs) described in SAM: |
| 175 | |
| 176 | /* Task Management Functions. Must be called from process context. */ |
| 177 | int (*lldd_abort_task)(struct sas_task *); |
| 178 | int (*lldd_abort_task_set)(struct domain_device *, u8 *lun); |
| 179 | int (*lldd_clear_aca)(struct domain_device *, u8 *lun); |
| 180 | int (*lldd_clear_task_set)(struct domain_device *, u8 *lun); |
| 181 | int (*lldd_I_T_nexus_reset)(struct domain_device *); |
| 182 | int (*lldd_lu_reset)(struct domain_device *, u8 *lun); |
| 183 | int (*lldd_query_task)(struct sas_task *); |
| 184 | |
| 185 | For more information please read SAM from T10.org. |
| 186 | |
| 187 | Port and Adapter management: |
| 188 | |
| 189 | /* Port and Adapter management */ |
| 190 | int (*lldd_clear_nexus_port)(struct sas_port *); |
| 191 | int (*lldd_clear_nexus_ha)(struct sas_ha_struct *); |
| 192 | |
| 193 | A SAS LLDD should implement at least one of those. |
| 194 | |
| 195 | Phy management: |
| 196 | |
| 197 | /* Phy management */ |
| 198 | int (*lldd_control_phy)(struct sas_phy *, enum phy_func); |
| 199 | |
| 200 | lldd_ha -- set this to point to your HA struct. You can also |
| 201 | use container_of if you embedded it as shown above. |
| 202 | |
| 203 | A sample initialization and registration function |
| 204 | can look like this (called last thing from probe()) |
| 205 | *but* before you enable the phys to do OOB: |
| 206 | |
| 207 | static int register_sas_ha(struct my_sas_ha *my_ha) |
| 208 | { |
| 209 | int i; |
| 210 | static struct sas_phy *sas_phys[MAX_PHYS]; |
| 211 | static struct sas_port *sas_ports[MAX_PHYS]; |
| 212 | |
| 213 | my_ha->sas_ha.sas_addr = &my_ha->sas_addr[0]; |
| 214 | |
| 215 | for (i = 0; i < MAX_PHYS; i++) { |
| 216 | sas_phys[i] = &my_ha->phys[i].sas_phy; |
| 217 | sas_ports[i] = &my_ha->sas_ports[i]; |
| 218 | } |
| 219 | |
| 220 | my_ha->sas_ha.sas_phy = sas_phys; |
| 221 | my_ha->sas_ha.sas_port = sas_ports; |
| 222 | my_ha->sas_ha.num_phys = MAX_PHYS; |
| 223 | |
| 224 | my_ha->sas_ha.lldd_port_formed = my_port_formed; |
| 225 | |
| 226 | my_ha->sas_ha.lldd_dev_found = my_dev_found; |
| 227 | my_ha->sas_ha.lldd_dev_gone = my_dev_gone; |
| 228 | |
| 229 | my_ha->sas_ha.lldd_max_execute_num = lldd_max_execute_num; (1) |
| 230 | |
| 231 | my_ha->sas_ha.lldd_queue_size = ha_can_queue; |
| 232 | my_ha->sas_ha.lldd_execute_task = my_execute_task; |
| 233 | |
| 234 | my_ha->sas_ha.lldd_abort_task = my_abort_task; |
| 235 | my_ha->sas_ha.lldd_abort_task_set = my_abort_task_set; |
| 236 | my_ha->sas_ha.lldd_clear_aca = my_clear_aca; |
| 237 | my_ha->sas_ha.lldd_clear_task_set = my_clear_task_set; |
| 238 | my_ha->sas_ha.lldd_I_T_nexus_reset= NULL; (2) |
| 239 | my_ha->sas_ha.lldd_lu_reset = my_lu_reset; |
| 240 | my_ha->sas_ha.lldd_query_task = my_query_task; |
| 241 | |
| 242 | my_ha->sas_ha.lldd_clear_nexus_port = my_clear_nexus_port; |
| 243 | my_ha->sas_ha.lldd_clear_nexus_ha = my_clear_nexus_ha; |
| 244 | |
| 245 | my_ha->sas_ha.lldd_control_phy = my_control_phy; |
| 246 | |
| 247 | return sas_register_ha(&my_ha->sas_ha); |
| 248 | } |
| 249 | |
| 250 | (1) This is normally a LLDD parameter, something of the |
| 251 | lines of a task collector. What it tells the SAS Layer is |
| 252 | whether the SAS layer should run in Direct Mode (default: |
| 253 | value 0 or 1) or Task Collector Mode (value greater than 1). |
| 254 | |
| 255 | In Direct Mode, the SAS Layer calls Execute Task as soon as |
| 256 | it has a command to send to the SDS, _and_ this is a single |
| 257 | command, i.e. not linked. |
| 258 | |
| 259 | Some hardware (e.g. aic94xx) has the capability to DMA more |
| 260 | than one task at a time (interrupt) from host memory. Task |
| 261 | Collector Mode is an optional feature for HAs which support |
| 262 | this in their hardware. (Again, it is completely optional |
| 263 | even if your hardware supports it.) |
| 264 | |
| 265 | In Task Collector Mode, the SAS Layer would do _natural_ |
| 266 | coalescing of tasks and at the appropriate moment it would |
| 267 | call your driver to DMA more than one task in a single HA |
| 268 | interrupt. DMBS may want to use this by insmod/modprobe |
| 269 | setting the lldd_max_execute_num to something greater than |
| 270 | 1. |
| 271 | |
| 272 | (2) SAS 1.1 does not define I_T Nexus Reset TMF. |
| 273 | |
| 274 | Events |
| 275 | ------ |
| 276 | |
| 277 | Events are _the only way_ a SAS LLDD notifies the SAS layer |
| 278 | of anything. There is no other method or way a LLDD to tell |
| 279 | the SAS layer of anything happening internally or in the SAS |
| 280 | domain. |
| 281 | |
| 282 | Phy events: |
| 283 | PHYE_LOSS_OF_SIGNAL, (C) |
| 284 | PHYE_OOB_DONE, |
| 285 | PHYE_OOB_ERROR, (C) |
| 286 | PHYE_SPINUP_HOLD. |
| 287 | |
| 288 | Port events, passed on a _phy_: |
| 289 | PORTE_BYTES_DMAED, (M) |
| 290 | PORTE_BROADCAST_RCVD, (E) |
| 291 | PORTE_LINK_RESET_ERR, (C) |
| 292 | PORTE_TIMER_EVENT, (C) |
| 293 | PORTE_HARD_RESET. |
| 294 | |
| 295 | Host Adapter event: |
| 296 | HAE_RESET |
| 297 | |
| 298 | A SAS LLDD should be able to generate |
| 299 | - at least one event from group C (choice), |
| 300 | - events marked M (mandatory) are mandatory (only one), |
| 301 | - events marked E (expander) if it wants the SAS layer |
| 302 | to handle domain revalidation (only one such). |
| 303 | - Unmarked events are optional. |
| 304 | |
| 305 | Meaning: |
| 306 | |
| 307 | HAE_RESET -- when your HA got internal error and was reset. |
| 308 | |
| 309 | PORTE_BYTES_DMAED -- on receiving an IDENTIFY/FIS frame |
| 310 | PORTE_BROADCAST_RCVD -- on receiving a primitive |
| 311 | PORTE_LINK_RESET_ERR -- timer expired, loss of signal, loss |
| 312 | of DWS, etc. (*) |
| 313 | PORTE_TIMER_EVENT -- DWS reset timeout timer expired (*) |
| 314 | PORTE_HARD_RESET -- Hard Reset primitive received. |
| 315 | |
| 316 | PHYE_LOSS_OF_SIGNAL -- the device is gone (*) |
| 317 | PHYE_OOB_DONE -- OOB went fine and oob_mode is valid |
| 318 | PHYE_OOB_ERROR -- Error while doing OOB, the device probably |
| 319 | got disconnected. (*) |
| 320 | PHYE_SPINUP_HOLD -- SATA is present, COMWAKE not sent. |
| 321 | |
| 322 | (*) should set/clear the appropriate fields in the phy, |
| 323 | or alternatively call the inlined sas_phy_disconnected() |
| 324 | which is just a helper, from their tasklet. |
| 325 | |
| 326 | The Execute Command SCSI RPC: |
| 327 | |
| 328 | int (*lldd_execute_task)(struct sas_task *, int num, |
| 329 | unsigned long gfp_flags); |
| 330 | |
| 331 | Used to queue a task to the SAS LLDD. @task is the tasks to |
| 332 | be executed. @num should be the number of tasks being |
| 333 | queued at this function call (they are linked listed via |
| 334 | task::list), @gfp_mask should be the gfp_mask defining the |
| 335 | context of the caller. |
| 336 | |
| 337 | This function should implement the Execute Command SCSI RPC, |
| 338 | or if you're sending a SCSI Task as linked commands, you |
| 339 | should also use this function. |
| 340 | |
| 341 | That is, when lldd_execute_task() is called, the command(s) |
| 342 | go out on the transport *immediately*. There is *no* |
| 343 | queuing of any sort and at any level in a SAS LLDD. |
| 344 | |
| 345 | The use of task::list is two-fold, one for linked commands, |
| 346 | the other discussed below. |
| 347 | |
| 348 | It is possible to queue up more than one task at a time, by |
| 349 | initializing the list element of struct sas_task, and |
| 350 | passing the number of tasks enlisted in this manner in num. |
| 351 | |
| 352 | Returns: -SAS_QUEUE_FULL, -ENOMEM, nothing was queued; |
| 353 | 0, the task(s) were queued. |
| 354 | |
| 355 | If you want to pass num > 1, then either |
| 356 | A) you're the only caller of this function and keep track |
| 357 | of what you've queued to the LLDD, or |
| 358 | B) you know what you're doing and have a strategy of |
| 359 | retrying. |
| 360 | |
| 361 | As opposed to queuing one task at a time (function call), |
| 362 | batch queuing of tasks, by having num > 1, greatly |
| 363 | simplifies LLDD code, sequencer code, and _hardware design_, |
| 364 | and has some performance advantages in certain situations |
| 365 | (DBMS). |
| 366 | |
| 367 | The LLDD advertises if it can take more than one command at |
| 368 | a time at lldd_execute_task(), by setting the |
| 369 | lldd_max_execute_num parameter (controlled by "collector" |
| 370 | module parameter in aic94xx SAS LLDD). |
| 371 | |
| 372 | You should leave this to the default 1, unless you know what |
| 373 | you're doing. |
| 374 | |
| 375 | This is a function of the LLDD, to which the SAS layer can |
| 376 | cater to. |
| 377 | |
| 378 | int lldd_queue_size |
| 379 | The host adapter's queue size. This is the maximum |
| 380 | number of commands the lldd can have pending to domain |
| 381 | devices on behalf of all upper layers submitting through |
| 382 | lldd_execute_task(). |
| 383 | |
| 384 | You really want to set this to something (much) larger than |
| 385 | 1. |
| 386 | |
| 387 | This _really_ has absolutely nothing to do with queuing. |
| 388 | There is no queuing in SAS LLDDs. |
| 389 | |
| 390 | struct sas_task { |
| 391 | dev -- the device this task is destined to |
| 392 | list -- must be initialized (INIT_LIST_HEAD) |
| 393 | task_proto -- _one_ of enum sas_proto |
| 394 | scatter -- pointer to scatter gather list array |
| 395 | num_scatter -- number of elements in scatter |
| 396 | total_xfer_len -- total number of bytes expected to be transfered |
| 397 | data_dir -- PCI_DMA_... |
| 398 | task_done -- callback when the task has finished execution |
| 399 | }; |
| 400 | |
| 401 | When an external entity, entity other than the LLDD or the |
| 402 | SAS Layer, wants to work with a struct domain_device, it |
| 403 | _must_ call kobject_get() when getting a handle on the |
| 404 | device and kobject_put() when it is done with the device. |
| 405 | |
| 406 | This does two things: |
| 407 | A) implements proper kfree() for the device; |
| 408 | B) increments/decrements the kref for all players: |
| 409 | domain_device |
| 410 | all domain_device's ... (if past an expander) |
| 411 | port |
| 412 | host adapter |
| 413 | pci device |
| 414 | and up the ladder, etc. |
| 415 | |
| 416 | DISCOVERY |
| 417 | --------- |
| 418 | |
| 419 | The sysfs tree has the following purposes: |
| 420 | a) It shows you the physical layout of the SAS domain at |
| 421 | the current time, i.e. how the domain looks in the |
| 422 | physical world right now. |
| 423 | b) Shows some device parameters _at_discovery_time_. |
| 424 | |
| 425 | This is a link to the tree(1) program, very useful in |
| 426 | viewing the SAS domain: |
| 427 | ftp://mama.indstate.edu/linux/tree/ |
| 428 | I expect user space applications to actually create a |
| 429 | graphical interface of this. |
| 430 | |
| 431 | That is, the sysfs domain tree doesn't show or keep state if |
| 432 | you e.g., change the meaning of the READY LED MEANING |
| 433 | setting, but it does show you the current connection status |
| 434 | of the domain device. |
| 435 | |
| 436 | Keeping internal device state changes is responsibility of |
| 437 | upper layers (Command set drivers) and user space. |
| 438 | |
| 439 | When a device or devices are unplugged from the domain, this |
| 440 | is reflected in the sysfs tree immediately, and the device(s) |
| 441 | removed from the system. |
| 442 | |
| 443 | The structure domain_device describes any device in the SAS |
| 444 | domain. It is completely managed by the SAS layer. A task |
| 445 | points to a domain device, this is how the SAS LLDD knows |
| 446 | where to send the task(s) to. A SAS LLDD only reads the |
| 447 | contents of the domain_device structure, but it never creates |
| 448 | or destroys one. |
| 449 | |
| 450 | Expander management from User Space |
| 451 | ----------------------------------- |
| 452 | |
| 453 | In each expander directory in sysfs, there is a file called |
| 454 | "smp_portal". It is a binary sysfs attribute file, which |
| 455 | implements an SMP portal (Note: this is *NOT* an SMP port), |
| 456 | to which user space applications can send SMP requests and |
| 457 | receive SMP responses. |
| 458 | |
| 459 | Functionality is deceptively simple: |
| 460 | |
| 461 | 1. Build the SMP frame you want to send. The format and layout |
| 462 | is described in the SAS spec. Leave the CRC field equal 0. |
| 463 | open(2) |
| 464 | 2. Open the expander's SMP portal sysfs file in RW mode. |
| 465 | write(2) |
| 466 | 3. Write the frame you built in 1. |
| 467 | read(2) |
| 468 | 4. Read the amount of data you expect to receive for the frame you built. |
| 469 | If you receive different amount of data you expected to receive, |
| 470 | then there was some kind of error. |
| 471 | close(2) |
| 472 | All this process is shown in detail in the function do_smp_func() |
| 473 | and its callers, in the file "expander_conf.c". |
| 474 | |
| 475 | The kernel functionality is implemented in the file |
| 476 | "sas_expander.c". |
| 477 | |
| 478 | The program "expander_conf.c" implements this. It takes one |
| 479 | argument, the sysfs file name of the SMP portal to the |
| 480 | expander, and gives expander information, including routing |
| 481 | tables. |
| 482 | |
| 483 | The SMP portal gives you complete control of the expander, |
| 484 | so please be careful. |