Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 1 | |
| 2 | PCI Power Management |
| 3 | ~~~~~~~~~~~~~~~~~~~~ |
| 4 | |
| 5 | An overview of the concepts and the related functions in the Linux kernel |
| 6 | |
| 7 | Patrick Mochel <mochel@transmeta.com> |
| 8 | (and others) |
| 9 | |
| 10 | --------------------------------------------------------------------------- |
| 11 | |
| 12 | 1. Overview |
| 13 | 2. How the PCI Subsystem Does Power Management |
| 14 | 3. PCI Utility Functions |
| 15 | 4. PCI Device Drivers |
| 16 | 5. Resources |
| 17 | |
| 18 | 1. Overview |
| 19 | ~~~~~~~~~~~ |
| 20 | |
| 21 | The PCI Power Management Specification was introduced between the PCI 2.1 and |
| 22 | PCI 2.2 Specifications. It a standard interface for controlling various |
| 23 | power management operations. |
| 24 | |
| 25 | Implementation of the PCI PM Spec is optional, as are several sub-components of |
| 26 | it. If a device supports the PCI PM Spec, the device will have an 8 byte |
| 27 | capability field in its PCI configuration space. This field is used to describe |
| 28 | and control the standard PCI power management features. |
| 29 | |
| 30 | The PCI PM spec defines 4 operating states for devices (D0 - D3) and for buses |
| 31 | (B0 - B3). The higher the number, the less power the device consumes. However, |
| 32 | the higher the number, the longer the latency is for the device to return to |
| 33 | an operational state (D0). |
| 34 | |
| 35 | There are actually two D3 states. When someone talks about D3, they usually |
| 36 | mean D3hot, which corresponds to an ACPI D2 state (power is reduced, the |
| 37 | device may lose some context). But they may also mean D3cold, which is an |
| 38 | ACPI D3 state (power is fully off, all state was discarded); or both. |
| 39 | |
| 40 | Bus power management is not covered in this version of this document. |
| 41 | |
| 42 | Note that all PCI devices support D0 and D3cold by default, regardless of |
| 43 | whether or not they implement any of the PCI PM spec. |
| 44 | |
| 45 | The possible state transitions that a device can undergo are: |
| 46 | |
| 47 | +---------------------------+ |
| 48 | | Current State | New State | |
| 49 | +---------------------------+ |
| 50 | | D0 | D1, D2, D3| |
| 51 | +---------------------------+ |
| 52 | | D1 | D2, D3 | |
| 53 | +---------------------------+ |
| 54 | | D2 | D3 | |
| 55 | +---------------------------+ |
| 56 | | D1, D2, D3 | D0 | |
| 57 | +---------------------------+ |
| 58 | |
| 59 | Note that when the system is entering a global suspend state, all devices will |
| 60 | be placed into D3 and when resuming, all devices will be placed into D0. |
| 61 | However, when the system is running, other state transitions are possible. |
| 62 | |
| 63 | 2. How The PCI Subsystem Handles Power Management |
| 64 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
| 65 | |
| 66 | The PCI suspend/resume functionality is accessed indirectly via the Power |
| 67 | Management subsystem. At boot, the PCI driver registers a power management |
| 68 | callback with that layer. Upon entering a suspend state, the PM layer iterates |
| 69 | through all of its registered callbacks. This currently takes place only during |
| 70 | APM state transitions. |
| 71 | |
| 72 | Upon going to sleep, the PCI subsystem walks its device tree twice. Both times, |
| 73 | it does a depth first walk of the device tree. The first walk saves each of the |
| 74 | device's state and checks for devices that will prevent the system from entering |
| 75 | a global power state. The next walk then places the devices in a low power |
| 76 | state. |
| 77 | |
| 78 | The first walk allows a graceful recovery in the event of a failure, since none |
| 79 | of the devices have actually been powered down. |
| 80 | |
| 81 | In both walks, in particular the second, all children of a bridge are touched |
| 82 | before the actual bridge itself. This allows the bridge to retain power while |
| 83 | its children are being accessed. |
| 84 | |
| 85 | Upon resuming from sleep, just the opposite must be true: all bridges must be |
| 86 | powered on and restored before their children are powered on. This is easily |
| 87 | accomplished with a breadth-first walk of the PCI device tree. |
| 88 | |
| 89 | |
| 90 | 3. PCI Utility Functions |
| 91 | ~~~~~~~~~~~~~~~~~~~~~~~~ |
| 92 | |
| 93 | These are helper functions designed to be called by individual device drivers. |
| 94 | Assuming that a device behaves as advertised, these should be applicable in most |
| 95 | cases. However, results may vary. |
| 96 | |
| 97 | Note that these functions are never implicitly called for the driver. The driver |
| 98 | is always responsible for deciding when and if to call these. |
| 99 | |
| 100 | |
| 101 | pci_save_state |
| 102 | -------------- |
| 103 | |
| 104 | Usage: |
Jonathan Corbet | 5fabdb9 | 2007-03-22 16:53:40 -0600 | [diff] [blame] | 105 | pci_save_state(struct pci_dev *dev); |
Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 106 | |
| 107 | Description: |
Jonathan Corbet | 5fabdb9 | 2007-03-22 16:53:40 -0600 | [diff] [blame] | 108 | Save first 64 bytes of PCI config space, along with any additional |
| 109 | PCI-Express or PCI-X information. |
Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 110 | |
| 111 | |
| 112 | pci_restore_state |
| 113 | ----------------- |
| 114 | |
| 115 | Usage: |
Jonathan Corbet | 5fabdb9 | 2007-03-22 16:53:40 -0600 | [diff] [blame] | 116 | pci_restore_state(struct pci_dev *dev); |
Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 117 | |
| 118 | Description: |
Jonathan Corbet | 5fabdb9 | 2007-03-22 16:53:40 -0600 | [diff] [blame] | 119 | Restore previously saved config space. |
Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 120 | |
| 121 | |
| 122 | pci_set_power_state |
| 123 | ------------------- |
| 124 | |
| 125 | Usage: |
Jonathan Corbet | 5fabdb9 | 2007-03-22 16:53:40 -0600 | [diff] [blame] | 126 | pci_set_power_state(struct pci_dev *dev, pci_power_t state); |
Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 127 | |
| 128 | Description: |
| 129 | Transition device to low power state using PCI PM Capabilities |
| 130 | registers. |
| 131 | |
| 132 | Will fail under one of the following conditions: |
| 133 | - If state is less than current state, but not D0 (illegal transition) |
| 134 | - Device doesn't support PM Capabilities |
| 135 | - Device does not support requested state |
| 136 | |
| 137 | |
| 138 | pci_enable_wake |
| 139 | --------------- |
| 140 | |
| 141 | Usage: |
Jonathan Corbet | 5fabdb9 | 2007-03-22 16:53:40 -0600 | [diff] [blame] | 142 | pci_enable_wake(struct pci_dev *dev, pci_power_t state, int enable); |
Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 143 | |
| 144 | Description: |
| 145 | Enable device to generate PME# during low power state using PCI PM |
| 146 | Capabilities. |
| 147 | |
| 148 | Checks whether if device supports generating PME# from requested state |
| 149 | and fail if it does not, unless enable == 0 (request is to disable wake |
| 150 | events, which is implicit if it doesn't even support it in the first |
| 151 | place). |
| 152 | |
Matt LaPlante | 5d3f083 | 2006-11-30 05:21:10 +0100 | [diff] [blame] | 153 | Note that the PMC Register in the device's PM Capabilities has a bitmask |
Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 154 | of the states it supports generating PME# from. D3hot is bit 3 and |
| 155 | D3cold is bit 4. So, while a value of 4 as the state may not seem |
| 156 | semantically correct, it is. |
| 157 | |
| 158 | |
| 159 | 4. PCI Device Drivers |
| 160 | ~~~~~~~~~~~~~~~~~~~~~ |
| 161 | |
| 162 | These functions are intended for use by individual drivers, and are defined in |
| 163 | struct pci_driver: |
| 164 | |
Pavel Machek | 92df516 | 2005-04-05 23:49:49 +0200 | [diff] [blame] | 165 | int (*suspend) (struct pci_dev *dev, pm_message_t state); |
Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 166 | int (*resume) (struct pci_dev *dev); |
Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 167 | |
| 168 | |
| 169 | suspend |
| 170 | ------- |
| 171 | |
| 172 | Usage: |
| 173 | |
| 174 | if (dev->driver && dev->driver->suspend) |
| 175 | dev->driver->suspend(dev,state); |
| 176 | |
| 177 | A driver uses this function to actually transition the device into a low power |
| 178 | state. This should include disabling I/O, IRQs, and bus-mastering, as well as |
| 179 | physically transitioning the device to a lower power state; it may also include |
| 180 | calls to pci_enable_wake(). |
| 181 | |
| 182 | Bus mastering may be disabled by doing: |
| 183 | |
| 184 | pci_disable_device(dev); |
| 185 | |
| 186 | For devices that support the PCI PM Spec, this may be used to set the device's |
| 187 | power state to match the suspend() parameter: |
| 188 | |
| 189 | pci_set_power_state(dev,state); |
| 190 | |
| 191 | The driver is also responsible for disabling any other device-specific features |
| 192 | (e.g blanking screen, turning off on-card memory, etc). |
| 193 | |
| 194 | The driver should be sure to track the current state of the device, as it may |
| 195 | obviate the need for some operations. |
| 196 | |
| 197 | The driver should update the current_state field in its pci_dev structure in |
| 198 | this function, except for PM-capable devices when pci_set_power_state is used. |
| 199 | |
| 200 | resume |
| 201 | ------ |
| 202 | |
| 203 | Usage: |
| 204 | |
Randy Dunlap | 54eee4c | 2007-04-04 21:35:39 -0700 | [diff] [blame] | 205 | if (dev->driver && dev->driver->resume) |
Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 206 | dev->driver->resume(dev) |
| 207 | |
| 208 | The resume callback may be called from any power state, and is always meant to |
| 209 | transition the device to the D0 state. |
| 210 | |
| 211 | The driver is responsible for reenabling any features of the device that had |
| 212 | been disabled during previous suspend calls, such as IRQs and bus mastering, |
| 213 | as well as calling pci_restore_state(). |
| 214 | |
| 215 | If the device is currently in D3, it may need to be reinitialized in resume(). |
| 216 | |
| 217 | * Some types of devices, like bus controllers, will preserve context in D3hot |
| 218 | (using Vcc power). Their drivers will often want to avoid re-initializing |
| 219 | them after re-entering D0 (perhaps to avoid resetting downstream devices). |
| 220 | |
| 221 | * Other kinds of devices in D3hot will discard device context as part of a |
| 222 | soft reset when re-entering the D0 state. |
| 223 | |
| 224 | * Devices resuming from D3cold always go through a power-on reset. Some |
| 225 | device context can also be preserved using Vaux power. |
| 226 | |
| 227 | * Some systems hide D3cold resume paths from drivers. For example, on PCs |
| 228 | the resume path for suspend-to-disk often runs BIOS powerup code, which |
| 229 | will sometimes re-initialize the device. |
| 230 | |
| 231 | To handle resets during D3 to D0 transitions, it may be convenient to share |
| 232 | device initialization code between probe() and resume(). Device parameters |
| 233 | can also be saved before the driver suspends into D3, avoiding re-probe. |
| 234 | |
| 235 | If the device supports the PCI PM Spec, it can use this to physically transition |
| 236 | the device to D0: |
| 237 | |
| 238 | pci_set_power_state(dev,0); |
| 239 | |
| 240 | Note that if the entire system is transitioning out of a global sleep state, all |
| 241 | devices will be placed in the D0 state, so this is not necessary. However, in |
| 242 | the event that the device is placed in the D3 state during normal operation, |
| 243 | this call is necessary. It is impossible to determine which of the two events is |
| 244 | taking place in the driver, so it is always a good idea to make that call. |
| 245 | |
| 246 | The driver should take note of the state that it is resuming from in order to |
| 247 | ensure correct (and speedy) operation. |
| 248 | |
| 249 | The driver should update the current_state field in its pci_dev structure in |
| 250 | this function, except for PM-capable devices when pci_set_power_state is used. |
| 251 | |
| 252 | |
Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 253 | |
pavel@ucw.cz | 21d6b7e | 2005-06-25 14:55:16 -0700 | [diff] [blame] | 254 | A reference implementation |
| 255 | ------------------------- |
| 256 | .suspend() |
| 257 | { |
| 258 | /* driver specific operations */ |
| 259 | |
| 260 | /* Disable IRQ */ |
| 261 | free_irq(); |
| 262 | /* If using MSI */ |
| 263 | pci_disable_msi(); |
| 264 | |
| 265 | pci_save_state(); |
| 266 | pci_enable_wake(); |
| 267 | /* Disable IO/bus master/irq router */ |
| 268 | pci_disable_device(); |
| 269 | pci_set_power_state(pci_choose_state()); |
| 270 | } |
| 271 | |
| 272 | .resume() |
| 273 | { |
| 274 | pci_set_power_state(PCI_D0); |
| 275 | pci_restore_state(); |
| 276 | /* device's irq possibly is changed, driver should take care */ |
| 277 | pci_enable_device(); |
| 278 | pci_set_master(); |
| 279 | |
| 280 | /* if using MSI, device's vector possibly is changed */ |
| 281 | pci_enable_msi(); |
| 282 | |
| 283 | request_irq(); |
| 284 | /* driver specific operations; */ |
| 285 | } |
| 286 | |
| 287 | This is a typical implementation. Drivers can slightly change the order |
| 288 | of the operations in the implementation, ignore some operations or add |
Matt LaPlante | fff9289 | 2006-10-03 22:47:42 +0200 | [diff] [blame] | 289 | more driver specific operations in it, but drivers should do something like |
pavel@ucw.cz | 21d6b7e | 2005-06-25 14:55:16 -0700 | [diff] [blame] | 290 | this on the whole. |
| 291 | |
Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 292 | 5. Resources |
| 293 | ~~~~~~~~~~~~ |
| 294 | |
| 295 | PCI Local Bus Specification |
| 296 | PCI Bus Power Management Interface Specification |
| 297 | |
Randy Dunlap | 98766fb | 2005-11-21 21:32:31 -0800 | [diff] [blame] | 298 | http://www.pcisig.com |
Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 299 | |