Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 1 | |
| 2 | Device Power Management |
| 3 | |
| 4 | |
| 5 | Device power management encompasses two areas - the ability to save |
| 6 | state and transition a device to a low-power state when the system is |
| 7 | entering a low-power state; and the ability to transition a device to |
| 8 | a low-power state while the system is running (and independently of |
| 9 | any other power management activity). |
| 10 | |
| 11 | |
| 12 | Methods |
| 13 | |
| 14 | The methods to suspend and resume devices reside in struct bus_type: |
| 15 | |
| 16 | struct bus_type { |
| 17 | ... |
| 18 | int (*suspend)(struct device * dev, pm_message_t state); |
| 19 | int (*resume)(struct device * dev); |
| 20 | }; |
| 21 | |
| 22 | Each bus driver is responsible implementing these methods, translating |
| 23 | the call into a bus-specific request and forwarding the call to the |
| 24 | bus-specific drivers. For example, PCI drivers implement suspend() and |
| 25 | resume() methods in struct pci_driver. The PCI core is simply |
| 26 | responsible for translating the pointers to PCI-specific ones and |
| 27 | calling the low-level driver. |
| 28 | |
| 29 | This is done to a) ease transition to the new power management methods |
| 30 | and leverage the existing PM code in various bus drivers; b) allow |
| 31 | buses to implement generic and default PM routines for devices, and c) |
| 32 | make the flow of execution obvious to the reader. |
| 33 | |
| 34 | |
| 35 | System Power Management |
| 36 | |
| 37 | When the system enters a low-power state, the device tree is walked in |
| 38 | a depth-first fashion to transition each device into a low-power |
| 39 | state. The ordering of the device tree is guaranteed by the order in |
| 40 | which devices get registered - children are never registered before |
| 41 | their ancestors, and devices are placed at the back of the list when |
| 42 | registered. By walking the list in reverse order, we are guaranteed to |
| 43 | suspend devices in the proper order. |
| 44 | |
| 45 | Devices are suspended once with interrupts enabled. Drivers are |
| 46 | expected to stop I/O transactions, save device state, and place the |
| 47 | device into a low-power state. Drivers may sleep, allocate memory, |
| 48 | etc. at will. |
| 49 | |
| 50 | Some devices are broken and will inevitably have problems powering |
| 51 | down or disabling themselves with interrupts enabled. For these |
| 52 | special cases, they may return -EAGAIN. This will put the device on a |
| 53 | list to be taken care of later. When interrupts are disabled, before |
| 54 | we enter the low-power state, their drivers are called again to put |
| 55 | their device to sleep. |
| 56 | |
| 57 | On resume, the devices that returned -EAGAIN will be called to power |
| 58 | themselves back on with interrupts disabled. Once interrupts have been |
| 59 | re-enabled, the rest of the drivers will be called to resume their |
| 60 | devices. On resume, a driver is responsible for powering back on each |
| 61 | device, restoring state, and re-enabling I/O transactions for that |
| 62 | device. |
| 63 | |
| 64 | System devices follow a slightly different API, which can be found in |
| 65 | |
| 66 | include/linux/sysdev.h |
| 67 | drivers/base/sys.c |
| 68 | |
| 69 | System devices will only be suspended with interrupts disabled, and |
| 70 | after all other devices have been suspended. On resume, they will be |
| 71 | resumed before any other devices, and also with interrupts disabled. |
| 72 | |
| 73 | |
| 74 | Runtime Power Management |
| 75 | |
| 76 | Many devices are able to dynamically power down while the system is |
| 77 | still running. This feature is useful for devices that are not being |
| 78 | used, and can offer significant power savings on a running system. |
| 79 | |
| 80 | In each device's directory, there is a 'power' directory, which |
| 81 | contains at least a 'state' file. Reading from this file displays what |
| 82 | power state the device is currently in. Writing to this file initiates |
| 83 | a transition to the specified power state, which must be a decimal in |
| 84 | the range 1-3, inclusive; or 0 for 'On'. |
| 85 | |
| 86 | The PM core will call the ->suspend() method in the bus_type object |
| 87 | that the device belongs to if the specified state is not 0, or |
| 88 | ->resume() if it is. |
| 89 | |
| 90 | Nothing will happen if the specified state is the same state the |
| 91 | device is currently in. |
| 92 | |
| 93 | If the device is already in a low-power state, and the specified state |
| 94 | is another, but different, low-power state, the ->resume() method will |
| 95 | first be called to power the device back on, then ->suspend() will be |
| 96 | called again with the new state. |
| 97 | |
| 98 | The driver is responsible for saving the working state of the device |
| 99 | and putting it into the low-power state specified. If this was |
| 100 | successful, it returns 0, and the device's power_state field is |
| 101 | updated. |
| 102 | |
| 103 | The driver must take care to know whether or not it is able to |
| 104 | properly resume the device, including all step of reinitialization |
| 105 | necessary. (This is the hardest part, and the one most protected by |
| 106 | NDA'd documents). |
| 107 | |
| 108 | The driver must also take care not to suspend a device that is |
| 109 | currently in use. It is their responsibility to provide their own |
| 110 | exclusion mechanisms. |
| 111 | |
| 112 | The runtime power transition happens with interrupts enabled. If a |
| 113 | device cannot support being powered down with interrupts, it may |
| 114 | return -EAGAIN (as it would during a system power management |
| 115 | transition), but it will _not_ be called again, and the transaction |
| 116 | will fail. |
| 117 | |
| 118 | There is currently no way to know what states a device or driver |
| 119 | supports a priori. This will change in the future. |
| 120 | |
| 121 | pm_message_t meaning |
| 122 | |
| 123 | pm_message_t has two fields. event ("major"), and flags. If driver |
| 124 | does not know event code, it aborts the request, returning error. Some |
| 125 | drivers may need to deal with special cases based on the actual type |
| 126 | of suspend operation being done at the system level. This is why |
| 127 | there are flags. |
| 128 | |
| 129 | Event codes are: |
| 130 | |
| 131 | ON -- no need to do anything except special cases like broken |
| 132 | HW. |
| 133 | |
| 134 | # NOTIFICATION -- pretty much same as ON? |
| 135 | |
| 136 | FREEZE -- stop DMA and interrupts, and be prepared to reinit HW from |
| 137 | scratch. That probably means stop accepting upstream requests, the |
Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 138 | actual policy of what to do with them being specific to a given |
| 139 | driver. It's acceptable for a network driver to just drop packets |
| 140 | while a block driver is expected to block the queue so no request is |
| 141 | lost. (Use IDE as an example on how to do that). FREEZE requires no |
| 142 | power state change, and it's expected for drivers to be able to |
| 143 | quickly transition back to operating state. |
| 144 | |
| 145 | SUSPEND -- like FREEZE, but also put hardware into low-power state. If |
| 146 | there's need to distinguish several levels of sleep, additional flag |
| 147 | is probably best way to do that. |
| 148 | |
| 149 | Transitions are only from a resumed state to a suspended state, never |
| 150 | between 2 suspended states. (ON -> FREEZE or ON -> SUSPEND can happen, |
| 151 | FREEZE -> SUSPEND or SUSPEND -> FREEZE can not). |
| 152 | |
| 153 | All events are: |
| 154 | |
| 155 | [NOTE NOTE NOTE: If you are driver author, you should not care; you |
| 156 | should only look at event, and ignore flags.] |
| 157 | |
| 158 | #Prepare for suspend -- userland is still running but we are going to |
| 159 | #enter suspend state. This gives drivers chance to load firmware from |
| 160 | #disk and store it in memory, or do other activities taht require |
| 161 | #operating userland, ability to kmalloc GFP_KERNEL, etc... All of these |
| 162 | #are forbiden once the suspend dance is started.. event = ON, flags = |
| 163 | #PREPARE_TO_SUSPEND |
| 164 | |
| 165 | Apm standby -- prepare for APM event. Quiesce devices to make life |
| 166 | easier for APM BIOS. event = FREEZE, flags = APM_STANDBY |
| 167 | |
| 168 | Apm suspend -- same as APM_STANDBY, but it we should probably avoid |
| 169 | spinning down disks. event = FREEZE, flags = APM_SUSPEND |
| 170 | |
| 171 | System halt, reboot -- quiesce devices to make life easier for BIOS. event |
| 172 | = FREEZE, flags = SYSTEM_HALT or SYSTEM_REBOOT |
| 173 | |
| 174 | System shutdown -- at least disks need to be spun down, or data may be |
| 175 | lost. Quiesce devices, just to make life easier for BIOS. event = |
| 176 | FREEZE, flags = SYSTEM_SHUTDOWN |
| 177 | |
| 178 | Kexec -- turn off DMAs and put hardware into some state where new |
| 179 | kernel can take over. event = FREEZE, flags = KEXEC |
| 180 | |
| 181 | Powerdown at end of swsusp -- very similar to SYSTEM_SHUTDOWN, except wake |
| 182 | may need to be enabled on some devices. This actually has at least 3 |
| 183 | subtypes, system can reboot, enter S4 and enter S5 at the end of |
| 184 | swsusp. event = FREEZE, flags = SWSUSP and one of SYSTEM_REBOOT, |
| 185 | SYSTEM_SHUTDOWN, SYSTEM_S4 |
| 186 | |
| 187 | Suspend to ram -- put devices into low power state. event = SUSPEND, |
| 188 | flags = SUSPEND_TO_RAM |
| 189 | |
| 190 | Freeze for swsusp snapshot -- stop DMA and interrupts. No need to put |
| 191 | devices into low power mode, but you must be able to reinitialize |
| 192 | device from scratch in resume method. This has two flavors, its done |
| 193 | once on suspending kernel, once on resuming kernel. event = FREEZE, |
| 194 | flags = DURING_SUSPEND or DURING_RESUME |
| 195 | |
| 196 | Device detach requested from /sys -- deinitialize device; proably same as |
| 197 | SYSTEM_SHUTDOWN, I do not understand this one too much. probably event |
| 198 | = FREEZE, flags = DEV_DETACH. |
| 199 | |
| 200 | #These are not really events sent: |
| 201 | # |
| 202 | #System fully on -- device is working normally; this is probably never |
| 203 | #passed to suspend() method... event = ON, flags = 0 |
| 204 | # |
| 205 | #Ready after resume -- userland is now running, again. Time to free any |
| 206 | #memory you ate during prepare to suspend... event = ON, flags = |
| 207 | #READY_AFTER_RESUME |
| 208 | # |