| |
| Device Power Management |
| |
| |
| Device power management encompasses two areas - the ability to save |
| state and transition a device to a low-power state when the system is |
| entering a low-power state; and the ability to transition a device to |
| a low-power state while the system is running (and independently of |
| any other power management activity). |
| |
| |
| Methods |
| |
| The methods to suspend and resume devices reside in struct bus_type: |
| |
| struct bus_type { |
| ... |
| int (*suspend)(struct device * dev, pm_message_t state); |
| int (*resume)(struct device * dev); |
| }; |
| |
| Each bus driver is responsible implementing these methods, translating |
| the call into a bus-specific request and forwarding the call to the |
| bus-specific drivers. For example, PCI drivers implement suspend() and |
| resume() methods in struct pci_driver. The PCI core is simply |
| responsible for translating the pointers to PCI-specific ones and |
| calling the low-level driver. |
| |
| This is done to a) ease transition to the new power management methods |
| and leverage the existing PM code in various bus drivers; b) allow |
| buses to implement generic and default PM routines for devices, and c) |
| make the flow of execution obvious to the reader. |
| |
| |
| System Power Management |
| |
| When the system enters a low-power state, the device tree is walked in |
| a depth-first fashion to transition each device into a low-power |
| state. The ordering of the device tree is guaranteed by the order in |
| which devices get registered - children are never registered before |
| their ancestors, and devices are placed at the back of the list when |
| registered. By walking the list in reverse order, we are guaranteed to |
| suspend devices in the proper order. |
| |
| Devices are suspended once with interrupts enabled. Drivers are |
| expected to stop I/O transactions, save device state, and place the |
| device into a low-power state. Drivers may sleep, allocate memory, |
| etc. at will. |
| |
| Some devices are broken and will inevitably have problems powering |
| down or disabling themselves with interrupts enabled. For these |
| special cases, they may return -EAGAIN. This will put the device on a |
| list to be taken care of later. When interrupts are disabled, before |
| we enter the low-power state, their drivers are called again to put |
| their device to sleep. |
| |
| On resume, the devices that returned -EAGAIN will be called to power |
| themselves back on with interrupts disabled. Once interrupts have been |
| re-enabled, the rest of the drivers will be called to resume their |
| devices. On resume, a driver is responsible for powering back on each |
| device, restoring state, and re-enabling I/O transactions for that |
| device. |
| |
| System devices follow a slightly different API, which can be found in |
| |
| include/linux/sysdev.h |
| drivers/base/sys.c |
| |
| System devices will only be suspended with interrupts disabled, and |
| after all other devices have been suspended. On resume, they will be |
| resumed before any other devices, and also with interrupts disabled. |
| |
| |
| Runtime Power Management |
| |
| Many devices are able to dynamically power down while the system is |
| still running. This feature is useful for devices that are not being |
| used, and can offer significant power savings on a running system. |
| |
| In each device's directory, there is a 'power' directory, which |
| contains at least a 'state' file. Reading from this file displays what |
| power state the device is currently in. Writing to this file initiates |
| a transition to the specified power state, which must be a decimal in |
| the range 1-3, inclusive; or 0 for 'On'. |
| |
| The PM core will call the ->suspend() method in the bus_type object |
| that the device belongs to if the specified state is not 0, or |
| ->resume() if it is. |
| |
| Nothing will happen if the specified state is the same state the |
| device is currently in. |
| |
| If the device is already in a low-power state, and the specified state |
| is another, but different, low-power state, the ->resume() method will |
| first be called to power the device back on, then ->suspend() will be |
| called again with the new state. |
| |
| The driver is responsible for saving the working state of the device |
| and putting it into the low-power state specified. If this was |
| successful, it returns 0, and the device's power_state field is |
| updated. |
| |
| The driver must take care to know whether or not it is able to |
| properly resume the device, including all step of reinitialization |
| necessary. (This is the hardest part, and the one most protected by |
| NDA'd documents). |
| |
| The driver must also take care not to suspend a device that is |
| currently in use. It is their responsibility to provide their own |
| exclusion mechanisms. |
| |
| The runtime power transition happens with interrupts enabled. If a |
| device cannot support being powered down with interrupts, it may |
| return -EAGAIN (as it would during a system power management |
| transition), but it will _not_ be called again, and the transaction |
| will fail. |
| |
| There is currently no way to know what states a device or driver |
| supports a priori. This will change in the future. |
| |
| pm_message_t meaning |
| |
| pm_message_t has two fields. event ("major"), and flags. If driver |
| does not know event code, it aborts the request, returning error. Some |
| drivers may need to deal with special cases based on the actual type |
| of suspend operation being done at the system level. This is why |
| there are flags. |
| |
| Event codes are: |
| |
| ON -- no need to do anything except special cases like broken |
| HW. |
| |
| # NOTIFICATION -- pretty much same as ON? |
| |
| FREEZE -- stop DMA and interrupts, and be prepared to reinit HW from |
| scratch. That probably means stop accepting upstream requests, the |
| actual policy of what to do with them being specific to a given |
| driver. It's acceptable for a network driver to just drop packets |
| while a block driver is expected to block the queue so no request is |
| lost. (Use IDE as an example on how to do that). FREEZE requires no |
| power state change, and it's expected for drivers to be able to |
| quickly transition back to operating state. |
| |
| SUSPEND -- like FREEZE, but also put hardware into low-power state. If |
| there's need to distinguish several levels of sleep, additional flag |
| is probably best way to do that. |
| |
| Transitions are only from a resumed state to a suspended state, never |
| between 2 suspended states. (ON -> FREEZE or ON -> SUSPEND can happen, |
| FREEZE -> SUSPEND or SUSPEND -> FREEZE can not). |
| |
| All events are: |
| |
| [NOTE NOTE NOTE: If you are driver author, you should not care; you |
| should only look at event, and ignore flags.] |
| |
| #Prepare for suspend -- userland is still running but we are going to |
| #enter suspend state. This gives drivers chance to load firmware from |
| #disk and store it in memory, or do other activities taht require |
| #operating userland, ability to kmalloc GFP_KERNEL, etc... All of these |
| #are forbiden once the suspend dance is started.. event = ON, flags = |
| #PREPARE_TO_SUSPEND |
| |
| Apm standby -- prepare for APM event. Quiesce devices to make life |
| easier for APM BIOS. event = FREEZE, flags = APM_STANDBY |
| |
| Apm suspend -- same as APM_STANDBY, but it we should probably avoid |
| spinning down disks. event = FREEZE, flags = APM_SUSPEND |
| |
| System halt, reboot -- quiesce devices to make life easier for BIOS. event |
| = FREEZE, flags = SYSTEM_HALT or SYSTEM_REBOOT |
| |
| System shutdown -- at least disks need to be spun down, or data may be |
| lost. Quiesce devices, just to make life easier for BIOS. event = |
| FREEZE, flags = SYSTEM_SHUTDOWN |
| |
| Kexec -- turn off DMAs and put hardware into some state where new |
| kernel can take over. event = FREEZE, flags = KEXEC |
| |
| Powerdown at end of swsusp -- very similar to SYSTEM_SHUTDOWN, except wake |
| may need to be enabled on some devices. This actually has at least 3 |
| subtypes, system can reboot, enter S4 and enter S5 at the end of |
| swsusp. event = FREEZE, flags = SWSUSP and one of SYSTEM_REBOOT, |
| SYSTEM_SHUTDOWN, SYSTEM_S4 |
| |
| Suspend to ram -- put devices into low power state. event = SUSPEND, |
| flags = SUSPEND_TO_RAM |
| |
| Freeze for swsusp snapshot -- stop DMA and interrupts. No need to put |
| devices into low power mode, but you must be able to reinitialize |
| device from scratch in resume method. This has two flavors, its done |
| once on suspending kernel, once on resuming kernel. event = FREEZE, |
| flags = DURING_SUSPEND or DURING_RESUME |
| |
| Device detach requested from /sys -- deinitialize device; proably same as |
| SYSTEM_SHUTDOWN, I do not understand this one too much. probably event |
| = FREEZE, flags = DEV_DETACH. |
| |
| #These are not really events sent: |
| # |
| #System fully on -- device is working normally; this is probably never |
| #passed to suspend() method... event = ON, flags = 0 |
| # |
| #Ready after resume -- userland is now running, again. Time to free any |
| #memory you ate during prepare to suspend... event = ON, flags = |
| #READY_AFTER_RESUME |
| # |