blob: 10018d19e0bfc580a3b383c9fd47cd29db029e8e [file] [log] [blame]
Rafael J. Wysocki624f6ec2010-03-26 23:53:42 +01001Device Power Management
2
3(C) 2010 Rafael J. Wysocki <rjw@sisk.pl>, Novell Inc.
4
David Brownell4fc08402006-08-10 16:38:28 -07005Most of the code in Linux is device drivers, so most of the Linux power
6management code is also driver-specific. Most drivers will do very little;
7others, especially for platforms with small batteries (like cell phones),
8will do a lot.
Linus Torvalds1da177e2005-04-16 15:20:36 -07009
David Brownell4fc08402006-08-10 16:38:28 -070010This writeup gives an overview of how drivers interact with system-wide
11power management goals, emphasizing the models and interfaces that are
12shared by everything that hooks up to the driver model core. Read it as
13background for the domain-specific work you'd do with any specific driver.
Linus Torvalds1da177e2005-04-16 15:20:36 -070014
15
David Brownell4fc08402006-08-10 16:38:28 -070016Two Models for Device Power Management
17======================================
18Drivers will use one or both of these models to put devices into low-power
19states:
20
21 System Sleep model:
22 Drivers can enter low power states as part of entering system-wide
23 low-power states like "suspend-to-ram", or (mostly for systems with
24 disks) "hibernate" (suspend-to-disk).
25
26 This is something that device, bus, and class drivers collaborate on
27 by implementing various role-specific suspend and resume methods to
28 cleanly power down hardware and software subsystems, then reactivate
29 them without loss of data.
30
31 Some drivers can manage hardware wakeup events, which make the system
Rafael J. Wysocki624f6ec2010-03-26 23:53:42 +010032 leave that low-power state. This feature may be enabled or disabled
33 using the relevant /sys/devices/.../power/wakeup file (for Ethernet
34 drivers the ioctl interface used by ethtool may also be used for this
35 purpose); enabling it may cost some power usage, but let the whole
36 system enter low power states more often.
David Brownell4fc08402006-08-10 16:38:28 -070037
38 Runtime Power Management model:
Rafael J. Wysocki624f6ec2010-03-26 23:53:42 +010039 Devices may also be put into low power states while the system is
40 running, independently of other power management activity in principle.
41 However, devices are not generally independent of each other (for
42 example, parent device cannot be suspended unless all of its child
43 devices have been suspended). Moreover, depending on the bus type the
44 device is on, it may be necessary to carry out some bus-specific
45 operations on the device for this purpose. Also, devices put into low
46 power states at run time may require special handling during system-wide
47 power transitions, like suspend to RAM.
David Brownell4fc08402006-08-10 16:38:28 -070048
Rafael J. Wysocki624f6ec2010-03-26 23:53:42 +010049 For these reasons not only the device driver itself, but also the
50 appropriate subsystem (bus type, device type or device class) driver
51 and the PM core are involved in the runtime power management of devices.
52 Like in the system sleep power management case, they need to collaborate
53 by implementing various role-specific suspend and resume methods, so
54 that the hardware is cleanly powered down and reactivated without data
55 or service loss.
David Brownell4fc08402006-08-10 16:38:28 -070056
57There's not a lot to be said about those low power states except that they
58are very system-specific, and often device-specific. Also, that if enough
Rafael J. Wysocki624f6ec2010-03-26 23:53:42 +010059devices have been put into low power states (at "run time"), the effect may be
60very similar to entering some system-wide low-power state (system sleep) ... and
61that synergies exist, so that several drivers using runtime PM might put the
David Brownell4fc08402006-08-10 16:38:28 -070062system into a state where even deeper power saving options are available.
63
Rafael J. Wysocki624f6ec2010-03-26 23:53:42 +010064Most suspended devices will have quiesced all I/O: no more DMA or IRQs, no
David Brownell4fc08402006-08-10 16:38:28 -070065more data read or written, and requests from upstream drivers are no longer
66accepted. A given bus or platform may have different requirements though.
67
68Examples of hardware wakeup events include an alarm from a real time clock,
69network wake-on-LAN packets, keyboard or mouse activity, and media insertion
70or removal (for PCMCIA, MMC/SD, USB, and so on).
Linus Torvalds1da177e2005-04-16 15:20:36 -070071
72
David Brownell4fc08402006-08-10 16:38:28 -070073Interfaces for Entering System Sleep States
74===========================================
Rafael J. Wysocki624f6ec2010-03-26 23:53:42 +010075There are programming interfaces provided for subsystem (bus type, device type,
76device class) and device drivers in order to allow them to participate in the
77power management of devices they are concerned with. They cover the system
78sleep power management as well as the runtime power management of devices.
Linus Torvalds1da177e2005-04-16 15:20:36 -070079
David Brownell4fc08402006-08-10 16:38:28 -070080
Rafael J. Wysocki624f6ec2010-03-26 23:53:42 +010081Device Power Management Operations
82----------------------------------
83Device power management operations, at the subsystem level as well as at the
84device driver level, are implemented by defining and populating objects of type
85struct dev_pm_ops:
Linus Torvalds1da177e2005-04-16 15:20:36 -070086
Rafael J. Wysocki624f6ec2010-03-26 23:53:42 +010087struct dev_pm_ops {
88 int (*prepare)(struct device *dev);
89 void (*complete)(struct device *dev);
90 int (*suspend)(struct device *dev);
91 int (*resume)(struct device *dev);
92 int (*freeze)(struct device *dev);
93 int (*thaw)(struct device *dev);
94 int (*poweroff)(struct device *dev);
95 int (*restore)(struct device *dev);
96 int (*suspend_noirq)(struct device *dev);
97 int (*resume_noirq)(struct device *dev);
98 int (*freeze_noirq)(struct device *dev);
99 int (*thaw_noirq)(struct device *dev);
100 int (*poweroff_noirq)(struct device *dev);
101 int (*restore_noirq)(struct device *dev);
102 int (*runtime_suspend)(struct device *dev);
103 int (*runtime_resume)(struct device *dev);
104 int (*runtime_idle)(struct device *dev);
Linus Torvalds1da177e2005-04-16 15:20:36 -0700105};
106
Rafael J. Wysocki624f6ec2010-03-26 23:53:42 +0100107This structure is defined in include/linux/pm.h and the methods included in it
108are also described in that file. Their roles will be explained in what follows.
109For now, it should be sufficient to remember that the last three of them are
110specific to runtime power management, while the remaining ones are used during
111system-wide power transitions.
112
113There also is an "old" or "legacy", deprecated way of implementing power
114management operations available at least for some subsystems. This approach
115does not use struct dev_pm_ops objects and it only is suitable for implementing
116system sleep power management methods. Therefore it is not described in this
117document, so please refer directly to the source code for more information about
118it.
119
120
121Subsystem-Level Methods
122-----------------------
123The core methods to suspend and resume devices reside in struct dev_pm_ops
124pointed to by the pm member of struct bus_type, struct device_type and
125struct class. They are mostly of interest to the people writing infrastructure
126for buses, like PCI or USB, or device type and device class drivers.
127
128Bus drivers implement these methods as appropriate for the hardware and
David Brownell4fc08402006-08-10 16:38:28 -0700129the drivers using it; PCI works differently from USB, and so on. Not many
Rafael J. Wysocki624f6ec2010-03-26 23:53:42 +0100130people write subsystem-level drivers; most driver code is a "device driver" that
David Brownell4fc08402006-08-10 16:38:28 -0700131builds on top of bus-specific framework code.
Linus Torvalds1da177e2005-04-16 15:20:36 -0700132
David Brownell4fc08402006-08-10 16:38:28 -0700133For more information on these driver calls, see the description later;
134they are called in phases for every device, respecting the parent-child
Rafael J. Wysocki624f6ec2010-03-26 23:53:42 +0100135sequencing in the driver model tree.
Linus Torvalds1da177e2005-04-16 15:20:36 -0700136
137
David Brownell4fc08402006-08-10 16:38:28 -0700138/sys/devices/.../power/wakeup files
139-----------------------------------
140All devices in the driver model have two flags to control handling of
141wakeup events, which are hardware signals that can force the device and/or
142system out of a low power state. These are initialized by bus or device
Rafael J. Wysocki624f6ec2010-03-26 23:53:42 +0100143driver code using device_init_wakeup().
Linus Torvalds1da177e2005-04-16 15:20:36 -0700144
David Brownell4fc08402006-08-10 16:38:28 -0700145The "can_wakeup" flag just records whether the device (and its driver) can
146physically support wakeup events. When that flag is clear, the sysfs
147"wakeup" file is empty, and device_may_wakeup() returns false.
Linus Torvalds1da177e2005-04-16 15:20:36 -0700148
David Brownell4fc08402006-08-10 16:38:28 -0700149For devices that can issue wakeup events, a separate flag controls whether
150that device should try to use its wakeup mechanism. The initial value of
Rafael J. Wysocki624f6ec2010-03-26 23:53:42 +0100151device_may_wakeup() will be false for the majority of devices, except for
152power buttons, keyboards, and Ethernet adapters whose WoL (wake-on-LAN) feature
153has been set up with ethtool. Thus in the majority of cases the device's
154"wakeup" file will initially hold the value "disabled". Userspace can change
155that to "enabled", so that device_may_wakeup() returns true, or change it back
156to "disabled", so that it returns false again.
Linus Torvalds1da177e2005-04-16 15:20:36 -0700157
Linus Torvalds1da177e2005-04-16 15:20:36 -0700158
Rafael J. Wysocki624f6ec2010-03-26 23:53:42 +0100159/sys/devices/.../power/control files
160------------------------------------
161All devices in the driver model have a flag to control the desired behavior of
162its driver with respect to runtime power management. This flag, called
163runtime_auto, is initialized by the bus type (or generally subsystem) code using
164pm_runtime_allow() or pm_runtime_forbid(), depending on whether or not the
165driver is supposed to power manage the device at run time by default,
166respectively.
Linus Torvalds1da177e2005-04-16 15:20:36 -0700167
Rafael J. Wysocki624f6ec2010-03-26 23:53:42 +0100168This setting may be adjusted by user space by writing either "on" or "auto" to
169the device's "control" file. If "auto" is written, the device's runtime_auto
170flag will be set and the driver will be allowed to power manage the device if
171capable of doing that. If "on" is written, the driver is not allowed to power
172manage the device which in turn is supposed to remain in the full power state at
173run time. User space can check the current value of the runtime_auto flag by
174reading from the device's "control" file.
David Brownell4fc08402006-08-10 16:38:28 -0700175
Rafael J. Wysocki624f6ec2010-03-26 23:53:42 +0100176The device's runtime_auto flag has no effect on the handling of system-wide
177power transitions by its driver. In particular, the device can (and in the
178majority of cases should and will) be put into a low power state during a
179system-wide transition to a sleep state (like "suspend-to-RAM") even though its
180runtime_auto flag is unset (in which case its "control" file contains "on").
David Brownell4fc08402006-08-10 16:38:28 -0700181
Rafael J. Wysocki624f6ec2010-03-26 23:53:42 +0100182For more information about the runtime power management framework for devices
183refer to Documentation/power/runtime_pm.txt.
David Brownell4fc08402006-08-10 16:38:28 -0700184
185
186Calling Drivers to Enter System Sleep States
187============================================
Rafael J. Wysocki624f6ec2010-03-26 23:53:42 +0100188When the system goes into a sleep state, each device's driver is asked
David Brownell4fc08402006-08-10 16:38:28 -0700189to suspend the device by putting it into state compatible with the target
190system state. That's usually some version of "off", but the details are
191system-specific. Also, wakeup-enabled devices will usually stay partly
192functional in order to wake the system.
193
194When the system leaves that low power state, the device's driver is asked
195to resume it. The suspend and resume operations always go together, and
196both are multi-phase operations.
197
198For simple drivers, suspend might quiesce the device using the class code
199and then turn its hardware as "off" as possible with late_suspend. The
200matching resume calls would then completely reinitialize the hardware
201before reactivating its class I/O queues.
202
Rafael J. Wysocki624f6ec2010-03-26 23:53:42 +0100203More power-aware drivers might prepare the devices for triggering system wakeup
204events.
David Brownell4fc08402006-08-10 16:38:28 -0700205
206
207Call Sequence Guarantees
208------------------------
Rafael J. Wysocki624f6ec2010-03-26 23:53:42 +0100209To ensure that bridges and similar links needing to talk to a device are
David Brownell4fc08402006-08-10 16:38:28 -0700210available when the device is suspended or resumed, the device tree is
211walked in a bottom-up order to suspend devices. A top-down order is
212used to resume those devices.
213
214The ordering of the device tree is defined by the order in which devices
215get registered: a child can never be registered, probed or resumed before
216its parent; and can't be removed or suspended after that parent.
217
218The policy is that the device tree should match hardware bus topology.
219(Or at least the control bus, for devices which use multiple busses.)
Rafael J. Wysocki58aca232008-03-12 00:57:22 +0100220In particular, this means that a device registration may fail if the parent of
Rafael J. Wysocki624f6ec2010-03-26 23:53:42 +0100221the device is suspending (i.e. has been chosen by the PM core as the next
Rafael J. Wysocki58aca232008-03-12 00:57:22 +0100222device to suspend) or has already suspended, as well as after all of the other
223devices have been suspended. Device drivers must be prepared to cope with such
224situations.
David Brownell4fc08402006-08-10 16:38:28 -0700225
226
227Suspending Devices
228------------------
229Suspending a given device is done in several phases. Suspending the
230system always includes every phase, executing calls for every device
231before the next phase begins. Not all busses or classes support all
232these callbacks; and not all drivers use all the callbacks.
233
Rafael J. Wysocki624f6ec2010-03-26 23:53:42 +0100234Generally, different callbacks are used depending on whether the system is
235going to the standby or memory sleep state ("suspend-to-RAM") or it is going to
236be hibernated ("suspend-to-disk").
David Brownell4fc08402006-08-10 16:38:28 -0700237
Rafael J. Wysocki624f6ec2010-03-26 23:53:42 +0100238If the system goes to the standby or memory sleep state the phases are seen by
239driver notifications issued in this order:
David Brownell4fc08402006-08-10 16:38:28 -0700240
Rafael J. Wysocki624f6ec2010-03-26 23:53:42 +0100241 1 bus->pm.prepare(dev) is called after tasks are frozen and it is supposed
242 to call the device driver's ->pm.prepare() method.
David Brownell4fc08402006-08-10 16:38:28 -0700243
Rafael J. Wysocki624f6ec2010-03-26 23:53:42 +0100244 The purpose of this method is mainly to prevent new children of the
245 device from being registered after it has returned. It also may be used
246 to generally prepare the device for the upcoming system transition, but
247 it should not put the device into a low power state.
David Brownell4fc08402006-08-10 16:38:28 -0700248
Rafael J. Wysocki624f6ec2010-03-26 23:53:42 +0100249 2 class->pm.suspend(dev) is called if dev is associated with a class that
250 has such a method. It may invoke the device driver's ->pm.suspend()
251 method, unless type->pm.suspend(dev) or bus->pm.suspend() does that.
David Brownell4fc08402006-08-10 16:38:28 -0700252
Rafael J. Wysocki624f6ec2010-03-26 23:53:42 +0100253 3 type->pm.suspend(dev) is called if dev is associated with a device type
254 that has such a method. It may invoke the device driver's
255 ->pm.suspend() method, unless class->pm.suspend(dev) or
256 bus->pm.suspend() does that.
257
258 4 bus->pm.suspend(dev) is called, if implemented. It usually calls the
259 device driver's ->pm.suspend() method.
260
261 This call should generally quiesce the device so that it doesn't do any
262 I/O after the call has returned. It also may save the device registers
263 and put it into the appropriate low power state, depending on the bus
264 type the device is on.
265
266 5 bus->pm.suspend_noirq(dev) is called, if implemented. It may call the
267 device driver's ->pm.suspend_noirq() method, depending on the bus type
268 in question.
269
270 This method is invoked after device interrupts have been suspended,
271 which means that the driver's interrupt handler will not be called
272 while it is running. It should save the values of the device's
273 registers that weren't saved previously and finally put the device into
274 the appropriate low power state.
275
276 The majority of subsystems and device drivers need not implement this
277 method. However, bus types allowing devices to share interrupt vectors,
278 like PCI, generally need to use it to prevent interrupt handling issues
279 from happening during suspend.
David Brownell4fc08402006-08-10 16:38:28 -0700280
281At the end of those phases, drivers should normally have stopped all I/O
282transactions (DMA, IRQs), saved enough state that they can re-initialize
283or restore previous state (as needed by the hardware), and placed the
284device into a low-power state. On many platforms they will also use
Rafael J. Wysocki624f6ec2010-03-26 23:53:42 +0100285gate off one or more clock sources; sometimes they will also switch off power
286supplies, or reduce voltages. [Drivers supporting runtime PM may already have
287performed some or all of the steps needed to prepare for the upcoming system
288state transition.]
David Brownell4fc08402006-08-10 16:38:28 -0700289
Rafael J. Wysocki624f6ec2010-03-26 23:53:42 +0100290If device_may_wakeup(dev) returns true, the device should be prepared for
291generating hardware wakeup signals when the system is in the sleep state to
292trigger a system wakeup event. For example, enable_irq_wake() might identify
293GPIO signals hooked up to a switch or other external hardware, and
294pci_enable_wake() does something similar for the PCI PME signal.
David Brownell4fc08402006-08-10 16:38:28 -0700295
Rafael J. Wysocki624f6ec2010-03-26 23:53:42 +0100296If a driver (or subsystem) fails it suspend method, the system won't enter the
297desired low power state; it will resume all the devices it's suspended so far.
David Brownell4fc08402006-08-10 16:38:28 -0700298
Rafael J. Wysocki624f6ec2010-03-26 23:53:42 +0100299
300Hibernation Phases
301------------------
302Hibernating the system is more complicated than putting it into the standby or
303memory sleep state, because it involves creating a system image and saving it.
304Therefore there are more phases of hibernation and special device PM methods are
305used in this case.
306
307First, it is necessary to prepare the system for creating a hibernation image.
308This is similar to putting the system into the standby or memory sleep state,
309although it generally doesn't require that devices be put into low power states
310(that is even not desirable at this point). Driver notifications are then
311issued in the following order:
312
313 1 bus->pm.prepare(dev) is called after tasks have been frozen and enough
314 memory has been freed.
315
316 2 class->pm.freeze(dev) is called if implemented. It may invoke the
317 device driver's ->pm.freeze() method, unless type->pm.freeze(dev) or
318 bus->pm.freeze() does that.
319
320 3 type->pm.freeze(dev) is called if implemented. It may invoke the device
321 driver's ->pm.suspend() method, unless class->pm.freeze(dev) or
322 bus->pm.freeze() does that.
323
324 4 bus->pm.freeze(dev) is called, if implemented. It usually calls the
325 device driver's ->pm.freeze() method.
326
327 5 bus->pm.freeze_noirq(dev) is called, if implemented. It may call the
328 device driver's ->pm.freeze_noirq() method, depending on the bus type
329 in question.
330
331The difference between ->pm.freeze() and the corresponding ->pm.suspend() (and
332similarly for the "noirq" variants) is that the former should avoid preparing
333devices to trigger system wakeup events and putting devices into low power
334states, although they generally have to save the values of device registers
335so that it's possible to restore them during system resume.
336
337Second, after the system image has been created, the functionality of devices
338has to be restored so that the image can be saved. That is similar to resuming
339devices after the system has been woken up from the standby or memory sleep
340state, which is described below, and causes the following device notifications
341to be issued:
342
343 1 bus->pm.thaw_noirq(dev), if implemented; may call the device driver's
344 ->pm.thaw_noirq() method, depending on the bus type in question.
345
346 2 bus->pm.thaw(dev), if implemented; usually calls the device driver's
347 ->pm.thaw() method.
348
349 3 type->pm.thaw(dev), if implemented; may call the device driver's
350 ->pm.thaw() method if not called by the bus type or class.
351
352 4 class->pm.thaw(dev), if implemented; may call the device driver's
353 ->pm.thaw() method if not called by the bus type or device type.
354
355 5 bus->pm.complete(dev), if implemented; may call the device driver's
356 ->pm.complete() method.
357
358Generally, the role of the ->pm.thaw() methods (including the "noirq" variants)
359is to bring the device back to the fully functional state, so that it may be
360used for saving the image, if necessary. The role of bus->pm.complete() is to
361reverse whatever bus->pm.prepare() did (likewise for the analogous device driver
362callbacks).
363
364After the image has been saved, the devices need to be prepared for putting the
365system into the low power state. That is analogous to suspending them before
366putting the system into the standby or memory sleep state and involves the
367following device notifications:
368
369 1 bus->pm.prepare(dev).
370
371 2 class->pm.poweroff(dev), if implemented; may invoke the device driver's
372 ->pm.poweroff() method if not called by the bus type or device type.
373
374 3 type->pm.poweroff(dev), if implemented; may invoke the device driver's
375 ->pm.poweroff() method if not called by the bus type or device class.
376
377 4 bus->pm.poweroff(dev), if implemented; usually calls the device driver's
378 ->pm.poweroff() method (if not called by the device class or type).
379
380 5 bus->pm.poweroff_noirq(dev), if implemented; may call the device
381 driver's ->pm.poweroff_noirq() method, depending on the bus type
382 in question.
383
384The difference between ->pm.poweroff() and the corresponding ->pm.suspend() (and
385analogously for the "noirq" variants) is that the former need not save the
386device's registers. Still, they should prepare the device for triggering
387system wakeup events if necessary and finally put it into the appropriate low
388power state.
David Brownell4fc08402006-08-10 16:38:28 -0700389
390
391Device Low Power (suspend) States
392---------------------------------
Rafael J. Wysocki624f6ec2010-03-26 23:53:42 +0100393Device low-power states aren't standard. One device might only handle
David Brownell4fc08402006-08-10 16:38:28 -0700394"on" and "off, while another might support a dozen different versions of
395"on" (how many engines are active?), plus a state that gets back to "on"
396faster than from a full "off".
397
398Some busses define rules about what different suspend states mean. PCI
399gives one example: after the suspend sequence completes, a non-legacy
400PCI device may not perform DMA or issue IRQs, and any wakeup events it
401issues would be issued through the PME# bus signal. Plus, there are
402several PCI-standard device states, some of which are optional.
403
Rafael J. Wysocki624f6ec2010-03-26 23:53:42 +0100404In contrast, integrated system-on-chip processors often use IRQs as the
David Brownell4fc08402006-08-10 16:38:28 -0700405wakeup event sources (so drivers would call enable_irq_wake) and might
406be able to treat DMA completion as a wakeup event (sometimes DMA can stay
407active too, it'd only be the CPU and some peripherals that sleep).
408
409Some details here may be platform-specific. Systems may have devices that
410can be fully active in certain sleep states, such as an LCD display that's
411refreshed using DMA while most of the system is sleeping lightly ... and
412its frame buffer might even be updated by a DSP or other non-Linux CPU while
413the Linux control processor stays idle.
414
415Moreover, the specific actions taken may depend on the target system state.
416One target system state might allow a given device to be very operational;
417another might require a hard shut down with re-initialization on resume.
418And two different target systems might use the same device in different
419ways; the aforementioned LCD might be active in one product's "standby",
420but a different product using the same SOC might work differently.
421
422
David Brownell4fc08402006-08-10 16:38:28 -0700423Resuming Devices
424----------------
425Resuming is done in multiple phases, much like suspending, with all
426devices processing each phase's calls before the next phase begins.
427
Rafael J. Wysocki624f6ec2010-03-26 23:53:42 +0100428Again, however, different callbacks are used depending on whether the system is
429waking up from the standby or memory sleep state ("suspend-to-RAM") or from
430hibernation ("suspend-to-disk").
David Brownell4fc08402006-08-10 16:38:28 -0700431
Rafael J. Wysocki624f6ec2010-03-26 23:53:42 +0100432If the system is waking up from the standby or memory sleep state, the phases
433are seen by driver notifications issued in this order:
David Brownell4fc08402006-08-10 16:38:28 -0700434
Rafael J. Wysocki624f6ec2010-03-26 23:53:42 +0100435 1 bus->pm.resume_noirq(dev) is called, if implemented. It may call the
436 device driver's ->pm.resume_noirq() method, depending on the bus type in
437 question.
David Brownell4fc08402006-08-10 16:38:28 -0700438
Rafael J. Wysocki624f6ec2010-03-26 23:53:42 +0100439 The role of this method is to perform actions that need to be performed
440 before device drivers' interrupt handlers are allowed to be invoked. If
441 the given bus type permits devices to share interrupt vectors, like PCI,
442 this method should bring the device and its driver into a state in which
443 the driver can recognize if the device is the source of incoming
444 interrupts, if any, and handle them correctly.
445
446 For example, the PCI bus type's ->pm.resume_noirq() puts the device into
447 the full power state (D0 in the PCI terminology) and restores the
448 standard configuration registers of the device. Then, it calls the
449 device driver's ->pm.resume_noirq() method to perform device-specific
450 actions needed at this stage of resume.
451
452 2 bus->pm.resume(dev) is called, if implemented. It usually calls the
453 device driver's ->pm.resume() method.
454
455 This call should generally bring the the device back to the working
456 state, so that it can do I/O as requested after the call has returned.
457 However, it may be more convenient to use the device class or device
458 type ->pm.resume() for this purpose, in which case the bus type's
459 ->pm.resume() method need not be implemented at all.
460
461 3 type->pm.resume(dev) is called, if implemented. It may invoke the
462 device driver's ->pm.resume() method, unless class->pm.resume(dev) or
463 bus->pm.resume() does that.
464
465 For devices that are not associated with any bus type or device class
466 this method plays the role of bus->pm.resume().
467
468 4 class->pm.resume(dev) is called, if implemented. It may invoke the
469 device driver's ->pm.resume() method, unless bus->pm.resume(dev) or
470 type->pm.resume() does that.
471
472 For devices that are not associated with any bus type or device type
473 this method plays the role of bus->pm.resume().
474
475 5 bus->pm.complete(dev) is called, if implemented. It is supposed to
476 invoke the device driver's ->pm.complete() method.
477
478 The role of this method is to reverse whatever bus->pm.prepare(dev)
479 (or the driver's ->pm.prepare()) did during suspend, if necessary.
David Brownell4fc08402006-08-10 16:38:28 -0700480
481At the end of those phases, drivers should normally be as functional as
482they were before suspending: I/O can be performed using DMA and IRQs, and
Rafael J. Wysocki624f6ec2010-03-26 23:53:42 +0100483the relevant clocks are gated on. In principle the device need not be
484"fully on"; it might be in a runtime lowpower/suspend state during suspend and
485the resume callbacks may try to restore that state, but that need not be
486desirable from the user's point of view. In fact, there are multiple reasons
487why it's better to always put devices into the "fully working" state in the
488system sleep resume callbacks and they are discussed in more detail in
489Documentation/power/runtime_pm.txt.
David Brownell4fc08402006-08-10 16:38:28 -0700490
491However, the details here may again be platform-specific. For example,
492some systems support multiple "run" states, and the mode in effect at
Rafael J. Wysocki624f6ec2010-03-26 23:53:42 +0100493the end of resume might not be the one which preceded suspension.
David Brownell4fc08402006-08-10 16:38:28 -0700494That means availability of certain clocks or power supplies changed,
495which could easily affect how a driver works.
496
David Brownell4fc08402006-08-10 16:38:28 -0700497Drivers need to be able to handle hardware which has been reset since the
498suspend methods were called, for example by complete reinitialization.
499This may be the hardest part, and the one most protected by NDA'd documents
500and chip errata. It's simplest if the hardware state hasn't changed since
Rafael J. Wysocki624f6ec2010-03-26 23:53:42 +0100501the suspend was carried out, but that can't be guaranteed (in fact, it ususally
502is not the case).
David Brownell4fc08402006-08-10 16:38:28 -0700503
504Drivers must also be prepared to notice that the device has been removed
505while the system was powered off, whenever that's physically possible.
506PCMCIA, MMC, USB, Firewire, SCSI, and even IDE are common examples of busses
507where common Linux platforms will see such removal. Details of how drivers
508will notice and handle such removals are currently bus-specific, and often
509involve a separate thread.
510
511
Rafael J. Wysocki624f6ec2010-03-26 23:53:42 +0100512Resume From Hibernation
513-----------------------
514Resuming from hibernation is, again, more complicated than resuming from a sleep
515state in which the contents of main memory are preserved, because it requires
516a system image to be loaded into memory and the pre-hibernation memory contents
517to be restored before control can be passed back to the image kernel.
518
519In principle, the image might be loaded into memory and the pre-hibernation
520memory contents might be restored by the boot loader. For this purpose,
521however, the boot loader would need to know the image kernel's entry point and
522there's no protocol defined for passing that information to boot loaders. As
523a workaround, the boot loader loads a fresh instance of the kernel, called the
524boot kernel, into memory and passes control to it in a usual way. Then, the
525boot kernel reads the hibernation image, restores the pre-hibernation memory
526contents and passes control to the image kernel. Thus, in fact, two different
527kernels are involved in resuming from hibernation and in general they are not
528only different because they play different roles in this operation. Actually,
529the boot kernel may be completely different from the image kernel. Not only
530the configuration of it, but also the version of it may be different.
531The consequences of this are important to device drivers and their subsystems
532(bus types, device classes and device types) too.
533
534Namely, to be able to load the hibernation image into memory, the boot kernel
535needs to include at least the subset of device drivers allowing it to access the
536storage medium containing the image, although it generally doesn't need to
537include all of the drivers included into the image kernel. After the image has
538been loaded the devices handled by those drivers need to be prepared for passing
539control back to the image kernel. This is very similar to the preparation of
540devices for creating a hibernation image described above. In fact, it is done
541in the same way, with the help of the ->pm.prepare(), ->pm.freeze() and
542->pm.freeze_noirq() callbacks, but only for device drivers included in the boot
543kernel (whose versions may generally be different from the versions of the
544analogous drivers from the image kernel).
545
546Should the restoration of the pre-hibernation memory contents fail, the boot
547kernel would carry out the procedure of "thawing" devices described above, using
548the ->pm.thaw_noirq(), ->pm.thaw(), and ->pm.complete() callbacks provided by
549subsystems and device drivers. This, however, is a very rare condition. Most
550often the pre-hibernation memory contents are restored successfully and control
551is passed to the image kernel that is now responsible for bringing the system
552back to the working state.
553
554To achieve this goal, among other things, the image kernel restores the
555pre-hibernation functionality of devices. This operation is analogous to the
556resuming of devices after waking up from the memory sleep state, although it
557involves different device notifications which are the following:
558
559 1 bus->pm.restore_noirq(dev), if implemented; may call the device driver's
560 ->pm.restore_noirq() method, depending on the bus type in question.
561
562 2 bus->pm.restore(dev), if implemented; usually calls the device driver's
563 ->pm.restore() method.
564
565 3 type->pm.restore(dev), if implemented; may call the device driver's
566 ->pm.restore() method if not called by the bus type or class.
567
568 4 class->pm.restore(dev), if implemented; may call the device driver's
569 ->pm.restore() method if not called by the bus type or device type.
570
571 5 bus->pm.complete(dev), if implemented; may call the device driver's
572 ->pm.complete() method.
573
574The roles of the ->pm.restore_noirq() and ->pm.restore() callbacks are analogous
575to the roles of the corresponding resume callbacks, but they must assume that
576the device may have been accessed before by the boot kernel. Consequently, the
577state of the device before they are called may be different from the state of it
578right prior to calling the resume callbacks. That difference usually doesn't
579matter, so the majority of device drivers can set their resume and restore
580callback pointers to the same routine. Nevertheless, different callback
581pointers are used in case there is a situation where it actually matters.
David Brownell4fc08402006-08-10 16:38:28 -0700582
583
584System Devices
585--------------
Linus Torvalds1da177e2005-04-16 15:20:36 -0700586System devices follow a slightly different API, which can be found in
587
588 include/linux/sysdev.h
589 drivers/base/sys.c
590
David Brownell4fc08402006-08-10 16:38:28 -0700591System devices will only be suspended with interrupts disabled, and after
592all other devices have been suspended. On resume, they will be resumed
593before any other devices, and also with interrupts disabled.
594
Rafael J. Wysocki624f6ec2010-03-26 23:53:42 +0100595That is, when the non-boot CPUs are all offline and IRQs are disabled on the
596remaining online CPU, then the sysdev_driver.suspend() phase is carried out, and
597the system enters a sleep state (or hibernation image is created). During
598resume (or after the image has been created) the sysdev_driver.resume() phase
599is carried out, IRQs are enabled on the only online CPU, the non-boot CPUs are
600enabled and that is followed by the "early resume" phase (in which the "noirq"
601callbacks provided by subsystems and device drivers are invoked).
David Brownell4fc08402006-08-10 16:38:28 -0700602
603Code to actually enter and exit the system-wide low power state sometimes
604involves hardware details that are only known to the boot firmware, and
605may leave a CPU running software (from SRAM or flash memory) that monitors
606the system and manages its wakeup sequence.
Linus Torvalds1da177e2005-04-16 15:20:36 -0700607
608
Rafael J. Wysocki624f6ec2010-03-26 23:53:42 +0100609Power Management Notifiers
610--------------------------
611As stated in Documentation/power/notifiers.txt, there are some operations that
612cannot be carried out by the power management callbacks discussed above, because
613carrying them out at these points would be too late or too early. To handle
614these cases subsystems and device drivers may register power management
615notifiers that are called before tasks are frozen and after they have been
616thawed.
617
618Generally speaking, the PM notifiers are suitable for performing actions that
619either require user space to be available, or at least won't interfere with user
620space in a wrong way.
621
622For details refer to Documentation/power/notifiers.txt.
623
624
Linus Torvalds1da177e2005-04-16 15:20:36 -0700625Runtime Power Management
David Brownell4fc08402006-08-10 16:38:28 -0700626========================
627Many devices are able to dynamically power down while the system is still
628running. This feature is useful for devices that are not being used, and
629can offer significant power savings on a running system. These devices
630often support a range of runtime power states, which might use names such
631as "off", "sleep", "idle", "active", and so on. Those states will in some
632cases (like PCI) be partially constrained by a bus the device uses, and will
633usually include hardware states that are also used in system sleep states.
Linus Torvalds1da177e2005-04-16 15:20:36 -0700634
Rafael J. Wysocki624f6ec2010-03-26 23:53:42 +0100635Note, however, that a system-wide power transition can be started while some
636devices are in low power states due to the runtime power management. The system
637sleep PM callbacks should generally recognize such situations and react to them
638appropriately, but the recommended actions to be taken in that cases are
639subsystem-specific.
Linus Torvalds1da177e2005-04-16 15:20:36 -0700640
Rafael J. Wysocki624f6ec2010-03-26 23:53:42 +0100641In some cases the decision may be made at the subsystem level while in some
642other cases the device driver may be left to decide. In some cases it may be
643desirable to leave a suspended device in that state during system-wide power
644transition, but in some other cases the device ought to be put back into the
645full power state, for example to be configured for system wakeup or so that its
646system wakeup capability can be disabled. That all depends on the hardware
647and the design of the subsystem and device driver in question.
Linus Torvalds1da177e2005-04-16 15:20:36 -0700648
Rafael J. Wysocki624f6ec2010-03-26 23:53:42 +0100649During system-wide resume from a sleep state it's better to put devices into
650the full power state, as explained in Documentation/power/runtime_pm.txt. Refer
651to that document for more information regarding this particular issue as well as
652for information on the device runtime power management framework in general.