blob: 5d4ae9a39f1d5df7ffa644d39d0235a71b22c125 [file] [log] [blame]
Linus Torvalds1da177e2005-04-16 15:20:36 -07001
2Device Power Management
3
4
5Device power management encompasses two areas - the ability to save
6state and transition a device to a low-power state when the system is
7entering a low-power state; and the ability to transition a device to
8a low-power state while the system is running (and independently of
9any other power management activity).
10
11
12Methods
13
14The methods to suspend and resume devices reside in struct bus_type:
15
16struct bus_type {
17 ...
18 int (*suspend)(struct device * dev, pm_message_t state);
19 int (*resume)(struct device * dev);
20};
21
22Each bus driver is responsible implementing these methods, translating
23the call into a bus-specific request and forwarding the call to the
24bus-specific drivers. For example, PCI drivers implement suspend() and
25resume() methods in struct pci_driver. The PCI core is simply
26responsible for translating the pointers to PCI-specific ones and
27calling the low-level driver.
28
29This is done to a) ease transition to the new power management methods
30and leverage the existing PM code in various bus drivers; b) allow
31buses to implement generic and default PM routines for devices, and c)
32make the flow of execution obvious to the reader.
33
34
35System Power Management
36
37When the system enters a low-power state, the device tree is walked in
38a depth-first fashion to transition each device into a low-power
39state. The ordering of the device tree is guaranteed by the order in
40which devices get registered - children are never registered before
41their ancestors, and devices are placed at the back of the list when
42registered. By walking the list in reverse order, we are guaranteed to
43suspend devices in the proper order.
44
45Devices are suspended once with interrupts enabled. Drivers are
46expected to stop I/O transactions, save device state, and place the
47device into a low-power state. Drivers may sleep, allocate memory,
48etc. at will.
49
50Some devices are broken and will inevitably have problems powering
51down or disabling themselves with interrupts enabled. For these
52special cases, they may return -EAGAIN. This will put the device on a
53list to be taken care of later. When interrupts are disabled, before
54we enter the low-power state, their drivers are called again to put
55their device to sleep.
56
57On resume, the devices that returned -EAGAIN will be called to power
58themselves back on with interrupts disabled. Once interrupts have been
59re-enabled, the rest of the drivers will be called to resume their
60devices. On resume, a driver is responsible for powering back on each
61device, restoring state, and re-enabling I/O transactions for that
62device.
63
64System devices follow a slightly different API, which can be found in
65
66 include/linux/sysdev.h
67 drivers/base/sys.c
68
69System devices will only be suspended with interrupts disabled, and
70after all other devices have been suspended. On resume, they will be
71resumed before any other devices, and also with interrupts disabled.
72
73
74Runtime Power Management
75
76Many devices are able to dynamically power down while the system is
77still running. This feature is useful for devices that are not being
78used, and can offer significant power savings on a running system.
79
80In each device's directory, there is a 'power' directory, which
81contains at least a 'state' file. Reading from this file displays what
82power state the device is currently in. Writing to this file initiates
83a transition to the specified power state, which must be a decimal in
84the range 1-3, inclusive; or 0 for 'On'.
85
86The PM core will call the ->suspend() method in the bus_type object
87that the device belongs to if the specified state is not 0, or
88->resume() if it is.
89
90Nothing will happen if the specified state is the same state the
91device is currently in.
92
93If the device is already in a low-power state, and the specified state
94is another, but different, low-power state, the ->resume() method will
95first be called to power the device back on, then ->suspend() will be
96called again with the new state.
97
98The driver is responsible for saving the working state of the device
99and putting it into the low-power state specified. If this was
100successful, it returns 0, and the device's power_state field is
101updated.
102
103The driver must take care to know whether or not it is able to
104properly resume the device, including all step of reinitialization
105necessary. (This is the hardest part, and the one most protected by
106NDA'd documents).
107
108The driver must also take care not to suspend a device that is
109currently in use. It is their responsibility to provide their own
110exclusion mechanisms.
111
112The runtime power transition happens with interrupts enabled. If a
113device cannot support being powered down with interrupts, it may
114return -EAGAIN (as it would during a system power management
115transition), but it will _not_ be called again, and the transaction
116will fail.
117
118There is currently no way to know what states a device or driver
119supports a priori. This will change in the future.
120
121pm_message_t meaning
122
123pm_message_t has two fields. event ("major"), and flags. If driver
124does not know event code, it aborts the request, returning error. Some
125drivers may need to deal with special cases based on the actual type
126of suspend operation being done at the system level. This is why
127there are flags.
128
129Event codes are:
130
131ON -- no need to do anything except special cases like broken
132HW.
133
134# NOTIFICATION -- pretty much same as ON?
135
136FREEZE -- stop DMA and interrupts, and be prepared to reinit HW from
137scratch. That probably means stop accepting upstream requests, the
138actual policy of what to do with them beeing specific to a given
139driver. It's acceptable for a network driver to just drop packets
140while a block driver is expected to block the queue so no request is
141lost. (Use IDE as an example on how to do that). FREEZE requires no
142power state change, and it's expected for drivers to be able to
143quickly transition back to operating state.
144
145SUSPEND -- like FREEZE, but also put hardware into low-power state. If
146there's need to distinguish several levels of sleep, additional flag
147is probably best way to do that.
148
149Transitions are only from a resumed state to a suspended state, never
150between 2 suspended states. (ON -> FREEZE or ON -> SUSPEND can happen,
151FREEZE -> SUSPEND or SUSPEND -> FREEZE can not).
152
153All events are:
154
155[NOTE NOTE NOTE: If you are driver author, you should not care; you
156should only look at event, and ignore flags.]
157
158#Prepare for suspend -- userland is still running but we are going to
159#enter suspend state. This gives drivers chance to load firmware from
160#disk and store it in memory, or do other activities taht require
161#operating userland, ability to kmalloc GFP_KERNEL, etc... All of these
162#are forbiden once the suspend dance is started.. event = ON, flags =
163#PREPARE_TO_SUSPEND
164
165Apm standby -- prepare for APM event. Quiesce devices to make life
166easier for APM BIOS. event = FREEZE, flags = APM_STANDBY
167
168Apm suspend -- same as APM_STANDBY, but it we should probably avoid
169spinning down disks. event = FREEZE, flags = APM_SUSPEND
170
171System halt, reboot -- quiesce devices to make life easier for BIOS. event
172= FREEZE, flags = SYSTEM_HALT or SYSTEM_REBOOT
173
174System shutdown -- at least disks need to be spun down, or data may be
175lost. Quiesce devices, just to make life easier for BIOS. event =
176FREEZE, flags = SYSTEM_SHUTDOWN
177
178Kexec -- turn off DMAs and put hardware into some state where new
179kernel can take over. event = FREEZE, flags = KEXEC
180
181Powerdown at end of swsusp -- very similar to SYSTEM_SHUTDOWN, except wake
182may need to be enabled on some devices. This actually has at least 3
183subtypes, system can reboot, enter S4 and enter S5 at the end of
184swsusp. event = FREEZE, flags = SWSUSP and one of SYSTEM_REBOOT,
185SYSTEM_SHUTDOWN, SYSTEM_S4
186
187Suspend to ram -- put devices into low power state. event = SUSPEND,
188flags = SUSPEND_TO_RAM
189
190Freeze for swsusp snapshot -- stop DMA and interrupts. No need to put
191devices into low power mode, but you must be able to reinitialize
192device from scratch in resume method. This has two flavors, its done
193once on suspending kernel, once on resuming kernel. event = FREEZE,
194flags = DURING_SUSPEND or DURING_RESUME
195
196Device detach requested from /sys -- deinitialize device; proably same as
197SYSTEM_SHUTDOWN, I do not understand this one too much. probably event
198= FREEZE, flags = DEV_DETACH.
199
200#These are not really events sent:
201#
202#System fully on -- device is working normally; this is probably never
203#passed to suspend() method... event = ON, flags = 0
204#
205#Ready after resume -- userland is now running, again. Time to free any
206#memory you ate during prepare to suspend... event = ON, flags =
207#READY_AFTER_RESUME
208#
209
210Driver Detach Power Management
211
212The kernel now supports the ability to place a device in a low-power
213state when it is detached from its driver, which happens when its
214module is removed.
215
216Each device contains a 'detach_state' file in its sysfs directory
217which can be used to control this state. Reading from this file
218displays what the current detach state is set to. This is 0 (On) by
219default. A user may write a positive integer value to this file in the
220range of 1-4 inclusive.
221
222A value of 1-3 will indicate the device should be placed in that
223low-power state, which will cause ->suspend() to be called for that
224device. A value of 4 indicates that the device should be shutdown, so
225->shutdown() will be called for that device.
226
227The driver is responsible for reinitializing the device when the
228module is re-inserted during it's ->probe() (or equivalent) method.
229The driver core will not call any extra functions when binding the
230device to the driver.
231
232pm_message_t meaning
233
234pm_message_t has two fields. event ("major"), and flags. If driver
235does not know event code, it aborts the request, returning error. Some
236drivers may need to deal with special cases based on the actual type
237of suspend operation being done at the system level. This is why
238there are flags.
239
240Event codes are:
241
242ON -- no need to do anything except special cases like broken
243HW.
244
245# NOTIFICATION -- pretty much same as ON?
246
247FREEZE -- stop DMA and interrupts, and be prepared to reinit HW from
248scratch. That probably means stop accepting upstream requests, the
249actual policy of what to do with them being specific to a given
250driver. It's acceptable for a network driver to just drop packets
251while a block driver is expected to block the queue so no request is
252lost. (Use IDE as an example on how to do that). FREEZE requires no
253power state change, and it's expected for drivers to be able to
254quickly transition back to operating state.
255
256SUSPEND -- like FREEZE, but also put hardware into low-power state. If
257there's need to distinguish several levels of sleep, additional flag
258is probably best way to do that.
259
260Transitions are only from a resumed state to a suspended state, never
261between 2 suspended states. (ON -> FREEZE or ON -> SUSPEND can happen,
262FREEZE -> SUSPEND or SUSPEND -> FREEZE can not).
263
264All events are:
265
266[NOTE NOTE NOTE: If you are driver author, you should not care; you
267should only look at event, and ignore flags.]
268
269#Prepare for suspend -- userland is still running but we are going to
270#enter suspend state. This gives drivers chance to load firmware from
271#disk and store it in memory, or do other activities taht require
272#operating userland, ability to kmalloc GFP_KERNEL, etc... All of these
273#are forbiden once the suspend dance is started.. event = ON, flags =
274#PREPARE_TO_SUSPEND
275
276Apm standby -- prepare for APM event. Quiesce devices to make life
277easier for APM BIOS. event = FREEZE, flags = APM_STANDBY
278
279Apm suspend -- same as APM_STANDBY, but it we should probably avoid
280spinning down disks. event = FREEZE, flags = APM_SUSPEND
281
282System halt, reboot -- quiesce devices to make life easier for BIOS. event
283= FREEZE, flags = SYSTEM_HALT or SYSTEM_REBOOT
284
285System shutdown -- at least disks need to be spun down, or data may be
286lost. Quiesce devices, just to make life easier for BIOS. event =
287FREEZE, flags = SYSTEM_SHUTDOWN
288
289Kexec -- turn off DMAs and put hardware into some state where new
290kernel can take over. event = FREEZE, flags = KEXEC
291
292Powerdown at end of swsusp -- very similar to SYSTEM_SHUTDOWN, except wake
293may need to be enabled on some devices. This actually has at least 3
294subtypes, system can reboot, enter S4 and enter S5 at the end of
295swsusp. event = FREEZE, flags = SWSUSP and one of SYSTEM_REBOOT,
296SYSTEM_SHUTDOWN, SYSTEM_S4
297
298Suspend to ram -- put devices into low power state. event = SUSPEND,
299flags = SUSPEND_TO_RAM
300
301Freeze for swsusp snapshot -- stop DMA and interrupts. No need to put
302devices into low power mode, but you must be able to reinitialize
303device from scratch in resume method. This has two flavors, its done
304once on suspending kernel, once on resuming kernel. event = FREEZE,
305flags = DURING_SUSPEND or DURING_RESUME
306
307Device detach requested from /sys -- deinitialize device; proably same as
308SYSTEM_SHUTDOWN, I do not understand this one too much. probably event
309= FREEZE, flags = DEV_DETACH.
310
311#These are not really events sent:
312#
313#System fully on -- device is working normally; this is probably never
314#passed to suspend() method... event = ON, flags = 0
315#
316#Ready after resume -- userland is now running, again. Time to free any
317#memory you ate during prepare to suspend... event = ON, flags =
318#READY_AFTER_RESUME
319#