blob: f987afe43e28e1327ee76ab4f3fdb617abfe0829 [file] [log] [blame]
Linus Torvalds1da177e2005-04-16 15:20:36 -07001
2Device Power Management
3
4
5Device power management encompasses two areas - the ability to save
6state and transition a device to a low-power state when the system is
7entering a low-power state; and the ability to transition a device to
8a low-power state while the system is running (and independently of
9any other power management activity).
10
11
12Methods
13
14The methods to suspend and resume devices reside in struct bus_type:
15
16struct bus_type {
17 ...
18 int (*suspend)(struct device * dev, pm_message_t state);
19 int (*resume)(struct device * dev);
20};
21
22Each bus driver is responsible implementing these methods, translating
23the call into a bus-specific request and forwarding the call to the
24bus-specific drivers. For example, PCI drivers implement suspend() and
25resume() methods in struct pci_driver. The PCI core is simply
26responsible for translating the pointers to PCI-specific ones and
27calling the low-level driver.
28
29This is done to a) ease transition to the new power management methods
30and leverage the existing PM code in various bus drivers; b) allow
31buses to implement generic and default PM routines for devices, and c)
32make the flow of execution obvious to the reader.
33
34
35System Power Management
36
37When the system enters a low-power state, the device tree is walked in
38a depth-first fashion to transition each device into a low-power
39state. The ordering of the device tree is guaranteed by the order in
40which devices get registered - children are never registered before
41their ancestors, and devices are placed at the back of the list when
42registered. By walking the list in reverse order, we are guaranteed to
43suspend devices in the proper order.
44
45Devices are suspended once with interrupts enabled. Drivers are
46expected to stop I/O transactions, save device state, and place the
47device into a low-power state. Drivers may sleep, allocate memory,
48etc. at will.
49
50Some devices are broken and will inevitably have problems powering
51down or disabling themselves with interrupts enabled. For these
52special cases, they may return -EAGAIN. This will put the device on a
53list to be taken care of later. When interrupts are disabled, before
54we enter the low-power state, their drivers are called again to put
55their device to sleep.
56
57On resume, the devices that returned -EAGAIN will be called to power
58themselves back on with interrupts disabled. Once interrupts have been
59re-enabled, the rest of the drivers will be called to resume their
60devices. On resume, a driver is responsible for powering back on each
61device, restoring state, and re-enabling I/O transactions for that
62device.
63
64System devices follow a slightly different API, which can be found in
65
66 include/linux/sysdev.h
67 drivers/base/sys.c
68
69System devices will only be suspended with interrupts disabled, and
70after all other devices have been suspended. On resume, they will be
71resumed before any other devices, and also with interrupts disabled.
72
73
74Runtime Power Management
75
76Many devices are able to dynamically power down while the system is
77still running. This feature is useful for devices that are not being
78used, and can offer significant power savings on a running system.
79
80In each device's directory, there is a 'power' directory, which
81contains at least a 'state' file. Reading from this file displays what
82power state the device is currently in. Writing to this file initiates
83a transition to the specified power state, which must be a decimal in
84the range 1-3, inclusive; or 0 for 'On'.
85
86The PM core will call the ->suspend() method in the bus_type object
87that the device belongs to if the specified state is not 0, or
88->resume() if it is.
89
90Nothing will happen if the specified state is the same state the
91device is currently in.
92
93If the device is already in a low-power state, and the specified state
94is another, but different, low-power state, the ->resume() method will
95first be called to power the device back on, then ->suspend() will be
96called again with the new state.
97
98The driver is responsible for saving the working state of the device
99and putting it into the low-power state specified. If this was
100successful, it returns 0, and the device's power_state field is
101updated.
102
103The driver must take care to know whether or not it is able to
104properly resume the device, including all step of reinitialization
105necessary. (This is the hardest part, and the one most protected by
106NDA'd documents).
107
108The driver must also take care not to suspend a device that is
109currently in use. It is their responsibility to provide their own
110exclusion mechanisms.
111
112The runtime power transition happens with interrupts enabled. If a
113device cannot support being powered down with interrupts, it may
114return -EAGAIN (as it would during a system power management
115transition), but it will _not_ be called again, and the transaction
116will fail.
117
118There is currently no way to know what states a device or driver
119supports a priori. This will change in the future.
120
121pm_message_t meaning
122
123pm_message_t has two fields. event ("major"), and flags. If driver
124does not know event code, it aborts the request, returning error. Some
125drivers may need to deal with special cases based on the actual type
126of suspend operation being done at the system level. This is why
127there are flags.
128
129Event codes are:
130
131ON -- no need to do anything except special cases like broken
132HW.
133
134# NOTIFICATION -- pretty much same as ON?
135
136FREEZE -- stop DMA and interrupts, and be prepared to reinit HW from
137scratch. That probably means stop accepting upstream requests, the
138actual policy of what to do with them beeing specific to a given
139driver. It's acceptable for a network driver to just drop packets
140while a block driver is expected to block the queue so no request is
141lost. (Use IDE as an example on how to do that). FREEZE requires no
142power state change, and it's expected for drivers to be able to
143quickly transition back to operating state.
144
145SUSPEND -- like FREEZE, but also put hardware into low-power state. If
146there's need to distinguish several levels of sleep, additional flag
147is probably best way to do that.
148
149Transitions are only from a resumed state to a suspended state, never
150between 2 suspended states. (ON -> FREEZE or ON -> SUSPEND can happen,
151FREEZE -> SUSPEND or SUSPEND -> FREEZE can not).
152
153All events are:
154
155[NOTE NOTE NOTE: If you are driver author, you should not care; you
156should only look at event, and ignore flags.]
157
158#Prepare for suspend -- userland is still running but we are going to
159#enter suspend state. This gives drivers chance to load firmware from
160#disk and store it in memory, or do other activities taht require
161#operating userland, ability to kmalloc GFP_KERNEL, etc... All of these
162#are forbiden once the suspend dance is started.. event = ON, flags =
163#PREPARE_TO_SUSPEND
164
165Apm standby -- prepare for APM event. Quiesce devices to make life
166easier for APM BIOS. event = FREEZE, flags = APM_STANDBY
167
168Apm suspend -- same as APM_STANDBY, but it we should probably avoid
169spinning down disks. event = FREEZE, flags = APM_SUSPEND
170
171System halt, reboot -- quiesce devices to make life easier for BIOS. event
172= FREEZE, flags = SYSTEM_HALT or SYSTEM_REBOOT
173
174System shutdown -- at least disks need to be spun down, or data may be
175lost. Quiesce devices, just to make life easier for BIOS. event =
176FREEZE, flags = SYSTEM_SHUTDOWN
177
178Kexec -- turn off DMAs and put hardware into some state where new
179kernel can take over. event = FREEZE, flags = KEXEC
180
181Powerdown at end of swsusp -- very similar to SYSTEM_SHUTDOWN, except wake
182may need to be enabled on some devices. This actually has at least 3
183subtypes, system can reboot, enter S4 and enter S5 at the end of
184swsusp. event = FREEZE, flags = SWSUSP and one of SYSTEM_REBOOT,
185SYSTEM_SHUTDOWN, SYSTEM_S4
186
187Suspend to ram -- put devices into low power state. event = SUSPEND,
188flags = SUSPEND_TO_RAM
189
190Freeze for swsusp snapshot -- stop DMA and interrupts. No need to put
191devices into low power mode, but you must be able to reinitialize
192device from scratch in resume method. This has two flavors, its done
193once on suspending kernel, once on resuming kernel. event = FREEZE,
194flags = DURING_SUSPEND or DURING_RESUME
195
196Device detach requested from /sys -- deinitialize device; proably same as
197SYSTEM_SHUTDOWN, I do not understand this one too much. probably event
198= FREEZE, flags = DEV_DETACH.
199
200#These are not really events sent:
201#
202#System fully on -- device is working normally; this is probably never
203#passed to suspend() method... event = ON, flags = 0
204#
205#Ready after resume -- userland is now running, again. Time to free any
206#memory you ate during prepare to suspend... event = ON, flags =
207#READY_AFTER_RESUME
208#
209
Linus Torvalds1da177e2005-04-16 15:20:36 -0700210
211pm_message_t meaning
212
213pm_message_t has two fields. event ("major"), and flags. If driver
214does not know event code, it aborts the request, returning error. Some
215drivers may need to deal with special cases based on the actual type
216of suspend operation being done at the system level. This is why
217there are flags.
218
219Event codes are:
220
221ON -- no need to do anything except special cases like broken
222HW.
223
224# NOTIFICATION -- pretty much same as ON?
225
226FREEZE -- stop DMA and interrupts, and be prepared to reinit HW from
227scratch. That probably means stop accepting upstream requests, the
228actual policy of what to do with them being specific to a given
229driver. It's acceptable for a network driver to just drop packets
230while a block driver is expected to block the queue so no request is
231lost. (Use IDE as an example on how to do that). FREEZE requires no
232power state change, and it's expected for drivers to be able to
233quickly transition back to operating state.
234
235SUSPEND -- like FREEZE, but also put hardware into low-power state. If
236there's need to distinguish several levels of sleep, additional flag
237is probably best way to do that.
238
239Transitions are only from a resumed state to a suspended state, never
240between 2 suspended states. (ON -> FREEZE or ON -> SUSPEND can happen,
241FREEZE -> SUSPEND or SUSPEND -> FREEZE can not).
242
243All events are:
244
245[NOTE NOTE NOTE: If you are driver author, you should not care; you
246should only look at event, and ignore flags.]
247
248#Prepare for suspend -- userland is still running but we are going to
249#enter suspend state. This gives drivers chance to load firmware from
250#disk and store it in memory, or do other activities taht require
251#operating userland, ability to kmalloc GFP_KERNEL, etc... All of these
252#are forbiden once the suspend dance is started.. event = ON, flags =
253#PREPARE_TO_SUSPEND
254
255Apm standby -- prepare for APM event. Quiesce devices to make life
256easier for APM BIOS. event = FREEZE, flags = APM_STANDBY
257
258Apm suspend -- same as APM_STANDBY, but it we should probably avoid
259spinning down disks. event = FREEZE, flags = APM_SUSPEND
260
261System halt, reboot -- quiesce devices to make life easier for BIOS. event
262= FREEZE, flags = SYSTEM_HALT or SYSTEM_REBOOT
263
264System shutdown -- at least disks need to be spun down, or data may be
265lost. Quiesce devices, just to make life easier for BIOS. event =
266FREEZE, flags = SYSTEM_SHUTDOWN
267
268Kexec -- turn off DMAs and put hardware into some state where new
269kernel can take over. event = FREEZE, flags = KEXEC
270
271Powerdown at end of swsusp -- very similar to SYSTEM_SHUTDOWN, except wake
272may need to be enabled on some devices. This actually has at least 3
273subtypes, system can reboot, enter S4 and enter S5 at the end of
274swsusp. event = FREEZE, flags = SWSUSP and one of SYSTEM_REBOOT,
275SYSTEM_SHUTDOWN, SYSTEM_S4
276
277Suspend to ram -- put devices into low power state. event = SUSPEND,
278flags = SUSPEND_TO_RAM
279
280Freeze for swsusp snapshot -- stop DMA and interrupts. No need to put
281devices into low power mode, but you must be able to reinitialize
282device from scratch in resume method. This has two flavors, its done
283once on suspending kernel, once on resuming kernel. event = FREEZE,
284flags = DURING_SUSPEND or DURING_RESUME
285
286Device detach requested from /sys -- deinitialize device; proably same as
287SYSTEM_SHUTDOWN, I do not understand this one too much. probably event
288= FREEZE, flags = DEV_DETACH.
289
290#These are not really events sent:
291#
292#System fully on -- device is working normally; this is probably never
293#passed to suspend() method... event = ON, flags = 0
294#
295#Ready after resume -- userland is now running, again. Time to free any
296#memory you ate during prepare to suspend... event = ON, flags =
297#READY_AFTER_RESUME
298#