Rafael J. Wysocki | 5e928f7 | 2009-08-18 23:38:32 +0200 | [diff] [blame] | 1 | Run-time Power Management Framework for I/O Devices |
| 2 | |
| 3 | (C) 2009 Rafael J. Wysocki <rjw@sisk.pl>, Novell Inc. |
| 4 | |
| 5 | 1. Introduction |
| 6 | |
| 7 | Support for run-time power management (run-time PM) of I/O devices is provided |
| 8 | at the power management core (PM core) level by means of: |
| 9 | |
| 10 | * The power management workqueue pm_wq in which bus types and device drivers can |
| 11 | put their PM-related work items. It is strongly recommended that pm_wq be |
| 12 | used for queuing all work items related to run-time PM, because this allows |
| 13 | them to be synchronized with system-wide power transitions (suspend to RAM, |
| 14 | hibernation and resume from system sleep states). pm_wq is declared in |
| 15 | include/linux/pm_runtime.h and defined in kernel/power/main.c. |
| 16 | |
| 17 | * A number of run-time PM fields in the 'power' member of 'struct device' (which |
| 18 | is of the type 'struct dev_pm_info', defined in include/linux/pm.h) that can |
| 19 | be used for synchronizing run-time PM operations with one another. |
| 20 | |
| 21 | * Three device run-time PM callbacks in 'struct dev_pm_ops' (defined in |
| 22 | include/linux/pm.h). |
| 23 | |
| 24 | * A set of helper functions defined in drivers/base/power/runtime.c that can be |
| 25 | used for carrying out run-time PM operations in such a way that the |
| 26 | synchronization between them is taken care of by the PM core. Bus types and |
| 27 | device drivers are encouraged to use these functions. |
| 28 | |
| 29 | The run-time PM callbacks present in 'struct dev_pm_ops', the device run-time PM |
| 30 | fields of 'struct dev_pm_info' and the core helper functions provided for |
| 31 | run-time PM are described below. |
| 32 | |
| 33 | 2. Device Run-time PM Callbacks |
| 34 | |
| 35 | There are three device run-time PM callbacks defined in 'struct dev_pm_ops': |
| 36 | |
| 37 | struct dev_pm_ops { |
| 38 | ... |
| 39 | int (*runtime_suspend)(struct device *dev); |
| 40 | int (*runtime_resume)(struct device *dev); |
| 41 | void (*runtime_idle)(struct device *dev); |
| 42 | ... |
| 43 | }; |
| 44 | |
| 45 | The ->runtime_suspend() callback is executed by the PM core for the bus type of |
| 46 | the device being suspended. The bus type's callback is then _entirely_ |
| 47 | _responsible_ for handling the device as appropriate, which may, but need not |
| 48 | include executing the device driver's own ->runtime_suspend() callback (from the |
| 49 | PM core's point of view it is not necessary to implement a ->runtime_suspend() |
| 50 | callback in a device driver as long as the bus type's ->runtime_suspend() knows |
| 51 | what to do to handle the device). |
| 52 | |
| 53 | * Once the bus type's ->runtime_suspend() callback has completed successfully |
| 54 | for given device, the PM core regards the device as suspended, which need |
| 55 | not mean that the device has been put into a low power state. It is |
| 56 | supposed to mean, however, that the device will not process data and will |
| 57 | not communicate with the CPU(s) and RAM until its bus type's |
| 58 | ->runtime_resume() callback is executed for it. The run-time PM status of |
| 59 | a device after successful execution of its bus type's ->runtime_suspend() |
| 60 | callback is 'suspended'. |
| 61 | |
| 62 | * If the bus type's ->runtime_suspend() callback returns -EBUSY or -EAGAIN, |
| 63 | the device's run-time PM status is supposed to be 'active', which means that |
| 64 | the device _must_ be fully operational afterwards. |
| 65 | |
| 66 | * If the bus type's ->runtime_suspend() callback returns an error code |
| 67 | different from -EBUSY or -EAGAIN, the PM core regards this as a fatal |
| 68 | error and will refuse to run the helper functions described in Section 4 |
| 69 | for the device, until the status of it is directly set either to 'active' |
| 70 | or to 'suspended' (the PM core provides special helper functions for this |
| 71 | purpose). |
| 72 | |
| 73 | In particular, if the driver requires remote wakeup capability for proper |
| 74 | functioning and device_may_wakeup() returns 'false' for the device, then |
| 75 | ->runtime_suspend() should return -EBUSY. On the other hand, if |
| 76 | device_may_wakeup() returns 'true' for the device and the device is put |
| 77 | into a low power state during the execution of its bus type's |
| 78 | ->runtime_suspend(), it is expected that remote wake-up (i.e. hardware mechanism |
| 79 | allowing the device to request a change of its power state, such as PCI PME) |
| 80 | will be enabled for the device. Generally, remote wake-up should be enabled |
| 81 | for all input devices put into a low power state at run time. |
| 82 | |
| 83 | The ->runtime_resume() callback is executed by the PM core for the bus type of |
| 84 | the device being woken up. The bus type's callback is then _entirely_ |
| 85 | _responsible_ for handling the device as appropriate, which may, but need not |
| 86 | include executing the device driver's own ->runtime_resume() callback (from the |
| 87 | PM core's point of view it is not necessary to implement a ->runtime_resume() |
| 88 | callback in a device driver as long as the bus type's ->runtime_resume() knows |
| 89 | what to do to handle the device). |
| 90 | |
| 91 | * Once the bus type's ->runtime_resume() callback has completed successfully, |
| 92 | the PM core regards the device as fully operational, which means that the |
| 93 | device _must_ be able to complete I/O operations as needed. The run-time |
| 94 | PM status of the device is then 'active'. |
| 95 | |
| 96 | * If the bus type's ->runtime_resume() callback returns an error code, the PM |
| 97 | core regards this as a fatal error and will refuse to run the helper |
| 98 | functions described in Section 4 for the device, until its status is |
| 99 | directly set either to 'active' or to 'suspended' (the PM core provides |
| 100 | special helper functions for this purpose). |
| 101 | |
| 102 | The ->runtime_idle() callback is executed by the PM core for the bus type of |
| 103 | given device whenever the device appears to be idle, which is indicated to the |
| 104 | PM core by two counters, the device's usage counter and the counter of 'active' |
| 105 | children of the device. |
| 106 | |
| 107 | * If any of these counters is decreased using a helper function provided by |
| 108 | the PM core and it turns out to be equal to zero, the other counter is |
| 109 | checked. If that counter also is equal to zero, the PM core executes the |
| 110 | device bus type's ->runtime_idle() callback (with the device as an |
| 111 | argument). |
| 112 | |
| 113 | The action performed by a bus type's ->runtime_idle() callback is totally |
| 114 | dependent on the bus type in question, but the expected and recommended action |
| 115 | is to check if the device can be suspended (i.e. if all of the conditions |
| 116 | necessary for suspending the device are satisfied) and to queue up a suspend |
| 117 | request for the device in that case. |
| 118 | |
| 119 | The helper functions provided by the PM core, described in Section 4, guarantee |
| 120 | that the following constraints are met with respect to the bus type's run-time |
| 121 | PM callbacks: |
| 122 | |
| 123 | (1) The callbacks are mutually exclusive (e.g. it is forbidden to execute |
| 124 | ->runtime_suspend() in parallel with ->runtime_resume() or with another |
| 125 | instance of ->runtime_suspend() for the same device) with the exception that |
| 126 | ->runtime_suspend() or ->runtime_resume() can be executed in parallel with |
| 127 | ->runtime_idle() (although ->runtime_idle() will not be started while any |
| 128 | of the other callbacks is being executed for the same device). |
| 129 | |
| 130 | (2) ->runtime_idle() and ->runtime_suspend() can only be executed for 'active' |
| 131 | devices (i.e. the PM core will only execute ->runtime_idle() or |
| 132 | ->runtime_suspend() for the devices the run-time PM status of which is |
| 133 | 'active'). |
| 134 | |
| 135 | (3) ->runtime_idle() and ->runtime_suspend() can only be executed for a device |
| 136 | the usage counter of which is equal to zero _and_ either the counter of |
| 137 | 'active' children of which is equal to zero, or the 'power.ignore_children' |
| 138 | flag of which is set. |
| 139 | |
| 140 | (4) ->runtime_resume() can only be executed for 'suspended' devices (i.e. the |
| 141 | PM core will only execute ->runtime_resume() for the devices the run-time |
| 142 | PM status of which is 'suspended'). |
| 143 | |
| 144 | Additionally, the helper functions provided by the PM core obey the following |
| 145 | rules: |
| 146 | |
| 147 | * If ->runtime_suspend() is about to be executed or there's a pending request |
| 148 | to execute it, ->runtime_idle() will not be executed for the same device. |
| 149 | |
| 150 | * A request to execute or to schedule the execution of ->runtime_suspend() |
| 151 | will cancel any pending requests to execute ->runtime_idle() for the same |
| 152 | device. |
| 153 | |
| 154 | * If ->runtime_resume() is about to be executed or there's a pending request |
| 155 | to execute it, the other callbacks will not be executed for the same device. |
| 156 | |
| 157 | * A request to execute ->runtime_resume() will cancel any pending or |
| 158 | scheduled requests to execute the other callbacks for the same device. |
| 159 | |
| 160 | 3. Run-time PM Device Fields |
| 161 | |
| 162 | The following device run-time PM fields are present in 'struct dev_pm_info', as |
| 163 | defined in include/linux/pm.h: |
| 164 | |
| 165 | struct timer_list suspend_timer; |
| 166 | - timer used for scheduling (delayed) suspend request |
| 167 | |
| 168 | unsigned long timer_expires; |
| 169 | - timer expiration time, in jiffies (if this is different from zero, the |
| 170 | timer is running and will expire at that time, otherwise the timer is not |
| 171 | running) |
| 172 | |
| 173 | struct work_struct work; |
| 174 | - work structure used for queuing up requests (i.e. work items in pm_wq) |
| 175 | |
| 176 | wait_queue_head_t wait_queue; |
| 177 | - wait queue used if any of the helper functions needs to wait for another |
| 178 | one to complete |
| 179 | |
| 180 | spinlock_t lock; |
| 181 | - lock used for synchronisation |
| 182 | |
| 183 | atomic_t usage_count; |
| 184 | - the usage counter of the device |
| 185 | |
| 186 | atomic_t child_count; |
| 187 | - the count of 'active' children of the device |
| 188 | |
| 189 | unsigned int ignore_children; |
| 190 | - if set, the value of child_count is ignored (but still updated) |
| 191 | |
| 192 | unsigned int disable_depth; |
| 193 | - used for disabling the helper funcions (they work normally if this is |
| 194 | equal to zero); the initial value of it is 1 (i.e. run-time PM is |
| 195 | initially disabled for all devices) |
| 196 | |
| 197 | unsigned int runtime_error; |
| 198 | - if set, there was a fatal error (one of the callbacks returned error code |
| 199 | as described in Section 2), so the helper funtions will not work until |
| 200 | this flag is cleared; this is the error code returned by the failing |
| 201 | callback |
| 202 | |
| 203 | unsigned int idle_notification; |
| 204 | - if set, ->runtime_idle() is being executed |
| 205 | |
| 206 | unsigned int request_pending; |
| 207 | - if set, there's a pending request (i.e. a work item queued up into pm_wq) |
| 208 | |
| 209 | enum rpm_request request; |
| 210 | - type of request that's pending (valid if request_pending is set) |
| 211 | |
| 212 | unsigned int deferred_resume; |
| 213 | - set if ->runtime_resume() is about to be run while ->runtime_suspend() is |
| 214 | being executed for that device and it is not practical to wait for the |
| 215 | suspend to complete; means "start a resume as soon as you've suspended" |
| 216 | |
| 217 | enum rpm_status runtime_status; |
| 218 | - the run-time PM status of the device; this field's initial value is |
| 219 | RPM_SUSPENDED, which means that each device is initially regarded by the |
| 220 | PM core as 'suspended', regardless of its real hardware status |
| 221 | |
| 222 | All of the above fields are members of the 'power' member of 'struct device'. |
| 223 | |
| 224 | 4. Run-time PM Device Helper Functions |
| 225 | |
| 226 | The following run-time PM helper functions are defined in |
| 227 | drivers/base/power/runtime.c and include/linux/pm_runtime.h: |
| 228 | |
| 229 | void pm_runtime_init(struct device *dev); |
| 230 | - initialize the device run-time PM fields in 'struct dev_pm_info' |
| 231 | |
| 232 | void pm_runtime_remove(struct device *dev); |
| 233 | - make sure that the run-time PM of the device will be disabled after |
| 234 | removing the device from device hierarchy |
| 235 | |
| 236 | int pm_runtime_idle(struct device *dev); |
| 237 | - execute ->runtime_idle() for the device's bus type; returns 0 on success |
| 238 | or error code on failure, where -EINPROGRESS means that ->runtime_idle() |
| 239 | is already being executed |
| 240 | |
| 241 | int pm_runtime_suspend(struct device *dev); |
| 242 | - execute ->runtime_suspend() for the device's bus type; returns 0 on |
| 243 | success, 1 if the device's run-time PM status was already 'suspended', or |
| 244 | error code on failure, where -EAGAIN or -EBUSY means it is safe to attempt |
| 245 | to suspend the device again in future |
| 246 | |
| 247 | int pm_runtime_resume(struct device *dev); |
| 248 | - execute ->runtime_resume() for the device's bus type; returns 0 on |
| 249 | success, 1 if the device's run-time PM status was already 'active' or |
| 250 | error code on failure, where -EAGAIN means it may be safe to attempt to |
| 251 | resume the device again in future, but 'power.runtime_error' should be |
| 252 | checked additionally |
| 253 | |
| 254 | int pm_request_idle(struct device *dev); |
| 255 | - submit a request to execute ->runtime_idle() for the device's bus type |
| 256 | (the request is represented by a work item in pm_wq); returns 0 on success |
| 257 | or error code if the request has not been queued up |
| 258 | |
| 259 | int pm_schedule_suspend(struct device *dev, unsigned int delay); |
| 260 | - schedule the execution of ->runtime_suspend() for the device's bus type |
| 261 | in future, where 'delay' is the time to wait before queuing up a suspend |
| 262 | work item in pm_wq, in milliseconds (if 'delay' is zero, the work item is |
| 263 | queued up immediately); returns 0 on success, 1 if the device's PM |
| 264 | run-time status was already 'suspended', or error code if the request |
| 265 | hasn't been scheduled (or queued up if 'delay' is 0); if the execution of |
| 266 | ->runtime_suspend() is already scheduled and not yet expired, the new |
| 267 | value of 'delay' will be used as the time to wait |
| 268 | |
| 269 | int pm_request_resume(struct device *dev); |
| 270 | - submit a request to execute ->runtime_resume() for the device's bus type |
| 271 | (the request is represented by a work item in pm_wq); returns 0 on |
| 272 | success, 1 if the device's run-time PM status was already 'active', or |
| 273 | error code if the request hasn't been queued up |
| 274 | |
| 275 | void pm_runtime_get_noresume(struct device *dev); |
| 276 | - increment the device's usage counter |
| 277 | |
| 278 | int pm_runtime_get(struct device *dev); |
| 279 | - increment the device's usage counter, run pm_request_resume(dev) and |
| 280 | return its result |
| 281 | |
| 282 | int pm_runtime_get_sync(struct device *dev); |
| 283 | - increment the device's usage counter, run pm_runtime_resume(dev) and |
| 284 | return its result |
| 285 | |
| 286 | void pm_runtime_put_noidle(struct device *dev); |
| 287 | - decrement the device's usage counter |
| 288 | |
| 289 | int pm_runtime_put(struct device *dev); |
| 290 | - decrement the device's usage counter, run pm_request_idle(dev) and return |
| 291 | its result |
| 292 | |
| 293 | int pm_runtime_put_sync(struct device *dev); |
| 294 | - decrement the device's usage counter, run pm_runtime_idle(dev) and return |
| 295 | its result |
| 296 | |
| 297 | void pm_runtime_enable(struct device *dev); |
| 298 | - enable the run-time PM helper functions to run the device bus type's |
| 299 | run-time PM callbacks described in Section 2 |
| 300 | |
| 301 | int pm_runtime_disable(struct device *dev); |
| 302 | - prevent the run-time PM helper functions from running the device bus |
| 303 | type's run-time PM callbacks, make sure that all of the pending run-time |
| 304 | PM operations on the device are either completed or canceled; returns |
| 305 | 1 if there was a resume request pending and it was necessary to execute |
| 306 | ->runtime_resume() for the device's bus type to satisfy that request, |
| 307 | otherwise 0 is returned |
| 308 | |
| 309 | void pm_suspend_ignore_children(struct device *dev, bool enable); |
| 310 | - set/unset the power.ignore_children flag of the device |
| 311 | |
| 312 | int pm_runtime_set_active(struct device *dev); |
| 313 | - clear the device's 'power.runtime_error' flag, set the device's run-time |
| 314 | PM status to 'active' and update its parent's counter of 'active' |
| 315 | children as appropriate (it is only valid to use this function if |
| 316 | 'power.runtime_error' is set or 'power.disable_depth' is greater than |
| 317 | zero); it will fail and return error code if the device has a parent |
| 318 | which is not active and the 'power.ignore_children' flag of which is unset |
| 319 | |
| 320 | void pm_runtime_set_suspended(struct device *dev); |
| 321 | - clear the device's 'power.runtime_error' flag, set the device's run-time |
| 322 | PM status to 'suspended' and update its parent's counter of 'active' |
| 323 | children as appropriate (it is only valid to use this function if |
| 324 | 'power.runtime_error' is set or 'power.disable_depth' is greater than |
| 325 | zero) |
| 326 | |
| 327 | It is safe to execute the following helper functions from interrupt context: |
| 328 | |
| 329 | pm_request_idle() |
| 330 | pm_schedule_suspend() |
| 331 | pm_request_resume() |
| 332 | pm_runtime_get_noresume() |
| 333 | pm_runtime_get() |
| 334 | pm_runtime_put_noidle() |
| 335 | pm_runtime_put() |
| 336 | pm_suspend_ignore_children() |
| 337 | pm_runtime_set_active() |
| 338 | pm_runtime_set_suspended() |
| 339 | pm_runtime_enable() |
| 340 | |
| 341 | 5. Run-time PM Initialization, Device Probing and Removal |
| 342 | |
| 343 | Initially, the run-time PM is disabled for all devices, which means that the |
| 344 | majority of the run-time PM helper funtions described in Section 4 will return |
| 345 | -EAGAIN until pm_runtime_enable() is called for the device. |
| 346 | |
| 347 | In addition to that, the initial run-time PM status of all devices is |
| 348 | 'suspended', but it need not reflect the actual physical state of the device. |
| 349 | Thus, if the device is initially active (i.e. it is able to process I/O), its |
| 350 | run-time PM status must be changed to 'active', with the help of |
| 351 | pm_runtime_set_active(), before pm_runtime_enable() is called for the device. |
| 352 | |
| 353 | However, if the device has a parent and the parent's run-time PM is enabled, |
| 354 | calling pm_runtime_set_active() for the device will affect the parent, unless |
| 355 | the parent's 'power.ignore_children' flag is set. Namely, in that case the |
| 356 | parent won't be able to suspend at run time, using the PM core's helper |
| 357 | functions, as long as the child's status is 'active', even if the child's |
| 358 | run-time PM is still disabled (i.e. pm_runtime_enable() hasn't been called for |
| 359 | the child yet or pm_runtime_disable() has been called for it). For this reason, |
| 360 | once pm_runtime_set_active() has been called for the device, pm_runtime_enable() |
| 361 | should be called for it too as soon as reasonably possible or its run-time PM |
| 362 | status should be changed back to 'suspended' with the help of |
| 363 | pm_runtime_set_suspended(). |
| 364 | |
| 365 | If the default initial run-time PM status of the device (i.e. 'suspended') |
| 366 | reflects the actual state of the device, its bus type's or its driver's |
| 367 | ->probe() callback will likely need to wake it up using one of the PM core's |
| 368 | helper functions described in Section 4. In that case, pm_runtime_resume() |
| 369 | should be used. Of course, for this purpose the device's run-time PM has to be |
| 370 | enabled earlier by calling pm_runtime_enable(). |
| 371 | |
| 372 | If the device bus type's or driver's ->probe() or ->remove() callback runs |
| 373 | pm_runtime_suspend() or pm_runtime_idle() or their asynchronous counterparts, |
| 374 | they will fail returning -EAGAIN, because the device's usage counter is |
| 375 | incremented by the core before executing ->probe() and ->remove(). Still, it |
| 376 | may be desirable to suspend the device as soon as ->probe() or ->remove() has |
| 377 | finished, so the PM core uses pm_runtime_idle_sync() to invoke the device bus |
| 378 | type's ->runtime_idle() callback at that time. |