mmc: msm_sdcc: fix race conditions in runtime PM
What is the race condition? (NOTE: RPM stands for Runtime Power Management)
1. SDCC is in RPM_SUSPENEDED state and SDCC platform suspend gets
triggered and then system goes into suspend.
2. During platform resume, SDCC platform resume is triggered
which blindly sets the pending_resume flag in SDCC host structure.
3. If new MMC transfer request comes then MMC block driver calls
mmc_claim_host() which internally calls msmsdcc_enable().
4. msmsdcc_enable() checks for following 3 conditions to be true:
1. host->sdcc_suspended == true
2. host->pending_resume == true
3. pm_runtime_suspended() == false
But as SDCC state is RPM_SUSPENDED, 3rd condition is not be satisfied
so msmsdcc_enable() don't clear the host->pending_resume flag and simply
calls pm_runtime_get_sync() to resume the SDCC. Once SDCC is resumed,
host->sdcc_suspended = false, runtime_status = RPM_ACTIVE but
host->pending_resume is still set.
5. Now after RPM idle timeout, SDCC runtime suspend gets triggered which
calls SDCC driver's runtime suspend callback (msmsdcc_runtime_suspend).
Note that before executing the RPM suspend calllback, SDCC RPM status
is set to RPM_SUSPENDING.
6. As SDCC driver's runtime suspend callback gets executed, it sets the
host->sdcc_suspended to true. But now before the RPM framework
sets the SDCC RPM status to RPM_SUSPENDED, on other active CPU core,
new MMC transfer reques comes in which would first call msmsdcc_enable()
and all of the 3 conditions below is true:
1. host->sdcc_suspended == true
2. host->pending_resume == true
3. pm_runtime_suspended() == false (RPM status is RPM_SUSPENDING)
As these conditions are satisfied, msmsdcc_enable() does not call
pm_runtime_get_sync(), instead just calls pm_runtime_get_noresume() and
msmsdcc_runtime_resume(). This means even after execution of
msmsdcc_enable(), SDCC RPM status is either RPM_SUSPENDING or
RPM_SUSPENDED.
7. RPM suspend framework on 1st core sets the SDCC RPM status to
RPM_SUSPENDED once SDCC runtime suspend callback returns.
8. Now once MMC transfer request is completed on other core, it will call
msmsdcc_disable(). This function calls pm_runtime_put_sync() which
returns -EAGAIN error as RPM status is already RPM_SUSPENED.
This error gets returned to MMC layer so MMC layer thinks that
SDCC is still enabled and skips call to msmsdcc_enable() until
msmsdcc_disable() succeeds.
8. Note when msmsdcc_disable() returned error, RPM usage_counter was set to
0 so next call to msmsdcc_disable() decrements RPM usage_counter to -1
(remember msmsdcc_enable() was skipped).
9. Now new MMC request comes in and it will first call msmsdcc_enable()
which calls pm_runtime_get_sync() and it sets the usage_counter to 0
and blindly resumes the SDCC as it was already suspended. After this
RPM status will be RPM_ACTIVE.
10. Once MMC request is processed, it will call smsdcc_disable() which
calls pm_runtime_put_sync() and it decrements RPM usage counter to -1
and skips scheduling runtime suspend callback as RPM usage counter
is not 0. RPM status remains in RPM_ACTIVE.
11. Now onwards for every new MMC transfer requests, 9 and 10 repeats and
SDCC always stays in RPM_ACTIVE state forever.
How is this race condition fixed?
Above race is created because host->pending_resume is remaining sticky.
Following changes are done to ensure that pending_resume gets set and
cleared at appropriate places:
1. In SDCC platform resume callback, set the host->pending_resume flag
only if RPM status is RPM_ACTIVE.
2. Clear the pending_resume flag once msmsdcc_runtime_resume() is
completed.
3. In msmsdcc_enable() function, if pending_resume flag is set skip calling
pm_runtime_get_sync() instead directly call msmsdcc_runtime_resume()
because RPM status is RPM_ACTIVE.
In addition, this patch adds WARN() messages in case of failures in
msmsdcc_enable() and msmsdcc_disable() functions to capture more details
in error conditions.
CRs-Fixed: 373338
Change-Id: I50d1bd63480c668dd2b83f01567f912661f0c606
Signed-off-by: Subhash Jadavani <subhashj@codeaurora.org>
1 file changed