sched: hmp: Ensure that best_cluster() never returns NULL

There are certain conditions under which group_will_fit() may return 0 for
all clusters in the system, especially under changing thermal conditions.
This may result in crashes such as this one:

        CPU 0                    |               CPU 1
====================================================================
select_best_cpu()                |
 -> env.rtg = rtgA               |
    rtgA.pref_cluster=C_big      |
                                 |   set_pref_cluster() for rtgA
                                 |     -> best_cluster()
                                 |        C_little doesn't fit
                                 |
                                 |   IRQ: thermal mitigation
                                 |   C_big capacity now less
                                 |   than C_little capacity
                                 |
                                 |     -> best_cluster() continues
                                 |        C_big doesn't fit
                                 |   set_pref_cluster() sets
                                 |   rtgA.pref_cluster = NULL
                                 |
select_least_power_cluster()     |
  -> cluster_first_cpu()         |
     -> BUG()                    |

To add lock protection around accesses to the group's preferred cluster
would be expensive and defeat the point of the usage of RCU to protect
access to the related_thread_group structure. Therefore, ensure that
best_cluster() can never return NULL. In the worst case, we'll select the
wrong cluster for a related_thread_group's demand, but this should be
fixed in the next tick or wakeup etc. Locking would have still led to the
momentary wrong decision with the additional expense!

Also, don't set preferred cluster to NULL when colocation is disabled.

Change-Id: Id3f514b149add9b3ed33d104fa6a9bd57bec27e2
Signed-off-by: Vikram Mulukutla <markivx@codeaurora.org>
1 file changed