net: Reset NAPI bit if IPI failed

During hotplug if an RPS CPU goes offline,
then there is a possibility that the IPI
delivery to the RPS core might fail, this
happens in the cases when unruly drivers
use netif_rx API in the wrong context.

This happens due to two reasons

a) Firstly using netif_rx API in non preemptive
context leads to enough latencies that the IPI
delivery might fail to an RPS core. This is because
the softIRQ trigger will become unpredictable.

b) by using netif_rx it  becomes an architectural
issue where we are trying to do two things in two
different contexts. We set the NAPI bit in context
and sent the IPI in other context. Now since the
context switch is allowed, the remote CPU is allowed
to go finish its hotplug.

If there was no context switch in the first place,
which typically happens by either using the correct
version of netif_rx or switching to NAPI framework,
then the remote CPU is not allowed to go to CPU DOWN
state. This is by design since hotplug framework causes
 the remote dying CPU to wait until atleast one context
switch happens on all other CPUS. If preemption is
disabled then the dying CPU has to wait until preemption
is enabled and a context switch happens.

This patch catches these unruly drivers and handles
IPI misses by clearing NAPI sate on remote RPS CPUs

Please refere here for more documentation on hotplug
and preemption cases https://lwn.net/Articles/569686/

CRs-Fixed: 2062245
Change-Id: I072f91bdb4d7e444e3624e8e010ef1b66a67b1ed
Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org>
1 file changed