sfc: Workaround flush failures on Falcon B0
Under certain conditions a PHY may backpressure Falcon B0
in such a way that flushes timeout. In normal circumstances
the phy poller would fix the PHY, and the flush could complete.
But efx_nic_flush_queues() is always called after efx_stop_all(),
so the poller has been stopped. Even if this weren't the case,
how long would we have to wait for the poller to fix this? And
several callers of efx_nic_flush_queues() are about to reset
the device anyway - so we don't need to do anything.
Work around this bug by scheduling a reset. Ensure that the
MAC is never rewired back into the datapath before the reset
runs (we already ignore all rx events anyway).
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff --git a/drivers/net/sfc/efx.c b/drivers/net/sfc/efx.c
index 0319000..d1a1d32 100644
--- a/drivers/net/sfc/efx.c
+++ b/drivers/net/sfc/efx.c
@@ -27,6 +27,7 @@
#include "nic.h"
#include "mcdi.h"
+#include "workarounds.h"
/**************************************************************************
*
@@ -556,10 +557,18 @@
BUG_ON(efx->port_enabled);
rc = efx_nic_flush_queues(efx);
- if (rc)
+ if (rc && EFX_WORKAROUND_7803(efx)) {
+ /* Schedule a reset to recover from the flush failure. The
+ * descriptor caches reference memory we're about to free,
+ * but falcon_reconfigure_mac_wrapper() won't reconnect
+ * the MACs because of the pending reset. */
+ EFX_ERR(efx, "Resetting to recover from flush failure\n");
+ efx_schedule_reset(efx, RESET_TYPE_ALL);
+ } else if (rc) {
EFX_ERR(efx, "failed to flush queues\n");
- else
+ } else {
EFX_LOG(efx, "successfully flushed all queues\n");
+ }
efx_for_each_channel(channel, efx) {
EFX_LOG(channel->efx, "shut down chan %d\n", channel->channel);