[SCSI] ipr: fix eeh recovery for 64-bit adapters In some scenarios, an EEH error can take a long time to be detected, since the driver issues an MMIO read only after a device reset command times out and we try to reset the adapter. This patch adds some code in ipr_cancel_op() to read a hardware register so we detect the error earlier in case the op is being aborted because of a timeout caused by a frozen adapter slot. Another problem in such scenarios is that in __ipr_eh_host_reset() we change the dump state flag from WAIT_FOR_DUMP to GET_DUMP, and the flag is later changed from GET_DUMP to READ_DUMP in ipr_reset_restore_cfg_space(). However, if when __ipr_eh_host_reset() is called by the SCSI error handling the function ipr_reset_restore_cfg_space() has already been called by the PCI EEH code, we end up with the flag in an inconsistent state. This patch also prevents this problem. Signed-off-by: Kleber Sacilotto de Souza <klebers@linux.vnet.ibm.com> Acked-by: Brian King <brking@linux.vnet.ibm.com> Signed-off-by: James Bottomley <JBottomley@Parallels.com>

commit: a92fa25c63a788758bd52e9123504d133210c8b7 [log] [tgz]
author: Kleber Sacilotto de Souza <klebers@linux.vnet.ibm.com> Mon Jan 16 19:30:25 2012 -0200
committer: James Bottomley <JBottomley@Parallels.com> Sat Feb 18 08:33:13 2012 -0600
tree: bec201c46aa5ae21f2006bb87ad744741b35e241
parent: 7fbd764881a5f9dc81a378293b7a74227fcc04ed [diff] [blame]
diff --git a/drivers/scsi/ipr.c b/drivers/scsi/ipr.c
index 67b169b..b538f08 100644
--- a/drivers/scsi/ipr.c
+++ b/drivers/scsi/ipr.c

@@ -4613,11 +4613,13 @@
 	ENTER;
 	ioa_cfg = (struct ipr_ioa_cfg *) scsi_cmd->device->host->hostdata;
 
-	dev_err(&ioa_cfg->pdev->dev,
-		"Adapter being reset as a result of error recovery.\n");
+	if (!ioa_cfg->in_reset_reload) {
+		dev_err(&ioa_cfg->pdev->dev,
+			"Adapter being reset as a result of error recovery.\n");
 
-	if (WAIT_FOR_DUMP == ioa_cfg->sdt_state)
-		ioa_cfg->sdt_state = GET_DUMP;
+		if (WAIT_FOR_DUMP == ioa_cfg->sdt_state)
+			ioa_cfg->sdt_state = GET_DUMP;
+	}
 
 	rc = ipr_reset_reload(ioa_cfg, IPR_SHUTDOWN_ABBREV);
 
@@ -4907,7 +4909,7 @@
 	struct ipr_ioa_cfg *ioa_cfg;
 	struct ipr_resource_entry *res;
 	struct ipr_cmd_pkt *cmd_pkt;
-	u32 ioasc;
+	u32 ioasc, int_reg;
 	int op_found = 0;
 
 	ENTER;
@@ -4920,7 +4922,17 @@
 	 */
 	if (ioa_cfg->in_reset_reload || ioa_cfg->ioa_is_dead)
 		return FAILED;
-	if (!res || !ipr_is_gscsi(res))
+	if (!res)
+		return FAILED;
+
+	/*
+	 * If we are aborting a timed out op, chances are that the timeout was caused
+	 * by a still not detected EEH error. In such cases, reading a register will
+	 * trigger the EEH recovery infrastructure.
+	 */
+	int_reg = readl(ioa_cfg->regs.sense_interrupt_reg);
+
+	if (!ipr_is_gscsi(res))
 		return FAILED;
 
 	list_for_each_entry(ipr_cmd, &ioa_cfg->pending_q, queue) {
commit	a92fa25c63a788758bd52e9123504d133210c8b7	[log] [tgz]
author	Kleber Sacilotto de Souza <klebers@linux.vnet.ibm.com>	Mon Jan 16 19:30:25 2012 -0200
committer	James Bottomley <JBottomley@Parallels.com>	Sat Feb 18 08:33:13 2012 -0600
tree	bec201c46aa5ae21f2006bb87ad744741b35e241
parent	7fbd764881a5f9dc81a378293b7a74227fcc04ed [diff] [blame]