Blame - Documentation/scsi/scsi_eh.txt - kernel/msm-4.19

blob: 11e447bdb3a58e6732f4e1f9ce5311c8bd534d6e [file] [log] [blame]

Tejun Heo	70c83e1	2005-09-11 09:37:19 +0900	[diff] [blame]	1
				2	SCSI EH
				3	======================================
				4
				5	This document describes SCSI midlayer error handling infrastructure.
				6	Please refer to Documentation/scsi/scsi_mid_low_api.txt for more
				7	information regarding SCSI midlayer.
				8
				9	TABLE OF CONTENTS
				10
				11	[1] How SCSI commands travel through the midlayer and to EH
				12	[1-1] struct scsi_cmnd
				13	[1-2] How do scmd's get completed?
				14	[1-2-1] Completing a scmd w/ scsi_done
				15	[1-2-2] Completing a scmd w/ timeout
				16	[1-3] How EH takes over
				17	[2] How SCSI EH works
				18	[2-1] EH through fine-grained callbacks
				19	[2-1-1] Overview
				20	[2-1-2] Flow of scmds through EH
				21	[2-1-3] Flow of control
Christoph Hellwig	9227c33	2006-04-01 19:21:04 +0200	[diff] [blame]	22	[2-2] EH through transportt->eh_strategy_handler()
				23	[2-2-1] Pre transportt->eh_strategy_handler() SCSI midlayer conditions
				24	[2-2-2] Post transportt->eh_strategy_handler() SCSI midlayer conditions
Tejun Heo	70c83e1	2005-09-11 09:37:19 +0900	[diff] [blame]	25	[2-2-3] Things to consider
				26
				27
				28	[1] How SCSI commands travel through the midlayer and to EH
				29
				30	[1-1] struct scsi_cmnd
				31
				32	Each SCSI command is represented with struct scsi_cmnd (== scmd). A
				33	scmd has two list_head's to link itself into lists. The two are
				34	scmd->list and scmd->eh_entry. The former is used for free list or
				35	per-device allocated scmd list and not of much interest to this EH
				36	discussion. The latter is used for completion and EH lists and unless
				37	otherwise stated scmds are always linked using scmd->eh_entry in this
				38	discussion.
				39
				40
				41	[1-2] How do scmd's get completed?
				42
				43	Once LLDD gets hold of a scmd, either the LLDD will complete the
				44	command by calling scsi_done callback passed from midlayer when
Hannes Reinecke	6ad5550	2013-11-11 13:44:57 +0100	[diff] [blame]	45	invoking hostt->queuecommand() or the block layer will time it out.
Tejun Heo	70c83e1	2005-09-11 09:37:19 +0900	[diff] [blame]	46
				47
				48	[1-2-1] Completing a scmd w/ scsi_done
				49
				50	For all non-EH commands, scsi_done() is the completion callback. It
Hannes Reinecke	6ad5550	2013-11-11 13:44:57 +0100	[diff] [blame]	51	just calls blk_complete_request() to delete the block layer timer and
				52	raise SCSI_SOFTIRQ
Tejun Heo	70c83e1	2005-09-11 09:37:19 +0900	[diff] [blame]	53
				54	SCSI_SOFTIRQ handler scsi_softirq calls scsi_decide_disposition() to
				55	determine what to do with the command. scsi_decide_disposition()
				56	looks at the scmd->result value and sense data to determine what to do
				57	with the command.
				58
				59	- SUCCESS
				60	scsi_finish_command() is invoked for the command. The
Hannes Reinecke	6ad5550	2013-11-11 13:44:57 +0100	[diff] [blame]	61	function does some maintenance chores and then calls
				62	scsi_io_completion() to finish the I/O.
				63	scsi_io_completion() then notifies the block layer on
				64	the completed request by calling blk_end_request and
				65	friends or figures out what to do with the remainder
				66	of the data in case of an error.
Tejun Heo	70c83e1	2005-09-11 09:37:19 +0900	[diff] [blame]	67
				68	- NEEDS_RETRY
				69	- ADD_TO_MLQUEUE
				70	scmd is requeued to blk queue.
				71
				72	- otherwise
Hannes Reinecke	a065863	2017-04-06 15:36:35 +0200	[diff] [blame]	73	scsi_eh_scmd_add(scmd) is invoked for the command. See
Matt LaPlante	5d3f083	2006-11-30 05:21:10 +0100	[diff] [blame]	74	[1-3] for details of this function.
Tejun Heo	70c83e1	2005-09-11 09:37:19 +0900	[diff] [blame]	75
				76
				77	[1-2-2] Completing a scmd w/ timeout
				78
				79	The timeout handler is scsi_times_out(). When a timeout occurs, this
				80	function
				81
Stefan Richter	8c0ae656	2005-11-05 01:35:05 +0100	[diff] [blame]	82	1. invokes optional hostt->eh_timed_out() callback. Return value can
Tejun Heo	70c83e1	2005-09-11 09:37:19 +0900	[diff] [blame]	83	be one of
				84
Hannes Reinecke	6ad5550	2013-11-11 13:44:57 +0100	[diff] [blame]	85	- BLK_EH_HANDLED
				86	This indicates that eh_timed_out() dealt with the timeout.
				87	The command is passed back to the block layer and completed
				88	via __blk_complete_requests().
Tejun Heo	70c83e1	2005-09-11 09:37:19 +0900	[diff] [blame]	89
Hannes Reinecke	6ad5550	2013-11-11 13:44:57 +0100	[diff] [blame]	90	NOTE After returning BLK_EH_HANDLED the SCSI layer is
				91	assumed to be finished with the command, and no other
				92	functions from the SCSI layer will be called. So this
				93	should typically only be returned if the eh_timed_out()
				94	handler raced with normal completion.
				95
				96	- BLK_EH_RESET_TIMER
Tejun Heo	70c83e1	2005-09-11 09:37:19 +0900	[diff] [blame]	97	This indicates that more time is required to finish the
				98	command. Timer is restarted. This action is counted as a
				99	retry and only allowed scmd->allowed + 1(!) times. Once the
Hannes Reinecke	6ad5550	2013-11-11 13:44:57 +0100	[diff] [blame]	100	limit is reached, action for BLK_EH_NOT_HANDLED is taken instead.
Tejun Heo	70c83e1	2005-09-11 09:37:19 +0900	[diff] [blame]	101
Hannes Reinecke	6ad5550	2013-11-11 13:44:57 +0100	[diff] [blame]	102	- BLK_EH_NOT_HANDLED
				103	eh_timed_out() callback did not handle the command.
Tejun Heo	70c83e1	2005-09-11 09:37:19 +0900	[diff] [blame]	104	Step #2 is taken.
				105
Hannes Reinecke	a065863	2017-04-06 15:36:35 +0200	[diff] [blame]	106	2. scsi_abort_command() is invoked to schedule an asynchrous abort.
Hannes Reinecke	8e8c9d0	2017-04-06 15:36:33 +0200	[diff] [blame]	107	Asynchronous abort are not invoked for commands which the
				108	SCSI_EH_ABORT_SCHEDULED flag is set (this indicates that the command
				109	already had been aborted once, and this is a retry which failed),
				110	or when the EH deadline is expired. In these case Step #3 is taken.
Hannes Reinecke	6ad5550	2013-11-11 13:44:57 +0100	[diff] [blame]	111
Hannes Reinecke	8e8c9d0	2017-04-06 15:36:33 +0200	[diff] [blame]	112	3. scsi_eh_scmd_add(scmd, SCSI_EH_CANCEL_CMD) is invoked for the
				113	command. See [1-4] for more information.
Tejun Heo	70c83e1	2005-09-11 09:37:19 +0900	[diff] [blame]	114
Hannes Reinecke	6ad5550	2013-11-11 13:44:57 +0100	[diff] [blame]	115	[1-3] Asynchronous command aborts
Tejun Heo	70c83e1	2005-09-11 09:37:19 +0900	[diff] [blame]	116
Hannes Reinecke	6ad5550	2013-11-11 13:44:57 +0100	[diff] [blame]	117	After a timeout occurs a command abort is scheduled from
				118	scsi_abort_command(). If the abort is successful the command
				119	will either be retried (if the number of retries is not exhausted)
				120	or terminated with DID_TIME_OUT.
				121	Otherwise scsi_eh_scmd_add() is invoked for the command.
				122	See [1-4] for more information.
				123
				124	[1-4] How EH takes over
Tejun Heo	70c83e1	2005-09-11 09:37:19 +0900	[diff] [blame]	125
				126	scmds enter EH via scsi_eh_scmd_add(), which does the following.
				127
Hannes Reinecke	a065863	2017-04-06 15:36:35 +0200	[diff] [blame]	128	1. Links scmd->eh_entry to shost->eh_cmd_q
Tejun Heo	70c83e1	2005-09-11 09:37:19 +0900	[diff] [blame]	129
Hannes Reinecke	a065863	2017-04-06 15:36:35 +0200	[diff] [blame]	130	2. Sets SHOST_RECOVERY bit in shost->shost_state
Tejun Heo	70c83e1	2005-09-11 09:37:19 +0900	[diff] [blame]	131
Hannes Reinecke	a065863	2017-04-06 15:36:35 +0200	[diff] [blame]	132	3. Increments shost->host_failed
Tejun Heo	70c83e1	2005-09-11 09:37:19 +0900	[diff] [blame]	133
Hannes Reinecke	a065863	2017-04-06 15:36:35 +0200	[diff] [blame]	134	4. Wakes up SCSI EH thread if shost->host_busy == shost->host_failed
Tejun Heo	70c83e1	2005-09-11 09:37:19 +0900	[diff] [blame]	135
				136	As can be seen above, once any scmd is added to shost->eh_cmd_q,
				137	SHOST_RECOVERY shost_state bit is turned on. This prevents any new
				138	scmd to be issued from blk queue to the host; eventually, all scmds on
				139	the host either complete normally, fail and get added to eh_cmd_q, or
				140	time out and get added to shost->eh_cmd_q.
				141
				142	If all scmds either complete or fail, the number of in-flight scmds
				143	becomes equal to the number of failed scmds - i.e. shost->host_busy ==
				144	shost->host_failed. This wakes up SCSI EH thread. So, once woken up,
				145	SCSI EH thread can expect that all in-flight commands have failed and
				146	are linked on shost->eh_cmd_q.
				147
				148	Note that this does not mean lower layers are quiescent. If a LLDD
				149	completed a scmd with error status, the LLDD and lower layers are
				150	assumed to forget about the scmd at that point. However, if a scmd
Stefan Richter	8c0ae656	2005-11-05 01:35:05 +0100	[diff] [blame]	151	has timed out, unless hostt->eh_timed_out() made lower layers forget
Tejun Heo	70c83e1	2005-09-11 09:37:19 +0900	[diff] [blame]	152	about the scmd, which currently no LLDD does, the command is still
				153	active as long as lower layers are concerned and completion could
				154	occur at any time. Of course, all such completions are ignored as the
				155	timer has already expired.
				156
				157	We'll talk about how SCSI EH takes actions to abort - make LLDD
				158	forget about - timed out scmds later.
				159
				160
				161	[2] How SCSI EH works
				162
				163	LLDD's can implement SCSI EH actions in one of the following two
				164	ways.
				165
				166	- Fine-grained EH callbacks
				167	LLDD can implement fine-grained EH callbacks and let SCSI
				168	midlayer drive error handling and call appropriate callbacks.
Matt LaPlante	fff9289	2006-10-03 22:47:42 +0200	[diff] [blame]	169	This will be discussed further in [2-1].
Tejun Heo	70c83e1	2005-09-11 09:37:19 +0900	[diff] [blame]	170
				171	- eh_strategy_handler() callback
				172	This is one big callback which should perform whole error
Finn Thain	542cb45	2014-10-03 11:42:17 +1000	[diff] [blame]	173	handling. As such, it should do all chores the SCSI midlayer
Tejun Heo	70c83e1	2005-09-11 09:37:19 +0900	[diff] [blame]	174	performs during recovery. This will be discussed in [2-2].
				175
				176	Once recovery is complete, SCSI EH resumes normal operation by
				177	calling scsi_restart_operations(), which
				178
				179	1. Checks if door locking is needed and locks door.
				180
				181	2. Clears SHOST_RECOVERY shost_state bit
				182
				183	3. Wakes up waiters on shost->host_wait. This occurs if someone
				184	calls scsi_block_when_processing_errors() on the host.
				185	(QUESTION why is it needed? All operations will be blocked
				186	anyway after it reaches blk queue.)
				187
				188	4. Kicks queues in all devices on the host in the asses
				189
				190
				191	[2-1] EH through fine-grained callbacks
				192
				193	[2-1-1] Overview
				194
				195	If eh_strategy_handler() is not present, SCSI midlayer takes charge
				196	of driving error handling. EH's goals are two - make LLDD, host and
				197	device forget about timed out scmds and make them ready for new
				198	commands. A scmd is said to be recovered if the scmd is forgotten by
				199	lower layers and lower layers are ready to process or fail the scmd
				200	again.
				201
				202	To achieve these goals, EH performs recovery actions with increasing
Matt LaPlante	2fe0ae7	2006-10-03 22:50:39 +0200	[diff] [blame]	203	severity. Some actions are performed by issuing SCSI commands and
Tejun Heo	70c83e1	2005-09-11 09:37:19 +0900	[diff] [blame]	204	others are performed by invoking one of the following fine-grained
				205	hostt EH callbacks. Callbacks may be omitted and omitted ones are
				206	considered to fail always.
				207
				208	int (* eh_abort_handler)(struct scsi_cmnd *);
				209	int (* eh_device_reset_handler)(struct scsi_cmnd *);
				210	int (* eh_bus_reset_handler)(struct scsi_cmnd *);
				211	int (* eh_host_reset_handler)(struct scsi_cmnd *);
				212
				213	Higher-severity actions are taken only when lower-severity actions
				214	cannot recover some of failed scmds. Also, note that failure of the
				215	highest-severity action means EH failure and results in offlining of
				216	all unrecovered devices.
				217
				218	During recovery, the following rules are followed
				219
				220	- Recovery actions are performed on failed scmds on the to do list,
				221	eh_work_q. If a recovery action succeeds for a scmd, recovered
				222	scmds are removed from eh_work_q.
				223
				224	Note that single recovery action on a scmd can recover multiple
				225	scmds. e.g. resetting a device recovers all failed scmds on the
				226	device.
				227
				228	- Higher severity actions are taken iff eh_work_q is not empty after
				229	lower severity actions are complete.
				230
				231	- EH reuses failed scmds to issue commands for recovery. For
				232	timed-out scmds, SCSI EH ensures that LLDD forgets about a scmd
				233	before reusing it for EH commands.
				234
				235	When a scmd is recovered, the scmd is moved from eh_work_q to EH
				236	local eh_done_q using scsi_eh_finish_cmd(). After all scmds are
				237	recovered (eh_work_q is empty), scsi_eh_flush_done_q() is invoked to
				238	either retry or error-finish (notify upper layer of failure) recovered
				239	scmds.
				240
				241	scmds are retried iff its sdev is still online (not offlined during
				242	EH), REQ_FAILFAST is not set and ++scmd->retries is less than
				243	scmd->allowed.
				244
				245
				246	[2-1-2] Flow of scmds through EH
				247
				248	1. Error completion / time out
				249	ACTION: scsi_eh_scmd_add() is invoked for scmd
Tejun Heo	70c83e1	2005-09-11 09:37:19 +0900	[diff] [blame]	250	- add scmd to shost->eh_cmd_q
				251	- set SHOST_RECOVERY
				252	- shost->host_failed++
				253	LOCKING: shost->host_lock
				254
				255	2. EH starts
				256	ACTION: move all scmds to EH's local eh_work_q. shost->eh_cmd_q
				257	is cleared.
				258	LOCKING: shost->host_lock (not strictly necessary, just for
				259	consistency)
				260
				261	3. scmd recovered
				262	ACTION: scsi_eh_finish_cmd() is invoked to EH-finish scmd
Tejun Heo	70c83e1	2005-09-11 09:37:19 +0900	[diff] [blame]	263	- scsi_setup_cmd_retry()
				264	- move from local eh_work_q to local eh_done_q
				265	LOCKING: none
Wei Fang	72d8c36	2016-06-07 14:53:56 +0800	[diff] [blame]	266	CONCURRENCY: at most one thread per separate eh_work_q to
				267	keep queue manipulation lockless
Tejun Heo	70c83e1	2005-09-11 09:37:19 +0900	[diff] [blame]	268
				269	4. EH completes
				270	ACTION: scsi_eh_flush_done_q() retries scmds or notifies upper
Wei Fang	72d8c36	2016-06-07 14:53:56 +0800	[diff] [blame]	271	layer of failure. May be called concurrently but must have
				272	a no more than one thread per separate eh_work_q to
				273	manipulate the queue locklessly
Tejun Heo	70c83e1	2005-09-11 09:37:19 +0900	[diff] [blame]	274	- scmd is removed from eh_done_q and scmd->eh_entry is cleared
				275	- if retry is necessary, scmd is requeued using
				276	scsi_queue_insert()
				277	- otherwise, scsi_finish_command() is invoked for scmd
Wei Fang	72d8c36	2016-06-07 14:53:56 +0800	[diff] [blame]	278	- zero shost->host_failed
Tejun Heo	70c83e1	2005-09-11 09:37:19 +0900	[diff] [blame]	279	LOCKING: queue or finish function performs appropriate locking
				280
				281
				282	[2-1-3] Flow of control
				283
				284	EH through fine-grained callbacks start from scsi_unjam_host().
				285
				286	<<scsi_unjam_host>>
				287
				288	1. Lock shost->host_lock, splice_init shost->eh_cmd_q into local
				289	eh_work_q and unlock host_lock. Note that shost->eh_cmd_q is
				290	cleared by this action.
				291
				292	2. Invoke scsi_eh_get_sense.
				293
				294	<<scsi_eh_get_sense>>
				295
				296	This action is taken for each error-completed
				297	(!SCSI_EH_CANCEL_CMD) commands without valid sense data. Most
				298	SCSI transports/LLDDs automatically acquire sense data on
				299	command failures (autosense). Autosense is recommended for
				300	performance reasons and as sense information could get out of
Lucas De Marchi	25985ed	2011-03-30 22:57:33 -0300	[diff] [blame]	301	sync between occurrence of CHECK CONDITION and this action.
Tejun Heo	70c83e1	2005-09-11 09:37:19 +0900	[diff] [blame]	302
				303	Note that if autosense is not supported, scmd->sense_buffer
				304	contains invalid sense data when error-completing the scmd
				305	with scsi_done(). scsi_decide_disposition() always returns
				306	FAILED in such cases thus invoking SCSI EH. When the scmd
				307	reaches here, sense data is acquired and
				308	scsi_decide_disposition() is called again.
				309
				310	1. Invoke scsi_request_sense() which issues REQUEST_SENSE
				311	command. If fails, no action. Note that taking no action
				312	causes higher-severity recovery to be taken for the scmd.
				313
				314	2. Invoke scsi_decide_disposition() on the scmd
				315
				316	- SUCCESS
				317	scmd->retries is set to scmd->allowed preventing
				318	scsi_eh_flush_done_q() from retrying the scmd and
				319	scsi_eh_finish_cmd() is invoked.
				320
				321	- NEEDS_RETRY
				322	scsi_eh_finish_cmd() invoked
				323
				324	- otherwise
				325	No action.
				326
				327	3. If !list_empty(&eh_work_q), invoke scsi_eh_abort_cmds().
				328
				329	<<scsi_eh_abort_cmds>>
				330
Hannes Reinecke	6ad5550	2013-11-11 13:44:57 +0100	[diff] [blame]	331	This action is taken for each timed out command when
				332	no_async_abort is enabled in the host template.
Tejun Heo	70c83e1	2005-09-11 09:37:19 +0900	[diff] [blame]	333	hostt->eh_abort_handler() is invoked for each scmd. The
				334	handler returns SUCCESS if it has succeeded to make LLDD and
				335	all related hardware forget about the scmd.
				336
				337	If a timedout scmd is successfully aborted and the sdev is
				338	either offline or ready, scsi_eh_finish_cmd() is invoked for
				339	the scmd. Otherwise, the scmd is left in eh_work_q for
				340	higher-severity actions.
				341
				342	Note that both offline and ready status mean that the sdev is
				343	ready to process new scmds, where processing also implies
				344	immediate failing; thus, if a sdev is in one of the two
				345	states, no further recovery action is needed.
				346
				347	Device readiness is tested using scsi_eh_tur() which issues
				348	TEST_UNIT_READY command. Note that the scmd must have been
				349	aborted successfully before reusing it for TEST_UNIT_READY.
				350
				351	4. If !list_empty(&eh_work_q), invoke scsi_eh_ready_devs()
				352
				353	<<scsi_eh_ready_devs>>
				354
				355	This function takes four increasingly more severe measures to
				356	make failed sdevs ready for new commands.
				357
				358	1. Invoke scsi_eh_stu()
				359
				360	<<scsi_eh_stu>>
				361
				362	For each sdev which has failed scmds with valid sense data
				363	of which scsi_check_sense()'s verdict is FAILED,
				364	START_STOP_UNIT command is issued w/ start=1. Note that
				365	as we explicitly choose error-completed scmds, it is known
				366	that lower layers have forgotten about the scmd and we can
				367	reuse it for STU.
				368
				369	If STU succeeds and the sdev is either offline or ready,
				370	all failed scmds on the sdev are EH-finished with
				371	scsi_eh_finish_cmd().
				372
				373	NOTE If hostt->eh_abort_handler() isn't implemented or
				374	failed, we may still have timed out scmds at this point
				375	and STU doesn't make lower layers forget about those
				376	scmds. Yet, this function EH-finish all scmds on the sdev
				377	if STU succeeds leaving lower layers in an inconsistent
				378	state. It seems that STU action should be taken only when
				379	a sdev has no timed out scmd.
				380
				381	2. If !list_empty(&eh_work_q), invoke scsi_eh_bus_device_reset().
				382
				383	<<scsi_eh_bus_device_reset>>
				384
				385	This action is very similar to scsi_eh_stu() except that,
				386	instead of issuing STU, hostt->eh_device_reset_handler()
				387	is used. Also, as we're not issuing SCSI commands and
				388	resetting clears all scmds on the sdev, there is no need
				389	to choose error-completed scmds.
				390
				391	3. If !list_empty(&eh_work_q), invoke scsi_eh_bus_reset()
				392
				393	<<scsi_eh_bus_reset>>
				394
				395	hostt->eh_bus_reset_handler() is invoked for each channel
				396	with failed scmds. If bus reset succeeds, all failed
				397	scmds on all ready or offline sdevs on the channel are
				398	EH-finished.
				399
				400	4. If !list_empty(&eh_work_q), invoke scsi_eh_host_reset()
				401
				402	<<scsi_eh_host_reset>>
				403
				404	This is the last resort. hostt->eh_host_reset_handler()
				405	is invoked. If host reset succeeds, all failed scmds on
				406	all ready or offline sdevs on the host are EH-finished.
				407
				408	5. If !list_empty(&eh_work_q), invoke scsi_eh_offline_sdevs()
				409
				410	<<scsi_eh_offline_sdevs>>
				411
				412	Take all sdevs which still have unrecovered scmds offline
				413	and EH-finish the scmds.
				414
				415	5. Invoke scsi_eh_flush_done_q().
				416
				417	<<scsi_eh_flush_done_q>>
				418
				419	At this point all scmds are recovered (or given up) and
				420	put on eh_done_q by scsi_eh_finish_cmd(). This function
				421	flushes eh_done_q by either retrying or notifying upper
				422	layer of failure of the scmds.
				423
				424
Christoph Hellwig	9227c33	2006-04-01 19:21:04 +0200	[diff] [blame]	425	[2-2] EH through transportt->eh_strategy_handler()
Tejun Heo	70c83e1	2005-09-11 09:37:19 +0900	[diff] [blame]	426
Christoph Hellwig	9227c33	2006-04-01 19:21:04 +0200	[diff] [blame]	427	transportt->eh_strategy_handler() is invoked in the place of
Tejun Heo	70c83e1	2005-09-11 09:37:19 +0900	[diff] [blame]	428	scsi_unjam_host() and it is responsible for whole recovery process.
				429	On completion, the handler should have made lower layers forget about
				430	all failed scmds and either ready for new commands or offline. Also,
Finn Thain	542cb45	2014-10-03 11:42:17 +1000	[diff] [blame]	431	it should perform SCSI EH maintenance chores to maintain integrity of
Tejun Heo	70c83e1	2005-09-11 09:37:19 +0900	[diff] [blame]	432	SCSI midlayer. IOW, of the steps described in [2-1-2], all steps
				433	except for #1 must be implemented by eh_strategy_handler().
				434
				435
Christoph Hellwig	9227c33	2006-04-01 19:21:04 +0200	[diff] [blame]	436	[2-2-1] Pre transportt->eh_strategy_handler() SCSI midlayer conditions
Tejun Heo	70c83e1	2005-09-11 09:37:19 +0900	[diff] [blame]	437
				438	The following conditions are true on entry to the handler.
				439
				440	- Each failed scmd's eh_flags field is set appropriately.
				441
				442	- Each failed scmd is linked on scmd->eh_cmd_q by scmd->eh_entry.
				443
				444	- SHOST_RECOVERY is set.
				445
				446	- shost->host_failed == shost->host_busy
				447
				448
Christoph Hellwig	9227c33	2006-04-01 19:21:04 +0200	[diff] [blame]	449	[2-2-2] Post transportt->eh_strategy_handler() SCSI midlayer conditions
Tejun Heo	70c83e1	2005-09-11 09:37:19 +0900	[diff] [blame]	450
				451	The following conditions must be true on exit from the handler.
				452
				453	- shost->host_failed is zero.
				454
Tejun Heo	70c83e1	2005-09-11 09:37:19 +0900	[diff] [blame]	455	- Each scmd is in such a state that scsi_setup_cmd_retry() on the
				456	scmd doesn't make any difference.
				457
				458	- shost->eh_cmd_q is cleared.
				459
				460	- Each scmd->eh_entry is cleared.
				461
				462	- Either scsi_queue_insert() or scsi_finish_command() is called on
				463	each scmd. Note that the handler is free to use scmd->retries and
				464	->allowed to limit the number of retries.
				465
				466
				467	[2-2-3] Things to consider
				468
				469	- Know that timed out scmds are still active on lower layers. Make
				470	lower layers forget about them before doing anything else with
				471	those scmds.
				472
				473	- For consistency, when accessing/modifying shost data structure,
				474	grab shost->host_lock.
				475
				476	- On completion, each failed sdev must have forgotten about all
				477	active scmds.
				478
				479	- On completion, each failed sdev must be ready for new commands or
				480	offline.
				481
				482
				483	--
				484	Tejun Heo
				485	htejun@gmail.com
				486	11th September 2005