Blame - Documentation/scsi/scsi_eh.txt - fp2-dev/kernel/msm

blob: 7acbebb17fa6c42bd07f1da11311dde70b4bc60b [file] [log] [blame]

Tejun Heo	70c83e1	2005-09-11 09:37:19 +0900	[diff] [blame]	1
				2	SCSI EH
				3	======================================
				4
				5	This document describes SCSI midlayer error handling infrastructure.
				6	Please refer to Documentation/scsi/scsi_mid_low_api.txt for more
				7	information regarding SCSI midlayer.
				8
				9	TABLE OF CONTENTS
				10
				11	[1] How SCSI commands travel through the midlayer and to EH
				12	[1-1] struct scsi_cmnd
				13	[1-2] How do scmd's get completed?
				14	[1-2-1] Completing a scmd w/ scsi_done
				15	[1-2-2] Completing a scmd w/ timeout
				16	[1-3] How EH takes over
				17	[2] How SCSI EH works
				18	[2-1] EH through fine-grained callbacks
				19	[2-1-1] Overview
				20	[2-1-2] Flow of scmds through EH
				21	[2-1-3] Flow of control
Christoph Hellwig	9227c33	2006-04-01 19:21:04 +0200	[diff] [blame]	22	[2-2] EH through transportt->eh_strategy_handler()
				23	[2-2-1] Pre transportt->eh_strategy_handler() SCSI midlayer conditions
				24	[2-2-2] Post transportt->eh_strategy_handler() SCSI midlayer conditions
Tejun Heo	70c83e1	2005-09-11 09:37:19 +0900	[diff] [blame]	25	[2-2-3] Things to consider
				26
				27
				28	[1] How SCSI commands travel through the midlayer and to EH
				29
				30	[1-1] struct scsi_cmnd
				31
				32	Each SCSI command is represented with struct scsi_cmnd (== scmd). A
				33	scmd has two list_head's to link itself into lists. The two are
				34	scmd->list and scmd->eh_entry. The former is used for free list or
				35	per-device allocated scmd list and not of much interest to this EH
				36	discussion. The latter is used for completion and EH lists and unless
				37	otherwise stated scmds are always linked using scmd->eh_entry in this
				38	discussion.
				39
				40
				41	[1-2] How do scmd's get completed?
				42
				43	Once LLDD gets hold of a scmd, either the LLDD will complete the
				44	command by calling scsi_done callback passed from midlayer when
				45	invoking hostt->queuecommand() or SCSI midlayer will time it out.
				46
				47
				48	[1-2-1] Completing a scmd w/ scsi_done
				49
				50	For all non-EH commands, scsi_done() is the completion callback. It
				51	does the following.
				52
				53	1. Delete timeout timer. If it fails, it means that timeout timer
				54	has expired and is going to finish the command. Just return.
				55
				56	2. Link scmd to per-cpu scsi_done_q using scmd->en_entry
				57
				58	3. Raise SCSI_SOFTIRQ
				59
				60	SCSI_SOFTIRQ handler scsi_softirq calls scsi_decide_disposition() to
				61	determine what to do with the command. scsi_decide_disposition()
				62	looks at the scmd->result value and sense data to determine what to do
				63	with the command.
				64
				65	- SUCCESS
				66	scsi_finish_command() is invoked for the command. The
				67	function does some maintenance choirs and notify completion by
				68	calling scmd->done() callback, which, for fs requests, would
				69	be HLD completion callback - sd:sd_rw_intr, sr:rw_intr,
				70	st:st_intr.
				71
				72	- NEEDS_RETRY
				73	- ADD_TO_MLQUEUE
				74	scmd is requeued to blk queue.
				75
				76	- otherwise
				77	scsi_eh_scmd_add(scmd, 0) is invoked for the command. See
Matt LaPlante	5d3f083	2006-11-30 05:21:10 +0100	[diff] [blame]	78	[1-3] for details of this function.
Tejun Heo	70c83e1	2005-09-11 09:37:19 +0900	[diff] [blame]	79
				80
				81	[1-2-2] Completing a scmd w/ timeout
				82
				83	The timeout handler is scsi_times_out(). When a timeout occurs, this
				84	function
				85
Stefan Richter	8c0ae65	2005-11-05 01:35:05 +0100	[diff] [blame]	86	1. invokes optional hostt->eh_timed_out() callback. Return value can
Tejun Heo	70c83e1	2005-09-11 09:37:19 +0900	[diff] [blame]	87	be one of
				88
				89	- EH_HANDLED
Stefan Richter	8c0ae65	2005-11-05 01:35:05 +0100	[diff] [blame]	90	This indicates that eh_timed_out() dealt with the timeout. The
Tejun Heo	70c83e1	2005-09-11 09:37:19 +0900	[diff] [blame]	91	scmd is passed to __scsi_done() and thus linked into per-cpu
				92	scsi_done_q. Normal command completion described in [1-2-1]
				93	follows.
				94
				95	- EH_RESET_TIMER
				96	This indicates that more time is required to finish the
				97	command. Timer is restarted. This action is counted as a
				98	retry and only allowed scmd->allowed + 1(!) times. Once the
				99	limit is reached, action for EH_NOT_HANDLED is taken instead.
				100
				101	NOTE This action is racy as the LLDD could finish the scmd
				102	after the timeout has expired but before it's added back. In
				103	such cases, scsi_done() would think that timeout has occurred
				104	and return without doing anything. We lose completion and the
				105	command will time out again.
				106
				107	- EH_NOT_HANDLED
Stefan Richter	8c0ae65	2005-11-05 01:35:05 +0100	[diff] [blame]	108	This is the same as when eh_timed_out() callback doesn't exist.
Tejun Heo	70c83e1	2005-09-11 09:37:19 +0900	[diff] [blame]	109	Step #2 is taken.
				110
				111	2. scsi_eh_scmd_add(scmd, SCSI_EH_CANCEL_CMD) is invoked for the
				112	command. See [1-3] for more information.
				113
				114
				115	[1-3] How EH takes over
				116
				117	scmds enter EH via scsi_eh_scmd_add(), which does the following.
				118
				119	1. Turns on scmd->eh_eflags as requested. It's 0 for error
				120	completions and SCSI_EH_CANCEL_CMD for timeouts.
				121
				122	2. Links scmd->eh_entry to shost->eh_cmd_q
				123
				124	3. Sets SHOST_RECOVERY bit in shost->shost_state
				125
				126	4. Increments shost->host_failed
				127
				128	5. Wakes up SCSI EH thread if shost->host_busy == shost->host_failed
				129
				130	As can be seen above, once any scmd is added to shost->eh_cmd_q,
				131	SHOST_RECOVERY shost_state bit is turned on. This prevents any new
				132	scmd to be issued from blk queue to the host; eventually, all scmds on
				133	the host either complete normally, fail and get added to eh_cmd_q, or
				134	time out and get added to shost->eh_cmd_q.
				135
				136	If all scmds either complete or fail, the number of in-flight scmds
				137	becomes equal to the number of failed scmds - i.e. shost->host_busy ==
				138	shost->host_failed. This wakes up SCSI EH thread. So, once woken up,
				139	SCSI EH thread can expect that all in-flight commands have failed and
				140	are linked on shost->eh_cmd_q.
				141
				142	Note that this does not mean lower layers are quiescent. If a LLDD
				143	completed a scmd with error status, the LLDD and lower layers are
				144	assumed to forget about the scmd at that point. However, if a scmd
Stefan Richter	8c0ae65	2005-11-05 01:35:05 +0100	[diff] [blame]	145	has timed out, unless hostt->eh_timed_out() made lower layers forget
Tejun Heo	70c83e1	2005-09-11 09:37:19 +0900	[diff] [blame]	146	about the scmd, which currently no LLDD does, the command is still
				147	active as long as lower layers are concerned and completion could
				148	occur at any time. Of course, all such completions are ignored as the
				149	timer has already expired.
				150
				151	We'll talk about how SCSI EH takes actions to abort - make LLDD
				152	forget about - timed out scmds later.
				153
				154
				155	[2] How SCSI EH works
				156
				157	LLDD's can implement SCSI EH actions in one of the following two
				158	ways.
				159
				160	- Fine-grained EH callbacks
				161	LLDD can implement fine-grained EH callbacks and let SCSI
				162	midlayer drive error handling and call appropriate callbacks.
Matt LaPlante	fff9289	2006-10-03 22:47:42 +0200	[diff] [blame]	163	This will be discussed further in [2-1].
Tejun Heo	70c83e1	2005-09-11 09:37:19 +0900	[diff] [blame]	164
				165	- eh_strategy_handler() callback
				166	This is one big callback which should perform whole error
				167	handling. As such, it should do all choirs SCSI midlayer
				168	performs during recovery. This will be discussed in [2-2].
				169
				170	Once recovery is complete, SCSI EH resumes normal operation by
				171	calling scsi_restart_operations(), which
				172
				173	1. Checks if door locking is needed and locks door.
				174
				175	2. Clears SHOST_RECOVERY shost_state bit
				176
				177	3. Wakes up waiters on shost->host_wait. This occurs if someone
				178	calls scsi_block_when_processing_errors() on the host.
				179	(QUESTION why is it needed? All operations will be blocked
				180	anyway after it reaches blk queue.)
				181
				182	4. Kicks queues in all devices on the host in the asses
				183
				184
				185	[2-1] EH through fine-grained callbacks
				186
				187	[2-1-1] Overview
				188
				189	If eh_strategy_handler() is not present, SCSI midlayer takes charge
				190	of driving error handling. EH's goals are two - make LLDD, host and
				191	device forget about timed out scmds and make them ready for new
				192	commands. A scmd is said to be recovered if the scmd is forgotten by
				193	lower layers and lower layers are ready to process or fail the scmd
				194	again.
				195
				196	To achieve these goals, EH performs recovery actions with increasing
Matt LaPlante	2fe0ae7	2006-10-03 22:50:39 +0200	[diff] [blame]	197	severity. Some actions are performed by issuing SCSI commands and
Tejun Heo	70c83e1	2005-09-11 09:37:19 +0900	[diff] [blame]	198	others are performed by invoking one of the following fine-grained
				199	hostt EH callbacks. Callbacks may be omitted and omitted ones are
				200	considered to fail always.
				201
				202	int (* eh_abort_handler)(struct scsi_cmnd *);
				203	int (* eh_device_reset_handler)(struct scsi_cmnd *);
				204	int (* eh_bus_reset_handler)(struct scsi_cmnd *);
				205	int (* eh_host_reset_handler)(struct scsi_cmnd *);
				206
				207	Higher-severity actions are taken only when lower-severity actions
				208	cannot recover some of failed scmds. Also, note that failure of the
				209	highest-severity action means EH failure and results in offlining of
				210	all unrecovered devices.
				211
				212	During recovery, the following rules are followed
				213
				214	- Recovery actions are performed on failed scmds on the to do list,
				215	eh_work_q. If a recovery action succeeds for a scmd, recovered
				216	scmds are removed from eh_work_q.
				217
				218	Note that single recovery action on a scmd can recover multiple
				219	scmds. e.g. resetting a device recovers all failed scmds on the
				220	device.
				221
				222	- Higher severity actions are taken iff eh_work_q is not empty after
				223	lower severity actions are complete.
				224
				225	- EH reuses failed scmds to issue commands for recovery. For
				226	timed-out scmds, SCSI EH ensures that LLDD forgets about a scmd
				227	before reusing it for EH commands.
				228
				229	When a scmd is recovered, the scmd is moved from eh_work_q to EH
				230	local eh_done_q using scsi_eh_finish_cmd(). After all scmds are
				231	recovered (eh_work_q is empty), scsi_eh_flush_done_q() is invoked to
				232	either retry or error-finish (notify upper layer of failure) recovered
				233	scmds.
				234
				235	scmds are retried iff its sdev is still online (not offlined during
				236	EH), REQ_FAILFAST is not set and ++scmd->retries is less than
				237	scmd->allowed.
				238
				239
				240	[2-1-2] Flow of scmds through EH
				241
				242	1. Error completion / time out
				243	ACTION: scsi_eh_scmd_add() is invoked for scmd
				244	- set scmd->eh_eflags
				245	- add scmd to shost->eh_cmd_q
				246	- set SHOST_RECOVERY
				247	- shost->host_failed++
				248	LOCKING: shost->host_lock
				249
				250	2. EH starts
				251	ACTION: move all scmds to EH's local eh_work_q. shost->eh_cmd_q
				252	is cleared.
				253	LOCKING: shost->host_lock (not strictly necessary, just for
				254	consistency)
				255
				256	3. scmd recovered
				257	ACTION: scsi_eh_finish_cmd() is invoked to EH-finish scmd
				258	- shost->host_failed--
				259	- clear scmd->eh_eflags
				260	- scsi_setup_cmd_retry()
				261	- move from local eh_work_q to local eh_done_q
				262	LOCKING: none
				263
				264	4. EH completes
				265	ACTION: scsi_eh_flush_done_q() retries scmds or notifies upper
				266	layer of failure.
				267	- scmd is removed from eh_done_q and scmd->eh_entry is cleared
				268	- if retry is necessary, scmd is requeued using
				269	scsi_queue_insert()
				270	- otherwise, scsi_finish_command() is invoked for scmd
				271	LOCKING: queue or finish function performs appropriate locking
				272
				273
				274	[2-1-3] Flow of control
				275
				276	EH through fine-grained callbacks start from scsi_unjam_host().
				277
				278	<<scsi_unjam_host>>
				279
				280	1. Lock shost->host_lock, splice_init shost->eh_cmd_q into local
				281	eh_work_q and unlock host_lock. Note that shost->eh_cmd_q is
				282	cleared by this action.
				283
				284	2. Invoke scsi_eh_get_sense.
				285
				286	<<scsi_eh_get_sense>>
				287
				288	This action is taken for each error-completed
				289	(!SCSI_EH_CANCEL_CMD) commands without valid sense data. Most
				290	SCSI transports/LLDDs automatically acquire sense data on
				291	command failures (autosense). Autosense is recommended for
				292	performance reasons and as sense information could get out of
				293	sync inbetween occurrence of CHECK CONDITION and this action.
				294
				295	Note that if autosense is not supported, scmd->sense_buffer
				296	contains invalid sense data when error-completing the scmd
				297	with scsi_done(). scsi_decide_disposition() always returns
				298	FAILED in such cases thus invoking SCSI EH. When the scmd
				299	reaches here, sense data is acquired and
				300	scsi_decide_disposition() is called again.
				301
				302	1. Invoke scsi_request_sense() which issues REQUEST_SENSE
				303	command. If fails, no action. Note that taking no action
				304	causes higher-severity recovery to be taken for the scmd.
				305
				306	2. Invoke scsi_decide_disposition() on the scmd
				307
				308	- SUCCESS
				309	scmd->retries is set to scmd->allowed preventing
				310	scsi_eh_flush_done_q() from retrying the scmd and
				311	scsi_eh_finish_cmd() is invoked.
				312
				313	- NEEDS_RETRY
				314	scsi_eh_finish_cmd() invoked
				315
				316	- otherwise
				317	No action.
				318
				319	3. If !list_empty(&eh_work_q), invoke scsi_eh_abort_cmds().
				320
				321	<<scsi_eh_abort_cmds>>
				322
				323	This action is taken for each timed out command.
				324	hostt->eh_abort_handler() is invoked for each scmd. The
				325	handler returns SUCCESS if it has succeeded to make LLDD and
				326	all related hardware forget about the scmd.
				327
				328	If a timedout scmd is successfully aborted and the sdev is
				329	either offline or ready, scsi_eh_finish_cmd() is invoked for
				330	the scmd. Otherwise, the scmd is left in eh_work_q for
				331	higher-severity actions.
				332
				333	Note that both offline and ready status mean that the sdev is
				334	ready to process new scmds, where processing also implies
				335	immediate failing; thus, if a sdev is in one of the two
				336	states, no further recovery action is needed.
				337
				338	Device readiness is tested using scsi_eh_tur() which issues
				339	TEST_UNIT_READY command. Note that the scmd must have been
				340	aborted successfully before reusing it for TEST_UNIT_READY.
				341
				342	4. If !list_empty(&eh_work_q), invoke scsi_eh_ready_devs()
				343
				344	<<scsi_eh_ready_devs>>
				345
				346	This function takes four increasingly more severe measures to
				347	make failed sdevs ready for new commands.
				348
				349	1. Invoke scsi_eh_stu()
				350
				351	<<scsi_eh_stu>>
				352
				353	For each sdev which has failed scmds with valid sense data
				354	of which scsi_check_sense()'s verdict is FAILED,
				355	START_STOP_UNIT command is issued w/ start=1. Note that
				356	as we explicitly choose error-completed scmds, it is known
				357	that lower layers have forgotten about the scmd and we can
				358	reuse it for STU.
				359
				360	If STU succeeds and the sdev is either offline or ready,
				361	all failed scmds on the sdev are EH-finished with
				362	scsi_eh_finish_cmd().
				363
				364	NOTE If hostt->eh_abort_handler() isn't implemented or
				365	failed, we may still have timed out scmds at this point
				366	and STU doesn't make lower layers forget about those
				367	scmds. Yet, this function EH-finish all scmds on the sdev
				368	if STU succeeds leaving lower layers in an inconsistent
				369	state. It seems that STU action should be taken only when
				370	a sdev has no timed out scmd.
				371
				372	2. If !list_empty(&eh_work_q), invoke scsi_eh_bus_device_reset().
				373
				374	<<scsi_eh_bus_device_reset>>
				375
				376	This action is very similar to scsi_eh_stu() except that,
				377	instead of issuing STU, hostt->eh_device_reset_handler()
				378	is used. Also, as we're not issuing SCSI commands and
				379	resetting clears all scmds on the sdev, there is no need
				380	to choose error-completed scmds.
				381
				382	3. If !list_empty(&eh_work_q), invoke scsi_eh_bus_reset()
				383
				384	<<scsi_eh_bus_reset>>
				385
				386	hostt->eh_bus_reset_handler() is invoked for each channel
				387	with failed scmds. If bus reset succeeds, all failed
				388	scmds on all ready or offline sdevs on the channel are
				389	EH-finished.
				390
				391	4. If !list_empty(&eh_work_q), invoke scsi_eh_host_reset()
				392
				393	<<scsi_eh_host_reset>>
				394
				395	This is the last resort. hostt->eh_host_reset_handler()
				396	is invoked. If host reset succeeds, all failed scmds on
				397	all ready or offline sdevs on the host are EH-finished.
				398
				399	5. If !list_empty(&eh_work_q), invoke scsi_eh_offline_sdevs()
				400
				401	<<scsi_eh_offline_sdevs>>
				402
				403	Take all sdevs which still have unrecovered scmds offline
				404	and EH-finish the scmds.
				405
				406	5. Invoke scsi_eh_flush_done_q().
				407
				408	<<scsi_eh_flush_done_q>>
				409
				410	At this point all scmds are recovered (or given up) and
				411	put on eh_done_q by scsi_eh_finish_cmd(). This function
				412	flushes eh_done_q by either retrying or notifying upper
				413	layer of failure of the scmds.
				414
				415
Christoph Hellwig	9227c33	2006-04-01 19:21:04 +0200	[diff] [blame]	416	[2-2] EH through transportt->eh_strategy_handler()
Tejun Heo	70c83e1	2005-09-11 09:37:19 +0900	[diff] [blame]	417
Christoph Hellwig	9227c33	2006-04-01 19:21:04 +0200	[diff] [blame]	418	transportt->eh_strategy_handler() is invoked in the place of
Tejun Heo	70c83e1	2005-09-11 09:37:19 +0900	[diff] [blame]	419	scsi_unjam_host() and it is responsible for whole recovery process.
				420	On completion, the handler should have made lower layers forget about
				421	all failed scmds and either ready for new commands or offline. Also,
				422	it should perform SCSI EH maintenance choirs to maintain integrity of
				423	SCSI midlayer. IOW, of the steps described in [2-1-2], all steps
				424	except for #1 must be implemented by eh_strategy_handler().
				425
				426
Christoph Hellwig	9227c33	2006-04-01 19:21:04 +0200	[diff] [blame]	427	[2-2-1] Pre transportt->eh_strategy_handler() SCSI midlayer conditions
Tejun Heo	70c83e1	2005-09-11 09:37:19 +0900	[diff] [blame]	428
				429	The following conditions are true on entry to the handler.
				430
				431	- Each failed scmd's eh_flags field is set appropriately.
				432
				433	- Each failed scmd is linked on scmd->eh_cmd_q by scmd->eh_entry.
				434
				435	- SHOST_RECOVERY is set.
				436
				437	- shost->host_failed == shost->host_busy
				438
				439
Christoph Hellwig	9227c33	2006-04-01 19:21:04 +0200	[diff] [blame]	440	[2-2-2] Post transportt->eh_strategy_handler() SCSI midlayer conditions
Tejun Heo	70c83e1	2005-09-11 09:37:19 +0900	[diff] [blame]	441
				442	The following conditions must be true on exit from the handler.
				443
				444	- shost->host_failed is zero.
				445
				446	- Each scmd's eh_eflags field is cleared.
				447
				448	- Each scmd is in such a state that scsi_setup_cmd_retry() on the
				449	scmd doesn't make any difference.
				450
				451	- shost->eh_cmd_q is cleared.
				452
				453	- Each scmd->eh_entry is cleared.
				454
				455	- Either scsi_queue_insert() or scsi_finish_command() is called on
				456	each scmd. Note that the handler is free to use scmd->retries and
				457	->allowed to limit the number of retries.
				458
				459
				460	[2-2-3] Things to consider
				461
				462	- Know that timed out scmds are still active on lower layers. Make
				463	lower layers forget about them before doing anything else with
				464	those scmds.
				465
				466	- For consistency, when accessing/modifying shost data structure,
				467	grab shost->host_lock.
				468
				469	- On completion, each failed sdev must have forgotten about all
				470	active scmds.
				471
				472	- On completion, each failed sdev must be ready for new commands or
				473	offline.
				474
				475
				476	--
				477	Tejun Heo
				478	htejun@gmail.com
				479	11th September 2005