Blame - Documentation/scsi/scsi_eh.txt - kernel/msm-4.9

blob: a0c85110a07ef7be95511ae13fa982b5692e1ca1 [file] [log] [blame]

Tejun Heo	70c83e1	2005-09-11 09:37:19 +0900	[diff] [blame]	1
				2	SCSI EH
				3	======================================
				4
				5	This document describes SCSI midlayer error handling infrastructure.
				6	Please refer to Documentation/scsi/scsi_mid_low_api.txt for more
				7	information regarding SCSI midlayer.
				8
				9	TABLE OF CONTENTS
				10
				11	[1] How SCSI commands travel through the midlayer and to EH
				12	[1-1] struct scsi_cmnd
				13	[1-2] How do scmd's get completed?
				14	[1-2-1] Completing a scmd w/ scsi_done
				15	[1-2-2] Completing a scmd w/ timeout
				16	[1-3] How EH takes over
				17	[2] How SCSI EH works
				18	[2-1] EH through fine-grained callbacks
				19	[2-1-1] Overview
				20	[2-1-2] Flow of scmds through EH
				21	[2-1-3] Flow of control
Christoph Hellwig	9227c33	2006-04-01 19:21:04 +0200	[diff] [blame]	22	[2-2] EH through transportt->eh_strategy_handler()
				23	[2-2-1] Pre transportt->eh_strategy_handler() SCSI midlayer conditions
				24	[2-2-2] Post transportt->eh_strategy_handler() SCSI midlayer conditions
Tejun Heo	70c83e1	2005-09-11 09:37:19 +0900	[diff] [blame]	25	[2-2-3] Things to consider
				26
				27
				28	[1] How SCSI commands travel through the midlayer and to EH
				29
				30	[1-1] struct scsi_cmnd
				31
				32	Each SCSI command is represented with struct scsi_cmnd (== scmd). A
				33	scmd has two list_head's to link itself into lists. The two are
				34	scmd->list and scmd->eh_entry. The former is used for free list or
				35	per-device allocated scmd list and not of much interest to this EH
				36	discussion. The latter is used for completion and EH lists and unless
				37	otherwise stated scmds are always linked using scmd->eh_entry in this
				38	discussion.
				39
				40
				41	[1-2] How do scmd's get completed?
				42
				43	Once LLDD gets hold of a scmd, either the LLDD will complete the
				44	command by calling scsi_done callback passed from midlayer when
Hannes Reinecke	6ad5550	2013-11-11 13:44:57 +0100	[diff] [blame^]	45	invoking hostt->queuecommand() or the block layer will time it out.
Tejun Heo	70c83e1	2005-09-11 09:37:19 +0900	[diff] [blame]	46
				47
				48	[1-2-1] Completing a scmd w/ scsi_done
				49
				50	For all non-EH commands, scsi_done() is the completion callback. It
Hannes Reinecke	6ad5550	2013-11-11 13:44:57 +0100	[diff] [blame^]	51	just calls blk_complete_request() to delete the block layer timer and
				52	raise SCSI_SOFTIRQ
Tejun Heo	70c83e1	2005-09-11 09:37:19 +0900	[diff] [blame]	53
				54	SCSI_SOFTIRQ handler scsi_softirq calls scsi_decide_disposition() to
				55	determine what to do with the command. scsi_decide_disposition()
				56	looks at the scmd->result value and sense data to determine what to do
				57	with the command.
				58
				59	- SUCCESS
				60	scsi_finish_command() is invoked for the command. The
Hannes Reinecke	6ad5550	2013-11-11 13:44:57 +0100	[diff] [blame^]	61	function does some maintenance chores and then calls
				62	scsi_io_completion() to finish the I/O.
				63	scsi_io_completion() then notifies the block layer on
				64	the completed request by calling blk_end_request and
				65	friends or figures out what to do with the remainder
				66	of the data in case of an error.
Tejun Heo	70c83e1	2005-09-11 09:37:19 +0900	[diff] [blame]	67
				68	- NEEDS_RETRY
				69	- ADD_TO_MLQUEUE
				70	scmd is requeued to blk queue.
				71
				72	- otherwise
				73	scsi_eh_scmd_add(scmd, 0) is invoked for the command. See
Matt LaPlante	5d3f083	2006-11-30 05:21:10 +0100	[diff] [blame]	74	[1-3] for details of this function.
Tejun Heo	70c83e1	2005-09-11 09:37:19 +0900	[diff] [blame]	75
				76
				77	[1-2-2] Completing a scmd w/ timeout
				78
				79	The timeout handler is scsi_times_out(). When a timeout occurs, this
				80	function
				81
Stefan Richter	8c0ae656	2005-11-05 01:35:05 +0100	[diff] [blame]	82	1. invokes optional hostt->eh_timed_out() callback. Return value can
Tejun Heo	70c83e1	2005-09-11 09:37:19 +0900	[diff] [blame]	83	be one of
				84
Hannes Reinecke	6ad5550	2013-11-11 13:44:57 +0100	[diff] [blame^]	85	- BLK_EH_HANDLED
				86	This indicates that eh_timed_out() dealt with the timeout.
				87	The command is passed back to the block layer and completed
				88	via __blk_complete_requests().
Tejun Heo	70c83e1	2005-09-11 09:37:19 +0900	[diff] [blame]	89
Hannes Reinecke	6ad5550	2013-11-11 13:44:57 +0100	[diff] [blame^]	90	NOTE After returning BLK_EH_HANDLED the SCSI layer is
				91	assumed to be finished with the command, and no other
				92	functions from the SCSI layer will be called. So this
				93	should typically only be returned if the eh_timed_out()
				94	handler raced with normal completion.
				95
				96	- BLK_EH_RESET_TIMER
Tejun Heo	70c83e1	2005-09-11 09:37:19 +0900	[diff] [blame]	97	This indicates that more time is required to finish the
				98	command. Timer is restarted. This action is counted as a
				99	retry and only allowed scmd->allowed + 1(!) times. Once the
Hannes Reinecke	6ad5550	2013-11-11 13:44:57 +0100	[diff] [blame^]	100	limit is reached, action for BLK_EH_NOT_HANDLED is taken instead.
Tejun Heo	70c83e1	2005-09-11 09:37:19 +0900	[diff] [blame]	101
Hannes Reinecke	6ad5550	2013-11-11 13:44:57 +0100	[diff] [blame^]	102	- BLK_EH_NOT_HANDLED
				103	eh_timed_out() callback did not handle the command.
Tejun Heo	70c83e1	2005-09-11 09:37:19 +0900	[diff] [blame]	104	Step #2 is taken.
				105
Hannes Reinecke	6ad5550	2013-11-11 13:44:57 +0100	[diff] [blame^]	106	2. If the host supports asynchronous completion (as indicated by the
				107	no_async_abort setting in the host template) scsi_abort_command()
				108	is invoked to schedule an asynchrous abort. If that fails
				109	Step #3 is taken.
				110
Tejun Heo	70c83e1	2005-09-11 09:37:19 +0900	[diff] [blame]	111	2. scsi_eh_scmd_add(scmd, SCSI_EH_CANCEL_CMD) is invoked for the
				112	command. See [1-3] for more information.
				113
Hannes Reinecke	6ad5550	2013-11-11 13:44:57 +0100	[diff] [blame^]	114	[1-3] Asynchronous command aborts
Tejun Heo	70c83e1	2005-09-11 09:37:19 +0900	[diff] [blame]	115
Hannes Reinecke	6ad5550	2013-11-11 13:44:57 +0100	[diff] [blame^]	116	After a timeout occurs a command abort is scheduled from
				117	scsi_abort_command(). If the abort is successful the command
				118	will either be retried (if the number of retries is not exhausted)
				119	or terminated with DID_TIME_OUT.
				120	Otherwise scsi_eh_scmd_add() is invoked for the command.
				121	See [1-4] for more information.
				122
				123	[1-4] How EH takes over
Tejun Heo	70c83e1	2005-09-11 09:37:19 +0900	[diff] [blame]	124
				125	scmds enter EH via scsi_eh_scmd_add(), which does the following.
				126
				127	1. Turns on scmd->eh_eflags as requested. It's 0 for error
				128	completions and SCSI_EH_CANCEL_CMD for timeouts.
				129
				130	2. Links scmd->eh_entry to shost->eh_cmd_q
				131
				132	3. Sets SHOST_RECOVERY bit in shost->shost_state
				133
				134	4. Increments shost->host_failed
				135
				136	5. Wakes up SCSI EH thread if shost->host_busy == shost->host_failed
				137
				138	As can be seen above, once any scmd is added to shost->eh_cmd_q,
				139	SHOST_RECOVERY shost_state bit is turned on. This prevents any new
				140	scmd to be issued from blk queue to the host; eventually, all scmds on
				141	the host either complete normally, fail and get added to eh_cmd_q, or
				142	time out and get added to shost->eh_cmd_q.
				143
				144	If all scmds either complete or fail, the number of in-flight scmds
				145	becomes equal to the number of failed scmds - i.e. shost->host_busy ==
				146	shost->host_failed. This wakes up SCSI EH thread. So, once woken up,
				147	SCSI EH thread can expect that all in-flight commands have failed and
				148	are linked on shost->eh_cmd_q.
				149
				150	Note that this does not mean lower layers are quiescent. If a LLDD
				151	completed a scmd with error status, the LLDD and lower layers are
				152	assumed to forget about the scmd at that point. However, if a scmd
Stefan Richter	8c0ae656	2005-11-05 01:35:05 +0100	[diff] [blame]	153	has timed out, unless hostt->eh_timed_out() made lower layers forget
Tejun Heo	70c83e1	2005-09-11 09:37:19 +0900	[diff] [blame]	154	about the scmd, which currently no LLDD does, the command is still
				155	active as long as lower layers are concerned and completion could
				156	occur at any time. Of course, all such completions are ignored as the
				157	timer has already expired.
				158
				159	We'll talk about how SCSI EH takes actions to abort - make LLDD
				160	forget about - timed out scmds later.
				161
				162
				163	[2] How SCSI EH works
				164
				165	LLDD's can implement SCSI EH actions in one of the following two
				166	ways.
				167
				168	- Fine-grained EH callbacks
				169	LLDD can implement fine-grained EH callbacks and let SCSI
				170	midlayer drive error handling and call appropriate callbacks.
Matt LaPlante	fff9289	2006-10-03 22:47:42 +0200	[diff] [blame]	171	This will be discussed further in [2-1].
Tejun Heo	70c83e1	2005-09-11 09:37:19 +0900	[diff] [blame]	172
				173	- eh_strategy_handler() callback
				174	This is one big callback which should perform whole error
				175	handling. As such, it should do all choirs SCSI midlayer
				176	performs during recovery. This will be discussed in [2-2].
				177
				178	Once recovery is complete, SCSI EH resumes normal operation by
				179	calling scsi_restart_operations(), which
				180
				181	1. Checks if door locking is needed and locks door.
				182
				183	2. Clears SHOST_RECOVERY shost_state bit
				184
				185	3. Wakes up waiters on shost->host_wait. This occurs if someone
				186	calls scsi_block_when_processing_errors() on the host.
				187	(QUESTION why is it needed? All operations will be blocked
				188	anyway after it reaches blk queue.)
				189
				190	4. Kicks queues in all devices on the host in the asses
				191
				192
				193	[2-1] EH through fine-grained callbacks
				194
				195	[2-1-1] Overview
				196
				197	If eh_strategy_handler() is not present, SCSI midlayer takes charge
				198	of driving error handling. EH's goals are two - make LLDD, host and
				199	device forget about timed out scmds and make them ready for new
				200	commands. A scmd is said to be recovered if the scmd is forgotten by
				201	lower layers and lower layers are ready to process or fail the scmd
				202	again.
				203
				204	To achieve these goals, EH performs recovery actions with increasing
Matt LaPlante	2fe0ae7	2006-10-03 22:50:39 +0200	[diff] [blame]	205	severity. Some actions are performed by issuing SCSI commands and
Tejun Heo	70c83e1	2005-09-11 09:37:19 +0900	[diff] [blame]	206	others are performed by invoking one of the following fine-grained
				207	hostt EH callbacks. Callbacks may be omitted and omitted ones are
				208	considered to fail always.
				209
				210	int (* eh_abort_handler)(struct scsi_cmnd *);
				211	int (* eh_device_reset_handler)(struct scsi_cmnd *);
				212	int (* eh_bus_reset_handler)(struct scsi_cmnd *);
				213	int (* eh_host_reset_handler)(struct scsi_cmnd *);
				214
				215	Higher-severity actions are taken only when lower-severity actions
				216	cannot recover some of failed scmds. Also, note that failure of the
				217	highest-severity action means EH failure and results in offlining of
				218	all unrecovered devices.
				219
				220	During recovery, the following rules are followed
				221
				222	- Recovery actions are performed on failed scmds on the to do list,
				223	eh_work_q. If a recovery action succeeds for a scmd, recovered
				224	scmds are removed from eh_work_q.
				225
				226	Note that single recovery action on a scmd can recover multiple
				227	scmds. e.g. resetting a device recovers all failed scmds on the
				228	device.
				229
				230	- Higher severity actions are taken iff eh_work_q is not empty after
				231	lower severity actions are complete.
				232
				233	- EH reuses failed scmds to issue commands for recovery. For
				234	timed-out scmds, SCSI EH ensures that LLDD forgets about a scmd
				235	before reusing it for EH commands.
				236
				237	When a scmd is recovered, the scmd is moved from eh_work_q to EH
				238	local eh_done_q using scsi_eh_finish_cmd(). After all scmds are
				239	recovered (eh_work_q is empty), scsi_eh_flush_done_q() is invoked to
				240	either retry or error-finish (notify upper layer of failure) recovered
				241	scmds.
				242
				243	scmds are retried iff its sdev is still online (not offlined during
				244	EH), REQ_FAILFAST is not set and ++scmd->retries is less than
				245	scmd->allowed.
				246
				247
				248	[2-1-2] Flow of scmds through EH
				249
				250	1. Error completion / time out
				251	ACTION: scsi_eh_scmd_add() is invoked for scmd
				252	- set scmd->eh_eflags
				253	- add scmd to shost->eh_cmd_q
				254	- set SHOST_RECOVERY
				255	- shost->host_failed++
				256	LOCKING: shost->host_lock
				257
				258	2. EH starts
				259	ACTION: move all scmds to EH's local eh_work_q. shost->eh_cmd_q
				260	is cleared.
				261	LOCKING: shost->host_lock (not strictly necessary, just for
				262	consistency)
				263
				264	3. scmd recovered
				265	ACTION: scsi_eh_finish_cmd() is invoked to EH-finish scmd
				266	- shost->host_failed--
				267	- clear scmd->eh_eflags
				268	- scsi_setup_cmd_retry()
				269	- move from local eh_work_q to local eh_done_q
				270	LOCKING: none
				271
				272	4. EH completes
				273	ACTION: scsi_eh_flush_done_q() retries scmds or notifies upper
				274	layer of failure.
				275	- scmd is removed from eh_done_q and scmd->eh_entry is cleared
				276	- if retry is necessary, scmd is requeued using
				277	scsi_queue_insert()
				278	- otherwise, scsi_finish_command() is invoked for scmd
				279	LOCKING: queue or finish function performs appropriate locking
				280
				281
				282	[2-1-3] Flow of control
				283
				284	EH through fine-grained callbacks start from scsi_unjam_host().
				285
				286	<<scsi_unjam_host>>
				287
				288	1. Lock shost->host_lock, splice_init shost->eh_cmd_q into local
				289	eh_work_q and unlock host_lock. Note that shost->eh_cmd_q is
				290	cleared by this action.
				291
				292	2. Invoke scsi_eh_get_sense.
				293
				294	<<scsi_eh_get_sense>>
				295
				296	This action is taken for each error-completed
				297	(!SCSI_EH_CANCEL_CMD) commands without valid sense data. Most
				298	SCSI transports/LLDDs automatically acquire sense data on
				299	command failures (autosense). Autosense is recommended for
				300	performance reasons and as sense information could get out of
Lucas De Marchi	25985ed	2011-03-30 22:57:33 -0300	[diff] [blame]	301	sync between occurrence of CHECK CONDITION and this action.
Tejun Heo	70c83e1	2005-09-11 09:37:19 +0900	[diff] [blame]	302
				303	Note that if autosense is not supported, scmd->sense_buffer
				304	contains invalid sense data when error-completing the scmd
				305	with scsi_done(). scsi_decide_disposition() always returns
				306	FAILED in such cases thus invoking SCSI EH. When the scmd
				307	reaches here, sense data is acquired and
				308	scsi_decide_disposition() is called again.
				309
				310	1. Invoke scsi_request_sense() which issues REQUEST_SENSE
				311	command. If fails, no action. Note that taking no action
				312	causes higher-severity recovery to be taken for the scmd.
				313
				314	2. Invoke scsi_decide_disposition() on the scmd
				315
				316	- SUCCESS
				317	scmd->retries is set to scmd->allowed preventing
				318	scsi_eh_flush_done_q() from retrying the scmd and
				319	scsi_eh_finish_cmd() is invoked.
				320
				321	- NEEDS_RETRY
				322	scsi_eh_finish_cmd() invoked
				323
				324	- otherwise
				325	No action.
				326
				327	3. If !list_empty(&eh_work_q), invoke scsi_eh_abort_cmds().
				328
				329	<<scsi_eh_abort_cmds>>
				330
Hannes Reinecke	6ad5550	2013-11-11 13:44:57 +0100	[diff] [blame^]	331	This action is taken for each timed out command when
				332	no_async_abort is enabled in the host template.
Tejun Heo	70c83e1	2005-09-11 09:37:19 +0900	[diff] [blame]	333	hostt->eh_abort_handler() is invoked for each scmd. The
				334	handler returns SUCCESS if it has succeeded to make LLDD and
				335	all related hardware forget about the scmd.
				336
				337	If a timedout scmd is successfully aborted and the sdev is
				338	either offline or ready, scsi_eh_finish_cmd() is invoked for
				339	the scmd. Otherwise, the scmd is left in eh_work_q for
				340	higher-severity actions.
				341
				342	Note that both offline and ready status mean that the sdev is
				343	ready to process new scmds, where processing also implies
				344	immediate failing; thus, if a sdev is in one of the two
				345	states, no further recovery action is needed.
				346
				347	Device readiness is tested using scsi_eh_tur() which issues
				348	TEST_UNIT_READY command. Note that the scmd must have been
				349	aborted successfully before reusing it for TEST_UNIT_READY.
				350
				351	4. If !list_empty(&eh_work_q), invoke scsi_eh_ready_devs()
				352
				353	<<scsi_eh_ready_devs>>
				354
				355	This function takes four increasingly more severe measures to
				356	make failed sdevs ready for new commands.
				357
				358	1. Invoke scsi_eh_stu()
				359
				360	<<scsi_eh_stu>>
				361
				362	For each sdev which has failed scmds with valid sense data
				363	of which scsi_check_sense()'s verdict is FAILED,
				364	START_STOP_UNIT command is issued w/ start=1. Note that
				365	as we explicitly choose error-completed scmds, it is known
				366	that lower layers have forgotten about the scmd and we can
				367	reuse it for STU.
				368
				369	If STU succeeds and the sdev is either offline or ready,
				370	all failed scmds on the sdev are EH-finished with
				371	scsi_eh_finish_cmd().
				372
				373	NOTE If hostt->eh_abort_handler() isn't implemented or
				374	failed, we may still have timed out scmds at this point
				375	and STU doesn't make lower layers forget about those
				376	scmds. Yet, this function EH-finish all scmds on the sdev
				377	if STU succeeds leaving lower layers in an inconsistent
				378	state. It seems that STU action should be taken only when
				379	a sdev has no timed out scmd.
				380
				381	2. If !list_empty(&eh_work_q), invoke scsi_eh_bus_device_reset().
				382
				383	<<scsi_eh_bus_device_reset>>
				384
				385	This action is very similar to scsi_eh_stu() except that,
				386	instead of issuing STU, hostt->eh_device_reset_handler()
				387	is used. Also, as we're not issuing SCSI commands and
				388	resetting clears all scmds on the sdev, there is no need
				389	to choose error-completed scmds.
				390
				391	3. If !list_empty(&eh_work_q), invoke scsi_eh_bus_reset()
				392
				393	<<scsi_eh_bus_reset>>
				394
				395	hostt->eh_bus_reset_handler() is invoked for each channel
				396	with failed scmds. If bus reset succeeds, all failed
				397	scmds on all ready or offline sdevs on the channel are
				398	EH-finished.
				399
				400	4. If !list_empty(&eh_work_q), invoke scsi_eh_host_reset()
				401
				402	<<scsi_eh_host_reset>>
				403
				404	This is the last resort. hostt->eh_host_reset_handler()
				405	is invoked. If host reset succeeds, all failed scmds on
				406	all ready or offline sdevs on the host are EH-finished.
				407
				408	5. If !list_empty(&eh_work_q), invoke scsi_eh_offline_sdevs()
				409
				410	<<scsi_eh_offline_sdevs>>
				411
				412	Take all sdevs which still have unrecovered scmds offline
				413	and EH-finish the scmds.
				414
				415	5. Invoke scsi_eh_flush_done_q().
				416
				417	<<scsi_eh_flush_done_q>>
				418
				419	At this point all scmds are recovered (or given up) and
				420	put on eh_done_q by scsi_eh_finish_cmd(). This function
				421	flushes eh_done_q by either retrying or notifying upper
				422	layer of failure of the scmds.
				423
				424
Christoph Hellwig	9227c33	2006-04-01 19:21:04 +0200	[diff] [blame]	425	[2-2] EH through transportt->eh_strategy_handler()
Tejun Heo	70c83e1	2005-09-11 09:37:19 +0900	[diff] [blame]	426
Christoph Hellwig	9227c33	2006-04-01 19:21:04 +0200	[diff] [blame]	427	transportt->eh_strategy_handler() is invoked in the place of
Tejun Heo	70c83e1	2005-09-11 09:37:19 +0900	[diff] [blame]	428	scsi_unjam_host() and it is responsible for whole recovery process.
				429	On completion, the handler should have made lower layers forget about
				430	all failed scmds and either ready for new commands or offline. Also,
				431	it should perform SCSI EH maintenance choirs to maintain integrity of
				432	SCSI midlayer. IOW, of the steps described in [2-1-2], all steps
				433	except for #1 must be implemented by eh_strategy_handler().
				434
				435
Christoph Hellwig	9227c33	2006-04-01 19:21:04 +0200	[diff] [blame]	436	[2-2-1] Pre transportt->eh_strategy_handler() SCSI midlayer conditions
Tejun Heo	70c83e1	2005-09-11 09:37:19 +0900	[diff] [blame]	437
				438	The following conditions are true on entry to the handler.
				439
				440	- Each failed scmd's eh_flags field is set appropriately.
				441
				442	- Each failed scmd is linked on scmd->eh_cmd_q by scmd->eh_entry.
				443
				444	- SHOST_RECOVERY is set.
				445
				446	- shost->host_failed == shost->host_busy
				447
				448
Christoph Hellwig	9227c33	2006-04-01 19:21:04 +0200	[diff] [blame]	449	[2-2-2] Post transportt->eh_strategy_handler() SCSI midlayer conditions
Tejun Heo	70c83e1	2005-09-11 09:37:19 +0900	[diff] [blame]	450
				451	The following conditions must be true on exit from the handler.
				452
				453	- shost->host_failed is zero.
				454
				455	- Each scmd's eh_eflags field is cleared.
				456
				457	- Each scmd is in such a state that scsi_setup_cmd_retry() on the
				458	scmd doesn't make any difference.
				459
				460	- shost->eh_cmd_q is cleared.
				461
				462	- Each scmd->eh_entry is cleared.
				463
				464	- Either scsi_queue_insert() or scsi_finish_command() is called on
				465	each scmd. Note that the handler is free to use scmd->retries and
				466	->allowed to limit the number of retries.
				467
				468
				469	[2-2-3] Things to consider
				470
				471	- Know that timed out scmds are still active on lower layers. Make
				472	lower layers forget about them before doing anything else with
				473	those scmds.
				474
				475	- For consistency, when accessing/modifying shost data structure,
				476	grab shost->host_lock.
				477
				478	- On completion, each failed sdev must have forgotten about all
				479	active scmds.
				480
				481	- On completion, each failed sdev must be ready for new commands or
				482	offline.
				483
				484
				485	--
				486	Tejun Heo
				487	htejun@gmail.com
				488	11th September 2005