Blame - Documentation/pci-error-recovery.txt - kernel/msm-5.4

blob: d089967e4948b730b9f48084172018c2b1efb0f8 [file] [log] [blame]

linas@austin.ibm.com	065c635	2005-12-02 19:16:18 -0600	[diff] [blame]	1
				2	PCI Error Recovery
				3	------------------
				4	May 31, 2005
				5
				6	Current document maintainer:
				7	Linas Vepstas <linas@austin.ibm.com>
				8
				9
				10	Some PCI bus controllers are able to detect certain "hard" PCI errors
				11	on the bus, such as parity errors on the data and address busses, as
				12	well as SERR and PERR errors. These chipsets are then able to disable
				13	I/O to/from the affected device, so that, for example, a bad DMA
				14	address doesn't end up corrupting system memory. These same chipsets
				15	are also able to reset the affected PCI device, and return it to
				16	working condition. This document describes a generic API form
				17	performing error recovery.
				18
				19	The core idea is that after a PCI error has been detected, there must
				20	be a way for the kernel to coordinate with all affected device drivers
				21	so that the pci card can be made operational again, possibly after
				22	performing a full electrical #RST of the PCI card. The API below
				23	provides a generic API for device drivers to be notified of PCI
				24	errors, and to be notified of, and respond to, a reset sequence.
				25
				26	Preliminary sketch of API, cut-n-pasted-n-modified email from
				27	Ben Herrenschmidt, circa 5 april 2005
				28
				29	The error recovery API support is exposed to the driver in the form of
				30	a structure of function pointers pointed to by a new field in struct
				31	pci_driver. The absence of this pointer in pci_driver denotes an
				32	"non-aware" driver, behaviour on these is platform dependant.
				33	Platforms like ppc64 can try to simulate pci hotplug remove/add.
				34
				35	The definition of "pci_error_token" is not covered here. It is based on
				36	Seto's work on the synchronous error detection. We still need to define
				37	functions for extracting infos out of an opaque error token. This is
				38	separate from this API.
				39
				40	This structure has the form:
				41
				42	struct pci_error_handlers
				43	{
				44	int (error_detected)(struct pci_dev dev, pci_error_token error);
				45	int (mmio_enabled)(struct pci_dev dev);
				46	int (resume)(struct pci_dev dev);
				47	int (link_reset)(struct pci_dev dev);
				48	int (slot_reset)(struct pci_dev dev);
				49	};
				50
				51	A driver doesn't have to implement all of these callbacks. The
				52	only mandatory one is error_detected(). If a callback is not
				53	implemented, the corresponding feature is considered unsupported.
				54	For example, if mmio_enabled() and resume() aren't there, then the
				55	driver is assumed as not doing any direct recovery and requires
				56	a reset. If link_reset() is not implemented, the card is assumed as
				57	not caring about link resets, in which case, if recover is supported,
				58	the core can try recover (but not slot_reset() unless it really did
				59	reset the slot). If slot_reset() is not supported, link_reset() can
				60	be called instead on a slot reset.
				61
				62	At first, the call will always be :
				63
				64	1) error_detected()
				65
				66	Error detected. This is sent once after an error has been detected. At
				67	this point, the device might not be accessible anymore depending on the
				68	platform (the slot will be isolated on ppc64). The driver may already
				69	have "noticed" the error because of a failing IO, but this is the proper
				70	"synchronisation point", that is, it gives a chance to the driver to
				71	cleanup, waiting for pending stuff (timers, whatever, etc...) to
				72	complete; it can take semaphores, schedule, etc... everything but touch
				73	the device. Within this function and after it returns, the driver
				74	shouldn't do any new IOs. Called in task context. This is sort of a
				75	"quiesce" point. See note about interrupts at the end of this doc.
				76
				77	Result codes:
				78	- PCIERR_RESULT_CAN_RECOVER:
				79	Driever returns this if it thinks it might be able to recover
				80	the HW by just banging IOs or if it wants to be given
				81	a chance to extract some diagnostic informations (see
				82	below).
				83	- PCIERR_RESULT_NEED_RESET:
				84	Driver returns this if it thinks it can't recover unless the
				85	slot is reset.
				86	- PCIERR_RESULT_DISCONNECT:
				87	Return this if driver thinks it won't recover at all,
				88	(this will detach the driver ? or just leave it
				89	dangling ? to be decided)
				90
				91	So at this point, we have called error_detected() for all drivers
				92	on the segment that had the error. On ppc64, the slot is isolated. What
				93	happens now typically depends on the result from the drivers. If all
				94	drivers on the segment/slot return PCIERR_RESULT_CAN_RECOVER, we would
				95	re-enable IOs on the slot (or do nothing special if the platform doesn't
				96	isolate slots) and call 2). If not and we can reset slots, we go to 4),
				97	if neither, we have a dead slot. If it's an hotplug slot, we might
				98	"simulate" reset by triggering HW unplug/replug though.
				99
				100	>>> Current ppc64 implementation assumes that a device driver will
				101	>>> not schedule or semaphore in this routine; the current ppc64
				102	>>> implementation uses one kernel thread to notify all devices;
				103	>>> thus, of one device sleeps/schedules, all devices are affected.
				104	>>> Doing better requires complex multi-threaded logic in the error
				105	>>> recovery implementation (e.g. waiting for all notification threads
				106	>>> to "join" before proceeding with recovery.) This seems excessively
				107	>>> complex and not worth implementing.
				108
				109	>>> The current ppc64 implementation doesn't much care if the device
				110	>>> attempts i/o at this point, or not. I/O's will fail, returning
				111	>>> a value of 0xff on read, and writes will be dropped. If the device
				112	>>> driver attempts more than 10K I/O's to a frozen adapter, it will
				113	>>> assume that the device driver has gone into an infinite loop, and
				114	>>> it will panic the the kernel.
				115
				116	2) mmio_enabled()
				117
				118	This is the "early recovery" call. IOs are allowed again, but DMA is
				119	not (hrm... to be discussed, I prefer not), with some restrictions. This
				120	is NOT a callback for the driver to start operations again, only to
				121	peek/poke at the device, extract diagnostic information, if any, and
				122	eventually do things like trigger a device local reset or some such,
				123	but not restart operations. This is sent if all drivers on a segment
				124	agree that they can try to recover and no automatic link reset was
				125	performed by the HW. If the platform can't just re-enable IOs without
				126	a slot reset or a link reset, it doesn't call this callback and goes
				127	directly to 3) or 4). All IOs should be done _synchronously_ from
				128	within this callback, errors triggered by them will be returned via
				129	the normal pci_check_whatever() api, no new error_detected() callback
				130	will be issued due to an error happening here. However, such an error
				131	might cause IOs to be re-blocked for the whole segment, and thus
				132	invalidate the recovery that other devices on the same segment might
				133	have done, forcing the whole segment into one of the next states,
				134	that is link reset or slot reset.
				135
				136	Result codes:
				137	- PCIERR_RESULT_RECOVERED
				138	Driver returns this if it thinks the device is fully
				139	functionnal and thinks it is ready to start
				140	normal driver operations again. There is no
				141	guarantee that the driver will actually be
				142	allowed to proceed, as another driver on the
				143	same segment might have failed and thus triggered a
				144	slot reset on platforms that support it.
				145
				146	- PCIERR_RESULT_NEED_RESET
				147	Driver returns this if it thinks the device is not
				148	recoverable in it's current state and it needs a slot
				149	reset to proceed.
				150
				151	- PCIERR_RESULT_DISCONNECT
				152	Same as above. Total failure, no recovery even after
				153	reset driver dead. (To be defined more precisely)
				154
				155	>>> The current ppc64 implementation does not implement this callback.
				156
				157	3) link_reset()
				158
				159	This is called after the link has been reset. This is typically
				160	a PCI Express specific state at this point and is done whenever a
				161	non-fatal error has been detected that can be "solved" by resetting
				162	the link. This call informs the driver of the reset and the driver
				163	should check if the device appears to be in working condition.
				164	This function acts a bit like 2) mmio_enabled(), in that the driver
				165	is not supposed to restart normal driver I/O operations right away.
				166	Instead, it should just "probe" the device to check it's recoverability
				167	status. If all is right, then the core will call resume() once all
				168	drivers have ack'd link_reset().
				169
				170	Result codes:
				171	(identical to mmio_enabled)
				172
				173	>>> The current ppc64 implementation does not implement this callback.
				174
				175	4) slot_reset()
				176
				177	This is called after the slot has been soft or hard reset by the
				178	platform. A soft reset consists of asserting the adapter #RST line
				179	and then restoring the PCI BARs and PCI configuration header. If the
				180	platform supports PCI hotplug, then it might instead perform a hard
				181	reset by toggling power on the slot off/on. This call gives drivers
				182	the chance to re-initialize the hardware (re-download firmware, etc.),
				183	but drivers shouldn't restart normal I/O processing operations at
				184	this point. (See note about interrupts; interrupts aren't guaranteed
				185	to be delivered until the resume() callback has been called). If all
				186	device drivers report success on this callback, the patform will call
				187	resume() to complete the error handling and let the driver restart
				188	normal I/O processing.
				189
				190	A driver can still return a critical failure for this function if
				191	it can't get the device operational after reset. If the platform
				192	previously tried a soft reset, it migh now try a hard reset (power
				193	cycle) and then call slot_reset() again. It the device still can't
				194	be recovered, there is nothing more that can be done; the platform
				195	will typically report a "permanent failure" in such a case. The
				196	device will be considered "dead" in this case.
				197
				198	Result codes:
				199	- PCIERR_RESULT_DISCONNECT
				200	Same as above.
				201
				202	>>> The current ppc64 implementation does not try a power-cycle reset
				203	>>> if the driver returned PCIERR_RESULT_DISCONNECT. However, it should.
				204
				205	5) resume()
				206
				207	This is called if all drivers on the segment have returned
				208	PCIERR_RESULT_RECOVERED from one of the 3 prevous callbacks.
				209	That basically tells the driver to restart activity, tht everything
				210	is back and running. No result code is taken into account here. If
				211	a new error happens, it will restart a new error handling process.
				212
				213	That's it. I think this covers all the possibilities. The way those
				214	callbacks are called is platform policy. A platform with no slot reset
				215	capability for example may want to just "ignore" drivers that can't
				216	recover (disconnect them) and try to let other cards on the same segment
				217	recover. Keep in mind that in most real life cases, though, there will
				218	be only one driver per segment.
				219
				220	Now, there is a note about interrupts. If you get an interrupt and your
				221	device is dead or has been isolated, there is a problem :)
				222
				223	After much thinking, I decided to leave that to the platform. That is,
				224	the recovery API only precies that:
				225
				226	- There is no guarantee that interrupt delivery can proceed from any
				227	device on the segment starting from the error detection and until the
				228	restart callback is sent, at which point interrupts are expected to be
				229	fully operational.
				230
				231	- There is no guarantee that interrupt delivery is stopped, that is, ad
				232	river that gets an interrupts after detecting an error, or that detects
				233	and error within the interrupt handler such that it prevents proper
				234	ack'ing of the interrupt (and thus removal of the source) should just
				235	return IRQ_NOTHANDLED. It's up to the platform to deal with taht
				236	condition, typically by masking the irq source during the duration of
				237	the error handling. It is expected that the platform "knows" which
				238	interrupts are routed to error-management capable slots and can deal
				239	with temporarily disabling that irq number during error processing (this
				240	isn't terribly complex). That means some IRQ latency for other devices
				241	sharing the interrupt, but there is simply no other way. High end
				242	platforms aren't supposed to share interrupts between many devices
				243	anyway :)
				244
				245
				246	Revised: 31 May 2005 Linas Vepstas <linas@austin.ibm.com>