blob: d389388c733e6c718f87de37a885132af6519930 [file] [log] [blame]
Linus Torvalds1da177e2005-04-16 15:20:36 -07001 The MSI Driver Guide HOWTO
2 Tom L Nguyen tom.l.nguyen@intel.com
3 10/03/2003
4 Revised Feb 12, 2004 by Martine Silbermann
5 email: Martine.Silbermann@hp.com
6 Revised Jun 25, 2004 by Tom L Nguyen
7
81. About this guide
9
10This guide describes the basics of Message Signaled Interrupts (MSI),
11the advantages of using MSI over traditional interrupt mechanisms,
12and how to enable your driver to use MSI or MSI-X. Also included is
Randy Dunlap2500e7a2005-11-07 01:01:03 -080013a Frequently Asked Questions (FAQ) section.
14
151.1 Terminology
16
17PCI devices can be single-function or multi-function. In either case,
18when this text talks about enabling or disabling MSI on a "device
19function," it is referring to one specific PCI device and function and
20not to all functions on a PCI device (unless the PCI device has only
21one function).
Linus Torvalds1da177e2005-04-16 15:20:36 -070022
232. Copyright 2003 Intel Corporation
24
253. What is MSI/MSI-X?
26
27Message Signaled Interrupt (MSI), as described in the PCI Local Bus
Randy Dunlap2500e7a2005-11-07 01:01:03 -080028Specification Revision 2.3 or later, is an optional feature, and a
Linus Torvalds1da177e2005-04-16 15:20:36 -070029required feature for PCI Express devices. MSI enables a device function
30to request service by sending an Inbound Memory Write on its PCI bus to
31the FSB as a Message Signal Interrupt transaction. Because MSI is
32generated in the form of a Memory Write, all transaction conditions,
33such as a Retry, Master-Abort, Target-Abort or normal completion, are
34supported.
35
36A PCI device that supports MSI must also support pin IRQ assertion
37interrupt mechanism to provide backward compatibility for systems that
Randy Dunlap2500e7a2005-11-07 01:01:03 -080038do not support MSI. In systems which support MSI, the bus driver is
Linus Torvalds1da177e2005-04-16 15:20:36 -070039responsible for initializing the message address and message data of
40the device function's MSI/MSI-X capability structure during device
41initial configuration.
42
43An MSI capable device function indicates MSI support by implementing
44the MSI/MSI-X capability structure in its PCI capability list. The
45device function may implement both the MSI capability structure and
46the MSI-X capability structure; however, the bus driver should not
47enable both.
48
49The MSI capability structure contains Message Control register,
50Message Address register and Message Data register. These registers
51provide the bus driver control over MSI. The Message Control register
52indicates the MSI capability supported by the device. The Message
53Address register specifies the target address and the Message Data
54register specifies the characteristics of the message. To request
55service, the device function writes the content of the Message Data
56register to the target address. The device and its software driver
57are prohibited from writing to these registers.
58
59The MSI-X capability structure is an optional extension to MSI. It
60uses an independent and separate capability structure. There are
61some key advantages to implementing the MSI-X capability structure
62over the MSI capability structure as described below.
63
64 - Support a larger maximum number of vectors per function.
65
66 - Provide the ability for system software to configure
67 each vector with an independent message address and message
68 data, specified by a table that resides in Memory Space.
69
70 - MSI and MSI-X both support per-vector masking. Per-vector
71 masking is an optional extension of MSI but a required
Randy Dunlap2500e7a2005-11-07 01:01:03 -080072 feature for MSI-X. Per-vector masking provides the kernel the
73 ability to mask/unmask a single MSI while running its
74 interrupt service routine. If per-vector masking is
Linus Torvalds1da177e2005-04-16 15:20:36 -070075 not supported, then the device driver should provide the
76 hardware/software synchronization to ensure that the device
77 generates MSI when the driver wants it to do so.
78
794. Why use MSI?
80
Randy Dunlap2500e7a2005-11-07 01:01:03 -080081As a benefit to the simplification of board design, MSI allows board
82designers to remove out-of-band interrupt routing. MSI is another
Linus Torvalds1da177e2005-04-16 15:20:36 -070083step towards a legacy-free environment.
84
85Due to increasing pressure on chipset and processor packages to
86reduce pin count, the need for interrupt pins is expected to
87diminish over time. Devices, due to pin constraints, may implement
88messages to increase performance.
89
90PCI Express endpoints uses INTx emulation (in-band messages) instead
91of IRQ pin assertion. Using INTx emulation requires interrupt
92sharing among devices connected to the same node (PCI bridge) while
93MSI is unique (non-shared) and does not require BIOS configuration
94support. As a result, the PCI Express technology requires MSI
95support for better interrupt performance.
96
97Using MSI enables the device functions to support two or more
Randy Dunlap2500e7a2005-11-07 01:01:03 -080098vectors, which can be configured to target different CPUs to
Linus Torvalds1da177e2005-04-16 15:20:36 -070099increase scalability.
100
1015. Configuring a driver to use MSI/MSI-X
102
103By default, the kernel will not enable MSI/MSI-X on all devices that
104support this capability. The CONFIG_PCI_MSI kernel option
105must be selected to enable MSI/MSI-X support.
106
1075.1 Including MSI/MSI-X support into the kernel
108
109To allow MSI/MSI-X capable device drivers to selectively enable
110MSI/MSI-X (using pci_enable_msi()/pci_enable_msix() as described
111below), the VECTOR based scheme needs to be enabled by setting
112CONFIG_PCI_MSI during kernel config.
113
114Since the target of the inbound message is the local APIC, providing
115CONFIG_X86_LOCAL_APIC must be enabled as well as CONFIG_PCI_MSI.
116
1175.2 Configuring for MSI support
118
119Due to the non-contiguous fashion in vector assignment of the
120existing Linux kernel, this version does not support multiple
121messages regardless of a device function is capable of supporting
122more than one vector. To enable MSI on a device function's MSI
123capability structure requires a device driver to call the function
124pci_enable_msi() explicitly.
125
1265.2.1 API pci_enable_msi
127
128int pci_enable_msi(struct pci_dev *dev)
129
Randy Dunlap2500e7a2005-11-07 01:01:03 -0800130With this new API, a device driver that wants to have MSI
131enabled on its device function must call this API to enable MSI.
Linus Torvalds1da177e2005-04-16 15:20:36 -0700132A successful call will initialize the MSI capability structure
133with ONE vector, regardless of whether a device function is
134capable of supporting multiple messages. This vector replaces the
Randy Dunlap2500e7a2005-11-07 01:01:03 -0800135pre-assigned dev->irq with a new MSI vector. To avoid a conflict
136of the new assigned vector with existing pre-assigned vector requires
Linus Torvalds1da177e2005-04-16 15:20:36 -0700137a device driver to call this API before calling request_irq().
138
1395.2.2 API pci_disable_msi
140
141void pci_disable_msi(struct pci_dev *dev)
142
143This API should always be used to undo the effect of pci_enable_msi()
144when a device driver is unloading. This API restores dev->irq with
145the pre-assigned IOAPIC vector and switches a device's interrupt
146mode to PCI pin-irq assertion/INTx emulation mode.
147
Randy Dunlap2500e7a2005-11-07 01:01:03 -0800148Note that a device driver should always call free_irq() on the MSI vector
149that it has done request_irq() on before calling this API. Failure to do
150so results in a BUG_ON() and a device will be left with MSI enabled and
Linus Torvalds1da177e2005-04-16 15:20:36 -0700151leaks its vector.
152
1535.2.3 MSI mode vs. legacy mode diagram
154
Randy Dunlap2500e7a2005-11-07 01:01:03 -0800155The below diagram shows the events which switch the interrupt
Linus Torvalds1da177e2005-04-16 15:20:36 -0700156mode on the MSI-capable device function between MSI mode and
157PIN-IRQ assertion mode.
158
159 ------------ pci_enable_msi ------------------------
160 | | <=============== | |
161 | MSI MODE | | PIN-IRQ ASSERTION MODE |
162 | | ===============> | |
163 ------------ pci_disable_msi ------------------------
164
165
Randy Dunlap2500e7a2005-11-07 01:01:03 -0800166Figure 1. MSI Mode vs. Legacy Mode
Linus Torvalds1da177e2005-04-16 15:20:36 -0700167
Randy Dunlap2500e7a2005-11-07 01:01:03 -0800168In Figure 1, a device operates by default in legacy mode. Legacy
Linus Torvalds1da177e2005-04-16 15:20:36 -0700169in this context means PCI pin-irq assertion or PCI-Express INTx
170emulation. A successful MSI request (using pci_enable_msi()) switches
171a device's interrupt mode to MSI mode. A pre-assigned IOAPIC vector
172stored in dev->irq will be saved by the PCI subsystem and a new
173assigned MSI vector will replace dev->irq.
174
175To return back to its default mode, a device driver should always call
176pci_disable_msi() to undo the effect of pci_enable_msi(). Note that a
Randy Dunlap2500e7a2005-11-07 01:01:03 -0800177device driver should always call free_irq() on the MSI vector it has
178done request_irq() on before calling pci_disable_msi(). Failure to do
179so results in a BUG_ON() and a device will be left with MSI enabled and
Linus Torvalds1da177e2005-04-16 15:20:36 -0700180leaks its vector. Otherwise, the PCI subsystem restores a device's
Randy Dunlap2500e7a2005-11-07 01:01:03 -0800181dev->irq with a pre-assigned IOAPIC vector and marks the released
Linus Torvalds1da177e2005-04-16 15:20:36 -0700182MSI vector as unused.
183
184Once being marked as unused, there is no guarantee that the PCI
185subsystem will reserve this MSI vector for a device. Depending on
186the availability of current PCI vector resources and the number of
187MSI/MSI-X requests from other drivers, this MSI may be re-assigned.
188
Randy Dunlap2500e7a2005-11-07 01:01:03 -0800189For the case where the PCI subsystem re-assigns this MSI vector to
190another driver, a request to switch back to MSI mode may result
Linus Torvalds1da177e2005-04-16 15:20:36 -0700191in being assigned a different MSI vector or a failure if no more
192vectors are available.
193
1945.3 Configuring for MSI-X support
195
196Due to the ability of the system software to configure each vector of
197the MSI-X capability structure with an independent message address
198and message data, the non-contiguous fashion in vector assignment of
199the existing Linux kernel has no impact on supporting multiple
200messages on an MSI-X capable device functions. To enable MSI-X on
201a device function's MSI-X capability structure requires its device
202driver to call the function pci_enable_msix() explicitly.
203
204The function pci_enable_msix(), once invoked, enables either
205all or nothing, depending on the current availability of PCI vector
206resources. If the PCI vector resources are available for the number
207of vectors requested by a device driver, this function will configure
208the MSI-X table of the MSI-X capability structure of a device with
209requested messages. To emphasize this reason, for example, a device
210may be capable for supporting the maximum of 32 vectors while its
211software driver usually may request 4 vectors. It is recommended
212that the device driver should call this function once during the
213initialization phase of the device driver.
214
215Unlike the function pci_enable_msi(), the function pci_enable_msix()
216does not replace the pre-assigned IOAPIC dev->irq with a new MSI
217vector because the PCI subsystem writes the 1:1 vector-to-entry mapping
218into the field vector of each element contained in a second argument.
Randy Dunlap2500e7a2005-11-07 01:01:03 -0800219Note that the pre-assigned IOAPIC dev->irq is valid only if the device
220operates in PIN-IRQ assertion mode. In MSI-X mode, any attempt at
Linus Torvalds1da177e2005-04-16 15:20:36 -0700221using dev->irq by the device driver to request for interrupt service
Matt LaPlante4ae0edc2006-11-30 04:58:40 +0100222may result in unpredictable behavior.
Linus Torvalds1da177e2005-04-16 15:20:36 -0700223
Randy Dunlap2500e7a2005-11-07 01:01:03 -0800224For each MSI-X vector granted, a device driver is responsible for calling
Linus Torvalds1da177e2005-04-16 15:20:36 -0700225other functions like request_irq(), enable_irq(), etc. to enable
226this vector with its corresponding interrupt service handler. It is
227a device driver's choice to assign all vectors with the same
228interrupt service handler or each vector with a unique interrupt
229service handler.
230
2315.3.1 Handling MMIO address space of MSI-X Table
232
233The PCI 3.0 specification has implementation notes that MMIO address
234space for a device's MSI-X structure should be isolated so that the
Randy Dunlap2500e7a2005-11-07 01:01:03 -0800235software system can set different pages for controlling accesses to the
236MSI-X structure. The implementation of MSI support requires the PCI
Linus Torvalds1da177e2005-04-16 15:20:36 -0700237subsystem, not a device driver, to maintain full control of the MSI-X
Randy Dunlap2500e7a2005-11-07 01:01:03 -0800238table/MSI-X PBA (Pending Bit Array) and MMIO address space of the MSI-X
239table/MSI-X PBA. A device driver is prohibited from requesting the MMIO
240address space of the MSI-X table/MSI-X PBA. Otherwise, the PCI subsystem
241will fail enabling MSI-X on its hardware device when it calls the function
Linus Torvalds1da177e2005-04-16 15:20:36 -0700242pci_enable_msix().
243
2445.3.2 Handling MSI-X allocation
245
246Determining the number of MSI-X vectors allocated to a function is
247dependent on the number of MSI capable devices and MSI-X capable
248devices populated in the system. The policy of allocating MSI-X
249vectors to a function is defined as the following:
250
251#of MSI-X vectors allocated to a function = (x - y)/z where
252
253x = The number of available PCI vector resources by the time
254 the device driver calls pci_enable_msix(). The PCI vector
255 resources is the sum of the number of unassigned vectors
256 (new) and the number of released vectors when any MSI/MSI-X
257 device driver switches its hardware device back to a legacy
258 mode or is hot-removed. The number of unassigned vectors
259 may exclude some vectors reserved, as defined in parameter
260 NR_HP_RESERVED_VECTORS, for the case where the system is
261 capable of supporting hot-add/hot-remove operations. Users
262 may change the value defined in NR_HR_RESERVED_VECTORS to
263 meet their specific needs.
264
265y = The number of MSI capable devices populated in the system.
266 This policy ensures that each MSI capable device has its
267 vector reserved to avoid the case where some MSI-X capable
268 drivers may attempt to claim all available vector resources.
269
Matt LaPlanted6bc8ac2006-10-03 22:54:15 +0200270z = The number of MSI-X capable devices populated in the system.
Linus Torvalds1da177e2005-04-16 15:20:36 -0700271 This policy ensures that maximum (x - y) is distributed
272 evenly among MSI-X capable devices.
273
274Note that the PCI subsystem scans y and z during a bus enumeration.
275When the PCI subsystem completes configuring MSI/MSI-X capability
276structure of a device as requested by its device driver, y/z is
277decremented accordingly.
278
2795.3.3 Handling MSI-X shortages
280
281For the case where fewer MSI-X vectors are allocated to a function
282than requested, the function pci_enable_msix() will return the
283maximum number of MSI-X vectors available to the caller. A device
284driver may re-send its request with fewer or equal vectors indicated
Randy Dunlap2500e7a2005-11-07 01:01:03 -0800285in the return. For example, if a device driver requests 5 vectors, but
286the number of available vectors is 3 vectors, a value of 3 will be
287returned as a result of pci_enable_msix() call. A function could be
Linus Torvalds1da177e2005-04-16 15:20:36 -0700288designed for its driver to use only 3 MSI-X table entries as
289different combinations as ABC--, A-B-C, A--CB, etc. Note that this
290patch does not support multiple entries with the same vector. Such
291attempt by a device driver to use 5 MSI-X table entries with 3 vectors
292as ABBCC, AABCC, BCCBA, etc will result as a failure by the function
293pci_enable_msix(). Below are the reasons why supporting multiple
294entries with the same vector is an undesirable solution.
295
Randy Dunlap2500e7a2005-11-07 01:01:03 -0800296 - The PCI subsystem cannot determine the entry that
297 generated the message to mask/unmask MSI while handling
Linus Torvalds1da177e2005-04-16 15:20:36 -0700298 software driver ISR. Attempting to walk through all MSI-X
299 table entries (2048 max) to mask/unmask any match vector
300 is an undesirable solution.
301
Randy Dunlap2500e7a2005-11-07 01:01:03 -0800302 - Walking through all MSI-X table entries (2048 max) to handle
Linus Torvalds1da177e2005-04-16 15:20:36 -0700303 SMP affinity of any match vector is an undesirable solution.
304
3055.3.4 API pci_enable_msix
306
Randy Dunlap2500e7a2005-11-07 01:01:03 -0800307int pci_enable_msix(struct pci_dev *dev, struct msix_entry *entries, int nvec)
Linus Torvalds1da177e2005-04-16 15:20:36 -0700308
309This API enables a device driver to request the PCI subsystem
Randy Dunlap2500e7a2005-11-07 01:01:03 -0800310to enable MSI-X messages on its hardware device. Depending on
Linus Torvalds1da177e2005-04-16 15:20:36 -0700311the availability of PCI vectors resources, the PCI subsystem enables
Randy Dunlap2500e7a2005-11-07 01:01:03 -0800312either all or none of the requested vectors.
Linus Torvalds1da177e2005-04-16 15:20:36 -0700313
Randy Dunlap2500e7a2005-11-07 01:01:03 -0800314Argument 'dev' points to the device (pci_dev) structure.
Linus Torvalds1da177e2005-04-16 15:20:36 -0700315
Randy Dunlap2500e7a2005-11-07 01:01:03 -0800316Argument 'entries' is a pointer to an array of msix_entry structs.
317The number of entries is indicated in argument 'nvec'.
318struct msix_entry is defined in /driver/pci/msi.h:
Linus Torvalds1da177e2005-04-16 15:20:36 -0700319
320struct msix_entry {
321 u16 vector; /* kernel uses to write alloc vector */
322 u16 entry; /* driver uses to specify entry */
323};
324
Randy Dunlap2500e7a2005-11-07 01:01:03 -0800325A device driver is responsible for initializing the field 'entry' of
326each element with a unique entry supported by MSI-X table. Otherwise,
Linus Torvalds1da177e2005-04-16 15:20:36 -0700327-EINVAL will be returned as a result. A successful return of zero
Randy Dunlap2500e7a2005-11-07 01:01:03 -0800328indicates the PCI subsystem completed initializing each of the requested
Linus Torvalds1da177e2005-04-16 15:20:36 -0700329entries of the MSI-X table with message address and message data.
330Last but not least, the PCI subsystem will write the 1:1
Randy Dunlap2500e7a2005-11-07 01:01:03 -0800331vector-to-entry mapping into the field 'vector' of each element. A
332device driver is responsible for keeping track of allocated MSI-X
Linus Torvalds1da177e2005-04-16 15:20:36 -0700333vectors in its internal data structure.
334
Randy Dunlap2500e7a2005-11-07 01:01:03 -0800335A return of zero indicates that the number of MSI-X vectors was
Linus Torvalds1da177e2005-04-16 15:20:36 -0700336successfully allocated. A return of greater than zero indicates
337MSI-X vector shortage. Or a return of less than zero indicates
338a failure. This failure may be a result of duplicate entries
339specified in second argument, or a result of no available vector,
340or a result of failing to initialize MSI-X table entries.
341
3425.3.5 API pci_disable_msix
343
344void pci_disable_msix(struct pci_dev *dev)
345
346This API should always be used to undo the effect of pci_enable_msix()
347when a device driver is unloading. Note that a device driver should
348always call free_irq() on all MSI-X vectors it has done request_irq()
Randy Dunlap2500e7a2005-11-07 01:01:03 -0800349on before calling this API. Failure to do so results in a BUG_ON() and
Linus Torvalds1da177e2005-04-16 15:20:36 -0700350a device will be left with MSI-X enabled and leaks its vectors.
351
3525.3.6 MSI-X mode vs. legacy mode diagram
353
Randy Dunlap2500e7a2005-11-07 01:01:03 -0800354The below diagram shows the events which switch the interrupt
Linus Torvalds1da177e2005-04-16 15:20:36 -0700355mode on the MSI-X capable device function between MSI-X mode and
356PIN-IRQ assertion mode (legacy).
357
358 ------------ pci_enable_msix(,,n) ------------------------
359 | | <=============== | |
360 | MSI-X MODE | | PIN-IRQ ASSERTION MODE |
361 | | ===============> | |
362 ------------ pci_disable_msix ------------------------
363
Randy Dunlap2500e7a2005-11-07 01:01:03 -0800364Figure 2. MSI-X Mode vs. Legacy Mode
Linus Torvalds1da177e2005-04-16 15:20:36 -0700365
Randy Dunlap2500e7a2005-11-07 01:01:03 -0800366In Figure 2, a device operates by default in legacy mode. A
Linus Torvalds1da177e2005-04-16 15:20:36 -0700367successful MSI-X request (using pci_enable_msix()) switches a
368device's interrupt mode to MSI-X mode. A pre-assigned IOAPIC vector
369stored in dev->irq will be saved by the PCI subsystem; however,
370unlike MSI mode, the PCI subsystem will not replace dev->irq with
371assigned MSI-X vector because the PCI subsystem already writes the 1:1
Randy Dunlap2500e7a2005-11-07 01:01:03 -0800372vector-to-entry mapping into the field 'vector' of each element
Linus Torvalds1da177e2005-04-16 15:20:36 -0700373specified in second argument.
374
375To return back to its default mode, a device driver should always call
376pci_disable_msix() to undo the effect of pci_enable_msix(). Note that
377a device driver should always call free_irq() on all MSI-X vectors it
378has done request_irq() on before calling pci_disable_msix(). Failure
Randy Dunlap2500e7a2005-11-07 01:01:03 -0800379to do so results in a BUG_ON() and a device will be left with MSI-X
Linus Torvalds1da177e2005-04-16 15:20:36 -0700380enabled and leaks its vectors. Otherwise, the PCI subsystem switches a
381device function's interrupt mode from MSI-X mode to legacy mode and
382marks all allocated MSI-X vectors as unused.
383
384Once being marked as unused, there is no guarantee that the PCI
385subsystem will reserve these MSI-X vectors for a device. Depending on
386the availability of current PCI vector resources and the number of
387MSI/MSI-X requests from other drivers, these MSI-X vectors may be
388re-assigned.
389
390For the case where the PCI subsystem re-assigned these MSI-X vectors
Randy Dunlap2500e7a2005-11-07 01:01:03 -0800391to other drivers, a request to switch back to MSI-X mode may result
Linus Torvalds1da177e2005-04-16 15:20:36 -0700392being assigned with another set of MSI-X vectors or a failure if no
393more vectors are available.
394
Randy Dunlap2500e7a2005-11-07 01:01:03 -08003955.4 Handling function implementing both MSI and MSI-X capabilities
Linus Torvalds1da177e2005-04-16 15:20:36 -0700396
397For the case where a function implements both MSI and MSI-X
398capabilities, the PCI subsystem enables a device to run either in MSI
399mode or MSI-X mode but not both. A device driver determines whether it
400wants MSI or MSI-X enabled on its hardware device. Once a device
Randy Dunlap2500e7a2005-11-07 01:01:03 -0800401driver requests for MSI, for example, it is prohibited from requesting
Linus Torvalds1da177e2005-04-16 15:20:36 -0700402MSI-X; in other words, a device driver is not permitted to ping-pong
403between MSI mod MSI-X mode during a run-time.
404
4055.5 Hardware requirements for MSI/MSI-X support
Randy Dunlap2500e7a2005-11-07 01:01:03 -0800406
Linus Torvalds1da177e2005-04-16 15:20:36 -0700407MSI/MSI-X support requires support from both system hardware and
408individual hardware device functions.
409
4105.5.1 System hardware support
Randy Dunlap2500e7a2005-11-07 01:01:03 -0800411
Linus Torvalds1da177e2005-04-16 15:20:36 -0700412Since the target of MSI address is the local APIC CPU, enabling
Randy Dunlap2500e7a2005-11-07 01:01:03 -0800413MSI/MSI-X support in the Linux kernel is dependent on whether existing
414system hardware supports local APIC. Users should verify that their
415system supports local APIC operation by testing that it runs when
416CONFIG_X86_LOCAL_APIC=y.
Linus Torvalds1da177e2005-04-16 15:20:36 -0700417
418In SMP environment, CONFIG_X86_LOCAL_APIC is automatically set;
419however, in UP environment, users must manually set
420CONFIG_X86_LOCAL_APIC. Once CONFIG_X86_LOCAL_APIC=y, setting
Randy Dunlap2500e7a2005-11-07 01:01:03 -0800421CONFIG_PCI_MSI enables the VECTOR based scheme and the option for
422MSI-capable device drivers to selectively enable MSI/MSI-X.
Linus Torvalds1da177e2005-04-16 15:20:36 -0700423
424Note that CONFIG_X86_IO_APIC setting is irrelevant because MSI/MSI-X
425vector is allocated new during runtime and MSI/MSI-X support does not
426depend on BIOS support. This key independency enables MSI/MSI-X
Randy Dunlap2500e7a2005-11-07 01:01:03 -0800427support on future IOxAPIC free platforms.
Linus Torvalds1da177e2005-04-16 15:20:36 -0700428
4295.5.2 Device hardware support
Randy Dunlap2500e7a2005-11-07 01:01:03 -0800430
Linus Torvalds1da177e2005-04-16 15:20:36 -0700431The hardware device function supports MSI by indicating the
432MSI/MSI-X capability structure on its PCI capability list. By
433default, this capability structure will not be initialized by
434the kernel to enable MSI during the system boot. In other words,
435the device function is running on its default pin assertion mode.
436Note that in many cases the hardware supporting MSI have bugs,
Randy Dunlap2500e7a2005-11-07 01:01:03 -0800437which may result in system hangs. The software driver of specific
438MSI-capable hardware is responsible for deciding whether to call
Linus Torvalds1da177e2005-04-16 15:20:36 -0700439pci_enable_msi or not. A return of zero indicates the kernel
Randy Dunlap2500e7a2005-11-07 01:01:03 -0800440successfully initialized the MSI/MSI-X capability structure of the
Tobias Klauserd533f672005-09-10 00:26:46 -0700441device function. The device function is now running on MSI/MSI-X mode.
Linus Torvalds1da177e2005-04-16 15:20:36 -0700442
4435.6 How to tell whether MSI/MSI-X is enabled on device function
444
445At the driver level, a return of zero from the function call of
446pci_enable_msi()/pci_enable_msix() indicates to a device driver that
447its device function is initialized successfully and ready to run in
448MSI/MSI-X mode.
449
Randy Dunlap2500e7a2005-11-07 01:01:03 -0800450At the user level, users can use the command 'cat /proc/interrupts'
451to display the vectors allocated for devices and their interrupt
452MSI/MSI-X modes ("PCI-MSI"/"PCI-MSI-X"). Below shows MSI mode is
453enabled on a SCSI Adaptec 39320D Ultra320 controller.
Linus Torvalds1da177e2005-04-16 15:20:36 -0700454
455 CPU0 CPU1
456 0: 324639 0 IO-APIC-edge timer
457 1: 1186 0 IO-APIC-edge i8042
458 2: 0 0 XT-PIC cascade
459 12: 2797 0 IO-APIC-edge i8042
460 14: 6543 0 IO-APIC-edge ide0
461 15: 1 0 IO-APIC-edge ide1
462169: 0 0 IO-APIC-level uhci-hcd
463185: 0 0 IO-APIC-level uhci-hcd
Randy Dunlap2500e7a2005-11-07 01:01:03 -0800464193: 138 10 PCI-MSI aic79xx
465201: 30 0 PCI-MSI aic79xx
Linus Torvalds1da177e2005-04-16 15:20:36 -0700466225: 30 0 IO-APIC-level aic7xxx
467233: 30 0 IO-APIC-level aic7xxx
468NMI: 0 0
469LOC: 324553 325068
470ERR: 0
471MIS: 0
472
Brice Goglin0cc2b372006-10-05 10:24:42 +02004736. MSI quirks
474
475Several PCI chipsets or devices are known to not support MSI.
476The PCI stack provides 3 possible levels of MSI disabling:
477* on a single device
478* on all devices behind a specific bridge
479* globally
480
4816.1. Disabling MSI on a single device
482
483Under some circumstances, it might be required to disable MSI on a
484single device, It may be achived by either not calling pci_enable_msi()
485or all, or setting the pci_dev->no_msi flag before (most of the time
486in a quirk).
487
4886.2. Disabling MSI below a bridge
489
490The vast majority of MSI quirks are required by PCI bridges not
491being able to route MSI between busses. In this case, MSI have to be
492disabled on all devices behind this bridge. It is achieves by setting
493the PCI_BUS_FLAGS_NO_MSI flag in the pci_bus->bus_flags of the bridge
494subordinate bus. There is no need to set the same flag on bridges that
495are below the broken brigde. When pci_enable_msi() is called to enable
496MSI on a device, pci_msi_supported() takes care of checking the NO_MSI
497flag in all parent busses of the device.
498
499Some bridges actually support dynamic MSI support enabling/disabling
500by changing some bits in their PCI configuration space (especially
501the Hypertransport chipsets such as the nVidia nForce and Serverworks
502HT2000). It may then be required to update the NO_MSI flag on the
503corresponding devices in the sysfs hierarchy. To enable MSI support
504on device "0000:00:0e", do:
505
506 echo 1 > /sys/bus/pci/devices/0000:00:0e/msi_bus
507
508To disable MSI support, echo 0 instead of 1. Note that it should be
509used with caution since changing this value might break interrupts.
510
5116.3. Disabling MSI globally
512
513Some extreme cases may require to disable MSI globally on the system.
514For now, the only known case is a Serverworks PCI-X chipsets (MSI are
515not supported on several busses that are not all connected to the
516chipset in the Linux PCI hierarchy). In the vast majority of other
517cases, disabling only behind a specific bridge is enough.
518
519For debugging purpose, the user may also pass pci=nomsi on the kernel
520command-line to explicitly disable MSI globally. But, once the appro-
521priate quirks are added to the kernel, this option should not be
522required anymore.
523
5246.4. Finding why MSI cannot be enabled on a device
525
526Assuming that MSI are not enabled on a device, you should look at
527dmesg to find messages that quirks may output when disabling MSI
528on some devices, some bridges or even globally.
529Then, lspci -t gives the list of bridges above a device. Reading
530/sys/bus/pci/devices/0000:00:0e/msi_bus will tell you whether MSI
531are enabled (1) or disabled (0). In 0 is found in a single bridge
532msi_bus file above the device, MSI cannot be enabled.
533
5347. FAQ
Linus Torvalds1da177e2005-04-16 15:20:36 -0700535
536Q1. Are there any limitations on using the MSI?
537
538A1. If the PCI device supports MSI and conforms to the
539specification and the platform supports the APIC local bus,
540then using MSI should work.
541
542Q2. Will it work on all the Pentium processors (P3, P4, Xeon,
543AMD processors)? In P3 IPI's are transmitted on the APIC local
544bus and in P4 and Xeon they are transmitted on the system
545bus. Are there any implications with this?
546
547A2. MSI support enables a PCI device sending an inbound
548memory write (0xfeexxxxx as target address) on its PCI bus
549directly to the FSB. Since the message address has a
550redirection hint bit cleared, it should work.
551
552Q3. The target address 0xfeexxxxx will be translated by the
553Host Bridge into an interrupt message. Are there any
554limitations on the chipsets such as Intel 8xx, Intel e7xxx,
555or VIA?
556
557A3. If these chipsets support an inbound memory write with
558target address set as 0xfeexxxxx, as conformed to PCI
559specification 2.3 or latest, then it should work.
560
561Q4. From the driver point of view, if the MSI is lost because
Randy Dunlap2500e7a2005-11-07 01:01:03 -0800562of errors occurring during inbound memory write, then it may
563wait forever. Is there a mechanism for it to recover?
Linus Torvalds1da177e2005-04-16 15:20:36 -0700564
565A4. Since the target of the transaction is an inbound memory
566write, all transaction termination conditions (Retry,
567Master-Abort, Target-Abort, or normal completion) are
568supported. A device sending an MSI must abide by all the PCI
569rules and conditions regarding that inbound memory write. So,
570if a retry is signaled it must retry, etc... We believe that
571the recommendation for Abort is also a retry (refer to PCI
572specification 2.3 or latest).