Mauro Carvalho Chehab | 1642a1e | 2017-05-14 15:29:55 -0300 | [diff] [blame] | 1 | =============================================== |
| 2 | The irq_domain interrupt number mapping library |
| 3 | =============================================== |
Grant Likely | 7ab3a83 | 2012-02-14 14:06:47 -0700 | [diff] [blame] | 4 | |
| 5 | The current design of the Linux kernel uses a single large number |
| 6 | space where each separate IRQ source is assigned a different number. |
| 7 | This is simple when there is only one interrupt controller, but in |
| 8 | systems with multiple interrupt controllers the kernel must ensure |
| 9 | that each one gets assigned non-overlapping allocations of Linux |
| 10 | IRQ numbers. |
| 11 | |
Linus Walleij | 023bba3 | 2012-12-01 19:05:16 +0100 | [diff] [blame] | 12 | The number of interrupt controllers registered as unique irqchips |
| 13 | show a rising tendency: for example subdrivers of different kinds |
| 14 | such as GPIO controllers avoid reimplementing identical callback |
| 15 | mechanisms as the IRQ core system by modelling their interrupt |
| 16 | handlers as irqchips, i.e. in effect cascading interrupt controllers. |
| 17 | |
| 18 | Here the interrupt number loose all kind of correspondence to |
| 19 | hardware interrupt numbers: whereas in the past, IRQ numbers could |
| 20 | be chosen so they matched the hardware IRQ line into the root |
| 21 | interrupt controller (i.e. the component actually fireing the |
| 22 | interrupt line to the CPU) nowadays this number is just a number. |
| 23 | |
| 24 | For this reason we need a mechanism to separate controller-local |
| 25 | interrupt numbers, called hardware irq's, from Linux IRQ numbers. |
| 26 | |
Grant Likely | 7ab3a83 | 2012-02-14 14:06:47 -0700 | [diff] [blame] | 27 | The irq_alloc_desc*() and irq_free_desc*() APIs provide allocation of |
| 28 | irq numbers, but they don't provide any support for reverse mapping of |
| 29 | the controller-local IRQ (hwirq) number into the Linux IRQ number |
| 30 | space. |
| 31 | |
| 32 | The irq_domain library adds mapping between hwirq and IRQ numbers on |
| 33 | top of the irq_alloc_desc*() API. An irq_domain to manage mapping is |
| 34 | preferred over interrupt controller drivers open coding their own |
| 35 | reverse mapping scheme. |
| 36 | |
Marc Zyngier | e7a46c8 | 2015-10-13 12:51:45 +0100 | [diff] [blame] | 37 | irq_domain also implements translation from an abstract irq_fwspec |
| 38 | structure to hwirq numbers (Device Tree and ACPI GSI so far), and can |
| 39 | be easily extended to support other IRQ topology data sources. |
Grant Likely | 7ab3a83 | 2012-02-14 14:06:47 -0700 | [diff] [blame] | 40 | |
Mauro Carvalho Chehab | 1642a1e | 2017-05-14 15:29:55 -0300 | [diff] [blame] | 41 | irq_domain usage |
| 42 | ================ |
| 43 | |
Grant Likely | 7ab3a83 | 2012-02-14 14:06:47 -0700 | [diff] [blame] | 44 | An interrupt controller driver creates and registers an irq_domain by |
| 45 | calling one of the irq_domain_add_*() functions (each mapping method |
| 46 | has a different allocator function, more on that later). The function |
| 47 | will return a pointer to the irq_domain on success. The caller must |
Jiang Liu | a257954 | 2014-05-27 16:07:37 +0800 | [diff] [blame] | 48 | provide the allocator function with an irq_domain_ops structure. |
Grant Likely | 7ab3a83 | 2012-02-14 14:06:47 -0700 | [diff] [blame] | 49 | |
| 50 | In most cases, the irq_domain will begin empty without any mappings |
| 51 | between hwirq and IRQ numbers. Mappings are added to the irq_domain |
| 52 | by calling irq_create_mapping() which accepts the irq_domain and a |
| 53 | hwirq number as arguments. If a mapping for the hwirq doesn't already |
| 54 | exist then it will allocate a new Linux irq_desc, associate it with |
| 55 | the hwirq, and call the .map() callback so the driver can perform any |
| 56 | required hardware setup. |
| 57 | |
| 58 | When an interrupt is received, irq_find_mapping() function should |
| 59 | be used to find the Linux IRQ number from the hwirq number. |
| 60 | |
Linus Walleij | 023bba3 | 2012-12-01 19:05:16 +0100 | [diff] [blame] | 61 | The irq_create_mapping() function must be called *atleast once* |
| 62 | before any call to irq_find_mapping(), lest the descriptor will not |
| 63 | be allocated. |
| 64 | |
Grant Likely | 7ab3a83 | 2012-02-14 14:06:47 -0700 | [diff] [blame] | 65 | If the driver has the Linux IRQ number or the irq_data pointer, and |
| 66 | needs to know the associated hwirq number (such as in the irq_chip |
| 67 | callbacks) then it can be directly obtained from irq_data->hwirq. |
| 68 | |
Mauro Carvalho Chehab | 1642a1e | 2017-05-14 15:29:55 -0300 | [diff] [blame] | 69 | Types of irq_domain mappings |
| 70 | ============================ |
| 71 | |
Grant Likely | 7ab3a83 | 2012-02-14 14:06:47 -0700 | [diff] [blame] | 72 | There are several mechanisms available for reverse mapping from hwirq |
| 73 | to Linux irq, and each mechanism uses a different allocation function. |
| 74 | Which reverse map type should be used depends on the use case. Each |
| 75 | of the reverse map types are described below: |
| 76 | |
Mauro Carvalho Chehab | 1642a1e | 2017-05-14 15:29:55 -0300 | [diff] [blame] | 77 | Linear |
| 78 | ------ |
| 79 | |
| 80 | :: |
| 81 | |
| 82 | irq_domain_add_linear() |
| 83 | irq_domain_create_linear() |
Grant Likely | 7ab3a83 | 2012-02-14 14:06:47 -0700 | [diff] [blame] | 84 | |
| 85 | The linear reverse map maintains a fixed size table indexed by the |
| 86 | hwirq number. When a hwirq is mapped, an irq_desc is allocated for |
| 87 | the hwirq, and the IRQ number is stored in the table. |
| 88 | |
| 89 | The Linear map is a good choice when the maximum number of hwirqs is |
| 90 | fixed and a relatively small number (~ < 256). The advantages of this |
| 91 | map are fixed time lookup for IRQ numbers, and irq_descs are only |
| 92 | allocated for in-use IRQs. The disadvantage is that the table must be |
| 93 | as large as the largest possible hwirq number. |
| 94 | |
Jianyu Zhan | dbe7fcd | 2016-03-27 11:51:20 +0800 | [diff] [blame] | 95 | irq_domain_add_linear() and irq_domain_create_linear() are functionally |
| 96 | equivalent, except for the first argument is different - the former |
| 97 | accepts an Open Firmware specific 'struct device_node', while the latter |
| 98 | accepts a more general abstraction 'struct fwnode_handle'. |
| 99 | |
Grant Likely | 7ab3a83 | 2012-02-14 14:06:47 -0700 | [diff] [blame] | 100 | The majority of drivers should use the linear map. |
| 101 | |
Mauro Carvalho Chehab | 1642a1e | 2017-05-14 15:29:55 -0300 | [diff] [blame] | 102 | Tree |
| 103 | ---- |
| 104 | |
| 105 | :: |
| 106 | |
| 107 | irq_domain_add_tree() |
| 108 | irq_domain_create_tree() |
Grant Likely | 7ab3a83 | 2012-02-14 14:06:47 -0700 | [diff] [blame] | 109 | |
| 110 | The irq_domain maintains a radix tree map from hwirq numbers to Linux |
| 111 | IRQs. When an hwirq is mapped, an irq_desc is allocated and the |
| 112 | hwirq is used as the lookup key for the radix tree. |
| 113 | |
| 114 | The tree map is a good choice if the hwirq number can be very large |
| 115 | since it doesn't need to allocate a table as large as the largest |
| 116 | hwirq number. The disadvantage is that hwirq to IRQ number lookup is |
| 117 | dependent on how many entries are in the table. |
| 118 | |
Jianyu Zhan | dbe7fcd | 2016-03-27 11:51:20 +0800 | [diff] [blame] | 119 | irq_domain_add_tree() and irq_domain_create_tree() are functionally |
| 120 | equivalent, except for the first argument is different - the former |
| 121 | accepts an Open Firmware specific 'struct device_node', while the latter |
| 122 | accepts a more general abstraction 'struct fwnode_handle'. |
| 123 | |
Kevin Cernekee | 7e229fa | 2014-12-25 09:49:01 -0800 | [diff] [blame] | 124 | Very few drivers should need this mapping. |
Grant Likely | 7ab3a83 | 2012-02-14 14:06:47 -0700 | [diff] [blame] | 125 | |
Mauro Carvalho Chehab | 1642a1e | 2017-05-14 15:29:55 -0300 | [diff] [blame] | 126 | No Map |
| 127 | ------ |
| 128 | |
| 129 | :: |
| 130 | |
| 131 | irq_domain_add_nomap() |
Grant Likely | 7ab3a83 | 2012-02-14 14:06:47 -0700 | [diff] [blame] | 132 | |
| 133 | The No Map mapping is to be used when the hwirq number is |
| 134 | programmable in the hardware. In this case it is best to program the |
| 135 | Linux IRQ number into the hardware itself so that no mapping is |
| 136 | required. Calling irq_create_direct_mapping() will allocate a Linux |
| 137 | IRQ number and call the .map() callback so that driver can program the |
| 138 | Linux IRQ number into the hardware. |
| 139 | |
| 140 | Most drivers cannot use this mapping. |
| 141 | |
Mauro Carvalho Chehab | 1642a1e | 2017-05-14 15:29:55 -0300 | [diff] [blame] | 142 | Legacy |
| 143 | ------ |
| 144 | |
| 145 | :: |
| 146 | |
| 147 | irq_domain_add_simple() |
| 148 | irq_domain_add_legacy() |
| 149 | irq_domain_add_legacy_isa() |
Grant Likely | 7ab3a83 | 2012-02-14 14:06:47 -0700 | [diff] [blame] | 150 | |
| 151 | The Legacy mapping is a special case for drivers that already have a |
| 152 | range of irq_descs allocated for the hwirqs. It is used when the |
| 153 | driver cannot be immediately converted to use the linear mapping. For |
| 154 | example, many embedded system board support files use a set of #defines |
| 155 | for IRQ numbers that are passed to struct device registrations. In that |
| 156 | case the Linux IRQ numbers cannot be dynamically assigned and the legacy |
| 157 | mapping should be used. |
| 158 | |
| 159 | The legacy map assumes a contiguous range of IRQ numbers has already |
| 160 | been allocated for the controller and that the IRQ number can be |
| 161 | calculated by adding a fixed offset to the hwirq number, and |
| 162 | visa-versa. The disadvantage is that it requires the interrupt |
| 163 | controller to manage IRQ allocations and it requires an irq_desc to be |
| 164 | allocated for every hwirq, even if it is unused. |
| 165 | |
| 166 | The legacy map should only be used if fixed IRQ mappings must be |
| 167 | supported. For example, ISA controllers would use the legacy map for |
| 168 | mapping Linux IRQs 0-15 so that existing ISA drivers get the correct IRQ |
| 169 | numbers. |
Mark Brown | 781d0f4 | 2012-07-05 12:19:19 +0100 | [diff] [blame] | 170 | |
| 171 | Most users of legacy mappings should use irq_domain_add_simple() which |
| 172 | will use a legacy domain only if an IRQ range is supplied by the |
Linus Walleij | 023bba3 | 2012-12-01 19:05:16 +0100 | [diff] [blame] | 173 | system and will otherwise use a linear domain mapping. The semantics |
| 174 | of this call are such that if an IRQ range is specified then |
| 175 | descriptors will be allocated on-the-fly for it, and if no range is |
Xishi Qiu | d9a6ed1 | 2013-11-06 13:18:19 -0800 | [diff] [blame] | 176 | specified it will fall through to irq_domain_add_linear() which means |
Linus Walleij | 023bba3 | 2012-12-01 19:05:16 +0100 | [diff] [blame] | 177 | *no* irq descriptors will be allocated. |
| 178 | |
| 179 | A typical use case for simple domains is where an irqchip provider |
| 180 | is supporting both dynamic and static IRQ assignments. |
| 181 | |
| 182 | In order to avoid ending up in a situation where a linear domain is |
| 183 | used and no descriptor gets allocated it is very important to make sure |
| 184 | that the driver using the simple domain call irq_create_mapping() |
| 185 | before any irq_find_mapping() since the latter will actually work |
| 186 | for the static IRQ assignment case. |
Jiang Liu | f8264e3 | 2014-11-06 22:20:14 +0800 | [diff] [blame] | 187 | |
Mauro Carvalho Chehab | 1642a1e | 2017-05-14 15:29:55 -0300 | [diff] [blame] | 188 | Hierarchy IRQ domain |
| 189 | -------------------- |
| 190 | |
Jiang Liu | f8264e3 | 2014-11-06 22:20:14 +0800 | [diff] [blame] | 191 | On some architectures, there may be multiple interrupt controllers |
| 192 | involved in delivering an interrupt from the device to the target CPU. |
Mauro Carvalho Chehab | 1642a1e | 2017-05-14 15:29:55 -0300 | [diff] [blame] | 193 | Let's look at a typical interrupt delivering path on x86 platforms:: |
Jiang Liu | f8264e3 | 2014-11-06 22:20:14 +0800 | [diff] [blame] | 194 | |
Mauro Carvalho Chehab | 1642a1e | 2017-05-14 15:29:55 -0300 | [diff] [blame] | 195 | Device --> IOAPIC -> Interrupt remapping Controller -> Local APIC -> CPU |
Jiang Liu | f8264e3 | 2014-11-06 22:20:14 +0800 | [diff] [blame] | 196 | |
| 197 | There are three interrupt controllers involved: |
Mauro Carvalho Chehab | 1642a1e | 2017-05-14 15:29:55 -0300 | [diff] [blame] | 198 | |
Jiang Liu | f8264e3 | 2014-11-06 22:20:14 +0800 | [diff] [blame] | 199 | 1) IOAPIC controller |
| 200 | 2) Interrupt remapping controller |
| 201 | 3) Local APIC controller |
| 202 | |
| 203 | To support such a hardware topology and make software architecture match |
| 204 | hardware architecture, an irq_domain data structure is built for each |
| 205 | interrupt controller and those irq_domains are organized into hierarchy. |
| 206 | When building irq_domain hierarchy, the irq_domain near to the device is |
| 207 | child and the irq_domain near to CPU is parent. So a hierarchy structure |
Mauro Carvalho Chehab | 1642a1e | 2017-05-14 15:29:55 -0300 | [diff] [blame] | 208 | as below will be built for the example above:: |
| 209 | |
Jiang Liu | f8264e3 | 2014-11-06 22:20:14 +0800 | [diff] [blame] | 210 | CPU Vector irq_domain (root irq_domain to manage CPU vectors) |
| 211 | ^ |
| 212 | | |
| 213 | Interrupt Remapping irq_domain (manage irq_remapping entries) |
| 214 | ^ |
| 215 | | |
| 216 | IOAPIC irq_domain (manage IOAPIC delivery entries/pins) |
| 217 | |
| 218 | There are four major interfaces to use hierarchy irq_domain: |
Mauro Carvalho Chehab | 1642a1e | 2017-05-14 15:29:55 -0300 | [diff] [blame] | 219 | |
Jiang Liu | f8264e3 | 2014-11-06 22:20:14 +0800 | [diff] [blame] | 220 | 1) irq_domain_alloc_irqs(): allocate IRQ descriptors and interrupt |
| 221 | controller related resources to deliver these interrupts. |
| 222 | 2) irq_domain_free_irqs(): free IRQ descriptors and interrupt controller |
| 223 | related resources associated with these interrupts. |
| 224 | 3) irq_domain_activate_irq(): activate interrupt controller hardware to |
| 225 | deliver the interrupt. |
Marc Zyngier | e7a46c8 | 2015-10-13 12:51:45 +0100 | [diff] [blame] | 226 | 4) irq_domain_deactivate_irq(): deactivate interrupt controller hardware |
Jiang Liu | f8264e3 | 2014-11-06 22:20:14 +0800 | [diff] [blame] | 227 | to stop delivering the interrupt. |
| 228 | |
Mauro Carvalho Chehab | 1642a1e | 2017-05-14 15:29:55 -0300 | [diff] [blame] | 229 | Following changes are needed to support hierarchy irq_domain: |
| 230 | |
Jiang Liu | f8264e3 | 2014-11-06 22:20:14 +0800 | [diff] [blame] | 231 | 1) a new field 'parent' is added to struct irq_domain; it's used to |
| 232 | maintain irq_domain hierarchy information. |
| 233 | 2) a new field 'parent_data' is added to struct irq_data; it's used to |
| 234 | build hierarchy irq_data to match hierarchy irq_domains. The irq_data |
| 235 | is used to store irq_domain pointer and hardware irq number. |
| 236 | 3) new callbacks are added to struct irq_domain_ops to support hierarchy |
| 237 | irq_domain operations. |
| 238 | |
| 239 | With support of hierarchy irq_domain and hierarchy irq_data ready, an |
| 240 | irq_domain structure is built for each interrupt controller, and an |
| 241 | irq_data structure is allocated for each irq_domain associated with an |
| 242 | IRQ. Now we could go one step further to support stacked(hierarchy) |
| 243 | irq_chip. That is, an irq_chip is associated with each irq_data along |
| 244 | the hierarchy. A child irq_chip may implement a required action by |
| 245 | itself or by cooperating with its parent irq_chip. |
| 246 | |
| 247 | With stacked irq_chip, interrupt controller driver only needs to deal |
| 248 | with the hardware managed by itself and may ask for services from its |
| 249 | parent irq_chip when needed. So we could achieve a much cleaner |
| 250 | software architecture. |
| 251 | |
| 252 | For an interrupt controller driver to support hierarchy irq_domain, it |
| 253 | needs to: |
Mauro Carvalho Chehab | 1642a1e | 2017-05-14 15:29:55 -0300 | [diff] [blame] | 254 | |
Jiang Liu | f8264e3 | 2014-11-06 22:20:14 +0800 | [diff] [blame] | 255 | 1) Implement irq_domain_ops.alloc and irq_domain_ops.free |
| 256 | 2) Optionally implement irq_domain_ops.activate and |
| 257 | irq_domain_ops.deactivate. |
| 258 | 3) Optionally implement an irq_chip to manage the interrupt controller |
| 259 | hardware. |
| 260 | 4) No need to implement irq_domain_ops.map and irq_domain_ops.unmap, |
| 261 | they are unused with hierarchy irq_domain. |
| 262 | |
Marc Zyngier | 5328666 | 2017-05-12 12:55:38 +0100 | [diff] [blame] | 263 | Hierarchy irq_domain is in no way x86 specific, and is heavily used to |
| 264 | support other architectures, such as ARM, ARM64 etc. |
| 265 | |
| 266 | === Debugging === |
| 267 | |
| 268 | If you switch on CONFIG_IRQ_DOMAIN_DEBUG (which depends on |
| 269 | CONFIG_IRQ_DOMAIN and CONFIG_DEBUG_FS), you will find a new file in |
| 270 | your debugfs mount point, called irq_domain_mapping. This file |
| 271 | contains a live snapshot of all the IRQ domains in the system: |
| 272 | |
| 273 | name mapped linear-max direct-max devtree-node |
| 274 | pl061 8 8 0 /smb/gpio@e0080000 |
| 275 | pl061 8 8 0 /smb/gpio@e1050000 |
| 276 | pMSI 0 0 0 /interrupt-controller@e1101000/v2m@e0080000 |
| 277 | MSI 37 0 0 /interrupt-controller@e1101000/v2m@e0080000 |
| 278 | GICv2m 37 0 0 /interrupt-controller@e1101000/v2m@e0080000 |
| 279 | GICv2 448 448 0 /interrupt-controller@e1101000 |
| 280 | |
| 281 | it also iterates over the interrupts to display their mapping in the |
| 282 | domains, and makes the domain stacking visible: |
| 283 | |
| 284 | |
| 285 | irq hwirq chip name chip data active type domain |
| 286 | 1 0x00019 GICv2 0xffff00000916bfd8 * LINEAR GICv2 |
| 287 | 2 0x0001d GICv2 0xffff00000916bfd8 LINEAR GICv2 |
| 288 | 3 0x0001e GICv2 0xffff00000916bfd8 * LINEAR GICv2 |
| 289 | 4 0x0001b GICv2 0xffff00000916bfd8 * LINEAR GICv2 |
| 290 | 5 0x0001a GICv2 0xffff00000916bfd8 LINEAR GICv2 |
| 291 | [...] |
| 292 | 96 0x81808 MSI 0x (null) RADIX MSI |
| 293 | 96+ 0x00063 GICv2m 0xffff8003ee116980 RADIX GICv2m |
| 294 | 96+ 0x00063 GICv2 0xffff00000916bfd8 LINEAR GICv2 |
| 295 | 97 0x08800 MSI 0x (null) * RADIX MSI |
| 296 | 97+ 0x00064 GICv2m 0xffff8003ee116980 * RADIX GICv2m |
| 297 | 97+ 0x00064 GICv2 0xffff00000916bfd8 * LINEAR GICv2 |
| 298 | |
| 299 | Here, interrupts 1-5 are only using a single domain, while 96 and 97 |
| 300 | are build out of a stack of three domain, each level performing a |
| 301 | particular function. |