Kirti Wankhede | 8e1c5a4 | 2016-11-17 02:16:31 +0530 | [diff] [blame] | 1 | /* |
| 2 | * VFIO Mediated devices |
| 3 | * |
| 4 | * Copyright (c) 2016, NVIDIA CORPORATION. All rights reserved. |
| 5 | * Author: Neo Jia <cjia@nvidia.com> |
| 6 | * Kirti Wankhede <kwankhede@nvidia.com> |
| 7 | * |
| 8 | * This program is free software; you can redistribute it and/or modify |
| 9 | * it under the terms of the GNU General Public License version 2 as |
| 10 | * published by the Free Software Foundation. |
| 11 | */ |
| 12 | |
| 13 | Virtual Function I/O (VFIO) Mediated devices[1] |
| 14 | =============================================== |
| 15 | |
| 16 | The number of use cases for virtualizing DMA devices that do not have built-in |
| 17 | SR_IOV capability is increasing. Previously, to virtualize such devices, |
| 18 | developers had to create their own management interfaces and APIs, and then |
| 19 | integrate them with user space software. To simplify integration with user space |
| 20 | software, we have identified common requirements and a unified management |
| 21 | interface for such devices. |
| 22 | |
| 23 | The VFIO driver framework provides unified APIs for direct device access. It is |
| 24 | an IOMMU/device-agnostic framework for exposing direct device access to user |
| 25 | space in a secure, IOMMU-protected environment. This framework is used for |
| 26 | multiple devices, such as GPUs, network adapters, and compute accelerators. With |
| 27 | direct device access, virtual machines or user space applications have direct |
| 28 | access to the physical device. This framework is reused for mediated devices. |
| 29 | |
| 30 | The mediated core driver provides a common interface for mediated device |
| 31 | management that can be used by drivers of different devices. This module |
| 32 | provides a generic interface to perform these operations: |
| 33 | |
| 34 | * Create and destroy a mediated device |
| 35 | * Add a mediated device to and remove it from a mediated bus driver |
| 36 | * Add a mediated device to and remove it from an IOMMU group |
| 37 | |
| 38 | The mediated core driver also provides an interface to register a bus driver. |
| 39 | For example, the mediated VFIO mdev driver is designed for mediated devices and |
| 40 | supports VFIO APIs. The mediated bus driver adds a mediated device to and |
| 41 | removes it from a VFIO group. |
| 42 | |
| 43 | The following high-level block diagram shows the main components and interfaces |
| 44 | in the VFIO mediated driver framework. The diagram shows NVIDIA, Intel, and IBM |
| 45 | devices as examples, as these devices are the first devices to use this module. |
| 46 | |
| 47 | +---------------+ |
| 48 | | | |
| 49 | | +-----------+ | mdev_register_driver() +--------------+ |
| 50 | | | | +<------------------------+ | |
| 51 | | | mdev | | | | |
| 52 | | | bus | +------------------------>+ vfio_mdev.ko |<-> VFIO user |
| 53 | | | driver | | probe()/remove() | | APIs |
| 54 | | | | | +--------------+ |
| 55 | | +-----------+ | |
| 56 | | | |
| 57 | | MDEV CORE | |
| 58 | | MODULE | |
| 59 | | mdev.ko | |
| 60 | | +-----------+ | mdev_register_device() +--------------+ |
| 61 | | | | +<------------------------+ | |
| 62 | | | | | | nvidia.ko |<-> physical |
| 63 | | | | +------------------------>+ | device |
| 64 | | | | | callbacks +--------------+ |
| 65 | | | Physical | | |
| 66 | | | device | | mdev_register_device() +--------------+ |
| 67 | | | interface | |<------------------------+ | |
| 68 | | | | | | i915.ko |<-> physical |
| 69 | | | | +------------------------>+ | device |
| 70 | | | | | callbacks +--------------+ |
| 71 | | | | | |
| 72 | | | | | mdev_register_device() +--------------+ |
| 73 | | | | +<------------------------+ | |
| 74 | | | | | | ccw_device.ko|<-> physical |
| 75 | | | | +------------------------>+ | device |
| 76 | | | | | callbacks +--------------+ |
| 77 | | +-----------+ | |
| 78 | +---------------+ |
| 79 | |
| 80 | |
| 81 | Registration Interfaces |
| 82 | ======================= |
| 83 | |
| 84 | The mediated core driver provides the following types of registration |
| 85 | interfaces: |
| 86 | |
| 87 | * Registration interface for a mediated bus driver |
| 88 | * Physical device driver interface |
| 89 | |
| 90 | Registration Interface for a Mediated Bus Driver |
| 91 | ------------------------------------------------ |
| 92 | |
| 93 | The registration interface for a mediated bus driver provides the following |
| 94 | structure to represent a mediated device's driver: |
| 95 | |
| 96 | /* |
| 97 | * struct mdev_driver [2] - Mediated device's driver |
| 98 | * @name: driver name |
| 99 | * @probe: called when new device created |
| 100 | * @remove: called when device removed |
| 101 | * @driver: device driver structure |
| 102 | */ |
| 103 | struct mdev_driver { |
| 104 | const char *name; |
| 105 | int (*probe) (struct device *dev); |
| 106 | void (*remove) (struct device *dev); |
| 107 | struct device_driver driver; |
| 108 | }; |
| 109 | |
| 110 | A mediated bus driver for mdev should use this structure in the function calls |
| 111 | to register and unregister itself with the core driver: |
| 112 | |
| 113 | * Register: |
| 114 | |
| 115 | extern int mdev_register_driver(struct mdev_driver *drv, |
| 116 | struct module *owner); |
| 117 | |
| 118 | * Unregister: |
| 119 | |
| 120 | extern void mdev_unregister_driver(struct mdev_driver *drv); |
| 121 | |
| 122 | The mediated bus driver is responsible for adding mediated devices to the VFIO |
| 123 | group when devices are bound to the driver and removing mediated devices from |
| 124 | the VFIO when devices are unbound from the driver. |
| 125 | |
| 126 | |
| 127 | Physical Device Driver Interface |
| 128 | -------------------------------- |
| 129 | |
Alex Williamson | 4293055 | 2016-12-30 08:13:38 -0700 | [diff] [blame] | 130 | The physical device driver interface provides the mdev_parent_ops[3] structure |
| 131 | to define the APIs to manage work in the mediated core driver that is related |
| 132 | to the physical device. |
Kirti Wankhede | 8e1c5a4 | 2016-11-17 02:16:31 +0530 | [diff] [blame] | 133 | |
Alex Williamson | 4293055 | 2016-12-30 08:13:38 -0700 | [diff] [blame] | 134 | The structures in the mdev_parent_ops structure are as follows: |
Kirti Wankhede | 8e1c5a4 | 2016-11-17 02:16:31 +0530 | [diff] [blame] | 135 | |
| 136 | * dev_attr_groups: attributes of the parent device |
| 137 | * mdev_attr_groups: attributes of the mediated device |
| 138 | * supported_config: attributes to define supported configurations |
| 139 | |
Alex Williamson | 4293055 | 2016-12-30 08:13:38 -0700 | [diff] [blame] | 140 | The functions in the mdev_parent_ops structure are as follows: |
Kirti Wankhede | 8e1c5a4 | 2016-11-17 02:16:31 +0530 | [diff] [blame] | 141 | |
| 142 | * create: allocate basic resources in a driver for a mediated device |
| 143 | * remove: free resources in a driver when a mediated device is destroyed |
| 144 | |
Alex Williamson | 4293055 | 2016-12-30 08:13:38 -0700 | [diff] [blame] | 145 | The callbacks in the mdev_parent_ops structure are as follows: |
Kirti Wankhede | 8e1c5a4 | 2016-11-17 02:16:31 +0530 | [diff] [blame] | 146 | |
| 147 | * open: open callback of mediated device |
| 148 | * close: close callback of mediated device |
| 149 | * ioctl: ioctl callback of mediated device |
| 150 | * read : read emulation callback |
| 151 | * write: write emulation callback |
| 152 | * mmap: mmap emulation callback |
| 153 | |
Alex Williamson | 4293055 | 2016-12-30 08:13:38 -0700 | [diff] [blame] | 154 | A driver should use the mdev_parent_ops structure in the function call to |
| 155 | register itself with the mdev core driver: |
Kirti Wankhede | 8e1c5a4 | 2016-11-17 02:16:31 +0530 | [diff] [blame] | 156 | |
| 157 | extern int mdev_register_device(struct device *dev, |
Alex Williamson | 4293055 | 2016-12-30 08:13:38 -0700 | [diff] [blame] | 158 | const struct mdev_parent_ops *ops); |
Kirti Wankhede | 8e1c5a4 | 2016-11-17 02:16:31 +0530 | [diff] [blame] | 159 | |
Alex Williamson | 4293055 | 2016-12-30 08:13:38 -0700 | [diff] [blame] | 160 | However, the mdev_parent_ops structure is not required in the function call |
| 161 | that a driver should use to unregister itself with the mdev core driver: |
Kirti Wankhede | 8e1c5a4 | 2016-11-17 02:16:31 +0530 | [diff] [blame] | 162 | |
| 163 | extern void mdev_unregister_device(struct device *dev); |
| 164 | |
| 165 | |
| 166 | Mediated Device Management Interface Through sysfs |
| 167 | ================================================== |
| 168 | |
| 169 | The management interface through sysfs enables user space software, such as |
| 170 | libvirt, to query and configure mediated devices in a hardware-agnostic fashion. |
| 171 | This management interface provides flexibility to the underlying physical |
| 172 | device's driver to support features such as: |
| 173 | |
| 174 | * Mediated device hot plug |
| 175 | * Multiple mediated devices in a single virtual machine |
| 176 | * Multiple mediated devices from different physical devices |
| 177 | |
| 178 | Links in the mdev_bus Class Directory |
| 179 | ------------------------------------- |
| 180 | The /sys/class/mdev_bus/ directory contains links to devices that are registered |
| 181 | with the mdev core driver. |
| 182 | |
| 183 | Directories and files under the sysfs for Each Physical Device |
| 184 | -------------------------------------------------------------- |
| 185 | |
| 186 | |- [parent physical device] |
| 187 | |--- Vendor-specific-attributes [optional] |
| 188 | |--- [mdev_supported_types] |
| 189 | | |--- [<type-id>] |
| 190 | | | |--- create |
| 191 | | | |--- name |
| 192 | | | |--- available_instances |
| 193 | | | |--- device_api |
| 194 | | | |--- description |
| 195 | | | |--- [devices] |
| 196 | | |--- [<type-id>] |
| 197 | | | |--- create |
| 198 | | | |--- name |
| 199 | | | |--- available_instances |
| 200 | | | |--- device_api |
| 201 | | | |--- description |
| 202 | | | |--- [devices] |
| 203 | | |--- [<type-id>] |
| 204 | | |--- create |
| 205 | | |--- name |
| 206 | | |--- available_instances |
| 207 | | |--- device_api |
| 208 | | |--- description |
| 209 | | |--- [devices] |
| 210 | |
| 211 | * [mdev_supported_types] |
| 212 | |
| 213 | The list of currently supported mediated device types and their details. |
| 214 | |
| 215 | [<type-id>], device_api, and available_instances are mandatory attributes |
| 216 | that should be provided by vendor driver. |
| 217 | |
| 218 | * [<type-id>] |
| 219 | |
Stan Drozd | 1c4f128 | 2017-04-21 13:07:10 +0200 | [diff] [blame] | 220 | The [<type-id>] name is created by adding the device driver string as a prefix |
| 221 | to the string provided by the vendor driver. This format of this name is as |
| 222 | follows: |
Kirti Wankhede | 8e1c5a4 | 2016-11-17 02:16:31 +0530 | [diff] [blame] | 223 | |
| 224 | sprintf(buf, "%s-%s", dev_driver_string(parent->dev), group->name); |
| 225 | |
Alex Williamson | 9372e6fe | 2016-12-30 08:13:41 -0700 | [diff] [blame] | 226 | (or using mdev_parent_dev(mdev) to arrive at the parent device outside |
| 227 | of the core mdev code) |
| 228 | |
Kirti Wankhede | 8e1c5a4 | 2016-11-17 02:16:31 +0530 | [diff] [blame] | 229 | * device_api |
| 230 | |
| 231 | This attribute should show which device API is being created, for example, |
| 232 | "vfio-pci" for a PCI device. |
| 233 | |
| 234 | * available_instances |
| 235 | |
| 236 | This attribute should show the number of devices of type <type-id> that can be |
| 237 | created. |
| 238 | |
| 239 | * [device] |
| 240 | |
| 241 | This directory contains links to the devices of type <type-id> that have been |
| 242 | created. |
| 243 | |
| 244 | * name |
| 245 | |
| 246 | This attribute should show human readable name. This is optional attribute. |
| 247 | |
| 248 | * description |
| 249 | |
| 250 | This attribute should show brief features/description of the type. This is |
| 251 | optional attribute. |
| 252 | |
| 253 | Directories and Files Under the sysfs for Each mdev Device |
| 254 | ---------------------------------------------------------- |
| 255 | |
| 256 | |- [parent phy device] |
| 257 | |--- [$MDEV_UUID] |
| 258 | |--- remove |
| 259 | |--- mdev_type {link to its type} |
| 260 | |--- vendor-specific-attributes [optional] |
| 261 | |
| 262 | * remove (write only) |
| 263 | Writing '1' to the 'remove' file destroys the mdev device. The vendor driver can |
| 264 | fail the remove() callback if that device is active and the vendor driver |
| 265 | doesn't support hot unplug. |
| 266 | |
| 267 | Example: |
| 268 | # echo 1 > /sys/bus/mdev/devices/$mdev_UUID/remove |
| 269 | |
| 270 | Mediated device Hot plug: |
| 271 | ------------------------ |
| 272 | |
| 273 | Mediated devices can be created and assigned at runtime. The procedure to hot |
| 274 | plug a mediated device is the same as the procedure to hot plug a PCI device. |
| 275 | |
| 276 | Translation APIs for Mediated Devices |
| 277 | ===================================== |
| 278 | |
| 279 | The following APIs are provided for translating user pfn to host pfn in a VFIO |
| 280 | driver: |
| 281 | |
| 282 | extern int vfio_pin_pages(struct device *dev, unsigned long *user_pfn, |
| 283 | int npage, int prot, unsigned long *phys_pfn); |
| 284 | |
| 285 | extern int vfio_unpin_pages(struct device *dev, unsigned long *user_pfn, |
| 286 | int npage); |
| 287 | |
| 288 | These functions call back into the back-end IOMMU module by using the pin_pages |
| 289 | and unpin_pages callbacks of the struct vfio_iommu_driver_ops[4]. Currently |
| 290 | these callbacks are supported in the TYPE1 IOMMU module. To enable them for |
| 291 | other IOMMU backend modules, such as PPC64 sPAPR module, they need to provide |
| 292 | these two callback functions. |
| 293 | |
Kirti Wankhede | 9d1a546 | 2016-11-17 02:16:33 +0530 | [diff] [blame] | 294 | Using the Sample Code |
| 295 | ===================== |
| 296 | |
| 297 | mtty.c in samples/vfio-mdev/ directory is a sample driver program to |
| 298 | demonstrate how to use the mediated device framework. |
| 299 | |
| 300 | The sample driver creates an mdev device that simulates a serial port over a PCI |
| 301 | card. |
| 302 | |
| 303 | 1. Build and load the mtty.ko module. |
| 304 | |
| 305 | This step creates a dummy device, /sys/devices/virtual/mtty/mtty/ |
| 306 | |
| 307 | Files in this device directory in sysfs are similar to the following: |
| 308 | |
| 309 | # tree /sys/devices/virtual/mtty/mtty/ |
| 310 | /sys/devices/virtual/mtty/mtty/ |
| 311 | |-- mdev_supported_types |
| 312 | | |-- mtty-1 |
| 313 | | | |-- available_instances |
| 314 | | | |-- create |
| 315 | | | |-- device_api |
| 316 | | | |-- devices |
| 317 | | | `-- name |
| 318 | | `-- mtty-2 |
| 319 | | |-- available_instances |
| 320 | | |-- create |
| 321 | | |-- device_api |
| 322 | | |-- devices |
| 323 | | `-- name |
| 324 | |-- mtty_dev |
| 325 | | `-- sample_mtty_dev |
| 326 | |-- power |
| 327 | | |-- autosuspend_delay_ms |
| 328 | | |-- control |
| 329 | | |-- runtime_active_time |
| 330 | | |-- runtime_status |
| 331 | | `-- runtime_suspended_time |
| 332 | |-- subsystem -> ../../../../class/mtty |
| 333 | `-- uevent |
| 334 | |
| 335 | 2. Create a mediated device by using the dummy device that you created in the |
| 336 | previous step. |
| 337 | |
| 338 | # echo "83b8f4f2-509f-382f-3c1e-e6bfe0fa1001" > \ |
| 339 | /sys/devices/virtual/mtty/mtty/mdev_supported_types/mtty-2/create |
| 340 | |
| 341 | 3. Add parameters to qemu-kvm. |
| 342 | |
| 343 | -device vfio-pci,\ |
| 344 | sysfsdev=/sys/bus/mdev/devices/83b8f4f2-509f-382f-3c1e-e6bfe0fa1001 |
| 345 | |
| 346 | 4. Boot the VM. |
| 347 | |
| 348 | In the Linux guest VM, with no hardware on the host, the device appears |
| 349 | as follows: |
| 350 | |
| 351 | # lspci -s 00:05.0 -xxvv |
| 352 | 00:05.0 Serial controller: Device 4348:3253 (rev 10) (prog-if 02 [16550]) |
| 353 | Subsystem: Device 4348:3253 |
| 354 | Physical Slot: 5 |
| 355 | Control: I/O+ Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- |
| 356 | Stepping- SERR- FastB2B- DisINTx- |
| 357 | Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- |
| 358 | <TAbort- <MAbort- >SERR- <PERR- INTx- |
| 359 | Interrupt: pin A routed to IRQ 10 |
| 360 | Region 0: I/O ports at c150 [size=8] |
| 361 | Region 1: I/O ports at c158 [size=8] |
| 362 | Kernel driver in use: serial |
| 363 | 00: 48 43 53 32 01 00 00 02 10 02 00 07 00 00 00 00 |
| 364 | 10: 51 c1 00 00 59 c1 00 00 00 00 00 00 00 00 00 00 |
| 365 | 20: 00 00 00 00 00 00 00 00 00 00 00 00 48 43 53 32 |
| 366 | 30: 00 00 00 00 00 00 00 00 00 00 00 00 0a 01 00 00 |
| 367 | |
| 368 | In the Linux guest VM, dmesg output for the device is as follows: |
| 369 | |
| 370 | serial 0000:00:05.0: PCI INT A -> Link[LNKA] -> GSI 10 (level, high) -> IRQ |
| 371 | 10 |
| 372 | 0000:00:05.0: ttyS1 at I/O 0xc150 (irq = 10) is a 16550A |
| 373 | 0000:00:05.0: ttyS2 at I/O 0xc158 (irq = 10) is a 16550A |
| 374 | |
| 375 | |
| 376 | 5. In the Linux guest VM, check the serial ports. |
| 377 | |
| 378 | # setserial -g /dev/ttyS* |
| 379 | /dev/ttyS0, UART: 16550A, Port: 0x03f8, IRQ: 4 |
| 380 | /dev/ttyS1, UART: 16550A, Port: 0xc150, IRQ: 10 |
| 381 | /dev/ttyS2, UART: 16550A, Port: 0xc158, IRQ: 10 |
| 382 | |
Tamara Diaconita | ce8cd40 | 2017-03-16 17:42:16 +0200 | [diff] [blame] | 383 | 6. Using minicom or any terminal emulation program, open port /dev/ttyS1 or |
Kirti Wankhede | 9d1a546 | 2016-11-17 02:16:33 +0530 | [diff] [blame] | 384 | /dev/ttyS2 with hardware flow control disabled. |
| 385 | |
| 386 | 7. Type data on the minicom terminal or send data to the terminal emulation |
| 387 | program and read the data. |
| 388 | |
| 389 | Data is loop backed from hosts mtty driver. |
| 390 | |
| 391 | 8. Destroy the mediated device that you created. |
| 392 | |
| 393 | # echo 1 > /sys/bus/mdev/devices/83b8f4f2-509f-382f-3c1e-e6bfe0fa1001/remove |
| 394 | |
Kirti Wankhede | 8e1c5a4 | 2016-11-17 02:16:31 +0530 | [diff] [blame] | 395 | References |
Kirti Wankhede | 9d1a546 | 2016-11-17 02:16:33 +0530 | [diff] [blame] | 396 | ========== |
Kirti Wankhede | 8e1c5a4 | 2016-11-17 02:16:31 +0530 | [diff] [blame] | 397 | |
| 398 | [1] See Documentation/vfio.txt for more information on VFIO. |
| 399 | [2] struct mdev_driver in include/linux/mdev.h |
Alex Williamson | 4293055 | 2016-12-30 08:13:38 -0700 | [diff] [blame] | 400 | [3] struct mdev_parent_ops in include/linux/mdev.h |
Kirti Wankhede | 8e1c5a4 | 2016-11-17 02:16:31 +0530 | [diff] [blame] | 401 | [4] struct vfio_iommu_driver_ops in include/linux/vfio.h |