Peter P Waskiewicz Jr | a093bf0 | 2007-06-28 20:45:47 -0700 | [diff] [blame] | 1 | |
| 2 | HOWTO for multiqueue network device support |
| 3 | =========================================== |
| 4 | |
| 5 | Section 1: Base driver requirements for implementing multiqueue support |
| 6 | Section 2: Qdisc support for multiqueue devices |
| 7 | Section 3: Brief howto using PRIO or RR for multiqueue devices |
| 8 | |
| 9 | |
| 10 | Intro: Kernel support for multiqueue devices |
| 11 | --------------------------------------------------------- |
| 12 | |
| 13 | Kernel support for multiqueue devices is only an API that is presented to the |
| 14 | netdevice layer for base drivers to implement. This feature is part of the |
| 15 | core networking stack, and all network devices will be running on the |
| 16 | multiqueue-aware stack. If a base driver only has one queue, then these |
| 17 | changes are transparent to that driver. |
| 18 | |
| 19 | |
| 20 | Section 1: Base driver requirements for implementing multiqueue support |
| 21 | ----------------------------------------------------------------------- |
| 22 | |
| 23 | Base drivers are required to use the new alloc_etherdev_mq() or |
| 24 | alloc_netdev_mq() functions to allocate the subqueues for the device. The |
| 25 | underlying kernel API will take care of the allocation and deallocation of |
| 26 | the subqueue memory, as well as netdev configuration of where the queues |
| 27 | exist in memory. |
| 28 | |
| 29 | The base driver will also need to manage the queues as it does the global |
| 30 | netdev->queue_lock today. Therefore base drivers should use the |
| 31 | netif_{start|stop|wake}_subqueue() functions to manage each queue while the |
| 32 | device is still operational. netdev->queue_lock is still used when the device |
| 33 | comes online or when it's completely shut down (unregister_netdev(), etc.). |
| 34 | |
| 35 | Finally, the base driver should indicate that it is a multiqueue device. The |
| 36 | feature flag NETIF_F_MULTI_QUEUE should be added to the netdev->features |
| 37 | bitmap on device initialization. Below is an example from e1000: |
| 38 | |
| 39 | #ifdef CONFIG_E1000_MQ |
| 40 | if ( (adapter->hw.mac.type == e1000_82571) || |
| 41 | (adapter->hw.mac.type == e1000_82572) || |
| 42 | (adapter->hw.mac.type == e1000_80003es2lan)) |
| 43 | netdev->features |= NETIF_F_MULTI_QUEUE; |
| 44 | #endif |
| 45 | |
| 46 | |
| 47 | Section 2: Qdisc support for multiqueue devices |
| 48 | ----------------------------------------------- |
| 49 | |
| 50 | Currently two qdiscs support multiqueue devices. A new round-robin qdisc, |
| 51 | sch_rr, and sch_prio. The qdisc is responsible for classifying the skb's to |
| 52 | bands and queues, and will store the queue mapping into skb->queue_mapping. |
| 53 | Use this field in the base driver to determine which queue to send the skb |
| 54 | to. |
| 55 | |
| 56 | sch_rr has been added for hardware that doesn't want scheduling policies from |
| 57 | software, so it's a straight round-robin qdisc. It uses the same syntax and |
| 58 | classification priomap that sch_prio uses, so it should be intuitive to |
| 59 | configure for people who've used sch_prio. |
| 60 | |
Peter P Waskiewicz Jr | fdd8a53 | 2007-09-11 11:12:06 +0200 | [diff] [blame] | 61 | In order to utilitize the multiqueue features of the qdiscs, the network |
| 62 | device layer needs to enable multiple queue support. This can be done by |
| 63 | selecting NETDEVICES_MULTIQUEUE under Drivers. |
| 64 | |
| 65 | The PRIO qdisc naturally plugs into a multiqueue device. If |
| 66 | NETDEVICES_MULTIQUEUE is selected, then on qdisc load, the number of |
| 67 | bands requested is compared to the number of queues on the hardware. If they |
Peter P Waskiewicz Jr | a093bf0 | 2007-06-28 20:45:47 -0700 | [diff] [blame] | 68 | are equal, it sets a one-to-one mapping up between the queues and bands. If |
| 69 | they're not equal, it will not load the qdisc. This is the same behavior |
| 70 | for RR. Once the association is made, any skb that is classified will have |
| 71 | skb->queue_mapping set, which will allow the driver to properly queue skb's |
| 72 | to multiple queues. |
| 73 | |
| 74 | |
| 75 | Section 3: Brief howto using PRIO and RR for multiqueue devices |
| 76 | --------------------------------------------------------------- |
| 77 | |
| 78 | The userspace command 'tc,' part of the iproute2 package, is used to configure |
| 79 | qdiscs. To add the PRIO qdisc to your network device, assuming the device is |
| 80 | called eth0, run the following command: |
| 81 | |
| 82 | # tc qdisc add dev eth0 root handle 1: prio bands 4 multiqueue |
| 83 | |
| 84 | This will create 4 bands, 0 being highest priority, and associate those bands |
| 85 | to the queues on your NIC. Assuming eth0 has 4 Tx queues, the band mapping |
| 86 | would look like: |
| 87 | |
| 88 | band 0 => queue 0 |
| 89 | band 1 => queue 1 |
| 90 | band 2 => queue 2 |
| 91 | band 3 => queue 3 |
| 92 | |
| 93 | Traffic will begin flowing through each queue if your TOS values are assigning |
| 94 | traffic across the various bands. For example, ssh traffic will always try to |
| 95 | go out band 0 based on TOS -> Linux priority conversion (realtime traffic), |
| 96 | so it will be sent out queue 0. ICMP traffic (pings) fall into the "normal" |
| 97 | traffic classification, which is band 1. Therefore pings will be send out |
| 98 | queue 1 on the NIC. |
| 99 | |
| 100 | Note the use of the multiqueue keyword. This is only in versions of iproute2 |
| 101 | that support multiqueue networking devices; if this is omitted when loading |
| 102 | a qdisc onto a multiqueue device, the qdisc will load and operate the same |
| 103 | if it were loaded onto a single-queue device (i.e. - sends all traffic to |
| 104 | queue 0). |
| 105 | |
| 106 | Another alternative to multiqueue band allocation can be done by using the |
| 107 | multiqueue option and specify 0 bands. If this is the case, the qdisc will |
| 108 | allocate the number of bands to equal the number of queues that the device |
| 109 | reports, and bring the qdisc online. |
| 110 | |
| 111 | The behavior of tc filters remains the same, where it will override TOS priority |
| 112 | classification. |
| 113 | |
| 114 | |
| 115 | Author: Peter P. Waskiewicz Jr. <peter.p.waskiewicz.jr@intel.com> |