David Ahern | 562d897 | 2015-09-15 10:50:14 -0600 | [diff] [blame] | 1 | Virtual Routing and Forwarding (VRF) |
| 2 | ==================================== |
| 3 | The VRF device combined with ip rules provides the ability to create virtual |
| 4 | routing and forwarding domains (aka VRFs, VRF-lite to be specific) in the |
| 5 | Linux network stack. One use case is the multi-tenancy problem where each |
| 6 | tenant has their own unique routing tables and in the very least need |
| 7 | different default gateways. |
| 8 | |
| 9 | Processes can be "VRF aware" by binding a socket to the VRF device. Packets |
| 10 | through the socket then use the routing table associated with the VRF |
| 11 | device. An important feature of the VRF device implementation is that it |
| 12 | impacts only Layer 3 and above so L2 tools (e.g., LLDP) are not affected |
| 13 | (ie., they do not need to be run in each VRF). The design also allows |
| 14 | the use of higher priority ip rules (Policy Based Routing, PBR) to take |
| 15 | precedence over the VRF device rules directing specific traffic as desired. |
| 16 | |
| 17 | In addition, VRF devices allow VRFs to be nested within namespaces. For |
| 18 | example network namespaces provide separation of network interfaces at L1 |
| 19 | (Layer 1 separation), VLANs on the interfaces within a namespace provide |
| 20 | L2 separation and then VRF devices provide L3 separation. |
| 21 | |
| 22 | Design |
| 23 | ------ |
| 24 | A VRF device is created with an associated route table. Network interfaces |
| 25 | are then enslaved to a VRF device: |
| 26 | |
| 27 | +-----------------------------+ |
| 28 | | vrf-blue | ===> route table 10 |
| 29 | +-----------------------------+ |
| 30 | | | | |
| 31 | +------+ +------+ +-------------+ |
| 32 | | eth1 | | eth2 | ... | bond1 | |
| 33 | +------+ +------+ +-------------+ |
| 34 | | | |
| 35 | +------+ +------+ |
| 36 | | eth8 | | eth9 | |
| 37 | +------+ +------+ |
| 38 | |
| 39 | Packets received on an enslaved device and are switched to the VRF device |
| 40 | using an rx_handler which gives the impression that packets flow through |
| 41 | the VRF device. Similarly on egress routing rules are used to send packets |
| 42 | to the VRF device driver before getting sent out the actual interface. This |
| 43 | allows tcpdump on a VRF device to capture all packets into and out of the |
| 44 | VRF as a whole.[1] Similiarly, netfilter [2] and tc rules can be applied |
| 45 | using the VRF device to specify rules that apply to the VRF domain as a whole. |
| 46 | |
| 47 | [1] Packets in the forwarded state do not flow through the device, so those |
| 48 | packets are not seen by tcpdump. Will revisit this limitation in a |
| 49 | future release. |
| 50 | |
| 51 | [2] Iptables on ingress is limited to NF_INET_PRE_ROUTING only with skb->dev |
| 52 | set to real ingress device and egress is limited to NF_INET_POST_ROUTING. |
| 53 | Will revisit this limitation in a future release. |
| 54 | |
| 55 | |
| 56 | Setup |
| 57 | ----- |
| 58 | 1. VRF device is created with an association to a FIB table. |
| 59 | e.g, ip link add vrf-blue type vrf table 10 |
| 60 | ip link set dev vrf-blue up |
| 61 | |
| 62 | 2. Rules are added that send lookups to the associated FIB table when the |
| 63 | iif or oif is the VRF device. e.g., |
| 64 | ip ru add oif vrf-blue table 10 |
| 65 | ip ru add iif vrf-blue table 10 |
| 66 | |
| 67 | Set the default route for the table (and hence default route for the VRF). |
| 68 | e.g, ip route add table 10 prohibit default |
| 69 | |
| 70 | 3. Enslave L3 interfaces to a VRF device. |
| 71 | e.g, ip link set dev eth1 master vrf-blue |
| 72 | |
| 73 | Local and connected routes for enslaved devices are automatically moved to |
| 74 | the table associated with VRF device. Any additional routes depending on |
| 75 | the enslaved device will need to be reinserted following the enslavement. |
| 76 | |
| 77 | 4. Additional VRF routes are added to associated table. |
| 78 | e.g., ip route add table 10 ... |
| 79 | |
| 80 | |
| 81 | Applications |
| 82 | ------------ |
| 83 | Applications that are to work within a VRF need to bind their socket to the |
| 84 | VRF device: |
| 85 | |
| 86 | setsockopt(sd, SOL_SOCKET, SO_BINDTODEVICE, dev, strlen(dev)+1); |
| 87 | |
| 88 | or to specify the output device using cmsg and IP_PKTINFO. |
| 89 | |
| 90 | |
| 91 | Limitations |
| 92 | ----------- |
| 93 | VRF device currently only works for IPv4. Support for IPv6 is under development. |
| 94 | |
| 95 | Index of original ingress interface is not available via cmsg. Will address |
| 96 | soon. |