| Virtual Routing and Forwarding (VRF) |
| ==================================== |
| The VRF device combined with ip rules provides the ability to create virtual |
| routing and forwarding domains (aka VRFs, VRF-lite to be specific) in the |
| Linux network stack. One use case is the multi-tenancy problem where each |
| tenant has their own unique routing tables and in the very least need |
| different default gateways. |
| |
| Processes can be "VRF aware" by binding a socket to the VRF device. Packets |
| through the socket then use the routing table associated with the VRF |
| device. An important feature of the VRF device implementation is that it |
| impacts only Layer 3 and above so L2 tools (e.g., LLDP) are not affected |
| (ie., they do not need to be run in each VRF). The design also allows |
| the use of higher priority ip rules (Policy Based Routing, PBR) to take |
| precedence over the VRF device rules directing specific traffic as desired. |
| |
| In addition, VRF devices allow VRFs to be nested within namespaces. For |
| example network namespaces provide separation of network interfaces at L1 |
| (Layer 1 separation), VLANs on the interfaces within a namespace provide |
| L2 separation and then VRF devices provide L3 separation. |
| |
| Design |
| ------ |
| A VRF device is created with an associated route table. Network interfaces |
| are then enslaved to a VRF device: |
| |
| +-----------------------------+ |
| | vrf-blue | ===> route table 10 |
| +-----------------------------+ |
| | | | |
| +------+ +------+ +-------------+ |
| | eth1 | | eth2 | ... | bond1 | |
| +------+ +------+ +-------------+ |
| | | |
| +------+ +------+ |
| | eth8 | | eth9 | |
| +------+ +------+ |
| |
| Packets received on an enslaved device and are switched to the VRF device |
| using an rx_handler which gives the impression that packets flow through |
| the VRF device. Similarly on egress routing rules are used to send packets |
| to the VRF device driver before getting sent out the actual interface. This |
| allows tcpdump on a VRF device to capture all packets into and out of the |
| VRF as a whole.[1] Similiarly, netfilter [2] and tc rules can be applied |
| using the VRF device to specify rules that apply to the VRF domain as a whole. |
| |
| [1] Packets in the forwarded state do not flow through the device, so those |
| packets are not seen by tcpdump. Will revisit this limitation in a |
| future release. |
| |
| [2] Iptables on ingress is limited to NF_INET_PRE_ROUTING only with skb->dev |
| set to real ingress device and egress is limited to NF_INET_POST_ROUTING. |
| Will revisit this limitation in a future release. |
| |
| |
| Setup |
| ----- |
| 1. VRF device is created with an association to a FIB table. |
| e.g, ip link add vrf-blue type vrf table 10 |
| ip link set dev vrf-blue up |
| |
| 2. Rules are added that send lookups to the associated FIB table when the |
| iif or oif is the VRF device. e.g., |
| ip ru add oif vrf-blue table 10 |
| ip ru add iif vrf-blue table 10 |
| |
| Set the default route for the table (and hence default route for the VRF). |
| e.g, ip route add table 10 prohibit default |
| |
| 3. Enslave L3 interfaces to a VRF device. |
| e.g, ip link set dev eth1 master vrf-blue |
| |
| Local and connected routes for enslaved devices are automatically moved to |
| the table associated with VRF device. Any additional routes depending on |
| the enslaved device will need to be reinserted following the enslavement. |
| |
| 4. Additional VRF routes are added to associated table. |
| e.g., ip route add table 10 ... |
| |
| |
| Applications |
| ------------ |
| Applications that are to work within a VRF need to bind their socket to the |
| VRF device: |
| |
| setsockopt(sd, SOL_SOCKET, SO_BINDTODEVICE, dev, strlen(dev)+1); |
| |
| or to specify the output device using cmsg and IP_PKTINFO. |
| |
| |
| Limitations |
| ----------- |
| VRF device currently only works for IPv4. Support for IPv6 is under development. |
| |
| Index of original ingress interface is not available via cmsg. Will address |
| soon. |