| <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3c.org/TR/1999/REC-html401-19991224/loose.dtd"> |
| <HTML><HEAD><TITLE>How bridge/ebtables/iptables interaction works</TITLE> |
| <META http-equiv=Content-Type content="text/html; charset=iso-8859-15"> |
| <STYLE type=text/css>H1 { |
| FONT: bold 25pt Times, serif; TEXT-ALIGN: center; TEXT-DECORATION: underline |
| } |
| P { |
| FONT: 20pt Times, serif |
| } |
| LI { |
| MARGIN-BOTTOM: 2em; FONT: 22pt 'Times New Roman', serif |
| } |
| PRE { |
| FONT: 18pt Courier, monospace |
| } |
| .statement { |
| TEXT-DECORATION: underline |
| } |
| .section { |
| FONT: bold 22pt Times |
| } |
| .case { |
| FONT-STYLE: italic |
| } |
| </STYLE> |
| |
| <META content="MSHTML 6.00.2505.0" name=GENERATOR></HEAD> |
| <BODY> |
| <H1>How bridge/ebtables/iptables interaction works</H1> |
| |
| <P class=section>1. How frames traverse the <EM>ebtables</EM> chains:</P> |
| <P>This section only considers <EM>ebtables</EM>, _not_ <EM>iptables</EM>.</P> |
| <PRE> |
| Route |
| ^ |
| | |
| I +--------+ Bridge +----------+ +-------+ +-----------+ O |
| N->|BROUTING|-------->|PREROUTING|----->[BRIDGING]---->|FORWARD| ---->|POSTROUTING|-->U |
| +--------+ +----------+ [DECISION] +-------+ +-----------+ T |
| | ^ |
| v | |
| +-----+ +----------+ |
| |INPUT| |OUTPUT (2)| |
| +-----+ +----------+ |
| | ^ |
| | | |
| | +----------+ |
| | +OUTPUT (1)+ |
| | +----------+ |
| | ^ |
| +------->Local Process---------+ |
| </PRE> |
| <P> |
| First thing to keep in mind is that we are talking about the ethernet layer here, |
| so the OSI layer 2. A packet destined for the local computer according to the bridge |
| (which works on the ethernet layer) isn't necessarily destined for the local computer |
| according to the ip layer. That's how routing works (MAC destination is the router, ip |
| destination is the actual box you want to communicate with).</P> |
| <P> |
| <EM>Ebtables</EM> currently has three tables: filter, nat and broute. The filter table has a |
| FORWARD, INPUT and OUTPUT chain. The nat table has a PREROUTING, OUTPUT and POSTROUTING chain. |
| The broute table has the BROUTING chain. In the figure the filter OUTPUT chain has (2) |
| appended and the nat OUTPUT chain has (1) appended. So these two OUTPUT chains are not |
| the same (and have a different intended use).</P> |
| <P> |
| When a nic enslaved to a bridge receives a frame, the frame will first go through the BROUTING |
| chain. In this special chain one can choose whether to route or bridge frames. The default |
| is bridging and we will assume the decision in this chain is 'bridge'. So, next the frame |
| passes through the PREROUTING chain. This chain is intended for you to be able to alter the |
| destination MAC address of |
| frames (DNAT). If the frame passes this chain, the bridging code will decide where the |
| frame should be sent. The bridge does this by looking at the destination MAC address, it |
| doesn't care about the OSI layer 3 addresses (e.g. ip address). Note that frames coming in |
| on non-forwarding ports of a bridge will not be seen by <EM>ebtables</EM>, not even by the BROUTING |
| chain.</P> |
| <P> |
| If the bridge decides the frame is for the bridging computer, the frame will go through the |
| INPUT chain. In this chain you can filter frames destined for the bridge box. After passing |
| the INPUT chain, the frame will be given to the code on layer 3 (i.e. it will be passed up), |
| e.g. to the ip code. So, a routed ip packet will go through the <EM>ebtables</EM> INPUT chain, not |
| through the <EM>ebtables</EM> FORWARD chain. This is logical.</P> |
| <P> |
| Else the frame should possibly be sent onto another side of the bridge. If it should, the |
| frame will go through the FORWARD chain and the POSTROUTING chain. In the FORWARD chain one |
| can filter frames that will be bridged, the POSTROUTING chain is intended to be able to |
| change the MAC source address (SNAT).</P> |
| <P> |
| Frames that originate from the bridge box itself will go, after the bridging decision, through the |
| nat OUTPUT chain, through the filter OUTPUT chain and the POSTROUTING chain. The |
| nat OUTPUT chain allows you to alter the destination MAC address and the filter OUTPUT chain |
| allows you to filter frames originating from the bridge box. Note that the nat OUTPUT chain is |
| traversed after the bridging decision, so actually too late. We should change this. The POSTROUTING |
| chain is the same one as described above. Note that it is also possible for routed frames to go |
| through these chains, this is when the destination device is a logical bridge device.</P> |
| <P class=section> |
| 2. A machine used as a bridge and a router (not a brouter):</P> |
| <P> |
| It's possible to see a single ip packet pass the PREROUTING, INPUT, nat OUTPUT, filter OUTPUT |
| and POSTROUTING <EM>ebtables</EM> chains.</P> |
| <P> |
| This can happen when the bridge is also used as a router. The ethernet frame(s) containing that |
| ip packet will have the bridge's destination MAC address, while the destination ip address is not |
| that of the bridge. Including the <EM>iptables</EM> chains, this is how the ip packet runs through the |
| bridge/router (eb=ebtables , ip=iptables ):</P> |
| <PRE>ebPREROUTING->ipPREROUTING->ebINPUT->ipFORWARD->ipPOSTROUTING->ebOUTPUT(1)->ebOUTPUT(2)->ebPOSTROUTING->send packet</PRE> |
| <P> |
| This assumes that the routing decision sends the packet to a bridge interface. If the routing |
| decision sends the packet to a physical network card, this is what happens:</P> |
| <PRE>ebPREROUTING->ipPREROUTING->ebINPUT->ipFORWARD->ipPOSTROUTING->send packet</PRE> |
| <P> |
| What is obviously "asymmetric" here is that the <EM>iptables</EM> PREROUTING chain is traversed before |
| the <EM>ebtables</EM> INPUT chain, however this can not be helped. See the next section.</P> |
| <P class=section> |
| 3. DNATing bridged packets:</P> |
| <P> |
| Take an ip packet received by the bridge, it enters the bridge code. Lets assume we want to do |
| some ip DNAT on it. Changing the destination address of the packet (ip address and MAC address) |
| has to happen before the bridge code decides what to do with the packet. The bridge code can decide |
| to bridge it (if the destination MAC address is on another side of the bridge), flood it over all |
| the forwarding bridge ports (the position of the box with the destination MAC is unknown to the bridge), |
| give it to the higher protocol code (here, the ip code) if the destination MAC address is that of the |
| bridge, or ignore it (the destination MAC address is located on the same side of the bridge).</P> |
| <P> |
| So, this ip DNAT has to happen very early in the bridge code. Namely before the bridge code |
| actually does anything. This is at the same place as where the <EM>ebtables</EM> PREROUTING chain will |
| be traversed (for the same reason).</P> |
| <P class=section> |
| 4. Chain traversal for bridged ip packets:</P> |
| <P> |
| A bridged packet never enters any network code above layer 2. So a bridged ip packet will never |
| enter the ip code. Therefore all <EM>iptables</EM> chains will be traversed while the ip packet is in the |
| bridge code. The chain traversal will look like this:</P> |
| <PRE> |
| ebPREROUTING->ipPREROUTING->ebFORWARD->ipFORWARD->ebPOSTROUTING->ipPOSTROUTING</PRE> |
| <P> |
| Once again note that there is a certain form of asymmetry here that cannot be helped.</P> |
| <P class=section> |
| 5. Using a bridge port in <EM>iptables</EM> rules:</P> |
| <P> |
| The wish to be able to use physical devices belonging to a bridge (bridge ports) in <EM>iptables</EM> rules |
| is valid. It's necessary to prevent spoofing attacks. Say br0 has ports eth0 and eth1. If <EM>iptables</EM> |
| rules can only use br0 there's no way of knowing when a box on the eth0 side changes it's source ip |
| address to that of a box on the eth1 side, except by looking at the MAC source address (and then |
| still...). With the current bridge/iptables patch (0.0.6 or later) you can use eth0 and eth1 in your |
| <EM>iptables</EM> rules and therefore catch these attempts.</P> |
| <P class=case> |
| 1. <EM>iptables</EM> wants to use bridge ports:<P> |
| <P> |
| To make this possible the <EM>iptables</EM> chains have to be traversed after the bridge code decided where |
| the frame needs to be sent (eth0, eth1, both or none). This has some impact on the scheme presented |
| in section 2 (so, we are looking at routed traffic here). It actually looks like this:</P> |
| <PRE> |
| ebPREROUTING->ipPREROUTING->ebINPUT->ipFORWARD->ebOUTPUT(1)->ebOUTPUT(2)->ipPOSTROUTING->ebPOSTROUTING->send packet</PRE> |
| <P> |
| Note that this is the work of the br-nf patch. If one does not compile the br-nf code into the kernel, |
| the chains will be traversed as shown below. However, then one can only use br0, not eth0/eth1 to |
| filter.</P> |
| <PRE>ebPREROUTING->ebINPUT->ipPREROUTING->ipFORWARD->ipPOSTROUTING->ebOUTPUT(1)->ebOUTPUT(2)->ebPOSTROUTING->send packet</PRE> |
| <P> |
| Notice that ipPREROUTING is now in the natural position in the chain list and too far to be able to change |
| the bridging decision. More precise: ipPREROUTING is now traversed while the packet is in the ip code.</P> |
| <P class=case> |
| 2. IP DNAT for locally generated packets (so in the <EM>iptables</EM> nat OUTPUT chain):</P> |
| <P> |
| The 'normal' way locally generated packets would go through the chains looks like this:</P> |
| <PRE> |
| ipOUTPUT(1)->ipOUTPUT(2)->ipPOSTROUTING->ebOUTPUT(1)->ebOUTPUT(2)->ebPOSTROUTING</PRE> |
| <P> |
| From the section 5.1 we know that this actually looks like this:</P> |
| <PRE> |
| ipOUTPUT(1)->ipOUTPUT(2)->ebOUTPUT(1)->ebOUTPUT(2)->ebPOSTROUTING->ipPOSTROUTING</PRE> |
| <P> |
| Here we denote by ipOUTPUT(1) (resp. ipOUTPUT(2)) the <EM>iptables</EM> nat (resp. filter) OUTPUT chain. Note that |
| the ipOUTPUT(1) chain is traversed while the packet is in the ip code, while the ipOUTPUT(2) chain is traversed when |
| the packet has entered the bridge code. This makes it possible to do DNAT to another device in ipOUTPUT(1) and lets |
| one use the bridge ports in the ipOUTPUT(2) chain.</P> |
| <P class=section> |
| 6. Two possible ways for frames/packets to pass through the <EM>iptables</EM> PREROUTING, FORWARD and POSTROUTING |
| chains:</P> |
| <P> |
| With the br-nf patch there are 2 ways a frame/packet can pass through the 3 given <EM>iptables</EM> |
| chains. The first way is when the frame is bridged, so the <EM>iptables</EM> chains are called by the bridge code. |
| The second way is when the packet is routed. So special care has to be taken to distinguish between those |
| two, especially in the <EM>iptables</EM> FORWARD chain. Here's an example of strange things to look out for:</P> |
| <P> |
| Consider the following situation (my personal setup)</P> |
| <PRE> |
| +-----------------+ |
| | cable modem | |
| +-------+---------+ |
| | |
| | |
| eth0|IP via DHCP from ISP |
| +-------+---------+ |
| |bridge/router/fw | |
| +--+-----------+--+ |
| eth1| 172.16.1.1|eth2 |
| | (br0) | |
| | | |
| 172.16.1.4| |172.16.1.2 |
| +----------+---+ +--+------------+ |
| |test computer/| | desktop | |
| |backup server | +---------------+ |
| +--------------+</PRE> |
| <P> |
| With this setup I can test the bridge+ebtables+iptables code while having access to the internet from all |
| three computers. The default gateway for 172.16.1.2 and 172.16.1.4 is 172.16.1.1. 172.16.1.1 is the bridge |
| interface br0 with ports eth1 and eth2.</P> |
| <P class=case>More details:</P> |
| <P> |
| The idea is that traffic between 172.16.1.4 and 172.16.2 is bridged, while the rest is routed, using |
| masquerading. Here's the "script" I use at bootup for the bridge/router:</P> |
| <PRE> |
| iptables -t nat -A POSTROUTING -s 172.16.1.0/24 -d 172.16.1.0/24 -j ACCEPT |
| iptables -t nat -A POSTROUTING -s 172.16.1.0/24 -j MASQUERADE |
| insmod ebtables |
| insmod ebtable_filter |
| insmod ebtable_nat |
| insmod ebt_nat |
| insmod ebt_log |
| insmod ebt_arp |
| insmod ebt_ip |
| insmod br_db |
| brctl addbr br0 |
| brctl stp br0 off |
| brctl addif br0 eth1 |
| brctl addif br0 eth2 |
| ifconfig eth1 0 0.0.0.0 |
| ifconfig eth2 0 0.0.0.0 |
| ifconfig br0 172.16.1.1 netmask 255.255.255.0 up |
| echo '1' > /proc/sys/net/ipv4/ip_forward</PRE> |
| <P> |
| The catch is in the first line. Because the <EM>iptables</EM> code gets executed for both bridged packets and routed |
| packets we need to make a distinction between the two. We don't really want the bridged packets to be |
| masqueraded. If we omit the first line then everything will work too, but things will happen differently. |
| Let's say 172.16.1.2 pings 172.16.1.4. The bridge receives the ping request and will transmit it through its eth1 |
| port after first masquerading the ip address. So the packet's source ip address will now be 172.16.1.1 and |
| 172.16.1.4 will respond to the bridge. Masquerading will change the ip destination of this response from |
| 172.16.1.1 to 172.16.1.4. Everything works fine. But it's better not to have this behaviour. Thus, we use the |
| first line of the script to avoid this. Note that if I wanted to filter the connections to and from the |
| internet, I would certainly need the first line so I don't filter the local connections as well.</P> |
| <P class=section> |
| 7. ip DNAT in the <EM>iptables</EM> PREROUTING chain on frames/packets entering on a bridge port:</P> |
| <P>Through some groovy play it is assured that (see /net/bridge/br_netfilter.c) DNAT'ed packets that after DNAT'ing |
| have the same output device as the input device they came on (the logical bridge device which we like to call br0) |
| will be bridged, not routed. So they will go through the <EM>ebtables</EM> FORWARD chain. All other DNAT'ed packets will be |
| routed, so won't go through the <EM>ebtables</EM> FORWARD chain, will go through the <EM>ebtables</EM> INPUT chain and might go |
| through the <EM>ebtables</EM> OUTPUT chain.</P> |
| <P class=section> |
| 8. using the mac module extension for <EM>iptables</EM>:</P> |
| <P>The side effect explained here occurs when the br-nf code is compiled in the kernel, the ip packet is routed and the out device |
| for that packet is a logical bridge. The side effect is encountered when filtering on the mac source in the |
| <EM>iptables</EM> FORWARD chains. As should be clear from earlier sections, the traversal of the <EM>iptables</EM> FORWARD chains |
| is postponed until the packet is in the bridge code. This is done so one can filter on the bridge port out device. This has a |
| side effect on the MAC source address, because the ip code will have changed the MAC source address to the MAC address of the bridge. |
| It is therefore impossible, in the <EM>iptables</EM> FORWARD chains, to filter on the MAC source address of the computer sending |
| the packet in question to the bridge/router. If you really need to filter on this MAC source address, you should do it in the nat |
| PREROUTING chain. Agreed, very ugly, but making it possible to filter on the real MAC source address in the FORWARD chains would |
| involve a very dirty hack and is probably not worth it.</P> |
| <P> |
| Released under the GPL.</P> |
| <P> |
| Bart De Schuymer.</P> |
| <P> |
| Last updated June 2nd, 2002.</P> |
| </BODY></HTML> |