Bart De Schuymer | 08934e3 | 2002-06-02 14:02:18 +0000 | [diff] [blame^] | 1 | <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3c.org/TR/1999/REC-html401-19991224/loose.dtd"> |
| 2 | <HTML><HEAD><TITLE>How bridge/ebtables/iptables interaction works</TITLE> |
| 3 | <META http-equiv=Content-Type content="text/html; charset=iso-8859-15"> |
| 4 | <STYLE type=text/css>H1 { |
| 5 | FONT: bold 25pt Times, serif; TEXT-ALIGN: center; TEXT-DECORATION: underline |
| 6 | } |
| 7 | P { |
| 8 | FONT: 20pt Times, serif |
| 9 | } |
| 10 | LI { |
| 11 | MARGIN-BOTTOM: 2em; FONT: 22pt 'Times New Roman', serif |
| 12 | } |
| 13 | PRE { |
| 14 | FONT: 18pt Courier, monospace |
| 15 | } |
| 16 | .statement { |
| 17 | TEXT-DECORATION: underline |
| 18 | } |
| 19 | .section { |
| 20 | FONT: bold 22pt Times |
| 21 | } |
| 22 | .case { |
| 23 | FONT-STYLE: italic |
| 24 | } |
| 25 | </STYLE> |
| 26 | |
| 27 | <META content="MSHTML 6.00.2505.0" name=GENERATOR></HEAD> |
| 28 | <BODY> |
| 29 | <H1>How bridge/ebtables/iptables interaction works</H1> |
| 30 | |
| 31 | <P class=section>1. How frames traverse the <EM>ebtables</EM> chains:</P> |
| 32 | <P>This section only considers <EM>ebtables</EM>, _not_ <EM>iptables</EM>.</P> |
| 33 | <PRE> |
| 34 | Route |
| 35 | ^ |
| 36 | | |
| 37 | I +--------+ Bridge +----------+ +-------+ +-----------+ O |
| 38 | N->|BROUTING|-------->|PREROUTING|----->[BRIDGING]---->|FORWARD| ---->|POSTROUTING|-->U |
| 39 | +--------+ +----------+ [DECISION] +-------+ +-----------+ T |
| 40 | | ^ |
| 41 | v | |
| 42 | +-----+ +----------+ |
| 43 | |INPUT| |OUTPUT (2)| |
| 44 | +-----+ +----------+ |
| 45 | | ^ |
| 46 | | | |
| 47 | | +----------+ |
| 48 | | +OUTPUT (1)+ |
| 49 | | +----------+ |
| 50 | | ^ |
| 51 | +------->Local Process---------+ |
| 52 | </PRE> |
| 53 | <P> |
| 54 | First thing to keep in mind is that we are talking about the ethernet layer here, |
| 55 | so the OSI layer 2. A packet destined for the local computer according to the bridge |
| 56 | (which works on the ethernet layer) isn't necessarily destined for the local computer |
| 57 | according to the ip layer. That's how routing works (MAC destination is the router, ip |
| 58 | destination is the actual box you want to communicate with).</P> |
| 59 | <P> |
| 60 | <EM>Ebtables</EM> currently has three tables: filter, nat and broute. The filter table has a |
| 61 | FORWARD, INPUT and OUTPUT chain. The nat table has a PREROUTING, OUTPUT and POSTROUTING chain. |
| 62 | The broute table has the BROUTING chain. In the figure the filter OUTPUT chain has (2) |
| 63 | appended and the nat OUTPUT chain has (1) appended. So these two OUTPUT chains are not |
| 64 | the same (and have a different intended use).</P> |
| 65 | <P> |
| 66 | When a nic enslaved to a bridge receives a frame, the frame will first go through the BROUTING |
| 67 | chain. In this special chain one can choose whether to route or bridge frames. The default |
| 68 | is bridging and we will assume the decision in this chain is 'bridge'. So, next the frame |
| 69 | passes through the PREROUTING chain. This chain is intended for you to be able to alter the |
| 70 | destination MAC address of |
| 71 | frames (DNAT). If the frame passes this chain, the bridging code will decide where the |
| 72 | frame should be sent. The bridge does this by looking at the destination MAC address, it |
| 73 | doesn't care about the OSI layer 3 addresses (e.g. ip address). Note that frames coming in |
| 74 | on non-forwarding ports of a bridge will not be seen by <EM>ebtables</EM>, not even by the BROUTING |
| 75 | chain.</P> |
| 76 | <P> |
| 77 | If the bridge decides the frame is for the bridging computer, the frame will go through the |
| 78 | INPUT chain. In this chain you can filter frames destined for the bridge box. After passing |
| 79 | the INPUT chain, the frame will be given to the code on layer 3 (i.e. it will be passed up), |
| 80 | e.g. to the ip code. So, a routed ip packet will go through the <EM>ebtables</EM> INPUT chain, not |
| 81 | through the <EM>ebtables</EM> FORWARD chain. This is logical.</P> |
| 82 | <P> |
| 83 | Else the frame should possibly be sent onto another side of the bridge. If it should, the |
| 84 | frame will go through the FORWARD chain and the POSTROUTING chain. In the FORWARD chain one |
| 85 | can filter frames that will be bridged, the POSTROUTING chain is intended to be able to |
| 86 | change the MAC source address (SNAT).</P> |
| 87 | <P> |
| 88 | Frames that originate from the bridge box itself will go, after the bridging decision, through the |
| 89 | nat OUTPUT chain, through the filter OUTPUT chain and the POSTROUTING chain. The |
| 90 | nat OUTPUT chain allows you to alter the destination MAC address and the filter OUTPUT chain |
| 91 | allows you to filter frames originating from the bridge box. Note that the nat OUTPUT chain is |
| 92 | traversed after the bridging decision, so actually too late. We should change this. The POSTROUTING |
| 93 | chain is the same one as described above. Note that it is also possible for routed frames to go |
| 94 | through these chains, this is when the destination device is a logical bridge device.</P> |
| 95 | <P class=section> |
| 96 | 2. A machine used as a bridge and a router (not a brouter):</P> |
| 97 | <P> |
| 98 | It's possible to see a single ip packet pass the PREROUTING, INPUT, nat OUTPUT, filter OUTPUT |
| 99 | and POSTROUTING <EM>ebtables</EM> chains.</P> |
| 100 | <P> |
| 101 | This can happen when the bridge is also used as a router. The ethernet frame(s) containing that |
| 102 | ip packet will have the bridge's destination MAC address, while the destination ip address is not |
| 103 | that of the bridge. Including the <EM>iptables</EM> chains, this is how the ip packet runs through the |
| 104 | bridge/router (eb=ebtables , ip=iptables ):</P> |
| 105 | <PRE>ebPREROUTING->ipPREROUTING->ebINPUT->ipFORWARD->ipPOSTROUTING->ebOUTPUT(1)->ebOUTPUT(2)->ebPOSTROUTING->send packet</PRE> |
| 106 | <P> |
| 107 | This assumes that the routing decision sends the packet to a bridge interface. If the routing |
| 108 | decision sends the packet to a physical network card, this is what happens:</P> |
| 109 | <PRE>ebPREROUTING->ipPREROUTING->ebINPUT->ipFORWARD->ipPOSTROUTING->send packet</PRE> |
| 110 | <P> |
| 111 | What is obviously "asymmetric" here is that the <EM>iptables</EM> PREROUTING chain is traversed before |
| 112 | the <EM>ebtables</EM> INPUT chain, however this can not be helped. See the next section.</P> |
| 113 | <P class=section> |
| 114 | 3. DNATing bridged packets:</P> |
| 115 | <P> |
| 116 | Take an ip packet received by the bridge, it enters the bridge code. Lets assume we want to do |
| 117 | some ip DNAT on it. Changing the destination address of the packet (ip address and MAC address) |
| 118 | has to happen before the bridge code decides what to do with the packet. The bridge code can decide |
| 119 | to bridge it (if the destination MAC address is on another side of the bridge), flood it over all |
| 120 | the forwarding bridge ports (the position of the box with the destination MAC is unknown to the bridge), |
| 121 | give it to the higher protocol code (here, the ip code) if the destination MAC address is that of the |
| 122 | bridge, or ignore it (the destination MAC address is located on the same side of the bridge).</P> |
| 123 | <P> |
| 124 | So, this ip DNAT has to happen very early in the bridge code. Namely before the bridge code |
| 125 | actually does anything. This is at the same place as where the <EM>ebtables</EM> PREROUTING chain will |
| 126 | be traversed (for the same reason).</P> |
| 127 | <P class=section> |
| 128 | 4. Chain traversal for bridged ip packets:</P> |
| 129 | <P> |
| 130 | A bridged packet never enters any network code above layer 2. So a bridged ip packet will never |
| 131 | enter the ip code. Therefore all <EM>iptables</EM> chains will be traversed while the ip packet is in the |
| 132 | bridge code. The chain traversal will look like this:</P> |
| 133 | <PRE> |
| 134 | ebPREROUTING->ipPREROUTING->ebFORWARD->ipFORWARD->ebPOSTROUTING->ipPOSTROUTING</PRE> |
| 135 | <P> |
| 136 | Once again note that there is a certain form of asymmetry here that cannot be helped.</P> |
| 137 | <P class=section> |
| 138 | 5. Using a bridge port in <EM>iptables</EM> rules:</P> |
| 139 | <P> |
| 140 | The wish to be able to use physical devices belonging to a bridge (bridge ports) in <EM>iptables</EM> rules |
| 141 | is valid. It's necessary to prevent spoofing attacks. Say br0 has ports eth0 and eth1. If <EM>iptables</EM> |
| 142 | rules can only use br0 there's no way of knowing when a box on the eth0 side changes it's source ip |
| 143 | address to that of a box on the eth1 side, except by looking at the MAC source address (and then |
| 144 | still...). With the current bridge/iptables patch (0.0.6 or later) you can use eth0 and eth1 in your |
| 145 | <EM>iptables</EM> rules and therefore catch these attempts.</P> |
| 146 | <P class=case> |
| 147 | 1. <EM>iptables</EM> wants to use bridge ports:<P> |
| 148 | <P> |
| 149 | To make this possible the <EM>iptables</EM> chains have to be traversed after the bridge code decided where |
| 150 | the frame needs to be sent (eth0, eth1, both or none). This has some impact on the scheme presented |
| 151 | in section 2 (so, we are looking at routed traffic here). It actually looks like this:</P> |
| 152 | <PRE> |
| 153 | ebPREROUTING->ipPREROUTING->ebINPUT->ipFORWARD->ebOUTPUT(1)->ebOUTPUT(2)->ipPOSTROUTING->ebPOSTROUTING->send packet</PRE> |
| 154 | <P> |
| 155 | Note that this is the work of the br-nf patch. If one does not compile the br-nf code into the kernel, |
| 156 | the chains will be traversed as shown below. However, then one can only use br0, not eth0/eth1 to |
| 157 | filter.</P> |
| 158 | <PRE>ebPREROUTING->ebINPUT->ipPREROUTING->ipFORWARD->ipPOSTROUTING->ebOUTPUT(1)->ebOUTPUT(2)->ebPOSTROUTING->send packet</PRE> |
| 159 | <P> |
| 160 | Notice that ipPREROUTING is now in the natural position in the chain list and too far to be able to change |
| 161 | the bridging decision. More precise: ipPREROUTING is now traversed while the packet is in the ip code.</P> |
| 162 | <P class=case> |
| 163 | 2. IP DNAT for locally generated packets (so in the <EM>iptables</EM> nat OUTPUT chain):</P> |
| 164 | <P> |
| 165 | The 'normal' way locally generated packets would go through the chains looks like this:</P> |
| 166 | <PRE> |
| 167 | ipOUTPUT(1)->ipOUTPUT(2)->ipPOSTROUTING->ebOUTPUT(1)->ebOUTPUT(2)->ebPOSTROUTING</PRE> |
| 168 | <P> |
| 169 | From the section 5.1 we know that this actually looks like this:</P> |
| 170 | <PRE> |
| 171 | ipOUTPUT(1)->ipOUTPUT(2)->ebOUTPUT(1)->ebOUTPUT(2)->ebPOSTROUTING->ipPOSTROUTING</PRE> |
| 172 | <P> |
| 173 | Here we denote by ipOUTPUT(1) (resp. ipOUTPUT(2)) the <EM>iptables</EM> nat (resp. filter) OUTPUT chain. Note that |
| 174 | the ipOUTPUT(1) chain is traversed while the packet is in the ip code, while the ipOUTPUT(2) chain is traversed when |
| 175 | the packet has entered the bridge code. This makes it possible to do DNAT to another device in ipOUTPUT(1) and lets |
| 176 | one use the bridge ports in the ipOUTPUT(2) chain.</P> |
| 177 | <P class=section> |
| 178 | 4. Two possible ways for frames/packets to pass through the <EM>iptables</EM> PREROUTING, FORWARD and POSTROUTING |
| 179 | chains:</P> |
| 180 | <P> |
| 181 | With the br-nf patch there are 2 ways a frame/packet can pass through the 3 given <EM>iptables</EM> |
| 182 | chains. The first way is when the frame is bridged, so the <EM>iptables</EM> chains are called by the bridge code. |
| 183 | The second way is when the packet is routed. So special care has to be taken to distinguish between those |
| 184 | two, especially in the <EM>iptables</EM> FORWARD chain. Here's an example of strange things to look out for:</P> |
| 185 | <P> |
| 186 | Consider the following situation (my personal setup)</P> |
| 187 | <PRE> |
| 188 | +-----------------+ |
| 189 | | cable modem | |
| 190 | +-------+---------+ |
| 191 | | |
| 192 | | |
| 193 | eth0|IP via DHCP from ISP |
| 194 | +-------+---------+ |
| 195 | |bridge/router/fw | |
| 196 | +--+-----------+--+ |
| 197 | eth1| 172.16.1.1|eth2 |
| 198 | | (br0) | |
| 199 | | | |
| 200 | 172.16.1.4| |172.16.1.2 |
| 201 | +----------+---+ +--+------------+ |
| 202 | |test computer/| | desktop | |
| 203 | |backup server | +---------------+ |
| 204 | +--------------+</PRE> |
| 205 | <P> |
| 206 | With this setup I can test the bridge+ebtables+iptables code while having access to the internet from all |
| 207 | three computers. The default gateway for 172.16.1.2 and 172.16.1.4 is 172.16.1.1. 172.16.1.1 is the bridge |
| 208 | interface br0 with ports eth1 and eth2.</P> |
| 209 | <P class=case>More details:</P> |
| 210 | <P> |
| 211 | The idea is that traffic between 172.16.1.4 and 172.16.2 is bridged, while the rest is routed, using |
| 212 | masquerading. Here's the "script" I use at bootup for the bridge/router:</P> |
| 213 | <PRE> |
| 214 | iptables -t nat -A POSTROUTING -s 172.16.1.0/24 -d 172.16.1.0/24 -j ACCEPT |
| 215 | iptables -t nat -A POSTROUTING -s 172.16.1.0/24 -j MASQUERADE |
| 216 | insmod ebtables |
| 217 | insmod ebtable_filter |
| 218 | insmod ebtable_nat |
| 219 | insmod ebt_nat |
| 220 | insmod ebt_log |
| 221 | insmod ebt_arp |
| 222 | insmod ebt_ip |
| 223 | insmod br_db |
| 224 | brctl addbr br0 |
| 225 | brctl stp br0 off |
| 226 | brctl addif br0 eth1 |
| 227 | brctl addif br0 eth2 |
| 228 | ifconfig eth1 0 0.0.0.0 |
| 229 | ifconfig eth2 0 0.0.0.0 |
| 230 | ifconfig br0 172.16.1.1 netmask 255.255.255.0 up |
| 231 | echo '1' > /proc/sys/net/ipv4/ip_forward</PRE> |
| 232 | <P> |
| 233 | The catch is in the first line. Because the <EM>iptables</EM> code gets executed for both bridged packets and routed |
| 234 | packets we need to make a distinction between the two. We don't really want the bridged packets to be |
| 235 | masqueraded. If we omit the first line then everything will work too, but things will happen differently. |
| 236 | Let's say 172.16.1.2 pings 172.16.1.4. The bridge receives the ping request and will transmit it through its eth1 |
| 237 | port after first masquerading the ip address. So the packet's source ip address will now be 172.16.1.1 and |
| 238 | 172.16.1.4 will respond to the bridge. Masquerading will change the ip destination of this response from |
| 239 | 172.16.1.1 to 172.16.1.4. Everything works fine. But it's better not to have this behaviour. Thus, we use the |
| 240 | first line of the script to avoid this. Note that if I wanted to filter the connections to and from the |
| 241 | internet, I would certainly need the first line so I don't filter the local connections as well.</P> |
| 242 | <P class=section> |
| 243 | 5. ip DNAT in the <EM>iptables</EM> PREROUTING chain on frames/packets entering on a bridge port:</P> |
| 244 | <P>Through some groovy play it is assured that (see /net/bridge/br_netfilter.c) DNAT'ed packets that after DNAT'ing |
| 245 | have the same output device as the input device they came on (the logical bridge device which we like to call br0) |
| 246 | will be bridged, not routed. So they will go through the <EM>ebtables</EM> FORWARD chain. All other DNAT'ed packets will be |
| 247 | routed, so won't go through the <EM>ebtables</EM> FORWARD chain, will go through the <EM>ebtables</EM> INPUT chain and might go |
| 248 | through the <EM>ebtables</EM> OUTPUT chain.</P> |
| 249 | <P> |
| 250 | Released under the GPL.</P> |
| 251 | <P> |
| 252 | Bart De Schuymer.</P> |
| 253 | <P> |
| 254 | Last updated the 19th May 2002.</P> |
| 255 | </BODY></HTML> |