Blame - docs/how_it_works.html - external/ebtables

blob: 9ec8b27e190b8a244a5c930e3a587134bb4e2d85 [file] [log] [blame]

Bart De Schuymer	08934e3	2002-06-02 14:02:18 +0000	[diff] [blame^]	1	<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3c.org/TR/1999/REC-html401-19991224/loose.dtd">
				2	<HTML><HEAD><TITLE>How bridge/ebtables/iptables interaction works</TITLE>
				3	<META http-equiv=Content-Type content="text/html; charset=iso-8859-15">
				4	<STYLE type=text/css>H1 {
				5	FONT: bold 25pt Times, serif; TEXT-ALIGN: center; TEXT-DECORATION: underline
				6	}
				7	P {
				8	FONT: 20pt Times, serif
				9	}
				10	LI {
				11	MARGIN-BOTTOM: 2em; FONT: 22pt 'Times New Roman', serif
				12	}
				13	PRE {
				14	FONT: 18pt Courier, monospace
				15	}
				16	.statement {
				17	TEXT-DECORATION: underline
				18	}
				19	.section {
				20	FONT: bold 22pt Times
				21	}
				22	.case {
				23	FONT-STYLE: italic
				24	}
				25	</STYLE>
				26
				27	<META content="MSHTML 6.00.2505.0" name=GENERATOR></HEAD>
				28	<BODY>
				29	<H1>How bridge/ebtables/iptables interaction works</H1>
				30
				31	<P class=section>1. How frames traverse the <EM>ebtables</EM> chains:</P>
				32	<P>This section only considers <EM>ebtables</EM>, _not_ <EM>iptables</EM>.</P>
				33	<PRE>
				34	Route
				35	^
				36	\|
				37	I +--------+ Bridge +----------+ +-------+ +-----------+ O
				38	N->\|BROUTING\|-------->\|PREROUTING\|----->[BRIDGING]---->\|FORWARD\| ---->\|POSTROUTING\|-->U
				39	+--------+ +----------+ [DECISION] +-------+ +-----------+ T
				40	\| ^
				41	v \|
				42	+-----+ +----------+
				43	\|INPUT\| \|OUTPUT (2)\|
				44	+-----+ +----------+
				45	\| ^
				46	\| \|
				47	\| +----------+
				48	\| +OUTPUT (1)+
				49	\| +----------+
				50	\| ^
				51	+------->Local Process---------+
				52	</PRE>
				53	<P>
				54	First thing to keep in mind is that we are talking about the ethernet layer here,
				55	so the OSI layer 2. A packet destined for the local computer according to the bridge
				56	(which works on the ethernet layer) isn't necessarily destined for the local computer
				57	according to the ip layer. That's how routing works (MAC destination is the router, ip
				58	destination is the actual box you want to communicate with).</P>
				59	<P>
				60	<EM>Ebtables</EM> currently has three tables: filter, nat and broute. The filter table has a
				61	FORWARD, INPUT and OUTPUT chain. The nat table has a PREROUTING, OUTPUT and POSTROUTING chain.
				62	The broute table has the BROUTING chain. In the figure the filter OUTPUT chain has (2)
				63	appended and the nat OUTPUT chain has (1) appended. So these two OUTPUT chains are not
				64	the same (and have a different intended use).</P>
				65	<P>
				66	When a nic enslaved to a bridge receives a frame, the frame will first go through the BROUTING
				67	chain. In this special chain one can choose whether to route or bridge frames. The default
				68	is bridging and we will assume the decision in this chain is 'bridge'. So, next the frame
				69	passes through the PREROUTING chain. This chain is intended for you to be able to alter the
				70	destination MAC address of
				71	frames (DNAT). If the frame passes this chain, the bridging code will decide where the
				72	frame should be sent. The bridge does this by looking at the destination MAC address, it
				73	doesn't care about the OSI layer 3 addresses (e.g. ip address). Note that frames coming in
				74	on non-forwarding ports of a bridge will not be seen by <EM>ebtables</EM>, not even by the BROUTING
				75	chain.</P>
				76	<P>
				77	If the bridge decides the frame is for the bridging computer, the frame will go through the
				78	INPUT chain. In this chain you can filter frames destined for the bridge box. After passing
				79	the INPUT chain, the frame will be given to the code on layer 3 (i.e. it will be passed up),
				80	e.g. to the ip code. So, a routed ip packet will go through the <EM>ebtables</EM> INPUT chain, not
				81	through the <EM>ebtables</EM> FORWARD chain. This is logical.</P>
				82	<P>
				83	Else the frame should possibly be sent onto another side of the bridge. If it should, the
				84	frame will go through the FORWARD chain and the POSTROUTING chain. In the FORWARD chain one
				85	can filter frames that will be bridged, the POSTROUTING chain is intended to be able to
				86	change the MAC source address (SNAT).</P>
				87	<P>
				88	Frames that originate from the bridge box itself will go, after the bridging decision, through the
				89	nat OUTPUT chain, through the filter OUTPUT chain and the POSTROUTING chain. The
				90	nat OUTPUT chain allows you to alter the destination MAC address and the filter OUTPUT chain
				91	allows you to filter frames originating from the bridge box. Note that the nat OUTPUT chain is
				92	traversed after the bridging decision, so actually too late. We should change this. The POSTROUTING
				93	chain is the same one as described above. Note that it is also possible for routed frames to go
				94	through these chains, this is when the destination device is a logical bridge device.</P>
				95	<P class=section>
				96	2. A machine used as a bridge and a router (not a brouter):</P>
				97	<P>
				98	It's possible to see a single ip packet pass the PREROUTING, INPUT, nat OUTPUT, filter OUTPUT
				99	and POSTROUTING <EM>ebtables</EM> chains.</P>
				100	<P>
				101	This can happen when the bridge is also used as a router. The ethernet frame(s) containing that
				102	ip packet will have the bridge's destination MAC address, while the destination ip address is not
				103	that of the bridge. Including the <EM>iptables</EM> chains, this is how the ip packet runs through the
				104	bridge/router (eb=ebtables , ip=iptables ):</P>
				105	<PRE>ebPREROUTING->ipPREROUTING->ebINPUT->ipFORWARD->ipPOSTROUTING->ebOUTPUT(1)->ebOUTPUT(2)->ebPOSTROUTING->send packet</PRE>
				106	<P>
				107	This assumes that the routing decision sends the packet to a bridge interface. If the routing
				108	decision sends the packet to a physical network card, this is what happens:</P>
				109	<PRE>ebPREROUTING->ipPREROUTING->ebINPUT->ipFORWARD->ipPOSTROUTING->send packet</PRE>
				110	<P>
				111	What is obviously "asymmetric" here is that the <EM>iptables</EM> PREROUTING chain is traversed before
				112	the <EM>ebtables</EM> INPUT chain, however this can not be helped. See the next section.</P>
				113	<P class=section>
				114	3. DNATing bridged packets:</P>
				115	<P>
				116	Take an ip packet received by the bridge, it enters the bridge code. Lets assume we want to do
				117	some ip DNAT on it. Changing the destination address of the packet (ip address and MAC address)
				118	has to happen before the bridge code decides what to do with the packet. The bridge code can decide
				119	to bridge it (if the destination MAC address is on another side of the bridge), flood it over all
				120	the forwarding bridge ports (the position of the box with the destination MAC is unknown to the bridge),
				121	give it to the higher protocol code (here, the ip code) if the destination MAC address is that of the
				122	bridge, or ignore it (the destination MAC address is located on the same side of the bridge).</P>
				123	<P>
				124	So, this ip DNAT has to happen very early in the bridge code. Namely before the bridge code
				125	actually does anything. This is at the same place as where the <EM>ebtables</EM> PREROUTING chain will
				126	be traversed (for the same reason).</P>
				127	<P class=section>
				128	4. Chain traversal for bridged ip packets:</P>
				129	<P>
				130	A bridged packet never enters any network code above layer 2. So a bridged ip packet will never
				131	enter the ip code. Therefore all <EM>iptables</EM> chains will be traversed while the ip packet is in the
				132	bridge code. The chain traversal will look like this:</P>
				133	<PRE>
				134	ebPREROUTING->ipPREROUTING->ebFORWARD->ipFORWARD->ebPOSTROUTING->ipPOSTROUTING</PRE>
				135	<P>
				136	Once again note that there is a certain form of asymmetry here that cannot be helped.</P>
				137	<P class=section>
				138	5. Using a bridge port in <EM>iptables</EM> rules:</P>
				139	<P>
				140	The wish to be able to use physical devices belonging to a bridge (bridge ports) in <EM>iptables</EM> rules
				141	is valid. It's necessary to prevent spoofing attacks. Say br0 has ports eth0 and eth1. If <EM>iptables</EM>
				142	rules can only use br0 there's no way of knowing when a box on the eth0 side changes it's source ip
				143	address to that of a box on the eth1 side, except by looking at the MAC source address (and then
				144	still...). With the current bridge/iptables patch (0.0.6 or later) you can use eth0 and eth1 in your
				145	<EM>iptables</EM> rules and therefore catch these attempts.</P>
				146	<P class=case>
				147	1. <EM>iptables</EM> wants to use bridge ports:<P>
				148	<P>
				149	To make this possible the <EM>iptables</EM> chains have to be traversed after the bridge code decided where
				150	the frame needs to be sent (eth0, eth1, both or none). This has some impact on the scheme presented
				151	in section 2 (so, we are looking at routed traffic here). It actually looks like this:</P>
				152	<PRE>
				153	ebPREROUTING->ipPREROUTING->ebINPUT->ipFORWARD->ebOUTPUT(1)->ebOUTPUT(2)->ipPOSTROUTING->ebPOSTROUTING->send packet</PRE>
				154	<P>
				155	Note that this is the work of the br-nf patch. If one does not compile the br-nf code into the kernel,
				156	the chains will be traversed as shown below. However, then one can only use br0, not eth0/eth1 to
				157	filter.</P>
				158	<PRE>ebPREROUTING->ebINPUT->ipPREROUTING->ipFORWARD->ipPOSTROUTING->ebOUTPUT(1)->ebOUTPUT(2)->ebPOSTROUTING->send packet</PRE>
				159	<P>
				160	Notice that ipPREROUTING is now in the natural position in the chain list and too far to be able to change
				161	the bridging decision. More precise: ipPREROUTING is now traversed while the packet is in the ip code.</P>
				162	<P class=case>
				163	2. IP DNAT for locally generated packets (so in the <EM>iptables</EM> nat OUTPUT chain):</P>
				164	<P>
				165	The 'normal' way locally generated packets would go through the chains looks like this:</P>
				166	<PRE>
				167	ipOUTPUT(1)->ipOUTPUT(2)->ipPOSTROUTING->ebOUTPUT(1)->ebOUTPUT(2)->ebPOSTROUTING</PRE>
				168	<P>
				169	From the section 5.1 we know that this actually looks like this:</P>
				170	<PRE>
				171	ipOUTPUT(1)->ipOUTPUT(2)->ebOUTPUT(1)->ebOUTPUT(2)->ebPOSTROUTING->ipPOSTROUTING</PRE>
				172	<P>
				173	Here we denote by ipOUTPUT(1) (resp. ipOUTPUT(2)) the <EM>iptables</EM> nat (resp. filter) OUTPUT chain. Note that
				174	the ipOUTPUT(1) chain is traversed while the packet is in the ip code, while the ipOUTPUT(2) chain is traversed when
				175	the packet has entered the bridge code. This makes it possible to do DNAT to another device in ipOUTPUT(1) and lets
				176	one use the bridge ports in the ipOUTPUT(2) chain.</P>
				177	<P class=section>
				178	4. Two possible ways for frames/packets to pass through the <EM>iptables</EM> PREROUTING, FORWARD and POSTROUTING
				179	chains:</P>
				180	<P>
				181	With the br-nf patch there are 2 ways a frame/packet can pass through the 3 given <EM>iptables</EM>
				182	chains. The first way is when the frame is bridged, so the <EM>iptables</EM> chains are called by the bridge code.
				183	The second way is when the packet is routed. So special care has to be taken to distinguish between those
				184	two, especially in the <EM>iptables</EM> FORWARD chain. Here's an example of strange things to look out for:</P>
				185	<P>
				186	Consider the following situation (my personal setup)</P>
				187	<PRE>
				188	+-----------------+
				189	\| cable modem \|
				190	+-------+---------+
				191	\|
				192	\|
				193	eth0\|IP via DHCP from ISP
				194	+-------+---------+
				195	\|bridge/router/fw \|
				196	+--+-----------+--+
				197	eth1\| 172.16.1.1\|eth2
				198	\| (br0) \|
				199	\| \|
				200	172.16.1.4\| \|172.16.1.2
				201	+----------+---+ +--+------------+
				202	\|test computer/\| \| desktop \|
				203	\|backup server \| +---------------+
				204	+--------------+</PRE>
				205	<P>
				206	With this setup I can test the bridge+ebtables+iptables code while having access to the internet from all
				207	three computers. The default gateway for 172.16.1.2 and 172.16.1.4 is 172.16.1.1. 172.16.1.1 is the bridge
				208	interface br0 with ports eth1 and eth2.</P>
				209	<P class=case>More details:</P>
				210	<P>
				211	The idea is that traffic between 172.16.1.4 and 172.16.2 is bridged, while the rest is routed, using
				212	masquerading. Here's the "script" I use at bootup for the bridge/router:</P>
				213	<PRE>
				214	iptables -t nat -A POSTROUTING -s 172.16.1.0/24 -d 172.16.1.0/24 -j ACCEPT
				215	iptables -t nat -A POSTROUTING -s 172.16.1.0/24 -j MASQUERADE
				216	insmod ebtables
				217	insmod ebtable_filter
				218	insmod ebtable_nat
				219	insmod ebt_nat
				220	insmod ebt_log
				221	insmod ebt_arp
				222	insmod ebt_ip
				223	insmod br_db
				224	brctl addbr br0
				225	brctl stp br0 off
				226	brctl addif br0 eth1
				227	brctl addif br0 eth2
				228	ifconfig eth1 0 0.0.0.0
				229	ifconfig eth2 0 0.0.0.0
				230	ifconfig br0 172.16.1.1 netmask 255.255.255.0 up
				231	echo '1' > /proc/sys/net/ipv4/ip_forward</PRE>
				232	<P>
				233	The catch is in the first line. Because the <EM>iptables</EM> code gets executed for both bridged packets and routed
				234	packets we need to make a distinction between the two. We don't really want the bridged packets to be
				235	masqueraded. If we omit the first line then everything will work too, but things will happen differently.
				236	Let's say 172.16.1.2 pings 172.16.1.4. The bridge receives the ping request and will transmit it through its eth1
				237	port after first masquerading the ip address. So the packet's source ip address will now be 172.16.1.1 and
				238	172.16.1.4 will respond to the bridge. Masquerading will change the ip destination of this response from
				239	172.16.1.1 to 172.16.1.4. Everything works fine. But it's better not to have this behaviour. Thus, we use the
				240	first line of the script to avoid this. Note that if I wanted to filter the connections to and from the
				241	internet, I would certainly need the first line so I don't filter the local connections as well.</P>
				242	<P class=section>
				243	5. ip DNAT in the <EM>iptables</EM> PREROUTING chain on frames/packets entering on a bridge port:</P>
				244	<P>Through some groovy play it is assured that (see /net/bridge/br_netfilter.c) DNAT'ed packets that after DNAT'ing
				245	have the same output device as the input device they came on (the logical bridge device which we like to call br0)
				246	will be bridged, not routed. So they will go through the <EM>ebtables</EM> FORWARD chain. All other DNAT'ed packets will be
				247	routed, so won't go through the <EM>ebtables</EM> FORWARD chain, will go through the <EM>ebtables</EM> INPUT chain and might go
				248	through the <EM>ebtables</EM> OUTPUT chain.</P>
				249	<P>
				250	Released under the GPL.</P>
				251	<P>
				252	Bart De Schuymer.</P>
				253	<P>
				254	Last updated the 19th May 2002.</P>
				255	</BODY></HTML>