Blame - doc/ip-tunnels.tex - platform/external/iproute2

blob: 0a8c930cb50580efadd9fec0216038b8e8687df8 [file] [log] [blame]

osdl.org!shemminger	aba5acd	2004-04-15 20:56:59 +0000	[diff] [blame]	1	\documentstyle[12pt,twoside]{article}
				2	\def\TITLE{Tunnels over IP}
				3	\input preamble
				4	\begin{center}
				5	\Large\bf Tunnels over IP in Linux-2.2
				6	\end{center}
				7
				8
				9	\begin{center}
				10	{ \large Alexey~N.~Kuznetsov } \\
				11	\em Institute for Nuclear Research, Moscow \\
				12	\verb\|kuznet@ms2.inr.ac.ru\| \\
				13	\rm March 17, 1999
				14	\end{center}
				15
				16	\vspace{5mm}
				17
				18	\tableofcontents
				19
				20
				21	\section{Instead of introduction: micro-FAQ.}
				22
				23	\begin{itemize}
				24
				25	\item
				26	Q: In linux-2.0.36 I used:
				27	\begin{verbatim}
				28	ifconfig tunl1 10.0.0.1 pointopoint 193.233.7.65
				29	\end{verbatim}
				30	to create tunnel. It does not work in 2.2.0!
				31
				32	A: You are right, it does not work. The command written above is split to two commands.
				33	\begin{verbatim}
				34	ip tunnel add MY-TUNNEL mode ipip remote 193.233.7.65
				35	\end{verbatim}
				36	will create tunnel device with name \verb\|MY-TUNNEL\|. Now you may configure
				37	it with:
				38	\begin{verbatim}
				39	ifconfig MY-TUNNEL 10.0.0.1
				40	\end{verbatim}
				41	Certainly, if you prefer name \verb\|tunl1\| to \verb\|MY-TUNNEL\|,
				42	you still may use it.
				43
				44	\item
				45	Q: In linux-2.0.36 I used:
				46	\begin{verbatim}
				47	ifconfig tunl0 10.0.0.1
				48	route add -net 10.0.0.0 gw 193.233.7.65 dev tunl0
				49	\end{verbatim}
				50	to tunnel net 10.0.0.0 via router 193.233.7.65. It does not
				51	work in 2.2.0! Moreover, \verb\|route\| prints a funny error sort of
				52	``network unreachable'' and after this I found a strange direct route
				53	to 10.0.0.0 via \verb\|tunl0\| in routing table.
				54
				55	A: Yes, in 2.2 the rule that {\em normal} gateway must reside on directly
				56	connected network has not any exceptions. You may tell kernel, that
				57	this particular route is {\em abnormal}:
				58	\begin{verbatim}
				59	ifconfig tunl0 10.0.0.1 netmask 255.255.255.255
				60	ip route add 10.0.0.0/8 via 193.233.7.65 dev tunl0 onlink
				61	\end{verbatim}
				62	Note keyword \verb\|onlink\|, it is the magic key that orders kernel
				63	not to check for consistency of gateway address.
				64	Probably, after this explanation you have already guessed another method
				65	to cheat kernel:
				66	\begin{verbatim}
				67	ifconfig tunl0 10.0.0.1 netmask 255.255.255.255
				68	route add -host 193.233.7.65 dev tunl0
				69	route add -net 10.0.0.0 netmask 255.0.0.0 gw 193.233.7.65
				70	route del -host 193.233.7.65 dev tunl0
				71	\end{verbatim}
				72	Well, if you like such tricks, nobody may prohibit you to use them.
				73	Only do not forget
				74	that between \verb\|route add\| and \verb\|route del\| host 193.233.7.65 is
				75	unreachable.
				76
				77	\item
				78	Q: In 2.0.36 I used to load \verb\|tunnel\| device module and \verb\|ipip\| module.
				79	I cannot find any \verb\|tunnel\| in 2.2!
				80
				81	A: Linux-2.2 has single module \verb\|ipip\| for both directions of tunneling
				82	and for all IPIP tunnel devices.
				83
				84	\item
				85	Q: \verb\|traceroute\| does not work over tunnel! Well, stop... It works,
				86	only skips some number of hops.
				87
				88	A: Yes. By default tunnel driver copies \verb\|ttl\| value from
				89	inner packet to outer one. It means that path traversed by tunneled
				90	packets to another endpoint is not hidden. If you dislike this, or if you
				91	are going to use some routing protocol expecting that packets
				92	with ttl 1 will reach peering host (f.e.\ RIP, OSPF or EBGP)
				93	and you are not afraid of
				94	tunnel loops, you may append option \verb\|ttl 64\|, when creating tunnel
				95	with \verb\|ip tunnel add\|.
				96
				97	\item
				98	Q: ... Well, list of things, which 2.0 was able to do finishes.
				99
				100	\end{itemize}
				101
				102	\paragraph{Summary of differences between 2.2 and 2.0.}
				103
				104	\begin{itemize}
				105
				106	\item {\bf In 2.0} you could compile tunnel device into kernel
				107	and got set of 4 devices \verb\|tunl0\| ... \verb\|tunl3\| or,
				108	alternatively, compile it as module and load new module
				109	for each new tunnel. Also, module \verb\|ipip\| was necessary
				110	to receive tunneled packets.
				111
				112	{\bf 2.2} has {\em one\/} module \verb\|ipip\|. Loading it you get base
				113	tunnel device \verb\|tunl0\| and another tunnels may be created with command
				114	\verb\|ip tunnel add\|. These new devices may have arbitrary names.
				115
				116
				117	\item {\bf In 2.0} you set remote tunnel endpoint address with
				118	the command \verb\|ifconfig\| ... \verb\|pointopoint A\|.
				119
				120	{\bf In 2.2} this command has the same semantics on all
				121	the interfaces, namely it sets not tunnel endpoint,
				122	but address of peering host, which is directly reachable
				123	via this tunnel,
				124	rather than via Internet. Actual tunnel endpoint address \verb\|A\|
				125	should be set with \verb\|ip tunnel add ... remote A\|.
				126
				127	\item {\bf In 2.0} you create tunnel routes with the command:
				128	\begin{verbatim}
				129	route add -net 10.0.0.0 gw A dev tunl0
				130	\end{verbatim}
				131
				132	{\bf 2.2} interprets this command equally for all device
				133	kinds and gateway is required to be directly reachable via this tunnel,
				134	rather than via Internet. You still may use \verb\|ip route add ... onlink\|
				135	to override this behaviour.
				136
				137	\end{itemize}
				138
				139
				140	\section{Tunnel setup: basics}
				141
				142	Standard Linux-2.2 kernel supports three flavor of tunnels,
				143	listed in the following table:
				144	\vspace{2mm}
				145
				146	\begin{tabular}{lll}
				147	\vrule depth 0.8ex width 0pt\relax
				148	Mode & Description & Base device \\
				149	ipip & IP over IP & tunl0 \\
				150	sit & IPv6 over IP & sit0 \\
				151	gre & ANY over GRE over IP & gre0
				152	\end{tabular}
				153
				154	\vspace{2mm}
				155
				156	\noindent All the kinds of tunnels are created with one command:
				157	\begin{verbatim}
				158	ip tunnel add <NAME> mode <MODE> [ local <S> ] [ remote <D> ]
				159	\end{verbatim}
				160
				161	This command creates new tunnel device with name \verb\|<NAME>\|.
				162	The \verb\|<NAME>\| is an arbitrary string. Particularly,
				163	it may be even \verb\|eth0\|. The rest of parameters set
				164	different tunnel characteristics.
				165
				166	\begin{itemize}
				167
				168	\item
				169	\verb\|mode <MODE>\| sets tunnel mode. Three modes are available now
				170	\verb\|ipip\|, \verb\|sit\| and \verb\|gre\|.
				171
				172	\item
				173	\verb\|remote <D>\| sets remote endpoint of the tunnel to IP
				174	address \verb\|<D>\|.
				175	\item
				176	\verb\|local <S>\| sets fixed local address for tunneled
				177	packets. It must be an address on another interface of this host.
				178
				179	\end{itemize}
				180
				181	\let\thefootnote\oldthefootnote
				182
				183	Both \verb\|remote\| and \verb\|local\| may be omitted. In this case we
				184	say that they are zero or wildcard. Two tunnels of one mode cannot
				185	have the same \verb\|remote\| and \verb\|local\|. Particularly it means
				186	that base device or fallback tunnel cannot be replicated.\footnote{
				187	This restriction is relaxed for keyed GRE tunnels.}
				188
				189	Tunnels are divided to two classes: {\bf pointopoint} tunnels, which
				190	have some not wildcard \verb\|remote\| address and deliver all the packets
				191	to this destination, and {\bf NBMA} (i.e. Non-Broadcast Multi-Access) tunnels,
				192	which have no \verb\|remote\|. Particularly, base devices (f.e.\ \verb\|tunl0\|)
				193	are NBMA, because they have neither \verb\|remote\| nor
				194	\verb\|local\| addresses.
				195
				196
				197	After tunnel device is created you should configure it as you did
				198	it with another devices. Certainly, the configuration of tunnels has
				199	some features related to the fact that they work over existing Internet
				200	routing infrastructure and simultaneously create new virtual links,
				201	which changes this infrastructure. The danger that not enough careful
				202	tunnel setup will result in formation of tunnel loops,
				203	collapse of routing or flooding network with exponentially
				204	growing number of tunneled fragments is very real.
				205
				206
				207	Protocol setup on pointopoint tunnels does not differ of configuration
				208	of another devices. You should set a protocol address with \verb\|ifconfig\|
				209	and add routes with \verb\|route\| utility.
				210
				211	NBMA tunnels are different. To route something via NBMA tunnel
				212	you have to explain to driver, where it should deliver packets to.
				213	The only way to make it is to create special routes with gateway
				214	address pointing to desired endpoint. F.e.\
				215	\begin{verbatim}
				216	ip route add 10.0.0.0/24 via <A> dev tunl0 onlink
				217	\end{verbatim}
				218	It is important to use option \verb\|onlink\|, otherwise
				219	kernel will refuse request to create route via gateway not directly
				220	reachable over device \verb\|tunl0\|. With IPv6 the situation is much simpler:
				221	when you start device \verb\|sit0\|, it automatically configures itself
				222	with all IPv4 addresses mapped to IPv6 space, so that all IPv4
				223	Internet is {\em really reachable} via \verb\|sit0\|! Excellent, the command
				224	\begin{verbatim}
				225	ip route add 3FFE::/16 via ::193.233.7.65 dev sit0
				226	\end{verbatim}
				227	will route \verb\|3FFE::/16\| via \verb\|sit0\|, sending all the packets
				228	destined to this prefix to 193.233.7.65.
				229
				230	\section{Tunnel setup: options}
				231
				232	Command \verb\|ip tunnel add\| has several additional options.
				233	\begin{itemize}
				234
				235	\item \verb\|ttl N\| --- set fixed TTL \verb\|N\| on tunneled packets.
				236	\verb\|N\| is number in the range 1--255. 0 is special value,
				237	meaning that packets inherit TTL value.
				238	Default value is: \verb\|inherit\|.
				239
				240	\item \verb\|tos T\| --- set fixed tos \verb\|T\| on tunneled packets.
				241	Default value is: \verb\|inherit\|.
				242
				243	\item \verb\|dev DEV\| --- bind tunnel to device \verb\|DEV\|, so that
				244	tunneled packets will be routed only via this device and will
				245	not be able to escape to another device, when route to endpoint changes.
				246
				247	\item \verb\|nopmtudisc\| --- disable Path MTU Discovery on this tunnel.
				248	It is enabled by default. Note that fixed ttl is incompatible
				249	with this option: tunnels with fixed ttl always make pmtu discovery.
				250
				251	\end{itemize}
				252
				253	\verb\|ipip\| and \verb\|sit\| tunnels have no more options. \verb\|gre\|
				254	tunnels are more complicated:
				255
				256	\begin{itemize}
				257
				258	\item \verb\|key K\| --- use keyed GRE with key \verb\|K\|. \verb\|K\| is
				259	either number or IP address-like dotted quad.
				260
				261	\item \verb\|csum\| --- checksum tunneled packets.
				262
				263	\item \verb\|seq\| --- serialize packets.
				264	\begin{NB}
				265	I think this option does not
				266	work. At least, I did not test it, did not debug it and
				267	even do not understand, how it is supposed to work and for what
				268	purpose Cisco planned to use it.
				269	\end{NB}
				270
				271	\end{itemize}
				272
				273
				274	Actually, these GRE options can be set separately for input and
				275	output directions by prefixing corresponding keywords with letter
				276	\verb\|i\| or \verb\|o\|. F.e.\ \verb\|icsum\| orders to accept only
				277	packets with correct checksum and \verb\|ocsum\| means, that
				278	our host will calculate and send checksum.
				279
				280	Command \verb\|ip tunnel add\| is not the only operation,
				281	which can be made with tunnels. Certainly, you may get short help page
				282	with:
				283	\begin{verbatim}
				284	ip tunnel help
				285	\end{verbatim}
				286
				287	Besides that, you may view list of installed tunnels with the help of command:
				288	\begin{verbatim}
				289	ip tunnel ls
				290	\end{verbatim}
				291	Also you may look at statistics:
				292	\begin{verbatim}
				293	ip -s tunnel ls Cisco
				294	\end{verbatim}
				295	where \verb\|Cisco\| is name of tunnel device. Command
				296	\begin{verbatim}
				297	ip tunnel del Cisco
				298	\end{verbatim}
				299	destroys tunnel \verb\|Cisco\|. And, finally,
				300	\begin{verbatim}
				301	ip tunnel change Cisco mode sit local ME remote HE ttl 32
				302	\end{verbatim}
				303	changes its parameters.
				304
				305	\section{Differences 2.2 and 2.0 tunnels revisited.}
				306
				307	Now we can discuss more subtle differences between tunneling in 2.0
				308	and 2.2.
				309
				310	\begin{itemize}
				311
				312	\item In 2.0 all tunneled packets were received promiscuously
				313	as soon as you loaded module \verb\|ipip\|. 2.2 tries to select the best
				314	tunnel device and packet looks as received on this. F.e.\ if host
				315	received \verb\|ipip\| packet from host \verb\|D\| destined to our
				316	local address \verb\|S\|, kernel searches for matching tunnels
				317	in order:
				318
				319	\begin{tabular}{ll}
				320	1 & \verb\|remote\| is \verb\|D\| and \verb\|local\| is \verb\|S\| \\
				321	2 & \verb\|remote\| is \verb\|D\| and \verb\|local\| is wildcard \\
				322	3 & \verb\|remote\| is wildcard and \verb\|local\| is \verb\|S\| \\
				323	4 & \verb\|tunl0\|
				324	\end{tabular}
				325
				326	If tunnel exists, but it is not in \verb\|UP\| state, the tunnel is ignored.
				327	Note, that if \verb\|tunl0\| is \verb\|UP\| it receives all the IPIP packets,
				328	not acknowledged by more specific tunnels.
				329	Be careful, it means that without carefully installed firewall rules
				330	anyone on the Internet may inject to your network any packets with
				331	source addresses indistinguishable from local ones. It is not so bad idea
				332	to design tunnels in the way enforcing maximal route symmetry
				333	and to enable reversed path filter (\verb\|rp_filter\| sysctl option) on
				334	tunnel devices.
				335
				336	\item In 2.2 you can monitor and debug tunnels with \verb\|tcpdump\|.
				337	F.e.\ \verb\|tcpdump\| \verb\|-i Cisco\| \verb\|-nvv\| will dump packets,
				338	which kernel output, via tunnel \verb\|Cisco\| and the packets received on it
				339	from kernel viewpoint.
				340
				341	\end{itemize}
				342
				343
				344	\section{Linux and Cisco IOS tunnels.}
				345
				346	Among another tunnels Cisco IOS supports IPIP and GRE.
				347	Essentially, Cisco setup is subset of options, available for Linux.
				348	Let us consider the simplest example:
				349
				350	\begin{verbatim}
				351	interface Tunnel0
				352	tunnel mode gre ip
				353	tunnel source 10.10.14.1
				354	tunnel destination 10.10.13.2
				355	\end{verbatim}
				356
				357
				358	This command set translates to:
				359
				360	\begin{verbatim}
				361	ip tunnel add Tunnel0 \
				362	mode gre \
				363	local 10.10.14.1 \
				364	remote 10.10.13.2
				365	\end{verbatim}
				366
				367	Any questions? No questions.
				368
				369	\section{Interaction IPIP tunnels and DVMRP.}
				370
				371	DVMRP exploits IPIP tunnels to route multicasts via Internet.
				372	\verb\|mrouted\| creates
				373	IPIP tunnels listed in its configuration file automatically.
				374	From kernel and user viewpoints there are no differences between
				375	tunnels, created in this way, and tunnels created by \verb\|ip tunnel\|.
				376	I.e.\ if \verb\|mrouted\| created some tunnel, it may be used to
				377	route unicast packets, provided appropriate routes are added.
				378	And vice versa, if administrator has already created a tunnel,
				379	it will be reused by \verb\|mrouted\|, if it requests DVMRP
				380	tunnel with the same local and remote addresses.
				381
				382	Do not wonder, if your manually configured tunnel is
				383	destroyed, when mrouted exits.
				384
				385
				386	\section{Broadcast GRE ``tunnels''.}
				387
				388	It is possible to set \verb\|remote\| for GRE tunnel to a multicast
				389	address. Such tunnel becomes {\bf broadcast} tunnel (though word
				390	tunnel is not quite appropriate in this case, it is rather virtual network).
				391	\begin{verbatim}
				392	ip tunnel add Universe local 193.233.7.65 \
				393	remote 224.66.66.66 ttl 16
				394	ip addr add 10.0.0.1/16 dev Universe
				395	ip link set Universe up
				396	\end{verbatim}
				397	This tunnel is true broadcast network and broadcast packets are
				398	sent to multicast group 224.66.66.66. By default such tunnel starts
				399	to resolve both IP and IPv6 addresses via ARP/NDISC, so that
				400	if multicast routing is supported in surrounding network, all GRE nodes
				401	will find one another automatically and will form virtual Ethernet-like
				402	broadcast network. If multicast routing does not work, it is unpleasant
				403	but not fatal flaw. The tunnel becomes NBMA rather than broadcast network.
				404	You may disable dynamic ARPing by:
				405	\begin{verbatim}
				406	echo 0 > /proc/sys/net/ipv4/neigh/Universe/mcast_solicit
				407	\end{verbatim}
				408	and to add required information to ARP tables manually:
				409	\begin{verbatim}
				410	ip neigh add 10.0.0.2 lladdr 128.6.190.2 dev Universe nud permanent
				411	\end{verbatim}
				412	In this case packets sent to 10.0.0.2 will be encapsulated in GRE
				413	and sent to 128.6.190.2. It is possible to facilitate address resolution
				414	using methods typical for another NBMA networks f.e.\ to start user
				415	level \verb\|arpd\| daemon, which will maintain database of hosts attached
				416	to GRE virtual network or ask for information
				417	dedicated ARP or NHRP server.
				418
				419
				420	Actually, such setup is the most natural for tunneling,
				421	it is really flexible, scalable and easily managable, so that
				422	it is strongly recommended to be used with GRE tunnels instead of ugly
				423	hack with NBMA mode and \verb\|onlink\| modifier. Unfortunately,
				424	by historical reasons broadcast mode is not supported by IPIP tunnels,
				425	but this probably will change in future.
				426
				427
				428
				429	\section{Traffic control issues.}
				430
				431	Tunnels are devices, hence all the power of Linux traffic control
				432	applies to them. The simplest (and the most useful in practice)
				433	example is limiting tunnel bandwidth. The following command:
				434	\begin{verbatim}
				435	tc qdisc add dev tunl0 root tbf \
				436	rate 128Kbit burst 4K limit 10K
				437	\end{verbatim}
				438	will limit tunneled traffic to 128Kbit with maximal burst size of 4K
				439	and queuing not more than 10K.
				440
				441	However, you should remember, that tunnels are {\em virtual} devices
				442	implemented in software and true queue management is impossible for them
				443	just because they have no queues. Instead, it is better to create classes
				444	on real physical interfaces and to map tunneled packets to them.
				445	In general case of dynamic routing you should create such classes
				446	on all outgoing interfaces, or, alternatively,
				447	to use option \verb\|dev DEV\| to bind tunnel to a fixed physical device.
				448	In the last case packets will be routed only via specified device
				449	and you need to setup corresponding classes only on it.
				450	Though you have to pay for this convenience,
				451	if routing will change, your tunnel will fail.
				452
				453	Suppose that CBQ class \verb\|1:ABC\| has been created on device \verb\|eth0\|
				454	specially for tunnel \verb\|Cisco\| with endpoints \verb\|S\| and \verb\|D\|.
				455	Now you can select IPIP packets with addresses \verb\|S\| and \verb\|D\|
				456	with some classifier and map them to class \verb\|1:ABC\|. F.e.\
				457	it is easy to make with \verb\|rsvp\| classifier:
				458	\begin{verbatim}
				459	tc filter add dev eth0 pref 100 proto ip rsvp \
				460	session D ipproto ipip filter S \
				461	classid 1:ABC
				462	\end{verbatim}
				463
				464	If you want to make more detailed classification of sub-flows
				465	transmitted via tunnel, you can build CBQ subtree,
				466	rooted at \verb\|1:ABC\| and attach to subroot set of rules parsing
				467	IPIP packets more deeply.
				468
				469	\end{document}