osdl.org!shemminger | aba5acd | 2004-04-15 20:56:59 +0000 | [diff] [blame] | 1 | \documentstyle[12pt,twoside]{article} |
| 2 | \def\TITLE{Tunnels over IP} |
| 3 | \input preamble |
| 4 | \begin{center} |
| 5 | \Large\bf Tunnels over IP in Linux-2.2 |
| 6 | \end{center} |
| 7 | |
| 8 | |
| 9 | \begin{center} |
| 10 | { \large Alexey~N.~Kuznetsov } \\ |
| 11 | \em Institute for Nuclear Research, Moscow \\ |
| 12 | \verb|kuznet@ms2.inr.ac.ru| \\ |
| 13 | \rm March 17, 1999 |
| 14 | \end{center} |
| 15 | |
| 16 | \vspace{5mm} |
| 17 | |
| 18 | \tableofcontents |
| 19 | |
| 20 | |
| 21 | \section{Instead of introduction: micro-FAQ.} |
| 22 | |
| 23 | \begin{itemize} |
| 24 | |
| 25 | \item |
| 26 | Q: In linux-2.0.36 I used: |
| 27 | \begin{verbatim} |
| 28 | ifconfig tunl1 10.0.0.1 pointopoint 193.233.7.65 |
| 29 | \end{verbatim} |
| 30 | to create tunnel. It does not work in 2.2.0! |
| 31 | |
| 32 | A: You are right, it does not work. The command written above is split to two commands. |
| 33 | \begin{verbatim} |
| 34 | ip tunnel add MY-TUNNEL mode ipip remote 193.233.7.65 |
| 35 | \end{verbatim} |
| 36 | will create tunnel device with name \verb|MY-TUNNEL|. Now you may configure |
| 37 | it with: |
| 38 | \begin{verbatim} |
| 39 | ifconfig MY-TUNNEL 10.0.0.1 |
| 40 | \end{verbatim} |
| 41 | Certainly, if you prefer name \verb|tunl1| to \verb|MY-TUNNEL|, |
| 42 | you still may use it. |
| 43 | |
| 44 | \item |
| 45 | Q: In linux-2.0.36 I used: |
| 46 | \begin{verbatim} |
| 47 | ifconfig tunl0 10.0.0.1 |
| 48 | route add -net 10.0.0.0 gw 193.233.7.65 dev tunl0 |
| 49 | \end{verbatim} |
| 50 | to tunnel net 10.0.0.0 via router 193.233.7.65. It does not |
| 51 | work in 2.2.0! Moreover, \verb|route| prints a funny error sort of |
| 52 | ``network unreachable'' and after this I found a strange direct route |
| 53 | to 10.0.0.0 via \verb|tunl0| in routing table. |
| 54 | |
| 55 | A: Yes, in 2.2 the rule that {\em normal} gateway must reside on directly |
| 56 | connected network has not any exceptions. You may tell kernel, that |
| 57 | this particular route is {\em abnormal}: |
| 58 | \begin{verbatim} |
| 59 | ifconfig tunl0 10.0.0.1 netmask 255.255.255.255 |
| 60 | ip route add 10.0.0.0/8 via 193.233.7.65 dev tunl0 onlink |
| 61 | \end{verbatim} |
| 62 | Note keyword \verb|onlink|, it is the magic key that orders kernel |
| 63 | not to check for consistency of gateway address. |
| 64 | Probably, after this explanation you have already guessed another method |
| 65 | to cheat kernel: |
| 66 | \begin{verbatim} |
| 67 | ifconfig tunl0 10.0.0.1 netmask 255.255.255.255 |
| 68 | route add -host 193.233.7.65 dev tunl0 |
| 69 | route add -net 10.0.0.0 netmask 255.0.0.0 gw 193.233.7.65 |
| 70 | route del -host 193.233.7.65 dev tunl0 |
| 71 | \end{verbatim} |
| 72 | Well, if you like such tricks, nobody may prohibit you to use them. |
| 73 | Only do not forget |
| 74 | that between \verb|route add| and \verb|route del| host 193.233.7.65 is |
| 75 | unreachable. |
| 76 | |
| 77 | \item |
| 78 | Q: In 2.0.36 I used to load \verb|tunnel| device module and \verb|ipip| module. |
| 79 | I cannot find any \verb|tunnel| in 2.2! |
| 80 | |
| 81 | A: Linux-2.2 has single module \verb|ipip| for both directions of tunneling |
| 82 | and for all IPIP tunnel devices. |
| 83 | |
| 84 | \item |
| 85 | Q: \verb|traceroute| does not work over tunnel! Well, stop... It works, |
| 86 | only skips some number of hops. |
| 87 | |
| 88 | A: Yes. By default tunnel driver copies \verb|ttl| value from |
| 89 | inner packet to outer one. It means that path traversed by tunneled |
| 90 | packets to another endpoint is not hidden. If you dislike this, or if you |
| 91 | are going to use some routing protocol expecting that packets |
| 92 | with ttl 1 will reach peering host (f.e.\ RIP, OSPF or EBGP) |
| 93 | and you are not afraid of |
| 94 | tunnel loops, you may append option \verb|ttl 64|, when creating tunnel |
| 95 | with \verb|ip tunnel add|. |
| 96 | |
| 97 | \item |
| 98 | Q: ... Well, list of things, which 2.0 was able to do finishes. |
| 99 | |
| 100 | \end{itemize} |
| 101 | |
| 102 | \paragraph{Summary of differences between 2.2 and 2.0.} |
| 103 | |
| 104 | \begin{itemize} |
| 105 | |
| 106 | \item {\bf In 2.0} you could compile tunnel device into kernel |
| 107 | and got set of 4 devices \verb|tunl0| ... \verb|tunl3| or, |
| 108 | alternatively, compile it as module and load new module |
| 109 | for each new tunnel. Also, module \verb|ipip| was necessary |
| 110 | to receive tunneled packets. |
| 111 | |
| 112 | {\bf 2.2} has {\em one\/} module \verb|ipip|. Loading it you get base |
| 113 | tunnel device \verb|tunl0| and another tunnels may be created with command |
| 114 | \verb|ip tunnel add|. These new devices may have arbitrary names. |
| 115 | |
| 116 | |
| 117 | \item {\bf In 2.0} you set remote tunnel endpoint address with |
| 118 | the command \verb|ifconfig| ... \verb|pointopoint A|. |
| 119 | |
| 120 | {\bf In 2.2} this command has the same semantics on all |
| 121 | the interfaces, namely it sets not tunnel endpoint, |
| 122 | but address of peering host, which is directly reachable |
| 123 | via this tunnel, |
| 124 | rather than via Internet. Actual tunnel endpoint address \verb|A| |
| 125 | should be set with \verb|ip tunnel add ... remote A|. |
| 126 | |
| 127 | \item {\bf In 2.0} you create tunnel routes with the command: |
| 128 | \begin{verbatim} |
| 129 | route add -net 10.0.0.0 gw A dev tunl0 |
| 130 | \end{verbatim} |
| 131 | |
| 132 | {\bf 2.2} interprets this command equally for all device |
| 133 | kinds and gateway is required to be directly reachable via this tunnel, |
| 134 | rather than via Internet. You still may use \verb|ip route add ... onlink| |
| 135 | to override this behaviour. |
| 136 | |
| 137 | \end{itemize} |
| 138 | |
| 139 | |
| 140 | \section{Tunnel setup: basics} |
| 141 | |
| 142 | Standard Linux-2.2 kernel supports three flavor of tunnels, |
| 143 | listed in the following table: |
| 144 | \vspace{2mm} |
| 145 | |
| 146 | \begin{tabular}{lll} |
| 147 | \vrule depth 0.8ex width 0pt\relax |
| 148 | Mode & Description & Base device \\ |
| 149 | ipip & IP over IP & tunl0 \\ |
| 150 | sit & IPv6 over IP & sit0 \\ |
| 151 | gre & ANY over GRE over IP & gre0 |
| 152 | \end{tabular} |
| 153 | |
| 154 | \vspace{2mm} |
| 155 | |
| 156 | \noindent All the kinds of tunnels are created with one command: |
| 157 | \begin{verbatim} |
| 158 | ip tunnel add <NAME> mode <MODE> [ local <S> ] [ remote <D> ] |
| 159 | \end{verbatim} |
| 160 | |
| 161 | This command creates new tunnel device with name \verb|<NAME>|. |
| 162 | The \verb|<NAME>| is an arbitrary string. Particularly, |
| 163 | it may be even \verb|eth0|. The rest of parameters set |
| 164 | different tunnel characteristics. |
| 165 | |
| 166 | \begin{itemize} |
| 167 | |
| 168 | \item |
| 169 | \verb|mode <MODE>| sets tunnel mode. Three modes are available now |
| 170 | \verb|ipip|, \verb|sit| and \verb|gre|. |
| 171 | |
| 172 | \item |
| 173 | \verb|remote <D>| sets remote endpoint of the tunnel to IP |
| 174 | address \verb|<D>|. |
| 175 | \item |
| 176 | \verb|local <S>| sets fixed local address for tunneled |
| 177 | packets. It must be an address on another interface of this host. |
| 178 | |
| 179 | \end{itemize} |
| 180 | |
| 181 | \let\thefootnote\oldthefootnote |
| 182 | |
| 183 | Both \verb|remote| and \verb|local| may be omitted. In this case we |
| 184 | say that they are zero or wildcard. Two tunnels of one mode cannot |
| 185 | have the same \verb|remote| and \verb|local|. Particularly it means |
| 186 | that base device or fallback tunnel cannot be replicated.\footnote{ |
| 187 | This restriction is relaxed for keyed GRE tunnels.} |
| 188 | |
| 189 | Tunnels are divided to two classes: {\bf pointopoint} tunnels, which |
| 190 | have some not wildcard \verb|remote| address and deliver all the packets |
| 191 | to this destination, and {\bf NBMA} (i.e. Non-Broadcast Multi-Access) tunnels, |
| 192 | which have no \verb|remote|. Particularly, base devices (f.e.\ \verb|tunl0|) |
| 193 | are NBMA, because they have neither \verb|remote| nor |
| 194 | \verb|local| addresses. |
| 195 | |
| 196 | |
| 197 | After tunnel device is created you should configure it as you did |
| 198 | it with another devices. Certainly, the configuration of tunnels has |
| 199 | some features related to the fact that they work over existing Internet |
| 200 | routing infrastructure and simultaneously create new virtual links, |
| 201 | which changes this infrastructure. The danger that not enough careful |
| 202 | tunnel setup will result in formation of tunnel loops, |
| 203 | collapse of routing or flooding network with exponentially |
| 204 | growing number of tunneled fragments is very real. |
| 205 | |
| 206 | |
| 207 | Protocol setup on pointopoint tunnels does not differ of configuration |
| 208 | of another devices. You should set a protocol address with \verb|ifconfig| |
| 209 | and add routes with \verb|route| utility. |
| 210 | |
| 211 | NBMA tunnels are different. To route something via NBMA tunnel |
| 212 | you have to explain to driver, where it should deliver packets to. |
| 213 | The only way to make it is to create special routes with gateway |
| 214 | address pointing to desired endpoint. F.e.\ |
| 215 | \begin{verbatim} |
| 216 | ip route add 10.0.0.0/24 via <A> dev tunl0 onlink |
| 217 | \end{verbatim} |
| 218 | It is important to use option \verb|onlink|, otherwise |
| 219 | kernel will refuse request to create route via gateway not directly |
| 220 | reachable over device \verb|tunl0|. With IPv6 the situation is much simpler: |
| 221 | when you start device \verb|sit0|, it automatically configures itself |
| 222 | with all IPv4 addresses mapped to IPv6 space, so that all IPv4 |
| 223 | Internet is {\em really reachable} via \verb|sit0|! Excellent, the command |
| 224 | \begin{verbatim} |
| 225 | ip route add 3FFE::/16 via ::193.233.7.65 dev sit0 |
| 226 | \end{verbatim} |
| 227 | will route \verb|3FFE::/16| via \verb|sit0|, sending all the packets |
| 228 | destined to this prefix to 193.233.7.65. |
| 229 | |
| 230 | \section{Tunnel setup: options} |
| 231 | |
| 232 | Command \verb|ip tunnel add| has several additional options. |
| 233 | \begin{itemize} |
| 234 | |
| 235 | \item \verb|ttl N| --- set fixed TTL \verb|N| on tunneled packets. |
| 236 | \verb|N| is number in the range 1--255. 0 is special value, |
| 237 | meaning that packets inherit TTL value. |
| 238 | Default value is: \verb|inherit|. |
| 239 | |
| 240 | \item \verb|tos T| --- set fixed tos \verb|T| on tunneled packets. |
| 241 | Default value is: \verb|inherit|. |
| 242 | |
| 243 | \item \verb|dev DEV| --- bind tunnel to device \verb|DEV|, so that |
| 244 | tunneled packets will be routed only via this device and will |
| 245 | not be able to escape to another device, when route to endpoint changes. |
| 246 | |
| 247 | \item \verb|nopmtudisc| --- disable Path MTU Discovery on this tunnel. |
| 248 | It is enabled by default. Note that fixed ttl is incompatible |
| 249 | with this option: tunnels with fixed ttl always make pmtu discovery. |
| 250 | |
| 251 | \end{itemize} |
| 252 | |
| 253 | \verb|ipip| and \verb|sit| tunnels have no more options. \verb|gre| |
| 254 | tunnels are more complicated: |
| 255 | |
| 256 | \begin{itemize} |
| 257 | |
| 258 | \item \verb|key K| --- use keyed GRE with key \verb|K|. \verb|K| is |
| 259 | either number or IP address-like dotted quad. |
| 260 | |
| 261 | \item \verb|csum| --- checksum tunneled packets. |
| 262 | |
| 263 | \item \verb|seq| --- serialize packets. |
| 264 | \begin{NB} |
| 265 | I think this option does not |
| 266 | work. At least, I did not test it, did not debug it and |
| 267 | even do not understand, how it is supposed to work and for what |
| 268 | purpose Cisco planned to use it. |
| 269 | \end{NB} |
| 270 | |
| 271 | \end{itemize} |
| 272 | |
| 273 | |
| 274 | Actually, these GRE options can be set separately for input and |
| 275 | output directions by prefixing corresponding keywords with letter |
| 276 | \verb|i| or \verb|o|. F.e.\ \verb|icsum| orders to accept only |
| 277 | packets with correct checksum and \verb|ocsum| means, that |
| 278 | our host will calculate and send checksum. |
| 279 | |
| 280 | Command \verb|ip tunnel add| is not the only operation, |
| 281 | which can be made with tunnels. Certainly, you may get short help page |
| 282 | with: |
| 283 | \begin{verbatim} |
| 284 | ip tunnel help |
| 285 | \end{verbatim} |
| 286 | |
| 287 | Besides that, you may view list of installed tunnels with the help of command: |
| 288 | \begin{verbatim} |
| 289 | ip tunnel ls |
| 290 | \end{verbatim} |
| 291 | Also you may look at statistics: |
| 292 | \begin{verbatim} |
| 293 | ip -s tunnel ls Cisco |
| 294 | \end{verbatim} |
| 295 | where \verb|Cisco| is name of tunnel device. Command |
| 296 | \begin{verbatim} |
| 297 | ip tunnel del Cisco |
| 298 | \end{verbatim} |
| 299 | destroys tunnel \verb|Cisco|. And, finally, |
| 300 | \begin{verbatim} |
| 301 | ip tunnel change Cisco mode sit local ME remote HE ttl 32 |
| 302 | \end{verbatim} |
| 303 | changes its parameters. |
| 304 | |
| 305 | \section{Differences 2.2 and 2.0 tunnels revisited.} |
| 306 | |
| 307 | Now we can discuss more subtle differences between tunneling in 2.0 |
| 308 | and 2.2. |
| 309 | |
| 310 | \begin{itemize} |
| 311 | |
| 312 | \item In 2.0 all tunneled packets were received promiscuously |
| 313 | as soon as you loaded module \verb|ipip|. 2.2 tries to select the best |
| 314 | tunnel device and packet looks as received on this. F.e.\ if host |
| 315 | received \verb|ipip| packet from host \verb|D| destined to our |
| 316 | local address \verb|S|, kernel searches for matching tunnels |
| 317 | in order: |
| 318 | |
| 319 | \begin{tabular}{ll} |
| 320 | 1 & \verb|remote| is \verb|D| and \verb|local| is \verb|S| \\ |
| 321 | 2 & \verb|remote| is \verb|D| and \verb|local| is wildcard \\ |
| 322 | 3 & \verb|remote| is wildcard and \verb|local| is \verb|S| \\ |
| 323 | 4 & \verb|tunl0| |
| 324 | \end{tabular} |
| 325 | |
| 326 | If tunnel exists, but it is not in \verb|UP| state, the tunnel is ignored. |
| 327 | Note, that if \verb|tunl0| is \verb|UP| it receives all the IPIP packets, |
| 328 | not acknowledged by more specific tunnels. |
| 329 | Be careful, it means that without carefully installed firewall rules |
| 330 | anyone on the Internet may inject to your network any packets with |
| 331 | source addresses indistinguishable from local ones. It is not so bad idea |
| 332 | to design tunnels in the way enforcing maximal route symmetry |
| 333 | and to enable reversed path filter (\verb|rp_filter| sysctl option) on |
| 334 | tunnel devices. |
| 335 | |
| 336 | \item In 2.2 you can monitor and debug tunnels with \verb|tcpdump|. |
| 337 | F.e.\ \verb|tcpdump| \verb|-i Cisco| \verb|-nvv| will dump packets, |
| 338 | which kernel output, via tunnel \verb|Cisco| and the packets received on it |
| 339 | from kernel viewpoint. |
| 340 | |
| 341 | \end{itemize} |
| 342 | |
| 343 | |
| 344 | \section{Linux and Cisco IOS tunnels.} |
| 345 | |
| 346 | Among another tunnels Cisco IOS supports IPIP and GRE. |
| 347 | Essentially, Cisco setup is subset of options, available for Linux. |
| 348 | Let us consider the simplest example: |
| 349 | |
| 350 | \begin{verbatim} |
| 351 | interface Tunnel0 |
| 352 | tunnel mode gre ip |
| 353 | tunnel source 10.10.14.1 |
| 354 | tunnel destination 10.10.13.2 |
| 355 | \end{verbatim} |
| 356 | |
| 357 | |
| 358 | This command set translates to: |
| 359 | |
| 360 | \begin{verbatim} |
| 361 | ip tunnel add Tunnel0 \ |
| 362 | mode gre \ |
| 363 | local 10.10.14.1 \ |
| 364 | remote 10.10.13.2 |
| 365 | \end{verbatim} |
| 366 | |
| 367 | Any questions? No questions. |
| 368 | |
| 369 | \section{Interaction IPIP tunnels and DVMRP.} |
| 370 | |
| 371 | DVMRP exploits IPIP tunnels to route multicasts via Internet. |
| 372 | \verb|mrouted| creates |
| 373 | IPIP tunnels listed in its configuration file automatically. |
| 374 | From kernel and user viewpoints there are no differences between |
| 375 | tunnels, created in this way, and tunnels created by \verb|ip tunnel|. |
| 376 | I.e.\ if \verb|mrouted| created some tunnel, it may be used to |
| 377 | route unicast packets, provided appropriate routes are added. |
| 378 | And vice versa, if administrator has already created a tunnel, |
| 379 | it will be reused by \verb|mrouted|, if it requests DVMRP |
| 380 | tunnel with the same local and remote addresses. |
| 381 | |
| 382 | Do not wonder, if your manually configured tunnel is |
| 383 | destroyed, when mrouted exits. |
| 384 | |
| 385 | |
| 386 | \section{Broadcast GRE ``tunnels''.} |
| 387 | |
| 388 | It is possible to set \verb|remote| for GRE tunnel to a multicast |
| 389 | address. Such tunnel becomes {\bf broadcast} tunnel (though word |
| 390 | tunnel is not quite appropriate in this case, it is rather virtual network). |
| 391 | \begin{verbatim} |
| 392 | ip tunnel add Universe local 193.233.7.65 \ |
| 393 | remote 224.66.66.66 ttl 16 |
| 394 | ip addr add 10.0.0.1/16 dev Universe |
| 395 | ip link set Universe up |
| 396 | \end{verbatim} |
| 397 | This tunnel is true broadcast network and broadcast packets are |
| 398 | sent to multicast group 224.66.66.66. By default such tunnel starts |
| 399 | to resolve both IP and IPv6 addresses via ARP/NDISC, so that |
| 400 | if multicast routing is supported in surrounding network, all GRE nodes |
| 401 | will find one another automatically and will form virtual Ethernet-like |
| 402 | broadcast network. If multicast routing does not work, it is unpleasant |
| 403 | but not fatal flaw. The tunnel becomes NBMA rather than broadcast network. |
| 404 | You may disable dynamic ARPing by: |
| 405 | \begin{verbatim} |
| 406 | echo 0 > /proc/sys/net/ipv4/neigh/Universe/mcast_solicit |
| 407 | \end{verbatim} |
| 408 | and to add required information to ARP tables manually: |
| 409 | \begin{verbatim} |
| 410 | ip neigh add 10.0.0.2 lladdr 128.6.190.2 dev Universe nud permanent |
| 411 | \end{verbatim} |
| 412 | In this case packets sent to 10.0.0.2 will be encapsulated in GRE |
| 413 | and sent to 128.6.190.2. It is possible to facilitate address resolution |
| 414 | using methods typical for another NBMA networks f.e.\ to start user |
| 415 | level \verb|arpd| daemon, which will maintain database of hosts attached |
| 416 | to GRE virtual network or ask for information |
| 417 | dedicated ARP or NHRP server. |
| 418 | |
| 419 | |
| 420 | Actually, such setup is the most natural for tunneling, |
| 421 | it is really flexible, scalable and easily managable, so that |
| 422 | it is strongly recommended to be used with GRE tunnels instead of ugly |
| 423 | hack with NBMA mode and \verb|onlink| modifier. Unfortunately, |
| 424 | by historical reasons broadcast mode is not supported by IPIP tunnels, |
| 425 | but this probably will change in future. |
| 426 | |
| 427 | |
| 428 | |
| 429 | \section{Traffic control issues.} |
| 430 | |
| 431 | Tunnels are devices, hence all the power of Linux traffic control |
| 432 | applies to them. The simplest (and the most useful in practice) |
| 433 | example is limiting tunnel bandwidth. The following command: |
| 434 | \begin{verbatim} |
| 435 | tc qdisc add dev tunl0 root tbf \ |
| 436 | rate 128Kbit burst 4K limit 10K |
| 437 | \end{verbatim} |
| 438 | will limit tunneled traffic to 128Kbit with maximal burst size of 4K |
| 439 | and queuing not more than 10K. |
| 440 | |
| 441 | However, you should remember, that tunnels are {\em virtual} devices |
| 442 | implemented in software and true queue management is impossible for them |
| 443 | just because they have no queues. Instead, it is better to create classes |
| 444 | on real physical interfaces and to map tunneled packets to them. |
| 445 | In general case of dynamic routing you should create such classes |
| 446 | on all outgoing interfaces, or, alternatively, |
| 447 | to use option \verb|dev DEV| to bind tunnel to a fixed physical device. |
| 448 | In the last case packets will be routed only via specified device |
| 449 | and you need to setup corresponding classes only on it. |
| 450 | Though you have to pay for this convenience, |
| 451 | if routing will change, your tunnel will fail. |
| 452 | |
| 453 | Suppose that CBQ class \verb|1:ABC| has been created on device \verb|eth0| |
| 454 | specially for tunnel \verb|Cisco| with endpoints \verb|S| and \verb|D|. |
| 455 | Now you can select IPIP packets with addresses \verb|S| and \verb|D| |
| 456 | with some classifier and map them to class \verb|1:ABC|. F.e.\ |
| 457 | it is easy to make with \verb|rsvp| classifier: |
| 458 | \begin{verbatim} |
| 459 | tc filter add dev eth0 pref 100 proto ip rsvp \ |
| 460 | session D ipproto ipip filter S \ |
| 461 | classid 1:ABC |
| 462 | \end{verbatim} |
| 463 | |
| 464 | If you want to make more detailed classification of sub-flows |
| 465 | transmitted via tunnel, you can build CBQ subtree, |
| 466 | rooted at \verb|1:ABC| and attach to subroot set of rules parsing |
| 467 | IPIP packets more deeply. |
| 468 | |
| 469 | \end{document} |