Blame - Documentation/networking/openvswitch.txt - kernel/msm-4.9

blob: b3b9ac61d29d8751baff13446f800a2c9bc9e9fb [file] [log] [blame]

Jesse Gross	ccb1352	2011-10-25 19:26:31 -0700	[diff] [blame]	1	Open vSwitch datapath developer documentation
				2	=============================================
				3
				4	The Open vSwitch kernel module allows flexible userspace control over
				5	flow-level packet processing on selected network devices. It can be
				6	used to implement a plain Ethernet switch, network device bonding,
				7	VLAN processing, network access control, flow-based network control,
				8	and so on.
				9
				10	The kernel module implements multiple "datapaths" (analogous to
				11	bridges), each of which can have multiple "vports" (analogous to ports
				12	within a bridge). Each datapath also has associated with it a "flow
				13	table" that userspace populates with "flows" that map from keys based
				14	on packet headers and metadata to sets of actions. The most common
				15	action forwards the packet to another vport; other actions are also
				16	implemented.
				17
				18	When a packet arrives on a vport, the kernel module processes it by
				19	extracting its flow key and looking it up in the flow table. If there
				20	is a matching flow, it executes the associated actions. If there is
				21	no match, it queues the packet to userspace for processing (as part of
				22	its processing, userspace will likely set up a flow to handle further
				23	packets of the same type entirely in-kernel).
				24
				25
				26	Flow key compatibility
				27	----------------------
				28
				29	Network protocols evolve over time. New protocols become important
				30	and existing protocols lose their prominence. For the Open vSwitch
				31	kernel module to remain relevant, it must be possible for newer
				32	versions to parse additional protocols as part of the flow key. It
				33	might even be desirable, someday, to drop support for parsing
				34	protocols that have become obsolete. Therefore, the Netlink interface
				35	to Open vSwitch is designed to allow carefully written userspace
				36	applications to work with any version of the flow key, past or future.
				37
				38	To support this forward and backward compatibility, whenever the
				39	kernel module passes a packet to userspace, it also passes along the
				40	flow key that it parsed from the packet. Userspace then extracts its
				41	own notion of a flow key from the packet and compares it against the
				42	kernel-provided version:
				43
				44	- If userspace's notion of the flow key for the packet matches the
				45	kernel's, then nothing special is necessary.
				46
				47	- If the kernel's flow key includes more fields than the userspace
				48	version of the flow key, for example if the kernel decoded IPv6
				49	headers but userspace stopped at the Ethernet type (because it
				50	does not understand IPv6), then again nothing special is
				51	necessary. Userspace can still set up a flow in the usual way,
				52	as long as it uses the kernel-provided flow key to do it.
				53
				54	- If the userspace flow key includes more fields than the
				55	kernel's, for example if userspace decoded an IPv6 header but
				56	the kernel stopped at the Ethernet type, then userspace can
				57	forward the packet manually, without setting up a flow in the
				58	kernel. This case is bad for performance because every packet
				59	that the kernel considers part of the flow must go to userspace,
				60	but the forwarding behavior is correct. (If userspace can
				61	determine that the values of the extra fields would not affect
				62	forwarding behavior, then it could set up a flow anyway.)
				63
				64	How flow keys evolve over time is important to making this work, so
				65	the following sections go into detail.
				66
				67
				68	Flow key format
				69	---------------
				70
				71	A flow key is passed over a Netlink socket as a sequence of Netlink
				72	attributes. Some attributes represent packet metadata, defined as any
				73	information about a packet that cannot be extracted from the packet
				74	itself, e.g. the vport on which the packet was received. Most
				75	attributes, however, are extracted from headers within the packet,
				76	e.g. source and destination addresses from Ethernet, IP, or TCP
				77	headers.
				78
				79	The <linux/openvswitch.h> header file defines the exact format of the
				80	flow key attributes. For informal explanatory purposes here, we write
				81	them as comma-separated strings, with parentheses indicating arguments
				82	and nesting. For example, the following could represent a flow key
				83	corresponding to a TCP packet that arrived on vport 1:
				84
				85	in_port(1), eth(src=e0:91:f5:21:d0:b2, dst=00:02:e3:0f:80:a4),
				86	eth_type(0x0800), ipv4(src=172.16.0.20, dst=172.18.0.52, proto=17, tos=0,
				87	frag=no), tcp(src=49163, dst=80)
				88
				89	Often we ellipsize arguments not important to the discussion, e.g.:
				90
				91	in_port(1), eth(...), eth_type(0x0800), ipv4(...), tcp(...)
				92
				93
Andy Zhou	03f0d91	2013-08-07 20:01:00 -0700	[diff] [blame]	94	Wildcarded flow key format
				95	--------------------------
				96
				97	A wildcarded flow is described with two sequences of Netlink attributes
				98	passed over the Netlink socket. A flow key, exactly as described above, and an
				99	optional corresponding flow mask.
				100
				101	A wildcarded flow can represent a group of exact match flows. Each '1' bit
				102	in the mask specifies a exact match with the corresponding bit in the flow key.
				103	A '0' bit specifies a don't care bit, which will match either a '1' or '0' bit
				104	of a incoming packet. Using wildcarded flow can improve the flow set up rate
				105	by reduce the number of new flows need to be processed by the user space program.
				106
				107	Support for the mask Netlink attribute is optional for both the kernel and user
				108	space program. The kernel can ignore the mask attribute, installing an exact
				109	match flow, or reduce the number of don't care bits in the kernel to less than
				110	what was specified by the user space program. In this case, variations in bits
				111	that the kernel does not implement will simply result in additional flow setups.
				112	The kernel module will also work with user space programs that neither support
				113	nor supply flow mask attributes.
				114
				115	Since the kernel may ignore or modify wildcard bits, it can be difficult for
				116	the userspace program to know exactly what matches are installed. There are
				117	two possible approaches: reactively install flows as they miss the kernel
				118	flow table (and therefore not attempt to determine wildcard changes at all)
				119	or use the kernel's response messages to determine the installed wildcards.
				120
				121	When interacting with userspace, the kernel should maintain the match portion
				122	of the key exactly as originally installed. This will provides a handle to
				123	identify the flow for all future operations. However, when reporting the
				124	mask of an installed flow, the mask should include any restrictions imposed
				125	by the kernel.
				126
				127	The behavior when using overlapping wildcarded flows is undefined. It is the
				128	responsibility of the user space program to ensure that any incoming packet
				129	can match at most one flow, wildcarded or not. The current implementation
				130	performs best-effort detection of overlapping wildcarded flows and may reject
				131	some but not all of them. However, this behavior may change in future versions.
				132
				133
Joe Stringer	74ed7ab	2015-01-21 16:42:52 -0800	[diff] [blame]	134	Unique flow identifiers
				135	-----------------------
				136
				137	An alternative to using the original match portion of a key as the handle for
				138	flow identification is a unique flow identifier, or "UFID". UFIDs are optional
				139	for both the kernel and user space program.
				140
				141	User space programs that support UFID are expected to provide it during flow
				142	setup in addition to the flow, then refer to the flow using the UFID for all
				143	future operations. The kernel is not required to index flows by the original
				144	flow key if a UFID is specified.
				145
				146
Jesse Gross	ccb1352	2011-10-25 19:26:31 -0700	[diff] [blame]	147	Basic rule for evolving flow keys
				148	---------------------------------
				149
				150	Some care is needed to really maintain forward and backward
				151	compatibility for applications that follow the rules listed under
				152	"Flow key compatibility" above.
				153
				154	The basic rule is obvious:
				155
				156	------------------------------------------------------------------
				157	New network protocol support must only supplement existing flow
				158	key attributes. It must not change the meaning of already defined
				159	flow key attributes.
				160	------------------------------------------------------------------
				161
				162	This rule does have less-obvious consequences so it is worth working
				163	through a few examples. Suppose, for example, that the kernel module
				164	did not already implement VLAN parsing. Instead, it just interpreted
				165	the 802.1Q TPID (0x8100) as the Ethertype then stopped parsing the
				166	packet. The flow key for any packet with an 802.1Q header would look
				167	essentially like this, ignoring metadata:
				168
				169	eth(...), eth_type(0x8100)
				170
				171	Naively, to add VLAN support, it makes sense to add a new "vlan" flow
				172	key attribute to contain the VLAN tag, then continue to decode the
				173	encapsulated headers beyond the VLAN tag using the existing field
Leo Alterman	efaac3b	2012-07-20 14:51:07 -0700	[diff] [blame]	174	definitions. With this change, a TCP packet in VLAN 10 would have a
Jesse Gross	ccb1352	2011-10-25 19:26:31 -0700	[diff] [blame]	175	flow key much like this:
				176
				177	eth(...), vlan(vid=10, pcp=0), eth_type(0x0800), ip(proto=6, ...), tcp(...)
				178
				179	But this change would negatively affect a userspace application that
				180	has not been updated to understand the new "vlan" flow key attribute.
				181	The application could, following the flow compatibility rules above,
				182	ignore the "vlan" attribute that it does not understand and therefore
				183	assume that the flow contained IP packets. This is a bad assumption
				184	(the flow only contains IP packets if one parses and skips over the
				185	802.1Q header) and it could cause the application's behavior to change
				186	across kernel versions even though it follows the compatibility rules.
				187
				188	The solution is to use a set of nested attributes. This is, for
				189	example, why 802.1Q support uses nested attributes. A TCP packet in
				190	VLAN 10 is actually expressed as:
				191
				192	eth(...), eth_type(0x8100), vlan(vid=10, pcp=0), encap(eth_type(0x0800),
				193	ip(proto=6, ...), tcp(...)))
				194
				195	Notice how the "eth_type", "ip", and "tcp" flow key attributes are
				196	nested inside the "encap" attribute. Thus, an application that does
				197	not understand the "vlan" key will not see either of those attributes
				198	and therefore will not misinterpret them. (Also, the outer eth_type
				199	is still 0x8100, not changed to 0x0800.)
				200
				201	Handling malformed packets
				202	--------------------------
				203
				204	Don't drop packets in the kernel for malformed protocol headers, bad
				205	checksums, etc. This would prevent userspace from implementing a
				206	simple Ethernet switch that forwards every packet.
				207
				208	Instead, in such a case, include an attribute with "empty" content.
				209	It doesn't matter if the empty content could be valid protocol values,
				210	as long as those values are rarely seen in practice, because userspace
				211	can always forward all packets with those values to userspace and
				212	handle them individually.
				213
				214	For example, consider a packet that contains an IP header that
				215	indicates protocol 6 for TCP, but which is truncated just after the IP
				216	header, so that the TCP header is missing. The flow key for this
				217	packet would include a tcp attribute with all-zero src and dst, like
				218	this:
				219
				220	eth(...), eth_type(0x0800), ip(proto=6, ...), tcp(src=0, dst=0)
				221
				222	As another example, consider a packet with an Ethernet type of 0x8100,
				223	indicating that a VLAN TCI should follow, but which is truncated just
				224	after the Ethernet type. The flow key for this packet would include
				225	an all-zero-bits vlan and an empty encap attribute, like this:
				226
				227	eth(...), eth_type(0x8100), vlan(0), encap()
				228
				229	Unlike a TCP packet with source and destination ports 0, an
				230	all-zero-bits VLAN TCI is not that rare, so the CFI bit (aka
				231	VLAN_TAG_PRESENT inside the kernel) is ordinarily set in a vlan
				232	attribute expressly to allow this situation to be distinguished.
				233	Thus, the flow key in this second example unambiguously indicates a
				234	missing or malformed VLAN TCI.
				235
				236	Other rules
				237	-----------
				238
				239	The other rules for flow keys are much less subtle:
				240
				241	- Duplicate attributes are not allowed at a given nesting level.
				242
				243	- Ordering of attributes is not significant.
				244
				245	- When the kernel sends a given flow key to userspace, it always
				246	composes it the same way. This allows userspace to hash and
				247	compare entire flow keys that it may not be able to fully
				248	interpret.