blob: f310edec8a776f0a8ffb4a943a386ea45990b1a6 [file] [log] [blame]
Michał Mirosławe5b1de12011-07-12 22:27:00 -07001Netdev features mess and how to get out from it alive
2=====================================================
3
4Author:
5 Michał Mirosław <mirq-linux@rere.qmqm.pl>
6
7
8
9 Part I: Feature sets
10======================
11
12Long gone are the days when a network card would just take and give packets
13verbatim. Today's devices add multiple features and bugs (read: offloads)
14that relieve an OS of various tasks like generating and checking checksums,
15splitting packets, classifying them. Those capabilities and their state
16are commonly referred to as netdev features in Linux kernel world.
17
18There are currently three sets of features relevant to the driver, and
19one used internally by network core:
20
21 1. netdev->hw_features set contains features whose state may possibly
22 be changed (enabled or disabled) for a particular device by user's
23 request. This set should be initialized in ndo_init callback and not
24 changed later.
25
26 2. netdev->features set contains features which are currently enabled
27 for a device. This should be changed only by network core or in
28 error paths of ndo_set_features callback.
29
30 3. netdev->vlan_features set contains features whose state is inherited
31 by child VLAN devices (limits netdev->features set). This is currently
32 used for all VLAN devices whether tags are stripped or inserted in
33 hardware or software.
34
35 4. netdev->wanted_features set contains feature set requested by user.
36 This set is filtered by ndo_fix_features callback whenever it or
37 some device-specific conditions change. This set is internal to
38 networking core and should not be referenced in drivers.
39
40
41
42 Part II: Controlling enabled features
43=======================================
44
45When current feature set (netdev->features) is to be changed, new set
46is calculated and filtered by calling ndo_fix_features callback
47and netdev_fix_features(). If the resulting set differs from current
48set, it is passed to ndo_set_features callback and (if the callback
49returns success) replaces value stored in netdev->features.
50NETDEV_FEAT_CHANGE notification is issued after that whenever current
51set might have changed.
52
53The following events trigger recalculation:
54 1. device's registration, after ndo_init returned success
55 2. user requested changes in features state
56 3. netdev_update_features() is called
57
58ndo_*_features callbacks are called with rtnl_lock held. Missing callbacks
59are treated as always returning success.
60
61A driver that wants to trigger recalculation must do so by calling
62netdev_update_features() while holding rtnl_lock. This should not be done
63from ndo_*_features callbacks. netdev->features should not be modified by
64driver except by means of ndo_fix_features callback.
65
66
67
68 Part III: Implementation hints
69================================
70
71 * ndo_fix_features:
72
73All dependencies between features should be resolved here. The resulting
74set can be reduced further by networking core imposed limitations (as coded
75in netdev_fix_features()). For this reason it is safer to disable a feature
76when its dependencies are not met instead of forcing the dependency on.
77
78This callback should not modify hardware nor driver state (should be
79stateless). It can be called multiple times between successive
80ndo_set_features calls.
81
82Callback must not alter features contained in NETIF_F_SOFT_FEATURES or
83NETIF_F_NEVER_CHANGE sets. The exception is NETIF_F_VLAN_CHALLENGED but
84care must be taken as the change won't affect already configured VLANs.
85
86 * ndo_set_features:
87
88Hardware should be reconfigured to match passed feature set. The set
89should not be altered unless some error condition happens that can't
90be reliably detected in ndo_fix_features. In this case, the callback
91should update netdev->features to match resulting hardware state.
92Errors returned are not (and cannot be) propagated anywhere except dmesg.
93(Note: successful return is zero, >0 means silent error.)
94
95
96
97 Part IV: Features
98===================
99
100For current list of features, see include/linux/netdev_features.h.
101This section describes semantics of some of them.
102
103 * Transmit checksumming
104
105For complete description, see comments near the top of include/linux/skbuff.h.
106
107Note: NETIF_F_HW_CSUM is a superset of NETIF_F_IP_CSUM + NETIF_F_IPV6_CSUM.
108It means that device can fill TCP/UDP-like checksum anywhere in the packets
109whatever headers there might be.
110
111 * Transmit TCP segmentation offload
112
113NETIF_F_TSO_ECN means that hardware can properly split packets with CWR bit
114set, be it TCPv4 (when NETIF_F_TSO is enabled) or TCPv6 (NETIF_F_TSO6).
115
116 * Transmit DMA from high memory
117
118On platforms where this is relevant, NETIF_F_HIGHDMA signals that
119ndo_start_xmit can handle skbs with frags in high memory.
120
121 * Transmit scatter-gather
122
123Those features say that ndo_start_xmit can handle fragmented skbs:
124NETIF_F_SG --- paged skbs (skb_shinfo()->frags), NETIF_F_FRAGLIST ---
125chained skbs (skb->next/prev list).
126
127 * Software features
128
129Features contained in NETIF_F_SOFT_FEATURES are features of networking
130stack. Driver should not change behaviour based on them.
131
132 * LLTX driver (deprecated for hardware drivers)
133
134NETIF_F_LLTX should be set in drivers that implement their own locking in
135transmit path or don't need locking at all (e.g. software tunnels).
136In ndo_start_xmit, it is recommended to use a try_lock and return
137NETDEV_TX_LOCKED when the spin lock fails. The locking should also properly
138protect against other callbacks (the rules you need to find out).
139
140Don't use it for new drivers.
141
142 * netns-local device
143
144NETIF_F_NETNS_LOCAL is set for devices that are not allowed to move between
145network namespaces (e.g. loopback).
146
147Don't use it in drivers.
148
149 * VLAN challenged
150
151NETIF_F_VLAN_CHALLENGED should be set for devices which can't cope with VLAN
152headers. Some drivers set this because the cards can't handle the bigger MTU.
153[FIXME: Those cases could be fixed in VLAN code by allowing only reduced-MTU
154VLANs. This may be not useful, though.]
Ben Greear36eabda32012-02-11 15:39:14 +0000155
156* rx-fcs
157
158This requests that the NIC append the Ethernet Frame Checksum (FCS)
159to the end of the skb data. This allows sniffers and other tools to
160read the CRC recorded by the NIC on receipt of the packet.
Ben Greear5e0c03c2012-02-11 15:39:45 +0000161
162* rx-all
163
164This requests that the NIC receive all possible frames, including errored
165frames (such as bad FCS, etc). This can be helpful when sniffing a link with
166bad packets on it. Some NICs may receive more packets if also put into normal
Kirill Smelkov73e212f2012-11-10 07:12:36 +0000167PROMISC mode.