Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 1 | |
| 2 | Network Devices, the Kernel, and You! |
| 3 | |
| 4 | |
| 5 | Introduction |
| 6 | ============ |
| 7 | The following is a random collection of documentation regarding |
| 8 | network devices. |
| 9 | |
| 10 | struct net_device allocation rules |
| 11 | ================================== |
| 12 | Network device structures need to persist even after module is unloaded and |
| 13 | must be allocated with kmalloc. If device has registered successfully, |
| 14 | it will be freed on last use by free_netdev. This is required to handle the |
| 15 | pathologic case cleanly (example: rmmod mydriver </sys/class/net/myeth/mtu ) |
| 16 | |
| 17 | There are routines in net_init.c to handle the common cases of |
| 18 | alloc_etherdev, alloc_netdev. These reserve extra space for driver |
| 19 | private data which gets freed when the network device is freed. If |
| 20 | separately allocated data is attached to the network device |
Wang Chen | b74ca3a | 2008-12-08 01:14:16 -0800 | [diff] [blame] | 21 | (netdev_priv(dev)) then it is up to the module exit handler to free that. |
Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 22 | |
Stephen Hemminger | 1c8c7d6 | 2007-07-07 23:03:44 -0700 | [diff] [blame] | 23 | MTU |
| 24 | === |
| 25 | Each network device has a Maximum Transfer Unit. The MTU does not |
| 26 | include any link layer protocol overhead. Upper layer protocols must |
| 27 | not pass a socket buffer (skb) to a device to transmit with more data |
| 28 | than the mtu. The MTU does not include link layer header overhead, so |
| 29 | for example on Ethernet if the standard MTU is 1500 bytes used, the |
| 30 | actual skb will contain up to 1514 bytes because of the Ethernet |
| 31 | header. Devices should allow for the 4 byte VLAN header as well. |
| 32 | |
| 33 | Segmentation Offload (GSO, TSO) is an exception to this rule. The |
| 34 | upper layer protocol may pass a large socket buffer to the device |
| 35 | transmit routine, and the device will break that up into separate |
| 36 | packets based on the current MTU. |
| 37 | |
| 38 | MTU is symmetrical and applies both to receive and transmit. A device |
| 39 | must be able to receive at least the maximum size packet allowed by |
| 40 | the MTU. A network device may use the MTU as mechanism to size receive |
| 41 | buffers, but the device should allow packets with VLAN header. With |
| 42 | standard Ethernet mtu of 1500 bytes, the device should allow up to |
| 43 | 1518 byte packets (1500 + 14 header + 4 tag). The device may either: |
| 44 | drop, truncate, or pass up oversize packets, but dropping oversize |
| 45 | packets is preferred. |
| 46 | |
| 47 | |
Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 48 | struct net_device synchronization rules |
| 49 | ======================================= |
Ben Hutchings | b3cf654 | 2012-04-05 14:39:47 +0000 | [diff] [blame] | 50 | ndo_open: |
Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 51 | Synchronization: rtnl_lock() semaphore. |
| 52 | Context: process |
| 53 | |
Ben Hutchings | b3cf654 | 2012-04-05 14:39:47 +0000 | [diff] [blame] | 54 | ndo_stop: |
Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 55 | Synchronization: rtnl_lock() semaphore. |
| 56 | Context: process |
Ben Hutchings | 93b6a3a | 2012-04-05 14:39:10 +0000 | [diff] [blame] | 57 | Note: netif_running() is guaranteed false |
Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 58 | |
Ben Hutchings | b3cf654 | 2012-04-05 14:39:47 +0000 | [diff] [blame] | 59 | ndo_do_ioctl: |
Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 60 | Synchronization: rtnl_lock() semaphore. |
| 61 | Context: process |
| 62 | |
Ben Hutchings | b3cf654 | 2012-04-05 14:39:47 +0000 | [diff] [blame] | 63 | ndo_get_stats: |
Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 64 | Synchronization: dev_base_lock rwlock. |
| 65 | Context: nominally process, but don't sleep inside an rwlock |
| 66 | |
Ben Hutchings | b3cf654 | 2012-04-05 14:39:47 +0000 | [diff] [blame] | 67 | ndo_start_xmit: |
Ben Hutchings | 04fd3d3 | 2012-04-05 14:39:30 +0000 | [diff] [blame] | 68 | Synchronization: __netif_tx_lock spinlock. |
Stephen Hemminger | 1722933 | 2007-07-07 22:59:14 -0700 | [diff] [blame] | 69 | |
Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 70 | When the driver sets NETIF_F_LLTX in dev->features this will be |
Herbert Xu | 932ff27 | 2006-06-09 12:20:56 -0700 | [diff] [blame] | 71 | called without holding netif_tx_lock. In this case the driver |
Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 72 | has to lock by itself when needed. It is recommended to use a try lock |
Stephen Hemminger | 1722933 | 2007-07-07 22:59:14 -0700 | [diff] [blame] | 73 | for this and return NETDEV_TX_LOCKED when the spin lock fails. |
Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 74 | The locking there should also properly protect against |
Jiri Pirko | b81693d | 2011-08-16 06:29:02 +0000 | [diff] [blame] | 75 | set_rx_mode. Note that the use of NETIF_F_LLTX is deprecated. |
Matt LaPlante | 19f5946 | 2009-04-27 15:06:31 +0200 | [diff] [blame] | 76 | Don't use it for new drivers. |
Stephen Hemminger | 1722933 | 2007-07-07 22:59:14 -0700 | [diff] [blame] | 77 | |
| 78 | Context: Process with BHs disabled or BH (timer), |
| 79 | will be called with interrupts disabled by netconsole. |
| 80 | |
Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 81 | Return codes: |
| 82 | o NETDEV_TX_OK everything ok. |
| 83 | o NETDEV_TX_BUSY Cannot transmit packet, try later |
| 84 | Usually a bug, means queue start/stop flow control is broken in |
| 85 | the driver. Note: the driver must NOT put the skb in its DMA ring. |
| 86 | o NETDEV_TX_LOCKED Locking failed, please retry quickly. |
| 87 | Only valid when NETIF_F_LLTX is set. |
| 88 | |
Ben Hutchings | b3cf654 | 2012-04-05 14:39:47 +0000 | [diff] [blame] | 89 | ndo_tx_timeout: |
Ben Hutchings | 04fd3d3 | 2012-04-05 14:39:30 +0000 | [diff] [blame] | 90 | Synchronization: netif_tx_lock spinlock; all TX queues frozen. |
Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 91 | Context: BHs disabled |
| 92 | Notes: netif_queue_stopped() is guaranteed true |
| 93 | |
Ben Hutchings | b3cf654 | 2012-04-05 14:39:47 +0000 | [diff] [blame] | 94 | ndo_set_rx_mode: |
Ben Hutchings | 04fd3d3 | 2012-04-05 14:39:30 +0000 | [diff] [blame] | 95 | Synchronization: netif_addr_lock spinlock. |
Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 96 | Context: BHs disabled |
| 97 | |
Stephen Hemminger | bea3348 | 2007-10-03 16:41:36 -0700 | [diff] [blame] | 98 | struct napi_struct synchronization rules |
| 99 | ======================================== |
| 100 | napi->poll: |
| 101 | Synchronization: NAPI_STATE_SCHED bit in napi->state. Device |
Ben Hutchings | b3cf654 | 2012-04-05 14:39:47 +0000 | [diff] [blame] | 102 | driver's ndo_stop method will invoke napi_disable() on |
Stephen Hemminger | bea3348 | 2007-10-03 16:41:36 -0700 | [diff] [blame] | 103 | all NAPI instances which will do a sleeping poll on the |
| 104 | NAPI_STATE_SCHED napi->state bit, waiting for all pending |
| 105 | NAPI activity to cease. |
Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 106 | Context: softirq |
Stephen Hemminger | 1722933 | 2007-07-07 22:59:14 -0700 | [diff] [blame] | 107 | will be called with interrupts disabled by netconsole. |