Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 1 | |
| 2 | Network Devices, the Kernel, and You! |
| 3 | |
| 4 | |
| 5 | Introduction |
| 6 | ============ |
| 7 | The following is a random collection of documentation regarding |
| 8 | network devices. |
| 9 | |
| 10 | struct net_device allocation rules |
| 11 | ================================== |
| 12 | Network device structures need to persist even after module is unloaded and |
| 13 | must be allocated with kmalloc. If device has registered successfully, |
| 14 | it will be freed on last use by free_netdev. This is required to handle the |
| 15 | pathologic case cleanly (example: rmmod mydriver </sys/class/net/myeth/mtu ) |
| 16 | |
| 17 | There are routines in net_init.c to handle the common cases of |
| 18 | alloc_etherdev, alloc_netdev. These reserve extra space for driver |
| 19 | private data which gets freed when the network device is freed. If |
| 20 | separately allocated data is attached to the network device |
Wang Chen | b74ca3a | 2008-12-08 01:14:16 -0800 | [diff] [blame] | 21 | (netdev_priv(dev)) then it is up to the module exit handler to free that. |
Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 22 | |
Stephen Hemminger | 1c8c7d6 | 2007-07-07 23:03:44 -0700 | [diff] [blame] | 23 | MTU |
| 24 | === |
| 25 | Each network device has a Maximum Transfer Unit. The MTU does not |
| 26 | include any link layer protocol overhead. Upper layer protocols must |
| 27 | not pass a socket buffer (skb) to a device to transmit with more data |
| 28 | than the mtu. The MTU does not include link layer header overhead, so |
| 29 | for example on Ethernet if the standard MTU is 1500 bytes used, the |
| 30 | actual skb will contain up to 1514 bytes because of the Ethernet |
| 31 | header. Devices should allow for the 4 byte VLAN header as well. |
| 32 | |
| 33 | Segmentation Offload (GSO, TSO) is an exception to this rule. The |
| 34 | upper layer protocol may pass a large socket buffer to the device |
| 35 | transmit routine, and the device will break that up into separate |
| 36 | packets based on the current MTU. |
| 37 | |
| 38 | MTU is symmetrical and applies both to receive and transmit. A device |
| 39 | must be able to receive at least the maximum size packet allowed by |
| 40 | the MTU. A network device may use the MTU as mechanism to size receive |
| 41 | buffers, but the device should allow packets with VLAN header. With |
| 42 | standard Ethernet mtu of 1500 bytes, the device should allow up to |
| 43 | 1518 byte packets (1500 + 14 header + 4 tag). The device may either: |
| 44 | drop, truncate, or pass up oversize packets, but dropping oversize |
| 45 | packets is preferred. |
| 46 | |
| 47 | |
Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 48 | struct net_device synchronization rules |
| 49 | ======================================= |
| 50 | dev->open: |
| 51 | Synchronization: rtnl_lock() semaphore. |
| 52 | Context: process |
| 53 | |
| 54 | dev->stop: |
| 55 | Synchronization: rtnl_lock() semaphore. |
| 56 | Context: process |
| 57 | Note1: netif_running() is guaranteed false |
| 58 | Note2: dev->poll() is guaranteed to be stopped |
| 59 | |
| 60 | dev->do_ioctl: |
| 61 | Synchronization: rtnl_lock() semaphore. |
| 62 | Context: process |
| 63 | |
| 64 | dev->get_stats: |
| 65 | Synchronization: dev_base_lock rwlock. |
| 66 | Context: nominally process, but don't sleep inside an rwlock |
| 67 | |
| 68 | dev->hard_start_xmit: |
Herbert Xu | 932ff27 | 2006-06-09 12:20:56 -0700 | [diff] [blame] | 69 | Synchronization: netif_tx_lock spinlock. |
Stephen Hemminger | 1722933 | 2007-07-07 22:59:14 -0700 | [diff] [blame] | 70 | |
Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 71 | When the driver sets NETIF_F_LLTX in dev->features this will be |
Herbert Xu | 932ff27 | 2006-06-09 12:20:56 -0700 | [diff] [blame] | 72 | called without holding netif_tx_lock. In this case the driver |
Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 73 | has to lock by itself when needed. It is recommended to use a try lock |
Stephen Hemminger | 1722933 | 2007-07-07 22:59:14 -0700 | [diff] [blame] | 74 | for this and return NETDEV_TX_LOCKED when the spin lock fails. |
Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 75 | The locking there should also properly protect against |
Christian Borntraeger | e24eb52 | 2007-09-25 19:42:02 -0700 | [diff] [blame] | 76 | set_multicast_list. Note that the use of NETIF_F_LLTX is deprecated. |
Matt LaPlante | 19f5946 | 2009-04-27 15:06:31 +0200 | [diff] [blame] | 77 | Don't use it for new drivers. |
Stephen Hemminger | 1722933 | 2007-07-07 22:59:14 -0700 | [diff] [blame] | 78 | |
| 79 | Context: Process with BHs disabled or BH (timer), |
| 80 | will be called with interrupts disabled by netconsole. |
| 81 | |
Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 82 | Return codes: |
| 83 | o NETDEV_TX_OK everything ok. |
| 84 | o NETDEV_TX_BUSY Cannot transmit packet, try later |
| 85 | Usually a bug, means queue start/stop flow control is broken in |
| 86 | the driver. Note: the driver must NOT put the skb in its DMA ring. |
| 87 | o NETDEV_TX_LOCKED Locking failed, please retry quickly. |
| 88 | Only valid when NETIF_F_LLTX is set. |
| 89 | |
| 90 | dev->tx_timeout: |
Herbert Xu | 932ff27 | 2006-06-09 12:20:56 -0700 | [diff] [blame] | 91 | Synchronization: netif_tx_lock spinlock. |
Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 92 | Context: BHs disabled |
| 93 | Notes: netif_queue_stopped() is guaranteed true |
| 94 | |
| 95 | dev->set_multicast_list: |
Herbert Xu | 932ff27 | 2006-06-09 12:20:56 -0700 | [diff] [blame] | 96 | Synchronization: netif_tx_lock spinlock. |
Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 97 | Context: BHs disabled |
| 98 | |
Stephen Hemminger | bea3348 | 2007-10-03 16:41:36 -0700 | [diff] [blame] | 99 | struct napi_struct synchronization rules |
| 100 | ======================================== |
| 101 | napi->poll: |
| 102 | Synchronization: NAPI_STATE_SCHED bit in napi->state. Device |
| 103 | driver's dev->close method will invoke napi_disable() on |
| 104 | all NAPI instances which will do a sleeping poll on the |
| 105 | NAPI_STATE_SCHED napi->state bit, waiting for all pending |
| 106 | NAPI activity to cease. |
Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 107 | Context: softirq |
Stephen Hemminger | 1722933 | 2007-07-07 22:59:14 -0700 | [diff] [blame] | 108 | will be called with interrupts disabled by netconsole. |