blob: 4d5e3b0cab3f4662ff5b9d29ed58a9b72bba9516 [file] [log] [blame]
Paul Gortmakerfaa52732013-06-21 14:56:12 -04001Documentation for /proc/sys/net/*
Shen Feng760df932009-04-02 16:57:20 -07002 (c) 1999 Terrehon Bowden <terrehon@pacbell.net>
3 Bodo Bauer <bb@ricochet.net>
4 (c) 2000 Jorge Nerin <comandante@zaralinux.com>
5 (c) 2009 Shen Feng <shen@cn.fujitsu.com>
6
7For general info and legal blurb, please look in README.
8
9==============================================================
10
11This file contains the documentation for the sysctl files in
Paul Gortmakerfaa52732013-06-21 14:56:12 -040012/proc/sys/net
Shen Feng760df932009-04-02 16:57:20 -070013
14The interface to the networking parts of the kernel is located in
Paul Gortmakerfaa52732013-06-21 14:56:12 -040015/proc/sys/net. The following table shows all possible subdirectories. You may
Shen Feng760df932009-04-02 16:57:20 -070016see only some of them, depending on your kernel's configuration.
17
18
19Table : Subdirectories in /proc/sys/net
20..............................................................................
21 Directory Content Directory Content
22 core General parameter appletalk Appletalk protocol
23 unix Unix domain sockets netrom NET/ROM
24 802 E802 protocol ax25 AX25
25 ethernet Ethernet protocol rose X.25 PLP layer
26 ipv4 IP version 4 x25 X.25 protocol
27 ipx IPX token-ring IBM token ring
28 bridge Bridging decnet DEC net
Ying Xuecc79dd12013-06-17 10:54:37 -040029 ipv6 IP version 6 tipc TIPC
Shen Feng760df932009-04-02 16:57:20 -070030..............................................................................
31
321. /proc/sys/net/core - Network core options
33-------------------------------------------------------
34
Eric Dumazet0a148422011-04-20 09:27:32 +000035bpf_jit_enable
36--------------
37
38This enables Berkeley Packet Filter Just in Time compiler.
39Currently supported on x86_64 architecture, bpf_jit provides a framework
40to speed packet filtering, the one used by tcpdump/libpcap for example.
41Values :
42 0 - disable the JIT (default value)
43 1 - enable the JIT
44 2 - enable the JIT and ask the compiler to emit traces on kernel log.
45
Daniel Borkmann4f3446b2016-05-13 19:08:32 +020046bpf_jit_harden
47--------------
48
49This enables hardening for the Berkeley Packet Filter Just in Time compiler.
50Supported are eBPF JIT backends. Enabling hardening trades off performance,
51but can mitigate JIT spraying.
52Values :
53 0 - disable JIT hardening (default value)
54 1 - enable JIT hardening for unprivileged users only
55 2 - enable JIT hardening for all users
56
Daniel Borkmannc98446e2019-08-17 00:00:08 +010057bpf_jit_limit
58-------------
59
60This enforces a global limit for memory allocations to the BPF JIT
61compiler in order to reject unprivileged JIT requests once it has
62been surpassed. bpf_jit_limit contains the value of the global limit
63in bytes.
64
Shan Weic60f6aa2012-04-26 16:52:52 +000065dev_weight
66--------------
67
68The maximum number of packets that kernel can handle on a NAPI interrupt,
69it's a Per-CPU variable.
70Default: 64
71
stephen hemminger6da7c8f2013-08-27 16:19:08 -070072default_qdisc
73--------------
74
75The default queuing discipline to use for network devices. This allows
Phil Sutter2e641262015-09-15 10:33:07 +020076overriding the default of pfifo_fast with an alternative. Since the default
77queuing discipline is created without additional parameters so is best suited
78to queuing disciplines that work well without configuration like stochastic
79fair queue (sfq), CoDel (codel) or fair queue CoDel (fq_codel). Don't use
80queuing disciplines like Hierarchical Token Bucket or Deficit Round Robin
81which require setting up classes and bandwidths. Note that physical multiqueue
82interfaces still use mq as root qdisc, which in turn uses this default for its
83leaves. Virtual devices (like e.g. lo or veth) ignore this setting and instead
84default to noqueue.
stephen hemminger6da7c8f2013-08-27 16:19:08 -070085Default: pfifo_fast
86
Eliezer Tamir64b0dc52013-07-10 17:13:36 +030087busy_read
Eliezer Tamir2d48d672013-06-24 10:28:03 +030088----------------
Cong Wange0d10952013-08-01 11:10:25 +080089Low latency busy poll timeout for socket reads. (needs CONFIG_NET_RX_BUSY_POLL)
Eliezer Tamircbf55002013-07-08 16:20:34 +030090Approximate time in us to busy loop waiting for packets on the device queue.
Eliezer Tamir64b0dc52013-07-10 17:13:36 +030091This sets the default value of the SO_BUSY_POLL socket option.
92Can be set or overridden per socket by setting socket option SO_BUSY_POLL,
93which is the preferred method of enabling. If you need to enable the feature
94globally via sysctl, a value of 50 is recommended.
Eliezer Tamircbf55002013-07-08 16:20:34 +030095Will increase power usage.
Eliezer Tamir2d48d672013-06-24 10:28:03 +030096Default: 0 (off)
97
Eliezer Tamir64b0dc52013-07-10 17:13:36 +030098busy_poll
Eliezer Tamir06021292013-06-10 11:39:50 +030099----------------
Cong Wange0d10952013-08-01 11:10:25 +0800100Low latency busy poll timeout for poll and select. (needs CONFIG_NET_RX_BUSY_POLL)
Eliezer Tamircbf55002013-07-08 16:20:34 +0300101Approximate time in us to busy loop waiting for events.
Eliezer Tamir2d48d672013-06-24 10:28:03 +0300102Recommended value depends on the number of sockets you poll on.
103For several sockets 50, for several hundreds 100.
104For more than that you probably want to use epoll.
Eliezer Tamir64b0dc52013-07-10 17:13:36 +0300105Note that only sockets with SO_BUSY_POLL set will be busy polled,
106so you want to either selectively set SO_BUSY_POLL on those sockets or set
107sysctl.net.busy_read globally.
Eliezer Tamircbf55002013-07-08 16:20:34 +0300108Will increase power usage.
Eliezer Tamir06021292013-06-10 11:39:50 +0300109Default: 0 (off)
110
Shen Feng760df932009-04-02 16:57:20 -0700111rmem_default
112------------
113
114The default setting of the socket receive buffer in bytes.
115
116rmem_max
117--------
118
119The maximum receive socket buffer size in bytes.
120
Willem de Bruijnb245be12015-01-30 13:29:32 -0500121tstamp_allow_data
122-----------------
123Allow processes to receive tx timestamps looped together with the original
124packet contents. If disabled, transmit timestamp requests from unprivileged
125processes are dropped unless socket option SOF_TIMESTAMPING_OPT_TSONLY is set.
126Default: 1 (on)
127
128
Shen Feng760df932009-04-02 16:57:20 -0700129wmem_default
130------------
131
132The default setting (in bytes) of the socket send buffer.
133
134wmem_max
135--------
136
137The maximum send socket buffer size in bytes.
138
139message_burst and message_cost
140------------------------------
141
142These parameters are used to limit the warning messages written to the kernel
143log from the networking code. They enforce a rate limit to make a
144denial-of-service attack impossible. A higher message_cost factor, results in
145fewer messages that will be written. Message_burst controls when messages will
146be dropped. The default settings limit warning messages to one every five
147seconds.
148
149warnings
150--------
151
Joe Perchesba7a46f2014-11-11 10:59:17 -0800152This sysctl is now unused.
153
154This was used to control console messages from the networking stack that
155occur because of problems on the network like duplicate address or bad
156checksums.
157
158These messages are now emitted at KERN_DEBUG and can generally be enabled
159and controlled by the dynamic_debug facility.
Shen Feng760df932009-04-02 16:57:20 -0700160
161netdev_budget
162-------------
163
164Maximum number of packets taken from all interfaces in one polling cycle (NAPI
165poll). In one polling cycle interfaces which are registered to polling are
Rami Rosen3cc75872013-05-17 09:10:34 +0000166probed in a round-robin manner.
Shen Feng760df932009-04-02 16:57:20 -0700167
168netdev_max_backlog
169------------------
170
171Maximum number of packets, queued on the INPUT side, when the interface
172receives packets faster than kernel can process them.
173
Eric Dumazet960fb622014-11-16 06:23:05 -0800174netdev_rss_key
175--------------
176
177RSS (Receive Side Scaling) enabled drivers use a 40 bytes host key that is
178randomly generated.
179Some user space might need to gather its content even if drivers do not
180provide ethtool -x support yet.
181
182myhost:~# cat /proc/sys/net/core/netdev_rss_key
18384:50:f4:00:a8:15:d1:a7:e9:7f:1d:60:35:c7:47:25:42:97:74:ca:56:bb:b6:a1:d8: ... (52 bytes total)
184
185File contains nul bytes if no driver ever called netdev_rss_key_fill() function.
186Note:
187/proc/sys/net/core/netdev_rss_key contains 52 bytes of key,
188but most drivers only use 40 bytes of it.
189
190myhost:~# ethtool -x eth0
191RX flow hash indirection table for eth0 with 8 RX ring(s):
192 0: 0 1 2 3 4 5 6 7
193RSS hash key:
19484:50:f4:00:a8:15:d1:a7:e9:7f:1d:60:35:c7:47:25:42:97:74:ca:56:bb:b6:a1:d8:43:e3:c9:0c:fd:17:55:c2:3a:4d:69:ed:f1:42:89
195
Eric Dumazet3b098e22010-05-15 23:57:10 -0700196netdev_tstamp_prequeue
197----------------------
198
199If set to 0, RX packet timestamps can be sampled after RPS processing, when
200the target CPU processes packets. It might give some delay on timestamps, but
201permit to distribute the load on several cpus.
202
203If set to 1 (default), timestamps are sampled as soon as possible, before
204queueing.
205
Shen Feng760df932009-04-02 16:57:20 -0700206optmem_max
207----------
208
209Maximum ancillary buffer size allowed per socket. Ancillary data is a sequence
210of struct cmsghdr structures with appended data.
211
2122. /proc/sys/net/unix - Parameters for Unix domain sockets
213-------------------------------------------------------
214
Li Xiaodong45dad7b2009-04-02 16:57:21 -0700215There is only one file in this directory.
216unix_dgram_qlen limits the max number of datagrams queued in Unix domain
Li Zefanca8b9952009-04-13 14:39:36 -0700217socket's buffer. It will not take effect unless PF_UNIX flag is specified.
Shen Feng760df932009-04-02 16:57:20 -0700218
219
2203. /proc/sys/net/ipv4 - IPV4 settings
221-------------------------------------------------------
222Please see: Documentation/networking/ip-sysctl.txt and ipvs-sysctl.txt for
223descriptions of these entries.
224
225
2264. Appletalk
227-------------------------------------------------------
228
229The /proc/sys/net/appletalk directory holds the Appletalk configuration data
230when Appletalk is loaded. The configurable parameters are:
231
232aarp-expiry-time
233----------------
234
235The amount of time we keep an ARP entry before expiring it. Used to age out
236old hosts.
237
238aarp-resolve-time
239-----------------
240
241The amount of time we will spend trying to resolve an Appletalk address.
242
243aarp-retransmit-limit
244---------------------
245
246The number of times we will retransmit a query before giving up.
247
248aarp-tick-time
249--------------
250
251Controls the rate at which expires are checked.
252
253The directory /proc/net/appletalk holds the list of active Appletalk sockets
254on a machine.
255
256The fields indicate the DDP type, the local address (in network:node format)
257the remote address, the size of the transmit pending queue, the size of the
258received queue (bytes waiting for applications to read) the state and the uid
259owning the socket.
260
261/proc/net/atalk_iface lists all the interfaces configured for appletalk.It
262shows the name of the interface, its Appletalk address, the network range on
263that address (or network number for phase 1 networks), and the status of the
264interface.
265
266/proc/net/atalk_route lists each known network route. It lists the target
267(network) that the route leads to, the router (may be directly connected), the
268route flags, and the device the route is using.
269
270
2715. IPX
272-------------------------------------------------------
273
274The IPX protocol has no tunable values in proc/sys/net.
275
276The IPX protocol does, however, provide proc/net/ipx. This lists each IPX
277socket giving the local and remote addresses in Novell format (that is
278network:node:port). In accordance with the strange Novell tradition,
279everything but the port is in hex. Not_Connected is displayed for sockets that
280are not tied to a specific remote address. The Tx and Rx queue sizes indicate
281the number of bytes pending for transmission and reception. The state
282indicates the state the socket is in and the uid is the owning uid of the
283socket.
284
285The /proc/net/ipx_interface file lists all IPX interfaces. For each interface
286it gives the network number, the node number, and indicates if the network is
287the primary network. It also indicates which device it is bound to (or
288Internal for internal networks) and the Frame Type if appropriate. Linux
289supports 802.3, 802.2, 802.2 SNAP and DIX (Blue Book) ethernet framing for
290IPX.
291
292The /proc/net/ipx_route table holds a list of IPX routes. For each route it
293gives the destination network, the router node (or Directly) and the network
294address of the router (or Connected) for internal networks.
Ying Xuecc79dd12013-06-17 10:54:37 -0400295
2966. TIPC
297-------------------------------------------------------
298
Erik Hugnea5325ae2014-08-28 09:08:47 +0200299tipc_rmem
300----------
301
Ying Xuecc79dd12013-06-17 10:54:37 -0400302The TIPC protocol now has a tunable for the receive memory, similar to the
303tcp_rmem - i.e. a vector of 3 INTEGERs: (min, default, max)
304
305 # cat /proc/sys/net/tipc/tipc_rmem
306 4252725 34021800 68043600
307 #
308
309The max value is set to CONN_OVERLOAD_LIMIT, and the default and min values
310are scaled (shifted) versions of that same value. Note that the min value
311is not at this point in time used in any meaningful way, but the triplet is
312preserved in order to be consistent with things like tcp_rmem.
Erik Hugnea5325ae2014-08-28 09:08:47 +0200313
314named_timeout
315--------------
316
317TIPC name table updates are distributed asynchronously in a cluster, without
318any form of transaction handling. This means that different race scenarios are
319possible. One such is that a name withdrawal sent out by one node and received
320by another node may arrive after a second, overlapping name publication already
321has been accepted from a third node, although the conflicting updates
322originally may have been issued in the correct sequential order.
323If named_timeout is nonzero, failed topology updates will be placed on a defer
324queue until another event arrives that clears the error, or until the timeout
325expires. Value is in milliseconds.