Blame - Documentation/networking/tcp.txt - kernel/msm

blob: 0fa300425575b4600594ba9f9edb017e09c60e2b [file] [log] [blame]

Stephen Hemminger	9d7bcfc	2005-06-23 12:22:36 -0700	[diff] [blame]	1	TCP protocol
				2	============
Linus Torvalds	1da177e	2005-04-16 15:20:36 -0700	[diff] [blame]	3
Stephen Hemminger	9d7bcfc	2005-06-23 12:22:36 -0700	[diff] [blame]	4	Last updated: 21 June 2005
				5
				6	Contents
				7	========
				8
				9	- Congestion control
				10	- How the new TCP output machine [nyi] works
				11
				12	Congestion control
				13	==================
				14
				15	The following variables are used in the tcp_sock for congestion control:
				16	snd_cwnd The size of the congestion window
				17	snd_ssthresh Slow start threshold. We are in slow start if
				18	snd_cwnd is less than this.
				19	snd_cwnd_cnt A counter used to slow down the rate of increase
				20	once we exceed slow start threshold.
				21	snd_cwnd_clamp This is the maximum size that snd_cwnd can grow to.
				22	snd_cwnd_stamp Timestamp for when congestion window last validated.
				23	snd_cwnd_used Used as a highwater mark for how much of the
				24	congestion window is in use. It is used to adjust
				25	snd_cwnd down when the link is limited by the
				26	application rather than the network.
				27
				28	As of 2.6.13, Linux supports pluggable congestion control algorithms.
				29	A congestion control mechanism can be registered through functions in
				30	tcp_cong.c. The functions used by the congestion control mechanism are
				31	registered via passing a tcp_congestion_ops struct to
				32	tcp_register_congestion_control. As a minimum name, ssthresh,
				33	cong_avoid, min_cwnd must be valid.
				34
				35	Private data for a congestion control mechanism is stored in tp->ca_priv.
				36	tcp_ca(tp) returns a pointer to this space. This is preallocated space - it
				37	is important to check the size of your private data will fit this space, or
				38	alternatively space could be allocated elsewhere and a pointer to it could
				39	be stored here.
				40
				41	There are three kinds of congestion control algorithms currently: The
				42	simplest ones are derived from TCP reno (highspeed, scalable) and just
				43	provide an alternative the congestion window calculation. More complex
				44	ones like BIC try to look at other events to provide better
				45	heuristics. There are also round trip time based algorithms like
				46	Vegas and Westwood+.
				47
				48	Good TCP congestion control is a complex problem because the algorithm
				49	needs to maintain fairness and performance. Please review current
				50	research and RFC's before developing new modules.
				51
				52	The method that is used to determine which congestion control mechanism is
				53	determined by the setting of the sysctl net.ipv4.tcp_congestion_control.
				54	The default congestion control will be the last one registered (LIFO);
				55	so if you built everything as modules. the default will be reno. If you
				56	build with the default's from Kconfig, then BIC will be builtin (not a module)
				57	and it will end up the default.
				58
				59	If you really want a particular default value then you will need
				60	to set it with the sysctl. If you use a sysctl, the module will be autoloaded
				61	if needed and you will get the expected protocol. If you ask for an
				62	unknown congestion method, then the sysctl attempt will fail.
				63
				64	If you remove a tcp congestion control module, then you will get the next
				65	available one. Since reno can not be built as a module, and can not be
				66	deleted, it will always be available.
				67
				68	How the new TCP output machine [nyi] works.
				69	===========================================
Linus Torvalds	1da177e	2005-04-16 15:20:36 -0700	[diff] [blame]	70
				71	Data is kept on a single queue. The skb->users flag tells us if the frame is
				72	one that has been queued already. To add a frame we throw it on the end. Ack
				73	walks down the list from the start.
				74
				75	We keep a set of control flags
				76
				77
				78	sk->tcp_pend_event
				79
				80	TCP_PEND_ACK Ack needed
				81	TCP_ACK_NOW Needed now
				82	TCP_WINDOW Window update check
				83	TCP_WINZERO Zero probing
				84
				85
				86	sk->transmit_queue The transmission frame begin
				87	sk->transmit_new First new frame pointer
				88	sk->transmit_end Where to add frames
				89
				90	sk->tcp_last_tx_ack Last ack seen
				91	sk->tcp_dup_ack Dup ack count for fast retransmit
				92
				93
				94	Frames are queued for output by tcp_write. We do our best to send the frames
				95	off immediately if possible, but otherwise queue and compute the body
				96	checksum in the copy.
				97
				98	When a write is done we try to clear any pending events and piggy back them.
				99	If the window is full we queue full sized frames. On the first timeout in
				100	zero window we split this.
				101
				102	On a timer we walk the retransmit list to send any retransmits, update the
				103	backoff timers etc. A change of route table stamp causes a change of header
				104	and recompute. We add any new tcp level headers and refinish the checksum
				105	before sending.
				106