Blame - Documentation/this_cpu_ops.txt - kernel/msm-4.19

blob: 5cb8b883ae83221ace183c463d0232ea9d556980 [file] [log] [blame]

Mauro Carvalho Chehab	79ab3b0	2017-05-17 09:10:48 -0300	[diff] [blame]	1	===================
Christoph Lameter	a1b2a55	2013-04-04 14:41:08 +0000	[diff] [blame]	2	this_cpu operations
Mauro Carvalho Chehab	79ab3b0	2017-05-17 09:10:48 -0300	[diff] [blame]	3	===================
				4
				5	:Author: Christoph Lameter, August 4th, 2014
				6	:Author: Pranith Kumar, Aug 2nd, 2014
Christoph Lameter	a1b2a55	2013-04-04 14:41:08 +0000	[diff] [blame]	7
				8	this_cpu operations are a way of optimizing access to per cpu
Pranith Kumar	ac490f4	2014-08-24 18:17:32 -0700	[diff] [blame]	9	variables associated with the currently executing processor. This is
				10	done through the use of segment registers (or a dedicated register where
				11	the cpu permanently stored the beginning of the per cpu area for a
				12	specific processor).
Christoph Lameter	a1b2a55	2013-04-04 14:41:08 +0000	[diff] [blame]	13
Pranith Kumar	ac490f4	2014-08-24 18:17:32 -0700	[diff] [blame]	14	this_cpu operations add a per cpu variable offset to the processor
				15	specific per cpu base and encode that operation in the instruction
Christoph Lameter	a1b2a55	2013-04-04 14:41:08 +0000	[diff] [blame]	16	operating on the per cpu variable.
				17
Pranith Kumar	ac490f4	2014-08-24 18:17:32 -0700	[diff] [blame]	18	This means that there are no atomicity issues between the calculation of
Christoph Lameter	a1b2a55	2013-04-04 14:41:08 +0000	[diff] [blame]	19	the offset and the operation on the data. Therefore it is not
Pranith Kumar	ac490f4	2014-08-24 18:17:32 -0700	[diff] [blame]	20	necessary to disable preemption or interrupts to ensure that the
Christoph Lameter	a1b2a55	2013-04-04 14:41:08 +0000	[diff] [blame]	21	processor is not changed between the calculation of the address and
				22	the operation on the data.
				23
				24	Read-modify-write operations are of particular interest. Frequently
				25	processors have special lower latency instructions that can operate
Pranith Kumar	ac490f4	2014-08-24 18:17:32 -0700	[diff] [blame]	26	without the typical synchronization overhead, but still provide some
				27	sort of relaxed atomicity guarantees. The x86, for example, can execute
				28	RMW (Read Modify Write) instructions like inc/dec/cmpxchg without the
Christoph Lameter	a1b2a55	2013-04-04 14:41:08 +0000	[diff] [blame]	29	lock prefix and the associated latency penalty.
				30
				31	Access to the variable without the lock prefix is not synchronized but
				32	synchronization is not necessary since we are dealing with per cpu
				33	data specific to the currently executing processor. Only the current
				34	processor should be accessing that variable and therefore there are no
				35	concurrency issues with other processors in the system.
				36
Pranith Kumar	ac490f4	2014-08-24 18:17:32 -0700	[diff] [blame]	37	Please note that accesses by remote processors to a per cpu area are
				38	exceptional situations and may impact performance and/or correctness
				39	(remote write operations) of local RMW operations via this_cpu_*.
				40
				41	The main use of the this_cpu operations has been to optimize counter
				42	operations.
				43
				44	The following this_cpu() operations with implied preemption protection
				45	are defined. These operations can be used without worrying about
Mauro Carvalho Chehab	79ab3b0	2017-05-17 09:10:48 -0300	[diff] [blame]	46	preemption and interrupts::
Pranith Kumar	ac490f4	2014-08-24 18:17:32 -0700	[diff] [blame]	47
Pranith Kumar	ac490f4	2014-08-24 18:17:32 -0700	[diff] [blame]	48	this_cpu_read(pcp)
				49	this_cpu_write(pcp, val)
				50	this_cpu_add(pcp, val)
				51	this_cpu_and(pcp, val)
				52	this_cpu_or(pcp, val)
				53	this_cpu_add_return(pcp, val)
				54	this_cpu_xchg(pcp, nval)
				55	this_cpu_cmpxchg(pcp, oval, nval)
				56	this_cpu_cmpxchg_double(pcp1, pcp2, oval1, oval2, nval1, nval2)
				57	this_cpu_sub(pcp, val)
				58	this_cpu_inc(pcp)
				59	this_cpu_dec(pcp)
				60	this_cpu_sub_return(pcp, val)
				61	this_cpu_inc_return(pcp)
				62	this_cpu_dec_return(pcp)
				63
				64
				65	Inner working of this_cpu operations
				66	------------------------------------
				67
Christoph Lameter	a1b2a55	2013-04-04 14:41:08 +0000	[diff] [blame]	68	On x86 the fs: or the gs: segment registers contain the base of the
				69	per cpu area. It is then possible to simply use the segment override
				70	to relocate a per cpu relative address to the proper per cpu area for
				71	the processor. So the relocation to the per cpu base is encoded in the
				72	instruction via a segment register prefix.
				73
Mauro Carvalho Chehab	79ab3b0	2017-05-17 09:10:48 -0300	[diff] [blame]	74	For example::
Christoph Lameter	a1b2a55	2013-04-04 14:41:08 +0000	[diff] [blame]	75
				76	DEFINE_PER_CPU(int, x);
				77	int z;
				78
				79	z = this_cpu_read(x);
				80
Mauro Carvalho Chehab	79ab3b0	2017-05-17 09:10:48 -0300	[diff] [blame]	81	results in a single instruction::
Christoph Lameter	a1b2a55	2013-04-04 14:41:08 +0000	[diff] [blame]	82
				83	mov ax, gs:[x]
				84
				85	instead of a sequence of calculation of the address and then a fetch
Pranith Kumar	ac490f4	2014-08-24 18:17:32 -0700	[diff] [blame]	86	from that address which occurs with the per cpu operations. Before
Christoph Lameter	a1b2a55	2013-04-04 14:41:08 +0000	[diff] [blame]	87	this_cpu_ops such sequence also required preempt disable/enable to
				88	prevent the kernel from moving the thread to a different processor
				89	while the calculation is performed.
				90
Mauro Carvalho Chehab	79ab3b0	2017-05-17 09:10:48 -0300	[diff] [blame]	91	Consider the following this_cpu operation::
Christoph Lameter	a1b2a55	2013-04-04 14:41:08 +0000	[diff] [blame]	92
				93	this_cpu_inc(x)
				94
Mauro Carvalho Chehab	79ab3b0	2017-05-17 09:10:48 -0300	[diff] [blame]	95	The above results in the following single instruction (no lock prefix!)::
Christoph Lameter	a1b2a55	2013-04-04 14:41:08 +0000	[diff] [blame]	96
				97	inc gs:[x]
				98
				99	instead of the following operations required if there is no segment
Mauro Carvalho Chehab	79ab3b0	2017-05-17 09:10:48 -0300	[diff] [blame]	100	register::
Christoph Lameter	a1b2a55	2013-04-04 14:41:08 +0000	[diff] [blame]	101
				102	int *y;
				103	int cpu;
				104
				105	cpu = get_cpu();
				106	y = per_cpu_ptr(&x, cpu);
				107	(*y)++;
				108	put_cpu();
				109
Pranith Kumar	ac490f4	2014-08-24 18:17:32 -0700	[diff] [blame]	110	Note that these operations can only be used on per cpu data that is
Christoph Lameter	a1b2a55	2013-04-04 14:41:08 +0000	[diff] [blame]	111	reserved for a specific processor. Without disabling preemption in the
				112	surrounding code this_cpu_inc() will only guarantee that one of the
Pranith Kumar	ac490f4	2014-08-24 18:17:32 -0700	[diff] [blame]	113	per cpu counters is correctly incremented. However, there is no
Christoph Lameter	a1b2a55	2013-04-04 14:41:08 +0000	[diff] [blame]	114	guarantee that the OS will not move the process directly before or
				115	after the this_cpu instruction is executed. In general this means that
				116	the value of the individual counters for each processor are
				117	meaningless. The sum of all the per cpu counters is the only value
				118	that is of interest.
				119
				120	Per cpu variables are used for performance reasons. Bouncing cache
				121	lines can be avoided if multiple processors concurrently go through
				122	the same code paths. Since each processor has its own per cpu
Pranith Kumar	ac490f4	2014-08-24 18:17:32 -0700	[diff] [blame]	123	variables no concurrent cache line updates take place. The price that
Christoph Lameter	a1b2a55	2013-04-04 14:41:08 +0000	[diff] [blame]	124	has to be paid for this optimization is the need to add up the per cpu
Pranith Kumar	ac490f4	2014-08-24 18:17:32 -0700	[diff] [blame]	125	counters when the value of a counter is needed.
Christoph Lameter	a1b2a55	2013-04-04 14:41:08 +0000	[diff] [blame]	126
				127
Mauro Carvalho Chehab	79ab3b0	2017-05-17 09:10:48 -0300	[diff] [blame]	128	Special operations
				129	------------------
				130
				131	::
Christoph Lameter	a1b2a55	2013-04-04 14:41:08 +0000	[diff] [blame]	132
				133	y = this_cpu_ptr(&x)
				134
				135	Takes the offset of a per cpu variable (&x !) and returns the address
				136	of the per cpu variable that belongs to the currently executing
				137	processor. this_cpu_ptr avoids multiple steps that the common
				138	get_cpu/put_cpu sequence requires. No processor number is
Pranith Kumar	ac490f4	2014-08-24 18:17:32 -0700	[diff] [blame]	139	available. Instead, the offset of the local per cpu area is simply
				140	added to the per cpu offset.
Christoph Lameter	a1b2a55	2013-04-04 14:41:08 +0000	[diff] [blame]	141
Pranith Kumar	ac490f4	2014-08-24 18:17:32 -0700	[diff] [blame]	142	Note that this operation is usually used in a code segment when
				143	preemption has been disabled. The pointer is then used to
				144	access local per cpu data in a critical section. When preemption
				145	is re-enabled this pointer is usually no longer useful since it may
				146	no longer point to per cpu data of the current processor.
Christoph Lameter	a1b2a55	2013-04-04 14:41:08 +0000	[diff] [blame]	147
				148
				149	Per cpu variables and offsets
				150	-----------------------------
				151
Pranith Kumar	ac490f4	2014-08-24 18:17:32 -0700	[diff] [blame]	152	Per cpu variables have offsets to the beginning of the per cpu
Christoph Lameter	a1b2a55	2013-04-04 14:41:08 +0000	[diff] [blame]	153	area. They do not have addresses although they look like that in the
				154	code. Offsets cannot be directly dereferenced. The offset must be
Pranith Kumar	ac490f4	2014-08-24 18:17:32 -0700	[diff] [blame]	155	added to a base pointer of a per cpu area of a processor in order to
Christoph Lameter	a1b2a55	2013-04-04 14:41:08 +0000	[diff] [blame]	156	form a valid address.
				157
				158	Therefore the use of x or &x outside of the context of per cpu
				159	operations is invalid and will generally be treated like a NULL
				160	pointer dereference.
				161
Mauro Carvalho Chehab	79ab3b0	2017-05-17 09:10:48 -0300	[diff] [blame]	162	::
				163
Pranith Kumar	ac490f4	2014-08-24 18:17:32 -0700	[diff] [blame]	164	DEFINE_PER_CPU(int, x);
Christoph Lameter	a1b2a55	2013-04-04 14:41:08 +0000	[diff] [blame]	165
Pranith Kumar	ac490f4	2014-08-24 18:17:32 -0700	[diff] [blame]	166	In the context of per cpu operations the above implies that x is a per
				167	cpu variable. Most this_cpu operations take a cpu variable.
Christoph Lameter	a1b2a55	2013-04-04 14:41:08 +0000	[diff] [blame]	168
Mauro Carvalho Chehab	79ab3b0	2017-05-17 09:10:48 -0300	[diff] [blame]	169	::
				170
Pranith Kumar	ac490f4	2014-08-24 18:17:32 -0700	[diff] [blame]	171	int __percpu *p = &x;
Christoph Lameter	a1b2a55	2013-04-04 14:41:08 +0000	[diff] [blame]	172
Pranith Kumar	ac490f4	2014-08-24 18:17:32 -0700	[diff] [blame]	173	&x and hence p is the offset of a per cpu variable. this_cpu_ptr()
				174	takes the offset of a per cpu variable which makes this look a bit
				175	strange.
Christoph Lameter	a1b2a55	2013-04-04 14:41:08 +0000	[diff] [blame]	176
				177
				178	Operations on a field of a per cpu structure
				179	--------------------------------------------
				180
Mauro Carvalho Chehab	79ab3b0	2017-05-17 09:10:48 -0300	[diff] [blame]	181	Let's say we have a percpu structure::
Christoph Lameter	a1b2a55	2013-04-04 14:41:08 +0000	[diff] [blame]	182
				183	struct s {
				184	int n,m;
				185	};
				186
				187	DEFINE_PER_CPU(struct s, p);
				188
				189
Mauro Carvalho Chehab	79ab3b0	2017-05-17 09:10:48 -0300	[diff] [blame]	190	Operations on these fields are straightforward::
Christoph Lameter	a1b2a55	2013-04-04 14:41:08 +0000	[diff] [blame]	191
				192	this_cpu_inc(p.m)
				193
				194	z = this_cpu_cmpxchg(p.m, 0, 1);
				195
				196
Mauro Carvalho Chehab	79ab3b0	2017-05-17 09:10:48 -0300	[diff] [blame]	197	If we have an offset to struct s::
Christoph Lameter	a1b2a55	2013-04-04 14:41:08 +0000	[diff] [blame]	198
				199	struct s __percpu *ps = &p;
				200
Pranith Kumar	ac490f4	2014-08-24 18:17:32 -0700	[diff] [blame]	201	this_cpu_dec(ps->m);
Christoph Lameter	a1b2a55	2013-04-04 14:41:08 +0000	[diff] [blame]	202
				203	z = this_cpu_inc_return(ps->n);
				204
				205
				206	The calculation of the pointer may require the use of this_cpu_ptr()
Mauro Carvalho Chehab	79ab3b0	2017-05-17 09:10:48 -0300	[diff] [blame]	207	if we do not make use of this_cpu ops later to manipulate fields::
Christoph Lameter	a1b2a55	2013-04-04 14:41:08 +0000	[diff] [blame]	208
				209	struct s *pp;
				210
				211	pp = this_cpu_ptr(&p);
				212
				213	pp->m--;
				214
				215	z = pp->n++;
				216
				217
				218	Variants of this_cpu ops
Mauro Carvalho Chehab	79ab3b0	2017-05-17 09:10:48 -0300	[diff] [blame]	219	------------------------
Christoph Lameter	a1b2a55	2013-04-04 14:41:08 +0000	[diff] [blame]	220
Pranith Kumar	ac490f4	2014-08-24 18:17:32 -0700	[diff] [blame]	221	this_cpu ops are interrupt safe. Some architectures do not support
Christoph Lameter	a1b2a55	2013-04-04 14:41:08 +0000	[diff] [blame]	222	these per cpu local operations. In that case the operation must be
				223	replaced by code that disables interrupts, then does the operations
Pranith Kumar	ac490f4	2014-08-24 18:17:32 -0700	[diff] [blame]	224	that are guaranteed to be atomic and then re-enable interrupts. Doing
Christoph Lameter	a1b2a55	2013-04-04 14:41:08 +0000	[diff] [blame]	225	so is expensive. If there are other reasons why the scheduler cannot
				226	change the processor we are executing on then there is no reason to
Pranith Kumar	ac490f4	2014-08-24 18:17:32 -0700	[diff] [blame]	227	disable interrupts. For that purpose the following __this_cpu operations
				228	are provided.
Christoph Lameter	a1b2a55	2013-04-04 14:41:08 +0000	[diff] [blame]	229
Pranith Kumar	ac490f4	2014-08-24 18:17:32 -0700	[diff] [blame]	230	These operations have no guarantee against concurrent interrupts or
				231	preemption. If a per cpu variable is not used in an interrupt context
				232	and the scheduler cannot preempt, then they are safe. If any interrupts
				233	still occur while an operation is in progress and if the interrupt too
				234	modifies the variable, then RMW actions can not be guaranteed to be
Mauro Carvalho Chehab	79ab3b0	2017-05-17 09:10:48 -0300	[diff] [blame]	235	safe::
Christoph Lameter	a1b2a55	2013-04-04 14:41:08 +0000	[diff] [blame]	236
Pranith Kumar	ac490f4	2014-08-24 18:17:32 -0700	[diff] [blame]	237	__this_cpu_read(pcp)
				238	__this_cpu_write(pcp, val)
				239	__this_cpu_add(pcp, val)
				240	__this_cpu_and(pcp, val)
				241	__this_cpu_or(pcp, val)
				242	__this_cpu_add_return(pcp, val)
				243	__this_cpu_xchg(pcp, nval)
				244	__this_cpu_cmpxchg(pcp, oval, nval)
				245	__this_cpu_cmpxchg_double(pcp1, pcp2, oval1, oval2, nval1, nval2)
				246	__this_cpu_sub(pcp, val)
				247	__this_cpu_inc(pcp)
				248	__this_cpu_dec(pcp)
				249	__this_cpu_sub_return(pcp, val)
				250	__this_cpu_inc_return(pcp)
				251	__this_cpu_dec_return(pcp)
				252
				253
				254	Will increment x and will not fall-back to code that disables
Christoph Lameter	a1b2a55	2013-04-04 14:41:08 +0000	[diff] [blame]	255	interrupts on platforms that cannot accomplish atomicity through
				256	address relocation and a Read-Modify-Write operation in the same
				257	instruction.
				258
				259
Christoph Lameter	a1b2a55	2013-04-04 14:41:08 +0000	[diff] [blame]	260	&this_cpu_ptr(pp)->n vs this_cpu_ptr(&pp->n)
				261	--------------------------------------------
				262
				263	The first operation takes the offset and forms an address and then
Pranith Kumar	ac490f4	2014-08-24 18:17:32 -0700	[diff] [blame]	264	adds the offset of the n field. This may result in two add
				265	instructions emitted by the compiler.
Christoph Lameter	a1b2a55	2013-04-04 14:41:08 +0000	[diff] [blame]	266
				267	The second one first adds the two offsets and then does the
				268	relocation. IMHO the second form looks cleaner and has an easier time
				269	with (). The second form also is consistent with the way
				270	this_cpu_read() and friends are used.
				271
				272
Pranith Kumar	ac490f4	2014-08-24 18:17:32 -0700	[diff] [blame]	273	Remote access to per cpu data
				274	------------------------------
				275
				276	Per cpu data structures are designed to be used by one cpu exclusively.
				277	If you use the variables as intended, this_cpu_ops() are guaranteed to
				278	be "atomic" as no other CPU has access to these data structures.
				279
				280	There are special cases where you might need to access per cpu data
				281	structures remotely. It is usually safe to do a remote read access
				282	and that is frequently done to summarize counters. Remote write access
				283	something which could be problematic because this_cpu ops do not
				284	have lock semantics. A remote write may interfere with a this_cpu
				285	RMW operation.
				286
				287	Remote write accesses to percpu data structures are highly discouraged
				288	unless absolutely necessary. Please consider using an IPI to wake up
				289	the remote CPU and perform the update to its per cpu area.
				290
				291	To access per-cpu data structure remotely, typically the per_cpu_ptr()
Mauro Carvalho Chehab	79ab3b0	2017-05-17 09:10:48 -0300	[diff] [blame]	292	function is used::
Pranith Kumar	ac490f4	2014-08-24 18:17:32 -0700	[diff] [blame]	293
				294
				295	DEFINE_PER_CPU(struct data, datap);
				296
				297	struct data *p = per_cpu_ptr(&datap, cpu);
				298
				299	This makes it explicit that we are getting ready to access a percpu
				300	area remotely.
				301
Mauro Carvalho Chehab	79ab3b0	2017-05-17 09:10:48 -0300	[diff] [blame]	302	You can also do the following to convert the datap offset to an address::
Pranith Kumar	ac490f4	2014-08-24 18:17:32 -0700	[diff] [blame]	303
				304	struct data *p = this_cpu_ptr(&datap);
				305
				306	but, passing of pointers calculated via this_cpu_ptr to other cpus is
				307	unusual and should be avoided.
				308
				309	Remote access are typically only for reading the status of another cpus
				310	per cpu data. Write accesses can cause unique problems due to the
				311	relaxed synchronization requirements for this_cpu operations.
				312
				313	One example that illustrates some concerns with write operations is
				314	the following scenario that occurs because two per cpu variables
				315	share a cache-line but the relaxed synchronization is applied to
				316	only one process updating the cache-line.
				317
Mauro Carvalho Chehab	79ab3b0	2017-05-17 09:10:48 -0300	[diff] [blame]	318	Consider the following example::
Pranith Kumar	ac490f4	2014-08-24 18:17:32 -0700	[diff] [blame]	319
				320
				321	struct test {
				322	atomic_t a;
				323	int b;
				324	};
				325
				326	DEFINE_PER_CPU(struct test, onecacheline);
				327
				328	There is some concern about what would happen if the field 'a' is updated
				329	remotely from one processor and the local processor would use this_cpu ops
				330	to update field b. Care should be taken that such simultaneous accesses to
				331	data within the same cache line are avoided. Also costly synchronization
				332	may be necessary. IPIs are generally recommended in such scenarios instead
				333	of a remote write to the per cpu area of another processor.
				334
				335	Even in cases where the remote writes are rare, please bear in
				336	mind that a remote write will evict the cache line from the processor
				337	that most likely will access it. If the processor wakes up and finds a
				338	missing local cache line of a per cpu area, its performance and hence
				339	the wake up times will be affected.