Blame - Documentation/atomic_ops.txt - kernel/msm-4.9

blob: c9d1cacb439590a02dbfb1282cbe0d3abe3a9b0d [file] [log] [blame]

Linus Torvalds	1da177e	2005-04-16 15:20:36 -0700	[diff] [blame]	1	Semantics and Behavior of Atomic and
				2	Bitmask Operations
				3
				4	David S. Miller
				5
				6	This document is intended to serve as a guide to Linux port
				7	maintainers on how to implement atomic counter, bitops, and spinlock
				8	interfaces properly.
				9
Paul E. McKenney	1f7870d	2014-10-19 12:05:22 -0700	[diff] [blame]	10	The atomic_t type should be defined as a signed integer and
				11	the atomic_long_t type as a signed long integer. Also, they should
				12	be made opaque such that any kind of cast to a normal C integer type
				13	will fail. Something like the following should suffice:
Linus Torvalds	1da177e	2005-04-16 15:20:36 -0700	[diff] [blame]	14
Nikanth Karthikesan	72eef0f	2011-05-26 16:25:13 -0700	[diff] [blame]	15	typedef struct { int counter; } atomic_t;
Paul E. McKenney	1f7870d	2014-10-19 12:05:22 -0700	[diff] [blame]	16	typedef struct { long counter; } atomic_long_t;
Linus Torvalds	1da177e	2005-04-16 15:20:36 -0700	[diff] [blame]	17
Matti Linnanvuori	8d7b52d	2007-10-16 23:30:08 -0700	[diff] [blame]	18	Historically, counter has been declared volatile. This is now discouraged.
				19	See Documentation/volatile-considered-harmful.txt for the complete rationale.
				20
Grant Grundler	1a2142b	2007-10-16 23:29:28 -0700	[diff] [blame]	21	local_t is very similar to atomic_t. If the counter is per CPU and only
				22	updated by one CPU, local_t is probably more appropriate. Please see
				23	Documentation/local_ops.txt for the semantics of local_t.
				24
Matti Linnanvuori	8d7b52d	2007-10-16 23:30:08 -0700	[diff] [blame]	25	The first operations to implement for atomic_t's are the initializers and
				26	plain reads.
Linus Torvalds	1da177e	2005-04-16 15:20:36 -0700	[diff] [blame]	27
				28	#define ATOMIC_INIT(i) { (i) }
				29	#define atomic_set(v, i) ((v)->counter = (i))
				30
				31	The first macro is used in definitions, such as:
				32
				33	static atomic_t my_counter = ATOMIC_INIT(1);
				34
Matti Linnanvuori	8d7b52d	2007-10-16 23:30:08 -0700	[diff] [blame]	35	The initializer is atomic in that the return values of the atomic operations
				36	are guaranteed to be correct reflecting the initialized value if the
				37	initializer is used before runtime. If the initializer is used at runtime, a
				38	proper implicit or explicit read memory barrier is needed before reading the
				39	value with atomic_read from another thread.
				40
Paul E. McKenney	1f7870d	2014-10-19 12:05:22 -0700	[diff] [blame]	41	As with all of the atomic_ interfaces, replace the leading "atomic_"
				42	with "atomic_long_" to operate on atomic_long_t.
				43
Linus Torvalds	1da177e	2005-04-16 15:20:36 -0700	[diff] [blame]	44	The second interface can be used at runtime, as in:
				45
				46	struct foo { atomic_t counter; };
				47	...
				48
				49	struct foo *k;
				50
				51	k = kmalloc(sizeof(*k), GFP_KERNEL);
				52	if (!k)
				53	return -ENOMEM;
				54	atomic_set(&k->counter, 0);
				55
Matti Linnanvuori	8d7b52d	2007-10-16 23:30:08 -0700	[diff] [blame]	56	The setting is atomic in that the return values of the atomic operations by
				57	all threads are guaranteed to be correct reflecting either the value that has
				58	been set with this operation or set with another operation. A proper implicit
				59	or explicit memory barrier is needed before the value set with the operation
				60	is guaranteed to be readable with atomic_read from another thread.
				61
Linus Torvalds	1da177e	2005-04-16 15:20:36 -0700	[diff] [blame]	62	Next, we have:
				63
				64	#define atomic_read(v) ((v)->counter)
				65
Matti Linnanvuori	8d7b52d	2007-10-16 23:30:08 -0700	[diff] [blame]	66	which simply reads the counter value currently visible to the calling thread.
				67	The read is atomic in that the return value is guaranteed to be one of the
				68	values initialized or modified with the interface operations if a proper
				69	implicit or explicit memory barrier is used after possible runtime
				70	initialization by any other thread and the value is modified only with the
				71	interface operations. atomic_read does not guarantee that the runtime
				72	initialization by any other thread is visible yet, so the user of the
				73	interface must take care of that with a proper implicit or explicit memory
				74	barrier.
Linus Torvalds	1da177e	2005-04-16 15:20:36 -0700	[diff] [blame]	75
Matti Linnanvuori	8d7b52d	2007-10-16 23:30:08 -0700	[diff] [blame]	76	* WARNING: atomic_read() and atomic_set() DO NOT IMPLY BARRIERS! *
				77
				78	Some architectures may choose to use the volatile keyword, barriers, or inline
				79	assembly to guarantee some degree of immediacy for atomic_read() and
				80	atomic_set(). This is not uniformly guaranteed, and may change in the future,
				81	so all users of atomic_t should treat atomic_read() and atomic_set() as simple
				82	C statements that may be reordered or optimized away entirely by the compiler
				83	or processor, and explicitly invoke the appropriate compiler and/or memory
				84	barrier for each use case. Failure to do so will result in code that may
				85	suddenly break when used with different architectures or compiler
				86	optimizations, or even changes in unrelated code which changes how the
				87	compiler optimizes the section accessing atomic_t variables.
				88
				89	* YOU HAVE BEEN WARNED! *
				90
Paul E. McKenney	182dd4b	2011-11-22 10:55:12 -0800	[diff] [blame]	91	Properly aligned pointers, longs, ints, and chars (and unsigned
				92	equivalents) may be atomically loaded from and stored to in the same
				93	sense as described for atomic_read() and atomic_set(). The ACCESS_ONCE()
				94	macro should be used to prevent the compiler from using optimizations
				95	that might otherwise optimize accesses out of existence on the one hand,
				96	or that might create unsolicited accesses on the other.
				97
				98	For example consider the following code:
				99
				100	while (a > 0)
				101	do_something();
				102
				103	If the compiler can prove that do_something() does not store to the
				104	variable a, then the compiler is within its rights transforming this to
				105	the following:
				106
				107	tmp = a;
				108	if (a > 0)
				109	for (;;)
				110	do_something();
				111
				112	If you don't want the compiler to do this (and you probably don't), then
				113	you should use something like the following:
				114
				115	while (ACCESS_ONCE(a) < 0)
				116	do_something();
				117
				118	Alternatively, you could place a barrier() call in the loop.
				119
				120	For another example, consider the following code:
				121
				122	tmp_a = a;
				123	do_something_with(tmp_a);
				124	do_something_else_with(tmp_a);
				125
				126	If the compiler can prove that do_something_with() does not store to the
				127	variable a, then the compiler is within its rights to manufacture an
				128	additional load as follows:
				129
				130	tmp_a = a;
				131	do_something_with(tmp_a);
				132	tmp_a = a;
				133	do_something_else_with(tmp_a);
				134
				135	This could fatally confuse your code if it expected the same value
				136	to be passed to do_something_with() and do_something_else_with().
				137
				138	The compiler would be likely to manufacture this additional load if
				139	do_something_with() was an inline function that made very heavy use
				140	of registers: reloading from variable a could save a flush to the
				141	stack and later reload. To prevent the compiler from attacking your
				142	code in this manner, write the following:
				143
				144	tmp_a = ACCESS_ONCE(a);
				145	do_something_with(tmp_a);
				146	do_something_else_with(tmp_a);
				147
				148	For a final example, consider the following code, assuming that the
				149	variable a is set at boot time before the second CPU is brought online
				150	and never changed later, so that memory barriers are not needed:
				151
				152	if (a)
				153	b = 9;
				154	else
				155	b = 42;
				156
				157	The compiler is within its rights to manufacture an additional store
				158	by transforming the above code into the following:
				159
				160	b = 42;
				161	if (a)
				162	b = 9;
				163
				164	This could come as a fatal surprise to other code running concurrently
				165	that expected b to never have the value 42 if a was zero. To prevent
				166	the compiler from doing this, write something like:
				167
				168	if (a)
				169	ACCESS_ONCE(b) = 9;
				170	else
				171	ACCESS_ONCE(b) = 42;
				172
				173	Don't even -think- about doing this without proper use of memory barriers,
				174	locks, or atomic operations if variable a can change at runtime!
				175
				176	* WARNING: ACCESS_ONCE() DOES NOT IMPLY A BARRIER! *
				177
Matti Linnanvuori	8d7b52d	2007-10-16 23:30:08 -0700	[diff] [blame]	178	Now, we move onto the atomic operation interfaces typically implemented with
				179	the help of assembly code.
Linus Torvalds	1da177e	2005-04-16 15:20:36 -0700	[diff] [blame]	180
				181	void atomic_add(int i, atomic_t *v);
				182	void atomic_sub(int i, atomic_t *v);
				183	void atomic_inc(atomic_t *v);
				184	void atomic_dec(atomic_t *v);
				185
				186	These four routines add and subtract integral values to/from the given
				187	atomic_t value. The first two routines pass explicit integers by
				188	which to make the adjustment, whereas the latter two use an implicit
				189	adjustment value of "1".
				190
				191	One very important aspect of these two routines is that they DO NOT
				192	require any explicit memory barriers. They need only perform the
				193	atomic_t counter update in an SMP safe manner.
				194
				195	Next, we have:
				196
				197	int atomic_inc_return(atomic_t *v);
				198	int atomic_dec_return(atomic_t *v);
				199
				200	These routines add 1 and subtract 1, respectively, from the given
				201	atomic_t and return the new counter value after the operation is
				202	performed.
				203
Paul E. McKenney	daf1aab	2015-02-02 08:08:25 -0800	[diff] [blame]	204	Unlike the above routines, it is required that these primitives
				205	include explicit memory barriers that are performed before and after
				206	the operation. It must be done such that all memory operations before
				207	and after the atomic operation calls are strongly ordered with respect
				208	to the atomic operation itself.
Linus Torvalds	1da177e	2005-04-16 15:20:36 -0700	[diff] [blame]	209
				210	For example, it should behave as if a smp_mb() call existed both
				211	before and after the atomic operation.
				212
				213	If the atomic instructions used in an implementation provide explicit
				214	memory barrier semantics which satisfy the above requirements, that is
				215	fine as well.
				216
				217	Let's move on:
				218
				219	int atomic_add_return(int i, atomic_t *v);
				220	int atomic_sub_return(int i, atomic_t *v);
				221
				222	These behave just like atomic_{inc,dec}_return() except that an
				223	explicit counter adjustment is given instead of the implicit "1".
				224	This means that like atomic_{inc,dec}_return(), the memory barrier
				225	semantics are required.
				226
				227	Next:
				228
				229	int atomic_inc_and_test(atomic_t *v);
				230	int atomic_dec_and_test(atomic_t *v);
				231
				232	These two routines increment and decrement by 1, respectively, the
				233	given atomic counter. They return a boolean indicating whether the
				234	resulting counter value was zero or not.
				235
Paul E. McKenney	daf1aab	2015-02-02 08:08:25 -0800	[diff] [blame]	236	Again, these primitives provide explicit memory barrier semantics around
				237	the atomic operation.
Linus Torvalds	1da177e	2005-04-16 15:20:36 -0700	[diff] [blame]	238
				239	int atomic_sub_and_test(int i, atomic_t *v);
				240
				241	This is identical to atomic_dec_and_test() except that an explicit
Paul E. McKenney	daf1aab	2015-02-02 08:08:25 -0800	[diff] [blame]	242	decrement is given instead of the implicit "1". This primitive must
				243	provide explicit memory barrier semantics around the operation.
Linus Torvalds	1da177e	2005-04-16 15:20:36 -0700	[diff] [blame]	244
				245	int atomic_add_negative(int i, atomic_t *v);
				246
Paul E. McKenney	daf1aab	2015-02-02 08:08:25 -0800	[diff] [blame]	247	The given increment is added to the given atomic counter value. A boolean
				248	is return which indicates whether the resulting counter value is negative.
				249	This primitive must provide explicit memory barrier semantics around
				250	the operation.
Linus Torvalds	1da177e	2005-04-16 15:20:36 -0700	[diff] [blame]	251
Nick Piggin	8426e1f	2005-11-13 16:07:25 -0800	[diff] [blame]	252	Then:
Nick Piggin	4a6dae6	2005-11-13 16:07:24 -0800	[diff] [blame]	253
Matti Linnanvuori	8d7b52d	2007-10-16 23:30:08 -0700	[diff] [blame]	254	int atomic_xchg(atomic_t *v, int new);
				255
				256	This performs an atomic exchange operation on the atomic variable v, setting
				257	the given new value. It returns the old value that the atomic variable v had
				258	just before the operation.
				259
Paul E. McKenney	daf1aab	2015-02-02 08:08:25 -0800	[diff] [blame]	260	atomic_xchg must provide explicit memory barriers around the operation.
Richard Braun	7e8b1e7	2012-12-13 11:07:32 +0100	[diff] [blame]	261
Nick Piggin	4a6dae6	2005-11-13 16:07:24 -0800	[diff] [blame]	262	int atomic_cmpxchg(atomic_t *v, int old, int new);
				263
				264	This performs an atomic compare exchange operation on the atomic value v,
				265	with the given old and new values. Like all atomic_xxx operations,
				266	atomic_cmpxchg will only satisfy its atomicity semantics as long as all
				267	other accesses of *v are performed through atomic_xxx operations.
				268
Will Deacon	ed2de9f	2015-07-16 16:10:06 +0100	[diff] [blame]	269	atomic_cmpxchg must provide explicit memory barriers around the operation,
				270	although if the comparison fails then no memory ordering guarantees are
				271	required.
Nick Piggin	4a6dae6	2005-11-13 16:07:24 -0800	[diff] [blame]	272
				273	The semantics for atomic_cmpxchg are the same as those defined for 'cas'
				274	below.
				275
Nick Piggin	8426e1f	2005-11-13 16:07:25 -0800	[diff] [blame]	276	Finally:
				277
				278	int atomic_add_unless(atomic_t *v, int a, int u);
				279
				280	If the atomic value v is not equal to u, this function adds a to v, and
				281	returns non zero. If v is equal to u then it returns zero. This is done as
				282	an atomic operation.
				283
Paul E. McKenney	daf1aab	2015-02-02 08:08:25 -0800	[diff] [blame]	284	atomic_add_unless must provide explicit memory barriers around the
				285	operation unless it fails (returns 0).
Nick Piggin	8426e1f	2005-11-13 16:07:25 -0800	[diff] [blame]	286
				287	atomic_inc_not_zero, equivalent to atomic_add_unless(v, 1, 0)
				288
Nick Piggin	4a6dae6	2005-11-13 16:07:24 -0800	[diff] [blame]	289
Linus Torvalds	1da177e	2005-04-16 15:20:36 -0700	[diff] [blame]	290	If a caller requires memory barrier semantics around an atomic_t
				291	operation which does not return a value, a set of interfaces are
				292	defined which accomplish this:
				293
Peter Zijlstra	1b15611	2014-03-13 19:00:35 +0100	[diff] [blame]	294	void smp_mb__before_atomic(void);
				295	void smp_mb__after_atomic(void);
Linus Torvalds	1da177e	2005-04-16 15:20:36 -0700	[diff] [blame]	296
Peter Zijlstra	1b15611	2014-03-13 19:00:35 +0100	[diff] [blame]	297	For example, smp_mb__before_atomic() can be used like so:
Linus Torvalds	1da177e	2005-04-16 15:20:36 -0700	[diff] [blame]	298
				299	obj->dead = 1;
Peter Zijlstra	1b15611	2014-03-13 19:00:35 +0100	[diff] [blame]	300	smp_mb__before_atomic();
Linus Torvalds	1da177e	2005-04-16 15:20:36 -0700	[diff] [blame]	301	atomic_dec(&obj->ref_count);
				302
Michael Hayes	a0ebb3f	2006-06-26 18:27:35 +0200	[diff] [blame]	303	It makes sure that all memory operations preceding the atomic_dec()
Linus Torvalds	1da177e	2005-04-16 15:20:36 -0700	[diff] [blame]	304	call are strongly ordered with respect to the atomic counter
Michael Hayes	a0ebb3f	2006-06-26 18:27:35 +0200	[diff] [blame]	305	operation. In the above example, it guarantees that the assignment of
Linus Torvalds	1da177e	2005-04-16 15:20:36 -0700	[diff] [blame]	306	"1" to obj->dead will be globally visible to other cpus before the
				307	atomic counter decrement.
				308
Peter Zijlstra	1b15611	2014-03-13 19:00:35 +0100	[diff] [blame]	309	Without the explicit smp_mb__before_atomic() call, the
Linus Torvalds	1da177e	2005-04-16 15:20:36 -0700	[diff] [blame]	310	implementation could legally allow the atomic counter update visible
				311	to other cpus before the "obj->dead = 1;" assignment.
				312
Linus Torvalds	1da177e	2005-04-16 15:20:36 -0700	[diff] [blame]	313	A missing memory barrier in the cases where they are required by the
Michael Hayes	a0ebb3f	2006-06-26 18:27:35 +0200	[diff] [blame]	314	atomic_t implementation above can have disastrous results. Here is
				315	an example, which follows a pattern occurring frequently in the Linux
Linus Torvalds	1da177e	2005-04-16 15:20:36 -0700	[diff] [blame]	316	kernel. It is the use of atomic counters to implement reference
				317	counting, and it works such that once the counter falls to zero it can
Michael Hayes	a0ebb3f	2006-06-26 18:27:35 +0200	[diff] [blame]	318	be guaranteed that no other entity can be accessing the object:
Linus Torvalds	1da177e	2005-04-16 15:20:36 -0700	[diff] [blame]	319
Figo.zhang	4764e28	2009-06-16 15:33:51 -0700	[diff] [blame]	320	static void obj_list_add(struct obj obj, struct list_head head)
Linus Torvalds	1da177e	2005-04-16 15:20:36 -0700	[diff] [blame]	321	{
				322	obj->active = 1;
Figo.zhang	4764e28	2009-06-16 15:33:51 -0700	[diff] [blame]	323	list_add(&obj->list, head);
Linus Torvalds	1da177e	2005-04-16 15:20:36 -0700	[diff] [blame]	324	}
				325
				326	static void obj_list_del(struct obj *obj)
				327	{
				328	list_del(&obj->list);
				329	obj->active = 0;
				330	}
				331
				332	static void obj_destroy(struct obj *obj)
				333	{
				334	BUG_ON(obj->active);
				335	kfree(obj);
				336	}
				337
				338	struct obj obj_list_peek(struct list_head head)
				339	{
				340	if (!list_empty(head)) {
				341	struct obj *obj;
				342
				343	obj = list_entry(head->next, struct obj, list);
				344	atomic_inc(&obj->refcnt);
				345	return obj;
				346	}
				347	return NULL;
				348	}
				349
				350	void obj_poke(void)
				351	{
				352	struct obj *obj;
				353
				354	spin_lock(&global_list_lock);
				355	obj = obj_list_peek(&global_list);
				356	spin_unlock(&global_list_lock);
				357
				358	if (obj) {
				359	obj->ops->poke(obj);
				360	if (atomic_dec_and_test(&obj->refcnt))
				361	obj_destroy(obj);
				362	}
				363	}
				364
				365	void obj_timeout(struct obj *obj)
				366	{
				367	spin_lock(&global_list_lock);
				368	obj_list_del(obj);
				369	spin_unlock(&global_list_lock);
				370
				371	if (atomic_dec_and_test(&obj->refcnt))
				372	obj_destroy(obj);
				373	}
				374
				375	(This is a simplification of the ARP queue management in the
				376	generic neighbour discover code of the networking. Olaf Kirch
				377	found a bug wrt. memory barriers in kfree_skb() that exposed
				378	the atomic_t memory barrier requirements quite clearly.)
				379
				380	Given the above scheme, it must be the case that the obj->active
				381	update done by the obj list deletion be visible to other processors
				382	before the atomic counter decrement is performed.
				383
				384	Otherwise, the counter could fall to zero, yet obj->active would still
				385	be set, thus triggering the assertion in obj_destroy(). The error
				386	sequence looks like this:
				387
				388	cpu 0 cpu 1
				389	obj_poke() obj_timeout()
				390	obj = obj_list_peek();
				391	... gains ref to obj, refcnt=2
				392	obj_list_del(obj);
				393	obj->active = 0 ...
				394	... visibility delayed ...
				395	atomic_dec_and_test()
				396	... refcnt drops to 1 ...
				397	atomic_dec_and_test()
				398	... refcount drops to 0 ...
				399	obj_destroy()
				400	BUG() triggers since obj->active
				401	still seen as one
				402	obj->active update visibility occurs
				403
				404	With the memory barrier semantics required of the atomic_t operations
				405	which return values, the above sequence of memory visibility can never
				406	happen. Specifically, in the above case the atomic_dec_and_test()
				407	counter decrement would not become globally visible until the
				408	obj->active update does.
				409
				410	As a historical note, 32-bit Sparc used to only allow usage of
Francis Galiegue	a33f322	2010-04-23 00:08:02 +0200	[diff] [blame]	411	24-bits of its atomic_t type. This was because it used 8 bits
Linus Torvalds	1da177e	2005-04-16 15:20:36 -0700	[diff] [blame]	412	as a spinlock for SMP safety. Sparc32 lacked a "compare and swap"
				413	type instruction. However, 32-bit Sparc has since been moved over
				414	to a "hash table of spinlocks" scheme, that allows the full 32-bit
				415	counter to be realized. Essentially, an array of spinlocks are
				416	indexed into based upon the address of the atomic_t being operated
				417	on, and that lock protects the atomic operation. Parisc uses the
				418	same scheme.
				419
				420	Another note is that the atomic_t operations returning values are
				421	extremely slow on an old 386.
				422
				423	We will now cover the atomic bitmask operations. You will find that
				424	their SMP and memory barrier semantics are similar in shape and scope
				425	to the atomic_t ops above.
				426
				427	Native atomic bit operations are defined to operate on objects aligned
				428	to the size of an "unsigned long" C data type, and are least of that
				429	size. The endianness of the bits within each "unsigned long" are the
				430	native endianness of the cpu.
				431
Michael Hayes	a0ebb3f	2006-06-26 18:27:35 +0200	[diff] [blame]	432	void set_bit(unsigned long nr, volatile unsigned long *addr);
				433	void clear_bit(unsigned long nr, volatile unsigned long *addr);
				434	void change_bit(unsigned long nr, volatile unsigned long *addr);
Linus Torvalds	1da177e	2005-04-16 15:20:36 -0700	[diff] [blame]	435
				436	These routines set, clear, and change, respectively, the bit number
				437	indicated by "nr" on the bit mask pointed to by "ADDR".
				438
				439	They must execute atomically, yet there are no implicit memory barrier
				440	semantics required of these interfaces.
				441
Michael Hayes	a0ebb3f	2006-06-26 18:27:35 +0200	[diff] [blame]	442	int test_and_set_bit(unsigned long nr, volatile unsigned long *addr);
				443	int test_and_clear_bit(unsigned long nr, volatile unsigned long *addr);
				444	int test_and_change_bit(unsigned long nr, volatile unsigned long *addr);
Linus Torvalds	1da177e	2005-04-16 15:20:36 -0700	[diff] [blame]	445
				446	Like the above, except that these routines return a boolean which
				447	indicates whether the changed bit was set _BEFORE_ the atomic bit
				448	operation.
				449
				450	WARNING! It is incredibly important that the value be a boolean,
				451	ie. "0" or "1". Do not try to be fancy and save a few instructions by
				452	declaring the above to return "long" and just returning something like
				453	"old_val & mask" because that will not work.
				454
				455	For one thing, this return value gets truncated to int in many code
				456	paths using these interfaces, so on 64-bit if the bit is set in the
				457	upper 32-bits then testers will never see that.
				458
				459	One great example of where this problem crops up are the thread_info
				460	flag operations. Routines such as test_and_set_ti_thread_flag() chop
				461	the return value into an int. There are other places where things
				462	like this occur as well.
				463
				464	These routines, like the atomic_t counter operations returning values,
Paul E. McKenney	daf1aab	2015-02-02 08:08:25 -0800	[diff] [blame]	465	must provide explicit memory barrier semantics around their execution.
				466	All memory operations before the atomic bit operation call must be
				467	made visible globally before the atomic bit operation is made visible.
Linus Torvalds	1da177e	2005-04-16 15:20:36 -0700	[diff] [blame]	468	Likewise, the atomic bit operation must be visible globally before any
				469	subsequent memory operation is made visible. For example:
				470
				471	obj->dead = 1;
				472	if (test_and_set_bit(0, &obj->flags))
				473	/* ... */;
				474	obj->killed = 1;
				475
Michael Hayes	a0ebb3f	2006-06-26 18:27:35 +0200	[diff] [blame]	476	The implementation of test_and_set_bit() must guarantee that
Linus Torvalds	1da177e	2005-04-16 15:20:36 -0700	[diff] [blame]	477	"obj->dead = 1;" is visible to cpus before the atomic memory operation
				478	done by test_and_set_bit() becomes visible. Likewise, the atomic
				479	memory operation done by test_and_set_bit() must become visible before
				480	"obj->killed = 1;" is visible.
				481
				482	Finally there is the basic operation:
				483
				484	int test_bit(unsigned long nr, __const__ volatile unsigned long *addr);
				485
				486	Which returns a boolean indicating if bit "nr" is set in the bitmask
				487	pointed to by "addr".
				488
Peter Zijlstra	1b15611	2014-03-13 19:00:35 +0100	[diff] [blame]	489	If explicit memory barriers are required around {set,clear}_bit() (which do
				490	not return a value, and thus does not need to provide memory barrier
				491	semantics), two interfaces are provided:
Linus Torvalds	1da177e	2005-04-16 15:20:36 -0700	[diff] [blame]	492
Peter Zijlstra	1b15611	2014-03-13 19:00:35 +0100	[diff] [blame]	493	void smp_mb__before_atomic(void);
				494	void smp_mb__after_atomic(void);
Linus Torvalds	1da177e	2005-04-16 15:20:36 -0700	[diff] [blame]	495
				496	They are used as follows, and are akin to their atomic_t operation
				497	brothers:
				498
				499	/* All memory operations before this call will
				500	* be globally visible before the clear_bit().
				501	*/
Peter Zijlstra	1b15611	2014-03-13 19:00:35 +0100	[diff] [blame]	502	smp_mb__before_atomic();
Linus Torvalds	1da177e	2005-04-16 15:20:36 -0700	[diff] [blame]	503	clear_bit( ... );
				504
				505	/* The clear_bit() will be visible before all
				506	* subsequent memory operations.
				507	*/
Peter Zijlstra	1b15611	2014-03-13 19:00:35 +0100	[diff] [blame]	508	smp_mb__after_atomic();
Linus Torvalds	1da177e	2005-04-16 15:20:36 -0700	[diff] [blame]	509
Nick Piggin	2633357	2007-10-18 03:06:39 -0700	[diff] [blame]	510	There are two special bitops with lock barrier semantics (acquire/release,
				511	same as spinlocks). These operate in the same way as their non-_lock/unlock
				512	postfixed variants, except that they are to provide acquire/release semantics,
				513	respectively. This means they can be used for bit_spin_trylock and
				514	bit_spin_unlock type operations without specifying any more barriers.
				515
				516	int test_and_set_bit_lock(unsigned long nr, unsigned long *addr);
				517	void clear_bit_unlock(unsigned long nr, unsigned long *addr);
				518	void __clear_bit_unlock(unsigned long nr, unsigned long *addr);
				519
				520	The __clear_bit_unlock version is non-atomic, however it still implements
				521	unlock barrier semantics. This can be useful if the lock itself is protecting
				522	the other bits in the word.
				523
Linus Torvalds	1da177e	2005-04-16 15:20:36 -0700	[diff] [blame]	524	Finally, there are non-atomic versions of the bitmask operations
				525	provided. They are used in contexts where some other higher-level SMP
				526	locking scheme is being used to protect the bitmask, and thus less
				527	expensive non-atomic operations may be used in the implementation.
				528	They have names similar to the above bitmask operation interfaces,
				529	except that two underscores are prefixed to the interface name.
				530
				531	void __set_bit(unsigned long nr, volatile unsigned long *addr);
				532	void __clear_bit(unsigned long nr, volatile unsigned long *addr);
				533	void __change_bit(unsigned long nr, volatile unsigned long *addr);
				534	int __test_and_set_bit(unsigned long nr, volatile unsigned long *addr);
				535	int __test_and_clear_bit(unsigned long nr, volatile unsigned long *addr);
				536	int __test_and_change_bit(unsigned long nr, volatile unsigned long *addr);
				537
				538	These non-atomic variants also do not require any special memory
				539	barrier semantics.
				540
Paul E. McKenney	daf1aab	2015-02-02 08:08:25 -0800	[diff] [blame]	541	The routines xchg() and cmpxchg() must provide the same exact
				542	memory-barrier semantics as the atomic and bit operations returning
				543	values.
Linus Torvalds	1da177e	2005-04-16 15:20:36 -0700	[diff] [blame]	544
Boqun Feng	8456799	2015-08-26 19:52:46 +0800	[diff] [blame]	545	Note: If someone wants to use xchg(), cmpxchg() and their variants,
				546	linux/atomic.h should be included rather than asm/cmpxchg.h, unless
				547	the code is in arch/* and can take care of itself.
				548
Linus Torvalds	1da177e	2005-04-16 15:20:36 -0700	[diff] [blame]	549	Spinlocks and rwlocks have memory barrier expectations as well.
				550	The rule to follow is simple:
				551
				552	1) When acquiring a lock, the implementation must make it globally
				553	visible before any subsequent memory operation.
				554
				555	2) When releasing a lock, the implementation must make it such that
				556	all previous memory operations are globally visible before the
				557	lock release.
				558
				559	Which finally brings us to _atomic_dec_and_lock(). There is an
				560	architecture-neutral version implemented in lib/dec_and_lock.c,
				561	but most platforms will wish to optimize this in assembler.
				562
				563	int _atomic_dec_and_lock(atomic_t atomic, spinlock_t lock);
				564
				565	Atomically decrement the given counter, and if will drop to zero
				566	atomically acquire the given spinlock and perform the decrement
				567	of the counter to zero. If it does not drop to zero, do nothing
				568	with the spinlock.
				569
				570	It is actually pretty simple to get the memory barrier correct.
				571	Simply satisfy the spinlock grab requirements, which is make
				572	sure the spinlock operation is globally visible before any
				573	subsequent memory operation.
				574
				575	We can demonstrate this operation more clearly if we define
				576	an abstract atomic operation:
				577
				578	long cas(long *mem, long old, long new);
				579
				580	"cas" stands for "compare and swap". It atomically:
				581
				582	1) Compares "old" with the value currently at "mem".
				583	2) If they are equal, "new" is written to "mem".
				584	3) Regardless, the current value at "mem" is returned.
				585
				586	As an example usage, here is what an atomic counter update
				587	might look like:
				588
				589	void example_atomic_inc(long *counter)
				590	{
				591	long old, new, ret;
				592
				593	while (1) {
				594	old = *counter;
				595	new = old + 1;
				596
				597	ret = cas(counter, old, new);
				598	if (ret == old)
				599	break;
				600	}
				601	}
				602
				603	Let's use cas() in order to build a pseudo-C atomic_dec_and_lock():
				604
				605	int _atomic_dec_and_lock(atomic_t atomic, spinlock_t lock)
				606	{
				607	long old, new, ret;
				608	int went_to_zero;
				609
				610	went_to_zero = 0;
				611	while (1) {
				612	old = atomic_read(atomic);
				613	new = old - 1;
				614	if (new == 0) {
				615	went_to_zero = 1;
				616	spin_lock(lock);
				617	}
				618	ret = cas(atomic, old, new);
				619	if (ret == old)
				620	break;
				621	if (went_to_zero) {
				622	spin_unlock(lock);
				623	went_to_zero = 0;
				624	}
				625	}
				626
				627	return went_to_zero;
				628	}
				629
				630	Now, as far as memory barriers go, as long as spin_lock()
				631	strictly orders all subsequent memory operations (including
				632	the cas()) with respect to itself, things will be fine.
				633
Michael Hayes	a0ebb3f	2006-06-26 18:27:35 +0200	[diff] [blame]	634	Said another way, _atomic_dec_and_lock() must guarantee that
Linus Torvalds	1da177e	2005-04-16 15:20:36 -0700	[diff] [blame]	635	a counter dropping to zero is never made visible before the
				636	spinlock being acquired.
				637
				638	Note that this also means that for the case where the counter
				639	is not dropping to zero, there are no memory ordering
				640	requirements.