Blame - llvm/docs/Atomics.rst - toolchain/llvm-project

blob: ff667480446271f7bf14d9468846ab5887112c26 [file] [log] [blame]

Bill Wendling	2908947	2012-06-29 09:00:01 +0000	[diff] [blame]	1	==============================================
				2	LLVM Atomic Instructions and Concurrency Guide
				3	==============================================
				4
				5	.. contents::
				6	:local:
				7
				8	Introduction
				9	============
				10
JF Bastien	86d8d87	2016-04-05 00:31:25 +0000	[diff] [blame]	11	LLVM supports instructions which are well-defined in the presence of threads and
				12	asynchronous signals.
Bill Wendling	2908947	2012-06-29 09:00:01 +0000	[diff] [blame]	13
				14	The atomic instructions are designed specifically to provide readable IR and
				15	optimized code generation for the following:
				16
JF Bastien	86d8d87	2016-04-05 00:31:25 +0000	[diff] [blame]	17	* The C++11 ``<atomic>`` header. (`C++11 draft available here
Robin Morisset	e83f59e	2014-10-03 01:04:20 +0000	[diff] [blame]	18	<http://www.open-std.org/jtc1/sc22/wg21/>`_.) (`C11 draft available here
Bill Wendling	2908947	2012-06-29 09:00:01 +0000	[diff] [blame]	19	<http://www.open-std.org/jtc1/sc22/wg14/>`_.)
				20
				21	* Proper semantics for Java-style memory, for both ``volatile`` and regular
				22	shared variables. (`Java Specification
Benjamin Kramer	2abde4f	2014-08-04 09:26:40 +0000	[diff] [blame]	23	<http://docs.oracle.com/javase/specs/jls/se8/html/jls-17.html>`_)
Bill Wendling	2908947	2012-06-29 09:00:01 +0000	[diff] [blame]	24
				25	* gcc-compatible ``__sync_*`` builtins. (`Description
Benjamin Kramer	2abde4f	2014-08-04 09:26:40 +0000	[diff] [blame]	26	<https://gcc.gnu.org/onlinedocs/gcc/_005f_005fsync-Builtins.html>`_)
Bill Wendling	2908947	2012-06-29 09:00:01 +0000	[diff] [blame]	27
				28	* Other scenarios with atomic semantics, including ``static`` variables with
				29	non-trivial constructors in C++.
				30
				31	Atomic and volatile in the IR are orthogonal; "volatile" is the C/C++ volatile,
				32	which ensures that every volatile load and store happens and is performed in the
				33	stated order. A couple examples: if a SequentiallyConsistent store is
				34	immediately followed by another SequentiallyConsistent store to the same
				35	address, the first store can be erased. This transformation is not allowed for a
				36	pair of volatile stores. On the other hand, a non-volatile non-atomic load can
				37	be moved across a volatile load freely, but not an Acquire load.
				38
				39	This document is intended to provide a guide to anyone either writing a frontend
				40	for LLVM or working on optimization passes for LLVM with a guide for how to deal
				41	with instructions with special semantics in the presence of concurrency. This
				42	is not intended to be a precise guide to the semantics; the details can get
				43	extremely complicated and unreadable, and are not usually necessary.
				44
				45	.. _Optimization outside atomic:
				46
				47	Optimization outside atomic
				48	===========================
				49
				50	The basic ``'load'`` and ``'store'`` allow a variety of optimizations, but can
				51	lead to undefined results in a concurrent environment; see `NotAtomic`_. This
				52	section specifically goes into the one optimizer restriction which applies in
				53	concurrent environments, which gets a bit more of an extended description
				54	because any optimization dealing with stores needs to be aware of it.
				55
				56	From the optimizer's point of view, the rule is that if there are not any
				57	instructions with atomic ordering involved, concurrency does not matter, with
				58	one exception: if a variable might be visible to another thread or signal
				59	handler, a store cannot be inserted along a path where it might not execute
				60	otherwise. Take the following example:
				61
				62	.. code-block:: c
				63
				64	/* C code, for readability; run through clang -O2 -S -emit-llvm to get
				65	equivalent IR */
				66	int x;
				67	void f(int* a) {
				68	for (int i = 0; i < 100; i++) {
				69	if (a[i])
				70	x += 1;
				71	}
				72	}
				73
				74	The following is equivalent in non-concurrent situations:
				75
				76	.. code-block:: c
				77
				78	int x;
				79	void f(int* a) {
				80	int xtemp = x;
				81	for (int i = 0; i < 100; i++) {
				82	if (a[i])
				83	xtemp += 1;
				84	}
				85	x = xtemp;
				86	}
				87
				88	However, LLVM is not allowed to transform the former to the latter: it could
				89	indirectly introduce undefined behavior if another thread can access ``x`` at
				90	the same time. (This example is particularly of interest because before the
				91	concurrency model was implemented, LLVM would perform this transformation.)
				92
				93	Note that speculative loads are allowed; a load which is part of a race returns
				94	``undef``, but does not have undefined behavior.
				95
				96	Atomic instructions
				97	===================
				98
				99	For cases where simple loads and stores are not sufficient, LLVM provides
				100	various atomic instructions. The exact guarantees provided depend on the
				101	ordering; see `Atomic orderings`_.
				102
				103	``load atomic`` and ``store atomic`` provide the same basic functionality as
				104	non-atomic loads and stores, but provide additional guarantees in situations
				105	where threads and signals are involved.
				106
				107	``cmpxchg`` and ``atomicrmw`` are essentially like an atomic load followed by an
				108	atomic store (where the store is conditional for ``cmpxchg``), but no other
Tim Northover	420a216	2014-06-13 14:24:07 +0000	[diff] [blame]	109	memory operation can happen on any thread between the load and store.
Bill Wendling	2908947	2012-06-29 09:00:01 +0000	[diff] [blame]	110
				111	A ``fence`` provides Acquire and/or Release ordering which is not part of
				112	another operation; it is normally used along with Monotonic memory operations.
				113	A Monotonic load followed by an Acquire fence is roughly equivalent to an
Robin Morisset	e83f59e	2014-10-03 01:04:20 +0000	[diff] [blame]	114	Acquire load, and a Monotonic store following a Release fence is roughly
				115	equivalent to a Release store. SequentiallyConsistent fences behave as both
				116	an Acquire and a Release fence, and offer some additional complicated
				117	guarantees, see the C++11 standard for details.
Bill Wendling	2908947	2012-06-29 09:00:01 +0000	[diff] [blame]	118
				119	Frontends generating atomic instructions generally need to be aware of the
				120	target to some degree; atomic instructions are guaranteed to be lock-free, and
				121	therefore an instruction which is wider than the target natively supports can be
				122	impossible to generate.
				123
				124	.. _Atomic orderings:
				125
				126	Atomic orderings
				127	================
				128
				129	In order to achieve a balance between performance and necessary guarantees,
				130	there are six levels of atomicity. They are listed in order of strength; each
				131	level includes all the guarantees of the previous level except for
				132	Acquire/Release. (See also `LangRef Ordering <LangRef.html#ordering>`_.)
				133
				134	.. _NotAtomic:
				135
				136	NotAtomic
				137	---------
				138
				139	NotAtomic is the obvious, a load or store which is not atomic. (This isn't
				140	really a level of atomicity, but is listed here for comparison.) This is
				141	essentially a regular load or store. If there is a race on a given memory
				142	location, loads from that location return undef.
				143
				144	Relevant standard
				145	This is intended to match shared variables in C/C++, and to be used in any
				146	other context where memory access is necessary, and a race is impossible. (The
				147	precise definition is in `LangRef Memory Model <LangRef.html#memmodel>`_.)
				148
				149	Notes for frontends
				150	The rule is essentially that all memory accessed with basic loads and stores
				151	by multiple threads should be protected by a lock or other synchronization;
				152	otherwise, you are likely to run into undefined behavior. If your frontend is
				153	for a "safe" language like Java, use Unordered to load and store any shared
				154	variable. Note that NotAtomic volatile loads and stores are not properly
				155	atomic; do not try to use them as a substitute. (Per the C/C++ standards,
				156	volatile does provide some limited guarantees around asynchronous signals, but
				157	atomics are generally a better solution.)
				158
				159	Notes for optimizers
				160	Introducing loads to shared variables along a codepath where they would not
				161	otherwise exist is allowed; introducing stores to shared variables is not. See
				162	`Optimization outside atomic`_.
				163
				164	Notes for code generation
				165	The one interesting restriction here is that it is not allowed to write to
				166	bytes outside of the bytes relevant to a store. This is mostly relevant to
				167	unaligned stores: it is not allowed in general to convert an unaligned store
				168	into two aligned stores of the same width as the unaligned store. Backends are
				169	also expected to generate an i8 store as an i8 store, and not an instruction
				170	which writes to surrounding bytes. (If you are writing a backend for an
				171	architecture which cannot satisfy these restrictions and cares about
Tanya Lattner	0d28f80	2015-08-05 03:51:17 +0000	[diff] [blame]	172	concurrency, please send an email to llvm-dev.)
Bill Wendling	2908947	2012-06-29 09:00:01 +0000	[diff] [blame]	173
				174	Unordered
				175	---------
				176
				177	Unordered is the lowest level of atomicity. It essentially guarantees that races
				178	produce somewhat sane results instead of having undefined behavior. It also
Jingyue Wu	c4725da	2014-09-23 17:35:28 +0000	[diff] [blame]	179	guarantees the operation to be lock-free, so it does not depend on the data
				180	being part of a special atomic structure or depend on a separate per-process
				181	global lock. Note that code generation will fail for unsupported atomic
				182	operations; if you need such an operation, use explicit locking.
Bill Wendling	2908947	2012-06-29 09:00:01 +0000	[diff] [blame]	183
				184	Relevant standard
				185	This is intended to match the Java memory model for shared variables.
				186
				187	Notes for frontends
				188	This cannot be used for synchronization, but is useful for Java and other
				189	"safe" languages which need to guarantee that the generated code never
				190	exhibits undefined behavior. Note that this guarantee is cheap on common
				191	platforms for loads of a native width, but can be expensive or unavailable for
				192	wider loads, like a 64-bit store on ARM. (A frontend for Java or other "safe"
				193	languages would normally split a 64-bit store on ARM into two 32-bit unordered
				194	stores.)
				195
				196	Notes for optimizers
				197	In terms of the optimizer, this prohibits any transformation that transforms a
				198	single load into multiple loads, transforms a store into multiple stores,
				199	narrows a store, or stores a value which would not be stored otherwise. Some
				200	examples of unsafe optimizations are narrowing an assignment into a bitfield,
				201	rematerializing a load, and turning loads and stores into a memcpy
				202	call. Reordering unordered operations is safe, though, and optimizers should
				203	take advantage of that because unordered operations are common in languages
				204	that need them.
				205
				206	Notes for code generation
				207	These operations are required to be atomic in the sense that if you use
				208	unordered loads and unordered stores, a load cannot see a value which was
				209	never stored. A normal load or store instruction is usually sufficient, but
				210	note that an unordered load or store cannot be split into multiple
				211	instructions (or an instruction which does multiple memory operations, like
JF Bastien	e84854a	2013-06-18 23:07:16 +0000	[diff] [blame]	212	``LDRD`` on ARM without LPAE, or not naturally-aligned ``LDRD`` on LPAE ARM).
Bill Wendling	2908947	2012-06-29 09:00:01 +0000	[diff] [blame]	213
				214	Monotonic
				215	---------
				216
				217	Monotonic is the weakest level of atomicity that can be used in synchronization
				218	primitives, although it does not provide any general synchronization. It
				219	essentially guarantees that if you take all the operations affecting a specific
				220	address, a consistent ordering exists.
				221
				222	Relevant standard
Robin Morisset	e83f59e	2014-10-03 01:04:20 +0000	[diff] [blame]	223	This corresponds to the C++11/C11 ``memory_order_relaxed``; see those
Bill Wendling	2908947	2012-06-29 09:00:01 +0000	[diff] [blame]	224	standards for the exact definition.
				225
				226	Notes for frontends
				227	If you are writing a frontend which uses this directly, use with caution. The
				228	guarantees in terms of synchronization are very weak, so make sure these are
				229	only used in a pattern which you know is correct. Generally, these would
				230	either be used for atomic operations which do not protect other memory (like
				231	an atomic counter), or along with a ``fence``.
				232
				233	Notes for optimizers
				234	In terms of the optimizer, this can be treated as a read+write on the relevant
				235	memory location (and alias analysis will take advantage of that). In addition,
				236	it is legal to reorder non-atomic and Unordered loads around Monotonic
				237	loads. CSE/DSE and a few other optimizations are allowed, but Monotonic
				238	operations are unlikely to be used in ways which would make those
				239	optimizations useful.
				240
				241	Notes for code generation
				242	Code generation is essentially the same as that for unordered for loads and
				243	stores. No fences are required. ``cmpxchg`` and ``atomicrmw`` are required
				244	to appear as a single operation.
				245
				246	Acquire
				247	-------
				248
				249	Acquire provides a barrier of the sort necessary to acquire a lock to access
				250	other memory with normal loads and stores.
				251
				252	Relevant standard
Robin Morisset	e83f59e	2014-10-03 01:04:20 +0000	[diff] [blame]	253	This corresponds to the C++11/C11 ``memory_order_acquire``. It should also be
				254	used for C++11/C11 ``memory_order_consume``.
Bill Wendling	2908947	2012-06-29 09:00:01 +0000	[diff] [blame]	255
				256	Notes for frontends
				257	If you are writing a frontend which uses this directly, use with caution.
				258	Acquire only provides a semantic guarantee when paired with a Release
				259	operation.
				260
				261	Notes for optimizers
				262	Optimizers not aware of atomics can treat this like a nothrow call. It is
				263	also possible to move stores from before an Acquire load or read-modify-write
				264	operation to after it, and move non-Acquire loads from before an Acquire
				265	operation to after it.
				266
				267	Notes for code generation
				268	Architectures with weak memory ordering (essentially everything relevant today
				269	except x86 and SPARC) require some sort of fence to maintain the Acquire
				270	semantics. The precise fences required varies widely by architecture, but for
				271	a simple implementation, most architectures provide a barrier which is strong
				272	enough for everything (``dmb`` on ARM, ``sync`` on PowerPC, etc.). Putting
				273	such a fence after the equivalent Monotonic operation is sufficient to
				274	maintain Acquire semantics for a memory operation.
				275
				276	Release
				277	-------
				278
				279	Release is similar to Acquire, but with a barrier of the sort necessary to
				280	release a lock.
				281
				282	Relevant standard
Robin Morisset	e83f59e	2014-10-03 01:04:20 +0000	[diff] [blame]	283	This corresponds to the C++11/C11 ``memory_order_release``.
Bill Wendling	2908947	2012-06-29 09:00:01 +0000	[diff] [blame]	284
				285	Notes for frontends
				286	If you are writing a frontend which uses this directly, use with caution.
				287	Release only provides a semantic guarantee when paired with a Acquire
				288	operation.
				289
				290	Notes for optimizers
				291	Optimizers not aware of atomics can treat this like a nothrow call. It is
				292	also possible to move loads from after a Release store or read-modify-write
				293	operation to before it, and move non-Release stores from after an Release
				294	operation to before it.
				295
				296	Notes for code generation
				297	See the section on Acquire; a fence before the relevant operation is usually
				298	sufficient for Release. Note that a store-store fence is not sufficient to
				299	implement Release semantics; store-store fences are generally not exposed to
				300	IR because they are extremely difficult to use correctly.
				301
				302	AcquireRelease
				303	--------------
				304
				305	AcquireRelease (``acq_rel`` in IR) provides both an Acquire and a Release
				306	barrier (for fences and operations which both read and write memory).
				307
				308	Relevant standard
Robin Morisset	e83f59e	2014-10-03 01:04:20 +0000	[diff] [blame]	309	This corresponds to the C++11/C11 ``memory_order_acq_rel``.
Bill Wendling	2908947	2012-06-29 09:00:01 +0000	[diff] [blame]	310
				311	Notes for frontends
				312	If you are writing a frontend which uses this directly, use with caution.
				313	Acquire only provides a semantic guarantee when paired with a Release
				314	operation, and vice versa.
				315
				316	Notes for optimizers
Sylvestre Ledru	35521e2	2012-07-23 08:51:15 +0000	[diff] [blame]	317	In general, optimizers should treat this like a nothrow call; the possible
Bill Wendling	2908947	2012-06-29 09:00:01 +0000	[diff] [blame]	318	optimizations are usually not interesting.
				319
				320	Notes for code generation
				321	This operation has Acquire and Release semantics; see the sections on Acquire
				322	and Release.
				323
				324	SequentiallyConsistent
				325	----------------------
				326
				327	SequentiallyConsistent (``seq_cst`` in IR) provides Acquire semantics for loads
				328	and Release semantics for stores. Additionally, it guarantees that a total
				329	ordering exists between all SequentiallyConsistent operations.
				330
				331	Relevant standard
Robin Morisset	e83f59e	2014-10-03 01:04:20 +0000	[diff] [blame]	332	This corresponds to the C++11/C11 ``memory_order_seq_cst``, Java volatile, and
Bill Wendling	2908947	2012-06-29 09:00:01 +0000	[diff] [blame]	333	the gcc-compatible ``__sync_*`` builtins which do not specify otherwise.
				334
				335	Notes for frontends
				336	If a frontend is exposing atomic operations, these are much easier to reason
				337	about for the programmer than other kinds of operations, and using them is
				338	generally a practical performance tradeoff.
				339
				340	Notes for optimizers
				341	Optimizers not aware of atomics can treat this like a nothrow call. For
				342	SequentiallyConsistent loads and stores, the same reorderings are allowed as
				343	for Acquire loads and Release stores, except that SequentiallyConsistent
				344	operations may not be reordered.
				345
				346	Notes for code generation
				347	SequentiallyConsistent loads minimally require the same barriers as Acquire
				348	operations and SequentiallyConsistent stores require Release
				349	barriers. Additionally, the code generator must enforce ordering between
				350	SequentiallyConsistent stores followed by SequentiallyConsistent loads. This
				351	is usually done by emitting either a full fence before the loads or a full
				352	fence after the stores; which is preferred varies by architecture.
				353
				354	Atomics and IR optimization
				355	===========================
				356
				357	Predicates for optimizer writers to query:
				358
				359	* ``isSimple()``: A load or store which is not volatile or atomic. This is
				360	what, for example, memcpyopt would check for operations it might transform.
				361
				362	* ``isUnordered()``: A load or store which is not volatile and at most
				363	Unordered. This would be checked, for example, by LICM before hoisting an
				364	operation.
				365
				366	* ``mayReadFromMemory()``/``mayWriteToMemory()``: Existing predicate, but note
				367	that they return true for any operation which is volatile or at least
				368	Monotonic.
				369
JF Bastien	800f87a	2016-04-06 21:19:33 +0000	[diff] [blame^]	370	* ``isStrongerThan`` / ``isAtLeastOrStrongerThan``: These are predicates on
Robin Morisset	e83f59e	2014-10-03 01:04:20 +0000	[diff] [blame]	371	orderings. They can be useful for passes that are aware of atomics, for
				372	example to do DSE across a single atomic access, but not across a
				373	release-acquire pair (see MemoryDependencyAnalysis for an example of this)
				374
Bill Wendling	2908947	2012-06-29 09:00:01 +0000	[diff] [blame]	375	* Alias analysis: Note that AA will return ModRef for anything Acquire or
				376	Release, and for the address accessed by any Monotonic operation.
				377
				378	To support optimizing around atomic operations, make sure you are using the
				379	right predicates; everything should work if that is done. If your pass should
				380	optimize some atomic operations (Unordered operations in particular), make sure
				381	it doesn't replace an atomic load or store with a non-atomic operation.
				382
				383	Some examples of how optimizations interact with various kinds of atomic
				384	operations:
				385
				386	* ``memcpyopt``: An atomic operation cannot be optimized into part of a
				387	memcpy/memset, including unordered loads/stores. It can pull operations
				388	across some atomic operations.
				389
				390	* LICM: Unordered loads/stores can be moved out of a loop. It just treats
				391	monotonic operations like a read+write to a memory location, and anything
				392	stricter than that like a nothrow call.
				393
				394	* DSE: Unordered stores can be DSE'ed like normal stores. Monotonic stores can
				395	be DSE'ed in some cases, but it's tricky to reason about, and not especially
Robin Morisset	e83f59e	2014-10-03 01:04:20 +0000	[diff] [blame]	396	important. It is possible in some case for DSE to operate across a stronger
				397	atomic operation, but it is fairly tricky. DSE delegates this reasoning to
				398	MemoryDependencyAnalysis (which is also used by other passes like GVN).
Bill Wendling	2908947	2012-06-29 09:00:01 +0000	[diff] [blame]	399
				400	* Folding a load: Any atomic load from a constant global can be constant-folded,
				401	because it cannot be observed. Similar reasoning allows scalarrepl with
				402	atomic loads and stores.
				403
				404	Atomics and Codegen
				405	===================
				406
				407	Atomic operations are represented in the SelectionDAG with ``ATOMIC_*`` opcodes.
				408	On architectures which use barrier instructions for all atomic ordering (like
Robin Morisset	e83f59e	2014-10-03 01:04:20 +0000	[diff] [blame]	409	ARM), appropriate fences can be emitted by the AtomicExpand Codegen pass if
				410	``setInsertFencesForAtomic()`` was used.
Bill Wendling	2908947	2012-06-29 09:00:01 +0000	[diff] [blame]	411
				412	The MachineMemOperand for all atomic operations is currently marked as volatile;
				413	this is not correct in the IR sense of volatile, but CodeGen handles anything
				414	marked volatile very conservatively. This should get fixed at some point.
				415
				416	Common architectures have some way of representing at least a pointer-sized
				417	lock-free ``cmpxchg``; such an operation can be used to implement all the other
				418	atomic operations which can be represented in IR up to that size. Backends are
				419	expected to implement all those operations, but not operations which cannot be
				420	implemented in a lock-free manner. It is expected that backends will give an
				421	error when given an operation which cannot be implemented. (The LLVM code
				422	generator is not very helpful here at the moment, but hopefully that will
				423	change.)
				424
Bill Wendling	2908947	2012-06-29 09:00:01 +0000	[diff] [blame]	425	On x86, all atomic loads generate a ``MOV``. SequentiallyConsistent stores
				426	generate an ``XCHG``, other stores generate a ``MOV``. SequentiallyConsistent
				427	fences generate an ``MFENCE``, other fences do not cause any code to be
				428	generated. cmpxchg uses the ``LOCK CMPXCHG`` instruction. ``atomicrmw xchg``
				429	uses ``XCHG``, ``atomicrmw add`` and ``atomicrmw sub`` use ``XADD``, and all
				430	other ``atomicrmw`` operations generate a loop with ``LOCK CMPXCHG``. Depending
				431	on the users of the result, some ``atomicrmw`` operations can be translated into
				432	operations like ``LOCK AND``, but that does not work in general.
				433
Tim Northover	420a216	2014-06-13 14:24:07 +0000	[diff] [blame]	434	On ARM (before v8), MIPS, and many other RISC architectures, Acquire, Release,
				435	and SequentiallyConsistent semantics require barrier instructions for every such
Bill Wendling	2908947	2012-06-29 09:00:01 +0000	[diff] [blame]	436	operation. Loads and stores generate normal instructions. ``cmpxchg`` and
				437	``atomicrmw`` can be represented using a loop with LL/SC-style instructions
				438	which take some sort of exclusive lock on a cache line (``LDREX`` and ``STREX``
Tim Northover	420a216	2014-06-13 14:24:07 +0000	[diff] [blame]	439	on ARM, etc.).
Robin Morisset	e83f59e	2014-10-03 01:04:20 +0000	[diff] [blame]	440
				441	It is often easiest for backends to use AtomicExpandPass to lower some of the
				442	atomic constructs. Here are some lowerings it can do:
Dan Liew	460e0f4	2014-10-03 12:28:48 +0000	[diff] [blame]	443
Robin Morisset	e83f59e	2014-10-03 01:04:20 +0000	[diff] [blame]	444	* cmpxchg -> loop with load-linked/store-conditional
Ahmed Bougacha	5246867	2015-09-11 17:08:28 +0000	[diff] [blame]	445	by overriding ``shouldExpandAtomicCmpXchgInIR()``, ``emitLoadLinked()``,
Robin Morisset	e83f59e	2014-10-03 01:04:20 +0000	[diff] [blame]	446	``emitStoreConditional()``
				447	* large loads/stores -> ll-sc/cmpxchg
				448	by overriding ``shouldExpandAtomicStoreInIR()``/``shouldExpandAtomicLoadInIR()``
				449	* strong atomic accesses -> monotonic accesses + fences
				450	by using ``setInsertFencesForAtomic()`` and overriding ``emitLeadingFence()``
				451	and ``emitTrailingFence()``
				452	* atomic rmw -> loop with cmpxchg or load-linked/store-conditional
				453	by overriding ``expandAtomicRMWInIR()``
Dan Liew	460e0f4	2014-10-03 12:28:48 +0000	[diff] [blame]	454
Robin Morisset	e83f59e	2014-10-03 01:04:20 +0000	[diff] [blame]	455	For an example of all of these, look at the ARM backend.