|  | <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" | 
|  | "http://www.w3.org/TR/html4/strict.dtd"> | 
|  | <html> | 
|  | <head> | 
|  | <title>LLVM Atomic Instructions and Concurrency Guide</title> | 
|  | <meta http-equiv="Content-Type" content="text/html; charset=utf-8"> | 
|  | <link rel="stylesheet" href="llvm.css" type="text/css"> | 
|  | </head> | 
|  | <body> | 
|  |  | 
|  | <h1> | 
|  | LLVM Atomic Instructions and Concurrency Guide | 
|  | </h1> | 
|  |  | 
|  | <ol> | 
|  | <li><a href="#introduction">Introduction</a></li> | 
|  | <li><a href="#outsideatomic">Optimization outside atomic</a></li> | 
|  | <li><a href="#atomicinst">Atomic instructions</a></li> | 
|  | <li><a href="#ordering">Atomic orderings</a></li> | 
|  | <li><a href="#iropt">Atomics and IR optimization</a></li> | 
|  | <li><a href="#codegen">Atomics and Codegen</a></li> | 
|  | </ol> | 
|  |  | 
|  | <div class="doc_author"> | 
|  | <p>Written by Eli Friedman</p> | 
|  | </div> | 
|  |  | 
|  | <!-- *********************************************************************** --> | 
|  | <h2> | 
|  | <a name="introduction">Introduction</a> | 
|  | </h2> | 
|  | <!-- *********************************************************************** --> | 
|  |  | 
|  | <div> | 
|  |  | 
|  | <p>Historically, LLVM has not had very strong support for concurrency; some | 
|  | minimal intrinsics were provided, and <code>volatile</code> was used in some | 
|  | cases to achieve rough semantics in the presence of concurrency.  However, this | 
|  | is changing; there are now new instructions which are well-defined in the | 
|  | presence of threads and asynchronous signals, and the model for existing | 
|  | instructions has been clarified in the IR.</p> | 
|  |  | 
|  | <p>The atomic instructions are designed specifically to provide readable IR and | 
|  | optimized code generation for the following:</p> | 
|  | <ul> | 
|  | <li>The new C++0x <code><atomic></code> header. | 
|  | (<a href="http://www.open-std.org/jtc1/sc22/wg21/">C++0x draft available here</a>.) | 
|  | (<a href="http://www.open-std.org/jtc1/sc22/wg14/">C1x draft available here</a>)</li> | 
|  | <li>Proper semantics for Java-style memory, for both <code>volatile</code> and | 
|  | regular shared variables. | 
|  | (<a href="http://java.sun.com/docs/books/jls/third_edition/html/memory.html">Java Specification</a>)</li> | 
|  | <li>gcc-compatible <code>__sync_*</code> builtins. | 
|  | (<a href="http://gcc.gnu.org/onlinedocs/gcc/Atomic-Builtins.html">Description</a>)</li> | 
|  | <li>Other scenarios with atomic semantics, including <code>static</code> | 
|  | variables with non-trivial constructors in C++.</li> | 
|  | </ul> | 
|  |  | 
|  | <p>Atomic and volatile in the IR are orthogonal; "volatile" is the C/C++ | 
|  | volatile, which ensures that every volatile load and store happens and is | 
|  | performed in the stated order.  A couple examples: if a | 
|  | SequentiallyConsistent store is immediately followed by another | 
|  | SequentiallyConsistent store to the same address, the first store can | 
|  | be erased. This transformation is not allowed for a pair of volatile | 
|  | stores. On the other hand, a non-volatile non-atomic load can be moved | 
|  | across a volatile load freely, but not an Acquire load.</p> | 
|  |  | 
|  | <p>This document is intended to provide a guide to anyone either writing a | 
|  | frontend for LLVM or working on optimization passes for LLVM with a guide | 
|  | for how to deal with instructions with special semantics in the presence of | 
|  | concurrency.  This is not intended to be a precise guide to the semantics; | 
|  | the details can get extremely complicated and unreadable, and are not | 
|  | usually necessary.</p> | 
|  |  | 
|  | </div> | 
|  |  | 
|  | <!-- *********************************************************************** --> | 
|  | <h2> | 
|  | <a name="outsideatomic">Optimization outside atomic</a> | 
|  | </h2> | 
|  | <!-- *********************************************************************** --> | 
|  |  | 
|  | <div> | 
|  |  | 
|  | <p>The basic <code>'load'</code> and <code>'store'</code> allow a variety of | 
|  | optimizations, but can lead to undefined results in a concurrent environment; | 
|  | see <a href="#o_nonatomic">NonAtomic</a>. This section specifically goes | 
|  | into the one optimizer restriction which applies in concurrent environments, | 
|  | which gets a bit more of an extended description because any optimization | 
|  | dealing with stores needs to be aware of it.</p> | 
|  |  | 
|  | <p>From the optimizer's point of view, the rule is that if there | 
|  | are not any instructions with atomic ordering involved, concurrency does | 
|  | not matter, with one exception: if a variable might be visible to another | 
|  | thread or signal handler, a store cannot be inserted along a path where it | 
|  | might not execute otherwise.  Take the following example:</p> | 
|  |  | 
|  | <pre> | 
|  | /* C code, for readability; run through clang -O2 -S -emit-llvm to get | 
|  | equivalent IR */ | 
|  | int x; | 
|  | void f(int* a) { | 
|  | for (int i = 0; i < 100; i++) { | 
|  | if (a[i]) | 
|  | x += 1; | 
|  | } | 
|  | } | 
|  | </pre> | 
|  |  | 
|  | <p>The following is equivalent in non-concurrent situations:</p> | 
|  |  | 
|  | <pre> | 
|  | int x; | 
|  | void f(int* a) { | 
|  | int xtemp = x; | 
|  | for (int i = 0; i < 100; i++) { | 
|  | if (a[i]) | 
|  | xtemp += 1; | 
|  | } | 
|  | x = xtemp; | 
|  | } | 
|  | </pre> | 
|  |  | 
|  | <p>However, LLVM is not allowed to transform the former to the latter: it could | 
|  | indirectly introduce undefined behavior if another thread can access x at | 
|  | the same time. (This example is particularly of interest because before the | 
|  | concurrency model was implemented, LLVM would perform this | 
|  | transformation.)</p> | 
|  |  | 
|  | <p>Note that speculative loads are allowed; a load which | 
|  | is part of a race returns <code>undef</code>, but does not have undefined | 
|  | behavior.</p> | 
|  |  | 
|  |  | 
|  | </div> | 
|  |  | 
|  | <!-- *********************************************************************** --> | 
|  | <h2> | 
|  | <a name="atomicinst">Atomic instructions</a> | 
|  | </h2> | 
|  | <!-- *********************************************************************** --> | 
|  |  | 
|  | <div> | 
|  |  | 
|  | <p>For cases where simple loads and stores are not sufficient, LLVM provides | 
|  | various atomic instructions. The exact guarantees provided depend on the | 
|  | ordering; see <a href="#ordering">Atomic orderings</a></p> | 
|  |  | 
|  | <p><code>load atomic</code> and <code>store atomic</code> provide the same | 
|  | basic functionality as non-atomic loads and stores, but provide additional | 
|  | guarantees in situations where threads and signals are involved.</p> | 
|  |  | 
|  | <p><code>cmpxchg</code> and <code>atomicrmw</code> are essentially like an | 
|  | atomic load followed by an atomic store (where the store is conditional for | 
|  | <code>cmpxchg</code>), but no other memory operation can happen on any thread | 
|  | between the load and store.  Note that LLVM's cmpxchg does not provide quite | 
|  | as many options as the C++0x version.</p> | 
|  |  | 
|  | <p>A <code>fence</code> provides Acquire and/or Release ordering which is not | 
|  | part of another operation; it is normally used along with Monotonic memory | 
|  | operations.  A Monotonic load followed by an Acquire fence is roughly | 
|  | equivalent to an Acquire load.</p> | 
|  |  | 
|  | <p>Frontends generating atomic instructions generally need to be aware of the | 
|  | target to some degree; atomic instructions are guaranteed to be lock-free, | 
|  | and therefore an instruction which is wider than the target natively supports | 
|  | can be impossible to generate.</p> | 
|  |  | 
|  | </div> | 
|  |  | 
|  | <!-- *********************************************************************** --> | 
|  | <h2> | 
|  | <a name="ordering">Atomic orderings</a> | 
|  | </h2> | 
|  | <!-- *********************************************************************** --> | 
|  |  | 
|  | <div> | 
|  |  | 
|  | <p>In order to achieve a balance between performance and necessary guarantees, | 
|  | there are six levels of atomicity. They are listed in order of strength; | 
|  | each level includes all the guarantees of the previous level except for | 
|  | Acquire/Release. (See also <a href="LangRef.html#ordering">LangRef</a>.)</p> | 
|  |  | 
|  | <!-- ======================================================================= --> | 
|  | <h3> | 
|  | <a name="o_notatomic">NotAtomic</a> | 
|  | </h3> | 
|  |  | 
|  | <div> | 
|  |  | 
|  | <p>NotAtomic is the obvious, a load or store which is not atomic. (This isn't | 
|  | really a level of atomicity, but is listed here for comparison.) This is | 
|  | essentially a regular load or store. If there is a race on a given memory | 
|  | location, loads from that location return undef.</p> | 
|  |  | 
|  | <dl> | 
|  | <dt>Relevant standard</dt> | 
|  | <dd>This is intended to match shared variables in C/C++, and to be used | 
|  | in any other context where memory access is necessary, and | 
|  | a race is impossible. (The precise definition is in | 
|  | <a href="LangRef.html#memmodel">LangRef</a>.) | 
|  | <dt>Notes for frontends</dt> | 
|  | <dd>The rule is essentially that all memory accessed with basic loads and | 
|  | stores by multiple threads should be protected by a lock or other | 
|  | synchronization; otherwise, you are likely to run into undefined | 
|  | behavior. If your frontend is for a "safe" language like Java, | 
|  | use Unordered to load and store any shared variable.  Note that NotAtomic | 
|  | volatile loads and stores are not properly atomic; do not try to use | 
|  | them as a substitute. (Per the C/C++ standards, volatile does provide | 
|  | some limited guarantees around asynchronous signals, but atomics are | 
|  | generally a better solution.) | 
|  | <dt>Notes for optimizers</dt> | 
|  | <dd>Introducing loads to shared variables along a codepath where they would | 
|  | not otherwise exist is allowed; introducing stores to shared variables | 
|  | is not. See <a href="#outsideatomic">Optimization outside | 
|  | atomic</a>.</dd> | 
|  | <dt>Notes for code generation</dt> | 
|  | <dd>The one interesting restriction here is that it is not allowed to write | 
|  | to bytes outside of the bytes relevant to a store.  This is mostly | 
|  | relevant to unaligned stores: it is not allowed in general to convert | 
|  | an unaligned store into two aligned stores of the same width as the | 
|  | unaligned store. Backends are also expected to generate an i8 store | 
|  | as an i8 store, and not an instruction which writes to surrounding | 
|  | bytes.  (If you are writing a backend for an architecture which cannot | 
|  | satisfy these restrictions and cares about concurrency, please send an | 
|  | email to llvmdev.)</dd> | 
|  | </dl> | 
|  |  | 
|  | </div> | 
|  |  | 
|  |  | 
|  | <!-- ======================================================================= --> | 
|  | <h3> | 
|  | <a name="o_unordered">Unordered</a> | 
|  | </h3> | 
|  |  | 
|  | <div> | 
|  |  | 
|  | <p>Unordered is the lowest level of atomicity. It essentially guarantees that | 
|  | races produce somewhat sane results instead of having undefined behavior. | 
|  | It also guarantees the operation to be lock-free, so it do not depend on | 
|  | the data being part of a special atomic structure or depend on a separate | 
|  | per-process global lock.  Note that code generation will fail for | 
|  | unsupported atomic operations; if you need such an operation, use explicit | 
|  | locking.</p> | 
|  |  | 
|  | <dl> | 
|  | <dt>Relevant standard</dt> | 
|  | <dd>This is intended to match the Java memory model for shared | 
|  | variables.</dd> | 
|  | <dt>Notes for frontends</dt> | 
|  | <dd>This cannot be used for synchronization, but is useful for Java and | 
|  | other "safe" languages which need to guarantee that the generated | 
|  | code never exhibits undefined behavior. Note that this guarantee | 
|  | is cheap on common platforms for loads of a native width, but can | 
|  | be expensive or unavailable for wider loads, like a 64-bit store | 
|  | on ARM. (A frontend for Java or other "safe" languages would normally | 
|  | split a 64-bit store on ARM into two 32-bit unordered stores.) | 
|  | <dt>Notes for optimizers</dt> | 
|  | <dd>In terms of the optimizer, this prohibits any transformation that | 
|  | transforms a single load into multiple loads, transforms a store | 
|  | into multiple stores, narrows a store, or stores a value which | 
|  | would not be stored otherwise.  Some examples of unsafe optimizations | 
|  | are narrowing an assignment into a bitfield, rematerializing | 
|  | a load, and turning loads and stores into a memcpy call. Reordering | 
|  | unordered operations is safe, though, and optimizers should take | 
|  | advantage of that because unordered operations are common in | 
|  | languages that need them.</dd> | 
|  | <dt>Notes for code generation</dt> | 
|  | <dd>These operations are required to be atomic in the sense that if you | 
|  | use unordered loads and unordered stores, a load cannot see a value | 
|  | which was never stored.  A normal load or store instruction is usually | 
|  | sufficient, but note that an unordered load or store cannot | 
|  | be split into multiple instructions (or an instruction which | 
|  | does multiple memory operations, like <code>LDRD</code> on ARM).</dd> | 
|  | </dl> | 
|  |  | 
|  | </div> | 
|  |  | 
|  | <!-- ======================================================================= --> | 
|  | <h3> | 
|  | <a name="o_monotonic">Monotonic</a> | 
|  | </h3> | 
|  |  | 
|  | <div> | 
|  |  | 
|  | <p>Monotonic is the weakest level of atomicity that can be used in | 
|  | synchronization primitives, although it does not provide any general | 
|  | synchronization. It essentially guarantees that if you take all the | 
|  | operations affecting a specific address, a consistent ordering exists. | 
|  |  | 
|  | <dl> | 
|  | <dt>Relevant standard</dt> | 
|  | <dd>This corresponds to the C++0x/C1x <code>memory_order_relaxed</code>; | 
|  | see those standards for the exact definition. | 
|  | <dt>Notes for frontends</dt> | 
|  | <dd>If you are writing a frontend which uses this directly, use with caution. | 
|  | The guarantees in terms of synchronization are very weak, so make | 
|  | sure these are only used in a pattern which you know is correct. | 
|  | Generally, these would either be used for atomic operations which | 
|  | do not protect other memory (like an atomic counter), or along with | 
|  | a <code>fence</code>.</dd> | 
|  | <dt>Notes for optimizers</dt> | 
|  | <dd>In terms of the optimizer, this can be treated as a read+write on the | 
|  | relevant memory location (and alias analysis will take advantage of | 
|  | that). In addition, it is legal to reorder non-atomic and Unordered | 
|  | loads around Monotonic loads. CSE/DSE and a few other optimizations | 
|  | are allowed, but Monotonic operations are unlikely to be used in ways | 
|  | which would make those optimizations useful.</dd> | 
|  | <dt>Notes for code generation</dt> | 
|  | <dd>Code generation is essentially the same as that for unordered for loads | 
|  | and stores.  No fences are required.  <code>cmpxchg</code> and | 
|  | <code>atomicrmw</code> are required to appear as a single operation.</dd> | 
|  | </dl> | 
|  |  | 
|  | </div> | 
|  |  | 
|  | <!-- ======================================================================= --> | 
|  | <h3> | 
|  | <a name="o_acquire">Acquire</a> | 
|  | </h3> | 
|  |  | 
|  | <div> | 
|  |  | 
|  | <p>Acquire provides a barrier of the sort necessary to acquire a lock to access | 
|  | other memory with normal loads and stores. | 
|  |  | 
|  | <dl> | 
|  | <dt>Relevant standard</dt> | 
|  | <dd>This corresponds to the C++0x/C1x <code>memory_order_acquire</code>. It | 
|  | should also be used for C++0x/C1x <code>memory_order_consume</code>. | 
|  | <dt>Notes for frontends</dt> | 
|  | <dd>If you are writing a frontend which uses this directly, use with caution. | 
|  | Acquire only provides a semantic guarantee when paired with a Release | 
|  | operation.</dd> | 
|  | <dt>Notes for optimizers</dt> | 
|  | <dd>Optimizers not aware of atomics can treat this like a nothrow call. | 
|  | It is also possible to move stores from before an Acquire load | 
|  | or read-modify-write operation to after it, and move non-Acquire | 
|  | loads from before an Acquire operation to after it.</dd> | 
|  | <dt>Notes for code generation</dt> | 
|  | <dd>Architectures with weak memory ordering (essentially everything relevant | 
|  | today except x86 and SPARC) require some sort of fence to maintain | 
|  | the Acquire semantics.  The precise fences required varies widely by | 
|  | architecture, but for a simple implementation, most architectures provide | 
|  | a barrier which is strong enough for everything (<code>dmb</code> on ARM, | 
|  | <code>sync</code> on PowerPC, etc.).  Putting such a fence after the | 
|  | equivalent Monotonic operation is sufficient to maintain Acquire | 
|  | semantics for a memory operation.</dd> | 
|  | </dl> | 
|  |  | 
|  | </div> | 
|  |  | 
|  | <!-- ======================================================================= --> | 
|  | <h3> | 
|  | <a name="o_acquire">Release</a> | 
|  | </h3> | 
|  |  | 
|  | <div> | 
|  |  | 
|  | <p>Release is similar to Acquire, but with a barrier of the sort necessary to | 
|  | release a lock. | 
|  |  | 
|  | <dl> | 
|  | <dt>Relevant standard</dt> | 
|  | <dd>This corresponds to the C++0x/C1x <code>memory_order_release</code>.</dd> | 
|  | <dt>Notes for frontends</dt> | 
|  | <dd>If you are writing a frontend which uses this directly, use with caution. | 
|  | Release only provides a semantic guarantee when paired with a Acquire | 
|  | operation.</dd> | 
|  | <dt>Notes for optimizers</dt> | 
|  | <dd>Optimizers not aware of atomics can treat this like a nothrow call. | 
|  | It is also possible to move loads from after a Release store | 
|  | or read-modify-write operation to before it, and move non-Release | 
|  | stores from after an Release operation to before it.</dd> | 
|  | <dt>Notes for code generation</dt> | 
|  | <dd>See the section on Acquire; a fence before the relevant operation is | 
|  | usually sufficient for Release. Note that a store-store fence is not | 
|  | sufficient to implement Release semantics; store-store fences are | 
|  | generally not exposed to IR because they are extremely difficult to | 
|  | use correctly.</dd> | 
|  | </dl> | 
|  |  | 
|  | </div> | 
|  |  | 
|  | <!-- ======================================================================= --> | 
|  | <h3> | 
|  | <a name="o_acqrel">AcquireRelease</a> | 
|  | </h3> | 
|  |  | 
|  | <div> | 
|  |  | 
|  | <p>AcquireRelease (<code>acq_rel</code> in IR) provides both an Acquire and a | 
|  | Release barrier (for fences and operations which both read and write memory). | 
|  |  | 
|  | <dl> | 
|  | <dt>Relevant standard</dt> | 
|  | <dd>This corresponds to the C++0x/C1x <code>memory_order_acq_rel</code>. | 
|  | <dt>Notes for frontends</dt> | 
|  | <dd>If you are writing a frontend which uses this directly, use with caution. | 
|  | Acquire only provides a semantic guarantee when paired with a Release | 
|  | operation, and vice versa.</dd> | 
|  | <dt>Notes for optimizers</dt> | 
|  | <dd>In general, optimizers should treat this like a nothrow call; the | 
|  | the possible optimizations are usually not interesting.</dd> | 
|  | <dt>Notes for code generation</dt> | 
|  | <dd>This operation has Acquire and Release semantics; see the sections on | 
|  | Acquire and Release.</dd> | 
|  | </dl> | 
|  |  | 
|  | </div> | 
|  |  | 
|  | <!-- ======================================================================= --> | 
|  | <h3> | 
|  | <a name="o_seqcst">SequentiallyConsistent</a> | 
|  | </h3> | 
|  |  | 
|  | <div> | 
|  |  | 
|  | <p>SequentiallyConsistent (<code>seq_cst</code> in IR) provides | 
|  | Acquire semantics for loads and Release semantics for | 
|  | stores. Additionally, it guarantees that a total ordering exists | 
|  | between all SequentiallyConsistent operations. | 
|  |  | 
|  | <dl> | 
|  | <dt>Relevant standard</dt> | 
|  | <dd>This corresponds to the C++0x/C1x <code>memory_order_seq_cst</code>, | 
|  | Java volatile, and the gcc-compatible <code>__sync_*</code> builtins | 
|  | which do not specify otherwise. | 
|  | <dt>Notes for frontends</dt> | 
|  | <dd>If a frontend is exposing atomic operations, these are much easier to | 
|  | reason about for the programmer than other kinds of operations, and using | 
|  | them is generally a practical performance tradeoff.</dd> | 
|  | <dt>Notes for optimizers</dt> | 
|  | <dd>Optimizers not aware of atomics can treat this like a nothrow call. | 
|  | For SequentiallyConsistent loads and stores, the same reorderings are | 
|  | allowed as for Acquire loads and Release stores, except that | 
|  | SequentiallyConsistent operations may not be reordered.</dd> | 
|  | <dt>Notes for code generation</dt> | 
|  | <dd>SequentiallyConsistent loads minimally require the same barriers | 
|  | as Acquire operations and SequentiallyConsistent stores require | 
|  | Release barriers. Additionally, the code generator must enforce | 
|  | ordering between SequentiallyConsistent stores followed by | 
|  | SequentiallyConsistent loads. This is usually done by emitting | 
|  | either a full fence before the loads or a full fence after the | 
|  | stores; which is preferred varies by architecture.</dd> | 
|  | </dl> | 
|  |  | 
|  | </div> | 
|  |  | 
|  | </div> | 
|  |  | 
|  | <!-- *********************************************************************** --> | 
|  | <h2> | 
|  | <a name="iropt">Atomics and IR optimization</a> | 
|  | </h2> | 
|  | <!-- *********************************************************************** --> | 
|  |  | 
|  | <div> | 
|  |  | 
|  | <p>Predicates for optimizer writers to query: | 
|  | <ul> | 
|  | <li>isSimple(): A load or store which is not volatile or atomic.  This is | 
|  | what, for example, memcpyopt would check for operations it might | 
|  | transform.</li> | 
|  | <li>isUnordered(): A load or store which is not volatile and at most | 
|  | Unordered. This would be checked, for example, by LICM before hoisting | 
|  | an operation.</li> | 
|  | <li>mayReadFromMemory()/mayWriteToMemory(): Existing predicate, but note | 
|  | that they return true for any operation which is volatile or at least | 
|  | Monotonic.</li> | 
|  | <li>Alias analysis: Note that AA will return ModRef for anything Acquire or | 
|  | Release, and for the address accessed by any Monotonic operation.</li> | 
|  | </ul> | 
|  |  | 
|  | <p>To support optimizing around atomic operations, make sure you are using | 
|  | the right predicates; everything should work if that is done.  If your | 
|  | pass should optimize some atomic operations (Unordered operations in | 
|  | particular), make sure it doesn't replace an atomic load or store with | 
|  | a non-atomic operation.</p> | 
|  |  | 
|  | <p>Some examples of how optimizations interact with various kinds of atomic | 
|  | operations: | 
|  | <ul> | 
|  | <li>memcpyopt: An atomic operation cannot be optimized into part of a | 
|  | memcpy/memset, including unordered loads/stores.  It can pull operations | 
|  | across some atomic operations. | 
|  | <li>LICM: Unordered loads/stores can be moved out of a loop.  It just treats | 
|  | monotonic operations like a read+write to a memory location, and anything | 
|  | stricter than that like a nothrow call. | 
|  | <li>DSE: Unordered stores can be DSE'ed like normal stores.  Monotonic stores | 
|  | can be DSE'ed in some cases, but it's tricky to reason about, and not | 
|  | especially important. | 
|  | <li>Folding a load: Any atomic load from a constant global can be | 
|  | constant-folded, because it cannot be observed.  Similar reasoning allows | 
|  | scalarrepl with atomic loads and stores. | 
|  | </ul> | 
|  |  | 
|  | </div> | 
|  |  | 
|  | <!-- *********************************************************************** --> | 
|  | <h2> | 
|  | <a name="codegen">Atomics and Codegen</a> | 
|  | </h2> | 
|  | <!-- *********************************************************************** --> | 
|  |  | 
|  | <div> | 
|  |  | 
|  | <p>Atomic operations are represented in the SelectionDAG with | 
|  | <code>ATOMIC_*</code> opcodes.  On architectures which use barrier | 
|  | instructions for all atomic ordering (like ARM), appropriate fences are | 
|  | split out as the DAG is built.</p> | 
|  |  | 
|  | <p>The MachineMemOperand for all atomic operations is currently marked as | 
|  | volatile; this is not correct in the IR sense of volatile, but CodeGen | 
|  | handles anything marked volatile very conservatively.  This should get | 
|  | fixed at some point.</p> | 
|  |  | 
|  | <p>Common architectures have some way of representing at least a pointer-sized | 
|  | lock-free <code>cmpxchg</code>; such an operation can be used to implement | 
|  | all the other atomic operations which can be represented in IR up to that | 
|  | size.  Backends are expected to implement all those operations, but not | 
|  | operations which cannot be implemented in a lock-free manner.  It is | 
|  | expected that backends will give an error when given an operation which | 
|  | cannot be implemented.  (The LLVM code generator is not very helpful here | 
|  | at the moment, but hopefully that will change.)</p> | 
|  |  | 
|  | <p>The implementation of atomics on LL/SC architectures (like ARM) is currently | 
|  | a bit of a mess; there is a lot of copy-pasted code across targets, and | 
|  | the representation is relatively unsuited to optimization (it would be nice | 
|  | to be able to optimize loops involving cmpxchg etc.).</p> | 
|  |  | 
|  | <p>On x86, all atomic loads generate a <code>MOV</code>. | 
|  | SequentiallyConsistent stores generate an <code>XCHG</code>, other stores | 
|  | generate a <code>MOV</code>. SequentiallyConsistent fences generate an | 
|  | <code>MFENCE</code>, other fences do not cause any code to be generated. | 
|  | cmpxchg uses the <code>LOCK CMPXCHG</code> instruction. | 
|  | <code>atomicrmw xchg</code> uses <code>XCHG</code>, | 
|  | <code>atomicrmw add</code> and <code>atomicrmw sub</code> use | 
|  | <code>XADD</code>, and all other <code>atomicrmw</code> operations generate | 
|  | a loop with <code>LOCK CMPXCHG</code>.  Depending on the users of the | 
|  | result, some <code>atomicrmw</code> operations can be translated into | 
|  | operations like <code>LOCK AND</code>, but that does not work in | 
|  | general.</p> | 
|  |  | 
|  | <p>On ARM, MIPS, and many other RISC architectures, Acquire, Release, and | 
|  | SequentiallyConsistent semantics require barrier instructions | 
|  | for every such operation. Loads and stores generate normal instructions. | 
|  | <code>cmpxchg</code> and <code>atomicrmw</code> can be represented using | 
|  | a loop with LL/SC-style instructions which take some sort of exclusive | 
|  | lock on a cache line  (<code>LDREX</code> and <code>STREX</code> on | 
|  | ARM, etc.). At the moment, the IR does not provide any way to represent a | 
|  | weak <code>cmpxchg</code> which would not require a loop.</p> | 
|  | </div> | 
|  |  | 
|  | <!-- *********************************************************************** --> | 
|  |  | 
|  | <hr> | 
|  | <address> | 
|  | <a href="http://jigsaw.w3.org/css-validator/check/referer"><img | 
|  | src="http://jigsaw.w3.org/css-validator/images/vcss-blue" alt="Valid CSS"></a> | 
|  | <a href="http://validator.w3.org/check/referer"><img | 
|  | src="http://www.w3.org/Icons/valid-html401-blue" alt="Valid HTML 4.01"></a> | 
|  |  | 
|  | <a href="http://llvm.org/">LLVM Compiler Infrastructure</a><br> | 
|  | Last modified: $Date: 2011-08-09 02:07:00 -0700 (Tue, 09 Aug 2011) $ | 
|  | </address> | 
|  |  | 
|  | </body> | 
|  | </html> |