Blame - www/atomic_design_c.html - platform/external/libcxx

blob: f574d4d7e2e6905f162e4c065ef608783da426a2 [file] [log] [blame]

Howard Hinnant	086b718	2010-10-06 16:15:10 +0000	[diff] [blame]	1	<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
				2	"http://www.w3.org/TR/html4/strict.dtd">
				3	<!-- Material used from: HTML 4.01 specs: http://www.w3.org/TR/html401/ -->
				4	<html>
				5	<head>
				6	<META http-equiv="Content-Type" content="text/html; charset=ISO-8859-1">
				7	<title><atomic> design</title>
				8	<link type="text/css" rel="stylesheet" href="menu.css">
				9	<link type="text/css" rel="stylesheet" href="content.css">
				10	</head>
				11
				12	<body>
				13	<div id="menu">
				14	<div>
				15	<a href="http://llvm.org/">LLVM Home</a>
				16	</div>
				17
				18	<div class="submenu">
				19	<label>libc++ Info</label>
				20	<a href="/index.html">About</a>
				21	</div>
				22
				23	<div class="submenu">
				24	<label>Quick Links</label>
				25	<a href="http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev">cfe-dev</a>
				26	<a href="http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits">cfe-commits</a>
				27	<a href="http://llvm.org/bugs/">Bug Reports</a>
				28	<a href="http://llvm.org/svn/llvm-project/libcxx/trunk/">Browse SVN</a>
				29	<a href="http://llvm.org/viewvc/llvm-project/libcxx/trunk/">Browse ViewVC</a>
				30	</div>
				31	</div>
				32
				33	<div id="content">
				34	<!--*********************************************************************-->
				35	<h1><atomic> design</h1>
				36	<!--*********************************************************************-->
				37
				38	<p>
				39	The <tt><atomic></tt> header is one of the most closely coupled headers to
				40	the compiler. Ideally when you invoke any function from
				41	<tt><atomic></tt>, it should result in highly optimized assembly being
				42	inserted directly into your application ... assembly that is not otherwise
				43	representable by higher level C or C++ expressions. The design of the libc++
				44	<tt><atomic></tt> header started with this goal in mind. A secondary, but
				45	still very important goal is that the compiler should have to do minimal work to
Howard Hinnant	0dee9cd	2013-02-15 15:37:50 +0000	[diff] [blame]	46	facilitate the implementation of <tt><atomic></tt>. Without this second
Howard Hinnant	086b718	2010-10-06 16:15:10 +0000	[diff] [blame]	47	goal, then practically speaking, the libc++ <tt><atomic></tt> header would
				48	be doomed to be a barely supported, second class citizen on almost every
				49	platform.
				50	</p>
				51
				52	<p>Goals:</p>
				53
				54	<blockquote><ul>
				55	<li>Optimal code generation for atomic operations</li>
				56	<li>Minimal effort for the compiler to achieve goal 1 on any given platform</li>
				57	<li>Conformance to the C++0X draft standard</li>
				58	</ul></blockquote>
				59
				60	<p>
				61	The purpose of this document is to inform compiler writers what they need to do
				62	to enable a high performance libc++ <tt><atomic></tt> with minimal effort.
				63	</p>
				64
				65	<h2>The minimal work that must be done for a conforming <tt><atomic></tt></h2>
				66
				67	<p>
				68	The only "atomic" operations that must actually be lock free in
				69	<tt><atomic></tt> are represented by the following compiler intrinsics:
				70	</p>
				71
				72	<blockquote><pre>
				73	__atomic_flag__
				74	__atomic_exchange_seq_cst(__atomic_flag__ volatile* obj, __atomic_flag__ desr)
				75	{
				76	unique_lock<mutex> _(some_mutex);
				77	__atomic_flag__ result = *obj;
				78	*obj = desr;
				79	return result;
				80	}
				81
				82	void
				83	__atomic_store_seq_cst(__atomic_flag__ volatile* obj, __atomic_flag__ desr)
				84	{
				85	unique_lock<mutex> _(some_mutex);
				86	*obj = desr;
				87	}
				88	</pre></blockquote>
				89
				90	<p>
				91	Where:
				92	</p>
				93
				94	<blockquote><ul>
				95	<li>
				96	If <tt>__has_feature(__atomic_flag)</tt> evaluates to 1 in the preprocessor then
				97	the compiler must define <tt>__atomic_flag__</tt> (e.g. as a typedef to
				98	<tt>int</tt>).
				99	</li>
				100	<li>
				101	If <tt>__has_feature(__atomic_flag)</tt> evaluates to 0 in the preprocessor then
				102	the library defines <tt>__atomic_flag__</tt> as a typedef to <tt>bool</tt>.
				103	</li>
				104	<li>
				105	<p>
				106	To communicate that the above intrinsics are available, the compiler must
				107	arrange for <tt>__has_feature</tt> to return 1 when fed the intrinsic name
				108	appended with an '_' and the mangled type name of <tt>__atomic_flag__</tt>.
				109	</p>
				110	<p>
				111	For example if <tt>__atomic_flag__</tt> is <tt>unsigned int</tt>:
				112	</p>
				113	<blockquote><pre>
				114	__has_feature(__atomic_flag) == 1
				115	__has_feature(__atomic_exchange_seq_cst_j) == 1
				116	__has_feature(__atomic_store_seq_cst_j) == 1
				117
				118	typedef unsigned int __atomic_flag__;
				119
				120	unsigned int __atomic_exchange_seq_cst(unsigned int volatile*, unsigned int)
				121	{
				122	// ...
				123	}
				124
				125	void __atomic_store_seq_cst(unsigned int volatile*, unsigned int)
				126	{
				127	// ...
				128	}
				129	</pre></blockquote>
				130	</li>
				131	</ul></blockquote>
				132
				133	<p>
				134	That's it! Compiler writers do the above and you've got a fully conforming
				135	(though sub-par performance) <tt><atomic></tt> header!
				136	</p>
				137
				138	<h2>Recommended work for a higher performance <tt><atomic></tt></h2>
				139
				140	<p>
				141	It would be good if the above intrinsics worked with all integral types plus
				142	<tt>void*</tt>. Because this may not be possible to do in a lock-free manner for
				143	all integral types on all platforms, a compiler must communicate each type that
				144	an intrinsic works with. For example if <tt>__atomic_exchange_seq_cst</tt> works
				145	for all types except for <tt>long long</tt> and <tt>unsigned long long</tt>
				146	then:
				147	</p>
				148
				149	<blockquote><pre>
				150	__has_feature(__atomic_exchange_seq_cst_b) == 1 // bool
				151	__has_feature(__atomic_exchange_seq_cst_c) == 1 // char
				152	__has_feature(__atomic_exchange_seq_cst_a) == 1 // signed char
				153	__has_feature(__atomic_exchange_seq_cst_h) == 1 // unsigned char
				154	__has_feature(__atomic_exchange_seq_cst_Ds) == 1 // char16_t
				155	__has_feature(__atomic_exchange_seq_cst_Di) == 1 // char32_t
				156	__has_feature(__atomic_exchange_seq_cst_w) == 1 // wchar_t
				157	__has_feature(__atomic_exchange_seq_cst_s) == 1 // short
				158	__has_feature(__atomic_exchange_seq_cst_t) == 1 // unsigned short
				159	__has_feature(__atomic_exchange_seq_cst_i) == 1 // int
				160	__has_feature(__atomic_exchange_seq_cst_j) == 1 // unsigned int
				161	__has_feature(__atomic_exchange_seq_cst_l) == 1 // long
				162	__has_feature(__atomic_exchange_seq_cst_m) == 1 // unsigned long
				163	__has_feature(__atomic_exchange_seq_cst_Pv) == 1 // void*
				164	</pre></blockquote>
				165
				166	<p>
				167	Note that only the <tt>__has_feature</tt> flag is decorated with the argument
				168	type. The name of the compiler intrinsic is not decorated, but instead works
				169	like a C++ overloaded function.
				170	</p>
				171
				172	<p>
				173	Additionally there are other intrinsics besides
				174	<tt>__atomic_exchange_seq_cst</tt> and <tt>__atomic_store_seq_cst</tt>. They
				175	are optional. But if the compiler can generate faster code than provided by the
				176	library, then clients will benefit from the compiler writer's expertise and
				177	knowledge of the targeted platform.
				178	</p>
				179
				180	<p>
				181	Below is the complete list of <i>sequentially consistent</i> intrinsics, and
				182	their library implementations. Template syntax is used to indicate the desired
				183	overloading for integral and void* types. The template does not represent a
				184	requirement that the intrinsic operate on <em>any</em> type!
				185	</p>
				186
				187	<blockquote><pre>
				188	T is one of: bool, char, signed char, unsigned char, short, unsigned short,
				189	int, unsigned int, long, unsigned long,
				190	long long, unsigned long long, char16_t, char32_t, wchar_t, void*
				191
				192	template <class T>
				193	T
				194	__atomic_load_seq_cst(T const volatile* obj)
				195	{
				196	unique_lock<mutex> _(some_mutex);
				197	return *obj;
				198	}
				199
				200	template <class T>
				201	void
				202	__atomic_store_seq_cst(T volatile* obj, T desr)
				203	{
				204	unique_lock<mutex> _(some_mutex);
				205	*obj = desr;
				206	}
				207
				208	template <class T>
				209	T
				210	__atomic_exchange_seq_cst(T volatile* obj, T desr)
				211	{
				212	unique_lock<mutex> _(some_mutex);
				213	T r = *obj;
				214	*obj = desr;
				215	return r;
				216	}
				217
				218	template <class T>
				219	bool
				220	__atomic_compare_exchange_strong_seq_cst_seq_cst(T volatile* obj, T* exp, T desr)
				221	{
				222	unique_lock<mutex> _(some_mutex);
				223	if (std::memcmp(const_cast<T*>(obj), exp, sizeof(T)) == 0)
				224	{
				225	std::memcpy(const_cast<T*>(obj), &desr, sizeof(T));
				226	return true;
				227	}
				228	std::memcpy(exp, const_cast<T*>(obj), sizeof(T));
				229	return false;
				230	}
				231
				232	template <class T>
				233	bool
				234	__atomic_compare_exchange_weak_seq_cst_seq_cst(T volatile* obj, T* exp, T desr)
				235	{
				236	unique_lock<mutex> _(some_mutex);
				237	if (std::memcmp(const_cast<T*>(obj), exp, sizeof(T)) == 0)
				238	{
				239	std::memcpy(const_cast<T*>(obj), &desr, sizeof(T));
				240	return true;
				241	}
				242	std::memcpy(exp, const_cast<T*>(obj), sizeof(T));
				243	return false;
				244	}
				245
				246	T is one of: char, signed char, unsigned char, short, unsigned short,
				247	int, unsigned int, long, unsigned long,
				248	long long, unsigned long long, char16_t, char32_t, wchar_t
				249
				250	template <class T>
				251	T
				252	__atomic_fetch_add_seq_cst(T volatile* obj, T operand)
				253	{
				254	unique_lock<mutex> _(some_mutex);
				255	T r = *obj;
				256	*obj += operand;
				257	return r;
				258	}
				259
				260	template <class T>
				261	T
				262	__atomic_fetch_sub_seq_cst(T volatile* obj, T operand)
				263	{
				264	unique_lock<mutex> _(some_mutex);
				265	T r = *obj;
				266	*obj -= operand;
				267	return r;
				268	}
				269
				270	template <class T>
				271	T
				272	__atomic_fetch_and_seq_cst(T volatile* obj, T operand)
				273	{
				274	unique_lock<mutex> _(some_mutex);
				275	T r = *obj;
				276	*obj &= operand;
				277	return r;
				278	}
				279
				280	template <class T>
				281	T
				282	__atomic_fetch_or_seq_cst(T volatile* obj, T operand)
				283	{
				284	unique_lock<mutex> _(some_mutex);
				285	T r = *obj;
				286	*obj \|= operand;
				287	return r;
				288	}
				289
				290	template <class T>
				291	T
				292	__atomic_fetch_xor_seq_cst(T volatile* obj, T operand)
				293	{
				294	unique_lock<mutex> _(some_mutex);
				295	T r = *obj;
				296	*obj ^= operand;
				297	return r;
				298	}
				299
				300	void*
				301	__atomic_fetch_add_seq_cst(void* volatile* obj, ptrdiff_t operand)
				302	{
				303	unique_lock<mutex> _(some_mutex);
				304	void* r = *obj;
				305	(char&)(obj) += operand;
				306	return r;
				307	}
				308
				309	void*
				310	__atomic_fetch_sub_seq_cst(void* volatile* obj, ptrdiff_t operand)
				311	{
				312	unique_lock<mutex> _(some_mutex);
				313	void* r = *obj;
				314	(char&)(obj) -= operand;
				315	return r;
				316	}
				317
				318	void __atomic_thread_fence_seq_cst()
				319	{
				320	unique_lock<mutex> _(some_mutex);
				321	}
				322
				323	void __atomic_signal_fence_seq_cst()
				324	{
				325	unique_lock<mutex> _(some_mutex);
				326	}
				327	</pre></blockquote>
				328
				329	<p>
				330	One should consult the (currently draft)
				331	<a href="http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2010/n3126.pdf">C++ standard</a>
				332	for the details of the definitions for these operations. For example
				333	<tt>__atomic_compare_exchange_weak_seq_cst_seq_cst</tt> is allowed to fail
				334	spuriously while <tt>__atomic_compare_exchange_strong_seq_cst_seq_cst</tt> is
				335	not.
				336	</p>
				337
				338	<p>
				339	If on your platform the lock-free definition of
				340	<tt>__atomic_compare_exchange_weak_seq_cst_seq_cst</tt> would be the same as
				341	<tt>__atomic_compare_exchange_strong_seq_cst_seq_cst</tt>, you may omit the
				342	<tt>__atomic_compare_exchange_weak_seq_cst_seq_cst</tt> intrinsic without a
				343	performance cost. The library will prefer your implementation of
				344	<tt>__atomic_compare_exchange_strong_seq_cst_seq_cst</tt> over its own
				345	definition for implementing
				346	<tt>__atomic_compare_exchange_weak_seq_cst_seq_cst</tt>. That is, the library
				347	will arrange for <tt>__atomic_compare_exchange_weak_seq_cst_seq_cst</tt> to call
				348	<tt>__atomic_compare_exchange_strong_seq_cst_seq_cst</tt> if you supply an
				349	intrinsic for the strong version but not the weak.
				350	</p>
				351
				352	<h2>Taking advantage of weaker memory synchronization</h2>
				353
				354	<p>
				355	So far all of the intrinsics presented require a <em>sequentially
				356	consistent</em> memory ordering. That is, no loads or stores can move across
				357	the operation (just as if the library had locked that internal mutex). But
				358	<tt><atomic></tt> supports weaker memory ordering operations. In all,
				359	there are six memory orderings (listed here from strongest to weakest):
				360	</p>
				361
				362	<blockquote><pre>
				363	memory_order_seq_cst
				364	memory_order_acq_rel
				365	memory_order_release
				366	memory_order_acquire
				367	memory_order_consume
				368	memory_order_relaxed
				369	</pre></blockquote>
				370
				371	<p>
				372	(See the
				373	<a href="http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2010/n3126.pdf">C++ standard</a>
				374	for the detailed definitions of each of these orderings).
				375	</p>
				376
				377	<p>
				378	On some platforms, the compiler vendor can offer some or even all of the above
				379	intrinsics at one or more weaker levels of memory synchronization. This might
				380	lead for example to not issuing an <tt>mfence</tt> instruction on the x86.
				381	</p>
				382
				383	<p>
				384	If the compiler does not offer any given operation, at any given memory ordering
				385	level, the library will automatically attempt to call the next highest memory
				386	ordering operation. This continues up to <tt>seq_cst</tt>, and if that doesn't
				387	exist, then the library takes over and does the job with a <tt>mutex</tt>. This
				388	is a compile-time search & selection operation. At run time, the
				389	application will only see the few inlined assembly instructions for the selected
				390	intrinsic.
				391	</p>
				392
				393	<p>
				394	Each intrinsic is appended with the 7-letter name of the memory ordering it
				395	addresses. For example a <tt>load</tt> with <tt>relaxed</tt> ordering is
				396	defined by:
				397	</p>
				398
				399	<blockquote><pre>
				400	T __atomic_load_relaxed(const volatile T* obj);
				401	</pre></blockquote>
				402
				403	<p>
				404	And announced with:
				405	</p>
				406
				407	<blockquote><pre>
				408	__has_feature(__atomic_load_relaxed_b) == 1 // bool
				409	__has_feature(__atomic_load_relaxed_c) == 1 // char
				410	__has_feature(__atomic_load_relaxed_a) == 1 // signed char
				411	...
				412	</pre></blockquote>
				413
				414	<p>
				415	The <tt>__atomic_compare_exchange_strong(weak)</tt> intrinsics are parameterized
				416	on two memory orderings. The first ordering applies when the operation returns
				417	<tt>true</tt> and the second ordering applies when the operation returns
				418	<tt>false</tt>.
				419	</p>
				420
				421	<p>
				422	Not every memory ordering is appropriate for every operation. <tt>exchange</tt>
				423	and the <tt>fetch_<i>op</i></tt> operations support all 6. But <tt>load</tt>
				424	only supports <tt>relaxed</tt>, <tt>consume</tt>, <tt>acquire</tt> and <tt>seq_cst</tt>.
				425	<tt>store</tt>
				426	only supports <tt>relaxed</tt>, <tt>release</tt>, and <tt>seq_cst</tt>. The
				427	<tt>compare_exchange</tt> operations support the following 16 combinations out
				428	of the possible 36:
				429	</p>
				430
				431	<blockquote><pre>
				432	relaxed_relaxed
				433	consume_relaxed
				434	consume_consume
				435	acquire_relaxed
				436	acquire_consume
				437	acquire_acquire
				438	release_relaxed
				439	release_consume
				440	release_acquire
				441	acq_rel_relaxed
				442	acq_rel_consume
				443	acq_rel_acquire
				444	seq_cst_relaxed
				445	seq_cst_consume
				446	seq_cst_acquire
				447	seq_cst_seq_cst
				448	</pre></blockquote>
				449
				450	<p>
				451	Again, the compiler supplies intrinsics only for the strongest orderings where
				452	it can make a difference. The library takes care of calling the weakest
				453	supplied intrinsic that is as strong or stronger than the customer asked for.
				454	</p>
				455
				456	</div>
				457	</body>
				458	</html>