blob: 4197fa41be4692990bca99d675ca6ccaec901a93 [file] [log] [blame]
sewardjb4112022007-11-09 22:49:28 +00001<?xml version="1.0"?> <!-- -*- sgml -*- -->
2<!DOCTYPE chapter PUBLIC "-//OASIS//DTD DocBook XML V4.2//EN"
sewardj33878892007-11-17 09:43:25 +00003 "http://www.oasis-open.org/docbook/xml/4.2/docbookx.dtd"
4[ <!ENTITY % vg-entities SYSTEM "../../docs/xml/vg-entities.xml"> %vg-entities; ]>
sewardjb4112022007-11-09 22:49:28 +00005
6
sewardj572feb72007-11-09 23:59:49 +00007<chapter id="hg-manual" xreflabel="Helgrind: thread error detector">
8 <title>Helgrind: a thread error detector</title>
sewardjb4112022007-11-09 22:49:28 +00009
10<para>To use this tool, you must specify
sewardj572feb72007-11-09 23:59:49 +000011<computeroutput>--tool=helgrind</computeroutput> on the Valgrind
sewardjb4112022007-11-09 22:49:28 +000012command line.</para>
13
14
15
16
sewardj572feb72007-11-09 23:59:49 +000017<sect1 id="hg-manual.overview" xreflabel="Overview">
sewardjb4112022007-11-09 22:49:28 +000018<title>Overview</title>
19
sewardj572feb72007-11-09 23:59:49 +000020<para>Helgrind is a Valgrind tool for detecting synchronisation errors
sewardjb4112022007-11-09 22:49:28 +000021in C, C++ and Fortran programs that use the POSIX pthreads
22threading primitives.</para>
23
24<para>The main abstractions in POSIX pthreads are: a set of threads
25sharing a common address space, thread creation, thread joinage,
26thread exit, mutexes (locks), condition variables (inter-thread event
27notifications), reader-writer locks, and semaphores.</para>
28
sewardj572feb72007-11-09 23:59:49 +000029<para>Helgrind is aware of all these abstractions and tracks their
sewardjb4112022007-11-09 22:49:28 +000030effects as accurately as it can. Currently it does not correctly
31handle pthread barriers and pthread spinlocks, although it will not
32object if you use them. On x86 and amd64 platforms, it understands
33and partially handles implicit locking arising from the use of the
34LOCK instruction prefix.
35</para>
36
sewardj572feb72007-11-09 23:59:49 +000037<para>Helgrind can detect three classes of errors, which are discussed
sewardjb4112022007-11-09 22:49:28 +000038in detail in the next three sections:</para>
39
40<orderedlist>
41 <listitem>
sewardj572feb72007-11-09 23:59:49 +000042 <para><link linkend="hg-manual.api-checks">
sewardjb4112022007-11-09 22:49:28 +000043 Misuses of the POSIX pthreads API.</link></para>
44 </listitem>
45 <listitem>
sewardj572feb72007-11-09 23:59:49 +000046 <para><link linkend="hg-manual.lock-orders">
sewardjb4112022007-11-09 22:49:28 +000047 Potential deadlocks arising from lock
48 ordering problems.</link></para>
49 </listitem>
50 <listitem>
sewardj572feb72007-11-09 23:59:49 +000051 <para><link linkend="hg-manual.data-races">
sewardjb4112022007-11-09 22:49:28 +000052 Data races -- accessing memory without adequate locking.
53 </link></para>
54 </listitem>
55</orderedlist>
56
57<para>Following those is a section containing
sewardj572feb72007-11-09 23:59:49 +000058<link linkend="hg-manual.effective-use">
59hints and tips on how to get the best out of Helgrind.</link>
sewardjb4112022007-11-09 22:49:28 +000060</para>
61
62<para>Then there is a
sewardj572feb72007-11-09 23:59:49 +000063<link linkend="hg-manual.options">summary of command-line
sewardjb4112022007-11-09 22:49:28 +000064options.</link>
65</para>
66
67<para>Finally, there is
sewardj572feb72007-11-09 23:59:49 +000068<link linkend="hg-manual.todolist">a brief summary of areas in which Helgrind
sewardjb4112022007-11-09 22:49:28 +000069could be improved.</link>
70</para>
71
72</sect1>
73
74
75
76
sewardj572feb72007-11-09 23:59:49 +000077<sect1 id="hg-manual.api-checks" xreflabel="API Checks">
sewardjb4112022007-11-09 22:49:28 +000078<title>Detected errors: Misuses of the POSIX pthreads API</title>
79
sewardj572feb72007-11-09 23:59:49 +000080<para>Helgrind intercepts calls to many POSIX pthreads functions, and
sewardjb4112022007-11-09 22:49:28 +000081is therefore able to report on various common problems. Although
82these are unglamourous errors, their presence can lead to undefined
83program behaviour and hard-to-find bugs later in execution. The
84detected errors are:</para>
85
86<itemizedlist>
87 <listitem><para>unlocking an invalid mutex</para></listitem>
88 <listitem><para>unlocking a not-locked mutex</para></listitem>
89 <listitem><para>unlocking a mutex held by a different
90 thread</para></listitem>
91 <listitem><para>destroying an invalid or a locked mutex</para></listitem>
92 <listitem><para>recursively locking a non-recursive mutex</para></listitem>
93 <listitem><para>deallocation of memory that contains a
94 locked mutex</para></listitem>
95 <listitem><para>passing mutex arguments to functions expecting
96 reader-writer lock arguments, and vice
97 versa</para></listitem>
98 <listitem><para>when a POSIX pthread function fails with an
99 error code that must be handled</para></listitem>
100 <listitem><para>when a thread exits whilst still holding locked
101 locks</para></listitem>
102 <listitem><para>calling <computeroutput>pthread_cond_wait</computeroutput>
103 with a not-locked mutex, or one locked by a different
104 thread</para></listitem>
105</itemizedlist>
106
107<para>Checks pertaining to the validity of mutexes are generally also
108performed for reader-writer locks.</para>
109
110<para>Various kinds of this-can't-possibly-happen events are also
111reported. These usually indicate bugs in the system threading
112library.</para>
113
114<para>Reported errors always contain a primary stack trace indicating
115where the error was detected. They may also contain auxiliary stack
116traces giving additional information. In particular, most errors
117relating to mutexes will also tell you where that mutex first came to
sewardj572feb72007-11-09 23:59:49 +0000118Helgrind's attention (the "<computeroutput>was first observed
sewardjb4112022007-11-09 22:49:28 +0000119at</computeroutput>" part), so you have a chance of figuring out which
120mutex it is referring to. For example:</para>
121
122<programlisting><![CDATA[
123Thread #1 unlocked a not-locked lock at 0x7FEFFFA90
sewardj572feb72007-11-09 23:59:49 +0000124 at 0x4C2408D: pthread_mutex_unlock (hg_intercepts.c:492)
sewardjb4112022007-11-09 22:49:28 +0000125 by 0x40073A: nearly_main (tc09_bad_unlock.c:27)
126 by 0x40079B: main (tc09_bad_unlock.c:50)
127 Lock at 0x7FEFFFA90 was first observed
sewardj572feb72007-11-09 23:59:49 +0000128 at 0x4C25D01: pthread_mutex_init (hg_intercepts.c:326)
sewardjb4112022007-11-09 22:49:28 +0000129 by 0x40071F: nearly_main (tc09_bad_unlock.c:23)
130 by 0x40079B: main (tc09_bad_unlock.c:50)
131]]></programlisting>
132
sewardj572feb72007-11-09 23:59:49 +0000133<para>Helgrind has a way of summarising thread identities, as
sewardjb4112022007-11-09 22:49:28 +0000134evidenced here by the text "<computeroutput>Thread
135#1</computeroutput>". This is so that it can speak about threads and
136sets of threads without overwhelming you with details. See
sewardj572feb72007-11-09 23:59:49 +0000137<link linkend="hg-manual.data-races.errmsgs">below</link>
sewardjb4112022007-11-09 22:49:28 +0000138for more information on interpreting error messages.</para>
139
140</sect1>
141
142
143
144
sewardj572feb72007-11-09 23:59:49 +0000145<sect1 id="hg-manual.lock-orders" xreflabel="Lock Orders">
sewardjb4112022007-11-09 22:49:28 +0000146<title>Detected errors: Inconsistent Lock Orderings</title>
147
148<para>In this section, and in general, to "acquire" a lock simply
149means to lock that lock, and to "release" a lock means to unlock
150it.</para>
151
sewardj572feb72007-11-09 23:59:49 +0000152<para>Helgrind monitors the order in which threads acquire locks.
sewardjb4112022007-11-09 22:49:28 +0000153This allows it to detect potential deadlocks which could arise from
154the formation of cycles of locks. Detecting such inconsistencies is
155useful because, whilst actual deadlocks are fairly obvious, potential
156deadlocks may never be discovered during testing and could later lead
157to hard-to-diagnose in-service failures.</para>
158
159<para>The simplest example of such a problem is as
160follows.</para>
161
162<itemizedlist>
163 <listitem><para>Imagine some shared resource R, which, for whatever
164 reason, is guarded by two locks, L1 and L2, which must both be held
165 when R is accessed.</para>
166 </listitem>
167 <listitem><para>Suppose a thread acquires L1, then L2, and proceeds
168 to access R. The implication of this is that all threads in the
169 program must acquire the two locks in the order first L1 then L2.
170 Not doing so risks deadlock.</para>
171 </listitem>
172 <listitem><para>The deadlock could happen if two threads -- call them
173 T1 and T2 -- both want to access R. Suppose T1 acquires L1 first,
174 and T2 acquires L2 first. Then T1 tries to acquire L2, and T2 tries
175 to acquire L1, but those locks are both already held. So T1 and T2
176 become deadlocked.</para>
177 </listitem>
178</itemizedlist>
179
sewardj572feb72007-11-09 23:59:49 +0000180<para>Helgrind builds a directed graph indicating the order in which
sewardjb4112022007-11-09 22:49:28 +0000181locks have been acquired in the past. When a thread acquires a new
182lock, the graph is updated, and then checked to see if it now contains
183a cycle. The presence of a cycle indicates a potential deadlock involving
184the locks in the cycle.</para>
185
186<para>In simple situations, where the cycle only contains two locks,
sewardj572feb72007-11-09 23:59:49 +0000187Helgrind will show where the required order was established:</para>
sewardjb4112022007-11-09 22:49:28 +0000188
189<programlisting><![CDATA[
190Thread #1: lock order "0x7FEFFFAB0 before 0x7FEFFFA80" violated
sewardj572feb72007-11-09 23:59:49 +0000191 at 0x4C23C91: pthread_mutex_lock (hg_intercepts.c:388)
sewardjb4112022007-11-09 22:49:28 +0000192 by 0x40081F: main (tc13_laog1.c:24)
193 Required order was established by acquisition of lock at 0x7FEFFFAB0
sewardj572feb72007-11-09 23:59:49 +0000194 at 0x4C23C91: pthread_mutex_lock (hg_intercepts.c:388)
sewardjb4112022007-11-09 22:49:28 +0000195 by 0x400748: main (tc13_laog1.c:17)
196 followed by a later acquisition of lock at 0x7FEFFFA80
sewardj572feb72007-11-09 23:59:49 +0000197 at 0x4C23C91: pthread_mutex_lock (hg_intercepts.c:388)
sewardjb4112022007-11-09 22:49:28 +0000198 by 0x400773: main (tc13_laog1.c:18)
199]]></programlisting>
200
201<para>When there are more than two locks in the cycle, the error is
sewardj572feb72007-11-09 23:59:49 +0000202equally serious. However, at present Helgrind does not show the locks
sewardjb4112022007-11-09 22:49:28 +0000203involved, so as to avoid flooding you with information. That could be
204fixed in future. For example, here is a an example involving a cycle
205of five locks from a naive implementation the famous Dining
206Philosophers problem
sewardj572feb72007-11-09 23:59:49 +0000207(see <computeroutput>helgrind/tests/tc14_laog_dinphils.c</computeroutput>).
208In this case Helgrind has detected that all 5 philosophers could
sewardjb4112022007-11-09 22:49:28 +0000209simultaneously pick up their left fork and then deadlock whilst
210waiting to pick up their right forks.</para>
211
212<programlisting><![CDATA[
213Thread #6: lock order "0x6010C0 before 0x601160" violated
sewardj572feb72007-11-09 23:59:49 +0000214 at 0x4C23C91: pthread_mutex_lock (hg_intercepts.c:388)
sewardjb4112022007-11-09 22:49:28 +0000215 by 0x4007C0: dine (tc14_laog_dinphils.c:19)
sewardj572feb72007-11-09 23:59:49 +0000216 by 0x4C25DF7: mythread_wrapper (hg_intercepts.c:178)
sewardjb4112022007-11-09 22:49:28 +0000217 by 0x4E2F09D: start_thread (in /lib64/libpthread-2.5.so)
218 by 0x51054CC: clone (in /lib64/libc-2.5.so)
219]]></programlisting>
220
221</sect1>
222
223
224
225
sewardj572feb72007-11-09 23:59:49 +0000226<sect1 id="hg-manual.data-races" xreflabel="Data Races">
sewardjb4112022007-11-09 22:49:28 +0000227<title>Detected errors: Data Races</title>
228
229<para>A data race happens, or could happen, when two threads
230access a shared memory location without using suitable locks to
231ensure single-threaded access. Such missing locking can cause
232obscure timing dependent bugs. Ensuring programs are race-free is
233one of the central difficulties of threaded programming.</para>
234
235<para>Reliably detecting races is a difficult problem, and most
sewardj572feb72007-11-09 23:59:49 +0000236of Helgrind's internals are devoted to do dealing with it.
sewardjb4112022007-11-09 22:49:28 +0000237As a consequence this section is somewhat long and involved.
238We begin with a simple example.</para>
239
240
sewardj572feb72007-11-09 23:59:49 +0000241<sect2 id="hg-manual.data-races.example" xreflabel="Simple Race">
sewardjb4112022007-11-09 22:49:28 +0000242<title>A Simple Data Race</title>
243
244<para>About the simplest possible example of a race is as follows. In
245this program, it is impossible to know what the value
246of <computeroutput>var</computeroutput> is at the end of the program.
247Is it 2 ? Or 1 ?</para>
248
249<programlisting><![CDATA[
250#include <pthread.h>
251
252int var = 0;
253
254void* child_fn ( void* arg ) {
255 var++; /* Unprotected relative to parent */ /* this is line 6 */
256 return NULL;
257}
258
259int main ( void ) {
260 pthread_t child;
261 pthread_create(&child, NULL, child_fn, NULL);
262 var++; /* Unprotected relative to child */ /* this is line 13 */
263 pthread_join(child, NULL);
264 return 0;
265}
266]]></programlisting>
267
268<para>The problem is there is nothing to
269stop <computeroutput>var</computeroutput> being updated simultaneously
270by both threads. A correct program would
271protect <computeroutput>var</computeroutput> with a lock of type
272<computeroutput>pthread_mutex_t</computeroutput>, which is acquired
sewardj572feb72007-11-09 23:59:49 +0000273before each access and released afterwards. Helgrind's output for
sewardjb4112022007-11-09 22:49:28 +0000274this program is:</para>
275
276<programlisting><![CDATA[
277Thread #1 is the program's root thread
278
279Thread #2 was created
280 at 0x510548E: clone (in /lib64/libc-2.5.so)
281 by 0x4E2F305: do_clone (in /lib64/libpthread-2.5.so)
282 by 0x4E2F7C5: pthread_create@@GLIBC_2.2.5 (in /lib64/libpthread-2.5.so)
sewardj572feb72007-11-09 23:59:49 +0000283 by 0x4C23870: pthread_create@* (hg_intercepts.c:198)
sewardjb4112022007-11-09 22:49:28 +0000284 by 0x4005F1: main (simple_race.c:12)
285
286Possible data race during write of size 4 at 0x601034
287 at 0x4005F2: main (simple_race.c:13)
288 Old state: shared-readonly by threads #1, #2
289 New state: shared-modified by threads #1, #2
290 Reason: this thread, #1, holds no consistent locks
291 Location 0x601034 has never been protected by any lock
292]]></programlisting>
293
294<para>This is quite a lot of detail for an apparently simple error.
295The last clause is the main error message. It says there is a race as
296a result of a write of size 4 (bytes), at 0x601034, which is
297presumably the address of <computeroutput>var</computeroutput>,
298happening in function <computeroutput>main</computeroutput> at line 13
299in the program.</para>
300
301<para>Note that it is purely by chance that the race is
302reported for the parent thread's access. It could equally have been
303reported instead for the child's access, at line 6. The error will
304only be reported for one of the locations, since neither the parent
305nor child is, by itself, incorrect. It is only when both access
306<computeroutput>var</computeroutput> without a lock that an error
307exists.</para>
308
309<para>The error message shows some other interesting details. The
310sections below explain them. Here we merely note their presence:</para>
311
312<itemizedlist>
sewardj572feb72007-11-09 23:59:49 +0000313 <listitem><para>Helgrind maintains some kind of state machine for the
sewardjb4112022007-11-09 22:49:28 +0000314 memory location in question, hence the "<computeroutput>Old
315 state:</computeroutput>" and "<computeroutput>New
316 state:</computeroutput>" lines.</para>
317 </listitem>
sewardj572feb72007-11-09 23:59:49 +0000318 <listitem><para>Helgrind keeps track of which threads have accessed
sewardjb4112022007-11-09 22:49:28 +0000319 the location: "<computeroutput>threads #1, #2</computeroutput>".
320 Before printing the main error message, it prints the creation
321 points of these two threads, so you can see which threads it is
322 referring to.</para>
323 </listitem>
sewardj33878892007-11-17 09:43:25 +0000324 <listitem><para>Helgrind tries to provide an explanation of why the
sewardjb4112022007-11-09 22:49:28 +0000325 race exists: "<computeroutput>Location 0x601034 has never been
326 protected by any lock</computeroutput>".</para>
327 </listitem>
328</itemizedlist>
329
330<para>Understanding the memory state machine is central to
sewardj572feb72007-11-09 23:59:49 +0000331understanding Helgrind's race-detection algorithm. The next three
sewardjb4112022007-11-09 22:49:28 +0000332subsections explain this.</para>
333
334</sect2>
335
336
sewardj572feb72007-11-09 23:59:49 +0000337<sect2 id="hg-manual.data-races.memstates" xreflabel="Memory States">
338<title>Helgrind's Memory State Machine</title>
sewardjb4112022007-11-09 22:49:28 +0000339
sewardj572feb72007-11-09 23:59:49 +0000340<para>Helgrind tracks the state of every byte of memory used by your
sewardjb4112022007-11-09 22:49:28 +0000341program. There are a number of states, but only three are
342interesting:</para>
343
344<itemizedlist>
345 <listitem><para>Exclusive: memory in this state is regarded as owned
346 exclusively by one particular thread. That thread may read and
347 write it without a lock. Even in highly threaded programs, the
348 majority of locations never leave the Exclusive state, since most
349 data is thread-private.</para>
350 </listitem>
351 <listitem><para>Shared-Readonly: memory in this state is regarded as
352 shared by multiple threads. In this state, any thread may read the
353 memory without a lock, reflecting the fact that readonly data may
354 safely be shared between threads without locking.</para>
355 </listitem>
356 <listitem><para>Shared-Modified: memory in this state is regarded as
357 shared by multiple threads, at least one of which has written to it.
358 All participating threads must hold at least one lock in common when
sewardj572feb72007-11-09 23:59:49 +0000359 accessing the memory. If no such lock exists, Helgrind reports a
sewardjb4112022007-11-09 22:49:28 +0000360 race error.</para>
361 </listitem>
362</itemizedlist>
363
364<para>Let's review the simple example above with this in mind. When
365the program starts, <computeroutput>var</computeroutput> is not in any
366of these states. Either the parent or child thread gets to its
367<computeroutput>var++</computeroutput> first, and thereby
368thereby gets Exclusive ownership of the location.</para>
369
370<para>The later-running thread now arrives at
371its <computeroutput>var++</computeroutput> statement. It first reads
372the existing value from memory.
373Because <computeroutput>var</computeroutput> is currently marked as
374owned exclusively by the other thread, its state is changed to
375shared-readonly by both threads.</para>
376
377<para>This same thread adds one to the value it has and stores it back
378in <computeroutput>var</computeroutput>. This causes another state
sewardj572feb72007-11-09 23:59:49 +0000379change, this time to the shared-modified state. Because Helgrind has
sewardjb4112022007-11-09 22:49:28 +0000380also been tracking which threads hold which locks, it can see that
381<computeroutput>var</computeroutput> is in shared-modified state but
382no lock has been used to consistently protect it. Hence a race is
383reported exactly at the transition from shared-readonly to
384shared-modified.</para>
385
sewardj572feb72007-11-09 23:59:49 +0000386<para>The essence of the algorithm is this. Helgrind keeps track of
sewardjb4112022007-11-09 22:49:28 +0000387each memory location that has been accessed by more than one thread.
388For each such location it incrementally infers the set of locks which
389have consistently been used to protect that location. If the
390location's lockset becomes empty, and at some point one of the threads
391attempts to write to it, a race is then reported.</para>
392
393<para>This technique is known as "lockset inference" and was
394introduced in: "Eraser: A Dynamic Data Race Detector for Multithreaded
395Programs" (Stefan Savage, Michael Burrows, Greg Nelson, Patrick
396Sobalvarro and Thomas Anderson, ACM Transactions on Computer Systems,
39715(4):391-411, November 1997).</para>
398
399<para>Lockset inference has since been widely implemented, studied and
sewardj572feb72007-11-09 23:59:49 +0000400extended. Helgrind incorporates several refinements aimed at avoiding
sewardjb4112022007-11-09 22:49:28 +0000401the high false error rate that naive versions of the algorithm suffer
402from. A
sewardj572feb72007-11-09 23:59:49 +0000403<link linkend="hg-manual.data-races.summary">summary of the complete
404algorithm used by Helgrind</link> is presented below. First, however,
sewardjb4112022007-11-09 22:49:28 +0000405it is important to understand details of transitions pertaining to the
406Exclusive-ownership state.</para>
407
408</sect2>
409
410
411
sewardj572feb72007-11-09 23:59:49 +0000412<sect2 id="hg-manual.data-races.exclusive" xreflabel="Excl Transfers">
sewardjb4112022007-11-09 22:49:28 +0000413<title>Transfers of Exclusive Ownership Between Threads</title>
414
415<para>As presented, the algorithm is far too strict. It reports many
416errors in perfectly correct, widely used parallel programming
417constructions, for example, using child worker threads and worker
418thread pools.</para>
419
420<para>To avoid these false errors, we must refine the algorithm so
421that it keeps memory in an Exclusive ownership state in cases where it
422would otherwise decay into a shared-readonly or shared-modified state.
423Recall that Exclusive ownership is special in that it grants the
424owning thread the right to access memory without use of any locks. In
425order to support worker-thread and worker-thread-pool idioms, we will
426allow threads to steal exclusive ownership of memory from other
427threads under certain circumstances.</para>
428
429<para>Here's an example. Imagine a parent thread creates child
430threads to do units of work. For each unit of work, the parent
431allocates a work buffer, fills it in, and creates the child thread,
432handing it a pointer to the buffer. The child reads/writes the buffer
433and eventually exits, and the waiting parent then extracts the results
434from the buffer:</para>
435
436<programlisting><![CDATA[
437typedef ... Buffer;
438
439pthread_t child;
440Buffer buf;
441
442/* ---- Parent ---- */ /* ---- Child ---- */
443
444/* parent writes workload into buf */
445pthread_create( &child, child_fn, &buf );
446
447/* parent does not read */ void child_fn ( Buffer* buf ) {
448/* or write buf */ /* read/write buf */
449 }
450
451pthread_join ( child );
452/* parent reads results from buf */
453]]></programlisting>
454
455<para>Although <computeroutput>buf</computeroutput> is accessed by
456both threads, neither uses locks, yet the program is race-free. The
457essential observation is that the child's creation and exit create
458synchronisation events between it and the parent. These force the
459child's accesses to <computeroutput>buf</computeroutput> to happen
460after the parent initialises <computeroutput>buf</computeroutput>, and
461before the parent reads the results
462from <computeroutput>buf</computeroutput>.</para>
463
sewardj572feb72007-11-09 23:59:49 +0000464<para>To model this, Helgrind allows the child to steal, from the
sewardjb4112022007-11-09 22:49:28 +0000465parent, exclusive ownership of any memory exclusively owned by the
466parent before the pthread_create call. Similarly, once the parent's
467pthread_join call returns, it can steal back ownership of memory
468exclusively owned by the child. In this way ownership
469of <computeroutput>buf</computeroutput> is transferred from parent to
470child and back, so the basic algorithm does not report any races
471despite the absence of any locking.</para>
472
473<para>Note that the child may only steal memory owned by the parent
474prior to the pthread_create call. If the child attempts to read or
475write memory which is also accessed by the parent in between the
476pthread_create and pthread_join calls, an error is still
477reported.</para>
478
479<para>This technique was introduced with the name "thread lifetime
480segments" in "Runtime Checking of Multithreaded Applications with
481Visual Threads" (Jerry J. Harrow, Jr, Proceedings of the 7th
482International SPIN Workshop on Model Checking of Software Stanford,
sewardj572feb72007-11-09 23:59:49 +0000483California, USA, August 2000, LNCS 1885, pp331--342). Helgrind
484implements an extended version of it. Specifically, Helgrind allows
sewardjb4112022007-11-09 22:49:28 +0000485transfer of exclusive ownership in the following situations:</para>
486
487<itemizedlist>
488 <listitem><para>At thread creation: a child can acquire ownership of
489 memory held exclusively by the parent prior to the child's
490 creation.</para>
491 </listitem>
492 <listitem><para>At thread joining: the joiner (thread not exiting)
493 can acquire ownership of memory held exclusively by the joinee
494 (thread that is exiting) at the point it exited.</para>
495 </listitem>
496 <listitem><para>At condition variable signallings and broadcasts. A
497 thread Tw which completes a pthread_cond_wait call as a result of
498 a signal or broadcast on the same condition variable by some other
499 thread Ts, may acquire ownership of memory held exclusively by
500 Ts prior to the pthread_cond_signal/broadcast
501 call.</para>
502 </listitem>
503 <listitem><para>At semaphore posts (sem_post) calls. A thread Tw
504 which completes a sem_wait call call as a result of a sem_post call
505 on the same semaphore by some other thread Tp, may acquire
506 ownership of memory held exclusively by Tp prior to the sem_post
507 call.</para>
508 </listitem>
509</itemizedlist>
510
511</sect2>
512
513
514
sewardj572feb72007-11-09 23:59:49 +0000515<sect2 id="hg-manual.data-races.re-excl" xreflabel="Re-Excl Transfers">
sewardjb4112022007-11-09 22:49:28 +0000516<title>Restoration of Exclusive Ownership</title>
517
518<para>Another common idiom is to partition the lifetime of the program
519as a whole into several distinct phases. In some of those phases, a
520memory location may be accessed by multiple threads and so require
521locking. In other phases only one thread exists and so can access the
522memory without locking. For example:</para>
523
524<programlisting><![CDATA[
525int var = 0; /* shared variable */
526pthread_mutex_t mx = PTHREAD_MUTEX_INITIALIZER; /* guard for var */
527pthread_t child;
528
529/* ---- Parent ---- */ /* ---- Child ---- */
530
531var += 1; /* no lock used */
532
533pthread_create( &child, child_fn, NULL );
534
535 void child_fn ( void* uu ) {
536pthread_mutex_lock(&mx); pthread_mutex_lock(&mx);
537var += 2; var += 3;
538pthread_mutex_unlock(&mx); pthread_mutex_unlock(&mx);
539 }
540
541pthread_join ( child );
542
543var += 4; /* no lock used */
544]]></programlisting>
545
546<para>This program is correct, but using only the mechanisms described
sewardj572feb72007-11-09 23:59:49 +0000547so far, Helgrind would report an error at
sewardjb4112022007-11-09 22:49:28 +0000548<computeroutput>var += 4</computeroutput>. This is because, by that
549point, <computeroutput>var</computeroutput> is marked as being in the
550state "shared-modified and protected by the
551lock <computeroutput>mx</computeroutput>", but is being accessed
552without locking. Really, what we want is
553for <computeroutput>var</computeroutput> to return to the parent
554thread's exclusive ownership after the child thread has exited.</para>
555
sewardj572feb72007-11-09 23:59:49 +0000556<para>To make this possible, for every memory location Helgrind also keeps
sewardjb4112022007-11-09 22:49:28 +0000557track of all the threads that have accessed that location
558-- its threadset. When a thread Tquitter joins back to Tstayer,
sewardj572feb72007-11-09 23:59:49 +0000559Helgrind examines the locksets of all memory in shared-modified or
sewardjb4112022007-11-09 22:49:28 +0000560shared-readable state. In each such lockset, if Tquitter is
561mentioned, it is removed and replaced by Tstayer. If, as a result, a
562lockset becomes a singleton set containing Tstayer, then the
563location's state is changed to belongs-exclusively-to-Tstayer.</para>
564
565<para>In our example, the result is exactly as we desire:
566<computeroutput>var</computeroutput> is reacquired exclusively by the
567parent after the child exits.</para>
568
569<para>More generally, when a group of threads merges back to a single
570thread via a cascade of pthread_join calls, any memory shared by the
571group (or a subset of it) ends up being owned exclusively by the sole
sewardj572feb72007-11-09 23:59:49 +0000572surviving thread. This significantly enhances Helgrind's flexibility,
sewardjb4112022007-11-09 22:49:28 +0000573since it means that each memory location may make arbitrarily many
574transitions between exclusive and shared ownership. Furthermore, a
575different lock may protect the location during each period of shared
576ownership.</para>
577
578</sect2>
579
580
581
sewardj572feb72007-11-09 23:59:49 +0000582<sect2 id="hg-manual.data-races.summary" xreflabel="Race Det Summary">
sewardjb4112022007-11-09 22:49:28 +0000583<title>A Summary of the Race Detection Algorithm</title>
584
sewardj572feb72007-11-09 23:59:49 +0000585<para>Helgrind looks for memory locations which are accessed by more
586than one thread. For each such location, Helgrind records which of
sewardjb4112022007-11-09 22:49:28 +0000587the program's locks were held by the accessing thread at the time of
588each access. The hope is to discover that there is indeed at least
589one lock which is consistently used by all threads to protect that
590location. If no such lock can be found, then there is apparently no
591consistent locking strategy being applied for that location, and so a
sewardj572feb72007-11-09 23:59:49 +0000592possible data race might result. Helgrind accordingly reports an
sewardjb4112022007-11-09 22:49:28 +0000593error.</para>
594
595<para>In practice this discipline is far too simplistic, and is
596unusable since it reports many races in some widely used and
sewardj572feb72007-11-09 23:59:49 +0000597known-correct programming disciplines. Helgrind's checking therefore
sewardjb4112022007-11-09 22:49:28 +0000598incorporates many refinements to this basic idea, and can be
599summarised as follows:</para>
600
601<para>The following thread events are intercepted and monitored:</para>
602
603<itemizedlist>
604 <listitem><para>thread creation and exiting (pthread_create,
605 pthread_join, pthread_exit)</para>
606 </listitem>
607 <listitem>
608 <para>lock acquisition and release (pthread_mutex_lock,
609 pthread_mutex_unlock, pthread_rwlock_rdlock,
610 pthread_rwlock_wrlock,
611 pthread_rwlock_unlock)</para>
612 </listitem>
613 <listitem>
614 <para>inter-thread event notifications (pthread_cond_wait,
615 pthread_cond_signal, pthread_cond_broadcast,
616 sem_wait, sem_post)</para>
617 </listitem>
618</itemizedlist>
619
620<para>Memory allocation and deallocation events are intercepted and
621monitored:</para>
622
623<itemizedlist>
624 <listitem>
625 <para>malloc/new/free/delete and variants</para>
626 </listitem>
627 <listitem>
628 <para>stack allocation and deallocation</para>
629 </listitem>
630</itemizedlist>
631
632<para>All memory accesses are intercepted and monitored.</para>
633
sewardj572feb72007-11-09 23:59:49 +0000634<para>By observing the above events, Helgrind can infer certain
sewardjb4112022007-11-09 22:49:28 +0000635aspects of the program's locking discipline. Programs which adhere to
636the following rules are considered to be acceptable:
637</para>
638
639<itemizedlist>
640 <listitem>
641 <para>A thread may allocate memory, and write initial values into
642 it, without locking. That thread is regarded as owning the memory
643 exclusively.</para>
644 </listitem>
645 <listitem>
646 <para>A thread may read and write memory which it owns exclusively,
647 without locking.</para>
648 </listitem>
649 <listitem>
650 <para>Memory which is owned exclusively by one thread may be read by
651 that thread and others without locking. However, in this situation
652 no thread may do unlocked writes to the memory (except for the owner
653 thread's initializing write).</para>
654 </listitem>
655 <listitem>
656 <para>Memory which is shared between multiple threads, one or more
657 of which writes to it, must be protected by a lock which is
658 correctly acquired and released by all threads accessing the
659 memory.</para>
660 </listitem>
661</itemizedlist>
662
663<para>Any violation of this discipline will cause an error to be reported.
664However, two exemptions apply:</para>
665
666<itemizedlist>
667 <listitem>
668 <para>A thread Y can acquire exclusive ownership of memory
669 previously owned exclusively by a different thread X providing
670 X's last access and Y's first access are separated by one of the
671 following synchronization events:</para>
672 <itemizedlist>
673 <listitem><para>X creates thread Y</para></listitem>
674 <listitem><para>X joins back to Y</para></listitem>
675 <listitem><para>X uses a condition-variable to signal at Y, and Y is
676 waiting for that event</para></listitem>
677 <listitem><para>Y completes a semaphore wait as a result of X signalling
678 on that same semaphore</para></listitem>
679 </itemizedlist>
680 <para>
sewardj572feb72007-11-09 23:59:49 +0000681 This refinement allows Helgrind to correctly track the ownership
sewardjb4112022007-11-09 22:49:28 +0000682 state of inter-thread buffers used in the worker-thread and
683 worker-thread-pool concurrent programming idioms (styles).</para>
684 </listitem>
685 <listitem>
686 <para>Similarly, if thread Y joins back to thread X, memory
687 exclusively owned by Y becomes exclusively owned by X instead.
688 Also, memory that has been shared only by X and Y becomes
689 exclusively owned by X. More generally, memory that has been shared
690 by X, Y and some arbitrary other set S of threads is re-marked as
691 shared by X and S. Hence, under the right circumstances, memory
692 shared amongst multiple threads, all of which join into just one,
693 can revert to the exclusive ownership state.</para>
694 <para>
695 In effect, each memory location may make arbitrarily many
696 transitions between exclusive and shared ownership. Furthermore, a
697 different lock may protect the location during each period of shared
698 ownership. This significantly enhances the flexibility of the
699 algorithm.</para>
700 </listitem>
701</itemizedlist>
702
703<para>The ownership state, accessing thread-set and related lock-set
704for each memory location are tracked at 8-bit granularity. This means
705the algorithm is precise even for 16- and 8-bit memory
706accesses.</para>
707
sewardj572feb72007-11-09 23:59:49 +0000708<para>Helgrind correctly handles reader-writer locks in this
sewardjb4112022007-11-09 22:49:28 +0000709framework. Locations shared between multiple threads can be protected
710during reads by locks held in either read-mode or write-mode, but can
711only be protected during writes by locks held in write-mode. Normal
712POSIX mutexes are treated as if they are reader-writer locks which are
713only ever held in write-mode.</para>
714
sewardj572feb72007-11-09 23:59:49 +0000715<para>Helgrind correctly handles POSIX mutexes for which recursive
sewardjb4112022007-11-09 22:49:28 +0000716locking is allowed.</para>
717
sewardj572feb72007-11-09 23:59:49 +0000718<para>Helgrind partially correctly handles x86 and amd64 memory access
sewardjb4112022007-11-09 22:49:28 +0000719instructions preceded by a LOCK prefix. Writes are correctly handled,
720by pretending that the LOCK prefix implies acquisition and release of
721a magic "bus hardware lock" mutex before and after the instruction.
722This unfortunately requires subsequent reads from such locations to
723also use a LOCK prefix, which is not required by the real hardware.
sewardj572feb72007-11-09 23:59:49 +0000724Helgrind does not offer any equivalent handling for atomic sequences
sewardjb4112022007-11-09 22:49:28 +0000725on PowerPC/POWER platforms created by the use of lwarx/stwcx
726instructions.</para>
727
728</sect2>
729
730
731
sewardj572feb72007-11-09 23:59:49 +0000732<sect2 id="hg-manual.data-races.errmsgs" xreflabel="Race Error Messages">
sewardjb4112022007-11-09 22:49:28 +0000733<title>Interpreting Race Error Messages</title>
734
sewardj572feb72007-11-09 23:59:49 +0000735<para>Helgrind's race detection algorithm collects a lot of
sewardjb4112022007-11-09 22:49:28 +0000736information, and tries to present it in a helpful way when a race is
737detected. Here's an example:</para>
738
739<programlisting><![CDATA[
740Thread #2 was created
741 at 0x510548E: clone (in /lib64/libc-2.5.so)
742 by 0x4E2F305: do_clone (in /lib64/libpthread-2.5.so)
743 by 0x4E2F7C5: pthread_create@@GLIBC_2.2.5 (in /lib64/libpthread-2.5.so)
sewardj572feb72007-11-09 23:59:49 +0000744 by 0x4C23870: pthread_create@* (hg_intercepts.c:198)
sewardjb4112022007-11-09 22:49:28 +0000745 by 0x400CEF: main (tc17_sembar.c:195)
746
747// And the same for threads #3, #4 and #5 -- omitted for conciseness
748
749Possible data race during read of size 4 at 0x602174
750 at 0x400BE5: gomp_barrier_wait (tc17_sembar.c:122)
751 by 0x400C44: child (tc17_sembar.c:161)
sewardj572feb72007-11-09 23:59:49 +0000752 by 0x4C25DF7: mythread_wrapper (hg_intercepts.c:178)
sewardjb4112022007-11-09 22:49:28 +0000753 by 0x4E2F09D: start_thread (in /lib64/libpthread-2.5.so)
754 by 0x51054CC: clone (in /lib64/libc-2.5.so)
755 Old state: shared-modified by threads #2, #3, #4, #5
756 New state: shared-modified by threads #2, #3, #4, #5
757 Reason: this thread, #2, holds no consistent locks
758 Last consistently used lock for 0x602174 was first observed
sewardj572feb72007-11-09 23:59:49 +0000759 at 0x4C25D01: pthread_mutex_init (hg_intercepts.c:326)
sewardjb4112022007-11-09 22:49:28 +0000760 by 0x4009E4: gomp_barrier_init (tc17_sembar.c:46)
761 by 0x400CBC: main (tc17_sembar.c:192)
762]]></programlisting>
763
sewardj572feb72007-11-09 23:59:49 +0000764<para>Helgrind first announces the creation points of any threads
sewardjb4112022007-11-09 22:49:28 +0000765referenced in the error message. This is so it can speak concisely
766about threads and sets of threads without repeatedly printing their
767creation point call stacks. Each thread is only ever announced once,
sewardj572feb72007-11-09 23:59:49 +0000768the first time it appears in any Helgrind error message.</para>
sewardjb4112022007-11-09 22:49:28 +0000769
770<para>The main error message begins at the text
771"<computeroutput>Possible data race during read</computeroutput>".
772At the start is information you would expect to see -- address and
773size of the racing access, whether a read or a write, and the call
774stack at the point it was detected.</para>
775
776<para>More interesting is the state transition caused by this access.
777This memory is already in the shared-modified state, and up to now has
778been consistently protected by at least one lock. However, the thread
779making the access in question (thread #2, here) does not hold any
780locks in common with those held during all previous accesses to the
781location -- "no consistent locks", in other words.</para>
782
sewardj572feb72007-11-09 23:59:49 +0000783<para>Finally, Helgrind shows the lock which has protected this
sewardjb4112022007-11-09 22:49:28 +0000784location in all previous accesses. (If there is more than one, only
785one is shown). This can be a useful hint, because it typically shows
786the lock that the programmers intended to use to protect the location,
787but in this case forgot.</para>
788
789<para>Here are some more examples of race reports. This not an
790exhaustive list of combinations, but should give you some insight into
791how to interpret the output.</para>
792
793<programlisting><![CDATA[
794Possible data race during write ...
795 Old state: shared-readonly by threads #1, #2, #3
796 New state: shared-modified by threads #1, #2, #3
797 Reason: this thread, #3, holds no consistent locks
798 Location ... has never been protected by any lock
799]]></programlisting>
800
801<para>The location is shared by 3 threads, all of which have been
802reading it without locking ("has never been protected by any lock").
803Now one of them is writing it. Regardless of whether the writer has a
804lock or not, this is still an error, because the write races against
805the previously observed reads.</para>
806
807<programlisting><![CDATA[
808Possible data race during read ...
809 Old state: shared-modified by threads #1, #2, #3
810 New state: shared-modified by threads #1, #2, #3
811 Reason: this thread, #3, holds no consistent locks
812 Last consistently used lock for ... was first observed ...
813]]></programlisting>
814
815<para>The location is shared by 3 threads, all of which have been
816reading and writing it while (as required) holding at least one lock
817in common. Now it is being read without that lock being held. In the
sewardj572feb72007-11-09 23:59:49 +0000818"Last consistently used lock" part, Helgrind offers its best guess as
sewardjb4112022007-11-09 22:49:28 +0000819to the identity of the lock that should have been used.</para>
820
821<programlisting><![CDATA[
822Possible data race during write ...
823 Old state: owned exclusively by thread #4
824 New state: shared-modified by threads #4, #5
825 Reason: this thread, #5, holds no locks at all
826]]></programlisting>
827
828<para>A location that has so far been accessed exclusively by thread
829#4 has now been written by thread #5, without use of any lock. This
830can be a sign that the programmer did not consider the possibility of
831the location being shared between threads, or, alternatively, forgot
832to use the appropriate lock.</para>
833
834<para>Note that thread #4 exclusively owns the location, and so has
835the right to access it without holding a lock. However, this message
836does not say that thread #4 is not using a lock for this location.
837Indeed, it could be using a lock for the location because it intends
838to make it available to other threads, one of which is thread #5 --
839and thread #5 has forgotten to use the lock.</para>
840
sewardj572feb72007-11-09 23:59:49 +0000841<para>Also, this message implies that Helgrind did not see any
sewardjb4112022007-11-09 22:49:28 +0000842synchronisation event between threads #4 and #5 that would have
843allowed #5 to acquire exclusive ownership from #4. See
sewardj572feb72007-11-09 23:59:49 +0000844<link linkend="hg-manual.data-races.exclusive">above</link>
sewardjb4112022007-11-09 22:49:28 +0000845for a discussion of transfers of exclusive ownership states between
846threads.</para>
847
848</sect2>
849
850
851</sect1>
852
sewardj572feb72007-11-09 23:59:49 +0000853<sect1 id="hg-manual.effective-use" xreflabel="Helgrind Effective Use">
854<title>Hints and Tips for Effective Use of Helgrind</title>
sewardjb4112022007-11-09 22:49:28 +0000855
sewardj572feb72007-11-09 23:59:49 +0000856<para>Helgrind can be very helpful in finding and resolving
sewardjb4112022007-11-09 22:49:28 +0000857threading-related problems. Like all sophisticated tools, it is most
858effective when you understand how to play to its strengths.</para>
859
sewardj572feb72007-11-09 23:59:49 +0000860<para>Helgrind will be less effective when you merely throw an
sewardjb4112022007-11-09 22:49:28 +0000861existing threaded program at it and try to make sense of any reported
862errors. It will be more effective if you design threaded programs
sewardj572feb72007-11-09 23:59:49 +0000863from the start in a way that helps Helgrind verify correctness. The
sewardjb4112022007-11-09 22:49:28 +0000864same is true for finding memory errors with Memcheck, but applies more
865here, because thread checking is a harder problem. Consequently it is
sewardj572feb72007-11-09 23:59:49 +0000866much easier to write a correct program for which Helgrind falsely
sewardjb4112022007-11-09 22:49:28 +0000867reports (threading) errors than it is to write a correct program for
868which Memcheck falsely reports (memory) errors.</para>
869
870<para>With that in mind, here are some tips, listed most important first,
871for getting reliable results and avoiding false errors. The first two
872are critical. Any violations of them will swamp you with huge numbers
873of false data-race errors.</para>
874
875
876<orderedlist>
877
878 <listitem>
879 <para>Make sure your application, and all the libraries it uses,
sewardj572feb72007-11-09 23:59:49 +0000880 use the POSIX threading primitives. Helgrind needs to be able to
sewardjb4112022007-11-09 22:49:28 +0000881 see all events pertaining to thread creation, exit, locking and
sewardj33878892007-11-17 09:43:25 +0000882 other synchronisation events. To do so it intercepts many POSIX
sewardjb4112022007-11-09 22:49:28 +0000883 pthread_ functions.</para>
884
885 <para>Do not roll your own threading primitives (mutexes, etc)
886 from combinations of the Linux futex syscall, counters and wotnot.
sewardj572feb72007-11-09 23:59:49 +0000887 These throw Helgrind's internal what's-going-on models way off
sewardjb4112022007-11-09 22:49:28 +0000888 course and will give bogus results.</para>
889
890 <para>Also, do not reimplement existing POSIX abstractions using
891 other POSIX abstractions. For example, don't build your own
892 semaphore routines or reader-writer locks from POSIX mutexes and
893 condition variables. Instead use POSIX reader-writer locks and
sewardj572feb72007-11-09 23:59:49 +0000894 semaphores directly, since Helgrind supports them directly.</para>
sewardjb4112022007-11-09 22:49:28 +0000895
sewardj572feb72007-11-09 23:59:49 +0000896 <para>Helgrind directly supports the following POSIX threading
sewardjb4112022007-11-09 22:49:28 +0000897 abstractions: mutexes, reader-writer locks, condition variables
898 (but see below), and semaphores. Currently spinlocks and barriers
899 are not supported, although they could be in future. A prototype
900 "safe" implementation of barriers, based on semaphores, is
901 available: please contact the Valgrind authors for details.</para>
902
903 <para>At the time of writing, the following popular Linux packages
904 are known to implement their own threading primitives:</para>
905
906 <itemizedlist>
907 <listitem><para>Qt version 4.X. Qt 3.X is fine, but not 4.X.
sewardj572feb72007-11-09 23:59:49 +0000908 Helgrind contains partial direct support for Qt 4.X threading,
sewardjb4112022007-11-09 22:49:28 +0000909 but this is not yet in a usable state. Assistance from folks
910 knowledgeable in Qt 4 threading internals would be
911 appreciated.</para></listitem>
912
913 <listitem><para>Runtime support library for GNU OpenMP (part of
914 GCC), at least GCC versions 4.2 and 4.3. With some minor effort
915 of modifying the GNU OpenMP runtime support sources, it is
sewardj572feb72007-11-09 23:59:49 +0000916 possible to use Helgrind on GNU OpenMP compiled codes. Please
sewardjb4112022007-11-09 22:49:28 +0000917 contact the Valgrind authors for details.</para></listitem>
918 </itemizedlist>
919 </listitem>
920
921 <listitem>
922 <para>Avoid memory recycling. If you can't avoid it, you must use
sewardj572feb72007-11-09 23:59:49 +0000923 tell Helgrind what is going on via the VALGRIND_HG_CLEAN_MEMORY
sewardjb4112022007-11-09 22:49:28 +0000924 client request
sewardj572feb72007-11-09 23:59:49 +0000925 (in <computeroutput>helgrind.h</computeroutput>).</para>
sewardjb4112022007-11-09 22:49:28 +0000926
sewardj572feb72007-11-09 23:59:49 +0000927 <para>Helgrind is aware of standard memory allocation and
sewardjb4112022007-11-09 22:49:28 +0000928 deallocation that occurs via malloc/free/new/delete and from entry
929 and exit of stack frames. In particular, when memory is
sewardj572feb72007-11-09 23:59:49 +0000930 deallocated via free, delete, or function exit, Helgrind considers
sewardjb4112022007-11-09 22:49:28 +0000931 that memory clean, so when it is eventually reallocated, its
932 history is irrelevant.</para>
933
934 <para>However, it is common practice to implement memory recycling
935 schemes. In these, memory to be freed is not handed to
936 malloc/delete, but instead put into a pool of free buffers to be
sewardj572feb72007-11-09 23:59:49 +0000937 handed out again as required. The problem is that Helgrind has no
sewardjb4112022007-11-09 22:49:28 +0000938 way to know that such memory is logically no longer in use, and
939 its history is irrelevant. Hence you must make that explicit,
940 using the VALGRIND_HG_CLEAN_MEMORY client request to specify the
941 relevant address ranges. It's easiest to put these requests into
942 the pool manager code, and use them either when memory is returned
943 to the pool, or is allocated from it.</para>
944 </listitem>
945
946 <listitem>
947 <para>Avoid POSIX condition variables. If you can, use POSIX
948 semaphores (sem_t, sem_post, sem_wait) to do inter-thread event
949 signalling. Semaphores with an initial value of zero are
950 particularly useful for this.</para>
951
sewardj572feb72007-11-09 23:59:49 +0000952 <para>Helgrind only partially correctly handles POSIX condition
953 variables. This is because Helgrind can see inter-thread
sewardjb4112022007-11-09 22:49:28 +0000954 dependencies between a pthread_cond_wait call and a
955 pthread_cond_signal/broadcast call only if the waiting thread
956 actually gets to the rendezvous first (so that it actually calls
957 pthread_cond_wait). It can't see dependencies between the threads
958 if the signaller arrives first. In the latter case, POSIX
959 guidelines imply that the associated boolean condition still
960 provides an inter-thread synchronisation event, but one which is
sewardj572feb72007-11-09 23:59:49 +0000961 invisible to Helgrind.</para>
sewardjb4112022007-11-09 22:49:28 +0000962
sewardj572feb72007-11-09 23:59:49 +0000963 <para>The result of Helgrind missing some inter-thread
sewardjb4112022007-11-09 22:49:28 +0000964 synchronisation events is to cause it to report false positives.
965 That's because missing such events reduces the extent to which it
966 can transfer exclusive memory ownership between threads. So
967 memory may end up in a shared-modified state when that was not
968 intended by the application programmers.</para>
969
970 <para>The root cause of this synchronisation lossage is
971 particularly hard to understand, so an example is helpful. It was
972 discussed at length by Arndt Muehlenfeld ("Runtime Race Detection
973 in Multi-Threaded Programs", Dissertation, TU Graz, Austria). The
974 canonical POSIX-recommended usage scheme for condition variables
975 is as follows:</para>
976
977<programlisting><![CDATA[
978b is a Boolean condition, which is False most of the time
979cv is a condition variable
980mx is its associated mutex
981
982Signaller: Waiter:
983
984lock(mx) lock(mx)
985b = True while (b == False)
986signal(cv) wait(cv,mx)
987unlock(mx) unlock(mx)
988]]></programlisting>
989
990 <para>Assume <computeroutput>b</computeroutput> is False most of
991 the time. If the waiter arrives at the rendezvous first, it
992 enters its while-loop, waits for the signaller to signal, and
sewardj572feb72007-11-09 23:59:49 +0000993 eventually proceeds. Helgrind sees the signal, notes the
sewardjb4112022007-11-09 22:49:28 +0000994 dependency, and all is well.</para>
995
996 <para>If the signaller arrives
997 first, <computeroutput>b</computeroutput> is set to true, and the
998 signal disappears into nowhere. When the waiter later arrives, it
999 does not enter its while-loop and simply carries on. But even in
1000 this case, the waiter code following the while-loop cannot execute
1001 until the signaller sets <computeroutput>b</computeroutput> to
1002 True. Hence there is still the same inter-thread dependency, but
1003 this time it is through an arbitrary in-memory condition, and
sewardj572feb72007-11-09 23:59:49 +00001004 Helgrind cannot see it.</para>
sewardjb4112022007-11-09 22:49:28 +00001005
sewardj572feb72007-11-09 23:59:49 +00001006 <para>By comparison, Helgrind's detection of inter-thread
sewardjb4112022007-11-09 22:49:28 +00001007 dependencies caused by semaphore operations is believed to be
1008 exactly correct.</para>
1009
1010 <para>As far as I know, a solution to this problem that does not
1011 require source-level annotation of condition-variable wait loops
1012 is beyond the current state of the art.</para>
1013 </listitem>
1014
1015 <listitem>
1016 <para>Make sure you are using a supported Linux distribution. At
sewardj572feb72007-11-09 23:59:49 +00001017 present, Helgrind only properly supports x86-linux and amd64-linux
sewardjb4112022007-11-09 22:49:28 +00001018 with glibc-2.3 or later. The latter restriction means we only
1019 support glibc's NPTL threading implementation. The old
1020 LinuxThreads implementation is not supported.</para>
1021
1022 <para>Unsupported targets may work to varying degrees. In
1023 particular ppc32-linux and ppc64-linux running NTPL should work,
sewardj572feb72007-11-09 23:59:49 +00001024 but you will get false race errors because Helgrind does not know
sewardjb4112022007-11-09 22:49:28 +00001025 how to properly handle atomic instruction sequences created using
1026 the lwarx/stwcx instructions.</para>
1027 </listitem>
1028
1029 <listitem>
1030 <para>Round up all finished threads using pthread_join. Avoid
1031 detaching threads: don't create threads in the detached state, and
1032 don't call pthread_detach on existing threads.</para>
1033
1034 <para>Using pthread_join to round up finished threads provides a
sewardj572feb72007-11-09 23:59:49 +00001035 clear synchronisation point that both Helgrind and programmers can
1036 see. This synchronisation point allows Helgrind to adjust its
sewardjb4112022007-11-09 22:49:28 +00001037 memory ownership
sewardj572feb72007-11-09 23:59:49 +00001038 models <link linkend="hg-manual.data-races.exclusive">as described
1039 extensively above</link>, which helps Helgrind produce more
sewardjb4112022007-11-09 22:49:28 +00001040 accurate error reports.</para>
1041
sewardj572feb72007-11-09 23:59:49 +00001042 <para>If you don't call pthread_join on a thread, Helgrind has no
sewardjb4112022007-11-09 22:49:28 +00001043 way to know when it finishes, relative to any significant
1044 synchronisation points for other threads in the program. So it
1045 assumes that the thread lingers indefinitely and can potentially
1046 interfere indefinitely with the memory state of the program. It
1047 has every right to assume that -- after all, it might really be
1048 the case that, for scheduling reasons, the exiting thread did run
1049 very slowly in the last stages of its life.</para>
1050 </listitem>
1051
1052 <listitem>
sewardj572feb72007-11-09 23:59:49 +00001053 <para>Perform thread debugging (with Helgrind) and memory
sewardjb4112022007-11-09 22:49:28 +00001054 debugging (with Memcheck) together.</para>
1055
sewardj572feb72007-11-09 23:59:49 +00001056 <para>Helgrind tracks the state of memory in detail, and memory
sewardjb4112022007-11-09 22:49:28 +00001057 management bugs in the application are liable to cause confusion.
1058 In extreme cases, applications which do many invalid reads and
1059 writes (particularly to freed memory) have been known to crash
sewardj572feb72007-11-09 23:59:49 +00001060 Helgrind. So, ideally, you should make your application
1061 Memcheck-clean before using Helgrind.</para>
sewardjb4112022007-11-09 22:49:28 +00001062
1063 <para>It may be impossible to make your application Memcheck-clean
1064 unless you first remove threading bugs. In particular, it may be
1065 difficult to remove all reads and writes to freed memory in
1066 multithreaded C++ destructor sequences at program termination.
sewardj572feb72007-11-09 23:59:49 +00001067 So, ideally, you should make your application Helgrind-clean
sewardjb4112022007-11-09 22:49:28 +00001068 before using Memcheck.</para>
1069
1070 <para>Since this circularity is obviously unresolvable, at least
sewardj572feb72007-11-09 23:59:49 +00001071 bear in mind that Memcheck and Helgrind are to some extent
sewardjb4112022007-11-09 22:49:28 +00001072 complementary, and you may need to use them together.</para>
1073 </listitem>
1074
1075 <listitem>
1076 <para>POSIX requires that implementations of standard I/O (printf,
1077 fprintf, fwrite, fread, etc) are thread safe. Unfortunately GNU
1078 libc implements this by using internal locking primitives that
sewardj572feb72007-11-09 23:59:49 +00001079 Helgrind is unable to intercept. Consequently Helgrind generates
sewardjb4112022007-11-09 22:49:28 +00001080 many false race reports when you use these functions.</para>
1081
sewardj572feb72007-11-09 23:59:49 +00001082 <para>Helgrind attempts to hide these errors using the standard
sewardjb4112022007-11-09 22:49:28 +00001083 Valgrind error-suppression mechanism. So, at least for simple
1084 test cases, you don't see any. Nevertheless, some may slip
1085 through. Just something to be aware of.</para>
1086 </listitem>
1087
1088 <listitem>
sewardj572feb72007-11-09 23:59:49 +00001089 <para>Helgrind's error checks do not work properly inside the
sewardjb4112022007-11-09 22:49:28 +00001090 system threading library itself
1091 (<computeroutput>libpthread.so</computeroutput>), and it usually
1092 observes large numbers of (false) errors in there. Valgrind's
1093 suppression system then filters these out, so you should not see
1094 them.</para>
1095
1096 <para>If you see any race errors reported
1097 where <computeroutput>libpthread.so</computeroutput> or
1098 <computeroutput>ld.so</computeroutput> is the object associated
1099 with the innermost stack frame, please file a bug report at
1100 http://www.valgrind.org.</para>
1101 </listitem>
1102
1103</orderedlist>
1104
1105</sect1>
1106
1107
1108
1109
sewardj572feb72007-11-09 23:59:49 +00001110<sect1 id="hg-manual.options" xreflabel="Helgrind Options">
1111<title>Helgrind Options</title>
sewardjb4112022007-11-09 22:49:28 +00001112
1113<para>The following end-user options are available:</para>
1114
1115<!-- start of xi:include in the manpage -->
sewardj572feb72007-11-09 23:59:49 +00001116<variablelist id="hg.opts.list">
sewardjb4112022007-11-09 22:49:28 +00001117
1118 <varlistentry id="opt.happens-before" xreflabel="--happens-before">
1119 <term>
1120 <option><![CDATA[--happens-before=none|threads|all
1121 [default: all] ]]></option>
1122 </term>
1123 <listitem>
sewardj572feb72007-11-09 23:59:49 +00001124 <para>Helgrind always regards locks as the basis for
sewardjb4112022007-11-09 22:49:28 +00001125 inter-thread synchronisation. However, by default, before
sewardj572feb72007-11-09 23:59:49 +00001126 reporting a race error, Helgrind will also check whether
sewardjb4112022007-11-09 22:49:28 +00001127 certain other kinds of inter-thread synchronisation events
1128 happened. It may be that if such events took place, then no
1129 race really occurred, and so no error needs to be reported.
sewardj572feb72007-11-09 23:59:49 +00001130 See <link linkend="hg-manual.data-races.exclusive">above</link>
sewardjb4112022007-11-09 22:49:28 +00001131 for a discussion of transfers of exclusive ownership states
1132 between threads.
1133 </para>
1134 <para>With <varname>--happens-before=all</varname>, the
1135 following events are regarded as sources of synchronisation:
1136 thread creation/joinage, condition variable
1137 signal/broadcast/waits, and semaphore posts/waits.
1138 </para>
1139 <para>With <varname>--happens-before=threads</varname>, only
1140 thread creation/joinage events are regarded as sources of
1141 synchronisation.
1142 </para>
1143 <para>With <varname>--happens-before=none</varname>, no events
1144 (apart, of course, from locking) are regarded as sources of
1145 synchronisation.
1146 </para>
1147 <para>Changing this setting from the default will increase your
1148 false-error rate but give little or no gain. The only advantage
1149 is that <option>--happens-before=threads</option> and
sewardj572feb72007-11-09 23:59:49 +00001150 <option>--happens-before=none</option> should make Helgrind
sewardjb4112022007-11-09 22:49:28 +00001151 less and less sensitive to the scheduling of threads, and hence
1152 the output more and more repeatable across runs.
1153 </para>
1154 </listitem>
1155 </varlistentry>
1156
1157 <varlistentry id="opt.trace-addr" xreflabel="--trace-addr">
1158 <term>
1159 <option><![CDATA[--trace-addr=0xXXYYZZ
1160 ]]></option> and
1161 <option><![CDATA[--trace-level=0|1|2 [default: 1]
1162 ]]></option>
1163 </term>
1164 <listitem>
sewardj572feb72007-11-09 23:59:49 +00001165 <para>Requests that Helgrind produces a log of all state changes
sewardjb4112022007-11-09 22:49:28 +00001166 to location 0xXXYYZZ. This can be helpful in tracking down
1167 tricky races. <varname>--trace-level</varname> controls the
1168 verbosity of the log. At the default setting (1), a one-line
1169 summary of is printed for each state change. At level 2 a
1170 complete stack trace is printed for each state change.</para>
1171 </listitem>
1172 </varlistentry>
1173
1174</variablelist>
1175<!-- end of xi:include in the manpage -->
1176
1177<!-- start of xi:include in the manpage -->
1178<para>In addition, the following debugging options are available for
sewardj572feb72007-11-09 23:59:49 +00001179Helgrind:</para>
sewardjb4112022007-11-09 22:49:28 +00001180
sewardj572feb72007-11-09 23:59:49 +00001181<variablelist id="hg.debugopts.list">
sewardjb4112022007-11-09 22:49:28 +00001182
1183 <varlistentry id="opt.trace-malloc" xreflabel="--trace-malloc">
1184 <term>
1185 <option><![CDATA[--trace-malloc=no|yes [no]
1186 ]]></option>
1187 </term>
1188 <listitem>
1189 <para>Show all client malloc (etc) and free (etc) requests.</para>
1190 </listitem>
1191 </varlistentry>
1192
1193 <varlistentry id="opt.gen-vcg" xreflabel="--gen-vcg">
1194 <term>
1195 <option><![CDATA[--gen-vcg=no|yes|yes-w-vts [no]
1196 ]]></option>
1197 </term>
1198 <listitem>
1199 <para>At exit, write to stderr a dump of the happens-before
sewardj572feb72007-11-09 23:59:49 +00001200 graph computed by Helgrind, in a format suitable for the VCG
sewardjb4112022007-11-09 22:49:28 +00001201 graph visualisation tool. A suitable command line is:</para>
sewardj572feb72007-11-09 23:59:49 +00001202 <para><computeroutput>valgrind --tool=helgrind
sewardjb4112022007-11-09 22:49:28 +00001203 --gen-vcg=yes my_app 2&gt;&amp;1
1204 | grep xxxxxx | sed "s/xxxxxx//g"
1205 | xvcg -</computeroutput></para>
1206 <para>With <varname>--gen-vcg=yes</varname>, the basic
1207 happens-before graph is shown. With
1208 <varname>--gen-vcg=yes-w-vts</varname>, the vector timestamp
1209 for each node is also shown.</para>
1210 </listitem>
1211 </varlistentry>
1212
1213 <varlistentry id="opt.cmp-race-err-addrs"
1214 xreflabel="--cmp-race-err-addrs">
1215 <term>
1216 <option><![CDATA[--cmp-race-err-addrs=no|yes [no]
1217 ]]></option>
1218 </term>
1219 <listitem>
1220 <para>Controls whether or not race (data) addresses should be
1221 taken into account when removing duplicates of race errors.
1222 With <varname>--cmp-race-err-addrs=no</varname>, two otherwise
1223 identical race errors will be considered to be the same if
1224 their race addresses differ. With
1225 With <varname>--cmp-race-err-addrs=yes</varname> they will be
1226 considered different. This is provided to help make certain
1227 regression tests work reliably.</para>
1228 </listitem>
1229 </varlistentry>
1230
1231 <varlistentry id="opt.tc-sanity-flags" xreflabel="--tc-sanity-flags">
1232 <term>
1233 <option><![CDATA[--tc-sanity-flags=<XXXXX> (X = 0|1) [00000]
1234 ]]></option>
1235 </term>
1236 <listitem>
sewardj572feb72007-11-09 23:59:49 +00001237 <para>Run extensive sanity checks on Helgrind's internal
sewardjb4112022007-11-09 22:49:28 +00001238 data structures at events defined by the bitstring, as
1239 follows:</para>
1240 <para><computeroutput>10000 </computeroutput>after changes to
1241 the lock order acquisition graph</para>
1242 <para><computeroutput>01000 </computeroutput>after every client
1243 memory access (NB: not currently used)</para>
1244 <para><computeroutput>00100 </computeroutput>after every client
1245 memory range permission setting of 256 bytes or greater</para>
1246 <para><computeroutput>00010 </computeroutput>after every client
1247 lock or unlock event</para>
1248 <para><computeroutput>00001 </computeroutput>after every client
1249 thread creation or joinage event</para>
sewardj572feb72007-11-09 23:59:49 +00001250 <para>Note these will make Helgrind run very slowly, often to
sewardjb4112022007-11-09 22:49:28 +00001251 the point of being completely unusable.</para>
1252 </listitem>
1253 </varlistentry>
1254
1255</variablelist>
1256<!-- end of xi:include in the manpage -->
1257
1258
1259</sect1>
1260
sewardj572feb72007-11-09 23:59:49 +00001261<sect1 id="hg-manual.todolist" xreflabel="To Do List">
1262<title>A To-Do List for Helgrind</title>
sewardjb4112022007-11-09 22:49:28 +00001263
1264<para>The following is a list of loose ends which should be tidied up
1265some time.</para>
1266
1267<itemizedlist>
1268 <listitem><para>Track which mutexes are associated with which
1269 condition variables, and emit a warning if this becomes
1270 inconsistent.</para>
1271 </listitem>
1272 <listitem><para>For lock order errors, print the complete lock
1273 cycle, rather than only doing for size-2 cycles as at
1274 present.</para>
1275 </listitem>
1276 <listitem><para>Document the VALGRIND_HG_CLEAN_MEMORY client
1277 request.</para>
1278 </listitem>
1279 <listitem><para>Possibly a client request to forcibly transfer
1280 ownership of memory from one thread to another. Requires further
1281 consideration.</para>
1282 </listitem>
1283 <listitem><para>Add a new client request that marks an address range
1284 as being "shared-modified with empty lockset" (the error state),
1285 and describe how to use it.</para>
1286 </listitem>
1287 <listitem><para>Document races caused by gcc's thread-unsafe code
1288 generation for speculative stores. In the interim see
1289 <computeroutput>http://gcc.gnu.org/ml/gcc/2007-10/msg00266.html
1290 </computeroutput>
1291 and <computeroutput>http://lkml.org/lkml/2007/10/24/673</computeroutput>.
1292 </para>
1293 </listitem>
1294 <listitem><para>Don't update the lock-order graph, and don't check
1295 for errors, when a "try"-style lock operation happens (eg
1296 pthread_mutex_trylock). Such calls do not add any real
1297 restrictions to the locking order, since they can always fail to
1298 acquire the lock, resulting in the caller going off and doing Plan
1299 B (presumably it will have a Plan B). Doing such checks could
1300 generate false lock-order errors and confuse users.</para>
1301 </listitem>
1302 <listitem><para> Performance can be very poor. Slowdowns on the
1303 order of 100:1 are not unusual. There is quite some scope for
1304 performance improvements, though.
1305 </para>
1306 </listitem>
1307
1308</itemizedlist>
1309
1310</sect1>
1311
1312</chapter>