blob: 73dde8c69dbcffaf270c6e6ccbf162cce3c472de [file] [log] [blame]
sewardjb4112022007-11-09 22:49:28 +00001<?xml version="1.0"?> <!-- -*- sgml -*- -->
2<!DOCTYPE chapter PUBLIC "-//OASIS//DTD DocBook XML V4.2//EN"
3 "http://www.oasis-open.org/docbook/xml/4.2/docbookx.dtd">
4
5
sewardj572feb72007-11-09 23:59:49 +00006<chapter id="hg-manual" xreflabel="Helgrind: thread error detector">
7 <title>Helgrind: a thread error detector</title>
sewardjb4112022007-11-09 22:49:28 +00008
9<para>To use this tool, you must specify
sewardj572feb72007-11-09 23:59:49 +000010<computeroutput>--tool=helgrind</computeroutput> on the Valgrind
sewardjb4112022007-11-09 22:49:28 +000011command line.</para>
12
13
14
15
sewardj572feb72007-11-09 23:59:49 +000016<sect1 id="hg-manual.overview" xreflabel="Overview">
sewardjb4112022007-11-09 22:49:28 +000017<title>Overview</title>
18
sewardj572feb72007-11-09 23:59:49 +000019<para>Helgrind is a Valgrind tool for detecting synchronisation errors
sewardjb4112022007-11-09 22:49:28 +000020in C, C++ and Fortran programs that use the POSIX pthreads
21threading primitives.</para>
22
23<para>The main abstractions in POSIX pthreads are: a set of threads
24sharing a common address space, thread creation, thread joinage,
25thread exit, mutexes (locks), condition variables (inter-thread event
26notifications), reader-writer locks, and semaphores.</para>
27
sewardj572feb72007-11-09 23:59:49 +000028<para>Helgrind is aware of all these abstractions and tracks their
sewardjb4112022007-11-09 22:49:28 +000029effects as accurately as it can. Currently it does not correctly
30handle pthread barriers and pthread spinlocks, although it will not
31object if you use them. On x86 and amd64 platforms, it understands
32and partially handles implicit locking arising from the use of the
33LOCK instruction prefix.
34</para>
35
sewardj572feb72007-11-09 23:59:49 +000036<para>Helgrind can detect three classes of errors, which are discussed
sewardjb4112022007-11-09 22:49:28 +000037in detail in the next three sections:</para>
38
39<orderedlist>
40 <listitem>
sewardj572feb72007-11-09 23:59:49 +000041 <para><link linkend="hg-manual.api-checks">
sewardjb4112022007-11-09 22:49:28 +000042 Misuses of the POSIX pthreads API.</link></para>
43 </listitem>
44 <listitem>
sewardj572feb72007-11-09 23:59:49 +000045 <para><link linkend="hg-manual.lock-orders">
sewardjb4112022007-11-09 22:49:28 +000046 Potential deadlocks arising from lock
47 ordering problems.</link></para>
48 </listitem>
49 <listitem>
sewardj572feb72007-11-09 23:59:49 +000050 <para><link linkend="hg-manual.data-races">
sewardjb4112022007-11-09 22:49:28 +000051 Data races -- accessing memory without adequate locking.
52 </link></para>
53 </listitem>
54</orderedlist>
55
56<para>Following those is a section containing
sewardj572feb72007-11-09 23:59:49 +000057<link linkend="hg-manual.effective-use">
58hints and tips on how to get the best out of Helgrind.</link>
sewardjb4112022007-11-09 22:49:28 +000059</para>
60
61<para>Then there is a
sewardj572feb72007-11-09 23:59:49 +000062<link linkend="hg-manual.options">summary of command-line
sewardjb4112022007-11-09 22:49:28 +000063options.</link>
64</para>
65
66<para>Finally, there is
sewardj572feb72007-11-09 23:59:49 +000067<link linkend="hg-manual.todolist">a brief summary of areas in which Helgrind
sewardjb4112022007-11-09 22:49:28 +000068could be improved.</link>
69</para>
70
71</sect1>
72
73
74
75
sewardj572feb72007-11-09 23:59:49 +000076<sect1 id="hg-manual.api-checks" xreflabel="API Checks">
sewardjb4112022007-11-09 22:49:28 +000077<title>Detected errors: Misuses of the POSIX pthreads API</title>
78
sewardj572feb72007-11-09 23:59:49 +000079<para>Helgrind intercepts calls to many POSIX pthreads functions, and
sewardjb4112022007-11-09 22:49:28 +000080is therefore able to report on various common problems. Although
81these are unglamourous errors, their presence can lead to undefined
82program behaviour and hard-to-find bugs later in execution. The
83detected errors are:</para>
84
85<itemizedlist>
86 <listitem><para>unlocking an invalid mutex</para></listitem>
87 <listitem><para>unlocking a not-locked mutex</para></listitem>
88 <listitem><para>unlocking a mutex held by a different
89 thread</para></listitem>
90 <listitem><para>destroying an invalid or a locked mutex</para></listitem>
91 <listitem><para>recursively locking a non-recursive mutex</para></listitem>
92 <listitem><para>deallocation of memory that contains a
93 locked mutex</para></listitem>
94 <listitem><para>passing mutex arguments to functions expecting
95 reader-writer lock arguments, and vice
96 versa</para></listitem>
97 <listitem><para>when a POSIX pthread function fails with an
98 error code that must be handled</para></listitem>
99 <listitem><para>when a thread exits whilst still holding locked
100 locks</para></listitem>
101 <listitem><para>calling <computeroutput>pthread_cond_wait</computeroutput>
102 with a not-locked mutex, or one locked by a different
103 thread</para></listitem>
104</itemizedlist>
105
106<para>Checks pertaining to the validity of mutexes are generally also
107performed for reader-writer locks.</para>
108
109<para>Various kinds of this-can't-possibly-happen events are also
110reported. These usually indicate bugs in the system threading
111library.</para>
112
113<para>Reported errors always contain a primary stack trace indicating
114where the error was detected. They may also contain auxiliary stack
115traces giving additional information. In particular, most errors
116relating to mutexes will also tell you where that mutex first came to
sewardj572feb72007-11-09 23:59:49 +0000117Helgrind's attention (the "<computeroutput>was first observed
sewardjb4112022007-11-09 22:49:28 +0000118at</computeroutput>" part), so you have a chance of figuring out which
119mutex it is referring to. For example:</para>
120
121<programlisting><![CDATA[
122Thread #1 unlocked a not-locked lock at 0x7FEFFFA90
sewardj572feb72007-11-09 23:59:49 +0000123 at 0x4C2408D: pthread_mutex_unlock (hg_intercepts.c:492)
sewardjb4112022007-11-09 22:49:28 +0000124 by 0x40073A: nearly_main (tc09_bad_unlock.c:27)
125 by 0x40079B: main (tc09_bad_unlock.c:50)
126 Lock at 0x7FEFFFA90 was first observed
sewardj572feb72007-11-09 23:59:49 +0000127 at 0x4C25D01: pthread_mutex_init (hg_intercepts.c:326)
sewardjb4112022007-11-09 22:49:28 +0000128 by 0x40071F: nearly_main (tc09_bad_unlock.c:23)
129 by 0x40079B: main (tc09_bad_unlock.c:50)
130]]></programlisting>
131
sewardj572feb72007-11-09 23:59:49 +0000132<para>Helgrind has a way of summarising thread identities, as
sewardjb4112022007-11-09 22:49:28 +0000133evidenced here by the text "<computeroutput>Thread
134#1</computeroutput>". This is so that it can speak about threads and
135sets of threads without overwhelming you with details. See
sewardj572feb72007-11-09 23:59:49 +0000136<link linkend="hg-manual.data-races.errmsgs">below</link>
sewardjb4112022007-11-09 22:49:28 +0000137for more information on interpreting error messages.</para>
138
139</sect1>
140
141
142
143
sewardj572feb72007-11-09 23:59:49 +0000144<sect1 id="hg-manual.lock-orders" xreflabel="Lock Orders">
sewardjb4112022007-11-09 22:49:28 +0000145<title>Detected errors: Inconsistent Lock Orderings</title>
146
147<para>In this section, and in general, to "acquire" a lock simply
148means to lock that lock, and to "release" a lock means to unlock
149it.</para>
150
sewardj572feb72007-11-09 23:59:49 +0000151<para>Helgrind monitors the order in which threads acquire locks.
sewardjb4112022007-11-09 22:49:28 +0000152This allows it to detect potential deadlocks which could arise from
153the formation of cycles of locks. Detecting such inconsistencies is
154useful because, whilst actual deadlocks are fairly obvious, potential
155deadlocks may never be discovered during testing and could later lead
156to hard-to-diagnose in-service failures.</para>
157
158<para>The simplest example of such a problem is as
159follows.</para>
160
161<itemizedlist>
162 <listitem><para>Imagine some shared resource R, which, for whatever
163 reason, is guarded by two locks, L1 and L2, which must both be held
164 when R is accessed.</para>
165 </listitem>
166 <listitem><para>Suppose a thread acquires L1, then L2, and proceeds
167 to access R. The implication of this is that all threads in the
168 program must acquire the two locks in the order first L1 then L2.
169 Not doing so risks deadlock.</para>
170 </listitem>
171 <listitem><para>The deadlock could happen if two threads -- call them
172 T1 and T2 -- both want to access R. Suppose T1 acquires L1 first,
173 and T2 acquires L2 first. Then T1 tries to acquire L2, and T2 tries
174 to acquire L1, but those locks are both already held. So T1 and T2
175 become deadlocked.</para>
176 </listitem>
177</itemizedlist>
178
sewardj572feb72007-11-09 23:59:49 +0000179<para>Helgrind builds a directed graph indicating the order in which
sewardjb4112022007-11-09 22:49:28 +0000180locks have been acquired in the past. When a thread acquires a new
181lock, the graph is updated, and then checked to see if it now contains
182a cycle. The presence of a cycle indicates a potential deadlock involving
183the locks in the cycle.</para>
184
185<para>In simple situations, where the cycle only contains two locks,
sewardj572feb72007-11-09 23:59:49 +0000186Helgrind will show where the required order was established:</para>
sewardjb4112022007-11-09 22:49:28 +0000187
188<programlisting><![CDATA[
189Thread #1: lock order "0x7FEFFFAB0 before 0x7FEFFFA80" violated
sewardj572feb72007-11-09 23:59:49 +0000190 at 0x4C23C91: pthread_mutex_lock (hg_intercepts.c:388)
sewardjb4112022007-11-09 22:49:28 +0000191 by 0x40081F: main (tc13_laog1.c:24)
192 Required order was established by acquisition of lock at 0x7FEFFFAB0
sewardj572feb72007-11-09 23:59:49 +0000193 at 0x4C23C91: pthread_mutex_lock (hg_intercepts.c:388)
sewardjb4112022007-11-09 22:49:28 +0000194 by 0x400748: main (tc13_laog1.c:17)
195 followed by a later acquisition of lock at 0x7FEFFFA80
sewardj572feb72007-11-09 23:59:49 +0000196 at 0x4C23C91: pthread_mutex_lock (hg_intercepts.c:388)
sewardjb4112022007-11-09 22:49:28 +0000197 by 0x400773: main (tc13_laog1.c:18)
198]]></programlisting>
199
200<para>When there are more than two locks in the cycle, the error is
sewardj572feb72007-11-09 23:59:49 +0000201equally serious. However, at present Helgrind does not show the locks
sewardjb4112022007-11-09 22:49:28 +0000202involved, so as to avoid flooding you with information. That could be
203fixed in future. For example, here is a an example involving a cycle
204of five locks from a naive implementation the famous Dining
205Philosophers problem
sewardj572feb72007-11-09 23:59:49 +0000206(see <computeroutput>helgrind/tests/tc14_laog_dinphils.c</computeroutput>).
207In this case Helgrind has detected that all 5 philosophers could
sewardjb4112022007-11-09 22:49:28 +0000208simultaneously pick up their left fork and then deadlock whilst
209waiting to pick up their right forks.</para>
210
211<programlisting><![CDATA[
212Thread #6: lock order "0x6010C0 before 0x601160" violated
sewardj572feb72007-11-09 23:59:49 +0000213 at 0x4C23C91: pthread_mutex_lock (hg_intercepts.c:388)
sewardjb4112022007-11-09 22:49:28 +0000214 by 0x4007C0: dine (tc14_laog_dinphils.c:19)
sewardj572feb72007-11-09 23:59:49 +0000215 by 0x4C25DF7: mythread_wrapper (hg_intercepts.c:178)
sewardjb4112022007-11-09 22:49:28 +0000216 by 0x4E2F09D: start_thread (in /lib64/libpthread-2.5.so)
217 by 0x51054CC: clone (in /lib64/libc-2.5.so)
218]]></programlisting>
219
220</sect1>
221
222
223
224
sewardj572feb72007-11-09 23:59:49 +0000225<sect1 id="hg-manual.data-races" xreflabel="Data Races">
sewardjb4112022007-11-09 22:49:28 +0000226<title>Detected errors: Data Races</title>
227
228<para>A data race happens, or could happen, when two threads
229access a shared memory location without using suitable locks to
230ensure single-threaded access. Such missing locking can cause
231obscure timing dependent bugs. Ensuring programs are race-free is
232one of the central difficulties of threaded programming.</para>
233
234<para>Reliably detecting races is a difficult problem, and most
sewardj572feb72007-11-09 23:59:49 +0000235of Helgrind's internals are devoted to do dealing with it.
sewardjb4112022007-11-09 22:49:28 +0000236As a consequence this section is somewhat long and involved.
237We begin with a simple example.</para>
238
239
sewardj572feb72007-11-09 23:59:49 +0000240<sect2 id="hg-manual.data-races.example" xreflabel="Simple Race">
sewardjb4112022007-11-09 22:49:28 +0000241<title>A Simple Data Race</title>
242
243<para>About the simplest possible example of a race is as follows. In
244this program, it is impossible to know what the value
245of <computeroutput>var</computeroutput> is at the end of the program.
246Is it 2 ? Or 1 ?</para>
247
248<programlisting><![CDATA[
249#include <pthread.h>
250
251int var = 0;
252
253void* child_fn ( void* arg ) {
254 var++; /* Unprotected relative to parent */ /* this is line 6 */
255 return NULL;
256}
257
258int main ( void ) {
259 pthread_t child;
260 pthread_create(&child, NULL, child_fn, NULL);
261 var++; /* Unprotected relative to child */ /* this is line 13 */
262 pthread_join(child, NULL);
263 return 0;
264}
265]]></programlisting>
266
267<para>The problem is there is nothing to
268stop <computeroutput>var</computeroutput> being updated simultaneously
269by both threads. A correct program would
270protect <computeroutput>var</computeroutput> with a lock of type
271<computeroutput>pthread_mutex_t</computeroutput>, which is acquired
sewardj572feb72007-11-09 23:59:49 +0000272before each access and released afterwards. Helgrind's output for
sewardjb4112022007-11-09 22:49:28 +0000273this program is:</para>
274
275<programlisting><![CDATA[
276Thread #1 is the program's root thread
277
278Thread #2 was created
279 at 0x510548E: clone (in /lib64/libc-2.5.so)
280 by 0x4E2F305: do_clone (in /lib64/libpthread-2.5.so)
281 by 0x4E2F7C5: pthread_create@@GLIBC_2.2.5 (in /lib64/libpthread-2.5.so)
sewardj572feb72007-11-09 23:59:49 +0000282 by 0x4C23870: pthread_create@* (hg_intercepts.c:198)
sewardjb4112022007-11-09 22:49:28 +0000283 by 0x4005F1: main (simple_race.c:12)
284
285Possible data race during write of size 4 at 0x601034
286 at 0x4005F2: main (simple_race.c:13)
287 Old state: shared-readonly by threads #1, #2
288 New state: shared-modified by threads #1, #2
289 Reason: this thread, #1, holds no consistent locks
290 Location 0x601034 has never been protected by any lock
291]]></programlisting>
292
293<para>This is quite a lot of detail for an apparently simple error.
294The last clause is the main error message. It says there is a race as
295a result of a write of size 4 (bytes), at 0x601034, which is
296presumably the address of <computeroutput>var</computeroutput>,
297happening in function <computeroutput>main</computeroutput> at line 13
298in the program.</para>
299
300<para>Note that it is purely by chance that the race is
301reported for the parent thread's access. It could equally have been
302reported instead for the child's access, at line 6. The error will
303only be reported for one of the locations, since neither the parent
304nor child is, by itself, incorrect. It is only when both access
305<computeroutput>var</computeroutput> without a lock that an error
306exists.</para>
307
308<para>The error message shows some other interesting details. The
309sections below explain them. Here we merely note their presence:</para>
310
311<itemizedlist>
sewardj572feb72007-11-09 23:59:49 +0000312 <listitem><para>Helgrind maintains some kind of state machine for the
sewardjb4112022007-11-09 22:49:28 +0000313 memory location in question, hence the "<computeroutput>Old
314 state:</computeroutput>" and "<computeroutput>New
315 state:</computeroutput>" lines.</para>
316 </listitem>
sewardj572feb72007-11-09 23:59:49 +0000317 <listitem><para>Helgrind keeps track of which threads have accessed
sewardjb4112022007-11-09 22:49:28 +0000318 the location: "<computeroutput>threads #1, #2</computeroutput>".
319 Before printing the main error message, it prints the creation
320 points of these two threads, so you can see which threads it is
321 referring to.</para>
322 </listitem>
sewardj572feb72007-11-09 23:59:49 +0000323 <listitem><para>Helgrind tries to provide an explaination of why the
sewardjb4112022007-11-09 22:49:28 +0000324 race exists: "<computeroutput>Location 0x601034 has never been
325 protected by any lock</computeroutput>".</para>
326 </listitem>
327</itemizedlist>
328
329<para>Understanding the memory state machine is central to
sewardj572feb72007-11-09 23:59:49 +0000330understanding Helgrind's race-detection algorithm. The next three
sewardjb4112022007-11-09 22:49:28 +0000331subsections explain this.</para>
332
333</sect2>
334
335
sewardj572feb72007-11-09 23:59:49 +0000336<sect2 id="hg-manual.data-races.memstates" xreflabel="Memory States">
337<title>Helgrind's Memory State Machine</title>
sewardjb4112022007-11-09 22:49:28 +0000338
sewardj572feb72007-11-09 23:59:49 +0000339<para>Helgrind tracks the state of every byte of memory used by your
sewardjb4112022007-11-09 22:49:28 +0000340program. There are a number of states, but only three are
341interesting:</para>
342
343<itemizedlist>
344 <listitem><para>Exclusive: memory in this state is regarded as owned
345 exclusively by one particular thread. That thread may read and
346 write it without a lock. Even in highly threaded programs, the
347 majority of locations never leave the Exclusive state, since most
348 data is thread-private.</para>
349 </listitem>
350 <listitem><para>Shared-Readonly: memory in this state is regarded as
351 shared by multiple threads. In this state, any thread may read the
352 memory without a lock, reflecting the fact that readonly data may
353 safely be shared between threads without locking.</para>
354 </listitem>
355 <listitem><para>Shared-Modified: memory in this state is regarded as
356 shared by multiple threads, at least one of which has written to it.
357 All participating threads must hold at least one lock in common when
sewardj572feb72007-11-09 23:59:49 +0000358 accessing the memory. If no such lock exists, Helgrind reports a
sewardjb4112022007-11-09 22:49:28 +0000359 race error.</para>
360 </listitem>
361</itemizedlist>
362
363<para>Let's review the simple example above with this in mind. When
364the program starts, <computeroutput>var</computeroutput> is not in any
365of these states. Either the parent or child thread gets to its
366<computeroutput>var++</computeroutput> first, and thereby
367thereby gets Exclusive ownership of the location.</para>
368
369<para>The later-running thread now arrives at
370its <computeroutput>var++</computeroutput> statement. It first reads
371the existing value from memory.
372Because <computeroutput>var</computeroutput> is currently marked as
373owned exclusively by the other thread, its state is changed to
374shared-readonly by both threads.</para>
375
376<para>This same thread adds one to the value it has and stores it back
377in <computeroutput>var</computeroutput>. This causes another state
sewardj572feb72007-11-09 23:59:49 +0000378change, this time to the shared-modified state. Because Helgrind has
sewardjb4112022007-11-09 22:49:28 +0000379also been tracking which threads hold which locks, it can see that
380<computeroutput>var</computeroutput> is in shared-modified state but
381no lock has been used to consistently protect it. Hence a race is
382reported exactly at the transition from shared-readonly to
383shared-modified.</para>
384
sewardj572feb72007-11-09 23:59:49 +0000385<para>The essence of the algorithm is this. Helgrind keeps track of
sewardjb4112022007-11-09 22:49:28 +0000386each memory location that has been accessed by more than one thread.
387For each such location it incrementally infers the set of locks which
388have consistently been used to protect that location. If the
389location's lockset becomes empty, and at some point one of the threads
390attempts to write to it, a race is then reported.</para>
391
392<para>This technique is known as "lockset inference" and was
393introduced in: "Eraser: A Dynamic Data Race Detector for Multithreaded
394Programs" (Stefan Savage, Michael Burrows, Greg Nelson, Patrick
395Sobalvarro and Thomas Anderson, ACM Transactions on Computer Systems,
39615(4):391-411, November 1997).</para>
397
398<para>Lockset inference has since been widely implemented, studied and
sewardj572feb72007-11-09 23:59:49 +0000399extended. Helgrind incorporates several refinements aimed at avoiding
sewardjb4112022007-11-09 22:49:28 +0000400the high false error rate that naive versions of the algorithm suffer
401from. A
sewardj572feb72007-11-09 23:59:49 +0000402<link linkend="hg-manual.data-races.summary">summary of the complete
403algorithm used by Helgrind</link> is presented below. First, however,
sewardjb4112022007-11-09 22:49:28 +0000404it is important to understand details of transitions pertaining to the
405Exclusive-ownership state.</para>
406
407</sect2>
408
409
410
sewardj572feb72007-11-09 23:59:49 +0000411<sect2 id="hg-manual.data-races.exclusive" xreflabel="Excl Transfers">
sewardjb4112022007-11-09 22:49:28 +0000412<title>Transfers of Exclusive Ownership Between Threads</title>
413
414<para>As presented, the algorithm is far too strict. It reports many
415errors in perfectly correct, widely used parallel programming
416constructions, for example, using child worker threads and worker
417thread pools.</para>
418
419<para>To avoid these false errors, we must refine the algorithm so
420that it keeps memory in an Exclusive ownership state in cases where it
421would otherwise decay into a shared-readonly or shared-modified state.
422Recall that Exclusive ownership is special in that it grants the
423owning thread the right to access memory without use of any locks. In
424order to support worker-thread and worker-thread-pool idioms, we will
425allow threads to steal exclusive ownership of memory from other
426threads under certain circumstances.</para>
427
428<para>Here's an example. Imagine a parent thread creates child
429threads to do units of work. For each unit of work, the parent
430allocates a work buffer, fills it in, and creates the child thread,
431handing it a pointer to the buffer. The child reads/writes the buffer
432and eventually exits, and the waiting parent then extracts the results
433from the buffer:</para>
434
435<programlisting><![CDATA[
436typedef ... Buffer;
437
438pthread_t child;
439Buffer buf;
440
441/* ---- Parent ---- */ /* ---- Child ---- */
442
443/* parent writes workload into buf */
444pthread_create( &child, child_fn, &buf );
445
446/* parent does not read */ void child_fn ( Buffer* buf ) {
447/* or write buf */ /* read/write buf */
448 }
449
450pthread_join ( child );
451/* parent reads results from buf */
452]]></programlisting>
453
454<para>Although <computeroutput>buf</computeroutput> is accessed by
455both threads, neither uses locks, yet the program is race-free. The
456essential observation is that the child's creation and exit create
457synchronisation events between it and the parent. These force the
458child's accesses to <computeroutput>buf</computeroutput> to happen
459after the parent initialises <computeroutput>buf</computeroutput>, and
460before the parent reads the results
461from <computeroutput>buf</computeroutput>.</para>
462
sewardj572feb72007-11-09 23:59:49 +0000463<para>To model this, Helgrind allows the child to steal, from the
sewardjb4112022007-11-09 22:49:28 +0000464parent, exclusive ownership of any memory exclusively owned by the
465parent before the pthread_create call. Similarly, once the parent's
466pthread_join call returns, it can steal back ownership of memory
467exclusively owned by the child. In this way ownership
468of <computeroutput>buf</computeroutput> is transferred from parent to
469child and back, so the basic algorithm does not report any races
470despite the absence of any locking.</para>
471
472<para>Note that the child may only steal memory owned by the parent
473prior to the pthread_create call. If the child attempts to read or
474write memory which is also accessed by the parent in between the
475pthread_create and pthread_join calls, an error is still
476reported.</para>
477
478<para>This technique was introduced with the name "thread lifetime
479segments" in "Runtime Checking of Multithreaded Applications with
480Visual Threads" (Jerry J. Harrow, Jr, Proceedings of the 7th
481International SPIN Workshop on Model Checking of Software Stanford,
sewardj572feb72007-11-09 23:59:49 +0000482California, USA, August 2000, LNCS 1885, pp331--342). Helgrind
483implements an extended version of it. Specifically, Helgrind allows
sewardjb4112022007-11-09 22:49:28 +0000484transfer of exclusive ownership in the following situations:</para>
485
486<itemizedlist>
487 <listitem><para>At thread creation: a child can acquire ownership of
488 memory held exclusively by the parent prior to the child's
489 creation.</para>
490 </listitem>
491 <listitem><para>At thread joining: the joiner (thread not exiting)
492 can acquire ownership of memory held exclusively by the joinee
493 (thread that is exiting) at the point it exited.</para>
494 </listitem>
495 <listitem><para>At condition variable signallings and broadcasts. A
496 thread Tw which completes a pthread_cond_wait call as a result of
497 a signal or broadcast on the same condition variable by some other
498 thread Ts, may acquire ownership of memory held exclusively by
499 Ts prior to the pthread_cond_signal/broadcast
500 call.</para>
501 </listitem>
502 <listitem><para>At semaphore posts (sem_post) calls. A thread Tw
503 which completes a sem_wait call call as a result of a sem_post call
504 on the same semaphore by some other thread Tp, may acquire
505 ownership of memory held exclusively by Tp prior to the sem_post
506 call.</para>
507 </listitem>
508</itemizedlist>
509
510</sect2>
511
512
513
sewardj572feb72007-11-09 23:59:49 +0000514<sect2 id="hg-manual.data-races.re-excl" xreflabel="Re-Excl Transfers">
sewardjb4112022007-11-09 22:49:28 +0000515<title>Restoration of Exclusive Ownership</title>
516
517<para>Another common idiom is to partition the lifetime of the program
518as a whole into several distinct phases. In some of those phases, a
519memory location may be accessed by multiple threads and so require
520locking. In other phases only one thread exists and so can access the
521memory without locking. For example:</para>
522
523<programlisting><![CDATA[
524int var = 0; /* shared variable */
525pthread_mutex_t mx = PTHREAD_MUTEX_INITIALIZER; /* guard for var */
526pthread_t child;
527
528/* ---- Parent ---- */ /* ---- Child ---- */
529
530var += 1; /* no lock used */
531
532pthread_create( &child, child_fn, NULL );
533
534 void child_fn ( void* uu ) {
535pthread_mutex_lock(&mx); pthread_mutex_lock(&mx);
536var += 2; var += 3;
537pthread_mutex_unlock(&mx); pthread_mutex_unlock(&mx);
538 }
539
540pthread_join ( child );
541
542var += 4; /* no lock used */
543]]></programlisting>
544
545<para>This program is correct, but using only the mechanisms described
sewardj572feb72007-11-09 23:59:49 +0000546so far, Helgrind would report an error at
sewardjb4112022007-11-09 22:49:28 +0000547<computeroutput>var += 4</computeroutput>. This is because, by that
548point, <computeroutput>var</computeroutput> is marked as being in the
549state "shared-modified and protected by the
550lock <computeroutput>mx</computeroutput>", but is being accessed
551without locking. Really, what we want is
552for <computeroutput>var</computeroutput> to return to the parent
553thread's exclusive ownership after the child thread has exited.</para>
554
sewardj572feb72007-11-09 23:59:49 +0000555<para>To make this possible, for every memory location Helgrind also keeps
sewardjb4112022007-11-09 22:49:28 +0000556track of all the threads that have accessed that location
557-- its threadset. When a thread Tquitter joins back to Tstayer,
sewardj572feb72007-11-09 23:59:49 +0000558Helgrind examines the locksets of all memory in shared-modified or
sewardjb4112022007-11-09 22:49:28 +0000559shared-readable state. In each such lockset, if Tquitter is
560mentioned, it is removed and replaced by Tstayer. If, as a result, a
561lockset becomes a singleton set containing Tstayer, then the
562location's state is changed to belongs-exclusively-to-Tstayer.</para>
563
564<para>In our example, the result is exactly as we desire:
565<computeroutput>var</computeroutput> is reacquired exclusively by the
566parent after the child exits.</para>
567
568<para>More generally, when a group of threads merges back to a single
569thread via a cascade of pthread_join calls, any memory shared by the
570group (or a subset of it) ends up being owned exclusively by the sole
sewardj572feb72007-11-09 23:59:49 +0000571surviving thread. This significantly enhances Helgrind's flexibility,
sewardjb4112022007-11-09 22:49:28 +0000572since it means that each memory location may make arbitrarily many
573transitions between exclusive and shared ownership. Furthermore, a
574different lock may protect the location during each period of shared
575ownership.</para>
576
577</sect2>
578
579
580
sewardj572feb72007-11-09 23:59:49 +0000581<sect2 id="hg-manual.data-races.summary" xreflabel="Race Det Summary">
sewardjb4112022007-11-09 22:49:28 +0000582<title>A Summary of the Race Detection Algorithm</title>
583
sewardj572feb72007-11-09 23:59:49 +0000584<para>Helgrind looks for memory locations which are accessed by more
585than one thread. For each such location, Helgrind records which of
sewardjb4112022007-11-09 22:49:28 +0000586the program's locks were held by the accessing thread at the time of
587each access. The hope is to discover that there is indeed at least
588one lock which is consistently used by all threads to protect that
589location. If no such lock can be found, then there is apparently no
590consistent locking strategy being applied for that location, and so a
sewardj572feb72007-11-09 23:59:49 +0000591possible data race might result. Helgrind accordingly reports an
sewardjb4112022007-11-09 22:49:28 +0000592error.</para>
593
594<para>In practice this discipline is far too simplistic, and is
595unusable since it reports many races in some widely used and
sewardj572feb72007-11-09 23:59:49 +0000596known-correct programming disciplines. Helgrind's checking therefore
sewardjb4112022007-11-09 22:49:28 +0000597incorporates many refinements to this basic idea, and can be
598summarised as follows:</para>
599
600<para>The following thread events are intercepted and monitored:</para>
601
602<itemizedlist>
603 <listitem><para>thread creation and exiting (pthread_create,
604 pthread_join, pthread_exit)</para>
605 </listitem>
606 <listitem>
607 <para>lock acquisition and release (pthread_mutex_lock,
608 pthread_mutex_unlock, pthread_rwlock_rdlock,
609 pthread_rwlock_wrlock,
610 pthread_rwlock_unlock)</para>
611 </listitem>
612 <listitem>
613 <para>inter-thread event notifications (pthread_cond_wait,
614 pthread_cond_signal, pthread_cond_broadcast,
615 sem_wait, sem_post)</para>
616 </listitem>
617</itemizedlist>
618
619<para>Memory allocation and deallocation events are intercepted and
620monitored:</para>
621
622<itemizedlist>
623 <listitem>
624 <para>malloc/new/free/delete and variants</para>
625 </listitem>
626 <listitem>
627 <para>stack allocation and deallocation</para>
628 </listitem>
629</itemizedlist>
630
631<para>All memory accesses are intercepted and monitored.</para>
632
sewardj572feb72007-11-09 23:59:49 +0000633<para>By observing the above events, Helgrind can infer certain
sewardjb4112022007-11-09 22:49:28 +0000634aspects of the program's locking discipline. Programs which adhere to
635the following rules are considered to be acceptable:
636</para>
637
638<itemizedlist>
639 <listitem>
640 <para>A thread may allocate memory, and write initial values into
641 it, without locking. That thread is regarded as owning the memory
642 exclusively.</para>
643 </listitem>
644 <listitem>
645 <para>A thread may read and write memory which it owns exclusively,
646 without locking.</para>
647 </listitem>
648 <listitem>
649 <para>Memory which is owned exclusively by one thread may be read by
650 that thread and others without locking. However, in this situation
651 no thread may do unlocked writes to the memory (except for the owner
652 thread's initializing write).</para>
653 </listitem>
654 <listitem>
655 <para>Memory which is shared between multiple threads, one or more
656 of which writes to it, must be protected by a lock which is
657 correctly acquired and released by all threads accessing the
658 memory.</para>
659 </listitem>
660</itemizedlist>
661
662<para>Any violation of this discipline will cause an error to be reported.
663However, two exemptions apply:</para>
664
665<itemizedlist>
666 <listitem>
667 <para>A thread Y can acquire exclusive ownership of memory
668 previously owned exclusively by a different thread X providing
669 X's last access and Y's first access are separated by one of the
670 following synchronization events:</para>
671 <itemizedlist>
672 <listitem><para>X creates thread Y</para></listitem>
673 <listitem><para>X joins back to Y</para></listitem>
674 <listitem><para>X uses a condition-variable to signal at Y, and Y is
675 waiting for that event</para></listitem>
676 <listitem><para>Y completes a semaphore wait as a result of X signalling
677 on that same semaphore</para></listitem>
678 </itemizedlist>
679 <para>
sewardj572feb72007-11-09 23:59:49 +0000680 This refinement allows Helgrind to correctly track the ownership
sewardjb4112022007-11-09 22:49:28 +0000681 state of inter-thread buffers used in the worker-thread and
682 worker-thread-pool concurrent programming idioms (styles).</para>
683 </listitem>
684 <listitem>
685 <para>Similarly, if thread Y joins back to thread X, memory
686 exclusively owned by Y becomes exclusively owned by X instead.
687 Also, memory that has been shared only by X and Y becomes
688 exclusively owned by X. More generally, memory that has been shared
689 by X, Y and some arbitrary other set S of threads is re-marked as
690 shared by X and S. Hence, under the right circumstances, memory
691 shared amongst multiple threads, all of which join into just one,
692 can revert to the exclusive ownership state.</para>
693 <para>
694 In effect, each memory location may make arbitrarily many
695 transitions between exclusive and shared ownership. Furthermore, a
696 different lock may protect the location during each period of shared
697 ownership. This significantly enhances the flexibility of the
698 algorithm.</para>
699 </listitem>
700</itemizedlist>
701
702<para>The ownership state, accessing thread-set and related lock-set
703for each memory location are tracked at 8-bit granularity. This means
704the algorithm is precise even for 16- and 8-bit memory
705accesses.</para>
706
sewardj572feb72007-11-09 23:59:49 +0000707<para>Helgrind correctly handles reader-writer locks in this
sewardjb4112022007-11-09 22:49:28 +0000708framework. Locations shared between multiple threads can be protected
709during reads by locks held in either read-mode or write-mode, but can
710only be protected during writes by locks held in write-mode. Normal
711POSIX mutexes are treated as if they are reader-writer locks which are
712only ever held in write-mode.</para>
713
sewardj572feb72007-11-09 23:59:49 +0000714<para>Helgrind correctly handles POSIX mutexes for which recursive
sewardjb4112022007-11-09 22:49:28 +0000715locking is allowed.</para>
716
sewardj572feb72007-11-09 23:59:49 +0000717<para>Helgrind partially correctly handles x86 and amd64 memory access
sewardjb4112022007-11-09 22:49:28 +0000718instructions preceded by a LOCK prefix. Writes are correctly handled,
719by pretending that the LOCK prefix implies acquisition and release of
720a magic "bus hardware lock" mutex before and after the instruction.
721This unfortunately requires subsequent reads from such locations to
722also use a LOCK prefix, which is not required by the real hardware.
sewardj572feb72007-11-09 23:59:49 +0000723Helgrind does not offer any equivalent handling for atomic sequences
sewardjb4112022007-11-09 22:49:28 +0000724on PowerPC/POWER platforms created by the use of lwarx/stwcx
725instructions.</para>
726
727</sect2>
728
729
730
sewardj572feb72007-11-09 23:59:49 +0000731<sect2 id="hg-manual.data-races.errmsgs" xreflabel="Race Error Messages">
sewardjb4112022007-11-09 22:49:28 +0000732<title>Interpreting Race Error Messages</title>
733
sewardj572feb72007-11-09 23:59:49 +0000734<para>Helgrind's race detection algorithm collects a lot of
sewardjb4112022007-11-09 22:49:28 +0000735information, and tries to present it in a helpful way when a race is
736detected. Here's an example:</para>
737
738<programlisting><![CDATA[
739Thread #2 was created
740 at 0x510548E: clone (in /lib64/libc-2.5.so)
741 by 0x4E2F305: do_clone (in /lib64/libpthread-2.5.so)
742 by 0x4E2F7C5: pthread_create@@GLIBC_2.2.5 (in /lib64/libpthread-2.5.so)
sewardj572feb72007-11-09 23:59:49 +0000743 by 0x4C23870: pthread_create@* (hg_intercepts.c:198)
sewardjb4112022007-11-09 22:49:28 +0000744 by 0x400CEF: main (tc17_sembar.c:195)
745
746// And the same for threads #3, #4 and #5 -- omitted for conciseness
747
748Possible data race during read of size 4 at 0x602174
749 at 0x400BE5: gomp_barrier_wait (tc17_sembar.c:122)
750 by 0x400C44: child (tc17_sembar.c:161)
sewardj572feb72007-11-09 23:59:49 +0000751 by 0x4C25DF7: mythread_wrapper (hg_intercepts.c:178)
sewardjb4112022007-11-09 22:49:28 +0000752 by 0x4E2F09D: start_thread (in /lib64/libpthread-2.5.so)
753 by 0x51054CC: clone (in /lib64/libc-2.5.so)
754 Old state: shared-modified by threads #2, #3, #4, #5
755 New state: shared-modified by threads #2, #3, #4, #5
756 Reason: this thread, #2, holds no consistent locks
757 Last consistently used lock for 0x602174 was first observed
sewardj572feb72007-11-09 23:59:49 +0000758 at 0x4C25D01: pthread_mutex_init (hg_intercepts.c:326)
sewardjb4112022007-11-09 22:49:28 +0000759 by 0x4009E4: gomp_barrier_init (tc17_sembar.c:46)
760 by 0x400CBC: main (tc17_sembar.c:192)
761]]></programlisting>
762
sewardj572feb72007-11-09 23:59:49 +0000763<para>Helgrind first announces the creation points of any threads
sewardjb4112022007-11-09 22:49:28 +0000764referenced in the error message. This is so it can speak concisely
765about threads and sets of threads without repeatedly printing their
766creation point call stacks. Each thread is only ever announced once,
sewardj572feb72007-11-09 23:59:49 +0000767the first time it appears in any Helgrind error message.</para>
sewardjb4112022007-11-09 22:49:28 +0000768
769<para>The main error message begins at the text
770"<computeroutput>Possible data race during read</computeroutput>".
771At the start is information you would expect to see -- address and
772size of the racing access, whether a read or a write, and the call
773stack at the point it was detected.</para>
774
775<para>More interesting is the state transition caused by this access.
776This memory is already in the shared-modified state, and up to now has
777been consistently protected by at least one lock. However, the thread
778making the access in question (thread #2, here) does not hold any
779locks in common with those held during all previous accesses to the
780location -- "no consistent locks", in other words.</para>
781
sewardj572feb72007-11-09 23:59:49 +0000782<para>Finally, Helgrind shows the lock which has protected this
sewardjb4112022007-11-09 22:49:28 +0000783location in all previous accesses. (If there is more than one, only
784one is shown). This can be a useful hint, because it typically shows
785the lock that the programmers intended to use to protect the location,
786but in this case forgot.</para>
787
788<para>Here are some more examples of race reports. This not an
789exhaustive list of combinations, but should give you some insight into
790how to interpret the output.</para>
791
792<programlisting><![CDATA[
793Possible data race during write ...
794 Old state: shared-readonly by threads #1, #2, #3
795 New state: shared-modified by threads #1, #2, #3
796 Reason: this thread, #3, holds no consistent locks
797 Location ... has never been protected by any lock
798]]></programlisting>
799
800<para>The location is shared by 3 threads, all of which have been
801reading it without locking ("has never been protected by any lock").
802Now one of them is writing it. Regardless of whether the writer has a
803lock or not, this is still an error, because the write races against
804the previously observed reads.</para>
805
806<programlisting><![CDATA[
807Possible data race during read ...
808 Old state: shared-modified by threads #1, #2, #3
809 New state: shared-modified by threads #1, #2, #3
810 Reason: this thread, #3, holds no consistent locks
811 Last consistently used lock for ... was first observed ...
812]]></programlisting>
813
814<para>The location is shared by 3 threads, all of which have been
815reading and writing it while (as required) holding at least one lock
816in common. Now it is being read without that lock being held. In the
sewardj572feb72007-11-09 23:59:49 +0000817"Last consistently used lock" part, Helgrind offers its best guess as
sewardjb4112022007-11-09 22:49:28 +0000818to the identity of the lock that should have been used.</para>
819
820<programlisting><![CDATA[
821Possible data race during write ...
822 Old state: owned exclusively by thread #4
823 New state: shared-modified by threads #4, #5
824 Reason: this thread, #5, holds no locks at all
825]]></programlisting>
826
827<para>A location that has so far been accessed exclusively by thread
828#4 has now been written by thread #5, without use of any lock. This
829can be a sign that the programmer did not consider the possibility of
830the location being shared between threads, or, alternatively, forgot
831to use the appropriate lock.</para>
832
833<para>Note that thread #4 exclusively owns the location, and so has
834the right to access it without holding a lock. However, this message
835does not say that thread #4 is not using a lock for this location.
836Indeed, it could be using a lock for the location because it intends
837to make it available to other threads, one of which is thread #5 --
838and thread #5 has forgotten to use the lock.</para>
839
sewardj572feb72007-11-09 23:59:49 +0000840<para>Also, this message implies that Helgrind did not see any
sewardjb4112022007-11-09 22:49:28 +0000841synchronisation event between threads #4 and #5 that would have
842allowed #5 to acquire exclusive ownership from #4. See
sewardj572feb72007-11-09 23:59:49 +0000843<link linkend="hg-manual.data-races.exclusive">above</link>
sewardjb4112022007-11-09 22:49:28 +0000844for a discussion of transfers of exclusive ownership states between
845threads.</para>
846
847</sect2>
848
849
850</sect1>
851
sewardj572feb72007-11-09 23:59:49 +0000852<sect1 id="hg-manual.effective-use" xreflabel="Helgrind Effective Use">
853<title>Hints and Tips for Effective Use of Helgrind</title>
sewardjb4112022007-11-09 22:49:28 +0000854
sewardj572feb72007-11-09 23:59:49 +0000855<para>Helgrind can be very helpful in finding and resolving
sewardjb4112022007-11-09 22:49:28 +0000856threading-related problems. Like all sophisticated tools, it is most
857effective when you understand how to play to its strengths.</para>
858
sewardj572feb72007-11-09 23:59:49 +0000859<para>Helgrind will be less effective when you merely throw an
sewardjb4112022007-11-09 22:49:28 +0000860existing threaded program at it and try to make sense of any reported
861errors. It will be more effective if you design threaded programs
sewardj572feb72007-11-09 23:59:49 +0000862from the start in a way that helps Helgrind verify correctness. The
sewardjb4112022007-11-09 22:49:28 +0000863same is true for finding memory errors with Memcheck, but applies more
864here, because thread checking is a harder problem. Consequently it is
sewardj572feb72007-11-09 23:59:49 +0000865much easier to write a correct program for which Helgrind falsely
sewardjb4112022007-11-09 22:49:28 +0000866reports (threading) errors than it is to write a correct program for
867which Memcheck falsely reports (memory) errors.</para>
868
869<para>With that in mind, here are some tips, listed most important first,
870for getting reliable results and avoiding false errors. The first two
871are critical. Any violations of them will swamp you with huge numbers
872of false data-race errors.</para>
873
874
875<orderedlist>
876
877 <listitem>
878 <para>Make sure your application, and all the libraries it uses,
sewardj572feb72007-11-09 23:59:49 +0000879 use the POSIX threading primitives. Helgrind needs to be able to
sewardjb4112022007-11-09 22:49:28 +0000880 see all events pertaining to thread creation, exit, locking and
881 other syncronisation events. To do so it intercepts many POSIX
882 pthread_ functions.</para>
883
884 <para>Do not roll your own threading primitives (mutexes, etc)
885 from combinations of the Linux futex syscall, counters and wotnot.
sewardj572feb72007-11-09 23:59:49 +0000886 These throw Helgrind's internal what's-going-on models way off
sewardjb4112022007-11-09 22:49:28 +0000887 course and will give bogus results.</para>
888
889 <para>Also, do not reimplement existing POSIX abstractions using
890 other POSIX abstractions. For example, don't build your own
891 semaphore routines or reader-writer locks from POSIX mutexes and
892 condition variables. Instead use POSIX reader-writer locks and
sewardj572feb72007-11-09 23:59:49 +0000893 semaphores directly, since Helgrind supports them directly.</para>
sewardjb4112022007-11-09 22:49:28 +0000894
sewardj572feb72007-11-09 23:59:49 +0000895 <para>Helgrind directly supports the following POSIX threading
sewardjb4112022007-11-09 22:49:28 +0000896 abstractions: mutexes, reader-writer locks, condition variables
897 (but see below), and semaphores. Currently spinlocks and barriers
898 are not supported, although they could be in future. A prototype
899 "safe" implementation of barriers, based on semaphores, is
900 available: please contact the Valgrind authors for details.</para>
901
902 <para>At the time of writing, the following popular Linux packages
903 are known to implement their own threading primitives:</para>
904
905 <itemizedlist>
906 <listitem><para>Qt version 4.X. Qt 3.X is fine, but not 4.X.
sewardj572feb72007-11-09 23:59:49 +0000907 Helgrind contains partial direct support for Qt 4.X threading,
sewardjb4112022007-11-09 22:49:28 +0000908 but this is not yet in a usable state. Assistance from folks
909 knowledgeable in Qt 4 threading internals would be
910 appreciated.</para></listitem>
911
912 <listitem><para>Runtime support library for GNU OpenMP (part of
913 GCC), at least GCC versions 4.2 and 4.3. With some minor effort
914 of modifying the GNU OpenMP runtime support sources, it is
sewardj572feb72007-11-09 23:59:49 +0000915 possible to use Helgrind on GNU OpenMP compiled codes. Please
sewardjb4112022007-11-09 22:49:28 +0000916 contact the Valgrind authors for details.</para></listitem>
917 </itemizedlist>
918 </listitem>
919
920 <listitem>
921 <para>Avoid memory recycling. If you can't avoid it, you must use
sewardj572feb72007-11-09 23:59:49 +0000922 tell Helgrind what is going on via the VALGRIND_HG_CLEAN_MEMORY
sewardjb4112022007-11-09 22:49:28 +0000923 client request
sewardj572feb72007-11-09 23:59:49 +0000924 (in <computeroutput>helgrind.h</computeroutput>).</para>
sewardjb4112022007-11-09 22:49:28 +0000925
sewardj572feb72007-11-09 23:59:49 +0000926 <para>Helgrind is aware of standard memory allocation and
sewardjb4112022007-11-09 22:49:28 +0000927 deallocation that occurs via malloc/free/new/delete and from entry
928 and exit of stack frames. In particular, when memory is
sewardj572feb72007-11-09 23:59:49 +0000929 deallocated via free, delete, or function exit, Helgrind considers
sewardjb4112022007-11-09 22:49:28 +0000930 that memory clean, so when it is eventually reallocated, its
931 history is irrelevant.</para>
932
933 <para>However, it is common practice to implement memory recycling
934 schemes. In these, memory to be freed is not handed to
935 malloc/delete, but instead put into a pool of free buffers to be
sewardj572feb72007-11-09 23:59:49 +0000936 handed out again as required. The problem is that Helgrind has no
sewardjb4112022007-11-09 22:49:28 +0000937 way to know that such memory is logically no longer in use, and
938 its history is irrelevant. Hence you must make that explicit,
939 using the VALGRIND_HG_CLEAN_MEMORY client request to specify the
940 relevant address ranges. It's easiest to put these requests into
941 the pool manager code, and use them either when memory is returned
942 to the pool, or is allocated from it.</para>
943 </listitem>
944
945 <listitem>
946 <para>Avoid POSIX condition variables. If you can, use POSIX
947 semaphores (sem_t, sem_post, sem_wait) to do inter-thread event
948 signalling. Semaphores with an initial value of zero are
949 particularly useful for this.</para>
950
sewardj572feb72007-11-09 23:59:49 +0000951 <para>Helgrind only partially correctly handles POSIX condition
952 variables. This is because Helgrind can see inter-thread
sewardjb4112022007-11-09 22:49:28 +0000953 dependencies between a pthread_cond_wait call and a
954 pthread_cond_signal/broadcast call only if the waiting thread
955 actually gets to the rendezvous first (so that it actually calls
956 pthread_cond_wait). It can't see dependencies between the threads
957 if the signaller arrives first. In the latter case, POSIX
958 guidelines imply that the associated boolean condition still
959 provides an inter-thread synchronisation event, but one which is
sewardj572feb72007-11-09 23:59:49 +0000960 invisible to Helgrind.</para>
sewardjb4112022007-11-09 22:49:28 +0000961
sewardj572feb72007-11-09 23:59:49 +0000962 <para>The result of Helgrind missing some inter-thread
sewardjb4112022007-11-09 22:49:28 +0000963 synchronisation events is to cause it to report false positives.
964 That's because missing such events reduces the extent to which it
965 can transfer exclusive memory ownership between threads. So
966 memory may end up in a shared-modified state when that was not
967 intended by the application programmers.</para>
968
969 <para>The root cause of this synchronisation lossage is
970 particularly hard to understand, so an example is helpful. It was
971 discussed at length by Arndt Muehlenfeld ("Runtime Race Detection
972 in Multi-Threaded Programs", Dissertation, TU Graz, Austria). The
973 canonical POSIX-recommended usage scheme for condition variables
974 is as follows:</para>
975
976<programlisting><![CDATA[
977b is a Boolean condition, which is False most of the time
978cv is a condition variable
979mx is its associated mutex
980
981Signaller: Waiter:
982
983lock(mx) lock(mx)
984b = True while (b == False)
985signal(cv) wait(cv,mx)
986unlock(mx) unlock(mx)
987]]></programlisting>
988
989 <para>Assume <computeroutput>b</computeroutput> is False most of
990 the time. If the waiter arrives at the rendezvous first, it
991 enters its while-loop, waits for the signaller to signal, and
sewardj572feb72007-11-09 23:59:49 +0000992 eventually proceeds. Helgrind sees the signal, notes the
sewardjb4112022007-11-09 22:49:28 +0000993 dependency, and all is well.</para>
994
995 <para>If the signaller arrives
996 first, <computeroutput>b</computeroutput> is set to true, and the
997 signal disappears into nowhere. When the waiter later arrives, it
998 does not enter its while-loop and simply carries on. But even in
999 this case, the waiter code following the while-loop cannot execute
1000 until the signaller sets <computeroutput>b</computeroutput> to
1001 True. Hence there is still the same inter-thread dependency, but
1002 this time it is through an arbitrary in-memory condition, and
sewardj572feb72007-11-09 23:59:49 +00001003 Helgrind cannot see it.</para>
sewardjb4112022007-11-09 22:49:28 +00001004
sewardj572feb72007-11-09 23:59:49 +00001005 <para>By comparison, Helgrind's detection of inter-thread
sewardjb4112022007-11-09 22:49:28 +00001006 dependencies caused by semaphore operations is believed to be
1007 exactly correct.</para>
1008
1009 <para>As far as I know, a solution to this problem that does not
1010 require source-level annotation of condition-variable wait loops
1011 is beyond the current state of the art.</para>
1012 </listitem>
1013
1014 <listitem>
1015 <para>Make sure you are using a supported Linux distribution. At
sewardj572feb72007-11-09 23:59:49 +00001016 present, Helgrind only properly supports x86-linux and amd64-linux
sewardjb4112022007-11-09 22:49:28 +00001017 with glibc-2.3 or later. The latter restriction means we only
1018 support glibc's NPTL threading implementation. The old
1019 LinuxThreads implementation is not supported.</para>
1020
1021 <para>Unsupported targets may work to varying degrees. In
1022 particular ppc32-linux and ppc64-linux running NTPL should work,
sewardj572feb72007-11-09 23:59:49 +00001023 but you will get false race errors because Helgrind does not know
sewardjb4112022007-11-09 22:49:28 +00001024 how to properly handle atomic instruction sequences created using
1025 the lwarx/stwcx instructions.</para>
1026 </listitem>
1027
1028 <listitem>
1029 <para>Round up all finished threads using pthread_join. Avoid
1030 detaching threads: don't create threads in the detached state, and
1031 don't call pthread_detach on existing threads.</para>
1032
1033 <para>Using pthread_join to round up finished threads provides a
sewardj572feb72007-11-09 23:59:49 +00001034 clear synchronisation point that both Helgrind and programmers can
1035 see. This synchronisation point allows Helgrind to adjust its
sewardjb4112022007-11-09 22:49:28 +00001036 memory ownership
sewardj572feb72007-11-09 23:59:49 +00001037 models <link linkend="hg-manual.data-races.exclusive">as described
1038 extensively above</link>, which helps Helgrind produce more
sewardjb4112022007-11-09 22:49:28 +00001039 accurate error reports.</para>
1040
sewardj572feb72007-11-09 23:59:49 +00001041 <para>If you don't call pthread_join on a thread, Helgrind has no
sewardjb4112022007-11-09 22:49:28 +00001042 way to know when it finishes, relative to any significant
1043 synchronisation points for other threads in the program. So it
1044 assumes that the thread lingers indefinitely and can potentially
1045 interfere indefinitely with the memory state of the program. It
1046 has every right to assume that -- after all, it might really be
1047 the case that, for scheduling reasons, the exiting thread did run
1048 very slowly in the last stages of its life.</para>
1049 </listitem>
1050
1051 <listitem>
sewardj572feb72007-11-09 23:59:49 +00001052 <para>Perform thread debugging (with Helgrind) and memory
sewardjb4112022007-11-09 22:49:28 +00001053 debugging (with Memcheck) together.</para>
1054
sewardj572feb72007-11-09 23:59:49 +00001055 <para>Helgrind tracks the state of memory in detail, and memory
sewardjb4112022007-11-09 22:49:28 +00001056 management bugs in the application are liable to cause confusion.
1057 In extreme cases, applications which do many invalid reads and
1058 writes (particularly to freed memory) have been known to crash
sewardj572feb72007-11-09 23:59:49 +00001059 Helgrind. So, ideally, you should make your application
1060 Memcheck-clean before using Helgrind.</para>
sewardjb4112022007-11-09 22:49:28 +00001061
1062 <para>It may be impossible to make your application Memcheck-clean
1063 unless you first remove threading bugs. In particular, it may be
1064 difficult to remove all reads and writes to freed memory in
1065 multithreaded C++ destructor sequences at program termination.
sewardj572feb72007-11-09 23:59:49 +00001066 So, ideally, you should make your application Helgrind-clean
sewardjb4112022007-11-09 22:49:28 +00001067 before using Memcheck.</para>
1068
1069 <para>Since this circularity is obviously unresolvable, at least
sewardj572feb72007-11-09 23:59:49 +00001070 bear in mind that Memcheck and Helgrind are to some extent
sewardjb4112022007-11-09 22:49:28 +00001071 complementary, and you may need to use them together.</para>
1072 </listitem>
1073
1074 <listitem>
1075 <para>POSIX requires that implementations of standard I/O (printf,
1076 fprintf, fwrite, fread, etc) are thread safe. Unfortunately GNU
1077 libc implements this by using internal locking primitives that
sewardj572feb72007-11-09 23:59:49 +00001078 Helgrind is unable to intercept. Consequently Helgrind generates
sewardjb4112022007-11-09 22:49:28 +00001079 many false race reports when you use these functions.</para>
1080
sewardj572feb72007-11-09 23:59:49 +00001081 <para>Helgrind attempts to hide these errors using the standard
sewardjb4112022007-11-09 22:49:28 +00001082 Valgrind error-suppression mechanism. So, at least for simple
1083 test cases, you don't see any. Nevertheless, some may slip
1084 through. Just something to be aware of.</para>
1085 </listitem>
1086
1087 <listitem>
sewardj572feb72007-11-09 23:59:49 +00001088 <para>Helgrind's error checks do not work properly inside the
sewardjb4112022007-11-09 22:49:28 +00001089 system threading library itself
1090 (<computeroutput>libpthread.so</computeroutput>), and it usually
1091 observes large numbers of (false) errors in there. Valgrind's
1092 suppression system then filters these out, so you should not see
1093 them.</para>
1094
1095 <para>If you see any race errors reported
1096 where <computeroutput>libpthread.so</computeroutput> or
1097 <computeroutput>ld.so</computeroutput> is the object associated
1098 with the innermost stack frame, please file a bug report at
1099 http://www.valgrind.org.</para>
1100 </listitem>
1101
1102</orderedlist>
1103
1104</sect1>
1105
1106
1107
1108
sewardj572feb72007-11-09 23:59:49 +00001109<sect1 id="hg-manual.options" xreflabel="Helgrind Options">
1110<title>Helgrind Options</title>
sewardjb4112022007-11-09 22:49:28 +00001111
1112<para>The following end-user options are available:</para>
1113
1114<!-- start of xi:include in the manpage -->
sewardj572feb72007-11-09 23:59:49 +00001115<variablelist id="hg.opts.list">
sewardjb4112022007-11-09 22:49:28 +00001116
1117 <varlistentry id="opt.happens-before" xreflabel="--happens-before">
1118 <term>
1119 <option><![CDATA[--happens-before=none|threads|all
1120 [default: all] ]]></option>
1121 </term>
1122 <listitem>
sewardj572feb72007-11-09 23:59:49 +00001123 <para>Helgrind always regards locks as the basis for
sewardjb4112022007-11-09 22:49:28 +00001124 inter-thread synchronisation. However, by default, before
sewardj572feb72007-11-09 23:59:49 +00001125 reporting a race error, Helgrind will also check whether
sewardjb4112022007-11-09 22:49:28 +00001126 certain other kinds of inter-thread synchronisation events
1127 happened. It may be that if such events took place, then no
1128 race really occurred, and so no error needs to be reported.
sewardj572feb72007-11-09 23:59:49 +00001129 See <link linkend="hg-manual.data-races.exclusive">above</link>
sewardjb4112022007-11-09 22:49:28 +00001130 for a discussion of transfers of exclusive ownership states
1131 between threads.
1132 </para>
1133 <para>With <varname>--happens-before=all</varname>, the
1134 following events are regarded as sources of synchronisation:
1135 thread creation/joinage, condition variable
1136 signal/broadcast/waits, and semaphore posts/waits.
1137 </para>
1138 <para>With <varname>--happens-before=threads</varname>, only
1139 thread creation/joinage events are regarded as sources of
1140 synchronisation.
1141 </para>
1142 <para>With <varname>--happens-before=none</varname>, no events
1143 (apart, of course, from locking) are regarded as sources of
1144 synchronisation.
1145 </para>
1146 <para>Changing this setting from the default will increase your
1147 false-error rate but give little or no gain. The only advantage
1148 is that <option>--happens-before=threads</option> and
sewardj572feb72007-11-09 23:59:49 +00001149 <option>--happens-before=none</option> should make Helgrind
sewardjb4112022007-11-09 22:49:28 +00001150 less and less sensitive to the scheduling of threads, and hence
1151 the output more and more repeatable across runs.
1152 </para>
1153 </listitem>
1154 </varlistentry>
1155
1156 <varlistentry id="opt.trace-addr" xreflabel="--trace-addr">
1157 <term>
1158 <option><![CDATA[--trace-addr=0xXXYYZZ
1159 ]]></option> and
1160 <option><![CDATA[--trace-level=0|1|2 [default: 1]
1161 ]]></option>
1162 </term>
1163 <listitem>
sewardj572feb72007-11-09 23:59:49 +00001164 <para>Requests that Helgrind produces a log of all state changes
sewardjb4112022007-11-09 22:49:28 +00001165 to location 0xXXYYZZ. This can be helpful in tracking down
1166 tricky races. <varname>--trace-level</varname> controls the
1167 verbosity of the log. At the default setting (1), a one-line
1168 summary of is printed for each state change. At level 2 a
1169 complete stack trace is printed for each state change.</para>
1170 </listitem>
1171 </varlistentry>
1172
1173</variablelist>
1174<!-- end of xi:include in the manpage -->
1175
1176<!-- start of xi:include in the manpage -->
1177<para>In addition, the following debugging options are available for
sewardj572feb72007-11-09 23:59:49 +00001178Helgrind:</para>
sewardjb4112022007-11-09 22:49:28 +00001179
sewardj572feb72007-11-09 23:59:49 +00001180<variablelist id="hg.debugopts.list">
sewardjb4112022007-11-09 22:49:28 +00001181
1182 <varlistentry id="opt.trace-malloc" xreflabel="--trace-malloc">
1183 <term>
1184 <option><![CDATA[--trace-malloc=no|yes [no]
1185 ]]></option>
1186 </term>
1187 <listitem>
1188 <para>Show all client malloc (etc) and free (etc) requests.</para>
1189 </listitem>
1190 </varlistentry>
1191
1192 <varlistentry id="opt.gen-vcg" xreflabel="--gen-vcg">
1193 <term>
1194 <option><![CDATA[--gen-vcg=no|yes|yes-w-vts [no]
1195 ]]></option>
1196 </term>
1197 <listitem>
1198 <para>At exit, write to stderr a dump of the happens-before
sewardj572feb72007-11-09 23:59:49 +00001199 graph computed by Helgrind, in a format suitable for the VCG
sewardjb4112022007-11-09 22:49:28 +00001200 graph visualisation tool. A suitable command line is:</para>
sewardj572feb72007-11-09 23:59:49 +00001201 <para><computeroutput>valgrind --tool=helgrind
sewardjb4112022007-11-09 22:49:28 +00001202 --gen-vcg=yes my_app 2&gt;&amp;1
1203 | grep xxxxxx | sed "s/xxxxxx//g"
1204 | xvcg -</computeroutput></para>
1205 <para>With <varname>--gen-vcg=yes</varname>, the basic
1206 happens-before graph is shown. With
1207 <varname>--gen-vcg=yes-w-vts</varname>, the vector timestamp
1208 for each node is also shown.</para>
1209 </listitem>
1210 </varlistentry>
1211
1212 <varlistentry id="opt.cmp-race-err-addrs"
1213 xreflabel="--cmp-race-err-addrs">
1214 <term>
1215 <option><![CDATA[--cmp-race-err-addrs=no|yes [no]
1216 ]]></option>
1217 </term>
1218 <listitem>
1219 <para>Controls whether or not race (data) addresses should be
1220 taken into account when removing duplicates of race errors.
1221 With <varname>--cmp-race-err-addrs=no</varname>, two otherwise
1222 identical race errors will be considered to be the same if
1223 their race addresses differ. With
1224 With <varname>--cmp-race-err-addrs=yes</varname> they will be
1225 considered different. This is provided to help make certain
1226 regression tests work reliably.</para>
1227 </listitem>
1228 </varlistentry>
1229
1230 <varlistentry id="opt.tc-sanity-flags" xreflabel="--tc-sanity-flags">
1231 <term>
1232 <option><![CDATA[--tc-sanity-flags=<XXXXX> (X = 0|1) [00000]
1233 ]]></option>
1234 </term>
1235 <listitem>
sewardj572feb72007-11-09 23:59:49 +00001236 <para>Run extensive sanity checks on Helgrind's internal
sewardjb4112022007-11-09 22:49:28 +00001237 data structures at events defined by the bitstring, as
1238 follows:</para>
1239 <para><computeroutput>10000 </computeroutput>after changes to
1240 the lock order acquisition graph</para>
1241 <para><computeroutput>01000 </computeroutput>after every client
1242 memory access (NB: not currently used)</para>
1243 <para><computeroutput>00100 </computeroutput>after every client
1244 memory range permission setting of 256 bytes or greater</para>
1245 <para><computeroutput>00010 </computeroutput>after every client
1246 lock or unlock event</para>
1247 <para><computeroutput>00001 </computeroutput>after every client
1248 thread creation or joinage event</para>
sewardj572feb72007-11-09 23:59:49 +00001249 <para>Note these will make Helgrind run very slowly, often to
sewardjb4112022007-11-09 22:49:28 +00001250 the point of being completely unusable.</para>
1251 </listitem>
1252 </varlistentry>
1253
1254</variablelist>
1255<!-- end of xi:include in the manpage -->
1256
1257
1258</sect1>
1259
sewardj572feb72007-11-09 23:59:49 +00001260<sect1 id="hg-manual.todolist" xreflabel="To Do List">
1261<title>A To-Do List for Helgrind</title>
sewardjb4112022007-11-09 22:49:28 +00001262
1263<para>The following is a list of loose ends which should be tidied up
1264some time.</para>
1265
1266<itemizedlist>
1267 <listitem><para>Track which mutexes are associated with which
1268 condition variables, and emit a warning if this becomes
1269 inconsistent.</para>
1270 </listitem>
1271 <listitem><para>For lock order errors, print the complete lock
1272 cycle, rather than only doing for size-2 cycles as at
1273 present.</para>
1274 </listitem>
1275 <listitem><para>Document the VALGRIND_HG_CLEAN_MEMORY client
1276 request.</para>
1277 </listitem>
1278 <listitem><para>Possibly a client request to forcibly transfer
1279 ownership of memory from one thread to another. Requires further
1280 consideration.</para>
1281 </listitem>
1282 <listitem><para>Add a new client request that marks an address range
1283 as being "shared-modified with empty lockset" (the error state),
1284 and describe how to use it.</para>
1285 </listitem>
1286 <listitem><para>Document races caused by gcc's thread-unsafe code
1287 generation for speculative stores. In the interim see
1288 <computeroutput>http://gcc.gnu.org/ml/gcc/2007-10/msg00266.html
1289 </computeroutput>
1290 and <computeroutput>http://lkml.org/lkml/2007/10/24/673</computeroutput>.
1291 </para>
1292 </listitem>
1293 <listitem><para>Don't update the lock-order graph, and don't check
1294 for errors, when a "try"-style lock operation happens (eg
1295 pthread_mutex_trylock). Such calls do not add any real
1296 restrictions to the locking order, since they can always fail to
1297 acquire the lock, resulting in the caller going off and doing Plan
1298 B (presumably it will have a Plan B). Doing such checks could
1299 generate false lock-order errors and confuse users.</para>
1300 </listitem>
1301 <listitem><para> Performance can be very poor. Slowdowns on the
1302 order of 100:1 are not unusual. There is quite some scope for
1303 performance improvements, though.
1304 </para>
1305 </listitem>
1306
1307</itemizedlist>
1308
1309</sect1>
1310
1311</chapter>