helgrind/docs/hg-manual.xml - platform/external/valgrind - Gitiles

 <?xml version="1.0"?> <!-- -*- sgml -*- -->
 <!DOCTYPE chapter PUBLIC "-//OASIS//DTD DocBook XML V4.2//EN"
           "http://www.oasis-open.org/docbook/xml/4.2/docbookx.dtd"
 [ <!ENTITY % vg-entities SYSTEM "../../docs/xml/vg-entities.xml"> %vg-entities; ]>


 <chapter id="hg-manual" xreflabel="Helgrind: thread error detector">
   <title>Helgrind: a thread error detector</title>

 <para>To use this tool, you must specify
 <computeroutput>--tool=helgrind</computeroutput> on the Valgrind
 command line.</para>


 <sect1 id="hg-manual.overview" xreflabel="Overview">
 <title>Overview</title>

 <para>Helgrind is a Valgrind tool for detecting synchronisation errors
 in C, C++ and Fortran programs that use the POSIX pthreads
 threading primitives.</para>

 <para>The main abstractions in POSIX pthreads are: a set of threads
 sharing a common address space, thread creation, thread joinage,
 thread exit, mutexes (locks), condition variables (inter-thread event
 notifications), reader-writer locks, and semaphores.</para>

 <para>Helgrind is aware of all these abstractions and tracks their
 effects as accurately as it can.  Currently it does not correctly
 handle pthread barriers and pthread spinlocks, although it will not
 object if you use them.  On x86 and amd64 platforms, it understands
 and partially handles implicit locking arising from the use of the
 LOCK instruction prefix.
 </para>

 <para>Helgrind can detect three classes of errors, which are discussed
 in detail in the next three sections:</para>

 <orderedlist>
  <listitem>
   <para><link linkend="hg-manual.api-checks">
         Misuses of the POSIX pthreads API.</link></para>
  </listitem>
  <listitem>
   <para><link linkend="hg-manual.lock-orders">
         Potential deadlocks arising from lock
         ordering problems.</link></para>
  </listitem>
  <listitem>
   <para><link linkend="hg-manual.data-races">
         Data races -- accessing memory without adequate locking.
         </link></para>
  </listitem>
 </orderedlist>

 <para>Following those is a section containing
 <link linkend="hg-manual.effective-use">
 hints and tips on how to get the best out of Helgrind.</link>
 </para>

 <para>Then there is a
 <link linkend="hg-manual.options">summary of command-line
 options.</link>
 </para>

 <para>Finally, there is
 <link linkend="hg-manual.todolist">a brief summary of areas in which Helgrind
 could be improved.</link>
 </para>

 </sect1>


 <sect1 id="hg-manual.api-checks" xreflabel="API Checks">
 <title>Detected errors: Misuses of the POSIX pthreads API</title>

 <para>Helgrind intercepts calls to many POSIX pthreads functions, and
 is therefore able to report on various common problems.  Although
 these are unglamourous errors, their presence can lead to undefined
 program behaviour and hard-to-find bugs later in execution.  The
 detected errors are:</para>

 <itemizedlist>
  <listitem><para>unlocking an invalid mutex</para></listitem>
  <listitem><para>unlocking a not-locked mutex</para></listitem>
  <listitem><para>unlocking a mutex held by a different
                  thread</para></listitem>
  <listitem><para>destroying an invalid or a locked mutex</para></listitem>
  <listitem><para>recursively locking a non-recursive mutex</para></listitem>
  <listitem><para>deallocation of memory that contains a
                  locked mutex</para></listitem>
  <listitem><para>passing mutex arguments to functions expecting
                  reader-writer lock arguments, and vice
                  versa</para></listitem>
  <listitem><para>when a POSIX pthread function fails with an
                  error code that must be handled</para></listitem>
  <listitem><para>when a thread exits whilst still holding locked
                  locks</para></listitem>
  <listitem><para>calling <computeroutput>pthread_cond_wait</computeroutput>
                  with a not-locked mutex, or one locked by a different
                  thread</para></listitem>
 </itemizedlist>

 <para>Checks pertaining to the validity of mutexes are generally also
 performed for reader-writer locks.</para>

 <para>Various kinds of this-can't-possibly-happen events are also
 reported.  These usually indicate bugs in the system threading
 library.</para>

 <para>Reported errors always contain a primary stack trace indicating
 where the error was detected.  They may also contain auxiliary stack
 traces giving additional information.  In particular, most errors
 relating to mutexes will also tell you where that mutex first came to
 Helgrind's attention (the "<computeroutput>was first observed
 at</computeroutput>" part), so you have a chance of figuring out which
 mutex it is referring to.  For example:</para>

 <programlisting><![CDATA[
 Thread #1 unlocked a not-locked lock at 0x7FEFFFA90
    at 0x4C2408D: pthread_mutex_unlock (hg_intercepts.c:492)
    by 0x40073A: nearly_main (tc09_bad_unlock.c:27)
    by 0x40079B: main (tc09_bad_unlock.c:50)
   Lock at 0x7FEFFFA90 was first observed
    at 0x4C25D01: pthread_mutex_init (hg_intercepts.c:326)
    by 0x40071F: nearly_main (tc09_bad_unlock.c:23)
    by 0x40079B: main (tc09_bad_unlock.c:50)
 ]]></programlisting>

 <para>Helgrind has a way of summarising thread identities, as
 evidenced here by the text "<computeroutput>Thread
 #1</computeroutput>".  This is so that it can speak about threads and
 sets of threads without overwhelming you with details.  See
 <link linkend="hg-manual.data-races.errmsgs">below</link>
 for more information on interpreting error messages.</para>

 </sect1>


 <sect1 id="hg-manual.lock-orders" xreflabel="Lock Orders">
 <title>Detected errors: Inconsistent Lock Orderings</title>

 <para>In this section, and in general, to "acquire" a lock simply
 means to lock that lock, and to "release" a lock means to unlock
 it.</para>

 <para>Helgrind monitors the order in which threads acquire locks.
 This allows it to detect potential deadlocks which could arise from
 the formation of cycles of locks.  Detecting such inconsistencies is
 useful because, whilst actual deadlocks are fairly obvious, potential
 deadlocks may never be discovered during testing and could later lead
 to hard-to-diagnose in-service failures.</para>

 <para>The simplest example of such a problem is as
 follows.</para>

 <itemizedlist>
  <listitem><para>Imagine some shared resource R, which, for whatever
   reason, is guarded by two locks, L1 and L2, which must both be held
   when R is accessed.</para>
  </listitem>
  <listitem><para>Suppose a thread acquires L1, then L2, and proceeds
   to access R.  The implication of this is that all threads in the
   program must acquire the two locks in the order first L1 then L2.
   Not doing so risks deadlock.</para>
  </listitem>
  <listitem><para>The deadlock could happen if two threads -- call them
   T1 and T2 -- both want to access R.  Suppose T1 acquires L1 first,
   and T2 acquires L2 first.  Then T1 tries to acquire L2, and T2 tries
   to acquire L1, but those locks are both already held.  So T1 and T2
   become deadlocked.</para>
  </listitem>
 </itemizedlist>

 <para>Helgrind builds a directed graph indicating the order in which
 locks have been acquired in the past.  When a thread acquires a new
 lock, the graph is updated, and then checked to see if it now contains
 a cycle.  The presence of a cycle indicates a potential deadlock involving
 the locks in the cycle.</para>

 <para>In simple situations, where the cycle only contains two locks,
 Helgrind will show where the required order was established:</para>

 <programlisting><![CDATA[
 Thread #1: lock order "0x7FEFFFAB0 before 0x7FEFFFA80" violated
    at 0x4C23C91: pthread_mutex_lock (hg_intercepts.c:388)
    by 0x40081F: main (tc13_laog1.c:24)
   Required order was established by acquisition of lock at 0x7FEFFFAB0
    at 0x4C23C91: pthread_mutex_lock (hg_intercepts.c:388)
    by 0x400748: main (tc13_laog1.c:17)
   followed by a later acquisition of lock at 0x7FEFFFA80
    at 0x4C23C91: pthread_mutex_lock (hg_intercepts.c:388)
    by 0x400773: main (tc13_laog1.c:18)
 ]]></programlisting>

 <para>When there are more than two locks in the cycle, the error is
 equally serious.  However, at present Helgrind does not show the locks
 involved, so as to avoid flooding you with information.  That could be
 fixed in future.  For example, here is a an example involving a cycle
 of five locks from a naive implementation the famous Dining
 Philosophers problem
 (see <computeroutput>helgrind/tests/tc14_laog_dinphils.c</computeroutput>).
 In this case Helgrind has detected that all 5 philosophers could
 simultaneously pick up their left fork and then deadlock whilst
 waiting to pick up their right forks.</para>

 <programlisting><![CDATA[
 Thread #6: lock order "0x6010C0 before 0x601160" violated
    at 0x4C23C91: pthread_mutex_lock (hg_intercepts.c:388)
    by 0x4007C0: dine (tc14_laog_dinphils.c:19)
    by 0x4C25DF7: mythread_wrapper (hg_intercepts.c:178)
    by 0x4E2F09D: start_thread (in /lib64/libpthread-2.5.so)
    by 0x51054CC: clone (in /lib64/libc-2.5.so)
 ]]></programlisting>

 </sect1>


 <sect1 id="hg-manual.data-races" xreflabel="Data Races">
 <title>Detected errors: Data Races</title>

 <para>A data race happens, or could happen, when two threads
 access a shared memory location without using suitable locks to
 ensure single-threaded access.  Such missing locking can cause
 obscure timing dependent bugs.  Ensuring programs are race-free is
 one of the central difficulties of threaded programming.</para>

 <para>Reliably detecting races is a difficult problem, and most
 of Helgrind's internals are devoted to do dealing with it.
 As a consequence this section is somewhat long and involved.
 We begin with a simple example.</para>


 <sect2 id="hg-manual.data-races.example" xreflabel="Simple Race">
 <title>A Simple Data Race</title>

 <para>About the simplest possible example of a race is as follows.  In
 this program, it is impossible to know what the value
 of <computeroutput>var</computeroutput> is at the end of the program.
 Is it 2 ?  Or 1 ?</para>

 <programlisting><![CDATA[
 #include <pthread.h>

 int var = 0;

 void* child_fn ( void* arg ) {
    var++; /* Unprotected relative to parent */ /* this is line 6 */
    return NULL;
 }

 int main ( void ) {
    pthread_t child;
    pthread_create(&child, NULL, child_fn, NULL);
    var++; /* Unprotected relative to child */ /* this is line 13 */
    pthread_join(child, NULL);
    return 0;
 }
 ]]></programlisting>

 <para>The problem is there is nothing to
 stop <computeroutput>var</computeroutput> being updated simultaneously
 by both threads.  A correct program would
 protect <computeroutput>var</computeroutput> with a lock of type
 <computeroutput>pthread_mutex_t</computeroutput>, which is acquired
 before each access and released afterwards.  Helgrind's output for
 this program is:</para>

 <programlisting><![CDATA[
 Thread #1 is the program's root thread

 Thread #2 was created
    at 0x510548E: clone (in /lib64/libc-2.5.so)
    by 0x4E2F305: do_clone (in /lib64/libpthread-2.5.so)
    by 0x4E2F7C5: pthread_create@@GLIBC_2.2.5 (in /lib64/libpthread-2.5.so)
    by 0x4C23870: pthread_create@* (hg_intercepts.c:198)
    by 0x4005F1: main (simple_race.c:12)

 Possible data race during write of size 4 at 0x601034
    at 0x4005F2: main (simple_race.c:13)
   Old state: shared-readonly by threads #1, #2
   New state: shared-modified by threads #1, #2
   Reason:    this thread, #1, holds no consistent locks
   Location 0x601034 has never been protected by any lock
 ]]></programlisting>

 <para>This is quite a lot of detail for an apparently simple error.
 The last clause is the main error message.  It says there is a race as
 a result of a write of size 4 (bytes), at 0x601034, which is
 presumably the address of <computeroutput>var</computeroutput>,
 happening in function <computeroutput>main</computeroutput> at line 13
 in the program.</para>

 <para>Note that it is purely by chance that the race is
 reported for the parent thread's access.  It could equally have been
 reported instead for the child's access, at line 6.  The error will
 only be reported for one of the locations, since neither the parent
 nor child is, by itself, incorrect.  It is only when both access
 <computeroutput>var</computeroutput> without a lock that an error
 exists.</para>

 <para>The error message shows some other interesting details.  The
 sections below explain them.  Here we merely note their presence:</para>

 <itemizedlist>
  <listitem><para>Helgrind maintains some kind of state machine for the
   memory location in question, hence the "<computeroutput>Old
   state:</computeroutput>" and "<computeroutput>New
   state:</computeroutput>" lines.</para>
  </listitem>
  <listitem><para>Helgrind keeps track of which threads have accessed
   the location: "<computeroutput>threads #1, #2</computeroutput>".
   Before printing the main error message, it prints the creation
   points of these two threads, so you can see which threads it is
   referring to.</para>
  </listitem>
  <listitem><para>Helgrind tries to provide an explanation of why the
   race exists: "<computeroutput>Location 0x601034 has never been
   protected by any lock</computeroutput>".</para>
  </listitem>
 </itemizedlist>

 <para>Understanding the memory state machine is central to
 understanding Helgrind's race-detection algorithm.  The next three
 subsections explain this.</para>

 </sect2>


 <sect2 id="hg-manual.data-races.memstates" xreflabel="Memory States">
 <title>Helgrind's Memory State Machine</title>

 <para>Helgrind tracks the state of every byte of memory used by your
 program.  There are a number of states, but only three are
 interesting:</para>

 <itemizedlist>
  <listitem><para>Exclusive: memory in this state is regarded as owned
   exclusively by one particular thread.  That thread may read and
   write it without a lock.  Even in highly threaded programs, the
   majority of locations never leave the Exclusive state, since most
   data is thread-private.</para>
  </listitem>
  <listitem><para>Shared-Readonly: memory in this state is regarded as
   shared by multiple threads.  In this state, any thread may read the
   memory without a lock, reflecting the fact that readonly data may
   safely be shared between threads without locking.</para>
  </listitem>
  <listitem><para>Shared-Modified: memory in this state is regarded as
   shared by multiple threads, at least one of which has written to it.
   All participating threads must hold at least one lock in common when
   accessing the memory.  If no such lock exists, Helgrind reports a
   race error.</para>
  </listitem>
 </itemizedlist>

 <para>Let's review the simple example above with this in mind.  When
 the program starts, <computeroutput>var</computeroutput> is not in any
 of these states.  Either the parent or child thread gets to its
 <computeroutput>var++</computeroutput> first, and thereby
 thereby gets Exclusive ownership of the location.</para>

 <para>The later-running thread now arrives at
 its <computeroutput>var++</computeroutput> statement.  It first reads
 the existing value from memory.
 Because <computeroutput>var</computeroutput> is currently marked as
 owned exclusively by the other thread, its state is changed to
 shared-readonly by both threads.</para>

 <para>This same thread adds one to the value it has and stores it back
 in <computeroutput>var</computeroutput>.  This causes another state
 change, this time to the shared-modified state.  Because Helgrind has
 also been tracking which threads hold which locks, it can see that
 <computeroutput>var</computeroutput> is in shared-modified state but
 no lock has been used to consistently protect it.  Hence a race is
 reported exactly at the transition from shared-readonly to
 shared-modified.</para>

 <para>The essence of the algorithm is this.  Helgrind keeps track of
 each memory location that has been accessed by more than one thread.
 For each such location it incrementally infers the set of locks which
 have consistently been used to protect that location.  If the
 location's lockset becomes empty, and at some point one of the threads
 attempts to write to it, a race is then reported.</para>

 <para>This technique is known as "lockset inference" and was
 introduced in: "Eraser: A Dynamic Data Race Detector for Multithreaded
 Programs" (Stefan Savage, Michael Burrows, Greg Nelson, Patrick
 Sobalvarro and Thomas Anderson, ACM Transactions on Computer Systems,
 15(4):391-411, November 1997).</para>

 <para>Lockset inference has since been widely implemented, studied and
 extended.  Helgrind incorporates several refinements aimed at avoiding
 the high false error rate that naive versions of the algorithm suffer
 from.  A
 <link linkend="hg-manual.data-races.summary">summary of the complete
 algorithm used by Helgrind</link> is presented below.  First, however,
 it is important to understand details of transitions pertaining to the
 Exclusive-ownership state.</para>

 </sect2>


 <sect2 id="hg-manual.data-races.exclusive" xreflabel="Excl Transfers">
 <title>Transfers of Exclusive Ownership Between Threads</title>

 <para>As presented, the algorithm is far too strict.  It reports many
 errors in perfectly correct, widely used parallel programming
 constructions, for example, using child worker threads and worker
 thread pools.</para>

 <para>To avoid these false errors, we must refine the algorithm so
 that it keeps memory in an Exclusive ownership state in cases where it
 would otherwise decay into a shared-readonly or shared-modified state.
 Recall that Exclusive ownership is special in that it grants the
 owning thread the right to access memory without use of any locks.  In
 order to support worker-thread and worker-thread-pool idioms, we will
 allow threads to steal exclusive ownership of memory from other
 threads under certain circumstances.</para>

 <para>Here's an example.  Imagine a parent thread creates child
 threads to do units of work.  For each unit of work, the parent
 allocates a work buffer, fills it in, and creates the child thread,
 handing it a pointer to the buffer.  The child reads/writes the buffer
 and eventually exits, and the waiting parent then extracts the results
 from the buffer:</para>

 <programlisting><![CDATA[
 typedef ... Buffer;

 pthread_t child;
 Buffer    buf;

 /* ---- Parent ---- */                          /* ---- Child ---- */

 /* parent writes workload into buf */
 pthread_create( &child, child_fn, &buf );

 /* parent does not read */                      void child_fn ( Buffer* buf ) {
 /* or write buf */                                 /* read/write buf */
                                                 }

 pthread_join ( child );
 /* parent reads results from buf */
 ]]></programlisting>

 <para>Although <computeroutput>buf</computeroutput> is accessed by
 both threads, neither uses locks, yet the program is race-free.  The
 essential observation is that the child's creation and exit create
 synchronisation events between it and the parent.  These force the
 child's accesses to <computeroutput>buf</computeroutput> to happen
 after the parent initialises <computeroutput>buf</computeroutput>, and
 before the parent reads the results
 from <computeroutput>buf</computeroutput>.</para>

 <para>To model this, Helgrind allows the child to steal, from the
 parent, exclusive ownership of any memory exclusively owned by the
 parent before the pthread_create call.  Similarly, once the parent's
 pthread_join call returns, it can steal back ownership of memory
 exclusively owned by the child.  In this way ownership
 of <computeroutput>buf</computeroutput> is transferred from parent to
 child and back, so the basic algorithm does not report any races
 despite the absence of any locking.</para>

 <para>Note that the child may only steal memory owned by the parent
 prior to the pthread_create call.  If the child attempts to read or
 write memory which is also accessed by the parent in between the
 pthread_create and pthread_join calls, an error is still
 reported.</para>

 <para>This technique was introduced with the name "thread lifetime
 segments" in "Runtime Checking of Multithreaded Applications with
 Visual Threads" (Jerry J. Harrow, Jr, Proceedings of the 7th
 International SPIN Workshop on Model Checking of Software Stanford,
 California, USA, August 2000, LNCS 1885, pp331--342).  Helgrind
 implements an extended version of it.  Specifically, Helgrind allows
 transfer of exclusive ownership in the following situations:</para>

 <itemizedlist>
  <listitem><para>At thread creation: a child can acquire ownership of
   memory held exclusively by the parent prior to the child's
   creation.</para>
  </listitem>
  <listitem><para>At thread joining: the joiner (thread not exiting)
   can acquire ownership of memory held exclusively by the joinee
   (thread that is exiting) at the point it exited.</para>
  </listitem>
  <listitem><para>At condition variable signallings and broadcasts.  A
   thread Tw which completes a pthread_cond_wait call as a result of
   a signal or broadcast on the same condition variable by some other
   thread Ts, may acquire ownership of memory held exclusively by
   Ts prior to the pthread_cond_signal/broadcast
   call.</para>
  </listitem>
  <listitem><para>At semaphore posts (sem_post) calls.  A thread Tw
   which completes a sem_wait call call as a result of a sem_post call
   on the same semaphore by some other thread Tp, may acquire
   ownership of memory held exclusively by Tp prior to the sem_post
   call.</para>
  </listitem>
 </itemizedlist>

 </sect2>


 <sect2 id="hg-manual.data-races.re-excl" xreflabel="Re-Excl Transfers">
 <title>Restoration of Exclusive Ownership</title>

 <para>Another common idiom is to partition the lifetime of the program
 as a whole into several distinct phases.  In some of those phases, a
 memory location may be accessed by multiple threads and so require
 locking.  In other phases only one thread exists and so can access the
 memory without locking.  For example:</para>

 <programlisting><![CDATA[
 int             var = 0;                         /* shared variable */
 pthread_mutex_t mx  = PTHREAD_MUTEX_INITIALIZER; /* guard for var */
 pthread_t       child;

 /* ---- Parent ---- */                          /* ---- Child ---- */

 var += 1; /* no lock used */

 pthread_create( &child, child_fn, NULL );

                                                 void child_fn ( void* uu ) {
 pthread_mutex_lock(&mx);                           pthread_mutex_lock(&mx);
 var += 2;                                          var += 3;
 pthread_mutex_unlock(&mx);                         pthread_mutex_unlock(&mx);
                                                 }

 pthread_join ( child );

 var += 4; /* no lock used */
 ]]></programlisting>

 <para>This program is correct, but using only the mechanisms described
 so far, Helgrind would report an error at
 <computeroutput>var += 4</computeroutput>.  This is because, by that
 point, <computeroutput>var</computeroutput> is marked as being in the
 state "shared-modified and protected by the
 lock <computeroutput>mx</computeroutput>", but is being accessed
 without locking.  Really, what we want is
 for <computeroutput>var</computeroutput> to return to the parent
 thread's exclusive ownership after the child thread has exited.</para>

 <para>To make this possible, for every memory location Helgrind also keeps
 track of all the threads that have accessed that location
 -- its threadset.  When a thread Tquitter joins back to Tstayer,
 Helgrind examines the locksets of all memory in shared-modified or
 shared-readable state.  In each such lockset, if Tquitter is
 mentioned, it is removed and replaced by Tstayer.  If, as a result, a
 lockset becomes a singleton set containing Tstayer, then the
 location's state is changed to belongs-exclusively-to-Tstayer.</para>

 <para>In our example, the result is exactly as we desire:
 <computeroutput>var</computeroutput> is reacquired exclusively by the
 parent after the child exits.</para>

 <para>More generally, when a group of threads merges back to a single
 thread via a cascade of pthread_join calls, any memory shared by the
 group (or a subset of it) ends up being owned exclusively by the sole
 surviving thread.  This significantly enhances Helgrind's flexibility,
 since it means that each memory location may make arbitrarily many
 transitions between exclusive and shared ownership.  Furthermore, a
 different lock may protect the location during each period of shared
 ownership.</para>

 </sect2>


 <sect2 id="hg-manual.data-races.summary" xreflabel="Race Det Summary">
 <title>A Summary of the Race Detection Algorithm</title>

 <para>Helgrind looks for memory locations which are accessed by more
 than one thread.  For each such location, Helgrind records which of
 the program's locks were held by the accessing thread at the time of
 each access.  The hope is to discover that there is indeed at least
 one lock which is consistently used by all threads to protect that
 location.  If no such lock can be found, then there is apparently no
 consistent locking strategy being applied for that location, and so a
 possible data race might result.  Helgrind accordingly reports an
 error.</para>

 <para>In practice this discipline is far too simplistic, and is
 unusable since it reports many races in some widely used and
 known-correct programming disciplines.  Helgrind's checking therefore
 incorporates many refinements to this basic idea, and can be
 summarised as follows:</para>

 <para>The following thread events are intercepted and monitored:</para>

 <itemizedlist>
  <listitem><para>thread creation and exiting (pthread_create,
            pthread_join, pthread_exit)</para>
  </listitem>
  <listitem>
   <para>lock acquisition and release (pthread_mutex_lock,
         pthread_mutex_unlock, pthread_rwlock_rdlock,
         pthread_rwlock_wrlock,
         pthread_rwlock_unlock)</para>
  </listitem>
  <listitem>
   <para>inter-thread event notifications (pthread_cond_wait,
         pthread_cond_signal, pthread_cond_broadcast,
         sem_wait, sem_post)</para>
  </listitem>
 </itemizedlist>

 <para>Memory allocation and deallocation events are intercepted and
 monitored:</para>

 <itemizedlist>
  <listitem>
   <para>malloc/new/free/delete and variants</para>
  </listitem>
  <listitem>
   <para>stack allocation and deallocation</para>
  </listitem>
 </itemizedlist>

 <para>All memory accesses are intercepted and monitored.</para>

 <para>By observing the above events, Helgrind can infer certain
 aspects of the program's locking discipline.  Programs which adhere to
 the following rules are considered to be acceptable:
 </para>

 <itemizedlist>
  <listitem>
   <para>A thread may allocate memory, and write initial values into
   it, without locking.  That thread is regarded as owning the memory
   exclusively.</para>
  </listitem>
  <listitem>
   <para>A thread may read and write memory which it owns exclusively,
   without locking.</para>
  </listitem>
  <listitem>
   <para>Memory which is owned exclusively by one thread may be read by
   that thread and others without locking.  However, in this situation
   no thread may do unlocked writes to the memory (except for the owner
   thread's initializing write).</para>
  </listitem>
  <listitem>
   <para>Memory which is shared between multiple threads, one or more
   of which writes to it, must be protected by a lock which is
   correctly acquired and released by all threads accessing the
   memory.</para>
  </listitem>
 </itemizedlist>

 <para>Any violation of this discipline will cause an error to be reported.
 However, two exemptions apply:</para>

 <itemizedlist>
  <listitem>
   <para>A thread Y can acquire exclusive ownership of memory
   previously owned exclusively by a different thread X providing
   X's last access and Y's first access are separated by one of the
   following synchronization events:</para>
   <itemizedlist>
    <listitem><para>X creates thread Y</para></listitem>
    <listitem><para>X joins back to Y</para></listitem>
    <listitem><para>X uses a condition-variable to signal at Y, and Y is
    waiting for that event</para></listitem>
    <listitem><para>Y completes a semaphore wait as a result of X signalling
    on that same semaphore</para></listitem>
   </itemizedlist>
   <para>
   This refinement allows Helgrind to correctly track the ownership
   state of inter-thread buffers used in the worker-thread and
   worker-thread-pool concurrent programming idioms (styles).</para>
  </listitem>
  <listitem>
   <para>Similarly, if thread Y joins back to thread X, memory
   exclusively owned by Y becomes exclusively owned by X instead.
   Also, memory that has been shared only by X and Y becomes
   exclusively owned by X.  More generally, memory that has been shared
   by X, Y and some arbitrary other set S of threads is re-marked as
   shared by X and S.  Hence, under the right circumstances, memory
   shared amongst multiple threads, all of which join into just one,
   can revert to the exclusive ownership state.</para>
   <para>
   In effect, each memory location may make arbitrarily many
   transitions between exclusive and shared ownership.  Furthermore, a
   different lock may protect the location during each period of shared
   ownership.  This significantly enhances the flexibility of the
   algorithm.</para>
  </listitem>
 </itemizedlist>

 <para>The ownership state, accessing thread-set and related lock-set
 for each memory location are tracked at 8-bit granularity.  This means
 the algorithm is precise even for 16- and 8-bit memory
 accesses.</para>

 <para>Helgrind correctly handles reader-writer locks in this
 framework.  Locations shared between multiple threads can be protected
 during reads by locks held in either read-mode or write-mode, but can
 only be protected during writes by locks held in write-mode.  Normal
 POSIX mutexes are treated as if they are reader-writer locks which are
 only ever held in write-mode.</para>

 <para>Helgrind correctly handles POSIX mutexes for which recursive
 locking is allowed.</para>

 <para>Helgrind partially correctly handles x86 and amd64 memory access
 instructions preceded by a LOCK prefix.  Writes are correctly handled,
 by pretending that the LOCK prefix implies acquisition and release of
 a magic "bus hardware lock" mutex before and after the instruction.
 This unfortunately requires subsequent reads from such locations to
 also use a LOCK prefix, which is not required by the real hardware.
 Helgrind does not offer any equivalent handling for atomic sequences
 on PowerPC/POWER platforms created by the use of lwarx/stwcx
 instructions.</para>

 </sect2>


 <sect2 id="hg-manual.data-races.errmsgs" xreflabel="Race Error Messages">
 <title>Interpreting Race Error Messages</title>

 <para>Helgrind's race detection algorithm collects a lot of
 information, and tries to present it in a helpful way when a race is
 detected.  Here's an example:</para>

 <programlisting><![CDATA[
 Thread #2 was created
    at 0x510548E: clone (in /lib64/libc-2.5.so)
    by 0x4E2F305: do_clone (in /lib64/libpthread-2.5.so)
    by 0x4E2F7C5: pthread_create@@GLIBC_2.2.5 (in /lib64/libpthread-2.5.so)
    by 0x4C23870: pthread_create@* (hg_intercepts.c:198)
    by 0x400CEF: main (tc17_sembar.c:195)

 // And the same for threads #3, #4 and #5 -- omitted for conciseness

 Possible data race during read of size 4 at 0x602174
    at 0x400BE5: gomp_barrier_wait (tc17_sembar.c:122)
    by 0x400C44: child (tc17_sembar.c:161)
    by 0x4C25DF7: mythread_wrapper (hg_intercepts.c:178)
    by 0x4E2F09D: start_thread (in /lib64/libpthread-2.5.so)
    by 0x51054CC: clone (in /lib64/libc-2.5.so)
   Old state: shared-modified by threads #2, #3, #4, #5
   New state: shared-modified by threads #2, #3, #4, #5
   Reason:    this thread, #2, holds no consistent locks
   Last consistently used lock for 0x602174 was first observed
    at 0x4C25D01: pthread_mutex_init (hg_intercepts.c:326)
    by 0x4009E4: gomp_barrier_init (tc17_sembar.c:46)
    by 0x400CBC: main (tc17_sembar.c:192)
 ]]></programlisting>

 <para>Helgrind first announces the creation points of any threads
 referenced in the error message.  This is so it can speak concisely
 about threads and sets of threads without repeatedly printing their
 creation point call stacks.  Each thread is only ever announced once,
 the first time it appears in any Helgrind error message.</para>

 <para>The main error message begins at the text
 "<computeroutput>Possible data race during read</computeroutput>".
 At the start is information you would expect to see -- address and
 size of the racing access, whether a read or a write, and the call
 stack at the point it was detected.</para>

 <para>More interesting is the state transition caused by this access.
 This memory is already in the shared-modified state, and up to now has
 been consistently protected by at least one lock.  However, the thread
 making the access in question (thread #2, here) does not hold any
 locks in common with those held during all previous accesses to the
 location -- "no consistent locks", in other words.</para>

 <para>Finally, Helgrind shows the lock which has protected this
 location in all previous accesses.  (If there is more than one, only
 one is shown).  This can be a useful hint, because it typically shows
 the lock that the programmers intended to use to protect the location,
 but in this case forgot.</para>

 <para>Here are some more examples of race reports.  This not an
 exhaustive list of combinations, but should give you some insight into
 how to interpret the output.</para>

 <programlisting><![CDATA[
 Possible data race during write ...
   Old state: shared-readonly by threads #1, #2, #3
   New state: shared-modified by threads #1, #2, #3
   Reason:    this thread, #3, holds no consistent locks
   Location ... has never been protected by any lock
 ]]></programlisting>

 <para>The location is shared by 3 threads, all of which have been
 reading it without locking ("has never been protected by any lock").
 Now one of them is writing it.  Regardless of whether the writer has a
 lock or not, this is still an error, because the write races against
 the previously observed reads.</para>

 <programlisting><![CDATA[
 Possible data race during read ...
   Old state: shared-modified by threads #1, #2, #3
   New state: shared-modified by threads #1, #2, #3
   Reason:    this thread, #3, holds no consistent locks
   Last consistently used lock for ... was first observed ...
 ]]></programlisting>

 <para>The location is shared by 3 threads, all of which have been
 reading and writing it while (as required) holding at least one lock
 in common.  Now it is being read without that lock being held.  In the
 "Last consistently used lock" part, Helgrind offers its best guess as
 to the identity of the lock that should have been used.</para>

 <programlisting><![CDATA[
 Possible data race during write ...
   Old state: owned exclusively by thread #4
   New state: shared-modified by threads #4, #5
   Reason:    this thread, #5, holds no locks at all
 ]]></programlisting>

 <para>A location that has so far been accessed exclusively by thread
 #4 has now been written by thread #5, without use of any lock.  This
 can be a sign that the programmer did not consider the possibility of
 the location being shared between threads, or, alternatively, forgot
 to use the appropriate lock.</para>

 <para>Note that thread #4 exclusively owns the location, and so has
 the right to access it without holding a lock.  However, this message
 does not say that thread #4 is not using a lock for this location.
 Indeed, it could be using a lock for the location because it intends
 to make it available to other threads, one of which is thread #5 --
 and thread #5 has forgotten to use the lock.</para>

 <para>Also, this message implies that Helgrind did not see any
 synchronisation event between threads #4 and #5 that would have
 allowed #5 to acquire exclusive ownership from #4.  See
 <link linkend="hg-manual.data-races.exclusive">above</link>
 for a discussion of transfers of exclusive ownership states between
 threads.</para>

 </sect2>


 </sect1>

 <sect1 id="hg-manual.effective-use" xreflabel="Helgrind Effective Use">
 <title>Hints and Tips for Effective Use of Helgrind</title>

 <para>Helgrind can be very helpful in finding and resolving
 threading-related problems.  Like all sophisticated tools, it is most
 effective when you understand how to play to its strengths.</para>

 <para>Helgrind will be less effective when you merely throw an
 existing threaded program at it and try to make sense of any reported
 errors.  It will be more effective if you design threaded programs
 from the start in a way that helps Helgrind verify correctness.  The
 same is true for finding memory errors with Memcheck, but applies more
 here, because thread checking is a harder problem.  Consequently it is
 much easier to write a correct program for which Helgrind falsely
 reports (threading) errors than it is to write a correct program for
 which Memcheck falsely reports (memory) errors.</para>

 <para>With that in mind, here are some tips, listed most important first,
 for getting reliable results and avoiding false errors.  The first two
 are critical.  Any violations of them will swamp you with huge numbers
 of false data-race errors.</para>


 <orderedlist>

   <listitem>
     <para>Make sure your application, and all the libraries it uses,
     use the POSIX threading primitives.  Helgrind needs to be able to
     see all events pertaining to thread creation, exit, locking and
     other synchronisation events.  To do so it intercepts many POSIX
     pthread_ functions.</para>

     <para>Do not roll your own threading primitives (mutexes, etc)
     from combinations of the Linux futex syscall, counters and wotnot.
     These throw Helgrind's internal what's-going-on models way off
     course and will give bogus results.</para>

     <para>Also, do not reimplement existing POSIX abstractions using
     other POSIX abstractions.  For example, don't build your own
     semaphore routines or reader-writer locks from POSIX mutexes and
     condition variables.  Instead use POSIX reader-writer locks and
     semaphores directly, since Helgrind supports them directly.</para>

     <para>Helgrind directly supports the following POSIX threading
     abstractions: mutexes, reader-writer locks, condition variables
     (but see below), and semaphores.  Currently spinlocks and barriers
     are not supported, although they could be in future.  A prototype
     "safe" implementation of barriers, based on semaphores, is
     available: please contact the Valgrind authors for details.</para>

     <para>At the time of writing, the following popular Linux packages
     are known to implement their own threading primitives:</para>

     <itemizedlist>
       <listitem><para>Qt version 4.X.  Qt 3.X is fine, but not 4.X.
       Helgrind contains partial direct support for Qt 4.X threading,
       but this is not yet in a usable state.  Assistance from folks
       knowledgeable in Qt 4 threading internals would be
       appreciated.</para></listitem>

       <listitem><para>Runtime support library for GNU OpenMP (part of
       GCC), at least GCC versions 4.2 and 4.3.  With some minor effort
       of modifying the GNU OpenMP runtime support sources, it is
       possible to use Helgrind on GNU OpenMP compiled codes.  Please
       contact the Valgrind authors for details.</para></listitem>
     </itemizedlist>
   </listitem>

   <listitem>
     <para>Avoid memory recycling.  If you can't avoid it, you must use
     tell Helgrind what is going on via the VALGRIND_HG_CLEAN_MEMORY
     client request
     (in <computeroutput>helgrind.h</computeroutput>).</para>

     <para>Helgrind is aware of standard memory allocation and
     deallocation that occurs via malloc/free/new/delete and from entry
     and exit of stack frames.  In particular, when memory is
     deallocated via free, delete, or function exit, Helgrind considers
     that memory clean, so when it is eventually reallocated, its
     history is irrelevant.</para>

     <para>However, it is common practice to implement memory recycling
     schemes.  In these, memory to be freed is not handed to
     malloc/delete, but instead put into a pool of free buffers to be
     handed out again as required.  The problem is that Helgrind has no
     way to know that such memory is logically no longer in use, and
     its history is irrelevant.  Hence you must make that explicit,
     using the VALGRIND_HG_CLEAN_MEMORY client request to specify the
     relevant address ranges.  It's easiest to put these requests into
     the pool manager code, and use them either when memory is returned
     to the pool, or is allocated from it.</para>
   </listitem>

   <listitem>
     <para>Avoid POSIX condition variables.  If you can, use POSIX
     semaphores (sem_t, sem_post, sem_wait) to do inter-thread event
     signalling.  Semaphores with an initial value of zero are
     particularly useful for this.</para>

     <para>Helgrind only partially correctly handles POSIX condition
     variables.  This is because Helgrind can see inter-thread
     dependencies between a pthread_cond_wait call and a
     pthread_cond_signal/broadcast call only if the waiting thread
     actually gets to the rendezvous first (so that it actually calls
     pthread_cond_wait).  It can't see dependencies between the threads
     if the signaller arrives first.  In the latter case, POSIX
     guidelines imply that the associated boolean condition still
     provides an inter-thread synchronisation event, but one which is
     invisible to Helgrind.</para>

     <para>The result of Helgrind missing some inter-thread
     synchronisation events is to cause it to report false positives.
     That's because missing such events reduces the extent to which it
     can transfer exclusive memory ownership between threads.  So
     memory may end up in a shared-modified state when that was not
     intended by the application programmers.</para>

     <para>The root cause of this synchronisation lossage is
     particularly hard to understand, so an example is helpful.  It was
     discussed at length by Arndt Muehlenfeld ("Runtime Race Detection
     in Multi-Threaded Programs", Dissertation, TU Graz, Austria).  The
     canonical POSIX-recommended usage scheme for condition variables
     is as follows:</para>

 <programlisting><![CDATA[
 b   is a Boolean condition, which is False most of the time
 cv  is a condition variable
 mx  is its associated mutex

 Signaller:                             Waiter:

 lock(mx)                               lock(mx)
 b = True                               while (b == False)
 signal(cv)                                wait(cv,mx)
 unlock(mx)                             unlock(mx)
 ]]></programlisting>

     <para>Assume <computeroutput>b</computeroutput> is False most of
     the time.  If the waiter arrives at the rendezvous first, it
     enters its while-loop, waits for the signaller to signal, and
     eventually proceeds.  Helgrind sees the signal, notes the
     dependency, and all is well.</para>

     <para>If the signaller arrives
     first, <computeroutput>b</computeroutput> is set to true, and the
     signal disappears into nowhere.  When the waiter later arrives, it
     does not enter its while-loop and simply carries on.  But even in
     this case, the waiter code following the while-loop cannot execute
     until the signaller sets <computeroutput>b</computeroutput> to
     True.  Hence there is still the same inter-thread dependency, but
     this time it is through an arbitrary in-memory condition, and
     Helgrind cannot see it.</para>

     <para>By comparison, Helgrind's detection of inter-thread
     dependencies caused by semaphore operations is believed to be
     exactly correct.</para>

     <para>As far as I know, a solution to this problem that does not
     require source-level annotation of condition-variable wait loops
     is beyond the current state of the art.</para>
   </listitem>

   <listitem>
     <para>Make sure you are using a supported Linux distribution.  At
     present, Helgrind only properly supports x86-linux and amd64-linux
     with glibc-2.3 or later.  The latter restriction means we only
     support glibc's NPTL threading implementation.  The old
     LinuxThreads implementation is not supported.</para>

     <para>Unsupported targets may work to varying degrees.  In
     particular ppc32-linux and ppc64-linux running NTPL should work,
     but you will get false race errors because Helgrind does not know
     how to properly handle atomic instruction sequences created using
     the lwarx/stwcx instructions.</para>
   </listitem>

   <listitem>
     <para>Round up all finished threads using pthread_join.  Avoid
     detaching threads: don't create threads in the detached state, and
     don't call pthread_detach on existing threads.</para>

     <para>Using pthread_join to round up finished threads provides a
     clear synchronisation point that both Helgrind and programmers can
     see.  This synchronisation point allows Helgrind to adjust its
     memory ownership
     models <link linkend="hg-manual.data-races.exclusive">as described
     extensively above</link>, which helps Helgrind produce more
     accurate error reports.</para>

     <para>If you don't call pthread_join on a thread, Helgrind has no
     way to know when it finishes, relative to any significant
     synchronisation points for other threads in the program.  So it
     assumes that the thread lingers indefinitely and can potentially
     interfere indefinitely with the memory state of the program.  It
     has every right to assume that -- after all, it might really be
     the case that, for scheduling reasons, the exiting thread did run
     very slowly in the last stages of its life.</para>
   </listitem>

   <listitem>
     <para>Perform thread debugging (with Helgrind) and memory
     debugging (with Memcheck) together.</para>

     <para>Helgrind tracks the state of memory in detail, and memory
     management bugs in the application are liable to cause confusion.
     In extreme cases, applications which do many invalid reads and
     writes (particularly to freed memory) have been known to crash
     Helgrind.  So, ideally, you should make your application
     Memcheck-clean before using Helgrind.</para>

     <para>It may be impossible to make your application Memcheck-clean
     unless you first remove threading bugs.  In particular, it may be
     difficult to remove all reads and writes to freed memory in
     multithreaded C++ destructor sequences at program termination.
     So, ideally, you should make your application Helgrind-clean
     before using Memcheck.</para>

     <para>Since this circularity is obviously unresolvable, at least
     bear in mind that Memcheck and Helgrind are to some extent
     complementary, and you may need to use them together.</para>
   </listitem>

   <listitem>
     <para>POSIX requires that implementations of standard I/O (printf,
     fprintf, fwrite, fread, etc) are thread safe.  Unfortunately GNU
     libc implements this by using internal locking primitives that
     Helgrind is unable to intercept.  Consequently Helgrind generates
     many false race reports when you use these functions.</para>

     <para>Helgrind attempts to hide these errors using the standard
     Valgrind error-suppression mechanism.  So, at least for simple
     test cases, you don't see any.  Nevertheless, some may slip
     through.  Just something to be aware of.</para>
   </listitem>

   <listitem>
     <para>Helgrind's error checks do not work properly inside the
     system threading library itself
     (<computeroutput>libpthread.so</computeroutput>), and it usually
     observes large numbers of (false) errors in there.  Valgrind's
     suppression system then filters these out, so you should not see
     them.</para>

     <para>If you see any race errors reported
     where <computeroutput>libpthread.so</computeroutput> or
     <computeroutput>ld.so</computeroutput> is the object associated
     with the innermost stack frame, please file a bug report at
     http://www.valgrind.org.</para>
   </listitem>

 </orderedlist>

 </sect1>


 <sect1 id="hg-manual.options" xreflabel="Helgrind Options">
 <title>Helgrind Options</title>

 <para>The following end-user options are available:</para>

 <!-- start of xi:include in the manpage -->
 <variablelist id="hg.opts.list">

   <varlistentry id="opt.happens-before" xreflabel="--happens-before">
     <term>
       <option><![CDATA[--happens-before=none|threads|all
       [default: all] ]]></option>
     </term>
     <listitem>
       <para>Helgrind always regards locks as the basis for
        inter-thread synchronisation.  However, by default, before
        reporting a race error, Helgrind will also check whether
        certain other kinds of inter-thread synchronisation events
        happened.  It may be that if such events took place, then no
        race really occurred, and so no error needs to be reported.
        See <link linkend="hg-manual.data-races.exclusive">above</link>
        for a discussion of transfers of exclusive ownership states
        between threads.
       </para>
       <para>With <varname>--happens-before=all</varname>, the
        following events are regarded as sources of synchronisation:
        thread creation/joinage, condition variable
        signal/broadcast/waits, and semaphore posts/waits.
       </para>
       <para>With <varname>--happens-before=threads</varname>, only
        thread creation/joinage events are regarded as sources of
        synchronisation.
       </para>
       <para>With <varname>--happens-before=none</varname>, no events
        (apart, of course, from locking) are regarded as sources of
        synchronisation.
       </para>
       <para>Changing this setting from the default will increase your
        false-error rate but give little or no gain.  The only advantage
        is that <option>--happens-before=threads</option> and
        <option>--happens-before=none</option> should make Helgrind
        less and less sensitive to the scheduling of threads, and hence
        the output more and more repeatable across runs.
       </para>
     </listitem>
   </varlistentry>

   <varlistentry id="opt.trace-addr" xreflabel="--trace-addr">
     <term>
       <option><![CDATA[--trace-addr=0xXXYYZZ
       ]]></option> and
       <option><![CDATA[--trace-level=0|1|2 [default: 1]
       ]]></option>
     </term>
     <listitem>
       <para>Requests that Helgrind produces a log of all state changes
       to location 0xXXYYZZ.  This can be helpful in tracking down
       tricky races.  <varname>--trace-level</varname> controls the
       verbosity of the log.  At the default setting (1), a one-line
       summary of is printed for each state change.  At level 2 a
       complete stack trace is printed for each state change.</para>
     </listitem>
   </varlistentry>

 </variablelist>
 <!-- end of xi:include in the manpage -->

 <!-- start of xi:include in the manpage -->
 <para>In addition, the following debugging options are available for
 Helgrind:</para>

 <variablelist id="hg.debugopts.list">

   <varlistentry id="opt.trace-malloc" xreflabel="--trace-malloc">
     <term>
       <option><![CDATA[--trace-malloc=no|yes [no]
       ]]></option>
     </term>
     <listitem>
       <para>Show all client malloc (etc) and free (etc) requests.</para>
     </listitem>
   </varlistentry>

   <varlistentry id="opt.gen-vcg" xreflabel="--gen-vcg">
     <term>
       <option><![CDATA[--gen-vcg=no|yes|yes-w-vts [no]
       ]]></option>
     </term>
     <listitem>
       <para>At exit, write to stderr a dump of the happens-before
 	graph computed by Helgrind, in a format suitable for the VCG
         graph visualisation tool.  A suitable command line is:</para>
       <para><computeroutput>valgrind --tool=helgrind
         --gen-vcg=yes my_app 2&gt;&amp;1
         | grep xxxxxx | sed "s/xxxxxx//g"
         | xvcg -</computeroutput></para>
       <para>With <varname>--gen-vcg=yes</varname>, the basic
         happens-before graph is shown.  With
         <varname>--gen-vcg=yes-w-vts</varname>, the vector timestamp
         for each node is also shown.</para>
     </listitem>
   </varlistentry>

   <varlistentry id="opt.cmp-race-err-addrs"
                 xreflabel="--cmp-race-err-addrs">
     <term>
       <option><![CDATA[--cmp-race-err-addrs=no|yes [no]
       ]]></option>
     </term>
     <listitem>
       <para>Controls whether or not race (data) addresses should be
         taken into account when removing duplicates of race errors.
         With <varname>--cmp-race-err-addrs=no</varname>, two otherwise
         identical race errors will be considered to be the same if
         their race addresses differ.  With
         With <varname>--cmp-race-err-addrs=yes</varname> they will be
         considered different.  This is provided to help make certain
         regression tests work reliably.</para>
     </listitem>
   </varlistentry>

   <varlistentry id="opt.tc-sanity-flags" xreflabel="--tc-sanity-flags">
     <term>
       <option><![CDATA[--tc-sanity-flags=<XXXXX> (X = 0|1) [00000]
       ]]></option>
     </term>
     <listitem>
       <para>Run extensive sanity checks on Helgrind's internal
         data structures at events defined by the bitstring, as
         follows:</para>
       <para><computeroutput>10000 </computeroutput>after changes to
         the lock order acquisition graph</para>
       <para><computeroutput>01000 </computeroutput>after every client
         memory access (NB: not currently used)</para>
       <para><computeroutput>00100 </computeroutput>after every client
         memory range permission setting of 256 bytes or greater</para>
       <para><computeroutput>00010 </computeroutput>after every client
         lock or unlock event</para>
       <para><computeroutput>00001 </computeroutput>after every client
         thread creation or joinage event</para>
       <para>Note these will make Helgrind run very slowly, often to
         the point of being completely unusable.</para>
     </listitem>
   </varlistentry>

 </variablelist>
 <!-- end of xi:include in the manpage -->


 </sect1>

 <sect1 id="hg-manual.todolist" xreflabel="To Do List">
 <title>A To-Do List for Helgrind</title>

 <para>The following is a list of loose ends which should be tidied up
 some time.</para>

 <itemizedlist>
   <listitem><para>Track which mutexes are associated with which
     condition variables, and emit a warning if this becomes
     inconsistent.</para>
   </listitem>
   <listitem><para>For lock order errors, print the complete lock
     cycle, rather than only doing for size-2 cycles as at
     present.</para>
   </listitem>
   <listitem><para>Document the VALGRIND_HG_CLEAN_MEMORY client
     request.</para>
   </listitem>
   <listitem><para>Possibly a client request to forcibly transfer
     ownership of memory from one thread to another.  Requires further
     consideration.</para>
   </listitem>
   <listitem><para>Add a new client request that marks an address range
     as being "shared-modified with empty lockset" (the error state),
     and describe how to use it.</para>
   </listitem>
   <listitem><para>Document races caused by gcc's thread-unsafe code
     generation for speculative stores.  In the interim see
     <computeroutput>http://gcc.gnu.org/ml/gcc/2007-10/msg00266.html
     </computeroutput>
     and <computeroutput>http://lkml.org/lkml/2007/10/24/673</computeroutput>.
     </para>
   </listitem>
   <listitem><para>Don't update the lock-order graph, and don't check
     for errors, when a "try"-style lock operation happens (eg
     pthread_mutex_trylock).  Such calls do not add any real
     restrictions to the locking order, since they can always fail to
     acquire the lock, resulting in the caller going off and doing Plan
     B (presumably it will have a Plan B).  Doing such checks could
     generate false lock-order errors and confuse users.</para>
   </listitem>
   <listitem><para> Performance can be very poor.  Slowdowns on the
     order of 100:1 are not unusual.  There is quite some scope for
     performance improvements, though.
     </para>
   </listitem>

 </itemizedlist>

 </sect1>

 </chapter>