| <?xml version="1.0"?> <!-- -*- sgml -*- --> |
| <!DOCTYPE chapter PUBLIC "-//OASIS//DTD DocBook XML V4.2//EN" |
| "http://www.oasis-open.org/docbook/xml/4.2/docbookx.dtd" |
| [ <!ENTITY % vg-entities SYSTEM "../../docs/xml/vg-entities.xml"> %vg-entities; ]> |
| |
| |
| <chapter id="hg-manual" xreflabel="Helgrind: thread error detector"> |
| <title>Helgrind: a thread error detector</title> |
| |
| <para>To use this tool, you must specify |
| <computeroutput>--tool=helgrind</computeroutput> on the Valgrind |
| command line.</para> |
| |
| |
| |
| |
| <sect1 id="hg-manual.overview" xreflabel="Overview"> |
| <title>Overview</title> |
| |
| <para>Helgrind is a Valgrind tool for detecting synchronisation errors |
| in C, C++ and Fortran programs that use the POSIX pthreads |
| threading primitives.</para> |
| |
| <para>The main abstractions in POSIX pthreads are: a set of threads |
| sharing a common address space, thread creation, thread joinage, |
| thread exit, mutexes (locks), condition variables (inter-thread event |
| notifications), reader-writer locks, and semaphores.</para> |
| |
| <para>Helgrind is aware of all these abstractions and tracks their |
| effects as accurately as it can. Currently it does not correctly |
| handle pthread barriers and pthread spinlocks, although it will not |
| object if you use them. On x86 and amd64 platforms, it understands |
| and partially handles implicit locking arising from the use of the |
| LOCK instruction prefix. |
| </para> |
| |
| <para>Helgrind can detect three classes of errors, which are discussed |
| in detail in the next three sections:</para> |
| |
| <orderedlist> |
| <listitem> |
| <para><link linkend="hg-manual.api-checks"> |
| Misuses of the POSIX pthreads API.</link></para> |
| </listitem> |
| <listitem> |
| <para><link linkend="hg-manual.lock-orders"> |
| Potential deadlocks arising from lock |
| ordering problems.</link></para> |
| </listitem> |
| <listitem> |
| <para><link linkend="hg-manual.data-races"> |
| Data races -- accessing memory without adequate locking. |
| </link></para> |
| </listitem> |
| </orderedlist> |
| |
| <para>Following those is a section containing |
| <link linkend="hg-manual.effective-use"> |
| hints and tips on how to get the best out of Helgrind.</link> |
| </para> |
| |
| <para>Then there is a |
| <link linkend="hg-manual.options">summary of command-line |
| options.</link> |
| </para> |
| |
| <para>Finally, there is |
| <link linkend="hg-manual.todolist">a brief summary of areas in which Helgrind |
| could be improved.</link> |
| </para> |
| |
| </sect1> |
| |
| |
| |
| |
| <sect1 id="hg-manual.api-checks" xreflabel="API Checks"> |
| <title>Detected errors: Misuses of the POSIX pthreads API</title> |
| |
| <para>Helgrind intercepts calls to many POSIX pthreads functions, and |
| is therefore able to report on various common problems. Although |
| these are unglamourous errors, their presence can lead to undefined |
| program behaviour and hard-to-find bugs later in execution. The |
| detected errors are:</para> |
| |
| <itemizedlist> |
| <listitem><para>unlocking an invalid mutex</para></listitem> |
| <listitem><para>unlocking a not-locked mutex</para></listitem> |
| <listitem><para>unlocking a mutex held by a different |
| thread</para></listitem> |
| <listitem><para>destroying an invalid or a locked mutex</para></listitem> |
| <listitem><para>recursively locking a non-recursive mutex</para></listitem> |
| <listitem><para>deallocation of memory that contains a |
| locked mutex</para></listitem> |
| <listitem><para>passing mutex arguments to functions expecting |
| reader-writer lock arguments, and vice |
| versa</para></listitem> |
| <listitem><para>when a POSIX pthread function fails with an |
| error code that must be handled</para></listitem> |
| <listitem><para>when a thread exits whilst still holding locked |
| locks</para></listitem> |
| <listitem><para>calling <computeroutput>pthread_cond_wait</computeroutput> |
| with a not-locked mutex, or one locked by a different |
| thread</para></listitem> |
| </itemizedlist> |
| |
| <para>Checks pertaining to the validity of mutexes are generally also |
| performed for reader-writer locks.</para> |
| |
| <para>Various kinds of this-can't-possibly-happen events are also |
| reported. These usually indicate bugs in the system threading |
| library.</para> |
| |
| <para>Reported errors always contain a primary stack trace indicating |
| where the error was detected. They may also contain auxiliary stack |
| traces giving additional information. In particular, most errors |
| relating to mutexes will also tell you where that mutex first came to |
| Helgrind's attention (the "<computeroutput>was first observed |
| at</computeroutput>" part), so you have a chance of figuring out which |
| mutex it is referring to. For example:</para> |
| |
| <programlisting><![CDATA[ |
| Thread #1 unlocked a not-locked lock at 0x7FEFFFA90 |
| at 0x4C2408D: pthread_mutex_unlock (hg_intercepts.c:492) |
| by 0x40073A: nearly_main (tc09_bad_unlock.c:27) |
| by 0x40079B: main (tc09_bad_unlock.c:50) |
| Lock at 0x7FEFFFA90 was first observed |
| at 0x4C25D01: pthread_mutex_init (hg_intercepts.c:326) |
| by 0x40071F: nearly_main (tc09_bad_unlock.c:23) |
| by 0x40079B: main (tc09_bad_unlock.c:50) |
| ]]></programlisting> |
| |
| <para>Helgrind has a way of summarising thread identities, as |
| evidenced here by the text "<computeroutput>Thread |
| #1</computeroutput>". This is so that it can speak about threads and |
| sets of threads without overwhelming you with details. See |
| <link linkend="hg-manual.data-races.errmsgs">below</link> |
| for more information on interpreting error messages.</para> |
| |
| </sect1> |
| |
| |
| |
| |
| <sect1 id="hg-manual.lock-orders" xreflabel="Lock Orders"> |
| <title>Detected errors: Inconsistent Lock Orderings</title> |
| |
| <para>In this section, and in general, to "acquire" a lock simply |
| means to lock that lock, and to "release" a lock means to unlock |
| it.</para> |
| |
| <para>Helgrind monitors the order in which threads acquire locks. |
| This allows it to detect potential deadlocks which could arise from |
| the formation of cycles of locks. Detecting such inconsistencies is |
| useful because, whilst actual deadlocks are fairly obvious, potential |
| deadlocks may never be discovered during testing and could later lead |
| to hard-to-diagnose in-service failures.</para> |
| |
| <para>The simplest example of such a problem is as |
| follows.</para> |
| |
| <itemizedlist> |
| <listitem><para>Imagine some shared resource R, which, for whatever |
| reason, is guarded by two locks, L1 and L2, which must both be held |
| when R is accessed.</para> |
| </listitem> |
| <listitem><para>Suppose a thread acquires L1, then L2, and proceeds |
| to access R. The implication of this is that all threads in the |
| program must acquire the two locks in the order first L1 then L2. |
| Not doing so risks deadlock.</para> |
| </listitem> |
| <listitem><para>The deadlock could happen if two threads -- call them |
| T1 and T2 -- both want to access R. Suppose T1 acquires L1 first, |
| and T2 acquires L2 first. Then T1 tries to acquire L2, and T2 tries |
| to acquire L1, but those locks are both already held. So T1 and T2 |
| become deadlocked.</para> |
| </listitem> |
| </itemizedlist> |
| |
| <para>Helgrind builds a directed graph indicating the order in which |
| locks have been acquired in the past. When a thread acquires a new |
| lock, the graph is updated, and then checked to see if it now contains |
| a cycle. The presence of a cycle indicates a potential deadlock involving |
| the locks in the cycle.</para> |
| |
| <para>In simple situations, where the cycle only contains two locks, |
| Helgrind will show where the required order was established:</para> |
| |
| <programlisting><![CDATA[ |
| Thread #1: lock order "0x7FEFFFAB0 before 0x7FEFFFA80" violated |
| at 0x4C23C91: pthread_mutex_lock (hg_intercepts.c:388) |
| by 0x40081F: main (tc13_laog1.c:24) |
| Required order was established by acquisition of lock at 0x7FEFFFAB0 |
| at 0x4C23C91: pthread_mutex_lock (hg_intercepts.c:388) |
| by 0x400748: main (tc13_laog1.c:17) |
| followed by a later acquisition of lock at 0x7FEFFFA80 |
| at 0x4C23C91: pthread_mutex_lock (hg_intercepts.c:388) |
| by 0x400773: main (tc13_laog1.c:18) |
| ]]></programlisting> |
| |
| <para>When there are more than two locks in the cycle, the error is |
| equally serious. However, at present Helgrind does not show the locks |
| involved, so as to avoid flooding you with information. That could be |
| fixed in future. For example, here is a an example involving a cycle |
| of five locks from a naive implementation the famous Dining |
| Philosophers problem |
| (see <computeroutput>helgrind/tests/tc14_laog_dinphils.c</computeroutput>). |
| In this case Helgrind has detected that all 5 philosophers could |
| simultaneously pick up their left fork and then deadlock whilst |
| waiting to pick up their right forks.</para> |
| |
| <programlisting><![CDATA[ |
| Thread #6: lock order "0x6010C0 before 0x601160" violated |
| at 0x4C23C91: pthread_mutex_lock (hg_intercepts.c:388) |
| by 0x4007C0: dine (tc14_laog_dinphils.c:19) |
| by 0x4C25DF7: mythread_wrapper (hg_intercepts.c:178) |
| by 0x4E2F09D: start_thread (in /lib64/libpthread-2.5.so) |
| by 0x51054CC: clone (in /lib64/libc-2.5.so) |
| ]]></programlisting> |
| |
| </sect1> |
| |
| |
| |
| |
| <sect1 id="hg-manual.data-races" xreflabel="Data Races"> |
| <title>Detected errors: Data Races</title> |
| |
| <para>A data race happens, or could happen, when two threads |
| access a shared memory location without using suitable locks to |
| ensure single-threaded access. Such missing locking can cause |
| obscure timing dependent bugs. Ensuring programs are race-free is |
| one of the central difficulties of threaded programming.</para> |
| |
| <para>Reliably detecting races is a difficult problem, and most |
| of Helgrind's internals are devoted to do dealing with it. |
| As a consequence this section is somewhat long and involved. |
| We begin with a simple example.</para> |
| |
| |
| <sect2 id="hg-manual.data-races.example" xreflabel="Simple Race"> |
| <title>A Simple Data Race</title> |
| |
| <para>About the simplest possible example of a race is as follows. In |
| this program, it is impossible to know what the value |
| of <computeroutput>var</computeroutput> is at the end of the program. |
| Is it 2 ? Or 1 ?</para> |
| |
| <programlisting><![CDATA[ |
| #include <pthread.h> |
| |
| int var = 0; |
| |
| void* child_fn ( void* arg ) { |
| var++; /* Unprotected relative to parent */ /* this is line 6 */ |
| return NULL; |
| } |
| |
| int main ( void ) { |
| pthread_t child; |
| pthread_create(&child, NULL, child_fn, NULL); |
| var++; /* Unprotected relative to child */ /* this is line 13 */ |
| pthread_join(child, NULL); |
| return 0; |
| } |
| ]]></programlisting> |
| |
| <para>The problem is there is nothing to |
| stop <computeroutput>var</computeroutput> being updated simultaneously |
| by both threads. A correct program would |
| protect <computeroutput>var</computeroutput> with a lock of type |
| <computeroutput>pthread_mutex_t</computeroutput>, which is acquired |
| before each access and released afterwards. Helgrind's output for |
| this program is:</para> |
| |
| <programlisting><![CDATA[ |
| Thread #1 is the program's root thread |
| |
| Thread #2 was created |
| at 0x510548E: clone (in /lib64/libc-2.5.so) |
| by 0x4E2F305: do_clone (in /lib64/libpthread-2.5.so) |
| by 0x4E2F7C5: pthread_create@@GLIBC_2.2.5 (in /lib64/libpthread-2.5.so) |
| by 0x4C23870: pthread_create@* (hg_intercepts.c:198) |
| by 0x4005F1: main (simple_race.c:12) |
| |
| Possible data race during write of size 4 at 0x601034 |
| at 0x4005F2: main (simple_race.c:13) |
| Old state: shared-readonly by threads #1, #2 |
| New state: shared-modified by threads #1, #2 |
| Reason: this thread, #1, holds no consistent locks |
| Location 0x601034 has never been protected by any lock |
| ]]></programlisting> |
| |
| <para>This is quite a lot of detail for an apparently simple error. |
| The last clause is the main error message. It says there is a race as |
| a result of a write of size 4 (bytes), at 0x601034, which is |
| presumably the address of <computeroutput>var</computeroutput>, |
| happening in function <computeroutput>main</computeroutput> at line 13 |
| in the program.</para> |
| |
| <para>Note that it is purely by chance that the race is |
| reported for the parent thread's access. It could equally have been |
| reported instead for the child's access, at line 6. The error will |
| only be reported for one of the locations, since neither the parent |
| nor child is, by itself, incorrect. It is only when both access |
| <computeroutput>var</computeroutput> without a lock that an error |
| exists.</para> |
| |
| <para>The error message shows some other interesting details. The |
| sections below explain them. Here we merely note their presence:</para> |
| |
| <itemizedlist> |
| <listitem><para>Helgrind maintains some kind of state machine for the |
| memory location in question, hence the "<computeroutput>Old |
| state:</computeroutput>" and "<computeroutput>New |
| state:</computeroutput>" lines.</para> |
| </listitem> |
| <listitem><para>Helgrind keeps track of which threads have accessed |
| the location: "<computeroutput>threads #1, #2</computeroutput>". |
| Before printing the main error message, it prints the creation |
| points of these two threads, so you can see which threads it is |
| referring to.</para> |
| </listitem> |
| <listitem><para>Helgrind tries to provide an explanation of why the |
| race exists: "<computeroutput>Location 0x601034 has never been |
| protected by any lock</computeroutput>".</para> |
| </listitem> |
| </itemizedlist> |
| |
| <para>Understanding the memory state machine is central to |
| understanding Helgrind's race-detection algorithm. The next three |
| subsections explain this.</para> |
| |
| </sect2> |
| |
| |
| <sect2 id="hg-manual.data-races.memstates" xreflabel="Memory States"> |
| <title>Helgrind's Memory State Machine</title> |
| |
| <para>Helgrind tracks the state of every byte of memory used by your |
| program. There are a number of states, but only three are |
| interesting:</para> |
| |
| <itemizedlist> |
| <listitem><para>Exclusive: memory in this state is regarded as owned |
| exclusively by one particular thread. That thread may read and |
| write it without a lock. Even in highly threaded programs, the |
| majority of locations never leave the Exclusive state, since most |
| data is thread-private.</para> |
| </listitem> |
| <listitem><para>Shared-Readonly: memory in this state is regarded as |
| shared by multiple threads. In this state, any thread may read the |
| memory without a lock, reflecting the fact that readonly data may |
| safely be shared between threads without locking.</para> |
| </listitem> |
| <listitem><para>Shared-Modified: memory in this state is regarded as |
| shared by multiple threads, at least one of which has written to it. |
| All participating threads must hold at least one lock in common when |
| accessing the memory. If no such lock exists, Helgrind reports a |
| race error.</para> |
| </listitem> |
| </itemizedlist> |
| |
| <para>Let's review the simple example above with this in mind. When |
| the program starts, <computeroutput>var</computeroutput> is not in any |
| of these states. Either the parent or child thread gets to its |
| <computeroutput>var++</computeroutput> first, and thereby |
| thereby gets Exclusive ownership of the location.</para> |
| |
| <para>The later-running thread now arrives at |
| its <computeroutput>var++</computeroutput> statement. It first reads |
| the existing value from memory. |
| Because <computeroutput>var</computeroutput> is currently marked as |
| owned exclusively by the other thread, its state is changed to |
| shared-readonly by both threads.</para> |
| |
| <para>This same thread adds one to the value it has and stores it back |
| in <computeroutput>var</computeroutput>. This causes another state |
| change, this time to the shared-modified state. Because Helgrind has |
| also been tracking which threads hold which locks, it can see that |
| <computeroutput>var</computeroutput> is in shared-modified state but |
| no lock has been used to consistently protect it. Hence a race is |
| reported exactly at the transition from shared-readonly to |
| shared-modified.</para> |
| |
| <para>The essence of the algorithm is this. Helgrind keeps track of |
| each memory location that has been accessed by more than one thread. |
| For each such location it incrementally infers the set of locks which |
| have consistently been used to protect that location. If the |
| location's lockset becomes empty, and at some point one of the threads |
| attempts to write to it, a race is then reported.</para> |
| |
| <para>This technique is known as "lockset inference" and was |
| introduced in: "Eraser: A Dynamic Data Race Detector for Multithreaded |
| Programs" (Stefan Savage, Michael Burrows, Greg Nelson, Patrick |
| Sobalvarro and Thomas Anderson, ACM Transactions on Computer Systems, |
| 15(4):391-411, November 1997).</para> |
| |
| <para>Lockset inference has since been widely implemented, studied and |
| extended. Helgrind incorporates several refinements aimed at avoiding |
| the high false error rate that naive versions of the algorithm suffer |
| from. A |
| <link linkend="hg-manual.data-races.summary">summary of the complete |
| algorithm used by Helgrind</link> is presented below. First, however, |
| it is important to understand details of transitions pertaining to the |
| Exclusive-ownership state.</para> |
| |
| </sect2> |
| |
| |
| |
| <sect2 id="hg-manual.data-races.exclusive" xreflabel="Excl Transfers"> |
| <title>Transfers of Exclusive Ownership Between Threads</title> |
| |
| <para>As presented, the algorithm is far too strict. It reports many |
| errors in perfectly correct, widely used parallel programming |
| constructions, for example, using child worker threads and worker |
| thread pools.</para> |
| |
| <para>To avoid these false errors, we must refine the algorithm so |
| that it keeps memory in an Exclusive ownership state in cases where it |
| would otherwise decay into a shared-readonly or shared-modified state. |
| Recall that Exclusive ownership is special in that it grants the |
| owning thread the right to access memory without use of any locks. In |
| order to support worker-thread and worker-thread-pool idioms, we will |
| allow threads to steal exclusive ownership of memory from other |
| threads under certain circumstances.</para> |
| |
| <para>Here's an example. Imagine a parent thread creates child |
| threads to do units of work. For each unit of work, the parent |
| allocates a work buffer, fills it in, and creates the child thread, |
| handing it a pointer to the buffer. The child reads/writes the buffer |
| and eventually exits, and the waiting parent then extracts the results |
| from the buffer:</para> |
| |
| <programlisting><![CDATA[ |
| typedef ... Buffer; |
| |
| pthread_t child; |
| Buffer buf; |
| |
| /* ---- Parent ---- */ /* ---- Child ---- */ |
| |
| /* parent writes workload into buf */ |
| pthread_create( &child, child_fn, &buf ); |
| |
| /* parent does not read */ void child_fn ( Buffer* buf ) { |
| /* or write buf */ /* read/write buf */ |
| } |
| |
| pthread_join ( child ); |
| /* parent reads results from buf */ |
| ]]></programlisting> |
| |
| <para>Although <computeroutput>buf</computeroutput> is accessed by |
| both threads, neither uses locks, yet the program is race-free. The |
| essential observation is that the child's creation and exit create |
| synchronisation events between it and the parent. These force the |
| child's accesses to <computeroutput>buf</computeroutput> to happen |
| after the parent initialises <computeroutput>buf</computeroutput>, and |
| before the parent reads the results |
| from <computeroutput>buf</computeroutput>.</para> |
| |
| <para>To model this, Helgrind allows the child to steal, from the |
| parent, exclusive ownership of any memory exclusively owned by the |
| parent before the pthread_create call. Similarly, once the parent's |
| pthread_join call returns, it can steal back ownership of memory |
| exclusively owned by the child. In this way ownership |
| of <computeroutput>buf</computeroutput> is transferred from parent to |
| child and back, so the basic algorithm does not report any races |
| despite the absence of any locking.</para> |
| |
| <para>Note that the child may only steal memory owned by the parent |
| prior to the pthread_create call. If the child attempts to read or |
| write memory which is also accessed by the parent in between the |
| pthread_create and pthread_join calls, an error is still |
| reported.</para> |
| |
| <para>This technique was introduced with the name "thread lifetime |
| segments" in "Runtime Checking of Multithreaded Applications with |
| Visual Threads" (Jerry J. Harrow, Jr, Proceedings of the 7th |
| International SPIN Workshop on Model Checking of Software Stanford, |
| California, USA, August 2000, LNCS 1885, pp331--342). Helgrind |
| implements an extended version of it. Specifically, Helgrind allows |
| transfer of exclusive ownership in the following situations:</para> |
| |
| <itemizedlist> |
| <listitem><para>At thread creation: a child can acquire ownership of |
| memory held exclusively by the parent prior to the child's |
| creation.</para> |
| </listitem> |
| <listitem><para>At thread joining: the joiner (thread not exiting) |
| can acquire ownership of memory held exclusively by the joinee |
| (thread that is exiting) at the point it exited.</para> |
| </listitem> |
| <listitem><para>At condition variable signallings and broadcasts. A |
| thread Tw which completes a pthread_cond_wait call as a result of |
| a signal or broadcast on the same condition variable by some other |
| thread Ts, may acquire ownership of memory held exclusively by |
| Ts prior to the pthread_cond_signal/broadcast |
| call.</para> |
| </listitem> |
| <listitem><para>At semaphore posts (sem_post) calls. A thread Tw |
| which completes a sem_wait call call as a result of a sem_post call |
| on the same semaphore by some other thread Tp, may acquire |
| ownership of memory held exclusively by Tp prior to the sem_post |
| call.</para> |
| </listitem> |
| </itemizedlist> |
| |
| </sect2> |
| |
| |
| |
| <sect2 id="hg-manual.data-races.re-excl" xreflabel="Re-Excl Transfers"> |
| <title>Restoration of Exclusive Ownership</title> |
| |
| <para>Another common idiom is to partition the lifetime of the program |
| as a whole into several distinct phases. In some of those phases, a |
| memory location may be accessed by multiple threads and so require |
| locking. In other phases only one thread exists and so can access the |
| memory without locking. For example:</para> |
| |
| <programlisting><![CDATA[ |
| int var = 0; /* shared variable */ |
| pthread_mutex_t mx = PTHREAD_MUTEX_INITIALIZER; /* guard for var */ |
| pthread_t child; |
| |
| /* ---- Parent ---- */ /* ---- Child ---- */ |
| |
| var += 1; /* no lock used */ |
| |
| pthread_create( &child, child_fn, NULL ); |
| |
| void child_fn ( void* uu ) { |
| pthread_mutex_lock(&mx); pthread_mutex_lock(&mx); |
| var += 2; var += 3; |
| pthread_mutex_unlock(&mx); pthread_mutex_unlock(&mx); |
| } |
| |
| pthread_join ( child ); |
| |
| var += 4; /* no lock used */ |
| ]]></programlisting> |
| |
| <para>This program is correct, but using only the mechanisms described |
| so far, Helgrind would report an error at |
| <computeroutput>var += 4</computeroutput>. This is because, by that |
| point, <computeroutput>var</computeroutput> is marked as being in the |
| state "shared-modified and protected by the |
| lock <computeroutput>mx</computeroutput>", but is being accessed |
| without locking. Really, what we want is |
| for <computeroutput>var</computeroutput> to return to the parent |
| thread's exclusive ownership after the child thread has exited.</para> |
| |
| <para>To make this possible, for every memory location Helgrind also keeps |
| track of all the threads that have accessed that location |
| -- its threadset. When a thread Tquitter joins back to Tstayer, |
| Helgrind examines the locksets of all memory in shared-modified or |
| shared-readable state. In each such lockset, if Tquitter is |
| mentioned, it is removed and replaced by Tstayer. If, as a result, a |
| lockset becomes a singleton set containing Tstayer, then the |
| location's state is changed to belongs-exclusively-to-Tstayer.</para> |
| |
| <para>In our example, the result is exactly as we desire: |
| <computeroutput>var</computeroutput> is reacquired exclusively by the |
| parent after the child exits.</para> |
| |
| <para>More generally, when a group of threads merges back to a single |
| thread via a cascade of pthread_join calls, any memory shared by the |
| group (or a subset of it) ends up being owned exclusively by the sole |
| surviving thread. This significantly enhances Helgrind's flexibility, |
| since it means that each memory location may make arbitrarily many |
| transitions between exclusive and shared ownership. Furthermore, a |
| different lock may protect the location during each period of shared |
| ownership.</para> |
| |
| </sect2> |
| |
| |
| |
| <sect2 id="hg-manual.data-races.summary" xreflabel="Race Det Summary"> |
| <title>A Summary of the Race Detection Algorithm</title> |
| |
| <para>Helgrind looks for memory locations which are accessed by more |
| than one thread. For each such location, Helgrind records which of |
| the program's locks were held by the accessing thread at the time of |
| each access. The hope is to discover that there is indeed at least |
| one lock which is consistently used by all threads to protect that |
| location. If no such lock can be found, then there is apparently no |
| consistent locking strategy being applied for that location, and so a |
| possible data race might result. Helgrind accordingly reports an |
| error.</para> |
| |
| <para>In practice this discipline is far too simplistic, and is |
| unusable since it reports many races in some widely used and |
| known-correct programming disciplines. Helgrind's checking therefore |
| incorporates many refinements to this basic idea, and can be |
| summarised as follows:</para> |
| |
| <para>The following thread events are intercepted and monitored:</para> |
| |
| <itemizedlist> |
| <listitem><para>thread creation and exiting (pthread_create, |
| pthread_join, pthread_exit)</para> |
| </listitem> |
| <listitem> |
| <para>lock acquisition and release (pthread_mutex_lock, |
| pthread_mutex_unlock, pthread_rwlock_rdlock, |
| pthread_rwlock_wrlock, |
| pthread_rwlock_unlock)</para> |
| </listitem> |
| <listitem> |
| <para>inter-thread event notifications (pthread_cond_wait, |
| pthread_cond_signal, pthread_cond_broadcast, |
| sem_wait, sem_post)</para> |
| </listitem> |
| </itemizedlist> |
| |
| <para>Memory allocation and deallocation events are intercepted and |
| monitored:</para> |
| |
| <itemizedlist> |
| <listitem> |
| <para>malloc/new/free/delete and variants</para> |
| </listitem> |
| <listitem> |
| <para>stack allocation and deallocation</para> |
| </listitem> |
| </itemizedlist> |
| |
| <para>All memory accesses are intercepted and monitored.</para> |
| |
| <para>By observing the above events, Helgrind can infer certain |
| aspects of the program's locking discipline. Programs which adhere to |
| the following rules are considered to be acceptable: |
| </para> |
| |
| <itemizedlist> |
| <listitem> |
| <para>A thread may allocate memory, and write initial values into |
| it, without locking. That thread is regarded as owning the memory |
| exclusively.</para> |
| </listitem> |
| <listitem> |
| <para>A thread may read and write memory which it owns exclusively, |
| without locking.</para> |
| </listitem> |
| <listitem> |
| <para>Memory which is owned exclusively by one thread may be read by |
| that thread and others without locking. However, in this situation |
| no thread may do unlocked writes to the memory (except for the owner |
| thread's initializing write).</para> |
| </listitem> |
| <listitem> |
| <para>Memory which is shared between multiple threads, one or more |
| of which writes to it, must be protected by a lock which is |
| correctly acquired and released by all threads accessing the |
| memory.</para> |
| </listitem> |
| </itemizedlist> |
| |
| <para>Any violation of this discipline will cause an error to be reported. |
| However, two exemptions apply:</para> |
| |
| <itemizedlist> |
| <listitem> |
| <para>A thread Y can acquire exclusive ownership of memory |
| previously owned exclusively by a different thread X providing |
| X's last access and Y's first access are separated by one of the |
| following synchronization events:</para> |
| <itemizedlist> |
| <listitem><para>X creates thread Y</para></listitem> |
| <listitem><para>X joins back to Y</para></listitem> |
| <listitem><para>X uses a condition-variable to signal at Y, and Y is |
| waiting for that event</para></listitem> |
| <listitem><para>Y completes a semaphore wait as a result of X signalling |
| on that same semaphore</para></listitem> |
| </itemizedlist> |
| <para> |
| This refinement allows Helgrind to correctly track the ownership |
| state of inter-thread buffers used in the worker-thread and |
| worker-thread-pool concurrent programming idioms (styles).</para> |
| </listitem> |
| <listitem> |
| <para>Similarly, if thread Y joins back to thread X, memory |
| exclusively owned by Y becomes exclusively owned by X instead. |
| Also, memory that has been shared only by X and Y becomes |
| exclusively owned by X. More generally, memory that has been shared |
| by X, Y and some arbitrary other set S of threads is re-marked as |
| shared by X and S. Hence, under the right circumstances, memory |
| shared amongst multiple threads, all of which join into just one, |
| can revert to the exclusive ownership state.</para> |
| <para> |
| In effect, each memory location may make arbitrarily many |
| transitions between exclusive and shared ownership. Furthermore, a |
| different lock may protect the location during each period of shared |
| ownership. This significantly enhances the flexibility of the |
| algorithm.</para> |
| </listitem> |
| </itemizedlist> |
| |
| <para>The ownership state, accessing thread-set and related lock-set |
| for each memory location are tracked at 8-bit granularity. This means |
| the algorithm is precise even for 16- and 8-bit memory |
| accesses.</para> |
| |
| <para>Helgrind correctly handles reader-writer locks in this |
| framework. Locations shared between multiple threads can be protected |
| during reads by locks held in either read-mode or write-mode, but can |
| only be protected during writes by locks held in write-mode. Normal |
| POSIX mutexes are treated as if they are reader-writer locks which are |
| only ever held in write-mode.</para> |
| |
| <para>Helgrind correctly handles POSIX mutexes for which recursive |
| locking is allowed.</para> |
| |
| <para>Helgrind partially correctly handles x86 and amd64 memory access |
| instructions preceded by a LOCK prefix. Writes are correctly handled, |
| by pretending that the LOCK prefix implies acquisition and release of |
| a magic "bus hardware lock" mutex before and after the instruction. |
| This unfortunately requires subsequent reads from such locations to |
| also use a LOCK prefix, which is not required by the real hardware. |
| Helgrind does not offer any equivalent handling for atomic sequences |
| on PowerPC/POWER platforms created by the use of lwarx/stwcx |
| instructions.</para> |
| |
| </sect2> |
| |
| |
| |
| <sect2 id="hg-manual.data-races.errmsgs" xreflabel="Race Error Messages"> |
| <title>Interpreting Race Error Messages</title> |
| |
| <para>Helgrind's race detection algorithm collects a lot of |
| information, and tries to present it in a helpful way when a race is |
| detected. Here's an example:</para> |
| |
| <programlisting><![CDATA[ |
| Thread #2 was created |
| at 0x510548E: clone (in /lib64/libc-2.5.so) |
| by 0x4E2F305: do_clone (in /lib64/libpthread-2.5.so) |
| by 0x4E2F7C5: pthread_create@@GLIBC_2.2.5 (in /lib64/libpthread-2.5.so) |
| by 0x4C23870: pthread_create@* (hg_intercepts.c:198) |
| by 0x400CEF: main (tc17_sembar.c:195) |
| |
| // And the same for threads #3, #4 and #5 -- omitted for conciseness |
| |
| Possible data race during read of size 4 at 0x602174 |
| at 0x400BE5: gomp_barrier_wait (tc17_sembar.c:122) |
| by 0x400C44: child (tc17_sembar.c:161) |
| by 0x4C25DF7: mythread_wrapper (hg_intercepts.c:178) |
| by 0x4E2F09D: start_thread (in /lib64/libpthread-2.5.so) |
| by 0x51054CC: clone (in /lib64/libc-2.5.so) |
| Old state: shared-modified by threads #2, #3, #4, #5 |
| New state: shared-modified by threads #2, #3, #4, #5 |
| Reason: this thread, #2, holds no consistent locks |
| Last consistently used lock for 0x602174 was first observed |
| at 0x4C25D01: pthread_mutex_init (hg_intercepts.c:326) |
| by 0x4009E4: gomp_barrier_init (tc17_sembar.c:46) |
| by 0x400CBC: main (tc17_sembar.c:192) |
| ]]></programlisting> |
| |
| <para>Helgrind first announces the creation points of any threads |
| referenced in the error message. This is so it can speak concisely |
| about threads and sets of threads without repeatedly printing their |
| creation point call stacks. Each thread is only ever announced once, |
| the first time it appears in any Helgrind error message.</para> |
| |
| <para>The main error message begins at the text |
| "<computeroutput>Possible data race during read</computeroutput>". |
| At the start is information you would expect to see -- address and |
| size of the racing access, whether a read or a write, and the call |
| stack at the point it was detected.</para> |
| |
| <para>More interesting is the state transition caused by this access. |
| This memory is already in the shared-modified state, and up to now has |
| been consistently protected by at least one lock. However, the thread |
| making the access in question (thread #2, here) does not hold any |
| locks in common with those held during all previous accesses to the |
| location -- "no consistent locks", in other words.</para> |
| |
| <para>Finally, Helgrind shows the lock which has protected this |
| location in all previous accesses. (If there is more than one, only |
| one is shown). This can be a useful hint, because it typically shows |
| the lock that the programmers intended to use to protect the location, |
| but in this case forgot.</para> |
| |
| <para>Here are some more examples of race reports. This not an |
| exhaustive list of combinations, but should give you some insight into |
| how to interpret the output.</para> |
| |
| <programlisting><![CDATA[ |
| Possible data race during write ... |
| Old state: shared-readonly by threads #1, #2, #3 |
| New state: shared-modified by threads #1, #2, #3 |
| Reason: this thread, #3, holds no consistent locks |
| Location ... has never been protected by any lock |
| ]]></programlisting> |
| |
| <para>The location is shared by 3 threads, all of which have been |
| reading it without locking ("has never been protected by any lock"). |
| Now one of them is writing it. Regardless of whether the writer has a |
| lock or not, this is still an error, because the write races against |
| the previously observed reads.</para> |
| |
| <programlisting><![CDATA[ |
| Possible data race during read ... |
| Old state: shared-modified by threads #1, #2, #3 |
| New state: shared-modified by threads #1, #2, #3 |
| Reason: this thread, #3, holds no consistent locks |
| Last consistently used lock for ... was first observed ... |
| ]]></programlisting> |
| |
| <para>The location is shared by 3 threads, all of which have been |
| reading and writing it while (as required) holding at least one lock |
| in common. Now it is being read without that lock being held. In the |
| "Last consistently used lock" part, Helgrind offers its best guess as |
| to the identity of the lock that should have been used.</para> |
| |
| <programlisting><![CDATA[ |
| Possible data race during write ... |
| Old state: owned exclusively by thread #4 |
| New state: shared-modified by threads #4, #5 |
| Reason: this thread, #5, holds no locks at all |
| ]]></programlisting> |
| |
| <para>A location that has so far been accessed exclusively by thread |
| #4 has now been written by thread #5, without use of any lock. This |
| can be a sign that the programmer did not consider the possibility of |
| the location being shared between threads, or, alternatively, forgot |
| to use the appropriate lock.</para> |
| |
| <para>Note that thread #4 exclusively owns the location, and so has |
| the right to access it without holding a lock. However, this message |
| does not say that thread #4 is not using a lock for this location. |
| Indeed, it could be using a lock for the location because it intends |
| to make it available to other threads, one of which is thread #5 -- |
| and thread #5 has forgotten to use the lock.</para> |
| |
| <para>Also, this message implies that Helgrind did not see any |
| synchronisation event between threads #4 and #5 that would have |
| allowed #5 to acquire exclusive ownership from #4. See |
| <link linkend="hg-manual.data-races.exclusive">above</link> |
| for a discussion of transfers of exclusive ownership states between |
| threads.</para> |
| |
| </sect2> |
| |
| |
| </sect1> |
| |
| <sect1 id="hg-manual.effective-use" xreflabel="Helgrind Effective Use"> |
| <title>Hints and Tips for Effective Use of Helgrind</title> |
| |
| <para>Helgrind can be very helpful in finding and resolving |
| threading-related problems. Like all sophisticated tools, it is most |
| effective when you understand how to play to its strengths.</para> |
| |
| <para>Helgrind will be less effective when you merely throw an |
| existing threaded program at it and try to make sense of any reported |
| errors. It will be more effective if you design threaded programs |
| from the start in a way that helps Helgrind verify correctness. The |
| same is true for finding memory errors with Memcheck, but applies more |
| here, because thread checking is a harder problem. Consequently it is |
| much easier to write a correct program for which Helgrind falsely |
| reports (threading) errors than it is to write a correct program for |
| which Memcheck falsely reports (memory) errors.</para> |
| |
| <para>With that in mind, here are some tips, listed most important first, |
| for getting reliable results and avoiding false errors. The first two |
| are critical. Any violations of them will swamp you with huge numbers |
| of false data-race errors.</para> |
| |
| |
| <orderedlist> |
| |
| <listitem> |
| <para>Make sure your application, and all the libraries it uses, |
| use the POSIX threading primitives. Helgrind needs to be able to |
| see all events pertaining to thread creation, exit, locking and |
| other synchronisation events. To do so it intercepts many POSIX |
| pthread_ functions.</para> |
| |
| <para>Do not roll your own threading primitives (mutexes, etc) |
| from combinations of the Linux futex syscall, counters and wotnot. |
| These throw Helgrind's internal what's-going-on models way off |
| course and will give bogus results.</para> |
| |
| <para>Also, do not reimplement existing POSIX abstractions using |
| other POSIX abstractions. For example, don't build your own |
| semaphore routines or reader-writer locks from POSIX mutexes and |
| condition variables. Instead use POSIX reader-writer locks and |
| semaphores directly, since Helgrind supports them directly.</para> |
| |
| <para>Helgrind directly supports the following POSIX threading |
| abstractions: mutexes, reader-writer locks, condition variables |
| (but see below), and semaphores. Currently spinlocks and barriers |
| are not supported, although they could be in future. A prototype |
| "safe" implementation of barriers, based on semaphores, is |
| available: please contact the Valgrind authors for details.</para> |
| |
| <para>At the time of writing, the following popular Linux packages |
| are known to implement their own threading primitives:</para> |
| |
| <itemizedlist> |
| <listitem><para>Qt version 4.X. Qt 3.X is fine, but not 4.X. |
| Helgrind contains partial direct support for Qt 4.X threading, |
| but this is not yet in a usable state. Assistance from folks |
| knowledgeable in Qt 4 threading internals would be |
| appreciated.</para></listitem> |
| |
| <listitem><para>Runtime support library for GNU OpenMP (part of |
| GCC), at least GCC versions 4.2 and 4.3. With some minor effort |
| of modifying the GNU OpenMP runtime support sources, it is |
| possible to use Helgrind on GNU OpenMP compiled codes. Please |
| contact the Valgrind authors for details.</para></listitem> |
| </itemizedlist> |
| </listitem> |
| |
| <listitem> |
| <para>Avoid memory recycling. If you can't avoid it, you must use |
| tell Helgrind what is going on via the VALGRIND_HG_CLEAN_MEMORY |
| client request |
| (in <computeroutput>helgrind.h</computeroutput>).</para> |
| |
| <para>Helgrind is aware of standard memory allocation and |
| deallocation that occurs via malloc/free/new/delete and from entry |
| and exit of stack frames. In particular, when memory is |
| deallocated via free, delete, or function exit, Helgrind considers |
| that memory clean, so when it is eventually reallocated, its |
| history is irrelevant.</para> |
| |
| <para>However, it is common practice to implement memory recycling |
| schemes. In these, memory to be freed is not handed to |
| malloc/delete, but instead put into a pool of free buffers to be |
| handed out again as required. The problem is that Helgrind has no |
| way to know that such memory is logically no longer in use, and |
| its history is irrelevant. Hence you must make that explicit, |
| using the VALGRIND_HG_CLEAN_MEMORY client request to specify the |
| relevant address ranges. It's easiest to put these requests into |
| the pool manager code, and use them either when memory is returned |
| to the pool, or is allocated from it.</para> |
| </listitem> |
| |
| <listitem> |
| <para>Avoid POSIX condition variables. If you can, use POSIX |
| semaphores (sem_t, sem_post, sem_wait) to do inter-thread event |
| signalling. Semaphores with an initial value of zero are |
| particularly useful for this.</para> |
| |
| <para>Helgrind only partially correctly handles POSIX condition |
| variables. This is because Helgrind can see inter-thread |
| dependencies between a pthread_cond_wait call and a |
| pthread_cond_signal/broadcast call only if the waiting thread |
| actually gets to the rendezvous first (so that it actually calls |
| pthread_cond_wait). It can't see dependencies between the threads |
| if the signaller arrives first. In the latter case, POSIX |
| guidelines imply that the associated boolean condition still |
| provides an inter-thread synchronisation event, but one which is |
| invisible to Helgrind.</para> |
| |
| <para>The result of Helgrind missing some inter-thread |
| synchronisation events is to cause it to report false positives. |
| That's because missing such events reduces the extent to which it |
| can transfer exclusive memory ownership between threads. So |
| memory may end up in a shared-modified state when that was not |
| intended by the application programmers.</para> |
| |
| <para>The root cause of this synchronisation lossage is |
| particularly hard to understand, so an example is helpful. It was |
| discussed at length by Arndt Muehlenfeld ("Runtime Race Detection |
| in Multi-Threaded Programs", Dissertation, TU Graz, Austria). The |
| canonical POSIX-recommended usage scheme for condition variables |
| is as follows:</para> |
| |
| <programlisting><![CDATA[ |
| b is a Boolean condition, which is False most of the time |
| cv is a condition variable |
| mx is its associated mutex |
| |
| Signaller: Waiter: |
| |
| lock(mx) lock(mx) |
| b = True while (b == False) |
| signal(cv) wait(cv,mx) |
| unlock(mx) unlock(mx) |
| ]]></programlisting> |
| |
| <para>Assume <computeroutput>b</computeroutput> is False most of |
| the time. If the waiter arrives at the rendezvous first, it |
| enters its while-loop, waits for the signaller to signal, and |
| eventually proceeds. Helgrind sees the signal, notes the |
| dependency, and all is well.</para> |
| |
| <para>If the signaller arrives |
| first, <computeroutput>b</computeroutput> is set to true, and the |
| signal disappears into nowhere. When the waiter later arrives, it |
| does not enter its while-loop and simply carries on. But even in |
| this case, the waiter code following the while-loop cannot execute |
| until the signaller sets <computeroutput>b</computeroutput> to |
| True. Hence there is still the same inter-thread dependency, but |
| this time it is through an arbitrary in-memory condition, and |
| Helgrind cannot see it.</para> |
| |
| <para>By comparison, Helgrind's detection of inter-thread |
| dependencies caused by semaphore operations is believed to be |
| exactly correct.</para> |
| |
| <para>As far as I know, a solution to this problem that does not |
| require source-level annotation of condition-variable wait loops |
| is beyond the current state of the art.</para> |
| </listitem> |
| |
| <listitem> |
| <para>Make sure you are using a supported Linux distribution. At |
| present, Helgrind only properly supports x86-linux and amd64-linux |
| with glibc-2.3 or later. The latter restriction means we only |
| support glibc's NPTL threading implementation. The old |
| LinuxThreads implementation is not supported.</para> |
| |
| <para>Unsupported targets may work to varying degrees. In |
| particular ppc32-linux and ppc64-linux running NTPL should work, |
| but you will get false race errors because Helgrind does not know |
| how to properly handle atomic instruction sequences created using |
| the lwarx/stwcx instructions.</para> |
| </listitem> |
| |
| <listitem> |
| <para>Round up all finished threads using pthread_join. Avoid |
| detaching threads: don't create threads in the detached state, and |
| don't call pthread_detach on existing threads.</para> |
| |
| <para>Using pthread_join to round up finished threads provides a |
| clear synchronisation point that both Helgrind and programmers can |
| see. This synchronisation point allows Helgrind to adjust its |
| memory ownership |
| models <link linkend="hg-manual.data-races.exclusive">as described |
| extensively above</link>, which helps Helgrind produce more |
| accurate error reports.</para> |
| |
| <para>If you don't call pthread_join on a thread, Helgrind has no |
| way to know when it finishes, relative to any significant |
| synchronisation points for other threads in the program. So it |
| assumes that the thread lingers indefinitely and can potentially |
| interfere indefinitely with the memory state of the program. It |
| has every right to assume that -- after all, it might really be |
| the case that, for scheduling reasons, the exiting thread did run |
| very slowly in the last stages of its life.</para> |
| </listitem> |
| |
| <listitem> |
| <para>Perform thread debugging (with Helgrind) and memory |
| debugging (with Memcheck) together.</para> |
| |
| <para>Helgrind tracks the state of memory in detail, and memory |
| management bugs in the application are liable to cause confusion. |
| In extreme cases, applications which do many invalid reads and |
| writes (particularly to freed memory) have been known to crash |
| Helgrind. So, ideally, you should make your application |
| Memcheck-clean before using Helgrind.</para> |
| |
| <para>It may be impossible to make your application Memcheck-clean |
| unless you first remove threading bugs. In particular, it may be |
| difficult to remove all reads and writes to freed memory in |
| multithreaded C++ destructor sequences at program termination. |
| So, ideally, you should make your application Helgrind-clean |
| before using Memcheck.</para> |
| |
| <para>Since this circularity is obviously unresolvable, at least |
| bear in mind that Memcheck and Helgrind are to some extent |
| complementary, and you may need to use them together.</para> |
| </listitem> |
| |
| <listitem> |
| <para>POSIX requires that implementations of standard I/O (printf, |
| fprintf, fwrite, fread, etc) are thread safe. Unfortunately GNU |
| libc implements this by using internal locking primitives that |
| Helgrind is unable to intercept. Consequently Helgrind generates |
| many false race reports when you use these functions.</para> |
| |
| <para>Helgrind attempts to hide these errors using the standard |
| Valgrind error-suppression mechanism. So, at least for simple |
| test cases, you don't see any. Nevertheless, some may slip |
| through. Just something to be aware of.</para> |
| </listitem> |
| |
| <listitem> |
| <para>Helgrind's error checks do not work properly inside the |
| system threading library itself |
| (<computeroutput>libpthread.so</computeroutput>), and it usually |
| observes large numbers of (false) errors in there. Valgrind's |
| suppression system then filters these out, so you should not see |
| them.</para> |
| |
| <para>If you see any race errors reported |
| where <computeroutput>libpthread.so</computeroutput> or |
| <computeroutput>ld.so</computeroutput> is the object associated |
| with the innermost stack frame, please file a bug report at |
| http://www.valgrind.org.</para> |
| </listitem> |
| |
| </orderedlist> |
| |
| </sect1> |
| |
| |
| |
| |
| <sect1 id="hg-manual.options" xreflabel="Helgrind Options"> |
| <title>Helgrind Options</title> |
| |
| <para>The following end-user options are available:</para> |
| |
| <!-- start of xi:include in the manpage --> |
| <variablelist id="hg.opts.list"> |
| |
| <varlistentry id="opt.happens-before" xreflabel="--happens-before"> |
| <term> |
| <option><![CDATA[--happens-before=none|threads|all |
| [default: all] ]]></option> |
| </term> |
| <listitem> |
| <para>Helgrind always regards locks as the basis for |
| inter-thread synchronisation. However, by default, before |
| reporting a race error, Helgrind will also check whether |
| certain other kinds of inter-thread synchronisation events |
| happened. It may be that if such events took place, then no |
| race really occurred, and so no error needs to be reported. |
| See <link linkend="hg-manual.data-races.exclusive">above</link> |
| for a discussion of transfers of exclusive ownership states |
| between threads. |
| </para> |
| <para>With <varname>--happens-before=all</varname>, the |
| following events are regarded as sources of synchronisation: |
| thread creation/joinage, condition variable |
| signal/broadcast/waits, and semaphore posts/waits. |
| </para> |
| <para>With <varname>--happens-before=threads</varname>, only |
| thread creation/joinage events are regarded as sources of |
| synchronisation. |
| </para> |
| <para>With <varname>--happens-before=none</varname>, no events |
| (apart, of course, from locking) are regarded as sources of |
| synchronisation. |
| </para> |
| <para>Changing this setting from the default will increase your |
| false-error rate but give little or no gain. The only advantage |
| is that <option>--happens-before=threads</option> and |
| <option>--happens-before=none</option> should make Helgrind |
| less and less sensitive to the scheduling of threads, and hence |
| the output more and more repeatable across runs. |
| </para> |
| </listitem> |
| </varlistentry> |
| |
| <varlistentry id="opt.trace-addr" xreflabel="--trace-addr"> |
| <term> |
| <option><![CDATA[--trace-addr=0xXXYYZZ |
| ]]></option> and |
| <option><![CDATA[--trace-level=0|1|2 [default: 1] |
| ]]></option> |
| </term> |
| <listitem> |
| <para>Requests that Helgrind produces a log of all state changes |
| to location 0xXXYYZZ. This can be helpful in tracking down |
| tricky races. <varname>--trace-level</varname> controls the |
| verbosity of the log. At the default setting (1), a one-line |
| summary of is printed for each state change. At level 2 a |
| complete stack trace is printed for each state change.</para> |
| </listitem> |
| </varlistentry> |
| |
| </variablelist> |
| <!-- end of xi:include in the manpage --> |
| |
| <!-- start of xi:include in the manpage --> |
| <para>In addition, the following debugging options are available for |
| Helgrind:</para> |
| |
| <variablelist id="hg.debugopts.list"> |
| |
| <varlistentry id="opt.trace-malloc" xreflabel="--trace-malloc"> |
| <term> |
| <option><![CDATA[--trace-malloc=no|yes [no] |
| ]]></option> |
| </term> |
| <listitem> |
| <para>Show all client malloc (etc) and free (etc) requests.</para> |
| </listitem> |
| </varlistentry> |
| |
| <varlistentry id="opt.gen-vcg" xreflabel="--gen-vcg"> |
| <term> |
| <option><![CDATA[--gen-vcg=no|yes|yes-w-vts [no] |
| ]]></option> |
| </term> |
| <listitem> |
| <para>At exit, write to stderr a dump of the happens-before |
| graph computed by Helgrind, in a format suitable for the VCG |
| graph visualisation tool. A suitable command line is:</para> |
| <para><computeroutput>valgrind --tool=helgrind |
| --gen-vcg=yes my_app 2>&1 |
| | grep xxxxxx | sed "s/xxxxxx//g" |
| | xvcg -</computeroutput></para> |
| <para>With <varname>--gen-vcg=yes</varname>, the basic |
| happens-before graph is shown. With |
| <varname>--gen-vcg=yes-w-vts</varname>, the vector timestamp |
| for each node is also shown.</para> |
| </listitem> |
| </varlistentry> |
| |
| <varlistentry id="opt.cmp-race-err-addrs" |
| xreflabel="--cmp-race-err-addrs"> |
| <term> |
| <option><![CDATA[--cmp-race-err-addrs=no|yes [no] |
| ]]></option> |
| </term> |
| <listitem> |
| <para>Controls whether or not race (data) addresses should be |
| taken into account when removing duplicates of race errors. |
| With <varname>--cmp-race-err-addrs=no</varname>, two otherwise |
| identical race errors will be considered to be the same if |
| their race addresses differ. With |
| With <varname>--cmp-race-err-addrs=yes</varname> they will be |
| considered different. This is provided to help make certain |
| regression tests work reliably.</para> |
| </listitem> |
| </varlistentry> |
| |
| <varlistentry id="opt.hg-sanity-flags" xreflabel="--hg-sanity-flags"> |
| <term> |
| <option><![CDATA[--hg-sanity-flags=<XXXXXX> (X = 0|1) [000000] |
| ]]></option> |
| </term> |
| <listitem> |
| <para>Run extensive sanity checks on Helgrind's internal |
| data structures at events defined by the bitstring, as |
| follows:</para> |
| <para><computeroutput>100000 </computeroutput>at every query |
| to the happens-before graph</para> |
| <para><computeroutput>010000 </computeroutput>after changes to |
| the lock order acquisition graph</para> |
| <para><computeroutput>001000 </computeroutput>after every client |
| memory access (NB: not currently used)</para> |
| <para><computeroutput>000100 </computeroutput>after every client |
| memory range permission setting of 256 bytes or greater</para> |
| <para><computeroutput>000010 </computeroutput>after every client |
| lock or unlock event</para> |
| <para><computeroutput>000001 </computeroutput>after every client |
| thread creation or joinage event</para> |
| <para>Note these will make Helgrind run very slowly, often to |
| the point of being completely unusable.</para> |
| </listitem> |
| </varlistentry> |
| |
| </variablelist> |
| <!-- end of xi:include in the manpage --> |
| |
| |
| </sect1> |
| |
| <sect1 id="hg-manual.todolist" xreflabel="To Do List"> |
| <title>A To-Do List for Helgrind</title> |
| |
| <para>The following is a list of loose ends which should be tidied up |
| some time.</para> |
| |
| <itemizedlist> |
| <listitem><para>Track which mutexes are associated with which |
| condition variables, and emit a warning if this becomes |
| inconsistent.</para> |
| </listitem> |
| <listitem><para>For lock order errors, print the complete lock |
| cycle, rather than only doing for size-2 cycles as at |
| present.</para> |
| </listitem> |
| <listitem><para>Document the VALGRIND_HG_CLEAN_MEMORY client |
| request.</para> |
| </listitem> |
| <listitem><para>Possibly a client request to forcibly transfer |
| ownership of memory from one thread to another. Requires further |
| consideration.</para> |
| </listitem> |
| <listitem><para>Add a new client request that marks an address range |
| as being "shared-modified with empty lockset" (the error state), |
| and describe how to use it.</para> |
| </listitem> |
| <listitem><para>Document races caused by gcc's thread-unsafe code |
| generation for speculative stores. In the interim see |
| <computeroutput>http://gcc.gnu.org/ml/gcc/2007-10/msg00266.html |
| </computeroutput> |
| and <computeroutput>http://lkml.org/lkml/2007/10/24/673</computeroutput>. |
| </para> |
| </listitem> |
| <listitem><para>Don't update the lock-order graph, and don't check |
| for errors, when a "try"-style lock operation happens (eg |
| pthread_mutex_trylock). Such calls do not add any real |
| restrictions to the locking order, since they can always fail to |
| acquire the lock, resulting in the caller going off and doing Plan |
| B (presumably it will have a Plan B). Doing such checks could |
| generate false lock-order errors and confuse users.</para> |
| </listitem> |
| <listitem><para> Performance can be very poor. Slowdowns on the |
| order of 100:1 are not unusual. There is quite some scope for |
| performance improvements, though. |
| </para> |
| </listitem> |
| |
| </itemizedlist> |
| |
| </sect1> |
| |
| </chapter> |