Elliott Hughes | a0664b9 | 2017-04-18 17:46:52 -0700 | [diff] [blame] | 1 | <html> |
| 2 | <head> |
| 3 | <meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1"> |
| 4 | <title>7. Helgrind: a thread error detector</title> |
| 5 | <link rel="stylesheet" type="text/css" href="vg_basic.css"> |
Elliott Hughes | ed39800 | 2017-06-21 14:41:24 -0700 | [diff] [blame^] | 6 | <meta name="generator" content="DocBook XSL Stylesheets V1.79.1"> |
Elliott Hughes | a0664b9 | 2017-04-18 17:46:52 -0700 | [diff] [blame] | 7 | <link rel="home" href="index.html" title="Valgrind Documentation"> |
| 8 | <link rel="up" href="manual.html" title="Valgrind User Manual"> |
| 9 | <link rel="prev" href="cl-manual.html" title="6. Callgrind: a call-graph generating cache and branch prediction profiler"> |
| 10 | <link rel="next" href="drd-manual.html" title="8. DRD: a thread error detector"> |
| 11 | </head> |
| 12 | <body bgcolor="white" text="black" link="#0000FF" vlink="#840084" alink="#0000FF"> |
| 13 | <div><table class="nav" width="100%" cellspacing="3" cellpadding="3" border="0" summary="Navigation header"><tr> |
| 14 | <td width="22px" align="center" valign="middle"><a accesskey="p" href="cl-manual.html"><img src="images/prev.png" width="18" height="21" border="0" alt="Prev"></a></td> |
| 15 | <td width="25px" align="center" valign="middle"><a accesskey="u" href="manual.html"><img src="images/up.png" width="21" height="18" border="0" alt="Up"></a></td> |
| 16 | <td width="31px" align="center" valign="middle"><a accesskey="h" href="index.html"><img src="images/home.png" width="27" height="20" border="0" alt="Up"></a></td> |
| 17 | <th align="center" valign="middle">Valgrind User Manual</th> |
| 18 | <td width="22px" align="center" valign="middle"><a accesskey="n" href="drd-manual.html"><img src="images/next.png" width="18" height="21" border="0" alt="Next"></a></td> |
| 19 | </tr></table></div> |
| 20 | <div class="chapter"> |
| 21 | <div class="titlepage"><div><div><h1 class="title"> |
| 22 | <a name="hg-manual"></a>7. Helgrind: a thread error detector</h1></div></div></div> |
| 23 | <div class="toc"> |
| 24 | <p><b>Table of Contents</b></p> |
| 25 | <dl class="toc"> |
| 26 | <dt><span class="sect1"><a href="hg-manual.html#hg-manual.overview">7.1. Overview</a></span></dt> |
| 27 | <dt><span class="sect1"><a href="hg-manual.html#hg-manual.api-checks">7.2. Detected errors: Misuses of the POSIX pthreads API</a></span></dt> |
| 28 | <dt><span class="sect1"><a href="hg-manual.html#hg-manual.lock-orders">7.3. Detected errors: Inconsistent Lock Orderings</a></span></dt> |
| 29 | <dt><span class="sect1"><a href="hg-manual.html#hg-manual.data-races">7.4. Detected errors: Data Races</a></span></dt> |
| 30 | <dd><dl> |
| 31 | <dt><span class="sect2"><a href="hg-manual.html#hg-manual.data-races.example">7.4.1. A Simple Data Race</a></span></dt> |
| 32 | <dt><span class="sect2"><a href="hg-manual.html#hg-manual.data-races.algorithm">7.4.2. Helgrind's Race Detection Algorithm</a></span></dt> |
| 33 | <dt><span class="sect2"><a href="hg-manual.html#hg-manual.data-races.errmsgs">7.4.3. Interpreting Race Error Messages</a></span></dt> |
| 34 | </dl></dd> |
| 35 | <dt><span class="sect1"><a href="hg-manual.html#hg-manual.effective-use">7.5. Hints and Tips for Effective Use of Helgrind</a></span></dt> |
| 36 | <dt><span class="sect1"><a href="hg-manual.html#hg-manual.options">7.6. Helgrind Command-line Options</a></span></dt> |
| 37 | <dt><span class="sect1"><a href="hg-manual.html#hg-manual.monitor-commands">7.7. Helgrind Monitor Commands</a></span></dt> |
| 38 | <dt><span class="sect1"><a href="hg-manual.html#hg-manual.client-requests">7.8. Helgrind Client Requests</a></span></dt> |
| 39 | <dt><span class="sect1"><a href="hg-manual.html#hg-manual.todolist">7.9. A To-Do List for Helgrind</a></span></dt> |
| 40 | </dl> |
| 41 | </div> |
| 42 | <p>To use this tool, you must specify |
| 43 | <code class="option">--tool=helgrind</code> on the Valgrind |
| 44 | command line.</p> |
| 45 | <div class="sect1"> |
| 46 | <div class="titlepage"><div><div><h2 class="title" style="clear: both"> |
| 47 | <a name="hg-manual.overview"></a>7.1. Overview</h2></div></div></div> |
| 48 | <p>Helgrind is a Valgrind tool for detecting synchronisation errors |
| 49 | in C, C++ and Fortran programs that use the POSIX pthreads |
| 50 | threading primitives.</p> |
| 51 | <p>The main abstractions in POSIX pthreads are: a set of threads |
| 52 | sharing a common address space, thread creation, thread joining, |
| 53 | thread exit, mutexes (locks), condition variables (inter-thread event |
| 54 | notifications), reader-writer locks, spinlocks, semaphores and |
| 55 | barriers.</p> |
| 56 | <p>Helgrind can detect three classes of errors, which are discussed |
| 57 | in detail in the next three sections:</p> |
| 58 | <div class="orderedlist"><ol class="orderedlist" type="1"> |
| 59 | <li class="listitem"><p><a class="link" href="hg-manual.html#hg-manual.api-checks" title="7.2. Detected errors: Misuses of the POSIX pthreads API"> |
| 60 | Misuses of the POSIX pthreads API.</a></p></li> |
| 61 | <li class="listitem"><p><a class="link" href="hg-manual.html#hg-manual.lock-orders" title="7.3. Detected errors: Inconsistent Lock Orderings"> |
| 62 | Potential deadlocks arising from lock |
| 63 | ordering problems.</a></p></li> |
| 64 | <li class="listitem"><p><a class="link" href="hg-manual.html#hg-manual.data-races" title="7.4. Detected errors: Data Races"> |
| 65 | Data races -- accessing memory without adequate locking |
| 66 | or synchronisation</a>. |
| 67 | </p></li> |
| 68 | </ol></div> |
| 69 | <p>Problems like these often result in unreproducible, |
| 70 | timing-dependent crashes, deadlocks and other misbehaviour, and |
| 71 | can be difficult to find by other means.</p> |
| 72 | <p>Helgrind is aware of all the pthread abstractions and tracks |
| 73 | their effects as accurately as it can. On x86 and amd64 platforms, it |
| 74 | understands and partially handles implicit locking arising from the |
| 75 | use of the LOCK instruction prefix. On PowerPC/POWER and ARM |
| 76 | platforms, it partially handles implicit locking arising from |
| 77 | load-linked and store-conditional instruction pairs. |
| 78 | </p> |
| 79 | <p>Helgrind works best when your application uses only the POSIX |
| 80 | pthreads API. However, if you want to use custom threading |
| 81 | primitives, you can describe their behaviour to Helgrind using the |
| 82 | <code class="varname">ANNOTATE_*</code> macros defined |
| 83 | in <code class="varname">helgrind.h</code>.</p> |
Elliott Hughes | ed39800 | 2017-06-21 14:41:24 -0700 | [diff] [blame^] | 84 | <p>Helgrind also provides <a class="xref" href="manual-core.html#manual-core.xtree" title="2.9. Execution Trees">Execution Trees</a> memory |
| 85 | profiling using the command line |
| 86 | option <code class="computeroutput">--xtree-memory</code> and the monitor command |
| 87 | <code class="computeroutput">xtmemory</code>.</p> |
Elliott Hughes | a0664b9 | 2017-04-18 17:46:52 -0700 | [diff] [blame] | 88 | <p>Following those is a section containing |
| 89 | <a class="link" href="hg-manual.html#hg-manual.effective-use" title="7.5. Hints and Tips for Effective Use of Helgrind"> |
| 90 | hints and tips on how to get the best out of Helgrind.</a> |
| 91 | </p> |
| 92 | <p>Then there is a |
| 93 | <a class="link" href="hg-manual.html#hg-manual.options" title="7.6. Helgrind Command-line Options">summary of command-line |
| 94 | options.</a> |
| 95 | </p> |
| 96 | <p>Finally, there is |
| 97 | <a class="link" href="hg-manual.html#hg-manual.todolist" title="7.9. A To-Do List for Helgrind">a brief summary of areas in which Helgrind |
| 98 | could be improved.</a> |
| 99 | </p> |
| 100 | </div> |
| 101 | <div class="sect1"> |
| 102 | <div class="titlepage"><div><div><h2 class="title" style="clear: both"> |
| 103 | <a name="hg-manual.api-checks"></a>7.2. Detected errors: Misuses of the POSIX pthreads API</h2></div></div></div> |
| 104 | <p>Helgrind intercepts calls to many POSIX pthreads functions, and |
| 105 | is therefore able to report on various common problems. Although |
| 106 | these are unglamourous errors, their presence can lead to undefined |
| 107 | program behaviour and hard-to-find bugs later on. The detected errors |
| 108 | are:</p> |
| 109 | <div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; "> |
| 110 | <li class="listitem"><p>unlocking an invalid mutex</p></li> |
| 111 | <li class="listitem"><p>unlocking a not-locked mutex</p></li> |
| 112 | <li class="listitem"><p>unlocking a mutex held by a different |
| 113 | thread</p></li> |
| 114 | <li class="listitem"><p>destroying an invalid or a locked mutex</p></li> |
| 115 | <li class="listitem"><p>recursively locking a non-recursive mutex</p></li> |
| 116 | <li class="listitem"><p>deallocation of memory that contains a |
| 117 | locked mutex</p></li> |
| 118 | <li class="listitem"><p>passing mutex arguments to functions expecting |
| 119 | reader-writer lock arguments, and vice |
| 120 | versa</p></li> |
| 121 | <li class="listitem"><p>when a POSIX pthread function fails with an |
| 122 | error code that must be handled</p></li> |
| 123 | <li class="listitem"><p>when a thread exits whilst still holding locked |
| 124 | locks</p></li> |
| 125 | <li class="listitem"><p>calling <code class="function">pthread_cond_wait</code> |
| 126 | with a not-locked mutex, an invalid mutex, |
| 127 | or one locked by a different |
| 128 | thread</p></li> |
| 129 | <li class="listitem"><p>inconsistent bindings between condition |
| 130 | variables and their associated mutexes</p></li> |
| 131 | <li class="listitem"><p>invalid or duplicate initialisation of a pthread |
| 132 | barrier</p></li> |
| 133 | <li class="listitem"><p>initialisation of a pthread barrier on which threads |
| 134 | are still waiting</p></li> |
| 135 | <li class="listitem"><p>destruction of a pthread barrier object which was |
| 136 | never initialised, or on which threads are still |
| 137 | waiting</p></li> |
| 138 | <li class="listitem"><p>waiting on an uninitialised pthread |
| 139 | barrier</p></li> |
| 140 | <li class="listitem"><p>for all of the pthreads functions that Helgrind |
| 141 | intercepts, an error is reported, along with a stack |
| 142 | trace, if the system threading library routine returns |
| 143 | an error code, even if Helgrind itself detected no |
| 144 | error</p></li> |
| 145 | </ul></div> |
| 146 | <p>Checks pertaining to the validity of mutexes are generally also |
| 147 | performed for reader-writer locks.</p> |
| 148 | <p>Various kinds of this-can't-possibly-happen events are also |
| 149 | reported. These usually indicate bugs in the system threading |
| 150 | library.</p> |
| 151 | <p>Reported errors always contain a primary stack trace indicating |
| 152 | where the error was detected. They may also contain auxiliary stack |
| 153 | traces giving additional information. In particular, most errors |
| 154 | relating to mutexes will also tell you where that mutex first came to |
| 155 | Helgrind's attention (the "<code class="computeroutput">was first observed |
| 156 | at</code>" part), so you have a chance of figuring out which |
| 157 | mutex it is referring to. For example:</p> |
| 158 | <pre class="programlisting"> |
| 159 | Thread #1 unlocked a not-locked lock at 0x7FEFFFA90 |
| 160 | at 0x4C2408D: pthread_mutex_unlock (hg_intercepts.c:492) |
| 161 | by 0x40073A: nearly_main (tc09_bad_unlock.c:27) |
| 162 | by 0x40079B: main (tc09_bad_unlock.c:50) |
| 163 | Lock at 0x7FEFFFA90 was first observed |
| 164 | at 0x4C25D01: pthread_mutex_init (hg_intercepts.c:326) |
| 165 | by 0x40071F: nearly_main (tc09_bad_unlock.c:23) |
| 166 | by 0x40079B: main (tc09_bad_unlock.c:50) |
| 167 | </pre> |
| 168 | <p>Helgrind has a way of summarising thread identities, as |
| 169 | you see here with the text "<code class="computeroutput">Thread |
| 170 | #1</code>". This is so that it can speak about threads and |
| 171 | sets of threads without overwhelming you with details. See |
| 172 | <a class="link" href="hg-manual.html#hg-manual.data-races.errmsgs" title="7.4.3. Interpreting Race Error Messages">below</a> |
| 173 | for more information on interpreting error messages.</p> |
| 174 | </div> |
| 175 | <div class="sect1"> |
| 176 | <div class="titlepage"><div><div><h2 class="title" style="clear: both"> |
| 177 | <a name="hg-manual.lock-orders"></a>7.3. Detected errors: Inconsistent Lock Orderings</h2></div></div></div> |
| 178 | <p>In this section, and in general, to "acquire" a lock simply |
| 179 | means to lock that lock, and to "release" a lock means to unlock |
| 180 | it.</p> |
| 181 | <p>Helgrind monitors the order in which threads acquire locks. |
| 182 | This allows it to detect potential deadlocks which could arise from |
| 183 | the formation of cycles of locks. Detecting such inconsistencies is |
| 184 | useful because, whilst actual deadlocks are fairly obvious, potential |
| 185 | deadlocks may never be discovered during testing and could later lead |
| 186 | to hard-to-diagnose in-service failures.</p> |
| 187 | <p>The simplest example of such a problem is as |
| 188 | follows.</p> |
| 189 | <div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; "> |
| 190 | <li class="listitem"><p>Imagine some shared resource R, which, for whatever |
| 191 | reason, is guarded by two locks, L1 and L2, which must both be held |
| 192 | when R is accessed.</p></li> |
| 193 | <li class="listitem"><p>Suppose a thread acquires L1, then L2, and proceeds |
| 194 | to access R. The implication of this is that all threads in the |
| 195 | program must acquire the two locks in the order first L1 then L2. |
| 196 | Not doing so risks deadlock.</p></li> |
| 197 | <li class="listitem"><p>The deadlock could happen if two threads -- call them |
| 198 | T1 and T2 -- both want to access R. Suppose T1 acquires L1 first, |
| 199 | and T2 acquires L2 first. Then T1 tries to acquire L2, and T2 tries |
| 200 | to acquire L1, but those locks are both already held. So T1 and T2 |
| 201 | become deadlocked.</p></li> |
| 202 | </ul></div> |
| 203 | <p>Helgrind builds a directed graph indicating the order in which |
| 204 | locks have been acquired in the past. When a thread acquires a new |
| 205 | lock, the graph is updated, and then checked to see if it now contains |
| 206 | a cycle. The presence of a cycle indicates a potential deadlock involving |
| 207 | the locks in the cycle.</p> |
| 208 | <p>In general, Helgrind will choose two locks involved in the cycle |
| 209 | and show you how their acquisition ordering has become inconsistent. |
| 210 | It does this by showing the program points that first defined the |
| 211 | ordering, and the program points which later violated it. Here is a |
| 212 | simple example involving just two locks:</p> |
| 213 | <pre class="programlisting"> |
| 214 | Thread #1: lock order "0x7FF0006D0 before 0x7FF0006A0" violated |
| 215 | |
| 216 | Observed (incorrect) order is: acquisition of lock at 0x7FF0006A0 |
| 217 | at 0x4C2BC62: pthread_mutex_lock (hg_intercepts.c:494) |
| 218 | by 0x400825: main (tc13_laog1.c:23) |
| 219 | |
| 220 | followed by a later acquisition of lock at 0x7FF0006D0 |
| 221 | at 0x4C2BC62: pthread_mutex_lock (hg_intercepts.c:494) |
| 222 | by 0x400853: main (tc13_laog1.c:24) |
| 223 | |
| 224 | Required order was established by acquisition of lock at 0x7FF0006D0 |
| 225 | at 0x4C2BC62: pthread_mutex_lock (hg_intercepts.c:494) |
| 226 | by 0x40076D: main (tc13_laog1.c:17) |
| 227 | |
| 228 | followed by a later acquisition of lock at 0x7FF0006A0 |
| 229 | at 0x4C2BC62: pthread_mutex_lock (hg_intercepts.c:494) |
| 230 | by 0x40079B: main (tc13_laog1.c:18) |
| 231 | </pre> |
| 232 | <p>When there are more than two locks in the cycle, the error is |
| 233 | equally serious. However, at present Helgrind does not show the locks |
| 234 | involved, sometimes because that information is not available, but |
| 235 | also so as to avoid flooding you with information. For example, a |
| 236 | naive implementation of the famous Dining Philosophers problem |
| 237 | involves a cycle of five locks |
| 238 | (see <code class="computeroutput">helgrind/tests/tc14_laog_dinphils.c</code>). |
| 239 | In this case Helgrind has detected that all 5 philosophers could |
| 240 | simultaneously pick up their left fork and then deadlock whilst |
| 241 | waiting to pick up their right forks.</p> |
| 242 | <pre class="programlisting"> |
| 243 | Thread #6: lock order "0x80499A0 before 0x8049A00" violated |
| 244 | |
| 245 | Observed (incorrect) order is: acquisition of lock at 0x8049A00 |
| 246 | at 0x40085BC: pthread_mutex_lock (hg_intercepts.c:495) |
| 247 | by 0x80485B4: dine (tc14_laog_dinphils.c:18) |
| 248 | by 0x400BDA4: mythread_wrapper (hg_intercepts.c:219) |
| 249 | by 0x39B924: start_thread (pthread_create.c:297) |
| 250 | by 0x2F107D: clone (clone.S:130) |
| 251 | |
| 252 | followed by a later acquisition of lock at 0x80499A0 |
| 253 | at 0x40085BC: pthread_mutex_lock (hg_intercepts.c:495) |
| 254 | by 0x80485CD: dine (tc14_laog_dinphils.c:19) |
| 255 | by 0x400BDA4: mythread_wrapper (hg_intercepts.c:219) |
| 256 | by 0x39B924: start_thread (pthread_create.c:297) |
| 257 | by 0x2F107D: clone (clone.S:130) |
| 258 | </pre> |
| 259 | </div> |
| 260 | <div class="sect1"> |
| 261 | <div class="titlepage"><div><div><h2 class="title" style="clear: both"> |
| 262 | <a name="hg-manual.data-races"></a>7.4. Detected errors: Data Races</h2></div></div></div> |
| 263 | <p>A data race happens, or could happen, when two threads access a |
| 264 | shared memory location without using suitable locks or other |
| 265 | synchronisation to ensure single-threaded access. Such missing |
| 266 | locking can cause obscure timing dependent bugs. Ensuring programs |
| 267 | are race-free is one of the central difficulties of threaded |
| 268 | programming.</p> |
| 269 | <p>Reliably detecting races is a difficult problem, and most |
| 270 | of Helgrind's internals are devoted to dealing with it. |
| 271 | We begin with a simple example.</p> |
| 272 | <div class="sect2"> |
| 273 | <div class="titlepage"><div><div><h3 class="title"> |
| 274 | <a name="hg-manual.data-races.example"></a>7.4.1. A Simple Data Race</h3></div></div></div> |
| 275 | <p>About the simplest possible example of a race is as follows. In |
| 276 | this program, it is impossible to know what the value |
| 277 | of <code class="computeroutput">var</code> is at the end of the program. |
| 278 | Is it 2 ? Or 1 ?</p> |
| 279 | <pre class="programlisting"> |
| 280 | #include <pthread.h> |
| 281 | |
| 282 | int var = 0; |
| 283 | |
| 284 | void* child_fn ( void* arg ) { |
| 285 | var++; /* Unprotected relative to parent */ /* this is line 6 */ |
| 286 | return NULL; |
| 287 | } |
| 288 | |
| 289 | int main ( void ) { |
| 290 | pthread_t child; |
| 291 | pthread_create(&child, NULL, child_fn, NULL); |
| 292 | var++; /* Unprotected relative to child */ /* this is line 13 */ |
| 293 | pthread_join(child, NULL); |
| 294 | return 0; |
| 295 | } |
| 296 | </pre> |
| 297 | <p>The problem is there is nothing to |
| 298 | stop <code class="varname">var</code> being updated simultaneously |
| 299 | by both threads. A correct program would |
| 300 | protect <code class="varname">var</code> with a lock of type |
| 301 | <code class="function">pthread_mutex_t</code>, which is acquired |
| 302 | before each access and released afterwards. Helgrind's output for |
| 303 | this program is:</p> |
| 304 | <pre class="programlisting"> |
| 305 | Thread #1 is the program's root thread |
| 306 | |
| 307 | Thread #2 was created |
| 308 | at 0x511C08E: clone (in /lib64/libc-2.8.so) |
| 309 | by 0x4E333A4: do_clone (in /lib64/libpthread-2.8.so) |
| 310 | by 0x4E33A30: pthread_create@@GLIBC_2.2.5 (in /lib64/libpthread-2.8.so) |
| 311 | by 0x4C299D4: pthread_create@* (hg_intercepts.c:214) |
| 312 | by 0x400605: main (simple_race.c:12) |
| 313 | |
| 314 | Possible data race during read of size 4 at 0x601038 by thread #1 |
| 315 | Locks held: none |
| 316 | at 0x400606: main (simple_race.c:13) |
| 317 | |
| 318 | This conflicts with a previous write of size 4 by thread #2 |
| 319 | Locks held: none |
| 320 | at 0x4005DC: child_fn (simple_race.c:6) |
| 321 | by 0x4C29AFF: mythread_wrapper (hg_intercepts.c:194) |
| 322 | by 0x4E3403F: start_thread (in /lib64/libpthread-2.8.so) |
| 323 | by 0x511C0CC: clone (in /lib64/libc-2.8.so) |
| 324 | |
| 325 | Location 0x601038 is 0 bytes inside global var "var" |
| 326 | declared at simple_race.c:3 |
| 327 | </pre> |
| 328 | <p>This is quite a lot of detail for an apparently simple error. |
| 329 | The last clause is the main error message. It says there is a race as |
| 330 | a result of a read of size 4 (bytes), at 0x601038, which is the |
| 331 | address of <code class="computeroutput">var</code>, happening in |
| 332 | function <code class="computeroutput">main</code> at line 13 in the |
| 333 | program.</p> |
| 334 | <p>Two important parts of the message are:</p> |
| 335 | <div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; "> |
| 336 | <li class="listitem"> |
| 337 | <p>Helgrind shows two stack traces for the error, not one. By |
| 338 | definition, a race involves two different threads accessing the |
| 339 | same location in such a way that the result depends on the relative |
| 340 | speeds of the two threads.</p> |
| 341 | <p> |
| 342 | The first stack trace follows the text "<code class="computeroutput">Possible |
| 343 | data race during read of size 4 ...</code>" and the |
| 344 | second trace follows the text "<code class="computeroutput">This conflicts with |
| 345 | a previous write of size 4 ...</code>". Helgrind is |
| 346 | usually able to show both accesses involved in a race. At least |
| 347 | one of these will be a write (since two concurrent, unsynchronised |
| 348 | reads are harmless), and they will of course be from different |
| 349 | threads.</p> |
| 350 | <p>By examining your program at the two locations, you should be |
| 351 | able to get at least some idea of what the root cause of the |
| 352 | problem is. For each location, Helgrind shows the set of locks |
| 353 | held at the time of the access. This often makes it clear which |
| 354 | thread, if any, failed to take a required lock. In this example |
| 355 | neither thread holds a lock during the access.</p> |
| 356 | </li> |
| 357 | <li class="listitem"> |
| 358 | <p>For races which occur on global or stack variables, Helgrind |
| 359 | tries to identify the name and defining point of the variable. |
| 360 | Hence the text "<code class="computeroutput">Location 0x601038 is 0 bytes inside |
| 361 | global var "var" declared at simple_race.c:3</code>".</p> |
| 362 | <p>Showing names of stack and global variables carries no |
| 363 | run-time overhead once Helgrind has your program up and running. |
| 364 | However, it does require Helgrind to spend considerable extra time |
| 365 | and memory at program startup to read the relevant debug info. |
| 366 | Hence this facility is disabled by default. To enable it, you need |
| 367 | to give the <code class="varname">--read-var-info=yes</code> option to |
| 368 | Helgrind.</p> |
| 369 | </li> |
| 370 | </ul></div> |
| 371 | <p>The following section explains Helgrind's race detection |
| 372 | algorithm in more detail.</p> |
| 373 | </div> |
| 374 | <div class="sect2"> |
| 375 | <div class="titlepage"><div><div><h3 class="title"> |
| 376 | <a name="hg-manual.data-races.algorithm"></a>7.4.2. Helgrind's Race Detection Algorithm</h3></div></div></div> |
| 377 | <p>Most programmers think about threaded programming in terms of |
| 378 | the basic functionality provided by the threading library (POSIX |
| 379 | Pthreads): thread creation, thread joining, locks, condition |
| 380 | variables, semaphores and barriers.</p> |
| 381 | <p>The effect of using these functions is to impose |
| 382 | constraints upon the order in which memory accesses can |
| 383 | happen. This implied ordering is generally known as the |
| 384 | "happens-before relation". Once you understand the happens-before |
| 385 | relation, it is easy to see how Helgrind finds races in your code. |
| 386 | Fortunately, the happens-before relation is itself easy to understand, |
| 387 | and is by itself a useful tool for reasoning about the behaviour of |
| 388 | parallel programs. We now introduce it using a simple example.</p> |
| 389 | <p>Consider first the following buggy program:</p> |
| 390 | <pre class="programlisting"> |
| 391 | Parent thread: Child thread: |
| 392 | |
| 393 | int var; |
| 394 | |
| 395 | // create child thread |
| 396 | pthread_create(...) |
| 397 | var = 20; var = 10; |
| 398 | exit |
| 399 | |
| 400 | // wait for child |
| 401 | pthread_join(...) |
| 402 | printf("%d\n", var); |
| 403 | </pre> |
| 404 | <p>The parent thread creates a child. Both then write different |
| 405 | values to some variable <code class="computeroutput">var</code>, and the |
| 406 | parent then waits for the child to exit.</p> |
| 407 | <p>What is the value of <code class="computeroutput">var</code> at the |
| 408 | end of the program, 10 or 20? We don't know. The program is |
| 409 | considered buggy (it has a race) because the final value |
| 410 | of <code class="computeroutput">var</code> depends on the relative rates |
| 411 | of progress of the parent and child threads. If the parent is fast |
| 412 | and the child is slow, then the child's assignment may happen later, |
| 413 | so the final value will be 10; and vice versa if the child is faster |
| 414 | than the parent.</p> |
| 415 | <p>The relative rates of progress of parent vs child is not something |
| 416 | the programmer can control, and will often change from run to run. |
| 417 | It depends on factors such as the load on the machine, what else is |
| 418 | running, the kernel's scheduling strategy, and many other factors.</p> |
| 419 | <p>The obvious fix is to use a lock to |
| 420 | protect <code class="computeroutput">var</code>. It is however |
| 421 | instructive to consider a somewhat more abstract solution, which is to |
| 422 | send a message from one thread to the other:</p> |
| 423 | <pre class="programlisting"> |
| 424 | Parent thread: Child thread: |
| 425 | |
| 426 | int var; |
| 427 | |
| 428 | // create child thread |
| 429 | pthread_create(...) |
| 430 | var = 20; |
| 431 | // send message to child |
| 432 | // wait for message to arrive |
| 433 | var = 10; |
| 434 | exit |
| 435 | |
| 436 | // wait for child |
| 437 | pthread_join(...) |
| 438 | printf("%d\n", var); |
| 439 | </pre> |
| 440 | <p>Now the program reliably prints "10", regardless of the speed of |
| 441 | the threads. Why? Because the child's assignment cannot happen until |
| 442 | after it receives the message. And the message is not sent until |
| 443 | after the parent's assignment is done.</p> |
| 444 | <p>The message transmission creates a "happens-before" dependency |
| 445 | between the two assignments: <code class="computeroutput">var = 20;</code> |
| 446 | must now happen-before <code class="computeroutput">var = 10;</code>. |
| 447 | And so there is no longer a race |
| 448 | on <code class="computeroutput">var</code>. |
| 449 | </p> |
| 450 | <p>Note that it's not significant that the parent sends a message |
| 451 | to the child. Sending a message from the child (after its assignment) |
| 452 | to the parent (before its assignment) would also fix the problem, causing |
| 453 | the program to reliably print "20".</p> |
| 454 | <p>Helgrind's algorithm is (conceptually) very simple. It monitors all |
| 455 | accesses to memory locations. If a location -- in this example, |
| 456 | <code class="computeroutput">var</code>, |
| 457 | is accessed by two different threads, Helgrind checks to see if the |
| 458 | two accesses are ordered by the happens-before relation. If so, |
| 459 | that's fine; if not, it reports a race.</p> |
| 460 | <p>It is important to understand that the happens-before relation |
| 461 | creates only a partial ordering, not a total ordering. An example of |
| 462 | a total ordering is comparison of numbers: for any two numbers |
| 463 | <code class="computeroutput">x</code> and |
| 464 | <code class="computeroutput">y</code>, either |
| 465 | <code class="computeroutput">x</code> is less than, equal to, or greater |
| 466 | than |
| 467 | <code class="computeroutput">y</code>. A partial ordering is like a |
| 468 | total ordering, but it can also express the concept that two elements |
| 469 | are neither equal, less or greater, but merely unordered with respect |
| 470 | to each other.</p> |
| 471 | <p>In the fixed example above, we say that |
| 472 | <code class="computeroutput">var = 20;</code> "happens-before" |
| 473 | <code class="computeroutput">var = 10;</code>. But in the original |
| 474 | version, they are unordered: we cannot say that either happens-before |
| 475 | the other.</p> |
| 476 | <p>What does it mean to say that two accesses from different |
| 477 | threads are ordered by the happens-before relation? It means that |
| 478 | there is some chain of inter-thread synchronisation operations which |
| 479 | cause those accesses to happen in a particular order, irrespective of |
| 480 | the actual rates of progress of the individual threads. This is a |
| 481 | required property for a reliable threaded program, which is why |
| 482 | Helgrind checks for it.</p> |
| 483 | <p>The happens-before relations created by standard threading |
| 484 | primitives are as follows:</p> |
| 485 | <div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; "> |
| 486 | <li class="listitem"><p>When a mutex is unlocked by thread T1 and later (or |
| 487 | immediately) locked by thread T2, then the memory accesses in T1 |
| 488 | prior to the unlock must happen-before those in T2 after it acquires |
| 489 | the lock.</p></li> |
| 490 | <li class="listitem"><p>The same idea applies to reader-writer locks, |
| 491 | although with some complication so as to allow correct handling of |
| 492 | reads vs writes.</p></li> |
| 493 | <li class="listitem"><p>When a condition variable (CV) is signalled on by |
| 494 | thread T1 and some other thread T2 is thereby released from a wait |
| 495 | on the same CV, then the memory accesses in T1 prior to the |
| 496 | signalling must happen-before those in T2 after it returns from the |
| 497 | wait. If no thread was waiting on the CV then there is no |
| 498 | effect.</p></li> |
| 499 | <li class="listitem"><p>If instead T1 broadcasts on a CV, then all of the |
| 500 | waiting threads, rather than just one of them, acquire a |
| 501 | happens-before dependency on the broadcasting thread at the point it |
| 502 | did the broadcast.</p></li> |
| 503 | <li class="listitem"><p>A thread T2 that continues after completing sem_wait |
| 504 | on a semaphore that thread T1 posts on, acquires a happens-before |
| 505 | dependence on the posting thread, a bit like dependencies caused |
| 506 | mutex unlock-lock pairs. However, since a semaphore can be posted |
| 507 | on many times, it is unspecified from which of the post calls the |
| 508 | wait call gets its happens-before dependency.</p></li> |
| 509 | <li class="listitem"><p>For a group of threads T1 .. Tn which arrive at a |
| 510 | barrier and then move on, each thread after the call has a |
| 511 | happens-after dependency from all threads before the |
| 512 | barrier.</p></li> |
| 513 | <li class="listitem"><p>A newly-created child thread acquires an initial |
| 514 | happens-after dependency on the point where its parent created it. |
| 515 | That is, all memory accesses performed by the parent prior to |
| 516 | creating the child are regarded as happening-before all the accesses |
| 517 | of the child.</p></li> |
| 518 | <li class="listitem"><p>Similarly, when an exiting thread is reaped via a |
| 519 | call to <code class="function">pthread_join</code>, once the call returns, the |
| 520 | reaping thread acquires a happens-after dependency relative to all memory |
| 521 | accesses made by the exiting thread.</p></li> |
| 522 | </ul></div> |
| 523 | <p>In summary: Helgrind intercepts the above listed events, and builds a |
| 524 | directed acyclic graph represented the collective happens-before |
| 525 | dependencies. It also monitors all memory accesses.</p> |
| 526 | <p>If a location is accessed by two different threads, but Helgrind |
| 527 | cannot find any path through the happens-before graph from one access |
| 528 | to the other, then it reports a race.</p> |
| 529 | <p>There are a couple of caveats:</p> |
| 530 | <div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; "> |
| 531 | <li class="listitem"><p>Helgrind doesn't check for a race in the case where |
| 532 | both accesses are reads. That would be silly, since concurrent |
| 533 | reads are harmless.</p></li> |
| 534 | <li class="listitem"><p>Two accesses are considered to be ordered by the |
| 535 | happens-before dependency even through arbitrarily long chains of |
| 536 | synchronisation events. For example, if T1 accesses some location |
| 537 | L, and then <code class="function">pthread_cond_signals</code> T2, which later |
| 538 | <code class="function">pthread_cond_signals</code> T3, which then accesses L, then |
| 539 | a suitable happens-before dependency exists between the first and second |
| 540 | accesses, even though it involves two different inter-thread |
| 541 | synchronisation events.</p></li> |
| 542 | </ul></div> |
| 543 | </div> |
| 544 | <div class="sect2"> |
| 545 | <div class="titlepage"><div><div><h3 class="title"> |
| 546 | <a name="hg-manual.data-races.errmsgs"></a>7.4.3. Interpreting Race Error Messages</h3></div></div></div> |
| 547 | <p>Helgrind's race detection algorithm collects a lot of |
| 548 | information, and tries to present it in a helpful way when a race is |
| 549 | detected. Here's an example:</p> |
| 550 | <pre class="programlisting"> |
| 551 | Thread #2 was created |
| 552 | at 0x511C08E: clone (in /lib64/libc-2.8.so) |
| 553 | by 0x4E333A4: do_clone (in /lib64/libpthread-2.8.so) |
| 554 | by 0x4E33A30: pthread_create@@GLIBC_2.2.5 (in /lib64/libpthread-2.8.so) |
| 555 | by 0x4C299D4: pthread_create@* (hg_intercepts.c:214) |
| 556 | by 0x4008F2: main (tc21_pthonce.c:86) |
| 557 | |
| 558 | Thread #3 was created |
| 559 | at 0x511C08E: clone (in /lib64/libc-2.8.so) |
| 560 | by 0x4E333A4: do_clone (in /lib64/libpthread-2.8.so) |
| 561 | by 0x4E33A30: pthread_create@@GLIBC_2.2.5 (in /lib64/libpthread-2.8.so) |
| 562 | by 0x4C299D4: pthread_create@* (hg_intercepts.c:214) |
| 563 | by 0x4008F2: main (tc21_pthonce.c:86) |
| 564 | |
| 565 | Possible data race during read of size 4 at 0x601070 by thread #3 |
| 566 | Locks held: none |
| 567 | at 0x40087A: child (tc21_pthonce.c:74) |
| 568 | by 0x4C29AFF: mythread_wrapper (hg_intercepts.c:194) |
| 569 | by 0x4E3403F: start_thread (in /lib64/libpthread-2.8.so) |
| 570 | by 0x511C0CC: clone (in /lib64/libc-2.8.so) |
| 571 | |
| 572 | This conflicts with a previous write of size 4 by thread #2 |
| 573 | Locks held: none |
| 574 | at 0x400883: child (tc21_pthonce.c:74) |
| 575 | by 0x4C29AFF: mythread_wrapper (hg_intercepts.c:194) |
| 576 | by 0x4E3403F: start_thread (in /lib64/libpthread-2.8.so) |
| 577 | by 0x511C0CC: clone (in /lib64/libc-2.8.so) |
| 578 | |
| 579 | Location 0x601070 is 0 bytes inside local var "unprotected2" |
| 580 | declared at tc21_pthonce.c:51, in frame #0 of thread 3 |
| 581 | </pre> |
| 582 | <p>Helgrind first announces the creation points of any threads |
| 583 | referenced in the error message. This is so it can speak concisely |
| 584 | about threads without repeatedly printing their creation point call |
| 585 | stacks. Each thread is only ever announced once, the first time it |
| 586 | appears in any Helgrind error message.</p> |
| 587 | <p>The main error message begins at the text |
| 588 | "<code class="computeroutput">Possible data race during read</code>". At |
| 589 | the start is information you would expect to see -- address and size |
| 590 | of the racing access, whether a read or a write, and the call stack at |
| 591 | the point it was detected.</p> |
| 592 | <p>A second call stack is presented starting at the text |
| 593 | "<code class="computeroutput">This conflicts with a previous |
| 594 | write</code>". This shows a previous access which also |
| 595 | accessed the stated address, and which is believed to be racing |
| 596 | against the access in the first call stack. Note that this second |
| 597 | call stack is limited to a maximum of 8 entries to limit the |
| 598 | memory usage.</p> |
| 599 | <p>Finally, Helgrind may attempt to give a description of the |
| 600 | raced-on address in source level terms. In this example, it |
| 601 | identifies it as a local variable, shows its name, declaration point, |
| 602 | and in which frame (of the first call stack) it lives. Note that this |
| 603 | information is only shown when <code class="varname">--read-var-info=yes</code> |
| 604 | is specified on the command line. That's because reading the DWARF3 |
| 605 | debug information in enough detail to capture variable type and |
| 606 | location information makes Helgrind much slower at startup, and also |
| 607 | requires considerable amounts of memory, for large programs. |
| 608 | </p> |
| 609 | <p>Once you have your two call stacks, how do you find the root |
| 610 | cause of the race?</p> |
| 611 | <p>The first thing to do is examine the source locations referred |
| 612 | to by each call stack. They should both show an access to the same |
| 613 | location, or variable.</p> |
| 614 | <p>Now figure out how how that location should have been made |
| 615 | thread-safe:</p> |
| 616 | <div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; "> |
| 617 | <li class="listitem"><p>Perhaps the location was intended to be protected by |
| 618 | a mutex? If so, you need to lock and unlock the mutex at both |
| 619 | access points, even if one of the accesses is reported to be a read. |
| 620 | Did you perhaps forget the locking at one or other of the accesses? |
| 621 | To help you do this, Helgrind shows the set of locks held by each |
| 622 | threads at the time they accessed the raced-on location.</p></li> |
| 623 | <li class="listitem"> |
| 624 | <p>Alternatively, perhaps you intended to use a some |
| 625 | other scheme to make it safe, such as signalling on a condition |
| 626 | variable. In all such cases, try to find a synchronisation event |
| 627 | (or a chain thereof) which separates the earlier-observed access (as |
| 628 | shown in the second call stack) from the later-observed access (as |
| 629 | shown in the first call stack). In other words, try to find |
| 630 | evidence that the earlier access "happens-before" the later access. |
| 631 | See the previous subsection for an explanation of the happens-before |
| 632 | relation.</p> |
| 633 | <p> |
| 634 | The fact that Helgrind is reporting a race means it did not observe |
| 635 | any happens-before relation between the two accesses. If |
| 636 | Helgrind is working correctly, it should also be the case that you |
| 637 | also cannot find any such relation, even on detailed inspection |
| 638 | of the source code. Hopefully, though, your inspection of the code |
| 639 | will show where the missing synchronisation operation(s) should have |
| 640 | been.</p> |
| 641 | </li> |
| 642 | </ul></div> |
| 643 | </div> |
| 644 | </div> |
| 645 | <div class="sect1"> |
| 646 | <div class="titlepage"><div><div><h2 class="title" style="clear: both"> |
| 647 | <a name="hg-manual.effective-use"></a>7.5. Hints and Tips for Effective Use of Helgrind</h2></div></div></div> |
| 648 | <p>Helgrind can be very helpful in finding and resolving |
| 649 | threading-related problems. Like all sophisticated tools, it is most |
| 650 | effective when you understand how to play to its strengths.</p> |
| 651 | <p>Helgrind will be less effective when you merely throw an |
| 652 | existing threaded program at it and try to make sense of any reported |
| 653 | errors. It will be more effective if you design threaded programs |
| 654 | from the start in a way that helps Helgrind verify correctness. The |
| 655 | same is true for finding memory errors with Memcheck, but applies more |
| 656 | here, because thread checking is a harder problem. Consequently it is |
| 657 | much easier to write a correct program for which Helgrind falsely |
| 658 | reports (threading) errors than it is to write a correct program for |
| 659 | which Memcheck falsely reports (memory) errors.</p> |
| 660 | <p>With that in mind, here are some tips, listed most important first, |
| 661 | for getting reliable results and avoiding false errors. The first two |
| 662 | are critical. Any violations of them will swamp you with huge numbers |
| 663 | of false data-race errors.</p> |
| 664 | <div class="orderedlist"><ol class="orderedlist" type="1"> |
| 665 | <li class="listitem"> |
| 666 | <p>Make sure your application, and all the libraries it uses, |
| 667 | use the POSIX threading primitives. Helgrind needs to be able to |
| 668 | see all events pertaining to thread creation, exit, locking and |
| 669 | other synchronisation events. To do so it intercepts many POSIX |
| 670 | pthreads functions.</p> |
| 671 | <p>Do not roll your own threading primitives (mutexes, etc) |
| 672 | from combinations of the Linux futex syscall, atomic counters, etc. |
| 673 | These throw Helgrind's internal what's-going-on models |
| 674 | way off course and will give bogus results.</p> |
| 675 | <p>Also, do not reimplement existing POSIX abstractions using |
| 676 | other POSIX abstractions. For example, don't build your own |
| 677 | semaphore routines or reader-writer locks from POSIX mutexes and |
| 678 | condition variables. Instead use POSIX reader-writer locks and |
| 679 | semaphores directly, since Helgrind supports them directly.</p> |
| 680 | <p>Helgrind directly supports the following POSIX threading |
| 681 | abstractions: mutexes, reader-writer locks, condition variables |
| 682 | (but see below), semaphores and barriers. Currently spinlocks |
| 683 | are not supported, although they could be in future.</p> |
| 684 | <p>At the time of writing, the following popular Linux packages |
| 685 | are known to implement their own threading primitives:</p> |
| 686 | <div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; "> |
| 687 | <li class="listitem"><p>Qt version 4.X. Qt 3.X is harmless in that it |
| 688 | only uses POSIX pthreads primitives. Unfortunately Qt 4.X |
| 689 | has its own implementation of mutexes (QMutex) and thread reaping. |
| 690 | Helgrind 3.4.x contains direct support |
| 691 | for Qt 4.X threading, which is experimental but is believed to |
| 692 | work fairly well. A side effect of supporting Qt 4 directly is |
| 693 | that Helgrind can be used to debug KDE4 applications. As this |
| 694 | is an experimental feature, we would particularly appreciate |
| 695 | feedback from folks who have used Helgrind to successfully debug |
| 696 | Qt 4 and/or KDE4 applications.</p></li> |
| 697 | <li class="listitem"> |
| 698 | <p>Runtime support library for GNU OpenMP (part of |
| 699 | GCC), at least for GCC versions 4.2 and 4.3. The GNU OpenMP runtime |
| 700 | library (<code class="filename">libgomp.so</code>) constructs its own |
| 701 | synchronisation primitives using combinations of atomic memory |
| 702 | instructions and the futex syscall, which causes total chaos since in |
| 703 | Helgrind since it cannot "see" those.</p> |
| 704 | <p>Fortunately, this can be solved using a configuration-time |
| 705 | option (for GCC). Rebuild GCC from source, and configure using |
| 706 | <code class="varname">--disable-linux-futex</code>. |
| 707 | This makes libgomp.so use the standard |
| 708 | POSIX threading primitives instead. Note that this was tested |
| 709 | using GCC 4.2.3 and has not been re-tested using more recent GCC |
| 710 | versions. We would appreciate hearing about any successes or |
| 711 | failures with more recent versions.</p> |
| 712 | </li> |
| 713 | </ul></div> |
| 714 | <p>If you must implement your own threading primitives, there |
| 715 | are a set of client request macros |
| 716 | in <code class="computeroutput">helgrind.h</code> to help you |
| 717 | describe your primitives to Helgrind. You should be able to |
| 718 | mark up mutexes, condition variables, etc, without difficulty. |
| 719 | </p> |
| 720 | <p> |
| 721 | It is also possible to mark up the effects of thread-safe |
| 722 | reference counting using the |
| 723 | <code class="computeroutput">ANNOTATE_HAPPENS_BEFORE</code>, |
| 724 | <code class="computeroutput">ANNOTATE_HAPPENS_AFTER</code> and |
| 725 | <code class="computeroutput">ANNOTATE_HAPPENS_BEFORE_FORGET_ALL</code>, |
| 726 | macros. Thread-safe reference counting using an atomically |
| 727 | incremented/decremented refcount variable causes Helgrind |
| 728 | problems because a one-to-zero transition of the reference count |
| 729 | means the accessing thread has exclusive ownership of the |
| 730 | associated resource (normally, a C++ object) and can therefore |
| 731 | access it (normally, to run its destructor) without locking. |
| 732 | Helgrind doesn't understand this, and markup is essential to |
| 733 | avoid false positives. |
| 734 | </p> |
| 735 | <p> |
| 736 | Here are recommended guidelines for marking up thread safe |
| 737 | reference counting in C++. You only need to mark up your |
| 738 | release methods -- the ones which decrement the reference count. |
| 739 | Given a class like this: |
| 740 | </p> |
| 741 | <pre class="programlisting"> |
| 742 | class MyClass { |
| 743 | unsigned int mRefCount; |
| 744 | |
| 745 | void Release ( void ) { |
| 746 | unsigned int newCount = atomic_decrement(&mRefCount); |
| 747 | if (newCount == 0) { |
| 748 | delete this; |
| 749 | } |
| 750 | } |
| 751 | } |
| 752 | </pre> |
| 753 | <p> |
| 754 | the release method should be marked up as follows: |
| 755 | </p> |
| 756 | <pre class="programlisting"> |
| 757 | void Release ( void ) { |
| 758 | unsigned int newCount = atomic_decrement(&mRefCount); |
| 759 | if (newCount == 0) { |
| 760 | ANNOTATE_HAPPENS_AFTER(&mRefCount); |
| 761 | ANNOTATE_HAPPENS_BEFORE_FORGET_ALL(&mRefCount); |
| 762 | delete this; |
| 763 | } else { |
| 764 | ANNOTATE_HAPPENS_BEFORE(&mRefCount); |
| 765 | } |
| 766 | } |
| 767 | </pre> |
| 768 | <p> |
| 769 | There are a number of complex, mostly-theoretical objections to |
| 770 | this scheme. From a theoretical standpoint it appears to be |
| 771 | impossible to devise a markup scheme which is completely correct |
| 772 | in the sense of guaranteeing to remove all false races. The |
| 773 | proposed scheme however works well in practice. |
| 774 | </p> |
| 775 | </li> |
| 776 | <li class="listitem"> |
| 777 | <p>Avoid memory recycling. If you can't avoid it, you must use |
| 778 | tell Helgrind what is going on via the |
| 779 | <code class="function">VALGRIND_HG_CLEAN_MEMORY</code> client request (in |
| 780 | <code class="computeroutput">helgrind.h</code>).</p> |
| 781 | <p>Helgrind is aware of standard heap memory allocation and |
| 782 | deallocation that occurs via |
| 783 | <code class="function">malloc</code>/<code class="function">free</code>/<code class="function">new</code>/<code class="function">delete</code> |
| 784 | and from entry and exit of stack frames. In particular, when memory is |
| 785 | deallocated via <code class="function">free</code>, <code class="function">delete</code>, |
| 786 | or function exit, Helgrind considers that memory clean, so when it is |
| 787 | eventually reallocated, its history is irrelevant.</p> |
| 788 | <p>However, it is common practice to implement memory recycling |
| 789 | schemes. In these, memory to be freed is not handed to |
| 790 | <code class="function">free</code>/<code class="function">delete</code>, but instead put |
| 791 | into a pool of free buffers to be handed out again as required. The |
| 792 | problem is that Helgrind has no |
| 793 | way to know that such memory is logically no longer in use, and |
| 794 | its history is irrelevant. Hence you must make that explicit, |
| 795 | using the <code class="function">VALGRIND_HG_CLEAN_MEMORY</code> client request |
| 796 | to specify the relevant address ranges. It's easiest to put these |
| 797 | requests into the pool manager code, and use them either when memory is |
| 798 | returned to the pool, or is allocated from it.</p> |
| 799 | </li> |
| 800 | <li class="listitem"> |
| 801 | <p>Avoid POSIX condition variables. If you can, use POSIX |
| 802 | semaphores (<code class="function">sem_t</code>, <code class="function">sem_post</code>, |
| 803 | <code class="function">sem_wait</code>) to do inter-thread event signalling. |
| 804 | Semaphores with an initial value of zero are particularly useful for |
| 805 | this.</p> |
| 806 | <p>Helgrind only partially correctly handles POSIX condition |
| 807 | variables. This is because Helgrind can see inter-thread |
| 808 | dependencies between a <code class="function">pthread_cond_wait</code> call and a |
| 809 | <code class="function">pthread_cond_signal</code>/<code class="function">pthread_cond_broadcast</code> |
| 810 | call only if the waiting thread actually gets to the rendezvous first |
| 811 | (so that it actually calls |
| 812 | <code class="function">pthread_cond_wait</code>). It can't see dependencies |
| 813 | between the threads if the signaller arrives first. In the latter case, |
| 814 | POSIX guidelines imply that the associated boolean condition still |
| 815 | provides an inter-thread synchronisation event, but one which is |
| 816 | invisible to Helgrind.</p> |
| 817 | <p>The result of Helgrind missing some inter-thread |
| 818 | synchronisation events is to cause it to report false positives. |
| 819 | </p> |
| 820 | <p>The root cause of this synchronisation lossage is |
| 821 | particularly hard to understand, so an example is helpful. It was |
| 822 | discussed at length by Arndt Muehlenfeld ("Runtime Race Detection |
| 823 | in Multi-Threaded Programs", Dissertation, TU Graz, Austria). The |
| 824 | canonical POSIX-recommended usage scheme for condition variables |
| 825 | is as follows:</p> |
| 826 | <pre class="programlisting"> |
| 827 | b is a Boolean condition, which is False most of the time |
| 828 | cv is a condition variable |
| 829 | mx is its associated mutex |
| 830 | |
| 831 | Signaller: Waiter: |
| 832 | |
| 833 | lock(mx) lock(mx) |
| 834 | b = True while (b == False) |
| 835 | signal(cv) wait(cv,mx) |
| 836 | unlock(mx) unlock(mx) |
| 837 | </pre> |
| 838 | <p>Assume <code class="computeroutput">b</code> is False most of |
| 839 | the time. If the waiter arrives at the rendezvous first, it |
| 840 | enters its while-loop, waits for the signaller to signal, and |
| 841 | eventually proceeds. Helgrind sees the signal, notes the |
| 842 | dependency, and all is well.</p> |
| 843 | <p>If the signaller arrives |
| 844 | first, <code class="computeroutput">b</code> is set to true, and the |
| 845 | signal disappears into nowhere. When the waiter later arrives, it |
| 846 | does not enter its while-loop and simply carries on. But even in |
| 847 | this case, the waiter code following the while-loop cannot execute |
| 848 | until the signaller sets <code class="computeroutput">b</code> to |
| 849 | True. Hence there is still the same inter-thread dependency, but |
| 850 | this time it is through an arbitrary in-memory condition, and |
| 851 | Helgrind cannot see it.</p> |
| 852 | <p>By comparison, Helgrind's detection of inter-thread |
| 853 | dependencies caused by semaphore operations is believed to be |
| 854 | exactly correct.</p> |
| 855 | <p>As far as I know, a solution to this problem that does not |
| 856 | require source-level annotation of condition-variable wait loops |
| 857 | is beyond the current state of the art.</p> |
| 858 | </li> |
| 859 | <li class="listitem"><p>Make sure you are using a supported Linux distribution. At |
| 860 | present, Helgrind only properly supports glibc-2.3 or later. This |
| 861 | in turn means we only support glibc's NPTL threading |
| 862 | implementation. The old LinuxThreads implementation is not |
| 863 | supported.</p></li> |
| 864 | <li class="listitem"><p>If your application is using thread local variables, |
| 865 | helgrind might report false positive race conditions on these |
| 866 | variables, despite being very probably race free. On Linux, you can |
| 867 | use <code class="option">--sim-hints=deactivate-pthread-stack-cache-via-hack</code> |
| 868 | to avoid such false positive error messages |
| 869 | (see <a class="xref" href="manual-core.html#opt.sim-hints">--sim-hints</a>). |
| 870 | </p></li> |
| 871 | <li class="listitem"> |
| 872 | <p>Round up all finished threads using |
| 873 | <code class="function">pthread_join</code>. Avoid |
| 874 | detaching threads: don't create threads in the detached state, and |
| 875 | don't call <code class="function">pthread_detach</code> on existing threads.</p> |
| 876 | <p>Using <code class="function">pthread_join</code> to round up finished |
| 877 | threads provides a clear synchronisation point that both Helgrind and |
| 878 | programmers can see. If you don't call |
| 879 | <code class="function">pthread_join</code> on a thread, Helgrind has no way to |
| 880 | know when it finishes, relative to any |
| 881 | significant synchronisation points for other threads in the program. So |
| 882 | it assumes that the thread lingers indefinitely and can potentially |
| 883 | interfere indefinitely with the memory state of the program. It |
| 884 | has every right to assume that -- after all, it might really be |
| 885 | the case that, for scheduling reasons, the exiting thread did run |
| 886 | very slowly in the last stages of its life.</p> |
| 887 | </li> |
| 888 | <li class="listitem"> |
| 889 | <p>Perform thread debugging (with Helgrind) and memory |
| 890 | debugging (with Memcheck) together.</p> |
| 891 | <p>Helgrind tracks the state of memory in detail, and memory |
| 892 | management bugs in the application are liable to cause confusion. |
| 893 | In extreme cases, applications which do many invalid reads and |
| 894 | writes (particularly to freed memory) have been known to crash |
| 895 | Helgrind. So, ideally, you should make your application |
| 896 | Memcheck-clean before using Helgrind.</p> |
| 897 | <p>It may be impossible to make your application Memcheck-clean |
| 898 | unless you first remove threading bugs. In particular, it may be |
| 899 | difficult to remove all reads and writes to freed memory in |
| 900 | multithreaded C++ destructor sequences at program termination. |
| 901 | So, ideally, you should make your application Helgrind-clean |
| 902 | before using Memcheck.</p> |
| 903 | <p>Since this circularity is obviously unresolvable, at least |
| 904 | bear in mind that Memcheck and Helgrind are to some extent |
| 905 | complementary, and you may need to use them together.</p> |
| 906 | </li> |
| 907 | <li class="listitem"> |
| 908 | <p>POSIX requires that implementations of standard I/O |
| 909 | (<code class="function">printf</code>, <code class="function">fprintf</code>, |
| 910 | <code class="function">fwrite</code>, <code class="function">fread</code>, etc) are thread |
| 911 | safe. Unfortunately GNU libc implements this by using internal locking |
| 912 | primitives that Helgrind is unable to intercept. Consequently Helgrind |
| 913 | generates many false race reports when you use these functions.</p> |
| 914 | <p>Helgrind attempts to hide these errors using the standard |
| 915 | Valgrind error-suppression mechanism. So, at least for simple |
| 916 | test cases, you don't see any. Nevertheless, some may slip |
| 917 | through. Just something to be aware of.</p> |
| 918 | </li> |
| 919 | <li class="listitem"> |
| 920 | <p>Helgrind's error checks do not work properly inside the |
| 921 | system threading library itself |
| 922 | (<code class="computeroutput">libpthread.so</code>), and it usually |
| 923 | observes large numbers of (false) errors in there. Valgrind's |
| 924 | suppression system then filters these out, so you should not see |
| 925 | them.</p> |
| 926 | <p>If you see any race errors reported |
| 927 | where <code class="computeroutput">libpthread.so</code> or |
| 928 | <code class="computeroutput">ld.so</code> is the object associated |
| 929 | with the innermost stack frame, please file a bug report at |
| 930 | <a class="ulink" href="http://www.valgrind.org/" target="_top">http://www.valgrind.org/</a>. |
| 931 | </p> |
| 932 | </li> |
| 933 | </ol></div> |
| 934 | </div> |
| 935 | <div class="sect1"> |
| 936 | <div class="titlepage"><div><div><h2 class="title" style="clear: both"> |
| 937 | <a name="hg-manual.options"></a>7.6. Helgrind Command-line Options</h2></div></div></div> |
| 938 | <p>The following end-user options are available:</p> |
| 939 | <div class="variablelist"> |
| 940 | <a name="hg.opts.list"></a><dl class="variablelist"> |
| 941 | <dt> |
| 942 | <a name="opt.free-is-write"></a><span class="term"> |
| 943 | <code class="option">--free-is-write=no|yes |
| 944 | [default: no] </code> |
| 945 | </span> |
| 946 | </dt> |
| 947 | <dd> |
| 948 | <p>When enabled (not the default), Helgrind treats freeing of |
| 949 | heap memory as if the memory was written immediately before |
| 950 | the free. This exposes races where memory is referenced by |
| 951 | one thread, and freed by another, but there is no observable |
| 952 | synchronisation event to ensure that the reference happens |
| 953 | before the free. |
| 954 | </p> |
| 955 | <p>This functionality is new in Valgrind 3.7.0, and is |
| 956 | regarded as experimental. It is not enabled by default |
| 957 | because its interaction with custom memory allocators is not |
| 958 | well understood at present. User feedback is welcomed. |
| 959 | </p> |
| 960 | </dd> |
| 961 | <dt> |
| 962 | <a name="opt.track-lockorders"></a><span class="term"> |
| 963 | <code class="option">--track-lockorders=no|yes |
| 964 | [default: yes] </code> |
| 965 | </span> |
| 966 | </dt> |
| 967 | <dd><p>When enabled (the default), Helgrind performs lock order |
| 968 | consistency checking. For some buggy programs, the large number |
| 969 | of lock order errors reported can become annoying, particularly |
| 970 | if you're only interested in race errors. You may therefore find |
| 971 | it helpful to disable lock order checking.</p></dd> |
| 972 | <dt> |
| 973 | <a name="opt.history-level"></a><span class="term"> |
| 974 | <code class="option">--history-level=none|approx|full |
| 975 | [default: full] </code> |
| 976 | </span> |
| 977 | </dt> |
| 978 | <dd> |
| 979 | <p><code class="option">--history-level=full</code> (the default) causes |
| 980 | Helgrind collects enough information about "old" accesses that |
| 981 | it can produce two stack traces in a race report -- both the |
| 982 | stack trace for the current access, and the trace for the |
| 983 | older, conflicting access. To limit memory usage, "old" accesses |
| 984 | stack traces are limited to a maximum of 8 entries, even if |
| 985 | <code class="option">--num-callers</code> value is bigger.</p> |
| 986 | <p>Collecting such information is expensive in both speed and |
| 987 | memory, particularly for programs that do many inter-thread |
| 988 | synchronisation events (locks, unlocks, etc). Without such |
| 989 | information, it is more difficult to track down the root |
| 990 | causes of races. Nonetheless, you may not need it in |
| 991 | situations where you just want to check for the presence or |
| 992 | absence of races, for example, when doing regression testing |
| 993 | of a previously race-free program.</p> |
| 994 | <p><code class="option">--history-level=none</code> is the opposite |
| 995 | extreme. It causes Helgrind not to collect any information |
| 996 | about previous accesses. This can be dramatically faster |
| 997 | than <code class="option">--history-level=full</code>.</p> |
| 998 | <p><code class="option">--history-level=approx</code> provides a |
| 999 | compromise between these two extremes. It causes Helgrind to |
| 1000 | show a full trace for the later access, and approximate |
| 1001 | information regarding the earlier access. This approximate |
| 1002 | information consists of two stacks, and the earlier access is |
| 1003 | guaranteed to have occurred somewhere between program points |
| 1004 | denoted by the two stacks. This is not as useful as showing |
| 1005 | the exact stack for the previous access |
| 1006 | (as <code class="option">--history-level=full</code> does), but it is |
| 1007 | better than nothing, and it is almost as fast as |
| 1008 | <code class="option">--history-level=none</code>.</p> |
| 1009 | </dd> |
| 1010 | <dt> |
| 1011 | <a name="opt.conflict-cache-size"></a><span class="term"> |
| 1012 | <code class="option">--conflict-cache-size=N |
| 1013 | [default: 1000000] </code> |
| 1014 | </span> |
| 1015 | </dt> |
| 1016 | <dd> |
| 1017 | <p>This flag only has any effect |
| 1018 | at <code class="option">--history-level=full</code>.</p> |
| 1019 | <p>Information about "old" conflicting accesses is stored in |
| 1020 | a cache of limited size, with LRU-style management. This is |
| 1021 | necessary because it isn't practical to store a stack trace |
| 1022 | for every single memory access made by the program. |
| 1023 | Historical information on not recently accessed locations is |
| 1024 | periodically discarded, to free up space in the cache.</p> |
| 1025 | <p>This option controls the size of the cache, in terms of the |
| 1026 | number of different memory addresses for which |
| 1027 | conflicting access information is stored. If you find that |
| 1028 | Helgrind is showing race errors with only one stack instead of |
| 1029 | the expected two stacks, try increasing this value.</p> |
| 1030 | <p>The minimum value is 10,000 and the maximum is 30,000,000 |
| 1031 | (thirty times the default value). Increasing the value by 1 |
| 1032 | increases Helgrind's memory requirement by very roughly 100 |
| 1033 | bytes, so the maximum value will easily eat up three extra |
| 1034 | gigabytes or so of memory.</p> |
| 1035 | </dd> |
| 1036 | <dt> |
| 1037 | <a name="opt.check-stack-refs"></a><span class="term"> |
| 1038 | <code class="option">--check-stack-refs=no|yes |
| 1039 | [default: yes] </code> |
| 1040 | </span> |
| 1041 | </dt> |
| 1042 | <dd><p> |
| 1043 | By default Helgrind checks all data memory accesses made by your |
| 1044 | program. This flag enables you to skip checking for accesses |
| 1045 | to thread stacks (local variables). This can improve |
| 1046 | performance, but comes at the cost of missing races on |
| 1047 | stack-allocated data. |
| 1048 | </p></dd> |
| 1049 | <dt> |
| 1050 | <a name="opt.ignore-thread-creation"></a><span class="term"> |
| 1051 | <code class="option">--ignore-thread-creation=<yes|no> |
| 1052 | [default: no]</code> |
| 1053 | </span> |
| 1054 | </dt> |
| 1055 | <dd> |
| 1056 | <p> |
| 1057 | Controls whether all activities during thread creation should be |
| 1058 | ignored. By default enabled only on Solaris. |
| 1059 | Solaris provides higher throughput, parallelism and scalability than |
| 1060 | other operating systems, at the cost of more fine-grained locking |
| 1061 | activity. This means for example that when a thread is created under |
| 1062 | glibc, just one big lock is used for all thread setup. Solaris libc |
| 1063 | uses several fine-grained locks and the creator thread resumes its |
| 1064 | activities as soon as possible, leaving for example stack and TLS setup |
| 1065 | sequence to the created thread. |
| 1066 | This situation confuses Helgrind as it assumes there is some false |
| 1067 | ordering in place between creator and created thread; and therefore many |
| 1068 | types of race conditions in the application would not be reported. |
| 1069 | To prevent such false ordering, this command line option is set to |
| 1070 | <code class="computeroutput">yes</code> by default on Solaris. |
| 1071 | All activity (loads, stores, client requests) is therefore ignored |
| 1072 | during:</p> |
| 1073 | <div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; "> |
| 1074 | <li class="listitem"><p> |
| 1075 | pthread_create() call in the creator thread |
| 1076 | </p></li> |
| 1077 | <li class="listitem"><p> |
| 1078 | thread creation phase (stack and TLS setup) in the created thread |
| 1079 | </p></li> |
| 1080 | </ul></div> |
| 1081 | <p> |
| 1082 | Also new memory allocated during thread creation is untracked, |
| 1083 | that is race reporting is suppressed there. DRD does the same thing |
| 1084 | implicitly. This is necessary because Solaris libc caches many objects |
| 1085 | and reuses them for different threads and that confuses |
| 1086 | Helgrind.</p> |
| 1087 | </dd> |
| 1088 | </dl> |
| 1089 | </div> |
| 1090 | </div> |
| 1091 | <div class="sect1"> |
| 1092 | <div class="titlepage"><div><div><h2 class="title" style="clear: both"> |
| 1093 | <a name="hg-manual.monitor-commands"></a>7.7. Helgrind Monitor Commands</h2></div></div></div> |
| 1094 | <p>The Helgrind tool provides monitor commands handled by Valgrind's |
| 1095 | built-in gdbserver (see <a class="xref" href="manual-core-adv.html#manual-core-adv.gdbserver-commandhandling" title="3.2.5. Monitor command handling by the Valgrind gdbserver">Monitor command handling by the Valgrind gdbserver</a>). |
| 1096 | </p> |
| 1097 | <div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; "> |
| 1098 | <li class="listitem"> |
| 1099 | <p><code class="varname">info locks [lock_addr]</code> shows the list of locks |
| 1100 | and their status. If <code class="varname">lock_addr</code> is given, only shows |
| 1101 | the lock located at this address. </p> |
| 1102 | <p> |
| 1103 | In the following example, helgrind knows about one lock. This |
| 1104 | lock is located at the guest address <code class="varname">ga |
| 1105 | 0x8049a20</code>. The lock kind is <code class="varname">rdwr</code> |
| 1106 | indicating a reader-writer lock. Other possible lock kinds |
| 1107 | are <code class="varname">nonRec</code> (simple mutex, non recursive) |
| 1108 | and <code class="varname">mbRec</code> (simple mutex, possibly recursive). |
| 1109 | The lock kind is then followed by the list of threads helding the |
| 1110 | lock. In the below example, <code class="varname">R1:thread #6 tid 3</code> |
| 1111 | indicates that the helgrind thread #6 has acquired (once, as the |
| 1112 | counter following the letter R is one) the lock in read mode. The |
| 1113 | helgrind thread nr is incremented for each started thread. The |
| 1114 | presence of 'tid 3' indicates that the thread #6 is has not exited |
| 1115 | yet and is the valgrind tid 3. If a thread has terminated, then |
| 1116 | this is indicated with 'tid (exited)'. |
| 1117 | </p> |
| 1118 | <pre class="programlisting"> |
| 1119 | (gdb) monitor info locks |
| 1120 | Lock ga 0x8049a20 { |
| 1121 | kind rdwr |
| 1122 | { R1:thread #6 tid 3 } |
| 1123 | } |
| 1124 | (gdb) |
| 1125 | </pre> |
| 1126 | <p> If you give the option <code class="varname">--read-var-info=yes</code>, |
| 1127 | then more information will be provided about the lock location, such as |
| 1128 | the global variable or the heap block that contains the lock: |
| 1129 | </p> |
| 1130 | <pre class="programlisting"> |
| 1131 | Lock ga 0x8049a20 { |
| 1132 | Location 0x8049a20 is 0 bytes inside global var "s_rwlock" |
| 1133 | declared at rwlock_race.c:17 |
| 1134 | kind rdwr |
| 1135 | { R1:thread #3 tid 3 } |
| 1136 | } |
| 1137 | </pre> |
| 1138 | </li> |
| 1139 | <li class="listitem"> |
| 1140 | <p><code class="varname">accesshistory <addr> [<len>]</code> |
| 1141 | shows the access history recorded for <len> (default 1) bytes |
| 1142 | starting at <addr>. For each recorded access that overlaps |
| 1143 | with the given range, <code class="varname">accesshistory</code> shows the operation |
| 1144 | type (read or write), the address and size read or written, the helgrind |
| 1145 | thread nr/valgrind tid number that did the operation and the locks held |
| 1146 | by the thread at the time of the operation. |
| 1147 | The oldest access is shown first, the most recent access is shown last. |
| 1148 | </p> |
| 1149 | <p> |
| 1150 | In the following example, we see first a recorded write of 4 bytes by |
| 1151 | thread #7 that has modified the given 2 bytes range. |
| 1152 | The second recorded write is the most recent recorded write : thread #9 |
| 1153 | modified the same 2 bytes as part of a 4 bytes write operation. |
| 1154 | The list of locks held by each thread at the time of the write operation |
| 1155 | are also shown. |
| 1156 | </p> |
| 1157 | <pre class="programlisting"> |
| 1158 | (gdb) monitor accesshistory 0x8049D8A 2 |
| 1159 | write of size 4 at 0x8049D88 by thread #7 tid 3 |
| 1160 | ==6319== Locks held: 2, at address 0x8049D8C (and 1 that can't be shown) |
| 1161 | ==6319== at 0x804865F: child_fn1 (locked_vs_unlocked2.c:29) |
| 1162 | ==6319== by 0x400AE61: mythread_wrapper (hg_intercepts.c:234) |
| 1163 | ==6319== by 0x39B924: start_thread (pthread_create.c:297) |
| 1164 | ==6319== by 0x2F107D: clone (clone.S:130) |
| 1165 | |
| 1166 | write of size 4 at 0x8049D88 by thread #9 tid 2 |
| 1167 | ==6319== Locks held: 2, at addresses 0x8049DA4 0x8049DD4 |
| 1168 | ==6319== at 0x804877B: child_fn2 (locked_vs_unlocked2.c:45) |
| 1169 | ==6319== by 0x400AE61: mythread_wrapper (hg_intercepts.c:234) |
| 1170 | ==6319== by 0x39B924: start_thread (pthread_create.c:297) |
| 1171 | ==6319== by 0x2F107D: clone (clone.S:130) |
| 1172 | |
| 1173 | </pre> |
| 1174 | </li> |
| 1175 | </ul></div> |
| 1176 | </div> |
| 1177 | <div class="sect1"> |
| 1178 | <div class="titlepage"><div><div><h2 class="title" style="clear: both"> |
| 1179 | <a name="hg-manual.client-requests"></a>7.8. Helgrind Client Requests</h2></div></div></div> |
| 1180 | <p>The following client requests are defined in |
| 1181 | <code class="filename">helgrind.h</code>. See that file for exact details of their |
| 1182 | arguments.</p> |
| 1183 | <div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; "> |
| 1184 | <li class="listitem"> |
| 1185 | <p><code class="function">VALGRIND_HG_CLEAN_MEMORY</code></p> |
| 1186 | <p>This makes Helgrind forget everything it knows about a |
| 1187 | specified memory range. This is particularly useful for memory |
| 1188 | allocators that wish to recycle memory.</p> |
| 1189 | </li> |
| 1190 | <li class="listitem"><p><code class="function">ANNOTATE_HAPPENS_BEFORE</code></p></li> |
| 1191 | <li class="listitem"><p><code class="function">ANNOTATE_HAPPENS_AFTER</code></p></li> |
| 1192 | <li class="listitem"><p><code class="function">ANNOTATE_NEW_MEMORY</code></p></li> |
| 1193 | <li class="listitem"><p><code class="function">ANNOTATE_RWLOCK_CREATE</code></p></li> |
| 1194 | <li class="listitem"><p><code class="function">ANNOTATE_RWLOCK_DESTROY</code></p></li> |
| 1195 | <li class="listitem"><p><code class="function">ANNOTATE_RWLOCK_ACQUIRED</code></p></li> |
| 1196 | <li class="listitem"> |
| 1197 | <p><code class="function">ANNOTATE_RWLOCK_RELEASED</code></p> |
| 1198 | <p>These are used to describe to Helgrind, the behaviour of |
| 1199 | custom (non-POSIX) synchronisation primitives, which it otherwise |
| 1200 | has no way to understand. See comments |
| 1201 | in <code class="filename">helgrind.h</code> for further |
| 1202 | documentation.</p> |
| 1203 | </li> |
| 1204 | </ul></div> |
| 1205 | </div> |
| 1206 | <div class="sect1"> |
| 1207 | <div class="titlepage"><div><div><h2 class="title" style="clear: both"> |
| 1208 | <a name="hg-manual.todolist"></a>7.9. A To-Do List for Helgrind</h2></div></div></div> |
| 1209 | <p>The following is a list of loose ends which should be tidied up |
| 1210 | some time.</p> |
| 1211 | <div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; "> |
| 1212 | <li class="listitem"><p>For lock order errors, print the complete lock |
| 1213 | cycle, rather than only doing for size-2 cycles as at |
| 1214 | present.</p></li> |
| 1215 | <li class="listitem"><p>The conflicting access mechanism sometimes |
| 1216 | mysteriously fails to show the conflicting access' stack, even |
| 1217 | when provided with unbounded storage for conflicting access info. |
| 1218 | This should be investigated.</p></li> |
| 1219 | <li class="listitem"><p>Document races caused by GCC's thread-unsafe code |
| 1220 | generation for speculative stores. In the interim see |
| 1221 | <code class="computeroutput">http://gcc.gnu.org/ml/gcc/2007-10/msg00266.html |
| 1222 | </code> |
| 1223 | and <code class="computeroutput">http://lkml.org/lkml/2007/10/24/673</code>. |
| 1224 | </p></li> |
| 1225 | <li class="listitem"><p>Don't update the lock-order graph, and don't check |
| 1226 | for errors, when a "try"-style lock operation happens (e.g. |
| 1227 | <code class="function">pthread_mutex_trylock</code>). Such calls do not add any real |
| 1228 | restrictions to the locking order, since they can always fail to |
| 1229 | acquire the lock, resulting in the caller going off and doing Plan |
| 1230 | B (presumably it will have a Plan B). Doing such checks could |
| 1231 | generate false lock-order errors and confuse users.</p></li> |
| 1232 | <li class="listitem"><p> Performance can be very poor. Slowdowns on the |
| 1233 | order of 100:1 are not unusual. There is limited scope for |
| 1234 | performance improvements. |
| 1235 | </p></li> |
| 1236 | </ul></div> |
| 1237 | </div> |
| 1238 | </div> |
| 1239 | <div> |
| 1240 | <br><table class="nav" width="100%" cellspacing="3" cellpadding="2" border="0" summary="Navigation footer"> |
| 1241 | <tr> |
| 1242 | <td rowspan="2" width="40%" align="left"> |
| 1243 | <a accesskey="p" href="cl-manual.html"><< 6. Callgrind: a call-graph generating cache and branch prediction profiler</a> </td> |
| 1244 | <td width="20%" align="center"><a accesskey="u" href="manual.html">Up</a></td> |
| 1245 | <td rowspan="2" width="40%" align="right"> <a accesskey="n" href="drd-manual.html">8. DRD: a thread error detector >></a> |
| 1246 | </td> |
| 1247 | </tr> |
| 1248 | <tr><td width="20%" align="center"><a accesskey="h" href="index.html">Home</a></td></tr> |
| 1249 | </table> |
| 1250 | </div> |
| 1251 | </body> |
| 1252 | </html> |