Doc updates for 3.8.0.
git-svn-id: svn://svn.valgrind.org/valgrind/trunk@12838 a5019735-40e9-0310-863c-91ae7b9d1cf9
diff --git a/docs/xml/manual-core-adv.xml b/docs/xml/manual-core-adv.xml
index d3c43e8..f30b7d5 100644
--- a/docs/xml/manual-core-adv.xml
+++ b/docs/xml/manual-core-adv.xml
@@ -728,10 +728,10 @@
the upper part starts with an <computeroutput>y</computeroutput>
and has an <computeroutput>h</computeroutput> before the shadow postfix.
</para>
-<para>The special presentation of the AVX shadow registers is due
-to the fact that GDB retrieves independently the lower and upper half
-of the <computeroutput>ymm</computeroutput> registers. GDB however
-does not know that the shadow half registers have to be shown combined.
+<para>The special presentation of the AVX shadow registers is due to
+the fact that GDB independently retrieves the lower and upper half of
+the <computeroutput>ymm</computeroutput> registers. GDB does not
+however know that the shadow half registers have to be shown combined.
</para>
</sect2>
@@ -1716,7 +1716,8 @@
<function>malloc</function> etc safely from within wrappers.
</para>
-<para>The above comments are true for {x86,amd64,ppc32,arm}-linux. On
+<para>The above comments are true for {x86,amd64,ppc32,arm,mips32,s390}-linux.
+On
ppc64-linux function wrapping is more fragile due to the (arguably
poorly designed) ppc64-linux ABI. This mandates the use of a shadow
stack which tracks entries/exits of both wrapper and replacement
@@ -1727,7 +1728,8 @@
possible to a limited depth, beyond which Valgrind has to abort the
run. This depth is currently 16 calls.</para>
-<para>For all platforms ({x86,amd64,ppc32,ppc64,arm}-linux) all the above
+<para>For all platforms ({x86,amd64,ppc32,ppc64,arm,mips32,s390}-linux)
+all the above
comments apply on a per-thread basis. In other words, wrapping is
thread-safe: each thread must individually observe the above
restrictions, but there is no need for any kind of inter-thread
diff --git a/docs/xml/manual-core.xml b/docs/xml/manual-core.xml
index fe11c17..109ad06 100644
--- a/docs/xml/manual-core.xml
+++ b/docs/xml/manual-core.xml
@@ -1436,18 +1436,19 @@
<option><![CDATA[--redzone-size=<number> [default: depends on the tool] ]]></option>
</term>
<listitem>
- <para> Valgrind's <function>malloc, realloc,</function> etc, add padding
- blocks before and after each block allocated for the client. Such padding
- blocks are called redzones.
- The default value for the redzone size depends on the tool.
- For example, Memcheck adds and protects a minimum of 16 bytes before and
- after each block allocated by the client to detect block overrun or
- underrun.
+ <para> Valgrind's <function>malloc, realloc,</function> etc, add
+ padding blocks before and after each heap block allocated by the
+ program being run. Such padding blocks are called redzones. The
+ default value for the redzone size depends on the tool. For
+ example, Memcheck adds and protects a minimum of 16 bytes before
+ and after each block allocated by the client. This allows it to
+ detect block underruns or overruns of up to 16 bytes.
</para>
- <para>Increasing the redzone size allows to detect more cases of
- blocks overrun or underrun. Decreasing the redzone size will
- reduce the memory needed by Valgrind but reduces the chance to
- detect block overrun/underrun.</para>
+ <para>Increasing the redzone size makes it possible to detect
+ overruns of larger distances, but increases the amount of memory
+ used by Valgrind. Decreasing the redzone size will reduce the
+ memory needed by Valgrind but also reduces the chances of
+ detecting over/underruns, so is not recommended.</para>
</listitem>
</varlistentry>
@@ -1463,7 +1464,7 @@
<!-- start of xi:include in the manpage -->
<para id="uncommon.opts.para">These options apply to all tools, as they
affect certain obscure workings of the Valgrind core. Most people won't
-need to use these.</para>
+need to use them.</para>
<variablelist id="uncommon.opts.list">
@@ -1514,14 +1515,14 @@
takes advantage of this observation, limiting the overhead of
checking to code which is likely to be JIT generated.</para>
- <para>Some architectures (including ppc32, ppc64 and ARM) require
- programs which create code at runtime to flush the instruction
- cache in between code generation and first use. Valgrind
- observes and honours such instructions. Hence, on ppc32/Linux,
- ppc64/Linux and ARM/Linux, Valgrind always provides complete, transparent
- support for self-modifying code. It is only on platforms such as
- x86/Linux, AMD64/Linux, x86/Darwin and AMD64/Darwin
- that you need to use this option.</para>
+ <para>Some architectures (including ppc32, ppc64, ARM and MIPS)
+ require programs which create code at runtime to flush the
+ instruction cache in between code generation and first use.
+ Valgrind observes and honours such instructions. Hence, on
+ ppc32/Linux, ppc64/Linux and ARM/Linux, Valgrind always provides
+ complete, transparent support for self-modifying code. It is
+ only on platforms such as x86/Linux, AMD64/Linux, x86/Darwin and
+ AMD64/Darwin that you need to use this option.</para>
</listitem>
</varlistentry>
@@ -1693,33 +1694,39 @@
<option><![CDATA[--fair-sched=<no|yes|try> [default: no] ]]></option>
</term>
- <listitem> <para>The <option>--fair-sched</option> controls the
- locking mechanism used by Valgrind to serialise thread
- execution. The locking mechanism differs in the way the threads
- are scheduled, giving a different trade-off between fairness and
- performance. For more details about the Valgrind thread
- serialisation principle and its impact on performance and thread
- scheduling, see <xref linkend="manual-core.pthreads_perf_sched"/>.
+ <listitem> <para>The <option>--fair-sched</option> option controls
+ the locking mechanism used by Valgrind to serialise thread
+ execution. The locking mechanism controls the way the threads
+ are scheduled, and different settings give different trade-offs
+ between fairness and performance. For more details about the
+ Valgrind thread serialisation scheme and its impact on
+ performance and thread scheduling, see
+ <xref linkend="manual-core.pthreads_perf_sched"/>.
<itemizedlist>
<listitem> <para>The value <option>--fair-sched=yes</option>
- activates a fair scheduling. Basically, if multiple threads are
+ activates a fair scheduler. In short, if multiple threads are
ready to run, the threads will be scheduled in a round robin
fashion. This mechanism is not available on all platforms or
- linux versions. If not available,
+ Linux versions. If not available,
using <option>--fair-sched=yes</option> will cause Valgrind to
terminate with an error.</para>
+ <para>You may find this setting improves overall
+ responsiveness if you are running an interactive
+ multithreaded program, for example a web browser, on
+ Valgrind.</para>
</listitem>
<listitem> <para>The value <option>--fair-sched=try</option>
- activates the fair scheduling if available on the
- platform. Otherwise, it will automatically fallback
+ activates fair scheduling if available on the
+ platform. Otherwise, it will automatically fall back
to <option>--fair-sched=no</option>.</para>
</listitem>
<listitem> <para>The value <option>--fair-sched=no</option> activates
- a scheduling mechanism which does not guarantee fairness
- between threads ready to run.</para>
+ a scheduler which does not guarantee fairness
+ between threads ready to run, but which in general gives the
+ highest performance.</para>
</listitem>
</itemizedlist>
</para></listitem>
@@ -1813,10 +1820,10 @@
<option><![CDATA[--soname-synonyms=syn1=pattern1,syn2=pattern2,...]]></option>
</term>
<listitem>
- <para>When a shared library is loaded, Valgrind examines if some
- functions of this library must be replaced or wrapped.
- For example, memcheck is replacing the malloc related
- functions (malloc, free, calloc, ...).
+ <para>When a shared library is loaded, Valgrind checks for
+ functions in the library that must be replaced or wrapped.
+ For example, Memcheck is replaces all malloc related
+ functions (malloc, free, calloc, ...) with its own versions.
Such replacements are done by default only in shared libraries whose
soname matches a predefined soname pattern (e.g.
<varname>libc.so*</varname> on linux).
@@ -1826,7 +1833,7 @@
<option>--soname-synonyms</option> to specify one additional
synonym pattern, giving flexibility in the replacement. </para>
- <para> Currently, this flexibility is only allowed for the
+ <para>Currently, this flexibility is only allowed for the
malloc related functions, using the
synonym <varname>somalloc</varname>. This synonym is usable for
all tools doing standard replacement of malloc related functions
@@ -1859,6 +1866,14 @@
that a NONE pattern will match the main executable and any
shared library having no soname. </para>
</listitem>
+
+ <listitem>
+ <para>To run a "default" Firefox build for Linux, in which
+ JEMalloc is linked in to the main executable,
+ use <option>--soname-synonyms=somalloc=NONE</option>.
+ </para>
+ </listitem>
+
</itemizedlist>
</listitem>
</varlistentry>
@@ -1985,79 +2000,89 @@
<sect2 id="manual-core.pthreads_perf_sched" xreflabel="Scheduling and Multi-Thread Performance">
<title>Scheduling and Multi-Thread Performance</title>
-<para>A thread executes some code only when it holds the lock. After
-executing a certain nr of instructions, the running thread will release
-the lock. All threads ready to run will compete to acquire the lock.</para>
+<para>A thread executes code only when it holds the abovementioned
+lock. After executing some number of instructions, the running thread
+will release the lock. All threads ready to run will then compete to
+acquire the lock.</para>
-<para>The option <option>--fair-sched</option> controls the locking mechanism
-used to serialise the thread execution.</para>
+<para>The <option>--fair-sched</option> option controls the locking mechanism
+used to serialise thread execution.</para>
-<para> The default pipe based locking
-(<option>--fair-sched=no</option>) is available on all platforms. The
-pipe based locking does not guarantee fairness between threads : it is
-very well possible that the thread that has just released the lock
-gets it back directly. When using the pipe based locking, different
-execution of the same multithreaded application might give very different
-thread scheduling.</para>
+<para>The default pipe based locking mechanism
+(<option>--fair-sched=no</option>) is available on all
+platforms. Pipe based locking does not guarantee fairness between
+threads: it is quite likely that a thread that has just released the
+lock reacquires it immediately, even though other threads are ready to
+run. When using pipe based locking, different runs of the same
+multithreaded application might give very different thread
+scheduling.</para>
-<para> The futex based locking is available on some platforms.
-If available, it is activated by <option>--fair-sched=yes</option> or
-<option>--fair-sched=try</option>. The futex based locking ensures
-fairness between threads : if multiple threads are ready to run, the lock
-will be given to the thread which first requested the lock. Note that a thread
-which is blocked in a system call (e.g. in a blocking read system call) has
-not (yet) requested the lock: such a thread requests the lock only after the
-system call is finished.</para>
+<para>An alternative locking mechanism, based on futexes, is available
+on some platforms. If available, it is activated
+by <option>--fair-sched=yes</option> or
+<option>--fair-sched=try</option>. Futex based locking ensures
+fairness (round-robin scheduling) between threads: if multiple threads
+are ready to run, the lock will be given to the thread which first
+requested the lock. Note that a thread which is blocked in a system
+call (e.g. in a blocking read system call) has not (yet) requested the
+lock: such a thread requests the lock only after the system call is
+finished.</para>
-<para> The fairness of the futex based locking ensures a better reproducibility
-of the thread scheduling for different executions of a multithreaded
-application. This fairness/better reproducibility is particularly
-interesting when using Helgrind or DRD.</para>
+<para> The fairness of the futex based locking produces better
+reproducibility of thread scheduling for different executions of a
+multithreaded application. This better reproducibility is particularly
+helpful when using Helgrind or DRD.</para>
-<para> The Valgrind thread serialisation implies that only one thread
-is running at a time. On a multiprocessor/multicore system, the
+<para>Valgrind's use of thread serialisation implies that only one
+thread at a time may run. On a multiprocessor/multicore system, the
running thread is assigned to one of the CPUs by the OS kernel
-scheduler. When a thread acquires the lock, sometimes the thread will
+scheduler. When a thread acquires the lock, sometimes the thread will
be assigned to the same CPU as the thread that just released the
-lock. Sometimes, the thread will be assigned to another CPU. When
-using the pipe based locking, the thread that just acquired the lock
-will often be scheduled on the same CPU as the thread that just
-released the lock. With the futex based mechanism, the thread that
+lock. Sometimes, the thread will be assigned to another CPU. When
+using pipe based locking, the thread that just acquired the lock
+will usually be scheduled on the same CPU as the thread that just
+released the lock. With the futex based mechanism, the thread that
just acquired the lock will more often be scheduled on another
-CPU. </para>
+CPU.</para>
-<para>The Valgrind thread serialisation and CPU assignment by the OS
-kernel scheduler can badly interact with the CPU frequency scaling
-available on many modern CPUs : to decrease power consumption, the
+<para>Valgrind's thread serialisation and CPU assignment by the OS
+kernel scheduler can interact badly with the CPU frequency scaling
+available on many modern CPUs. To decrease power consumption, the
frequency of a CPU or core is automatically decreased if the CPU/core
has not been used recently. If the OS kernel often assigns the thread
-which just acquired the lock to another CPU/core, there is quite some
-chance that this CPU/core is currently at a low frequency. The
-frequency of this CPU will be increased after some time. However,
-during this time, the (only) running thread will have run at a low
-frequency. Once this thread has run during some time, it will release
-the lock. Another thread will acquire this lock, and might be
-scheduled again on another CPU whose clock frequency was decreased in
-the meantime.</para>
+which just acquired the lock to another CPU/core, it is quite likely
+that this CPU/core is currently at a low frequency. The frequency of
+this CPU will be increased after some time. However, during this
+time, the (only) running thread will have run at the low frequency.
+Once this thread has run for some time, it will release the lock.
+Another thread will acquire this lock, and might be scheduled again on
+another CPU whose clock frequency was decreased in the
+meantime.</para>
-<para>The futex based locking causes threads to more often switch of
-CPU/core. So, if CPU frequency scaling is activated, the futex based
-locking might decrease significantly (up to 50% degradation has been
-observed) the performance of a multithreaded app running under
-Valgrind. The pipe based locking also somewhat interacts badly with
-CPU frequency scaling. Up to 10..20% performance degradation has been
-observed. </para>
+<para>The futex based locking causes threads to change CPUs/cores more
+often. So, if CPU frequency scaling is activated, the futex based
+locking might decrease significantly the performance of a
+multithreaded app running under Valgrind. Performance losses of up to
+50% degradation have been observed, as compared to running on a
+machine for which CPU frequency scaling has been disabled. The pipe
+based locking locking scheme also interacts badly with CPU frequency
+scaling, with performance losses in the range 10..20% having been
+observed.</para>
-<para>To avoid this performance degradation, you can indicate to the
-kernel that all CPUs/cores should always run at maximum clock
-speed. Depending on your linux distribution, CPU frequency scaling
-might be controlled using a graphical interface or using command line
+<para>To avoid such performance degradation, you should indicate to
+the kernel that all CPUs/cores should always run at maximum clock
+speed. Depending on your Linux distribution, CPU frequency scaling
+may be controlled using a graphical interface or using command line
such as
<computeroutput>cpufreq-selector</computeroutput> or
-<computeroutput>cpufreq-set</computeroutput>. You might also indicate to the
-OS scheduler to run a Valgrind process on a specific (fixed) CPU using the
-<computeroutput>taskset</computeroutput> command : running on a fixed
-CPU should ensure that this specific CPU keeps a high frequency clock speed.
+<computeroutput>cpufreq-set</computeroutput>.
+</para>
+
+<para>An alternative way to avoid these problems is to tell the
+OS scheduler to tie a Valgrind process to a specific (fixed) CPU using the
+<computeroutput>taskset</computeroutput> command. This should ensure
+that the selected CPU does not fall below its maximum frequency
+setting so long as any thread of the program has work to do.
</para>
</sect2>
@@ -2202,11 +2227,10 @@
instructions. If the translator encounters these, Valgrind will
generate a SIGILL when the instruction is executed. Apart from
that, on x86 and amd64, essentially all instructions are supported,
- up to and including SSE4.2 in 64-bit mode and SSSE3 in 32-bit mode.
- Some exceptions: SSE4.2 AES instructions are not supported in
- 64-bit mode, and 32-bit mode does in fact support the bare minimum
- SSE4 instructions to needed to run programs on MacOSX 10.6 on
- 32-bit targets.
+ up to and including AVX abd AES in 64-bit mode and SSSE3 in 32-bit
+ mode. 32-bit mode does in fact support the bare minimum SSE4
+ instructions to needed to run programs on MacOSX 10.6 on 32-bit
+ targets.
</para>
</listitem>
@@ -2262,7 +2286,7 @@
large amount of administrative information maintained behind the
scenes. Another cause is that Valgrind dynamically translates the
original executable. Translated, instrumented code is 12-18 times
- larger than the original so you can easily end up with 100+ MB of
+ larger than the original so you can easily end up with 150+ MB of
translations when running (eg) a web browser.</para>
</listitem>
diff --git a/docs/xml/vg-entities.xml b/docs/xml/vg-entities.xml
index 1b5adb1..d5532d6 100644
--- a/docs/xml/vg-entities.xml
+++ b/docs/xml/vg-entities.xml
@@ -2,12 +2,12 @@
<!ENTITY vg-jemail "julian@valgrind.org">
<!ENTITY vg-vemail "valgrind@valgrind.org">
<!ENTITY cl-email "Josef.Weidendorfer@gmx.de">
-<!ENTITY vg-lifespan "2000-2011">
+<!ENTITY vg-lifespan "2000-2012">
<!-- valgrind release + version stuff -->
<!ENTITY rel-type "Release">
-<!ENTITY rel-version "3.7.0">
-<!ENTITY rel-date "2 November 2011">
+<!ENTITY rel-version "3.8.0">
+<!ENTITY rel-date "XX August 2012">
<!-- where the docs are installed -->
<!ENTITY vg-docs-path "$INSTALL/share/doc/valgrind/html/index.html">