Edit cache stuff, minorly.
git-svn-id: svn://svn.valgrind.org/valgrind/trunk@180 a5019735-40e9-0310-863c-91ae7b9d1cf9
diff --git a/coregrind/docs/manual.html b/coregrind/docs/manual.html
index 4da22db..daaa153 100644
--- a/coregrind/docs/manual.html
+++ b/coregrind/docs/manual.html
@@ -518,10 +518,9 @@
</li><br><p>
<li><code>--cachesim=no</code> [default]<br>
- <code>--cachesim=yes</code>
- <p>When enabled, turns off memory checking, and turns on cache profiling.
- Cache profiling is described in detail in <a href="#cache">Section 7</a>.
- </li><p>
+ <code>--cachesim=yes</code> <p>When enabled, turns off memory
+ checking, and turns on cache profiling. Cache profiling is
+ described in detail in <a href="#cache">Section 7</a>. </li><p>
</ul>
There are also some options for debugging Valgrind itself. You
@@ -1799,45 +1798,56 @@
The three steps are:
<ol>
- <li>Generate a cache simulator for your machine's cache configuration with
- `vg_cachegen' and recompile Valgrind with <code>make install</code>.
- Valgrind comes with a default simulator, but it is unlikely to be correct
- for your system, so you should generate a simulator yourself.</li>
- <li>Run your program with <code>valgrind --cachesim=yes</code> in front of
- the normal command line invocation. When the program finishes, Valgrind
- will print summary cache statistics. It also collects line-by-line
- information in a file <code>cachegrind.out</code>.</li>
- <li>Generate a function-by-function summary, and possibly annotate source
- files with 'vg_annotate'. Source files to annotate can be specified
- manually, or manually on the command line, or "interesting" source files
- can be annotated automatically with the <code>--auto=yes</code> option.
- You can annotate C/C++ files or assembly language files equally
- easily.</li>
+ <li>Generate a cache simulator for your machine's cache
+ configuration with the supplied <code>vg_cachegen</code>
+ program, and recompile Valgrind with <code>make install</code>.
+ <p>
+ The default settings are for an AMD Athlon, and you will get
+ useful information with the defaults, so you can skip this step
+ if you want. Nevertheless, for accurate cache profiles you will
+ need use <code>vg_cachegen</code> to customise
+ <code>cachegrind</code> for your system.
+ <p>
+ This step only needs to be done once, unless you are interested
+ in simulating different cache configurations (eg. first
+ concentrating on instruction cache misses, then on data cache
+ misses).
+ </li>
+ <p>
+ <li>Run your program with <code>cachegrind</code> in front of the
+ normal command line invocation. When the program finishes,
+ Valgrind will print summary cache statistics. It also collects
+ line-by-line information in a file <code>cachegrind.out</code>.
+ <p>
+ This step should be done every time you want to collect
+ information about a new program, a changed program, or about the
+ same program with different input.
+ </li>
+ <p>
+ <li>Generate a function-by-function summary, and possibly annotate
+ source files with 'vg_annotate'. Source files to annotate can be
+ specified manually, or manually on the command line, or
+ "interesting" source files can be annotated automatically with
+ the <code>--auto=yes</code> option. You can annotate C/C++
+ files or assembly language files equally easily.</li>
+ <p>
+ This step can be performed as many times as you like for each
+ Step 2. You may want to do multiple annotations showing
+ different information each time.<p>
</ol>
-<a href="#generate">Step 1</a> only needs to be done once, unless you are
-interested in simulating different cache configurations (eg. first
-concentrating on instruction cache misses, then on data cache misses).<p>
-
-<a href="#profile">Step 2</a> should be done every time you want to collect
-information about a new program, a changed program, or about the same program
-with different input.<p>
-
-<a href="#annotate">Step 3</a> can be performed as many times as you like for
-each Step 2; you may want to do multiple annotations showing different
-information each time.<p>
-
The steps are described in detail in the following sections.<p>
<a name="generate"></a>
<h3>7.3 Generating a cache simulator</h3>
-Although Valgrind comes with a pre-generated cache simulator, it most likely
-won't match the cache configuration of your machine, so you should generate
-a new simulator.<p>
-You need to generate three files, one for each of the I1, D1 and L2 caches.
-For each cache, you need to know the:
+Although Valgrind comes with a pre-generated cache simulator, it most
+likely won't match the cache configuration of your machine, so you
+should generate a new simulator.<p>
+
+You need to generate three files, one for each of the I1, D1 and L2
+caches. For each cache, you need to know the:
<ul>
<li>Cache size (bytes);
<li>Line size (bytes);
@@ -1851,9 +1861,10 @@
<li><code>--L2=size,line_size,associativity</code>
</ul>
-You can specify one, two or all three caches per invocation of vg_cachegen. It
-checks that the configuration is sensible before generating the simulators; to
-see the allowed values, run <code>vg_cachegen -h</code>.<p>
+You can specify one, two or all three caches per invocation of
+vg_cachegen. It checks that the configuration is sensible before
+generating the simulators; to see the allowed values, run
+<code>vg_cachegen -h</code>.<p>
An example invocation would be:
@@ -1861,37 +1872,43 @@
vg_cachegen --I1=65536,64,2 --D1=65536,64,2 --L2=262144,64,8
</code></blockquote>
-This simulates a machine with a 128KB split L1 2-way associative cache, and a
-256KB unified 8-way associative L2 cache. Both caches have 64B lines.<p>
+This simulates a machine with a 128KB split L1 2-way associative
+cache, and a 256KB unified 8-way associative L2 cache. Both caches
+have 64B lines.<p>
-If you don't know your cache configuration, you'll have to find it out.
-(Ideally vg_cachegen could auto-identify your cache configuration using the
-CPUID instruction, which could be done automatically during installation, and
-this whole step could be skipped...)<p>
+If you don't know your cache configuration, you'll have to find it
+out. (Ideally <code>vg_cachegen</code> could auto-identify your cache
+configuration using the CPUID instruction, which could be done
+automatically during installation, and this whole step could be
+skipped.)<p>
<h3>7.4 Cache simulation specifics</h3>
-vg_cachegen only generates simulations for a machine with a split L1 cache and
-a unified L2 cache. This configuration is used for all x86-based machines we
-are aware of.<p>
+
+<code>vg_cachegen</code> only generates simulations for a machine with
+a split L1 cache and a unified L2 cache. This configuration is used
+for all (modern) x86-based machines we are aware of. Old Cyrix CPUs
+had a unified I and D L1 cache, but they are ancient history now.<p>
The more specific characteristics of the simulation are as follows.
<ul>
- <li>Write-allocate: when a write miss occurs, the block written to is brought
- into the D1 cache. Most modern caches have this property.</li><p>
+ <li>Write-allocate: when a write miss occurs, the block written to
+ is brought into the D1 cache. Most modern caches have this
+ property.</li><p>
- <li>Bit-selection hash function: the line(s) in the cache to which a memory
- block maps is chosen by the middle bits M--(M+N-1) of the byte address,
- where:
+ <li>Bit-selection hash function: the line(s) in the cache to which a
+ memory block maps is chosen by the middle bits M--(M+N-1) of the
+ byte address, where:
<ul>
<li> line size = 2^M bytes </li>
<li>(cache size / line size) = 2^N bytes</li>
</ul> </li><p>
- <li>Inclusive L2 cache: the L2 cache replicates all the entries of the L1
- cache. This is standard on Pentium chips, but AMD Athlons use an
- exclusive L2 cache that only holds blocks evicted from L1.</li><p>
+ <li>Inclusive L2 cache: the L2 cache replicates all the entries of
+ the L1 cache. This is standard on Pentium chips, but AMD
+ Athlons use an exclusive L2 cache that only holds blocks evicted
+ from L1. Ditto AMD Durons and most modern VIAs.</li><p>
</ul>
Other noteworthy behaviour:
@@ -1924,14 +1941,18 @@
<a name="profile"></a>
<h3>7.5 Profiling programs</h3>
-Cache profiling is enabled by using the <code>--cachesim=yes</code> option to
-Valgrind. This automatically turns off Valgrind's memory checking functions,
-since the cache simulation is slow enough already, and you probably don't want
-to do both at once.<p>
-To gather cache profiling information about the program <code>ls -l<code, type:
+Cache profiling is enabled by using the <code>--cachesim=yes</code>
+option to the <code>valgrind</code> shell script. Alternatively, it
+is probably more convenient to use the <code>cachegrind</code> script.
+This automatically turns off Valgrind's memory checking functions,
+since the cache simulation is slow enough already, and you probably
+don't want to do both at once.
+<p>
+To gather cache profiling information about the program <code>ls
+-l<code, type:
-<blockquote><code>valgrind --cachesim=yes ls -l</code></blockquote>
+<blockquote><code>cachegrind ls -l</code></blockquote>
The program will execute (slowly). Upon completion, summary statistics
that look like this will be printed:
@@ -1967,30 +1988,36 @@
<h3>7.6 Output file</h3>
-As well as printing summary information, Valgrind also writes line-by-line
-cache profiling information to a file named <code>cachegrind.out</code> . This
-file is human-readable, but is best interpreted by the accompanying program
-vg_annotate, described in the next section.<p>
+As well as printing summary information, Cachegrind also writes
+line-by-line cache profiling information to a file named
+<code>cachegrind.out</code>. This file is human-readable, but is best
+interpreted by the accompanying program <code>vg_annotate</code>,
+described in the next section.
+<p>
Things to note about the <code>cachegrind.out</code> file:
<ul>
- <li>It is written every time <code>valgrind --cachesim=yes</code> is run; it
- will automatically overwrite any existing <code>cachegrind.out<code/> in
- the current directory.</li>
- <li>It can be quite large: <code>ls -l</code> generates a file of about
- 350KB; browsing a few files and web pages with Konqueror generates a file
- of around 10MB.</li>
+ <li>It is written every time <code>valgrind --cachesim=yes</code> or
+ <code>cachegrind</code> is run, and will overwrite any existing
+ <code>cachegrind.out</code> in the current directory.</li>
+ <p>
+ <li>It can be huge: <code>ls -l</code> generates a file of about
+ 350KB. Browsing a few files and web pages with a Konqueror
+ built with full debugging information generates a file
+ of around 15 MB.</li>
</ul>
<a name="annotate"></a>
<h3>7.7 Annotating C/C++ programs</h3>
-Before using vg_annotate, it is worth widening your window to be at least
-120-characters wide if possible, as the output lines can be quite long.<p>
+Before using <code>vg_annotate</code>, it is worth widening your
+window to be at least 120-characters wide if possible, as the output
+lines can be quite long.
+<p>
To get a function-by-function summary, run <code>vg_annotate</code> in
-directory containing a <code>cachegrind.out</code> file. The output looks like
-this:
+directory containing a <code>cachegrind.out</code> file. The output
+looks like this:
<pre>
--------------------------------------------------------------------------------
@@ -2073,12 +2100,13 @@
shown" line (and can be changed with the <code>--sort</code> option).
</li><p>
- <li>Threshold: vg_annotate by default omits functions that cause very low
- numbers of misses to avoid drowing you in information. In this case,
- vg_annotate shows summaries the functions that account for 99% of the
- <code>Ir</code> counts; <code>Ir</code> is chosen as the treshold event
- since it is the primary sort event. The threshold can be adjusted with
- the <code>--threshold</code> option.</li><p>
+ <li>Threshold: <code>vg_annotate</code> by default omits functions
+ that cause very low numbers of misses to avoid drowning you in
+ information. In this case, vg_annotate shows summaries the
+ functions that account for 99% of the <code>Ir</code> counts;
+ <code>Ir</code> is chosen as the threshold event since it is the
+ primary sort event. The threshold can be adjusted with the
+ <code>--threshold</code> option.</li><p>
<li>Chosen for annotation: names of files specified manually for annotation;
in this case none.</li><p>
@@ -2090,14 +2118,15 @@
Then follows summary statistics for the whole program. These are similar
to the summary provided when running <code>valgrind --cachesim=yes</code>.<p>
-Then follows function-by-function statistics. Each function is identified by a
-<code>file_name:function_name</code> pair. If a column contains only a
-`.' it means the function never performs that event (eg. the third row shows
-that <code>strcmp()</code> contains no instructions that write to memory). The
-name <code>???</code> is used if the the file name and/or function name could
-not be determined from debugging information. (If most of the entries have the
-form <code>???:???</code> the program probably wasn't compiled with
-<code>-g</code>.)<p>
+Then follows function-by-function statistics. Each function is
+identified by a <code>file_name:function_name</code> pair. If a column
+contains only a dot it means the function never performs
+that event (eg. the third row shows that <code>strcmp()</code>
+contains no instructions that write to memory). The name
+<code>???</code> is used if the the file name and/or function name
+could not be determined from debugging information. If most of the
+entries have the form <code>???:???</code> the program probably wasn't
+compiled with <code>-g</code>. <p>
It is worth noting that functions will come from three types of source files:
<ol>
@@ -2111,12 +2140,13 @@
</li>
</ol>
-There are two ways to annotate source files -- by choosing them manually, or
-with the <code>--auto=yes</code> option. To do it manually, just
-specify the filenames as arguments to vg_annotate. For example, the output from
-running <code>vg_annotate concord.c</code> for our example produces the same
-output as above followed by an annotated version of <code>concord.c</code>, a
-section of which looks like:
+There are two ways to annotate source files -- by choosing them
+manually, or with the <code>--auto=yes</code> option. To do it
+manually, just specify the filenames as arguments to
+<code>vg_annotate</code>. For example, the output from running
+<code>vg_annotate concord.c</code> for our example produces the same
+output as above followed by an annotated version of
+<code>concord.c</code>, a section of which looks like:
<pre>
--------------------------------------------------------------------------------
@@ -2211,15 +2241,16 @@
<h3>7.8 Annotating assembler programs</h3>
-Valgrind can annotate assembler programs too, or annotate the assembler
-generated for your C program. Sometimes this is useful for understanding what
-is really happening when an interesting line of C code is translated into
-multiple instructions.<p>
+
+Valgrind can annotate assembler programs too, or annotate the
+assembler generated for your C program. Sometimes this is useful for
+understanding what is really happening when an interesting line of C
+code is translated into multiple instructions.<p>
To do this, you just need to assemble your <code>.s</code> files with
-assembler-level debug information. gcc doesn't do this, but you can use GNU as
-with the <code>--gstabs</code> option to generate object files with this
-information, eg:
+assembler-level debug information. gcc doesn't do this, but you can
+use the GNU assembler with the <code>--gstabs</code> option to
+generate object files with this information, eg:
<blockquote><code>as --gstabs foo.s</code></blockquote>
@@ -2227,7 +2258,7 @@
programs.
-<h3>7.9 vg_annotate options</h3>
+<h3>7.9 <code>vg_annotate</code> options</h3>
<ul>
<li><code>-h, --help</code></li><p>
<li><code>-v, --version</code><p>
@@ -2398,11 +2429,13 @@
<ul>
<li>Use CPUID instruction to auto-identify cache configuration during
installation. This would save the user from having to know their cache
- configuration and using vg_cachegen.</li><p>
+ configuration and using vg_cachegen.</li>
+ <p>
<li>Program start-up/shut-down calls a lot of functions that aren't
interesting and just complicate the output. Would be nice to exclude
- these somehow.</li><p>
- <li>Handle files with >65535 lines</li><p>
+ these somehow.</li>
+ <p>
+ <li>Handle files with more than 65535 lines.</li><p>
</ul>
<hr width="100%">
</body>