| <html> |
| <head> |
| <title>Massif: a heap profiler</title> |
| </head> |
| |
| <body> |
| <a name="ms-top"></a> |
| <h2>7 <b>Massif</b>: a heap profiler</h2> |
| |
| To use this tool, you must specify <code>--tool=massif</code> |
| on the Valgrind command line. |
| |
| <a name="spaceprof"></a> |
| <h3>7.1 Heap profiling</h3> |
| Massif is a heap profiler, i.e. it measures how much heap memory programs use. |
| In particular, it can give you information about: |
| <ul> |
| <li>Heap blocks; |
| <li>Heap administration blocks; |
| <li>Stack sizes. |
| </ul> |
| |
| Heap profiling is useful to help you reduce the amount of memory your program |
| uses. On modern machines with virtual memory, this provides the following |
| benefits: |
| <ul> |
| <li>It can speed up your program -- a smaller program will interact better |
| with your machine's caches, avoid paging, and so on. |
| |
| <li>If your program uses lots of memory, it will reduce the chance that it |
| exhausts your machine's swap space. |
| </ul> |
| |
| Also, there are certain space leaks that aren't detected by traditional |
| leak-checkers, such as Memcheck's. That's because the memory isn't ever |
| actually lost -- a pointer remains to it -- but it's not in use. Programs |
| that have leaks like this can unnecessarily increase the amount of memory |
| they are using over time. |
| <p> |
| |
| |
| <a name="whyuse_heapprof"></a> |
| <h3>7.2 Why Use a Heap Profiler?</h3> |
| |
| Everybody knows how useful time profilers are for speeding up programs. They |
| are particularly useful because people are notoriously bad at predicting where |
| are the bottlenecks in their programs. |
| <p> |
| But the story is different for heap profilers. Some programming languages, |
| particularly lazy functional languages like <a |
| href="http://www.haskell.org">Haskell</a>, have quite sophisticated heap |
| profilers. But there are few tools as powerful for profiling C and C++ |
| programs. |
| <p> |
| Why is this? Maybe it's because C and C++ programmers must think that |
| they know where the memory is being allocated. After all, you can see all the |
| calls to <code>malloc()</code> and <code>new</code> and <code>new[]</code>, |
| right? But, in a big program, do you really know which heap allocations are |
| being executed, how many times, and how large each allocation is? Can you give |
| even a vague estimate of the memory footprint for your program? Do you know |
| this for all the libraries your program uses? What about administration bytes |
| required by the heap allocator to track heap blocks -- have you thought about |
| them? What about the stack? If you are unsure about any of these things, |
| maybe you should think about heap profiling. |
| <p> |
| Massif can tell you these things. |
| <p> |
| Or maybe it's because it's relatively easy to add basic heap profiling |
| functionality into a program, to tell you how many bytes you have allocated for |
| certain objects, or similar. But this information might only be simple like |
| total counts for the whole program's execution. What about space usage at |
| different points in the program's execution, for example? And reimplementing |
| heap profiling code for each project is a pain. |
| <p> |
| Massif can save you this effort. |
| <p> |
| |
| |
| <a name="overview"></a> |
| <h3>7.3 Overview</h3> |
| First off, as for normal Valgrind use, you probably want to compile with |
| debugging info (the <code>-g</code> flag). But, as opposed to Memcheck, |
| you probably <b>do</b> want to turn optimisation on, since you should profile |
| your program as it will be normally run. |
| <p> |
| Then, run your program with <code>valgrind --tool=massif</code> in front of the |
| normal command line invocation. When the program finishes, Massif will print |
| summary space statistics. It also creates a graph representing the program's |
| heap usage in a file called <code>massif.<i>pid</i>.ps</code>, which can |
| be read by any PostScript viewer, such as Ghostview. |
| <p> |
| It also puts detailed information about heap consumption in a file file |
| <code>massif.<i>pid</i>.txt</code> (text format) or |
| <code>massif.<i>pid</i>.html</code> (HTML format), where |
| <code><i>pid</i></code> is the program's process id. |
| <p> |
| |
| |
| <a name="basicresults"></a> |
| <h3>7.4 Basic Results of Profiling</h3> |
| |
| To gather heap profiling information about the program <code>prog</code>, |
| type: |
| <p> |
| <blockquote> |
| <code>valgrind --tool=massif prog</code> |
| </blockquote> |
| <p> |
| The program will execute (slowly). Upon completion, summary statistics |
| that look like this will be printed: |
| |
| <pre> |
| ==27519== Total spacetime: 2,258,106 ms.B |
| ==27519== heap: 24.0% |
| ==27519== heap admin: 2.2% |
| ==27519== stack(s): 73.7% |
| </pre> |
| |
| All measurements are done in <i>spacetime</i>, i.e. space (in bytes) multiplied |
| by time (in milliseconds). Note that because Massif slows a program down a |
| lot, the actual spacetime figure is fairly meaningless; it's the relative |
| values that are interesting. |
| <p> |
| Which entries you see in the breakdown depends on the command line options |
| given. The above example measures all the possible parts of memory: |
| <ul> |
| <li>Heap: number of words allocated on the heap, via <code>malloc()</code>, |
| <code>new</code> and <code>new[]</code>. |
| <p> |
| <li>Heap admin: each heap block allocated requires some administration data, |
| which lets the allocator track certain things about the block. It is easy |
| to forget about this, and if your program allocates lots of small blocks, |
| it can add up. This value is an estimate of the space required for this |
| administration data. |
| <p> |
| <li>Stack(s): the spacetime used by the programs' stack(s). (Threaded programs |
| can have multiple stacks.) This includes signal handler stacks. |
| <p> |
| </ul> |
| <p> |
| |
| |
| <a name="graphs"></a> |
| <h3>7.5 Spacetime Graphs</h3> |
| As well as printing summary information, Massif also creates a file |
| representing a spacetime graph, <code>massif.<i>pid</i>.hp</code>. |
| It will produce a file called <code>massif.<i>pid</i>.ps</code>, which can be |
| viewed in a PostScript viewer. |
| <p> |
| Massif uses a program called <code>hp2ps</code> to convert the raw data into |
| the PostScript graph. It's distributed with Massif, but came originally |
| from the <a href="http://haskell.cs.yale.edu/ghc/">Glasgow Haskell |
| Compiler</a>. You shouldn't need to worry about this at all. However, if |
| the graph creation fails for any reason, Massif tell you, and will leave |
| behind a file named <code>massif.<i>pid</i>.hp</code>, containing the raw |
| heap profiling data. |
| <p> |
| Here's an example graph:<br> |
| <img src="date.gif" alt="spacetime graph"> |
| <p> |
| The graph is broken into several bands. Most bands represent a single line of |
| your program that does some heap allocation; each such band represents all |
| the allocations and deallocations done from that line. Up to twenty bands are |
| shown; less significant allocation sites are merged into "other" and/or "OTHER" |
| bands. The accompanying text/HTML file produced by Massif has more detail |
| about these heap allocation bands. Then there are single bands for the |
| stack(s) and heap admin bytes. |
| <p> |
| Note: it's the height of a band that's important. Don't let the ups and downs |
| caused by other bands confuse you. For example, the |
| <code>read_alias_file</code> band in the example has the same height all the |
| time it's in existence. |
| <p> |
| The triangles on the x-axis show each point at which a memory census was taken. |
| These aren't necessarily evenly spread; Massif only takes a census when |
| memory is allocated or deallocated. The time on the x-axis is wallclock |
| time, which is not ideal because you can get different graphs for different |
| executions of the same program, due to random OS delays. But it's not too |
| bad, and it becomes less of a problem the longer a program runs. |
| <p> |
| Massif takes censuses at an appropriate timescale; censuses take place less |
| frequently as the program runs for longer. There is no point having more |
| than 100-200 censuses on a single graph. |
| <p> |
| The graphs give a good overview of where your program's space use comes from, |
| and how that varies over time. The accompanying text/HTML file gives a lot |
| more information about heap use. |
| |
| <a name="detailsofheap"></a> |
| <h3>7.6 Details of Heap Allocations</h3> |
| |
| The text/HTML file contains information to help interpret the heap bands of the |
| graph. It also contains a lot of extra information about heap allocations that you don't see in the graph. |
| <p> |
| Here's part of the information that accompanies the above graph. |
| |
| <hr> |
| == 0 ===========================<br> |
| Heap allocation functions accounted for 50.8% of measured spacetime<br> |
| <p> |
| Called from: |
| <ul> |
| <li><a name="a401767D1"></a><a href="#b401767D1">22.1%</a>: 0x401767D0: _nl_intern_locale_data (in /lib/i686/libc-2.3.2.so) |
| <li><a name="a4017C394"></a><a href="#b4017C394"> 8.6%</a>: 0x4017C393: read_alias_file (in /lib/i686/libc-2.3.2.so) |
| |
| <li><i>(several entries omitted)</i> |
| |
| <li>and 6 other insignificant places</li> |
| </ul> |
| <hr> |
| The first part shows the total spacetime due to heap allocations, and the |
| places in the program where most memory was allocated (nb: if this program had |
| been compiled with <code>-g</code>, actual line numbers would be given). These |
| places are sorted, from most significant to least, and correspond to the bands |
| seen in the graph. Insignificant sites (accounting for less than 0.5% of total |
| spacetime) are omitted. |
| <p> |
| That alone can be useful, but often isn't enough. What if one of these |
| functions was called from several different places in the program? Which one |
| of these is responsible for most of the memory used? For |
| <code>_nl_intern_locale_data()</code>, this question is answered by clicking on |
| the <a href="#b401767D1">22.1%</a> link, which takes us to the following part |
| of the file. |
| |
| <hr> |
| <p>== 1 ===========================<br> |
| <a name="b401767D1"></a>Context accounted for <a href="#a401767D1">22.1%</a> of measured spacetime<br> |
| 0x401767D0: _nl_intern_locale_data (in /lib/i686/libc-2.3.2.so)<br> |
| <p> |
| Called from: |
| <ul> |
| <li><a name="a40176F96"></a><a href="#b40176F96">22.1%</a>: 0x40176F95: _nl_load_locale_from_archive (in /lib/i686/libc-2.3.2.so) |
| </ul> |
| <hr> |
| |
| At this level, we can see all the places from which |
| <code>_nl_load_locale_from_archive()</code> was called such that it allocated |
| memory at 0x401767D0. (We can click on the top <a href="#a40176F96">22.1%</a> |
| link to go back to the parent entry.) At this level, we have moved beyond the |
| information presented in the graph. In this case, it is only called from one |
| place. We can again follow the link for more detail, moving to the following |
| part of the file. |
| |
| <hr> |
| <p>== 2 ===========================<br> |
| <a name="b40176F96"></a>Context accounted for <a href="#a40176F96">22.1%</a> of measured spacetime<br> |
| 0x401767D0: _nl_intern_locale_data (in /lib/i686/libc-2.3.2.so)<br> |
| 0x40176F95: _nl_load_locale_from_archive (in /lib/i686/libc-2.3.2.so)<br> |
| <p> |
| Called from: |
| <ul> |
| <li><a name="a40176185"></a>22.1%: 0x40176184: _nl_find_locale (in /lib/i686/libc-2.3.2.so) |
| </ul> |
| <hr> |
| |
| In this way we can dig deeper into the call stack, to work out exactly what |
| sequence of calls led to some memory being allocated. At this point, with a |
| call depth of 3, the information runs out (thus the address of the child entry, |
| 0x40176184, isn't a link). We could rerun the program with a greater |
| <code>--depth</code> value if we wanted more information. |
| <p> |
| Sometimes you will get a code location like this: |
| <ul> |
| <li>30.8% : 0xFFFFFFFF: ??? |
| </ul> |
| The code address isn't really 0xFFFFFFFF -- that's impossible. This is what |
| Massif does when it can't work out what the real code address is. |
| <p> |
| Massif produces this information in a plain text file by default, or HTML with |
| the <code>--format=html</code> option. The plain text version obviously |
| doesn't have the links, but a similar effect can be achieved by searching on |
| the code addresses. (In Vim, the '*' and '#' searches are ideal for this.) |
| |
| |
| <a name="massifoptions"></a> |
| <h3>7.7 Massif options</h3> |
| |
| Massif-specific options are: |
| |
| <ul> |
| <li><code>--heap=no</code><br> |
| <code>--heap=yes</code> [default]<br> |
| When enabled, profile heap usage in detail. Without it, the |
| <code>massif.<i>pid</i>.txt</code> or |
| <code>massif.<i>pid</i>.html</code> will be very short. |
| <p> |
| <li><code>--heap-admin=<i>n</i></code> [default: 8]<br> |
| The number of admin bytes per block to use. This can only be an |
| estimate of the average, since it may vary. The allocator used by |
| <code>glibc</code> requires somewhere between 4--15 bytes per block, |
| depending on various factors. It also requires admin space for freed |
| blocks, although Massif does not count this. |
| <p> |
| <li><code>--stacks=no</code><br> |
| <code>--stacks=yes</code> [default]<br> |
| When enabled, include stack(s) in the profile. Threaded programs can |
| have multiple stacks. |
| <p> |
| <li><code>--depth=<i>n</i></code> [default: 3]<br> |
| Depth of call chains to present in the detailed heap information. |
| Increasing it will give more information, but Massif will run the program |
| more slowly, using more memory, and produce a bigger |
| <code>.txt</code>/<code>.hp</code> file. |
| <p> |
| <li><code>--alloc-fn=<i>name</i></code><br> |
| Specify a function that allocates memory. This is useful for functions |
| that are wrappers to <code>malloc()</code>, which can fill up the context |
| information uselessly (and give very uninformative bands on the graph). |
| Functions specified will be ignored in contexts, i.e. treated as though |
| they were <code>malloc()</code>. This option can be specified multiple |
| times on the command line, to name multiple functions. |
| <p> |
| <li><code>--format=text</code> [default]<br> |
| <code>--format=html</code><br> |
| Produce the detailed heap information in text or HTML format. The file |
| suffix used will be either <code>.txt</code> or <code>.html</code>. |
| <p> |
| </ul> |
| |
| <a name="accuracy"></a> |
| <h3>7.8 Accuracy</h3> |
| The information should be pretty accurate. Some approximations made might |
| cause some allocation contexts to be attributed with less memory than they |
| actually allocated, but the amounts should be miniscule. |
| <p> |
| The heap admin spacetime figure is an approximation, as described above. If |
| anyone knows how to improve its accuracy, please let us know. |
| |
| </body> |
| </html> |
| |