Namhyung Kim | dd30920 | 2015-04-22 15:33:45 +0900 | [diff] [blame] | 1 | Overhead calculation |
| 2 | -------------------- |
| 3 | The overhead can be shown in two columns as 'Children' and 'Self' when |
| 4 | perf collects callchains. The 'self' overhead is simply calculated by |
| 5 | adding all period values of the entry - usually a function (symbol). |
| 6 | This is the value that perf shows traditionally and sum of all the |
| 7 | 'self' overhead values should be 100%. |
| 8 | |
| 9 | The 'children' overhead is calculated by adding all period values of |
| 10 | the child functions so that it can show the total overhead of the |
| 11 | higher level functions even if they don't directly execute much. |
| 12 | 'Children' here means functions that are called from another (parent) |
| 13 | function. |
| 14 | |
| 15 | It might be confusing that the sum of all the 'children' overhead |
| 16 | values exceeds 100% since each of them is already an accumulation of |
| 17 | 'self' overhead of its child functions. But with this enabled, users |
| 18 | can find which function has the most overhead even if samples are |
| 19 | spread over the children. |
| 20 | |
| 21 | Consider the following example; there are three functions like below. |
| 22 | |
| 23 | ----------------------- |
| 24 | void foo(void) { |
| 25 | /* do something */ |
| 26 | } |
| 27 | |
| 28 | void bar(void) { |
| 29 | /* do something */ |
| 30 | foo(); |
| 31 | } |
| 32 | |
| 33 | int main(void) { |
| 34 | bar() |
| 35 | return 0; |
| 36 | } |
| 37 | ----------------------- |
| 38 | |
| 39 | In this case 'foo' is a child of 'bar', and 'bar' is an immediate |
| 40 | child of 'main' so 'foo' also is a child of 'main'. In other words, |
| 41 | 'main' is a parent of 'foo' and 'bar', and 'bar' is a parent of 'foo'. |
| 42 | |
| 43 | Suppose all samples are recorded in 'foo' and 'bar' only. When it's |
| 44 | recorded with callchains the output will show something like below |
| 45 | in the usual (self-overhead-only) output of perf report: |
| 46 | |
| 47 | ---------------------------------- |
| 48 | Overhead Symbol |
| 49 | ........ ..................... |
| 50 | 60.00% foo |
| 51 | | |
| 52 | --- foo |
| 53 | bar |
| 54 | main |
| 55 | __libc_start_main |
| 56 | |
| 57 | 40.00% bar |
| 58 | | |
| 59 | --- bar |
| 60 | main |
| 61 | __libc_start_main |
| 62 | ---------------------------------- |
| 63 | |
| 64 | When the --children option is enabled, the 'self' overhead values of |
| 65 | child functions (i.e. 'foo' and 'bar') are added to the parents to |
| 66 | calculate the 'children' overhead. In this case the report could be |
| 67 | displayed as: |
| 68 | |
| 69 | ------------------------------------------- |
| 70 | Children Self Symbol |
| 71 | ........ ........ .................... |
| 72 | 100.00% 0.00% __libc_start_main |
| 73 | | |
| 74 | --- __libc_start_main |
| 75 | |
| 76 | 100.00% 0.00% main |
| 77 | | |
| 78 | --- main |
| 79 | __libc_start_main |
| 80 | |
| 81 | 100.00% 40.00% bar |
| 82 | | |
| 83 | --- bar |
| 84 | main |
| 85 | __libc_start_main |
| 86 | |
| 87 | 60.00% 60.00% foo |
| 88 | | |
| 89 | --- foo |
| 90 | bar |
| 91 | main |
| 92 | __libc_start_main |
| 93 | ------------------------------------------- |
| 94 | |
| 95 | In the above output, the 'self' overhead of 'foo' (60%) was add to the |
| 96 | 'children' overhead of 'bar', 'main' and '\_\_libc_start_main'. |
| 97 | Likewise, the 'self' overhead of 'bar' (40%) was added to the |
| 98 | 'children' overhead of 'main' and '\_\_libc_start_main'. |
| 99 | |
| 100 | So '\_\_libc_start_main' and 'main' are shown first since they have |
| 101 | same (100%) 'children' overhead (even though they have zero 'self' |
| 102 | overhead) and they are the parents of 'foo' and 'bar'. |
| 103 | |
| 104 | Since v3.16 the 'children' overhead is shown by default and the output |
| 105 | is sorted by its values. The 'children' overhead is disabled by |
| 106 | specifying --no-children option on the command line or by adding |
| 107 | 'report.children = false' or 'top.children = false' in the perf config |
| 108 | file. |